Results 1 to 4 of 4

Thread: Parsing URLs from a file

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Oct 2006
    Posts
    419

    Parsing URLs from a file

    Hey,

    I am having a problem trying to find a way of getting URLs from a file, i have no problems getting the html file or saving the information, its just getting the information from inside the file.

    All i want to do is search through the file to find links e.g. find /tutorials/Maya/1 from a line that maybe:
    <a href="/tutorials/Maya/1">Maya</a>

    I also am trying to find a way so that when it gets a list of links it then looks if they contain the phrase:
    /tutorials/ (I would be able to do that if i knew how to do the first bit i think!).

    Can anyone help at all?

    Thanks,
    Lee.
    If a post has been usefull then Rate it!

  2. #2
    WiggleWiggle dclamp's Avatar
    Join Date
    Aug 2006
    Posts
    3,527

    Re: Parsing URLs from a file

    this is what you are looking for:

    preg_match
    My usual boring signature: Something

  3. #3
    VBA Nutter visualAd's Avatar
    Join Date
    Apr 2002
    Location
    Ickenham, UK
    Posts
    4,906

    Re: Parsing URLs from a file

    Quote Originally Posted by VBlee
    Hey,

    I am having a problem trying to find a way of getting URLs from a file, i have no problems getting the html file or saving the information, its just getting the information from inside the file.

    All i want to do is search through the file to find links e.g. find /tutorials/Maya/1 from a line that maybe:
    <a href="/tutorials/Maya/1">Maya</a>

    I also am trying to find a way so that when it gets a list of links it then looks if they contain the phrase:
    /tutorials/ (I would be able to do that if i knew how to do the first bit i think!).

    Can anyone help at all?

    Thanks,
    Lee.
    Although a regular expression can be used it would be rather large due to the high variety of ways in which HTML is written.

    HTML Code:
    <P><A HREF=www.vbforums.com>My Link</p>
    
    <P><A HREF="www.vbforums.com" >My Link<p>
    
    <P><A HREF='http://www.vbforums.com' >My Link<p>
    
    <P><A HREF="www.vbforums.com" >My Link</a><p>
    
    <p><a href=
    
    "www.vbforums.com" 
    
    >My Link</a><p>
    And any combination of the above. The best way of doing this is using the loadHTML method of the DomDocument object. This will take into account any poorly formed HTML and inconsistencies in the markup. You can also be relativity sure that everything have been captured.

    If the HTMl document is XHTML, you can just load the document using DOMDocument-->load().

    You can then use the getElementsByTagName() and getAttribute() methods to get the values of the links attributes.
    PHP Code:
    $anchors $dom->getElementsByTagName('a');

    foreach(
    $anchors as $anchor) {
      echo (
    $anchor->getAttribute('href')); 

    PHP || MySql || Apache || Get Firefox || OpenOffice.org || Click || Slap ILMV || 1337 c0d || GotoMyPc For FREE! Part 1, Part 2

    | PHP Session --> Database Handler * Custom Error Handler * Installing PHP * HTML Form Handler * PHP 5 OOP * Using XML * Ajax * Xslt | VB6 Winsock - HTTP POST / GET * Winsock - HTTP File Upload

    Latest quote: crptcblade - VB6 executables can't be decompiled, only disassembled. And the disassembled code is even less useful than I am.

    Random VisualAd: Blog - Latest Post: When the Internet becomes Electricity!!


    Spread happiness and joy. Rate good posts.

  4. #4

    Thread Starter
    Hyperactive Member
    Join Date
    Oct 2006
    Posts
    419

    Re: Parsing URLs from a file

    Thanks you both so much, i had been looking at regular expressions, just getting my head round them really but i was just looking for another option such as visualAd's method.

    Thanks again, i will test/investigate them
    Lee.
    If a post has been usefull then Rate it!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width