|
-
May 7th, 2008, 02:26 PM
#1
Thread Starter
Hyperactive Member
Parsing URLs from a file
Hey,
I am having a problem trying to find a way of getting URLs from a file, i have no problems getting the html file or saving the information, its just getting the information from inside the file.
All i want to do is search through the file to find links e.g. find /tutorials/Maya/1 from a line that maybe:
<a href="/tutorials/Maya/1">Maya</a>
I also am trying to find a way so that when it gets a list of links it then looks if they contain the phrase:
/tutorials/ (I would be able to do that if i knew how to do the first bit i think!).
Can anyone help at all?
Thanks,
Lee.
If a post has been usefull then Rate it! 
-
May 7th, 2008, 11:53 PM
#2
Re: Parsing URLs from a file
this is what you are looking for:
preg_match
My usual boring signature: Something
-
May 8th, 2008, 03:26 AM
#3
Re: Parsing URLs from a file
 Originally Posted by VBlee
Hey,
I am having a problem trying to find a way of getting URLs from a file, i have no problems getting the html file or saving the information, its just getting the information from inside the file.
All i want to do is search through the file to find links e.g. find /tutorials/Maya/1 from a line that maybe:
<a href="/tutorials/Maya/1">Maya</a>
I also am trying to find a way so that when it gets a list of links it then looks if they contain the phrase:
/tutorials/ (I would be able to do that if i knew how to do the first bit i think!).
Can anyone help at all?
Thanks,
Lee.
Although a regular expression can be used it would be rather large due to the high variety of ways in which HTML is written.
HTML Code:
<P><A HREF=www.vbforums.com>My Link</p>
<P><A HREF="www.vbforums.com" >My Link<p>
<P><A HREF='http://www.vbforums.com' >My Link<p>
<P><A HREF="www.vbforums.com" >My Link</a><p>
<p><a href=
"www.vbforums.com"
>My Link</a><p>
And any combination of the above. The best way of doing this is using the loadHTML method of the DomDocument object. This will take into account any poorly formed HTML and inconsistencies in the markup. You can also be relativity sure that everything have been captured.
If the HTMl document is XHTML, you can just load the document using DOMDocument-->load().
You can then use the getElementsByTagName() and getAttribute() methods to get the values of the links attributes.
PHP Code:
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) { echo ($anchor->getAttribute('href')); }
-
May 8th, 2008, 10:46 AM
#4
Thread Starter
Hyperactive Member
Re: Parsing URLs from a file
Thanks you both so much, i had been looking at regular expressions, just getting my head round them really but i was just looking for another option such as visualAd's method.
Thanks again, i will test/investigate them 
Lee.
If a post has been usefull then Rate it! 
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|