|
-
Aug 31st, 2007, 08:56 AM
#1
Thread Starter
New Member
PHP Crawler?
Hello,
I'm trying to make a PHP crawler, I'm trying to make it crawl a webpage then gather all images and links and store them in MySQL then move on to another link.
This is how far I have got:
Code:
<?
$site = $_GET[url];
$f = fopen("$site","r");
$inputStream = fread($f,65535);
fclose($f);
if (preg_match_all("/<a.*? href=\"(.*?)\".*?>(.*?)<\/a>/i",$inputStream,$matches)) {
$something = strip_tags($matches);
print_r($matches);
}
?>
Maybe someone could help me add in the image crawl part and storing it.
Thank You
-
Sep 1st, 2007, 06:27 AM
#2
Addicted Member
Re: PHP Crawler?
For the route you have taken, you have to be really good with regular expressions. With the regex you have, you are assuming that all href attributes are enclosed in double quotes ("), not in single quotes('), which is not always correct. If you are using php5, you could make use of DOM. Take a look here. You have lot of functions which make life easier like getElementsByTagName etc.
-
Sep 3rd, 2007, 02:10 AM
#3
Re: PHP Crawler?
More specifically the function loadHTML that loads HTML 4 documents (which don't conform to XML standards) into a DOM document.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|