Results 1 to 3 of 3

Thread: PHP Crawler?

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2007
    Posts
    14

    Question PHP Crawler?

    Hello,

    I'm trying to make a PHP crawler, I'm trying to make it crawl a webpage then gather all images and links and store them in MySQL then move on to another link.

    This is how far I have got:

    Code:
    <? 
    $site = $_GET[url];
    
    $f = fopen("$site","r"); 
    $inputStream = fread($f,65535); 
    fclose($f); 
    
    if (preg_match_all("/<a.*? href=\"(.*?)\".*?>(.*?)<\/a>/i",$inputStream,$matches)) { 
        $something = strip_tags($matches);
    	print_r($matches);
     }
     
    ?>
    Maybe someone could help me add in the image crawl part and storing it.

    Thank You

  2. #2
    Addicted Member
    Join Date
    Feb 2006
    Location
    Hyderabad, India
    Posts
    233

    Re: PHP Crawler?

    For the route you have taken, you have to be really good with regular expressions. With the regex you have, you are assuming that all href attributes are enclosed in double quotes ("), not in single quotes('), which is not always correct. If you are using php5, you could make use of DOM. Take a look here. You have lot of functions which make life easier like getElementsByTagName etc.

  3. #3
    VBA Nutter visualAd's Avatar
    Join Date
    Apr 2002
    Location
    Ickenham, UK
    Posts
    4,906

    Re: PHP Crawler?

    More specifically the function loadHTML that loads HTML 4 documents (which don't conform to XML standards) into a DOM document.
    PHP || MySql || Apache || Get Firefox || OpenOffice.org || Click || Slap ILMV || 1337 c0d || GotoMyPc For FREE! Part 1, Part 2

    | PHP Session --> Database Handler * Custom Error Handler * Installing PHP * HTML Form Handler * PHP 5 OOP * Using XML * Ajax * Xslt | VB6 Winsock - HTTP POST / GET * Winsock - HTTP File Upload

    Latest quote: crptcblade - VB6 executables can't be decompiled, only disassembled. And the disassembled code is even less useful than I am.

    Random VisualAd: Blog - Latest Post: When the Internet becomes Electricity!!


    Spread happiness and joy. Rate good posts.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width