It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.
if you open a pdf page in internet explorer with view-source:http://to.com/file.pdf what do you get?
Thats the answer too.. just use
if the file extension is PDF instead of html/php etc...Code:view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf
Here is what your scrapper will see, it's not HTML code
Code:%PDF-1.2 %âãÏÓ 9 0 obj << /Length 10 0 R /Filter /FlateDecode >> stream H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ? žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ß঺¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª endstream endobj 10 0 obj 246 endobj 4 0 obj << /Type /Page /Parent 5 0 R /Resources << /Font << /F0 6 0 R /F1 7 0 R >> /ProcSet 2 0 R >> /Contents 9 0 R




Reply With Quote
