Click to See Complete Forum and Search --> : DOM Functions
Datacide
Jun 29th, 2006, 08:22 PM
Ok this is what I want to do, I'm making a website and in it I want to fetch the contents of a table cell from another site. For example I want to get a picture from a table cell that has an ID name of "DefaultImage". Is it possible to do this without loading the page and searching for that string?
Datacide
Jun 29th, 2006, 10:18 PM
Anyone have any ideas? Maybe not with PHP, but with javascript or something? I'm totally have no idea how to do this, Google didnt really comeup with anything. I just know how to do it in VB, using GetElementByID.
visualAd
Jun 30th, 2006, 02:34 AM
What version of PHP are you using?
Datacide
Jun 30th, 2006, 03:09 AM
Uhh... 5 I think. I found this but it's not that helpful...
http://ca3.php.net/manual/en/function.dom-domdocument-loadhtmlfile.php
http://ca3.php.net/manual/en/function.dom-domdocument-getelementbyid.php
visualAd
Jun 30th, 2006, 03:26 AM
If you are using PHP 5 it is quite simple. The DOM (http://www.php.net/DOM) extension which is included by default enables you to load HTML into a DOMDocument object and treat it as you would any other XML document.
$doc = new DOMDocument('1.0'); // 1.0 is the XML version
$doc->loadHTMLFile('http://www.example.com/test.html');
$tables = $doc-getElementsByTagName('table');
If you only have PHP 4, you'll need to use the DOM XML (http://www.php.net/DOMXML) extension. As well as not complying with the official W3C DOM specification, it is also not included by default. My advice if your host only has PHP 4, is to find another. :)
Datacide
Jun 30th, 2006, 03:39 AM
Ok, cool, so do I get the contents of a table cell? The first cell contains a picture, the other one is just some text.
visualAd
Jun 30th, 2006, 03:45 AM
The easiest way to get a refernce to the image element you need is to give it an ID and then use the getElementById function.
Datacide
Jun 30th, 2006, 03:48 AM
Ok... so lets say that the id of the cell is "usrpic", how do I save the image location to a variable?
EDIT: Actually the image and the cell don't have an ID, the <a href> surrounding the image does. ALso, I need to get something from between a <span> tag which also has no id, but has a unique class stlye name...
visualAd
Jun 30th, 2006, 04:22 AM
Using DOM you would first get the ID of the anchor, then get the img element:
$a = $doc->getElementById('idofa');
$imgs = $a->getElementsByTagName('img');
$src = $imgs->item(0)->getAttribute('src');
You could be sneaky however and use a single xPath expression to pull out the info you need:
$doc = new DOMDocument('1.0');
$doc->loadHTMLFile('dom.html');
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//*[@id='blah']/img[1]/@src)");
You can see it work on dom.html (http://php5.codedv.com/examples/dom/dom.html) here:
http://php5.codedv.com/examples/dom/dom.php (source (http://php5.codedv.com/examples/dom/dom.php?source))
Datacide
Jun 30th, 2006, 05:07 AM
Great thanks, I'll try that.
On a somewhat related topic, who can I load the html for a website into a variable, insert my own (javascript) and then echo it back? For example, take mywebsite "http://www.bluecable.ca/index.php" and add the adsense javascript to the top and then display the page?
Datacide
Jun 30th, 2006, 05:46 AM
Ok, here's my test code:
<?
$doc = new DOMDocument('1.0');
$doc->loadHTMLFile('http://somewhere.com/82336919');
$a = $doc->getElementById('ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage');
$imgs = $a->getElementsByTagName('img');
$src = $imgs->item(0)->getAttribute('src');
echo($src);
?>
And I get this message: Parse error: parse error, unexpected T_OBJECT_OPERATOR in /www/html/index.php on line 6
Here's the HTML for the other site:
<a id="ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage" href="somewhere"><img src="default.jpg" style="border-width:0px;" /></a>
visualAd
Jun 30th, 2006, 05:56 AM
Which line is line 6?
visualAd
Jun 30th, 2006, 06:03 AM
Also, you need to validate the doument before using getElementById.
$doc->loadHTMLFile();
$doc->validate();
If it doesn't contain a doctype declaration, you'll need to search for the ID manually or use the xPath method.
Datacide
Jun 30th, 2006, 06:13 AM
Same error, but now it's line 7:
<?
$doc = new DOMDocument('1.0');
$doc->loadHTMLFile('http://somewhere.com/82336919');
$doc->validate();
$a = $doc->getElementById('ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage');
$imgs = $a->getElementsByTagName('img');
$src = $imgs->item(0)->getAttribute('src'); // < ERROR HERE
echo($src);
?>
visualAd
Jun 30th, 2006, 06:21 AM
What does print_r($imgs) and print_r($a) yeild?
Datacide
Jun 30th, 2006, 06:44 AM
<?
$doc = new DOMDocument('1.0');
$doc->loadHTMLFile('http://example.com/82336919');
$doc->validate();
$a = $doc->getElementById('ctl00_Main_ctl00_UserBasicInformation1_hlDefaultImage');
$imgs = $a->getElementsByTagName('img');
print_r($imgs);
print_r($a);
?>
Returns:
Warning: domdocument(): Start tag expected, '<' not found in /home/virtual/site156/fst/var/www/html/index.php on line 2
Fatal error: Call to undefined function: loadhtmlfile() in /home/virtual/site156/fst/var/www/html/index.php on line 3
Is there a way to echo the PHP version number?
visualAd
Jun 30th, 2006, 06:58 AM
Use phpinfo().
That was just working right? :confused:
Datacide
Jun 30th, 2006, 07:05 AM
Damn, PHP Version 4.3.8 (http://www.tdotblog.com)
visualAd
Jun 30th, 2006, 07:10 AM
That might be why its not working ;). Your host may support PHP 5 too. Try renaming the ffile with a .php5 extension, failing that send them an email and threaten to go elsewehre if they don't add it :D
Datacide
Jun 30th, 2006, 07:23 AM
Nope, that didn't work. And since I make software for this company sometimes I get 10000 MB of space and 50000 MB of bandwidth for free, plus any domain names I want, 10000 email addresses, and unlimited databases(it seems), uh ya.. not going to be switching anytime soon. lol :p
Anyways, is there a way to do it in php4?
visualAd
Jun 30th, 2006, 12:53 PM
Yes, use the DOM XML extension. It appears that it is available and you can check it is enabled by looking for a DOM XML section in the output of the phpinfo() function. Your only problem is, is the HTML must be vlaid and well formed XHTML, there does not appear to be a function which loads HTML from an external resource.
Seriously though, ask your hosting company to include PHP 5 as well as PHP 4. PHP 5 is now stable and runs very smoothly as both a CGI and an Apache Module, it can also run along side PHP 4. You are probably in more of a position to get them to than many of their other customers.
I also recommend you look into the PEAR XML (http://pear.php.net/package/XML_Parser) parser and the Tidy (http://uk.php.net/tidy) library (this can be used to tidy up poorly formed HTML and convert it to XML).
Whatever you decide on, you will need an extension or library that is not part of PHP's default installation. Unless you fancy doing a lot of coding and effectivly reinvent the wheel by creating your own Parser ;)
vbforums.com
Copyright Internet.com Inc., All Rights Reserved.