PDA

Click to See Complete Forum and Search --> : Parse HTML


agmorgan
Aug 16th, 2007, 11:54 AM
I am writing an IRCbot that will respond to stock requests with the current price.
To do this I need a screen scraper.
I download the page then I need to look for the relevent content.import java.net.URL;

url = new URL(("http://bluebones.net/ticker/feed/?s=vod.L&n=uk"));

in = new java.io.BufferedReader(new java.io.InputStreamReader(url.openStream()));
line = in.readLine();
while (line != null)
{
xml = xml + line;
line = in.readLine();
}
xml = xml.replaceAll("&lt;", "<").replaceAll("&gt;", ">").replaceAll("&quot;", "\"");
//Parse HTML here.
I just cant seem to get my sting containing the HTML into any DOM.
I have seen references to javax.swing.text.html.HTMLEditorKit and javax.xml.parsers.DocumentBuilder but I am struggling to make it work.
I don't want to use regex or any basic string manipulation to do it.
It seems so much harder to do stuff in java than VB :(

ComputerJy
Aug 17th, 2007, 02:32 AM
I hope this code (http://www.koders.com/java/fid2A84C4FE6F6679455AE29CBAC81D1390B9E2A205.aspx?s=parse+html) helps