Parse HTML

**agmorgan** · Aug 16th, 2007, 11:54 AM

I am writing an IRCbot that will respond to stock requests with the current price.
To do this I need a screen scraper.
I download the page then I need to look for the relevent content.

Code:

import java.net.URL;

      url = new URL(("http://bluebones.net/ticker/feed/?s=vod.L&n=uk"));

      in = new java.io.BufferedReader(new java.io.InputStreamReader(url.openStream()));
      line = in.readLine();
      while (line != null)
      {
        xml = xml + line;
        line = in.readLine();
      }
xml = xml.replaceAll("&lt;", "<").replaceAll("&gt;", ">").replaceAll("&quot;", "\"");
//Parse HTML here.

I just cant seem to get my sting containing the HTML into any DOM.
I have seen references to javax.swing.text.html.HTMLEditorKit and javax.xml.parsers.DocumentBuilder but I am struggling to make it work.
I don't want to use regex or any basic string manipulation to do it.
It seems so much harder to do stuff in java than VB

**ComputerJy** · Aug 17th, 2007, 02:32 AM

I hope this code helps

Thread: Parse HTML

Thread Tools

Display

Parse HTML

Re: Parse HTML

Posting Permissions