Results 1 to 2 of 2

Thread: Parse HTML

  1. #1

    Thread Starter
    Frenzied Member agmorgan's Avatar
    Join Date
    Dec 2000
    Location
    Lurking
    Posts
    1,383

    Parse HTML

    I am writing an IRCbot that will respond to stock requests with the current price.
    To do this I need a screen scraper.
    I download the page then I need to look for the relevent content.
    Code:
    import java.net.URL;
    
          url = new URL(("http://bluebones.net/ticker/feed/?s=vod.L&n=uk"));
    
          in = new java.io.BufferedReader(new java.io.InputStreamReader(url.openStream()));
          line = in.readLine();
          while (line != null)
          {
            xml = xml + line;
            line = in.readLine();
          }
    xml = xml.replaceAll("&lt;", "<").replaceAll("&gt;", ">").replaceAll("&quot;", "\"");
    //Parse HTML here.
    I just cant seem to get my sting containing the HTML into any DOM.
    I have seen references to javax.swing.text.html.HTMLEditorKit and javax.xml.parsers.DocumentBuilder but I am struggling to make it work.
    I don't want to use regex or any basic string manipulation to do it.
    It seems so much harder to do stuff in java than VB

  2. #2
    Arabic Poster ComputerJy's Avatar
    Join Date
    Nov 2005
    Location
    Happily misplaced
    Posts
    2,513

    Re: Parse HTML

    I hope this code helps
    "I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
    My Blog

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width