Results 1 to 6 of 6

Thread: Convert html string to HTMLDocument for parsing

  1. #1

    Thread Starter
    Frenzied Member stateofidleness's Avatar
    Join Date
    Jan 2009
    Posts
    1,780

    Convert html string to HTMLDocument for parsing

    Gents,
    I'm able to retrieve the source code of a web page and store it in a string variable. I would like to cast that string variable into an HTMLDocument if possible, to make parsing its elements much easier.

    How can I do this?

  2. #2
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    26,422

    Re: Convert html string to HTMLDocument for parsing

    you can declare a webbrowser control in code + use it without adding it to a form:

    vb Code:
    1. Dim wb As New WebBrowser
    2. wb.DocumentText = "your html string"
    3. Dim doc As HtmlDocument = wb.Document

  3. #3

    Thread Starter
    Frenzied Member stateofidleness's Avatar
    Join Date
    Jan 2009
    Posts
    1,780

    Re: Convert html string to HTMLDocument for parsing

    ah ok. i was close!

    Problem I'm seeing is that my variable DOES hold valid html (an entire page's worth), but when I set the DocumentText equal to it, it strips everything out and leaves only "<HTML></HTML>" so any future calls to it generate a Null Reference Exception..

    Here's my code:

    vb Code:
    1. Try
    2.             'updates will hold the HTML Source code (verified working)
    3.             updates = getUpdates(classURL)
    4.             'Now scrape it
    5.             Dim wb As New WebBrowser
    6.             With wb
    7.                 .DocumentText = updates
    8.                 .ScriptErrorsSuppressed = True
    9.             End With
    10.             MsgBox(wb.DocumentText)
    11.             Dim doc As HtmlDocument = wb.Document
    12.  
    13.             Dim element = doc.GetElementById("forumLabel" & txtClassID.Text).InnerText
    14.             If element IsNot Nothing Then
    15.                 MsgBox(element)
    16.             End If
    17.  
    18.         Catch ex As Exception
    19.             MsgBox("Appears something failed: " & ex.ToString)
    20.         End Try

  4. #4
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    26,422

    Re: Convert html string to HTMLDocument for parsing

    what does:

    vb Code:
    1. MsgBox(wb.DocumentText)
    return?

  5. #5

    Thread Starter
    Frenzied Member stateofidleness's Avatar
    Join Date
    Jan 2009
    Posts
    1,780

    Re: Convert html string to HTMLDocument for parsing

    that returns the "<HTML></HTML>" , but in the debugger, if I hover over "updates", I see ALL the html

  6. #6
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    26,422

    Re: Convert html string to HTMLDocument for parsing

    ok. try this:

    vb Code:
    1. Dim wb As New WebBrowser
    2. wb.ScriptErrorsSuppressed = True
    3. Dim doc As HtmlDocument = wb.Document.OpenNew(True)
    4. doc.Write("your html string")

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width