Results 1 to 7 of 7

Thread: Data extraction without WebBrowser

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Data extraction without WebBrowser

    My code is as follows. I do a HttpWebRequest to save the HTML code and then load it to the WebBrowser. Once in the Webbrowser, I can do all sorts of data scraping. Is there a way to data scrape without loading it into the WebBrowser? The problem I'm running into is when I'm loading large amounts of webpages onto the WebBrowser, the memory creep start adding up, it slows and eventually freezes my system.

    Dim url as string = "http://www.amazon.com"
    Dim source_code As String
    Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
    Dim response As System.Net.HttpWebResponse = request.GetResponse
    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
    source_code = sr.ReadToEnd

    WebBrowser1.DocumentText = source_code

    Dim li_tags As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("li")
    For Each li_tag As HtmlElement In li_tags
    If li_tag.GetAttribute("id") = "Product" Then

  2. #2
    Frenzied Member
    Join Date
    Oct 2012
    Location
    Tampa, FL
    Posts
    1,187

    Re: Data extraction without WebBrowser

    You can parse the variable source_code yourself, it's just a string. You can look into regex or you could load the string into an html document in memory and use something like html agility pack to parse it.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: Data extraction without WebBrowser

    I tried assigning a variable wb as a Webbrowser. (Dim wb As WebBrowser = New WebBrowser). I don't understand why the memory creep still happens on a variable instead of the webbrowser that resides on the form. Is there a way to dispose of the variable to release some memory? Is there any other way to store the HttpWebRequest results into a Document? I'm not familiar with html agility pack. I'll have look into that. Do you have any resources for html agility pack?

  4. #4
    Frenzied Member
    Join Date
    Oct 2012
    Location
    Tampa, FL
    Posts
    1,187

    Re: Data extraction without WebBrowser

    So when and where is your error occuring? Are you disposing of the Web browser before creating a new one? How many Web browsers are you creating? I would think that would get garbage collected, but it depends how you have coded your application. Dispose of objects when you are done with them.

    Your best resource is google. You are creating a UI control (who know how many times, you have not specified) which is very expensive for a job that you do not need a UI for.

    See this thread for some basic info on the HTML Agility Pack:

    http://stackoverflow.com/questions/8...l-agility-pack

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: Data extraction without WebBrowser

    I tried wb.dispose but it doesn't release any memory creep. I am only creating one webbrowser variable as a function but I am calling that function hundreds of time. I tried reading up a little about HTML Agility pack but I'm still not understanding how to use that code. I don't understand why a variable keep building up memory and doesn't release it when it is dispose of. Is there any other easy way to store the HttpWebRequest results into a Document?

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Aug 2011
    Posts
    184

    Re: Data extraction without WebBrowser

    I added the HTML Agility pack. Why am I getting this syntax error? Is the HtmlDocument from the HAP the same format as Webbrowser1.document ?

    Dim document1 As HtmlDocument = HtmlWeb.Load("http://www.google.com")
    Dim div_tags As HtmlElementCollection = document1.GetElementsByTagName("div")

    ******* error
    GetElementsByTagName' is not a member of 'HtmlAgilityPack.HtmlDocument'.

  7. #7
    Frenzied Member
    Join Date
    May 2014
    Location
    Central Europe
    Posts
    1,372

    Re: Data extraction without WebBrowser

    that should not be to hard... http://bfy.tw/1TCQ

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width