Results 1 to 2 of 2

Thread: Scraping a google search page for the top 200 search links for a keyword

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Jun 2011
    Posts
    137

    Scraping a google search page for the top 200 search links for a keyword

    i want to scrape the top 200 search links from a google page on searching a keyword.

    i am using httpwebrequest .
    Any other simple way to do it ?

    For so far i have this.

    Code:
     Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.google.com/search?num=100&q=" & TextBox1.Text)
                    Dim response As System.Net.HttpWebResponse = request.GetResponse
                    Dim stream As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
                    Dim page As String = stream.ReadToEnd
                    Dim regexobj As Regex = New Regex("http://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\\^\\*\(\)_\-\=\+\\\/\?\.\:\;\,]*)?")
                    Dim matches As MatchCollection = regexobj.Matches(page)
                    For Each item As Match In matches
                        If Not item.Value.Contains("google") And Not item.Value.Contains("wj") Then
                            ListBox1.Items.Add(item.Value)
                        End If
                    Next
    This is what i have tried but it's freezing the program and do not add more than 200 pages.
    Code:
                Dim url As Integer = 1
                Do Until url = 10
                   For Each item As Match In matches
                        If Not item.Value.Contains("google") And Not item.Value.Contains("wj") Then
                            ListBox1.Items.Add(item.Value & url)
                        End If
                    Next
                    url = url - 1
                    Loop
    How to fix that ?

    Any help would be well.

    Thanks
    Last edited by polas; Sep 23rd, 2013 at 02:23 PM.

  2. #2
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Scraping a google search page for the top 200 search links for a keyword

    I would start off by actually reading your previous thread. If you are not going to pay attention to advice offered then what's the point in offering the same advice again.

    Calling add forces the control to draw it self each time, add the range. I would also advise using the httpwebrequest wrapper webclient. It even has it's own download async method that is on a threaded pool.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width