Scraping a google search page for the top 200 search links for a keyword
i want to scrape the top 200 search links from a google page on searching a keyword.
i am using httpwebrequest .
Any other simple way to do it ?
For so far i have this.
Code:
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.google.com/search?num=100&q=" & TextBox1.Text)
Dim response As System.Net.HttpWebResponse = request.GetResponse
Dim stream As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim page As String = stream.ReadToEnd
Dim regexobj As Regex = New Regex("http://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\\^\\*\(\)_\-\=\+\\\/\?\.\:\;\,]*)?")
Dim matches As MatchCollection = regexobj.Matches(page)
For Each item As Match In matches
If Not item.Value.Contains("google") And Not item.Value.Contains("wj") Then
ListBox1.Items.Add(item.Value)
End If
Next
This is what i have tried but it's freezing the program and do not add more than 200 pages.
Code:
Dim url As Integer = 1
Do Until url = 10
For Each item As Match In matches
If Not item.Value.Contains("google") And Not item.Value.Contains("wj") Then
ListBox1.Items.Add(item.Value & url)
End If
Next
url = url - 1
Loop
How to fix that ?
Any help would be well.
Thanks
Re: Scraping a google search page for the top 200 search links for a keyword
I would start off by actually reading your previous thread. If you are not going to pay attention to advice offered then what's the point in offering the same advice again.
Calling add forces the control to draw it self each time, add the range. I would also advise using the httpwebrequest wrapper webclient. It even has it's own download async method that is on a threaded pool.