|
-
May 13th, 2013, 09:10 PM
#1
Thread Starter
Lively Member
Web scrapping
I figured I'd ask this question in a different way. Perhaps someone could point me in the right direction. How would I go about webscrapping a website? If the user enters in a value in a user form textbox, let's say a zip code. How code I take that value and enter it into a search engine or website and extract the answer into a a cell on a spreadsheet to reference(database),unless there's a better way to build a zip code database. Especially when the answer is nested within many tags, some with the same name. Thanks.
Last edited by pooky; May 13th, 2013 at 09:34 PM.
-
May 14th, 2013, 08:04 AM
#2
Hyperactive Member
Re: Web scrapping
First off we need to know what you are trying to achieve as there could be multiple ways to accomplish this.
Have you looked into the webbrowser control? How about the webclient method? Do you need to see the navigation and resulting page? Are you going to be manipulating the pages (clicks/etc)?
The fastest method is to use a webclient and use the DownloadString() method to get the html of the url. I'm sure you can manipulate the search engines url (ie, www.google.com/search=**). Once you get the string result of that method, just parse the html.
If you want to use the webbrowser control (slower), you can use the Navigate() method and hook the document_completed event. In this event, you can parse the webbrowser.DocumentText (html). Or, if you know the tags the results are stored in, you can set an HtmlElementCollection variable = webbrowser.Document.GetElementsByTagName("taghere") and parse the collection.
-
May 14th, 2013, 08:46 AM
#3
Addicted Member
Re: Web scrapping
I've done this many times. Read up on the System.Net.HttpWebRequest and HttpWebResponse classes and their base class WebRequest and WebResponse. Then once you get a response stream, you will need to parse out the info using whatever method that makes the most sense like Regular Expressions. But the caveat is this tends to break whenever the website updates the html.
You will also need to know what information is passed to the web server from the browser and I use a free program called Fiddler2.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|