[RESOLVED] Web Scraping to allow redirect?
I have a fairly simple program that uses the WebClient to go and download a URL's source code and pulls specific information by using Split. Though, if if it hits a redirect, it doesn't redirect itself. Which I've tried to correct by using the WebBrowser, though it's also not picking up the redirect. An example of what I mean is, for instance Sears.com, when you run a search. It runs it through and produces the results. If it returns only one item, then it redirects to said item.
(Note: Sears.com isn't the website that I am scraping, it is a mere example. It states in the Privacy use that web scraping is against the usage agreement)
How exactly would I allow the WebBrowser to accept the redirect? I tried to do a Do....Loop where it would check the URL until it's different from the search, and then scrape the information, but as you can figure, it doesn't work that well. Any other ideas?
Thank you
Re: Web Scraping to allow redirect?
Quote:
It states in the Privacy use that web scraping is against the usage agreement
It does indeed, which rather begs the question of why you thought it was a good example to give.
Scraping is against the Ts and Cs of the vast majority of sites out there and, as such, is a topic whose discussion we generally discourage on this site.
Re: Web Scraping to allow redirect?
Quote:
Originally Posted by
FunkyDexter
It does indeed, which rather begs the question of why you thought it was a good example to give.
Scraping is against the Ts and Cs of the vast majority of sites out there and, as such, is a topic whose discussion we generally discourage on this site.
It's an internal server that makes it incredibly difficult to query information from it, so web scraping using its search is quite a bit simpler, or at least I thought it would be. I used Sears.com mainly because I found it to work in a similar manner.
I wasn't aware it was discouraged, that's understandable though due to what you mentioned. I'm not too concerned about getting an answer. It was a side project that I'd thought would be a neat tool.
Thank you
Re: [RESOLVED] Web Scraping to allow redirect?
You have stated you are working internally, I can sympathize with your needs. I actually have the same needs sometimes with some of our older applications that don't have an API and will not be updated ever. It makes more sense for us to scrape 3000 internal web pages than to have someone manually do it.
Once you have the source code you need, the first thing you should do is create a watch on the variable and figure out what qualifies the need to perform a redirect. Once you establish the pattern that occurs, its just a matter of pointing to the URL that you are expecting based on the previous variable and initiate another download.
Re: [RESOLVED] Web Scraping to allow redirect?
Quote:
It's an internal server
Fair enough. There are some legitimate reasons to scrape but we tend to treat such questions with a healthy dose of cynicism because 9 times out of 10 someone's trying to do something they shouldn't. Particularly when they're a brand new user who hasn't had time to build up much trust yet.
Personally I'd still try to avoid scraping as a solution, even on an internal server. It's a flakey aproach at best and prone to error when teh marketting dept decide to give the web pages lok a bit of a touch up and don't bother telling IT:rolleyes:. As it's an internal server it's technically possible to stand up a web service to do what you want which would provide a more robust solution. Of course, that may not be politically possible.