Results 1 to 5 of 5

Thread: [RESOLVED] Web Scraping to allow redirect?

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2012
    Location
    Maine, USA
    Posts
    5

    Resolved [RESOLVED] Web Scraping to allow redirect?

    I have a fairly simple program that uses the WebClient to go and download a URL's source code and pulls specific information by using Split. Though, if if it hits a redirect, it doesn't redirect itself. Which I've tried to correct by using the WebBrowser, though it's also not picking up the redirect. An example of what I mean is, for instance Sears.com, when you run a search. It runs it through and produces the results. If it returns only one item, then it redirects to said item.

    (Note: Sears.com isn't the website that I am scraping, it is a mere example. It states in the Privacy use that web scraping is against the usage agreement)

    How exactly would I allow the WebBrowser to accept the redirect? I tried to do a Do....Loop where it would check the URL until it's different from the search, and then scrape the information, but as you can figure, it doesn't work that well. Any other ideas?

    Thank you

  2. #2
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,902

    Re: Web Scraping to allow redirect?

    It states in the Privacy use that web scraping is against the usage agreement
    It does indeed, which rather begs the question of why you thought it was a good example to give.

    Scraping is against the Ts and Cs of the vast majority of sites out there and, as such, is a topic whose discussion we generally discourage on this site.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  3. #3

    Thread Starter
    New Member
    Join Date
    Feb 2012
    Location
    Maine, USA
    Posts
    5

    Re: Web Scraping to allow redirect?

    Quote Originally Posted by FunkyDexter View Post
    It does indeed, which rather begs the question of why you thought it was a good example to give.

    Scraping is against the Ts and Cs of the vast majority of sites out there and, as such, is a topic whose discussion we generally discourage on this site.
    It's an internal server that makes it incredibly difficult to query information from it, so web scraping using its search is quite a bit simpler, or at least I thought it would be. I used Sears.com mainly because I found it to work in a similar manner.

    I wasn't aware it was discouraged, that's understandable though due to what you mentioned. I'm not too concerned about getting an answer. It was a side project that I'd thought would be a neat tool.

    Thank you

  4. #4
    Frenzied Member
    Join Date
    Oct 2012
    Location
    Tampa, FL
    Posts
    1,187

    Re: [RESOLVED] Web Scraping to allow redirect?

    You have stated you are working internally, I can sympathize with your needs. I actually have the same needs sometimes with some of our older applications that don't have an API and will not be updated ever. It makes more sense for us to scrape 3000 internal web pages than to have someone manually do it.

    Once you have the source code you need, the first thing you should do is create a watch on the variable and figure out what qualifies the need to perform a redirect. Once you establish the pattern that occurs, its just a matter of pointing to the URL that you are expecting based on the previous variable and initiate another download.

  5. #5
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,902

    Re: [RESOLVED] Web Scraping to allow redirect?

    It's an internal server
    Fair enough. There are some legitimate reasons to scrape but we tend to treat such questions with a healthy dose of cynicism because 9 times out of 10 someone's trying to do something they shouldn't. Particularly when they're a brand new user who hasn't had time to build up much trust yet.

    Personally I'd still try to avoid scraping as a solution, even on an internal server. It's a flakey aproach at best and prone to error when teh marketting dept decide to give the web pages lok a bit of a touch up and don't bother telling IT. As it's an internal server it's technically possible to stand up a web service to do what you want which would provide a more robust solution. Of course, that may not be politically possible.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width