Results 1 to 9 of 9

Thread: Dealing with Screen Scrappers and Bots

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Jun 2008
    Posts
    407

    Dealing with Screen Scrappers and Bots

    HI Guys, I got a bot and screen scrapper problem. I have successfully redirected them somewhere else, but they still show up in my stats. I run my redirection code in the pre init section of the master page. I use discountasp for my hosting provider. Is there a way to run my code before the preinit page section?
    My Websites
    SharpMP3 - MP3 Design Articles www.sharpmp3.com
    Yobbers - Job Search www.yobbers.com
    Lets Trend - Methods For Riding Stock Trends www.letstrend.com

  2. #2
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: Dealing with Screen Scrappers and Bots

    Hey,

    Just to clarify, exactly what problem are you having?

    Gary

  3. #3

    Thread Starter
    Hyperactive Member
    Join Date
    Jun 2008
    Posts
    407

    Re: Dealing with Screen Scrappers and Bots

    When the screen scrappers and bots hit my site they mess with my stats. Making it a little harder to figure out who's coming or not. I know it's them because the user agent usually says scraper, bot, or crawler.
    My Websites
    SharpMP3 - MP3 Design Articles www.sharpmp3.com
    Yobbers - Job Search www.yobbers.com
    Lets Trend - Methods For Riding Stock Trends www.letstrend.com

  4. #4
    Frenzied Member brin351's Avatar
    Join Date
    Mar 2007
    Location
    Land Down Under
    Posts
    1,293

    Re: Dealing with Screen Scrappers and Bots

    It may not be a good idea to prevent bots from crawling the website because that's how the site pages get indexed for search engine searchs. Don't you wan't the site to show up on search engines ??? OR is their another issue besides the stats???

    If it's just the stats issue can you not just query your stats on the useragent to ignore the bots.
    The problem with computers is their nature is pure logic. Just once I'd like my computer to do something deluded.

  5. #5
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: Dealing with Screen Scrappers and Bots

    Have to agree with brin here. Unless you know for a fact that the bot is coming from a malicious source, what is the problem?

    Gary

  6. #6

    Thread Starter
    Hyperactive Member
    Join Date
    Jun 2008
    Posts
    407

    Re: Dealing with Screen Scrappers and Bots

    I want the search engines. I want humans to come to my site, but not computers. I really don't want someone grabbing all the information my site and stashing it somewhere. I don't mind people copying it and storing but someone who has a computer setup to actually just come and randomly grab stuff. I have a problem with that. A lot of this stuff is from China and Russia and I really have problems with the way they handle their computer systems. I'd like them to say out of my site.

    I don't want things like this:
    User Agent Browser Version Platform Visits
    ScrapeBox v1.0.0 Unknown Unknown 21
    No User Agent (masked) - - 35
    R6_CommentReader(www.radian6.com/crawler) Unknown Unknown 13

    Moreoverbot/5.1 ( http://w.moreover.com; [email protected]) Mozilla/5.0 Unknown Unknown 3

    Java/1.6.0 Unknown Unknown 2

    Jakarta Commons-HttpClient/3.1 Unknown Unknown 1

    Sogou web spider/4.0( http://www.sogou.com/docs/help/webmasters.htm#07) Unknown Unknown 1
    My Websites
    SharpMP3 - MP3 Design Articles www.sharpmp3.com
    Yobbers - Job Search www.yobbers.com
    Lets Trend - Methods For Riding Stock Trends www.letstrend.com

  7. #7
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: Dealing with Screen Scrappers and Bots

    Bottom line is, if you are going to put something up on your web site, then you are opening it up to the public. You are never going to be able to stop every single person that you don't want accessing your site. What sort of content are you worried about? You could also add a "members" section to your site, and put the content there, that way people have to log in before they can see parts of your site.

    Gary

  8. #8
    Frenzied Member brin351's Avatar
    Join Date
    Mar 2007
    Location
    Land Down Under
    Posts
    1,293

    Re: Dealing with Screen Scrappers and Bots

    jakkjakk,

    Gary has hit the nail on the head so to speak. If you put information in the public domain then it's open to anyone/thing to consume and you almost can't change that.

    If you google something like "stop screen scraping" you'll see this problems is mainly resolved by analyzing log files for potential offenders and creating a blacklist on requesting useragents,IP's etc.

    Stopping/routing blacklist requests requires examining every request and looking up your blacklist thus must be efficient. I've not tried to do it but on shared hosting maybe a http module could be a good approach. OR just make your pages hard for scrapers to read but easy for search engine bots yer right.

    Let us know if you come up with a good compromise. I sympathize with you because I've been ask to stop scraping of data and I couldn't stop myself doing it without comprimising SEO or app scalability.
    The problem with computers is their nature is pure logic. Just once I'd like my computer to do something deluded.

  9. #9
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: Dealing with Screen Scrappers and Bots

    Quote Originally Posted by brin351 View Post
    Let us know if you come up with a good compromise. I sympathize with you because I've been ask to stop scraping of data and I couldn't stop myself doing it without comprimising SEO or app scalability.
    You have also hit the nail of the head Most likely anything that you put in place to make it hard for these bots that you have identified is going to make it hard for legitimate bots, as used by google.

    Gary

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width