Dealing with Screen Scrappers and Bots
HI Guys, I got a bot and screen scrapper problem. I have successfully redirected them somewhere else, but they still show up in my stats. I run my redirection code in the pre init section of the master page. I use discountasp for my hosting provider. Is there a way to run my code before the preinit page section?
Re: Dealing with Screen Scrappers and Bots
Hey,
Just to clarify, exactly what problem are you having?
Gary
Re: Dealing with Screen Scrappers and Bots
When the screen scrappers and bots hit my site they mess with my stats. Making it a little harder to figure out who's coming or not. I know it's them because the user agent usually says scraper, bot, or crawler.
Re: Dealing with Screen Scrappers and Bots
It may not be a good idea to prevent bots from crawling the website because that's how the site pages get indexed for search engine searchs. Don't you wan't the site to show up on search engines ??? OR is their another issue besides the stats???
If it's just the stats issue can you not just query your stats on the useragent to ignore the bots.
Re: Dealing with Screen Scrappers and Bots
Have to agree with brin here. Unless you know for a fact that the bot is coming from a malicious source, what is the problem?
Gary
Re: Dealing with Screen Scrappers and Bots
I want the search engines. I want humans to come to my site, but not computers. I really don't want someone grabbing all the information my site and stashing it somewhere. I don't mind people copying it and storing but someone who has a computer setup to actually just come and randomly grab stuff. I have a problem with that. A lot of this stuff is from China and Russia and I really have problems with the way they handle their computer systems. I'd like them to say out of my site.
I don't want things like this:
User Agent Browser Version Platform Visits
ScrapeBox v1.0.0 Unknown Unknown 21
No User Agent (masked) - - 35
R6_CommentReader(www.radian6.com/crawler) Unknown Unknown 13
Moreoverbot/5.1 ( http://w.moreover.com; [email protected]) Mozilla/5.0 Unknown Unknown 3
Java/1.6.0 Unknown Unknown 2
Jakarta Commons-HttpClient/3.1 Unknown Unknown 1
Sogou web spider/4.0( http://www.sogou.com/docs/help/webmasters.htm#07) Unknown Unknown 1
Re: Dealing with Screen Scrappers and Bots
Bottom line is, if you are going to put something up on your web site, then you are opening it up to the public. You are never going to be able to stop every single person that you don't want accessing your site. What sort of content are you worried about? You could also add a "members" section to your site, and put the content there, that way people have to log in before they can see parts of your site.
Gary
Re: Dealing with Screen Scrappers and Bots
jakkjakk,
Gary has hit the nail on the head so to speak. If you put information in the public domain then it's open to anyone/thing to consume and you almost can't change that.
If you google something like "stop screen scraping" you'll see this problems is mainly resolved by analyzing log files for potential offenders and creating a blacklist on requesting useragents,IP's etc.
Stopping/routing blacklist requests requires examining every request and looking up your blacklist thus must be efficient. I've not tried to do it but on shared hosting maybe a http module could be a good approach. OR just make your pages hard for scrapers to read but easy for search engine bots :) yer right.
Let us know if you come up with a good compromise. I sympathize with you because I've been ask to stop scraping of data and I couldn't stop myself doing it without comprimising SEO or app scalability.
Re: Dealing with Screen Scrappers and Bots
Quote:
Originally Posted by
brin351
Let us know if you come up with a good compromise. I sympathize with you because I've been ask to stop scraping of data and I couldn't stop myself doing it without comprimising SEO or app scalability.
You have also hit the nail of the head :) Most likely anything that you put in place to make it hard for these bots that you have identified is going to make it hard for legitimate bots, as used by google.
Gary