My visit log is getting jammed with visits from bots like cyveillence and nameprotect. My website is asp.net. Is there anything I can do to block their access to my pages?
Printable View
My visit log is getting jammed with visits from bots like cyveillence and nameprotect. My website is asp.net. Is there anything I can do to block their access to my pages?
http://www.gulker.com/music_industry...llancebot.html
http://www.nameprotect.com/botinfo.html
If you do simple searches at google.com, you would come up with those links.
Thanks, but I think you misunderstood. I was asking for an asp.net solution to keep certain bots off my site (not a description of the bots). I think it can be done with global.asax but I am not sure ...Quote:
Originally posted by hellswraith
http://www.gulker.com/music_industry...llancebot.html
http://www.nameprotect.com/botinfo.html
If you do simple searches at google.com, you would come up with those links.
Thanks again
Ok, the links I gave you should get you started. The first says this:
Now, I understand you are not using apache, so you will have to take the incoming IP address and see if it is one of those that is from the bot. If it is, redirect to a non existent page and it will get an error.Quote:
How to block Cyveillancebot:
If you are using Apache, have access to your .htaccess file, and the rewrite engine is enabled, then you may add the following lines to block the Cyveillence bot:
# Cyveillance RewriteCond %{REMOTE_ADDR} ^63.148.99.(22[4-9]|2[3-5][0-9])$
# FILTER BOTS : 403-Forbidden RewriteRule ^.* - [F,L]
These line will return a http status code of 403 (Forbidden) anytime a request is made from any of Cyveillence's IP addresses.
If you look at the second link, this is what you have to do:
The robots.txt file is pretty common thing done. You don't need asp.net to do it, but you could include it in your asp.net project if you like.Quote:
Practices the following best practices to ensure non-invasive crawling:
Honoring robots.txt files - to exclude the NPBot crawler, please use "NPBot" as the user-agent name in your robots.txt file (for more robots.txt information, see:
http://www.robotstxt.org/wc/exclusion.html#robotstxt )
Thanks for the responses! ... I dont like using robots.txt ... something about posting a "here's the stuff I dont want you to look at" list just seems kinda flawed to me :-) ... cyveillance apparently ignores this anyway.
i think i can do something really simple with global.asax. Keep in mind that I am an asp.net idiot. The statement:
Request.UserHostAddress
crashes with an error ... something like "not legal in this context"
Im still trying some things though ...
Request.UserHostAddress isn't available in the Global because it isn't available there. The request is made to the page itself or another web resource. Global.asax will handle some app events, but the Request object isn't in the scope of the event handlers. At least that is how I have found it. You could make a control that you can drop into each page that checks the request object for the address because they will have access to it via the page object.
I wasn't thinking that... just so you know.Quote:
Keep in mind that I am an asp.net idiot.
I understand about the robots.txt file, I don't like it either. Other than protecting the pages behind some kind of authentication, I know of any other way to protect the page.
As funny as it sounds, I have come accross some sites that take an html page, and save it as a jpg file or something. When the browser browses to that file, the page is displayed. I haven't dug into this very much though, just not enough time. I like the method though. If you think about the way most spiders work, they won't download the jpg file because it is useless to it. I have found this mostly on porn type sites (always ahead of the curve). If I find a page like this again, I will post it here. I know it isn't a asp.net solution though.
I think the web control thing is probably the best way to go, just drop it on each page and you are done.