Results 1 to 6 of 6

Thread: banning bots

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046

    banning bots

    My visit log is getting jammed with visits from bots like cyveillence and nameprotect. My website is asp.net. Is there anything I can do to block their access to my pages?

  2. #2
    PowerPoster hellswraith's Avatar
    Join Date
    Jul 2002
    Location
    Washington St.
    Posts
    2,464
    http://www.gulker.com/music_industry...llancebot.html

    http://www.nameprotect.com/botinfo.html

    If you do simple searches at google.com, you would come up with those links.

  3. #3

    Thread Starter
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046
    Originally posted by hellswraith
    http://www.gulker.com/music_industry...llancebot.html

    http://www.nameprotect.com/botinfo.html

    If you do simple searches at google.com, you would come up with those links.
    Thanks, but I think you misunderstood. I was asking for an asp.net solution to keep certain bots off my site (not a description of the bots). I think it can be done with global.asax but I am not sure ...

    Thanks again

  4. #4
    PowerPoster hellswraith's Avatar
    Join Date
    Jul 2002
    Location
    Washington St.
    Posts
    2,464
    Ok, the links I gave you should get you started. The first says this:
    How to block Cyveillancebot:

    If you are using Apache, have access to your .htaccess file, and the rewrite engine is enabled, then you may add the following lines to block the Cyveillence bot:

    # Cyveillance RewriteCond %{REMOTE_ADDR} ^63.148.99.(22[4-9]|2[3-5][0-9])$

    # FILTER BOTS : 403-Forbidden RewriteRule ^.* - [F,L]

    These line will return a http status code of 403 (Forbidden) anytime a request is made from any of Cyveillence's IP addresses.
    Now, I understand you are not using apache, so you will have to take the incoming IP address and see if it is one of those that is from the bot. If it is, redirect to a non existent page and it will get an error.


    If you look at the second link, this is what you have to do:
    Practices the following best practices to ensure non-invasive crawling:
    Honoring robots.txt files - to exclude the NPBot crawler, please use "NPBot" as the user-agent name in your robots.txt file (for more robots.txt information, see:
    http://www.robotstxt.org/wc/exclusion.html#robotstxt )
    The robots.txt file is pretty common thing done. You don't need asp.net to do it, but you could include it in your asp.net project if you like.

  5. #5

    Thread Starter
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046
    Thanks for the responses! ... I dont like using robots.txt ... something about posting a "here's the stuff I dont want you to look at" list just seems kinda flawed to me :-) ... cyveillance apparently ignores this anyway.

    i think i can do something really simple with global.asax. Keep in mind that I am an asp.net idiot. The statement:

    Request.UserHostAddress

    crashes with an error ... something like "not legal in this context"

    Im still trying some things though ...

  6. #6
    PowerPoster hellswraith's Avatar
    Join Date
    Jul 2002
    Location
    Washington St.
    Posts
    2,464
    Request.UserHostAddress isn't available in the Global because it isn't available there. The request is made to the page itself or another web resource. Global.asax will handle some app events, but the Request object isn't in the scope of the event handlers. At least that is how I have found it. You could make a control that you can drop into each page that checks the request object for the address because they will have access to it via the page object.

    Keep in mind that I am an asp.net idiot.
    I wasn't thinking that... just so you know.

    I understand about the robots.txt file, I don't like it either. Other than protecting the pages behind some kind of authentication, I know of any other way to protect the page.

    As funny as it sounds, I have come accross some sites that take an html page, and save it as a jpg file or something. When the browser browses to that file, the page is displayed. I haven't dug into this very much though, just not enough time. I like the method though. If you think about the way most spiders work, they won't download the jpg file because it is useless to it. I have found this mostly on porn type sites (always ahead of the curve). If I find a page like this again, I will post it here. I know it isn't a asp.net solution though.

    I think the web control thing is probably the best way to go, just drop it on each page and you are done.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width