Click to See Complete Forum and Search --> : Preventing Spam with php on a web form?
menre
Jan 5th, 2010, 06:38 AM
Hi Folks,
I have a form on my website and people regularly spam my site. I have thought of using captcha, but don't really know how to do it. Could someone advice me on how to go about it please? I will appreciate it if you can help me with some code or point me in the right directions.
My web form is just plain at the moment and I would like to add something like enter the code you see below.
Thanks in advace.
Menre
SambaNeko
Jan 5th, 2010, 09:49 AM
I made a CAPTCHA on my site's comment forms and I don't think it did squat to deter spam. What really helped was a keyword filter, due to the fact that spam comments tend to repeat certain keywords ad nauseam. So I observed my spam for about a week and made a database table of keywords, and now I search new comments for those keywords; depending on the (subjectively determined) "severity" of the word, comments can be rejected with just one instance, or several.
If you want to try this method instead, I can go into some code.
menre
Jan 5th, 2010, 10:23 AM
Thanks for your response to my posting.
If you want to try this method instead, I can go into some code.
I will appreciate some code please. Could you post some code for me to try that method?
Thanks,
Menre
StrangerInBeijing
Jan 5th, 2010, 10:30 AM
I was about to post about the same thing when I noticed yours....
Keep getting this stupid messages like below on my contact form (http://www.lenocin.com/contact/). Dont like captcha very much (easy to implement, but a pain for users):
From: Joe
Email: JoefThompson@text2re.com
Subject: DbVgpfOsDoSj
Message: acomplia lose weight stop smoking generic acomplia acomplia blogs
sciguyryan
Jan 5th, 2010, 11:24 AM
As has been previously states CAPACHA's can be an issue. Sometimes the simple ones can't deter bots but the more complex ones prevent genuine users using them.
What you could do is a question answer system. Like say, asking "What is the result of one plus one?" or something like that. That normally throws of bots and they are less difficult for end users.
SambaNeko
Jan 5th, 2010, 11:58 AM
First you'll need to set up a database table with at least 2 columns: "keyword" and "point value" (you can name them as you please). In the "keyword" column, put in the spammy words - like in StrangerInBeijing's post, "acomplia" is an obvious keyword - and in the "point value" column, put in a number between 0 and 10 to rank how "bad" the word is. The point value is needed because you may put in some words that aren't always "bad" - in my table for instance, I have "teens" as one word, which may be innocent, but has also been used in porn spam often enough, so give the word a low point value. Words like "acomplia" though would always be bad - give these ones a high value.
Once you have your table set up, when a comment is submitted, check the body of the comment for your keywords (you can also check other fields, but the body is where most of the spam happens).
//connect to MySQL DB
//set a variable to hold point values
$found = 0;
//set max value for bad words - should be equal to your highest point value in the db
$threshold = 10;
//get spam keywords
$sql_result = mysql_query("SELECT * FROM spam_keywords");
//search for spam keywords
while($sql_row = mysql_fetch_assoc($sql_result)){
//make a regex pattern using the keyword
$pattern = "/".$sql_row["word"]."/i";
//preg_match_all() returns the number of pattern matches
//in $comment, which we multiply by the keyword's point value,
//then add to $found
$found += preg_match_all($pattern,$comment,$args) * $sql_row["points"];
}
//check if we've hit the max value for bad words
if($found >= $threshold){
//spam comment, reject it
}else{
//comment is okay, continue processing as normal
}
And that's that. Anyone's welcome to ask questions or improve it. :)
I also do regex checks for HTML tags and URLs, and for a string of consonants at the beginning of the comment (spam will occassionally start off with some garble like the subject line in StrangerInBeijing's post - "DbVgpfOsDoSj" - so if there aren't any vowels within the first 5 or so characters, it's probably not a normal comment). Those are also done with preg_match_all()...
//regex patterns
$reg_html = "/(<([\w]+)[^>]*>)([^<]*)(<\/\\2>)/";
$reg_consonants = "/[bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ]{5}/i";
$reg_url = "/(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:\/~\+#]*[\w\-\@?^=%&\/~\+#])?/i";
Slyke
Jan 5th, 2010, 12:06 PM
Instead of a CAPTCHA, how about a simple question?
Like "Add X & Y", where's X & Y can be any number between 1 & 100. The "Add" could also be changed to a minus or a times and you could also randomly put in the words (like "ten plus five", or "add ten and five") or just use the numbers. Should be enough to confuse bots.
I haven't tried this, but has anyone thought of a moving gif with noise, with static letters/numbers as a CAPTCHA before?
sciguyryan
Jan 5th, 2010, 12:11 PM
I suggested that too. It works pretty well. Question answers really throw off bots as long as they aren't too simple to be programmed in.
SambaNeko
Jan 5th, 2010, 01:28 PM
I like the word filter because it requires no extra work whatsoever for your user. No CAPTCHA, no math, no nothin'. Also as said, it has proven results for my site.
I feel that it also better reflects the rationale of blocking: I'm not trying to block a user's comment because he can't do simple math or can't figure out a CAPTCHA, I'm trying to block it because it has unwanted content in it. So why not just identify what that content is, and if you see it, block it? *shrug* Not an expert, just my opinion on the subject.
But maybe I could get behind Kitten CAPTCHA (http://arstechnica.com/old/content/2006/04/6554.ars).
sciguyryan
Jan 5th, 2010, 01:31 PM
The word filter does work but it has it's limits. Like using spaces or symbols to throw off the word system (unless you strip all symbols and spaces first of course). I may use the keyboard blocker and make it into a class - with some extensions, if you wouldn't mind of course.
The second idea is also very promising since it would be very difficult for a bot to guess the correct answers.
SambaNeko
Jan 5th, 2010, 01:53 PM
(unless you strip all symbols and spaces first of course)
I do (sorry, should've mentioned that), but you're right that there are cases where that wouldn't cut it too. Frankly though, the spam I've gotten on my site is always like Stranger's example - where it repeats its keyword(s) five times with little-to-no ambiguity.
StrangerInBeijing
Jan 5th, 2010, 11:55 PM
Question: That spam...how does it get posted on one's site in the first place?
Some software that automatically run (I know I can do that with WebDriver for instance)?
and out of curiosity..what is the darn purpose of it in the first place?
Trying to think of some automated ways of preventing spam (or at least minimize it) ... thinking about looking inside that Akismet (Wordpress plugin preventing spam)
kows
Jan 6th, 2010, 12:44 AM
it's just a bot (like a search engine crawler, for example) that crawls through the internet. some exist to go through endless amount of links and parse out email addresses to add to spam lists; some for submitting contact forms advertising whatever; some are even made to register on certain types of forum software, and then randomly post a reply to a thread. usually, the thread is a few years old. I have no idea why those ones exist, though some forum-registering bots also make new threads/replies that advertise products of some kind.
there are also plenty of bots that scour every single website imaginable for any obvious exploits. when I used to rent a server that I ran multiple websites off of, the actual IP address of the server didn't return anything. I'd get hundreds, or thousands, of hits per day from random IP addresses trying to access different phpMyAdmin combinations (like /phpmyadmin, /pma, /pma2.2.4, etc) or other software that could be mistakenly left unprotected. I was quite surprised, to say the least.
SambaNeko
Jan 6th, 2010, 12:52 AM
Most spam will try to post links to the spammer's site. This benefits them both by people possibly clicking the link and visiting their site, and by adding their link to a reputable webpage, thus positively affecting their site's page rank.
If you're using WordPress, then yeah, don't bother rolling your own - there's a wealth of development already invested in plugins.
some are even made to register on certain types of forum software
There was a forum I was a moderator on; it died and like a year later I got notified of a new post. A bot had posted a spam thread and added at the end "SambaNeko said it was okay to post this." The nerve!
Nightwalker83
Jan 6th, 2010, 03:58 AM
Dont like captcha very much (easy to implement, but a pain for users):
Yeah, some of the characters are really hard to make out which character they are. Not to mention if the coding hasn't be doing properly the image won't appear and you can't register or whatever.
menre
Jan 11th, 2010, 03:35 PM
Hello Everyone,
Thanks for all your code and advice. There is a lot to work with from your suggestions. I am now putting things together bit by bit based on your contributions. I will let you know where I have got to soon, and ask more questions if I am stuck. This issue of stopping spam requires guidiance from pros like you.
Thanks,
Menre
vbforums.com
Copyright Internet.com Inc., All Rights Reserved.