Over the past couple of months, a few people have asked me how the anti-spam technique works on my ColdFusion job board. It's actually quite simple - I create a comma-delimited list of numbers and then I replace one of those numbers with an input field. When submitting the form, the user has to complete the set, filling in the missing number.
This isn't a particularly strong anti-spam approach; it doesn't check any of the content for spam-related information. If a user comes along and spams the site manually, this won't help you. It's simply an easy way to help fight automated form submission by spiders and bots.
Every time the page loads, we select a random number as our anti-spam value. Then, we create a key against which we can compare the user-submitted answer. To keep things as simple as possible, I am creating a key based on multiplications of 113. This way, rather than using encryption and decryption, we can simply use the modulus operator to find the original anti-spam value.
<!--- Param out form values. ---> <cfparam name="form.submitted" type="boolean" default="false" /> <cfparam name="form.name" type="string" default="" /> <!--- Spam inputs. ---> <cfparam name="form.key" type="numeric" default="0" /> <cfparam name="form.value" type="string" default="" /> <!--- Check to see if the form has been submitted. ---> <cfif form.submitted> <!--- Check to see if the anti-spam value is correct. The key that was submitted back was a multiple of 113 plus out anti-spam value. As such, we can use the MODULUS operator to match the key against the submitted value. ---> <cfif ((form.key % 113) neq form.value)> <!--- The key did not match. This is a bot. ---> <p> Get your filthy paws off me you damned dirty bot! </p> <!--- For this demo, just quit the page request. ---> <cfabort /> </cfif> </cfif> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- Select a new anti-spam value. We are going to present this number as an input box between within a list of values between 1 and 10. We don't want it to be on the outliers, so only select from an inner range of values. ---> <cfset antiSpamNumber = randRange( 2, 9 ) /> <!--- When we pass the form through, we need to pass a key that will help us determine if the submitted anti-spam value is valid. To keep things as simple as possible, we'll just add our anti-spam value to a multiple of 113. This way, we can use the modulus operator to figure out the original value. ---> <cfset antiSpamKey = ( (113 * randRange( 1, 20 )) + antiSpamNumber ) /> <!--- Now that we have our anti-spam number, let's create the list of values + input that we will display on the form. To do this, we are simply going to replace our anti-spam number with an INPUT field within the list of values. ---> <cfset antiSpamList = replace( "1, 2, 3, 4, 5, 6, 7, 8, 9, 10", antiSpamNumber, "<input type='text' name='value' size='2' />", "one" ) /> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- Reset the output buffer and set the mime type. ---> <cfcontent type="text/html" /> <cfoutput> <!DOCTYPE html> <html> <head> <title>Simple List-Based Anti-Spam Technique</title> </head> <body> <h1> Simple List-Based Anti-Spam Technique </h1> <form action="#cgi.script_name#" method="post"> <!--- Form submission flag. ---> <input type="hidden" name="submitted" value="true" /> <!--- Out anti-spam key. ---> <input type="hidden" name="key" value="#antiSpamKey#" /> <p> Please enter your name:<br /> <input type="text" name="name" /> </p> <!--- Our anti-spam input. ---> <p> Please complete this list:<br /> <!--- This is our list of values including the one input value that we need. ---> #antiSpamList# </p> <p> <input type="submit" value="Submit" /> </p> </form> </body> </html> </cfoutput>
Like I said before, this approach doesn't take form content into account. If a user fills out your form and submits content about where to find cheap computer RAM, this is not gonna help you. This kind of anti-spam is here only to help against automated, bot-driven form submissions. And, so far, it's been working out quite nicely.
Want to use code from this post? Check out the license.
As a math person, I like the approach! As you say, this won't stop everything, but definitely will help keep most of the rif-raf out! :)
If you want to keep out the manual spammers go for a CAPTCHA like this.
I like your new blog approach - video and code. Before you know it, you will be on Adobe TV.
Thanks - seems like a really light-weight way of doing this.
Ha ha ha, clearly the answer is... oh wait, gotta run to a meeting.
Thanks! I think the video + code is a really nice combination. Especially because the video allows you to tell the story around the code; plus, you get to start zooming in - so first you get the story, then you get the walk through, then you start to peer down into the code. You never have to mentally parse anything until you have the previous concept, so to speak.
I use strict JS field verification and CFFormProtect/Akismet - and I've obfuscated the links to the forms with JS too.
The result is spambot can't bypass the verification as they won't see the existence of a form without JS.
I still get the occasional spammed form from a human in China / Africa / wherever labour's cheap but it always gets caught by CFFormProtect so I get a spam report but the form recipient is none the wiser.
Human spammers are so annoying. I have to admit, I've occasionally deleted valid comments because I couldn't determine if they were spam or not. Spam really has hurt the web.
"...I've occasionally deleted valid comments because I couldn't determine if they were spam or not."
I doubt it helps when natural English-speakers struggle with the Queen's. Not too much trouble on a site like this, I guess, but I immediately move on to the next post on sites where someone has tried using 'txt spk' or ALL CAPS.
I try not to concentrate on the language they use too much; I know we all come for different places. Typically, when I get suspicious when the the comments seems slightly off-topic AND the user is linking to a non-personal site (like a domain name registry or an IT-service company). At that point, I'll typically look at the other comments that they have left; and, if the patterns emerges, I delete the comment (and maybe some of the earlier ones as well).
It's really mentally stressful :)
I can imagine!
We use what is called "honeypot captcha".
Since most spambot don't parse and evaluate CSS (they aren't that smart yet) we leave a hidden input field on the form. On form submission we make sure it's empty. If it's filled you can be pretty sure it's a bot.
All you need is a warning message in case a human has CSS disabled indicating that they should leave the field blank if they see it.
Worked so far and it's completely non intrusive to just about every user.
Great comments from all.
@Ben, I think you should weed out all spammers who use the words: labour, theatre, colour, flavour, honour, neighbour, rumour, labour, humour, favourite, honourable, behaviourism & saviour.
After all, FireFox sees them as misspelled (apparently "FireFox" is also a misspelling) so therefore they must be from a spammer, right Ross? ;-) j/k!
Randall 'Round here y'all speak 'Merican!"
Agreed - I definitely make use of the honey pot approach. Actually, this blog comment form uses that approach.
Ha ha ha :)
It's good that you're guarding against spam. Did you know that yours is the 15,683 site in the world?
Ha ha ha, that's bananas. I don't even know how they can figure that stuff out (or if its even meaningful).
Here's a "bookmarklet" you can use to check out lots of sites with serverinsiders:
(2) Open bookmarks, find where you want to put it, say New Bookmark.
(3) For the name, say "Server Stats", or some such.
(4) For the URL, Edit > Paste the string above.
(5) Save, close bookmarks.
Then browse to any site. If you'd like to see its serverinsiders stats, just select the Server Stats bookmark. The browser will open a new window with the URL formed to look at the stats for the site you were looking at.
Of course, you may get a 404 Not Found if they haven't accumulated stats for the site. And you can't go 3 levels deep (for example, there isn't any "myblog-blogspot-com.html"). Serverinsiders goes only 2 levels deep, including the top level domain name.
P.S.: google-com.html says that Google's the number 1 site on the Internet. Not a surprise, but it seems somehow odd to know something like that. It reminded me of your "How would they know?" response.
It appears to be based on traffic analysis and hits.
Yahoo's number 4. Microsoft's number 20. bing's number 22. Apple's number 54. CNN's number 61. c|net's number 66. whitehouse.gov's number 3363. The more I try to think of sites that might be number 2, the further off I get.
Good bookmarklet - Fandango is 849 :) Holy cow, AOL is 42... really? really? AOL? maybe in 1995 ;)
@WebManWalking Facebook.com is number two.
@JF: Makes sense. Thanks!
@Ben: You're forgetting about the newbie, non-tech crowd. AOL still works for them.
What surprises the heck out of me is that I overheard the familiar "You've Got Mail!" sound at work today.
Hey I just noticed the Similar Traffic Domains on the left side! #4, groups.yahoo.com, proves that they DO go 3 deep, just not for everyone. (Even maps.google.com didn't rate having their own page. Oh well.)
So @JT and @Ben, here's another bookmarklet:
"Server Stats 3 Deep":
Pretty trivial modification, but tested.
Ah Facebook - that makes sense. I can't believe I didn't think of that.
Still blows my mind. I do know people who still use their AOL email. Makes me think of the Oatmeal:
... that guy is just brilliant :D
You have a very inspiring way of exploring and sharing your thoughts. It is very uncommon nowadays, lots of sites and blogs having copy pasted or rewritten info. But here, no doubt, info is original and very well structured. Keep it up. !!