Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Kurt Wiersma

The Anti-Spam Technique Used On My ColdFusion Job Board

By Ben Nadel on
Tags: ColdFusion

Over the past couple of months, a few people have asked me how the anti-spam technique works on my ColdFusion job board. It's actually quite simple - I create a comma-delimited list of numbers and then I replace one of those numbers with an input field. When submitting the form, the user has to complete the set, filling in the missing number.

 
 
 
 
 
 
 
 
 
 

This isn't a particularly strong anti-spam approach; it doesn't check any of the content for spam-related information. If a user comes along and spams the site manually, this won't help you. It's simply an easy way to help fight automated form submission by spiders and bots.

Every time the page loads, we select a random number as our anti-spam value. Then, we create a key against which we can compare the user-submitted answer. To keep things as simple as possible, I am creating a key based on multiplications of 113. This way, rather than using encryption and decryption, we can simply use the modulus operator to find the original anti-spam value.

  • <!--- Param out form values. --->
  • <cfparam name="form.submitted" type="boolean" default="false" />
  • <cfparam name="form.name" type="string" default="" />
  •  
  • <!--- Spam inputs. --->
  • <cfparam name="form.key" type="numeric" default="0" />
  • <cfparam name="form.value" type="string" default="" />
  •  
  •  
  • <!--- Check to see if the form has been submitted. --->
  • <cfif form.submitted>
  •  
  •  
  • <!---
  • Check to see if the anti-spam value is correct. The key
  • that was submitted back was a multiple of 113 plus out
  • anti-spam value. As such, we can use the MODULUS operator
  • to match the key against the submitted value.
  • --->
  • <cfif ((form.key % 113) neq form.value)>
  •  
  • <!--- The key did not match. This is a bot. --->
  • <p>
  • Get your filthy paws off me you damned dirty bot!
  • </p>
  •  
  • <!--- For this demo, just quit the page request. --->
  • <cfabort />
  •  
  • </cfif>
  •  
  •  
  • </cfif>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • Select a new anti-spam value. We are going to present this
  • number as an input box between within a list of values between
  • 1 and 10. We don't want it to be on the outliers, so only
  • select from an inner range of values.
  • --->
  • <cfset antiSpamNumber = randRange( 2, 9 ) />
  •  
  • <!---
  • When we pass the form through, we need to pass a key that will
  • help us determine if the submitted anti-spam value is valid. To
  • keep things as simple as possible, we'll just add our anti-spam
  • value to a multiple of 113. This way, we can use the modulus
  • operator to figure out the original value.
  • --->
  • <cfset antiSpamKey = (
  • (113 * randRange( 1, 20 )) +
  • antiSpamNumber
  • ) />
  •  
  • <!---
  • Now that we have our anti-spam number, let's create the list of
  • values + input that we will display on the form. To do this, we
  • are simply going to replace our anti-spam number with an INPUT
  • field within the list of values.
  • --->
  • <cfset antiSpamList = replace(
  • "1, 2, 3, 4, 5, 6, 7, 8, 9, 10",
  • antiSpamNumber,
  • "<input type='text' name='value' size='2' />",
  • "one"
  • ) />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!--- Reset the output buffer and set the mime type. --->
  • <cfcontent type="text/html" />
  •  
  • <cfoutput>
  •  
  • <!DOCTYPE html>
  • <html>
  • <head>
  • <title>Simple List-Based Anti-Spam Technique</title>
  • </head>
  • <body>
  •  
  • <h1>
  • Simple List-Based Anti-Spam Technique
  • </h1>
  •  
  • <form action="#cgi.script_name#" method="post">
  •  
  •  
  • <!--- Form submission flag. --->
  • <input type="hidden" name="submitted" value="true" />
  •  
  • <!--- Out anti-spam key. --->
  • <input type="hidden" name="key" value="#antiSpamKey#" />
  •  
  •  
  • <p>
  • Please enter your name:<br />
  •  
  • <input type="text" name="name" />
  • </p>
  •  
  •  
  • <!--- Our anti-spam input. --->
  • <p>
  • Please complete this list:<br />
  •  
  • <!---
  • This is our list of values including the one
  • input value that we need.
  • --->
  • #antiSpamList#
  • </p>
  •  
  • <p>
  • <input type="submit" value="Submit" />
  • </p>
  •  
  •  
  • </form>
  •  
  • </body>
  • </html>
  •  
  • </cfoutput>

Like I said before, this approach doesn't take form content into account. If a user fills out your form and submits content about where to find cheap computer RAM, this is not gonna help you. This kind of anti-spam is here only to help against automated, bot-driven form submissions. And, so far, it's been working out quite nicely.




Reader Comments

As a math person, I like the approach! As you say, this won't stop everything, but definitely will help keep most of the rif-raf out! :)

Reply to this Comment

I like your new blog approach - video and code. Before you know it, you will be on Adobe TV.

Reply to this Comment

@Paul,

Thanks - seems like a really light-weight way of doing this.

@Jose,

Ha ha ha, clearly the answer is... oh wait, gotta run to a meeting.

@Jon,

Thanks! I think the video + code is a really nice combination. Especially because the video allows you to tell the story around the code; plus, you get to start zooming in - so first you get the story, then you get the walk through, then you start to peer down into the code. You never have to mentally parse anything until you have the previous concept, so to speak.

Reply to this Comment

I use strict JS field verification and CFFormProtect/Akismet - and I've obfuscated the links to the forms with JS too.

The result is spambot can't bypass the verification as they won't see the existence of a form without JS.

I still get the occasional spammed form from a human in China / Africa / wherever labour's cheap but it always gets caught by CFFormProtect so I get a spam report but the form recipient is none the wiser.

Job done.

Reply to this Comment

@Ross,

Human spammers are so annoying. I have to admit, I've occasionally deleted valid comments because I couldn't determine if they were spam or not. Spam really has hurt the web.

Reply to this Comment

"...I've occasionally deleted valid comments because I couldn't determine if they were spam or not."

I doubt it helps when natural English-speakers struggle with the Queen's. Not too much trouble on a site like this, I guess, but I immediately move on to the next post on sites where someone has tried using 'txt spk' or ALL CAPS.

Reply to this Comment

@Ross,

I try not to concentrate on the language they use too much; I know we all come for different places. Typically, when I get suspicious when the the comments seems slightly off-topic AND the user is linking to a non-personal site (like a domain name registry or an IT-service company). At that point, I'll typically look at the other comments that they have left; and, if the patterns emerges, I delete the comment (and maybe some of the earlier ones as well).

It's really mentally stressful :)

Reply to this Comment

We use what is called "honeypot captcha".

Since most spambot don't parse and evaluate CSS (they aren't that smart yet) we leave a hidden input field on the form. On form submission we make sure it's empty. If it's filled you can be pretty sure it's a bot.

All you need is a warning message in case a human has CSS disabled indicating that they should leave the field blank if they see it.

Worked so far and it's completely non intrusive to just about every user.

Reply to this Comment

Great comments from all.

@Ben, I think you should weed out all spammers who use the words: labour, theatre, colour, flavour, honour, neighbour, rumour, labour, humour, favourite, honourable, behaviourism & saviour.

After all, FireFox sees them as misspelled (apparently "FireFox" is also a misspelling) so therefore they must be from a spammer, right Ross? ;-) j/k!

Randall 'Round here y'all speak 'Merican!"

Reply to this Comment

@JF,

Agreed - I definitely make use of the honey pot approach. Actually, this blog comment form uses that approach.

@Randall,

Ha ha ha :)

Reply to this Comment

@WebManWalking,

Ha ha ha, that's bananas. I don't even know how they can figure that stuff out (or if its even meaningful).

Reply to this Comment

@Ben,

Here's a "bookmarklet" you can use to check out lots of sites with serverinsiders:

javascript:if((p=location.hostname.split(".")).length>=2)void(window.open("http://www.serverinsiders.com/domain/"+p[p.length-2]+"-"+p[p.length-1]+".html"));

Instructions:
(1) Edit > Copy the string above, the whole thing, from javascript: to semicolon inclusive.
(2) Open bookmarks, find where you want to put it, say New Bookmark.
(3) For the name, say "Server Stats", or some such.
(4) For the URL, Edit > Paste the string above.
(5) Save, close bookmarks.

Then browse to any site. If you'd like to see its serverinsiders stats, just select the Server Stats bookmark. The browser will open a new window with the URL formed to look at the stats for the site you were looking at.

Of course, you may get a 404 Not Found if they haven't accumulated stats for the site. And you can't go 3 levels deep (for example, there isn't any "myblog-blogspot-com.html"). Serverinsiders goes only 2 levels deep, including the top level domain name.

Reply to this Comment

@Ben,

P.S.: google-com.html says that Google's the number 1 site on the Internet. Not a surprise, but it seems somehow odd to know something like that. It reminded me of your "How would they know?" response.

It appears to be based on traffic analysis and hits.

Yahoo's number 4. Microsoft's number 20. bing's number 22. Apple's number 54. CNN's number 61. c|net's number 66. whitehouse.gov's number 3363. The more I try to think of sites that might be number 2, the further off I get.

Reply to this Comment

@WebManWalking,

Good bookmarklet - Fandango is 849 :) Holy cow, AOL is 42... really? really? AOL? maybe in 1995 ;)

Reply to this Comment

@Ben: You're forgetting about the newbie, non-tech crowd. AOL still works for them.

What surprises the heck out of me is that I overheard the familiar "You've Got Mail!" sound at work today.

Reply to this Comment

@JT,

Hey I just noticed the Similar Traffic Domains on the left side! #4, groups.yahoo.com, proves that they DO go 3 deep, just not for everyone. (Even maps.google.com didn't rate having their own page. Oh well.)

So @JT and @Ben, here's another bookmarklet:

"Server Stats 3 Deep":

javascript:if((p=location.hostname.split(".")).length>=3)void(window.open("http://www.serverinsiders.com/domain/"+p[p.length-3]+"-"+p[p.length-2]+"-"+p[p.length-1]+".html"));

Pretty trivial modification, but tested.

Reply to this Comment

You have a very inspiring way of exploring and sharing your thoughts. It is very uncommon nowadays, lots of sites and blogs having copy pasted or rewritten info. But here, no doubt, info is original and very well structured. Keep it up. !!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.