There have been some posts recently about how people stop spammers from submitting comments on their blogs and contact forms, so I thought I would share mine as it has been working near flawlessly. I wanted to keep mine simple. I don't care for Captcha as I find it hard to read; the de-spamming process shouldn't keep out people as well as spam bots. I wanted to keep it HTML, keep it easy for humans, hard for computers. What I came up with was math.
Now, I know what you're thinking, "Math, harder for a computer?" No, not at all. Math will always be harder for humans. The difference here is that reading the math will be easy for humans, hard for robots. What I do, is provide a mandatory math equation for every submit form:
To cut down on spam, please solve this math equation: ( 2 + 10 )?
To make this easy for humans and harder for computers, the source code for this equation look like:
To cut down on spam, please solve this math equation: (
<span class="despamminq">-</span> <span class="despammint">6</span> <span class="despamminz">19</span> <span class="despammina">13</span> <span class="despamming34">9</span> <span class="despamminzzz">-</span> <span class="despamming01">2</span> <span class="despammingj">+</span> <span class="despammin4">17</span> <span class="despamming4">+</span> <span class="despammingg">6</span> <span class="despamminnn">10</span> <span class="despamminzzz">+</span> <span class="despamminz">3</span> <span class="despammint">11</span> <span class="despammin09">16</span>
)?<br />
And further more, if you were to copy and paste the equation from the web browser (at least in FireFox), it would look like:
To cut down on spam, please solve this math equation: ( - 6 19 13 9 - 2 + 17 + 6 10 + 3 11 16 )?
So, what is going on here? First of all, let me say that it is nothing fool proof. Robots can figure it out, but not yet it seems. The security here is has several aspects:
All the character values in the above equation are ASCII encoded. That means, that instead of being represented by the physical character, such as "A", the characters are represented by the escaped ASCII value, such as "A". This, of course, is only in the source code of the page. When viewing the web page, the user is seeing the easy-to-read evaluated value, "A".
Again this is not fool-proof. Robots can decode ascii values. It just adds an obstacle that they have to figure out.
The next obstacle is that the equation is randomly dirty. As you can see from the pasted value, the equation contains much more than the two values and a single operator. It is interspersed with random numbers and operators. Again, not fool proof, just an obstacle.
The final obstacle is CSS. The way I get the equation to show properly is to hide many of the spans being displayed. In my example, the there are several CSS classes that can be displayed and several that are hidden. By making allowing several classes to hide and several classes to show, it should make finding a pattern even harder for the spam-bot.
Now, none of these is fool proof. Even all combined, a robot could be programmed to figure it out. The point here is that these three obstacles combined make it tough. So far, with my new despamming methodology in place, I have not gotten a single spam post or contact form submissions. I had a bot hit my comment page close to a 70 times in one day and nothing got through.
I have a custom function that helps me create this de-spamming text. It takes the two values, the operator, the list of classes that are visible, and the list of classes that are hidden:
Launch code in new window » Download code as text file »
Outputting the math equation is not the only part. We need a way to test it. Since my math equation is randomized, it never shows up the same for page loads (or at least, with a very low probability). To make sure I only get proper answers, I send the values and operator through the form as well. These values are encrypted so that only I can read them on the server:
Launch code in new window » Download code as text file »
Then on the server, I can decrypt the values and check them against the user's submitted solution.
This is not a fool-proof solution, as I keep saying. But it does have randomness that makes it harder to crack for a robot. Humans should find this easy to use, as long as they can do a little math. Sweet, simple, and effective.
Download Code Snippet ZIP File
Comments (23) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Hey, this is pretty nifty. I just tested it and took a look at your source code. Cool stuff. I have to admit, i messed up the math question the first time :) Hey, it's been a while since I have done math :)
Posted by Kit "Kat" Claudio on Aug 7, 2006 at 12:18 PM
If you really want to go that extra mile, you may want to add in some CSS inheritance:
.spam1 .spam2 { display: hidden }
.spam3 .spam4 { color: white; background-color: white; }
In that way, .spam2 is only hidden if it's inside of .spam1. Or, .spam4 is only white if it's inside of .spam3. Bots would have to implement a full-blown CSS parser to work around it.
You couldn't get too fancy (child/sibling selectors), due to shoddy CSS support in still-used browsers (*cough*Explorer*cough*), but it'll ratchet up the bar that much higher.
Or, for the truly insane, go with a Schneier-esque wheat-and-chaff presentation. Present 10 different equations, all similarly obfuscated, then a hint such as "what is the answer to the second blue equation?". (I've often wondered why this isn't done already with CAPTCHA.)
Posted by Rick O on Aug 7, 2006 at 2:33 PM
Rick, as always, you offer excellent insight and suggestions. I think the inheritence idea is fantastic. Let me work on implementing it. It would create a lot more classes as I would want to have multipel parent classes (otherwise it would defeat the randomness). But, still, definately doable.
Posted by Ben Nadel on Aug 7, 2006 at 2:36 PM
And don't forget:
http://www.thinkgeek.com/homeoffice/stickers/3185/
Posted by Rick O on Aug 7, 2006 at 2:36 PM
Very impressive. Math rocks!
Posted by Sami Hoda on Aug 7, 2006 at 3:19 PM
Math does rock :) If only we all spoke math, we would never misunderstand each other ;)
Posted by Ben Nadel on Aug 7, 2006 at 3:22 PM
I'm doing something similar on my blog's contact form, but I'm doing it with JavaScript. But the problem I'm running into is that spam bots often skip the form and post directly to the form processor. I'm thinking of the best way to make sure the form data actually comes from my form, otherwise forward to /dev/null
Posted by Jacob Munson on Aug 7, 2006 at 8:03 PM
Jacob, to overcome this, I am posting the answer to the form as a hidden, encrypted field. That way, if the bot was to post directly to the form, not only would it have to post an answer, it would also have to know how to properly encrypt the data so that I could decrypt it on the server and check it against the provided answer.
Posted by Ben Nadel on Aug 8, 2006 at 7:29 AM
I came up with a solution last night. I thought about doing what you did, Ben (encrypting the answer), but I ended up using ajax to set a session variable, and then in the form processor I check for that session variable, and if it's the right value. Time will tell if this will work well or not.
But I agree with your original post, captcha seems too cumbersome compared to this. Not to mention, I'm on a Linux server and I haven't been able to get the open source captcha components to work. I've heard that Alagad's works on headless Linux, but I'm too cheap to buy something like that.
To make it even easier for my users, I restricted the range of numbers to keep them small enough for simple math. And, I made the first number a smaller range than the second number (less than 10) to make it even easier.
Posted by Jacob Munson on Aug 8, 2006 at 9:50 AM
Jacob, I like the idea of the AJAX stuff. I am not sure how well the spam bots handle Javascript, so it might be one more level of protection to exclude non-js capable browsers.
As far as the numbers, I like the restriction. I currently restrict to 20, which is not too bad, but still messes some people up.
Posted by Ben Nadel on Aug 8, 2006 at 9:57 AM
Jacob, why do you need to use ajax to set a session variable?
Andrew
Posted by Andrew Grosset on Aug 13, 2006 at 10:08 PM
OK, I get it!! on the assumption that the bot can't parse the javascript then the bot won't set the session variable whilst a user will!
Posted by Andrew Grosset on Aug 14, 2006 at 1:28 PM
This may be completely wrong but... why don't you use session comparisons?
For example; in a form have a hidden field with the current sessionID in it. When the form is submitted compare this value to the current sessionID, if matches execute as normal, if not throw away.
Now the bot will use its cached version of the form session data which when compared to the 'real' current session will be incorrect unless the bot spams you instantly. Setting a session timeout relatively low will help.
I understand this is not the most comprehensive solution but seems to have worked so far on my blog.
Or am I missing something really obvious?
Posted by Ads on Aug 25, 2006 at 12:59 PM
Ads,
That is definately a valid solution. The problem with it, while small, is that there are a good number of people (I don't want to say paranoid people, but...) that turn off their cookies. Without the cookies, the session is not easily help from page to page and I don't want to have complicated session handling. I want those people to submit forms also.
Posted by Ben Nadel on Aug 25, 2006 at 1:07 PM
while tring to user your function this error is come:
Element LIBRARY.TEXT is undefined in a Java object of type class [Ljava.lang.String; referenced as
The error occurred in C:\website\amncaptcha\DeSpamEquation.cfm: line 37
35 : // Add a hidden operator.
36 : LOCAL.Result.Append( "<span class=""#ListGetAt( ARGUMENTS.HiddenClasses, RandRange( 1, LOCAL.HiddenClassesLength ) )#"">" );
37 : LOCAL.Result.Append( VARIABLES.Library.Text.ToAsciiString( "-" ) );
38 : LOCAL.Result.Append( "</span> " );
39 :
Posted by Ameen on Sep 11, 2006 at 6:06 AM
Ameen,
ToAsciiText() is another function I have in my library. Sorry it was not posted above. Here it is (you will have to tweak your code to call it as you probably do not have your UDF's broken up the way I do).
<cffunction
name="ToAsciiString"
access="public"
returntype="string"
output="no"
hint="Returns the given string in ascii format. This can be used for making strings hard for web-spiders to read.">
<!--- Define arguments. --->
<cfargument name="Text" type="string" required="yes" />
<cfscript>
// Define the local scope.
var LOCAL = StructNew();
// Create a default safe string.
LOCAL.AsciiText = "";
// Loop over the characters in the string and convert to ascii.
for (LOCAL.CharIndex = 1 ; LOCAL.CharIndex LTE Len( ARGUMENTS.Text ) ; LOCAL.CharIndex = (LOCAL.CharIndex + 1)){
LOCAL.AsciiText = (LOCAL.AsciiText & "#" & Asc( Mid( ARGUMENTS.Text, LOCAL.CharIndex, 1 ) ) & ";" );
}
// Return the new ascii string.
return( LOCAL.AsciiText );
</cfscript>
</cffunction>
Posted by Ben Nadel on Sep 11, 2006 at 7:14 AM
thanks Ben it works perfect
Posted by Ameen on Sep 11, 2006 at 12:10 PM
Ameen,
Awesome. Glad to help :)
Posted by Ben Nadel on Sep 11, 2006 at 12:50 PM
The problem I have with this approach is that it also bars people using screen readers, e.g. blind people using JAWS, or other mechanisms like that. When you block the bots, you also block the blind and people with other disabilities.
This can create financial liabilities, as the folks developing the Sydney2000 Olympics site learned to the tune of a $40,000 judgement plus legal costs when a blind user sued them under the anti-discrimination laws. With 6 weeks to go to the Olympics they had to re=write much of the site.
So even if you dont think blind users amount to a significant proportion of your user base, you stlil have to watch out for anti-discrimination laws.
Anyway, I'd like to see if there can be a variation of this idea (which is a REALLY good idea by the way!) which would allow humans using screen readers to get around the blocking. Perhaps naming the fields something like "ThisfieldjustToTrickSpamBots_DoNotChange" or some such.
Posted by Mike Kear on Nov 27, 2006 at 9:00 PM
Mike, you raise excellent points. I am already trying to tackle these:
http://www.bennadel.com/index.cfm?dax=blog:405.view
Not quite there yet, but almost.
Posted by Ben Nadel on Nov 27, 2006 at 10:17 PM
<a href= http://forum.lixium.fr/cgi-bin/liste.eur?wellbut > wellbutrin sr </a> [url= http://forum.lixium.fr/cgi-bin/liste.eur?wellbut ] wellbutrin medication [/url]
Posted by wellbutrin medication on Nov 28, 2006 at 7:27 AM
Contact the advertisers not the sender. Provide them with the emial and the full header.
Quite often they do not really understand they are supporting the spammers.
Posted by FEDUP on May 24, 2007 at 6:45 PM
I think it would be funny to make the math a little harder, and put a calculator at the bottom for the user if they cant figure it out.. hehe
I messed up the math the first time too.. calculus > addition ... :)
Posted by Annen on Jun 5, 2007 at 9:14 AM