I was thinking of functions to add to the imageUtils.cfc ColdFusion image manipulation component, when I started to think about CAPTCHA. CAPTCHA is a cool thing, but one of the problems with it is that in order to fool bots, sometimes you have to make it unreadable even to humans. I wondered if the fact that it is a single image has anything to do with bot's and spider's ability to beat the CAPTCHA. To play around with this idea, I have created a method called EasyCaptcha(). This method takes the string you want to use in a CAPTCHA image and creates a separate image for each letter and writes it to the browsers response buffer. This way, I figured it might be very easy for humans to read, but more difficult for spiders and bots to figure out on which images it needs to perform character recognition (OCR).
In addition to the text you are going to use, EasyCaptcha() also optionally takes the font size, the canvas background color, and the text color:
Launch code in new window » Download code as text file »
Be careful, the EasyCaptcha() ColdFusion user defined function makes use of another UDF, GetTextDimensions() so that it can figure out how big to make the individual letter images. The images are written to the browser such that no white space is included between the individual images; I figured this would allow for the most flexibility in styling. To demonstrate, I have created a little test page that outputs EasyCaptcha() using two different styles:
Launch code in new window » Download code as text file »
Running the above code, we get the following output:
| | | | ||
| | ![]() | | ||
| | | |
As you can see, we can have the CAPTCHA present to the user as if it was a single image or we can style it so that it looks like individual images. I am not sure if one or the other will affect the effectiveness at defeating bots. Heck, I am not sure if this will be effective at defeating bots at all, but I thought it would be a cool experiment. My hope is that this can create a CAPTCHA that is really easy for web users to understand but much more difficult for bots to decipher.
Download Code Snippet ZIP File
Comments (13) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
What Does It Mean To Be A CSS Class?
GetTextDimensions() For Finding ColdFusion Image Text Dimensions
While it looks cool, I don't think it's a real "CAPTCHA" per-say. There has to be some kind of background noise or retarded letter distortion that makes you guess at the word for at least 30 minutes.
Posted by Todd Rafferty on Feb 11, 2008 at 7:38 AM
Ha ha, 30 minutes :) I have seen CAPTCHA that I couldn't solve even with repeated bouts of incorrect submissions.
Posted by Ben Nadel on Feb 11, 2008 at 8:29 AM
@Ben:
The reason CAPTCHA uses all the background noises and tilted letter is to try to prevent OCR programs from reading the images. Of course supposedly now spammers have built code that can bypass some of the more common CAPTCHA images.
Anyway, splitting the images is interesting, but I wonder if splitting the images up in the *middle* of each letter might be more effective.
Granted, a computer could still possibly stitch the image back together, but you might be able to obfuscate that enough. Besides, if it's not a widely adopted CAPTCHA implementation, the odds of a spammer righting code to circumvent it is very slim.
Posted by Dan G. Switzer, II on Feb 11, 2008 at 8:52 AM
@Dan,
I considered the splitting up of images mid-character at first. I was actually thinking of writing the image and cutting it up in something like 10x10 pixel images and then putting a pixel space between each image. But, then I decided to just start out simple and see what people thought.
But I agree, as long as it's not a popular CAPTCHA method, the chances are someone will take time to write out the OCR algorithm, especially for us bloggers, is slim.
Posted by Ben Nadel on Feb 11, 2008 at 9:00 AM
You underestimate the sheer power of being able to spam blogs with links to porn site, etc. This is an SEO war and bloggers are unfortunately aiding and abetting.
I read in the news once, I believe on MSNBC.com, that spammers were passing around this little porn widget. A little stripper would dance. Then she would stop and a little message would appear "If you'd like her to continue, type the phrase in the box." What was it? It was a captcha. Little did they know that people were involved in a social engineering tool of helping spammers crack captcha by building an image library.
Article:
http://www.msnbc.msn.com/id/21566341/
Posted by Todd Rafferty on Feb 11, 2008 at 9:12 AM
@Todd,
I hadn't heard about that scam! You have to admit that's rather clever. If your software can't beat the Turing test, then you just trick some humans into helping you. Could be a great plotline on The Sarah Connor Chronicles!
In all seriousness, any individual CAPTCHA technique is only going to be effective for a limited time, until spammers figure out how to circumvent it. Which requires us to continually invent new techniques. It's an arms race, and new ideas like Ben's are exactly what we need to keep the other side at bay.
Rock on, Ben!
Posted by David Stamm on Feb 11, 2008 at 11:06 AM
@Todd,
That article is bananas!
@Dave,
It's not too hard to come up with something that will be a bot... the problem is that so often that ALSO beats humans :) I have definitely come up against several CAPTCHA style problems that I could not seem to get. The trick is to make something that is easy for the human brain and very hard for the computer one.
Posted by Ben Nadel on Feb 11, 2008 at 5:33 PM
Absolutely, the trick is to make something that is easy for the human brain and very hard for the computer one. Sadly, one of the great purposes of software development and microchip development is to make a computer brain that works exactly like a human's does. There are lots of little advances that we all think are cool and useful-- such as OCR software for scanning documents, stitching pictures together to make virtual 3-D tours, or even object recognition for baggage scanners at airports-- which are in turn useful to those who want to make bots appear to be human. I'm not sure where the balance will come out, personally.
Now if only someone could require all bots to be built with Asimov's 3 Laws of Robotics...
Posted by Tom Mollerus on Feb 13, 2008 at 4:17 PM
Hahah that's a captcha? Bitmap comparison defeats it, OCR defeats it. There is nothing hard about it.
Posted by Not Me on Feb 29, 2008 at 12:12 PM
btw, I came across an interesting blog post:
http://www.codinghorror.com/blog/archives/001067.html
Posted by Todd Rafferty on Mar 6, 2008 at 8:26 AM
@Not Me,
I was hoping that it was the fact that it was multiple images that would make it hard, not the actual characters.
Posted by Ben Nadel on Mar 6, 2008 at 8:33 AM
I've been contemplating different CAPTCHA schemes for a while as well, including one similar to yours. Nice work on the project. I think it might be tougher if a) you split the letters, b) you picked different fonts for each letter, c) fuzzed the image somewhat and d) scrambled them but with numeric cues to let a human put them in the right order. I'm working on a project to attempt all of these and test their utility.
Posted by Chris Strasser on Aug 1, 2008 at 9:32 AM
@Chris,
Sounds like a cool project. Please post your results here (or a link to them) when you are done.
Posted by Ben Nadel on Aug 1, 2008 at 1:30 PM