A while back, I wrote about how easy ColdFusion 8 is going to make image creation and manipulation. Of my multipart series on ColdFusion 8's new CFImage tag, I briefly touched upon its CAPTCHA functionality. A few people have contacted me since looking for a more detailed example of how CAPTCHA can actually be used. Before I show you the code, let's just quickly cover the basics. With the CAPTCHA action of the CFImage tag, there are few attributes that are required:
Height - The pixel height of the generated image. The height must be large enough to properly display all of the designated text at the given font size.
Width - The pixel width of the generated image. The width must be large enough to properly display all of the designated text at the given font size. This width is directly proportional to the number of characters, so once you find a width that works, just stick with it. If the width is too small, ColdFusion will throw an error (which will actually tell you what the minimum width can be for the given text).
Text - The text to display in the CAPTCHA image. The CAPTCHA action does not randomly select text for you; it only displays the text that you tell it to. It is still up to us as programmers to generate random text and tell it what to display.
As you adjust the height and width of the tag, ColdFusion 8 automatically spreads out the individual characters of the designated text to fill up the area. The text may or may not be evenly distributed; ColdFusion tries to move things around enough to make it bot-proof, but human readable.
Additionally, there are some optional ColdFusion 8 CAPTCHA attributes:
Destination - By default, ColdFusion 8 writes the CAPTCHA image to a randomly named image in the image servlet and then writes that image to the browser (recommended IMO). However, if you provide a destination (an absolute or relative path) file, ColdFusion 8 will write the CAPTCHA image to this file rather than writing it to the browser. If you do this, you have to be careful not to overwrite a CAPTCHA image that a concurrent user might be viewing.
Difficulty - The degree to which the text is made bot-proof. Bey default, the difficulty is Low (options include Low, Medium, High). The higher the difficulty, the less likely it is that both Bots and Humans can read it. The trade-off is your choice, but I feel that Medium is a nice level of security.
Overwrite - When using the Destination attribute, this is a flag to determine whether or not to overwrite the existing file at the given destination path. This defaults to No, and if you attempt to overwrite an image without this being set to Yes, ColdFusion 8 will throw an error.
Fonts - This is a comma separated list of fonts that can be used to generate the CAPTCHA image. If you give is a list of several fonts, it will randomly pick a font for EACH character of the CAPTCHA text. Only certain fonts can be used (of which the rules are not entirely clear to me).
FontSize - The point (pt) size of the font to be used in the generated image. FontSize will affect the minimum width of the result image. It will also affect readability, so be sure to chose a font size that will be readable by your users.
Ok, so now that we have a basic understanding of the CFImage / CAPTCHA tag, let's take a look at the example:
Launch code in new window » Download code as text file »
In my example, I am using a medium difficulty CAPTCHA image. This seems to work well. Also, I have left a lot of code in the example so that you can see fully how you might go about selecting a random CAPTCHA value and then checking against it. A good deal of this code would be factored out and placed in some sort of function library. But, for the purposes of education, I don't like to have "black-boxed" functionality.
When using CAPTCHA, it is not enough to just provide the CAPTCHA image and then an input for the user. We also need to check the user-entered value against what the original CAPTCHA text was. Therefore, we need to pass along the original text with the form submission. However, we don't want this text to be scrappable by the Bots. Therefore, I am encrypting the original text and submitting that via a hidden form field. Once the form is submitted, I then decrypt it and check that it matches the user entered value.
Running the above page, we get a screen that looks like this:
| | | | ||
| | ![]() | | ||
| | | |
As you can see, ColdFusion 8 is going to make creating CAPTCHA images really easy. In fact, the bulk of the code here goes into the HTML and form validation; actually creating the CAPTCHA image is just ONE LINE of code. Snazzy!
Download Code Snippet ZIP File
Comments (15) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Ben, I'm not a fan of Captcha images in general, as I don't think they're particularly accessible for many people. However the basic idea serves a purpose. One area I think could be improved is the general comprehension; an 8 digit random string isn't as obvious as say a familiar word. e.g. looking at your example, at first it might seem that there is a letter L or number 1 in between the P and the U. And it looks a little like a letter C beneath the 9. Probably obvious enough to us that that's not the case, but try explaining it to your grandmother, someone with reduced vision or learning difficulties. Partly this is due to the difficulty level you've selected, but these sort of artefacts are a problem with a lot of captcha images I feel.
So, as we can set the text to be what we like, why not pick a random word? I use a similar method on one of my sites, except I don't embed it as an image, I just say:
The magic word is #session.magicword#, please enter the magic word.
This would be trivial for any bot to get around by parsing the form, but so far none of them have, although if it were widely used I'm sure it would happen. I actually saw this being done on another site (mysociety.org, they do good work), where it seems the magic word never changes! Yet it must work for them too. I actually have a dictionary file and I grab a random 4 letter word into the session each time.
However the same principle could be applied to the captcha image, give it a random word of your choosing. This would then raise the comprehension significantly I think, as instead of trying to work out if you should be entering 3PU9IWHS, 3PLU9C1WHLS, etc, you just have to enter 'tangible', 'artefact', etc. the shorter the word the better too!
Posted by duncan on Jul 30, 2007 at 5:06 AM
@Duncan,
I too am not the hugest fan of CAPTCHA for the very reasons you bring up. I am shocked how often I get it incorrect, and my eye sight it fairly good and I know the computer world. Picking a random word as opposed to random letters does make much more sense.
Thanks for the excellent suggestion.
Posted by Ben Nadel on Jul 30, 2007 at 8:59 AM
Hey Ben,
Another suggestion would be to do away with the encrypt function and pass a hashed version of the captcha text in the hidden field and compare that to the hashed user input as the validation test. This would simplify the process a bit.
Another benefit in using hashed values is that the encrypt function may produce characters that could break the HTML source, so the encrypted version would have to be escaped as the hidden form field value. Hashed values wont have this problem.
Posted by Brett on Aug 10, 2007 at 3:38 PM
@Brett,
I really like that idea. You're right - I really don't need the encryption at all. Dynamite drop-in.
Posted by Ben Nadel on Aug 10, 2007 at 3:52 PM
I was planning to use this feature on my new site, but saw a segment on WIRED Science last night about reCaptcha, and I'm thinking of using that instead.
It's a pretty interesting idea, they present a user with two words in the image. One that is "known" and one scanned in from a book that is "unknown".
The user doesn't know which is which.
To pass the captcha the user has to correctly type the "known" word.
Eventually many different people with be shown the "unknown" word, if they all type the same thing, it is now "known"
So by typing the scanned word in, the user helps to digitize old texts.
Because this is hosted on a site outside my own, I will probably add some type of check to see if their server is up and show an alternate captcha if their site is down.
Posted by Steve Savage on Dec 23, 2007 at 9:59 AM
@Steve,
That is an interesting approach.
The other day, I was reading in PC Magazine about CAPTCHA. Specifically about an online ticketing site (maybe it was TicketMaster, can't remember). Anyway, the one mom was saying she went to buy Hanna Montana tickets the second they went on sale and immediately they were sold out. The TicketMaster site has been bombarded by bots looking to buy out the Hanna Montana tickets.
Now, the TicketMaster site did have a CAPTCHA; the problem was that the bots had no problem with the CAPTCHA, and in the end, ironically, the CAPTCHA made the registration slower for real people than it did for the bots.
I guess this just goes to show that the bots that spam Blog sites are clearly not the top of the line bots and the ones that are used to make big money simply cannot be stopped. Anyway, thought it was fascinating.
Posted by Ben Nadel on Dec 23, 2007 at 12:15 PM
I've read similar concerns - high end OCR, captcha-farms etc.
I agree that personal blogs don't seem to get hit by top end bots, so the graphical/audio captcha is probably sufficient to prevent bots on our sites.
One test that I did find interesting when looking in to this a few years back was the "kitten" test, where the user is shown a group of 9 or more similar images, only two of which are kittens, the rest are other cats or animals. The placement of the images and file names are random, and different photos are used.
This type of image recognition is still something that computers have difficulty with.
The problems are:
this test is completely inaccessible for the blind,
requires a fairly large library of photos
assumes you know what a kitten looks like
I think bot proof captchas can be made, but not without increasing the annoyance for users
Posted by Steve Savage on Dec 23, 2007 at 1:33 PM
When you generate the captcha place it in a session variable.
Then just check it against the entry on the form submission page or with ajax request.
This way there is nothing to read in the source from a bot.
Posted by Michael Bazaillion on Dec 30, 2007 at 10:25 PM
I'm kinda of a noob. So forgive me if there's a stupidly simple answer to this that I'm missing. But, I'm building a captcha system for blog comments using the same basic idea you mention in this example.
But, If I the correct captcha answer stays in the form itself, what would stop a bot from simply going "back" and then retrying the captcha until it's beaten because it would never change if using a cached version of the page.
Am I missing something?
Posted by Henrik on Jan 7, 2008 at 6:01 PM
@Henrik,
You are correct in this. However, if you look at the value that is being sent back and forth from the form, it is encrypted. So, even though the bot could go back and keep trying to resubmit the form, it would still have to decrypt the value. Or, I suppose it could just submit thousands of alternate values - a brute force attack.
But, to be honest, the types of bots that are trolling Blog sites are really not that advanced. There are plenty of insanely advanced bots out there that can easily read CAPTCHA text even faster than humans can, but those bots are directed at sites where money can be made (such as buying out tickets to concerts for later resale).
Bloggers, in general, don't get the cream of the crop bots :)
Posted by Ben Nadel on Jan 8, 2008 at 7:41 AM
I'm sure you're right. Thanks for this awesome ColdFusion blog Ben. I've found answers on here more than once when I've gotten stuck on something.
Like you said, I suppose bloggers are hardly prime targets. Using this type of captcha system is probably sufficient for my purposes.
By the way, I just looked at a TicketMaster captcha last night. If anybody could write a piece of software that could read that, I'm very impressed.
Posted by Henrik on Jan 8, 2008 at 9:52 AM
@Henrik,
I supposer where there is Money, there is a way :)
Posted by Ben Nadel on Jan 8, 2008 at 9:55 AM
I just took a look at Ticketmaster, and it looks like the logic to "clean up" the image is fairly simple
Main obscuring method is straight lines running from one edge of the image to another (always used)
Second is random dots - noise. (not always used)
Third is a slight deforming of the text (used seldomly)
background is typically white.
For lines:
Scan the edges of the image looking for black pixels that indicate the start of the line, trace the line across the image, and replace each pixel "in the line" with white pixels. A bit code could deal with where a line crosses a letter (e.g. don't delete if surrounded by black pixels)
For noise look for single non-white pixels surrounded by white pixels and replace them with white
Run the result through a simple OCR system - even something checking for basic letter shapes would have a good chance of getting it right.
Posted by Steve Savage on Jan 8, 2008 at 3:28 PM
@Steve,
Looks like TicketMaster was not ready to deal with bots. I've seen more complicated CAPTCHA on blog posts :)
Posted by Ben Nadel on Jan 9, 2008 at 7:23 AM
I've been using CAPTCHAs generated by cfimage recently, and came across a problem, which I've detailed on my blog (I've finally got round to setting one up!):
http://sebduggan.com/posts/cfimage-generating-unreadable-captchas
Basically, cfimage CAPTCHAs are PNGs with alpha transparency on their background layer - so you need to be careful what page background you put them on, or they can be completely unreadable! (Took me quite a while to work out what the problem was...)
Posted by Seb Duggan on Jan 15, 2008 at 4:52 AM