Using CAPTCHA In ColdFusion 8
A while back, I wrote about how easy ColdFusion 8 is going to make image creation and manipulation. Of my multipart series on ColdFusion 8's new CFImage tag, I briefly touched upon its CAPTCHA functionality. A few people have contacted me since looking for a more detailed example of how CAPTCHA can actually be used. Before I show you the code, let's just quickly cover the basics. With the CAPTCHA action of the CFImage tag, there are few attributes that are required:
Height - The pixel height of the generated image. The height must be large enough to properly display all of the designated text at the given font size.
Width - The pixel width of the generated image. The width must be large enough to properly display all of the designated text at the given font size. This width is directly proportional to the number of characters, so once you find a width that works, just stick with it. If the width is too small, ColdFusion will throw an error (which will actually tell you what the minimum width can be for the given text).
Text - The text to display in the CAPTCHA image. The CAPTCHA action does not randomly select text for you; it only displays the text that you tell it to. It is still up to us as programmers to generate random text and tell it what to display.
As you adjust the height and width of the tag, ColdFusion 8 automatically spreads out the individual characters of the designated text to fill up the area. The text may or may not be evenly distributed; ColdFusion tries to move things around enough to make it bot-proof, but human readable.
Additionally, there are some optional ColdFusion 8 CAPTCHA attributes:
Destination - By default, ColdFusion 8 writes the CAPTCHA image to a randomly named image in the image servlet and then writes that image to the browser (recommended IMO). However, if you provide a destination (an absolute or relative path) file, ColdFusion 8 will write the CAPTCHA image to this file rather than writing it to the browser. If you do this, you have to be careful not to overwrite a CAPTCHA image that a concurrent user might be viewing.
Difficulty - The degree to which the text is made bot-proof. Bey default, the difficulty is Low (options include Low, Medium, High). The higher the difficulty, the less likely it is that both Bots and Humans can read it. The trade-off is your choice, but I feel that Medium is a nice level of security.
Overwrite - When using the Destination attribute, this is a flag to determine whether or not to overwrite the existing file at the given destination path. This defaults to No, and if you attempt to overwrite an image without this being set to Yes, ColdFusion 8 will throw an error.
Fonts - This is a comma separated list of fonts that can be used to generate the CAPTCHA image. If you give is a list of several fonts, it will randomly pick a font for EACH character of the CAPTCHA text. Only certain fonts can be used (of which the rules are not entirely clear to me).
FontSize - The point (pt) size of the font to be used in the generated image. FontSize will affect the minimum width of the result image. It will also affect readability, so be sure to chose a font size that will be readable by your users.
Ok, so now that we have a basic understanding of the CFImage / CAPTCHA tag, let's take a look at the example:
<!--- Kill extra output. ---> <cfsilent> <!--- Param FORM values. ---> <cfparam name="FORM.captcha" type="string" default="" /> <cfparam name="FORM.captcha_check" type="string" default="" /> <cftry> <cfparam name="FORM.submitted" type="numeric" default="0" /> <cfcatch> <cfset FORM.submitted = 0 /> </cfcatch> </cftry> <!--- Set a flag to see if this user is a bot or not. ---> <cfset blnIsBot = true /> <!--- Check to see if the form has been submitted. ---> <cfif FORM.submitted> <!--- Decrypt the captcha check value. Since this was submitted via a FORM, we have to be careful about attempts to hack it. Always put a Decrypt() call inside of a CFTry / CFCatch block. ---> <cftry> <!--- Decrypt the check value. ---> <cfset strCaptcha = Decrypt( FORM.captcha_check, "bots-aint-sexy", "CFMX_COMPAT", "HEX" ) /> <!--- Check to see if the user-submitted value is the same as the decrypted CAPTCHA value. Remember, ColdFusion is case INsensitive with the EQ opreator. ---> <cfif (strCaptcha EQ FORM.captcha)> <!--- The user entered the correct text. Set the flag for future use. ---> <cfset blnIsBot = false /> </cfif> <!--- Catch any errors. ---> <cfcatch> <!--- Make sure the bot flag is set. ---> <cfset blnIsBot = true /> </cfcatch> </cftry> </cfif> <!--- Before we render the form, we have to figure out which CAPTCHA text we are going to display. For this, we have to come up with a random combination of letters/numbers. For this, we are going to use an easy solution which is shuffling an array of valid characters. ---> <!--- Create the array of valid characters. Leave out the numbers 0 (zero) and 1 (one) as they can be easily confused with the characters o and l (respectively). ---> <cfset arrValidChars = ListToArray( "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z," & "2,3,4,5,6,7,8,9" ) /> <!--- Now, shuffle the array. ---> <cfset CreateObject( "java", "java.util.Collections" ).Shuffle( arrValidChars ) /> <!--- Now that we have a shuffled array, let's grab the first 8 characters as our CAPTCHA text string. ---> <cfset strCaptcha = ( arrValidChars[ 1 ] & arrValidChars[ 2 ] & arrValidChars[ 3 ] & arrValidChars[ 4 ] & arrValidChars[ 5 ] & arrValidChars[ 6 ] & arrValidChars[ 7 ] & arrValidChars[ 8 ] ) /> <!--- At this point, we have picked out the CAPTCHA string that we want the users to ender. However, we don't want to pass that text anywhere in the form otherwise a spider might be able to scrape it. Thefefore, we now want to encrypt this value into our check field. ---> <cfset FORM.captcha_check = Encrypt( strCaptcha, "bots-aint-sexy", "CFMX_COMPAT", "HEX" ) /> </cfsilent> <cfoutput> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>ColdFusion 8 CFImage / CAPTCHA Demo</title> </head> <body> <h1> ColdFusion 8 CFImage / CAPTCHA Demo </h1> <form action="#CGI.script_name#" method="post"> <!--- This is the hidden field that will flag form submission for data validation. ---> <input type="hidden" name="submitted" value="1" /> <!--- This is the hidden field that we will check the user's CAPTCHA text against. This is an encrypted field so that spiders / bots cannot use it to their advantage. ---> <input type="hidden" name="captcha_check" value="#FORM.captcha_check#" /> <p> <!--- Output the CAPTCHA image to the browser. Here, we are using a difficulty of medium so that we don't fry the user's brain. ---> <cfimage action="captcha" height="75" width="363" text="#strCaptcha#" difficulty="medium" fonts="verdana,arial,times new roman,courier" fontsize="28" /> </p> <label for="captcha"> Please enter text in image: </label> <input type="text" name="captcha" id="captcha" value="" /> <input type="submit" value="Submit" /> <br /> <!--- Check to see if the form has been submitted so we can see if we need to show the validation. ---> <cfif FORM.submitted> <h3> Bot Validation Results </h3> <!--- Check for a bot. ---> <cfif blnIsBot> <p> You Are A Bot!!! </p> <cfelse> <p> You are not a bot :) </p> </cfif> </cfif> </form> </body> </html> </cfoutput>
In my example, I am using a medium difficulty CAPTCHA image. This seems to work well. Also, I have left a lot of code in the example so that you can see fully how you might go about selecting a random CAPTCHA value and then checking against it. A good deal of this code would be factored out and placed in some sort of function library. But, for the purposes of education, I don't like to have "black-boxed" functionality.
When using CAPTCHA, it is not enough to just provide the CAPTCHA image and then an input for the user. We also need to check the user-entered value against what the original CAPTCHA text was. Therefore, we need to pass along the original text with the form submission. However, we don't want this text to be scrappable by the Bots. Therefore, I am encrypting the original text and submitting that via a hidden form field. Once the form is submitted, I then decrypt it and check that it matches the user entered value.
Running the above page, we get a screen that looks like this:
As you can see, ColdFusion 8 is going to make creating CAPTCHA images really easy. In fact, the bulk of the code here goes into the HTML and form validation; actually creating the CAPTCHA image is just ONE LINE of code. Snazzy!
Want to use code from this post? Check out the license.
Ben, I'm not a fan of Captcha images in general, as I don't think they're particularly accessible for many people. However the basic idea serves a purpose. One area I think could be improved is the general comprehension; an 8 digit random string isn't as obvious as say a familiar word. e.g. looking at your example, at first it might seem that there is a letter L or number 1 in between the P and the U. And it looks a little like a letter C beneath the 9. Probably obvious enough to us that that's not the case, but try explaining it to your grandmother, someone with reduced vision or learning difficulties. Partly this is due to the difficulty level you've selected, but these sort of artefacts are a problem with a lot of captcha images I feel.
So, as we can set the text to be what we like, why not pick a random word? I use a similar method on one of my sites, except I don't embed it as an image, I just say:
The magic word is #session.magicword#, please enter the magic word.
This would be trivial for any bot to get around by parsing the form, but so far none of them have, although if it were widely used I'm sure it would happen. I actually saw this being done on another site (mysociety.org, they do good work), where it seems the magic word never changes! Yet it must work for them too. I actually have a dictionary file and I grab a random 4 letter word into the session each time.
However the same principle could be applied to the captcha image, give it a random word of your choosing. This would then raise the comprehension significantly I think, as instead of trying to work out if you should be entering 3PU9IWHS, 3PLU9C1WHLS, etc, you just have to enter 'tangible', 'artefact', etc. the shorter the word the better too!
I too am not the hugest fan of CAPTCHA for the very reasons you bring up. I am shocked how often I get it incorrect, and my eye sight it fairly good and I know the computer world. Picking a random word as opposed to random letters does make much more sense.
Thanks for the excellent suggestion.
Another suggestion would be to do away with the encrypt function and pass a hashed version of the captcha text in the hidden field and compare that to the hashed user input as the validation test. This would simplify the process a bit.
Another benefit in using hashed values is that the encrypt function may produce characters that could break the HTML source, so the encrypted version would have to be escaped as the hidden form field value. Hashed values wont have this problem.
I really like that idea. You're right - I really don't need the encryption at all. Dynamite drop-in.
I was planning to use this feature on my new site, but saw a segment on WIRED Science last night about reCaptcha, and I'm thinking of using that instead.
It's a pretty interesting idea, they present a user with two words in the image. One that is "known" and one scanned in from a book that is "unknown".
The user doesn't know which is which.
To pass the captcha the user has to correctly type the "known" word.
Eventually many different people with be shown the "unknown" word, if they all type the same thing, it is now "known"
So by typing the scanned word in, the user helps to digitize old texts.
Because this is hosted on a site outside my own, I will probably add some type of check to see if their server is up and show an alternate captcha if their site is down.
That is an interesting approach.
The other day, I was reading in PC Magazine about CAPTCHA. Specifically about an online ticketing site (maybe it was TicketMaster, can't remember). Anyway, the one mom was saying she went to buy Hanna Montana tickets the second they went on sale and immediately they were sold out. The TicketMaster site has been bombarded by bots looking to buy out the Hanna Montana tickets.
Now, the TicketMaster site did have a CAPTCHA; the problem was that the bots had no problem with the CAPTCHA, and in the end, ironically, the CAPTCHA made the registration slower for real people than it did for the bots.
I guess this just goes to show that the bots that spam Blog sites are clearly not the top of the line bots and the ones that are used to make big money simply cannot be stopped. Anyway, thought it was fascinating.
I've read similar concerns - high end OCR, captcha-farms etc.
I agree that personal blogs don't seem to get hit by top end bots, so the graphical/audio captcha is probably sufficient to prevent bots on our sites.
One test that I did find interesting when looking in to this a few years back was the "kitten" test, where the user is shown a group of 9 or more similar images, only two of which are kittens, the rest are other cats or animals. The placement of the images and file names are random, and different photos are used.
This type of image recognition is still something that computers have difficulty with.
The problems are:
this test is completely inaccessible for the blind,
requires a fairly large library of photos
assumes you know what a kitten looks like
I think bot proof captchas can be made, but not without increasing the annoyance for users
When you generate the captcha place it in a session variable.
Then just check it against the entry on the form submission page or with ajax request.
This way there is nothing to read in the source from a bot.
I'm kinda of a noob. So forgive me if there's a stupidly simple answer to this that I'm missing. But, I'm building a captcha system for blog comments using the same basic idea you mention in this example.
But, If I the correct captcha answer stays in the form itself, what would stop a bot from simply going "back" and then retrying the captcha until it's beaten because it would never change if using a cached version of the page.
Am I missing something?
You are correct in this. However, if you look at the value that is being sent back and forth from the form, it is encrypted. So, even though the bot could go back and keep trying to resubmit the form, it would still have to decrypt the value. Or, I suppose it could just submit thousands of alternate values - a brute force attack.
But, to be honest, the types of bots that are trolling Blog sites are really not that advanced. There are plenty of insanely advanced bots out there that can easily read CAPTCHA text even faster than humans can, but those bots are directed at sites where money can be made (such as buying out tickets to concerts for later resale).
Bloggers, in general, don't get the cream of the crop bots :)
I'm sure you're right. Thanks for this awesome ColdFusion blog Ben. I've found answers on here more than once when I've gotten stuck on something.
Like you said, I suppose bloggers are hardly prime targets. Using this type of captcha system is probably sufficient for my purposes.
By the way, I just looked at a TicketMaster captcha last night. If anybody could write a piece of software that could read that, I'm very impressed.
I supposer where there is Money, there is a way :)
I just took a look at Ticketmaster, and it looks like the logic to "clean up" the image is fairly simple
Main obscuring method is straight lines running from one edge of the image to another (always used)
Second is random dots - noise. (not always used)
Third is a slight deforming of the text (used seldomly)
background is typically white.
Scan the edges of the image looking for black pixels that indicate the start of the line, trace the line across the image, and replace each pixel "in the line" with white pixels. A bit code could deal with where a line crosses a letter (e.g. don't delete if surrounded by black pixels)
For noise look for single non-white pixels surrounded by white pixels and replace them with white
Run the result through a simple OCR system - even something checking for basic letter shapes would have a good chance of getting it right.
Looks like TicketMaster was not ready to deal with bots. I've seen more complicated CAPTCHA on blog posts :)
I've been using CAPTCHAs generated by cfimage recently, and came across a problem, which I've detailed on my blog (I've finally got round to setting one up!):
Basically, cfimage CAPTCHAs are PNGs with alpha transparency on their background layer - so you need to be careful what page background you put them on, or they can be completely unreadable! (Took me quite a while to work out what the problem was...)
'If I the correct captcha answer stays in the form itself, what would stop a bot from simply going "back" and then retrying the captcha until it's beaten because it would never change if using a cached version of the page'
Good question. Not only concerning captcha pages, but all pages that are being submitted. How to stop people or bots from refreshing a site, or click back and resubmitting information.
I create a uuid string each time a user enters the page, store it in a database and submit it along with the form. When the form gets submitted I check if the uuid exists in the database and mark it as used. therefor the form can only be submitted once. Then I delete uuid strings in the database that are more than 2 days old.
very nicely done Ben !
and as you said, you just strip it back to what you need. Your comments in the code where very good too.
using it already.
Glad you found the code commenting nice - some people find it excessive :)
Great post. People find it excessive when they don't know what they're doing.
This was easier to implement than Lyla Captcha, which is a good solution as well. You can't beat the built-in functionality though.
Some cross pollinating on this topic:
If stuffing the hash on the form, be sure to add some salt before creating the hash, then salt the user's response before comparing the hashes:
For the blind, speak the text:
The issue of using TTS is a thorny one. Here is an interesting blog post over at reCaptcha concerning their efforts to provide a TTS solution:
Do try the test here:
http://recaptcha.net/learnmore.html (click on the audio button.
I had a dismal success rate. You have to type to many words and the speech is at a way to high rate. Oh well,...
One of the biggest issues I think with this method is that even though this will stop most bots from randomly selecting your site, it doesn't stop people from writing manual automated scripts.
I think the trick here, is to pass your captcha_check variable as a session instead of a form element.
Very nice and simple article. I had no idea how easy captcha was in ColdFusion. I will be implementing your code shortly. What is your policy of use for the code?
Again thanks for the inline comments in code and write up, very helpful indeed.
My policy for code is to use it at you see fit :) I'm just happy to put things out there that people find interesting from time to time.
I am trying to implement a Captcha include in a site that i work for. My problem is that when i enter an incorrect Captcha value it refreshes the page and all the enterd info is gone, is there a way that i can dynamically have the info that was entered in the form stay in the form after it has been refreshed after a incorrectly enter captcha? I am a noob and i have been beating my head against this for a while now and can't find a solution.
Haven't tested this. Probably needs some tweaking, but this should get you going. Assumes there is a input tag for the captcha with an id attribute of "captcha" (no quotes).
add atrib to body tag:
On form post button add this attribute:
onclick="setCookie("test", this.value); return true;"
I would think about a CAPTCHA form as no different from any other form that requires validation. When you are dealing with other forms in which a field is invalid, how do you handle it? Typically by re-rending the FORM variables in the input. There's no reason that you should think of a ColdFusion CAPTCHA form as any different than a standard ColdFusion form.
I took some time and added an audio option to this. I can share the code once I get it cleaned up.
I manage this by cfparam'ing all my form items on the form page, then loop through form.fieldnames on my action page an do an autopostback, see below:
Ben, you wrote: "By default, ColdFusion 8 writes the CAPTCHA image to a randomly named image in the image servlet and then writes that image to the browser (recommended IMO). However, if you provide a destination (an absolute or relative path) file, ColdFusion 8 will write the CAPTCHA image to this file rather than writing it to the browser. If you do this, you have to be careful not to overwrite a CAPTCHA image that a concurrent user might be viewing."
I'm having trouble with the "...will write the CAPTCHA image to this file RATHER THAN writing it to the browser." part.
<cfimage action="captcha" fontsize="24" fonts="Times New Roman" width="250" height="50" text="#rndString#" difficulty="low" destination="#fileDest#" overwrite="yes">
I want CF to create the image but NOT display it at the time of creation. Later, in a user-supplied form, THAT's when I want the image to be displayed. Yet, I get both images (with the first one being a broken image because the path is elsewhere).
How do you get the CAPTCHA image CREATED but not SHOWN?
(Reason: I have a content management system I wrote in CF that does allow people to put in their own source code. One customer put in their own FORM and wants CAPTCHA on it, but their form won't execute CF because it's inside of a textarea tag, so I need to use <img> instead of <cfimage> there, in order to get the captcha inside the <form> and near where they are asked to input the value in the image.)
I have been using captcha with cfimage and passing the encrypted or hashed field as a hidden value and comparing it to the user input, however a spammer or bot can still bypass the form. They can scrape the encrypted hidden value from the form and know it's corresponding real value one-time from a form, then they can use the bot to always pass these 2 values to the form and the scripts will always accept it, since he is sending a correct code and it's correct encryption everytime.
For example, if a captcha of 12345x was displayed and it's encryption was 3&#^L$, the bot can pass both values to the form everytime and it should go through. You shouldn't pass the encrypted captcha value as a hidden value, instead, it should always be a session variable or some other variable that is not viewed from the file source.
I thought I was safe by encrypting the value, but it doesn't make any difference if the bot picks up both values only 1 time then uses them over and over again.
I'm curious as to why the encryption would validate.
Mm, I suggest making it a dynamic value. I would
1.) encrypt the true value by whatever means.
2.) Replace it with BASE64 encode
3.) Retrieve it, decode base64 and then your decode to see if it's valid.
You can just make random numbers with the rand function.
Yes, the value passed in the captcha is dynamic and encrypted.
But for the line that compares the passed value with the encrypted value below:
<cfif (strCaptcha EQ FORM.captcha)>
If the spammer can check the form once and scrape the strCaptcha encrypted string and pick up it's corresponding value from the cfimage and send it as a FORM.captcha value, the above code will always validate as it doesn't matter if the captcha image changes with every form submission.
I've always found traditional captcha techniques cumbersome. Regarding encrypting/decrypting values to compare, there's a risk having the encryption broken. I tend to just HASH() the value. Also, decrypting the value and matching it against a user's typed value is less efficient than just HASH()ng the user's value and comparing against the HASH()d captcha text.
At the end of the day, I use simple math equations instead of random strings. This has several benefits. 1) math is the same in every language so there's no localization or character set issue. 2) While bots are trying to hack and use the text shown in the image, your form submission is expecting the answer to the equation instead.
--just my two pence
I've recently set up ssl on one of my sites that uses the captcha on this page but the image is not coming up secure. I need to know how to make the cfimage image that is created here to be sent to the browser from an encrypted connection. Is there a way to do this?
I received the following error message and am not sure why.
"cannot convert the value "1,1" to a boolean"
Would appreciate your help.
Thank you, Diana
Are you using checkboxes? It looks to me like you used a check box, got a fail, backed up (or came back around to run the same program) and the variables didn't clear out so now you have two values for one variable and that's why you have a "1,1" instead of just a "1".
Nice article.I have to implement the captcha with a fix background image.I mean the image should not be changed each time the page is refreshed.
Is this possible?
Thanks for this Ben. I continue to use your site a lot as a learn Coldfusion.
I recently used your captcha code on my web site recently and it works great on my home server. However when deployed to my live server for some reason the captcha images do not render sometimes. Instead I just get an image placeholder as if the image link were broken. If I refresh the page enough eventually I will get one that works.
The page in question is http://www.smylconcepts.com/contact.cfm
Please let me know if you have any ideas about why this would be happening.
I'm also facing the same issue now in our production server. Have you found any solution for this issue ?
Unfortunately I do not have a solution for this yet, sorry!
Thank you very much for this post. I used cf_recaptcha previously, but it did not work well on SSL. I've never used CF8 captcha before, but this post and your comments made my life easier.
For my cfcaptcha's, I just grab the gettickcount() and look at the 5 on the right... I use that as my image and if people enter that 4 or 5 number, they're good to go. That seems to work for me and the code is simple enough.
<cfimage action="captcha" text="#botchecker#"
fonts="verdana,arial,times new roman,courier"
Image Code: <cfinput type="text" name="bot_test_check">
Then I check to make sure they're a bot before they're logged in or subscribed or whatever it is they're trying to do.
<cfif form.bot_test_check eq botchecker>
Something along those lines.
I'm trying to implement this code but for some reason the image will not display. I'm using CF10 Dev on a Linux server.
So when looking at Firebug I get an error:
"NetworkError: 404 Not Found - http://<IP>/CFFileServlet/_cf_captcha/_captcha_img5608960163582804185.png"