My Non-Working Anti-Spam Solution And How It Works (Or Rather, Doesn't Work)
My new anti-spam ColdFusion solution has been in place since Friday and I have already gotten about 15 or 20 spam comments posted to my site. Sorry if anyone was offended by the steroid or penis enlargement emails they may have gotten. My bad. Even though my solution is not working, I thought I would share it with you as it points out how freakin' good these spam bots really are!
This solution was a time-based solution. It was meant to leverage spam-bot's speed against them. It was also built to allow a form to be active for only a short period of time. These are the requirements that I put in place:
A form was only valid for 7 minutes. If the form was generated and then submitted more than 7 minutes later, you would get an error message and would have to re-submit the form.
A form must be active for at least 10 seconds before it could be submitted. If it was submitted without being active for 10 seconds, you would get an error message and would have to wait 10 seconds and re-submit.
The 7 minutes validation was to ensure that spam bots were not storing form data and then submitting it later. The 10 second validation was to ensure that a spam bot couldn't just load the page and immediately submit data. As clever as I thought this was, it is not working. On top of the general time constraints, all spam markers were encrypted so that they could not be hacked (easily).
Ok, so here's how it was done. At the bottom of every action page (before the form was rendered), I generated a random anti-spam key and used it to encrypt the time constraints:
<!--- Get the key that we are going to use to encrypt the time stamp. Rand() gives us a multidigit number between zero and one. ---> <cfset REQUEST.SpamKey = ToString( Rand() ) /> <!--- Once we get this number, we are going to strip out the leading zero/one and decimal place. This should leave us with a digit-only string of characters. ---> <cfset REQUEST.SpamKey = REQUEST.SpamKey.ReplaceFirst( "(\d*\.)?0*", "" ) /> <!--- Get the time span for which this form is active. CreateTimeSpan() gives us a float value to represent the 7 minutes during which time this form can be submitted. ---> <cfset REQUEST.ActiveTimeSpan = CreateTimeSpan( 0, 0, 7, <!--- 7 Minutes. ---> 0 ) /> <!--- Get the time spam for which this form MUST be active. This form is not allowed to be submitted before it has been live for 10 seconds. CreateTimeSpan() gives us a float value representation of the time span. ---> <cfset REQUEST.RequiredActiveTimeSpan = CreateTimeSpan( 0, 0, 0, 10 <!--- 10 Seconds. ---> ) /> <!--- Get the cut off date for this active form. To get this value, we will add the time span above (7 minutes) to the current time. NOTE: Since the time span created above is a float value, this addition will AUTOMATICALLY convert the result to a float rather than a string-style date. ---> <cfset REQUEST.CutOffDate = ( REQUEST.Environment.DateTime.Now + REQUEST.ActiveTimeSpan ) /> <!--- Get the early cut off date for this active form (this is the time the form is required to exist). Like the above date, the results of this will be a float value, not a standard string-style date. ---> <cfset REQUEST.RequiredCutOffDate = ( REQUEST.Environment.DateTime.Now + REQUEST.RequiredActiveTimeSpan ) /> <!--- Encrpt the cut off date using our randomly generated spam key above. Using the CFMX_COMPAT algorithm, we are going to convert this to a HEX value. ---> <cfset REQUEST.CutOffDateEncrypted = Encrypt( REQUEST.CutOffDate, REQUEST.SpamKey, "CFMX_COMPAT", "HEX" ) /> <!--- Encrpt the required cut off date using our randomly generated spam key above. Again, we are creating a HEX value for this encryption. ---> <cfset REQUEST.RequiredCutOffDateEncrypted = Encrypt( REQUEST.RequiredCutOffDate, REQUEST.SpamKey, "CFMX_COMPAT", "HEX" ) /> <!--- Encrupt the spam key that we generated above. Seeing as our spam key is totally random, once we encrypt it with our global encryption key (nutz_4_butts), I don't see how it can be cracked. ---> <cfset REQUEST.SpamKeyEncrypted = Encrypt( REQUEST.SpamKey, "nutz_4_butts", "CFMX_COMPAT", "HEX" ) />
At this point, we should have our two time stamps and our random spam key encrypted and ready for use. In the page output, at the top of the form, I include these three values as hidden form fields:
<!--- The time stamp for the cut off date. ---> <input type="hidden" name="spam_key1" value="#REQUEST.CutOffDateEncrypted#" /> <!--- The time stamp for the required live date. ---> <input type="hidden" name="spam_key2" value="#REQUEST.RequiredCutOffDateEncrypted#" /> <!--- The spam key. ---> <input type="hidden" name="spam_key3" value="#REQUEST.SpamKeyEncrypted#" />
Once the form has been submitted, we need to check to see if the time constraints have been violated:
<!--- Try to decrypt and check the anti-spam keys. It is important to put CFTry / CFCatch around any and all decryption actions since the Decrypt() method will throw an error if no string (empty string) is passed to it. Seeing as we can esnure the integrity of our data, this is a possibility. ---> <cftry> <!--- Decript the spam key using our global encryption key. This will give us back our randomly generated key. ---> <cfset REQUEST.SpamKey = Decrypt( FORM.spam_key3, "nutz_4_butts", "CFMX_COMPAT", "HEX" ) /> <!--- Decrypt the cut off date using the spam key we just decrypted. This will give us back our time this form can be active. ---> <cfset REQUEST.CutOffDate = Decrypt( FORM.spam_key1, REQUEST.SpamKey, "CFMX_COMPAT", "HEX" ) /> <!--- Decrypt the required cut off date using the spam key. This will give us the time for which the form MUST be active (otherwise, submission occurred to fast). ---> <cfset REQUEST.RequiredCutOffDate = Decrypt( FORM.spam_key2, REQUEST.SpamKey, "CFMX_COMPAT", "HEX" ) /> <!--- Check to see if our anti-spam time contraints have been violated. For starters, we must check to see if the decrypted time stamps are even valid numeric dates. If they are, then check to see if "Now" falls between the required live time and active cut-off time. ---> <cfif ( (NOT IsNumericDate( REQUEST.CutOffDate )) OR (NOT IsNumericDate( REQUEST.RequiredCutOffDate )) OR (REQUEST.CutOffDate LT REQUEST.Environment.DateTime.Now) OR (REQUEST.RequiredCutOffDate GT REQUEST.Environment.DateTime.Now) )> <!--- This cut off date is not valid. ---> <cfset REQUEST.FormErrors.Add( "There was a problem submitting the form, please try again" ) /> <!--- Add an additional check to see if the form was submitted to fast. If it was, give the user some feedback as to why the form submission did not work. ---> <cfif (REQUEST.RequiredCutOffDate GT REQUEST.Environment.DateTime.Now)> <cfset REQUEST.FormErrors.Add( "The form must be active for 10 seconds before it can be submitted" ) /> </cfif> </cfif> <!--- Catch any errors that occur. ---> <cfcatch> <!--- Something went wrong. Either there was an error in the code, or the data was invalid. Either way, the anti-spamming failed. ---> <cfset REQUEST.FormErrors.Add( "There was a problem submitting the form, please try again" ) /> </cfcatch> </cftry>
Of course, if the form errors collection, REQUEST.FormErrors, has a size to it (has any messages), then the form cannot be submitted and the form is re-rendered with the error messages displayed.
Now, I think this is pretty clever and I am a bit shocked that spam messages are making it through. But, seeing as the spam messages are getting through, what does this mean about the spam bots:
Perhaps the spam bot can figure out how to decrypt my spam key and then generate it's own time stamps. However, seeing as my spam key is randomly generated for every single page load, I don't see how this could possibly be done? How can you figure out how to decrypt a randomly generated number? I suppose if they found patterns in the time stamps, it could be done - I am no cryptographer.
If it's not a matter of decryption, then it must mean that the spam bot does not submit the form instantly. If the spam bot performed the submit via some sort of automated post (such as CFHttp), then it would probably violate the 10 second minimum requirement. If it stored the form and submitted it later, then it would probably violate the 7 minute active time. This must mean that some spam bots either physically load the page in a browser and wait some time before submitting it, or they wait some time before submitting it via CFHttp (or similar).
If that is the case, that is one smart bot! It looks like I am going to have to go back and add some hidden fields to try and get the spam bot to slip up and submit data that a human would NOT submit. Damn you spam bots, DAMN YOU!
I am not quite ready to give up. I am also not quite ready to rely on an outside anti-spamming service. It may come to that eventually, but I will not go quietly into that night.
Want to use code from this post? Check out the license.
That is a brilliant solution!
It is fascinating that it didn't work.
I'm curious to see what other solutions are possible without CAPTCHA or similar.
Thanks for liking it. I am also a bit surprised that it didn't work. However, I do believe that all the spam messages that have come in this weekend have come from the one or two bots (I don't think it's a massive attack form all spam bots).
Furthermore, there doesn't seem to be any regularity to it. A few come in at a single time (span of a few minutes) and then nothing for a few hours. Not sure what the spamming strategy is these days.
So, it looks like it is stopping most spam, but not all. I will keep tinkering.
I'm using Jake Munson's CFFormProtect, and it's implemented this along with some of the other stuff you've talked about. It is blocking spam nicely - I've yet to have a false positive, and I've yet to have spam make it through.
If I recall correctly, you were checking the keyboard right? And someone pointed out that some people have autofill? Well, That may be fine for checkout procedures, but how could they autofill the comments?
If anything, you can have failures emailed to you and manually add in legitimate comments.
I will check out Jacob's solution. I remember him posting about it. As far as autofill stuff, I don't know much about it as I never use autofill. In my previous solution, however, there were hidden fields that hand names that contained key words like "email" and "name". I suspect that some autofill functionality searches not just on full field names but on partial field names????
Oh, I see. I thought (remembered incorrectly) you were worried about autofilling everything and not seeing the keyboard being pressed.
I guess the solution to autofilling hidden fields might be to name the fields something autofill wouldn't get. Of course, I don't know how effective that would be.
Yeah, I am gonna try to get a hidden field in there (in addition the current solution). My only concern is that if the hidden field is named too strangely then the spam bots won't try to fill it in. Of course, we don't know till we know, right?
Can you give an indication of how the form is processed server side? You encrypt an anti-spam key, but you haven't said how your backend checks for the valid submission.
For example, a bot could download the form and then submit *the same form* once every minute. The first version fails since it is within the forst 10 seconds. Then 5 should get through, and the next and all after would exceed the 7 minute mark. Does your solution somehow invalidate the anti-spamkey on each submission? Maybe I missed something in your description...
The bots accessing my sites must be stupid because they can't even figure out where to send the form when presented with the following code:
form action="" onsubmit="this.action='guest'+'book.cfm';"
Al covered my point. I've done this before too and noticed very quickly how spam came in bursts for the window of a form's active state. If you log the failures or session-ban them based on invalid submissions, it might help. But it might not if they are using a proxy (or many proxies). I often notice the same spam in series from different IP addresses.
But you should count on screen-scraping bots. If all you are doing is analyzing what is on the form and submission itself then, you are giving them everything they need to post. I've since added a session-based key, which I create on a per-article basis and remove from the session struct once a comment is successfully submitted. One might argue that it restricts users who don't have cookies enabled, but most people only disable regular cookies, not in-memory cookies. If you disable in-memory cookies a lot of stuff stops working on the web. That was a rare sacrifice I was willing to make. I still get spam though, which is why I'm reading your experiences and others'.
PS - the "must be active for 10 seconds" rule sucks when it takes longer than 7 minutes to type your message, then are redirected back to the form with a validation error and hit submit immediately.
There's another possibility - it might not be a bot.
There are wankers who think tis fun or worthwhile to submit stuff for the hell of it, or else they're really low-paid cottage industry workers spamming forms everywhere.
I know there are bots there - i fight them a lot too, and I have reduced the incidence of it on a radio station site i look after, but i can't eliminate it completely. And the postings I get are slightly different each time (which is giving me heartburn looking for a pattern to search for to block). Since they're slightly different each time, I'm inclined to believe there are banks of people in some sweatshop somewhere cruising the net posting stuff.
What they gain from it, I'm at a loss to understand, becuase our guestbooks are moderated and none of it ever appears on the web.
Recently, I have started using my own home grown spam control solution and have been quiet successful.
I just check each form field and make sure they are not duplicate values. Usually spams have same info in more than 1 form field.
Secondly I check each field value against Ray's spam word list (ray.camdenfamily.com / tbspam.cfm) and if I find more than one I mark it as a spam submission.
Its very basic but does the job.
I am not doing any checking to that respect. As the key is randomly generated for every form post I do not store them. However, I like the idea of storing used keys (at least temporarily). I just wanted to try and stay away from any persistent scope usage.
Good point with the in-memory cookie storage. I will take that into account. And yeah, I know that the 10 second rule sucks when it takes a while to enter a post. I figured most posts wouldn't take 7 minutes to write, but I have run into it several times already (probably with this comment as well).
Interesting point. It's funny that you say that because there are many spam comments that come through that don't even have content. No links or anything... just random characters. What the heck kind of spam is that?!??! There's no point to it. It's like some sort of psychological warfare :)
Nice tip on the checking for duplicate values. That is a very cool idea!
I have tried checking for duplicates and that has worked very well (in fact, the custom tags I use for forms reject any form where more than half of the values are the same - assuming enough fields).
This doesn't work, however, when I have a single field form (like search or "sign up for newsletter").
I would love to see an approach that would work for these as well.
Additionally, I expect that the "check for duplicates" approach will be circumvented pretty soon.
After making you aware of Auto-Fill problem in your previous solution, I changed some of the hidden field name (I changed 'contact_email' & 'contact_email2' to 'MESSAGE_341' & 'MESSAGE_342' (just random numbers, I didn't want to use 'Message' & 'Message2' because I use this for many different forms and message is a field name I use pretty often in my forms)) to avoid the Auto-fill problem. It has been over a week and after hundreds of spam attempts I'm happy to report that not one spam message got through. I find that most of the times they do not attempt to fill the two message fields I have created. I think that the reason for that (I hope) is because I already have a 'message' field in my form which they would rather fill. They get caught because they fill the 'contact_url' field.
Yeah, I ended up adding back in a few hidden fields that shouldn't be used by an auto-form-filling app. We will see what works.
I was ignorant to your post here when I posted up my own solution to this problem yesterday. In my comments Steve Bryant pointed out that you had done something similar.
My method is different in 2 ways.
2; Because some bots might have cached the form, I generate a hash that represents the current day - if this doesn't match on the server side, the form isn't processed.
So far, the 4 sites we've implemented this on have gone from hundreds of form spams per week to zero. If I need my form to validate, i'll just resort to using a CAPTCHA.
My solution is here: http://www.cftopper.com/index.cfm?blogpostid=155
I'll tell ya, Mark Kruger has an anti-sql injection filter on his blog that's a bear... I really wish folks would take the time to write them to send flagged messages to moderation instead of unceremoniously aborting.
Though Mark did loosen it up a bit recently because of some comments I left on an entry about cfqueryparam. :)
The method I have been using with good success is the "honey pot" method. It works for both standard and blind users, assuming that screen readers ignore CSS (and therefore show critical labels such as "Don't fill in these form fields").
In conjunction with that, I do have some keyword and red-flag filtering on the back end. But, no, I don't use any moderation - it either works or it doesn't. If I find comments that I consider spam, I can delete them manually.
Ahh, well my reasoning behind sending things to a moderation bin if they get flagged is two-fold - 1 it's to prevent false positives from frustrating users who want to participate and 2 to reduce the amount of spam that subscribers get in their email via the comment notifications. I don't worry too much about the spam I receive in my email, but I know others do.
Thanks for the clarification btw. :)
That is understandable. My spam blocking has been fairly good so far. I'll get someone coming in from time to time who will get a few posts by it, but I update my filtering when that happens. My spam filtering is a neural-net processor - a learning computer - the more contact it has with humans, the more it learns. Oh wait, maybe I am getting confused with the Terminator??
If someone does fail to submit a "valid" post, it will tell them something like, "There were errors trying to submit your comment". It can be frustrating, but it gives them a chance to clean it up. I try not to be too informative as I don't want people trying to get around it.