My new anti-spam ColdFusion solution has been in place since Friday and I have already gotten about 15 or 20 spam comments posted to my site. Sorry if anyone was offended by the steroid or penis enlargement emails they may have gotten. My bad. Even though my solution is not working, I thought I would share it with you as it points out how freakin' good these spam bots really are!
This solution was a time-based solution. It was meant to leverage spam-bot's speed against them. It was also built to allow a form to be active for only a short period of time. These are the requirements that I put in place:
The 7 minutes validation was to ensure that spam bots were not storing form data and then submitting it later. The 10 second validation was to ensure that a spam bot couldn't just load the page and immediately submit data. As clever as I thought this was, it is not working. On top of the general time constraints, all spam markers were encrypted so that they could not be hacked (easily).
Ok, so here's how it was done. At the bottom of every action page (before the form was rendered), I generated a random anti-spam key and used it to encrypt the time constraints:
Launch code in new window » Download code as text file »
At this point, we should have our two time stamps and our random spam key encrypted and ready for use. In the page output, at the top of the form, I include these three values as hidden form fields:
Launch code in new window » Download code as text file »
Once the form has been submitted, we need to check to see if the time constraints have been violated:
Launch code in new window » Download code as text file »
Of course, if the form errors collection, REQUEST.FormErrors, has a size to it (has any messages), then the form cannot be submitted and the form is re-rendered with the error messages displayed.
Now, I think this is pretty clever and I am a bit shocked that spam messages are making it through. But, seeing as the spam messages are getting through, what does this mean about the spam bots:
Perhaps the spam bot can figure out how to decrypt my spam key and then generate it's own time stamps. However, seeing as my spam key is randomly generated for every single page load, I don't see how this could possibly be done? How can you figure out how to decrypt a randomly generated number? I suppose if they found patterns in the time stamps, it could be done - I am no cryptographer.
If it's not a matter of decryption, then it must mean that the spam bot does not submit the form instantly. If the spam bot performed the submit via some sort of automated post (such as CFHttp), then it would probably violate the 10 second minimum requirement. If it stored the form and submitted it later, then it would probably violate the 7 minute active time. This must mean that some spam bots either physically load the page in a browser and wait some time before submitting it, or they wait some time before submitting it via CFHttp (or similar).
If that is the case, that is one smart bot! It looks like I am going to have to go back and add some hidden fields to try and get the spam bot to slip up and submit data that a human would NOT submit. Damn you spam bots, DAMN YOU!
I am not quite ready to give up. I am also not quite ready to rely on an outside anti-spamming service. It may come to that eventually, but I will not go quietly into that night.
Download Code Snippet ZIP File
Comments (18) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Project HUGE: The Psychological Problem With Losing Weight
Yikes! New Anti-Spam Technique Not Working!
That is a brilliant solution!
It is fascinating that it didn't work.
I'm curious to see what other solutions are possible without CAPTCHA or similar.
Posted by Steve Bryant on Feb 11, 2007 at 10:21 AM
Steve,
Thanks for liking it. I am also a bit surprised that it didn't work. However, I do believe that all the spam messages that have come in this weekend have come from the one or two bots (I don't think it's a massive attack form all spam bots).
Furthermore, there doesn't seem to be any regularity to it. A few come in at a single time (span of a few minutes) and then nothing for a few hours. Not sure what the spamming strategy is these days.
So, it looks like it is stopping most spam, but not all. I will keep tinkering.
Posted by Ben Nadel on Feb 11, 2007 at 10:25 AM
I'm using Jake Munson's CFFormProtect, and it's implemented this along with some of the other stuff you've talked about. It is blocking spam nicely - I've yet to have a false positive, and I've yet to have spam make it through.
If I recall correctly, you were checking the keyboard right? And someone pointed out that some people have autofill? Well, That may be fine for checkout procedures, but how could they autofill the comments?
Posted by Sammy Larbi on Feb 11, 2007 at 11:15 AM
If anything, you can have failures emailed to you and manually add in legitimate comments.
Posted by Sammy Larbi on Feb 11, 2007 at 11:17 AM
Sammi,
I will check out Jacob's solution. I remember him posting about it. As far as autofill stuff, I don't know much about it as I never use autofill. In my previous solution, however, there were hidden fields that hand names that contained key words like "email" and "name". I suspect that some autofill functionality searches not just on full field names but on partial field names????
Posted by Ben Nadel on Feb 11, 2007 at 11:22 AM
Oh, I see. I thought (remembered incorrectly) you were worried about autofilling everything and not seeing the keyboard being pressed.
I guess the solution to autofilling hidden fields might be to name the fields something autofill wouldn't get. Of course, I don't know how effective that would be.
Posted by Sammy Larbi on Feb 11, 2007 at 11:38 AM
Yeah, I am gonna try to get a hidden field in there (in addition the current solution). My only concern is that if the hidden field is named too strangely then the spam bots won't try to fill it in. Of course, we don't know till we know, right?
Posted by Ben Nadel on Feb 11, 2007 at 11:43 AM
Ben,
Can you give an indication of how the form is processed server side? You encrypt an anti-spam key, but you haven't said how your backend checks for the valid submission.
For example, a bot could download the form and then submit *the same form* once every minute. The first version fails since it is within the forst 10 seconds. Then 5 should get through, and the next and all after would exceed the 7 minute mark. Does your solution somehow invalidate the anti-spamkey on each submission? Maybe I missed something in your description...
Posted by AL on Feb 11, 2007 at 11:48 AM
The bots accessing my sites must be stupid because they can't even figure out where to send the form when presented with the following code:
form action="" onsubmit="this.action='guest'+'book.cfm';"
Of course the down-side here is that the user needs javascript turned on to be able to post comments when using this method.
Posted by Per Djurner on Feb 11, 2007 at 2:38 PM
Al covered my point. I've done this before too and noticed very quickly how spam came in bursts for the window of a form's active state. If you log the failures or session-ban them based on invalid submissions, it might help. But it might not if they are using a proxy (or many proxies). I often notice the same spam in series from different IP addresses.
But you should count on screen-scraping bots. If all you are doing is analyzing what is on the form and submission itself then, you are giving them everything they need to post. I've since added a session-based key, which I create on a per-article basis and remove from the session struct once a comment is successfully submitted. One might argue that it restricts users who don't have cookies enabled, but most people only disable regular cookies, not in-memory cookies. If you disable in-memory cookies a lot of stuff stops working on the web. That was a rare sacrifice I was willing to make. I still get spam though, which is why I'm reading your experiences and others'.
PS - the "must be active for 10 seconds" rule sucks when it takes longer than 7 minutes to type your message, then are redirected back to the form with a validation error and hit submit immediately.
Posted by Doug on Feb 11, 2007 at 4:04 PM
There's another possibility - it might not be a bot.
There are wankers who think tis fun or worthwhile to submit stuff for the hell of it, or else they're really low-paid cottage industry workers spamming forms everywhere.
I know there are bots there - i fight them a lot too, and I have reduced the incidence of it on a radio station site i look after, but i can't eliminate it completely. And the postings I get are slightly different each time (which is giving me heartburn looking for a pattern to search for to block). Since they're slightly different each time, I'm inclined to believe there are banks of people in some sweatshop somewhere cruising the net posting stuff.
What they gain from it, I'm at a loss to understand, becuase our guestbooks are moderated and none of it ever appears on the web.
Posted by Mike Kear on Feb 11, 2007 at 7:57 PM
Ben,
Recently, I have started using my own home grown spam control solution and have been quiet successful.
I just check each form field and make sure they are not duplicate values. Usually spams have same info in more than 1 form field.
Secondly I check each field value against Ray's spam word list (ray.camdenfamily.com / tbspam.cfm) and if I find more than one I mark it as a spam submission.
Its very basic but does the job.
Posted by Pragnesh Vaghela on Feb 11, 2007 at 10:23 PM
@Al,
I am not doing any checking to that respect. As the key is randomly generated for every form post I do not store them. However, I like the idea of storing used keys (at least temporarily). I just wanted to try and stay away from any persistent scope usage.
@Per Djurner,
That sounds good, but like you said, Javascript is required. I am just trying to find a non-Javascript way.
@Doug,
Good point with the in-memory cookie storage. I will take that into account. And yeah, I know that the 10 second rule sucks when it takes a while to enter a post. I figured most posts wouldn't take 7 minutes to write, but I have run into it several times already (probably with this comment as well).
@Mike,
Interesting point. It's funny that you say that because there are many spam comments that come through that don't even have content. No links or anything... just random characters. What the heck kind of spam is that?!??! There's no point to it. It's like some sort of psychological warfare :)
@Pragnesh,
Nice tip on the checking for duplicate values. That is a very cool idea!
Posted by Ben Nadel on Feb 12, 2007 at 7:38 AM
I have tried checking for duplicates and that has worked very well (in fact, the custom tags I use for forms reject any form where more than half of the values are the same - assuming enough fields).
This doesn't work, however, when I have a single field form (like search or "sign up for newsletter").
I would love to see an approach that would work for these as well.
Additionally, I expect that the "check for duplicates" approach will be circumvented pretty soon.
Posted by Steve Bryant on Feb 12, 2007 at 9:51 AM
Ben,
After making you aware of Auto-Fill problem in your previous solution, I changed some of the hidden field name (I changed 'contact_email' & 'contact_email2' to 'MESSAGE_341' & 'MESSAGE_342' (just random numbers, I didn't want to use 'Message' & 'Message2' because I use this for many different forms and message is a field name I use pretty often in my forms)) to avoid the Auto-fill problem. It has been over a week and after hundreds of spam attempts I'm happy to report that not one spam message got through. I find that most of the times they do not attempt to fill the two message fields I have created. I think that the reason for that (I hope) is because I already have a 'message' field in my form which they would rather fill. They get caught because they fill the 'contact_url' field.
Posted by Shloime Henig on Feb 12, 2007 at 12:32 PM
@Shloime,
Yeah, I ended up adding back in a few hidden fields that shouldn't be used by an auto-form-filling app. We will see what works.
Posted by Ben Nadel on Feb 12, 2007 at 3:40 PM
Hi Ben,
I was ignorant to your post here when I posted up my own solution to this problem yesterday. In my comments Steve Bryant pointed out that you had done something similar.
My method is different in 2 ways.
1; I use javascript to change the location the form is pointing to. I make it clear that this won't work if your site must be compliant. (This is similar to Per's solution above.)
2; Because some bots might have cached the form, I generate a hash that represents the current day - if this doesn't match on the server side, the form isn't processed.
So far, the 4 sites we've implemented this on have gone from hundreds of form spams per week to zero. If I need my form to validate, i'll just resort to using a CAPTCHA.
My solution is here: http://www.cftopper.com/index.cfm?blogpostid=155
Best Regards,
Topper
Posted by Topper on Feb 13, 2007 at 7:02 AM
@Topper,
That looks good. I am hesitant to go the Javascript route for the time being, but that might just be the best of all solutions. In all seriousness, who doesn't have Javascript enabled?
Posted by Ben Nadel on Feb 13, 2007 at 7:39 AM