A couple of months ago, after getting deluged with spam comments, I finally broke-down and added comment moderation to my ColdFusion blog. My initial implementation included an email that I would receive with embedded action-links to "Approve Comment" and "Reject Comment". This worked great for me; but, I noticed that I would occasionally see a
404 Not Found error in my logs pointing to one of these moderation URLs.
At first, I thought it was probably just user-error on my part (perhaps double-clicking one of the URLs). But, upon looking into the logs, I saw that they were all coming from
cache.google.com. It seems that Google (via GMail) was - at least sometimes - attempting to spider URLs embedded within my emails.
If the URLs pointed to innocuous things, like a blog post, I wouldn't give it a second thought. But, the URLs showing up in my error logs were for destructive actions. If Google attempted to proactively spider one of my "Reject Comment" action-links, it could have, theoretically, removed legitimate content from my blog.
NOTE: I see no evidence that it ever tried to spider a comment-moderation URL before I consumed it. However, it's clear now that such a behavior may be possible. In fact, if you look at this Google Support thread - Gmail is opening and caching urls within emails without user intervention. How and why? - you can see people complaining that Google was actually consuming "unsubscribe links" and "one-time password links". Yikes!
Fundamentally, my strategy for comment-moderation was flawed: I was using a
GET method, embedded within my email, to perform a destructive action. All destructive actions should really be performed behind something like a
DELETE method. This is proper web etiquette.
To move my comment moderation away from a
GET request, I've removed the direct action-links from my email and am now, instead, pointing to a web-based form. This form must then be submitted via
POST in order for the comment moderation action to be applied.
It's a small inconvenience for me because I now need to perform two actions for each moderation: clicking a link to the form and then submitting the form. But, in the long run, this is a more "correct" workflow for this type of request.
Here's another link where this was being discussed on Hacker News.
Just wondering why your comment moderation links weren't in a secured area of your site? Wouldn't you want to protect them from non-logged in users too?
Great question! The links weren't behind a login per-say; but, the URLs were signed and then validated on the server. So, it's not like anyone could guess the content moderation links - only the person receiving the email could use them (and they only work for a given pending comment - it wasn't a magic-link sign-in or anything).