Who here hasn't had to validate an email address format at some point? Heck, I do that in practically every application that I build. And why do I do it? The truth is, I don't really care if people mess up entering their own email address. I mean sure, I validate that. But really, my main concern is NOT having the ColdFusion page request crap out. If you attempt to send a CFMail tag with a bad email address format, it will throw an error.
Sometimes, my email validation is not perfect and I end up not allowing some emails that are actually valid. This got me thinking - what is a valid email address? Or rather, what does ColdFusion think a valid email address is? There are two ways to look at this:
To test this, I set up an array of emails and then tried sending emails out. In the example below, you will notice that I use ".ben" in the email extension a lot. That is because I don't actually want to send emails to valid addresses. I just want to test to see if they send action works (yes, I did get several hundred undeliverable emails).
Launch code in new window » Download code as text file »
The results are kind of surprising. I was a bit shocked how many emails actually can get sent through CFMail with completely horrible email addresses. Here are the results (I have modified the table for display):
| | | | ||
| | ![]() | | ||
| | | |
As you can see, only THREE emails crashed the ColdFusion CFMail tag. It has hardly any issues. The salmon rows are the rows where IsValid() and CFMail disagree as to what is a valid email address. Very interesting indeed. Honestly, I think my email validation can be MUCH MUCH more simple (if only caring about the CFMail tag crashing). But I guess, I can start to use IsValid(). But, and I am no expert on email address formatting, but IsValid() seems very relaxed about some of the stuff above.
Download Code Snippet ZIP File
Comments (8) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
This one is a good one for Damon Cooper for enhancement to Scorpio. You can contact him or I can file an enhancement if you'd like. Great work by the way.
Posted by Sami Hoda on Sep 12, 2006 at 9:00 PM
Sami,
I just emailed Damon. But I am a little fish and I never hear back from anyone. Feel free to get in contact with him if you think it will help things along.
Thanks!
Posted by Ben Nadel on Sep 12, 2006 at 10:44 PM
Perhaps more information will help.
The IsValid() function uses the following regular expression to determine if the email is valid:
^[a-zA-Z0-9-'\+~]+(\.[a-zA-Z0-9-'\+~]+)*@([a-zA-Z_0-9-]+\.)+[a-zA-Z]{2,7}$
The CFMail tag uses the Sun Java class javax.mail.internet.InternetAddress parse() function. Since the implementation uses JavaMail, this is how we generate the InternetAddress objects that we pass in for the addresses (to, from, cc, etc).
The "strict" attribute is turned on. The JavaDoc says of this:
"Parse the given sequence of addresses into InternetAddress objects. If strict is false, simple email addresses separated by spaces are also allowed. If strict is true, many (but not all) of the RFC822 syntax rules are enforced. In particular, even if strict is true, addresses composed of simple names (with no "@domain" part) are allowed. Such "illegal" addresses are not uncommon in real messages.
Non-strict parsing is typically used when parsing a list of mail addresses entered by a human. Strict parsing is typically used when parsing address headers in mail messages"
See the JavaDoc at http://java.sun.com/products/javamail/javadocs/javax/mail/internet/InternetAddress.html
Hope that clears it up for you.
Posted by Tom Jordahl on Sep 13, 2006 at 1:18 AM
i've done some work w/javamail & what tom says is true (of course). even strict parsing passes a lot of addresses some folks would consider "bad", though by the same definition so does the RFC.
i guess complain to sun: http://java.sun.com/products/javamail/index.jsp better yet, if you get on the javamail list you can complain directly to bill shannon.
Posted by PaulH on Sep 13, 2006 at 3:20 AM
Hey guys, I really appreciate the information. I am not familiar at all with the javax classes. I see that I can create them using CreateObject(). That is kind of cool.
But please, I don't want to be misunderstood. I wasn't attacking email validation. I DON'T want to complain to anyone. It's not important to me that some emails get through that maybe are not the best. As I said, I don't want the page to crash and from what I can see, I can relax a bit of my email validation. That was really my main point.
But again, thanks for all the feedback.
Posted by Ben Nadel on Sep 13, 2006 at 7:26 AM
This is great information. Thanks for posting this, Ben and Tom.
However, it confirms my fears about using ColdFusion's email address validation (through isValid() or cfparam), since the regex Tom posted is a little crazy, IMO (despite my acceptance that an email validation regex should not follow RFC 822 to the letter). For example, it allows underscores in the hostname (which technically makes it an invalid domain name), and yet there's no support for internationalized domain names (or usernames). And what's with the seemingly arbitrary seven-character top-level domain cap, when the longest official TLDs (.museum and .travel) are six characters? Is it trying to support reserved TLDs like .example and .invalid? If so, what about .localhost, which is eight characters? There are a number of other issues I could point out as well.
In any case, that regex (and any others which control validation provided by isValid/cfparam) should be in the LiveDocs.
Posted by Steve on Apr 17, 2007 at 1:41 PM
Ahh, email validation is not actually correct for many of those entries - http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx
I recently got a support request for an email that looks like
name&name@domain.com
On research, the "&" actually appears to be valid, although it is though to be not valid.
This is causing some problems.
Posted by Stephen Cassady on Dec 6, 2007 at 3:14 PM
@Stephen,
I just recently heard of a big problem that was caused by the & characters. It was something bit, like New York Times email addresses or something. Can't remember where I heard it.
Just tried running this:
#IsValid( "email", "name&name@domain.com" )#
... and it returns NOT valid.
Posted by Ben Nadel on Dec 10, 2007 at 8:14 AM