Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with:

Ask Ben: Extracting Parts Of An Email Address In Javascript

By Ben Nadel on

How can I use Javascript to grab the different parts of an email address that someone enters in a form?

Grabbing parts of the email address all comes down to understanding how email addresses are formed. This is different from validating an email address, but related. Email addresses are made up of three parts: the name, the domain, and the extension. Forgive me if those are not the "official" names.

If you take the email address "hot.n.sexy@college.girls.com" for example, the parts are as follows:

name: hot.n.sexy
domain: college.girls
extension: com

Emails are always in this format; the name and the domain are separated by a "@" and the domain and the extension are separated by a ".". Now, if you have been following my blog, you will know that I love regular expressions and that's exactly how we are going to do this. Let's make a function that takes an email address and returns a structure with the three parts of the email:

  • function GetEmailParts( strEmail ){
  • // Set up a default structure with null values
  • // incase our email matching fails.
  • var objParts = {
  • user: null,
  • domain: null,
  • ext: null
  • };
  •  
  • // Get the parts of the email address by leveraging
  • // the String::replace method. Notice that we are
  • // matching on the whole string using ^...$ notation.
  • strEmail.replace(
  • new RegExp( "^(.+)@(.+)\\.(\\w+)$" , "i" ),
  •  
  • // Send the match to the sub-function.
  • function( $0, $1, $2, $3 ){
  • objParts.user = $1;
  • objParts.domain = $2;
  • objParts.ext = $3;
  • }
  • );
  •  
  • // Return the "potentially" updated parts structure.
  • return( objParts );
  • }

In the above, we are matching the whole email address and grouping the three parts in a regular expression. Those parts are then being passed to the sub-function at the point of matching: $1 is the entire matched email, $1 is the name, $2 is the domain, and $3 is the extension. If the pattern is not matched, then the "objParts" object never gets updated and will return with all nulls.

Note: This does NOT validate an email address. This merely looks for a very simple form of email address.



Reader Comments

Although you did disclaim that this is not a good email address validator, note that it will also run slowly when tested against certain input constructions.

The regex can be improved (in both its validation strength and performance) as follows:

new RegExp("^([a-z\\d._%-]+)@((?:[a-z\\d-]+\\.)+)([a-z]{2,6})$", "i")

Or, without using the RegExp constructor:

/^([a-z\d._%-]+)@((?:[a-z\d-]+\.)+)([a-z]{2,6})$/i

Or, if you want to preserve the performance improvements but don't care about it's validation strength (note that it would take thousands of characters to write an email validation regex that followed the official RFC 822 standard to the letter):

/^([^@]+)@((?:[^.]+\.)+)(.+)$/i

Or, since I don't know why you'd want to separate the top-level domain (e.g., ".com") from the other segments of the domain name, you could capture just two backreferences as follows:

/^([^@]+)@(.+)$/i

Reply to this Comment

Oops. The "i" (case insensitive operator) is not necessary at the end of the last two regexes, since they don't use any a-z characters.

Reply to this Comment

Steve,

I am a bit confused about the use of non-capturing groups (?:non-capturing). I see in a bunch of your examples that you use it sometimes but not others.

First off, is that something that I should use if I do not need to reference the group (does it affect performance).

Secondly, does a non-capturing group refer to back references within the regular expression only, or does it also mean that that group will not be returned as a result of the pattern/matching.

Reply to this Comment

I near-religiously use non-capturing groups whenever I do not need to reference a group's contents. There are only three reasons to use capturing groups:

1. You're using parts of the match to construct a replacement string, or otherwise referencing parts of the match in code outside the regex.
2. You need to reuse parts of the match within the regex itself. E.g., (["'])(?:\\\1|.)*?\1 would match values enclosed in either double or single quotes, while requiring that the same quote type start and end the match, and supporting inner, escaped quotes of the same type as the enclosure.
3. You need to test if an optional group was part of the match so far, as the condition to evaluate within a conditional. E.g., (a)?b(?(1)c|d) only matches the values "bd" and "abc".

There are two primary reasons to use non-capturing groups if a grouping doesn't meet one of the above conditions:

• Yes, capturing groups negatively impact performace, since creating backreferences requires that their contents be stored in memory. The performance hit may be tiny, especially when working with small strings, but it's there.
• When you need to use several groupings in a single regex, only some of which you plan to reference later, it's very convenient to have the backreferences you want to use numbered sequentially. E.g., the logic in my parseUri() UDF (http://badassery.blogspot.com/2007/01/parsing-uris-in-coldfusion.html) could not be nearly as simple if I had not made appropriate use of capturing and non-capturing groups within the same regex.

On a related note, the values of backreferences created using capturing groups with repitition operators on the end of them may not be obvious until you're familar with how it works. E.g., if you ran the regex (.)* over the string "test", backreference 1 would be "t", not "test". Also, there would be no 2nd, 3rd, or 4th backreferences created for the strings "e", "s", and "t". However, the entire string would of course be used for backreference 0, which always contains the entire match. If you wanted the entire match of a repeated grouping to be captured into a backreference, you could do, e.g., ((?:.)*)

Reply to this Comment

[does a non-capturing group refer to back references within the regular expression only, or does it also mean that that group will not be returned as a result of the pattern/matching.]

They're not captured as backreferences, period. Hence, their values are not referenceable in- or outside of the regex.

Reply to this Comment

Steve,

Couple of questions: (["'])(?:\\\1|.)*?\1

I do not follow the \\\1 in the middle group. You said that that was an escaped closing of the same type (group 1). I do not follow. Does that mean that the middle group can have quotes in it? If that is the case, how does the reluctant search in the middle (*?) know when to stop if it can have quotes in side of it? What am I missing?

In the expression: (a)?b(?(1)c|d)

What does the construct ?(1) mean? Is that testing to see if group one was found? I have never seen this before.

Your URI Parser demonstrates flagrant baddassery :) I understand the concept, but reading the regular expressions is difficult. Working my way through it though.

Reply to this Comment

[Couple of questions: (["'])(?:\\\1|.)*?\1
I do not follow the \\\1 in the middle group. You said that that was an escaped closing of the same type (group 1). I do not follow.]

Basically, it will correctly match all of the following strings:

"test", 'test', "te\"st", '\'t\'e\'\'\'s\'t', etc.

It allows any number of escaped quotes of the same type as the enclosure. (Due to the way the regex is written, it doesn't need special handling for inner quotes that are not of the same type as the enclosure.)

As for how the regex works, it is similar in construct to the examples I gave on my blog post about regex recursion without balancing groups (http://badassery.blogspot.com/2006/03/regex-recursion-without-balancing.html).

Basically, the inner, lazily-repeated grouping matches escaped quotes OR any single character, with the escaped quote part before the dot in the attempt sequence. So, as it lazily steps through the match looking for the first closing quote, it jumps right past each instance of the two characters which together make up a closing quote.

Final note... if you wanted to support multi-line quotes in libraries without an option to make dots match newlines, change the dot to [\S\s]

Reply to this Comment

As for your question about my example of a regex conditional, well, there are two types of conditionals (testing for the presense of an optional capturing group, and testing lookarounds), but either way the construct is like this:

(?(IfCondition)Then|Else)

So, the example goes, if optional capturing group 1 matched its contents, match literal string "c", else, match literal string "d".

Reply to this Comment

Or, to better clarify, instead of "optional capturing group 1", I should have written, "capturing group 1 (which is optional, so testing for its presense isn't pointless, unlike if we'd tested a non-optional grouping)".

Reply to this Comment

Note that with regex engines which support negative lookbehinds (i.e., not those used by ColdFusion, JavaScript, etc.), the following pattern would be equivalent to (["'])(?:\\\1|.)*?\1

(["']).*?(?<!\\)\1

Because I use JS and CF a lot, I automatically default to constructing patterns in ways that don't require lookbehinds.

One thing worth noting is that in neither regex did I try to use anything like [^\1] for matching quoted content. If that worked as you might expect, it would allow us to construct a slightly faster regex which would greedily jump from the start to end of the quote and/or between escaped quotes. We can't greedily repeat an "any character" pattern such as a dot or [\S\s] because then we wouldn't be able to distinguish between multiple discrete quotes within the same test string, and our match would go from the start of the first quote to the end of the last quote. However, we can't use [^\1] either, because you can't use backreferences within character ranges (negated or otherwise), even though in this case the match contained within the backreference is only one character in length. Also note that the patterns [\1] and [^\1] actually do have special meaning, though possibly not what you would expect. They mean: match a single character which is/is not octal index 1 in the character set.

Reply to this Comment

Steve,

Ahhh, I see. I didn't see what you mean when you were talking about nested escaped quotes (ex. \"). I see what you mean. For some reason, "escaped" never rung any bells in my head.

Duuuuude:

(?(IfCondition)Then|Else)

What?!?!? Freakin' awesome. I have never seen anything like this. How do I miss this stuff. I think I just need to print out every page on regular-expressions.info and just sit down and read through it.

You really make all this stuff very clear. Thanks a lot for putting in the time and effort to help me out. I think that would be an awesome idea, what you said on your site, about having a Regular Expressions in-depth segment on a usual basis. Rock on.

Reply to this Comment

Hey Steve,

I just tried to run the abcd example:

(a)?b(?(1)c|d)

... and I get this error when I use the Java regex underlying ColdFusion:

Unknown inline modifier near index 7 (a)?b(?(1)c|d) ^ null

So you know if this is not supported by Java?

Reply to this Comment

I guess you didn't see the Unicode-based quoted string matching example (bullshit like that is fun for me) which I tacked on the end of my blog post after the fact, where I noted that yeah, unfortunately for both of us, Java doesn't support conditionals (though PCRE, PHP, the .NET framework, RegexBuddy, and possibly others do). IIRF (http://cheeso.members.winisp.net/IIRF.aspx), which we have on my company's servers, uses the PCRE library though, so at least I can use conditionals in my URL rewrites. :)

Reply to this Comment

@Michael,

I am not sure what you mean? You can use the function call results in anyway you want. You don't even have to output them?

Reply to this Comment

Excuse the novice. I need to capture the email address, then place the domain name in a database table.

Was trying cfoutput to sql, but got an error. Using <cfoutput>#objParts.user#</cfoutput>

Thxs

Reply to this Comment

@Michael,

The example above is in Javascript. You are trying to interact with ColdFusion / SQL. These happen in two different places. The Javascript takes place on the client (browser). The ColdFusion / SQL takes place on the server. There is not inherent way for Javascript values to be used in ColdFusion from a page flow standpoint.

However, you can pass values from Javascript to ColdFusion via AJAX or form submissions. Most likely, you probably just want a way to do this in ColdFusion with a user defined function.

Reply to this Comment

@Michael,

In ColdFusion, you could probably treat the email as an address:

<cfset name = ListFirst( email, "@" ) />

<cfset domain = ListLast( email, "@" ) />

<cfset ext = ListLast( domain, "." ) />

<cfset subdomain = ListDeleteAt( domain, ListLen( domain, "." ), "." ) />

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.