The 16th Annual Regular Expression Day - June 1st 2023

By Ben Nadel

Published 2023-06-01 in JavaScript / DHTML — Comments (2)

It's that time of year again! The days are getting longer; the weather is getting nicer; the babies are all being born at the zoo; and, people are going bonkers over the undeniable power of Regular Expression pattern matching! Which must mean, it's Regular Expression Day! This is the time of year in which we take a moment to reflect on how much better off we are having patterns in our lives. And in celebration of that, I'm going to learn something new about using Regular Expressions in JavaScript: named capture groups.

Run this demo in my JavaScript Demos project on GitHub.

View this code in my JavaScript Demos project on GitHub.

ASIDE: This is not the first time I've looked at named capture groups. I've lightly explored this topic in Node.js and in ColdFusion. The concept and implementation details are basically the same as in JavaScript and the browser runtime.

Regular Expressions provide us with a way to locate text within an input value. But, unlike the .indexOf() method which operates on a literal match, RegEx-based matching allows us to flexibly locate text based on a pattern definition.

As part of that pattern definition, we can define "capture groups". The capture groups give us a way to reference substrings within the matches that we locate in our input. Traditionally, the capture groups are referenced by the order-number in which they are defined (left-to-right by open parenthesis). Named capture groups, however, allows us to give an explicit name to the group such that it is more intuitive to reference later on.

A capture group in JavaScript uses the following syntax:

(?<name>pattern)

... where the <name> becomes the identifier of the group matched by the pattern.

This doesn't replace the concept of ordered capture groups, it simply augments it. Meaning, the given capture group can also be referenced by the its index-order.

To experiment with named capture groups, I'm going to parse an email address. As I'm sure many web developers are aware, some email providers allow us to use plus-style addressing, which lets developers embed arbitrary data as part of the email address user. The following two email addresses are the "same user", but the second one uses plus-style addressing to include additional text:

ben@bennadel.com
ben+additional-text@bennadel.com

Plus-style addressing is an amazing feature! In fact, it's how I've enabled "Comment by reply email" on this blog. To use this in the context of Regular Expressions, let's parse the email into 3 parts:

User - (?<user>...)
Hash (optional group)- (?<hash>...)
Domain - - (?<domain>...)

... each of which will be captured in a named group:

  
          <!doctype html>
        
          <html lang="en">
        
          <body>
        
          	<h1>
        
          		Using RegEx Named Capture Groups In JavaScript
        
          	</h1>
        
          	<script type="text/javascript">
        
          		var inputs = [
        
          			// Standard email address format.
        
          			"ben@bennadel.com",
        
          			// Email address with "hash" (plus-style addressing) that contains arbitrary
        
          			// information to be included in the email.
        
          			"ben+patterns@bennadel.com"
        
          		];
        
          		// Since JavaScript RegExp patterns do not support the "Verbose" (?x) flag, I'm
        
          		// going to build the RegEx pattern in parts and then join them together as a
        
          		// string. In the following pattern, I'm using the (?<name>) syntax to create
        
          		// names for the captured groups. This will make them easier to reference in the
        
          		// in matching.
        
          		var parts = [
        
          			// Match the start of the input.
        
          			"^",
        
          			// Our first named capturing group is everything before the "@" and "+" signs.
        
          			// This is the "user" associated with the email domain.
        
          			"(?<user>[^+@]+)",
        
          			// The email "hash" (plus-style addressing) is an optional match. But, if it
        
          			// is present, it will start with the "+" and then include everything up to
        
          			// the "@" delimiter.
        
          			"(?:\\+",
        
          				"(?<hash>[^@]+)",
        
          			")?",
        
          			// Literal match for our email delimiter.
        
          			"@",
        
          			// Our last named capturing group is the domain, which includes everything
        
          			// after the "@" literal.
        
          			"(?<domain>.+)",
        
          			// Match the end of the input.
        
          			"$"
        
          		];
        
          		// --------------------------------------------------------------------------- //
        
          		// --------------------------------------------------------------------------- //
        
          		var pattern = new RegExp( parts.join( "" ) );
        
          		for ( var input of inputs ) {
        
          			// Since our pattern matches against the start/end of the input, we only have
        
          			// to call exec() once per input.
        
          			var result = pattern.exec( input );
        
          			console.group( `RegExp Match (${ input })` );
        
          			// The named groups are captured in a "groups" property.
        
          			console.log( ...highlight( "User:" ), result.groups.user );
        
          			console.log( ...highlight( "Hash:" ), result.groups.hash );
        
          			console.log( ...highlight( "Domain:" ), result.groups.domain );
        
          			// The traditional capture group index-based references are still available.
        
          			console.log( "- - -" );
        
          			console.log( "$0:", result[ 0 ] );
        
          			console.log( "$1:", result[ 1 ] );
        
          			console.log( "$2:", result[ 2 ] );
        
          			console.log( "$3:", result[ 3 ] );
        
          			console.groupEnd();
        
          		}
        
          		// --------------------------------------------------------------------------- //
        
          		// --------------------------------------------------------------------------- //
        
          		// Utility method to apply CSS highlighting to the given value.
        
          		function highlight( value ) {
        
          			return([
        
          				( "%c" + value ),
        
          				"background-color: yellow ; font-weight: bold ; padding: 2px 3px 2px 7px ;"
        
          			]);
        
          		}
        
          	</script>
        
          </body>
        
          </html>

view raw index.htm hosted with ❤ by GitHub

NOTE: This is not intended to be an exhaustively correct email parser - this is just a demo.

As you can see, I'm building up the RegExp object using string concatenation in order to add comments - unfortunately, JavaScript doesn't have a "verbose flag" like ColdFusion does. When I then match the given pattern against the given email address input, the exec() result contains a special .groups property which holds are named capture groups:

Console log output of demo showing that each part of the Regular Expression match can be referenced both by the named capture group as well as the index order.

As you can see, the named capture groups can also be referenced by their index-based capture as well. This gives us a lot of flexibility; and, allows us to create code that is more human-friendly.

With that, I wish you all a Happy Regular Expression Day! And, if all this pattern matching has gotten you all hot and bothered, please checkout my Video presentations: Regular Expressions, Extraordinary Power.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4471

Reader Comments

Gary F Jun 3, 2023 at 12:21 AM

29 Comments

I never knew about named pattern groups. Never too old to learn a new trick. Thank you Ben. :-)

	<!doctype html>
	<html lang="en">
	<body>

	<h1>
	Using RegEx Named Capture Groups In JavaScript
	</h1>

	<script type="text/javascript">

	var inputs = [
	// Standard email address format.
	"ben@bennadel.com",
	// Email address with "hash" (plus-style addressing) that contains arbitrary
	// information to be included in the email.
	"ben+patterns@bennadel.com"
	];

	// Since JavaScript RegExp patterns do not support the "Verbose" (?x) flag, I'm
	// going to build the RegEx pattern in parts and then join them together as a
	// string. In the following pattern, I'm using the (?<name>) syntax to create
	// names for the captured groups. This will make them easier to reference in the
	// in matching.
	var parts = [
	// Match the start of the input.
	"^",
	// Our first named capturing group is everything before the "@" and "+" signs.
	// This is the "user" associated with the email domain.
	"(?<user>[^+@]+)",
	// The email "hash" (plus-style addressing) is an optional match. But, if it
	// is present, it will start with the "+" and then include everything up to
	// the "@" delimiter.
	"(?:\\+",
	"(?<hash>[^@]+)",
	")?",
	// Literal match for our email delimiter.
	"@",
	// Our last named capturing group is the domain, which includes everything
	// after the "@" literal.
	"(?<domain>.+)",
	// Match the end of the input.
	"$"
	];

	// --------------------------------------------------------------------------- //
	// --------------------------------------------------------------------------- //

	var pattern = new RegExp( parts.join( "" ) );

	for ( var input of inputs ) {

	// Since our pattern matches against the start/end of the input, we only have
	// to call exec() once per input.
	var result = pattern.exec( input );

	console.group( `RegExp Match (${ input })` );
	// The named groups are captured in a "groups" property.
	console.log( ...highlight( "User:" ), result.groups.user );
	console.log( ...highlight( "Hash:" ), result.groups.hash );
	console.log( ...highlight( "Domain:" ), result.groups.domain );

	// The traditional capture group index-based references are still available.
	console.log( "- - -" );
	console.log( "$0:", result[ 0 ] );
	console.log( "$1:", result[ 1 ] );
	console.log( "$2:", result[ 2 ] );
	console.log( "$3:", result[ 3 ] );
	console.groupEnd();

	}

	// --------------------------------------------------------------------------- //
	// --------------------------------------------------------------------------- //

	// Utility method to apply CSS highlighting to the given value.
	function highlight( value ) {

	return([
	( "%c" + value ),
	"background-color: yellow ; font-weight: bold ; padding: 2px 3px 2px 7px ;"
	]);

	}

	</script>

	</body>
	</html>

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!