Skip to main content
Ben Nadel at CFUNITED 2009 (Lansdowne, VA) with: Ray Camden and Todd Sharp
Ben Nadel at CFUNITED 2009 (Lansdowne, VA) with: Ray Camden@cfjedimaster ) and Todd Sharp@cfsilence )

More Slicing And Dicing Strings Using Java Regular Expressions In ColdFusion

By on
Tags:

After yesterday's post on parsing lists using a regular expression delimiter in ColdFusion, I was all high on the celebration of RegEx Day 2022; and, I wanted to keep the party going! So, I started to think about how else I might want to slice and dice strings using RegEx patterns. In a previous post on parsing inbound emails from Postmark, I created a user defined function (UDF) called jreBefore(). For this post, I wanted to showcase that method, and a few other methods like it, in ColdFusion.

ASIDE: Eventually, I'll add these methods to my JRegEx Project.

First, let's look at the jreBefore() method that I mentioned above as well as a new version called jreBeforeLast(). These two methods extract the leading part of an input String up until the match of a given Regular Expression pattern. The former slices up until the first match of the pattern while the latter slices up until the last match of the pattern.

jreBefore( "abcabc", "b" ) == "a"

jreBeforeLast( "abcabc", "b" ) == "abca"

NOTE: In all of the code samples, I am omitting any validation on the Regular Expression pattern for the sake of brevity. In reality, I would likely validate that the pattern was non-empty as an empty pattern wouldn't make sense for these ColdFusion functions.

<cfscript>

	/**
	* I return the leading portion of the string up until first match of the given
	* pattern. If the pattern cannot be matched, the entire string is returned.
	*/
	public string function jreBefore(
		required string input,
		required string patternText
		) {

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		var parts = javaCast( "string", input )
			.split( patternText, 2 )
		;

		return( parts[ 1 ] );

	}


	/**
	* I return the leading portion of the string up until last match of the given pattern.
	* If the pattern cannot be matched, the entire string is returned.
	*/
	public string function jreBeforeLast(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			.compile( patternText )
			.matcher( input )
		;

		if ( ! matcher.find() ) {

			return( input );

		}

		var previousStart = matcher.start();

		while ( matcher.find() ) {

			previousStart = matcher.start();

		}

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		return( javaCast( "string", input ).substring( 0, previousStart ) );

	}

</cfscript>

In both of these methods, if the Regular Expression pattern cannot be matched, the entire input is returned. I modeled this after ColdFusion's listFirst() function: if no delimiter can be found, the entire input is thought to represent the first list item.

For the next two methods, let's look at the opposite extraction: getting the tailing portion of a string after a given Regular Expression match. In this case, jreAfter() extracts everything after the first match of a the RegEx pattern and jreAfterLast() extracts everything after the last match of the RegEx pattern.

jreAfter( "abcabc", "b" ) == "cabc"

jreAfterLast( "abcabc", "b" ) == "c"

<cfscript>

	/**
	* I return the trailing portion of the string starting after the first match of the
	* given pattern. If the pattern cannot be matched, the empty string is returned.
	*/
	public string function jreAfter(
		required string input,
		required string patternText
		) {

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		var parts = javaCast( "string", input )
			.split( patternText, 2 )
		;

		return( parts[ 2 ] ?: "" );

	}


	/**
	* I return the trailing portion of the string starting after the last match of the
	* given pattern. If the pattern cannot be matched, the empty string is returned.
	*/
	public string function jreAfterLast(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			.compile( patternText )
			.matcher( input )
		;

		if ( ! matcher.find() ) {

			return( "" );

		}

		var previousEnd = matcher.end();

		while ( matcher.find() ) {

			previousEnd = matcher.end();

		}

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		return( javaCast( "string", input ).substring( previousEnd ) );

	}

</cfscript>

In both of these methods, if the Regular Expression pattern cannot be matched, the empty string is returned. I modeled this after ColdFusion's listRest() function: if no delimiter can be found, the input is thought to only have a single item and therefore cannot contain a "rest" set of items.

The next two functions extract a portion of a string either starting with or ending with a Regular Expression match. jreStartingWith() collects the first match of the given pattern and everything that follows it. jreEndingWith() collects the first match of the given pattern and everything that came before it.

jreStartingWith( "abcabc", "ca" ) == "cabc"

jreEndingWith( "abcabc", "ca" ) == "abca"

<cfscript>

	/**
	* I return the trailing portion of the string starting with the first match of the
	* given pattern. If the pattern cannot be matched, the empty string is returned.
	*/
	public string function jreStartingWith(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			.compile( patternText )
			.matcher( input )
		;

		if ( ! matcher.find() ) {

			return( "" );

		}

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		return( javaCast( "string", input ).substring( matcher.start() ) );

	}


	/**
	* I return the leading portion of the string ending with the first match of the given
	* pattern. If the pattern cannot be matched, the empty string is returned.
	*/
	public string function jreEndingWith(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			.compile( patternText )
			.matcher( input )
		;

		if ( ! matcher.find() ) {

			return( "" );

		}

		// NOTE: Technically, CFML Strings are Java Strings; however, since we're going to
		// dip down into the Java layer methods, it's comforting to explicitly cast the
		// value to the native Java type, if for no other reason to provide some
		// documentation as to where these method are coming from.
		return( javaCast( "string", input ).substring( 0, matcher.end() ) );

	}

</cfscript>

In both of these functions, since the RegEx pattern is part of the extracted string, if no pattern can be matched the empty string is returned.

Finally, let's look at two similar methods that also deal with starting and ending patterns; but, which lock the match down to the beginning and ending of the input, respectively: jreStartsWith() and jreEndsWith(). These methods return Boolean instead of strings.

jreStartsWith( "abcabc", "[ab]+" ) == true

jreEndsWith( "abcabc", "[bc]+" ) == true

The underpinning logic of these two methods rests on the fact that a Regular Expression pattern is still valid if you double-up on boundary anchors. What you'll see below is that I am adding \A and \z to the user-provided patterns before I compile them in order to anchor them to the start of the input and the end of the input, respectively.

<cfscript>

	/**
	* I determine if the given pattern can be matched at the start of the input.
	*/
	public boolean function jreStartsWith(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			// In order to limit the amount of text that we have to search, let's anchor
			// the pattern to the beginning of the input. It's OK if the pattern already
			// has anchor boundaries - they can be doubled-up without consequence.
			.compile( "\A" & patternText )
			.matcher( input )
		;

		return( matcher.find() );

	}


	/**
	* I determine if the given pattern can be matched at the end of the input.
	*/
	public boolean function jreEndsWith(
		required string input,
		required string patternText
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			// In order to limit the amount of text that we have to search, let's anchor
			// the pattern to the end of the input. It's OK if the pattern already has
			// anchor boundaries - they can be doubled-up without consequence.
			.compile( patternText & "\z" )
			.matcher( input )
		;

		return( matcher.find() );

	}

</cfscript>

Aren't Regular Expressions just the cat's pajamas! And, since ColdFusion is built on top of Java, we get to leverage some very powerful RegEx features like the String.split() method and the Pattern and Matcher classes. ColdFusion is truly the gift that keeps on giving.

Want to use code from this post? Check out the license.

Reader Comments

Post A Comment — I'd Love To Hear From You!

Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.