JRegEx - A ColdFusion Wrapper Around Java's Regular Expression Patterns

By Ben Nadel

Published 2017-08-26 in ColdFusion — Comments (3)

For years, I've been a huge fan of Regular Expressions. And, while I program a lot in ColdFusion, I often dip down into the underlying Java layer in order to leverage the robust feature-set and increased performance of Java's Pattern matching engine. In the past, I've tried to find elegant ways to surface this functionality in ColdFusion. But, it wasn't until ColdFusion introduced Closures that we could finally recreate the elegance of JavaScript's String.prototype.replace() method. Now that ColdFusion has all the right ingredients, I wanted to wrap my common Java Regular Expression use-cases in a single ColdFusion component - JRegEx.cfc.

View the ColdFusion JRegEx project on github.

JRegEx is just a utility library that contains a number of static methods that - for the most part - abstract the complexity of Java's Matcher class. For example, here is the current implementation of the .jreReplace() method, which takes a closure and uses it to manipulate each match in a replacement operation:

/**
* I use Java's Pattern / Matcher libraries to replace matched patterns using the
* given operator function or closure.
*
* @targetText I am the text being scanned.
* @patternText I am the Java Regular Expression pattern used to locate matches.
* @operator I am the Function or Closure used to provide the match replacements.
* @output false
*/
public string function jreReplace(
	required string targetText,
	required string patternText,
	required function operator
	) {

	var matcher = createMatcher( targetText, patternText );
	var buffer = createBuffer();

	// Iterate over each pattern match in the target text.
	while ( matcher.find() ) {

		// When preparing the arguments for the operator, we need to construct an
		// argumentCollection structure in which the argument index is the numeric
		// key of the argument offset. In order to simplify overlaying the pattern
		// group matching over the arguments array, we're simply going to keep an
		// incremented offset every time we add an argument.
		var operatorArguments = {};
		var operatorArgumentOffset = 1; // Will be incremented with each argument.

		var groupCount = matcher.groupCount();

		// NOTE: Calling .group(0) is equivalent to calling .group(), which will
		// return the entire match, not just a capturing group.
		for ( var i = 0 ; i <= groupCount ; i++ ) {

			operatorArguments[ operatorArgumentOffset++ ] = matcher.group( javaCast( "int", i ) );

		}

		// Including the match offset and the original content for parity with the
		// JavaScript String.replace() function on which this algorithm is based.
		// --
		// NOTE: We're adding 1 to the offset since ColdFusion starts offsets at 1
		// where as Java starts offsets at 0.
		operatorArguments[ operatorArgumentOffset++ ] = ( matcher.start() + 1 );
		operatorArguments[ operatorArgumentOffset++ ] = targetText;

		var replacement = operator( argumentCollection = operatorArguments );

		// In the event the operator doesn't return a value, we'll assume that the
		// intention is to replace the match with nothing.
		if ( isNull( replacement ) ) {

			replacement = "";

		}

		// Since the operator is providing the replacement text based on the
		// individual parts found in the match, we are going to assume that any
		// embedded group reference is coincidental and should be consumed as a
		// string literal.
		matcher.appendReplacement(
			buffer,
			matcher.quoteReplacement( javaCast( "string", replacement ) )
		);

	}

	matcher.appendTail( buffer );

	return( buffer.toString() );

}

Since ColdFusion uses 1-based indices and Java uses 0-based indices, these methods take care of translating to and from the 1-based index system that ColdFusion developers are used to using. And, unlike ColdFusion's native RegEx methods, there are no "NoCase" versions of these methods. If you want to use a case-insensitive pattern, you should just prepend your pattern with the case-insensitivity flag, (?i).

jreEscape( patternText ) :: string

This takes a Java Regular Expression pattern and returns a literal version of it. This literal version can then be used in other JRegEx methods. This is essentially what the "quoteReplacement" argument is doing in some of the other methods.

jreFind( targetText, patternText [, startingAt = 1 ] ) :: number

This finds the offset of the first Java Regular Expression pattern match in the given target text. Returns zero if no match is found.

jreMap( targetText, patternText, operator ) :: array[any]

This iterates over each match of the given Java Regular Expression pattern found within the given target text. Each match and its captured groups are passed to the operator function which can return a mapped value. The mapped values are then aggregated and returned in an array. If the operator returns an undefined value, the match is omitted from the results.

jreMatch( targetText, patternText ) :: array[string]

This takes the given Java Regular Expression pattern and collects all matches of it that can be found within the given target text. That matches are returned as an array of strings.

jreMatchGroups( targetText, patternText ) :: array[struct]

This takes the given Java Regular Expression pattern and collects all matches of it that can be found within the given target text. That matches are returned as an array of structs in which each struct holds the captured groups of the match. The struct is keyed based on the index of the captured group, within the pattern, with the "0" key containing the entire match text.

jreReplace( targetText, patternText, operator ) :: string

This iterates over each match of the given Java Regular Expression pattern found within the given target text and passes the match to the operator function / closure. The operator function / closure can then return a replacement value that will be merged into the resultant text. This is based on JavaScript's String.prototype.replace() method.

jreReplaceAll( targetText, patternText, replacementText [, quoteReplacement = false ] ) :: string

I replace all matches of the given Java Regular Expression pattern found within the given target text with the given replacement text. The replacement text can contain "$N" references to captured groups within the pattern.

jreReplaceFirst( targetText, patternText, replacementText [, quoteReplacement = false ] ) :: string

I replace the first match of the given Java Regular Expression pattern found within the given target text with the given replacement text. The replacement text can contain "$N" references to captured groups within the pattern.

jreSegment( targetText, patternText ) :: array[struct]

I use the given Java Regular Expression pattern to break the given target text up into segments. Unlike the `jreSplit()` method, this method returns both the pattern matches as well as the text in between the matches. The resultant array contains a heterogeneous mix of match and non-match structs. Each struct contains the following properties:

Match: Boolean
Offset: Numeric
Text: String

Match structs also contain the property:

Groups: Struct[Numeric] :: String

... which contains the captured groups of the match.

jreSplit( targetText, patternText [, limit = 0 ] ) :: array[string]

I use the given Java Regular Expression pattern to split the given target text. The resultant portions of the target text are returned as an array of strings.

jreTest( targetText, patternText ) :: boolean

I test to see if the given Java Regular Expression pattern matches the entire target text. You can think of this as wrapping your pattern with "^" and "$" boundary sequences.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3322

Reader Comments

Stewart McGuire Sep 1, 2017 at 8:52 AM

1 Comments

Ben, what versions of CF does this work on? 2016 only?

Ben Nadel Sep 1, 2017 at 9:07 AM

16,154 Comments

@Stewart,

I believe it should be anything >= 10. Whenver closures were introduced - I think maybe that was 10. Though, technically, they don't need to be closures - they could just be function references. In that case, the support would go back even farther, I think.

But, 10 is what I have installed locally -- time to upgrade :D

Ben Nadel May 7, 2019 at 8:31 AM

16,154 Comments

@All,

Just wanted to cross-post here that I used the JRegEx component to apply some post-processing to markdown content:

www.bennadel.com/blog/3616-considering-ways-to-embed-widgets-in-my-markdown-using-flexmark-0-42-6-and-coldfusion.htm

I'm using it to convert markdown items like,  to an embedded Vimeo iframe. Just love using Regular Expressions!

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.