Skip to main content
Ben Nadel at the New York ColdFusion User Group (Sep. 2009) with: Hansjorg Posch
Ben Nadel at the New York ColdFusion User Group (Sep. 2009) with: Hansjorg Posch ( @tunesbag )

PatternMatcher.cfc - A ColdFusion Component Wrapper For The Java Regular Expression Engine

By on
Tags:

ColdFusion is built on top of Java. This architecture not only gives us access to all the ColdFusion functionality, but also to all of the core Java libraries that lay just below the surface. Among those core libraries are the Pattern and Matcher classes contained within the java.util.regex package. These classes provide very power regular expression (RegEx) functionality; however, going from a loosely typed language (ColdFusion) to a strongly typed language (Java) can make communication somewhat arduous. As such, I wanted to wrap the Pattern/Matcher access in a ColdFusion component that would encapsulate all of the necessary data type conversions and extraneous mechanics.

To be honest, what I'm going to show you here is nothing very new. I have, many times before, blogged about ways in which to abstract access to the Java Pattern and Matcher classes. User defined functions (UDFs) like reSplit(), reMatchGroup(), reMatchGroups(), and reMultiMatch() all rely on the Java Matcher class internally. Likewise, ColdFusion custom tags like reLoop.cfm, rereplace.cfm, and re:replace.cfm allow for abstractions to both the access and mutation features of the Matcher class. Up until now, however, nothing that I've done has really been ColdFusion-component-based.

At first, I was going to create both a Pattern.cfc and a Matcher.cfc to correspond to the two underlying Java classes; however, what I realized was that in all of my regular expression usage, the Matcher class was really the object of primary use. The Pattern class was simply a stepping stone used to get to a Matcher instance. As such, I decided to narrow the scope of my wrapper down to a single ColdFusion component: PatternMatcher.cfc.

This ColdFusion component takes care of creating both the necessary Pattern class and Matcher class instances; but, it really only provides for an augmented subset of the underlying Matcher class.

  • init( pattern, input ) :: any
  • find() :: boolean
  • group( [index [, default ]] ) :: string
  • groupCount() :: numeric
  • hasGroup( index ) :: boolean
  • match() :: array
  • matchGroups() :: array
  • replaceAll( replacement [, quoteReplacement ] ) :: string
  • replaceFirst( replacement [, quoteReplacement ] ) :: string
  • replaceWith( replacement [, quoteReplacement ] ) :: any
  • reset() :: any
  • result() :: string

The replaceAll() and replaceFirst() methods are single operation methods - that is, they act on the input value and are done. Likewise, match() and matchGroups() are single operation methods - they collect matches from the input value and return the aggregated collection. find(), group(), and replaceWith(), on the other hand, are where things get a bit more interesting. These last three functions allow you to iterate over each pattern match within the given input string, collecting and/or replacing captured values on a point-by-point basis.

Before we look at the PatternMatcher.cfc code, let's take a look at some examples. The first will demonstrate the looping behavior afforded by the find() method:

<!--- Create an input text in which we will find patterns. --->
<cfsavecontent variable="input">

	Sarah 212-555-1234 01/12/1980
	Kim 212-555-9399 07/14/1975
	Jenna 917-555-8712 05/05/1977
	Tricia 646-555-9990 12/20/1974

</cfsavecontent>

<!---
	Now, create a pattern to parse the above intput. We will be
	looking for names, phone numbers, and birthdays. I'm going to
	use a VERBOSE regular expression to explain the capture.
--->
<cfsavecontent variable="pattern">(?x)

	## Group 1.
	## Capture the name.
	(\w+)

	\s+

	## Group 2.
	## Capture the phone number.
	(\d+ - \d+ - \d+)

	\s+

	## Group 3.
	## Capture the date of birth (DOB).
	(\d+ / \d+ / \d+)

</cfsavecontent>


<!---
	Now, create a matcher to parse the input string and make
	use of it.
--->
<cfset matcher = createObject( "component", "PatternMatcher" ).init(
	pattern,
	input
	) />

<!--- Create an array to keep track of the records. --->
<cfset records = [] />

<!--- Keep looping over the input to find the matches. --->
<cfloop condition="matcher.find()">

	<!---
		Create a record for this set of matches. When each pattern
		is matched, we are given access to each captured group through
		the group() method.

		NOTE: Group Zero (0) is always the entire pattern match, even
		if there is no capturing group around the entire pattern.
	--->
	<cfset record = {
		name = matcher.group( 1 ),
		phoneNumber = matcher.group( 2 ),
		dateOfBirth = matcher.group( 3 )
		} />

	<!--- Add the record to the record set. --->
	<cfset arrayAppend(
		records,
		record
		) />

</cfloop>

<!--- Output the matches. --->
<cfdump
	var="#records#"
	label="Parsed Record Data"
	/>

Here, we are given a chunk of textual data that contains "person" records. Since each record (line of text) adheres to a pattern, we can use our PatternMatcher.cfc to parse each row into a data structure. Our pattern is comprised of three captured groups - name, phone number, and date-of-birth. As we loop over each match, you'll notice that we have access to each of those captured groups through the group() method.

When we run the above code, we get the following CFDump output:

PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine.

As you can see, the match-iteration provided by the PatternMatcher.cfc made our text input easily transformable.

Gathering data is only half of the magic. Replacing matches is the other. This time, we'll use the same pattern and input; but, rather than aggregating the pattern matches, we'll replace them with altered text.

NOTE: This demo was run directly after the previous one; this is why we start off by reset()'ing the matcher and do not need to re-define our pattern or input values.

<!---
	Now, let's imagine that we want to go through the input and XXX
	out people's date of births. First we'll want to reset our
	matcher.
--->
<cfset matcher.reset() />

<!--- Now, let's loop over the matches again. --->
<cfloop condition="matcher.find()">

	<!---
		When replacing, we can use the captured group back references
		for the values that we DO want to keep. Remember, the first
		group was the name, the second the phone number, and the
		third was the date of birth.
	--->
	<cfset matcher.replaceWith(
		"$1 $2 MM/DD/YYYY"
		) />

</cfloop>

<!--- Output the result of the replacement. --->
<cfoutput>

	<pre>
		#matcher.result()#
	</pre>

</cfoutput>

As you can see here, we are using the replaceWith() method to replace the current match with the given value. Part of our value uses back-references ($1, $2), which allow us to use captured groups within the replacement text. As we are performing these replacements, the PatternMatcher.cfc is building up an internal buffer; once we are done replacing values, we can then gain access to that internal buffer by using the result() method.

When we run the above code, we get the following output:

Sarah 212-555-1234 MM/DD/YYYY
Kim 212-555-9399 MM/DD/YYYY
Jenna 917-555-8712 MM/DD/YYYY
Tricia 646-555-9990 MM/DD/YYYY

As you can see, the date-of-birth was successfully "erased."

The find(), group(), replaceWith(), replaceFirst(), and replaceAll() methods really comprise the core Pattern/Matcher functionality from the Java layer. As an added bonus, I have also added two utility methods - match() and matchGroups() - which provide a more powerful alternative to ColdFusion's native reMatch() function.

<!--- Output all the matches of the given pattern. --->
<cfdump
	var="#matcher.match()#"
	label="Match()"
	/>

Running the above code, we get the following output:

PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine.

This works like the native reMatch() function; only, it gives you access to Java regular expression library which is much more robust.

The matchGroups() function works like the match() function; only, it breaks the individual matches down by captured group:

<!---
	Output all the matches of the given pattern, broken down
	by captured group.
--->
<cfdump
	var="#matcher.matchGroups()#"
	label="MatchGroups()"
	/>

Running the above code, we get the following output:

PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine.

Ok, enough exploration. Let's take a look at the PatternMatcher.cfc ColdFusion component:

PatternMatcher.cfc

<cfcomponent
	output="false"
	hint="I provide easier, implicitly type-cast access to the underlying Java Pattern and Matcher functionality.">


	<cffunction
		name="init"
		access="public"
		returntype="any"
		output="false"
		hint="I return an intialized component.">

		<!--- Define arguments. --->
		<cfargument
			name="pattern"
			type="string"
			required="true"
			hint="I am the Java-compatible regular expression to be used to create this pattern matcher."
			/>

		<cfargument
			name="input"
			type="string"
			required="true"
			hint="I am the input text over which we will be matching the above regular expression pattern."
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Store the original values. --->
		<cfset variables.pattern = arguments.pattern />
		<cfset variables.input = arguments.input />

		<!---
			Compile the regular expression pattern and get the
			matcher for the given input sequence.
		--->
		<cfset variables.matcher =
			createObject( "java", "java.util.regex.Pattern" )
				.compile( javaCast( "string", variables.pattern ) )
				.matcher( javaCast( "string", variables.input ) )
			/>

		<!--- Create a buffer to store the replacement result. --->
		<cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() />

		<!--- Return this object reference for method chaining. --->
		<cfreturn this />
	</cffunction>


	<cffunction
		name="find_"
		access="public"
		returntype="boolean"
		output="false"
		hint="I attempt to find the next pattern match located within the input string.">

		<!--- Pass this request onto the matcher. --->
		<cfreturn variables.matcher.find() />
	</cffunction>


	<cffunction
		name="group"
		access="public"
		returntype="any"
		output="false"
		hint="I return the value captured by the given group. NOTE: Zero (0) will return the entire pattern match.">

		<!--- Define arguments. --->
		<cfargument
			name="index"
			type="numeric"
			required="false"
			default="0"
			/>

		<cfargument
			name="default"
			type="string"
			required="false"
			hint="I am the optional default to use if the given group (index) was not captured. Non-captured group references will return VOID. A default can be used to return non-void values."
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Get the given group value. --->
		<cfset local.capturedValue = variables.matcher.group(
			javaCast( "int", arguments.index )
			) />

		<!---
			Check to see if the given group was able to capture a
			value (of if it did not, in which case, it will return
			NULL, destroying the variable).
		--->
		<cfif structKeyExists( local, "capturedValue" )>

			<!--- Return the captured value. --->
			<cfreturn local.capturedValue />

		<cfelseif structKeyExists( arguments, "default" )>

			<!---
				No group was captured, but a default value was
				provided. Return the default value.
			--->
			<cfreturn arguments.default />

		<cfelse>

			<!---
				No value was captured and no default was provided;
				simply return VOID to the calling context.
			--->
			<cfreturn />

		</cfif>
	</cffunction>


	<cffunction
		name="groupCount"
		access="public"
		returntype="numeric"
		output="false"
		hint="I return the number of capturing groups within the regular exression pattern.">

		<!--- Pass this request onto the matcher. --->
		<cfreturn variables.matcher.groupCount() />
	</cffunction>


	<cffunction
		name="hasGroup"
		access="public"
		returntype="any"
		output="false"
		hint="I determine whether or not the given group was captured in the previous match.">

		<!--- Define arguments. --->
		<cfargument
			name="index"
			type="numeric"
			required="true"
			/>

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Get the given group value. --->
		<cfset local.capturedValue = variables.matcher.group(
			javaCast( "int", arguments.index )
			) />

		<!---
			Return whether or not the given captured group exists.
			If it was captured, the value will exists; if it was not
			captured, the given group value will be NULL (and hence
			not exist).
		--->
		<cfreturn structKeyExists( local, "capturedValue" ) />
	</cffunction>


	<cffunction
		name="match"
		access="public"
		returntype="array"
		output="false"
		hint="I return the collection of all pattern matches found within the given input. NOTE: This resets the internal matcher.">

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Reset the pattern matcher. --->
		<cfset this.reset() />

		<!---
			Create an array in which to hold the aggregated
			pattern matches.
		--->
		<cfset local.matches = [] />

		<!--- Keep looping, looking for matches. --->
		<cfloop condition="variables.matcher.find()">

			<!--- Gather the current match. --->
			<cfset arrayAppend(
				local.matches,
				variables.matcher.group()
				) />

		</cfloop>

		<!--- Return the collected matches. --->
		<cfreturn local.matches />
	</cffunction>


	<cffunction
		name="matchGroups"
		access="public"
		returntype="array"
		output="false"
		hint="I return the collection of all pattern matches found within the given input, broken down by group. NOTE: This resets the internal matcher.">

		<!--- Define the local scope. --->
		<cfset var local = {} />

		<!--- Reset the pattern matcher. --->
		<cfset this.reset() />

		<!---
			Create an array in which to hold the aggregated
			pattern matches.
		--->
		<cfset local.matches = [] />

		<!--- Keep looping, looking for matches. --->
		<cfloop condition="variables.matcher.find()">

			<!--- Create a match object. --->
			<cfset local.match = {} />

			<!---
				Move all of the captured groups into the match object
				(with zero being the entire match).
			--->
			<cfloop
				index="local.groupIndex"
				from="0"
				to="#variables.matcher.groupCount()#"
				step="1">

				<!--- Get the local value. --->
				<cfset local.groupValue = variables.matcher.group(
					javaCast( "int", local.groupIndex )
					) />

				<!---
					Check to see if the value exists and only set it
					if it does; ColdFusion seems to not like having
					a NULL set into the struct (although it really
					shouldn't have a problem with it).
				--->
				<cfif structKeyExists( local, "groupvalue" )>

					<!--- Store the captured value. --->
					<cfset local.match[ local.groupIndex ] = local.groupValue />

				</cfif>

			</cfloop>

			<!--- Add the current match object. --->
			<cfset arrayAppend(
				local.matches,
				local.match
				) />

		</cfloop>

		<!--- Return the collected matches. --->
		<cfreturn local.matches />
	</cffunction>


	<cffunction
		name="replaceAll"
		access="public"
		returntype="string"
		output="false"
		hint="I replace all the pattern matches of the original input with the given value.">

		<!--- Define arguments. --->
		<cfargument
			name="replacement"
			type="string"
			required="true"
			hint="I am the string with which we are replacing the pattern matches."
			/>

		<cfargument
			name="quoteReplacement"
			type="boolean"
			required="false"
			default="false"
			hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
			/>

		<!--- Check to see if we are quoting the replacement. --->
		<cfif arguments.quoteReplacement>

			<!--- Quote the replacement string. --->
			<cfreturn variables.matcher.replaceAll(
				variables.matcher.quoteReplacement(
					javaCast( "string", arguments.replacement )
					)
				) />

		<cfelse>

			<!--- Use the replacement text as-is. --->
			<cfreturn variables.matcher.replaceAll(
				javaCast( "string", arguments.replacement )
				) />

		</cfif>
	</cffunction>


	<cffunction
		name="replaceFirst"
		access="public"
		returntype="string"
		output="false"
		hint="I replace the first pattern matche of the original input with the given value.">

		<!--- Define arguments. --->
		<cfargument
			name="replacement"
			type="string"
			required="true"
			hint="I am the string with which we are replacing the first pattern match."
			/>

		<cfargument
			name="quoteReplacement"
			type="boolean"
			required="false"
			default="false"
			hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
			/>

		<!--- Check to see if we are quoting the replacement. --->
		<cfif arguments.quoteReplacement>

			<!--- Quote the replacement string. --->
			<cfreturn variables.matcher.replaceFirst(
				variables.matcher.quoteReplacement(
					javaCast( "string", arguments.replacement )
					)
				) />

		<cfelse>

			<!--- Use the replacement text as-is. --->
			<cfreturn variables.matcher.replaceFirst(
				javaCast( "string", arguments.replacement )
				) />

		</cfif>
	</cffunction>


	<cffunction
		name="replaceWith"
		access="public"
		returntype="any"
		output="false"
		hint="I replace the current match with the given value. NOTE: Back references within the replacement string will be honored unless the replacement value is quoted (see second arguemnt).">

		<!--- Define arguments. --->
		<cfargument
			name="replacement"
			type="string"
			required="true"
			hint="I am the value with which we are replacing the previous match."
			/>

		<cfargument
			name="quoteReplacement"
			type="boolean"
			required="false"
			default="false"
			hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
			/>

		<!--- Check to see if we are quoting the replacement. --->
		<cfif arguments.quoteReplacement>

			<!--- Quote the replacement value before you use it. --->
			<cfset variables.matcher.appendReplacement(
				variables.buffer,
				variables.matcher.quoteReplacement(
					javaCast( "string", arguments.replacement )
					)
				) />

		<cfelse>

			<!--- Use raw replacement value. --->
			<cfset variables.matcher.appendReplacement(
				variables.buffer,
				javaCast( "string", arguments.replacement )
				) />

		</cfif>

		<!--- Return this object reference for method chaining. --->
		<cfreturn this />
	</cffunction>


	<cffunction
		name="reset"
		access="public"
		returntype="any"
		output="false"
		hint="I reset the pattern matcher.">

		<!--- Define arguments. --->
		<cfargument
			name="input"
			type="string"
			required="false"
			hint="I am the optional input with which to reset the pattern matcher."
			/>

		<!--- Check to see if a new input is being used. --->
		<cfif structKeyExists( arguments, "input" )>

			<!--- Use a new input to reset the matcher. --->
			<cfset variables.matcher.reset(
				javaCast( "string", arguments.input )
				) />

			<!--- Store the input property. --->
			<cfset variables.input = arguments.input />

		<cfelse>

			<!--- Reset the internal matcher. --->
			<cfset variables.matcher.reset() />

		</cfif>

		<!--- Reset the internal results buffer. --->
		<cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() />

		<!--- Return this object reference for method chaining. --->
		<cfreturn this />
	</cffunction>


	<cffunction
		name="result"
		access="public"
		returntype="string"
		output="false"
		hint="I return the result of the replacement up until this point.">

		<!---
			Since we are no longer dealing with replacements,
			append the rest of the unmatched input string to the
			results buffer.
		--->
		<cfset variables.matcher.appendTail(
			variables.buffer
			) />

		<!--- Return the resultand string. --->
		<cfreturn variables.buffer.toString() />
	</cffunction>


	<!--- ------------------------------------------------- --->
	<!--- ------------------------------------------------- --->


	<!---
		Swap some of the method names; we couldn't name it "find"
		to begin with otherwise we'd get a ColdFusion error for
		conflicting with a native function name.
	--->
	<cfset this.find = this.find_ />

</cfcomponent>

Java's Pattern and Matcher classes are, without a doubt, amazing. They provide for a very efficient, very robust regular expression engine - much more powerful than the one at the native ColdFusion level. I'm always looking for ways to make using these classes easier. Creating a ColdFusion component wrapper might just be the easiest approach yet.

Want to use code from this post? Check out the license.

Reader Comments

5 Comments

With out doubt the coolest concept this year so far, and I'd not be surprised to be saying the same come Xmas. Genius! Thanks once again for the inspiration.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel