Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with:

PatternMatcher.cfc - A ColdFusion Component Wrapper For The Java Regular Expression Engine

By Ben Nadel on
Tags: ColdFusion

ColdFusion is built on top of Java. This architecture not only gives us access to all the ColdFusion functionality, but also to all of the core Java libraries that lay just below the surface. Among those core libraries are the Pattern and Matcher classes contained within the java.util.regex package. These classes provide very power regular expression (RegEx) functionality; however, going from a loosely typed language (ColdFusion) to a strongly typed language (Java) can make communication somewhat arduous. As such, I wanted to wrap the Pattern/Matcher access in a ColdFusion component that would encapsulate all of the necessary data type conversions and extraneous mechanics.

To be honest, what I'm going to show you here is nothing very new. I have, many times before, blogged about ways in which to abstract access to the Java Pattern and Matcher classes. User defined functions (UDFs) like reSplit(), reMatchGroup(), reMatchGroups(), and reMultiMatch() all rely on the Java Matcher class internally. Likewise, ColdFusion custom tags like reLoop.cfm, rereplace.cfm, and re:replace.cfm allow for abstractions to both the access and mutation features of the Matcher class. Up until now, however, nothing that I've done has really been ColdFusion-component-based.

At first, I was going to create both a Pattern.cfc and a Matcher.cfc to correspond to the two underlying Java classes; however, what I realized was that in all of my regular expression usage, the Matcher class was really the object of primary use. The Pattern class was simply a stepping stone used to get to a Matcher instance. As such, I decided to narrow the scope of my wrapper down to a single ColdFusion component: PatternMatcher.cfc.

This ColdFusion component takes care of creating both the necessary Pattern class and Matcher class instances; but, it really only provides for an augmented subset of the underlying Matcher class.

  • init( pattern, input ) :: any
  • find() :: boolean
  • group( [index [, default ]] ) :: string
  • groupCount() :: numeric
  • hasGroup( index ) :: boolean
  • match() :: array
  • matchGroups() :: array
  • replaceAll( replacement [, quoteReplacement ] ) :: string
  • replaceFirst( replacement [, quoteReplacement ] ) :: string
  • replaceWith( replacement [, quoteReplacement ] ) :: any
  • reset() :: any
  • result() :: string

The replaceAll() and replaceFirst() methods are single operation methods - that is, they act on the input value and are done. Likewise, match() and matchGroups() are single operation methods - they collect matches from the input value and return the aggregated collection. find(), group(), and replaceWith(), on the other hand, are where things get a bit more interesting. These last three functions allow you to iterate over each pattern match within the given input string, collecting and/or replacing captured values on a point-by-point basis.

Before we look at the PatternMatcher.cfc code, let's take a look at some examples. The first will demonstrate the looping behavior afforded by the find() method:

  • <!--- Create an input text in which we will find patterns. --->
  • <cfsavecontent variable="input">
  •  
  • Sarah 212-555-1234 01/12/1980
  • Kim 212-555-9399 07/14/1975
  • Jenna 917-555-8712 05/05/1977
  • Tricia 646-555-9990 12/20/1974
  •  
  • </cfsavecontent>
  •  
  • <!---
  • Now, create a pattern to parse the above intput. We will be
  • looking for names, phone numbers, and birthdays. I'm going to
  • use a VERBOSE regular expression to explain the capture.
  • --->
  • <cfsavecontent variable="pattern">(?x)
  •  
  • ## Group 1.
  • ## Capture the name.
  • (\w+)
  •  
  • \s+
  •  
  • ## Group 2.
  • ## Capture the phone number.
  • (\d+ - \d+ - \d+)
  •  
  • \s+
  •  
  • ## Group 3.
  • ## Capture the date of birth (DOB).
  • (\d+ / \d+ / \d+)
  •  
  • </cfsavecontent>
  •  
  •  
  • <!---
  • Now, create a matcher to parse the input string and make
  • use of it.
  • --->
  • <cfset matcher = createObject( "component", "PatternMatcher" ).init(
  • pattern,
  • input
  • ) />
  •  
  • <!--- Create an array to keep track of the records. --->
  • <cfset records = [] />
  •  
  • <!--- Keep looping over the input to find the matches. --->
  • <cfloop condition="matcher.find()">
  •  
  • <!---
  • Create a record for this set of matches. When each pattern
  • is matched, we are given access to each captured group through
  • the group() method.
  •  
  • NOTE: Group Zero (0) is always the entire pattern match, even
  • if there is no capturing group around the entire pattern.
  • --->
  • <cfset record = {
  • name = matcher.group( 1 ),
  • phoneNumber = matcher.group( 2 ),
  • dateOfBirth = matcher.group( 3 )
  • } />
  •  
  • <!--- Add the record to the record set. --->
  • <cfset arrayAppend(
  • records,
  • record
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Output the matches. --->
  • <cfdump
  • var="#records#"
  • label="Parsed Record Data"
  • />

Here, we are given a chunk of textual data that contains "person" records. Since each record (line of text) adheres to a pattern, we can use our PatternMatcher.cfc to parse each row into a data structure. Our pattern is comprised of three captured groups - name, phone number, and date-of-birth. As we loop over each match, you'll notice that we have access to each of those captured groups through the group() method.

When we run the above code, we get the following CFDump output:

 
 
 
 
 
 
PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine. 
 
 
 

As you can see, the match-iteration provided by the PatternMatcher.cfc made our text input easily transformable.

Gathering data is only half of the magic. Replacing matches is the other. This time, we'll use the same pattern and input; but, rather than aggregating the pattern matches, we'll replace them with altered text.

NOTE: This demo was run directly after the previous one; this is why we start off by reset()'ing the matcher and do not need to re-define our pattern or input values.

  • <!---
  • Now, let's imagine that we want to go through the input and XXX
  • out people's date of births. First we'll want to reset our
  • matcher.
  • --->
  • <cfset matcher.reset() />
  •  
  • <!--- Now, let's loop over the matches again. --->
  • <cfloop condition="matcher.find()">
  •  
  • <!---
  • When replacing, we can use the captured group back references
  • for the values that we DO want to keep. Remember, the first
  • group was the name, the second the phone number, and the
  • third was the date of birth.
  • --->
  • <cfset matcher.replaceWith(
  • "$1 $2 MM/DD/YYYY"
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Output the result of the replacement. --->
  • <cfoutput>
  •  
  • <pre>
  • #matcher.result()#
  • </pre>
  •  
  • </cfoutput>

As you can see here, we are using the replaceWith() method to replace the current match with the given value. Part of our value uses back-references ($1, $2), which allow us to use captured groups within the replacement text. As we are performing these replacements, the PatternMatcher.cfc is building up an internal buffer; once we are done replacing values, we can then gain access to that internal buffer by using the result() method.

When we run the above code, we get the following output:

Sarah 212-555-1234 MM/DD/YYYY
Kim 212-555-9399 MM/DD/YYYY
Jenna 917-555-8712 MM/DD/YYYY
Tricia 646-555-9990 MM/DD/YYYY

As you can see, the date-of-birth was successfully "erased."

The find(), group(), replaceWith(), replaceFirst(), and replaceAll() methods really comprise the core Pattern/Matcher functionality from the Java layer. As an added bonus, I have also added two utility methods - match() and matchGroups() - which provide a more powerful alternative to ColdFusion's native reMatch() function.

  • <!--- Output all the matches of the given pattern. --->
  • <cfdump
  • var="#matcher.match()#"
  • label="Match()"
  • />

Running the above code, we get the following output:

 
 
 
 
 
 
PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine. 
 
 
 

This works like the native reMatch() function; only, it gives you access to Java regular expression library which is much more robust.

The matchGroups() function works like the match() function; only, it breaks the individual matches down by captured group:

  • <!---
  • Output all the matches of the given pattern, broken down
  • by captured group.
  • --->
  • <cfdump
  • var="#matcher.matchGroups()#"
  • label="MatchGroups()"
  • />

Running the above code, we get the following output:

 
 
 
 
 
 
PatternMatcher.cfc - A ColdFusion Component Wrapper For Java's Regular Expression Engine. 
 
 
 

Ok, enough exploration. Let's take a look at the PatternMatcher.cfc ColdFusion component:

PatternMatcher.cfc

  • <cfcomponent
  • output="false"
  • hint="I provide easier, implicitly type-cast access to the underlying Java Pattern and Matcher functionality.">
  •  
  •  
  • <cffunction
  • name="init"
  • access="public"
  • returntype="any"
  • output="false"
  • hint="I return an intialized component.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="pattern"
  • type="string"
  • required="true"
  • hint="I am the Java-compatible regular expression to be used to create this pattern matcher."
  • />
  •  
  • <cfargument
  • name="input"
  • type="string"
  • required="true"
  • hint="I am the input text over which we will be matching the above regular expression pattern."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Store the original values. --->
  • <cfset variables.pattern = arguments.pattern />
  • <cfset variables.input = arguments.input />
  •  
  • <!---
  • Compile the regular expression pattern and get the
  • matcher for the given input sequence.
  • --->
  • <cfset variables.matcher =
  • createObject( "java", "java.util.regex.Pattern" )
  • .compile( javaCast( "string", variables.pattern ) )
  • .matcher( javaCast( "string", variables.input ) )
  • />
  •  
  • <!--- Create a buffer to store the replacement result. --->
  • <cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() />
  •  
  • <!--- Return this object reference for method chaining. --->
  • <cfreturn this />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="find_"
  • access="public"
  • returntype="boolean"
  • output="false"
  • hint="I attempt to find the next pattern match located within the input string.">
  •  
  • <!--- Pass this request onto the matcher. --->
  • <cfreturn variables.matcher.find() />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="group"
  • access="public"
  • returntype="any"
  • output="false"
  • hint="I return the value captured by the given group. NOTE: Zero (0) will return the entire pattern match.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="index"
  • type="numeric"
  • required="false"
  • default="0"
  • />
  •  
  • <cfargument
  • name="default"
  • type="string"
  • required="false"
  • hint="I am the optional default to use if the given group (index) was not captured. Non-captured group references will return VOID. A default can be used to return non-void values."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Get the given group value. --->
  • <cfset local.capturedValue = variables.matcher.group(
  • javaCast( "int", arguments.index )
  • ) />
  •  
  • <!---
  • Check to see if the given group was able to capture a
  • value (of if it did not, in which case, it will return
  • NULL, destroying the variable).
  • --->
  • <cfif structKeyExists( local, "capturedValue" )>
  •  
  • <!--- Return the captured value. --->
  • <cfreturn local.capturedValue />
  •  
  • <cfelseif structKeyExists( arguments, "default" )>
  •  
  • <!---
  • No group was captured, but a default value was
  • provided. Return the default value.
  • --->
  • <cfreturn arguments.default />
  •  
  • <cfelse>
  •  
  • <!---
  • No value was captured and no default was provided;
  • simply return VOID to the calling context.
  • --->
  • <cfreturn />
  •  
  • </cfif>
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="groupCount"
  • access="public"
  • returntype="numeric"
  • output="false"
  • hint="I return the number of capturing groups within the regular exression pattern.">
  •  
  • <!--- Pass this request onto the matcher. --->
  • <cfreturn variables.matcher.groupCount() />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="hasGroup"
  • access="public"
  • returntype="any"
  • output="false"
  • hint="I determine whether or not the given group was captured in the previous match.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="index"
  • type="numeric"
  • required="true"
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Get the given group value. --->
  • <cfset local.capturedValue = variables.matcher.group(
  • javaCast( "int", arguments.index )
  • ) />
  •  
  • <!---
  • Return whether or not the given captured group exists.
  • If it was captured, the value will exists; if it was not
  • captured, the given group value will be NULL (and hence
  • not exist).
  • --->
  • <cfreturn structKeyExists( local, "capturedValue" ) />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="match"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="I return the collection of all pattern matches found within the given input. NOTE: This resets the internal matcher.">
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Reset the pattern matcher. --->
  • <cfset this.reset() />
  •  
  • <!---
  • Create an array in which to hold the aggregated
  • pattern matches.
  • --->
  • <cfset local.matches = [] />
  •  
  • <!--- Keep looping, looking for matches. --->
  • <cfloop condition="variables.matcher.find()">
  •  
  • <!--- Gather the current match. --->
  • <cfset arrayAppend(
  • local.matches,
  • variables.matcher.group()
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Return the collected matches. --->
  • <cfreturn local.matches />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="matchGroups"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="I return the collection of all pattern matches found within the given input, broken down by group. NOTE: This resets the internal matcher.">
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Reset the pattern matcher. --->
  • <cfset this.reset() />
  •  
  • <!---
  • Create an array in which to hold the aggregated
  • pattern matches.
  • --->
  • <cfset local.matches = [] />
  •  
  • <!--- Keep looping, looking for matches. --->
  • <cfloop condition="variables.matcher.find()">
  •  
  • <!--- Create a match object. --->
  • <cfset local.match = {} />
  •  
  • <!---
  • Move all of the captured groups into the match object
  • (with zero being the entire match).
  • --->
  • <cfloop
  • index="local.groupIndex"
  • from="0"
  • to="#variables.matcher.groupCount()#"
  • step="1">
  •  
  • <!--- Get the local value. --->
  • <cfset local.groupValue = variables.matcher.group(
  • javaCast( "int", local.groupIndex )
  • ) />
  •  
  • <!---
  • Check to see if the value exists and only set it
  • if it does; ColdFusion seems to not like having
  • a NULL set into the struct (although it really
  • shouldn't have a problem with it).
  • --->
  • <cfif structKeyExists( local, "groupvalue" )>
  •  
  • <!--- Store the captured value. --->
  • <cfset local.match[ local.groupIndex ] = local.groupValue />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  • <!--- Add the current match object. --->
  • <cfset arrayAppend(
  • local.matches,
  • local.match
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Return the collected matches. --->
  • <cfreturn local.matches />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="replaceAll"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I replace all the pattern matches of the original input with the given value.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="replacement"
  • type="string"
  • required="true"
  • hint="I am the string with which we are replacing the pattern matches."
  • />
  •  
  • <cfargument
  • name="quoteReplacement"
  • type="boolean"
  • required="false"
  • default="false"
  • hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
  • />
  •  
  • <!--- Check to see if we are quoting the replacement. --->
  • <cfif arguments.quoteReplacement>
  •  
  • <!--- Quote the replacement string. --->
  • <cfreturn variables.matcher.replaceAll(
  • variables.matcher.quoteReplacement(
  • javaCast( "string", arguments.replacement )
  • )
  • ) />
  •  
  • <cfelse>
  •  
  • <!--- Use the replacement text as-is. --->
  • <cfreturn variables.matcher.replaceAll(
  • javaCast( "string", arguments.replacement )
  • ) />
  •  
  • </cfif>
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="replaceFirst"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I replace the first pattern matche of the original input with the given value.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="replacement"
  • type="string"
  • required="true"
  • hint="I am the string with which we are replacing the first pattern match."
  • />
  •  
  • <cfargument
  • name="quoteReplacement"
  • type="boolean"
  • required="false"
  • default="false"
  • hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
  • />
  •  
  • <!--- Check to see if we are quoting the replacement. --->
  • <cfif arguments.quoteReplacement>
  •  
  • <!--- Quote the replacement string. --->
  • <cfreturn variables.matcher.replaceFirst(
  • variables.matcher.quoteReplacement(
  • javaCast( "string", arguments.replacement )
  • )
  • ) />
  •  
  • <cfelse>
  •  
  • <!--- Use the replacement text as-is. --->
  • <cfreturn variables.matcher.replaceFirst(
  • javaCast( "string", arguments.replacement )
  • ) />
  •  
  • </cfif>
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="replaceWith"
  • access="public"
  • returntype="any"
  • output="false"
  • hint="I replace the current match with the given value. NOTE: Back references within the replacement string will be honored unless the replacement value is quoted (see second arguemnt).">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="replacement"
  • type="string"
  • required="true"
  • hint="I am the value with which we are replacing the previous match."
  • />
  •  
  • <cfargument
  • name="quoteReplacement"
  • type="boolean"
  • required="false"
  • default="false"
  • hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)."
  • />
  •  
  • <!--- Check to see if we are quoting the replacement. --->
  • <cfif arguments.quoteReplacement>
  •  
  • <!--- Quote the replacement value before you use it. --->
  • <cfset variables.matcher.appendReplacement(
  • variables.buffer,
  • variables.matcher.quoteReplacement(
  • javaCast( "string", arguments.replacement )
  • )
  • ) />
  •  
  • <cfelse>
  •  
  • <!--- Use raw replacement value. --->
  • <cfset variables.matcher.appendReplacement(
  • variables.buffer,
  • javaCast( "string", arguments.replacement )
  • ) />
  •  
  • </cfif>
  •  
  • <!--- Return this object reference for method chaining. --->
  • <cfreturn this />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="reset"
  • access="public"
  • returntype="any"
  • output="false"
  • hint="I reset the pattern matcher.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="input"
  • type="string"
  • required="false"
  • hint="I am the optional input with which to reset the pattern matcher."
  • />
  •  
  • <!--- Check to see if a new input is being used. --->
  • <cfif structKeyExists( arguments, "input" )>
  •  
  • <!--- Use a new input to reset the matcher. --->
  • <cfset variables.matcher.reset(
  • javaCast( "string", arguments.input )
  • ) />
  •  
  • <!--- Store the input property. --->
  • <cfset variables.input = arguments.input />
  •  
  • <cfelse>
  •  
  • <!--- Reset the internal matcher. --->
  • <cfset variables.matcher.reset() />
  •  
  • </cfif>
  •  
  • <!--- Reset the internal results buffer. --->
  • <cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() />
  •  
  • <!--- Return this object reference for method chaining. --->
  • <cfreturn this />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="result"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I return the result of the replacement up until this point.">
  •  
  • <!---
  • Since we are no longer dealing with replacements,
  • append the rest of the unmatched input string to the
  • results buffer.
  • --->
  • <cfset variables.matcher.appendTail(
  • variables.buffer
  • ) />
  •  
  • <!--- Return the resultand string. --->
  • <cfreturn variables.buffer.toString() />
  • </cffunction>
  •  
  •  
  • <!--- ------------------------------------------------- --->
  • <!--- ------------------------------------------------- --->
  •  
  •  
  • <!---
  • Swap some of the method names; we couldn't name it "find"
  • to begin with otherwise we'd get a ColdFusion error for
  • conflicting with a native function name.
  • --->
  • <cfset this.find = this.find_ />
  •  
  • </cfcomponent>

Java's Pattern and Matcher classes are, without a doubt, amazing. They provide for a very efficient, very robust regular expression engine - much more powerful than the one at the native ColdFusion level. I'm always looking for ways to make using these classes easier. Creating a ColdFusion component wrapper might just be the easiest approach yet.




Reader Comments

With out doubt the coolest concept this year so far, and I'd not be surprised to be saying the same come Xmas. Genius! Thanks once again for the inspiration.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.