REMatchGroups() ColdFusion User Defined Function

Posted November 15, 2007 at 11:30 AM

Tags: ColdFusion

I was reading over on CF-Talk and saw that Jon Clausen was trying to use back references in his REReplace() functions in a non-string context:

 Launch code in new window » Download code as text file »

  • <cfset pageOut = reReplace(
  • pageContent,
  • "<%show:([a-zA-Z0-9_]+)%>",
  • appSettings["\1"],
  • "ALL"
  • )/>

The problem with this is that all of the arguments are evaluated by ColdFusion at the point of initial REReplace() function execution; \1 doesn't mean anything at this point. The only reason \1 means anything when it is in a string is because that string is later evaluated for each regular expression pattern matched.

The easiest way to deal with this, is to use a function that returns the captured groups of the regular expression pattern so that you can deal with them individually. My RELoop.cfc ColdFusion custom tag can do this, but, and I'm sorry if this is ultra repetitive, I figured I would throw together a function that mimicked ColdFusion 8's REMatch() function, but with the twist that it returns the groups, not just the matched string:

 Launch code in new window » Download code as text file »

  • <cffunction
  • name="REMatchGroups"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="Returns the captrued groups for each pattern match.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Text"
  • type="string"
  • required="true"
  • hint="The target text in which we are trying to match patterns."
  • />
  •  
  • <cfargument
  • name="Pattern"
  • type="string"
  • required="true"
  • hint="The regular expression patterns that we are matching."
  • />
  •  
  • <cfargument
  • name="Scope"
  • type="string"
  • required="false"
  • default="ALL"
  • hint="The scope of pattern matching (valid is ONE or ALL)."
  • />
  •  
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = StructNew() />
  •  
  •  
  • <!--- Create an array to hold our matches. --->
  • <cfset LOCAL.Results = ArrayNew( 1 ) />
  •  
  •  
  • <!--- Create the compiled pattern object. --->
  • <cfset LOCAL.Pattern = CreateObject(
  • "java",
  • "java.util.regex.Pattern"
  • ).Compile(
  • JavaCast( "string", ARGUMENTS.Pattern )
  • )
  • />
  •  
  • <!---
  • Create the matcher for our pattern based on
  • the target text.
  • --->
  • <cfset LOCAL.Matcher = LOCAL.Pattern.Matcher(
  • JavaCast( "string", ARGUMENTS.Text )
  • ) />
  •  
  •  
  • <!---
  • Keep looping over the pattern matcher until it can no
  • longer find a match OR the searching scope is satisified.
  • --->
  • <cfloop condition="LOCAL.Matcher.Find()">
  •  
  • <!--- Create a struct to hold our groups. --->
  • <cfset LOCAL.Groups = StructNew() />
  •  
  •  
  • <!---
  • Loop over the captured groups to store each one
  • of them individually.
  • --->
  • <cfloop
  • index="LOCAL.GroupIndex"
  • from="0"
  • to="#LOCAL.Matcher.GroupCount()#"
  • step="1">
  •  
  • <!---
  • Store the captured group. If this group was not
  • captured, then the key will not be valid in the
  • struct (which is fine).
  • --->
  • <cfset LOCAL.Groups[ LOCAL.GroupIndex ] = LOCAL.Matcher.Group(
  • JavaCast( "int", LOCAL.GroupIndex )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Add this group to our results. --->
  • <cfset ArrayAppend( LOCAL.Results, LOCAL.Groups ) />
  •  
  • <!---
  • Check to see if our search scope has been
  • satisified by the number of matches found.
  • --->
  • <cfif (ARGUMENTS.Scope EQ "ONE")>
  •  
  • <!--- We found our one match, so break out. --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!--- Return the results. --->
  • <cfreturn LOCAL.Results />
  • </cffunction>

This function takes the text you are working, the regular expression patterns, and then unlike ColdFusion 8's REMatch() function, you have the option to specify match scoping - ALL or ONE (defaults to ALL). Let's take a look at an example:

 Launch code in new window » Download code as text file »

  • <!--- Crate the text that we will search. --->
  • <cfsavecontent variable="strText">
  • Jill: 212-555-1234
  • Sarah: 917.538.0001
  • Maria: 212.538.1234 x14
  • Kim: 212.555.5432 x5435
  • </cfsavecontent>
  •  
  •  
  • <!--- Collect the phone numbers. --->
  • <cfset arrMatches = REMatchGroups(
  • strText,
  • "(\d+)[. \-]?(\d+)[. \-]?(\d+)(?: x(\d+))?"
  • ) />
  •  
  •  
  • <!--- Dump out results. --->
  • <cfdump
  • var="#arrMatches#"
  • label="REMatchGroups() Results Array"
  • />

Here, we have a list of phone numbers that have various patterns. We want to grab all the numeric data, irrelevant of the delimiters, and then return the groups. Of that, we have an optional phone number extension which may or may not be returned. Running the above code, we get the following CFDump output:


 
 
 

 
REMatchGroups() Phone Number Group Output  
 
 
 

Notice that each array item contains a structure of captured groups. The zero group is always the full string match and then each indexed group represents a captured group. Don't get worried that some of the struct keys say "undefined struct element". This is just what happens when you store a Java NULL value (the result of the Matcher's Group() method) into a struct key. Regardless of what the CFDump output looks like, things like StructKeyExists() still work as expected (as you'll see in a second).

Now, let's take that array, returned above, and output the phone numbers:

 Launch code in new window » Download code as text file »

  • <!--- Loop over the phone numbers. --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#ArrayLen( arrMatches )#"
  • step="1">
  •  
  • <!--- Get the groups. --->
  • <cfset objGroups = arrMatches[ intI ] />
  •  
  • <p>
  • (#objGroups[ 1 ]#) #objGroups[ 2 ]#-#objGroups[ 3 ]#
  •  
  • <!--- Check to see if a phone ext. was found. --->
  • <cfif StructKeyExists( objGroups, "4" )>
  • x#objGroups[ 4 ]#
  • </cfif>
  • </p>
  •  
  • </cfloop>

Notice that we are assuming that groups 1, 2, and 3 exist as they are required for the pattern to match. We are then testing the existence of the 4th group to see if an extension was found. Running the above code, we get the following output:

(212) 555-1234

(917) 538-0001

(212) 538-1234 x14

(212) 555-5432 x5435

Ok, so by now, I think I have covered like every angle of finding patterns, returning groups, and acting on them in an iterative manner. Going forward, I will try to just point people to these examples rather than writing more examples.

Download Code Snippet ZIP File

Comments (2)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Adobe ColdFusion 8.0.1 Update - Helping Programmers To Be Signifanctly Less Girlie - Download ColdFusion 8 Update 8.0.1 Now.

Reader Comments

Ben,

I posted the following reply on CF-Talk as well, but I figured I'd post it here as well since CF-Talk seems to be running slow today:

<blockquote>
Ben,

Thanks! As always, you rock! I had kind of figured that the method execution order was the issue, but was hoping to find a workaround to "trick" the excution so that the regex backref was captured before the method was called.

Very nice code and potentially very useful! I'm not entirely sure that, in my situation, it's going to be faster than using a loop with reFind() as I have currently, though I'm going to check it out with some timers later today. I'll let you know the results.

What you've written, though could be tremendously useful for parsing the contents of a file and turning into workable data for reuse. In my case I'm simply replacing the user provided shorthand "tokens" so there's no need to retain the information after the token has been replaced.

Nice work!

Jon
</blockquote>

:-)

Posted by Jon Clausen on Nov 15, 2007 at 3:03 PM


@Jon,

Always glad to help. If nothing else, it gives me just one more example to which to point people when possible. Plus, always happy to find new ways in which to make regular expression useful to all programmers.

Posted by Ben Nadel on Nov 15, 2007 at 3:11 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting