Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with: Ellen Kaspern

Using ColdFusion Custom Tags To Help Explore Complex Regular Expressions

By Ben Nadel on
Tags: ColdFusion

At the beginning of March, I'm giving a talk on Regular Expressions at Scotch on the Rocks - Europe's premier ColdFusion conference. If you have followed my blog for any time, you know that I think regular expressions are some kind of wonderful; I've even played around with ColdFusion custom-tag-based regular expression functionality. But for the conference, I wanted to try and come up with something that wasn't so much focused on functionality but, rather, focused on clarity. So, over the weekend, I created a set of custom tags that would still allow for very powerful regular-expression-replace functionality, but with built-in features that would aid in exploration and explanation.

The following ColdFusion custom tags create a sort-of domain specific language (DSL) for a regular expression replace. By using such a verbose approach, I was hoping that each element could be looked at and understood more clearly. Here is the basic outline of the tags:

  • <re:replace result="result" scope="[all,one]">
  • <re:input trim="[true]">
  •  
  • <!--- Your input text - that we are mutating. --->
  •  
  • </re:input>
  • <re:pattern verbose="[true]">
  •  
  • <!--- Your regular expression (RegEx) pattern. --->
  •  
  • </re:pattern>
  • <re:with>
  •  
  • <re:value trim="[true]">
  • <!--- Your replacement value (for the match). --->
  • </re:value>
  •  
  • </re:with>
  • </re:replace>

By default, this set of regular expression replace tags compiles the pattern in Verbose mode; that is, it compiles it in such a way that both comments and white-space are ignored. You can override this feature but, I felt that a default to verbosity would help layout the patterns in such a way that they could be picked apart and commented. Regular expressions are a pain-in-the-butt to read; as such, I wanted to leverage white-space as much as possible in order to create readability.

Ok, now that we've seen the ColdFusion custom tag concept, let's take a look at an example:

  • <!--- Import the RE tag library. --->
  • <cfimport prefix="re" taglib="./" />
  •  
  •  
  • <!---
  • Perform a replace. The Replace action uses four tags - one
  • parent and three children - in an effort to more clearly outline
  • a regular expression replace action for discussion.
  •  
  • Replace = Parent tag [result, scope]
  • Input = The target text we are mutating [trim].
  • Pattern = The regular expression pattern we are using [verbose].
  • With = The replacement text [value].
  • Value = The value we are using to replace the match [trim].
  • --->
  • <re:replace result="result">
  • <re:input>
  •  
  • Hey Ben, this is Jill. I just wanted to say that
  • I enjoyed our dinner the other night. You seem
  • like a nice guy.
  •  
  • </re:input>
  • <re:pattern>
  •  
  • # Chech for enjoyed, but only if it is NOT preceeded
  • # by a modifier.
  • (
  • # Negative look-behind.
  • (?<!
  • (?:really|greatly) \s
  • )
  • enjoyed
  • )
  •  
  • |
  •  
  • # Check for a less-than-awesome description.
  • (nice \s guy)
  •  
  • |
  •  
  • # End of the entire string.
  • (\Z)
  •  
  • </re:pattern>
  • <re:with>
  •  
  • <!--- Check to see if the first group was found. --->
  • <cfif structKeyExists( variables, "$1" )>
  •  
  • <!--- Add juicy modifier. --->
  • <re:value>really $1</re:value>
  •  
  • </cfif>
  •  
  • <!--- Check to see if the second group was found. --->
  • <cfif structKeyExists( variables, "$2" )>
  •  
  • <!--- Make better. --->
  • <re:value>devastatingly handsome guy</re:value>
  •  
  • </cfif>
  •  
  • <!--- Check to see if the third group was found. --->
  • <cfif structKeyExists( variables, "$3" )>
  •  
  • <!--- End on a high note. --->
  • <re:value trim="false"> I can't wait to be near you again.</re:value>
  •  
  • </cfif>
  •  
  • </re:with>
  • </re:replace>
  •  
  •  
  • <!--- Output the result. --->
  • <cfoutput>
  •  
  • Result: <pre>#result#</pre>
  •  
  • </cfoutput>

As you can see, the regular expression pattern compiles in Verbose mode which allows me to use a lot of white-space and commenting (and you know how much I love white-space AND commenting!). As the underlying Java Pattern Matcher iterates over the regular expression matches, it creates certain caller-scoped variables:

$0 - The entire content of the current match.

$1, $2, $3 ... $N - The content of the given captured group.

These values can then be leveraged within the ColdFusion logic that is used to determine which Value tag to render. It is the content of the Value tag that then gets replaced into the resultant text value. If a particular group within the pattern is not matched, the corresponding variable does not get created (well, it actually gets created as a NULL value); as such, structKeyExists() is used in this case to determine which part of the pattern has been matched.

When we run the above code, we get the following output:

Result:
Hey Ben, this is Jill. I just wanted to say that
I really enjoyed our dinner the other night. You seem
like a devastatingly handsome guy. I can't wait to be near you again.

As you can see, our changes were successfully merged into the result.

So, what do you think? Would this kind of a demo make regular expressions easier to discuss and potentially understand? My goal here was in no way to be brief - there are much shorter ways to get the same exact functionality. The primary objective here was to create a context in which regular expressions could be broken down into bite-sized pieces.

Anyway, now that we've seen the code in action, let's take a look at the ColdFusion custom tags; I think you'll find that they are much relatively simple.

Replace.cfm (Root tag)

  • <!--- Check to see what mode our tag is running in. --->
  • <cfif (thisTag.executionMode eq "start")>
  •  
  •  
  • <!--- Param tag attributes. --->
  •  
  • <!--- The return variable for the result. --->
  • <cfparam
  • name="attributes.result"
  • type="variableName"
  • />
  •  
  • <!---
  • The scope is the scope of the replacements. By default, we
  • will replace all matching instances. This can be overridden
  • with "one."
  • --->
  • <cfparam
  • name="attributes.scope"
  • type="regex"
  • pattern="all|one"
  • default="all"
  • />
  •  
  • <!---
  • This is the input text that we will be mutating without
  • replace action. This value can (and probably should be)
  • overriden with the Pattern child tag.
  • --->
  • <cfparam
  • name="attributes.input"
  • type="string"
  • default=""
  • />
  •  
  • <!---
  • This is the pattern we will use to iterate over the given
  • input string. If supplied in the root tag (this tag), no
  • changes will be made. However, if the pattern is supplied
  • via the Pattern tag, an (?x) verbosity attribute will be
  • added implicitly.
  • --->
  • <cfparam
  • name="attributes.pattern"
  • type="string"
  • default=""
  • />
  •  
  •  
  • <!---
  • Before we do anything, our result will be the same as our
  • input. If there are no pattern matches, this will also hold
  • true.
  •  
  • NOTE: This *will* be overridden by the nested WITH tag. This
  • is here mostly to express intent.
  • --->
  • <cfset result = attributes.input />
  •  
  •  
  • <cfelse>
  •  
  •  
  • <!---
  • At this time, our With tag has finished mergin all the
  • replacements. Now, let's store the result in the caller
  • scope.
  • --->
  • <cfset caller[ attributes.result ] = result />
  •  
  • <!---
  • Clear the generated output so we don't produce any unwanted
  • output on the page.
  • --->
  • <cfset thisTag.generatedContent = "" />
  •  
  •  
  • </cfif>

As you might have noticed in this ColdFusion custom tag, there are attributes for Input and Pattern. If you wanted to, you could use these attributes instead of the nested child tags; however, with the goal of clarity, I provided these only for theoretical interest.

Input.cfm

  • <!--- Check to see what mode our tag is running in. --->
  • <cfif (thisTag.executionMode eq "start")>
  •  
  •  
  • <!--- Param the tag attributes. --->
  •  
  • <!---
  • This determines whether or not the input should be trimmed.
  • This is the default, but can be overriden with a false.
  • --->
  • <cfparam
  • name="attributes.trim"
  • type="boolean"
  • default="true"
  • />
  •  
  •  
  • <cfelse>
  •  
  •  
  • <!--- Gather the generated content. --->
  • <cfset content = thisTag.generatedContent />
  •  
  • <!--- Check to see if the content is to be trimmed. --->
  • <cfif attributes.trim>
  •  
  • <!---
  • Perform both a general trim and a per-line trim. This
  • will remove the leading and trailing white-space on
  • every line of the input.
  • --->
  • <cfset content = reReplace(
  • trim( content ),
  • "(?m)^\s+|\s+$",
  • "",
  • "all"
  • ) />
  •  
  • </cfif>
  •  
  • <!---
  • Store the gathered content as the input in the parent tag.
  • This will override any value that was defined using the root
  • RE:replace tag.
  • --->
  • <cfset getBaseTagData( "cf_replace" ).attributes.input = content />
  •  
  • <!---
  • Clear the generated content so we don't produce any unwanted
  • output on the page.
  • --->
  • <cfset thisTag.generatedContent = "" />
  •  
  •  
  • </cfif>

By default, the input contained within the Input child node is trimmed on a per-line basis. That is, the leading and trailing white-space on each line is trimmed. This can always be overridden with the trim="false" attribute. Once collected, the input value is stored back in the parent tag's input attribute.

Pattern.cfm

  • <!--- Check to see what mode our tag is running in. --->
  • <cfif (thisTag.executionMode eq "start")>
  •  
  •  
  • <!--- Param the tag attributes. --->
  •  
  • <!---
  • This determines whether or not the verbosity flag is
  • automatically added to the collected pattern. By default,
  • it is added, but it can be overridden with a false.
  • --->
  • <cfparam
  • name="attributes.verbose"
  • type="boolean"
  • default="true"
  • />
  •  
  •  
  • <cfelse>
  •  
  •  
  • <!---
  • Gather the generated content. This will server as our regular
  • expression pattern.
  • --->
  • <cfset pattern = thisTag.generatedContent />
  •  
  • <!---
  • Check to see if we should add the verbose flag. This will
  • allow the regular expression pattern so contain comments and
  • whitespace that get ignored.
  • --->
  • <cfif attributes.verbose>
  •  
  • <!--- Prepend the verbose tag. --->
  • <cfset pattern = ("(?x)" & pattern) />
  •  
  • </cfif>
  •  
  • <!---
  • Store the gathered content as the pattern in the parent tag.
  • This will override any value that was defined using the root
  • RE:replace tag.
  • --->
  • <cfset getBaseTagData( "cf_replace" ).attributes.pattern = pattern />
  •  
  • <!---
  • Clear the generated content so we don't produce any unwanted
  • output on the page.
  • --->
  • <cfset thisTag.generatedContent = "" />
  •  
  •  
  • </cfif>

By default, the pattern is compiled with a Verbose flagg (?x). This is what allows us to use all of the white-space and commenting. If you didn't want that, for some reason, you could always override it with a verbose="false" attribute. Once collected, the pattern value is stored back into the parent tag's pattern attribute.

With.cfm

This is the tag that really does most of the heavy lifting. This is where we create the underlying Java Pattern and Matcher objects that are used to iterate over the input string and perform the regular expression replacement.

  • <!--- Check to see what mode our tag is running in. --->
  • <cfif (thisTag.executionMode eq "start")>
  •  
  •  
  • <!--- Param the tag attributes. --->
  •  
  • <!---
  • This will act as our default replacement text for our pattern
  • match. This can be overridden with a nested Value tag.
  • --->
  • <cfparam
  • name="attributes.value"
  • type="string"
  • default=""
  • />
  •  
  •  
  • <!--- Get a reference to the base tag. --->
  • <cfset replaceTag = getBaseTagData( "cf_replace" ) />
  •  
  • <!--- Compile the regular expression into a Pattern object. --->
  • <cfset pattern = createObject( "java", "java.util.regex.Pattern" )
  • .compile(
  • javaCast( "string", replaceTag.attributes.pattern )
  • )
  • />
  •  
  • <!---
  • Get a matcher for the pattern as it is applied to the
  • input text.
  • --->
  • <cfset matcher = pattern.matcher(
  • javaCast( "string", replaceTag.attributes.input )
  • ) />
  •  
  • <!---
  • Before we start iterating over the matches, we have to create
  • a string buffer in which we will build the results.
  • --->
  • <cfset buffer = createObject( "java", "java.lang.StringBuffer" ).init() />
  •  
  •  
  • <!---
  • Now, we are going to start replacing the pattern matches
  • with our value.
  • --->
  • <cfif !matcher.find()>
  •  
  • <!---
  • There was not even a single match. As such, there's no
  • need to continue processing this tag. Store the unchanged
  • input as the result in the parent tag.
  • --->
  • <cfset replaceTag.result = replaceTag.attributes.input />
  •  
  • <!--- Exit out of this tag. --->
  • <cfexit method="exitTag" />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • If we have made it this far, then there was a first match.
  • At this point, we have to prepare the CALLER scope to contian
  • the group values.
  • --->
  • <cfset caller[ "$0" ] = matcher.group() />
  •  
  • <!--- Loop over the group to store the captured groups. --->
  • <cfloop
  • index="groupIndex"
  • from="1"
  • to="#matcher.groupCount()#"
  • step="1">
  •  
  • <!---
  • Store the group value.
  •  
  • NOTE: If this group was not captured, then this value
  • will end up being NULL which means that the caller-based
  • value will be destroyed.
  • --->
  • <cfset caller[ "$#groupIndex#" ] = matcher.group(
  • javaCast( "int", groupIndex )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <cfelse>
  •  
  •  
  • <!---
  • At this point, the With tag has had a chance to execute and
  • to set up a Value for our replacement. This will have been
  • stored in our Value attribute (even if defined in a nested
  • Value tag).
  • --->
  • <cfset matcher.appendReplacement(
  • buffer,
  • javaCast( "string", attributes.value )
  • ) />
  •  
  •  
  • <!---
  • Check to see the scope of our replacement action. If it is
  • ONE, we can just stop now.
  • --->
  • <cfif (replaceTag.attributes.scope eq "one")>
  •  
  • <!--- Append the rest of the content to the buffer. --->
  • <cfset matcher.appendTail( buffer ) />
  •  
  • <!--- Store the result into the parent tag. --->
  • <cfset replaceTag.result = buffer.toString() />
  •  
  • <!--- Exit out of this tag - no more matches to be found. --->
  • <cfexit method="exitTag" />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • If we have made it this far then we are going to keep
  • replacing pattern matches until we have found them all.
  • Check for the next match.
  • --->
  • <cfif !matcher.find()>
  •  
  • <!---
  • No further matches could be found. As such, we are done
  • looking. Append the rest of the input to the buffer.
  • --->
  • <cfset matcher.appendTail( buffer ) />
  •  
  • <!--- Store the result into the parent tag. --->
  • <cfset replaceTag.result = buffer.toString() />
  •  
  • <!--- Exit out of this tag - no more matches to be found. --->
  • <cfexit method="exitTag" />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • If we have made it this far, then we found a new match. At
  • this point, we have to prepare the CALLER scope to contian
  • the group values.
  • --->
  •  
  • <cfset caller[ "$0" ] = matcher.group() />
  •  
  • <!--- Loop over the group to store the captured groups. --->
  • <cfloop
  • index="groupIndex"
  • from="1"
  • to="#matcher.groupCount()#"
  • step="1">
  •  
  • <!---
  • Store the group value.
  •  
  • NOTE: If this group was not captured, then this value
  • will end up being NULL which means that the caller-based
  • value will be destroyed.
  • --->
  • <cfset caller[ "$#groupIndex#" ] = matcher.group(
  • javaCast( "int", groupIndex )
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Loop back to the beginning of the With tag. --->
  • <cfexit method="loop" />
  •  
  •  
  • </cfif>

As you can see, this tag uses the looping potential of ColdFusion custom tags. Since the With tag needs to be executed for all the matched contained within the input string, the End-mode of the tag needs to re-execute the With-tag-body for each match. Custom tags are so wicked awesome!

Value.cfm

The Value tag does nothing more than figure out which value is going to be used within the regular expression replace performed within the With custom tag.

  • <!--- Check to see what mode our tag is running in. --->
  • <cfif (thisTag.executionMode eq "start")>
  •  
  •  
  • <!--- Param the tag attributes. --->
  •  
  • <!---
  • This determines whether or not the value should be trimmed.
  • This is the default, but can be overriden with a false.
  • --->
  • <cfparam
  • name="attributes.trim"
  • type="boolean"
  • default="true"
  • />
  •  
  •  
  • <cfelse>
  •  
  •  
  • <!--- Gather the generated content as the value. --->
  • <cfset value = thisTag.generatedContent />
  •  
  • <!--- Check to see if the value should be trimmed. --->
  • <cfif attributes.trim>
  •  
  • <cfset value = trim( value ) />
  •  
  • </cfif>
  •  
  • <!---
  • Store the generated content as the value in the parent
  • With tag.
  • --->
  • <cfset getBaseTagData( "cf_with" ).attributes.value = value />
  •  
  • <!--- Clear the generated content. --->
  • <cfset thisTag.generatedContent = "" />
  •  
  •  
  • </cfif>

That's all there is to it. Other then the With tag, I think you can see that these ColdFusion custom tags do little more than collect their own generated content. But, again, the goal here wasn't to create sleek, efficient custom tags; rather, it was to create a framework in which regular expressions could be easily picked apart and discussed. I'm not sure how I will or if I will use them within my talk; but, at least it got my thinking about picking regular expressions apart. What do you think?




Reader Comments

Very very nice. Using custom tags to help make your end code cleaner. Sweet. One small suggestion. Many times that regex is complex, but the input string is not. Most of the time it is an existing variable. Modify input.cfm so I do not need to wrap anything. Allow me to do:

<cf_input variable="#s#" />

Right now I'd have to

<cf_input><cfoutput>#s#</cfoutput></cf_input>

Which seems like overkill.

Just my 2 cents. :)

I think its the meanings of the ?*$^[] etc characters that makes them complex and daunting. If your demo code makes the patterns clearer with white space thats great.

Its probably wrong of me but I find half the appeal of regular expressions is the perlesque density of them.

Yep, I think it would go a ways to understanding regular expressions. The problem is that the expression syntax really puts my mind into a whirling cauldron of letters and symbols all covered in pea soup.

@Inj,

Thanks for the positive feedback.

@Raymond,

Definitely a good point. I could easily add that to the Input tag. Groovy.

@Adam,

Ha ha, I know what you mean. It is awesome how compact, yet powerful they are; but, when teaching people, they need to be unrolled :)

@Lola,

Yeah, exactly - reading a RegEx is highly overrated, especially if you're not even sure what it does. Parsing and organizing mentally is way too hard. Hopefully the comments / white-space will go a long way.