Turning Modes On And Off Within A Regular Expression

Posted January 13, 2011 at 10:50 AM by Ben Nadel

Tags: ColdFusion

Most regular expression engines have at least some support for flags and modes within their patterns. Flags such a (x) Verbose, (i) Case-Insensitive, and (m) Multiline change the mode of the regular expression matching which, of course, changes the type of text that can be matched. I've used these flags plenty of times in the past; however, with my Scotch on the Rocks (SOTR) presentation approaching, I figured it was time to really see just how these flags can be used within a regular expression.

Up until now, I've only ever used these flags at the beginning of a regular expression in order to turn on a given mode for the entire pattern. For example, as you saw in my ColdFusion custom tag blog post yesterday, I was prepending every pattern with the (x) flag:

  • <cfset pattern = ("(?x)" & pattern) />

... in order to turn on Verbose mode for the entire pattern matching operation.

As it turns out, however, the modes determined by these flags don't just have to be toggled once. These flags can be used throughout the regular expression pattern in order to turn on and off the associated modes at will. So, for example, if you wanted to turn on the case-insensitive mode half-way through the pattern, you would just include the (?i) flag in the middle:

abc(?i)xyz

In this case, the literal "abc" would be matched by case; however, the latter half of the pattern, "xyz," would be matched without case. Each flag turns on the corresponding mode for the part of the pattern that follows it.

While everything I've looked at so far involved turning on modes, flags can also be used to turn off modes within a regular expression. To turn a mode off, simple precede the flag (or set of flags) with a minus sign:

(?-i)
(?i-m)
(?-ixm)
(?mx-i)

The above flags simply demonstrate a variety of ways in which the minus sign can be integrated within the flag construct. All the flags to the left of the minus sign are used to turn modes on; all flags to the right of the minus sign are used to turn modes off. So, for example, the pattern:

(?i-xm)

... is turning on case-insensitive mode (i), but turning off verbose mode (x) and multi-line mode (m).

With these on/off toggles, we can now apply a given set of modes to a portion of a regular expression pattern. There is, however, an even more terse way to apply a set of modes to a single portion of a regular expression pattern: a non-capturing group.

In the past, I've only ever used a non-capturing group to define a group that doesn't get tracked as back-reference:

(?:non captured group)

As it turns out, pattern flags can be applied in this context; and, when applied, they are only turned on or off for the duration of the non-capturing group:

(?i:non captured group)

In this example, we are turning on the case-insensitive mode (i), but only for the duration of the non-capturing group.

This is pretty awesome stuff!

Well, sort of. This level of support doesn't actually exist in all regular expression engines. In fact, in my testing, I discovered that this level of support doesn't even exist at the ColdFusion level. Just as with Javascript, all flags used within a regular expression pattern get applied to the entire pattern, not just to the portion of the pattern that follows it. As such, you can't turn on a mode for only part of a pattern.

And, any attempt to turn off a mode within ColdFusion will throw an error like this:

Sequence (?-...) not recognized null

... and, any attempt to use a non-capturing group to apply a mode to a specific portion of a regular expression pattern will throw an error like this:

Sequence (?:...) not recognized null

The regular expression engine at the ColdFusion level is good for basic stuff; but, unfortunately, it sucks for almost everything else (including performance). Luckily, however, none of the constraints at the ColdFusion level apply to the regular expression engine at the Java level. So, let's dip down and get our hands dirty with the Java Pattern Matcher object.

  • <cffunction
  • name="jreMatch"
  • access="public"
  • returntype="array"
  • output="false"
  • hint="I gather all instances of the given pattern within the given string.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="pattern"
  • type="string"
  • required="true"
  • hint="I am the regular expression pattern being matched within the input string."
  • />
  •  
  • <cfargument
  • name="input"
  • type="string"
  • required="true"
  • hint="I am the input in which the patterns are being matched."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var local = {} />
  •  
  • <!--- Get the matcher for the given regular expression. --->
  • <cfset local.matcher =
  • createObject( "java", "java.util.regex.Pattern" )
  • .compile( javaCast( "string", arguments.pattern ) )
  • .matcher( javaCast( "string", arguments.input ) )
  • />
  •  
  • <!--- Create an array to hold the matches. --->
  • <cfset local.matches = [] />
  •  
  • <!---
  • Keep searching the input string while matches of our regular
  • expression pattern can be found.
  • --->
  • <cfloop condition="local.matcher.find()">
  •  
  • <!--- Add the current match to the collection. --->
  • <cfset arrayAppend(
  • local.matches,
  • local.matcher.group()
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Return the aggregated matches. --->
  • <cfreturn local.matches />
  • </cffunction>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!--- Define our input. --->
  • <cfset input = "ABCxyz" />
  •  
  • <cfoutput>
  •  
  • INPUT == #input#<br />
  • <br />
  •  
  • abc:
  • #arrayToList( jreMatch( "abc", input ) )#
  • <br />
  •  
  • (?i)abc ==
  • #arrayToList( jreMatch( "(?i)abc", input ) )#
  • <br />
  •  
  • (?i)ab(?-i)c ==
  • #arrayToList( jreMatch( "(?i)ab(?-i)c", input ) )#
  • <br />
  •  
  • (?i:ab)c ==
  • #arrayToList( jreMatch( "(?i:ab)c", input ) )#
  • <br />
  •  
  • (?i:abc) ==
  • #arrayToList( jreMatch( "(?i:abc)", input ) )#
  • <br />
  •  
  • (?i)(abc)(XYZ) ==
  • #arrayToList( jreMatch( "(?i)(abc)(XYZ)", input ) )#
  • <br />
  •  
  • (?i)(ABC)(?-i)(XYZ) ==
  • #arrayToList( jreMatch( "(?i)(ABC)(?-i)(XYZ)", input ) )#
  • <br />
  •  
  • (?i)(abc)(?-i:xyz) ==
  • #arrayToList( jreMatch( "(?i)(abc)(?-i:xyz)", input ) )#
  • <br />
  •  
  • abc(?i)XYZ ==
  • #arrayToList( jreMatch( "abc(?i)XYZ", input ) )#
  • <br />
  •  
  • ABC(?i)XYZ ==
  • #arrayToList( jreMatch( "ABC(?i)XYZ", input ) )#
  • <br />
  •  
  • </cfoutput>

Since there is no reMatch()-style method in Java, we start out by creating a ColdFusion UDF that uses the Java Matcher object in order to compile a collection of pattern matches. Then, we go about trying various flags to turn on and off case-insensitive mode within various regular expressions. When we run the above code, we get the following output:

INPUT == ABCxyz

abc:
(?i)abc == ABC
(?i)ab(?-i)c ==
(?i:ab)c ==
(?i:abc) == ABC
(?i)(abc)(XYZ) == ABCxyz
(?i)(ABC)(?-i)(XYZ) ==
(?i)(abc)(?-i:xyz) == ABCxyz
abc(?i)XYZ ==
ABC(?i)XYZ == ABCxyz

So there you go; at the Java level, you can use flags to both turn on and off a given mode (or set of modes) for very specific durations of a regular expression pattern. I'll tell you, though, if we ever get the ability to overwrite native methods in ColdFusion, the first thing I'm gonna do is overwrite all the Regular-Expression functions to use the Java pattern matching engine; the more I learn about regular expressions, the more I am I'm finding the ColdFusion regular expression engine to be quite limited.




Reader Comments

Apr 18, 2011 at 9:35 AM // reply »
1 Comments

Its really really awsome stuff. You have exposed some hidden jewels in regular expressions. Thanks a lot.


Apr 18, 2011 at 10:59 AM // reply »
10,743 Comments

@Rakesh,

Glad you liked this - regular expressions are so awesome :)


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 16, 2012 at 8:18 PM
Best Of ColdFusion 10 Contest Entry - HTML Email Utility
Just found this, looks good! I'm trying to run it on local, it's the 64bit version and I'm experiencing horrible lag. On average the generate.cfm processes the content change in 60-90 seconds. I've ... read »
May 16, 2012 at 6:40 PM
Maintaining Sessions Across Multiple ColdFusion CFHttp Requests
I am trying to integrate this CFHTTPsession into an application that will log into zeekrewards.com to post ads and I am not having any luck. The code works perfectly for logging into other websites, ... read »
May 16, 2012 at 2:44 PM
Creating A Sometimes-Fixed-Position Element With jQuery
Thank you, very useful technique! Worked like a charm. ... read »
May 16, 2012 at 1:58 PM
Movies As A Religious Experience
Acting can, in a way, ruin the movie-goer's experience. I used to be able to get so caught up in movies and their plots, and totally engaged. But lately, I haven't been able to as much with a lot o ... read »
May 16, 2012 at 1:52 PM
The Science Of Optimal Post-Exercise Nutrition
children of this age eat very less vegetables so u can opt for salads they will like it also carrot ,cucumber,onion and as far as pulses are concerned u can boil them ,give him along with mashed rice ... read »
May 16, 2012 at 1:34 PM
Strange ColdFusion JRUN Stack Overflow Error
Hey, Recently I updated my jrun4 using the latest updater 7 and now i am having memory issues :(:(:( any help is appreciated ... read »
May 16, 2012 at 9:56 AM
ColdFusion 10 Beta, Apache Tomcat, And Symbolic Links On Mac OSX
Hi, Now that ColdFusion 10 is out I have stumbled over this as well and I cannot figure out the proper solution. We're running virtual hosts via Apache2; the ColdFusion-applications store their fil ... read »
May 15, 2012 at 6:03 PM
Movies As A Religious Experience
@Ben, I don't know whether you'd consider this a religious observation, but it seems to me, in a sense, movies multiply how many lives we get to have. Each movie is like a little extra life we get ... read »