Since I am heading off to Scotch On The Rocks tomorrow, I figured this will pretty much be my last post for the next week or so. And, since Regular Expression Day 2010 ends in about a week, I thought I would try something that I have never tried before: loading and running a .NET (dotnet) regular expression inside of ColdFusion. Starting with ColdFusion 8, Adobe has added a .NET Integration Service to the ColdFusion installer that allows your ColdFusion code to access local and remote .NET (dotnet) assemblies (DLL files). Somewhat like its Java integration, ColdFusion does this by creating proxies to the actual .NET classes.
NOTE: As this is the first time I have played with .NET integration, please take any technical explanation with a grain of salt as it is likely flawed.
In ColdFusion, we have access to the POSIX regular expression engine and the underlying Java regular expression engine; so, why even bother using the .NET (dotnet) regular expression engine? For the most part, RegEx engines are the same; but, each flavor has its own specialized features. DotNet is no different. Unfortunately, I don't know anything about the .NET regular expression engine since I've never used it before. As such, I am going to flagrantly borrow from the flagrant badassery of RegEx guru, Steven Levithan.
In .NET regular expressions, you can use named collections to help keep track of patterns matched within a given string. Using a special notation, you can either push elements onto a named collection, pop elements off of a named collection, or check to see if the collection contains any remaining elements. I won't go into too much detail on how this works (as I just tried this myself for the first time), I'll just defer to Steve Levithan's blog post on this matter.
That said, let's use this depth-tracking regular expression construct inside ColdFusion using the .NET (dotnet) integration services. In .NET, the regular expression class, System.Text.RegularExpressions.Regex, is compiled within the System.dll assembly, which is, itself, contained within the root of the .NET framework (probably buried somewhere in your Windows folder - I had to search for mine). In the following demo, we're going to match patterns in which a string of "A" characters is followed by an equal number of "B" characters.
<!--- Store the path to the .NET framework. ---> <cfset frameworkDirectory = "C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\" /> <!--- Create an instance of the .NET RegEx CLASS definition. When ColdFusion creates a .NET class, it works as if you are working with a Java class - it gives you the static class until you call the constrcutor or call a non-static method (at which point it calls the default constructor if possible). ---> <cfobject name="pattern" type="dotnet" class="System.Text.RegularExpressions.Regex" assembly="#frameworkDirectory#System.dll" /> <!--- Create a .NET regular expression pattern. This uses .NET's unique "depth" mechanism to track matching and counter-matching patterns. NOTE: I don't really know anything about this - I am borrowing it from Steve Levithan: http://blog.stevenlevithan.com/archives/balancing-groups ---> <cfsavecontent variable="dotNetPattern">(?x) ^ <!--- For each A that we encounter, push it onto the a stack with the name "Counter". ---> (?<Counter>A)+ <!--- For each B that we encounter, pop one item off of the stack with the name "Counter." ---> (?<-Counter>B)+ <!--- Once we have finished matching our A/B string, check to see if there are any items remaining on the "Counter" stack. If there are, apply an empty negative look-ahead (that will always fail). ---> (?(Counter)(?!)) $ </cfsavecontent> <!--- Now that we have our .NET regular expression pattern, let's instantiate and initialize our pattern object. Remember, the init() method is how we call the constructor (same as with the Java classes). ---> <cfset pattern.init( dotNetPattern ) /> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- Now, let's check to see if variaous strings match against this regular expression pattern. NOTE: Only the fourth one *should* match. ---> <cfoutput> #pattern.isMatch( "AAA" )#<br /> #pattern.isMatch( "AAAB" )#<br /> #pattern.isMatch( "AAABB" )#<br /> <!--- Three and three. ---> #pattern.isMatch( "AAABBB" )#<br /> #pattern.isMatch( "AAABBBB" )#<br /> #pattern.isMatch( "AAABBBBB" )#<br /> </cfoutput>
Here, I am using the "?<Counter>", "?<-Counter>", and "?(Counter)" constructs to push, pop, and check the "Counter" stack respectively. Every time I hit an "A", I push and everytime I hit a "B", I pop. Then, at the end of the string, I make sure that the collection is empty (indicating an even number of push and pop actions).
When I run the above code, I get the following output:
As you can see, only the fourth string, "AAABBB," was a match for our .NET regular expression pattern. This is because the string was composed of three "A"s followed by an equal number of three "B"s.
In other regular expression engines, you can easily match three "A"s followed by three "B"s. What makes the .NET (dotnet) regular expression engine so exciting is that equal matching (3 and 3) doesn't have to have a predetermined length! I wonder what other kinds of gems are available in the .NET pattern classes? Special thanks to Steven Levithan for providing the actual RegEx understanding.
Want to use code from this post? Check out the license.