Using .NET (dotnet) Regular Expressions In ColdFusion
Since I am heading off to Scotch On The Rocks tomorrow, I figured this will pretty much be my last post for the next week or so. And, since Regular Expression Day 2010 ends in about a week, I thought I would try something that I have never tried before: loading and running a .NET (dotnet) regular expression inside of ColdFusion. Starting with ColdFusion 8, Adobe has added a .NET Integration Service to the ColdFusion installer that allows your ColdFusion code to access local and remote .NET (dotnet) assemblies (DLL files). Somewhat like its Java integration, ColdFusion does this by creating proxies to the actual .NET classes.
NOTE: As this is the first time I have played with .NET integration, please take any technical explanation with a grain of salt as it is likely flawed.
In ColdFusion, we have access to the POSIX regular expression engine and the underlying Java regular expression engine; so, why even bother using the .NET (dotnet) regular expression engine? For the most part, RegEx engines are the same; but, each flavor has its own specialized features. DotNet is no different. Unfortunately, I don't know anything about the .NET regular expression engine since I've never used it before. As such, I am going to flagrantly borrow from the flagrant badassery of RegEx guru, Steven Levithan.
In .NET regular expressions, you can use named collections to help keep track of patterns matched within a given string. Using a special notation, you can either push elements onto a named collection, pop elements off of a named collection, or check to see if the collection contains any remaining elements. I won't go into too much detail on how this works (as I just tried this myself for the first time), I'll just defer to Steve Levithan's blog post on this matter.
That said, let's use this depth-tracking regular expression construct inside ColdFusion using the .NET (dotnet) integration services. In .NET, the regular expression class, System.Text.RegularExpressions.Regex, is compiled within the System.dll assembly, which is, itself, contained within the root of the .NET framework (probably buried somewhere in your Windows folder - I had to search for mine). In the following demo, we're going to match patterns in which a string of "A" characters is followed by an equal number of "B" characters.
<!--- Store the path to the .NET framework. ---> <cfset frameworkDirectory = "C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\" /> <!--- Create an instance of the .NET RegEx CLASS definition. When ColdFusion creates a .NET class, it works as if you are working with a Java class - it gives you the static class until you call the constrcutor or call a non-static method (at which point it calls the default constructor if possible). ---> <cfobject name="pattern" type="dotnet" class="System.Text.RegularExpressions.Regex" assembly="#frameworkDirectory#System.dll" /> <!--- Create a .NET regular expression pattern. This uses .NET's unique "depth" mechanism to track matching and counter-matching patterns. NOTE: I don't really know anything about this - I am borrowing it from Steve Levithan: http://blog.stevenlevithan.com/archives/balancing-groups ---> <cfsavecontent variable="dotNetPattern">(?x) ^ <!--- For each A that we encounter, push it onto the a stack with the name "Counter". ---> (?<Counter>A)+ <!--- For each B that we encounter, pop one item off of the stack with the name "Counter." ---> (?<-Counter>B)+ <!--- Once we have finished matching our A/B string, check to see if there are any items remaining on the "Counter" stack. If there are, apply an empty negative look-ahead (that will always fail). ---> (?(Counter)(?!)) $ </cfsavecontent> <!--- Now that we have our .NET regular expression pattern, let's instantiate and initialize our pattern object. Remember, the init() method is how we call the constructor (same as with the Java classes). ---> <cfset pattern.init( dotNetPattern ) /> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- Now, let's check to see if variaous strings match against this regular expression pattern. NOTE: Only the fourth one *should* match. ---> <cfoutput> #pattern.isMatch( "AAA" )#<br /> #pattern.isMatch( "AAAB" )#<br /> #pattern.isMatch( "AAABB" )#<br /> <!--- Three and three. ---> #pattern.isMatch( "AAABBB" )#<br /> #pattern.isMatch( "AAABBBB" )#<br /> #pattern.isMatch( "AAABBBBB" )#<br /> </cfoutput>
Here, I am using the "?<Counter>", "?<-Counter>", and "?(Counter)" constructs to push, pop, and check the "Counter" stack respectively. Every time I hit an "A", I push and everytime I hit a "B", I pop. Then, at the end of the string, I make sure that the collection is empty (indicating an even number of push and pop actions).
When I run the above code, I get the following output:
As you can see, only the fourth string, "AAABBB," was a match for our .NET regular expression pattern. This is because the string was composed of three "A"s followed by an equal number of three "B"s.
In other regular expression engines, you can easily match three "A"s followed by three "B"s. What makes the .NET (dotnet) regular expression engine so exciting is that equal matching (3 and 3) doesn't have to have a predetermined length! I wonder what other kinds of gems are available in the .NET pattern classes? Special thanks to Steven Levithan for providing the actual RegEx understanding.
Want to use code from this post? Check out the license.
Thanks Ben for the wonderful tutorial.
But..I'm getting the following error!!
If a dll is specified in the assembly list, DotNetExtension must also be installed.
The .NET Integration stuff is actually an additional service that needs to be installed that runs along side the ColdFusion Application service. This way, you can turn it on and off without stopping ColdFusion.
It's one of the checkboxes in the ColdFusion installer. I never install it so last night when I wanted to play around with it, I actually ended up just uninstalling and re-installing ColdFusion (I couldn't figure out how to run an "update" for the installer without kicking into multi-instance mode).
I'm sure there is a way to just add on the .NET Integration stuff, but that goes beyond my understanding.
No need to reinstall CF to get the .net integration services. On the CD or in the extras download is a CF8-DotNetIntegration.exe/ColdFusion_9_DotNetIntegration_WWE.exe to install the .net integration service into a existing installation.
What happens if you got the installer from the website, so it's a single EXE file? Is the DotNet installer just put in the ColdFusion install directories?
If you don't install the service with the ColdFusion installer you will not have the .net integration installer on your disk. You will need to take it from the cd or download. For CF8 download it was in a big extras.zip file, but the installer is just a single exe file. The cf8 download was just possible after a login with a purchased license on adobe.com. Don't know about CF9.
It's an Install Anywhere installer like for CF itself and looks the same. You can choose some options like path and installation mode:
You can install the .net integration services on a windows server where no ColdFusion is installed and connect from unix cf installation to use the .net feature.
CF will be configured to use the new .net integration services installation.
Hmm, I'll have to check out the extras.zip file on my home machine (at work, I am running in multi-instance mode, which I am still learning about).
For anyone who is interested.
You can also just download the .NET integration services .exe from the adobe website.
Awesome!! Thanks for pointing this out. I know people have mentioned that it can be installed separately, but I had no idea where from. This is brilliant, thanks.
This is very useful information. Thanks for sharing.
This is very good technically correct and easy to understand information about Using .NET Regular Expressions In ColdFusion.
Hi Ben, thanks for your sharing and I tested your script in my window server 2008 + CF8.0.1 the script is not functioning. Getting the error message of
Object Instantiation Exception.
An exception occurred when instantiating a Java object. The class must not be an interface or an abstract class. Error: ''.
I tested some other .net dll which I wrote it for testing purposely, all not functioning, getting the same error message.
Is there any tips or setting that I can look into it?
Thanks and hope to hear from you soon.