ColdFusion 10 - reEscape() vs. Java Pattern's Quote() Method
I love Regular Expressions! I think they are the cat's pajamas. I use them all the time. That's why, when I saw that ColdFusion 10 had introduced a new Regular Expression method - reEscape() - I was eager to start using it right away. As you can guess from the name, reEscape() takes a string and escapes any special regular expression control characters found within the string. This creates a safe string literal than can be used inside of a larger regular expression pattern. Typically, when I need to escape a sequence of characters, I use the Java Pattern method, quote(). This only works with the underlying Java Regular Expression engine; but, I thought it would be nice to compare the two approaches anyway.
NOTE: At the time of this writing, ColdFusion 10 was in public beta.
I love ColdFusion's native regular expression methods like reFind() and reMatch(). However, I often find myself dipping down into the underlying Java layer to leverage the faster, more robust Java Regular Expression engine. When dealing with the Java classes, Pattern and Matcher, I can use the method quote() to escape regular expression sequences. As Steven Levithan pointed out years ago, however, these escape sequences only work in Java and cannot be used with the native ColdFusion RE-methods. This makes it particularly exciting that ColdFusion 10 now provides reEscape() to prepare string literals for native RE-methods.
That said, let's look at what the two different methods are doing to string values:
<cfscript> // Escape a string with ColdFusion 10's new reEscape() method. // This escapes all the special control characters in the string. // // NOTE: This escape sequence can be used in BOTH a ColdFusion // and a Java regular expression pattern. writeOutput( "reEscape: " & reEscape( "[who's] Johnny, she said (and smiled in her special way)." ) ); writeOutput( "<br />" ); // Escape a string using Java Pattern's quote() function. This // escapes all the special control characters in the string. // // NOTE: This escape sequence can ONLY BE USED in a Java regular // expression pattern. writeOutput( "Quote: " & createObject( "java", "java.util.regex.Pattern" ).quote( "[who's] Johnny, she said (and smiled in her special way)." ) ); </cfscript>
Here, we are escaping a string that contains several special Regular Expression control characters: "[", "]", "(", ")", and ".". When we run the above code, we get the following page output:
reEscape: \[who's\] Johnny, she said \(and smiled in her special way\)\.
Quote: \Q[who's] Johnny, she said (and smiled in her special way).\E
As you can see, the two escape methods provide very different results. Java uses the "\Q..\E" construct to escape the entire sequence. reEscape(), on the other hand, escapes every special character individually. The good news is, since reEscape()'s approach is more general, the result of reEscape() can be used in both the native ColdFusion regular expression methods and the underlying Java regular expression methods. That's a "win" if you ask me.
And, in the spirit of Regular Expressions, don't forget that Internal Regular Expression Day comes up on June 1st! It will be a day of massive string-manipulation celebration.
Want to use code from this post? Check out the license.
Good to see that ColdFusion is still giving regular expressions at least a little attention. :)
For the record, here's how you can do this in various other languages:
* C#, VB.NET:
Java, as you mentioned, does it via
Interesting that Java escapes strings by simply wrapping
around them. I didn't know that. Have you tested what happens when
is included in the string that you pass to Java to escape?
method in ECMAScript 6, but at least for now it looks like that won't be happening.
Sorry about the line-breaks :) I never did update my CODE snippet stuff.
Great stuff Ben ... :-)
You 'could' simply use String.format to straighten out the high chars ...