ColdFusion 10 - reEscape() vs. Java Pattern's Quote() Method

Posted March 29, 2012 at 9:31 AM by Ben Nadel

Tags: ColdFusion

I love Regular Expressions! I think they are the cat's pajamas. I use them all the time. That's why, when I saw that ColdFusion 10 had introduced a new Regular Expression method - reEscape() - I was eager to start using it right away. As you can guess from the name, reEscape() takes a string and escapes any special regular expression control characters found within the string. This creates a safe string literal than can be used inside of a larger regular expression pattern. Typically, when I need to escape a sequence of characters, I use the Java Pattern method, quote(). This only works with the underlying Java Regular Expression engine; but, I thought it would be nice to compare the two approaches anyway.

NOTE: At the time of this writing, ColdFusion 10 was in public beta.

I love ColdFusion's native regular expression methods like reFind() and reMatch(). However, I often find myself dipping down into the underlying Java layer to leverage the faster, more robust Java Regular Expression engine. When dealing with the Java classes, Pattern and Matcher, I can use the method quote() to escape regular expression sequences. As Steven Levithan pointed out years ago, however, these escape sequences only work in Java and cannot be used with the native ColdFusion RE-methods. This makes it particularly exciting that ColdFusion 10 now provides reEscape() to prepare string literals for native RE-methods.

That said, let's look at what the two different methods are doing to string values:

  • <cfscript>
  •  
  • // Escape a string with ColdFusion 10's new reEscape() method.
  • // This escapes all the special control characters in the string.
  • //
  • // NOTE: This escape sequence can be used in BOTH a ColdFusion
  • // and a Java regular expression pattern.
  • writeOutput(
  • "reEscape: " &
  • reEscape(
  • "[who's] Johnny, she said (and smiled in her special way)."
  • )
  • );
  •  
  • writeOutput( "<br />" );
  •  
  • // Escape a string using Java Pattern's quote() function. This
  • // escapes all the special control characters in the string.
  • //
  • // NOTE: This escape sequence can ONLY BE USED in a Java regular
  • // expression pattern.
  • writeOutput(
  • "Quote: " &
  • createObject( "java", "java.util.regex.Pattern" ).quote(
  • "[who's] Johnny, she said (and smiled in her special way)."
  • )
  • );
  •  
  • </cfscript>

Here, we are escaping a string that contains several special Regular Expression control characters: "[", "]", "(", ")", and ".". When we run the above code, we get the following page output:

reEscape: \[who's\] Johnny, she said \(and smiled in her special way\)\.
Quote: \Q[who's] Johnny, she said (and smiled in her special way).\E

As you can see, the two escape methods provide very different results. Java uses the "\Q..\E" construct to escape the entire sequence. reEscape(), on the other hand, escapes every special character individually. The good news is, since reEscape()'s approach is more general, the result of reEscape() can be used in both the native ColdFusion regular expression methods and the underlying Java regular expression methods. That's a "win" if you ask me.

And, in the spirit of Regular Expressions, don't forget that Internal Regular Expression Day comes up on June 1st! It will be a day of massive string-manipulation celebration.


You Might Also Be Interested In:



Reader Comments

Mar 29, 2012 at 2:38 PM // reply »
172 Comments

Good to see that ColdFusion is still giving regular expressions at least a little attention. :)

For the record, here's how you can do this in various other languages:

* Perl:

  • quotemeta(str)

* PHP:

  • preg_quote(str)

* Python:

  • re.escape(str)

* Ruby:

  • Regexp.escape(str)

* C#, VB.NET:

  • Regex.Escape(str)

Java, as you mentioned, does it via

  • Pattern.quote(str)

.

Interesting that Java escapes strings by simply wrapping

  • \Q..\E

around them. I didn't know that. Have you tested what happens when

  • \E

is included in the string that you pass to Java to escape?

Also, my XRegExp library ( http://git.io/xregexp ) lets you do this in JavaScript via

  • XRegExp.escape(str)

. There was some recent discussion on the es-discuss mailing list about adding a native JavaScript

  • RegExp.escape

method in ECMAScript 6, but at least for now it looks like that won't be happening.


Mar 30, 2012 at 3:23 PM // reply »
11,243 Comments

@Steve,

Sorry about the line-breaks :) I never did update my CODE snippet stuff.


Oct 25, 2012 at 2:22 AM // reply »
63 Comments

Great stuff Ben ... :-)

You 'could' simply use String.format to straighten out the high chars ...

http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 23, 2013 at 11:06 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Are you talking about As Number: YES As String: YES As Java: YES? If so, that's with 3 different ways of referencing the constant 1, not users.id[1]. Query object references(*) are what seem ... read »
May 23, 2013 at 9:55 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dan, According to the CF Admin, I'm running Java "1.6.0_45". As far as the DB column, in the database it's an INT. I'll see if I can dig into what CF sees it as. @WebManWalking, But h ... read »
May 23, 2013 at 9:49 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, I think the problem is that we're used to loose typing in ColdFusion, like JavaScript. If a value is a number but it's needed in an expression to be a string, noooo problem. I've encountered ... read »
May 23, 2013 at 9:47 AM
ColdFusion QueryAppend( qOne, qTwo )
You rock! Thank you, thank you, thank you!!! ... read »
May 23, 2013 at 5:19 AM
Ask Ben: Print Part Of A Web Page With jQuery
How to print also the background color of table cells and table lines ... read »
May 23, 2013 at 3:55 AM
Javascript Array Methods: Unshift(), Shift(), Push(), And Pop()
very interesting and helpful too. ... read »
May 22, 2013 at 5:35 PM
Script Tags, jQuery, And Html(), Text() And Contents()
This is still an issue 2 years later. jQuery is supposed to remediate these cross browser issues, no? I have been unable to find any statement from the jQuery team calling this behavior "by de ... read »
May 22, 2013 at 12:44 PM
Ask Ben: Query Loop Inside CFScript Tags
In cf10, if you call a function that has: local.result = {}; local.result.msg = ""; local.svc = new query(); local.svc.setSQL("SELECT * FROM..."); local.obj = local.svc.exe ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools