Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at TechCrunch Disrupt (New York, NY) with:

ColdFusion 10 - reEscape() vs. Java Pattern's Quote() Method

By Ben Nadel on
Tags: ColdFusion

I love Regular Expressions! I think they are the cat's pajamas. I use them all the time. That's why, when I saw that ColdFusion 10 had introduced a new Regular Expression method - reEscape() - I was eager to start using it right away. As you can guess from the name, reEscape() takes a string and escapes any special regular expression control characters found within the string. This creates a safe string literal than can be used inside of a larger regular expression pattern. Typically, when I need to escape a sequence of characters, I use the Java Pattern method, quote(). This only works with the underlying Java Regular Expression engine; but, I thought it would be nice to compare the two approaches anyway.

NOTE: At the time of this writing, ColdFusion 10 was in public beta.

I love ColdFusion's native regular expression methods like reFind() and reMatch(). However, I often find myself dipping down into the underlying Java layer to leverage the faster, more robust Java Regular Expression engine. When dealing with the Java classes, Pattern and Matcher, I can use the method quote() to escape regular expression sequences. As Steven Levithan pointed out years ago, however, these escape sequences only work in Java and cannot be used with the native ColdFusion RE-methods. This makes it particularly exciting that ColdFusion 10 now provides reEscape() to prepare string literals for native RE-methods.

That said, let's look at what the two different methods are doing to string values:

  • <cfscript>
  •  
  • // Escape a string with ColdFusion 10's new reEscape() method.
  • // This escapes all the special control characters in the string.
  • //
  • // NOTE: This escape sequence can be used in BOTH a ColdFusion
  • // and a Java regular expression pattern.
  • writeOutput(
  • "reEscape: " &
  • reEscape(
  • "[who's] Johnny, she said (and smiled in her special way)."
  • )
  • );
  •  
  • writeOutput( "<br />" );
  •  
  • // Escape a string using Java Pattern's quote() function. This
  • // escapes all the special control characters in the string.
  • //
  • // NOTE: This escape sequence can ONLY BE USED in a Java regular
  • // expression pattern.
  • writeOutput(
  • "Quote: " &
  • createObject( "java", "java.util.regex.Pattern" ).quote(
  • "[who's] Johnny, she said (and smiled in her special way)."
  • )
  • );
  •  
  • </cfscript>

Here, we are escaping a string that contains several special Regular Expression control characters: "[", "]", "(", ")", and ".". When we run the above code, we get the following page output:

reEscape: \[who's\] Johnny, she said \(and smiled in her special way\)\.
Quote: \Q[who's] Johnny, she said (and smiled in her special way).\E

As you can see, the two escape methods provide very different results. Java uses the "\Q..\E" construct to escape the entire sequence. reEscape(), on the other hand, escapes every special character individually. The good news is, since reEscape()'s approach is more general, the result of reEscape() can be used in both the native ColdFusion regular expression methods and the underlying Java regular expression methods. That's a "win" if you ask me.

And, in the spirit of Regular Expressions, don't forget that Internal Regular Expression Day comes up on June 1st! It will be a day of massive string-manipulation celebration.




Reader Comments

Good to see that ColdFusion is still giving regular expressions at least a little attention. :)

For the record, here's how you can do this in various other languages:

* Perl:

  • quotemeta(str)

* PHP:

  • preg_quote(str)

* Python:

  • re.escape(str)

* Ruby:

  • Regexp.escape(str)

* C#, VB.NET:

  • Regex.Escape(str)

Java, as you mentioned, does it via

  • Pattern.quote(str)

.

Interesting that Java escapes strings by simply wrapping

  • \Q..\E

around them. I didn't know that. Have you tested what happens when

  • \E

is included in the string that you pass to Java to escape?

Also, my XRegExp library ( http://git.io/xregexp ) lets you do this in JavaScript via

  • XRegExp.escape(str)

. There was some recent discussion on the es-discuss mailing list about adding a native JavaScript

  • RegExp.escape

method in ECMAScript 6, but at least for now it looks like that won't be happening.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.