Skip to main content
Ben Nadel at Node.js Training with Nodejitsu (Apr. 2011) with: Paolo Fragomeni and Charlie Robbins and Marak Squires
Ben Nadel at Node.js Training with Nodejitsu (Apr. 2011) with: Paolo Fragomeni ( @hij1nx ) Charlie Robbins ( @indexzero ) Marak Squires ( @maraksquires )

ColdFusion 10 - reEscape() vs. Java Pattern's Quote() Method

By on
Tags:

I love Regular Expressions! I think they are the cat's pajamas. I use them all the time. That's why, when I saw that ColdFusion 10 had introduced a new Regular Expression method - reEscape() - I was eager to start using it right away. As you can guess from the name, reEscape() takes a string and escapes any special regular expression control characters found within the string. This creates a safe string literal than can be used inside of a larger regular expression pattern. Typically, when I need to escape a sequence of characters, I use the Java Pattern method, quote(). This only works with the underlying Java Regular Expression engine; but, I thought it would be nice to compare the two approaches anyway.

NOTE: At the time of this writing, ColdFusion 10 was in public beta.

I love ColdFusion's native regular expression methods like reFind() and reMatch(). However, I often find myself dipping down into the underlying Java layer to leverage the faster, more robust Java Regular Expression engine. When dealing with the Java classes, Pattern and Matcher, I can use the method quote() to escape regular expression sequences. As Steven Levithan pointed out years ago, however, these escape sequences only work in Java and cannot be used with the native ColdFusion RE-methods. This makes it particularly exciting that ColdFusion 10 now provides reEscape() to prepare string literals for native RE-methods.

That said, let's look at what the two different methods are doing to string values:

<cfscript>

	// Escape a string with ColdFusion 10's new reEscape() method.
	// This escapes all the special control characters in the string.
	//
	// NOTE: This escape sequence can be used in BOTH a ColdFusion
	// and a Java regular expression pattern.
	writeOutput(
		"reEscape: " &
		reEscape(
			"[who's] Johnny, she said (and smiled in her special way)."
		)
	);

	writeOutput( "<br />" );

	// Escape a string using Java Pattern's quote() function. This
	// escapes all the special control characters in the string.
	//
	// NOTE: This escape sequence can ONLY BE USED in a Java regular
	// expression pattern.
	writeOutput(
		"Quote: " &
		createObject( "java", "java.util.regex.Pattern" ).quote(
			"[who's] Johnny, she said (and smiled in her special way)."
		)
	);

</cfscript>

Here, we are escaping a string that contains several special Regular Expression control characters: "[", "]", "(", ")", and ".". When we run the above code, we get the following page output:

reEscape: \[who's\] Johnny, she said \(and smiled in her special way\)\.
Quote: \Q[who's] Johnny, she said (and smiled in her special way).\E

As you can see, the two escape methods provide very different results. Java uses the "\Q..\E" construct to escape the entire sequence. reEscape(), on the other hand, escapes every special character individually. The good news is, since reEscape()'s approach is more general, the result of reEscape() can be used in both the native ColdFusion regular expression methods and the underlying Java regular expression methods. That's a "win" if you ask me.

And, in the spirit of Regular Expressions, don't forget that Internal Regular Expression Day comes up on June 1st! It will be a day of massive string-manipulation celebration.

Want to use code from this post? Check out the license.

Reader Comments

172 Comments

Good to see that ColdFusion is still giving regular expressions at least a little attention. :)

For the record, here's how you can do this in various other languages:

* Perl:

quotemeta(str)

* PHP:

preg_quote(str)

* Python:

re.escape(str)

* Ruby:

Regexp.escape(str)

* C#, VB.NET:

Regex.Escape(str)

Java, as you mentioned, does it via

Pattern.quote(str)

Interesting that Java escapes strings by simply wrapping

\Q..\E

around them. I didn't know that. Have you tested what happens when

\E

is included in the string that you pass to Java to escape?

Also, my XRegExp library ( http://git.io/xregexp ) lets you do this in JavaScript via

XRegExp.escape(str)

There was some recent discussion on the es-discuss mailing list about adding a native JavaScript

RegExp.escape

method in ECMAScript 6, but at least for now it looks like that won't be happening.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel