When To Use \N And $N As Regular Expression Back-References

Posted May 17, 2010 at 8:52 AM by Ben Nadel

Tags: ColdFusion, Javascript / DHTML

The other day, I was talking to Ryan Jeffords about regular expressions (RegEx) and there was some confusion about when to use \N versus when to use $N as a captured-group back-reference. It can only be one or the other, so figuring it out is generally not a big issue. But, this does happen to be one of those things that is a bit different in each technology. As such, I thought I would write up a quick comparison of the regular expression back-references for the three main languages that I use: ColdFusion, Java, and Javascript.

In a regular expression, most anything wrapped in parenthesis is known a captured group. There are some exceptions to this, and you can use a syntax that performs non-captured grouping; but, for the most part, groups are captured from left-to-right by parenthesis. So, for example, in the following regular expression pattern:

(he(ll)o) (world)

... You would get the following captured groups:

  • Group 1: (he(ll)o)
  • Group 2: (ll)
  • Group 3: (world)

These groups can be referenced using back-references - \N and $N - both when matching and replacing a given pattern. In either case, the "N" is the numerical digit, 1-9, that represents the index of the captured group. Which notation you use - \N or $N - depends both on the technology and the execution phase (matching vs. replacing) and is what I will be exploring below.

ColdFusion Regular Expressions

In ColdFusion, we can use the reFind() and reReplace() functions to find and replace regular expressions respectively. In the following script, I am going to use both functions to test the various back-reference approaches in the two phases of pattern execution:

  • <!--- Check to see if \N works within pattern. --->
  • <cfif reFind( "(ha) \1", "ha ha" )>
  •  
  • Find using \N
  •  
  • <cfelse>
  •  
  • No find using \N
  •  
  • </cfif>
  •  
  • <br />
  • <br />
  •  
  • <!--- Check to see if $N works within pattern. --->
  • <cfif reFind( "(ha) $1", "ha ha" )>
  •  
  • Find using $N
  •  
  • <cfelse>
  •  
  • No find using $N
  •  
  • </cfif>
  •  
  • <br />
  • <br />
  •  
  • <!--- Check to see if \N or $N works in replace. --->
  • <cfoutput>
  •  
  • #reReplace(
  • "ha ha",
  • "(ha) (ha)",
  • "\1-$2"
  • )#
  •  
  • </cfoutput>

As you can see here, we are using the string, "ha ha" in all cases. This is a nice string because it is composed of a repeated pattern, "ha." When we run the above code, we get the following output:

Find using \N
No find using $N
ha-$2

To break down what is happening, here's the type of notation that you can use in the two phases of ColdFusion regular expression pattern execution:

Matching: \N
Replacing: \N

Java Regular Expressions

ColdFusion is built on top of Java but, Java uses a different regular expression engine. Therefore, the pattern rules that apply to reFind() and reReplace() (POSIX) are not necessarily the same as the pattern rules that apply to instances of the Java class, java.util.regex.Pattern. In the following test, I am going to use the "undocumented" fact that ColdFusion strings are really Java strings and therefore provide access to the Java String's regular-expression-based methods:

  • <!--- Set string value. --->
  • <cfset value = "ha ha" />
  •  
  • <!--- Check to see if \N works within pattern. --->
  • <cfif value.matches( "(ha) \1" )>
  •  
  • Find using \N
  •  
  • <cfelse>
  •  
  • No find using \N
  •  
  • </cfif>
  •  
  • <br />
  • <br />
  •  
  • <!--- Check to see if $N works within pattern. --->
  • <cfif value.matches( "(ha) $1" )>
  •  
  • Find using $N
  •  
  • <cfelse>
  •  
  • No find using $N
  •  
  • </cfif>
  •  
  • <br />
  • <br />
  •  
  • <!--- Check to see if \N or $N works in replace. --->
  • <cfoutput>
  •  
  • #value.replaceFirst(
  • "(ha) (ha)",
  • "\1-$2"
  • )#
  •  
  • </cfoutput>

Again, we are using the string, "ha ha." But, this time, we are accessing the matches() and replaceFirst() methods directly on the value, "ha ha." When we run the above code, we get the following output:

Find using \N
No find using $N
1-ha

To break down what is happening, here's the type of notation that you can use in the two phases of Java regular expression pattern execution:

Matching: \N
Replacing: $N

NOTE: The reason we get the "1" in the replace string is because in a regular expression, the syntax \X (where X is a non-special-character) simply denotes a literal character match. You'll also note that since we are executing Java through a ColdFusion context, we don't need to escape back-slashes in strings.

Javascript Regular Expressions

While the Javascript engine is not as robust as some of the other regular expressions engines, it can do some pretty amazing stuff when it comes to string matching and replacing. If you look at my general Javascript regular expression overview, you'll see that Javascript has a number of regex functions that can work in a variety of ways. That said, let's run the same demo as above, this time in a Javascript context:

  • <!DOCTYPE HTML>
  • <html>
  • <head>
  • <title>Javascript Regular Expressions</title>
  • <script type="text/javascript">
  •  
  • // Check to see if \N works in pattern.
  • if ("ha ha".search( new RegExp( "(ha) \\1", "i" ) )){
  •  
  • document.write( "Find using \\N" );
  •  
  • } else {
  •  
  • document.write( "No find using \\N" );
  •  
  • }
  •  
  • document.write( "<br><br>" );
  •  
  • // Check to see if \$ works in pattern.
  • if ("ha ha".search( new RegExp( "(ha) \\$", "i" ) )){
  •  
  • document.write( "Find using \\$" );
  •  
  • } else {
  •  
  • document.write( "No find using \\$" );
  •  
  • }
  •  
  • document.write( "<br><br>" );
  •  
  • document.write(
  • "ha ha".replace(
  • new RegExp( "(ha) (ha)", "i" ),
  • "\\1-$2"
  • )
  • );
  •  
  • </script>
  • </head>
  • <body>
  • <!-- Intentionally left blank. -->
  • </body>
  • </html>

Unlike the ColdFusion context, when we are working with strings in Javascript, we do have to escape the back-slash as a special character. Therefore, when we use \N notation in a Javascript string, we have to use, \\N, such that when the string evaluates, out regular expression pattern is left with a proper back-reference, \N. When we run the above code, we get the following output:

No find using \N
Find using \$
\1-ha

To break down what is happening, here's the type of notation that you can use in the two phases of Javascript regular expression pattern execution:

Matching: $N
Replacing: $N

So there you have it - three powerful languages providing three different flavors of regular expression execution. I know these language are all running on different RegEx engines, but I am a bit curious as to why there is no standard on how back-references work. This seems like the kind of thing that would have been nailed down after PERL (or whoever) set the standard. In any case, I hope this helps. If you are a .NET or Ruby developer, I'd love to hear how they use back-references as well.




Reader Comments

May 18, 2010 at 4:11 PM // reply »
15 Comments

I was always annoyed by the difference between how Homesite+ implemented RegEx backreference for find/replace and how CF does it. Why would the (admittedly old) CF IDE use a different backreference than CF itself?


May 18, 2010 at 8:24 PM // reply »
11,314 Comments

@David,

I know exactly what you mean. I happen to love HomeSite. In fact, HomeSite is where I learned RegEx for the first time, using the Find/Replace to clean data exports from clients. HomeSite has allll kinds of differences. It's like a sub-set of the POSIX functionality. Very frustrating when simple things like (\r\n) don't work.


May 19, 2010 at 9:49 AM // reply »
15 Comments

Homesite+ was also my introduction to ReGex. Back then, the "extended" find/replace feature made it easy to include line breaks, even in your RegEx--as long as you put them in as literals! That's sort of contrary to RegEx and probably stunted my growth/understanding of RegEx overall.

I'm generally pleased with RegEx support in eclipse/cfEclipse find/replace these days. And have switched entirely over to eclipse for all my CF, HTML, JS development. I've added the non-paid version of Aptana to Eclipse for HTML, CSS, JS but haven't begun to take advantage of of Aptana's JS library recognition--haven't figured out how to tell it that I'm using jQuery or even my own libraries with a certain CF page. But the standard JS intellisense, color coding and code formatting alone are enough to abandon Homesite.


May 19, 2010 at 9:55 AM // reply »
11,314 Comments

@David,

The extended find/replace definitely made line breaks easier! In fact, that's part of why I love the big box so much after all these years. Of course, once I started learning more about regular expressions, I wanted to just use \r\n... but no such luck. Still, it's a great feature.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Jun 18, 2013 at 3:39 PM
Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion
Hi Ben, THANKS! While not bleeding edge, it is new to me & I like learning new things every day! ... read »
Jun 18, 2013 at 12:30 PM
Disabling Auto-Correct And Auto-Capitalize Features On iPhone Inputs
Also spellcheck="false" should be mentioned as part of html5 specs ... read »
Jun 18, 2013 at 8:40 AM
Using Named Functions Within Self-Executing Function Blocks In Javascript
Hi Ben, you forgot to mention the most important thing for named self-executing functions - they can be referenced by name ONLY inside their execution context (which is parens in this case), it mean ... read »
dee
Jun 18, 2013 at 7:01 AM
My Safari Browser SQLite Database Hello World Example
hai ben, this program is really good i could understand the concept but i dint know how to save it and how to open it as you have done in the video can u give that details pls ... read »
Jun 18, 2013 at 6:04 AM
Clearing Inline CSS Properties With jQuery
Thanks a lot for for post! It helped me a lot... after being stuck since 24 hrs.. found solution from your post. Thanks again! ... read »
Jun 18, 2013 at 2:31 AM
SOTR 2013 - The Best Conference I Never Went To
I keep watching it, should keep me happily distracted until SotR14 ;) ... read »
Jun 17, 2013 at 9:45 PM
What If All User Interface (UI) Data Came In Reports?
@Jonah, As I was reading what you wrote, it occurred to me that maybe I do something similar to that in some of my client-side code. In an application I'm working on, there are a bunch of unrelated ... read »
Jun 17, 2013 at 9:36 PM
Object Thinking By David West
@Jonah, Please, don't feel bad at all. I appreciate all that you have contributed to the conversation. And, the more points of view I get, the more confident I am that I will some day, some how und ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools