Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2011 (Minneapolis, MN) with:

The Regular Expression Multiline Flag In ColdFusion

By Ben Nadel on
Tags: ColdFusion

I am not sure how this applies directly to ColdFusion regular expressions and the use of REFind() and REReplace(). I have only tested this in the Java regular expressions that are available through ColdFusion (Once you go Java you can't go back). In standard regular expressions, the (^) symbol matches on the beginning of the target string and the ($) symbol matches on the end of the target string. To demonstrate this, let's store some text:

  • <!--- Store some text. --->
  • <cfsavecontent variable="strText">
  • Laughing as he snatched another plate from the stack,
  • Chalked his hands and monstrous back,
  • Said, Boy, stop lying and don't say you've forgotten!
  • Trouble with you is you aint been SQUATTIN!
  • </cfsavecontent>

Now, let's apply a regular expression that matches the entire string (^) to ($):

  • <!--- Match the entire string from start to end. --->
  • <cfset strText = strText.ReplaceAll(
  • "(^[\w\W]+$)",
  • "BEGIN::$1::END"
  • ) />

... we get the following output:

BEGIN::
Laughing as he snatched another plate from the stack,
Chalked his hands and monstrous back,
Said, Boy, stop lying and dont say you've forgotten!
Trouble with you is you aint been SQUATTIN!
::END

Notice that the (^) and ($) did indeed match the start and end of the string. Also, notice that we used a simple expressions [\w\W]+ to match every character in the string. Since \w is the word character and \W is the non-word character, we are definitely going to match every single character that we come across (remember your Ven diagrams).

Now, there is a regular expression flag, (?m), that tells the regular expression to match (^) at the beginning of EVERY line and to match ($) at the end of EVERY line. Let's run the same expression above except this time we will run a multiline flag on it.

  • <!--- Match the multiline string from start to end. --->
  • <cfset strText = strText.ReplaceAll(
  • "(?m)(^[\w\W]+$)",
  • "BEGIN::$1::END"
  • ) />

Notice that we start the expression with the (?m) multiline flag. Running this we get the following output:

BEGIN::
Laughing as he snatched another plate from the stack,
Chalked his hands and monstrous back,
Said, Boy, stop lying and dont say you've forgotten!
Trouble with you is you aint been SQUATTIN!
::END

But wait, isn't that the same exact thing as before? Yes. The problem with our example is that regular expressions are GREEDY. Our regular expression is trying to match as much as possible while still allowing matches on every line. To fix this, we must modify the expression to be non-greedy. To do so, we will add the (?) character after the (+) character:

  • <!---
  • Match the multiline string from start to end. Run
  • this expression as a non-greedy search.
  • --->
  • <cfset strText = strText.ReplaceAll(
  • "(?m)(^[\w\W]+?$)",
  • "BEGIN::$1::END"
  • ) />

This gives us the desired output:

BEGIN::
Laughing as he snatched another plate from the stack,::END
BEGIN::Chalked his hands and monstrous back,::END
BEGIN::Said, Boy, stop lying and don't say you've forgotten!::END
BEGIN::Trouble with you is you aint been SQUATTIN! ::END

Notice that the BEGIN and END strings are inserted at the beginning and the end of every line respectively.

Ok, so that might not look too sexy, but it is very useful. One of the places that I love to use something like this is when I am scripting database calls. Image that I had a delimited list of data in a data file like (we will simulate this in a CFSaveContent tag):

  • <!--- Simulate a delimited data file via CFSaveContent. ---->
  • <cfsavecontent variable="strText">
  • 1,Libby,9.0
  • 2,Anna,7.5
  • 3,Donna,6
  • 4,Sarah,9.5
  • </cfsavecontent>

We can run a multiline regular expression on this that will allow us to create a SQL statement for every line:

  • <!--- Generate SQL scripts. --->
  • <cfset strSQL = strText.ReplaceAll (
  • "(?m)^(\s)*([0-9]+),([^ ]+),([0-9.]+)$",
  • "UPDATE girl SET hotness = $4 WHERE id = $2;"
  • ) />

Here, we are using the (?m) multiline flag. We are also grouping the different data fields. This will give us the output:

UPDATE girl SET hotness = 9.0 WHERE id = 1;
UPDATE girl SET hotness = 7.5 WHERE id = 2;
UPDATE girl SET hotness = 6 WHERE id = 3;
UPDATE girl SET hotness = 9.5 WHERE id = 4;

Dump that in SQL Analyser or a CFQuery tag and we are good to go. Pretty cool huh?

If you are not using regular expressions, learn them. They are really awesome and so freakin' powerful. The multiline flag doesn't come alll that often, but it is great anytime to you want to treat each line as an individual entity. For more control, you can use this with the Java Pattern / Matcher to handle each line on it's own.




Reader Comments

Note that anytime, even when using multi-line mode, you should be able to use \A to match the start of the string and \z to match the end. Also note that I haven't tested those operators in ColdFusion, so here's hoping it's not another feature ColdFusion left off the regex support list.

Reply to this Comment

Just so we're on the same page, is the \A and \z stuff equivalent to ^ and $ when multi-line matching is NOT enabled? And, if I understand what you are saying, then even if you are doing multi-line matching, \A and \z will still match the very beginning / end of the ENTIRE string.

If that is the case, awesome. I was wondering what the heck those were for. I should just learn by now that none of this stuff is excess, that if I don't see the difference between two things, I probably am not understanding it.

Reply to this Comment

[If that is the case, awesome.]

That is the case, and it is indeed quite nifty. Another related (but not nearly as useful, IMO) operator is \Z (uppercase z), which matches at the end of the string and never before line breaks, except for the very last line break if the string ends with a line break. (Whoever thought up that operator had too much free time.)

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.