Matching Multi-Line Regular Expression Patterns In MULTILINE Mode (?m)

Posted November 5, 2009 at 2:04 PM by Ben Nadel

Tags: ColdFusion

This morning, I was (and am still) having a problem getting some MULTILINE regular expression patterns to match properly. As such, I wanted to put a quick blog post together as a sanity check for myself. As I have blogged about before, when a Java regular expression is running in multiline mode (as denoted by the "?m" flag), the "^" and "$" expressions match the line start and line terminator (respectively) rather than the start and end of the source string. This allows us to do line-level pattern matching.

This is really useful; but, when we are running our regular expressions in multiline mode, we have to be aware that the new line and carriage return data is not matched inside of the ^ and $ expressions. As such, if we want to match a pattern across multiple lines in multiline mode, we have to define the new line and carriage return expressions explicitly in our pattern. To see this, take a look at the following demo:

  • <!--- Store target data. --->
  • <cfsavecontent variable="data">
  • AAAAAAAAAABBBBBBBBBB
  • CCCCCCCCCCDDDDDDDDDD
  • EEEEEEEEEEFFFFFFFFFF
  • GGGGGGGGGGHHHHHHHHHH
  • </cfsavecontent>
  •  
  • <!---
  • Create the Java pattern. Note that we are using the
  • MULTILINE mode flag; this will allow ^ and $ to match the
  • line delimiters.
  • --->
  • <cfset pattern = createObject( "java", "java.util.regex.Pattern" )
  • .compile(
  • javaCast(
  • "string",
  • (
  • "(?m)D.++$" &
  • "(\r\n?|\n)" &
  • "^.*+$" &
  • "(\r\n?|\n)" &
  • "^G"
  • ))
  • )
  • />
  •  
  • <!--- Get the matcher for our target text. --->
  • <cfset matcher = pattern.matcher(
  • javaCast( "string", trim( data ) )
  • ) />
  •  
  • <!--- Move to the first match. --->
  • <cfset matcher.find() />
  •  
  • <!--- Output the first match. --->
  • [#matcher.group()#]

Here, we start out matching the letter "D" and then everything until the end of the line (remember that in Java, the "dot" does not match line terminators until DOTALL mode [aka. single-line mode] is turned on using ?s). We then add the line terminators to the pattern. We then match the entire next line and match its line terminators. We then match the line start followed by "G". When we run this code, we get the followingg output:

[DDDDDDDDDD
EEEEEEEEEEFFFFFFFFFF
G]

Anyway, nothing revolutionary going on here; like I said above, I mostly put this post together as a sanity check for myself to make sure that I really understood what was going on in Java's multiline mode.




Reader Comments

Nov 5, 2009 at 2:43 PM // reply »
32 Comments

Did you look to see if it would return the end of line character if in single line mode? I have an idea how I would go about it and may try it later.


Nov 5, 2009 at 3:34 PM // reply »
3 Comments

The way that expression is written, you know you'll have exactly one line between the DDD line and the line that starts with a G. Was that your intention? If you add a line or even a blank line no match is found. If you need more flexibility may I suggest something like:

"(?m)D+$(\r\n?|\n)((^.*$)(\r\n?|\n))+^G"

a match is found even with:

<cfsavecontent variable="data">
AAAAAAAAAABBBBBBBBBB
CCCCCCCCCCDDDDDDDDDD
EEEEEEEEEEFFFFFFFFFF

T

GGGGGGGGGGHHHHHHHHHH
</cfsavecontent>


Nov 5, 2009 at 3:48 PM // reply »
11,238 Comments

@Daniel,

Hmm, I think it would. When in single line mode, the dot will match line terminators, which means that .++ should possessively match ALL characters until it hits the end of the string.

@Travis,

Yes, I mean to only match a single line between the D and G (that was more along the lines of the use case I was trying to debug). But yes, the way you have it would be more flexible.


Nov 8, 2009 at 10:06 AM // reply »
2 Comments

This is a good web log and i like it.
It will help people to learn ColdFusion.


Aug 12, 2011 at 1:11 PM // reply »
1 Comments

Hi:

I'm trying to match both of these: Single-line and Multi-line. Basically everything between the fail( and the );

Can you help me?

fail("MockFloorServiceDelegate timed out.");

fail( "Polygon failed verification: " +
errorCode + " lastVertex: " +
lastVertex.ordinal + ",
currentVertex: " +
currentVertex.ordinal );

Thank you


Mar 11, 2013 at 11:52 AM // reply »
1 Comments

I think "(?m)D(?s).*?^G" also works.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 21, 2013 at 6:12 PM
Using Plupload For Drag & Drop File Uploads In ColdFusion
Ben, I did not see you after Pete Freitag's Lockdown session at cfObjective but he said that IIS sets file size limits at 30MB by default which just happened to be the threshold for file size when ... read »
May 21, 2013 at 11:51 AM
Ask Ben: Parsing Very Large XML Documents In ColdFusion
Looking at my first ever XML document that I have to parse and put into MS SQL 2000 with CF8. I get it to list the desired Field name, many times over, and have a long list of this field name displa ... read »
May 21, 2013 at 9:25 AM
Turning Off and On Identity Column in SQL Server
you are awesome..i am lucky to get this blog between such a garbage one....Thanks, Prashant ... read »
May 20, 2013 at 4:38 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Dana, Your confusion is well founded, since this is a very confusing features. In fact, it ONLY works if you use array notation. Meaning, that this: arrayToList( query[ "columnName" ] ) ... read »
May 20, 2013 at 4:34 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
I was thinking chicken and the egg, I wouldn't have expected it to work in the valuelist going in I guess. Maybe I just need a beer, long day :) ... read »
May 20, 2013 at 4:29 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Dana, That's if you're trying to reference a specific row. In this case, we're trying to reference the entire query column as one cohesive value. So, you are correct that if you wanted to output a ... read »
May 20, 2013 at 4:24 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
I thought when you used array notation to reference queries you always had to have the row or it would throw a similar error as well? ... read »
May 20, 2013 at 11:45 AM
Using jQuery's Animate() Step Callback Function To Create Custom Animations
This is really useful. I found out that you don't actually have to use a dummy css property (surprisingly). To animate a property in a linear-gradient for instance I did this this.css('someLinearGra ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools