Matching Multi-Line Regular Expression Patterns In MULTILINE Mode (?m)

Posted November 5, 2009 at 2:04 PM

Tags: ColdFusion

This morning, I was (and am still) having a problem getting some MULTILINE regular expression patterns to match properly. As such, I wanted to put a quick blog post together as a sanity check for myself. As I have blogged about before, when a Java regular expression is running in multiline mode (as denoted by the "?m" flag), the "^" and "$" expressions match the line start and line terminator (respectively) rather than the start and end of the source string. This allows us to do line-level pattern matching.

This is really useful; but, when we are running our regular expressions in multiline mode, we have to be aware that the new line and carriage return data is not matched inside of the ^ and $ expressions. As such, if we want to match a pattern across multiple lines in multiline mode, we have to define the new line and carriage return expressions explicitly in our pattern. To see this, take a look at the following demo:

 Launch code in new window » Download code as text file »

  • <!--- Store target data. --->
  • <cfsavecontent variable="data">
  • AAAAAAAAAABBBBBBBBBB
  • CCCCCCCCCCDDDDDDDDDD
  • EEEEEEEEEEFFFFFFFFFF
  • GGGGGGGGGGHHHHHHHHHH
  • </cfsavecontent>
  •  
  • <!---
  • Create the Java pattern. Note that we are using the
  • MULTILINE mode flag; this will allow ^ and $ to match the
  • line delimiters.
  • --->
  • <cfset pattern = createObject( "java", "java.util.regex.Pattern" )
  • .compile(
  • javaCast(
  • "string",
  • (
  • "(?m)D.++$" &
  • "(\r\n?|\n)" &
  • "^.*+$" &
  • "(\r\n?|\n)" &
  • "^G"
  • ))
  • )
  • />
  •  
  • <!--- Get the matcher for our target text. --->
  • <cfset matcher = pattern.matcher(
  • javaCast( "string", trim( data ) )
  • ) />
  •  
  • <!--- Move to the first match. --->
  • <cfset matcher.find() />
  •  
  • <!--- Output the first match. --->
  • [#matcher.group()#]

Here, we start out matching the letter "D" and then everything until the end of the line (remember that in Java, the "dot" does not match line terminators until DOTALL mode [aka. single-line mode] is turned on using ?s). We then add the line terminators to the pattern. We then match the entire next line and match its line terminators. We then match the line start followed by "G". When we run this code, we get the followingg output:

[DDDDDDDDDD
EEEEEEEEEEFFFFFFFFFF
G]

Anyway, nothing revolutionary going on here; like I said above, I mostly put this post together as a sanity check for myself to make sure that I really understood what was going on in Java's multiline mode.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Nov 5, 2009 at 2:43 PM // reply »
29 Comments

Did you look to see if it would return the end of line character if in single line mode? I have an idea how I would go about it and may try it later.


Nov 5, 2009 at 3:34 PM // reply »
2 Comments

The way that expression is written, you know you'll have exactly one line between the DDD line and the line that starts with a G. Was that your intention? If you add a line or even a blank line no match is found. If you need more flexibility may I suggest something like:

"(?m)D+$(\r\n?|\n)((^.*$)(\r\n?|\n))+^G"

a match is found even with:

<cfsavecontent variable="data">
AAAAAAAAAABBBBBBBBBB
CCCCCCCCCCDDDDDDDDDD
EEEEEEEEEEFFFFFFFFFF

T

GGGGGGGGGGHHHHHHHHHH
</cfsavecontent>


Nov 5, 2009 at 3:48 PM // reply »
6,515 Comments

@Daniel,

Hmm, I think it would. When in single line mode, the dot will match line terminators, which means that .++ should possessively match ALL characters until it hits the end of the string.

@Travis,

Yes, I mean to only match a single line between the D and G (that was more along the lines of the use case I was trying to debug). But yes, the way you have it would be more flexible.


Nov 8, 2009 at 10:06 AM // reply »
2 Comments

This is a good web log and i like it.
It will help people to learn ColdFusion.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 20, 2009 at 5:38 PM
Learning ColdFusion 8: CFImage Part I - Reading And Writing Images
Hi Ben, Great article. I've been looking around to see if ColdFusion image engine can programatically create the following "wrap around" effect: http://www.creativepro.com/article/photoshop-s-she ... read »
Nov 20, 2009 at 5:35 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Dave: I talked to Gert he suggested: <cfhttp method="get" url="http://{some cf website}" result="stuff" addtoken="yes" /> Note the addition of cfhttp attribute addtoken. That should persist y ... read »
Nov 20, 2009 at 5:23 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, Ahh, gotcha, yeah that makes sense. ... read »
Nov 20, 2009 at 5:17 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, sorry if I didn't make this clear. You can make it work like that if you want, just put <cfset session.foo = 1> (and <cfset application.foo = 1>) in your OnRequestStart() and it reve ... read »
Nov 20, 2009 at 5:07 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, I have seen tidbits about the way Railo handles session. I can understand that it lazy-loads sessions, but I also think that I might make some things more complicated. For example, often tim ... read »
Nov 20, 2009 at 4:53 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, you can ramp up the security by turning on J2EE session which gives you a third set of numbers other than CFID/CFTOKEN. There's a reason why ACF put this in place (other than just session replic ... read »
Nov 20, 2009 at 4:52 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Case in point, Ben, you may not be aware of this, but in Railo - OnApplicationStart() & OnSessionStart() act differently than in ACF. ACF does: OnApplicationStart (1st hit) OnSessionStart (1st and e ... read »
Nov 20, 2009 at 4:46 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, That's understandable. I am not sure if this really leaves any more security holes than the fact that using old cookie-based CFID / CFTOKEN values will create a new session using the old CFI ... read »