Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Replacing Blank Lines Using Multiline Mode RegEx Patterns In POSIX And Java In Lucee CFML 5.3.7.47

By Ben Nadel on
Tags: ColdFusion

While working on my ColdFusion custom tag DSL for HTML emails, I ran into an interesting problem when performing a multiline RegExp replace on my generated email content. This is not the first time that I've tripped over issues with multiline (?m) Regular Expression patterns and line-breaks. Though, in this case, the issue was that my RegEx pattern was failing to match adjacent lines if the pattern ended with a line-break. Or rather, it was failing in POSIX (the default ColdFusion Regular Expression engine); but, it was succeeding in Java in Lucee CFML 5.3.7.47.

At the end of the ColdFusion custom tag DSL rendering, I attempt to strip-out unnecessary whitespace. Which means, removing "blank lines"; or, lines that contain nothing other than space and tab characters.

My first attempt at this operation used the native reReplace() ColdFusion function, which uses the POSIX RegEx engine under the hood. This seemed to replace the first match of the pattern; but, left all adjacent matches in place. I then tried switching over to the Java Regular Expression engine (using the lower-level String.replaceAll() method) with the same pattern text, and the operation succeeded.

To see this divergence in behavior, let's create some control content with a string of "blank lines" and then try to strip them out using both the POSIX and the Java RegEx engines - note that I'm using Verbose mode (aka Comments mode) so that I can add comments next to the pattern text:

<cfscript>

	// We're building content that multiple "blank lines" next to each other.
	content = arrayToList(
		[
			"AAAAA",
			"BBBBB",
			"",             // Blank line.
			"  ",           // Blank line.
			" #chr( 9 )# ", // Blank line.
			"",             // Blank line.
			"CCCCC",
			"        ",     // Blank line.
			"",             // Blank line.
			"DDDDD"
		],
		chr( 10 )
	);

	// In order to make the Regular Expression (RegEx) pattern easier to read, I am
	// running it in VERBOSE mode (?x). This ignores incidental whitespace and requires
	// all whitespace characters to be explicitly provided. As such, I am using the
	// following HEX codes:
	// --
	// \x20 => Space
	// \x09 => Tab
	// --
	// This Regular Expression pattern is attempting to match "blank lines" (ie, lines
	// that have nothing but whitespace) so that I can strip those lines out in the
	// replacement operation.
	```
	<cfsavecontent variable="patternText"
		>(?mx)       <!--- Multi-Line + Verbose mode enabled. --->
		^            <!--- Match at START OF LINE. --->
		[\x20\x09]*  <!--- Leading Space or Tab characters. --->
		\n           <!--- Match line-break at end of line. --->
	</cfsavecontent>
	```

	// Note that we are using the SAME PATTERN TEXT to apply the changes using the
	// default ColdFusion Regular Expression engine (POSIX) and the lower-level Java
	// Regular Expression engine.
	cfResult = content.reReplace( patternText, "", "all" );
	javaResult = javaCast( "string", content ).replaceAll( patternText, "" );

	echo( "<h3> POSIX (CFML) Result - reReplace() </h3>" );
	echo( "<pre>#encodeForHtml( cfResult )#</pre>" );
	echo( "<h3> Java Result - .replaceAll() </h3>" );
	echo( "<pre>#encodeForHtml( javaResult )#</pre>" );

</cfscript>

As you can see, I'm using multiline mode to find lines that have nothing but string of tabs and spaces followed by a newline character. And, when we run this ColdFusion code in Lucee CFML, we get the following output:

Regular Expression replacement output in both POSIX and Java pattern matching shows different results in Lucee CFML.

As you can see, we get a different result when using the POSIX RegEx engine vs. using the Java RegEx engine. In the POSIX output, the number of "blank lines" is cut in half whereas in the Java output, the "blank lines" are removed entirely.

We can get the POSIX version (the native reReplace() function) to work by wrapping the pattern text in its own capture group and having it repeat:

<cfsavecontent variable="patternText"
	>(?mx)
	^
	<!---
		By wrapping the "blank line" in a repeating capture group, we use the
		repeating nature of the pattern to replace adjacent lines rather than leaning
		entirely on the "all" behavior of the reReplace() function.
	--->
	(
		[\x20\x09]*
		\n
	)+
</cfsavecontent>

This gets around the issue by leaning on the repeating nature of the RegEx pattern rather than relying on the "all" behavior of the reReplace() function.

I absolutely love Regular Expressions. But, they can be complex; and, tripping over the differences between the POSIX engine and the Java engine is never fun. But, hopefully this will stick to the back of my mind; and, I'll have it on hand as I continue to write sweet, sweet pattern matching Lucee CFML code.

Switching Away From the POSIX Engine

As of Adobe ColdFusion 2018, you can actually configure your ColdFusion application to use Java as the default RegEx engine by enabling the useJavaAsRegexEngine setting. I haven't tested this specifically for this example; but, I assume it means that both outputs would become identical.



Reader Comments

Hi Ben. I always use:

REReplaceNoCase(string,"[\s]+"," ", "ALL");

When I need to strip out new lines, tabs & carriage returns...
Replacing with a single space doesn't tend to cause any harm.

Reply to this Comment

@Charles,

That's not a bad idea. In my particular case, I wanted to keep the line-breaks in place because I was outputting HTML source code - and, I wanted to keep the "View Source" a bit more readable. But, yeah, I like your thinking there.

Reply to this Comment

OK. I see. Yes. My method zaps all line breaks.

Dealing with regex over multiple lines can be a bit buggy, in my experience.

I must say, I never knew about:

(?mx)

I must use this setting sometime and see if I can finally apply regex over more than one line.

This has been a bugbear of mine for many years...

Reply to this Comment

I also tried to run your code in TryCF.com and the Lucee CFML engine started to complain about a missing CFTRY tag?

In the end, I copied your code to cffiddle.org.
Now, cfffiddle.org only allows us to choose ACF CFML engine.

I had to change your code to the following before it worked:

<cfscript>
    
content = arrayToList(
	[
		"AAAAA",
		"BBBBB",
		"", 
		"  ",
		" #chr( 9 )# ",
		"",
		"CCCCC",
		"        ",
		"",
		"DDDDD"
	],
	chr( 10 )
);

patternText = "(?mx)^[\x20\x09]*\n";

result = content.reReplace( patternText, "", "all" );
javaResult = javaCast( "string", content ).replaceAll( patternText, "" );

WriteOutput( "<h3> POSIX (CFML) Result - reReplace() </h3>" );
WriteOutput( "<pre>#encodeForHtml( result )#</pre>" );
WriteOutput( "<h3> Java Result - .replaceAll() </h3>" );
WriteOutput( "<pre>#encodeForHtml( javaResult )#</pre>" );

</cfscript>

Whats interesting about all of this, is how much ACF has now diverged from Lucee!

Anyway, this has been a very useful exploration, and I am sure, at some point, I will need to strip out blank lines, using regex. So, thanks...

By the way, your results were emulated on cffiddle.org

Reply to this Comment

@Charles,

To be clear the (?) pattern is used to turn pattern flags on and off. So, in this case (?mx) is actually turning on two different flags:

  • m - Multiline matching mode.
  • x - Verbose / comment mode.

You can also turn on:

  • i - Case insensitive mode.

Which means that reFindNoCase( "pattern" ) is the same as reFind( "(?i)patttern" ).

Reply to this Comment

This is very cool.

I know how to turn on the regex flags in JavaScript:

const regex = new RegExp('[\s]+', 'igm');

But I never found out how to do this in ColdFusion? I know, I feel like an idiot;)

So, now, in ColdFusion, all I have to do is:

REReplaceNoCase("string", "(?m)patttern","replacement","ALL" ); // equivalent to -> igm

REReplaceNoCase("string", "(?m)patttern","replacement","ONE" ); // equivalent to -> im 

REReplace("string", "(?m)patttern","replacement","ALL" ); // equivalent to -> gm

REReplace("string", "(?m)patttern","replacement","ONE" ); // equivalent to -> m

Outstanding...

Reply to this Comment

To be fair, I am not sure whether Adobe have any docs on regex flags? It would be great, if you could publish a full list of Coldfusion regex flags in a reply.

This would be incredibly useful for future reference.

Here are the flags I found on regex101:

global
multi line
insensitive
extended
single line
unicode
Ungreedy
Anchored
jChanged
Dollar end only

Reply to this Comment

@Charles,

When it comes to RegEx flags, you just have to be careful that they aren't universally supported. So, what works in the Java RegEx engine may not work in the POSIX RegEx engine. Always be sure to test!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.