Skip to main content
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Ryan Vikander and Kashif Khan and John Hiner
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Ryan Vikander ( @rvikander ) Kashif Khan John Hiner

Understanding The TrimWhitespace() Function In Lucee CFML

By on
Tags:

The other day, when I was looking into which whitespace characters are removed by trim(), I came across a Lucee CFML function that I hadn't seen before: trimWhitespace(). The Function doesn't have an in-depth description; and, looking at the Java code didn't immediately clarify the function's behavior. As such, I wanted to try it out for myself in order to see if the function might be useful to me in the future.

To start building up a mental model, I created a <cfsavecontent> buffer that combined various whitespace and non-space characters in various orders. However, whenever I went to save the file, SublimeText kept trying to trim some of the spaces (which is what I want it to do in most cases). So, instead of using whitespace directly, I used some placeholder characters:

  • + → Space (Chr 32)
  • ~ → Tab (Chr 9)

Then, I replaced these with the proper whitespace character before calling trimWhitespace():

<cfsavecontent variable="buffer">
++~++
+__+~+__~+~__~~__~++++
+++++
+~
+__++__+~+__~+~__~~__~
~++++
</cfsavecontent>
<cfscript>

	cleaned = buffer
		.replace( "+", chr( 32 ), "all" )
		.replace( "~", chr( 9 ), "all" )
		.trimWhitespace()
		.replace( chr( 9 ), "T", "all" )
		.replace( chr( 10 ), "N", "all" )
		.replace( chr( 32 ), "S", "all" )
	;

	echo( cleaned );

</cfscript>

As you can see, I have all manner of whitespace character combinations. And, when we run this Lucee CFML code, we get the following output:

N__S__T__T__N__S__S__T__T__N

After going back-and-forth between the input and the output, I think I finally understand the rules:

  • Any series of whitespace characters that contains a Newline is collapsed down into a single Newline character.

  • And series of whitespace characters that does not contain a Newline is collapsed down into the first whitespace character in the series.

Ironically, the trimWhitespace() function doesn't actually "trim" the string (leaving Newlines on both ends in my example). Really, it's "collapsing" whitespace, not trimming it. That said, I do like the fact that it reduces multiple newlines down into a single newline. I can see that being helpful in various text-processing workflows.

Want to use code from this post? Check out the license.

Reader Comments

81 Comments

The java source for Lucee's trimWhitespace() function is available at:
https://github.com/lucee/Lucee/blob/8554dddfffcdc5fdb0c4d9f298c61bc0d6c837d2/core/src/main/java/lucee/runtime/functions/string/TrimWhiteSpace.java#L9

It looks like it filters some common ASCII7 space characters, but not UTF-8 or the non-breaking space (NBSP; ASCII code 160)

I've experienced some abuses where UTF-8 "thin & hair spaces" or "zero-width space/non-joiner/joiner" characters are used. (Comment form spammers attempt to bypass filters by adding some of these non-visible characters in the middle of spammy phrases.)

15,688 Comments

@James,

I love that we can see Lucee's code! Such a benefit of having it open-source. I can't tell you how many times I've wanted to see what Adobe ColdFusion is doing behind the scenes!

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel