Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Amsterdam) with: Tom de Manincor
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Amsterdam) with: Tom de Manincor ( @tomdeman )
🎄 Early Access Discount 🎄 Feature FlagsFrom Concept to Cultural Revolution

Which Whitespace Characters Does trim() Remove In ColdFusion

By on
Tags:

Yesterday, an external API call that I was making failed because one of the values that I was posting contained a trailing "Zero width space" character (\u200b). The value in question was being passed-through ColdFusion's native trim() function; which was clearly not removing this whitespace character. As such, it occurred to me that I didn't really know which characters are (and are not) handled by the trim() function. And so, I wanted to run a test.

One of the things that I love about Lucee CFML is that all of the source code is posted right there on GitHub. So, if I want to know how something is working under the hood, I can just go look at it. When we look at Lucee's implementation of the trim() function, we can see that it is handing control off to Java's String.trim() method. And, Java's String.trim() removes all ASCII characters from \u0000 up to (and including) \u0020 (the space character).

Of course, since Adobe ColdFusion's code is closed-source, we can't know what it is doing. We can only test it. And, do this, I'm collecting all of the "standard" whitespace characters and the non-standard whitespace characters (that I identified in my text-normalization component) and I'm looping over them to see if they survive a call to trim():

<cfscript>

	testCharacters = [
		// Standard "whitespace" charaters.
		hexToChar( "0009" ), // Tab.
		hexToChar( "0010" ), // Line Break.
		hexToChar( "0013" ), // Carriage Return.
		hexToChar( "0020" ), // Space.

		// Non-stanard "whitespace" characters.
		hexToChar( "00a0" ), // No-Break Space.
		hexToChar( "2000" ), // En Quad (space that is one en wide).
		hexToChar( "2001" ), // Em Quad (space that is one em wide).
		hexToChar( "2002" ), // En Space.
		hexToChar( "2003" ), // Em Space.
		hexToChar( "2004" ), // Thic Space.
		hexToChar( "2005" ), // Mid Space.
		hexToChar( "2006" ), // Six-Per-Em Space.
		hexToChar( "2007" ), // Figure Space.
		hexToChar( "2008" ), // Punctuation Space.
		hexToChar( "2009" ), // Thin Space.
		hexToChar( "200a" ), // Hair Space.
		hexToChar( "200b" ), // Zero Width Space.
		hexToChar( "2028" ), // Line Separator.
		hexToChar( "2029" ), // Paragraph Separator.
		hexToChar( "202f" ), // Narrow No-Break Space.
		hexToChar( "feff" )  // Zero Width No-Break Space.
	];

	// For each test whitespace character, let's see if it survives a trim() call.
	for ( c in testCharacters ) {

		writeOutput( len( trim( c ) ) );
		writeOutput( " , " );

	}

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I convert the given hex-encoded character to an ASCII character.
	*/
	public string function hexToChar( required string hexEncoded ) {

		return( chr( inputBaseN( hexEncoded, 16 ) ) );

	}

</cfscript>

As you can see, I start with our four most common control-characters and spaces; and then, I follow with a variety of other uncommon whitespace characters. When we run this code in either Lucee CFML or Adobe ColdFusion, we get the same output:

0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

As you can see, the first 4 test characters (Tab, Line-Break, Carriage Return, Space) were all removed by the trim() function - which matches what Java's String.trim() function is documented to do. And, all of the other uncommon whitespace characters remain. As such, I think it would be fair to assume that Adobe ColdFusion's trim() function is likely also handing control off to Java's String.trim() implementation. Which means that both CFML engines only remove characters \u0000 up to and including \u0020 in their trim() function implementations.

Want to use code from this post? Check out the license.

Reader Comments

15,540 Comments

@Chris,

I am lucky in so much as my IDE (SublimeText) has special notation for non-visible characters. I was processing email addresses; and, when one didn't work, I opened the underlying data-file and saw this at the end of one of the lines:

<0x200b>

Then, I looked up which character was 200b and that's how I realized what was going on :) Just lucky!

80 Comments

I ran into issues with NBSPs in Excel files. Clients use Excel to import data into their databases and may records weren't matching because trailing non-breaking spaces were in some of the cells and they weren't able to be trimmed using CFML or SQL functions.

I use VSCode with this extension to highlight NBSPs.
https://marketplace.visualstudio.com/items?itemName=viktorzetterstrom.non-breaking-space-highlighter

This VSCode extension can highlight a pre-configured array of characters.
https://marketplace.visualstudio.com/items?itemName=wengerk.highlight-bad-chars

15,540 Comments

There must be some sort of keyboard command that people are accidentally hitting to enter stuff like this. I know that I accidentally put in all kinds of strange "accents" and "emoji" when I hit Alt/Ctrl when typing. I have to assume that the same kind of shortcuts exist for these strange spaces. Otherwise, I can't understand where they come from.

80 Comments

I've seen characters added when data is copied from one Microsoft program to another. Excel often adds unwanted carriage returns. I've logged some hack attempts that use unsafe spaces in an attempt to bypass keyword blocklists. (ie, it looks the same but won't get blocked.)

Even worse IMHO is Microsoft's autocorrecting & format-as-you-type "smart quotes" features that seem to be re-enabled whenever Microsoft 365 apps are updated. If I didn't explicitly type these characters, I don't want them to ever be outputted.

15,540 Comments

@James,

100% on the smart-quotes. Whenever I get copy from the Marketing team that I have to add to the product, the first thing I have to do is go in and replace all the smart-quotes with regular ASCII characters. :shakes-fist:

6 Comments

Hi Ben

Thanks for sharing, do you have a 'proper' trim function you've built to accommodate these extras that we can use instead of trim()?

TIA

Dave

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel