Skip to main content
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton ( @thefalken )

Creating blockTrim(), inlineTrim(), and trailingTrim() Functions In ColdFusion

By
Published in Comments (1)

In terms of removing whitespace from a given string, ColdFusion has several built-in trimming functions: trim(), ltrim(), and rtrim(). On top of that, Lucee CFML has a trimWhitespace() function (which is a little confusing). And, as a code kata, I wanted to think about a few other trimming approaches: blockTrim(), inlineTrim(), and trailingTrim().

I'm borrowing the "block" and "inline" concepts from CSS logical properties. In CSS, "block" (roughly) refers to the top and bottom and "inline" (roughly) refers to the left and right. In this exploration, "block trimming" refers to removing whitespace above and below text; "inline trimming" refers to removing whitespace to the left and right of text; and, as a final function, the "trailing trim" refers to removing whitespace only from the right-side of the text (such as one might do when saving code in their text editor).

All of my trimming functions are powered by Regular Expressions. And each approach makes use of a handful of special characters:

  • \A - Matches the start of the string.

  • \Z - Matches the end of the string.

  • \s - Matches any space character (which includes space, tab, newline, and carriage return).

  • \r - Matches the carriage return (chr(13)).

  • \n - Matches the newline (chr(10)).

When dealing with text that has line-breaks, it's natural to think that the RegEx multi-line mode flag (?m) would be helpful. The problem is, multi-line mode doesn't recognize \r as part of the line-delimiter. As such, my pattern solutions all explicitly define the line-delimiters and end-of-input delimiters.

blockTrim() Function

My blockTrim() function removes the empty lines above and below text. Consider a scenario in which you're saving content using the CFSaveContent tag. In all likelihood, you have a line-break after the opening <cfsavecontent> tag and another line-break before the closing </cfsavecontent> tag. This is a code formatting issue; and wraps the primary content with some superfluous empty lines. Using blockTrim() will remove those empty lines without also removing the inline whitespace that the trim() function would remove.

In the following ColdFusion code (and in all subsequent demos in this post), I'm defining a block of text that is surrounded on all sides by whitespace. This text is then passed through blockTrim() and output to the page using a <pre> tag.

Note: My showWhitespace() method replaces spaces with . and line-breaks with +. This makes the results easier to see in a browser. The definition of this function is shared at the end of this post.

<cfscript>

	include "./utils.cfm"; // concat(), line(), and showWhitespace().

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	text = concat(
		line( "             " ),
		line( "             " ),
		line( "    Hello    " ),
		line( "    World    " ),
		line( "             " ),
		line( "             " )
	);

	// Performing a "block" trim will remove all of the EMPTY LINES that lead and trail
	// the content in the given text value. But, it will leave "inline" spaces in place.
	trimmedText = blockTrim( text );

	// Substitute WHITE SPACe when rendering for easier understanding.
	echo( "<pre>" & showWhitespace( trimmedText ) & "</pre>" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I trim the empty lines above and below the given block of text.
	*/
	public string function blockTrim( required string input ) {

		var result = input
			// Remove leading empty lines in block. By forcing the pattern to end on a
			// line-delimiter, we make sure not to remove leading spaces from the first
			// non-empty line of text.
			.reReplace( "\A\s*(\r\n?|\n|\Z)", "" )
			// Remove trailing empty lines in block. By forcing the pattern to start on a
			// line-delimiter, we make sure not to remove trailing spaces from the last
			// non-empty line of text. The last line-delimiter is left in place.
			.reReplace( "(\r\n?|\n)\s*\Z", "\1" )
		;

		return result;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

....Hello....+
....World....+

As you can see, the empty lines above and below the text were removed. But, all of the inline whitespace was left intact.

inlineTrim() Function

My inlineTrim() function removes the leading and trailing whitespace from each line, but leaves all line-breaks in place. Again, this might be useful when consuming text content that was defined via the CFSaveContent tag. In such a context, content is indented for the sake of the code formatting, not for the sake of the content.

<cfscript>

	include "./utils.cfm"; // concat(), line(), and showWhitespace().

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	text = concat(
		line( "             " ),
		line( "             " ),
		line( "    Hello    " ),
		line( "    World    " ),
		line( "             " ),
		line( "             " )
	);

	// Performing an "inline" trim will remove all of the SPACES at the start and end of
	// each line in the given text value. But, it will leave all empty lines in place.
	trimmedText = inlineTrim( text );

	// Substitute WHITE SPACE when rendering for easier understanding.
	echo( "<pre>" & showWhitespace( trimmedText ) & "</pre>" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I trim the empty spaces at the start and end of each line of the given text. But
	* leave all line-breaks in place.
	*/
	public string function inlineTrim( required string input ) {

		var TAB = chr( 9 );
		var SPACE = chr( 32 );

		// NOTE: WE could have simplified this a bit by using (?m) multi-line mode. But,
		// multi-line mode doesn't recognize the Carriage Return (\r) as a line-delimiter.
		// As such, this can lead to unexpected results in strange edge-cases.
		var result = input
			// Remove leading spaces from start of string.
			.reReplace( "\A[#TAB##SPACE#]+", "" )
			// Remove leading spaces from the start of each subsequent line.
			.reReplace( "(\r\n?|\n)[#TAB##SPACE#]+", "\1", "all" )
			// Remove trailing spaces from the end of each line.
			.reReplace( "[#TAB##SPACE#]+(\r\n?|\n|\Z)", "\1", "all" )
		;

		return result;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

+
+
Hello+
World+
+
+

As you can see, the leading and trailing whitespace was removed from each line. But, all line-breaks were left intact.

trailingTrim() Function

My trailingTrim() function doesn't really have a great use-case. But, I use functionality like this all the time in my text editor. As such, I threw it in here for funzies. This function acts like inlineTrim(); but, only removes the trailing spaces from each line, leaving the leading spaces intact:

<cfscript>

	include "./utils.cfm"; // concat(), line(), and showWhitespace().

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	text = concat(
		line( "             " ),
		line( "             " ),
		line( "    Hello    " ),
		line( "    World    " ),
		line( "             " ),
		line( "             " )
	);

	// Performing a "trailing" trim will remove all of the SPACES at the end of each line
	// in the given text value. But, it will leave all empty lines in place.
	trimmedText = trailingTrim( text );

	// Substitute WHITE SPACE when rendering for easier understanding.
	echo( "<pre>" & showWhitespace( trimmedText ) & "</pre>" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I trim the empty spaces at the end of each line of the given text. But leave all
	* line-breaks in place.
	*/
	public string function trailingTrim( required string input ) {

		var TAB = chr( 9 );
		var SPACE = chr( 32 );

		// NOTE: WE could have simplified this a bit by using (?m) multi-line mode. But,
		// multi-line mode doesn't recognize the Carriage Return (\r) as a line-delimiter.
		// As such, this can lead to unexpected results in strange edge-cases.
		var result = input
			// Remove trailing spaces from the end of each line.
			.reReplace( "[#TAB##SPACE#]+(\r\n?|\n|\Z)", "\1", "all" )
		;

		return result;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

+
+
....Hello+
....World+
+
+

As you can see, the trailing whitespace was removed from each line. But, all leading whitespace and line-breaks were left intact.

Utility Functions

In order to keep the code clean and readable, I moved a few of my demo utility functions into another ColdFusion file. While they are not really the point of this exploration, I'm including them here for completeness.

You might notice that my line() function has several commented-out return statements. I was adding/removing various line-endings while developing this code in order to make sure that my regular expression patterns continued to work when different line delimiters were applied.

<cfscript>

	/**
	* I concatenate all of the arguments into a single string.
	*/
	public string function concat() {

		return arrayToList( arguments, "" );

	}

	/**
	* I append the CRLF symbol to the end of the given string.
	*/
	public string function line( required string input ) {

		return ( input & chr( 13 ) & chr( 10 ) );
		// return ( input & chr( 13 ) );
		// return ( input & chr( 10 ) );

	}

	/**
	* I replace whitespace characters with rendered characters for easier debugging.
	*/
	public string function showWhitespace( required string input ) {

		var result = input
			// Replace spaces with dots.
			.reReplace( "[ \t]", ".", "all" )
			// Replace line breaks with plus.
			.reReplace( "(\r\n?|\n)", "+\1", "all" )
		;

		return result;

	}

</cfscript>

I've said it before and I'll say it again, regular expressions are the bee's knees! They really are powerful. If you'd like to learn more about them, checkout my video presentation: Regular Expressions, Extraordinary Power.

Want to use code from this post? Check out the license.

Reader Comments

15,810 Comments

In the comments in my code, I mention the fact that the regular expression multi-line mode (?m) flag doesn't recognize the carriage return. Which is why the pattern matching is a bit more complicated than it otherwise might be.

This morning, I asked Chat GPT if I have to care about the carriage return. Or rather, do I really have to worry about systems that might use the carriage return as the only separator:

You:
Do any modern operating systems or computer programs use character 13 (carriage return) as their only line separator?

ChatGPT:
Using only the carriage return (character 13) as a line separator is uncommon in modern operating systems and computer programs. The carriage return character typically moves the cursor to the beginning of the current line without advancing to the next line. It's often paired with the line feed character (character 10) in ASCII-based systems to represent the end of a line (CR LF).

However, there might be certain specialized or legacy systems where carriage return is used as the sole line separator. For example, some older systems or text formats might use carriage return exclusively. Additionally, in some environments or configurations, custom line endings could be employed for specific purposes.

But in mainstream modern operating systems and programs, the convention is to use a combination of carriage return and line feed (CR LF) or just the line feed character (LF) as the standard line separator. This convention is followed by operating systems like Windows (which traditionally uses CR LF), Unix-like systems (which typically use LF), and many programming languages and text processing tools.

I know ChatGPT hallucinates; but, I believe this to be accurate. I think I should stop worrying about inputs that only use chr(13) to delimit lines of text. If someone is integrating with a legacy system, I think the burden of normalizing line-endings should be deferred to the calling context, not to the manipulation functions.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel