Inverting Your Thinking About List Parsing In ColdFusion
One of the many joyful features in ColdFusion is the fact that lists of comma-delimited values receive a first-class treatment in the language. That is, we have many constructs for creating, parsing, and mutating lists despite the fact that they are just string values. For the most part, thinking about lists in terms of their delimiters is straightforward. After all, the vast majority of lists are just comma separated values. But, some lists can be more intricate. And, in those cases, it can help to invert your thinking about lists by parsing out the items rather than the delimiters.
In ColdFusion, strings are inherently multi-line. Which means, a string can include line-breaks without the ceremony of escape sequences. This feature can be used to define lists that are human-friendly. Consider the following ColdFusion code in which we need to parse a list that includes tabs, spaces, and line-breaks in addition to the standard comma delimiter:
<cfscript> | |
fieldList = " | |
id , | |
name , | |
code , | |
createdAt | |
"; | |
// Treat the input as a delimited list with several delimiters. In ColdFusion, the | |
// default behavior is to collapse several sequential delimiters down into a single | |
// delimiter. As such, the comma followed by several white-space characters (newline, | |
// tab, etc) will all become one delimiter (for all intents and purposes). | |
fields = fieldList.listToArray( ",#chr( 9 )##chr( 10 )##chr( 13 )##chr( 32 )#" ); | |
dump( label = "re: List Delimiters", var = fields ); | |
</cfscript> |
We can still parse our fieldList
string variable as a delimited list, no problem; but, we need to include both the comma and all of the whitespace characters when parsing the list. By default, ColdFusion will just collapse a series of delimiters down into a single delimiter so that we don't end up with a bunch of zero-length list-items:

That worked perfectly. But, defining all those delimiters is a bit of a pain. In this case, it would be easier to think about the list in terms of the items rather than the delimiters. Instead of worrying about all the various whitespace characters, we can think of the important parts as a sequence of "word characters". Then, we can use simple regular expression patterns to extract the "items" using reMatch()
:
<cfscript> | |
fieldList = " | |
item_id , | |
item_name , | |
item_code , | |
item_createdAt | |
"; | |
// Instead of thinking about the list in terms of its delimiters, we can invert our | |
// thinking and consider the list in terms of the items. For lists with a variety of | |
// delimiters, this can make the parsing much easier (for humans) to read. In this | |
// case, we'll use regular expression matching to extract all "word characters". | |
fields = fieldList.reMatch( "\w+" ); | |
// We could have also used the following: | |
_fields = fieldList.reMatch( "[^,\s]+" ); // NOT comma or whitespace. | |
_fields = fieldList.reMatchNoCase( "[a-z_]+" ); | |
dump( label = "re: List Items", var = fields ); | |
</cfscript> |
By inverting our thinking, and moving from .listToArray()
to .reMatch()
, we still end up with the same output:

But, we greatly simplified our parsing. Let's look at the two approaches side by side:
<cfscript> | |
// Delimiter-oriented thinking. | |
fieldList.listToArray( ",#chr( 9 )##chr( 10 )##chr( 13 )##chr( 32 )#" ); | |
// Item-oriented thinking. | |
fieldList.reMatch( "\w+" ); | |
</cfscript> |
To be clear, I'm not saying that inverting your consideration of lists is always warranted. In fact, it may not be warranted in most cases (like we said above, the vast majority of lists are just comma-separated items). But, it's helpful to know that you can invert your thinking once your set of delimiters passes a tipping point of complexity.
Want to use code from this post? Check out the license.
Reader Comments
Post A Comment — ❤️ I'd Love To Hear From You! ❤️