Skip to main content
Ben Nadel at RIA Unleashed (Nov. 2009) with: Kimberly Morrow
Ben Nadel at RIA Unleashed (Nov. 2009) with: Kimberly Morrow ( @KimberlyMorrow )

Always Include Charset With fileRead() In ColdFusion

By
Published in Comments (2)

Lately, I've been having trouble with Russian spam being posted to this blog. You all don't see it because it goes through comment moderation first; but, it really shouldn't even be getting that far (through to the moderation step) - it should be getting blocked by my automatic content evaluation. This morning, I finally starting digging through my code to find the logic gap and realized that it was a problematic fileRead() call. Unfortunately, this code pre-dates my understanding of character encoding, and was missing the charset argument.

When you post a comment to this blog, before anything else significant happens, I run the comment through a whole lot of Regular Expression (RegEx) pattern matching. This barrage of patterns has been built-up over time in response to the spam that I see posted. I maintain these patterns in a .txt file in which each line represents an individual RegEx pattern source.

For example, a portion of this file looks like this:

viagra|cialis|sildenafil|tadalafil
printer.?repair
laptop.?battery
ugg.?(boot|shoe)

As I've started to get Russian spam, I've been adding Russian-based patterns to this text file. And, those patterns have been working fine in my local development environment which is a nix-based Docker container. But, once I deployed these patterns to production - a Windows-based VPS - they stopped working.

Here's a snippet of my code that is loading the patterns from the .txt file during ColdFusion application initialization:

component {

	// ... truncated code ...

	private array function loadAndCompilePatterns( required string filepath ) {

		var patterns = fileRead( filepath )
			.listToArray( chr( 13 ) & chr( 10 ) )
			.map(
				( patternText ) => {

					var pattern = createObject( "java", "java.util.regex.Pattern" )
						.compile( "(?i)#patternText#" )
					;

					return( pattern );

				}
			)
		;

		return( patterns );

	}

}

Notice that I have no charset included with my fileRead() invocation:

fileRead( filepath )

In my Docker container, the Russian characters worked fine. But, once this ColdFusion code made its way to the Windows server, it seems that the Russian characters weren't being decoded properly; and, were no longer being caught by my pattern matching.

To fix this, I just included utf-8 in the fileRead() call:

fileRead( filepath, "utf-8" )

With this update, my ColdFusion code - on the Windows server - was able to read-in the Russian characters properly, compile the Java Pattern objects, and is now successfully blocking Russian spam before it even gets to the comment moderation step.

Long-story short - always include the charset argument when you are performing a fileRead() operation in ColdFusion. In fact, any time you are reading or writing text data, you should include the charset.

Want to use code from this post? Check out the license.

Reader Comments

2 Comments

Thank you for this! I had some Japanese encoding go weird but everything worked fine locally. I also had to apply this to fileWrite() as well. Figured it had something to do with windows server.

15,776 Comments

@Tyler,

My pleasure! It's one of those super subtle bugs because it doesn't "break", per se, it just doesn't work 😆

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel