Converting Between String And Binary Values In ColdFusion
For years, I have been using the toString(), toBase64(), and toBinary() functions to convert values back and forth between String and Binary formats in ColdFusion. According to the documentation, however, these methods are mostly deprecated. A few people have told me to use the charsetEncode() and charsetDecode() functions instead. And, for years, I ignored this advice. But yesterday, and the behest of Grumpy CFer, I finally decided to look into it.
Before we look at charsetEncode() and charsetDecode(), let's quickly take a look at how I currently convert string values to binary, and vice-versa:
<cfscript>
// These two functions represent the "deprecated" approach to
// converting string values back and forth between binary
// (according to the ColdFusion documentation).
function stringToBinary( String stringValue ){
var base64Value = toBase64( stringValue );
var binaryValue = toBinary( base64Value );
return( binaryValue );
}
function binaryToString( Any binaryValue ){
var stringValue = toString( binaryValue );
return( stringValue );
}
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// Create a message to convert.
message = "Donkeys!";
// Try a string-to-binary-to-string conversion.
messageAsBinary = stringToBinary( message );
messageAsString = binaryToString( messageAsBinary );
writeOutput( "Original: " & message );
writeOutput( "<br />" );
writeOutput( "Binary: " & "[B" & arrayLen( messageAsBinary ) );
writeOutput( "<br />" );
writeOutput( "Converted: " & messageAsString );
</cfscript>
As you can see, I use the toString(), toBase64(), and toBinary() functions. And, when I run the above code, I get the following output:
Original: Donkeys!
Binary: [B8
Converted: Donkeys!
Excellent! And, this has been working for years.
Ok, now let's take a look at the charsetEncode() and charsetDecode() functions. These two functions convert a value into String format and into Binary format, respectively. Below is the same code as above, only using these newer, recommended functions:
<cfscript>
// These two functions represent the "recommended" way of
// converting string values back and forth from binary (according
// to the ColdFusion documentation since ColdFusion MX 7).
function stringToBinary( String stringValue ){
var binaryValue = charsetDecode( stringValue, "utf-8" );
return( binaryValue );
}
function binaryToString( Any binaryValue ){
var stringValue = charsetEncode( binaryValue, "utf-8" );
return( stringValue );
}
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// Create a message to convert.
message = "Donkeys!";
// Try a string-to-binar-to-string conversion.
messageAsBinary = stringToBinary( message );
messageAsString = binaryToString( messageAsBinary );
writeOutput( "Original: " & message );
writeOutput( "<br />" );
writeOutput( "Binary: " & "[B" & arrayLen( messageAsBinary ) );
writeOutput( "<br />" );
writeOutput( "Converted: " & messageAsString );
</cfscript>
When we run this code, we get the same exact output:
Original: Donkeys!
Binary: [B8
Converted: Donkeys!
If Adobe is recommending these functions, I'll start using them as I assume they are recommended for a performance reason. However, the older functions appear to be more straightforward. Sure, I had to use an intermediary toBase64() call; but, the old methods read very well and remove the noise of having to choose a character encoding (does anyone not use UTF-8?).
Want to use code from this post? Check out the license.
Reader Comments
Adobe might recommend using charsetDecode/charsetEncode especially because of the character encoding. This is only speculation of course.
@Guillaume,
I assume so as well; but, I wonder what the charset was assumed to be in the earlier versions where it could not be defined??
Not using UTF-8 is very common. Notably, Java uses UTF-16 for their internal string representation. UTF-8 has only become popular (relatively) recently, due to the fact that it's partly backwards compatible with ASCII.
I wouldn't be terribly surprised if the default for toString was ASCII (an easy way to test would be to pass a unicode symbol through, and if it gets mangled). It might also do some checking (UTF-16 is easy to recognize because it starts with a special symbol called the byte-order-mark, aka BOM. UTF-8 isn't as easy, but some character values are disallowed in it, so you can rule it out some of the time) before falling back on ASCII.
If you haven't read it already, I recommend Joel Spolsky's blog post about what every developer should know about Unicode/Character encodings (here http://www.joelonsoftware.com/articles/Unicode.html). It explains enough of the details about it (including why specifying them is necessary) to make sense, without going overboard.
This is especially useful knowledge. I'm thinking of the struggles with strings to binary, particularly when dealing with DB2 and EBCDIC.
Hi Ben,
In your ColdFusion apps when do you find yourself converting from string to binary and vice versa? I have seen it used for representing a XML file as a string and attaching files in an email. Good post (as always), but I feel like I could use more knowledge about potential applications for this in ColdFusion.
@Thom,
In ColdFusion, strings *are* Java strings, and they seem to be convertible based on a UTF-8 encoding assumption. When you refer to Java's internal string representation, are you referring to the strings we use? Or something done behind the scenes?
I'll try to do some more experimentation and I'll read that Spolsky article, thanks!
@Brian,
Awesome - glad this will help!
@Noah,
No problem - I typically use string-to-binary conversion when streaming values back to the client with the CFContent tag:
By using the "variable" attribute, the response closes with the value being streamed. Basically, this resets the output buffer and makes sure that ONLY the given response value (the string being encoded) shows up in the response stream.
Today I had a case where toString worked and charsetEncode didn't which I thought I'd share here as this thread popped up when researching :)
Locally I didn't need either of these, but the MySQL version on the live server is a few notches behind my testing server, and for some reason I was getting the error "ByteArray objects cannot be converted to strings."
The value in question was a timestamp returned via lsDateFormat, but timestamps in general weren't kicking back the error. It was only ts values that were returned with a line as followed in the SQL query that generated the error: "MAX(IF(table.id = 2, tbl.timestamp, NULL)) date2". Obviously this isn't common, but annoying nonetheless :) The issue is either down to the driver or the actual db version, but I'd guess the db version given it works locally.
Anyway, converting date2 with toString like so fixed this: LSdateformat(toString(date2),"yyyy-mm-dd"). Doing a conversion with charsetEncode just kicked back errors on NULL values, meaning another cfif that wasn't necessary using toString.
Long story short, I kind of preferred toString in this case ;)