Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Using Base32 Encoding And Decoding In ColdFusion

By Ben Nadel on
Tags: ColdFusion

Earlier this week, I looked at converting a binary value into a "bit stream" that would allow for sub-byte reads. I did this because I was trying to figure out how to run Base32 encoding in ColdFusion, which requires splitting binary data up into 5-bit chunks. This was a surprisingly complicated task, which took me 2 days to figure out. But, I wrapped up what I learned into a ColdFusion component called Base32.cfc.

As I was researching Base32 encoding (since I didn't know much about it at all), I did see that there are Apache Commons libraries that implement Base32 encoding. But, those don't seem to be installed in my local version of Java. So, I set about trying to implement [some of] the Base32 standard on my own, in ColdFusion.

I don't think I've implemented all the constraints; and, I didn't even look at Base32Hex (which uses the same encoding process but with a different encoding alphabet). But, I ran it through a series of simple tests and it seems to work.

During the encoding and decoding process, I started out using a Java ByteArrayOutputStream. However, after reading this article, it was suggested that a Java ByteBuffer would have much better performance when converting the buffer into a binary value (which is what I do at the end of each process). The trade-off is one of complexity in that the ByteBuffer has to be pre-allocated, which means that you have to calculate the length of the output before you start the encoding or decoding process.

Internally, the encoding and decoding process uses byte-arrays; but, in order to make the ColdFusion component easier to use, I made methods for both binary and string inputs:

  • encode( string ) :: base32-encoded string.
  • encodeBinary( binary ) :: base32-encoded binary.
  • decode( base32-string ) :: decoded string.
  • decodeBinary( base32-binary ) :: decoded binary.

And the code for Base32.cfc:

<cfscript>
// NOTE: CFScript used for Gist color coding. Can be removed.
component
	output = false
	hint = "I provide encoding and decoding methods for Base32 values."
	{

	/**
	* I server no purpose since the methods on this component are "static".
	*
	* @output false
	*/
	public any function init() {

		return( this );

	}


	// ---
	// STATIC METHODS.
	// ---


	/**
	* I decode the given Base32-encoded string value.
	*
	* @output false
	* @hint The input string is assumed to be utf-8.
	*/
	public string function decode( required string input ) {

		var binaryOutput = decodeBinary( charsetDecode( input, "utf-8" ) );

		return( charsetEncode( binaryOutput, "utf-8" ) );

	}


	/**
	* I decode the given Base32-encoded binary value.
	*
	* @output false
	*/
	public binary function decodeBinary( required binary input ) {

		// I map the encoded bytes onto the original 5-bit partial input bytes.
		var decodingMap = getDecodingMap();

		// I hold the intermediary, decoded bytes.
		var buffer = getAllocatedDecodingBuffer( input );

		// The input maybe be padded with "=" to make sure that the value is evenly
		// divisible by 8 (to make the length of data more predictable). Once we hit
		// this byte (if it exists), we have consumed all of the encoded data.
		var terminatingByte = asc( "=" );

		// I help zero-out the parts of the byte that were not discarded.
		var rightMostBits = [
			inputBaseN( "1", 2 ),
			inputBaseN( "11", 2 ),
			inputBaseN( "111", 2 ),
			inputBaseN( "1111", 2 )
		];

		// As we loop over the encoded bytes, we may have to build up each decoded byte
		// across multiple input bytes. This will help us keep track of how many more
		// bits we need to complete the pending byte.
		var decodedByte = 0;
		var bitsNeeded = 8;

		// Decode each input byte.
		for ( var byte in input ) {

			// If we hit the EOF byte, there's nothing more to process.
			if ( byte == terminatingByte ) {

				break;

			}

			// Get the original 5-bit input that was encoded.
			var partialByte = decodingMap[ byte ];

			// If we need more than 5 bits, we can consume the given value in it's
			// entirety without actually filling the pending bit.
			if ( bitsNeeded > 5 ) {

				// Push the incoming 5-bits onto the end of the pending byte.
				decodedByte = bitOr( bitSHLN( decodedByte, 5 ), partialByte );

				bitsNeeded -= 5;

			// If we need exactly 5 more bits, we can use the given value to complete
			// the pending bit.
			} else if ( bitsNeeded == 5 ) {

				// Push the incoming 5-bits onto the end of the pending byte.
				decodedByte = bitOr( bitSHLN( decodedByte, 5 ), partialByte );

				// At this point, the pending byte is complete.
				buffer.put( decodedByte );

				decodedByte = 0;
				bitsNeeded = 8;

			// If we need between 1 and 4 bits, we have to consume the given value
			// across, two different pending bytes since it won't fit entirely into the
			// currently-pending byte (the leading bits complete the currently-pending
			// byte, then the trailing bits start the next pending byte).
			} else {

				var discardedCount = ( 5 - bitsNeeded );

				// Push only the leading bits onto the end of the pending byte.
				decodedByte = bitOr(
					bitSHLN( decodedByte, bitsNeeded ),
					bitSHRN( partialByte, discardedCount )
				);

				// At this point, the pending byte is complete.
				buffer.put( decodedByte );

				// Start the next pending byte with the trailing bits that we discarded
				// in the last operation.
				decodedByte = bitAnd( partialByte, rightMostBits[ discardedCount ] );

				bitsNeeded = ( 8 - discardedCount );

			}

			// NOTE: We will never need an ELSE case that requiers zero bits to complete
			// the pending byte. Since each case that can result in a completed byte
			// (need 5 bits (1) or less than 5 bits (2)) already push a byte on to the
			// result, we will never complete a byte without pushing it onto the output.

		}

		// Return the result as a binary value.
		return( buffer.array() );

	}


	/**
	* I encode the given string value using Base32 encoding.
	*
	* @output false
	* @hint The input string is assumed to be utf-8.
	*/
	public string function encode( required string input ) {

		var binaryOutput = encodeBinary( charsetDecode( input, "utf-8" ) );

		return( charsetEncode( binaryOutput, "utf-8" ) );

	}


	/**
	* I encode the given binary value using Base32 encoding.
	*
	* @output false
	*/
	public binary function encodeBinary( required binary input ) {

		// I map the 5-bit input chunks to the base32-encoding bytes.
		var encodingMap = getEncodingMap();

		// Base32-encoded strings must be divisible by 8 (so that the length of the data
		// is more predictable). We'll pad the null characters with "=".
		var paddingByte = asc( "=" );

		// I hold the intermediary, encoded bytes.
		var buffer = getAllocatedEncodingBuffer( input );

		// In order to iterate over the input bits more easily, we'll wrap it in a
		// BigInteger instance - this allows us to check individual bits without having
		// to calculate the offset across multiple bytes.
		var inputWrapper = createObject( "java", "java.math.BigInteger" ).init( input );

		// Since BigInteger will not take leading zeros into account, we have to
		// explicitly calculate the number of input bits based on the number of input
		// bytes.
		var bitCount = ( arrayLen( input ) * 8 );

		// Since each encoded chunk uses 5 bits, which may not divide evenly into a
		// set of 8-bit bytes, we need to normalize the input. Let's sanitize the input
		// wrapper to be evenly divisible by 5 (by pushing zeros onto the end). This
		// way, we never have to worry about reading an incomplete chunk of bits from
		// the underlying data.
		if ( bitCount % 5 ) {

			var missingBitCount = ( 5 - ( bitCount % 5 ) );

			inputWrapper = inputWrapper.shiftLeft( javaCast( "int", missingBitCount ) );

			bitCount += missingBitCount;

		}

		// Now that we have know that our input bit count is evenly divisible by 5,
		// we can loop over the input in increments of 5 to read one decoded partial
		// byte at a time.
		// --
		// NOTE: We are starting a bitCount-1 since the bits are zero-indexed.
		for ( var chunkOffset = ( bitCount - 1 ) ; chunkOffset > 0 ; chunkOffset -= 5 ) {

			var partialByte = 0;

			// Read 5 bits into the partial byte, starting at the chunk offset.
			for ( var i = chunkOffset ; i > ( chunkOffset - 5 ) ; i-- ) {

				// Implicit read of "0" bit into the right-most bit of the partial byte.
				partialByte = bitSHLN( partialByte, 1 );

				// If the underlying input bit is "1", update the partial byte.
				if ( inputWrapper.testBit( javaCast( "int", i ) ) ) {

					partialByte = bitOr( partialByte, 1 );

				}

			}

			// At this point, the partial byte value is a number that refers to the
			// index of the encoded value in the base32 character-set. Push the mapped
			// characterByte onto the buffer.
			buffer.put( encodingMap[ partialByte ] );

		}

		// If the number of chunks isn't divisible by 8, we need to pad the result.
		while ( buffer.remaining() ) {

			buffer.put( paddingByte );

		}

		// Return the result as a binary value.
		return( buffer.array() );

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I provide a pre-allocated ByteBuffer that will store the output string value
	* during the decoding process. This takes into account the possible padding of the
	* input and will discard padding characters during buffer allocation.
	*
	* @output false
	*/
	private any function getAllocatedDecodingBuffer( required binary input ) {

		var paddingByte = asc( "=" );

		var inputLength = arrayLen( input );

		// When allocating the output buffer, we don't want to take the padding
		// characters into account. Decrement the input length until we hit our first
		// trailing non-padding character.
		while ( input[ inputLength ] == paddingByte ) {

			inputLength--;

		}

		// Since each encoded byte reprsents only 5-bits of the encoded input, we know
		// that we know that the total number of meaningful input bits is *5. But then,
		// that has to represent full, 8-bit bytes.
		var totalOutputBytes = fix( inputLength * 5 / 8 );

		return(
			createObject( "java", "java.nio.ByteBuffer" ).allocate(
				javaCast( "int", totalOutputBytes )
			)
		);

	}


	/**
	* I provide a pre-allocated ByteBuffer that will store the base32 value during the
	* encoding process. This takes into account the possible need to pad the final output
	* and will be allocated to the exact length needed for encoding.
	*
	* @output false
	*/
	private any function getAllocatedEncodingBuffer( required binary input ) {

		// Each 5-bits of the input bytes will represent a byte in the encoded value. As
		// such, we know that the output will required the total number of bits (n * 8)
		// divided by 5-bits (for each output byte).
		var totalOutputBytes = ceiling( arrayLen( input ) * 8 / 5 );

		// The length of the output has to be evenly divisible by 8; as such, if it is
		// not, we have to account of the trailing padding ("=") characters.
		if ( totalOutputBytes % 8 ) {

			totalOutputBytes += ( 8 - ( totalOutputBytes % 8 ) );

		}

		return(
			createObject( "java", "java.nio.ByteBuffer" ).allocate(
				javaCast( "int", totalOutputBytes )
			)
		);

	}


	/**
	* I return an array of character-bytes used to represent the set of possible Base32
	* encoding values.
	*
	* @output false
	* @hint This returns a Java array, not a ColdFusion array.
	*/
	private array function getBase32Bytes() {

		return( javaCast( "string", "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" ).getBytes() );

	}


	/**
	* I return a struct that maps Base32 input characters to the bytes that are
	* used in the encoding process.
	*
	* Example: map[ encodedByte ] => byte.
	*
	* @output false
	*/
	private struct function getDecodingMap() {

		var byteMap = {};

		var bytes = getBase32Bytes();
		var byteLength = arrayLen( bytes );

		for ( var i = 1 ; i <= byteLength ; i++ ) {

			byteMap[ bytes[ i ] ] = ( i - 1 );

		}

		return( byteMap );
	}


	/**
	* I return a struct that maps input bytes to the characters that are used
	* to encode in Base32.
	*
	* Example: map[ byte ] => encodedByte.
	*
	* @output false
	*/
	private struct function getEncodingMap() {

		var byteMap = {};

		var bytes = getBase32Bytes();
		var byteLength = arrayLen( bytes );

		for ( var i = 1 ; i <= byteLength ; i++ ) {

			byteMap[ i - 1 ] = bytes[ i ];

		}

		return( byteMap );

	}

}
</cfscript>

I used an online Base32 encoding demo to build up a suite of tests. Then, I ran the tests through my Base32.cfc component to test both the encoding and decoding process:

<cfscript>

	// Set up our known Base32-encoded values.
	tests = {
		"C" = "IM======",
		"Co" = "INXQ====",
		"Com" = "INXW2===",
		"Come" = "INXW2ZI=",
		"Come " = "INXW2ZJA",
		"Come w" = "INXW2ZJAO4======",
		"Come wi" = "INXW2ZJAO5UQ====",
		"Come wit" = "INXW2ZJAO5UXI===",
		"Come with" = "INXW2ZJAO5UXI2A=",
		"Come with " = "INXW2ZJAO5UXI2BA",
		"Come with m" = "INXW2ZJAO5UXI2BANU======",
		"Come with me" = "INXW2ZJAO5UXI2BANVSQ====",
		"Come with me " = "INXW2ZJAO5UXI2BANVSSA===",
		"Come with me i" = "INXW2ZJAO5UXI2BANVSSA2I=",
		"Come with me if" = "INXW2ZJAO5UXI2BANVSSA2LG",
		"Come with me if " = "INXW2ZJAO5UXI2BANVSSA2LGEA======",
		"Come with me if y" = "INXW2ZJAO5UXI2BANVSSA2LGEB4Q====",
		"Come with me if yo" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W6===",
		"Come with me if you" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65I=",
		"Come with me if you " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JA",
		"Come with me if you w" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO4======",
		"Come with me if you wa" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QQ====",
		"Come with me if you wan" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW4===",
		"Come with me if you want" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45A=",
		"Come with me if you want " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BA",
		"Come with me if you want t" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAOQ======",
		"Come with me if you want to" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXQ====",
		"Come with me if you want to " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA===",
		"Come with me if you want to l" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3A=",
		"Come with me if you want to li" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJ",
		"Come with me if you want to liv" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOY======",
		"Come with me if you want to live" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOZSQ====",
		"Come with me if you want to live." = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOZSS4==="
	};

	// Instantiate our base32 encoder - the encode/decode methods are static.
	base32 = new Base32();

	testInputs = structKeyArray( tests );

	arraySort( testInputs, "text", "asc" );

	for ( input in testInputs ) {

		// Encode the value.
		encodedInput = base32.encode( input );

		// Decode the encoded-value.
		decodedOutput = base32.decode( encodedInput );

		// Check to see if the process worked in both directions.
		encodingPassed = ( encodedInput == tests[ input ] );
		decodingPassed = ( input == decodedOutput );

		// Output test results for this test.
		writeOutput( ( encodingPassed && decodingPassed ) ? "[PASS]" : "[FAIL]" );
		writeOutput( " #input# >> #encodedInput# >> #decodedOutput# <br />" );

	}

</cfscript>

All the tests pass! And, I used a awesome Terminator 2 quote, so that's pretty much a win for everyone. Such a great movie.

If this post did nothing else, it sure did teach me a lot about bit-wise operations. And, just as the "bit stream" post was a stepping-stone to Base32 encoding, so is this post a stepping-stone for something else. More to come on that later.



Reader Comments

@All,

Caution, I don't think this actually works for ASCII characters over 127. I didn't test that case; but, it looks like that has to use multi-byte encoding, which this component does NOT decode properly.

I'm working on a fix, trying to figure out what is going wrong where.