Skip to main content
Ben Nadel at cf.Objective() 2012 (Minneapolis, MN) with: Shawn Slaughter
Ben Nadel at cf.Objective() 2012 (Minneapolis, MN) with: Shawn Slaughter

Converting Numbers To A Character-Set Based Radix Using ColdFusion

By on
Tags:

Earlier this morning on Twitter, someone asked about why "tiny url" sites used both numbers and letters in their short URLs. I responded that using both numbers and letters gives a much greater set of possible values as there are more possibilities for each digit. After this micro conversation, I thought it would be fun to try and create a tiny URL service as I don't really know what the best approach would be.

When I sat down and started to think about it, I realized that it would be hugely beneficial to be able to create a tiny URL based on a numeric index in a database. Furthermore, when it comes to looking up a tiny URL, it would be awesome to be able to convert that tiny URL back into its original numeric value. As I was contemplating this, I realized that what I might be doing is converting a base-10 number to a number of a different base (defined by our alpha/numeric character set).

ColdFusion has some great utility methods for base conversions - FormatBaseN() and InputBaseN(). But, those methods only go up to a 36-radix (26 characters + 10 numbers = 36 characters). The reason for this is that its base conversions are case-insensitive, such that "A" is the same as "a". Unfortunately, as I wanted to use both upper and lower case characters as different characters for a greater variety, I couldn't make use of ColdFusion's built-in conversion methods.

So then, I started to think about other characters that might be used. In the past, I have built tiny URLs that have "-" characters and "_" characters. How to I use that in a base conversion? That's when it dawned on me! The radix that I am using is not defined by a standard set of data; rather, my radix is defined by an arbitrary character set. And so, I set about creating two ColdFusion user defined methods that would convert a value based on a passed-in character array in which the index of each character would represent the decimal value of that particular character.

To align more naturally with ColdFusion's built-in methods, I named them similarly:

  • FormatBaseNData( decimalValue, characterArray )
  • InputBaseNData( formattedValue, characterArray )

Here is the code for these ColdFusion user defined functions:

<cffunction
	name="FormatBaseNData"
	access="public"
	returntype="string"
	output="false"
	hint="I take a demical value and then convert it to the number set defined by the passed in array.">

	<!--- Define arguments. --->
	<cfargument
		name="Value"
		type="numeric"
		required="true"
		hint="I am the numberic value that is being converted to a new radix."
		/>

	<cfargument
		name="CharacterSet"
		type="array"
		required="true"
		hint="I am the character set used in the conversion. Each character represents the number at it's array index."
		/>

	<!--- Define the local scope. --->
	<cfset var LOCAL = {} />

	<!--- Create the base string to be returned. --->
	<cfset LOCAL.EncodedValue = "" />

	<!---
		Get the length of our array. This will be our radix for
		the conversion. NOTE: Because ColdFusion arrays start at
		1, not zero, we will have to some manual offsetting when
		we perform the conversion.
	--->
	<cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />

	<!---
		Get a local copy of our value as we will be updating it
		as we divide into it.
	--->
	<cfset LOCAL.Value = ARGUMENTS.Value />

	<!---
		When converting to a new radix, we need to keep dividing
		the value passed in until we hit zero (which will never
		have a remainder). However, because we always want to
		perform at least ONE division, we will break from within
		the loop if it hits zero rather than check for that
		loop conditional.
	--->
	<cfloop condition="true">

		<!--- Get the division result. --->
		<cfset LOCAL.Result = Fix( LOCAL.Value / LOCAL.Radix ) />

		<!--- Get the remainder of radix division. --->
		<cfset LOCAL.Remainder = (LOCAL.Value MOD LOCAL.Radix) />

		<!---
			Take the remainder and prepend the Radix-converted
			string to the encoded value. Remember, since we are
			using arrays that start at 1, we need to add one to
			this value.
		--->
		<cfset LOCAL.EncodedValue = (
			ARGUMENTS.CharacterSet[ LOCAL.Remainder + 1 ] &
			LOCAL.EncodedValue
			) />

		<!---
			Now that we have gotten the current, store the result
			value back into the value.
		--->
		<cfset LOCAL.Value = LOCAL.Result />

		<!---
			Check to see if we have any more value to divide into.
			Once we hit zero, we are out of possible remainders.
		--->
		<cfif NOT LOCAL.Value>
			<cfbreak />
		</cfif>

	</cfloop>

	<!--- Return the encoded value. --->
	<cfreturn LOCAL.EncodedValue />
</cffunction>


<cffunction
	name="InputBaseNData"
	access="public"
	returntype="string"
	output="false"
	hint="I take an encoded value and convert it back to demical based on the passed in character array.">

	<!--- Define arguments. --->
	<cfargument
		name="Value"
		type="string"
		required="true"
		hint="I am the encode value that is being converted back to a base 10 number."
		/>

	<cfargument
		name="CharacterSet"
		type="array"
		required="true"
		hint="I am the character set used in the conversion. Each character represents the number at it's array index."
		/>

	<!--- Define the local scope. --->
	<cfset var LOCAL = {} />

	<!--- Create the base number to be returned. --->
	<cfset LOCAL.DecodedValue = 0 />

	<!---
		Get the length of our array. This will be our radix for
		the conversion. NOTE: Because ColdFusion arrays start at
		1, not zero, we will have to some manual offsetting when
		we perform the conversion.
	--->
	<cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />

	<!---
		Convert our character set to a list so that we can easily
		get the numeric value of our encoded digit.
	--->
	<cfset LOCAL.CharacterList = ArrayToList( ARGUMENTS.CharacterSet ) />

	<!---
		Reverse the string that was passed in. We are doing this
		because the right-most value is actually the smallest
		place and it will be easier for us to deal with in reverse.
	--->
	<cfset LOCAL.Value = Reverse( ARGUMENTS.Value ) />

	<!---
		Now, break the value up into an array so that we can more
		easily iterate over it.
	--->
	<cfset LOCAL.ValueArray = ListToArray(
		REReplace(
			LOCAL.Value,
			"(.)",
			"\1,",
			"all"
			)
		) />

	<!---
		Iterate over the array and convert each value to a power
		of our character set defined radix.
	--->
	<cfloop
		index="LOCAL.Index"
		from="1"
		to="#ArrayLen( LOCAL.ValueArray )#"
		step="1">

		<!---
			Convert the current digit and add it to the going sum
			of our conversion.
		--->
		<cfset LOCAL.DecodedValue += (
			(ListFind( LOCAL.CharacterList, LOCAL.ValueArray[ LOCAL.Index ] ) - 1) *
			(LOCAL.Radix ^ (LOCAL.Index - 1))
			) />

	</cfloop>

	<!--- Return the decoded value. --->
	<cfreturn LOCAL.DecodedValue />
</cffunction>

To test this out, I used both a well known character set (Hexadecimal) and an arbitrary character set. For each test, I converted to the new characters-set-based radix and then back to the decimal value.

<!--- Get the HEX character set. --->
<cfset arrCharacters = ListToArray(
	"0 1 2 3 4 5 6 7 8 9 A B C D E F",
	" "
	) />

<!--- Test conversions. --->
255 => #FormatBaseNData( 255, arrCharacters )#<br />
FF => #InputBaseNData( "FF", arrCharacters )#<br />


<br />


<!---
	Get our collection of available characters that can be used
	when creating character-sensitive tiny URLs.
--->
<cfset arrCharacters = ListToArray(
	"a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9",
	" "
	) />

<!--- Test conversions. --->
255 => #FormatBaseNData( 255, arrCharacters )#<br />
eh => #InputBaseNData( "eh", arrCharacters )#<br />

When we run this code, we get the following output:

255 => FF
FF => 255

255 => eh
eh => 255

As you can see, our control (HEX) and our 62 character set custom radix both encoded and decoded the value properly.

Now that I have these ColdFusion UDFs in place, I think building tiny URL functionality should be much easier.

Want to use code from this post? Check out the license.

Reader Comments

68 Comments

I use a similar function to make unique but verifiable form nonces, to be used in protecting against XSS/XSRF attacks:

<cffunction name="makeNonce" returntype="string">
<cfargument name="inputString" type="string" required="true">
<cfset var md5=hash(arguments.inputString & Request.AppKey & len(arguments.inputString), "MD5")>
<cfset var bigInt=createObject("java","java.math.BigInteger").init(md5,javaCast("int",16)).abs()>
<cfset var chars="bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ012345789">
<cfset var charCount=createObject("java","java.math.BigInteger").valueOf(javaCast("long",len(chars)))>
<cfset var remainder="">
<cfset var ret="">
<cfloop condition="bigInt.bitCount() gt 0">
<cfset remainder=bigInt.mod(charCount)>
<cfset bigInt=bigInt.divide(charCount)>
<cfset ret=ret & mid(chars,remainder.intValue()+1,1)>
</cfloop>
<cfreturn ret>
</cffunction>

My character set is a bit reduced, however, as I also use it for nonces for account creation, so I've removed any characters that might be mistaken if they had to be typed out by hand.

15,688 Comments

@Rick,

I am not sure that I follow what you are doing exactly. However, your use of BigInteger has me curious. One thing that I did get nervous about was the idea of the database going too high to be able to actually handle match on the given value. Of course, I think this would require like over a billion records, but considering the type of service, I am not sure that is beyond reason.

When you create a BigInteger, will ColdFusion be able to handle math with it? Example:

Fix( BigInt / 16 )

... or will it convert it to a standard number when returning the Fix() value?

68 Comments

Unfortunately, CF operators and BigInts don't really get along. The automagic typecasting that CF does tends to do really weird things at unexpected times.

But, just to be obstinate, the BigInt class has methods that are reserved CF words, like .and(), which makes CF choke when you try to chain them such as a.div(b).and(c) . Your example would work, as long as you translated it to: b.divide(sixteen).

Two tricks, though: (1) I said "sixteen" instead of "16" because you can't trust the overloading, and the math functions always expect another BigInt, and (2) since it is an Integer class I don't think it has an equivalent for Fix(). I might be wrong, though.

I used BigInts here because the result of an MD5 hash is a 128-bit number. The InputBaseN function doesn't work for anything larger than 16 bits. So unless I want to do all of the hex conversion and bit math by hand, such as you did, the BigInt is really the only way to go. I'm lazy.

As for overflowing your INT columns in your database, that's definitely a problem if you haven't planned ahead and used 64-bit numbers.

15,688 Comments

@Rick,

Thanks for the insight. I am not sure I full understand, but I will try to implement the BigInteger stuff to make sure it doesn't barf on large values.

As for the database, my INT column is size 10, which should allow it to be 80 bits, right? Or am I not understanding how the length in a database column works with INT values?

15,688 Comments

@Rick,

According to the BigInteger, there is a method:

BigInteger.divideAndRemainder() :: result[]

That looks awesome - takes care of both actions at the same time.

17 Comments

I've modified your methods to work with bigInts so I can create shortcodes from really large PKs. Here is the section I changed (with annotated comments where changes were made):

<!---
	Get a local copy of our value as we will be updating it
	as we divide into it.
--->
<!--- DAN: Make this a BigInteger so we can deal with super big numbers --->
<cfset LOCAL.Value = createObject("java","java.math.BigInteger").init(javaCast( "string", ARGUMENTS.Value)) />

<!---
	When converting to a new radix, we need to keep dividing
	the value passed in until we hit zero (which will never
	have a remainder). However, because we always want to
	perform at least ONE division, we will break from within
	the loop if it hits zero rather than check for that
	loop conditional.
--->
<cfloop condition="true">

	<!--- Get the division and remainder result. --->
	<!--- DAN: Use the divideAndRemainder method to do the division, and get the remainder (instead of using MOD) --->
	<cfset LOCAL.Result = LOCAL.Value.divideAndRemainder(createObject("java","java.math.BigInteger").init(javaCast( "string", LOCAL.Radix))) />

	<!--- Get the remainder of radix division. --->
	<!--- DAN: Adjusted to use the second array value returned from divideAndRemainder --->
	<cfset LOCAL.Remainder = LOCAL.Result[2] />

	<!---
		Take the remainder and prepend the Radix-converted
		string to the encoded value. Remember, since we are
		using arrays that start at 1, we need to add one to
		this value.
	--->
	<cfset LOCAL.EncodedValue = (
		ARGUMENTS.CharacterSet[ LOCAL.Remainder + 1 ] &
		LOCAL.EncodedValue
		) />

	<!---
		Now that we have gotten the current, store the result
		value back into the value.
	--->
	<!--- DAN: Adjusted to use the first array value returned from divideAndRemainder --->
	<cfset LOCAL.Value = LOCAL.Result[1] />

	<!---
		Check to see if we have any more value to divide into.
		Once we hit zero, we are out of possible remainders.
	--->
	<cfif NOT LOCAL.Value>
		<cfbreak />
	</cfif>

</cfloop>
I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel