Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Kev McCabe

Converting Numbers To A Character-Set Based Radix Using ColdFusion

By Ben Nadel on
Tags: ColdFusion

Earlier this morning on Twitter, someone asked about why "tiny url" sites used both numbers and letters in their short URLs. I responded that using both numbers and letters gives a much greater set of possible values as there are more possibilities for each digit. After this micro conversation, I thought it would be fun to try and create a tiny URL service as I don't really know what the best approach would be.

When I sat down and started to think about it, I realized that it would be hugely beneficial to be able to create a tiny URL based on a numeric index in a database. Furthermore, when it comes to looking up a tiny URL, it would be awesome to be able to convert that tiny URL back into its original numeric value. As I was contemplating this, I realized that what I might be doing is converting a base-10 number to a number of a different base (defined by our alpha/numeric character set).

ColdFusion has some great utility methods for base conversions - FormatBaseN() and InputBaseN(). But, those methods only go up to a 36-radix (26 characters + 10 numbers = 36 characters). The reason for this is that its base conversions are case-insensitive, such that "A" is the same as "a". Unfortunately, as I wanted to use both upper and lower case characters as different characters for a greater variety, I couldn't make use of ColdFusion's built-in conversion methods.

So then, I started to think about other characters that might be used. In the past, I have built tiny URLs that have "-" characters and "_" characters. How to I use that in a base conversion? That's when it dawned on me! The radix that I am using is not defined by a standard set of data; rather, my radix is defined by an arbitrary character set. And so, I set about creating two ColdFusion user defined methods that would convert a value based on a passed-in character array in which the index of each character would represent the decimal value of that particular character.

To align more naturally with ColdFusion's built-in methods, I named them similarly:

  • FormatBaseNData( decimalValue, characterArray )
  • InputBaseNData( formattedValue, characterArray )

Here is the code for these ColdFusion user defined functions:

  • <cffunction
  • name="FormatBaseNData"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take a demical value and then convert it to the number set defined by the passed in array.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Value"
  • type="numeric"
  • required="true"
  • hint="I am the numberic value that is being converted to a new radix."
  • />
  •  
  • <cfargument
  • name="CharacterSet"
  • type="array"
  • required="true"
  • hint="I am the character set used in the conversion. Each character represents the number at it's array index."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!--- Create the base string to be returned. --->
  • <cfset LOCAL.EncodedValue = "" />
  •  
  • <!---
  • Get the length of our array. This will be our radix for
  • the conversion. NOTE: Because ColdFusion arrays start at
  • 1, not zero, we will have to some manual offsetting when
  • we perform the conversion.
  • --->
  • <cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Get a local copy of our value as we will be updating it
  • as we divide into it.
  • --->
  • <cfset LOCAL.Value = ARGUMENTS.Value />
  •  
  • <!---
  • When converting to a new radix, we need to keep dividing
  • the value passed in until we hit zero (which will never
  • have a remainder). However, because we always want to
  • perform at least ONE division, we will break from within
  • the loop if it hits zero rather than check for that
  • loop conditional.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Get the division result. --->
  • <cfset LOCAL.Result = Fix( LOCAL.Value / LOCAL.Radix ) />
  •  
  • <!--- Get the remainder of radix division. --->
  • <cfset LOCAL.Remainder = (LOCAL.Value MOD LOCAL.Radix) />
  •  
  • <!---
  • Take the remainder and prepend the Radix-converted
  • string to the encoded value. Remember, since we are
  • using arrays that start at 1, we need to add one to
  • this value.
  • --->
  • <cfset LOCAL.EncodedValue = (
  • ARGUMENTS.CharacterSet[ LOCAL.Remainder + 1 ] &
  • LOCAL.EncodedValue
  • ) />
  •  
  • <!---
  • Now that we have gotten the current, store the result
  • value back into the value.
  • --->
  • <cfset LOCAL.Value = LOCAL.Result />
  •  
  • <!---
  • Check to see if we have any more value to divide into.
  • Once we hit zero, we are out of possible remainders.
  • --->
  • <cfif NOT LOCAL.Value>
  • <cfbreak />
  • </cfif>
  •  
  • </cfloop>
  •  
  • <!--- Return the encoded value. --->
  • <cfreturn LOCAL.EncodedValue />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="InputBaseNData"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take an encoded value and convert it back to demical based on the passed in character array.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Value"
  • type="string"
  • required="true"
  • hint="I am the encode value that is being converted back to a base 10 number."
  • />
  •  
  • <cfargument
  • name="CharacterSet"
  • type="array"
  • required="true"
  • hint="I am the character set used in the conversion. Each character represents the number at it's array index."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!--- Create the base number to be returned. --->
  • <cfset LOCAL.DecodedValue = 0 />
  •  
  • <!---
  • Get the length of our array. This will be our radix for
  • the conversion. NOTE: Because ColdFusion arrays start at
  • 1, not zero, we will have to some manual offsetting when
  • we perform the conversion.
  • --->
  • <cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Convert our character set to a list so that we can easily
  • get the numeric value of our encoded digit.
  • --->
  • <cfset LOCAL.CharacterList = ArrayToList( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Reverse the string that was passed in. We are doing this
  • because the right-most value is actually the smallest
  • place and it will be easier for us to deal with in reverse.
  • --->
  • <cfset LOCAL.Value = Reverse( ARGUMENTS.Value ) />
  •  
  • <!---
  • Now, break the value up into an array so that we can more
  • easily iterate over it.
  • --->
  • <cfset LOCAL.ValueArray = ListToArray(
  • REReplace(
  • LOCAL.Value,
  • "(.)",
  • "\1,",
  • "all"
  • )
  • ) />
  •  
  • <!---
  • Iterate over the array and convert each value to a power
  • of our character set defined radix.
  • --->
  • <cfloop
  • index="LOCAL.Index"
  • from="1"
  • to="#ArrayLen( LOCAL.ValueArray )#"
  • step="1">
  •  
  • <!---
  • Convert the current digit and add it to the going sum
  • of our conversion.
  • --->
  • <cfset LOCAL.DecodedValue += (
  • (ListFind( LOCAL.CharacterList, LOCAL.ValueArray[ LOCAL.Index ] ) - 1) *
  • (LOCAL.Radix ^ (LOCAL.Index - 1))
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Return the decoded value. --->
  • <cfreturn LOCAL.DecodedValue />
  • </cffunction>

To test this out, I used both a well known character set (Hexadecimal) and an arbitrary character set. For each test, I converted to the new characters-set-based radix and then back to the decimal value.

  • <!--- Get the HEX character set. --->
  • <cfset arrCharacters = ListToArray(
  • "0 1 2 3 4 5 6 7 8 9 A B C D E F",
  • " "
  • ) />
  •  
  • <!--- Test conversions. --->
  • 255 => #FormatBaseNData( 255, arrCharacters )#<br />
  • FF => #InputBaseNData( "FF", arrCharacters )#<br />
  •  
  •  
  • <br />
  •  
  •  
  • <!---
  • Get our collection of available characters that can be used
  • when creating character-sensitive tiny URLs.
  • --->
  • <cfset arrCharacters = ListToArray(
  • "a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9",
  • " "
  • ) />
  •  
  • <!--- Test conversions. --->
  • 255 => #FormatBaseNData( 255, arrCharacters )#<br />
  • eh => #InputBaseNData( "eh", arrCharacters )#<br />

When we run this code, we get the following output:

255 => FF
FF => 255

255 => eh
eh => 255

As you can see, our control (HEX) and our 62 character set custom radix both encoded and decoded the value properly.

Now that I have these ColdFusion UDFs in place, I think building tiny URL functionality should be much easier.




Reader Comments

I use a similar function to make unique but verifiable form nonces, to be used in protecting against XSS/XSRF attacks:

<cffunction name="makeNonce" returntype="string">
<cfargument name="inputString" type="string" required="true">
<cfset var md5=hash(arguments.inputString & Request.AppKey & len(arguments.inputString), "MD5")>
<cfset var bigInt=createObject("java","java.math.BigInteger").init(md5,javaCast("int",16)).abs()>
<cfset var chars="bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ012345789">
<cfset var charCount=createObject("java","java.math.BigInteger").valueOf(javaCast("long",len(chars)))>
<cfset var remainder="">
<cfset var ret="">
<cfloop condition="bigInt.bitCount() gt 0">
<cfset remainder=bigInt.mod(charCount)>
<cfset bigInt=bigInt.divide(charCount)>
<cfset ret=ret & mid(chars,remainder.intValue()+1,1)>
</cfloop>
<cfreturn ret>
</cffunction>

My character set is a bit reduced, however, as I also use it for nonces for account creation, so I've removed any characters that might be mistaken if they had to be typed out by hand.

Reply to this Comment

@Rick,

I am not sure that I follow what you are doing exactly. However, your use of BigInteger has me curious. One thing that I did get nervous about was the idea of the database going too high to be able to actually handle match on the given value. Of course, I think this would require like over a billion records, but considering the type of service, I am not sure that is beyond reason.

When you create a BigInteger, will ColdFusion be able to handle math with it? Example:

Fix( BigInt / 16 )

... or will it convert it to a standard number when returning the Fix() value?

Reply to this Comment

Unfortunately, CF operators and BigInts don't really get along. The automagic typecasting that CF does tends to do really weird things at unexpected times.

But, just to be obstinate, the BigInt class has methods that are reserved CF words, like .and(), which makes CF choke when you try to chain them such as a.div(b).and(c) . Your example would work, as long as you translated it to: b.divide(sixteen).

Two tricks, though: (1) I said "sixteen" instead of "16" because you can't trust the overloading, and the math functions always expect another BigInt, and (2) since it is an Integer class I don't think it has an equivalent for Fix(). I might be wrong, though.

I used BigInts here because the result of an MD5 hash is a 128-bit number. The InputBaseN function doesn't work for anything larger than 16 bits. So unless I want to do all of the hex conversion and bit math by hand, such as you did, the BigInt is really the only way to go. I'm lazy.

As for overflowing your INT columns in your database, that's definitely a problem if you haven't planned ahead and used 64-bit numbers.

Reply to this Comment

@Rick,

Thanks for the insight. I am not sure I full understand, but I will try to implement the BigInteger stuff to make sure it doesn't barf on large values.

As for the database, my INT column is size 10, which should allow it to be 80 bits, right? Or am I not understanding how the length in a database column works with INT values?

Reply to this Comment

@Rick,

According to the BigInteger, there is a method:

BigInteger.divideAndRemainder() :: result[]

That looks awesome - takes care of both actions at the same time.

Reply to this Comment

I've modified your methods to work with bigInts so I can create shortcodes from really large PKs. Here is the section I changed (with annotated comments where changes were made):

  • <!---
  • Get a local copy of our value as we will be updating it
  • as we divide into it.
  • --->
  • <!--- DAN: Make this a BigInteger so we can deal with super big numbers --->
  • <cfset LOCAL.Value = createObject("java","java.math.BigInteger").init(javaCast( "string", ARGUMENTS.Value)) />
  •  
  • <!---
  • When converting to a new radix, we need to keep dividing
  • the value passed in until we hit zero (which will never
  • have a remainder). However, because we always want to
  • perform at least ONE division, we will break from within
  • the loop if it hits zero rather than check for that
  • loop conditional.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Get the division and remainder result. --->
  • <!--- DAN: Use the divideAndRemainder method to do the division, and get the remainder (instead of using MOD) --->
  • <cfset LOCAL.Result = LOCAL.Value.divideAndRemainder(createObject("java","java.math.BigInteger").init(javaCast( "string", LOCAL.Radix))) />
  •  
  • <!--- Get the remainder of radix division. --->
  • <!--- DAN: Adjusted to use the second array value returned from divideAndRemainder --->
  • <cfset LOCAL.Remainder = LOCAL.Result[2] />
  •  
  • <!---
  • Take the remainder and prepend the Radix-converted
  • string to the encoded value. Remember, since we are
  • using arrays that start at 1, we need to add one to
  • this value.
  • --->
  • <cfset LOCAL.EncodedValue = (
  • ARGUMENTS.CharacterSet[ LOCAL.Remainder + 1 ] &
  • LOCAL.EncodedValue
  • ) />
  •  
  • <!---
  • Now that we have gotten the current, store the result
  • value back into the value.
  • --->
  • <!--- DAN: Adjusted to use the first array value returned from divideAndRemainder --->
  • <cfset LOCAL.Value = LOCAL.Result[1] />
  •  
  • <!---
  • Check to see if we have any more value to divide into.
  • Once we hit zero, we are out of possible remainders.
  • --->
  • <cfif NOT LOCAL.Value>
  • <cfbreak />
  • </cfif>
  •  
  • </cfloop>

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.