Converting Numbers To A Character-Set Based Radix Using ColdFusion

Posted March 24, 2009 at 10:15 AM

Tags: ColdFusion

Earlier this morning on Twitter, someone asked about why "tiny url" sites used both numbers and letters in their short URLs. I responded that using both numbers and letters gives a much greater set of possible values as there are more possibilities for each digit. After this micro conversation, I thought it would be fun to try and create a tiny URL service as I don't really know what the best approach would be.

When I sat down and started to think about it, I realized that it would be hugely beneficial to be able to create a tiny URL based on a numeric index in a database. Furthermore, when it comes to looking up a tiny URL, it would be awesome to be able to convert that tiny URL back into its original numeric value. As I was contemplating this, I realized that what I might be doing is converting a base-10 number to a number of a different base (defined by our alpha/numeric character set).

ColdFusion has some great utility methods for base conversions - FormatBaseN() and InputBaseN(). But, those methods only go up to a 36-radix (26 characters + 10 numbers = 36 characters). The reason for this is that its base conversions are case-insensitive, such that "A" is the same as "a". Unfortunately, as I wanted to use both upper and lower case characters as different characters for a greater variety, I couldn't make use of ColdFusion's built-in conversion methods.

So then, I started to think about other characters that might be used. In the past, I have built tiny URLs that have "-" characters and "_" characters. How to I use that in a base conversion? That's when it dawned on me! The radix that I am using is not defined by a standard set of data; rather, my radix is defined by an arbitrary character set. And so, I set about creating two ColdFusion user defined methods that would convert a value based on a passed-in character array in which the index of each character would represent the decimal value of that particular character.

To align more naturally with ColdFusion's built-in methods, I named them similarly:

  • FormatBaseNData( decimalValue, characterArray )
  • InputBaseNData( formattedValue, characterArray )

Here is the code for these ColdFusion user defined functions:

 Launch code in new window » Download code as text file »

  • <cffunction
  • name="FormatBaseNData"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take a demical value and then convert it to the number set defined by the passed in array.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Value"
  • type="numeric"
  • required="true"
  • hint="I am the numberic value that is being converted to a new radix."
  • />
  •  
  • <cfargument
  • name="CharacterSet"
  • type="array"
  • required="true"
  • hint="I am the character set used in the conversion. Each character represents the number at it's array index."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!--- Create the base string to be returned. --->
  • <cfset LOCAL.EncodedValue = "" />
  •  
  • <!---
  • Get the length of our array. This will be our radix for
  • the conversion. NOTE: Because ColdFusion arrays start at
  • 1, not zero, we will have to some manual offsetting when
  • we perform the conversion.
  • --->
  • <cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Get a local copy of our value as we will be updating it
  • as we divide into it.
  • --->
  • <cfset LOCAL.Value = ARGUMENTS.Value />
  •  
  • <!---
  • When converting to a new radix, we need to keep dividing
  • the value passed in until we hit zero (which will never
  • have a remainder). However, because we always want to
  • perform at least ONE division, we will break from within
  • the loop if it hits zero rather than check for that
  • loop conditional.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Get the division result. --->
  • <cfset LOCAL.Result = Fix( LOCAL.Value / LOCAL.Radix ) />
  •  
  • <!--- Get the remainder of radix division. --->
  • <cfset LOCAL.Remainder = (LOCAL.Value MOD LOCAL.Radix) />
  •  
  • <!---
  • Take the remainder and prepend the Radix-converted
  • string to the encoded value. Remember, since we are
  • using arrays that start at 1, we need to add one to
  • this value.
  • --->
  • <cfset LOCAL.EncodedValue = (
  • ARGUMENTS.CharacterSet[ LOCAL.Remainder + 1 ] &
  • LOCAL.EncodedValue
  • ) />
  •  
  • <!---
  • Now that we have gotten the current, store the result
  • value back into the value.
  • --->
  • <cfset LOCAL.Value = LOCAL.Result />
  •  
  • <!---
  • Check to see if we have any more value to divide into.
  • Once we hit zero, we are out of possible remainders.
  • --->
  • <cfif NOT LOCAL.Value>
  • <cfbreak />
  • </cfif>
  •  
  • </cfloop>
  •  
  • <!--- Return the encoded value. --->
  • <cfreturn LOCAL.EncodedValue />
  • </cffunction>
  •  
  •  
  • <cffunction
  • name="InputBaseNData"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take an encoded value and convert it back to demical based on the passed in character array.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Value"
  • type="string"
  • required="true"
  • hint="I am the encode value that is being converted back to a base 10 number."
  • />
  •  
  • <cfargument
  • name="CharacterSet"
  • type="array"
  • required="true"
  • hint="I am the character set used in the conversion. Each character represents the number at it's array index."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!--- Create the base number to be returned. --->
  • <cfset LOCAL.DecodedValue = 0 />
  •  
  • <!---
  • Get the length of our array. This will be our radix for
  • the conversion. NOTE: Because ColdFusion arrays start at
  • 1, not zero, we will have to some manual offsetting when
  • we perform the conversion.
  • --->
  • <cfset LOCAL.Radix = ArrayLen( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Convert our character set to a list so that we can easily
  • get the numeric value of our encoded digit.
  • --->
  • <cfset LOCAL.CharacterList = ArrayToList( ARGUMENTS.CharacterSet ) />
  •  
  • <!---
  • Reverse the string that was passed in. We are doing this
  • because the right-most value is actually the smallest
  • place and it will be easier for us to deal with in reverse.
  • --->
  • <cfset LOCAL.Value = Reverse( ARGUMENTS.Value ) />
  •  
  • <!---
  • Now, break the value up into an array so that we can more
  • easily iterate over it.
  • --->
  • <cfset LOCAL.ValueArray = ListToArray(
  • REReplace(
  • LOCAL.Value,
  • "(.)",
  • "\1,",
  • "all"
  • )
  • ) />
  •  
  • <!---
  • Iterate over the array and convert each value to a power
  • of our character set defined radix.
  • --->
  • <cfloop
  • index="LOCAL.Index"
  • from="1"
  • to="#ArrayLen( LOCAL.ValueArray )#"
  • step="1">
  •  
  • <!---
  • Convert the current digit and add it to the going sum
  • of our conversion.
  • --->
  • <cfset LOCAL.DecodedValue += (
  • (ListFind( LOCAL.CharacterList, LOCAL.ValueArray[ LOCAL.Index ] ) - 1) *
  • (LOCAL.Radix ^ (LOCAL.Index - 1))
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Return the decoded value. --->
  • <cfreturn LOCAL.DecodedValue />
  • </cffunction>

To test this out, I used both a well known character set (Hexadecimal) and an arbitrary character set. For each test, I converted to the new characters-set-based radix and then back to the decimal value.

 Launch code in new window » Download code as text file »

  • <!--- Get the HEX character set. --->
  • <cfset arrCharacters = ListToArray(
  • "0 1 2 3 4 5 6 7 8 9 A B C D E F",
  • " "
  • ) />
  •  
  • <!--- Test conversions. --->
  • 255 => #FormatBaseNData( 255, arrCharacters )#<br />
  • FF => #InputBaseNData( "FF", arrCharacters )#<br />
  •  
  •  
  • <br />
  •  
  •  
  • <!---
  • Get our collection of available characters that can be used
  • when creating character-sensitive tiny URLs.
  • --->
  • <cfset arrCharacters = ListToArray(
  • "a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9",
  • " "
  • ) />
  •  
  • <!--- Test conversions. --->
  • 255 => #FormatBaseNData( 255, arrCharacters )#<br />
  • eh => #InputBaseNData( "eh", arrCharacters )#<br />

When we run this code, we get the following output:

255 => FF
FF => 255

255 => eh
eh => 255

As you can see, our control (HEX) and our 62 character set custom radix both encoded and decoded the value properly.

Now that I have these ColdFusion UDFs in place, I think building tiny URL functionality should be much easier.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Mar 24, 2009 at 12:07 PM // reply »
35 Comments

I use a similar function to make unique but verifiable form nonces, to be used in protecting against XSS/XSRF attacks:

<cffunction name="makeNonce" returntype="string">
<cfargument name="inputString" type="string" required="true">
<cfset var md5=hash(arguments.inputString & Request.AppKey & len(arguments.inputString), "MD5")>
<cfset var bigInt=createObject("java","java.math.BigInteger").init(md5,javaCast("int",16)).abs()>
<cfset var chars="bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ012345789">
<cfset var charCount=createObject("java","java.math.BigInteger").valueOf(javaCast("long",len(chars)))>
<cfset var remainder="">
<cfset var ret="">
<cfloop condition="bigInt.bitCount() gt 0">
<cfset remainder=bigInt.mod(charCount)>
<cfset bigInt=bigInt.divide(charCount)>
<cfset ret=ret & mid(chars,remainder.intValue()+1,1)>
</cfloop>
<cfreturn ret>
</cffunction>

My character set is a bit reduced, however, as I also use it for nonces for account creation, so I've removed any characters that might be mistaken if they had to be typed out by hand.


Mar 24, 2009 at 2:44 PM // reply »
6,516 Comments

@Rick,

I am not sure that I follow what you are doing exactly. However, your use of BigInteger has me curious. One thing that I did get nervous about was the idea of the database going too high to be able to actually handle match on the given value. Of course, I think this would require like over a billion records, but considering the type of service, I am not sure that is beyond reason.

When you create a BigInteger, will ColdFusion be able to handle math with it? Example:

Fix( BigInt / 16 )

... or will it convert it to a standard number when returning the Fix() value?


Mar 24, 2009 at 5:28 PM // reply »
35 Comments

Unfortunately, CF operators and BigInts don't really get along. The automagic typecasting that CF does tends to do really weird things at unexpected times.

But, just to be obstinate, the BigInt class has methods that are reserved CF words, like .and(), which makes CF choke when you try to chain them such as a.div(b).and(c) . Your example would work, as long as you translated it to: b.divide(sixteen).

Two tricks, though: (1) I said "sixteen" instead of "16" because you can't trust the overloading, and the math functions always expect another BigInt, and (2) since it is an Integer class I don't think it has an equivalent for Fix(). I might be wrong, though.

I used BigInts here because the result of an MD5 hash is a 128-bit number. The InputBaseN function doesn't work for anything larger than 16 bits. So unless I want to do all of the hex conversion and bit math by hand, such as you did, the BigInt is really the only way to go. I'm lazy.

As for overflowing your INT columns in your database, that's definitely a problem if you haven't planned ahead and used 64-bit numbers.


Mar 24, 2009 at 5:59 PM // reply »
6,516 Comments

@Rick,

Thanks for the insight. I am not sure I full understand, but I will try to implement the BigInteger stuff to make sure it doesn't barf on large values.

As for the database, my INT column is size 10, which should allow it to be 80 bits, right? Or am I not understanding how the length in a database column works with INT values?


Mar 24, 2009 at 6:00 PM // reply »
6,516 Comments

@Rick,

According to the BigInteger, there is a method:

BigInteger.divideAndRemainder() :: result[]

That looks awesome - takes care of both actions at the same time.


Mar 25, 2009 at 9:58 AM // reply »
6,516 Comments

@Rick,

I didn't have time to play around with the BigInteger stuff; but, I was able to put the FormatBaseNData() method into effect for creating tiny URLs:

http://www.bennadel.com/index.cfm?dax=blog:1541.view


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 20, 2009 at 11:32 PM
Five Months Without Hungarian Notation And I'm Loving It
I've used headless camel case for years for not only ColdFusion variables, but also SQL tables and fields... pretty much everything involving code. I also subscribe to the "don't abbreviate and clea ... read »
Nov 20, 2009 at 11:00 PM
Five Months Without Hungarian Notation And I'm Loving It
@Marcel, Yeah, I always err on the side of longer but more readable variable names. As for the camel casing of CF methods and the headless camel casing of custom items, I get around this by always ... read »
Nov 20, 2009 at 10:56 PM
Five Months Without Hungarian Notation And I'm Loving It
I use the following and love it: my.namespace.MyComponents.functionMethodsOrUDF() CONSTANT_VALUES_OR_PROPERTIES One thing I always try is to CamelCaseBuiltInColdFusionFunctions() so others can tell ... read »
Nov 20, 2009 at 5:38 PM
Learning ColdFusion 8: CFImage Part I - Reading And Writing Images
Hi Ben, Great article. I've been looking around to see if ColdFusion image engine can programatically create the following "wrap around" effect: http://www.creativepro.com/article/photoshop-s-she ... read »
Nov 20, 2009 at 5:35 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Dave: I talked to Gert he suggested: <cfhttp method="get" url="http://{some cf website}" result="stuff" addtoken="yes" /> Note the addition of cfhttp attribute addtoken. That should persist y ... read »
Nov 20, 2009 at 5:23 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, Ahh, gotcha, yeah that makes sense. ... read »
Nov 20, 2009 at 5:17 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, sorry if I didn't make this clear. You can make it work like that if you want, just put <cfset session.foo = 1> (and <cfset application.foo = 1>) in your OnRequestStart() and it reve ... read »