CUID2 For ColdFusion / CFML

By Ben Nadel

Published 2023-01-14 in ColdFusion — Comments (7)

A couple of years ago, I built a ColdFusion port of the CUID library which we've been using successfully at InVision. The CUID library provides collision-resistant IDs that are optimized for horizontal scaling and performance. Just recently, however, Eric Elliott released Cuid2 - an updated version of the library intended to address some philosophical security issues. I wanted to create a ColdFusion port of his new Cuid2 library.

View this code in my CUID2 For ColdFUsion project on GitHub.

The CUID2 token is guaranteed to start with a letter and be of a consistent, configured length. The values within the token consist of the Base36 character set, which is [a-z0-9]. The ColdFusion port of CUID2 is thread safe and can be cached and reused within your ColdFusion application:

<cfscript>

	// Cachced reference to the CUID instance.
	cuid2 = new lib.Cuid2();

	writeDump({ token: cuid2.createCuid() });
	writeDump({ token: cuid2.createCuid() });
	writeDump({ token: cuid2.createCuid() });
	writeDump({ token: cuid2.createCuid() });

</cfscript>

Running this ColdFusion code gives us the following output:

token: uem955pnse56id49y6bcmjz8
token: ek9lgqi0mfkh9wmxnb6rvzuc
token: lycfyvl0dlspi0us6smqkkr0
token: x0hhypk7l7k4hga8newn4gnw

The Cuid2.cfc ColdFusion component can be instantiated with three optional arguments:

new Cuid2( [ length [, fingerprint [, algorithm ] ] ] )

length - The desired size of the generated tokens. This defaults to 24 but can be anything between 24 and 32 (inclusive).
fingerprint - An additional source of entropy associated with the device. This will default to the name of the JVM process as provided by the ManagementFactory Runtime MX Bean.
algorithm - The hash algorithm to be used when reducing the sources of entropy. It defaults to sha3-256, which is the CUID2 standard; but, can be set to sha-256 on older versions of Java (8) which do not support the SHA3 family of algorithms.

Under the hood, the randomness is provided by ColdFusion's built-in function, randRange(), which is being executed with the sha1prng algorithm for pseudo-random number generation. We can perform a high-level test of the randomness by generating 1,000,000 tokens and then seeing how well these values get distributed into a set of buckets:

<cfscript>

	cfsetting( requestTimeout = 300 );

	length = 24;
	cuid = new lib.Cuid2( length );
	count = 1000000;
	tokens = [:];
	buckets = [:];
	BigIntegerClass = createObject( "java", "java.math.BigInteger" );

	// Create buckets for our distribution counters. As each CUID is generated, it will be
	// sorted into one of these buckets (used to increment the given counter).
	for ( i = 1 ; i <= 20 ; i++ ) {

		buckets[ i ] = 0;

	}

	bucketCount = BigIntegerClass.init( buckets.count() );
	startedAt = getTickCount();

	for ( i = 1 ; i <= count ; i++ ) {

		token = cuid.createCuid();

		// Make sure the CUID is generated at the configured length.
		if ( token.len() != length ) {

			throw(
				type = "Cuid2.Test.InvalidTokenLength",
				message = "Cuid token length did not match configuration.",
				detail = "Length: [#length#], Token: [#token#]"
			);

		}

		tokens[ token ] = i;

		// If the CUIDs are all unique, then each token should represent a unique entry
		// within the struct. But, if there is a collision, then a key will be written
		// twice and the size of the struct will no longer match the current iteration.
		if ( tokens.count() != i ) {

			throw(
				type = "Cuid2.Test.TokenCollision",
				message = "Cuid token collision detected in test.",
				detail = "Iteration: [#numberFormat( i )#]"
			);

		}

		// Each token is in the form of ("letter" + "base36 value"). As such, we can strip
		// off the leading letter and then use the remaining base36 value to generate a
		// BigInteger instance.
		intRepresentation = BigIntegerClass.init( token.right( -1 ), 36 );
		// And, from said BigInteger instance, we can use the modulo operator to sort it
		// into the relevant bucket / counter.
		bucketIndex = ( intRepresentation.remainder( bucketCount ) + 1 );
		buckets[ bucketIndex ]++;

	}

</cfscript>
<cfoutput>

	<p>
		Tokens generated: #numberFormat( count )# (no collisions)<br />
		Time: #numberFormat( getTickCount() - startedAt )#ms
	</p>

	<ol>
		<cfloop item="key" collection="#buckets#">
			<li>
				<div class="bar" style="width: #fix( buckets[ key ] / 100 )#px ;">
					#numberFormat( buckets[ key ] )#
				</div>
			</li>
		</cfloop>
	</ol>

	<style type="text/css">
		.bar {
			background-color: ##ff007f ;
			border-radius: 3px 3px 3px 3px ;
			color: ##ffffff ;
			margin: 2px 0px ;
			padding: 2px 5px ;
		}
	</style>

</cfoutput>

Since the CUID2 token is, essentially, a Base36-encoded value, we can decode that value back into a byte array; and then, use that byte array to initialize a BigInteger class. And, once we have a numeric representation of the CUID2 token, we can use the modulo operator to sort it into a given bucket. Assuming we have 20 buckets, you can think of this operation like:

asBigInt( token ) % 20 => bucket

Now, when we run the above ColdFusion code, we get the following output:

A histogram of CUID2 token distribution, showing relatively even distribution across 20 buckets.

As you can see, the 1,000,000 CUID2 tokens are, roughly, evenly distributed across the 20 buckets. And, in those million keys, no collisions were detected.

I would not necessarily consider my CUID2 port as production ready since no one else has looked at it and I haven't tried running it in production. But, if you are looking for a CUID2 port in ColdFusion, hopefully this will at least point you in the right direction.

Not Sortable / Not Increasing

Part of the security issue that CUID2 is addressing (over CUID) is to remove any sense of sorting / increasing order. When an ID - of any kind - is generated in order, the order "leaks information" about the token. Since CUID2 has no predictable order, it is inherently more secure. Elliott talks about the security considerations more in the CUID2 repository.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4388

Reader Comments

Ben Nadel Jan 14, 2023 at 8:08 PM

15,841 Comments

Here's an article that Eric Elliott wrote about his new Cuid2 approach (and why his previous approach was insufficient from a security standpoint): Identity Crisis: How Modern Applications Generate Unique Ids. It goes into a lot of depth (though much of what is there is also on the Cuid2 GitHub page as well).

Ben Nadel Jan 16, 2023 at 2:32 PM

15,841 Comments

I've been trying to look at the low-level performance of the Cuid2.cfc - using some reflection, I'm dynamically instrumenting the methods to see how long they execute during the generation of 1M CUID tokens:

www.bennadel.com/blog/4390-dynamically-instrumenting-coldfusion-component-methods-with-gettickcount-to-locate-performance-bottlenecks.htm

Surprisingly, the test is spending 30% of its time just running randRange(). This seems somewhat strange.

Ben Nadel Jan 17, 2023 at 11:53 AM

15,841 Comments

So, my current implementation takes about 60-seconds to generate 1M CUID tokens. Which is not terribly fast. I've been playing around with how to get the performance improved by:

Scoping all variable access (ie, adding the arguments. and variables. prefixes wherever they make sense.
Using BIF (built-in functions) instead of Member Methods (ie, len(value) instead of value.len().
Inlining some hot functions (such as the toBase36() call).

Through these means, I was able to get the cost of 1M CUID tokens down from about 60-seconds to 24-seconds. This is a significant performance improvement; but, at what cost?!

When I do this, the code becomes much less readable. Is it really worth the faster token generation? Is token generation really going to be a limiting factor in terms of performance. I have to remember that this only gets used during write operations; and, that the vast majority of applications are read heavy, not write heavy. Which means I may really be improving the performance of something that is, in and of itself, not a natural bottleneck in the application.

Gary F Jan 20, 2023 at 10:08 PM

29 Comments

Out of curiosity, what's wrong with CreateUUID, apart from it being 11 chars longer? It's also compatible with SQL's uniqueidentifier datatype (if you insert an extra hyphen). I used it in an application that's been running for 17 years 24x7 and it's never clashed. It was interesting to hear how CUID2 works, a good article as always Ben. :-)

Ben Nadel Jan 21, 2023 at 10:44 AM

15,841 Comments

@Gary,

Honestly, I can't really speak to the pros/cons of UUIDs or CUIDs - I just like trying to build it. I still use Auto-incrementing values in my applications. Over in the CUID2 for JavaScript repository Elliott has some comparisons of CUID2 to other UUID algorithms.

Ben Nadel Jan 24, 2023 at 11:42 AM

15,841 Comments

I've updated the post with a Warning at the top to indicate that this is an incomplete version of the implementation. Only algorithms using the SHA3 family of hashing can be considered implementations of CUID2. And, since SHA3 wasn't added until Java 9, implementing this for Java 8 would mean loading a 3rd-party JAR, which just makes this whole thing less developer friendly than I would like.

Ben Nadel Jan 26, 2023 at 11:54 AM

15,841 Comments

Ok, I've removed the warning and updated the Cuid2.cfc to default to sha3-256. This works fine on Java 9+; but, if you are still on Java 8, then it can be explicitly set to sha-256 as a constructor argument. I've updated the blog post to include the optional argument description.

Not Sortable / Not Increasing

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!