Looking At The Performance Overhead Of A Read-Only Lock In Lucee CFML 5.3.8.201

Published 2022-06-24 in ColdFusion — Comments (2)

In yesterday's post, I demonstrated that iterating over shared Structs and Arrays is thread-safe in ColdFusion; assuming, of course, that the access is read-only. But, what if I need to occasionally mutate the shared data? In that case, I'd have to acquire an exclusive lock some of the time; which, in turn, means that I'd have to acquire a read-only lock most of the time. This got me thinking about the performance overhead of a read-only lock in Lucee CFML 5.3.8.201.

The performance overhead of an exclusive lock is easier to understand because it essentially single-threads access to a given block of code. So, if nothing else, there's a limit to the throughput on an exclusive lock. But, with a read-only lock, throughput isn't an issue (unless there's a competing exclusive lock) - multiple threads can access the same read-only lock at the same time.

But, do the mechanics of a read-only lock have overhead in and of itself? Meaning, when there is no exclusive lock contention, does having a read-only lock in place affect throughput? To test, I'm going to try and iterate over shared data using parallel threads. In the first test - our control - there will be no locking. Then, in the second test, we'll apply a read-only lock.

In the following control test, we're giving ColdFusion a 10-second window in which to run as many iterations as possible. Each iteration will spawn parallel threads that each try to iterate over the same read-only data:

<cfscript>

	// Let's attempt to simulate concurrent request activity all trying to access shared
	// data. Each entry in the simulated request will be executed via parallel iteration.
	// And, each parallel iteration will try to iterate over the given shared data array.
	simulatedRequests = buildArray( 20 );
	sharedData = buildArray( 100 );

	// Let's keep track of how many test iterations we perform in our test window.
	loopCounter = 0;
	valueCounter = 0;

	// Each test window will be 10-seconds long.
	cutoffAt = ( getTickCount() + ( 10 * 1000 ) );

	// Let's see how many contentious read operations we can perform in our test window.
	while ( getTickCount() < cutoffAt ) {

		simulatedRequests.each(
			() => {

				for ( var increment in sharedData ) {

					// CAUTION: The "++" operator is NOT THREAD SAFE. As such, we cannot
					// trust the following operation inside a parallel iterator. That
					// said, I have it here in order to make sure that the Lucee compiler
					// doesn't try to optimize this inner loop away. I wanted to make sure
					// that we're consuming the iteration value in some way.
					valueCounter += increment;

				}

			},
			// Run the .each() in parallel using Java's thread pool.
			true,
			// Maximum number of parallel threads.
			simulatedRequests.len()
		);

		loopCounter++;

	}

	echo( "Without-lock test <br />" );
	echo( "Loop Counter: #loopCounter# <br />" );
	echo( "Value Counter: #valueCounter.intValue()# <br />" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I construct an array of the given size in each value is "1".
	*/
	public array function buildArray( required numeric size ) {

		var result = [];

		for ( var i = 1 ; i <= size ; i++ ) {

			result[ i ] = 1; // All values are 1 (for our counter).

		}

		return( result );

	}

</cfscript>

Note that inside each iteration of the shared data array, I'm using the value of the array item (which is always 1) to increment a counter. I'm doing this to make sure that the ColdFusion compiler isn't removing our inner loop using some clever optimization. That said, the ++ operator is not thread-safe. As such, we don't expect this inner counter to be accurate - it's there just to force the code to compile a certain way.

That said, if I run this control case 10-times in a row and take the 5 highest values, we get the following performance numbers:

Without-lock test
Loop Counter: 5397
Value Counter: 10769762

Without-lock test
Loop Counter: 4964
Value Counter: 9895341

Without-lock test
Loop Counter: 5138
Value Counter: 10242939

Without-lock test
Loop Counter: 5296
Value Counter: 10552607

Without-lock test
Loop Counter: 4885
Value Counter: 9740710

As you can see, the 10-second window for our control test - without locking - resulted in outer iterations of 4,885 - 5,397.

ASIDE: You can also see how the ++ operator is not thread-safe. The "value counter" is different on every single request, despite the fact that it was always running the same logic.

Now for our read-only lock test. This is the same exact code; only, inside each parallel thread, we're acquiring a read-only lock:

<cfscript>

	// Let's attempt to simulate concurrent request activity all trying to access shared
	// data. Each entry in the simulated request will be executed via parallel iteration.
	// And, each parallel iteration will try to iterate over the given shared data array.
	simulatedRequests = buildArray( 20 );
	sharedData = buildArray( 100 );

	// Let's keep track of how many test iterations we perform in our test window.
	loopCounter = 0;
	valueCounter = 0;

	// Each test window will be 10-seconds long.
	cutoffAt = ( getTickCount() + ( 10 * 1000 ) );

	// Let's see how many contentious read operations we can perform in our test window.
	while ( getTickCount() < cutoffAt ) {

		simulatedRequests.each(
			() => {

				lock
					name = "read-only-lock-test"
					type = "readonly"
					timeout = 5
					{

					for ( var increment in sharedData ) {

						// CAUTION: The "++" operator is NOT THREAD SAFE. As such, we
						// cannot trust the following operation inside a parallel
						// iterator. That said, I have it here in order to make sure that
						// the Lucee compiler doesn't try to optimize this inner loop
						// away. I wanted to make sure that we're consuming the iteration
						// value in some way.
						valueCounter += increment;

					}

				}

			},
			// Run the .each() in parallel using Java's thread pool.
			true,
			// Maximum number of parallel threads.
			simulatedRequests.len()
		);

		loopCounter++;

	};

	echo( "With-lock test <br />" );
	echo( "Loop Counter: #loopCounter.intValue()# <br />" );
	echo( "Value Counter: #valueCounter.intValue()# <br />" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I construct an array of the given size in each value is "1".
	*/
	public array function buildArray( required numeric size ) {

		var result = [];

		for ( var i = 1 ; i <= size ; i++ ) {

			result[ i ] = 1; // All values are 1 (for our counter).

		}

		return( result );

	}

</cfscript>

As you can see, this ColdFusion code is interesting into a read-only lock before it tries to iterate over the shared data. And, when I run this code 10-times in a row and take the 5 highest values, we get the following output:

With-lock test
Loop Counter: 5263
Value Counter: 10491977

With-lock test
Loop Counter: 4939
Value Counter: 9835909

With-lock test
Loop Counter: 5045
Value Counter: 10060834

With-lock test
Loop Counter: 5358
Value Counter: 10688017

With-lock test
Loop Counter: 5642
Value Counter: 11256738

As you can see, the 10-second window for our test - with read-only locking - resulted in outer iterations of 4,939 - 5,642. The side-by-side results:

Without Locking: between 4,885 and 5,397 iterations.
With Locking: between 4,939 and 5,642 iterations.

What I'm seeing here is that there is no readily apparent overhead to a having a read-only lock in ColdFusion. Both tests ran with a decent amount of variation between tests. But, both tests also ran within roughly the same min/max range.

Again, to be clear, if there was a competing exclusive lock, all of the read-only locks would block-and-wait until the lock became available. But, in a situation where there won't be an exclusive lock in the vast majority of requests, having the read-only lock in place doesn't appear to have a discernible performance overhead.

Take This Performance Test For What It's Worth

This performance test is running on my local machine with no external load. I'm trying to simulate some load by running parallel iterations. But, understand that this is not a production setting. That said, my goal here wasn't to be exact in my test, it was only to get a general sense of read-only lock overhead. And, from what I can see, if there is a cost to have a read-only lock in ColdFusion, it's not large enough to merit consider in my algorithmic choices.

Also, as I fun aside, I wanted to see how many cores Java was using, so I ran this:

<cfscript>

	coreCount = createObject( "java", "java.lang.Runtime" )
		.getRuntime()
		.availableProcessors()
	;

	dump( coreCount );

</cfscript>

... and it says that I'm using 8 cores. Which means, that parallel execution of the .each() iteration should, in theory, actually run in parallel some of the time (and not just a simulated concurrency).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4290

Reader Comments

Charlie Arehart Jun 24, 2022 at 4:59 PM

50 Comments

Interesting thought experiment, Ben. If you'd been taking bets, I'd have put down good money that there'd be virtually NO impact in this scenario. The reason I say that may add some value to this post, whether one is using Lucee or ACF.

And as the comment I was writing got longer and longer, I realized a post of my own was better. I offer it here to take that aspect of the conversation (and knowledge-sharing on this topic) further, for those who may be interested:

https://www.carehart.org/blog/2022/6/24/understanding_cflock_cost_part_1

Take This Performance Test For What It's Worth

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!