Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Ralph Holzmann
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Ralph Holzmann@ralphholzmann )

Using An AtomicLong With A Modulo Operation May Be Faster Than Using compareAndSet() For Thread-Safe Range-Based Counters

By Ben Nadel on
Tags: ColdFusion

Last week, I looked at using Java's AtomicInteger class in order to loop over a range of numbers in a thread-safe manner without having to add any additional locking to the code. The core of the logic revolved around executing a while(true) loop that repeatedly invoked a .compareAndSet() method until it returned successfully. After I posted that exploration, fellow InVisioneer, Ben Darfler, suggested that I test this approach against the modulus operator. I had originally thought about using the modulus operator; but, opted-out for fear of running out of INT-values. To this, Darfler reminded me that I could use an AtomicLong for an increased number space. So, let's have at it!

Rather than trying to re-use the AtomicRange.cfc component that I made the other week, I wanted to boil this speed test down to just the necessary ingredients. As such, these tests will be entirely self-contained within a single ColdFusion script page. Each test will attempt to generate a large set of numbers over a static range. And, will output the time (in milliseconds) that it took for the test to run. Since the general concept has already been proven, I'm not going to validate the number generation - only the speed with which the numbers get generated.

And, of course, all the usual caveats about speed tests apply - this is being run on my local development environment while I was doing other stuff. So, the goal here is to see general magnitudes - not to calculate exact comparisons.

That said, let's look at the while-loop approach that I used in last week's blog post:

  • <cfscript>
  •  
  • counter = createObject( "java", "java.util.concurrent.atomic.AtomicInteger" ).init();
  •  
  • // Define the parallel testing threads.
  • testThreadCount = 11;
  • testIncrementCount = 21733579;
  • testRange = 33;
  • totalCount = ( testThreadCount * testIncrementCount );
  •  
  • // Start recording the duration.
  • startedAt = getTickCount();
  •  
  • for ( i = 0 ; i < testThreadCount ; i++ ) {
  •  
  • thread
  • name = "parallel-counter-#i#"
  • action = "run"
  • {
  •  
  • for ( var r = 0 ; r < testIncrementCount ; r++ ) {
  •  
  • do {
  •  
  • var currentValue = counter.get();
  • var nextValue = ( currentValue + 1 );
  •  
  • // Ensure that the counter never goes outside the desired range.
  • if ( nextValue >= testRange ) {
  •  
  • nextValue = 0;
  •  
  • }
  •  
  • // Note: This value may not actually be the final value.
  • thread.lastValue = currentValue;
  •  
  • // Since we are explicitly setting the value of the counter, there is
  • // going to be a race-condition. As such, we have to keep looping until
  • // the next value can be set successfully, based on the current value.
  • } while ( ! counter.compareAndSet( javaCast( "int", currentValue ), javaCast( "int", nextValue ) ) );
  •  
  • }
  •  
  • }
  •  
  • }
  •  
  • // Block / wait for the asynchronous threads to return.
  • thread action = "join";
  •  
  • // Calculate the duration of the test (ie, how long it took for all of the numbers
  • // to be generated using the WHILE-True approach).
  • duration = ( getTickCount() - startedAt );
  •  
  • writeOutput( "#numberFormat( totalCount )# values generated in #numberFormat( duration )#ms. <br />" );
  •  
  • // Output the last value generated in each thread (just as a sanity check).
  • for ( threadName in cfthread ) {
  •  
  • writeOutput( "Last value: #cfthread[ threadName ].lastValue# <br />" );
  •  
  • }
  •  
  • </cfscript>

As you can see, the core logic of the test performs a .compareAndSet() operation from within a do-while loop. If I run this three times in a row, we get the following concatenated output:

239,069,369 values generated in 563,014ms.
Last value: 31
Last value: 24
Last value: 26
Last value: 22
Last value: 18
Last value: 2
Last value: 1
Last value: 10
Last value: 3
Last value: 25
Last value: 23

239,069,369 values generated in 573,330ms.
Last value: 19
Last value: 10
Last value: 1
Last value: 28
Last value: 23
Last value: 32
Last value: 18
Last value: 1
Last value: 25
Last value: 21
Last value: 29

239,069,369 values generated in 565,516ms.
Last value: 13
Last value: 14
Last value: 10
Last value: 30
Last value: 29
Last value: 24
Last value: 10
Last value: 0
Last value: 21
Last value: 12
Last value: 22

For 239-million values, over a range of [0,33) (ie, 33 not inclusive), the test took around 567-seconds.

Now, let's look at the test that uses an ever-increasing AtomicLong in conjunction with the modulus operator in order to iterate over the same range:

  • <cfscript>
  •  
  • // Since we're going to be using the modulo operation in this test, I want to use
  • // a LONG rather than an INT in order to make sure that I am unlikely to run out
  • // of numbers in a production scenario.
  • counter = createObject( "java", "java.util.concurrent.atomic.AtomicLong" ).init();
  •  
  • // The module operator can only work on an INT in ColdFusion. As such, we'll need
  • // to use the Long class in order to perform the modulo operation on our counter.
  • LongClass = createObject( "java", "java.lang.Long" );
  •  
  • // Define the parallel testing threads.
  • testThreadCount = 11;
  • testIncrementCount = 21733579;
  • testRange = 33;
  • totalCount = ( testThreadCount * testIncrementCount );
  •  
  • // Start recording the duration.
  • startedAt = getTickCount();
  •  
  • for ( i = 0 ; i < testThreadCount ; i++ ) {
  •  
  • thread
  • name = "parallel-counter-#i#"
  • action = "run"
  • {
  •  
  • for ( var r = 0 ; r < testIncrementCount ; r++ ) {
  •  
  • var index = counter.getAndIncrement();
  •  
  • // Since the counter, in this demo, is always increasing, we need to
  • // calculate the modulus in order to overlay the counter on the range
  • // of acceptable values.
  • var remainder = LongClass.remainderUnsigned(
  • javaCast( "long", index ),
  • javaCast( "long", testRange )
  • );
  •  
  • thread.lastValue = remainder;
  •  
  • }
  •  
  • }
  •  
  • }
  •  
  • // Block / wait for the asynchronous threads to return.
  • thread action = "join";
  •  
  • // Calculate the duration of the test (ie, how long it took for all of the numbers
  • // to be generated using the MODULO approach).
  • duration = ( getTickCount() - startedAt );
  •  
  • writeOutput( "#numberFormat( totalCount )# values generated in #numberFormat( duration )#ms. <br />" );
  •  
  • // Output the last value generated in each thread (just as a sanity check).
  • for ( threadName in cfthread ) {
  •  
  • writeOutput( "Last value: #cfthread[ threadName ].lastValue# <br />" );
  •  
  • }
  •  
  • </cfscript>

As you can see, this approach ditches in internal do-while loop, using only the AtomicLong's .getAndIncrement() method. The result of the ever-increasing AtomicLong counter is then wedged into the same [0,33) range using a modulus operation - the Long class' .remainderUnsigned() method. If I run this three times in a row, we get the following concatenated output:

239,069,369 values generated in 248,951ms.
Last value: 20
Last value: 30
Last value: 31
Last value: 10
Last value: 22
Last value: 17
Last value: 28
Last value: 8
Last value: 6
Last value: 10
Last value: 14

239,069,369 values generated in 217,446ms.
Last value: 9
Last value: 24
Last value: 2
Last value: 16
Last value: 10
Last value: 17
Last value: 27
Last value: 5
Last value: 4
Last value: 32
Last value: 9

239,069,369 values generated in 223,999ms.
Last value: 5
Last value: 2
Last value: 16
Last value: 10
Last value: 22
Last value: 31
Last value: 3
Last value: 23
Last value: 30
Last value: 28
Last value: 26

For 239-million values, over the same range, the test took around 230-seconds. Wow - that's kind of shocking! Even if this is just a fuzzy comparison, the do-while approach took twice as long as the modulo approach. And, I believe the modulo-based code is objectively more simple and easy to understand.

As someone who is basically naive to the workings of computers (despite the fact that I program on them daily), seeing division operate with such speed is very surprising. But, I am sure that those who are familiar with how computers actually work could probably explain that non-decimal math is blazing fast in modern computer architectures because of "science" and "math" reasons. And, to that, I say, awesome!

Normally, I don't care too much about speed comparisons. For the most part, I like to write code that's easy to understand and is "fast enough." However, in this case, I think the modulo code is both easier to understand and, at the same time, appears to be about twice as fast so the equivalent do-while loop. As such, I can't see much of a downside to using the modulo approach to iterate over a range of numbers in a thread-safe manner. And, the AtomicLong should provide a large enough number space to get you to the next deployment (at which time the numbers will all get reset).



Looking For A New Job?

Ooops, there are no jobs. Post one now for only $29 and own this real estate!

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.