Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Sally Jenkinson
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Sally Jenkinson@sjenkinson )

Shared-Array Iteration Can Cause Thread Deadlocks In Lucee ColdFusion 5.2

By Ben Nadel on
Tags: ColdFusion

Yesterday, as we migrated some ColdFusion code from the Adobe ColdFusion 10 engine over to the Lucee ColdFusion 5 engine, we saw a lot of JVM threads enter a BLOCKED state once the code was put under load. Upon investigation of the corresponding thread dumps, all of the BLOCKED threads appeared to be in a deadlock while trying to iterate over an empty Array. The root cause of the problem was the difference in the way in which each of the ColdFusion engines handles array-passing: by-value vs. by-reference. What was an isolated array in Adobe ColdFusion suddenly became a shared array in Lucee ColdFusion. And, as it turns out, attempting to iterate over a shared array reference in Lucee can cause deadlocks under load.

One of the big differences between Adobe ColdFusion and Lucee ColdFusion is the way in which Arrays are passed around. In Adobe ColdFusion, arrays are passed-by-value; meaning, a new shallow copy of the array data-structure is created whenever the array is assigned to a new variable or passed out of context. Conversely, in Lucee ColdFusion, arrays are passed-by-reference; meaning, only the pointer to the data-structure is copied whenever the array is assigned to a new variable or passed out of context. This means that what was not a shared memory space in Adobe ColdFusion can become a shared memory space in Lucee ColdFusion.

Generally, when dealing with shared memory space access and mutation, you need to take care to provide some sort of synchronization (ie, locking) around the data operations. For me, this lesson always becomes muddy when we start talking about read-only data. Meaning, a shared data structure that is read-from but never written-to. In such cases, I tend to believe that synchronization is unnecessary since no state is being changed. But, what I failed to understand in this particular case was that the very act of reading the data actually changed the internal - albeit encapsulated - state of the data-structure. And, it was this change in shared state that caused the deadlock.

To make this concrete, I had a ColdFusion component that kept an empty Array in the variables scope. This empty array was intended to provide some semantic documentation in subsequent method calls:

  • component {
  •  
  • public any function init() {
  •  
  • // This empty array is here to provide some semantic documentation to the URL
  • // generation calls. The very name of the variable will provide far more meaning
  • // than an in-line empty array definition.
  • variables.emptyQueryParameters = [];
  •  
  • }
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • public string function getThisUrl() {
  •  
  • return( generateUrl( "/this", emptyQueryParameters ) );
  •  
  • }
  •  
  • public string function getThatUrl() {
  •  
  • return( generateUrl( "/that", emptyQueryParameters ) );
  •  
  • }
  •  
  • // ---
  • // PRIVATE METHODS.
  • // ---
  •  
  • private string function generateUrl(
  • required string resource,
  • required array queryParams
  • ) {
  •  
  • // ...
  • for ( var queryParam in queryParams ) {
  • // ...
  • }
  • // ...
  •  
  • }
  •  
  • }

As you can see, the shared-value, "emptyQueryParameters", was an empty array that served no real purpose other than to define and fulfill a required argument for subsequent method invocations. In other words, it was there to make it clear to future developers why an empty-array was being passed with various method calls.

In Adobe ColdFusion, this worked perfectly well as the "emptyQueryParameters" variable was passed-by-value. However, in Lucee ColdFusion, this "emptyQueryParameters" variable became pass-by-reference. And, under load, it lead to stack-traces that looked like this:

  • "http-apr-8500-exec-245"
  • java.lang.Thread.State: BLOCKED
  • at java.util.Vector.size(Vector.java:318)
  • at lucee.runtime.type.util.ListIteratorImpl.hasNext(ListIteratorImpl.java:75)
  • at lucee.runtime.type.util.ListIteratorImpl.next(ListIteratorImpl.java:103)
  • at services.thinger_cfc$cf.udfCall3(/invision/services/Thinger.cfc:971)
  • at services.thinger_cfc$cf.udfCall(/invision/services/Thinger.cfc)
  • at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:107)
  • at lucee.runtime.type.UDFImpl._call(UDFImpl.java:357)
  • at lucee.runtime.type.UDFImpl.call(UDFImpl.java:226)
  • at lucee.runtime.type.scope.UndefinedImpl.call(UndefinedImpl.java:803)
  • at lucee.runtime.util.VariableUtilImpl.callFunctionWithoutNamedValues(VariableUtilImpl.java:756)
  • at lucee.runtime.PageContextImpl.getFunction(PageContextImpl.java:1718)
  • at services.thinger_cfc$cf.udfCall3(/invision/services/Thinger.cfc:1019)
  • at services.thinger_cfc$cf.udfCall(/invision/services/Thinger.cfc)

As you can see, the JVM thread is blocked on a call to java.util.Vector.size(). What happened was that the shared array was being consumed in a for-in loop. And, under the hood, the ColdFusion for-in loop needed to check the size of the array, which is really a Vector of some sort. And, in doing so, it ended up deadlocking on some internal state of the underlying Java data-structure.

Eventually, after 600-seconds, Lucee ColdFusion would interrupt these threads and kill the hanging requests:

request ... has run into a timeout (600 seconds) and has been stopped.

The point of this lesson is a reminder to always lock access to data in a shared memory space even if the data-access appears to be read-only. What is "read-only" on the surface may not be read-only under the hood. And, unless a data-structure is inherently synchronized (ie, synchronized internally to its implementation), synchronization needs to be explicitly handled by the developer.

I'm writing this post in part because I received no search results when I attempted to Google for this type of stack-trace. But, I'm also writing this post in part to drill it into my head that shared-memory access needs to be synchronized. Always. Hopefully, this helps anyone else who runs into the same problem.



Reader Comments

Just a comment, Lucee != ColdFusion as ColdFusion is a brand name for Adobe. So more correctly (IMHO, YMMV) Lucee CFML or just Lucee

Reply to this Comment

@Mark,

Ahhh, gotcha. Sorry, I'm relatively new to the Lucee world. I had thought that "ColdFusion" was being referred to as the language, and "Adobe" and "Lucee" were the platform implementations.

And, for what it's worth, we're already seeing some improvement performance patterns in Lucee! Woot woot! :party-parrot:

Reply to this Comment

@Ben,

Nice one! The by ref does free up a lot of memory if you are using lots of arrays and structures (as they aren't being copied every time you pass them around)

I need to check but I thought there was a setting for function arguments like by-value or we could have talked about it at Railo/Lucee but the benefits outweigh the usefulness.

When I first was exposed to this I was confused as I had mis-understood ColdFusion and thought it WAS by reference and surprised I was wrong.

Reply to this Comment

Hey Ben, which exact version of Lucee where you using?

there are a lot of Lucee 5.2 releases and it's always useful to know in which release you found the problem

Reply to this Comment

@Mark,

I totally agree. I think the pass-by-reference makes complete sense for Arrays. It is in alignment with my other language of choice, JavaScript :D

@Zachary,

Good question. When I dump out the server scope, I get, 5.2.9.31. That said, to be clear, I am not calling this a bug in Lucee -- I am calling this a problem with my use of shared-state without properly synchronization. I've historically thought about "Read only" as safe to do without locking; but, this makes it clear that even such a mentality is ill-fated in some circumstances.

Reply to this Comment

I am in the process of migration of ColdFusion to Lucee. If you can write a blog on the your process of migration that would be helpful.

I am happy with overall performance of the Lucee on our staging environment.

Reply to this Comment

@Vikrant,

Right now, our process is just "doing it slowly" :D We did the migration locally, in the development environments. So, all the engineers were working with Lucee locally and then deploying to Adobe ColdFusion.

Which, of course, caused some problems since some non-compatible issues were only discovered once we went to Staging or Production (depending on how complicated the issue was).

In production, we then have a "Bug Crowd" environment that runs in Lucee where people can get paid to find bugs.

And, in our "normal production" environment, we have the load-balancer setup so that Lucee only receives like 2% of incoming HTTP traffic. This way, we limit the scope of ways in which Lucee will affect users. That's the stage we're at right now - this 2% of traffic phase. And, we're slowly finding and fixing problems as they show up in the error logs or are reported by Support.

To be clear, all the production stuff is over my head - that's all managed by the SRE team who understands how Kubernetes works.

Now, keep in mind, this path is relevant for our type of app and size of our team and infrastructure. This could be way over-kill for smaller apps. That said, I think it was great that our local development was running on Lucee many weeks before we moved any Lucee functionality to production.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.