At InVision, we're in the middle of Sync Week, which is our annual all-hands event and Hackathon. And, given the fact that I've been neck-deep in a ColdFusion custom tag DSL for HTML emails, I thought it would be a fun to try applying my DSL to InVision's transactional emails as my hackathon activity. On the first morning, however, I ran into a bit of a snag: once I moved the DSL files from my host computer into my Docker container, a non-trivial performance issue surfaced with
CFImport tag based Custom Tags in Lucee CFML 126.96.36.199.
ASIDE: I started discussing this issue yesterday on the Lucee Dev Forums. What follows here is a recreation of that thread in blog format. I've also filed a performance enhancement ticket in the Lucee Jira product.
Because my ColdFusion custom tag DSL (Domain Specific Language) is meant to be an abstraction over HTML, some of the common custom tags get executed a lot. And I mean a lot, a lot! Some of the common HTML tags might get executed hundreds - or even thousands - of times depending on the complexity of the HTML email.
In a previous post, I already looked at eliminating implicit variable access as a means to dramatically improve the performance of "hot control flows" within the ColdFusion custom tag execution. As such, I have to admit that I was a bit demoralized when I saw a new performance problem show-up after I moved the execution of the ColdFusion custom tags into Docker.
Nothing in the ColdFusion Debugger pointed me to an obvious issue; so, I just started trying random things to see if I could figure anything out. And, one of the random things that I tried was switching one of the files from using the
CFImport tag to the
These two tags represent two different approaches to consuming the same ColdFusion custom tags. I've always assumed that these two approaches were functionally equivalent. However, when I switched that one file, the performance for that file shot-up - significantly!
To see if the ColdFusion custom tag invocation choice made a real difference, I setup a standalone test-case in my Docker container. First, I created a simple tag that has no internal processing overhead:
<cfswitch expression="#thistag.executionMode#"> <cfcase value="start"> [ </cfcase> <cfcase value="end"> ] </cfcase> </cfswitch>
Then, I created a test harness that tries to invoke this ColdFusion custom tag 1,000 times using both
<cfimport prefix="tags" taglib="./tags/" /> <!--- Give the FusionReactor profiler time to kick-in. ---> <cfset sleep( 50 ) /> <cftimer type="outline" label="CFModule"> <cfoutput> <cfloop index="i" from="1" to="1000" step="1"> <cfmodule template="./tags/MyTag.cfm"> #i# </cfmodule> </cfloop> </cfoutput> </cftimer> <cftimer type="outline" label="CFImport"> <cfoutput> <cfloop index="i" from="1" to="1000" step="1"> <tags:MyTag> #i# </tags:MyTag> </cfloop> </cfoutput> </cftimer>
When I ran this ColdFusion page a number of times, the results were dramatically and consistently different. On average, here's the latency that I was seeing:
CFModule- 10 - 15 milliseconds
CFImport- 1,200 - 1,500 milliseconds
As you can see, the
CFImport tag execution is two orders of magnitude slower than the
CFModule tag execution.
At the suggestion of Zac Spitzer, I tried looking at the execution of this request in FusionReactor. And, when I looked at the profiler, I immediately understood why a new performance issue was showing up in my Docker container (click for a larger image):
As you can see, only 12ms of time were spent executing the
CFModule portion of the page. But, a whopping 1.45s of time was spent executing the
CFImport portion of the page. Because, as it turns out, every time you execute a
CFImport-based ColdFusion custom tag, the Lucee runtime is checking to see if the corresponding file exists.
The wonderful thing about Docker is that it allows everyone's development environment to look, feel, and behave the same. The unfortunate thing about Docker - at least with Docker For Mac - is that File I/O performance is notably terrible. As such, Docker becomes a choke-point in this scenario when
CFImport performs a File operation (checking file existence) every single time a ColdFusion custom tags gets invoked.
To mitigate some of these issues, I went into my ColdFusion custom tag DSL and converted internal tag consumption to use the
CFModule approach. However, I don't want to lose the syntactic sugar that
CFImport affords me. I certainly don't want to convert this:
<html:p> Hello </html:p>
... into this:
<cfmodule template="/my-custom-tag-dsl/html/p.cfm"> Hello </cfmodule>
As such, I'm going to stick to low-level improvements and then hope that File IO performance in production is better. This is not necessarily "hope as a plan" - I do have good reason to believe that file IO will actually be better in production:
The production machines are much larger, dedicated machines, whereas my local development Docker container is running as VirtualMachine on an otherwise busy laptop.
The production machines are using Kubernetes (K8), not Docker; and, are not running on Docker For Mac.
As such, if I can maintain "good enough" performance locally, I am quite confident that the performance will be perfectly fine in production. And, of course, I'll be deploying everything behind a LaunchDarkly feature flag so that I can safely enable and / or disable the new HTML email generation as needed.
As I apply my ColdFusion custom tag DSL to a real-world scenario, some really helpful insights are bubbling to the surface! I'm refactoring the DSL as I learn new things; which I'll then backport into my DSL proof-of-concept GitHub repository after the hackathon is over.
Hack the planet!
Because of this File IO overhead inside of Docker and Lucee CFML, I've started to explore a two-pass approach wherein I actually generate CFML code that I then execute on all subsequent passes. This limits the file IO overhead to only the first pass:
Still just exploring ideas; but, this is the best idea I think I've come up with so far (to address the performance issues).
Random thought on this. Have you considered loading your custom tags into a ram disk onApplicationStart or something and then trying to use that as the path for your custom tags?
Then lucee would be checking ram on every request instead of the disk.
Might still be horrible, but thought I'd share.
Ok, found your thread in dev.lucee and I see you've been down the ram path.
Yeah, this has been a real kick-in-the-gut for me. It's super
frustrating - I feel like I poured dozens of hours of personal time
into an approach that uses
<cfimport> only to find
out that, due to a series of oddities, may not be viable. I'm still
noodling on this, trying to see if I can find a solution. But, it has
definitely taken the wind out of my sails.
So, I just ran a small test to see what the pragmatic implications of this File IO overhead are in containers. And, I'm both shocked and thrilled to find out that the File IO in my local Docker For Mac setup is 68-times slower than it is in production:
What this means is that my ColdFusion custom tag DSL is still a
viable solution, even with the "buggy" implementation
<CFImport> tag... and I'm back in the game!