Intermittent Bug In serializeJSON() In Adobe ColdFusion 2025
In one of the recent Adobe ColdFusion 2025 updates (maybe 7, maybe 8), I seem to be hitting a strange intermittent bug in the serializeJson() function. It only happens a handful of times a day; and in my recent debugging efforts, I've found that running a small sleep() and then re-trying the call seems to work. This is why I think it's a bug in ColdFusion itself and not in my code.
At the top of every request to my site, I generate a Content Security Policy (CSP) payload. Part of this payload includes a JSON-stringification call:
<cfscript>
var reportPayload = serializeJson({
group: "csp-endpoint",
max_age: 10886400,
endpoints: [
{
"url": reportToUrl
}
]
});
</cfscript>
There's nothing request-specific in this payload. The reportToUrl is a configuration value that never changes. And my site gets hits thousands of times a day with no problem. Except, on 5-6 requests, this serializeJson() call throws this nonsensical error:
Invalid argument value for
serializeJSON.The
SerializeQueryargument can be a boolean or string type only.
After a bunch of failed debugging steps — assuming it was my fault — I finally tried adding a sleep(100) and a retry. In the following code, notice that the serializeJson() call in each try block is the identical:
<cfscript>
var reportPayload = {
group: "csp-endpoint",
max_age: 10886400,
endpoints: [
{
"url": reportToUrl
}
]
};
try {
// ..... THIS CALL IS IDENTICAL TO NEXT ONE .....
var reportValue = serializeJson( reportPayload );
} catch ( any error ) {
logger.error( "Couldn't JSON CSP data (A).", { reportPayload } );
sleep( 100 );
try {
// ..... THIS CALL IS IDENTICAL TO PREV ONE .....
var reportValue = serializeJson( reportPayload );
} catch ( any error2 ) {
logger.error( "Couldn't JSON CSP data (B).", { reportPayload } );
rethrow;
}
}
</cfscript>
If the error were in my code, I would expect both the logger.error() calls to show up in Bugsnag. However, when I look at my logging after running this all day, here's what I get:
As you can see, only the (A) version of the logging is recorded. After the sleep(100), the repeated call to serializeJson() works without error; and the (B) version never shows up.
What I assume happened is that there must have been some sort of "security fix" introduced to the serializeJson() function which has inadvertently introduced a transient bug of its own. I will open a ticket and link it in the comments.
UPDATE: 2025-05-30
After publishing this yesterday, I went to create a "hotfix" method for serializeJson(). And, while writing that method, I wanted to see if maybe the sleep(100) wasn't actually necessary. So my hotfix method includes two fallback retries, one with a sleep, one without:
component {
/**
* HOTFIX: I provide a version of the serializeJson() method that runs mulitple
* attempts on failure. This is to patch an emergent bug in one of the latest ACF
* updates that seems to have introduced some timing wonkiness.
*/
public string function serializeJsonHotfix( required any input ) {
try {
return serializeJson( input );
} catch ( any error ) {
logger.info( "Serialize JSON hotfix (A)" );
}
try {
return serializeJson( input );
} catch ( any error2 ) {
logger.info( "Serialize JSON hotfix (B)" );
}
sleep( 100 );
try {
return serializeJson( input );
} catch ( any error3 ) {
logger.info( "Serialize JSON hotfix (C)" );
rethrow;
}
}
}
As you can see, (B) is just an immediate retry of the same serializeJson() call - no sleep at all. Then (C) is a second retry with a sleep(100). I deployed this version this morning and so far, all I've seen in the error logs are:
Serialize JSON hotfix (A)
Turns out that the immediate retry is sufficient to work around the bug. At least in my particular case.
Want to use code from this post? Check out the license.
Reader Comments
I've filed issue CF-4232120 in the Adobe Tracker.
I just posted an update that an immediate retry seems to be sufficient. But, the truth is, my log-line (posted above in the "Update") does its own
serializeJson()call internally to prepare the payload for Bugsnag. So it's possible that this internally call either:Provided enough delay.
Rejiggered the internal Java state enough.
Not sure. Either way, I'm now 1000% convinced this is a true ACF bug.
Interesting, as always, Ben. But as you've redacted what's in the logged output of A (in the catch), we're left to assume you've confirmed the value shown IS OK? If you tried to deserialize it on its own, it would work?
If not, do log B inside the nested try (BEFORE) it's nested deserializejson, just to ensure the results are identical.
If doing a manual deserialize of log A's result does work, that would indeed be odd, of course. But at that point I'd think you should consider just going ahead and doing the logging BEFORE the first deserialize (inside that first try).
Sure, that will be a lot of log lines, but then when you get the LOG A standing out, you can see how the log in that catch compares to the one that led to failure. They'd seemingly HAVE to be different. Or yes it's a very odd bug. :-)
I realize what I've written above may seem confusing to anyone just glancing at it quickly. If you have any doubts about what I'm proposing, Ben, I'd be happy to clarify. It seems worth doing.
Indeed, we can't try it for you ourselves since your code above (and in the bug report) doesn't show what you're passing in. And there may well be something to that (since again we can't see what it is), but doing this logging should confirm things either way for you.
Hope you'll consider it, even just to rule it out.
Oh, and if you're on Commandbox you could of course easily run different cf updates to confirm when things started failing. I get that it may fail for you only in prod, and you may prefer not to run older updates there. But perhaps some standalone test caller could replicate the issue. (If so, that would certainly help Adobe if it was not just about different inputs to the deserializatiin).
You may even find there's more to all this than meets the eye: if the logged results DO differ (when working vs when next failing), maybe some other change is causing THAT difference. And testing might even show that difference could have started due to some prior change, but is only exacerbated for you by a more recent one.
Sorry for the wall of text. Just thinking things out with you. As you know, I do this sort of thinking things through with folks daily in my troubleshooting consulting. And as I've offered before, if it would help you I'd happily work with you for free in a shared desktop session, in thanks for all you've done and do for the community.
Otherwise hope the above might help.
@Charlie,
All good thinking. And in fact, my thinking on this has been peeling back like an onion. At first, I thought it was the
sleep()that was making the difference. But then I realized that mylogger.()call would also be serializing the value that just failed to serialize. And that payload was showing up in Bugsnag.So the sleep wasn't the key. But then I thought maybe it was just the time-delay caused by the logging itself. So what I actually have running in production right now is more like:
... in this version (live right now), I now have back-to-back calls to the
serializeJson()with nothing in between - no sleep, no intermediary logging. And so far, the logging in the secondcatchblock hasn't shown up yet. And it's been live for several hours.I feel like this conclusively leads me to believe that there's literally some random error showing up in the
serializeJson()call; and that doing the call again works for unknown reasons.re: CommandBox - unfortunately, the error only showed up a handful of times a day, so it's hard to test. I have to try something, deploy, and wait. I could try to load-test it locally; but not sure it would surface anything.
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →