The other day, Dustin Chesterman asked me about an XML parsing error he was seeing. He was getting the "Content is not allowed in Prolog" XmlParse() error. I have blogged about this error before - it is an exception that is thrown when you try to parse XML that has data or white space prior to the encoding declaration or root node. This is often caused when an XML feed does not trim it's return value. Usually, passing the content through ColdFusion's Trim() method before calling XmlParse() does the trick; however, in Dustin's case, Trim() didn't seem to be helping.
He was working with Authorize.NET's API, which returns XML responses. Let's take a look at the call that was being made. For demonstration purposes, I am just going to call the Authorize.NET API without any data - this will error on their side, but will return a valid XML response:
<!--- Call Authorize.NET API. This will fail because we are not passing any of the require information, but at least it will return an XML result (error message) that we can then use. ---> <cfhttp method="get" url="https://apitest.authorize.net/xml/v1/request.api" result="objGet" /> <!--- Dump out the results. ---> <cfdump var="#objGet#" label="Authorize.NET Result" />
Running this code, we get the following CFDump output:
If you look at the FileContent key above, you will see that an XML document was returned. And, furthermore, from what you can see, it appears that the first piece of data returned is the encoding:
<?xml version="1.0" encoding="utf-8"?>
But, now, let's try to parse this return value:
<!--- Parse Authorize.NET resposne into a ColdFusion XML object. Be sure to Trim() the content to get rid of any white space. ---> <cfset xmlResult = XmlParse( Trim( objGet.FileContent ) ) />
Notice that we are running the objGet.FileContent through ColdFusion's Trim() method before parsing it. Usually, this would take care of any prolog data issues; however, running the above code, we get the following error:
An error occured while Parsing an XML document. Content is not allowed in prolog.
Clearly, there is data there that we are not seeing. Let's loop over the first few characters of the response data to see what is going on:
<!--- Loop over first few characters of response. ---> <cfloop index="intCharIndex" from="1" to="6" step="1"> <!--- Get the character in question. ---> <cfset strChar = Mid( Trim( objGet.FileContent ), intCharIndex, 1 ) /> <!--- Output char and Ascii values. ---> [#strChar#] - #Asc( strChar )#<br /> </cfloop>
After running the loop, we can see that there is, indeed, a leading character:
 - 65279
[<] - 60
[?] - 63
[x] - 120
[m] - 109
[l] - 108
There is a mysterious leading character - 65279.
It turns out, this character is not just random data, it's something called a Byte-Order-Mark and in an XML document, it is used to flag the encoding type of the XML. When you convert this byte into Hexadecimal, you get "FEFF". If you look on www.opentag.com, you will see that this byte signals a UTF-16 (big-endian) encoding:
- EFBBBF - UTF-8
- FEFF - UTF-16 (big-endian)
- FFFE - UTF-16 (little-endian)
- 0000FEFF - UTF-32 (big-endian)
- FFFE0000 - UTF-32 (little-endian)
- None of the above - UTF-8
Unfortunately, ColdFusion does not appreciate the use of this Byte-Order-Mark, or BOM. In order to get this kind of XML feed to play nicely with ColdFusion, we have to remove the BOM before we parse the document. Luckily, getting rid of this requires nothing more than a simple regular expression that strips out all characters before the first bracket:
<!--- Parse the return value into a ColdFusion XML document. Remove the Byte-Order-Mark (BOM) by stripping all pre-"<" characters. ---> <cfset xmlResult = XmlParse( REReplace( objGet.FileContent, "^[^<]*", "", "all" ) ) /> <!--- Dump out XML resposne. ---> <cfdump var="#xmlResult#" label="Authorize.NET Clean Response" />
Running this, we get the following CFDump output:
As you can see, with the BOM character easily stripped out, we can now parse the XML data without issue. I don't know much about BOM characters or how often they are used. I assume that since ColdFusion doesn't play nicely with them that they are NOT common practice; but, I can't really say for sure. Clearly they aren't used everywhere or I would have come across this issue before. As such, I wouldn't go around implementing this code for every XML feed you encounter - only for those that error out because of it.
Want to use code from this post? Check out the license.