Content Is Not Allowed In Prolog - ColdFusion XML And The Byte-Order-Mark (BOM)

Posted April 22, 2008 at 8:35 AM

Tags: ColdFusion

The other day, Dustin Chesterman asked me about an XML parsing error he was seeing. He was getting the "Content is not allowed in Prolog" XmlParse() error. I have blogged about this error before - it is an exception that is thrown when you try to parse XML that has data or white space prior to the encoding declaration or root node. This is often caused when an XML feed does not trim it's return value. Usually, passing the content through ColdFusion's Trim() method before calling XmlParse() does the trick; however, in Dustin's case, Trim() didn't seem to be helping.

He was working with Authorize.NET's API, which returns XML responses. Let's take a look at the call that was being made. For demonstration purposes, I am just going to call the Authorize.NET API without any data - this will error on their side, but will return a valid XML response:

 Launch code in new window » Download code as text file »

  • <!---
  • Call Authorize.NET API. This will fail because we are not
  • passing any of the require information, but at least it will
  • return an XML result (error message) that we can then use.
  • --->
  • <cfhttp
  • method="get"
  • url="https://apitest.authorize.net/xml/v1/request.api"
  • result="objGet"
  • />
  •  
  • <!--- Dump out the results. --->
  • <cfdump
  • var="#objGet#"
  • label="Authorize.NET Result"
  • />

Running this code, we get the following CFDump output:


 
 
 

 
Authorize.NET CFHttp Response  
 
 
 

If you look at the FileContent key above, you will see that an XML document was returned. And, furthermore, from what you can see, it appears that the first piece of data returned is the encoding:

 Launch code in new window » Download code as text file »

  • <?xml version="1.0" encoding="utf-8"?>

But, now, let's try to parse this return value:

 Launch code in new window » Download code as text file »

  • <!---
  • Parse Authorize.NET resposne into a ColdFusion XML object.
  • Be sure to Trim() the content to get rid of any white space.
  • --->
  • <cfset xmlResult = XmlParse(
  • Trim( objGet.FileContent )
  • ) />

Notice that we are running the objGet.FileContent through ColdFusion's Trim() method before parsing it. Usually, this would take care of any prolog data issues; however, running the above code, we get the following error:

An error occured while Parsing an XML document. Content is not allowed in prolog.

Clearly, there is data there that we are not seeing. Let's loop over the first few characters of the response data to see what is going on:

 Launch code in new window » Download code as text file »

  • <!--- Loop over first few characters of response. --->
  • <cfloop
  • index="intCharIndex"
  • from="1"
  • to="6"
  • step="1">
  •  
  • <!--- Get the character in question. --->
  • <cfset strChar = Mid(
  • Trim( objGet.FileContent ),
  • intCharIndex,
  • 1
  • ) />
  •  
  • <!--- Output char and Ascii values. --->
  • [#strChar#] - #Asc( strChar )#<br />
  •  
  • </cfloop>

After running the loop, we can see that there is, indeed, a leading character:

[] - 65279
[<] - 60
[?] - 63
[x] - 120
[m] - 109
[l] - 108

There is a mysterious leading character - 65279.

It turns out, this character is not just random data, it's something called a Byte-Order-Mark and in an XML document, it is used to flag the encoding type of the XML. When you convert this byte into Hexadecimal, you get "FEFF". If you look on www.opentag.com, you will see that this byte signals a UTF-16 (big-endian) encoding:

  • EFBBBF - UTF-8
  • FEFF - UTF-16 (big-endian)
  • FFFE - UTF-16 (little-endian)
  • 0000FEFF - UTF-32 (big-endian)
  • FFFE0000 - UTF-32 (little-endian)
  • None of the above - UTF-8

Unfortunately, ColdFusion does not appreciate the use of this Byte-Order-Mark, or BOM. In order to get this kind of XML feed to play nicely with ColdFusion, we have to remove the BOM before we parse the document. Luckily, getting rid of this requires nothing more than a simple regular expression that strips out all characters before the first bracket:

 Launch code in new window » Download code as text file »

  • <!---
  • Parse the return value into a ColdFusion XML
  • document. Remove the Byte-Order-Mark (BOM) by
  • stripping all pre-"<" characters.
  • --->
  • <cfset xmlResult = XmlParse(
  • REReplace( objGet.FileContent, "^[^<]*", "", "all" )
  • ) />
  •  
  • <!--- Dump out XML resposne. --->
  • <cfdump
  • var="#xmlResult#"
  • label="Authorize.NET Clean Response"
  • />

Running this, we get the following CFDump output:


 
 
 

 
Authorize.NET Xml Repsonse Parsed Into ColdFusion XML Document  
 
 
 

As you can see, with the BOM character easily stripped out, we can now parse the XML data without issue. I don't know much about BOM characters or how often they are used. I assume that since ColdFusion doesn't play nicely with them that they are NOT common practice; but, I can't really say for sure. Clearly they aren't used everywhere or I would have come across this issue before. As such, I wouldn't go around implementing this code for every XML feed you encounter - only for those that error out because of it.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Apr 22, 2008 at 9:08 AM // reply »
30 Comments

as i recently told somebody in the forums having the same issue (maybe the same guy?) it appears that a BOM is valid in XML & any parser (including cf's) should be able to handle this. looks like a bug in xmlParse().


Apr 22, 2008 at 9:26 AM // reply »
30 Comments

ah i must be getting old, just noticed that you got the UTF-16 BOM call right (i called it UTF-8 BOM in the forums).

that makes authorize.net a lying so & so, it declared the xml encoding to be UTF-8, yet it supplied a UTF-16 BOM. which might mean xmlParse() is actually bombing because the BOM is lying & just telling us the wrong error???


Apr 22, 2008 at 9:36 AM // reply »
7,572 Comments

@PaulH,

Interesting point. I didn't even notice that when I was checking this stuff out. I don't know how parsing works, but it seems like if the encoding was misleading that could lead to serious errors. But, from the error that ColdFusion is throwing, it looks like it is having trouble just kicking off the parsing. If there is "bad" data in the prolog, I am not sure if would even get to the tag-based encoding.

I guess this is some sort of bug, if this is following standards.


Apr 22, 2008 at 11:11 AM // reply »
10 Comments

On a side note - given your regular expression you can change your scope attribute on the REReplace call to "one" or leave blank as it defaults to one. The nature of your expression will catch all the characters prior to the opening chevron.

Is it necessary? Yes, and noticeably depending on the size of your document. In putting together a quick example using a moderately sized XML, ColdFusion registered 0ms when using "one" as the scope and 16ms when using the "all" scope.

If I've learned anything in working with regular expressions, it that you should always be mindful of performance. Once you find a regular expression that works - try to refactor a more efficient one. You can use free tools (dontate-ware) like Regex Coach to help build and step through your expressions.


Apr 22, 2008 at 11:12 AM // reply »
10 Comments

#Replace(previousComment, "necessary", "faster")#


Apr 22, 2008 at 11:28 AM // reply »
7,572 Comments

@Shayne,

It's funny that you bring that up cause as I was writing the RegEx, that thought did pop into my mind, but I ignored it. I am just used to writing "all". But you are correct - one should be mindful of their regular expressions and "one" was more my *intent*.

Also, RegEx Coach rocks :) I have it in my quick-launch at all times.


Apr 22, 2008 at 12:13 PM // reply »
7 Comments

@All...
So which is it? A CF bug or not properly formatted XML response? Ben, thanks for this post. You helped me in the past with this but now I have a better understanding as to what is going on.


Apr 22, 2008 at 12:46 PM // reply »
21 Comments

Ben, you are wicked smart. :)

If anyone is nerd enough to seek further reading, I highly recommend Wikipedia's information on this subject. I just looked up byte order marks and endianness (big-endian vs. little-endian) and I learned a ton.


Apr 22, 2008 at 1:01 PM // reply »
7,572 Comments

@David,

Thanks :) To be honest, I don't even know that much about encoding at all. I just use the default encoding (probably not the best practice). A weakness in my brain!


Apr 22, 2008 at 7:27 PM // reply »
30 Comments

@ben, "just use unicode" is all the encoding advice anyone needs.

@dv, both. that xml is lying through it's teeth (it was actually utf-8) & i just tested w/real utf-8 & utf-16 xml & both bombed xmlParse() when a BOM was included.


Apr 23, 2008 at 3:32 AM // reply »
2 Comments

Thanks for the post.

In the past (CF7 for sure, probably CF8.0 as well) we had successfully parsed some XML documents starting with a UTF-8 BOM. After upgrading to CF 8.0.1 we also got this error "Content Is Not Allowed In Prolog" when parsing such documents. So it seems like a bug in CF 8.0.1 to me, but I didn't investigate further. Could someone confirm if this was still OK in 8.0 and got broke in 8.0.1?


Apr 23, 2008 at 7:36 AM // reply »
7,572 Comments

@Thilo,

I can confirm that my example (in the post above) was done in ColdFusion 8.0.1 and failed to parse the UTF-16 BOM.


Apr 24, 2008 at 12:08 AM // reply »
92 Comments

Thilo, I can test tomorrow using Ben's example. I'll let you know what I find out.


Apr 24, 2008 at 4:41 PM // reply »
92 Comments

Thilo,

This error also occurs on CF8 version: 8,0,0,176276. I ran Ben's sample code and got the same "Content Is Not Allowed In Prolog" error. I also tried his sample on outputting the first few char codes and I got the same output. Hope this helps.


Apr 24, 2008 at 5:52 PM // reply »
7,572 Comments

@Javier,

Way to help us double-team this problem :)


Apr 24, 2008 at 9:00 PM // reply »
92 Comments

No problem man! You did the hard part though! Working up the effort to write all that code. :) Did a good old copy and paste on our DEV server which runs CF8 (my local runs the latest 8.0.1) so figured I'd help out. Glad to do my part!


Apr 25, 2008 at 4:23 AM // reply »
2 Comments

Thanks Ben & Javier!

Seems I have to look a little futher into CF XML parsing to get around this error which in our case is related to some scecial characters and does not occur every time. (some XML documents including a BOM got successfully parsed, some not)
I'll post a follow-up when I know more...


May 2, 2008 at 12:08 PM // reply »
39 Comments

This does seem like a bug - but not with xmlParse, rather with cfhttp which is preserving the BOM in the response. When a string parser reads a string under a specific encoding, it is not supposed to store the BOM as a character within that string.

Other string functionality (such as cffile) handle this correctly. For example, try saving the cfhttp.filecontent, then use [cffile action="read" charset="utf-8"] on it, and pass that to xmlParse - you will not have a problem.

So the issue is that however cfhttp is parsing response strings, it's failing to properly handle the BOM, and returning it as if it were part of the string - which it's not.

This is probably faster than the regular expression:
[!--- Remove BOM from the start of the string, if it exists ---]
[cfif Left(xmlText, 1) EQ chr(65279)]
[cfset xmlText = mid(xmlText, 2, len(xmlText))]
[/cfif]


May 2, 2008 at 11:07 PM // reply »
30 Comments

1) some of my tests used an xml string w/a BOM directly, no cfhttp was involved.

2) more importantly, as far as i can tell the W3C says xml parsers *have* to understand BOMs. period (see #1).

3) your cffile test doesn't apply. cffile doesn't write a BOM out in the first place.


May 3, 2008 at 9:04 AM // reply »
39 Comments

I can understand where your confusion comes from, byte order markers are not described in incredible detail, because their use is largely becoming out of date.

I started writing a lengthy comment discussing the virtues of preserving vs discarding BOM, what my own research has revealed, etc, but decided this was getting off-topic to this discussion (the topic of this discussion being how to handle the disconnect between cfhttp preserving BOM and xmlParse expecting it to have been discarded).

A specific reply to PaulH:
1) If you author a string from within ColdFusion with a BOM, of course it's going to have the BOM, you've made outside character decoding which BOM is designed for.

2) I'll address this in my blog.

3) The point isn't whether cffile writes a BOM, it's whether it reads a BOM then discards it after character decoding is complete (it does) - behavior which is inconsistent with cfhttp. Since discarding BOM has to be intentional, while preserving BOM could easily be accidental, it's my belief that Adobe intends to discard BOM. As to whether BOM should be discarded - that's discussed in my blog too.

You can read my full response at http://www.bandeblog.com/2008/05/bom-is-it-part-of-data.html


May 3, 2008 at 9:07 AM // reply »
39 Comments

And sorry, I didn't mean to say, "your confusion" as if I necessarily am the authority on everything, that's what happens I guess when I write a long comment here, then snip it to little pieces to try to avoid going totally OT here.


May 3, 2008 at 11:42 AM // reply »
30 Comments

1) if an xml stream has a BOM, xmlParse() or whatever is supposed to be able to handle it (as far as i can tell). doesn't matter where it's created. according to the unicode standard, a BOM is not part of the text.

2) can you cite references for your opinion?

3) oops, you're right, reading too quick, for utf-8 a BOM is entirely optional (it really has no use as far as endiness goes for utf-8) but many s/w use it as a hint that the following content is utf-8 (notepad for instance). in fact now that i reread the section on "Unicode Encoding Schemes", a BOM is always optional (though i swear it was required for utf-16/32 in earlier unicode versions), see: http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G7404


May 3, 2008 at 2:09 PM // reply »
39 Comments

1) the XML 1.0 standard says that when reading a binary stream, BOM is useful to indicate endianness and should be interpreted and discarded (for example, in no language, under no XML DOM, can you identify from a parsed DOM whether it started with a BOM or not, once parsing of the string is done, this information is discarded). Once the byte stream has been converted to a character array, it no longer serves a purpose (ala java's bytea.toCharArray() ). It's not part of the DOM, only a hint to correctly parse the bytes making up the data.

BOM is only significant in a byte array/stream, not a character array. I think you may be confusing multi-byte string encoding with post mb-string decoded data.

2) as stated in my blog, I tried to find an authoritative source for or against, and in fact there are none that I could find. It seems to be as long as you're maintaining a non-character-decoded byte stream (eg, a byte array), you preserve BOM, but again, once you convert byte stream -> character array, it no longer serves a purpose. When you go to convert Char[] back to Byte[] for writing to a file or sending to someone else, you have to use some kind of encoding (most people use UTF-8 any more), and you may want to write a new BOM if you think there's a chance the consumer of your byte stream might not know your byte order or encoding.

3) I want to clarify a statement you made here, "but many s/w use it as a hint that the following content is utf-8" - actually BOM has nothing really special to do with utf-8 other than that utf-8 has a unique representation of BOM that other character encodings don't. If we look for UTF-8's BOM at the start of a byte stream, and find it, chances are pretty good (but not guaranteed) that it's encoded as UTF-8.

In UTF-16BE (big-endian), BOM (U+FEFF) is encoded as 0xFE 0xFF. In UTF-16LE (little-endian), BOM is encoded as 0xFF 0xFE. UTF-16, as you probably know, uses two bytes for every character. UTF-32 of course uses 4 bytes for every character, so UTF-32BE's BOM is 0x00 0x00 0xFE 0xFF, while UTF-32LE's BOM is 0xFF 0xFE 0x00 0x00

UTF-8, as you probably also know, is a variable-width character encoding; characters under U+00F0 are encoded with a single byte, characters from U+00F0 and over are encoded with two or more bytes. Specifically how that encoding happens is actually covered in a scheduled blog entry which appeared earlier this morning as a followup to my Unicode post yesterday. U+FEFF is represented in UTF-8 as a three-byte character: 0xEF 0xBB 0xBF. However, once a string is parsed, U-FEFF is not typically represented in memory as 0xEF 0xBB 0xBF. In Java, it's essentially represented as 65279 (a number of type int [32 bits] whose hex representation of course is \x0000FEFF).

This is the difference between a byte array and a character array. A character array is effectively an array of ints (32 bits) (not quite, but close enough for argument's sake), while a byte array is an array of bytes (8 bits). If you read UTF-8 encoded data with a leading BOM into a byte array, the first three elements will be \xEF \xBB \xBF. If you read the same string into a character array (assuming BOM is preserved) the first element would be 65279 (or \xFEFF). If you re-read that same exact byte stream into a character array but decode it as UTF-16BE, the first two elements will be \xEFBB \xBF?? (where ?? is the hex value of the first byte following BOM). Parsed as UTF-32BE, the first element would be \xEFBBBF??.

It's useful in UTF-8 as a hint that the data may be encoded as UTF-8, because in Unicode, U+EFBB is a reserved character, and should not show up in any normal plain text stream. However although it's convenient, it doesn't guarantee anything in the context of UTF-8, as pointed out by John Boyer here: http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0356.html . Basically when you don't know the encoding of the text, it can help you guess, but it's quite possible for it to be wrong, and so shouldn't be relied on if it can be helped (this is perhaps why it's not mandatory to start every UTF-8 encoded stream with BOM).

This still all boils down to: when converting Byte[] into Char[], BOM may help to correctly decode Byte[], but most software doesn't preserve the BOM since BOM was probably added by the string encoding subsystem and wasn't a part of the original data.

Finally,
If you think about it, it makes sense to silently discard BOM. BOM is only a BOM if it is the first character in a stream, and contributes nothing to the in-memory representation of a decoded string. Its only purpose is to help properly decode the string. Also, if you had two strings which started with BOM, and concatenated them together, you would be introducing a BOM into the middle of the string, where it does not belong (see http://unicode.org/faq/utf_bom.html#38 - "What should I do with U+FEFF in the middle of a file?"). Preserving BOM in the in-memory representation of a decoded string means every string concatenation and every string operation would first need to detect if the leading character is BOM, which would be a huge and needless waste of resources. Much better, since by this point its entire contribution to the string has been fulfilled, to discard it and recreate it later if we need it.

CF's strings are closer to Char[], not Byte[], if you create a string within CF which starts with U+FEFF, you're effectively setting the first array element to Character(\xFEFF), CF won't stop you, nor should it, nor should it discard it once you've created it there for performance reasons. If you convert to Byte[] and back again (which requires you specify some character encoding for both directions), you'll probably discover it disappears.

That said, it would still be nice if parseXml() silently ignored a leading BOM just for corner cases like this; but I still believe the fault is with cfhttp for not properly decoding the string in the first place.


May 3, 2008 at 8:21 PM // reply »
30 Comments

1) xmlParse() still has to be able to handle BOMs ie. you can pass it a file name, maybe you forgot about that option? i'm still arguing that this is a bug in xmlParse().

3) as far as "clarifying my statement", it doesn't--many s/w still use a BOM as an encoding hint no matter your opinion. as for the rest, please tell me something i don't already know.


May 4, 2008 at 8:49 AM // reply »
39 Comments

I'm not sure there's a need to get hostile, but maybe I'm reading too much into it.

1) I haven't forgotten about cffile's ability to take a filename as an argument. Indeed, xmlParse(ExpandPath('file_with_bom.xml')) works correctly, meaning xmlParse() is compliant with the XML standard when dealing with byte streams (which is the context of the XML standard which talks about BOM). Further evidence this is a bug with cfhttp.

3) right, they write a bom to help other systems read the text - but the point is the software prepends the actual data with the bom, just like a http response is prepended with the http headers. But you don't get http headers back as part of the cfhttp.filecontent. It's metadata, it's not actual data.

If I choose UTF-8 as the encoding when saving a file in Notepad, it does indeed write a BOM as you suggest. But as I suggest, when I close and re-open that file, the BOM is not preserved. It's added by the character encoding routine, and stripped by the decoding routine. When you do decode(encode(something)) you should get exactly the same value back as you passed into it, which wouldn't be the case if BOM was preserved. BOM isn't part of the data, it's part of the encoding of that data.


May 13, 2008 at 11:42 AM // reply »
1 Comments

Life saver.

Thanks a bunch.

Chris.


Jul 25, 2008 at 12:58 PM // reply »
92 Comments

Believe it or not I finally got this error! I applied the fix provided by one of the comments. I replace the first character or two if the match is met that the first character is chr(65279). Since you had the issue with Authorize.NET I think its just in general a .NET issue. Here at my job we build RESTful web services so we all build services in a variety of technologies. I was interacting with one built in .NET providing that BOM. Hope this helps others!


Jul 25, 2008 at 1:01 PM // reply »
7,572 Comments

@Javi,

Glad you both got and contributed some value here :) Sweeet.


Sep 4, 2008 at 12:19 AM // reply »
5 Comments

Glad I remembered reading this post a while back, just ran into this issue and your post saved me hours of debugging. For the interest of everyone else, I ran into this issue when reading an XML file saved as utf-8 out of a .zip file using the cfzip tag and passing the XML string into XMLParse. Stripping out the BOM cleared the issue up.

Thanks Ben!


Sep 4, 2008 at 8:27 AM // reply »
7,572 Comments

@Ryan,

Glad you found some value.


Sep 19, 2008 at 4:36 PM // reply »
9 Comments

Ben thanks, and like the other guy said you are wicked smart!

Dave


Nov 6, 2008 at 1:59 PM // reply »
1 Comments

Thanks. This is exactly the solution I was looking for.


Jan 4, 2009 at 9:59 PM // reply »
4 Comments

Thanks for the great post Ben. Your advice helped me out while I was adding additional feeds for a new health section on nobosh.com

Thanks again. I hope we can chat sometime.

Brett
http://nobosh.com


Jan 8, 2009 at 11:55 AM // reply »
2 Comments

I've been banging my head against the wall with this prolog issue the past few days.

When I dump the xml similar to above I get:
[<] - 60
[?] - 63
[x] - 120
[m] - 109
[l] - 108

If I run the reg exp above it changes the error to a footer error. Trim() isn't doing anything. Any other ideas here?


Apr 1, 2009 at 10:25 AM // reply »
1 Comments

Just came across this issue myself, thanks for the blog post :)


Jun 9, 2009 at 12:37 PM // reply »
3 Comments

Thanks Ben, saved me from a headache!


Jun 16, 2009 at 4:34 PM // reply »
2 Comments

Thanks Ben, although your exact example wasn't the issue I was experiencing it helped me think outside the box and solve my issue.


Jul 6, 2009 at 12:32 PM // reply »
5 Comments

Ben, I am running across similar. I think this is isolated to CF7, but not sure.

Anyway, when I do the above fix, the 'An error occured while Parsing an XML document. Content is not allowed in prolog' error goes away, but then I get the Premature end of file.

Any ideas on this would be helpful. Fun stuff. :)


Jul 7, 2009 at 8:08 AM // reply »
7,572 Comments

@Bret,

The premature end of file is usually associated with web services. Are you performing a web service call?


Jul 7, 2009 at 1:05 PM // reply »
5 Comments

@ben. yes, i was. i actually figured out my problem, too. everything mimiced what you had above, but was coming in from google api when trying to return contacts. everything worked on CF8 but not CF7. i eventually figured the problem to be because they use different default charsets, so i had to specify in my cfhttp which to use. once i did that, it worked everywhere (knock on wood).

i was getting end of file, because once i cleaned off the BOM, the content was empty. hence...end of file.

sometimes its the smallest things that take the longest to figure it out. but i got it. and you helped, so thanks for posting this!


Jul 7, 2009 at 1:15 PM // reply »
7,572 Comments

@Bret,

Oh nice! Glad you got it worked out. Character sets are something I would like to have more of a mastery over.


Sep 23, 2009 at 4:00 PM // reply »
1 Comments

Thanks for the tip, Ben. I found your page when I Googled the error message. Your fix worked well. I added your name and a link to this page to the comments in my CF page (internal to NASA) so that future developers may know of your contribution.


Sep 24, 2009 at 9:19 AM // reply »
7,572 Comments

@Ken,

Awesome! Glad to help; hey, do you know Kyle Dodge by any chance? He's a FLEX / CF guy working with you guys (NASA).


Oct 28, 2009 at 1:17 PM // reply »
1 Comments

Just wanted to quickly thank you (again) for posting this. Had the EXACT problem you describe (with Authorize.net BTW) and googled it - found this page and fixed inside 30 seconds...awesome!


Ed
Oct 28, 2009 at 2:03 PM // reply »
1 Comments

Just been doing something similar with CF parsing Unicode data from SQLServer 2005. If you're doing Unicode replacements db-side, watch for this.

There's an issue with SQLServer's REPLACE function and handling of certain high Unicode values.

Example 1: works as expected.
SELECT REPLACE(N'test' + NCHAR(65500), NCHAR(65500), '')

Example 2: no REPLACE() occurs.
SELECT REPLACE(N'test' + NCHAR(65533), NCHAR(65533), '')

Example 3: collate as binary to perform the REPLACE() to work around the issue.
SELECT REPLACE(N'test' + NCHAR(65533) COLLATE Latin1_General_BIN, NCHAR(65533) COLLATE Latin1_General_BIN, '')

Reference:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=385082

Hope this helps someone.


Oct 31, 2009 at 2:22 PM // reply »
7,572 Comments

@Ed,

Very interesting. Extended characters is just a universe that I don't have a good handle on yet. It seems to very rarely be an issue; but I am sure when it comes up, I will need to be more prepared.


Nov 23, 2009 at 11:05 AM // reply »
4 Comments

Thankyou Ben!! Coldfusion wouldn't be the same without you!

Any thoughts as to why someone would start an XML document with "k", as in "k<roottag>..."


Nov 23, 2009 at 11:14 AM // reply »
7,572 Comments

@Pete,

That's odd. That could be a typo? Or maybe some sort of security / obfuscation technique?


Nov 23, 2009 at 11:24 AM // reply »
4 Comments

I'm assuming it's a typo in the XML but I've considered it might be a security thing, I'll post up anything I find out.


Nov 23, 2009 at 11:37 AM // reply »
7,572 Comments

@Pete,

Ok cool - let us know what you find out.


Nov 26, 2009 at 5:42 AM // reply »
2 Comments

Thanks for this post Ben, just found it really really helpful!


Dec 11, 2009 at 12:25 PM // reply »
1 Comments

@Bret, I am having the same problem as you, except trying to consume data from a FMS Admin API. Could you share what you changed the charset too for your request? I have tried utf-8, which is the default for CF8, but I still have the same problem.


Jan 12, 2010 at 11:07 PM // reply »
1 Comments

YAY YAY YAY!
thanks dude,u r a life saver
cheers from NY


Jan 13, 2010 at 9:52 AM // reply »
7,572 Comments

@Arun,

Glad to help... also from NY (NYC).


Mar 19, 2010 at 12:55 PM // reply »
4 Comments

Thank you! Thank you! Thank you!

One more additional bit to add to this: in addition to the, "Content is not allowed in Prolog," error solved by Ben's REReplace, I was also getting, "An invalid XML character (Unicode: ... ) was found in the element content of the document." My first attempt to fix this was to use http://cflib.org/udf/xmlFormat2 but it seemed pretty slow on large amounts of XML. Then, based on one of the comments above, I added charset="utf-8" to the cfhttp I'm using to fetch the XML and, BOOM, no more invalid XML characters!

Thank you! Thank you! Thank you!


Post Comment  |  Ask Ben

Recent Blog Comments
Mar 21, 2010 at 3:59 PM
Exploring ColdFusion Component Runtime Class Properties And Serialization
@Elliott, according to Ben's experiment, serializeJSON() doesn't access the private data by default - it doesn't even access the getHair() method - so trying to clone a Girl.cfc via serializeJSON/des ... read »
Mar 21, 2010 at 3:49 PM
Ask Ben: Javascript String Replace Method
I'm confused a bit by what you are asking, but if had this sentence: The color, red, is in the style statement; style: red;. and wanted to remove all or change all of the commas, colons, and semi-c ... read »
Mar 21, 2010 at 3:13 PM
Ask Ben: Javascript String Replace Method
I am trying to make a java program to count the number of times that these punctuation marks occur in a body of text: , : ; . ! - ' " ? / \ I am using this piece to ferret out the commas: numcommas ... read »
Mar 21, 2010 at 11:13 AM
A New Wrist Pain
@chiropractor suwanee, Spoken like someone trying to sell something. Other than for minor, temporary relief from some back pain, chiropractic treatment is nothing but placebo effect and quackery. ... read »
Mar 21, 2010 at 6:32 AM
ColdFusion CFPOP - My First Look
Apologies... The field name in the db for C. is "BounceCode" It stores the code / message which is returned in the email. Sorry for the confusion. ... read »
Mar 21, 2010 at 6:29 AM
ColdFusion CFPOP - My First Look
@Jose Galdamez, Hi Ben and Jose 1st of all.. big thanks to Jose for his Skype chat a few weeks back. Your time was much appreciated. I have come up with a rather unelegant solution to my problem a ... read »
Mar 21, 2010 at 3:42 AM
A New Wrist Pain
Chiropractic treatment is one of the best methods for treating numerous health problems naturally. After years of experience being a chiropractor, I have found that it is a powerful way to solve many ... read »
Mar 20, 2010 at 12:07 PM
Drawing On The iPhone Canvas With jQuery And ColdFusion
Simply awesome. Saved my day. ... read »