Last year in the comments of my blog post on cleaning high ascii values out of text, Eric Stevens suggested that one might use the XML ENTITY tag in an XML document to define named data variables. I had marked this as something I wanted to look into; but, I had somewhat forgotten about until last night when Clarke Bishop asked about a similar topic. As such, I decided it was time to look at this XML ENTITY tag and see what it does.
In an XML document, the ampersand (&) is a special character. Just as in an HTML document, the ampersand is used to define placeholders for XML data entities. Out of the box, there are five predefined internal XML entities:
These should look pretty familiar; I am sure that most of us have used at least one or two of these to define the output in our HTML documents.
If you try to parse an XML document in ColdFusion and the XML data contains entities outside this predefined set, you'll probably get an error that looks like this:
An error occured while Parsing an XML document. The entity "mdash" was referenced, but not declared.
In this case, the XML parser came across the m-dash entity (—) and could not find the data that this particular entity was representing. To fix this, we can add our own ENTITY definition to the XML document. This can be done either in an external DTD file or in an internal DOCTYPE tag.
To demonstrate this concept, let's look at a single-file DOCTYPE example. In the following code, we are defining a DOCTYPE declaration which, itself, defines our — entity:
<!--- Define our XML document. ---> <cfxml variable="xhtml"> <?xml version="1.0" encoding="UTF-8"?> <!--- The document type definition can go inline or in a seperate file. It has to start with the root element and can define entities that are used within the XML. NOTE: We are defining the entity "mdash" which will allow us to use the entity — in our XML document without getting any parsing errors. ---> <!DOCTYPE data [ <!ENTITY mdash "--"> ]> <!--- The actual XML Data. ---> <data> <h2 class="name"> Tricia Smith </h2> <p class="position"> Vice President — Sales & Marketing </p> </data> </cfxml> <!--- Output the name and position of the person defined. ---> <cfoutput> <!--- Name. ---> #xhtml.data.h2.xmlText# <!--- Position. ---> (#xhtml.data.p.xmlText#) </cfoutput>
As you can see, our DOCTYPE is defining the "mdash" entity as representing the double-dash (--). When the XML document gets parsed, all instances of the "—" entity will be replaced with our double-dash. And, when we run the above code, we get the following output:
Tricia Smith ( Vice President -- Sales & Marketing )
As you can see, the above text actually has two substitutions. The first is our explicitly-defined "mdash" entity; the second is the internally-defined "amp" entity.
At this time, that's about all I know about the XML ENTITY declaration. It seems really cool; and, while I was looking into this, I was definitely reminded that XML is a lot more powerful and more robust than I even realize. I wonder how many cool things I could do if I knew more about how XML worked. Oh well - one baby step at a time.
Want to use code from this post? Check out the license.