Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Avoiding Self-Closing IFRAME Tags Using htmlParse() In Lucee CFML 5.3.4.80

By Ben Nadel on
Tags: ColdFusion

Over the past week, I've been working to retrofit Markdown onto all of my old blog content using Lucee CFML. It's been an exciting journey with a lot of trial and error. For example, the other day, I realized the .xmlText property wasn't giving me escaped HTML entities; and, just this morning, I realized that iframe tags with no content were getting re-serialized as self-closing tags. While this is valid for XML - any tag with no children can be self-closing - only certain tags in HTML can be self-closing. And, the iframe is not one of them. As such, I had to re-process all of my posts, ensuring that iframe tags were serialized using both an Open and Close tag in Lucee CFML 5.3.4.80.

To see the issue I was running into, let's look at a stand-alone example. In the following ColdFusion code, we're going to parse an HTML snippet using htmlParse(). And then, simply serialize it back to HTML using toString():

<cfscript>

	```
	<cfsavecontent variable="htmlContent">

		<p>
			Heck, checkout this video:
		</p>

		<p>
			<iframe src="video.mp4"></iframe>
		</p>

	</cfsavecontent>
	```

	// The htmlParse() function parses the HTML into an XML document. The rules for XML
	// documents are different than the rules for HTML documents. This can cause a
	// re-serialization problem for non-self-closing tags with empty-content.
	xmlContent = htmlParse( htmlContent );

	// Because the IFRAME element has no child-nodes, stringification of the XML document
	// will render the IFRAME as a SELF-CLOSING tag. This is valid for XML but is NOT
	// valid for HTML.
	echo( encodeForHtml( toString( xmlContent.html.body ) ) );

</cfscript>

As you can see, the HTML content being parsed contains an iframe tag with no children:

<iframe src="video.mp4"></iframe>

And, when we serialize this using toString(), we get the following markup (I've manually added white-space to make it more readable):

<?xml version="1.0" encoding="UTF-8"?>
<body xmlns="http://www.w3.org/1999/xhtml">
	<p>
		Heck, checkout this video:
	</p>
	<p>
		<iframe frameborder="1" scrolling="auto" src="video.mp4"/>
	</p>
</body>

As you can see, the iframe tag is being serialized as a self-closing tag, in that it now ends with /> rather than with </iframe>. If I were to try and get the browser to render this iframe, the page would break. It wouldn't throw an error, it would simply hit the <iframe/> tag and stop rendering the rest of the page output.

NOTE: Literally, as I am writing this, I am just noticed that the htmlParse() method seems to have injected frameborder and scrolling attributes into my iframe tag.

To get around this, I have to force the iframe tag to have at least one child-node. If it has one child node, then the toString() call will correctly render it with the </iframe> closing tag.

The easiest way I can think of to do this is to simply append an empty HTML comment to the iframe content. This shouldn't have any bearing on the visual rendering of the page; but, it will force the iframe tree-fragment to be non-empty. I'm going to do this before I run the HTML content through htmlParse():

<cfscript>

	```
	<cfsavecontent variable="htmlContent">

		<p>
			Heck, checkout this video:
		</p>

		<p>
			<iframe src="video.mp4"></iframe>
		</p>

	</cfsavecontent>
	```

	// In order to get IFRAME tags to re-serialize with the desired, two-tag format, we
	// have to ensure that the IFRAME contains at least one child-node. In this case, we
	// can use the innocuous COMMENT node to force children.
	htmlContent = htmlContent
		.reReplaceNoCase( "></iframe>", "><!-- --></iframe>", "all" )
	;

	// With the inserted COMMENT, our IFRAME element in the resultant XML document will
	// no longer be empty.
	xmlContent = htmlParse( htmlContent );

	// ... which means, when re-serialized, it will render as <iframe>....</iframe>.
	echo( encodeForHtml( xmlContent.html.body ) );

</cfscript>

As you can see, before I call htmlParse(), I'm finding any iframe closing tag that butts-up against another tag artifact (angle bracket) and I'm inserting an empty HTML comment. Now, when we re-serialize the content using the toString() function, we get the following markup (again, I've manually added white-space to make it more readable):

<?xml version="1.0" encoding="UTF-8"?>
<body xmlns="http://www.w3.org/1999/xhtml">
	<p>
		Heck, checkout this video:
	</p>
	<p>
		<iframe frameborder="1" scrolling="auto" src="video.mp4"><!-- --></iframe>
	</p>
</body>

As you can see, because we force the iframe tag to have at least one child node, it now gets re-serialized with the </iframe> closing tag.

To be clear, I'm talking about the iframe tag in this case because that's the tag that caused my page-rendering issues. However, this same rule applies to any HTML tag that has no children. Of course, tags like img and meta are allowed to be self-closing and won't be a problem. It just happens that the iframe tag will break the page if it self-closing.

Ultimately, there may be other ways to deal with HTML parsing and sanitization; such as by using XSLT and xmlTransform() in ColdFusion. However, htmlParse() feels like a nice combination of ease-of-use and powerful functionality. It just happens that it has caveats that you have to watch out for in Lucee CFML.



Reader Comments

What has two thumbs and hopes you leave a comment? This Guy! (Ben Nadel).

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.