Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Avoiding Self-Closing IFRAME Tags Using htmlParse() In Lucee CFML 5.3.4.80

By Ben Nadel on
Tags: ColdFusion

Over the past week, I've been working to retrofit Markdown onto all of my old blog content using Lucee CFML. It's been an exciting journey with a lot of trial and error. For example, the other day, I realized the .xmlText property wasn't giving me escaped HTML entities; and, just this morning, I realized that iframe tags with no content were getting re-serialized as self-closing tags. While this is valid for XML - any tag with no children can be self-closing - only certain tags in HTML can be self-closing. And, the iframe is not one of them. As such, I had to re-process all of my posts, ensuring that iframe tags were serialized using both an Open and Close tag in Lucee CFML 5.3.4.80.

To see the issue I was running into, let's look at a stand-alone example. In the following ColdFusion code, we're going to parse an HTML snippet using htmlParse(). And then, simply serialize it back to HTML using toString():

<cfscript>

	```
	<cfsavecontent variable="htmlContent">

		<p>
			Heck, checkout this video:
		</p>

		<p>
			<iframe src="video.mp4"></iframe>
		</p>

	</cfsavecontent>
	```

	// The htmlParse() function parses the HTML into an XML document. The rules for XML
	// documents are different than the rules for HTML documents. This can cause a
	// re-serialization problem for non-self-closing tags with empty-content.
	xmlContent = htmlParse( htmlContent );

	// Because the IFRAME element has no child-nodes, stringification of the XML document
	// will render the IFRAME as a SELF-CLOSING tag. This is valid for XML but is NOT
	// valid for HTML.
	echo( encodeForHtml( toString( xmlContent.html.body ) ) );

</cfscript>

As you can see, the HTML content being parsed contains an iframe tag with no children:

<iframe src="video.mp4"></iframe>

And, when we serialize this using toString(), we get the following markup (I've manually added white-space to make it more readable):

<?xml version="1.0" encoding="UTF-8"?>
<body xmlns="http://www.w3.org/1999/xhtml">
	<p>
		Heck, checkout this video:
	</p>
	<p>
		<iframe frameborder="1" scrolling="auto" src="video.mp4"/>
	</p>
</body>

As you can see, the iframe tag is being serialized as a self-closing tag, in that it now ends with /> rather than with </iframe>. If I were to try and get the browser to render this iframe, the page would break. It wouldn't throw an error, it would simply hit the <iframe/> tag and stop rendering the rest of the page output.

NOTE: Literally, as I am writing this, I am just noticed that the htmlParse() method seems to have injected frameborder and scrolling attributes into my iframe tag.

To get around this, I have to force the iframe tag to have at least one child-node. If it has one child node, then the toString() call will correctly render it with the </iframe> closing tag.

The easiest way I can think of to do this is to simply append an empty HTML comment to the iframe content. This shouldn't have any bearing on the visual rendering of the page; but, it will force the iframe tree-fragment to be non-empty. I'm going to do this before I run the HTML content through htmlParse():

<cfscript>

	```
	<cfsavecontent variable="htmlContent">

		<p>
			Heck, checkout this video:
		</p>

		<p>
			<iframe src="video.mp4"></iframe>
		</p>

	</cfsavecontent>
	```

	// In order to get IFRAME tags to re-serialize with the desired, two-tag format, we
	// have to ensure that the IFRAME contains at least one child-node. In this case, we
	// can use the innocuous COMMENT node to force children.
	htmlContent = htmlContent
		.reReplaceNoCase( "></iframe>", "><!-- --></iframe>", "all" )
	;

	// With the inserted COMMENT, our IFRAME element in the resultant XML document will
	// no longer be empty.
	xmlContent = htmlParse( htmlContent );

	// ... which means, when re-serialized, it will render as <iframe>....</iframe>.
	echo( encodeForHtml( xmlContent.html.body ) );

</cfscript>

As you can see, before I call htmlParse(), I'm finding any iframe closing tag that butts-up against another tag artifact (angle bracket) and I'm inserting an empty HTML comment. Now, when we re-serialize the content using the toString() function, we get the following markup (again, I've manually added white-space to make it more readable):

<?xml version="1.0" encoding="UTF-8"?>
<body xmlns="http://www.w3.org/1999/xhtml">
	<p>
		Heck, checkout this video:
	</p>
	<p>
		<iframe frameborder="1" scrolling="auto" src="video.mp4"><!-- --></iframe>
	</p>
</body>

As you can see, because we force the iframe tag to have at least one child node, it now gets re-serialized with the </iframe> closing tag.

To be clear, I'm talking about the iframe tag in this case because that's the tag that caused my page-rendering issues. However, this same rule applies to any HTML tag that has no children. Of course, tags like img and meta are allowed to be self-closing and won't be a problem. It just happens that the iframe tag will break the page if it self-closing.

Ultimately, there may be other ways to deal with HTML parsing and sanitization; such as by using XSLT and xmlTransform() in ColdFusion. However, htmlParse() feels like a nice combination of ease-of-use and powerful functionality. It just happens that it has caveats that you have to watch out for in Lucee CFML.



Reader Comments

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.