Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Using The OWASP Java HTML Sanitizer In Lucee CFML 5.3.7.48 To Sanitize HTML Input And Prevent XSS Attacks

By Ben Nadel on
Tags: ColdFusion

Earlier this week, at the Adobe ColdFusion Developer Conference, Charlie Arehart mentioned that the OWASP AntiSamy project was added to Adobe ColdFusion 11. I started using the AntiSamy project back in ColdFusion 10, and hadn't realized that it was now a native part of the ColdFusion runtime. This inspired me to go back and re-read my old post wherein I remembered that Matthew Clemente mentioned yet another OWASP project of relevance called the Java HTML Sanitizer. To keep things exciting, I decided to play around a bit with this Java HTML Sanitizer project in Lucee CFML 5.3.7.48.

View this code in my OWASP Java HTML Sanitizer With Lucee CFML 5.3.7.48 project on GitHub.

The OWASP Java HTML Sanitizer project works very much like the OWASP AntiSamy project in so much as you define a policy that outlines what you want to allow in an untrusted input; and then, you can process the input against that policy in order to produced safe, trusted output HTML.

What makes the OWASP Java HTML Sanitizer project nice is that, instead of using an XML file as you do with AntiSamy, your policy is defined in-code using a fluent API. There's nothing wrong with having to use an XML file - it just feels a bit outdated. And, defining your policy in-code means that you get to leverage all of the flexibility that your ColdFusion / Java runtime offers.

To get this working, I went to the Maven Repository and manually downloaded all of the JAR files necessary for version 20200713.1. I'm sure there's a really easy command-line way to do this; but, I never learned it. Then, once I had all the JAR files stored locally, I used Lucee CFML's ability to create Java classes using JAR paths.

Within the OWASP Java HTML Sanitizer, everything is blocked by default. You have to use the policy builder to allow-list specific elements and attributes within your untrusted input. In the following demo, I am allow-listing a few HTML elements and just a handful of attributes. Attributes can either be allow-listed globally; or, locked-down to a specific set of HTML elements.

Here's a simple input / output demo - not that the <a> tag in my first paragraph contains a persisted XSS (Cross-Site Scripting) attack:

<cfscript>

	// This is the untrusted HTML input that we need to sanitize.
	```
	<cfsavecontent variable="htmlInput">

		<p>
			Check out
			<a href="www.bennadel.com" target="_blank" onmousedown="alert( 'XSS!' )">my site</a>.
		</p>

		<marquee loop="-1" width="100%">
			I am very trustable! You can totes trust me!
		</marquee>

		<p>
			<strong>Thanks for stopping by!</strong> <em>You Rock!</em> &amp;
			<blink>Woot!</blink>
		</p>

	</cfsavecontent>
	```

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	Pattern = createObject( "java", "java.util.regex.Pattern" );

	// The Policy Builder has a number of fluent APIs that allow us to incrementally
	// define the sanitization policy. It primarily consists of allow-listing elements
	// and attributes (usually in the context of a given set of elements).
	policyBuilder = javaNew( "org.owasp.html.HtmlPolicyBuilder" )
		.init()
		.allowElements([
			"p", "div",
			"br",
			"a",
			"b", "strong",
			"i", "em",
			"ul", "ol", "li"
		])
		.allowUrlProtocols([ "http", "https" ])
		.requireRelNofollowOnLinks()
		.allowAttributes([ "title" ])
			.globally()
		.allowAttributes([ "href", "target" ])
			.onElements([ "a" ])
		.allowAttributes([ "lang" ])
			.matching( Pattern.compile( "[a-zA-Z]{2,20}" ) )
			.globally()
		.allowAttributes([ "align" ])
			// NOTE: true = ignoreCase.
			.matching( true, [ "center", "left", "right", "justify" ] )
			.onElements([ "p" ])
	;
	policy = policyBuilder.toFactory();

	// Sanitize the HTML input.
	// --
	// NOTE: There's a more complicated invocation of the sanitization that allows you to
	// capture the block-listed elements and attributes that are removed from input. That
	// said, I could NOT FIGURE OUT how to do that - it looks like you might need to
	// write some actual Java code to provide the necessary arguments.
	sanitizedHtmlInput = policy.sanitize( htmlInput );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	```
	<h1>
		OWASP Java Html Sanitizer
	</h1>

	<h2>
		Untrusted Input
	</h2>

	<cfoutput>
		<!--- NOTE: I'm dedenting the indentation incurred by the CFSaveContent tag. --->
		<pre>#encodeForHtml( htmlInput.reReplace( "(?m)^\t\t", "", "all" ).trim() )#</pre>
	</cfoutput>

	<h2>
		Sanitized Input
	</h2>

	<cfoutput>
		<!--- NOTE: I'm dedenting the indentation incurred by the CFSaveContent tag. --->
		<pre>#encodeForHtml( sanitizedHtmlInput.reReplace( "(?m)^\t\t", "", "all" ).trim() )#</pre>
	</cfoutput>
	```

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I load the given Java class using the underlying JAR files.
	*/
	public any function javaNew( required string className ) {

		// I downloaded these from the Maven Repository (manually since I don't actually
		// know how Maven works).
		// --
		// https://mvnrepository.com/artifact/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20200713.1
		var jarFiles = [
			"./vendor/owasp-java-html-sanitizer-20200713.1/animal-sniffer-annotations-1.17.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/checker-qual-2.5.2.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/error_prone_annotations-2.2.0.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/failureaccess-1.0.1.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/guava-27.1-jre.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/j2objc-annotations-1.1.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/jsr305-3.0.2.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar",
			"./vendor/owasp-java-html-sanitizer-20200713.1/owasp-java-html-sanitizer-20200713.1.jar"
		];

		return( createObject( "java", className, jarFiles ) );

	}

</cfscript>

Once you have your Policy (generated from the Policy Builder), all you have to do is call .sanitize(input) and you get a safe HTML result. In this version of the code, you don't get a report of the elements / attributes that were removed from the input. There's a more complicated version of the sanitization process that user some sort of an event-emitter to track the filtering process. Unfortunately, I couldn't get that to work as it required more Java know-how than I have.

That said, when we run the above ColdFusion code, we get the following output:

As you can see, elements and attributes that were not explicitly allow-listed have been removed. And, some link-spam and opener attack protection was also added.

ASIDE: The noopener noreferrer rel attribute values are meant to protect against an attack pattern known as Reverse Tabnabbing. Isn't protecting a web application fun?!

The OWASP (Open Web Application Security Project) projects are pretty dang amazing! And since they work primarily with Java, it means that pulling them into a ColdFusion or Lucee CFML application is usually a low-effort, high-return endeavor.



Reader Comments

What has two thumbs and hopes you leave a comment? This Guy! (Ben Nadel).

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.