Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Jeremy Mount
Ben Nadel at InVision In Real Life (IRL) 2018 (Hollywood, CA) with: Jeremy Mount

Using The OWASP AntiSamy 1.5.7 Project With ColdFusion 10 To Sanitize HTML Input And Help Prevent XSS Attacks

By Ben Nadel on
Tags: ColdFusion

For the past few days, I've been working to enable Markdown for my blog comments. Of course, the second I enable Markdown, I allow my readers to submit a wider variety of content. In order to ensure that said content doesn't contain malicious or ill-advised code, I wanted to add a subsequent layer of validation and sanitization. I took a look at OWASP (Open Web Application Security Project) to see what they recommend; which is where I discovered the OWASP AntiSamy Project. AntiSamy allows for untrusted HTML to be evaluated and sanitized using a custom security Policy. Unfortunately, loading AntiSamy 1.5.7 (the latest version at the time of this writing) into a ColdFusion application isn't effortless. It requires Mark Mandel's JavaLoader project; and, a few Class Loading shenanigans.

View this code in my AntiSamy 1.5.7 With ColdFusion 10 project on GitHub.

First off, I want to give a special shout-out to Matthew J. Clemente and his post about AntiSamy 1.5.3. He set me down the right path. I just needed to workout the differences between his use of 1.5.3 and my use of 1.5.7 - which, ironically, uses a non-breaking semver (Semantic Versioning) version that clearly causes breaking changes of some sort.

That said, the OWASP AntiSamy project uses an XML-based security Policy file to evaluate and sanitize untrusted, user-provided HTML. The XML policy file can be very relaxed; or, it can be very strict. The project contains a few sample XML files that are based on some real-world context. Of course, you can create your own Policy file with whichever rules you feel make sense.

For example, I created one for this demo that is very strict, and strips out all but the most basic text formatting tags:

  • <?xml version="1.0" encoding="UTF-8" ?>
  • <anti-samy-rules
  • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  • xsi:noNamespaceSchemaLocation="antisamy.xsd">
  •  
  • <directives>
  • <directive name="embedStyleSheets" value="false" />
  • <directive name="formatOutput" value="true" />
  • <directive name="maxInputSize" value="100000" />
  • <directive name="nofollowAnchors" value="true" />
  • <directive name="omitDoctypeDeclaration" value="true" />
  • <directive name="omitXmlDeclaration" value="true" />
  • <directive name="onUnknownTag" value="remove" />
  • <directive name="useXHTML" value="true" />
  • </directives>
  •  
  • <tag-rules>
  • <tag name="a" action="validate">
  • <attribute name="href" onInvalid="filterTag">
  • <regexp-list>
  • <regexp value="https?://[A-Za-z0-9]+[~a-zA-Z0-9-_\.@\#\$%&amp;;:,\?=/\+!\(\)]*" />
  • </regexp-list>
  • </attribute>
  •  
  • <attribute name="rel">
  • <literal-list>
  • <literal value="nofollow" />
  • </literal-list>
  • </attribute>
  • </tag>
  • <tag name="b" action="validate" />
  • <tag name="blockquote" action="validate" />
  • <tag name="code" action="validate">
  • <attribute name="class">
  • <regexp-list>
  • <regexp value="language-[a-zA-Z0-9]+" />
  • </regexp-list>
  • </attribute>
  • </tag>
  • <tag name="em" action="validate" />
  • <tag name="i" action="validate" />
  • <tag name="li" action="validate" />
  • <tag name="ol" action="validate" />
  • <tag name="p" action="validate" />
  • <tag name="pre" action="validate">
  • <attribute name="class">
  • <regexp-list>
  • <regexp value="language-[a-zA-Z0-9]+" />
  • </regexp-list>
  • </attribute>
  • </tag>
  • <tag name="strong" action="validate" />
  • <tag name="ul" action="validate" />
  • </tag-rules>
  •  
  • </anti-samy-rules>

By default, this AntiSamy policy will strip out all tags that are not explicitly listed in the tag rules (but will keep their content). This means that only tags like P, Strong, Blockquote, Code, and Pre will be allowed to pass through. But, even the tags that are allowed to pass through are still sanitized based on attribute validation rules. For example, the Code and Pre tags can have the attribute, "class"; but, only if it adheres to the given Regular Expression pattern.

NOTE: From what I can see, the only other valid value for "onUnknownTag" is "encode", which will escape the characters of unknown tags, rather than strip them out.

Understanding the Policy XML file schema is complicated. I barely understand it. But, I did find a good description in the WaveMaker's Learning Center that breaks it down fairly well.

Now, the primary issue with using AntiSamy 1.5.7 in ColdFusion 10 is that something in the internals of the library (or its dependencies) ends up using the wrong Class Loader and leads to the following error:

java.lang.ClassCastException: org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to javax.xml.parsers.DocumentBuilderFactory

Thankfully, the JavaLoader project has a special method to deal with this very problem: switchThreadContextClassLoader(). This is the same method that I used when loading LaunchDarkly's feature flag library into ColdFusion 10. This method executes an arbitrary function inside a context that prevents the executing code from reaching into the wrong Class Loader. It makes the code a bit harder to read; but, it gets the job done.

To see this in action, let's sanitize some HTML! First, we need to create our JavaLoader for AntiSamy 1.5.7, which I'm caching in the ColdFusion Application scope during application initialization:

  • component
  • output = false
  • hint = "I provide the application settings and event handlers."
  • {
  •  
  • // Define the application.
  • this.name = hash( getCurrentTemplatePath() );
  • this.applicationTimeout = createTimeSpan( 0, 0, 10, 0 );
  • this.sessionManagement = false;
  •  
  • // Setup the mappings
  • this.directory = getDirectoryFromPath( getCurrentTemplatePath() );
  • this.mappings[ "/" ] = this.directory;
  • this.mappings[ "/antisamy" ] = ( this.directory & "vendor/antisamy-1.5.7/" );
  • this.mappings[ "/javaloader" ] = ( this.directory & "vendor/javaloader-1.2/javaloader/" );
  • this.mappings[ "/javaloaderfactory" ] = ( this.directory & "vendor/javaloaderfactory/" );
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • /**
  • * I initialize the application.
  • *
  • * @output false
  • */
  • public boolean function onApplicationStart() {
  •  
  • // In order to prevent memory leaks, we're going to use the JavaLoaderFactory to
  • // instantiate our JavaLoader. This will keep the instance cached in the Server
  • // scope so that it doesn't have to continually re-create it as we test our
  • // application configuration.
  • application.javaLoaderFactory = new javaloaderfactory.JavaLoaderFactory();
  •  
  • // Create a JavaLoader that can access the AntiSamy 1.5.7 JAR files.
  • // --
  • // CAUTION: The directory has MORE JAR FILES than are actually necessary to run
  • // the demo. However, I just downloaded all the non-optional dependencies
  • // according to the MAVEN resource pages. I don't actually know enough about Java
  • // to know which libraries I can and cannot exclude from the JavaLoader. I have
  • // commented-out the ones that were not supplied by the "JAR Download" website.
  • application.antisamyJavaLoader = application.javaLoaderFactory.getJavaLoader([
  • expandPath( "/antisamy/antisamy-1.5.7.jar" ),
  • // expandPath( "/antisamy/avalon-framework-4.1.3.jar" ),
  • // expandPath( "/antisamy/avalon-framework-4.1.5.jar" ),
  • expandPath( "/antisamy/batik-constants-1.9.1.jar" ),
  • expandPath( "/antisamy/batik-css-1.9.1.jar" ),
  • expandPath( "/antisamy/batik-i18n-1.9.1.jar" ),
  • expandPath( "/antisamy/batik-util-1.9.1.jar" ),
  • expandPath( "/antisamy/commons-codec-1.6.jar" ),
  • expandPath( "/antisamy/commons-io-1.3.1.jar" ),
  • // expandPath( "/antisamy/commons-logging-1.0.4.jar" ),
  • expandPath( "/antisamy/commons-logging-1.1.3.jar" ),
  • expandPath( "/antisamy/httpclient-4.3.6.jar" ),
  • expandPath( "/antisamy/httpcore-4.3.3.jar" ),
  • // expandPath( "/antisamy/log4j-1.2.17.jar" ),
  • // expandPath( "/antisamy/logkit-1.0.1.jar" ),
  • expandPath( "/antisamy/nekohtml-1.9.22.jar" ),
  • expandPath( "/antisamy/xercesImpl-2.11.0.jar" ),
  • expandPath( "/antisamy/xml-apis-1.4.01.jar" ),
  • expandPath( "/antisamy/xml-apis-ext-1.3.04.jar" ),
  • // expandPath( "/antisamy/xml-resolver-1.2.jar" ),
  • expandPath( "/antisamy/xmlgraphics-commons-2.2.jar" )
  • ]);
  •  
  • // Indicate that the application has been initialized successfully.
  • return( true );
  •  
  • }
  •  
  • }

To be honest, I don't really know that much about Java. I love using random Java utilities, like the Pattern / Matcher classes; but, my understanding of how Java applications execute is very shallow. For example, I don't understand why not all of the Java JAR files are necessary to run this demo. To get started, I went to the Maven project page for AntiSamy 1.5.7 and just manually downloaded all of the non-optional dependencies. But, if I compare my list of files to the one in the ZIP file provided by the JAR Download site, they are different. As such, I commented-out the ones that were not present in the JAR Download ZIP.

CAUTION: I don't know if JAR Download is a legitimate site. I trust Maven, so I'll happily grab files from their site. However, I am not sure if JAR Download is a trusted resource - use those ZIP files with caution. I only used it for a comparison of what files were made available.

Once we have our AntiSamy JavaLoader instance cache, we can start looking at user-provided HTML. Here's where we have to jump through some obscure Class Loading hoops. In the following code, notice that I have to load my XML Policy file using the switchThreadContextClassLoader() method:

  • <!--- Setup our untrusted HTML content. --->
  • <cfsavecontent variable="unsafeHtml">
  •  
  • <p>
  • Check out
  • <a href="https://www.bennadel.com" onmousedown="alert( 'XSS!' )">my site</a>.
  • </p>
  •  
  • <marquee loop="-1" width="100%">
  • I am very trustable! You can totes trust me!
  • </marquee>
  •  
  • <p>
  • <strong>Thanks for stopping by!</strong> <em>You Rock!</em>
  • <blink>Woot!</blink>
  • </p>
  •  
  • </cfsavecontent>
  •  
  • <!--- ------------------------------------------------------------------------------ --->
  • <!--- ------------------------------------------------------------------------------ --->
  •  
  • <cfscript>
  •  
  • // Create our AntiSamy instance.
  • // --
  • // NOTE: We would probably cache this in the Application scope. Or, more likely,
  • // inside a proxy Component that handles all of the intricate interaction details
  • // for us so that we don't have know about all the junk below.
  • antisamy = application.antisamyJavaLoader.create( "org.owasp.validator.html.AntiSamy" ).init();
  •  
  • // Create our security policy from the given XML file. This policy determines what
  • // tags and attributes will be allowed in the sanitized HTML; and, about how AntiSamy
  • // will treat invalid tags that it comes across.
  • // --
  • // NOTE: We would probably cache this so that we don't have re-read it every time.
  • // --
  • // Read More: https://www.wavemaker.com/learn/app-development/app-security/xss-antisamy-policy-configuration/
  • policy = application.antisamyJavaLoader.switchThreadContextClassLoader(
  • getInstance__inProperContext,
  • {
  • PolicyClass: application.antisamyJavaLoader.create( "org.owasp.validator.html.Policy" ),
  • policyFilePath: expandPath( "./demo-policy.xml" )
  • }
  • );
  •  
  • // Scan the untrusted HTML. The results will contain both error messages and the
  • // sanitized HTML output.
  • result = antisamy.scan( javaCast( "string", unsafeHtml ), policy );
  •  
  • writeOutput( encodeForHtml( result.getCleanHTML() ));
  • writeOutput( "<hr />" );
  • writeDump( result.getErrorMessages() );
  •  
  • // ------------------------------------------------------------------------------- //
  • // ------------------------------------------------------------------------------- //
  •  
  • /**
  • * I am intended to be INVOKED BY THE JAVALOADER. I run the getInstance() method in a
  • * context that forces the classes to be loaded from the AntiSamy JavaLoader. This
  • * gets around issues in which Java classes try to load dependencies from the wrong
  • * Class Loader.
  • *
  • * NOTE: While in this method, you cannot access the core ColdFusion classes. As such,
  • * this method should do AS LITTLE AS POSSIBLE such that it can return to the normal
  • * execution context as fast as possible.
  • *
  • * @PolicyClass I am the Policy class provided by AntiSamy.
  • * @policyFilePath I am the path to our security policy XML file.
  • * @output false
  • */
  • public any function getInstance__inProperContext(
  • required any PolicyClass,
  • required string policyFilePath
  • ) {
  •  
  • return( PolicyClass.getInstance( javaCast( "string", policyFilePath ) ) );
  •  
  • }
  •  
  • </cfscript>

As you can see, we use the JavaLoader to create an instance of the AntiSamy library. For the demo, I'm just instantiating AntiSamy each time the page is run (to make the development easier). However, in a real-world scenario, I'd probably load and cache this library inside of another ColdFusion component that proxies the AntiSamy API. I'd also read and cache the Policy file so that I don't have to keep performing disk I/O.

That said, once we have our AntiSamy instance, we can use the .scan() method to evaluate the untrusted HTML. The .scan() method returns a result object that contains both the sanitized / clean HTML as well as a collection of error messages that outline how the untrusted HTML was manipulated. In this case, if we run the page, we get the following page output:


 
 
 

 
 Using AntiSamy 1.5.7 with ColdFusion 10 and JavaLoader to evaluate and sanitize user-provide HTML content. 
 
 
 

As you can see, the unknown tags, Marquee and Blink, were stripped out; but, their content was allowed to remain. That said, these two tags were listed in the errors collection. So, it's up to you as the developer to decide if you want to store the sanitized version. Or, if you want to kick it back to the user with the listed problems.

AntiSamy is a pretty cool project for content sanitization and the prevention of XSS (Cross-Site Scripting) attacks! It's not exactly clear how active the project is. But, considering that it's an OWASP project, I have to imagine that they will evolve it as necessary. That said, it's really just an HTML parser; so, it should naturally be able to adapt to changes in the HTML specification (as long as it continues to consist of Tags and Attributes). And, hopefully this post helps you understand how AntiSamy 1.5.7 might be loaded into a ColdFusion 10 application.



Looking For A New Job?

Ooops, there are no jobs. Post one now for only $29 and own this real estate!

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

Nice! That was a real rabbit hole... incredibly frustrating.

I tried a number of other approaches (fat jars, modified pom.xml builds, etc), none of which worked, so I'm glad to see you were able to get the dependencies sorted out and working.

Along with watching the AntiSamy project to see if they sort out their use of Xerces, I'm also going to explore this other OWASP project: https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project

Looks like it's not quite as robust as AntiSamy yet, but is being very actively developed.

Cheers!

Reply to this Comment

@Matthew,

Oh man! How are we supposed to keep up with all this stuff?! I'll take a look as well. Though, what I'd really like is just to create some sort of Abstraction around sanitizing / validation HTML, so I could swap the Java libraries out under the hood.

Thanks again for all your help!

Reply to this Comment

@Mahendra,

Glad you enjoyed it. Using AntiSamy certainly adds a layer of confidence to accepting open-ended data from users.

Reply to this Comment

Hi Ben,

it comes as a surprise you're still on CF 10. Later Versions load jar file much more easily through "this.javaSettings" in Application.cfc.
I recently cleaned HTML output using jsoup which does not need to be configured through such a "heavy" xml file.
I believed their release to be much more recent that Antisamy's but they obviously released versions in 2017, too, after some years without updates.
Antysamy returns a list of findings which is nice.

http://central.maven.org/maven2/org/owasp/antisamy/antisamy/
http://central.maven.org/maven2/org/jsoup/jsoup/

Best,
Bernhard

Reply to this Comment

I studied your blog further and understood you found the maven downloads yourself.
When I try a later version of Adobe ColdFusion and use "this.javaSettings" I run into the same error
Error casting an object of type org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to javax.xml.parsers.DocumentBuilderFactory to an incompatible type. This usually indicates a programming error in Java, although it could also mean you have tried to use a foreign object in a different way than it was designed.
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to javax.xml.parsers.DocumentBuilderFactory

Reply to this Comment

@Bernhard,

Yeah, I'm working on updating my version of ColdFusion. I may actually move over to Lucee as I'm actually paying $$$ for a hosted version of ColdFusion. We'll see -- it depends on how much I actually want to learn about managing a server (vs. being able to just open a Support ticket).

As far as using the ColdFusion settings for loading Java classes, I think that really only works when there are no conflicts with the dependencies. So, even in a later version of ColdFusion, I'd still use the JavaLoader project to load stuff like this. That way, you get total control over which classes are being used when. In fact, even with the JavaLoader project, you can see that I ran into problems and had to use the heavy-handed, .switchThreadContextClassLoader() method to force the proper context when loading the XML classes (which would, otherwise, throw the casting error you were seeing).

Reply to this Comment

Hi Ben,

Great post!
By any chance did you try to use the dynamic attributes with 1.5.7? I'm currently trying to implement this and got everything working except for the dynamic attributes, it keeps filtering out all of them, no matter what's in my policy file.

Thanks,
Landon

Reply to this Comment

@Landon,

I have not tried any of the dynamic attributes yet. I assume you're talking about the data-* style tags? If so, I'll put it on my list of something to try. For now, I'm just locking stuff down really hard cause it's all user-provided.

That said, I would like to eventually use Markdown as my post-authoring ingress. At that point, I'll need to have a lot more flexibility.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.