Testing wkhtmltopdf 0.12.6 With Docker In Lucee CFML 5.3.4.80

By Ben Nadel

Published 2020-07-13 in ColdFusion — Comments (4)

A few months ago, James Moberg listed out a good number of Command-Line utilities that he uses in ColdFusion. Among them is wkhtmltopdf, which is a tool that can convert HTML and CSS to PDFs using the Qt WebKit rendering engine. Since I've been digging into PDF document generation in Lucee CFML, with varying degrees of success, I thought it was time that I try out Moberg's wkhtmltopdf recommendation. As such, this weekend, I sat down and got a proof-of-concept working in Docker and Lucee CFML 5.3.4.80.

Setting Up My Docker Container / Playground

Just as with my GraphicsMagick exploration in Lucee CFML, I figured that the cleanest way to start playing with wkhtmltopdf would be to create a Docker container that isolates this work and allows me to easily spin-up and spin-down my experiments. And, as with my GraphicsMagick approach, my Docker container is based on the Ortus Solutions' CommandBox Docker Image for Lucee CFML 5.

The wkhtmltopdf project provides a number of pre-compiled binaries for different platforms. But, I didn't know which platform the CommandBox image was actually running on - honestly, I know very little about Servers themselves. So, I had to figure out what platform I was running on first.

Based on a StackExchange post, I learned that I could run lsb_release -a to see what distribution I was using. So, my first Dockerfile did nothing but spin-up the CommandBox base image, which I could then "bash" into and run the aforementioned command. This gave me the following terminal output:

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic

Given this information, I was able to see that I needed to use the Ubuntu 18.04 (bionic) version with CommandBox. Now, I don't really understand the difference between "architectures"; so, I just picked the first one - amd64 - and that turned out to be correct.

Then, I more-or-less copied the wkhtmltopdf Docker instructions from Deyan Ginev, which gave me the following Dockerfile:

# Use the CommandBox base image.
FROM ortussolutions/commandbox:lucee5

# Prevents the keyboard from having to be configured during build.
# --
# Read more: https://github.com/phusion/baseimage-docker/issues/342
ENV DEBIAN_FRONTEND noninteractive

# CAUTION: The following dependencies list and installation steps have been taken from:
# --
# https://github.com/openlabs/docker-wkhtmltopdf/blob/master/Dockerfile
# --
# ... and modified slightly. I am not sure if the dependencies listed her are actually
# required. Frankly, I don't even know how to determine which dependencies a package like
# wkhtmltopdf even requires.
RUN apt-get update
RUN apt-get upgrade -y

# Download and install wkhtmltopdf dependencies.
RUN apt-get install -y \
	build-essential \
	xorg \
	libssl-dev \
	libxrender-dev \
	wget \
	gdebi \
	&& apt-get clean

# Download the wkhtmltopdf package.
RUN wget https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox_0.12.6-1.bionic_amd64.deb

# Install the wkhtmltopdf package.
RUN gdebi --n wkhtmltox_0.12.6-1.bionic_amd64.deb

To be clear, I don't understand the the list of apt-get packages; and, I have no idea what gdebi is or what it's actually doing. I only have the vague notion that it's installing the wkhtmltopdf package for this particular platform.

With this Dockerfile in place, I created a simple docker-compose.yml file:

version: "2.4"

services:

  lucee:
    build: "."
    ports:
      # Server administrative URLs:
      # --
      # Server: http://localhost:8080/lucee/admin/server.cfm
      # Web: http://localhost:8080/lucee/admin/web.cfm
      - "8080:8080"
    volumes:
      - ".:/app"
    environment:
      APP_DIR: "/app/wwwroot"
      # We are using the CommandBox image as the base for our container. As such, we can
      # use the CFConfig utility to configure out ColdFusion settings (such as the Admin
      # password).
      cfconfigfile: "/app/.cfconfig.json"
      # Setup the font-configuration so that WKHTMLTOPDF knows where to locate our fonts.
      # --
      # Read More: https://blog.rebased.pl/2018/07/12/wkhtmltopdf-considered-harmful.html
      FONTCONFIG_FILE: "/app/wwwroot/fonts/fonts.conf"

... which exposed localhost:8080 as the ingress to my Lucee CFML playground!

NOTE: The FONTCONFIG_FILE ENV variable points to a file that, in turn, points to a directory that contains True Type Fonts (TTF). This file is what allows wkhtmltopdf to embed fonts within the generated PDF document. This file is rather simple:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
	<dir>/app/wwwroot/fonts</dir>
</fontconfig>

Generating PDFs With HTML, CSS, And wkhtmltopdf

Once I had my Docker playground up and running, it was time to start experimenting! What I knew I wanted to do was build an HTML document that referenced CSS files and Images on the local file-system. By default, wkhtmltopdf doesn't grant access to the local file-system. To do this, you have to use the following command-line option:

--enable-local-file-access

Then, with this flag enabled, I was able to use the path-prefix, file:///, to embed files from the Lucee CFML server (just like we can with the CFML CFDocument tag).

What follows is the result of two-days of trial-and-error. There's no point in stepping through how I got here. As such, I'll just share the final product. The following Lucee CFML page first generates a PDF using wkhtmltopdf; then, it renders that PDF document inside an iframe. The PDF includes the primary content as well as a repeated footer that includes the Title and Page Number. In typical Ben-style, I've tried to include a copious amount of comments.

I'm invoking the wkhtmltopdf binary using the CFExecute tag:

<cfscript>

	// Get the "Current Working Directory" - this will be used to generate local
	// file-paths within the CFML files.
	cwd = getDirectoryFromPath( getCurrentTemplatePath() ).left( -1 );

	// We want to be able to use ColdFusion / CFML to define the content of the PDF.
	// However, the wkhtmltopdf binary uses static files. As such, we need to render our
	// ColdFusion / CFML files as static HTML.
	// --
	// CAUTION: My CFML files assume that the "cwd" (Current Working Directory) variable
	// will exist so that it can be used to reference files relative to the local working
	// directory (ie, using the "file:///" path prefix).
	fileWrite( "./document.html", renderCfmlAsHtml( "./document.cfm" ) );
	fileWrite( "./document-footer.html", renderCfmlAsHtml( "./document-footer.cfm" ) );

	wkHtmlToPdf([
		// This allows "file:///" to be used in order to access files directly from
		// within the local server (such as Images, CSS, and JavaScript files).
		"--enable-local-file-access",

		// Only output errors - this allows the CFExecute tag to differentiate debugging
		// output from error output.
		"--log-level error",

		// The default unit for the page dimensions is Millimeter (mm); but, it looks
		// like we can override with inches.
		"--page-width 8in",
		"--page-height 6.5in",

		// Setup the page margins that exist outside the HTML content.
		// --
		// NOTE: We need to have top/bottom margins in order to use Header and Footer
		// items, respectively.
		"--margin-top 0.25in",
		"--margin-right 0.25in",
		"--margin-bottom 0.5in",
		"--margin-left 0.25in",

		// NOTE: There is a "--footer-spacing" (implicit mm unit) that separates the
		// body content from the footer content. However, it does not appear to play well
		// with the "--margin-bottom" value. As such, I'm leaving it as a default 0 and
		// then moving the "--footer-spacing" into the BODY PADDING of the footer CSS.
		// --
		// "--footer-spacing 10",
		"--footer-html #cwd#/document-footer.html",

		// If we weren't using "--footer-html", we could have used the following text to
		// generate dynamic footer content. But, since we are using "--footer-html", we
		// have to use JavaScript within the footer to populate the following tokens.
		// --
		// NOTE: In order for the headers and footers to be visible within the document,
		// a sufficient MARGIN needs to be provided.
		// --
		// "--footer-left '[title]'",
		// "--footer-right 'Page [page] of [topage]'",
		// "--footer-font-name 'Open Sans'",
		// "--footer-font-size 9", // Font-sizes seem to be in Points.

		// No shenanigans!
		// --
		// NOTE: This is only for the main document - JavaScript will still work in the
		// Header and Footer HTML files, which need JavaScript in order to perform
		// runtime substitutions for things like [page].
		"--disable-javascript",

		// Input and output files for the main document.
		"#cwd#/document.html",
		"#cwd#/document.pdf"
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I evaluate the given ColdFusion CFML template and return the rendered HTML.
	* 
	* @templatePath I am the CFML file being evaluated.
	*/
	public string function renderCfmlAsHtml( required string templatePath )
		localmode = "modern"
		{

		savecontent variable = "local.htmlContent" {

			include template = templatePath;

		}

		return( htmlContent );

	}


	/**
	* I execute the wkhtmltopdf binary with the given options. If the binary outputs an
	* error, the error is rendered and the page execution is halted. Otherwise, returns
	* the successful output of the execution.
	* 
	* @options I am the command-line options.
	* @timeout I am the timeout after which CFExecute will throw an error.
	*/
	public string function wkHtmlToPdf(
		required array options,
		numeric timeout = 10
		) {

		execute
			name = "wkhtmltopdf"
			arguments = options.toList( " " )
			variable = "local.successResult"
			errorVariable = "local.errorResult"
			timeout = timeout
		;


		// If the error variable has been populated, it means the CFExecute tag ran into
		// an error - let's dump-it-out and halt processing.
		if ( len( errorResult ?: "" ) ) {

			dump( errorResult );
			abort;

		}

		return( successResult ?: "" );

	}	

</cfscript>

<cfcontent type="text/html; charset=utf-8" />
<cfoutput>

	<!doctype html>
	<html lang="en">
	<head>
		<meta charset="utf-8" />

		<title>
			Testing wkhtmltopdf 0.12.6 With Docker In Lucee CFML 5.3.4.80
		</title>
	</head>
	<body>

		<h1>
			Testing wkhtmltopdf 0.12.6 With Docker In Lucee CFML 5.3.4.80
		</h1>

		<!--- Let's preview the PDF we just generated. --->
		<iframe
			src="./document.pdf?uncache=#getTickCount()#"
			border="2"
			style="width: 100% ; height: 80vh ;"
		/>

	</body>
	</html>

</cfoutput>

Ultimately, I need to provide the wkhtmltopdf binary with static HTML. However, I want to render that HTML using ColdFusion / CFML templates. To do this, I'm simply including a CFML template into a CFSaveContent tag, which evaluates the ColdFusion code in the context and buffers the generated output. I then write this buffered content to a .html file, which is what I'm piping into wkhtmltopdf. I'm using this technique for both the primary content and the footer content.

With the main CFML / HTML content, I wanted to make sure to test a few HTML and CSS features:

External CSS files.
External Font files.
External Image files.
Local links (ie, in-document anchor tags).
Basic CSS support.
Absolutely-positioned / overflow CSS support (especially for images).
Background image support.

The Qt Webkit rendering engine that wkhtmltopdf is using under-the-hood is actually quite old and doesn't support all the new CSS hotness. Specifically, it has very limited support for Flexbox (using strange prefixes) and does not support border-radius on images at all (or containers that clip images).

Here's the CFML / HTML that I used in my exploration - I'm using the HTML5 section tag to drive page-breaks (using special CSS):

<cfoutput>

	<!doctype html>
	<html lang="en">
	<head>
		<meta charset="utf-8" />

		<title>
			Testing wkhtmltopdf 0.12.6 With Docker In Lucee CFML 5.3.4.80
		</title>

		<link rel="stylesheet" type="text/css" href="file:///#cwd#/document.css" />
	</head>
	<body>

		<section>

			<h1 id="section-one">
				Testing <a href="https://wkhtmltopdf.org/">wkhtmltopdf</a>
				With Docker In Lucee CFML 5.3.4.80
			</h1>

			<p>
				Hello world, how goes it?
			</p>

			<ul>
				<!---
					NOTE: These anchors are linking to ID-attributes lower-down within
					this same document. You could have also used a[name] targets; but,
					since I was already including h[1-6] tags, IDs felt like an easy
					choice.
				--->
				<li><a href="##section-two">Section two</a></li>
				<li><a href="##section-three">Section three</a></li>
				<li><a href="##section-four">Section four</a></li>
				<li><a href="##section-five">Section five</a></li>
			</ul>

		</section>

		<section>

			<h2 id="section-two">
				Section Two
			</h2>

			<p class="border-radius">
				Section two is the best.
			</p>

			<p class="border-radius-2">
				And borders work!
			</p>

			<p>
				<a href="##section-one">Back to top</a>
			</p>

		</section>

		<section>

			<h2 id="section-three">
				Section Three
			</h2>

			<p>
				Section three is not so bad, though.
			</p>

			<cfloop index="i" from="1" to="40" step="1">
				<p>
					That said, it has a lot of text....
				</p>
			</cfloop>

			<p>
				<a href="##section-one">Back to top</a>
			</p>

		</section>

		<section>

			<h2 id="section-four">
				Section Four
			</h2>

			<p>
				Trying out some absolutely-positioned images.
			</p>

			<!---
				NOTE: wkhtmltopdf does support SOME FLEXBOX; but, it uses a strange
				notation and doesn't have great support. As such, I'm just falling back
				to using Tables to help center rows of content.
			--->
			<table width="100%" border="0" cellpadding="0" cellspacing="0" class="images">
			<tr>
				<td class="images__viewport">
					<img
						src="#cwd#/lucy.jpg"
						class="images__image"
						style="top: -100px ; left: -100px ;"
					/>
				</td>
				<td width="50%">
					<!-- Space evenly. -->
				</td>
				<td class="images__viewport">
					<img
						src="#cwd#/lucy.jpg"
						class="images__image"
						style="top: -350px ; left: -350px ;"
					/>
				</td>
				<td width="50%">
					<!-- Space evenly. -->
				</td>
				<td class="images__viewport">
					<img
						src="#cwd#/lucy.jpg"
						class="images__image"
						style="top: -600px ; left: -600px ;"
					/>
				</td>
			</tr>
			</table>

			<p>
				<a href="##section-one">Back to top</a>
			</p>

		</section>

		<section>

			<h2 id="section-five">
				Section Five
			</h2>

			<p>
				Trying out some background-images
			</p>

			<div
				class="background"
				style="background: url( '#cwd#/lucy.jpg' ) center right no-repeat ; background-size: cover ;">
				<br />
			</div>

			<p>
				<a href="##section-one">Back to top</a>
			</p>

		</section>

	</body>
	</html>

</cfoutput>

As you can see, nothing too fancy. Here's the accompanying CSS file, which is being linked-to as a local file using the file:/// prefix:

body {
	color: #212121 ;
	/*
		NOTE: I downloaded this font (Caveat) from Google Fonts and then put it in a
		local fonts folder, which is being referenced by the "fonts.conf" file that is
		defined by the ENV variable, "FONTCONFIG_FILE".
	*/
	font-family: caveat, sans-serif ;
	font-size: 30px ;
	line-height: 1.3 ;
	margin: 0px 0px 0px 0px ;
	padding: 0px 0px 0px 0px ;
}

/*
	NOTE: The "page-break-before" and "page-break-after" CSS will help determine where
	the new pages start within the PDF. To help keep page-breaks clear, I'm going to
	force a page-break for each SECTION element (except the first).
*/
section {
	page-break-before: always ;
	padding: 0px 0px 0px 0px ;
}

section:first-child {
	page-break-before: auto ;	
}

section *:first-child {
	margin-top: 0px ;
}

a {
	color: red ;
}

h1 a {
	color: inherit ;
}

.border-radius {
	background-color: gold ;
	border-radius: 20px 20px 20px 20px ;
	padding: 20px 20px 20px 20px ;
}

.border-radius-2 {
	background-color: cyan ;
	border: 5px solid #333333 ;
	border-radius: 20px 20px 20px 20px ;
	padding: 20px 20px 20px 20px ;
}

.images {}

.images__viewport {
	border: 2px solid #131313 ;
	display: inline-block ;
	height: 250px ;
	overflow: hidden ;
	position: relative ;
	width: 250px ;
}

.images__image {
	position: absolute ;
}

.background {
	height: 250px ;
	width: 100% ;
}

Now, if we run our ColdFusion code that evaluates this CFML and pipes it into wkhtmltopdf, we get the following browser and PDF output:

A PDF generated using wkhtmltopdf, Docker, and Lucee CFML.

As you can see, the generated PDF has local-links, embedded CSS, embedded fonts, and embedded images. Pretty cool!

The footer of the PDF is a completely separate CFML / HTML document that gets printed at the bottom of each page. Here's my ColdFusion code for the footer file:

<!---
	In wkhtmltopdf, the Header and Footer portions are self-contained HTML documents
	that are printing on the same pages as the root document. As such, they can have
	their own HTML, CSS, and JavaScript.
--->
<cfoutput>

	<!doctype html>
	<html lang="en">
	<head>
		<link rel="stylesheet" type="text/css" href="file:///#cwd#/document-footer.css" />
		<script type="text/javascript" src="file:///#cwd#/document-footer.js"></script>
	</head>
	<body onload="interpolate()">

		<table width="100%" border="0" cellpadding="0" cellspacing="0">
		<tr>
			<td class="title">
				[title]
			</td>
			<td align="right" class="page">
				Page
			</td>
			<td width="1">
				<div class="page-count">
					[page] of [topage]
				</div>
			</td>
		</tr>
		</table>

	</body>
	</html>

</cfoutput>

As you can see, this ColdFusion CFML code contains special tokens like [title] and [page]. These tokens aren't being automatically replaced. Instead, we have to replace them using JavaScript, which is why my body tag has an onload event-handler. The interpolate() function is being provided in the externally-linked JavaScript file:

// For the Header and Footer HTML documents, all of the dynamic portions are being
// passed to the document as part of the GET request for the rendering of this portion
// of the page (Headers and Footers are entirely separate HTML documents). As such, we
// need to use JavaScript to dynamically replace the [key] placeholders with the search
// string tokens.
// --
// * [page] - Replaced by the number of the pages currently being printed
// * [frompage] - Replaced by the number of the first page to be printed
// * [topage] - Replaced by the number of the last page to be printed
// * [webpage] - Replaced by the URL of the page being printed
// * [section] - Replaced by the name of the current section
// * [subsection] - Replaced by the name of the current subsection
// * [date] - Replaced by the current date in system local format
// * [isodate] - Replaced by the current date in ISO 8601 extended format
// * [time] - Replaced by the current time in system local format
// * [title] - Replaced by the title of the of the current page object
// * [doctitle] - Replaced by the title of the output document
// * [sitepage] - Replaced by the number of the page in the current site being converted
// * [sitepages] - Replaced by the number of pages in the current site being converted
function interpolate() {

	var search = document.location.search.slice( 1 );
	var body = document.body;
	var innerHTML = body.innerHTML;
	var originalHTML = innerHTML;

	search.replace(
		/([^&=]+)=([^&]*)/g,
		function handleSearchParameter( $0, key, value ) {

			innerHTML = innerHTML.replace( ( "[" + key + "]" ), decodeURIComponent( value ) );

		}
	);

	// If we actually substituted any of the HTML content, let's update the body.
	if ( innerHTML !== originalHTML ) {

		document.body.innerHTML = innerHTML;

	}

}

And, for completeness, here's the CSS file for the footer - remember, the Header and Footer HTML files are completely separate HTML files. As such, they need their own CSS:

body {
	color: #666666 ;
	font-family: caveat ;
	font-size: 20px ;
	line-height: 1.3 ;
	margin: 0px 0px 0px 0px ;
	padding: 20px 0px 0px 0px ;
}

.title {}

.page {
	padding-right: 7px ;
}

.page-count {
	background-color: #121212 ;
	border-radius: 3px 3px 3px 3px ;
	color: #ffffff ;
	display: inline-block ;
	font-size: 13px ;
	font-weight: bold ;
	padding: 2px 7px 3px 7px ;
	white-space: nowrap ;
}

wkhtmltopdf, HTML Content, And Server Security

On several different pages, the wkhtmltopdf site mentions that security is a major concern. That, when including user-provided content, we must take steps to sanitize / escape the content as it could lead to serious issues:

Do not use wkhtmltopdf with any untrusted HTML - be sure to sanitize any user-supplied HTML/JS, otherwise it can lead to complete takeover of the server it is running on!

Be careful!

The wkhtmltopdf project seems pretty cool. Generating PDF feels like one of those problems that should just be solved at this point; but, apparently, PDFs are super complicated; and, many of the solutions out there - like wkhtmltopdf - have some rendering limitations. But, overall, this looks like a decent way to generate PDFs in ColdFusion / Lucee CFML.

Epilogue: Why Not Just Use The `CFDocument` Tag?

Ideally, I would just use the CFDocument tag in ColdFusion as that would reduce complexity and remove external dependencies. However, the CFDocument tag also has some limitations in what kind of HTML / CSS is will render. This is especially true of the Flying Saucer rendering engine that Lucee now includes. While Lucee's implementation of Flying Saucer adds some functionality over the Adobe ColdFusion support, it appears to drop other features (like being able to clip images inside an overflow:hidden container).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3862

Reader Comments

Art Jul 16, 2020 at 10:16 AM

1 Comments

I have looked at many possible solution for my project including wkhtmltopdf, princeXML and others and unfortunately ended up writing my own java module for it that utilizes headless chrome. One of the main reasons is that if you need support for the latest CSS ( css-grid, flexbox, etc... ) none of the other solutions came close to headless chrome rendering engine.

Ben Nadel Jul 28, 2020 at 12:28 PM

16,125 Comments

@Art,

I think a lot of people use some sort of headless-Chrome for this type of thing. Just the other day, I was talking to someone who does that via a Lambda function on AWS. If I were to write this in Node.js, that's probably the way that I would go. But, it's fun to see what ColdFusion / Java can do.

Sebastiaan Nov 5, 2020 at 2:39 PM

61 Comments

Have you tried getting HighCharts to render properly in WKHTMLTOPDF?
Usually I can get it to work but recently have had an issue only getting the box for the graph but not the graph inside it...

Ben Nadel Nov 6, 2020 at 4:50 AM

16,125 Comments

@Sebastiaan,

At this point, I've only done this one exploration. To go further with WKHTMLTOPDF, I'd have to get our Security team involved to see if I can install this on our production servers. It would be cool - just lower on my list of priorities :D

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.