Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Generate And Incrementally Stream A ZIP Archive File On-The-Fly In Lucee CFML 5.3.7.47

By Ben Nadel on
Tags: ColdFusion

The other day, in the InVision Architecture Office Hours meeting (which is, by far, my favorite meeting of the week), I was talking about how amazing it is that GitHub allows you to download a ZIP archive file of any repository, despite the fact that some repositories are many Gigabytes in size. One engineer (I can't remember who) theorized that GitHub might be generating the ZIP on-the-fly and just streaming the response back to the browser. This concept tickled my curiosity, and I wondered if I could generate and stream a ZIP archive file on-the-fly in Lucee CFML 5.3.7.47.

Typically, when generating a ColdFusion response, we are dealing with Text data. This is often in the form of CFML / HTML for a page response; or, something like JSON (JavaScript Object Notation) for an API response. However, under the hood of every ColdFusion response is a Java Servlet Page Context, which we can access with the function, getPageContext().

Now, I will heavily caveat that I know nothing about Java servlets. But, I have a rough-enough understanding of how Input and Output Streams work. And, by using the Page Context object, we can get access to the underlying Response Output Stream to which ColdFusion writes our String-based CFML content.

With this response output stream, we can start writing any kind of data to the output. Or, we can have another "writer" pipe data into the response on our behalf. And, that's exactly what we're going to do in this experiment: we're going to create an instance of Java's java.util.zip.ZipOutputStream object and have it write directly to the ColdFusion response output stream. Then, as we add Zip entries into the ZipOutputStream, the ZipOutputStream will handle the compression and write the deflated-bytes to the CFML output for us.

To experiment with this idea, I grabbed a few of the latest image URLs from my People page. And, I'm going to download and ZIP them together on-the-fly:

ASIDE: As I've discussed before on this blog, image files are already compressed. As such, it doesn't make so much sense to store them in a ZIP file using the DEFLATE method since it's just a waste of CPU cycles. As such, I've looked at using the STORE method with the zip CLI in Lucee CFML to archive images without as much CPU overhead.

<cfscript>

	// To experiment with creating and streaming a ZIP file in Lucee CFML, I'm going to
	// download a number of images from the People section on my website and then add
	// them, in turn, to the ZIP output stream.
	imageUrls = [
		"https://bennadel-cdn.com/images/header/photos/irl_2019_old_school_staff.jpg",
		"https://bennadel-cdn.com/images/header/photos/james_murray_connor_murphy_drew_newberry_alvin_mutisya_nick_miller_jack_neil.jpg",
		"https://bennadel-cdn.com/images/header/photos/juan_agustin_moyano_2.jpg",
		"https://bennadel-cdn.com/images/header/photos/jeremiah_lee_2.jpg",
		"https://bennadel-cdn.com/images/header/photos/wissam_abirached.jpg",
		"https://bennadel-cdn.com/images/header/photos/winnie_tong.jpg",
		"https://bennadel-cdn.com/images/header/photos/sean_roberts.jpg",
		"https://bennadel-cdn.com/images/header/photos/scott_markovits.jpg",
		"https://bennadel-cdn.com/images/header/photos/sara_dunnack_3.jpg",
		"https://bennadel-cdn.com/images/header/photos/salvatore_dagostino.jpg",
		"https://bennadel-cdn.com/images/header/photos/robbie_manalo_jessica_thorp.jpg",
		"https://bennadel-cdn.com/images/header/photos/rich_armstrong.jpg"
	];

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	zipFilename = "people.zip";

	// Reset the output buffer (assuming nothing has been flushed yet) and setup the
	// response headers. We can't tell the response how LONG the content will be; but, we
	// can tell it that it will be a ZIP file and that it should be treated as an
	// attachment (ie, prompt for download).
	header
		name = "content-disposition"
		value = "attachment; filename=""#zipFilename#""; filename*=UTF-8''#urlEncodedFormat( '#zipFilename#' )#"
	;
	content
		type = "application/zip"
		reset = true
	;

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// Under the surface of each ColdFusion page request is a Java Servlet Response. By,
	// dipping down into the Java layer, we can get access to the actual Binary Output
	// Stream to which we can write raw binary content. In this case, we're going to
	// create a Zip Output Stream and have it pipe directly into the ColdFusion response
	// output stream.
	cfmlResponse = getPageContext().getResponse();
	cfmlOutputStream = cfmlResponse.getOutputStream();

	// Notice that we are initializing the Zip output to WRITE TO the CFML response
	// output stream. Now, as the Zip Entries are being calculated, the ZIP will
	// incrementally be flushed to the client (the browser).
	zipOutputStream = createObject( "java", "java.util.zip.ZipOutputStream" )
		.init( cfmlOutputStream )
	;

	// Download and add each image to the Zip in turn.
	for ( imageUrl in imageUrls ) {

		http
			result = "download"
			method = "get"
			url = imageUrl
			getAsBinary = "yes"
		;

		zipEntry = createObject( "java", "java.util.zip.ZipEntry" )
			.init( "streaming-zip/images/" & getFileFromPath( imageUrl ) )
		;
		zipOutputStream.putNextEntry( zipEntry );
		// Write the image binary content directly to the ZIP output stream. This will
		// compress / archive the data on-the-fly and write it to the CFML output stream.
		zipOutputStream.write( download.fileContent );
		zipOutputStream.closeEntry();

		// Flush the Zip output stream and the CFML output stream. This should start
		// sending bytes to the browser incrementally.
		zipOutputStream.flush();
		cfmlOutputStream.flush();

		// Let's sleep in between each image so that we can see how the data is getting
		// incrementally sent to the browser.
		systemOutput( "Sleeping in between ZIP entries.", true );
		sleep( 1000 );

	}

	// Close out output streams, and make sure no other data is sent to the browser.
	zipOutputStream.close();
	cfmlOutputStream.close();

</cfscript>

The key part of this ColdFusion file is the line:

<cfscript>

	zipOutputStream = createObject( "java", "java.util.zip.ZipOutputStream" )
		.init( cfmlOutputStream )
	;

</cfscript>

This is where we're handing control over the output to the ZipOutputStream. After this, any data that is written to the ZipOutputStream is automatically going to be written the ColdFusion page response. And, since we flush both the Zip output stream and the CFML output stream after each image is written to the archive, it should start streaming data to the browser incrementally.

In fact, I've added a sleep() call after each Zip entry operation so that we can see data slowly streaming to the browser. And, when we run this Lucee CFML code, we get the following output:

A ZIP archive file being generated on-the-fly and streamed to the browser in Lucee CFML.

As you can see, the browser starts to receive data immediately while the Zip archive is being generated in the background. This is because, each entry that is added to the Zip Archive is, in turn, written to the CFML Response Output Stream.

We could just have easily created a ZIP archive file on disk, and then used the cfcontent tag with the file/delete attributes or the cfcontent tag with the variable attribute to stream the generated Zip archive file to the client. However, the goal with this approach would be to incrementally generate and stream the ZIP file such that we didn't have to keep any large files on disk. This would reduce resource consumption (I think); and, would obviate the need to clean-up the file-system when we were done.

I'm not trying to say this approach is better that creating an intermediary file. I'm only saying that this approach might make creating very large archives more feasible. But, again, this was just a fun exploration.



Reader Comments

Hey Ben. This is a great article. At the moment, I am building a project that requires a file upload of 20MB+

I am using the latest version of Lucee on Windows 2012R2 with IIS7

After about 5 minutes, the connection drops. No CF or IIS error.

I have set the:

<cfsetting requestTimeOut="1000" />

My form has the following set:

enctype="multipart/form-data"

I have set the following in my web.config:

<system.web>
	<httpRuntime maxRequestLength="25480" />
</system.web>

I have also made sure I have set the correct folder read/write permissions, for the upload, on my VPS.

I am racking my brains but cannot think of anything obvious, I am missing. I must say, I have never tried to upload files of more than 5MB, in the past, but 20MB, doesn't seems that excessive.

Any ideas?

Reply to this Comment

@All,

When I was originally putting this post together, I wanted to try and use the STORED method for the images, since images are already compressed. However, I kept running into errors. As such, I wanted to quickly follow-up with a post that uses the STORED method:

www.bennadel.com/blog/3966-using-both-stored-and-deflated-compression-methods-with-zipoutputstream-in-lucee-cfml-5-3-7-47.htm

Turns out, I needed to explicitly provide the size and CRC-32 (checksum) for each ZipEntry when using the STORED method instead of the default DEFLATED method.

Reply to this Comment

@Charles,

Hmmm, 5-minutes feels like a really long time to upload a 20-mb file. I mean, I guess it depends on your internet connection; but, assuming you're on a decent connection, that feels rather slow. Since you're not getting any errors, I'm wondering if the upload is working, but something is holding the request open?

In the past, I've found that if I have a Content-Length header that is incorrect, the browser will just sit there waiting for the content to arrive (which never arrives) and then the request dies eventually.

Have you tried just writing to a Log file or the server's output-stream when receiving / processing requests, just to see if the request is reaching the server? Does the upload work with smaller files?

Reply to this Comment

Ben. Thanks for all your suggestions. I will check out the content-length header.

I have now found that anything below 4MB, uploads fine and interestingly, the default web.config:

maxRequestLength

Is 4MB

Now earlier I set ** maxRequestLength** to 102400, thinking that this setting was in KB, but, in fact, I have found out that in IIS6, it is in bytes. So, I am going to bump this up to 102400000, and see what happens!

I thought I was using IIS7, but in fact, it looks like my Windows 2012R2 VPS is running IIS6, which is a little strange.

Reply to this Comment

Sorry. It is IIS8, not IIS6. Just double checked. So, I read a very interesting article about the difference between IIS6 & 7, which probably means 8, as well!

https://weblogs.asp.net/jeffwids/from-iis6-maxrequestlength-to-iis7-maxallowedcontentlengthfile-specifying-maximum-file-upload-size

It seems I need to use maxAllowedContentLength instead of maxRequestLength, but the latter seems to override the former, if both are set in IIS8.

Anyway, enough talk, it is time to find out whether my hypothesis works!

I shall let you know the outcome, in case you ever come across this problem, in future.

Reply to this Comment

Hi Ben. Wow. I have cracked it!

Not quite, how I explained in the previous reply.

In the end, I had to add both maxAllowedContentLength and maxRequestLength, like:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
...
		<security>
			<requestFiltering>
				<requestLimits maxQueryString="32768" maxAllowedContentLength="104857600" />
			</requestFiltering>
		</security>
    </system.webServer>
	<system.web>
       <customErrors mode="Off" />
	   <httpRuntime maxRequestLength="102400000" />
	</system.web>
</configuration>

It seems that both attributes are in bytes, not kilobytes.

That just wasted a whole day, but it is well worth knowing, in case, I need to add capacity for big file uploads, in future.

Reply to this Comment

@Charles,

Oh man, that's some tricky stuff! When it comes to request-size, there's so many places things can "fail" do to all the layers between the User and your ColdFusion application. I'm glad you got it figured out.

I was recently bitten by something related(ish), where I recently upgraded my ColdFusion server, and the datasource connection got reset. So, when I re-created it, I think the max "packet size" for the MySQL driver was like 64Kb. And, as you may know, I have some long-ass posts from time to time and the driver was silently truncating long posts 😱 Thank goodness I had some text/mardown back-ups of my writing.

Reply to this Comment

One last thing, that I overlooked.

In my last reply, I only tested a 10MB file. I decided to test my target file which is just shy of 20MB. I got a new IIS error. Something about Application Timeout?

But, with a quick look on StackOverflow, I found an easy solution. We just have to add the following attribute to the web.config httpRuntime property:

executionTimeout="3600"

This adds a 60 minute timeout for any page request.

My final web.config without the guff:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
...
        <security>
            <requestFiltering>
                <requestLimits maxQueryString="32768" maxAllowedContentLength="104857600" />
            </requestFiltering>
        </security>
    </system.webServer>
    <system.web>
        <httpRuntime maxRequestLength="102400000" executionTimeout="3600" />
    </system.web>
</configuration>

The executionTimeout attribute uses a value in seconds.

With this set-up, we can now upload files up to about 100MB. Hooray.

My apologies for hijacking your blog with an unrelated topic, but hopefully, this might help someone, in future.

Reply to this Comment

Ben. As regards your database issue, there are so many settings that we are often unaware of, but it is actually quite satisfying, getting to the bottom of each problem.

I will remember this because I am certain, I will need to submit larger data packets to my MySQL database, at some point, in the future.
64KB seems pretty small to me, but I am sure there is a good reason for this.

Thanks for the heads up!

Reply to this Comment

@Charles,

So much stuff can break! At work, we have a Request-Timeout on the ColdFusion app, a Request-Timeout in the nginx proxy, and a Request-Timeout in the CloudFlare CDN. And, I'm 99% sure they are all different values. So, when something returns a 504 Gateway Timeout, it's a fun investigation to figure out which level is causing the issue :D

Good sir - no hijacking - I'm always excited to have conversations about programming!

Reply to this Comment

I must say, I am amazed that some of these default values are set so low. Only 4MB upload limit on IIS8. Lucee/Tomcat is 50MB, which seems reasonable I realise that normally, people upload small images etc, but 4MB, seems a bit low? And 64KB for a MySQL packet seems kind of small.

I am just amazed that I have never come up against these limits before, in my 20 year career as a technical developer.

We live & learn...

Reply to this Comment

@All,

After I posted this, I got to thinking about serving the Zip file up from S3 so that we don't actually have to hold the user-connection open while the Zip file is generated. This, in turn, got me thinking about streaming the file to S3:

www.bennadel.com/blog/3971-generate-and-incrementally-stream-a-zip-archive-to-amazon-s3-using-multipart-uploads-in-lucee-cfml-5-3-7-47.htm

Amazon S3 doesn't really have a "streaming API". But, they do have "multipart uploads", which can kind of working as a streaming API, especially when we run the Zip generation and the S3 upload in parallel.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.