Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with:

Using ColdFusion To Stream Files To The Client Without Loading The Entire File Into Memory

By Ben Nadel on
Tags: ColdFusion

Just a really quick post here. In my previous post on creating semi-secure file downloads, Todd Rafferty brought up the idea that the reason we want to avoid CFContent is because it loads the entire file into memory before it flushes it to the browser. He made the point that the problem is not so much the tying up of threads via CFContent, but rather the fact that so much RAM was tied up in the file load. Therefore, he raised the idea of using a FileInputStream to incrementally load the file into memory and then flush it to the browser. He pointed me to a post on RealityStorm.com, which is what he was basing his code on.

Anyway, I have done some stuff like that a long time ago based on what Christian Cantrell wrote (which is what RealityStorm.com was also referencing); but, I haven't played around with it in a while (and mine didn't use an input stream, but rather a binary variable). And so, I thought I would quickly take the code and port it over to a ColdFusion custom tag - smartcfcontent.cfm. This doesn't support all of the ColdFusion CFContent features, just the File and Type combination. For everything else, you would just want to use the CFContent tag directly.

The use of this ColdFusion tag (in CFModule format) would look like this:

  • <!---
  • Even when using the "smart" buffer, we can still use our
  • standard ColdFusion header values.
  • --->
  • <cfheader
  • name="content-disposition"
  • value="attachment; filename='girls.png'"
  • />
  •  
  • <!---
  • Stream file to browser without having to load the entire
  • file into memory. This uses a 5 meg buffer to shuttle data
  • from a file to the client.
  • --->
  • <cfmodule
  • template="smartcfcontent.cfm"
  • type="image/png"
  • file="#ExpandPath( './girls.png' )#"
  • />

Notice that you can still use the standard ColdFusion CFHeader tag to define your attachment type and suggested file name.

Here is the ColdFusion code behind this custom tag. Nothing revolutionary here; I am really just duplicating what others have done, but in my own style so that me and Todd Rafferty can compare notes:

  • <!--- Param the tag attributes. --->
  •  
  •  
  • <!---
  • This is the mime type of the content that we are
  • streaming to the browser.
  • --->
  • <cfparam
  • name="ATTRIBUTES.Type"
  • type="string"
  • default="application/octet-stream"
  • />
  •  
  • <!---
  • This it the expanded path of the file that will be
  • streamed to the client.
  • --->
  • <cfparam
  • name="ATTRIBUTES.File"
  • type="string"
  • />
  •  
  •  
  • <!---
  • Get a pointer to the response. We will need to this to
  • set the header values and finalize the data flush. To get
  • this, we will have to go two levels deep - past the text
  • output stream, to it's underlying binary stream.
  • --->
  • <cfset THISTAG.Response = GetPageContext()
  • .GetResponse()
  • .GetResponse()
  • />
  •  
  •  
  • <!---
  • Get a pointer to the underlying binary repsonse stream
  • of the current ColdFusions.
  • --->
  • <cfset THISTAG.BinaryOutputStream = THISTAG.Response.GetOutputStream() />
  •  
  •  
  • <!---
  • We need to create a byte array that will be used to read
  • in the input stream and then transfer the input stream to
  • the output stream. Since ColdFusion doesn't have true
  • arrays, we need to hack one by grabbing the byte array
  • from a ColdFusion string.
  •  
  • Here, we are using the underlying Java method to grab a
  • byte array that is 5,120 bytes long (around 5 megs).
  • --->
  • <cfset THISTAG.ByteBuffer = RepeatString( "12345", 1024 )
  • .GetBytes()
  • />
  •  
  •  
  • <!---
  • Now, we need to create a file input stream so that we can
  • read chunks of the file into memory as we stream it.
  • --->
  • <cfset THISTAG.FileInputStream = CreateObject(
  • "java",
  • "java.io.FileInputStream"
  • ).Init(
  • JavaCast( "string", ATTRIBUTES.File )
  • )
  • />
  •  
  •  
  • <!---
  • Before we start putting stuff in the buffer, let's
  • turn off the auto-flushing mechanism so that we have
  • full control.
  • --->
  • <cfset GetPageContext().SetFlushOutput(
  • JavaCast( "boolean", false )
  • ) />
  •  
  •  
  • <!---
  • Reset the buffer to make sure nothing else has built up
  • in prior to this tag.
  • --->
  • <cfset THISTAG.Response.ResetBuffer() />
  •  
  •  
  • <!---
  • Set the content type using the mime type that was passed
  • in. This will give the browser information as to how to
  • deal with the streamed content.
  • --->
  • <cfset THISTAG.Response.SetContentType(
  • JavaCast( "string", ATTRIBUTES.Type )
  • ) />
  •  
  •  
  • <!---
  • Now that we have all the elements in place, let's start
  • reading in the file and moving it to the output buffer.
  • We are going to keep doing this while until we hit the
  • end of the file.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Read a chunk of the file into the byte buffer. --->
  • <cfset THISTAG.BytesRead = THISTAG.FileInputStream.Read(
  • THISTAG.ByteBuffer,
  • JavaCast( "int", 0 ),
  • JavaCast( "int", ArrayLen( THISTAG.ByteBuffer ) )
  • ) />
  •  
  •  
  • <!---
  • Check to see if any bytes were read. If not, then we
  • will have a -1 to denote that the end of the file has
  • been reached.
  • --->
  • <cfif (THISTAG.BytesRead NEQ -1)>
  •  
  • <!---
  • Write the buffer to the output stream. We want to be
  • careful only to write as many bytes as were read in.
  • --->
  • <cfset THISTAG.BinaryOutputStream.Write(
  • THISTAG.ByteBuffer,
  • JavaCast( "int", 0 ),
  • JavaCast( "int", THISTAG.BytesRead )
  • ) />
  •  
  • <!--- Flush this new content to the client. --->
  • <cfset THISTAG.BinaryOutputStream.Flush() />
  •  
  • <cfelse>
  •  
  • <!---
  • We hit a (-1). We reached the end of the file. This
  • is not the cleanest solution, but just break out
  • of the loop.
  • --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • ASSERT: At this point, we have fully read in the file,
  • moved it to the binary output stream, and then flushed it
  • to the client. Now, we just have to peform clean up work.
  • --->
  •  
  •  
  • <!---
  • Reset the response. This will clear any remaining information
  • in the buffer as well as any header information.
  • --->
  • <cfset THISTAG.Response.Reset() />
  •  
  • <!---
  • Close the file input stream to make sure we are not locking
  • the file from further use.
  • --->
  • <cfset THISTAG.FileInputStream.Close() />
  •  
  • <!---
  • Close the output stream to make sure no other content is
  • getting flushed to the browser.
  • --->
  • <cfset THISTAG.BinaryOutputStream.Close() />
  •  
  •  
  • <!---
  • Exit out of this tag to make sure it doesn't try to execute
  • for a second time if someone made it self-closing.
  • --->
  • <cfexit method="exittag" />

This works quite nicely.




Reader Comments

Good job Ben. I'll have to compare it against what I have at home. I'd love to see some memory stats on this as well and see if it really does what we think it is doing.

P.S.: CFDev Team @ Adobe, you're not off the hook on this - I want CFContent fixed. :P

@Todd: Are you saying that <cfcontent file="/path/to/file"> internally fails to use a file buffer, and instead reads the entire file before transmitting it to the browser (like as if you did <cffile action="read"><cfoutput>#filecontent#</cfoutput>)?

If that's the case, this seems like an egregious bug with ColdFusion, of what real value is <cfcontent file=...>? Although the docs don't say so explicitly I always assumed that was its purpose.

@Eric : Yes, to the best of my knowledge, that's what I'm saying. It's been this way for a long time too. Someone from Adobe can come correct me at any time.

Having built 2 iterations of a document library, I have seen some strange things happening with memory when it comes to cfcontent. I have brought this up elsewhere and seen a lot of "ditto" responses that have lead me to believe this. Don't get me wrong, CFContent is doing what it is supposed to be doing, streaming a file down to the user. It's just not doing what I'd like it to be doing, buffering. Perhaps there are additional threading involved when it comes to buffering, which is why I'd love to see stats on this.

So, someone can also feel free to step in and tell me why buffering would be bad in this scenario and why reading the whole thing would be more desirable. The only thing I can think of is cfcontent's deleteFile attribute. If you say yes, then it makes sense to read the whole thing into memory and delete the file. However, if I have no need to remove that file, then can't it be a little more memory friendly?

can anyone confirm that the behavior being presented is still present is cf8? remember that the file libraries got a complete overhaul.

this might be something to also pass a long to the open blue dragon committee.

I just did a test in CF 8 using FusionReactor to watch memory while using <cfcontent type="application/x-zip-compressed" file="#ExpandPath('bigfile.zip')#">

Memory was at 141mb when the file started, memory was at 151 when the file transfer completed. The file was 302 meg, and the transfer took about 2 minutes. The memory graph never spiked, and never went over 151 meg, though it did clearly do a garbage collection in the middle of the transfer and drop down to 141 meg before rising gradually to 151 again.

Long story short: it looks like <cfcontent file="..."> uses buffering to transfer the file, it doesn't consume an inappropriate level of resources.

@Eric,

Awesome detective work! This is most excellent to see.

@Todd,

As long as we are in a better place than we were a few hours ago, I have to think this foray has been pretty successful :)

FusionReactor is just downright invaluable. It kicks the butt of CF's built-in server monitoring. It's so great to be able to look at your past requests, see how many queries they executed, see the longest-running queries, see how much memory that request consumed, identify scripts which are consistently slow-running, see the request/response headers after the fact, kill threads which are stuck for whatever reason, set yourself up for alerts, see what your memory/cpu/jdbc utilization is like over time, and if you buy the enterprise version, get a dashboard of the health of all your servers.

One of my favorite features is the ability to add traces to a request. These are bits of extra data which get stored in the request history, so you can click on that request and get bits of debug info (whatever you feel like putting in there). Where I work, we do a lot of interaction with SAP from ColdFusion. I set up for myself the ability to have it trace the inputs and outputs of each SAP call as a flag on our SAP interaction CFC. Then when a business user places an order and claims they're seeing the wrong thing, I can go into the request history, click on the request, and see what I passed to SAP to make sure my inputs are right, and also see what came back from SAP to see if the problem is on that side.

If you're doing personal development or small-scale development, it's probably not worth it, but if you're doing enterprise stuff, this tool is simply critical. It's saved me so much debug time when I can just peek at the request history to narrow down where problems are.

It's also critical for monitoring production instances. We have a big pile of production boxes, some of which have many instances of CF running on them. If we get a complaint about slowness, we can fire up the enterprise dashboard and see at a glance exactly where the problem is - whether it's a specific server, whether it's the database, whether one of the boxes has run out of memory, etc.

Download the trial and look up how to wrap your data sources with the FusionReactor wrapper, as well as how to do FRAPI traces. My guess is you'll be impressed.

@Eric, I don't doubt it. If I were in business for myself, I would be purchasing it. Since I'm only poking around my own server and such, it's a little steep for me (for the moment). I will say that I might suggest a copy of it for work, but I'm not sure they'll go for it (small shop).

@Todd - Yeah, it would be nice if they provided a developer-only version for free (such as if it only worked in Developer-mode CF installs) - they'd probably make a lot of sales that way =)

@Kurt:
I hadn't tested it on 7 when I posted earlier. I'm running a test now. 25 meg into the file, memory is actually down slightly from when I started - a garbage collection clearly ran (started at 111 mb, down to 104 mb now). It looks like CF7 also buffers.

It should be noted, I'm looking at used memory, not allocated memory. The difference is allocated memory is how much RAM the JVM has claimed for itself while used memory is how much of the allocated memory is in use by resident objects (it's always smaller than allocated memory). This is the real measure of actual usage, and has to be reported by the JVM, not by the operating system (FusionReactor shows you both).

Hi,

Great to hear some feedback on FusionReactor. Our team is very active in the community and love to hear feedback (negative & positive). I'll certainly pass on your comments.

You can see an online demo of FusionReactor from our website but the best way to see it is download and install it (time limited demo version, don't worry, it just switches off when your time limit expires or, you can buy a license) - I'm sure you'll be impressed and see what a valuable tool it is.

If you're heading to Scotch-on-the-Rocks (UK) or CFUnited (US) then come along and meet some of the team to see FusionReactor and the other tools in our Fusion product suite.

Thanks,
D

Great code! I've modified it to stream large BLOBs from a database and it's working beautifully. Thanks Ben!

Ran into an interesting problem though - it looks like the response reset

  • <cfset THISTAG.Response.Reset() />

is throwing java.lang.IllegalStateException: Response has already been committed.

The exception is not visible to the user but it of course shows up in the exception log.

Has anyone else run into this?

@Mike, that's expected behavior. If you've used <cfcontent> to send content to the browser, that is not buffered for memory reasons (there is no practical limit to how much data you could stream down with this approach). The response being committed means that ColdFusion has already instructed the web server to start sending bytes to the browser. Once they've been sent down the line, you can't recall them.

@Mike,

This has been a while, but I don't remember getting any errors. As @Eric is saying, however, if you've flushed any content to the browser already, then you won't be able to reset. It't be like trying to call CFContent twice in a row.

Do you have a CFFlush or something before the tag that's "committing" the request?

Hey Ben, not sure if you knew this or not but CF9 has a nasty bug of not being able to serve large files with CFContent (http://cfbugs.adobe.com/cfbugreport/flexbugui/cfbugtracker/main.html#bugId=83425).

I thought I had found a miracle work around with this posting, but it seems that using smartcfcontent, all files over 127 MB get truncated right at that point. Don't know if this has always happened, or if it's new and related to the CF9 CFContent bug, but that file size seems to be the same as where CFContent can't serve it anymore. Playing with the JVM memory settings doesn't seem to make any difference.

@Eric,

Hmmm, interesting behavior. I haven't served up large files before so have not run into this problem myself. I hope this gets solved in one of the dot-releases.

We never experienced cfcontent issues until we moved to CF9. I get the same errors using the ahove method as I do with cfcontent "Java heap space null". Some of the files on our file server can be as big as 750mb. Does anyone know if there is a way to make the max heap setting higher than 1024mb? If we make it higher, the CF service won't start back up. Thanks!

@Chad, yes, but only under a 64-bit OS. if you're running 32-bit, your limit is 1024. If you upgrade to 64-bit, you need to do a fresh install of CF - don't even just install overtop of your existing CF install or else CF will still be running in 32-bit mode.

Hey!

I'm using your code to read a file that is still being written, so I can begin the stream to the client before needing to wait for the file processing to finish.

Your code as is works great, but if the download speed exceeds the processing speed it will fail. Since I also write a log file when the processing is complete, I added a check for the log file's existence into the loop instead of looking for no new bytes, with a sleep to allow a little more processing to buffer if the log file's not present.

However, when all is said and done, there seems to be a lock on the file once the write is complete. I can still read the file, but can't delete it.

Do you think that since I open the file for reading while its still being written, the program writing it can't close its write handle? I don't really know how the handles are, er, handled as the file is being written by a pile of programs via pipes.. ffmpeg | sox | sox | lame | ffmpeg.

The fact that it works at all is actually outstanding. :D

Jay

@Jay, I don't know of any reason that reading a file will screw up locks on the same file by other processes. But it sounds like you're using a file as an intermediary between stream processing software (eg, ffmpeg) and ColdFusion. I haven't done this directly, but as far as I'm aware those programs can write their raw data to StdOut. If the data is discarded as soon as you stream it to the browser, you should be able to take advantage of that and send the data directly to the browser without it ever sitting on the disk.

You probably have to drop to Java processes, but you can do so entirely from the comfort of a ColdFusion environment. I've done real time command execution back to the browser a few years ago, what I have is probably not too far off from what you need. I talk about it on my long-extinct blog:
http://www.bandeblog.com/2008/03/real-time-command-execution-feedback/

When dealing with binary data, the hairy bits might be surrounding WriteOutput() calls, I'm not sure if you can do that with binary data or not, or how CF will even handle the data at this point. To do it right you need to start using buffered readers and such (the approach there would be a bit slow for really big chunks of data, such as ffmpeg tends to produce).

There's a way to do this, I've just never tried (though you've got me curious).

Interesting, I'll go through that code and see if there are any bits I can use, thank you!! I need to write the output of this process to a file for re-use since its very processor intensive. I haven't had a chance to really work at the problem yet, but I'll post back when I do.

On a side note, after many iterations of executing my processes with java based methods, I fell back to crappy ol' cfexecute to, of all things, a batch file. FFMpeg has an unusual problem in that all console output seems to be written to stdErr, which really sucks. I execute cmd.exe, call a batch file with all the programs I want to run afterwards, like:

route.bat c:\tools\ffmpeg.exe -i blah.flv -f wav - | lame.... etc

The batch file simply has:

echo %* <-- which is awesome
echo %errorlevel%

This allows me to look for the errorlevel in the cfexecute output (always in stdOut). I don't think errorLevel is otherwise available to cfexecute. That %* was new to me today.. it runs everything passed to it, getting over the 9 item variable limit in windoze command line.

K, too far off topic, just thought ppl here might interested. :D

Jay

So, I found the problem with locked files, not being able to delete them after using this code. This code:

<cfset THISTAG.Response.Reset() />

Generates this error in application.log:

"Error","jrpp-2","11/15/10","16:50:42",,"Response has already been committed The specific sequence of files included or processed is: <snip>\stream.cfm, line: 176 "

Thus, the file was simply not being closed. Hope this helps someone out there in internet land.

I tried the tag on Windows 2008 IIS 7, works GREAT for anything under 1GB download size. For larger files, it stops downloading right at 1GB. Any idea why? I've been pulling my hair out on this for a day now trying all of the usual searches for IIS 7 download limits. The problem happens for me in both Chrome and IE browsers so I'm fairly sure it's not a browser problem. Any ideas anybody?

@Jay, @Ben,

I have encountered the same error: "Response has already been committed". I'm not doing anything else with the file other than downloading to the client so the file being locked isn't a problem for me, but my log file is getting filled with this error message.

I commented out the offending line

<cfset THISTAG.Response.Reset() />

and everything still seems to work properly. Is it OK to remove this line?

FWIW this appears to be fixed in CF901 hotfix 2. I have not verified yet as I don't run the affected server here at work (though I asked for them to install hotfix when they can).

I'm maintaining a site that manages HD video promoting tourism that recently added some big files 1 - 5 GB and I tried this method which works very well for files up to 1GB (the same limitation as using <CFcontent...>)

The download proceeds very will until I see it hang at 1,021 MB (CF9 with hotfix2)

At files that big you'd be better off looking at using XSendFile technology - CF passes a HTTP header to the web server which tells the web server to handle the download. This frees up the CF thread and lets it continue doing what it's good at (generating pages) and let the web server do what its good at (transferring data)

@David,
I've got an IIS server on a Win 2008 box and although I have successfully installed Helicon Ape - I don't understand the next step manual installation code like where my site's /Bin folder is or where & what to put in a web.config file or where the /Helicon/httpd.conf file is to go...

For that matter I'm uncertain what the differences are between a GAC single site and Snap-in are? The documentation assumes I'm far more familiar with the server than I actually am.

I'm feeling my way in the dark & hope for a happy resolution.

XSendFile technology should in no way be necessary. Coldfusion works great in serving up large files less than 1GB in size. There is some configured throttle setting somewhere either in IIS or in CF that is causing it to abort when it gets to 1GB of outbound content transfered. This should NOT be happening. I've pored over the CF configuration in the administrator tool, and IIS configuration as well, for days.

Please....Anybody??

Something might be broken here. I tried to get CF to serve up a large file, and it looks like internally CF is trying to read the entire file into RAM first.

My test file is a single line of code:

  • <cfcontent file="/path/to/large/file">

Over 1GB, this fails. My test file is 1,073,666,048 bytes. In the ColdFusion exception log is the following error:

  • "Error","jrpp-143","06/27/12","10:43:56",,"Java heap space The specific sequence of files included or processed is: /path/to/test/file.cfm'' "
  • java.lang.OutOfMemoryError: Java heap space

I even tried some tomfoolery in Java to write directly to the jsp output context, but I still kept exhausting my heap space, so something inside CF is not letting go of that flushed output. In short, it seems like it might not be possible to serve files from ColdFusion to exceed the available heap space to Java right now.

Increasing the memory you give CF, or write the file to a temporary location before producing the link, I think these are your only options.

For anyone curious, here's my java tomfoolery:

  •  
  • <cfscript>
  • filename="/path/to/large/file";
  • bufferSize = 4096;
  •  
  • // Java acrobatics to create a byte array (CF can't do this directly, we
  • // need to trick someone else into giving it to us)
  • emptyByteArray =createObject("java", "java.io.ByteArrayOutputStream").init().toByteArray();
  • byteClass = emptyByteArray.getClass().getComponentType();
  • byteArray = createObject("java","java.lang.reflect.Array").newInstance(byteClass, bufferSize);
  •  
  • // Get an input stream for that file
  • javaFile = createobject("java", "java.io.File").init(filename);
  • instream = createobject("java", "java.io.FileInputStream").init(javaFile);
  •  
  • // CF's underlying JSP response objects
  • pageContext = getPageContext();
  • response = pageContext.getResponse();
  • output = response.getOutputStream();
  •  
  •  
  • // Start streaming!!!
  • response.setContentType("application/octet-stream");
  • for (;;) {
  • bytesRead = instream.read(byteArray);
  • if (bytesRead gt -1) {
  • // Directly write the response
  • output.write(byteArray, 0, bytesRead);
  • // Do not let our buffer fill up!
  • pageContext.getOut().flush();
  • } else {
  • break;
  • }
  • }
  • instream.close();
  • abort;
  • </cfscript>

@Eric, @Spencer

Have you thought about slicing the large file up, and serving them sequentially? Seems like it would be a pretty trivial task. Get the overall file size, slice, read each slice and serve. You'd be releasing the file off the heap before reading the next slice.

Just like consuming pie.

Jason

@Eric,

Sorry, to be clear I meant split the file OUTSIDE this environment and read them sequentially.

In *nix you'd use the SPLIT command. It splits a binary file into pieces. Then you'd execute your code as above, looped for each piece. For instance, you'd read 12 100MB files and 1 44MB file to serve a 1244MB file, and you would only be taxing your help to 1/10 if your heap size is 1GB.

You'd just have to do some GC on javaFile, but I'd wager CF will do it for you when you close the file.

(This is all conjecture, but worth trying IMHO.)

It never gets anywhere near instream.close(); the heap is exhausted inside that for loop.

Heap exhaustion is not related to the size of the file on the disk, it's related to how much data you send downstream.

For example, here I replaced the for loop from my earlier example. Now I only ever read 4kb of data, exactly one time, then I output that data repeatedly:

  • // load just one chunk, and output it repeatedly
  • bytesRead = instream.read(byteArray);
  • for (;;) {
  • output.write(byteArray, 0, bytesRead);
  • output.flush();
  • out.flush();
  • }

This should loop forever, and the size of your download should be limited only by the request timeout and your connection speed. This should never exhaust the heap - in fact your memory for this request should never exceed something in the neighborhood of 10 to 20 kb.

@Jay,@Eric
I've followed the 2008 single site manual installation and updated the web.config file (2) and copied the httpd.conf from program files/helicon/Ape to my site in a new folder /Helicon (3).

What is missing is step (1) - there doesn't seem to be a Helicon.ape.dll from the Helicon download? So, I'm stuck, because it doesn't appear work without that file, or some other mistake I've made.

I'd sure like to find an example I might try to assure that its working when ever I do get it installed.

As for breaking up the files - this is a globally accessed media website for graphics/advertising/marketing people to pick up big pictures (typically 75mb) and HD videos from 100mb to 5GB There's no way the customer would make users concatenate files. Nor can I expect that it would be easy to implement.

Like Eric, I've tried a lot of things from within ColdFusion 9 HF2 to get this working with no joy. My server is a Dell PER510 dual X5550 with 16GB ram.

I've increased all the memory sizes that I can in the CF administrator with no luck, so I think that this direction to use mod_xsendfile is the right way to go...

<CFHEADER NAME="Content-Disposition" value='attachment; filename="myfile.zip"'>
<CFLOCATION URL="http://www.domain.com/myfile.zip">

After trying all kinds of different methods to stream the file and download the file, the simplest solution is to just CFLOCATION right to the file. I'm able to get this working 100% of the time for downloading large files in all the major browsers.

@Spencer, the problem is that if someone shares that link, then anyone can access the file; you can't put any access controls on a direct download like that.

@john waddell, mod_xsendfile seems like the best option for you, I think CF has a bug for now that prevents you from using CF as the direct delivery mechanism for any file larger than the available heap.

@Eric,

All the access control work was done prior to the CFLOCATION to the file. In our case, it was copied to a temporary location with a randomized path, although we are going to re-implement it with a randomized temporary link structure to allow direct downloading, to avoid having to copy it to a temporary folder first.

@Spencer, It's still temporarily hijackable. How do you clean up the old temporary files? When you're offering someone a 5GB file, how much time do you give them to finish downloading it before you zap that file? Also copying around huge files like that is a lot of disk I/O.

If you're using Linux or Mac OS as your server, you can create a randomly named symlink to the protected file, and delete that symlink after 10 seconds or so. This will save you the overhead of copying the whole file to a temporary location, and because of the nature of the OS, you can delete a file even when it's being accessed by a process; the process will retain a handle to the deleted file and continue being able to serve it up, and that disk space won't actually be re-used by the OS until all handles on the deleted file are closed. If you're on Windows you don't have an equivalent option.

@Eric,

Totally understood that it is temporarily hijackable, but the client that bought the file has no incentive to go sharing the download link. The need to download such a huge file does not happen often, at most a few times per day. On Windows..so we'll have to work through some custom solutions. Should be easy though, create a custom randomized URL, start the download, immediately delete the randomized URL from additional new uses, which should not interfere with the download in progress. That's the theory ;)

The #1 design goal of the site was to prevent links from being shared - after all it's government. I crashed the site trying to implement the mod_xsendfile last night, so back to the beginning. I am hoping that the customer goes with an FTP account for each user and we push the file to their account file space. Uses up a lot of disk, but its a win for the users.

The CFlocation is nice & so simple, but the user then has to save the file after rendering it. Some of them really are not that sophisticated

Couple things..

First, my idea of splitting files wasn't clear enough. Here's what you'd do:

(I'm stubbing this code.. It's Canada Day and I'm not programming lol)
[code]
<cfexecute name="split large_file.mp4 smaller_files">

<cfdir get a listing of the smaller files>

<cfloop list=cfdirresults index = filename>

<cfscript>
javaFile = createobject("java", "java.io.File").init(filename);
instream = createobject("java", "java.io.FileInputStream").init(javaFile);

// CF's underlying JSP response objects
pageContext = getPageContext();
response = pageContext.getResponse();
output = response.getOutputStream();

// Start streaming!!!
response.setContentType("application/octet-stream");
for (;;) {
bytesRead = instream.read(byteArray);
if (bytesRead gt -1) {
// Directly write the response
output.write(byteArray, 0, bytesRead);
// Do not let our buffer fill up!
pageContext.getOut().flush();
} else {
break;
}
}
instream.close();
</cfscript>
</cfloop>

<cfabort>

[/code]

See how I'm splitting, then opening, streaming and closing each smaller file? If the bug lies in the context of the file handing, this will fix it. If it lies in the context of the byteArray / flush, then you're still screwed.

Here's a way of handling the security problem using Amazon S3.

Copy the large file to a temp location on S3. Once the download has started, replace the file on s3 with a small .jpg that tells the user the file is no longer available.

S3 will continue to serve the large file to the first request but serve the .jpg to all the subsequent ones.

I believe you can do the same thing with soft links in Linux. Apache will continue to serve the large file. So, make a soft link to the large file, serve it, then create a new soft link of the same name to a new small jpg.

On Windows you'll likely get file locking errors. Stupid Windows.

@Jay, your example of smaller files will trigger the bug. I demonstrated that it's the number of bytes being output that are the problem. Even if you break and disable all possible buffering in CF's JSP context, and even if all you do is output one small piece of data repeatedly, and force flushing aggressively, you will still overflow the heap. It's not the size of the input files, it's the size of the response.

Storing to S3 then replacing it with a small .jpg is the same idea as doing CFLocation then deleting the file under a Linux or Mac OS server. It suffers from a fundamental problem in that you can't know when the download actually begins. Are they using an old version of IE where the download doesn't actually start until you tell Windows where to store the file? It might take them 2 or 3 minutes to find where they wanted to stick that file, it's a lousy solution if the file has vanished before they clicked "Save," and it's vulnerable if you keep it around for 5 minutes, it gives them a chance to share that link with an unauthenticated user. You also have to upload that file to S3, and you need to complete that transfer before sharing the S3 link - S3 doesn't allow read access to a file while it's still being written to (S3 gives a 404). If you're talking a 5gb or 10gb file, that could be a while that as far as the user can tell, nothing is happening. Also you're going to be paying for that transfer to and from S3. Copying the file to a temporary local path is probably a much better idea.

@John Waddell, Helicon Ape is an IIS product, so I assume you're using IIS. In that arena, I think this is very nearly your only option. I recommend contacting Helicon's support for help getting it installed. Ben has covered using mod_xsendfile in the past (and Helicon Ape supposedly has a compatible implementation): http://www.bennadel.com/blog/2170-Streaming-Secure-Files-Efficiently-With-ColdFusion-And-MOD-XSendFile.htm

The only other thing I can think is that you would need to drop to a different scripting language to securely deliver files larger than CF's available heap. I have used PHP to serve single downloads in the terabyte range, so I know it doesn't have this same weakness. Figuring out how to tell PHP that a particular client is permitted access to a particular download is largely an exercise for you, but I would recommend doing something like using a common file to transfer information back and forth (/crosstalk/#SESSION.CFID#.#SESSION.CFTOKEN#.json, containing {"allowedDownloads":["a","b","c"]}, or something like that).

Still, if you can get Helicon Ape installed, that's a much simpler solution once it's working.

@Eric, THanks, I've contacted Helicon & received this reply;

The bad news is that x-sendfile will not probably work with ColdFusion. The problem is ColdFusion is known to break normal IIS request processing sequence, causing all other modules not to function after ColdFusion module. If it is acceptable for you I suggest you to try free Railo or Open BlueDragon instead of ColdFusion - http://www.helicontech.com/articles/run ... roduction/
http://www.helicontech.com/articles/ins ... on-on-iis/ These are free compatible alternatives to Adobe ColdFusion, which are known to work correctly with IIS using our Zoo integration solution.
Alternatively you may try to use BonCode connector to run ColdFusion with IIS http://boncode.blogspot.com/2012/06/cf- ... -with.html I have not tested this solution but it will not probably have the problems with traditional ColdFusion connectors.

Perhaps there is a fix there, but I'm stuck with CF9 & can't make the case to move to CF10 to fix just this problem. Perhaps I'll consider an FTP option...

We upgraded the server to CF10 and the download problem solved itself. We routinely deliver .FLV .AVI and .MP4 files up to 15Gb to our users with no problems. CF9 would not deliver anything over 1GB.

content type for MP4 is "video/mpeg"
filename is the path to the file.
Then all you need is 2 lines of code...

<cfheader name="content-disposition" value="attachment;filename=""#URL.ID.#fileextension""" />
<cfcontent type=#contentType#" file="#filename#">

@John, thanks for confirming my experience.

I'm struggling though to find any resource from Adobe that confirms that cfcontent now streams to the browser directly rather taking the file into memory before sending it on.

I'd really like to have something to quote to refute past years of functionality!