Ask Ben: Tracking File Downloads In ColdFusion

By Ben Nadel on May 15, 2007

I am working on a project that includes a little function that I am having trouble with. I have been snooping around various CF resources, and haven't seen anything close to what I'm trying to do.
In a nutshell, I have a library application that allows people to download docs once they've logged in. This part is not trouble.

What I'd like to do is to record when someone downloads a doc. My thought was to have the href link to the doc include an onclick event, which calls a function which would do an sql insert, adding the userID, docID and date to the database.

I'm not proficient in cfc's and cfscript, so if you can give me an example I can modify, I'd be greatful.

First, let me tell you that I don't know what the best solution is. These three method discussed below are variations on things that I have done - each has its time and place. In order to track the downloads for a particular file, I will either route the file download through a proxy that logs the document activity and then forwards the user to the requested file or I will use a mouse event tracker like you mentioned. For this demo, I am logging the document requests to a text file, "download_log.txt," but this could just as easily be a database or an XML file or what ever kind of data persistence model you like.

Let's start out with a download proxy. Here is the code for the proxy.cfm ColdFusion template. It takes only one URL parameter, File. This is the name of the file that the user wants to download. This demo assumes that all the files are in a known, web-accessible directory. If you your files are in specific directories, then you will probably have to alter the ColdFusion CFLocation tag:

<!---
	We do not want any output for this page. Therefore, we
	are going to set it so that only content within a set
	of CFOutput tags will be written to the content buffer.
--->
<cfsetting
	enablecfoutputonly="true"
	/>


<!---
	Param the file value in the URL. This will be the name
	of a file that we are going to let the user download. This
	assumes that all of our files are in the same directory
	(otherwise, we would need to know more about the location
	for this specific file).
--->
<cfparam
	name="URL.file"
	type="string"
	default=""
	/>


<!---
	Log this download information. This logging could be to
	a database, but for the purposes of this example, we are
	going to log this download to a text file. When logging,
	we are going to capture the following information as a
	pipe-delimited list:

	SESSION.User.ID -- ID of the logged-in user.
	URL.File -- Name of the file being downloaded.
	Now() -- Date/Time of download.
--->
<cffile
	action="APPEND"
	file="#ExpandPath( './download_log.txt' )#"
	output="#SESSION.User.ID#|#URL.file#|#Now()#"
	addnewline="true"
	/>


<!---
	Now that we have logged information about the file
	download, we can let the user download the file. There
	are two ways to go about this. We can use CFContent to
	stream the file via ColdFusion or we can just use a
	CFLocation to forward the use to the location of the file.
	CFContent would allow us to stream non-web-accessible files
	but puts processing time on the ColdFusion server.
	CFLocation does not really put any work on the CF server,
	but it means the files must be accessible via the web.
--->
<cflocation
	url="./#URL.file#"
	addtoken="false"
	/>

Now that we have our proxy.cfm ColdFusion template in place, we have to go back and alter our links to point to the proxy template rather than directly to the files:

<html>
<head>
	<title>File Download Proxy In ColdFusion</title>
</head>
<body>

	<!---
		When outputting the file links, instead of linking
		directly the file, route the user through the
		proxy template. This will log the clicks and then
		forward the user to the requested file.
	--->
	<p>
		<a
			href="./proxy.cfm?file=#UrlEncodedFormat( "picture_1.jpg" )#"
			>Picture 1</a><br />

		<a
			href="./proxy.cfm?file=#UrlEncodedFormat( "picture_2.jpg" )#"
			>Picture 2</a><br />

		<a
			href="./proxy.cfm?file=#UrlEncodedFormat( "picture_3.jpg" )#"
			>Picture 3</a><br />
	</p>

</body>
</html>

Notice that we are URL encoding our file names. This is the safe thing to do since we never know what kind of crazy characters there are in the file names. Also notice that we do not decode the file name on the proxy page. ColdFusion will automatically decode URL values for us (as far as I know).

After clicking around on the links, here is what our download_log.txt file looks like:

4|picture_1.jpg|{ts '2007-05-15 08:49:27'}
4|picture_2.jpg|{ts '2007-05-15 08:49:33'}
4|picture_2.jpg|{ts '2007-05-15 08:49:37'}
4|picture_3.jpg|{ts '2007-05-15 08:49:47'}
4|picture_2.jpg|{ts '2007-05-15 08:49:49'}
4|picture_1.jpg|{ts '2007-05-15 08:49:51'}
4|picture_2.jpg|{ts '2007-05-15 08:50:05'}
4|picture_2.jpg|{ts '2007-05-15 08:50:31'}
4|picture_3.jpg|{ts '2007-05-15 08:55:58'}
4|picture_1.jpg|{ts '2007-05-15 08:55:59'}
4|picture_1.jpg|{ts '2007-05-15 08:56:20'}

I am capturing the full file name, but it sounds like you have a good document ID to grab - you would use that instead as IDs are just about always better to use than file names (which might change).

Now, this example uses a CFLocation to forward the user to a given file. This has some down sides. For starters, CFLocation can only point to publicly accessible folders. This means that none of these files being downloaded can be outside of the web root. This removes several security options that we could have implimented via ColdFusion. Additionally, if someone were to right-click on the link and do "Save Target As", the file would show up as a ColdFusion CFM file, NOT as the requested file type (since the browser does not know that it will be forwarded to a different document type). This can be deal-breaker.

The upside to using CFLocation is that is puts no stress on the ColdFusion server as ColdFusion is not responsible for dealing with the file; once the user is forwarded to the file, it's the web server's job of streaming the file.

If the security-related down side to the CFLocation method is a deal breaker (which for many, it is), you can always go the ColdFusion CFHeader and CFContent route. Using CFHeader and CFContent, ColdFusion can grab the requested file and stream it to the client with more control and security:

<!---
	We do not want any output for this page. Therefore, we
	are going to set it so that only content within a set
	of CFOutput tags will be written to the content buffer.
--->
<cfsetting
	enablecfoutputonly="true"
	/>


<!---
	Param the file value in the URL. This will be the name
	of a file that we are going to let the user download. This
	assumes that all of our files are in the same directory
	(otherwise, we would need to know more about the location
	for this specific file).
--->
<cfparam
	name="URL.file"
	type="string"
	default=""
	/>


<!---
	Log this download information. This logging could be to
	a database, but for the purposes of this example, we are
	going to log this download to a text file. When logging,
	we are going to capture the following information as a
	pipe-delimited list:

	SESSION.User.ID -- ID of the logged-in user.
	URL.File -- Name of the file being downloaded.
	Now() -- Date/Time of download.
--->
<cffile
	action="APPEND"
	file="#ExpandPath( './download_log.txt' )#"
	output="#SESSION.User.ID#|#URL.file#|#Now()#"
	addnewline="true"
	/>


<!---
	Before we send over the content, we might want to try
	to narrow down the type of content being streamed. You
	can use the file extension to help figure this out.
	While this does NOT affect the content of the file
	itself, it will help the client deal with file once
	it is downloaded.
--->
<cfswitch expression="#ListLast( URL.file, '.' )#">

	<!--- Image types. --->
	<cfcase value="gif,jpg,jpeg,pjpeg,png,pic,bmp">
		<cfset strMime = "image/#ListLast( URL.file, '.' )#" />
	</cfcase>

	<!--- MS Excel. --->
	<cfcase value="xls">
		<cfset strMime = "application/msexcel" />
	</cfcase>

	<!--- MS Word. --->
	<cfcase value="doc,mht,rft">
		<cfset strMime = "application/msword" />
	</cfcase>

	<!--- Text. --->
	<cfcase value="txt">
		<cfset strMime = "text/plain" />
	</cfcase>

	<!---
		Our default value will just send the default mine
		type, the octet stream, which is our way of just
		saying we have no idea what the file type is.
	--->
	<cfdefaultcase>
		<cfset strMime = "application/octet-stream" />
	</cfdefaultcase>

</cfswitch>


<!---
	Tell the client to try and open this file inline. This
	is the best option if you expect to be getitng lots of
	image requests. We can also tell the client what the
	suggested file name of the asset is.
--->
<cfheader
	name="content-disposition"
	value="inline; filename=#URL.file#"
	/>


<!---
	Stream the file to the client using CFContent. By doing
	this, we can grab files that are outside of the web root.
	This gives us more security access options. Notice that
	we have to use the Full Server Path for this file
	since COldFusion needs to know exactly where to find it.
	Also, we are passing back the file's mime type which we
	calculated above. This will help the client figure out
	how to best deal with the resultant file.
--->
<cfcontent
	type="#strMime#"
	file="#ExpandPath( './#URL.file#' )#"
	/>

When we use this technique, the right-click "Save Target As" still comes up as a ColdFusion file, but the file that gets downloaded can be anywhere that ColdFusion can reach. It does not have to be web-accessible which means you can implement all the security around file access that your heart desires.

As a somewhat large down side to this method, the file download is much slower since ColdFusion has to actually read in the binary data and stream it to the client. It just doesn't do it that fast, even on small files. This will probably be most noticeable on images where the image can load a bit at a time.

Ok, so how do we deal with the whole right click issue? If it is important to you that letting your user's right click is a "must have," then using a file download proxy is not the way to go. In that case, I would suggest using the method you mentioned in your question: a Javascript onclick mouse event handler. In this case, we are going to create a download logging page that will track via an onclick event:

<!---
	We do not want any output for this page. Therefore, we
	are going to set it so that only content within a set
	of CFOutput tags will be written to the content buffer.
--->
<cfsetting
	enablecfoutputonly="true"
	/>


<!---
	Param the file value in the URL. This will be the name
	of a file that we are going to let the user download. This
	assumes that all of our files are in the same directory
	(otherwise, we would need to know more about the location
	for this specific file).
--->
<cfparam
	name="URL.file"
	type="string"
	default=""
	/>


<!---
	Log this download information. This logging could be to
	a database, but for the purposes of this example, we are
	going to log this download to a text file. When logging,
	we are going to capture the following information as a
	pipe-delimited list:

	SESSION.User.ID -- ID of the logged-in user.
	URL.File -- Name of the file being downloaded.
	Now() -- Date/Time of download.
--->
<cffile
	action="APPEND"
	file="#ExpandPath( './download_log.txt' )#"
	output="#SESSION.User.ID#|#URL.file#|#Now()#"
	addnewline="true"
	/>


<!---
	Since we have logged the file and the client is not
	expecting any feedback from this page, we are done.
--->

As you can see, the logging for this ColdFusion template is only slightly different from the ones above it. The actually log to the data storage is the same across all three pages. The only difference is what the ColdFusion template returns to the client. In this case, we are returning nothing in our content buffer (except for some potential white space).

Now, let's look at the HTML page that will tie into this one:

<html>
<head>
	<title>File Download Logging In ColdFusion</title>

	<script type="text/javascript">

		function LogDownload( strFile ){
			// Create an image object to ping the file logging.
			// While the target CFM file will NEVER return a
			// valid image file, we don't really care... we just
			// want to trigger the CFM page itself.
			var imgPing = new Image();

			imgPing.src = (
				"./log_download.cfm?file=" +
				strFile
				);

			// Return out.
			return;
		}

	</script>
</head>
<body>

	<!---
		When outputting the file links, instead of linking
		directly the file, route the user through the
		proxy template. This will log the clicks and then
		forward the user to the requested file.
	--->
	<p>
		<a
			href="./picture_1.jpg"
			onclick="LogDownload( '#UrlEncodedFormat( "picture_1.jpg" )#' );"
			>Picture 1</a><br />

		<a
			href="./picture_2.jpg"
			onclick="LogDownload( '#UrlEncodedFormat( "picture_2.jpg" )#' );"
			>Picture 2</a><br />

		<a
			href="./picture_3.jpg"
			onclick="LogDownload( '#UrlEncodedFormat( "picture_3.jpg" )#' );"
			>Picture 3</a><br />
	</p>

</body>
</html>

As you can see, each link is targeted directly at the file itself. This means that ColdFusion does not have to stream the file. It also means that the right-click "Save Target As" will actually result in the proper file type. When the user clicks on any of the links, it triggers a Javascript function call to the LogDownload() method. This method creates an image object and sets its source equal to the download logger ColdFusion template. Now, the CFM template will NEVER return a valid image binary, but frankly, we don't care. It will be invalid, but since we never do anything with it, this is going to be much easier to code that using any sort of AJAX call.

So, these are three options that can be used to track user downloads. Each has its time and place for use. You have to decide which aspects of each are more important to you.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/go/706

Reader Comments

Rich Rein May 15, 2007 at 2:01 PM

14 Comments

We have taken kind of a hybrid approach to this problem on our internal website. A dynamic page listing files available for download, grouped by category (I will save the admin side of this functionality from the explanation, but suffice it to say that each file is listed in a db table, along with various attributes about the file and a foreign key reference to the category that it belongs in), is initially presented to the user. Clicking on a specific file takes them to a page (something like filepreview.cfm?fileid=1) where they can see additional information about the file they have selected, such as file size (from cfdirectory), description (from the db table), file name, etc. Here they can choose to actually download the file (download file button) or to go back to the main file listing. Clicking the download file button takes the user to a generic proxy page (something like downloadfile.cfm?fileid=1) which logs information about the download (user, datetime, fileid) into a db table, and then attempts to automatically push the file to the user. If the automatic download stops due to a popup blocker or for various other reasons, the page also loads with a link to manually download the file (but at this point, the attempted download is already logged).

Ben Nadel May 15, 2007 at 2:06 PM

15,674 Comments

@Rich,

I like the idea of having a preview page before you have to download the file. When you say that you provide a link to manually download the file, what is that link? Does that link directly to the file itself? Or are you still going through a proxy?

Rich Rein May 15, 2007 at 2:10 PM

14 Comments

The final page (once you have confirmed on the preview page that you actually want to download the file) has a javascript that attempts to do a javascript document.url to the actual file, and then the following html of the page is "If the file does not begin to download please click here", with here being a hyperlink to the file itself.

This solution does require us to have the file all be web-accessible, so currently the security of any individual file is only as good as somebody only accessing urls through the ui (which enforces permissions to determine which files you see in your master list).

Ben Nadel May 15, 2007 at 2:13 PM

15,674 Comments

@Rich,

I think that is good. Personally, I only feel super file security is needed if absolutely necessary. Otherwise, file handling through the web application is just a drain on resources.

I guess a possible medium between the two is copying the file from a secure place to a public place and then linking to it can be done... but this makes me nervous. Having to wait for a file to copy makes me nervous.

Rich Rein May 15, 2007 at 2:17 PM

14 Comments

@Ben-

For super-secret, super-large files, we do have additional functionality in place. When an authorized users request a file designated as super-secret and super-large, we simply display a page letting them know that we will sending them an email soon when the file is available for them. At that point, it is sent to an ftp site that the specific user has access to and a task is there to make sure that the file only lives on the ftp server for a set period of time (I believe a day currently). An asynchronous process places the file on the user's ftp space and then sends them an email with directions to pick it up. This is a very arduous process, so most of our files go through the process mentioned above.

Scott May 15, 2007 at 2:18 PM

1 Comments

We actually use a document id and a auth session ID to track downloads. Each link is a form submit that writes the tracked info to a SQL table. We track time, date, user ID, and asset ID. We can then tie the asset ID to the asset table, and the user ID to the user table.

Ben Nadel May 15, 2007 at 2:33 PM

15,674 Comments

@Rich,

I really like the email with the link idea. I have never thought of that. Player.

Rich Rein May 15, 2007 at 2:37 PM

14 Comments

@Ben-

One of the (few) benefits of having 99% of your code goes towards a very targeted audience :)

Like I said before, our solution might not work for everyone (or even every case), but for our site (which is accessed by internal employees and employees of our clients) we have control over who accesses the site in general, client machine specifications (they can access the ftp site and the web server), and user information (we can track who did the download, and have their email address to send notifications to). If it helps save somebody else from re-inventing the wheel, so be it.

Ben Nadel May 15, 2007 at 2:51 PM

15,674 Comments

Thanks for all the feedback and tips.

Simon Free May 15, 2007 at 3:00 PM

32 Comments

When you do an onclick function, would that track it as a hit if i were to right click on the link? If so would that not provide you with an accurate result? Or is onClick only for left hand clicks?

@rich, i have used a similar method to yours when it comes to large files although my client needed to be able to download multiple large files so our system zips them up as it transfers them to the ftp. I find the ftp method rules, especially as its really easy to disable an account after 24 hours.

Ben Nadel May 15, 2007 at 3:05 PM

15,674 Comments

@Simon,

Not sure if the right click will trigger a mouse event. I want to say no, but it will get trapped by onmousedown... but again, can't say without testing.

Simon Free May 15, 2007 at 3:13 PM

32 Comments

@Ben,

I just tested it and you were right, onClick doesnt fire for right hand click but onMouseUp does fire.

Tom Mollerus May 15, 2007 at 3:14 PM

28 Comments

@Ben-

You've given some very good tips on how to track downloads and have given security some consideration, but I'd like to point out that because you're allowing the filename to be passed in the URL, someone could request any file in the webroot to be downloaded to them. This means that the proposed solution won't handle file security for groups or users, because anyone could conceivably get any file on the server (e.g., href="./proxy.cfm?file=#UrlEncodedFormat( "../../someone_else's_file.doc" )#").

My suggestion to your user would be to 1) depend on storing a list of files and permissions in a db table somewhere, and passing the file's ID in the URL instead of the file's name; and 2) use cfcontent to deliver the file since it can deliver assets from outside of the webroot,. This way, the application logic can check the permissions in the database before allowing the download to occur.

Ben Nadel May 15, 2007 at 3:32 PM

15,674 Comments

@Tom,

You raise an excellent point. In fact, you could have passed "Application.cfc" in the URL and it would have downloaded the App file (assuming CFContent, not CFLocation). I agree; an ID is ALWAYS the way to do. IDs do not give away any information about the file behind the scenes and limits the file to one that corresponds to some sort of data cache.

If you don't have an ID, I would recommend Encrypting the file name at the very least.

Thanks for raising some excellent points.

Simon Free May 15, 2007 at 3:35 PM

32 Comments

@Tom,

With the id method would you be passing in the document ID and checking it against a permissions join table? My thought is that if you pass in an id couldnt somoene just change the id to get access to a different file?

Tom Mollerus May 15, 2007 at 3:44 PM

28 Comments

@Simon-

You're right that just using an ID by itself wouldn't secure the file, but I was assuming a certain kind of application logic would be used to make sure that only authorized users could get at certain files. For instance, the permissions table could hold a lookup ensuring that certain files were only available to certain groups or certain users. If you modified the URL string in your browser to pass in the ID for a different file, the application wouldn't find you via the database lookup and wouldn't download the file to you. Let me know if that doesn't make sense.

Simon Free May 15, 2007 at 3:54 PM

32 Comments

@Tom,

That makes sense.

@anyone

If a user was to right click and to choose save target as, would they be able to save the file? Or would this try and have them save a .cfm file. I have had instances that when they have chosen to save target it has prompted as the file name "a file name.cfm" even though they could change it to the correct file extention and it would work, that isnt the way i would like it to work. Has anyone experienced that?

Tom Mollerus May 15, 2007 at 4:04 PM

28 Comments

@Simon-

I've experience the same problem, where the Save File dialog box presents you with a .cfm filename. The inclusion of "filename=..." in the cfcontent tag, like Ben does, is supposed to specify the filename but doesn't always work. I find IE to be a particularly problematic browser for this.

I've seen one kludgy workaround for the issue: instead of sending URL parameters via a query string (e.g. "index.cfm?file=6"), send them as slash-separated path info with the desired filename right at the end (e.g. "index.cfm/file/6//filename/Business Plan.doc"). There are several code libraries out there which can help you parse the CGI.PATH_INFO variable. The server gets the variables, and the browser thinks it's downloading a file with the name of whatever comes after the last slash.

Tom Mollerus May 15, 2007 at 4:06 PM

28 Comments

Whoops-- I shouldn't have included the double foreslash in my comment above. The proposed path info URL should be "index.cfm/file/6/filename/Business Plan.doc"

Simon Free May 15, 2007 at 4:14 PM

32 Comments

@Tom, i love the slash idea, that makes sense. I am going to be changing the site over to slash notation for better SEO so that would match the rest of the site as well.

SWEEET! thanks!

Adam Cameron May 15, 2007 at 7:06 PM

67 Comments

I would tread lightly around <cfcontent>. As Ben touches on, one has to wait around whilst CF transfers it (why CF does not pass this off to the file system, I have no idea).

That is SLOW, but it does not impact server stability.

What DOES impact server stability is that whilst the file is transferring, <cfcontent> holds a thread open. This is not a consideration for occasional small file transfers, but it's a significant one in busy environments or when large files are concerned. Unless one is careful, one can very quickly end up with <cfcontent> transfers holding open all available threads, and the CF server basically grinding to a halt.

Not so nice.

--
Adam

Ben Nadel May 16, 2007 at 7:26 AM

15,674 Comments

@Adam,

Thanks for driving that point home. I don't generally work with high traffic sites, so did not want to speak with authority... but that is exactly what I was thinking.

Julian Halliwell May 18, 2007 at 5:44 AM

14 Comments

Adam is absolutely correct on the impact of using CFCONTENT for file downloads. Avoid it unless you absolutely have to have the security. If you can't avoid it, make sure you're using your own server/instance and that the simultaneous requests setting in the CFAdmin is set high: not 3 times the no. processors as generally advised, but 10 or more. Just yesterday I had a server lockup several times as I watched the thread count rise during a session where I suspect a group of students had been told to go and download some documents.

If you're on a shared server, then use of CFCONTENT for large downloads could be considered anti-social behaviour.

Jason Wohlford Dec 4, 2008 at 4:09 PM

2 Comments

Ben, thanks a bunch for your tutorials. It's a huge help figuring out ColdFusion.

I'm using the second version of the above to change the name of existing files to something a little more web friendly. It works great on small files, but dies on large ones (~40MB). Any guesses?

Ben Nadel Dec 4, 2008 at 4:22 PM

15,674 Comments

@Jason,

Glad to help in some way. Streaming files can take up some resources, but from what I have seen, ColdFusion seems to handle it well. When you say "dies", how do you know? Does ColdFusion service restart?

Jason Wohlford Dec 4, 2008 at 5:27 PM

2 Comments

@Ben

Doh! Turns out I had some links that were bad. 404s a plenty.

Stay kinky! ;-)

Ben Nadel Dec 4, 2008 at 5:32 PM

15,674 Comments

@Jason,

No worries man. I'll be keeping it kinky as long as I can :)

Piyush Nov 24, 2009 at 3:18 AM

2 Comments

i m facing a issue with cflocation , if file name is same for 3-4 entries in same page, nd user try to download, then it display the same file every time

Piyush Nov 24, 2009 at 3:20 AM

2 Comments

Could u pls let me know solution of this problem

Ben Nadel Nov 24, 2009 at 7:57 AM

15,674 Comments

@Piyush,

In cases like that, you have to come up with a way to unique identify the file in the URL. Either you need to use some sort of database-generated ID, or perhaps you could use something like the hash() of the file path.

After all, the files *have* to be in different directories, if they have the same name, or they would *be* the same file. As such, you just need a way to integrate that difference into the URL.