Earlier this week, I posted an experiment in which I used an ordinary Form POST to upload files directly to Amazon's Simple Storage Service (S3) without using my server as an upload proxy. Once I had that working, I really wanted to see if I could do the same thing using Plupload. I've talked about Plupload's excellent drag-and-drop file uploads before; but, I was uploading files directly to my server. Now, with Amason's POST functionality, I want to do the same thing, only using my S3 bucket as my target URL.
Project: View this project on my GitHub.
The Plupload repository has an Amazon S3 demo; but, even with the demo in hand, it took me a few days to get this working. I kept running into small hurdles around data serialization, unique naming, policy generation, redirects, etc.. Finally, however, I think I came up with a stable solution that seems to work with everything that I've thrown at it.
NOTE: Using a normal Form POST allows for a "success redirect", which removes this communication burden from the client (to some degree).
In this demo, the "upload report" is being implemented as an IMG src attribute. This image URL both reports the upload to the ColdFusion server and then consumes a pre-signed query string request authentication URL to the private S3 resource.
That said, here is the main page of the demo - the one that generates the Amazon S3 upload Policy and then renders the Plupload widget. Remember, since this user-interface (UI) maybe be visible for a good amount of time without any client-side page refresh, we have to create a distant expiration date for the Policy. Welcome to the world of rich client-side applications.
index.cfm - Our Main Plupload Demo
Right before the upload occurs, in the Plupload "BeforeUpload" event, I am dynamically adjusting the POST data for the current file. In this case, I am using the Plupload-generated file ID in order to place the upload in a unique "directory" within the S3 bucket. In this way, I can upload files with the same name and avoid collisions. This allows me to handle possible naming collisions in the business logic on my ColdFusion server, and not on the Amazon S3 bucket.
NOTE: Amazon S3 doesn't really have "directories;" merely, resource keys that can mimic a hierarchical directory structure.
Once Plupload has posted the file to Amason S3, I then parse the response and report back to the ColdFusion server using a newly prepended IMG element. Here is the ColdFusion page that accepts the report and generates a pre-signed query string request authentication URL (that will eventually render the upload in the IMG element back on the client-side).
success.cfm - Our ColdFusion Success Handler
<cfscript> // Include our ColdFusion 9 -> ColdFusion 10 migration script so // that I can work on this at home (CF10) and in the office (CF9). include "cf10-migration.cfm"; // Include the Amazon Web Service (AWS) S3 credentials. include "aws-credentials.cfm"; // ------------------------------------------------------ // // ------------------------------------------------------ // // We are expecting the key of the uploaded resource. // -- // NOTE: This values will NOT start with a leading slash. param name="url.key" type="string"; // Since the key may have characters that required url-encoding, // we have to re-encode the key or our signature may not match. urlEncodedKey = urlEncodedFormat( url.key ); // Now that we have the resource, we can construct a full URL // and generate a pre-signed, authorized URL. resource = ( "/" & aws.bucket & "/" & urlEncodedKey ); // The expiration is defined as the number of seconds since // epoch - as such, we need to figure out what our local timezone // epoch is. localEpoch = dateConvert( "utc2local", "1970/01/01" ); // The resource will expire in +1 day. expiration = dateDiff( "s", localEpoch, ( now() + 1 ) ); // Build up the content of the signature (excluding Content-MD5 // and the mime-type). stringToSignParts = [ "GET", "", "", expiration, resource ]; stringToSign = arrayToList( stringToSignParts, chr( 10 ) ); // Generate the signature as a Base64-encoded string. // NOTE: Hmac() function was added in ColdFusion 10. signature = binaryEncode( binaryDecode( hmac( stringToSign, aws.secretKey, "HmacSHA1", "utf-8" ), "hex" ), "base64" ); // Prepare the signature for use in a URL (to make sure none of // the characters get transported improperly). urlEncodedSignature = urlEncodedFormat( signature ); // ------------------------------------------------------ // // ------------------------------------------------------ // // Direct to the pre-signed URL. location( url = "https://s3.amazonaws.com#resource#?AWSAccessKeyId=#aws.accessID#&Expires=#expiration#&Signature=#urlEncodedSignature#", addToken = false ); </cfscript>
There's a lot of detail in the upload script - more than I can go into in this one blog post. Hopefully the comments in the actual markup will elucidate some of the decisions I made. I'll also try to follow-up with some smaller blog posts about some very specific problems that I had when trying to figure this all out. But, in the end, I was super pumped up to finally get Plupload uploading files directly to Amazon S3!
Ben, this is awesome! I'm looking forward to having this implemented in our application. It should make file uploads faster and take some load off the server. Everything feels logical and secure. Great job!
Thanks! It's pretty exciting. Once you upload to Amazon S3, you may still need to pull it back down to the ColdFusion server (such as for image thumbnailing, validation, etc.). BUT, you don't have to subject the ColdFusion server to *slow* client connections. That's huge! Let Amazon deal with keeping a connection open for 300 seconds... once it's on Amazon S3, the ColdFusion server can pull it down (over the massive internet backbone) in a matter of (milli)seconds.
Very nice work, Ben!
@Josh - it might be worth setting up the infrastructure for your ColdFusion server (or whatever server-side infrastructure you use) to talk to S3 and pull files down as needed. S3 is awesome, but pretty limited in terms of file manipulation. If you ever want to dynamically ZIP files, for example, you can't do that on S3. You have to pull them all down to the server, create the ZIP, and then re-upload to S3 (or serve the ZIP from your server). There are JS solutions for cropping and applying filters to images, but you're probably going to get better options and performance on the server -- for now at least!
Yeah, there's definitely certain image assets that can just be copied. Plus, S3 offers Copy/Move commands that can move resources from the "temp upload" keys to the application keys without pulling them down.
Thanks! And yeah, the ZIPing up of images (well logical collections of a number of things) is on our soon-to-be-done roadmap. I was just thinking about that as well. Even so, however, pulling things from S3 to the data-center is probably going to be significantly fast. Even if we had to download 200MB of stuff (which is larger than we would have to, most likely), I don't think that will add a significant wait time. At least not in some small amount of testing that I've done.
"BUT, you don't have to subject the ColdFusion server to *slow* client connections. That's huge! Let Amazon deal with keeping a connection open for 300 seconds... "
PLUpload works in chunks, so uploading even the largest file isn't going to suck up a thread for 300 seconds solidly...
And if your server connection to S3 is that slow, you'll have other issues if you ever need to do anything else on AWS.
So I'm not sure of the benefit of doing all the extra client-side heartache vs just uploading it from CF once the final chunk arrives...
I think the issue is not so much that the server connection is slow, it's that the client-server connection is slow. And, I think this is perhaps very specific to file-upload. I know that when I am at the office, uploads are super fast because we have some sort of business-grade internet connection. However, when I am at home using my persona internet connection, uploading the same file can take like 4-5 times as long.
During that extended upload time, I can see (from Fusion Reactor) that a ColdFusion thread is being used to read in that file. So, if a user's connection causes them to need 20-seconds to upload a file, that's 20-seconds that ColdFusion needs to be watching that POST.
Now, if we can, instead have people upload directly to Amazon S3, then S3 needs to worry about that 20-second upload time. Once the file is uploaded, however, pulling the file down from S3 to our production server (if we need to) will probably take milliseconds (due to the massive data-center connections).
Well, that's why we chunk.
If a 5 meg file takes 20 seconds, it's not going to take much longer in blocks of 128k or whatever, but you don't see a scary spike for average time in FusionReactor :-)
Oh, and of course you've got upload feedback for chunked uploads in PLUpload too, which users find reassuring.
To be honest, I don't really understand what Chunking is and what implications it has. If I chunk a file in Plupload, do I have to manually glue it back together on the server? Or is that something that is hidden away from me (ie. handled by the ColdFusion infrastructure)?
The target page has to handle the temporary storage and reassembly at the end, but there is example ColdFusion code.
Thanks for the link, I'll check it out. I vaguely remembering having to join files together when using the XStandard API. So, hopefully this will make sense :)
How can i upload my files from specific folder to amazon s3(with access key)? Please suggest
I am not sure I understand your question? When using Plupload, the user has to explicitly select files from their computer. So, you can't really do anything automated (via Plupload).
Regarding chunking, the S3 API supports multi-part uploads. http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html
Multi-part is important for at least 2 reason. It allows you to multi-thread uploads of large files more quickly and also improves resiliency. S3 ingest speed improves with more threads, assuming you have bandwidth to support it, plus if you chunk uploads, and a piece fails, you can restart just that chunk.
Does this still work?
I'm having difficult trying to implement this, I changed this line
- // The upload URL - our Amazon S3 bucket.
- url: "http://#aws.bucket#.s3.amazonaws.com/",
As amazon seems to have switched the order to
But difficulty getting any error to show up and upload isn't completing.
I've not read anything about Amazon changing their URL formats. The easiest way to see errors might be to capture the network traffic...
I've recently been having issues with the S3 direct-uploads; but, I believe I have narrowed it down to my VPN software as my current machine cannot run my demo... OR, other demos that I've come across (even ones that are hosted on other people's servers).
As an experiment, try to disabling the "html5" runtime and just doing the "flash" runtime. This worked for me. This is how I was able to narrow the issue down to html5 and the OPTIONS preflight request. It seems that on my computer all OPTIONS requests get automatically "Aborted".
Every time I try to run an OPTION request, I see an error show up in my console.app (Mac) log output. After some Googling, it looks like this error is related to my Cisco AnyConnect VPN software:
Anyway, not to be long-winded, but I believe the demo still works, but may have complications with the VPN if you have one running.
I just uninstalled my Cisco AnyConnect VPN client and my CORS requests started working again ! I will now try to re-installed with fewer options.
This is awesome work. Anyone know of a similar approach using PLUpload on the Microsoft stack? We're on MVC4 (for better or worse) and would like to implement the direct stream to S3.