Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Jeff McDowell and Jonathan Dowdle and Joel Hill and Josh Siok and Christian Ready and Steve 'Cutter' Blades

Url-Encoding Amazon S3 Resource Keys For Pre-Signed Urls In ColdFusion

By Ben Nadel on
Tags: ColdFusion

Yesterday, I looked at which characters where being encoded by ColdFusion's urlEncodedFormat() function. This exploration was prompted by some recent trouble I'd been having with generating pre-signed URLs for Amazon S3 objects. I've been using Amazon S3 for a while now without issue; then, recently, I noticed that some of my pre-signed URLs were failing for certain file names. After some digging, it turned out that urlEncodedFormat() was being a bit too aggressive with which characters were being escaped. As such, I had to start unescaping some of the characters returned by the urlEncodedFormat() function.

In my Googling, I came across this Amazon Web Services (AWS) forum thread in which one of the AWS support representatives said:

To avoid any ambiguity, you can percent-encode all RFC 3986 2.2 reserved characters when constructing a request URI for an Amazon S3 key.

When I looked up the RFC 3986 specification (for Uniform Resource Identifiers), section 2.2 listed out the characters that should be escaped. But, more importantly, section 2.3 listed out the characters that shouldn't be escaped. These so-called, "Unreserved Characters", should be allowed in the URL:

ALPHA / DIGIT / "-" / "." / "_" / "~"

And, here's where I started looking into what urlEncodedFormat() was actually doing. What I found was that all of those non-alpha-numeric "unreserved characters" were being escaped. So, I went in an unescaped them after running urlEncodedFormat(). At that point, my pre-signed Amazon S3 urls started working consistently.

To test this, I programmatically generated a file name that contained most of the non-control ASCII characters in the first 127 decimal values. Then, I uploaded this file to Amazon S3 (using the AWS console) and started generating pre-signed URLs for it:

  • <cfscript>
  •  
  • filename = "file";
  •  
  • // Build up the file name, one ASCII character at a time.
  • for ( i = 32 ; i <= 126 ; i++ ) {
  •  
  • // The colon (:) is illegal on the Mac in folder and file names.
  • if ( i == asc( ":" ) ) {
  •  
  • continue;
  •  
  • }
  •  
  • // The forward-slash messes up the file name on the Mac. This ends up being
  • // reported as a ":". It's a bit confusing to me. If you do want to use "/" in
  • // an S3 file name, you have to escape it so that S3 doesn't treat it as a
  • // pseudo-directory separator.
  • if ( i == asc( "/" ) ) {
  •  
  • continue;
  •  
  • }
  •  
  • filename &= chr( i );
  •  
  • }
  •  
  • // NOTE: We're writing to this to the output (instead of writing a file object),
  • // since ColdFusion (Java) was having an issue writing some of the characters in
  • // the resultant file path. I'm not sure which ones.
  • writeOutput( filename & ".jpg" );
  •  
  • </cfscript>

The file didn't contain the ":", which is illegal to use on the Mac OSX. It also doesn't contain the "/" character since that seems to be transmitted as a ":", which confuses me a bit. I also had to write this file name to the page output, as opposed to doing a fileWrite() call, since Java seemed to be having an issue with one (or more) of the characters in the file name.

But, once the file was created and uploaded, I used the following script to generate and test pre-signed URLs from the browser (using an IMG src) as well as from the server (using both the CFHttp and the CFImage tag - though, only CFHttp is currently included in the demo).

  • <cfscript>
  •  
  • /**
  • * I get the expiration in seconds based on the given expires-at date. This takes
  • * care of the UTC conversion and expects to receive a date in local time.
  • *
  • * @output false
  • */
  • public numeric function getExpirationInSeconds( required date expiresAt ) {
  •  
  • var localEpoch = dateConvert( "utc2local", "1970/01/01" );
  •  
  • return( dateDiff( "s", localEpoch, expiresAt ) );
  •  
  • }
  •  
  •  
  • /**
  • * I get the file name of the test file we [previously] uploaded to Amazon S3 in an
  • * attempt to explore the URL encoding needed to reference an object key. This
  • * generates a file name with the every non-control chracter in the first 127 ASCII
  • * character (less some characters that are illegal on the Mac OSX).
  • *
  • * @output false
  • */
  • public string function getTestFileName() {
  •  
  • var filename = "file";
  •  
  • // Build up the file name, one ASCII character at a time.
  • for ( var i = 32 ; i <= 126 ; i++ ) {
  •  
  • // The colon (:) is illegal on the Mac in folder and file names.
  • if ( i == asc( ":" ) ) {
  •  
  • continue;
  •  
  • }
  •  
  • // The forward-slash messes up the file name on the Mac. This ends up being
  • // reported as a ":". It's a bit confusing to me. If you do want to use "/"
  • // in an S3 file name, you probably have to escape it so that S3 doesn't
  • // treat it as a pseudo-directory separator -- not 100% sure on that.
  • if ( i == asc( "/" ) ) {
  •  
  • continue;
  •  
  • }
  •  
  • filename &= chr( i );
  •  
  • }
  •  
  • return( filename & ".jpg" );
  •  
  • }
  •  
  •  
  • /**
  • * I generate the signature for the given resource which will be available until
  • * the given expiration date (in seconds).
  • *
  • * @output false
  • */
  • public string function generateSignature(
  • required string resource,
  • required numeric expirationInSeconds
  • ) {
  •  
  • var stringToSignParts = [
  • "GET",
  • "",
  • "",
  • expirationInSeconds,
  • resource
  • ];
  •  
  • var stringToSign = arrayToList( stringToSignParts, chr( 10 ) );
  •  
  • var signature = hmac( stringToSign, aws.secretKey, "HmacSHA1", "utf-8" );
  •  
  • // By default, ColdFusion returns the Hmac in Hex; we need to convert it to
  • // base64 for usag in the pre-signed URL.
  • return(
  • binaryEncode( binaryDecode( signature, "hex" ), "base64" )
  • );
  •  
  • }
  •  
  •  
  • /**
  • * I encode the given S3 object key for use in a url. Amazon S3 keys have some non-
  • * standard behavior for encoding - see this Amazon forum thread for more information:
  • * https://forums.aws.amazon.com/thread.jspa?threadID=55746
  • *
  • * @output false
  • */
  • public string function urlEncodeS3Key( required string key ) {
  •  
  • key = urlEncodedFormat( key, "utf-8" );
  •  
  • // At this point, we have a key that has been encoded too aggressively by
  • // ColdFusion. Now, we have to go through and un-escape the characters that
  • // AWS does not expect to be encoded.
  •  
  • // The following are "unreserved" characters in the RFC 3986 spec for Uniform
  • // Resource Identifiers (URIs) - http://tools.ietf.org/html/rfc3986#section-2.3
  • key = replace( key, "%2E", ".", "all" );
  • key = replace( key, "%2D", "-", "all" );
  • key = replace( key, "%5F", "_", "all" );
  • key = replace( key, "%7E", "~", "all" );
  •  
  • // Technically, the "/" characters can be encoded and will work. However, if the
  • // bucket name is included in this key, then it will break (since it will bleed
  • // into the S3 domain: "s3.amazonaws.com%2fbucket"). As such, I like to unescape
  • // the slashes to make the function more flexible. Plus, I think we can all agree
  • // that regular slashes make the URLs look nicer.
  • key = replace( key, "%2F", "/", "all" );
  •  
  • // This one isn't necessary; but, I think it makes for a more attactive URL.
  • // --
  • // NOTE: That said, it looks like Amazon S3 may always interpret a "+" as a
  • // space, which may not be the way other servers work. As such, we are leaving
  • // the "+"" literal as the encoded hex value, %2B.
  • key = replace( key, "%20", "+", "all" );
  •  
  • return( key );
  •  
  • }
  •  
  •  
  • // ------------------------------------------------------ //
  • // ------------------------------------------------------ //
  •  
  •  
  • // Include my AWS credentials (so they are not in the code). Creates the structure:
  • // * aws.bucket
  • // * aws.accessID
  • // * aws.secretKey
  • include "./credentials.cfm";
  •  
  • // Define the key to our character-test file. The filename in this key uses all the
  • // ASCII characters between 32 and 126 (less the slash and the colon). It looks like:
  • // --
  • // file !"#$%&'()*+,-.0123456789;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~.jpg
  • // --
  • key = urlEncodeS3Key( "testing-characters/" & getTestFileName() );
  •  
  • // Define the full resource of our key in our bucket.
  • resource = ( "/" & aws.bucket & "/" & key );
  •  
  • expirationInSeconds = getExpirationInSeconds( dateAdd( "n", 3, now() ) );
  •  
  • signature = generateSignature( resource, expirationInSeconds );
  •  
  • urlEncodedSignature = urlEncodedFormat( signature );
  •  
  • </cfscript>
  •  
  • <cfoutput>
  •  
  • <!--- Render the pre-signed URL directly in the browser. --->
  • <img
  • src="https://s3.amazonaws.com#resource#?AWSAccessKeyId=#aws.accessID#&Expires=#expirationInSeconds#&Signature=#urlEncodedSignature#"
  • width="325"
  • style="display: inline-block ; margin-right: 15px ;"
  • />
  •  
  • <!--- Try to use the pre-signed URL to download the image from Amazon S3. --->
  • <cfhttp
  • result="get"
  • method="get"
  • url="https://s3.amazonaws.com#resource#?AWSAccessKeyId=#aws.accessID#&Expires=#expirationInSeconds#&Signature=#urlEncodedSignature#"
  • getasbinary="yes"
  • file="#expandPath( './from-s3.jpg' )#"
  • />
  •  
  • <img src="./from-s3.jpg" width="325" />
  •  
  • </cfoutput>

As you can see, after I've run the Amazon S3 key through ColdFusion's urlEncodedFormat() function, I then re-introduce the "unreserved characters" from the RFC 3986 spec: ".-_~". After that, I also re-introduce the "/" and the "+" (for spaces). These two aren't strictly required; but, I think they make for a more attractive URL. Plus, the "/" is needed if someone were to pass-in the full resource path for the object (instead of just the key).

Most of the time, when I upload files to Amazon S3, my object keys are composed of alpha-numeric characters and the file names are usually based on Database IDs. As such, I never ran into any encoding problems before. But, when I started playing with Amazon S3 and Plupload for client-side uploading, my so-called "normal filen names" started causing problems. Hopefully, now that I understand what needs to be encoded, it should be smooth sailing going forward.




Reader Comments

@James,

I am *so* glad you were also seeing differences cross-browser! I thought I was going insane! In some cases, things worked fine in Firefox, but then broke in Chrome/Safari. At first, I was convinced that I must have been doing something wrong (I mean browsers are always "right", right?). But, then once I started to Google around, I found the stuff on the AWS forum about the RFC spec.

I have to say, Googling for URL-encoding of S3 characters revealed very little. Mostly, just people saying it broke; but, not any real detail about what was supposed to be done. It was a surprisingly hard nut to crack; although, maybe I was just Googling for the wrong thing?

Thanks for the link to your post. I should add, though, that I think I read that even the Java SDK was having issues with "+" in the path part of the URI. I think if your file name has a "+" in it, S3 incorrectly interprets that as a "space". Though, don't quote me on that :D

Reply to this Comment

Thanks mate; much appreciated. Been sacrificing chickens and selling parts of my body to science trying to work out what was wrong. Should have checked bennadel.com first.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.