Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFinNC 2009 (Raleigh, North Carolina) with:

Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion

By Ben Nadel on
Tags: ColdFusion

Before I say anything, I should probably mention that as of ColdFusion 9.0.1, ColdFusion has had native file-support for Amazon S3 using the "s3://" protocol. That said, I wanted to try experimenting with the Amazon S3 REST API using ColdFusion's CFHttp functionality. I know that I'm like 5 years (at least) behind everyone else on this topic; so, this blog post won't add much to the conversation - really, this is just here for my own reference.

Amazon Simple Storage Service (S3) is a hugely scalable data storage system. But, it is not a file system; it is a key-value store. You can have it mimic a file system by using storage keys that look like file paths; many applications, including ColdFusion's native S3 integration, present S3 as a file hierarchy. But at the end of the day, that's just a user-friendly abstraction built on top of the resource key that identifies a stored object.

The "not a file system" nature of Amazon S3 has other implications, as well, such as consistency. In some regions (but not all), S3 provides "eventual consistency." I don't have a full grasp of how "eventual" eventual consistency is; but in the US Standard region, due to the cross-country latency, Amazon does not guarantee read-after-write access.

Right now, I don't know if this eventual consistency applies to every client of your application? Or, if it is just for cross-client consistency? Meaning, if I PUT an object into S3, can I (as the PUT executer) read that object from S3 immediately? I'll have to do some more reading on this.

Ok, enough with the background, let's do some experimenting. For this post, all I want to do is try to upload an object to Amazon Simple Storage Service (S3), read it out as a binary, and provide an authenticated, public URL to the object.

Uploading Objects To Amazon Simple Storage Service (S3)

Amazon S3 can store just about anything with only the loosest of size constraints. It simply stores bytes. Those bytes can represent text files; those bytes can also represent images. We're going to try uploading an image of the beautiful and talented Helena Bonham Carter.

All authenticated requests to the S3 REST API must include a signature - a Base64-encoded hash-based message authentication code. As of ColdFusion 10, generating Hmac values is wicked easy and can be done with the native hmac() function; but, since I am on ColdFusion 9, I'll use my Crypto.cfc Hmac component.

When posting the file to S3, we'll post its binary value as the Body of the post.

  • <!---
  • Creates a structure with the secretKey and accessID so that I
  • don't have to have them in the blog post.
  • --->
  • <cfinclude template="credentials.cfm" />
  •  
  • <!---
  • This is the file we are going to upload. We need to read in the
  • binary file since we aren't posting it like a form field - we're
  • posting it as the BODY of the PUT request.
  • --->
  • <cfset content = fileReadBinary( expandPath( "./helena.jpg" ) ) />
  •  
  • <!---
  • When uploading the file, we are going to save it at the
  • following "Key". NOTE: S3 is NOT A FILE SYSTEM. It's a key/value
  • store. While this resource address looks like a file path, it is
  • a single key.
  • --->
  • <cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • All requests to the S3 API have to be authenticated. Here, we are
  • going to create the "signature" to be used in the Authorization
  • header of the PUT request.
  • --->
  •  
  • <!---
  • A timestamp is required for all authenticated requests (NOTE: This
  • does not apply to query-string-authentication based requests).
  • --->
  • <cfset currentTime = getHttpTimeString( now() ) />
  •  
  • <!---
  • The content type is not required; but it will be stored as meta-
  • data with the object if supplied.
  • --->
  • <cfset contentType = "image/jpeg" />
  •  
  • <!---
  • Set up the part of the string to sign - we are not including any
  • X-AMZ headers in this.
  • --->
  • <cfset stringToSignParts = [
  • "PUT",
  • "",
  • contentType,
  • currentTime,
  • resource
  • ] />
  •  
  • <!--- Collapse the parts into a newline-delimited list. --->
  • <cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
  •  
  • <!---
  • The target string is then signed to Hmac-Sha1 hashing, and
  • must be encoded as Base64. For this, I am using my Crypto.cfc
  • component.
  •  
  • NOTE: If you have ColdFusion 10, the hmac() function will now
  • do this with a single function call.
  • --->
  • <cfset signature = new Crypto().hmacSha1(
  • aws.secretKey,
  • stringToSign,
  • "base64"
  • ) />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • Post the actual binary to the S3 bucket at the given resouce.
  • NOTE: Since we have not provided any ACL (Access Control List)
  • permissions, the resource will be stored as *private* by default.
  • --->
  • <cfhttp
  • result="put"
  • method="put"
  • url="https://s3.amazonaws.com#resource#">
  •  
  • <cfhttpparam
  • type="header"
  • name="Authorization"
  • value="AWS #aws.accessID#:#signature#"
  • />
  •  
  • <cfhttpparam
  • type="header"
  • name="Content-Length"
  • value="#arrayLen( content )#"
  • />
  •  
  • <cfhttpparam
  • type="header"
  • name="Content-Type"
  • value="#contentType#"
  • />
  •  
  • <cfhttpparam
  • type="header"
  • name="Date"
  • value="#currentTime#"
  • />
  •  
  • <cfhttpparam
  • type="body"
  • value="#content#"
  • />
  •  
  • </cfhttp>
  •  
  •  
  • <!--- Dump out the Amazon S3 response. --->
  • <cfdump
  • var="#put#"
  • label="S3 Response"
  • />

By default, the object is stored with private access settings. This means that only authenticated users can view the object using the resource URL. You can pass a lot of additional settings with the PUT command, including access control permissions; but, for this blog post, I'll keep it as simple as possible.

Reading Objects From Amazon Simple Storage Service (S3)

Now that we've uploaded our image, let's read it back out. Like the PUT command, the GET command also has to be authenticated with the Hmac signature.

  • <!---
  • Creates a structure with the secretKey and accessID so that I
  • don't have to have them in the blog post.
  • --->
  • <cfinclude template="credentials.cfm" />
  •  
  • <!--- This is the resource that we want to read as a binary. --->
  • <cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • All requests to the S3 API have to be authenticated. Here, we are
  • going to create the "signature" to be used in the Authorization
  • header of the GET request.
  • --->
  •  
  • <!---
  • A timestamp is required for all authenticated requests (NOTE: This
  • does not apply to query-string-authentication based requests).
  • --->
  • <cfset currentTime = getHttpTimeString( now() ) />
  •  
  • <!--- Set up the part of the string to sign. --->
  • <cfset stringToSignParts = [
  • "GET",
  • "",
  • "",
  • currentTime,
  • resource
  • ] />
  •  
  • <!--- Collapse the parts into a newline-delimited list. --->
  • <cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
  •  
  • <!---
  • The target string is then signed to Hmac-Sha1 hashing, and
  • must be encoded as Base64. For this, I am using my Crypto.cfc
  • component.
  •  
  • NOTE: If you have ColdFusion 10, the hmac() function will now
  • do this with a single function call.
  • --->
  • <cfset signature = new Crypto().hmacSha1(
  • aws.secretKey,
  • stringToSign,
  • "base64"
  • ) />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!--- Read the S3 resource AS A BINARY object. --->
  • <cfhttp
  • result="get"
  • method="get"
  • url="https://s3.amazonaws.com#resource#"
  • getasbinary="yes">
  •  
  • <cfhttpparam
  • type="header"
  • name="Authorization"
  • value="AWS #aws.accessID#:#signature#"
  • />
  •  
  • <cfhttpparam
  • type="header"
  • name="Date"
  • value="#currentTime#"
  • />
  •  
  • </cfhttp>
  •  
  •  
  • <!---
  • Reset the output buffer and then stream the content to the
  • screen as an image.
  • --->
  • <cfcontent
  • type="image/jpeg"
  • variable="#get.fileContent#"
  • />

Notice that both the PUT and the GET actions required the current date to be set as part of the request headers. This date/time value needs to be within 15 minutes of Amazon S3 system time, or the request will be rejected. In addition to being current, the date/time value also has to be posted in a specific format. Luckily, ColdFusion's native getHttpTimeString() function makes this super easy as well.

Generating Pre-Signed Urls For Amazon Simple Storage Service (S3) Objects

Now that we've seen that we, as authenticated S3 users, can write-to and read-from the REST API, let's look at how to provide public URLs to our uploaded objects. Using "Query String Request Authentication," we can put our authentication signature directly into the request URL, removing the need of our end-users to provide the Authorization request header.

These generated URLs are time-sensitive. That is, we define an expiration date as part of the URL definition. Once the URLs has expired, Amazon S3 will start returning "Access Denied" responses. The expiration is defined as the number of seconds since Epoch. In our demo, we'll provide a URL that is valid for only 10 seconds.

  • <!---
  • Creates a structure with the secretKey and accessID so that I
  • don't have to have them in the blog post.
  • --->
  • <cfinclude template="credentials.cfm" />
  •  
  • <!---
  • This is the base resource that we want to provide a URL to. Since
  • the resource was stored with Private permissions, we'll need to
  • create a query-string-authentication URL that will grant people
  • access to the resource (for a limited amout of time).
  • --->
  • <cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • Using Query-String-Authentication is no different than any other
  • authenticated request in that a authentication signature still
  • needs to be provided. In this case, it will be part of the URL
  • that we are generating.
  • --->
  •  
  • <!---
  • The URL will only be valid for a certain amout of time. This
  • time will be determined by the number of SECONDS since Epoch.
  • For this demo, we'll make the URL available for 10 seconds.
  • --->
  • <cfset nowInSeconds = fix( now().getTime() / 1000 ) />
  •  
  • <!--- Add 10 seconds. --->
  • <cfset expirationInSeconds = ( nowInSeconds + 10 ) />
  •  
  • <!---
  • Prepare the parts of the signature - we are going to leave the
  • MD5 hash and the content type blank since the GET request won't
  • send those.
  • --->
  • <cfset stringToSignParts = [
  • "GET",
  • "",
  • "",
  • expirationInSeconds,
  • resource
  • ] />
  •  
  • <!--- Collapse the parts into a newline-delimited list. --->
  • <cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
  •  
  • <!---
  • The target string is then signed to Hmac-Sha1 hashing, and
  • must be encoded as Base64. For this, I am using my Crypto.cfc
  • component.
  •  
  • NOTE: If you have ColdFusion 10, the hmac() function will now
  • do this with a single function call.
  • --->
  • <cfset signature = new Crypto().hmacSha1(
  • aws.secretKey,
  • stringToSign,
  • "base64"
  • ) />
  •  
  • <!---
  • Make sure the signature is properly encoded for use in the query
  • string of a GET request.
  • --->
  • <cfset urlEncodedSignature = urlEncodedFormat( signature ) />
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <cfoutput>
  •  
  • <img src="https://s3.amazonaws.com#resource#?AWSAccessKeyId=#aws.accessID#&Expires=#expirationInSeconds#&Signature=#urlEncodedSignature#" />
  •  
  • </cfoutput>

NOTE: I am using the undocumented .getTime() method of the Jave Date object. You could be a bit more "proper" and use the dateDiff() function.

After we generate this URL, we can then use it to populate an IMG "src" attribute presented to our users. In this way, we can provide "secure" content to our users without making our S3 objects public.

This is only a taste of what the Amazon Simple Storage Service (S3) can do. There's a ton of stuff left to explore.




Reader Comments

@Josh,

My pleasure. It was fun to learn more about this stuff.

@Chebby,

Will do - we're gonna be moving some stuff over to S3, so I am sure I'll be learning all sorts of interesting things / use cases along the way!

Reply to this Comment

I have coincidentally been beating my head against the S3 API for the last week or so. One big "gotcha" I had to work around was file names and paths containing spaces. Remember to URL Encode your request!

If you don't, the signature will be for the non-encoded value while the browser will auto-URL encode the returned presigned URL. This will result in a signature mismatch error being returned by S3.

Reply to this Comment

@Richard,

Glad you like! Hopefully I'll have some more interesting stuff coming. This morning, I blogged a bit more about generating the pre-signed, query string authenticated URLs; but, then deemed that my exploration probably was not very fruitful (other than an increased understanding of the technology).

Reply to this Comment

@Joe,

Oh, super interesting! I had only thought to url-encode the signature; but I think that's because the S3 docs actually have a special NOTE telling you to do so. It would have never occurred to me that url-encoding would be necessary for the file names when generating the signature. Dang! I don't have any idea how I would have even debugged that.

In the past, I know that debugging Hmac values is wicked super pain. I remember when I was dealing with the Twilio API (I think), I wasn't converting to Hex properly and the leading "0" would always be stripped off... so it failed like 20% of the time :D Talk about frustrating! Took me like a week going back and forth with their support before I figured out what the problem was.

Thanks for the tip!

Reply to this Comment

" Meaning, if I PUT an object into S3, can I (as the PUT executer) read that object from S3 immediately? "

"It depends" : http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ "S3 buckets in the US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney) and South America (Sao Paulo) Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard Region provide eventual consistency."

Reply to this Comment

@Tom,

Right, but I'm still not 100% sure I understand the implications of that. Meaning, let's say that I am on the East coast of the US and I PUT an object into S3. When I think about "eventual consistency," it makes me think that if someone on the West coast of the US then immediately tried to request it, it *may* not be available yet, due to the latency in distribution across data centers. But, does that also mean that if I (on the East coast) make a request to the Object I just uploaded, will it be available immediately.

Or maybe I'm just not understanding how the data is distributed across areas.

Reply to this Comment

@Ben,

read-after-write consistency means immediate visibility of new data to all clients.

Amazon is less than perfectly transparent about some things. I dunno if it's just a docs issue, or that they keep updating the service and not the docs structure or what...

Reply to this Comment

@Ben & Tom

In my cursory testing uploading to US Standard, I've been able to access the files immediately after upload. My uploader performs processing on the file on upload success. So it appears that the uploader can hit the file, but perhaps not clients hitting nodes in other regions.

It remains to be seen if that process survives QA. If we don't get consistency, I may have to roll my processing over to a polling type system that keeps an eye on my temp S3 storage location.

I assume the data is distributed as with any other kind of CDN. It gets placed onto a single node immediately and then propagates through the rest of the network.

Reply to this Comment

@Joe, @Tom,

I was a conference last week talking to John Mancuso, who is a "Solutions Architect at Amazon Web Services". He had mentioned to the eventual consistency to me at the time. BUT, he said that the latency was only on the order of 1 second. So, at the very least, if its eventually consistent, at least "eventual" is super fast.

In my particular scenario, that could be OK, because we don't really need to read directly after write. What we do need to do is:

* Upload.
* Create a pre-signed URL.
* Send that URL to the browser.
* Have the client use an IMG tag with that URL.

So, hopefully the 1s delay (if it happens at all), will be offset by the workflow and server-client communication and HTML rendering overhead.

Reply to this Comment

@Ed,

Thanks for the links; I poked around in Barney's S3; and I've actually used Joe's in a previous project. But, I'd not really gotten my hands dirty with the knitty-gritty of how everything was put together.

Reply to this Comment

Kudos Ben, I read this at just the right time. I have recently migrated two client CF sites to AWS (one Windows, one Ubuntu) using the CF 10 AMI that came out a few months ago. I need to convert the static assets to S3 next. Thanks for the code and the hmac() function. I'll take a look at your Crypto for my CF9 clients.

You are a great asset to the ColdFusion community. I've been coding out here in Seattle for years mostly under the radar admiring your blog from afar.

BTW I've been happy with CF on AWS so far and the pricing beats traditional hosting.

Reply to this Comment

@Noah,

Excellent timing! And, funny you mention the hmac() stuff. I actually, just yesterday, posted a bit more about generating the signatures and the Content-MD5 hash in both ColdFusion 9 and ColdFusion 10:

http://www.bennadel.com/blog/2499-Generating-The-Content-MD5-Checksum-For-The-Amazon-S3-REST-API-Using-ColdFusion.htm

Small world :)

Thank you for the kinds words! I'm really glad my blog has been providing value. Hopefully many more years to come!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.