Last week, I was exploring the use of the Java's java.net.HttpURLConnection class as a means to stream file data in a form POST without having to buffer the entire file in memory prior to transfer. This worked fine when the entire post consisted of the file data; but, would the same approach work with multi-part form data (ie. form data that consists of name-value pairs)? From what I can gather this morning, it cannot. The problem appears to come from the fact that incoming ColdFusion multi-part form requests require a valid "Content-Length" header which is not necessarily known when the form data is being chunked.
To see what I'm talking about, let's take a look at some demo code. Like the example last week, we'll be writing to an output stream of an HttpURLConnection instance. However, unlike last week, we'll be sending several delimited content values in the general form of:
Content-Type: multipart/form-data; boundary=FIELD_BOUNDARY
Content-Disposition: form-data; name="title"
Content-Disposition: form-data; name="author"
Content-Disposition: form-data; name="publisher"
Content-Disposition: form-data; name="text"; filename="the_bride.txt"
By Julie Garwood
By edict of the king, the mighty Scottish laird Alec Kincaid
must take an English bride. His choice was Jamie, youngest
daughter of Baron Jamison...a feisty, violet-eyed beauty.
Alec ached to touch her, to tame her, to possess her...forever.
But Jamie vowed never to surrender to this highland barbarian.
He was everything her heart warned against, an arrogant scoundrel
whose rough good looks spoke of savage pleasures. And though
Kincaid's scorching kisses fired her blood, she brazenly resisted
him... until one rapturous moment quelled their clash of wills,
and something far more dangerous than desire threatened to conquer
Notice that each part of the form post is delimited by a consistent boundary marker; and, that each name/value token is separated by two line breaks.
Ok, let's take a look at the code that I thought would get this to work:
<!--- Calculate the URL to which we are posting. For this demo, it will be another page on this ColdFusion server. ---> <cfset postUrl = ( "http://" & cgi.server_name & getDirectoryFromPath( cgi.script_name ) & "target.cfm" ) /> <!--- Create an instance of our Java URL - This is the object that we will use to open the connection to the above location. ---> <cfset targetUrl = createObject( "java", "java.net.URL" ).init( javaCast( "string", postUrl ) ) /> <!--- Now that we have our URL, let's open a connection to it. This will give us access to the input (download) and output (upload) streams for the target end point. NOTE: This gives us an instance of java.net.URLConnection (or one of its sub-classes). ---> <cfset connection = targetUrl.openConnection() /> <!--- Be default, the connection is only set to gather target content, not to POST it. As such, we have to make sure that we turn on output (upload) before we access the data streams. ---> <cfset connection.setDoOutput( javaCast( "boolean", true ) ) /> <!--- Since we are uploading, we have to set the method to POST. ---> <cfset connection.setRequestMethod( javaCast( "string", "POST" ) ) /> <!--- By default, the connection will locally buffer the data until it is ready to be posted in its entirety. We don't want to hold it all in memory, however; as such, we need to explicitly turn data Chunking on. This will allow the connection to flush data to the target url without having to load it all in memory (this is perfect for when the size of the data is not known ahead of time). ---> <cfset connection.setChunkedStreamingMode( javaCast( "int", 50 ) ) /> <!--- When posting data, the content-type will determine how the target server parses the incoming request. If the target server is ColdFusion, this is especially crtical as it will throw an error if it tries to parse this POST as a collection of name-value pairs. In this case, we WANT it to see the form as multi-part, which will be a collection of name-value pairs. In order to delimit the part of the form post, we need to create a bondary identifier. This is how the server will know where one value ends and the next one starts. This needs to be a random string so as not to show up in the form data itself (as a false boundary). ---> <cfset fieldBoundary = ("POST------------------" & getTickCount()) /> <!--- Set the content type and include the boundary information so the server knowns how to parse the data. ---> <cfset connection.setRequestProperty( javaCast( "string", "Content-Type" ), javaCast( "string", ("multipart/form-data; boundary=" & fieldBoundary) ) ) /> <!--- Now that we have prepared the connection to the target URL, let's get the output stream - this is the UPLOAD stream to which we can write data to be posted to the target server. ---> <cfset uploadStream = connection.getOutputStream() /> <!--- Before we send the file data, we'll send some simple name-value pairs in plain-text format. In order to make it easier to write strings to the upload stream, let's wrap it in a Writer. This will allow us to write string data rather than just bytes. ---> <cfset uploadWriter = createObject( "java", "java.io.OutputStreamWriter" ).init( uploadStream ) /> <!--- Form data makes heavy use of the Carriage Return and New Line characters to delimite values. ---> <cfset crnl = (chr( 13 ) & chr( 10 )) /> <!--- A double break is also used. ---> <cfset crnl2 = (crnl & crnl) /> <!--- Delimit the field. ---> <cfset uploadWriter.write( javaCast( "string", ("--" & fieldBoundary & crnl) ) ) /> <!--- Send the title. ---> <cfset uploadWriter.write( javaCast( "string", ( "Content-Disposition: form-data; name=""title""" & crnl2 & "The Bride" & crnl )) ) /> <!--- Delimit the field. ---> <cfset uploadWriter.write( javaCast( "string", ("--" & fieldBoundary & crnl) ) ) /> <!--- Send the author. ---> <cfset uploadWriter.write( javaCast( "string", ( "Content-Disposition: form-data; name=""author""" & crnl2 & "Julie Garwood" & crnl )) ) /> <!--- Delimit the field. ---> <cfset uploadWriter.write( javaCast( "string", ("--" & fieldBoundary & crnl) ) ) /> <!--- Send the publisher. ---> <cfset uploadWriter.write( javaCast( "string", ( "Content-Disposition: form-data; name=""publisher""" & crnl2 & "Pocket Star" & crnl )) ) /> <!--- Now that we've written the simple name/value pairs, let's post the actual file data as part of the incoming request. This works very much in the same way, although we are going to stream the local file into the post data. Let's open a connection to a local file that we will stream to the output a byte at a time. NOTE: There are more effficient, buffered ways to read a file into memory; however, this is just trying to keep it simple. ---> <cfset fileInputStream = createObject( "java", "java.io.FileInputStream" ).init( javaCast( "string", expandPath( "./data2.txt" ) ) ) /> <!--- Delimit the field. ---> <cfset uploadWriter.write( javaCast( "string", ("--" & fieldBoundary & crnl) ) ) /> <!--- Send the file along. ---> <cfset uploadWriter.write( javaCast( "string", ( "Content-Disposition: form-data; name=""text""; filename=""the_bride.txt""" & crnl & "Content-Type: ""text/plain""" & crnl2 )) ) /> <!--- Read the first byte from the file. ---> <cfset nextByte = fileInputStream.read() /> <!--- Keep reading from the file, one byte at a time, until we hit (-1) - the End of File marker for the input stream. ---> <cfloop condition="(nextByte neq -1)"> <!--- Write this byte to the output (UPLOAD) stream. ---> <cfset uploadWriter.write( javaCast( "int", nextByte ) ) /> <!--- Read the next byte from the file. ---> <cfset nextByte = fileInputStream.read() /> </cfloop> <!--- Add the new line to the field value. ---> <cfset uploadWriter.write( javaCast( "string", crnl ) ) /> <!--- Delimit the end of the post. Notice that the last delimiter has a trailing double-slash after it. ---> <cfset uploadWriter.write( javaCast( "string", (crnl & "--" & fieldBoundary & "--" & crnl) ) ) /> <!--- Now that we're done streaming the file, close the stream. ---> <cfset uploadWriter.close() /> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- ----------------------------------------------------- ---> <!--- At this point, we have completed the UPLOAD portion of the request. We could be done; or we could look at the input (download) portion of the request in order to view the response or the error. ---> <cfoutput> Response: #connection.getResponseCode()# - #connection.getResponseMessage()#<br /> <br /> </cfoutput> <!--- The input stream is mutually exclusive with the error stream, although both can return data. As such, let's try to access the input stream... and then use the error stream if there is a problem. ---> <cftry> <!--- Try for the input stream. ---> <cfset downloadStream = connection.getInputStream() /> <!--- If the input stream is not available (ie. the server returned an error response), then we'll have to use the error output as the response stream. ---> <cfcatch> <!--- Use the error stream as the download. ---> <cfset downloadStream = connection.getErrorStream() /> </cfcatch> </cftry> <!--- At this point, we have either the natural download or the error download. In either case, we can start reading the output in the same mannor. ---> <cfset responseBuffer =  /> <!--- Get the first byte. ---> <cfset nextByte = downloadStream.read() /> <!--- Keep reading from the response stream until we run out of bytes (-1). We'll be building up the response buffer a byte at a time and then outputting it as a single value. ---> <cfloop condition="(nextByte neq -1)"> <!--- Add the byte AS CHAR to the response buffer. ---> <cfset arrayAppend( responseBuffer, chr( nextByte ) ) /> <!--- Get the next byte. ---> <cfset nextByte = downloadStream.read() /> </cfloop> <!--- Close the response stream. ---> <cfset downloadStream.close() /> <!--- Output the response. ---> <cfoutput> Response: #arrayToList( responseBuffer, "" )# </cfoutput>
As you can see, we are taking the output stream from the URL connection and wrapping it in an instance of java.io.OutputStreamWriter. This simply allows us to write string data in addition to individual bytes; since we're dealing with so many form parts, this will make our lives a lot easer.
Once we have the string writer, we go about defining each part of the multi-part form data. Unfortunately, when we run this page, we get the following error response from the target ColdFusion server (to which we are POSTing):
411 - Length Required
A request of the requested method POST requires a valid Content-length.
It appears that the ColdFusion server wants to see the "Content-Length" header. However, any attempt on my part to define content-length as a request parameter failed. It appears as if the content-length header is mutually exclusive with the way the chunking is defined by the HTTP Url Connection class.
The problem lies with this line of code:
<cfset connection.setChunkedStreamingMode( javaCast( "int", 50 ) ) />
This is what turns on chunking and defines the size of the data chunks to be flushed to the server as the form data is being posted. If I comment this line out, the content of the form post will be buffered entirely within the local machine before the form data is flushed to the target server. In doing so, the connection will be able to determine the appropriate content-length of the post and everything appears to run successfully:
The above output is the HTML content that is coming back from the "target.cfm" page. It is being produced by the following ColdFusion code:
Target.cfm (Our POST Action)
<!--- Spit out a response based on the incoming POST. ---> <div style="width: 500px ;"> <!--- Output the request data. ---> <cfdump var="#getHttpRequestData()#" label="HTTP Request" /> <br /> <!--- Output the form data. ---> <cfdump var="#form#" label="Form Data" /> <br /> <!--- Read in the TMP file that was uploaded and output the content to the screen. ---> <cfdump var="#fileRead( form.text )#" label="File Content" /> </div>
I'm very new to using anything other than CFHTTP to post form data in ColdFusion; as such, it's entirely likely that I'm missing something critical in getting a chunked, multi-part form POST to work. However, from what I can gather in my brief testing, it appears that chunking only works when the POST consists of a single Body value; when the form POST consists of multiple, delimited values, ColdFusion doesn't seem to like streaming requests.
NOTE: This general limitation seems to be echoed in a post by Tom Jordahl when discussing ColdFusion requests in the context of SOAP requests.
Want to use code from this post? Check out the license.
I was uploading an Image file with the name and other fields as a multi-part data from Android.
I was using DataOutputStream with buffered writer, and I was thinking that it'll transfer file buffer size. But now I realize that I was using buffered reader for reading a file. Means I was still making only one request to server. Oh....
I now must need to find out how to handle this "Chunked" data in Java and other languages.
Now I have some more questions like how the "Streaming" works (specially with ColdFusion)? What is connection property "Keep-alive"? etc..
This is probably just JRun only supporting HTTP/1.0 or incorrectly supporting 1.1. Did you try on Tomcat instead? It's unlikely that CF has anything to do with this, the error comes up from the servlet engine.
I just tried posting to a Node.js server and the chunked post seems to stream quite well:
It must just be something on the CF stack that can't deal with the stream.
Using Tomcat is a bit outside my understanding at this point. I've only ever dealt with the native JRUN stuff that gets installed with CF.
I had a similar problem when trying to handle SOAP-MTOM / XOP jizz jazz, running headwall into CFHTTP and ColdFusion's limitations altogether.
No matter what direction I tried, that binary chunk is impossible to get correct. For me, I was passing a couple of zip files to and fro a web service, and I finally had to suffer writing a couple of extra bytes at the end, eschewing CFZIP or the underlying Java for zipping and wield a lot more forgiving command line app using CFEXECUTE. Streaming going up wasn't as much a problem as getting meaningful binary back.
I didn't see any improvements with CF9 on this score, and I haven't played enough with Railo or Tomcat to understand if I can expose anything else to move the needle on this one.
For my part, if it comes up moving forward, since most of my CF work inhabits IIS, I'll just build DLLs, more than likely, and ply services with binary that way until CF catches up on this score. It's a pretty big handicap, especially when so many SaaS APIs throw this our way.
I wouldn't even know the first thing about writing Windows DLLs. That's pretty awesome if you can do that. It would be cool if this worked from ColdFusion. I don't have any existing need to make use of it; but, I read that Amazon's S3 could use steaming posts.
Have you tired this on CF10 since it replaced JRUN with Tomcat?
@Brian, I was wondering if you ever found a solution for your SOAP-MTOM multipart problem?
We're struggling with a very similar issue and we've ran out of options... Any help/pointers would be greatly appreciated!
I really ultimately only had two options if there were no other choices but to use SOAP-MTOM, and that was to either handle the transactions directly in a CFX, or as I ultimately ended up doing, building a DLL specifically for those transactions. .NET since 3.0 (WSE) has had nice handlers for this in System.ServiceModel. I'm stronger in C# and .NET than I am in ColdFusion, even though my experience with CF is older...I get more work in .NET than ColdFusion (which saddens me on many levels).
If it's just a matter of handling attachments in an incoming SOAP-MTOM piece, you can parse the response and try and write out the appropriate portion of the file to a file, get the content via .toByteArray() and write that content using cffile. I found with zip files I'd get extra bytes somehow, which forced me to hit them often with a CFEXECUTE against a more forgiving command line zip utility. Ultimately the DLL was less headache prone, especially if you are dealing with an API (like eBay's LMS) that is more fluid.
Maybe someone has come up with something since? I haven't looked into this in about two years since the DLL ended up being an adequate solution.