POST Streaming Upload Data From ColdFusion Using Java And Node.js

By Ben Nadel

Published 2011-09-02 in ColdFusion, JavaScript / DHTML — Comments (4)

I can't find the email, but a while back, someone asked me about POSTing very large files with ColdFusion's CFHTTP and CFHTTPParam tags. This individual was running out of memory because ColdFusion apparently needed to load the entire file into the local RAM before posting it up to the target server. To get around this issue, I started poking into the Java layer (beneath the ColdFusion surface) and found the java.net.HttpURLConnection class. This Java class allows a URL connection to be held open with HTTP-like behavior, including chunked data streaming; and, this chunked data streaming allows us to post data a byte at a time, without having to know about the size of the local file.

I don't what to go into too much explanation since I only just discovered this Java class and started to play with it. But, from what I can gather, the connection to the target URL has both an output stream and input stream. The output stream represents the Upload and the input stream represents the Download (ie. the response).

As long as the connection has "chunking" turned on, we can write to the output (upload) stream and have the stream flush data without having to buffer the data entirely within the local memory. In this way, we can read the local file in and write it, a byte at a time, to the output (upload) stream.

NOTE: For my demo, I am reading in a byte at a time, which is probably horribly inefficient. In reality, you'd probably want a Buffered input stream; but, to keep it simple, I'm using byte-wise streaming as it allows granular control.

In order to see this chunked upload in progress, I am actually going to be posting to a local Node.js server. This Node.js server will then pipe the incoming POST data out to a different GET response. Furthermore, we'll have the ColdFusion POST pause after every 100 bytes. In this way, we can get a solid visual confirmation that the POST is, in fact, being sent in chunks, without having to be completely buffered in the server's memory.

Before we look at the ColdFusion code, which is extremely verbose, let's look at the Node.js code so we can see how the requests will be handled. In the following server configuration, we need to make a standard GET request to the Node.js server before we make our POST. The Node.js server will hold the GET response open until the POST request is registered. At that point, the data chunks from the POST will be written to the GET response.

server.js (Node.js Server Configuration)

// Include the necessary modules.
var sys = require( "sys" );
var http = require( "http" );


// ---------------------------------------------------------- //
// ---------------------------------------------------------- //


// For this demo, we are going to pipe the form upload POST into the
// response of a browser-based GET request.
//
// NOTE: You have to make the GET request *before* the POST.
var getResponse = null;


// ---------------------------------------------------------- //
// ---------------------------------------------------------- //


// Create an instance of the HTTP server.
var server = http.createServer(
	function( request, response ){


		// Check to see if the incoming request is a GET. If so,
		// we're going to hold it open and pump the POST data through
		// (once the post is made).
		if (request.method === "GET"){


			// Store the output stream for later.
			getResponse = response;

			// Set the 200-OK header.
			response.writeHead(
				200,
				{ "content-type": "text/plain" }
			);

			// Write some data.
			getResponse.write( "Waiting for POST...\n\n" );

			// Log the hold-open.
			console.log( "Holing GET request open for POST." );

			// NOTE: We are not explicitly ending the response. This
			// will hold it open until it times-out.


		// Check to see if the reuqest is a POST.
		} else if (request.method === "POST"){


			// Make sure that we have a pending GET response.
			if (getResponse === null){

				// Log the issue.
				console.log( "POST being denied." );

				// We have no response to pipe the data to. Return an
				// error response to the post.
				response.writeHead(
					500,
					{ "content-type": "text/plain" }
				);

				// End the response.
				return(
					response.end( "No pending GET response!!" )
				);

			}


			// If we made it this far than we have a GET request we
			// are holding open and can pipe the POST data through
			// without problem. Set the 200-OK header.
			response.writeHead(
				200,
				{ "content-type": "text/plain" }
			);

			// Listen for data chunks to come through on the post.
			// This will be the data that gets periodically flushed
			// during our streaming POST.
			request.on(
				"data",
				function( buffer ){

					// Log the length of the buffer.
					console.log( "Chunk:", buffer.length );

					// Pipe the incoming data chunk into the response
					// of our GET output stream.
					getResponse.write( buffer.toString() );

				}
			);

			// Listen for the completion of the POST. Once the POST
			// is done, we will close both the POST and the GET
			// response streams.
			request.on(
				"end",
				function(){

					// Close the POST stream.
					response.end(
						"\n\nEnded Node.js response." +
						(new Date()).toString()
					);

					// Close the GET stream.
					getResponse.end(
						"\n\nEnded Node.js response." +
						(new Date()).toString()
					);

					// Clear the GET response reference.
					getResponse = null;

				}
			);


		}


	}
);

// Point the server to listen to the given port for incoming
// requests.
server.listen( 8080 );


// ---------------------------------------------------------- //
// ---------------------------------------------------------- //


// Write debugging information to the console to indicate that
// the server has been configured and is up and running.
sys.puts( "Server is running on 8080" );

As you can see, the incoming GET response is cached in the getResponse variable. Then, when the POST requests comes in, a "data"-event handler writes the buffered chunks to the cached GET response stream. Then, once the POST request is done, both the POST and GET responses are closed.

As an aside... I'm sorry, but Node.js is pretty badass!

Ok, now that we see the Node.js logic, let's take a look at the ColdFusion code that opens the connection to the target Node.js server and then begins to pipe the local file data, in chunks, to the output stream. I won't cover the code too much since 1) I don't know it in so much depth and 2) the code is quite heavily commented.

<!---
	Create an instance of our target URL - the one to which we
	are going to post binary form data. In our case, this will be
	a local NODE.JS server because it will allow us to examine the
	post in the Node.js console as it comes through.
--->
<cfset targetUrl = createObject( "java", "java.net.URL" ).init(
	javaCast( "string", "http://localhost:8080" )
	) />

<!---
	Now that we have our URL, let's open a connection to it. This
	will give us access to the input (download) and output (upload)
	streams for the target end point.

	NOTE: This gives us an instance of java.net.URLConnection (or
	one of its sub-classes).
--->
<cfset connection = targetUrl.openConnection() />

<!---
	Be default, the connection is only set to gather target content,
	not to POST it. As such, we have to make sure that we turn on
	output (upload) before we access the data streams.
--->
<cfset connection.setDoOutput( javaCast( "boolean", true ) ) />

<!--- Since we are uploading, we have to set the method to POST. --->
<cfset connection.setRequestMethod( javaCast( "string", "POST" ) ) />

<!---
	By default, the connection will locally buffer the data until it
	is ready to be posted in its entirety. We don't want to hold it
	all in memory, however; as such, we need to explicitly turn data
	Chunking on. This will allow the connection to flush data to the
	target url without having to load it all in memory (this is
	perfect for when the size of the data is not known ahead of time).

	NOTE: In our case, we're gonna set it small so we can see some
	activity over the stream in realtime.
--->
<cfset connection.setChunkedStreamingMode( javaCast( "int", 50 ) ) />

<!---
	When posting data, the content-type will determine how the
	target server parses the incoming request. If the target server
	is ColdFusion, this is especially crtical as it will throw an
	error if it tries to parse this POST as a collection of
	name-value pairs.
--->
<cfset connection.setRequestProperty(
	javaCast( "string", "content-type" ),
	javaCast( "string", "text/plain" )
	) />


<!---
	Now that we have prepared the connection to the target URL, let's
	get the output stream - this is the UPLOAD stream to which we can
	write data to be posted to the target server.
--->
<cfset uploadStream = connection.getOutputStream() />


<!---
	Let's open a connection to a local file that we will stream to
	the output a byte at a time.

	NOTE: There are more effficient, buffered ways to read a file
	into memory; however, this is just trying to keep it simple.
--->
<cfset fileInputStream = createObject( "java", "java.io.FileInputStream" ).init(
	javaCast( "string", expandPath( "./data2.txt" ) )
	) />


<!---
	Before we start posting, we want to keep track of the number
	of bytes that gets sent; this way, we can pause the stream
	occassionally to give us time to watch the activity in the
	NODE.JS console.
--->
<cfset byteCount = 0 />

<!--- Read the first byte from the file. --->
<cfset nextByte = fileInputStream.read() />

<!---
	Keep reading from the file, one byte at a time, until we hit
	(-1) - the End of File marker for the input stream.
--->
<cfloop condition="(nextByte neq -1)">

	<!--- Increment the byte count. --->
	<cfset byteCount++ />

	<!--- Write this byte to the output (UPLOAD) stream. --->
	<cfset uploadStream.write( javaCast( "int", nextByte ) ) />


	<!---
		Check to see if we are at 100 bytes. We want to pause the
		upload every 100 bytes in order to view the activity.
	--->
	<cfif !(byteCount % 100)>

		<!--- Flush the upload stream. --->
		<cfset uploadStream.flush() />

		<!--- Pause the upload. --->
		<cfset sleep( 2000 ) />

	</cfif>


	<!--- Read the next byte from the file. --->
	<cfset nextByte = fileInputStream.read() />

</cfloop>

<!--- Now that we're done streaming the file, close the stream. --->
<cfset uploadStream.close() />


<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->


<!---
	At this point, we have completed the UPLOAD portion of the
	request. We could be done; or we could look at the input
	(download) portion of the request in order to view the response
	or the error.
--->
<cfoutput>

	Response:
	#connection.getResponseCode()# -
	#connection.getResponseMessage()#<br />
	<br />

</cfoutput>

<!---
	The input stream is mutually exclusive with the error stream,
	although both can return data. As such, let's try to access
	the input stream... and then use the error stream if there is
	a problem.
--->
<cftry>

	<!--- Try for the input stream. --->
	<cfset downloadStream = connection.getInputStream() />

	<!---
		If the input stream is not available (ie. the server returned
		an error response), then we'll have to use the error output
		as the response stream.
	--->
	<cfcatch>

		<!--- Use the error stream as the download. --->
		<cfset downloadStream = connection.getErrorStream() />

	</cfcatch>

</cftry>


<!---
	At this point, we have either the natural download or the error
	download. In either case, we can start reading the output in
	the same mannor.
--->
<cfset responseBuffer = [] />

<!--- Get the first byte. --->
<cfset nextByte = downloadStream.read() />

<!---
	Keep reading from the response stream until we run out of bytes
	(-1). We'll be building up the response buffer a byte at a time
	and then outputting it as a single value.
--->
<cfloop condition="(nextByte neq -1)">

	<!--- Add the byte AS CHAR to the response buffer. --->
	<cfset arrayAppend( responseBuffer, chr( nextByte ) ) />

	<!--- Get the next byte. --->
	<cfset nextByte = downloadStream.read() />

</cfloop>

<!--- Close the response stream. --->
<cfset downloadStream.close() />

<!--- Output the response. --->
<cfoutput>

	Response: #arrayToList( responseBuffer, "" )#

</cfoutput>

To see this in action, take a look at the video above. You'll be able to see that the local file (data2.txt) is loaded using an input stream and then posted, in small chunks, to the target server. This allows the file to be posted without it ever being fully loaded into the server's local memory.

NOTE: This approach does not use name-value pairs in its form data. That would make the form content much more complicated (way beyond the scope of this exploration). By using a "text/plain" content-type, I can only worry about posting a single value.

Most of the time, form-POST size is not an issue; however, if you have to post very large files, you can (apparently) find yourself running out of RAM. In order to handle large posts, it seems that you can dip down into the Java layer to stream data over a URL connection in smaller, bite-sized chunks (no pun intended).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/2251

Reader Comments

Jim Hankins Sep 2, 2011 at 11:16 AM

3 Comments

Very timely post for me. While I've not run into a memory issue as yet, I was concerned I might on a project I'm working on. Good stuff.

Ben Nadel Sep 2, 2011 at 11:33 AM

16,020 Comments

@Jim,

Cool man. Keep us posted with anything you do. I assume this gets much more complicated if you need to post a file as *part* of a form post with other name-value pairs; as, then, you have to build the delimiters and what not.

I'll try to play around with that concept as well. I did that once to play with the File API in HTML5 JavaScript; I think the concept is exactly the same.

Phillip Senn Sep 3, 2011 at 9:27 PM

62 Comments

Did you participate in the node.js knockout (hackathon)? There was a team here in Charlotte that stayed up Friday - Sunday.

Ben Nadel Sep 6, 2011 at 11:07 AM

16,020 Comments

@Phillip,

I didn't :( I only have one hackathon under my belt so far.

@All,

I tried this kind of approach with a multi-part form data post:

www.bennadel.com/blog/2252-Apparently-ColdFusion-Cannot-Handle-Chunked-Multi-Part-Form-Data.htm

Unfortunately, ColdFusion as a target server doesn't seem to handle chunking and multi-part form data. At least, that's what I'm seeing in my small, mostly ignorant testing (I'm very new to this concept).

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.