Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with:

ColdFusion CFFile vs. Java java.io.BufferedOutputStream

By Ben Nadel on
Tags: ColdFusion

Project Skin Spider is powered by a home-grown XML database system. This involves a lot of writing of data to files. And for any of you who have every used a database, you know that even simple databases can quickly build up huge repository of data. For this reason, even minor increases in file-writing performance can have an impact on an application that's constantly updating data.

Currently, my DatabaseService.cfc uses the ColdFusion tag, CFFile, to write the query data to an XML file. I did a little exploration to see how CFFile stacks up against straight Java I/O calls. I compared CFFile to a standard File Output Stream as well as a Buffered Output Stream. For the test, I basically come up with random phrases and write tens of thousands of them to disk.

As with all testing, let me start out by setting up the testing environment:

  • <!--- Create an array of data to choose from. --->
  • <cfset arrParts = ArrayNew( 1 ) />
  •  
  • <!--- Add data to the array. --->
  • <cfset arrParts[ 1 ] = "Feet" />
  • <cfset arrParts[ 2 ] = "Calves" />
  • <cfset arrParts[ 3 ] = "Thighs" />
  • <cfset arrParts[ 4 ] = "Hips" />
  • <cfset arrParts[ 5 ] = "Bottom" />
  • <cfset arrParts[ 6 ] = "Boobs" />
  • <cfset arrParts[ 7 ] = "Eyes" />
  •  
  • <!---
  • Set the number of iterations. This is the number of lines
  • of text that we will end up writing to file.
  • --->
  • <cfset intIterations = 100000 />

Now, the way I currently use CFFile for large data that I build incrementally is to use the Java StringBuffer to create the output data. Then, I write the string buffer to disk. For those of you who are not familiar with the StringBuffer, it basically creates a more efficient way of creating large strings from smaller ones by putting of string concatenation until it is absolute necessary:

  • <!---
  • Test the standard ColdFusion CFFile and Java
  • StringBuffer methodolog.
  • --->
  • <cftimer label="StringBuffer Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./sb_output.txt" ) />
  •  
  • <!--- Create a string buffer. --->
  • <cfset sbOutput = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init() />
  •  
  • <!---
  • Loop over tthe iterations to build up the string
  • buffer. For each iteration, we are going to
  • select a random string and add it to the buffer.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!--- Add a random string to the string buffer. --->
  • <cfset sbOutput.Append(
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ) />
  •  
  • </cfloop>
  •  
  • <!---
  • Now that we have created the string buffer, write
  • the data to selected file name.
  • --->
  • <cffile
  • action="WRITE"
  • file="#strFilePath#"
  • output="#sbOutput.ToString()#"
  • />
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

This created a file that was roughly 2.3 MegaBytes. Then, I did the same thing, but used the Java FileOutputStream to write the file as I created the data (as opposed to a lump-sum writing at the end):

  • <!---
  • Test the Java FileOuptputStream methodology of writing data
  • to disk as we get it, not just at the end.
  • --->
  • <cftimer label="FileOutputStream Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./io_output.txt" ) />
  •  
  • <!---
  • Create the file output stream. When creating the file
  • output stream, we have to initialize it with a Java
  • File object (which we initialize with the path to
  • the file we want to create).
  • --->
  • <cfset osOutput = CreateObject(
  • "java",
  • "java.io.FileOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.File"
  • ).Init(
  • strFilePath
  • )
  • ) />
  •  
  •  
  • <!---
  • Loop over the iterations to build up the data. For
  • each iteration, we are going to select a random
  • string and write that string directly to the output
  • stream which should write it directly to file.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!---
  • Add a random string to the file output stream.
  • In this case, the Write() method of the output
  • stream takes a Byte Array. For that, we can
  • call the GetBytes() Java method on the string.
  • We use the ToString() method to create a string
  • object before getting the byte array.
  • --->
  • <cfset osOutput.Write(
  • ToString(
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ).GetBytes()
  • ) />
  •  
  • </cfloop>
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

This created a file that was also roughly 2.3 MegaBytes. Now, I don't know that much about Java - this is all experimentation to me. I see that there is a BufferedOutputStream. That's got to be there for a reason and that reason has to be optimization:

By setting up such an output stream, an application can write bytes to the underlying output stream without necessarily causing a call to the underlying system for each byte written.

When creating the buffered output stream, you can create the buffer size. I tried this with three different buffer sizes (but will only show the code once as the buffer size is the only variable). I tried the default buffer size which is 512 bytes. Then I tried it with 2048 bytes and 5000 bytes:

  • <!---
  • Test the Java BufferedOutputStream. In this case we are
  • testing the output stream with a buffer size of 2048 bytes,
  • but we can set it to anything we want.
  • --->
  • <cftimer label="BufferedFileOutput Stream Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./bio_output2.txt" ) />
  •  
  • <!---
  • Create the buffered file output stream. When
  • creating the buffered file output stream, we have
  • to initialize it with a Java File Output stream,
  • which we, in turn, have to initialize with a Java
  • File Output Stream, which itself needs to be
  • initialized with a Java File object (which we
  • initialize with the path to the file we want
  • to create).
  •  
  • Additionally, the second argument of the buffered
  • output stream is the size of the buffer. In this
  • case we are using 2048 bytes.
  • --->
  • <cfset bosOutput = CreateObject(
  • "java",
  • "java.io.BufferedOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.FileOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.File"
  • ).Init(
  • strFilePath
  • )
  • ),
  • JavaCast( "int", 2048 )
  • ) />
  •  
  •  
  • <!---
  • Loop over the iterations to build up the data. For
  • each iteration, we are going to select a random
  • string and write that string directly to the
  • buffered output stream which should write it
  • directly to file output stream once the buffer is
  • populated with enough data.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!---
  • Create the random string. In this case we are
  • creating the string prior to buffer writing
  • because we will need it to get the length of
  • the data.
  • --->
  • <cfset strText = (
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ) />
  •  
  •  
  • <!---
  • Add a random string to the buffered file output
  • stream. We want to write the entire byte array
  • to the output stream.
  • --->
  • <cfset bosOutput.Write(
  • strText.GetBytes(),
  • JavaCast( "int", 0 ),
  • strText.Length()
  • ) />
  •  
  • </cfloop>
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

As with many things, a small number of iterations yields absolutely no difference in speed. We, however, are performing 100,000 iterations. But, even at large iterations, the performance is inconsistent. When I run this test on my machine at home on my new Dell Inspiron E1505 Core Duo, the 5000 byte BufferedOutputStream outperforms everything but small margin, maybe a 100 ms or so. However, when I perform this test here at the office on our powerful server, the CFFile tag outperforms everything else by about 40 ms on repeated tests.

I wonder what that's all about?? I can say for sure though that the straight up Java FileOutputStream was by far the slowest performer. The ColdFusion CFFile tag and the Buffered output streams or any size performed faster. I guess you need to find a balance between data caching a file I/O.

It seems that I am doing alright with the CFFile methodology. But one other thing to note though, and this has more to do with my XML database system, performance is not the only consideration. Using CFFile and Java StringBuffer, I have to create the entire output data buffer in memory before I write it to disk. This can be hard on the system RAM. With an output stream, I can minimize the amount of data that gets stored in the system RAM at any given time.



Reader Comments

One thing to keep in mind when doing coldfusion vs java speed tests is some unexpected performance costs when calling java from coldfusion. coldfusion will attempt to analyze the java call's parameters as to data type, so as to convert them properly if necessary. In most instances it's not a consideration, but with large strings or other pieces of data it can be significant. The way to avoid the cost is to specifically 'javaCast' the input like this, javaCast("String", x).

I don't know if the above issue would impact your tests, just thought I would share.

Reply to this Comment

@Arthur,

Yeah, I have to get into using the JavaCast() method all the time. I get very lazy about it when it comes to methods that expect strings.

Reply to this Comment

Hi Ben - There's an issue with these methods in that if you don't issue a osOutput.close() then the files can sometimes remain open within JRun.

Martin

Reply to this Comment

Post A Comment

?
You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.