ColdFusion CFFile vs. Java java.io.BufferedOutputStream

Posted October 2, 2006 at 8:05 AM by Ben Nadel

Tags: ColdFusion

Project Skin Spider is powered by a home-grown XML database system. This involves a lot of writing of data to files. And for any of you who have every used a database, you know that even simple databases can quickly build up huge repository of data. For this reason, even minor increases in file-writing performance can have an impact on an application that's constantly updating data.

Currently, my DatabaseService.cfc uses the ColdFusion tag, CFFile, to write the query data to an XML file. I did a little exploration to see how CFFile stacks up against straight Java I/O calls. I compared CFFile to a standard File Output Stream as well as a Buffered Output Stream. For the test, I basically come up with random phrases and write tens of thousands of them to disk.

As with all testing, let me start out by setting up the testing environment:

  • <!--- Create an array of data to choose from. --->
  • <cfset arrParts = ArrayNew( 1 ) />
  •  
  • <!--- Add data to the array. --->
  • <cfset arrParts[ 1 ] = "Feet" />
  • <cfset arrParts[ 2 ] = "Calves" />
  • <cfset arrParts[ 3 ] = "Thighs" />
  • <cfset arrParts[ 4 ] = "Hips" />
  • <cfset arrParts[ 5 ] = "Bottom" />
  • <cfset arrParts[ 6 ] = "Boobs" />
  • <cfset arrParts[ 7 ] = "Eyes" />
  •  
  • <!---
  • Set the number of iterations. This is the number of lines
  • of text that we will end up writing to file.
  • --->
  • <cfset intIterations = 100000 />

Now, the way I currently use CFFile for large data that I build incrementally is to use the Java StringBuffer to create the output data. Then, I write the string buffer to disk. For those of you who are not familiar with the StringBuffer, it basically creates a more efficient way of creating large strings from smaller ones by putting of string concatenation until it is absolute necessary:

  • <!---
  • Test the standard ColdFusion CFFile and Java
  • StringBuffer methodolog.
  • --->
  • <cftimer label="StringBuffer Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./sb_output.txt" ) />
  •  
  • <!--- Create a string buffer. --->
  • <cfset sbOutput = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init() />
  •  
  • <!---
  • Loop over tthe iterations to build up the string
  • buffer. For each iteration, we are going to
  • select a random string and add it to the buffer.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!--- Add a random string to the string buffer. --->
  • <cfset sbOutput.Append(
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ) />
  •  
  • </cfloop>
  •  
  • <!---
  • Now that we have created the string buffer, write
  • the data to selected file name.
  • --->
  • <cffile
  • action="WRITE"
  • file="#strFilePath#"
  • output="#sbOutput.ToString()#"
  • />
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

This created a file that was roughly 2.3 MegaBytes. Then, I did the same thing, but used the Java FileOutputStream to write the file as I created the data (as opposed to a lump-sum writing at the end):

  • <!---
  • Test the Java FileOuptputStream methodology of writing data
  • to disk as we get it, not just at the end.
  • --->
  • <cftimer label="FileOutputStream Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./io_output.txt" ) />
  •  
  • <!---
  • Create the file output stream. When creating the file
  • output stream, we have to initialize it with a Java
  • File object (which we initialize with the path to
  • the file we want to create).
  • --->
  • <cfset osOutput = CreateObject(
  • "java",
  • "java.io.FileOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.File"
  • ).Init(
  • strFilePath
  • )
  • ) />
  •  
  •  
  • <!---
  • Loop over the iterations to build up the data. For
  • each iteration, we are going to select a random
  • string and write that string directly to the output
  • stream which should write it directly to file.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!---
  • Add a random string to the file output stream.
  • In this case, the Write() method of the output
  • stream takes a Byte Array. For that, we can
  • call the GetBytes() Java method on the string.
  • We use the ToString() method to create a string
  • object before getting the byte array.
  • --->
  • <cfset osOutput.Write(
  • ToString(
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ).GetBytes()
  • ) />
  •  
  • </cfloop>
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

This created a file that was also roughly 2.3 MegaBytes. Now, I don't know that much about Java - this is all experimentation to me. I see that there is a BufferedOutputStream. That's got to be there for a reason and that reason has to be optimization:

By setting up such an output stream, an application can write bytes to the underlying output stream without necessarily causing a call to the underlying system for each byte written.

When creating the buffered output stream, you can create the buffer size. I tried this with three different buffer sizes (but will only show the code once as the buffer size is the only variable). I tried the default buffer size which is 512 bytes. Then I tried it with 2048 bytes and 5000 bytes:

  • <!---
  • Test the Java BufferedOutputStream. In this case we are
  • testing the output stream with a buffer size of 2048 bytes,
  • but we can set it to anything we want.
  • --->
  • <cftimer label="BufferedFileOutput Stream Test" type="outline">
  •  
  • <!---
  • Kill extra output. We want to do this because otherwise,
  • we are creating [intIterations] amount of white space
  • on the page. No need for that.
  • --->
  • <cfsilent>
  •  
  • <!--- Get the file name to write to. --->
  • <cfset strFilePath = ExpandPath( "./bio_output2.txt" ) />
  •  
  • <!---
  • Create the buffered file output stream. When
  • creating the buffered file output stream, we have
  • to initialize it with a Java File Output stream,
  • which we, in turn, have to initialize with a Java
  • File Output Stream, which itself needs to be
  • initialized with a Java File object (which we
  • initialize with the path to the file we want
  • to create).
  •  
  • Additionally, the second argument of the buffered
  • output stream is the size of the buffer. In this
  • case we are using 2048 bytes.
  • --->
  • <cfset bosOutput = CreateObject(
  • "java",
  • "java.io.BufferedOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.FileOutputStream"
  • ).Init(
  • CreateObject(
  • "java",
  • "java.io.File"
  • ).Init(
  • strFilePath
  • )
  • ),
  • JavaCast( "int", 2048 )
  • ) />
  •  
  •  
  • <!---
  • Loop over the iterations to build up the data. For
  • each iteration, we are going to select a random
  • string and write that string directly to the
  • buffered output stream which should write it
  • directly to file output stream once the buffer is
  • populated with enough data.
  • --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#intIterations#"
  • step="1">
  •  
  • <!---
  • Create the random string. In this case we are
  • creating the string prior to buffer writing
  • because we will need it to get the length of
  • the data.
  • --->
  • <cfset strText = (
  • "I am crazy about " &
  • arrParts[ RandRange( 1, 7 ) ] &
  • Chr( 13) & Chr( 10 )
  • ) />
  •  
  •  
  • <!---
  • Add a random string to the buffered file output
  • stream. We want to write the entire byte array
  • to the output stream.
  • --->
  • <cfset bosOutput.Write(
  • strText.GetBytes(),
  • JavaCast( "int", 0 ),
  • strText.Length()
  • ) />
  •  
  • </cfloop>
  •  
  • </cfsilent>
  •  
  • <!--- Output name of file. --->
  • #strFilePath#
  •  
  • </cftimer>

As with many things, a small number of iterations yields absolutely no difference in speed. We, however, are performing 100,000 iterations. But, even at large iterations, the performance is inconsistent. When I run this test on my machine at home on my new Dell Inspiron E1505 Core Duo, the 5000 byte BufferedOutputStream outperforms everything but small margin, maybe a 100 ms or so. However, when I perform this test here at the office on our powerful server, the CFFile tag outperforms everything else by about 40 ms on repeated tests.

I wonder what that's all about?? I can say for sure though that the straight up Java FileOutputStream was by far the slowest performer. The ColdFusion CFFile tag and the Buffered output streams or any size performed faster. I guess you need to find a balance between data caching a file I/O.

It seems that I am doing alright with the CFFile methodology. But one other thing to note though, and this has more to do with my XML database system, performance is not the only consideration. Using CFFile and Java StringBuffer, I have to create the entire output data buffer in memory before I write it to disk. This can be hard on the system RAM. With an output stream, I can minimize the amount of data that gets stored in the system RAM at any given time.



Reader Comments

Aug 5, 2007 at 6:26 PM // reply »
1 Comments

One thing to keep in mind when doing coldfusion vs java speed tests is some unexpected performance costs when calling java from coldfusion. coldfusion will attempt to analyze the java call's parameters as to data type, so as to convert them properly if necessary. In most instances it's not a consideration, but with large strings or other pieces of data it can be significant. The way to avoid the cost is to specifically 'javaCast' the input like this, javaCast("String", x).

I don't know if the above issue would impact your tests, just thought I would share.


Aug 6, 2007 at 7:20 AM // reply »
11,314 Comments

@Arthur,

Yeah, I have to get into using the JavaCast() method all the time. I get very lazy about it when it comes to methods that expect strings.


Jun 3, 2009 at 6:35 AM // reply »
5 Comments

Hi Ben - There's an issue with these methods in that if you don't issue a osOutput.close() then the files can sometimes remain open within JRun.

Martin


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Jun 18, 2013 at 9:20 PM
Mapping AngularJS Routes Onto URL Parameters And Client-Side Events
I couldn't find examples of passing multiple arguments using the when() routing statement so figured out through trial and error that you can pass multiple arguments using the following format: .whe ... read »
Jun 18, 2013 at 3:39 PM
Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion
Hi Ben, THANKS! While not bleeding edge, it is new to me & I like learning new things every day! ... read »
Jun 18, 2013 at 12:30 PM
Disabling Auto-Correct And Auto-Capitalize Features On iPhone Inputs
Also spellcheck="false" should be mentioned as part of html5 specs ... read »
Jun 18, 2013 at 8:40 AM
Using Named Functions Within Self-Executing Function Blocks In Javascript
Hi Ben, you forgot to mention the most important thing for named self-executing functions - they can be referenced by name ONLY inside their execution context (which is parens in this case), it mean ... read »
dee
Jun 18, 2013 at 7:01 AM
My Safari Browser SQLite Database Hello World Example
hai ben, this program is really good i could understand the concept but i dint know how to save it and how to open it as you have done in the video can u give that details pls ... read »
Jun 18, 2013 at 6:04 AM
Clearing Inline CSS Properties With jQuery
Thanks a lot for for post! It helped me a lot... after being stuck since 24 hrs.. found solution from your post. Thanks again! ... read »
Jun 18, 2013 at 2:31 AM
SOTR 2013 - The Best Conference I Never Went To
I keep watching it, should keep me happily distracted until SotR14 ;) ... read »
Jun 17, 2013 at 9:45 PM
What If All User Interface (UI) Data Came In Reports?
@Jonah, As I was reading what you wrote, it occurred to me that maybe I do something similar to that in some of my client-side code. In an application I'm working on, there are a bunch of unrelated ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools