Writing Enormous Files Based On Massive Record Sets In ColdFusion

Posted December 26, 2006 at 3:36 PM by Ben Nadel

Tags: ColdFusion

Someone over on CF-Talk was asking about writing massive data files that are the result of several million records coming out of a database. One of the concerns expressed was that the problem was not only with the writing of the file, it was with the amount of memory that the massive record set would take up. Here is my stab at a solution that uses small CFQuery record sets as well as a buffered output stream that should ease the use of ColdFusion at both ends.

Not sure if this helps any one, it was really only for the dude on CF-Talk who has not done any Java. In this example, I pulling blog entries out of a database and writing them in quasi-XML format to a txt file. This isn't really anything that I do, I just needed something to test on.

Also, while I am not doing it in the example, you can wrap the whole thing in a CFSilent tag to stop output and whitespace from being sent to the browser. The ClearBuffer() line should take care of any white space in the ColdFusion buffer??? Not sure.

  • <!---
  • Get the number of records to read at one time. This will
  • limit the amount of data that ColdFusion has to pull down
  • from the SQL Server and store in it's memory.
  • --->
  • <cfset intReadCount = 50 />
  •  
  • <!--- Get the offset for the read. --->
  • <cfset intOffset = 0 />
  •  
  •  
  • <!---
  • Set up the buffered output stream. The buffered output
  • stream requires a file output stream which requires
  • a file object. By using a buffered output stream we
  • will limit the amount of data that ColdFusion has to store
  • in memory AND we will moderate the number of write commands
  • that need to take place.
  • --->
  • <cfset objOutput = CreateObject(
  • "java",
  • "java.io.BufferedOutputStream" ).Init(
  •  
  • <!--- File Output Stream. --->
  • CreateObject(
  • "java",
  • "java.io.FileOutputStream" ).Init(
  •  
  • <!--- File object. --->
  • CreateObject(
  • "java",
  • "java.io.File" ).Init(
  •  
  • <!--- File Path. --->
  • JavaCast(
  • "string",
  • ExpandPath( "./output.txt" )
  • )
  • )
  • )
  • ) />
  •  
  •  
  •  
  • <!---
  • Keep looping while we have records to read. This loop
  • will only be broken manually once we have no more records
  • getting returned from the database.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Query for TOP ## records. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#" username="#REQUEST.DSN.Username#" password="#REQUEST.DSN.Password#">
  • SELECT TOP #intReadCount#
  • b.id,
  • b.name,
  • b.description,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id > <cfqueryparam
  • value="#intOffset#"
  • cfsqltype="CF_SQL_INTEGER"
  • />
  • ORDER BY
  • b.id ASC
  • </cfquery>
  •  
  •  
  • <!---
  • Check to see if we have a record count. If we do not,
  • then we are out of data. Break out of the loop.
  • --->
  • <cfif qBlog.RecordCount>
  •  
  • <!--- We have returned records from the database. --->
  •  
  • <!--- Loop over the records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Create a string buffer to hold row output. --->
  • <cfset objRowBuffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  • <!--- Add ID. --->
  • <cfset objRowBuffer.Append(
  • "<id>" &
  • qBlog.id &
  • "</id>"
  • ) />
  •  
  • <!--- Add name. --->
  • <cfset objRowBuffer.Append(
  • "<name>" &
  • XmlFormat( qBlog.name ) &
  • "</name>"
  • ) />
  •  
  • <!--- Add description. --->
  • <cfset objRowBuffer.Append(
  • "<description>" &
  • XmlFormat( qBlog.description ) &
  • "</description>"
  • ) />
  •  
  • <!--- Add date posted. --->
  • <cfset objRowBuffer.Append(
  • "<posted>" &
  • XmlFormat( qBlog.date_posted ) &
  • "</posted>"
  • ) />
  •  
  • <!--- Add new line (for next record). --->
  • <cfset objRowBuffer.Append(
  • Chr( 13 ) &
  • Chr( 10 )
  • ) />
  •  
  •  
  • <!---
  • Convert the string buffer to a string
  • (concatenate all the fields) and then get the
  • character byte array from the resultant string.
  • --->
  • <cfset arrRowBytes = objRowBuffer.ToString().GetBytes() />
  •  
  • <!--- Write the byte array to the output stream. --->
  • <cfset objOutput.Write(
  • arrRowBytes,
  • JavaCast( "int", 0 ),
  • JavaCast( "int", ArrayLen( arrRowBytes ) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Increment the offset. --->
  • <cfset intOffset = (intOffset + intReadCount) />
  •  
  •  
  • <!---
  • Reset the buffer. This should kill the white space
  • that is building up in the ColdFusion memory space.
  • --->
  • <cfset GetPageContext().GetOut().ClearBuffer() />
  •  
  • <cfelse>
  •  
  • <!--- No more records. Stop looping. --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Flush the buffered output stream to make sure there
  • is no straggling buffer data.
  • --->
  • <cfset objOutput.Flush() />


Reader Comments

Dec 26, 2006 at 7:56 PM // reply »
67 Comments

I can't help but think you could have achieved this more easily without horsing around with Java and just using <cffile action="append"> (which will be doing all your stuff "under the hood" anyhow).

The problem from the bloke on CF-TALK is that CF is very ham-fisted when it comes to dealing with large amounts of data (recordsets, file contents, etc). It tries to load it ALL into memory, whether you need it or not (the answer being "not" if you're just spooling the recordset to a file).

My conceptual knowledge of Java is far better than my syntactical one, but I would try to do this by streaming the recordset back from the DB, and streaming THAT to the file. So only the chunk-size you're streaming every takes up memory, not the whole result/file.

Now... you're quite good with your Java... I'm sure you could cook something up that does that?

--
Adam


Dec 26, 2006 at 9:05 PM // reply »
7 Comments

Adam, you could use the CFFile 'append' method, but that eats up even more memory, mostly from a processing perspective. The company I work for ran into this problem also, as we would build large CSV files from data gained from most of the tables our main application. I compared the Java method you talk to here Ben with the ArrayAppend/ArrayToList method I describe at http://www.philduba.com/index.cfm/2006/10/20/Improving-Performance-Concatenating-Large-Strings.
They came out equivalent in time up to 10 million records being written (at least for our data). If someone doesn't have the Java knowledge, the CF solution might be easier to understand.


Dec 27, 2006 at 7:23 AM // reply »
10,640 Comments

@Adam,

I like your idea, but I would have no idea about streaming a record set from the Database. I am not even sure if that is possible as it is not even a ColdFusion recordset until the data is completely returned??? Perhaps if you did the query itself in Java then something like this would be possible, but it is way above my skill set.

@Phil,

Cool idea, especially for people who are unfamiliar with the Java techniques.


Dec 27, 2006 at 8:20 AM // reply »
67 Comments

Hi Ben.
No, I meant foresake CF completely, and do it all in Java. The only time it becomes useful for the data to be in a CF-recordset-format is if you want to use CF query functions on it. If you're just spooling it to a file: no need.

Perhaps the only time I'd involve CF would be if maybe this would be a common enough task to wrap up in a <cf_query2file> tag or something.

--
Adam


Dec 27, 2006 at 9:06 AM // reply »
3 Comments

Adam, I think doing the query and file writing in Java would be a good solution, if the knowledge is there to do that. In what I've had to work on that led me to the ArrayAppend idea, the query is being used for other things in the page, not just CSV or file creation. I looked at that thread today and the guy seems to be CF only.


May 8, 2007 at 10:59 AM // reply »
1 Comments

this might sound like a silly comment but when I try to post the code, I get pound signs that I have to delete before I can try to execute the code. Is that something you could fix or am I making some sort of mistake?


May 8, 2007 at 12:07 PM // reply »
10,640 Comments

If you are copy / pasting the code directly from this page, then you will get # signs because it is actually an unordered list (and those # are the list item bullets).

The better way to do this would be either to click on the "Download" link above the code you want, or the "open in new window" link (which has a copy-n-paste box with proper formatting).


Jul 24, 2007 at 9:29 PM // reply »
1 Comments

I've got a similar problem and need to create a 100+MB text file. Is it possible to stream the query results using java?


Jul 28, 2007 at 6:05 PM // reply »
10,640 Comments

@Garth,

Using some sort of file output stream should be just fine for that.


Mar 24, 2009 at 6:39 AM // reply »
4 Comments

Thanks Ben - had this issue myself this will be useful to have a go.

On a side, we are also experiencing some real slow down when adding large numbers of images to a zip file. Do you think this would work on this too?


Mar 24, 2009 at 2:45 PM // reply »
10,640 Comments

@Richard,

I don't think this particular strategy would work with ZIP files, but I am sure that if you write a zip file to disk and then iteratively add image files to the zip, you'd be in better shape.


Mar 26, 2009 at 7:20 AM // reply »
4 Comments

thanks ben - fyi - implemented the above method and it worked well for me the memory for jrun & sql was very stable. Only thing was it took "a lot" longer to run through. I'm going to test out adding in blocks of rows with cfsavecontent in one go, to see if this works quicker for me.

Thanks again.


Mar 26, 2009 at 12:33 PM // reply »
10,640 Comments

@Richard,

Awesome.


Ray
Oct 1, 2010 at 3:59 PM // reply »
1 Comments

I keep getting java.lang.OutOfMemoryError: GC overhead limit exceeded

for even 1000 records.


Oct 3, 2010 at 9:08 PM // reply »
10,640 Comments

@Ray,

Hmm, that's odd - I would be surprised for 1000 records to be causing memory problems at all, with any approach. Are you returning a query with an enormouns number of columns or something?

Have you tried restarting ColdFusion - I've run into problems where CF just stops working properly and I get memory problems; then, I restart the service and it seems to come back OK.

Also, at the end of each loop, you might try doing this:

<cfthread action="sleep" duration="10" />

I've been told that adding a very short sleep allows ColdFusion enough time to perform the necessary garbage collection.


Jan 20, 2011 at 1:54 PM // reply »
7 Comments

Thank you so much Ben! This really helped me out today. I just have one question. After the entire process is done and I try to move the file and or delete (outside of cf) it says that the file is open by another program. It lloks like java is not letting go of the file. Is there something I am missing?


Feb 9, 2011 at 1:06 PM // reply »
7 Comments

Hello Ben,

Just wanted to share a change I did as I needed to write the file set as UTF-8. Below is a simple example of how to do this.

// file to write
fileToWrite = 'c:\testfile.txt';

// create the file
theFile = createObject("java","java.io.File").init(javaCast("string",fileToWrite));

// create my java file writer
fileWriter = createObject("java","java.io.BufferedWriter").init(createObject("java","java.io.OutputStreamWriter").init(createObject("java","java.io.FileOutputStream").init(theFile),"UTF8"));

// create string object
stringToWrite = createObject("java","java.lang.StringBuffer").Init();
stringToWrite.Append('Hello World');

// write line
fileWriter.write(stringToWrite.toString());

// Flush the buffered output stream to make sure there is no straggling buffer data
fileWriter.flush();


Feb 9, 2011 at 1:08 PM // reply »
7 Comments

Sorry the text did't wrap so here is the cut off piece.

fileWriter = createObject("java","java.io.BufferedWriter").
init(createObject("java","java.io.OutputStreamWriter").
init(createObject("java","java.io.FileOutputStream").
init(theFile),"UTF8"));


DK
Sep 16, 2011 at 3:32 PM // reply »
4 Comments

So, how would this approach need to change if I need to *return* a CSV file to a user? That is, no external file is currently written. It goes like this:

  • application.Util.prepareForCSVDownload("My Report.csv");
  • report = new components.FooDomain.Report(foo, bar);
  • report.export();
  • <cffunction name="prepareForCSVDownload" returntype="void">
  • <cfargument name="filename" required="yes" type="string"/>
  •  
  • <cfsetting enablecfoutputonly="yes" showdebugoutput="no" />
  • <cfcontent type="application/msexcel" />
  • <cfheader name="Content-Disposition" value='filename="#filename#"' /> <!--- #Replace(filename, " ", "\ ", "all")# --->
  • </cffunction>

1. Would you just use Java buffered output to GetPageContext().GetOut()?
2. Probably, you would get rid of GetPageContext().GetOut().ClearBuffer(), right?

I need to find a way of stopping running out of heap for a large CSV report rather urgently. Thanks in advance for any ideas.


Sep 16, 2011 at 3:36 PM // reply »
336 Comments

I like this post. A lot of really good, useful information here. When I saw the title, I couldn't help but click on it, because it sounds exactly like the kind of random problem I would have. Good job at finding a solution.


DK
Sep 16, 2011 at 4:09 PM // reply »
4 Comments

Adding to my previous comment, should I just <cfflush> my CSV output and all will be good, at least on the output side? The DB side is a separate issue.


DK
Sep 16, 2011 at 4:35 PM // reply »
4 Comments

  • <cfflush interval="512" />

at the top of the page solved my problem and provided nice download progress feedback for users.

However it has one nasty side effect. When IE prompts and I select to save .csv, everything works great. When IE prompts and I select to Open with Excel, it is downloaded with the wrong extension: .csv.XLS, and, as a result, Excel does not recognize CSV format.


DK
Sep 16, 2011 at 5:04 PM // reply »
4 Comments

OK, next attempt at a solution is putting <cfflush> after the headers. This time Open file works. However, that change introduced 2 problem with report file name:

1. File name suddenly started to contain underscores rather than spaces.

2. Open file generates the file which is shown as reportAction (which is the .CFM name). Weird.

The code now looks like this:

  • <cfscript>
  • // download spreadsheet application.Util.prepareForCSVDownload("My Report.csv");
  • </cfscript>
  • <cfflush interval="512" /> <!--- Flush output every 512 bytes --->
  • <cfscript>
  • report = new components.FooDomain.Report(foo, bar);
  • report.export();
  • </cfscript>



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 12, 2012 at 3:37 AM
Learning ColdFusion 8: CFImage Part III - Watermarks And Transparency
Hi Ben, Just to ask currently it is placed bottom right corner, if i need to replace the same rendered image on the bottom left side or in the bottom center, how that can be calculated. bottom ce ... read »
Feb 11, 2012 at 9:29 PM
Use jQuery's SlideDown() With Fixed-Width Elements To Prevent Jumping
I can't say how glad I am that I found your post. Thank you very much. ... read »
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »