Writing Enormous Files Based On Massive Record Sets In ColdFusion

Posted December 26, 2006 at 3:36 PM by Ben Nadel

Tags: ColdFusion

Someone over on CF-Talk was asking about writing massive data files that are the result of several million records coming out of a database. One of the concerns expressed was that the problem was not only with the writing of the file, it was with the amount of memory that the massive record set would take up. Here is my stab at a solution that uses small CFQuery record sets as well as a buffered output stream that should ease the use of ColdFusion at both ends.

Not sure if this helps any one, it was really only for the dude on CF-Talk who has not done any Java. In this example, I pulling blog entries out of a database and writing them in quasi-XML format to a txt file. This isn't really anything that I do, I just needed something to test on.

Also, while I am not doing it in the example, you can wrap the whole thing in a CFSilent tag to stop output and whitespace from being sent to the browser. The ClearBuffer() line should take care of any white space in the ColdFusion buffer??? Not sure.

  • <!---
  • Get the number of records to read at one time. This will
  • limit the amount of data that ColdFusion has to pull down
  • from the SQL Server and store in it's memory.
  • --->
  • <cfset intReadCount = 50 />
  •  
  • <!--- Get the offset for the read. --->
  • <cfset intOffset = 0 />
  •  
  •  
  • <!---
  • Set up the buffered output stream. The buffered output
  • stream requires a file output stream which requires
  • a file object. By using a buffered output stream we
  • will limit the amount of data that ColdFusion has to store
  • in memory AND we will moderate the number of write commands
  • that need to take place.
  • --->
  • <cfset objOutput = CreateObject(
  • "java",
  • "java.io.BufferedOutputStream" ).Init(
  •  
  • <!--- File Output Stream. --->
  • CreateObject(
  • "java",
  • "java.io.FileOutputStream" ).Init(
  •  
  • <!--- File object. --->
  • CreateObject(
  • "java",
  • "java.io.File" ).Init(
  •  
  • <!--- File Path. --->
  • JavaCast(
  • "string",
  • ExpandPath( "./output.txt" )
  • )
  • )
  • )
  • ) />
  •  
  •  
  •  
  • <!---
  • Keep looping while we have records to read. This loop
  • will only be broken manually once we have no more records
  • getting returned from the database.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Query for TOP ## records. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#" username="#REQUEST.DSN.Username#" password="#REQUEST.DSN.Password#">
  • SELECT TOP #intReadCount#
  • b.id,
  • b.name,
  • b.description,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id > <cfqueryparam
  • value="#intOffset#"
  • cfsqltype="CF_SQL_INTEGER"
  • />
  • ORDER BY
  • b.id ASC
  • </cfquery>
  •  
  •  
  • <!---
  • Check to see if we have a record count. If we do not,
  • then we are out of data. Break out of the loop.
  • --->
  • <cfif qBlog.RecordCount>
  •  
  • <!--- We have returned records from the database. --->
  •  
  • <!--- Loop over the records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Create a string buffer to hold row output. --->
  • <cfset objRowBuffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  • <!--- Add ID. --->
  • <cfset objRowBuffer.Append(
  • "<id>" &
  • qBlog.id &
  • "</id>"
  • ) />
  •  
  • <!--- Add name. --->
  • <cfset objRowBuffer.Append(
  • "<name>" &
  • XmlFormat( qBlog.name ) &
  • "</name>"
  • ) />
  •  
  • <!--- Add description. --->
  • <cfset objRowBuffer.Append(
  • "<description>" &
  • XmlFormat( qBlog.description ) &
  • "</description>"
  • ) />
  •  
  • <!--- Add date posted. --->
  • <cfset objRowBuffer.Append(
  • "<posted>" &
  • XmlFormat( qBlog.date_posted ) &
  • "</posted>"
  • ) />
  •  
  • <!--- Add new line (for next record). --->
  • <cfset objRowBuffer.Append(
  • Chr( 13 ) &
  • Chr( 10 )
  • ) />
  •  
  •  
  • <!---
  • Convert the string buffer to a string
  • (concatenate all the fields) and then get the
  • character byte array from the resultant string.
  • --->
  • <cfset arrRowBytes = objRowBuffer.ToString().GetBytes() />
  •  
  • <!--- Write the byte array to the output stream. --->
  • <cfset objOutput.Write(
  • arrRowBytes,
  • JavaCast( "int", 0 ),
  • JavaCast( "int", ArrayLen( arrRowBytes ) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Increment the offset. --->
  • <cfset intOffset = (intOffset + intReadCount) />
  •  
  •  
  • <!---
  • Reset the buffer. This should kill the white space
  • that is building up in the ColdFusion memory space.
  • --->
  • <cfset GetPageContext().GetOut().ClearBuffer() />
  •  
  • <cfelse>
  •  
  • <!--- No more records. Stop looping. --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Flush the buffered output stream to make sure there
  • is no straggling buffer data.
  • --->
  • <cfset objOutput.Flush() />


Reader Comments

Dec 26, 2006 at 7:56 PM // reply »
67 Comments

I can't help but think you could have achieved this more easily without horsing around with Java and just using <cffile action="append"> (which will be doing all your stuff "under the hood" anyhow).

The problem from the bloke on CF-TALK is that CF is very ham-fisted when it comes to dealing with large amounts of data (recordsets, file contents, etc). It tries to load it ALL into memory, whether you need it or not (the answer being "not" if you're just spooling the recordset to a file).

My conceptual knowledge of Java is far better than my syntactical one, but I would try to do this by streaming the recordset back from the DB, and streaming THAT to the file. So only the chunk-size you're streaming every takes up memory, not the whole result/file.

Now... you're quite good with your Java... I'm sure you could cook something up that does that?

--
Adam


Dec 26, 2006 at 9:05 PM // reply »
7 Comments

Adam, you could use the CFFile 'append' method, but that eats up even more memory, mostly from a processing perspective. The company I work for ran into this problem also, as we would build large CSV files from data gained from most of the tables our main application. I compared the Java method you talk to here Ben with the ArrayAppend/ArrayToList method I describe at http://www.philduba.com/index.cfm/2006/10/20/Improving-Performance-Concatenating-Large-Strings.
They came out equivalent in time up to 10 million records being written (at least for our data). If someone doesn't have the Java knowledge, the CF solution might be easier to understand.


Dec 27, 2006 at 7:23 AM // reply »
11,246 Comments

@Adam,

I like your idea, but I would have no idea about streaming a record set from the Database. I am not even sure if that is possible as it is not even a ColdFusion recordset until the data is completely returned??? Perhaps if you did the query itself in Java then something like this would be possible, but it is way above my skill set.

@Phil,

Cool idea, especially for people who are unfamiliar with the Java techniques.


Dec 27, 2006 at 8:20 AM // reply »
67 Comments

Hi Ben.
No, I meant foresake CF completely, and do it all in Java. The only time it becomes useful for the data to be in a CF-recordset-format is if you want to use CF query functions on it. If you're just spooling it to a file: no need.

Perhaps the only time I'd involve CF would be if maybe this would be a common enough task to wrap up in a <cf_query2file> tag or something.

--
Adam


Dec 27, 2006 at 9:06 AM // reply »
3 Comments

Adam, I think doing the query and file writing in Java would be a good solution, if the knowledge is there to do that. In what I've had to work on that led me to the ArrayAppend idea, the query is being used for other things in the page, not just CSV or file creation. I looked at that thread today and the guy seems to be CF only.


May 8, 2007 at 10:59 AM // reply »
1 Comments

this might sound like a silly comment but when I try to post the code, I get pound signs that I have to delete before I can try to execute the code. Is that something you could fix or am I making some sort of mistake?


May 8, 2007 at 12:07 PM // reply »
11,246 Comments

If you are copy / pasting the code directly from this page, then you will get # signs because it is actually an unordered list (and those # are the list item bullets).

The better way to do this would be either to click on the "Download" link above the code you want, or the "open in new window" link (which has a copy-n-paste box with proper formatting).


Jul 24, 2007 at 9:29 PM // reply »
1 Comments

I've got a similar problem and need to create a 100+MB text file. Is it possible to stream the query results using java?


Jul 28, 2007 at 6:05 PM // reply »
11,246 Comments

@Garth,

Using some sort of file output stream should be just fine for that.


Mar 24, 2009 at 6:39 AM // reply »
4 Comments

Thanks Ben - had this issue myself this will be useful to have a go.

On a side, we are also experiencing some real slow down when adding large numbers of images to a zip file. Do you think this would work on this too?


Mar 24, 2009 at 2:45 PM // reply »
11,246 Comments

@Richard,

I don't think this particular strategy would work with ZIP files, but I am sure that if you write a zip file to disk and then iteratively add image files to the zip, you'd be in better shape.


Mar 26, 2009 at 7:20 AM // reply »
4 Comments

thanks ben - fyi - implemented the above method and it worked well for me the memory for jrun & sql was very stable. Only thing was it took "a lot" longer to run through. I'm going to test out adding in blocks of rows with cfsavecontent in one go, to see if this works quicker for me.

Thanks again.


Mar 26, 2009 at 12:33 PM // reply »
11,246 Comments

@Richard,

Awesome.


Ray
Oct 1, 2010 at 3:59 PM // reply »
1 Comments

I keep getting java.lang.OutOfMemoryError: GC overhead limit exceeded

for even 1000 records.


Oct 3, 2010 at 9:08 PM // reply »
11,246 Comments

@Ray,

Hmm, that's odd - I would be surprised for 1000 records to be causing memory problems at all, with any approach. Are you returning a query with an enormouns number of columns or something?

Have you tried restarting ColdFusion - I've run into problems where CF just stops working properly and I get memory problems; then, I restart the service and it seems to come back OK.

Also, at the end of each loop, you might try doing this:

<cfthread action="sleep" duration="10" />

I've been told that adding a very short sleep allows ColdFusion enough time to perform the necessary garbage collection.


Jan 20, 2011 at 1:54 PM // reply »
7 Comments

Thank you so much Ben! This really helped me out today. I just have one question. After the entire process is done and I try to move the file and or delete (outside of cf) it says that the file is open by another program. It lloks like java is not letting go of the file. Is there something I am missing?


Feb 9, 2011 at 1:06 PM // reply »
7 Comments

Hello Ben,

Just wanted to share a change I did as I needed to write the file set as UTF-8. Below is a simple example of how to do this.

// file to write
fileToWrite = 'c:\testfile.txt';

// create the file
theFile = createObject("java","java.io.File").init(javaCast("string",fileToWrite));

// create my java file writer
fileWriter = createObject("java","java.io.BufferedWriter").init(createObject("java","java.io.OutputStreamWriter").init(createObject("java","java.io.FileOutputStream").init(theFile),"UTF8"));

// create string object
stringToWrite = createObject("java","java.lang.StringBuffer").Init();
stringToWrite.Append('Hello World');

// write line
fileWriter.write(stringToWrite.toString());

// Flush the buffered output stream to make sure there is no straggling buffer data
fileWriter.flush();


Feb 9, 2011 at 1:08 PM // reply »
7 Comments

Sorry the text did't wrap so here is the cut off piece.

fileWriter = createObject("java","java.io.BufferedWriter").
init(createObject("java","java.io.OutputStreamWriter").
init(createObject("java","java.io.FileOutputStream").
init(theFile),"UTF8"));


DK
Sep 16, 2011 at 3:32 PM // reply »
4 Comments

So, how would this approach need to change if I need to *return* a CSV file to a user? That is, no external file is currently written. It goes like this:

  • application.Util.prepareForCSVDownload("My Report.csv");
  • report = new components.FooDomain.Report(foo, bar);
  • report.export();
  • <cffunction name="prepareForCSVDownload" returntype="void">
  • <cfargument name="filename" required="yes" type="string"/>
  •  
  • <cfsetting enablecfoutputonly="yes" showdebugoutput="no" />
  • <cfcontent type="application/msexcel" />
  • <cfheader name="Content-Disposition" value='filename="#filename#"' /> <!--- #Replace(filename, " ", "\ ", "all")# --->
  • </cffunction>

1. Would you just use Java buffered output to GetPageContext().GetOut()?
2. Probably, you would get rid of GetPageContext().GetOut().ClearBuffer(), right?

I need to find a way of stopping running out of heap for a large CSV report rather urgently. Thanks in advance for any ideas.


Sep 16, 2011 at 3:36 PM // reply »
369 Comments

I like this post. A lot of really good, useful information here. When I saw the title, I couldn't help but click on it, because it sounds exactly like the kind of random problem I would have. Good job at finding a solution.


DK
Sep 16, 2011 at 4:09 PM // reply »
4 Comments

Adding to my previous comment, should I just <cfflush> my CSV output and all will be good, at least on the output side? The DB side is a separate issue.


DK
Sep 16, 2011 at 4:35 PM // reply »
4 Comments

  • <cfflush interval="512" />

at the top of the page solved my problem and provided nice download progress feedback for users.

However it has one nasty side effect. When IE prompts and I select to save .csv, everything works great. When IE prompts and I select to Open with Excel, it is downloaded with the wrong extension: .csv.XLS, and, as a result, Excel does not recognize CSV format.


DK
Sep 16, 2011 at 5:04 PM // reply »
4 Comments

OK, next attempt at a solution is putting <cfflush> after the headers. This time Open file works. However, that change introduced 2 problem with report file name:

1. File name suddenly started to contain underscores rather than spaces.

2. Open file generates the file which is shown as reportAction (which is the .CFM name). Weird.

The code now looks like this:

  • <cfscript>
  • // download spreadsheet application.Util.prepareForCSVDownload("My Report.csv");
  • </cfscript>
  • <cfflush interval="512" /> <!--- Flush output every 512 bytes --->
  • <cfscript>
  • report = new components.FooDomain.Report(foo, bar);
  • report.export();
  • </cfscript>


Dec 18, 2012 at 2:16 PM // reply »
1 Comments

Brilliant! Exactly what I needed. Thanks so much.

On our dev team we always joke about how your blog always shows up in the top five results for our Google searches when we're stuck on something. Thanks for all the help you don't know you've already provided!



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 24, 2013 at 5:39 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
@Adam Oops! My mistake! I hadn't gotten that far in my testing - I'm still baby stepping my way through the process. ... read »
May 24, 2013 at 5:13 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
Hi Jason, Thanks for checking up on that, but I still stand firm on my position. :) There are actually two listLast()'s in use, and you're right that the one using a space as a delimiter is fine. ... read »
May 24, 2013 at 4:45 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
@Ben I have been lurking your site for quite some time, and haven't stepped up to comment until today. Thanks for all the great info - keep it up! @Adam I believe you are mistaken... as the commen ... read »
May 24, 2013 at 11:21 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, Ha ha, let's us never speak of justifying "##" notation again :P ... read »
May 24, 2013 at 11:18 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Ah, so it was indeed how I vaguely remembered it to be: A direct assignment value = users.id[ i ] causes value to retain the sticky datatype of the query column. Although unnecessary in ... read »
May 24, 2013 at 9:11 AM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Brandon, Hi, No, I haven't been able to do that. I have just kept it as it is. ... read »
May 23, 2013 at 9:52 PM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Muhmmadibn Did you figure out a solution to launching PDFs? I am running into the same issues myself. There is no way to close the PDF or go back once you launch it. Thanks in advance! ... read »
May 23, 2013 at 6:06 PM
The Girl Who Broke My Heart, And Made Me A Better Person
Good day,ladies and gentle men, my name is Dr AMADI the great spell caster in Africa, i have help so many people for different kind of problems,who say there is no solution to problems on earth, that ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools