Writing Enormous Files Based On Massive Record Sets In ColdFusion

Posted December 26, 2006 at 3:36 PM

Tags: ColdFusion

Someone over on CF-Talk was asking about writing massive data files that are the result of several million records coming out of a database. One of the concerns expressed was that the problem was not only with the writing of the file, it was with the amount of memory that the massive record set would take up. Here is my stab at a solution that uses small CFQuery record sets as well as a buffered output stream that should ease the use of ColdFusion at both ends.

Not sure if this helps any one, it was really only for the dude on CF-Talk who has not done any Java. In this example, I pulling blog entries out of a database and writing them in quasi-XML format to a txt file. This isn't really anything that I do, I just needed something to test on.

Also, while I am not doing it in the example, you can wrap the whole thing in a CFSilent tag to stop output and whitespace from being sent to the browser. The ClearBuffer() line should take care of any white space in the ColdFusion buffer??? Not sure.

 Launch code in new window » Download code as text file »

  • <!---
  • Get the number of records to read at one time. This will
  • limit the amount of data that ColdFusion has to pull down
  • from the SQL Server and store in it's memory.
  • --->
  • <cfset intReadCount = 50 />
  •  
  • <!--- Get the offset for the read. --->
  • <cfset intOffset = 0 />
  •  
  •  
  • <!---
  • Set up the buffered output stream. The buffered output
  • stream requires a file output stream which requires
  • a file object. By using a buffered output stream we
  • will limit the amount of data that ColdFusion has to store
  • in memory AND we will moderate the number of write commands
  • that need to take place.
  • --->
  • <cfset objOutput = CreateObject(
  • "java",
  • "java.io.BufferedOutputStream" ).Init(
  •  
  • <!--- File Output Stream. --->
  • CreateObject(
  • "java",
  • "java.io.FileOutputStream" ).Init(
  •  
  • <!--- File object. --->
  • CreateObject(
  • "java",
  • "java.io.File" ).Init(
  •  
  • <!--- File Path. --->
  • JavaCast(
  • "string",
  • ExpandPath( "./output.txt" )
  • )
  • )
  • )
  • ) />
  •  
  •  
  •  
  • <!---
  • Keep looping while we have records to read. This loop
  • will only be broken manually once we have no more records
  • getting returned from the database.
  • --->
  • <cfloop condition="true">
  •  
  • <!--- Query for TOP ## records. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#" username="#REQUEST.DSN.Username#" password="#REQUEST.DSN.Password#">
  • SELECT TOP #intReadCount#
  • b.id,
  • b.name,
  • b.description,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id > <cfqueryparam
  • value="#intOffset#"
  • cfsqltype="CF_SQL_INTEGER"
  • />
  • ORDER BY
  • b.id ASC
  • </cfquery>
  •  
  •  
  • <!---
  • Check to see if we have a record count. If we do not,
  • then we are out of data. Break out of the loop.
  • --->
  • <cfif qBlog.RecordCount>
  •  
  • <!--- We have returned records from the database. --->
  •  
  • <!--- Loop over the records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Create a string buffer to hold row output. --->
  • <cfset objRowBuffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init()
  • />
  •  
  • <!--- Add ID. --->
  • <cfset objRowBuffer.Append(
  • "<id>" &
  • qBlog.id &
  • "</id>"
  • ) />
  •  
  • <!--- Add name. --->
  • <cfset objRowBuffer.Append(
  • "<name>" &
  • XmlFormat( qBlog.name ) &
  • "</name>"
  • ) />
  •  
  • <!--- Add description. --->
  • <cfset objRowBuffer.Append(
  • "<description>" &
  • XmlFormat( qBlog.description ) &
  • "</description>"
  • ) />
  •  
  • <!--- Add date posted. --->
  • <cfset objRowBuffer.Append(
  • "<posted>" &
  • XmlFormat( qBlog.date_posted ) &
  • "</posted>"
  • ) />
  •  
  • <!--- Add new line (for next record). --->
  • <cfset objRowBuffer.Append(
  • Chr( 13 ) &
  • Chr( 10 )
  • ) />
  •  
  •  
  • <!---
  • Convert the string buffer to a string
  • (concatenate all the fields) and then get the
  • character byte array from the resultant string.
  • --->
  • <cfset arrRowBytes = objRowBuffer.ToString().GetBytes() />
  •  
  • <!--- Write the byte array to the output stream. --->
  • <cfset objOutput.Write(
  • arrRowBytes,
  • JavaCast( "int", 0 ),
  • JavaCast( "int", ArrayLen( arrRowBytes ) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Increment the offset. --->
  • <cfset intOffset = (intOffset + intReadCount) />
  •  
  •  
  • <!---
  • Reset the buffer. This should kill the white space
  • that is building up in the ColdFusion memory space.
  • --->
  • <cfset GetPageContext().GetOut().ClearBuffer() />
  •  
  • <cfelse>
  •  
  • <!--- No more records. Stop looping. --->
  • <cfbreak />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Flush the buffered output stream to make sure there
  • is no straggling buffer data.
  • --->
  • <cfset objOutput.Flush() />

Download Code Snippet ZIP File

Comments (9)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page



Adobe ColdFusion 8.0.1 Update - Helping Programmers To Be Signifanctly Less Girlie - Download ColdFusion 8 Update 8.0.1 Now.

Reader Comments

I can't help but think you could have achieved this more easily without horsing around with Java and just using <cffile action="append"> (which will be doing all your stuff "under the hood" anyhow).

The problem from the bloke on CF-TALK is that CF is very ham-fisted when it comes to dealing with large amounts of data (recordsets, file contents, etc). It tries to load it ALL into memory, whether you need it or not (the answer being "not" if you're just spooling the recordset to a file).

My conceptual knowledge of Java is far better than my syntactical one, but I would try to do this by streaming the recordset back from the DB, and streaming THAT to the file. So only the chunk-size you're streaming every takes up memory, not the whole result/file.

Now... you're quite good with your Java... I'm sure you could cook something up that does that?

--
Adam

Posted by Adam Cameron on Dec 26, 2006 at 7:56 PM


Adam, you could use the CFFile 'append' method, but that eats up even more memory, mostly from a processing perspective. The company I work for ran into this problem also, as we would build large CSV files from data gained from most of the tables our main application. I compared the Java method you talk to here Ben with the ArrayAppend/ArrayToList method I describe at http://www.philduba.com/index.cfm/2006/10/20/Improving-Performance-Concatenating-Large-Strings.
They came out equivalent in time up to 10 million records being written (at least for our data). If someone doesn't have the Java knowledge, the CF solution might be easier to understand.

Posted by Phil Duba on Dec 26, 2006 at 9:05 PM


@Adam,

I like your idea, but I would have no idea about streaming a record set from the Database. I am not even sure if that is possible as it is not even a ColdFusion recordset until the data is completely returned??? Perhaps if you did the query itself in Java then something like this would be possible, but it is way above my skill set.

@Phil,

Cool idea, especially for people who are unfamiliar with the Java techniques.

Posted by Ben Nadel on Dec 27, 2006 at 7:23 AM


Hi Ben.
No, I meant foresake CF completely, and do it all in Java. The only time it becomes useful for the data to be in a CF-recordset-format is if you want to use CF query functions on it. If you're just spooling it to a file: no need.

Perhaps the only time I'd involve CF would be if maybe this would be a common enough task to wrap up in a <cf_query2file> tag or something.

--
Adam

Posted by Adam Cameron on Dec 27, 2006 at 8:20 AM


Adam, I think doing the query and file writing in Java would be a good solution, if the knowledge is there to do that. In what I've had to work on that led me to the ArrayAppend idea, the query is being used for other things in the page, not just CSV or file creation. I looked at that thread today and the guy seems to be CF only.

Posted by Phil Duba on Dec 27, 2006 at 9:06 AM


this might sound like a silly comment but when I try to post the code, I get pound signs that I have to delete before I can try to execute the code. Is that something you could fix or am I making some sort of mistake?

Posted by Scott Hoff on May 8, 2007 at 10:59 AM


If you are copy / pasting the code directly from this page, then you will get # signs because it is actually an unordered list (and those # are the list item bullets).

The better way to do this would be either to click on the "Download" link above the code you want, or the "open in new window" link (which has a copy-n-paste box with proper formatting).

Posted by Ben Nadel on May 8, 2007 at 12:07 PM


I've got a similar problem and need to create a 100+MB text file. Is it possible to stream the query results using java?

Posted by Garth on Jul 24, 2007 at 9:29 PM


@Garth,

Using some sort of file output stream should be just fine for that.

Posted by Ben Nadel on Jul 28, 2007 at 6:05 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting