Someone over on CF-Talk was asking about writing massive data files that are the result of several million records coming out of a database. One of the concerns expressed was that the problem was not only with the writing of the file, it was with the amount of memory that the massive record set would take up. Here is my stab at a solution that uses small CFQuery record sets as well as a buffered output stream that should ease the use of ColdFusion at both ends.
Not sure if this helps any one, it was really only for the dude on CF-Talk who has not done any Java. In this example, I pulling blog entries out of a database and writing them in quasi-XML format to a txt file. This isn't really anything that I do, I just needed something to test on.
Also, while I am not doing it in the example, you can wrap the whole thing in a CFSilent tag to stop output and whitespace from being sent to the browser. The ClearBuffer() line should take care of any white space in the ColdFusion buffer??? Not sure.
Launch code in new window » Download code as text file »
Download Code Snippet ZIP File
Comments (9) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
ColdFusion File Explorer - Round One
Team Nylon : Interface Driven Architecture Proof Of Concept (Design File / Code / Demo Application)
I can't help but think you could have achieved this more easily without horsing around with Java and just using <cffile action="append"> (which will be doing all your stuff "under the hood" anyhow).
The problem from the bloke on CF-TALK is that CF is very ham-fisted when it comes to dealing with large amounts of data (recordsets, file contents, etc). It tries to load it ALL into memory, whether you need it or not (the answer being "not" if you're just spooling the recordset to a file).
My conceptual knowledge of Java is far better than my syntactical one, but I would try to do this by streaming the recordset back from the DB, and streaming THAT to the file. So only the chunk-size you're streaming every takes up memory, not the whole result/file.
Now... you're quite good with your Java... I'm sure you could cook something up that does that?
--
Adam
Posted by Adam Cameron on Dec 26, 2006 at 7:56 PM
Adam, you could use the CFFile 'append' method, but that eats up even more memory, mostly from a processing perspective. The company I work for ran into this problem also, as we would build large CSV files from data gained from most of the tables our main application. I compared the Java method you talk to here Ben with the ArrayAppend/ArrayToList method I describe at http://www.philduba.com/index.cfm/2006/10/20/Improving-Performance-Concatenating-Large-Strings.
They came out equivalent in time up to 10 million records being written (at least for our data). If someone doesn't have the Java knowledge, the CF solution might be easier to understand.
Posted by Phil Duba on Dec 26, 2006 at 9:05 PM
@Adam,
I like your idea, but I would have no idea about streaming a record set from the Database. I am not even sure if that is possible as it is not even a ColdFusion recordset until the data is completely returned??? Perhaps if you did the query itself in Java then something like this would be possible, but it is way above my skill set.
@Phil,
Cool idea, especially for people who are unfamiliar with the Java techniques.
Posted by Ben Nadel on Dec 27, 2006 at 7:23 AM
Hi Ben.
No, I meant foresake CF completely, and do it all in Java. The only time it becomes useful for the data to be in a CF-recordset-format is if you want to use CF query functions on it. If you're just spooling it to a file: no need.
Perhaps the only time I'd involve CF would be if maybe this would be a common enough task to wrap up in a <cf_query2file> tag or something.
--
Adam
Posted by Adam Cameron on Dec 27, 2006 at 8:20 AM
Adam, I think doing the query and file writing in Java would be a good solution, if the knowledge is there to do that. In what I've had to work on that led me to the ArrayAppend idea, the query is being used for other things in the page, not just CSV or file creation. I looked at that thread today and the guy seems to be CF only.
Posted by Phil Duba on Dec 27, 2006 at 9:06 AM
this might sound like a silly comment but when I try to post the code, I get pound signs that I have to delete before I can try to execute the code. Is that something you could fix or am I making some sort of mistake?
Posted by Scott Hoff on May 8, 2007 at 10:59 AM
If you are copy / pasting the code directly from this page, then you will get # signs because it is actually an unordered list (and those # are the list item bullets).
The better way to do this would be either to click on the "Download" link above the code you want, or the "open in new window" link (which has a copy-n-paste box with proper formatting).
Posted by Ben Nadel on May 8, 2007 at 12:07 PM
I've got a similar problem and need to create a 100+MB text file. Is it possible to stream the query results using java?
Posted by Garth on Jul 24, 2007 at 9:29 PM
@Garth,
Using some sort of file output stream should be just fine for that.
Posted by Ben Nadel on Jul 28, 2007 at 6:05 PM