Reading In File Data One Line At A Time Using ColdFusion's CFLoop Tag Or Java's LineNumberReader

Posted September 15, 2010 at 11:12 AM by Ben Nadel

Tags: ColdFusion

Last week on Twitter, someone asked about reading in files that were too big to fit in the allocated RAM on the JVM. To this problem, I suggested that the developer try using the file line reader functionality built into ColdFusion 8's CFLoop tag. After this discussion ended, someone else asked me to blog about this new CFLoop functionality as they had never heard of it before. As such, I figured I'd put together this quick ColdFusion demo.

As of ColdFusion 8, there are two new CFLoop attributes related to file parsing:

  • File - The expanded path of the file to read.
  • Characters - The number of characters to read from the file with each iteration.

While the File attribute is required for file reading, the Characters attribute is not. If the Characters attribute is omitted, ColdFusion defaults to reading in the file one line at a time (as defined by standard line delimiters - \r, \n, and \r\n). In this case (characters omitted), the Index variable of the loop will contain the line data, minus the line delimiters. If the Characters attribute is provided, the Index variable of the loop will contain the number of characters as defined by the Characters attribute (including the line delimiters).

To see both of these scenarios in action (by-line and by-characters), let's take a look at the following ColdFusion demo:

  • <!---
  • We are going to be reading in a file, line by line, so first,
  • let's create a file to read. Define the path to the file we
  • are going to populate.
  • --->
  • <cfset filePath = expandPath( "./data.txt" ) />
  •  
  • <!---
  • Delete the file if it exists so that we don't keep populating
  • the same document.
  • --->
  • <cfif fileExists( filePath )>
  •  
  • <cfset fileDelete( filePath ) />
  •  
  • </cfif>
  •  
  • <!--- Write some data to the file. --->
  • <cfloop
  • index="i"
  • from="1"
  • to="10"
  • step="1">
  •  
  • <cffile
  • action="append"
  • file="#filePath#"
  • output="This is line #i# in this file."
  • addnewline="true"
  • />
  •  
  • </cfloop>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <cfoutput>
  •  
  •  
  • <!---
  • Now, we are going to read the file in line-by-line using
  • ColdFusion 8's new CFLoop behavior. The File attribute
  • tells ColdFusion what file to read in, the Index attribute
  • defines the variable into which ColdFusion will put the
  • parsed text line.
  • --->
  • <cfloop
  • index="line"
  • file="#filePath#">
  •  
  • Line: #line#<br />
  •  
  • </cfloop>
  •  
  •  
  • <br />
  •  
  •  
  • <!---
  • CFLoop also allows for a Characters attribute. If we omit
  • this attibute (as above), ColdFusion reads the file line-by-
  • line. If we use the Characters attribute, however, ColdFusion
  • will read the file a chunk at a time based on the number of
  • characters defined.
  •  
  • Here, we are going to read the file in 50 characters at
  • a time.
  • --->
  • <cfloop
  • index="chunk"
  • file="#filePath#"
  • characters="50">
  •  
  • 50 Char Chunk: #chunk#<br />
  •  
  • </cfloop>
  •  
  •  
  • </cfoutput>

The first part of this demo simply creates and populates the file that we are going to read-in. Then, I use two CFLoop tags - one with just the File attribute and one with both the File and Characters attribute. When we run the above code, we get the following page output:

Line: This is line 1 in this file.
Line: This is line 2 in this file.
Line: This is line 3 in this file.
Line: This is line 4 in this file.
Line: This is line 5 in this file.
Line: This is line 6 in this file.
Line: This is line 7 in this file.
Line: This is line 8 in this file.
Line: This is line 9 in this file.
Line: This is line 10 in this file.

50 Char Chunk: This is line 1 in this file. This is line 2 in thi
50 Char Chunk: s file. This is line 3 in this file. This is line
50 Char Chunk: 4 in this file. This is line 5 in this file. This
50 Char Chunk: is line 6 in this file. This is line 7 in this fil
50 Char Chunk: e. This is line 8 in this file. This is line 9 in
50 Char Chunk: this file. This is line 10 in this file.

As you can see, when we provide the File attribute but omit the Characters attribute, ColdFusion will read the file in one line at a time. When we include the Characters attribute, ColdFusion will read the file in one-character-chunk at a time.

NOTE: While it is not represented in the rendered output, by-line reading does not include line delimiters; by-characters reading, on the other hand, does include line delimiters.

It's awesome how easy ColdFusion makes some of this functionality. And, while I can't be sure, I would guess that ColdFusion is using Java's LineNumberReader under the covers. The LineNumberReader class provides both by-line and by-characters parsing which makes it ideal for this new combination of CFLoop attributes.

If you are not using ColdFusion 8+ yet, you can still get this kind of functionality by dipping down into the Java layer and invoking the LineNumberReader class directly. ColdFusion provides a clean, simple abstraction for this functionality, so you'll see that using the LineNumberReader directly is quite a bit more complicated.

In the following demo, I am going to replicate the previous CFLoop output using the LineNumberReader class:

  • <!---
  • We are going to be reading in a file, line by line, so first,
  • let's create a file to read. Define the path to the file we
  • are going to populate.
  • --->
  • <cfset filePath = expandPath( "./data.txt" ) />
  •  
  • <!---
  • Delete the file if it exists so that we don't keep populating
  • the same document.
  • --->
  • <cfif fileExists( filePath )>
  •  
  • <cfset fileDelete( filePath ) />
  •  
  • </cfif>
  •  
  • <!--- Write some data to the file. --->
  • <cfloop
  • index="i"
  • from="1"
  • to="10"
  • step="1">
  •  
  • <cffile
  • action="append"
  • file="#filePath#"
  • output="This is line #i# in this file."
  • addnewline="true"
  • />
  •  
  • </cfloop>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • If you are not on ColdFusion 8 yet, you can still read files
  • in a line at a time by dipping down into the Java layer.
  • Behind the scenes, ColdFusion is probably using some sort of
  • buffered file reader, so we can do the same explicitly.
  •  
  • To create the line number reader, we have to pass it a Reader
  • object, which will a buffered reader for performance reasons.
  • --->
  • <cfset lineReader = createObject( "java", "java.io.LineNumberReader" ).init(
  • createObject( "java", "java.io.BufferedReader" ).init(
  • createObject( "java", "java.io.FileReader" ).init(
  • javaCast( "string", filePath )
  • )
  • )
  • ) />
  •  
  • <!---
  • Mark the beginning of the stream so we can reset the position
  • of the reader if we need to.
  •  
  • NOTE: You typically won't need this - I just need to do this so
  • I can demonstrate two file reads without creating a new line
  • number reader object.
  • --->
  • <cfset lineReader.mark(
  • javaCast( "int", 999999 )
  • ) />
  •  
  •  
  • <cfoutput>
  •  
  •  
  • <!---
  • Now, let's read the file in a line at a time. As we use the
  • readLine(), it will return a NULL when it gets to the end of
  • the file. When that happens, the variable we are using to
  • read the line will be deleted.
  • --->
  • <cfset line = lineReader.readLine() />
  •  
  • <!---
  • Check to make sure we didn't hit the end of the file (which
  • will return NULL, which will delete our variable).
  • --->
  • <cfloop condition="structKeyExists( variables, 'line' )">
  •  
  • Line: #line#<br />
  •  
  • <!--- Read the next line. --->
  • <cfset line = lineReader.readLine() />
  •  
  • </cfloop>
  •  
  •  
  • <br />
  •  
  •  
  • <!---
  • Reset the line number reader to the beginning of input
  • stream for next demo.
  • --->
  • <cfset lineReader.reset() />
  •  
  • <!---
  • We can also use the buffered line reader to read in chunks
  • of the file as we did with the CFLoop tag. This is a bit
  • more compliated as we need to read the character data into
  • a character array.
  • --->
  •  
  • <!---
  • Create a character array of length 50 for out read buffer
  • (we will be reading in a max of 50 characters at any time).
  •  
  • NOTE: It doesn't matter what the inital values are at this
  • point since our line number reader will overwrite the data.
  • --->
  • <cfset buffer = listToArray( repeatString( " ,", 50 ) ) />
  •  
  • <!---
  • Cast the ColdFusion array (collection) to a typed Java array
  • so that we can use it with the line number reader.
  • --->
  • <cfset buffer = javaCast( "char[]", buffer ) />
  •  
  •  
  • <!---
  • Read the file data into the buffer and record the number
  • of characters that were read.
  • --->
  • <cfset charCount = lineReader.read(
  • buffer,
  • javaCast( "int", 0 ),
  • javaCast( "int", arrayLen( buffer ) )
  • ) />
  •  
  • <!---
  • Keep looping while characters were read-in. When the line
  • reader hits the end of the file, it will return -1 for the
  • character count.
  • --->
  • <cfloop condition="(charCount neq -1)">
  •  
  • <!---
  • Output the chunk. When we do this, we want to convert
  • the buffer to a string and then just take out what's
  • needed.
  • --->
  • <cfset chunk = mid(
  • arrayToList( buffer, "" ),
  • 1,
  • charCount
  • ) />
  •  
  • 50 Char Chunk: #chunk#<br />
  •  
  • <!---
  • Read the next chunk of character data from the file
  • into the buffer and record the number of characters
  • that were read.
  • --->
  • <cfset charCount = lineReader.read(
  • buffer,
  • javaCast( "int", 0 ),
  • javaCast( "int", arrayLen( buffer ) )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • </cfoutput>

Again, the first part of the demo simply creates and populates the file that we are going to be reading. Once that is done, I then use the readLine() method for by-line parsing and the read() method for by-characters parsing. While the readLine() method is fairly straightforward, the read() method requires us to use a strongly-typed character array buffer which, as you can see, greatly increases the complexity of the code.

When we run the above ColdFusion and Java code, we get the following page output:

Line: This is line 1 in this file.
Line: This is line 2 in this file.
Line: This is line 3 in this file.
Line: This is line 4 in this file.
Line: This is line 5 in this file.
Line: This is line 6 in this file.
Line: This is line 7 in this file.
Line: This is line 8 in this file.
Line: This is line 9 in this file.
Line: This is line 10 in this file.

50 Char Chunk: This is line 1 in this file. This is line 2 in thi
50 Char Chunk: s file. This is line 3 in this file. This is line
50 Char Chunk: 4 in this file. This is line 5 in this file. This
50 Char Chunk: is line 6 in this file. This is line 7 in this fil
50 Char Chunk: e. This is line 8 in this file. This is line 9 in
50 Char Chunk: this file. This is line 10 in this file.

As you can see, this output is exactly the same as the output generated by the CFLoop-only demonstration.

In the above code, you'll notice that the LineNumberReader composes a BufferedReader instance. It's the BufferedReader that really makes this approach (and most likely the CFLoop approach) so efficient. I don't want to talk too much about how buffered readers work, as I'm not really a Java developer; but, they optimize the way the character data is read into memory so as to both minimize disk I/O as well as overall memory consumption.

The CFLoop tag is really one of the most amazing tags in ColdFusion. Between for-loops, query-loops, array-loops, list-loops, file-loops, and conditional-loops there's very little that the CFLoop tag can't do. It makes looping so easy, in fact, that you probably never even think about how much work this ColdFusion tag is actually abstracting. It's like they say - Great design should be invisible. Anyway, I hope this helps clarify how this part of the CFLoop tag works.




Reader Comments

Sep 15, 2010 at 12:04 PM // reply »
14 Comments

Hey Ben, I just got a quick thought while reading your post, what about reading the whole file into memory (using cffile action=read) and then we use list or maybe better regular expression to split the content string into chunks that we need. For the first case, they can be split into list items using the newline as delimiter. For the second case, we can use string manipulation to get 50 chars at a time (or regexp somehow, gotta think more)

Do you think that will works? Or it would be a lot slower? Like i said, i just thought of it, haven't tried it yet. well, lying on my bed with my ipad right now, so cant test it out :-)


Sep 15, 2010 at 12:25 PM // reply »
49 Comments

The problem with "reading the whole file into memory" is that you're reading the whole file into memory.

Depending on the file you're working with, this might be a difference between occupying 50 bytes of memory or occupying 5,000,000 bytes of memory.


Sep 15, 2010 at 12:26 PM // reply »
49 Comments

(As the opening line of this blog post says: "someone asked about reading in files that were too big to fit in the allocated RAM on the JVM")


Sep 15, 2010 at 1:00 PM // reply »
11,246 Comments

@Vinh,

If you can read the file into memory at one time, it will definitely be faster - the buffering is going to add overhead to the performance. I have definitely seen people deal with CSV files in this way - reading in the file, then converting it to an array using listToArray() in which the line delimiters are used as list-delimiters.

If you cannot read a file into memory, however, the buffered reader approach is going to be slower, but critical for overall performance (ie. not eating up all your RAM and causing overflow problems).

@Peter,

This just made me laugh out loud:

>> "The problem with "reading the whole file into memory" is that you're reading the whole file into memory."

Ha ha :)


Sep 15, 2010 at 1:55 PM // reply »
46 Comments

Ben,

How would you then extract just a portion of the content in a file, and then do a line by line read on it?

Im having to kind of do what Vinh is talking about. Extracting a portion, without reading in the whole file.


Sep 15, 2010 at 2:22 PM // reply »
42 Comments

I thought you'd include a benchmark between cfloop and Java's lineReader. :)


Sep 15, 2010 at 6:16 PM // reply »
14 Comments

@Peter, @Ben hahaha, sorry, stupid me, look like I missed the point of the article :-) I always read the whole file in. I think my applications are just not big enough that I encounter that overflow scenario :-p

Thanks for the article Ben. Next time my app hangs when it reads a file, I know why and will think of you :-p


Sep 15, 2010 at 6:20 PM // reply »
1 Comments

I've been working with this in a Java application this week. For working with large XML files, check out the XML Pull Parser project: http://www.extreme.indiana.edu/xgws/xsoap/xpp/

I haven't worked with this in Coldfusion, but it encapsulates streaming large XML files, provides methods for XPath... the JVM usage went from serious heap overflows to just a blip on the memory.


Sep 15, 2010 at 6:35 PM // reply »
11,246 Comments

@Matthew,

I suppose you could just call readLine() a given number of times until you get to portion you want. How do you identify the portion you are targeting?

@Henry,

Yeah, good point. My server was hemming and hawing this morning so I didn't have a lot of time to do the most in-depth exploration.

@Vinh,

No problem at all :)

@Robert,

Looks very interesting. I've played around a bit with large XML files and I know there are a number of event-driven parses; I've not had a great success with using them inside of ColdFusion. I'll take a look at this one as it seems to be doing something a bit different.


Sep 15, 2010 at 8:56 PM // reply »
46 Comments

@Ben,

currently im reading in the file and just doing an indexOf or reFind to get start and end positions but i then use mid() to get the portion.

I could do the same going through line by line and when it finds x to start getting the data, and when it finds y it stops. ill try it out and see if its any faster.


Sep 15, 2010 at 9:27 PM // reply »
14 Comments

@Matthew,

If I understand your method correctly, you use indexOf() to find the position of the newline character (or whatever char), mark it as the start, then find the next position of the same char, mark it as end and then use mid() to get the portion. Is that correct? If so, why not use list with the newline (or whatever char) as delimiter?


Sep 15, 2010 at 11:01 PM // reply »
11,246 Comments

@Matthew,

It's probably gonna be faster just using the Mid() approach since you're only doing the search once.

@Vinh,

You make a good point. Converting the list to an array using line delimiters and then using the array index is probably gonna be quite fast.


Sep 16, 2010 at 10:58 AM // reply »
110 Comments

I had this problem last week (reading in a large file) and found the cfloop solution to work nicely. I ran into a timeout issue then though with the cfloop. I managed to solve this by adding my own timer to the page that checks how long the import has been running (read in a line, quick time check, read in a line, time check, etc). Once it hit a preset time, it breaks out of the loop, pushes the user back to the "import" page, I run a javascript redirect (thus CF is now taking a break :) ) which pushes the user back to the CF page to continue the import at whatever line the import left off at. I set the CF Admin timeout to 10 seconds, and my time check to 8 seconds and it was still running (correctly) after 15 minutes. I just make sure to show the user a message to let them know what is happening, but it was a quick way to get around the timeout issues.


Sep 16, 2010 at 11:23 AM // reply »
11,246 Comments

@Gareth,

Wow - checking the execution time throughout the page - there's something kind of brilliant about that. I never thought of doing that.


Sep 16, 2010 at 4:43 PM // reply »
110 Comments

@Ben,
Thanks! It came down to necessity at the time. It's crazy what you come up with when it's late, you're running out of time and need to have a solution :) I had tried at first to use the method you described in a blog post a while back about extending the timeout so you can perform different tasks, but depending on the file size, this didn't seem feasible (and I'm sure would've caused fits on the server). The only way I could see getting around the timeout was to leave the CF execution and pass the reins back to the browser, then start the CF process up again. As long as you know the timeout in your administrator and set the timer to less than that value, it should run nicely.


Sep 17, 2010 at 10:54 PM // reply »
11,246 Comments

@Gareth,

Yeah, I've had trouble with my previous suggestion as well. In fact, when I recently changed it, I started getting a *ton* of error emails. At first, I thought maybe I was just getting new errors; what I think was happening though, was that my previous concept was just failing and the emails were never getting sent.

I like you're style, good sir.


Sep 18, 2010 at 12:26 PM // reply »
8 Comments

If you are looking for specific line numbers, you could use this CFC I wrote a while back:
http://filebyline.riaforge.org/

Its pretty flexible - some methods: getLine, setLine, insertLine, deleteLine...


Nov 30, 2012 at 8:42 AM // reply »
3 Comments

In a file contains the data as |(pipe) delimiter format and the file contains 50 rows. In that file each line 17 th | delimeter data I want, and How to change that 17th column data in same line and how to store that data in to same line in the same file.

can you pls post the sample codes for the above


Dec 3, 2012 at 3:06 AM // reply »
3 Comments

how to reads one line from an input file, does some processing on it, and writes the resultant data to the same line in the same file.
suppose the line contains the data in delimiter format,I want to change the data in 17 th delimiter then how to add the modified data in the same location(17 th delimiter) in the same line of the same file in cold fusion.



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 24, 2013 at 11:21 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, Ha ha, let's us never speak of justifying "##" notation again :P ... read »
May 24, 2013 at 11:18 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Ah, so it was indeed how I vaguely remembered it to be: A direct assignment value = users.id[ i ] causes value to retain the sticky datatype of the query column. Although unnecessary in ... read »
May 24, 2013 at 9:11 AM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Brandon, Hi, No, I haven't been able to do that. I have just kept it as it is. ... read »
May 23, 2013 at 9:52 PM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Muhmmadibn Did you figure out a solution to launching PDFs? I am running into the same issues myself. There is no way to close the PDF or go back once you launch it. Thanks in advance! ... read »
May 23, 2013 at 6:06 PM
The Girl Who Broke My Heart, And Made Me A Better Person
Good day,ladies and gentle men, my name is Dr AMADI the great spell caster in Africa, i have help so many people for different kind of problems,who say there is no solution to problems on earth, that ... read »
May 23, 2013 at 4:26 PM
ColdFusion QueryAppend( qOne, qTwo )
@Heather, Glad people are still getting value out of this! ... read »
May 23, 2013 at 3:49 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, I meant the code at the bottom (not the video). I did try to experiment with an intermediary variable, like: value = users.id[ i ]; arrayContains( userIDs, value ); ... but t ... read »
May 23, 2013 at 11:06 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Are you talking about As Number: YES As String: YES As Java: YES? If so, that's with 3 different ways of referencing the constant 1, not users.id[1]. Query object references(*) are what seem ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools