As always I am trying to learn more about the Java libraries that live underneath the surface of ColdFusion MX 7. One class caught my eye: LineNumberReader. This is utility that reads in one line of data from a file at a time while keeping track of the line number of the current data read. It considers lines to be separated by \n, \r, or \r followed by \n.
Now, when would you use something like this? Why not just read in an entire file and split it on line breaks? It's all about memory usage. To read in an entire file and store it in the system RAM can slow down the machine greatly. You might even run out of memory and I am sure bad things happen at that point. By using a line reader, you can read in only bits of the file at a time. It's not going to be as fast as reading in the whole file and breaking it up, but it's going to be a lot nicer on the overall system.
Let's take a look at an example.
<!--- Create the line reader. ---> <cfset objLineReader = CreateObject( "java", "java.io.LineNumberReader" ).Init( <!--- Buffered Reader. ---> CreateObject( "java", "java.io.BufferedReader" ).Init( <!--- File reader. ---> CreateObject( "java", "java.io.FileReader" ).Init( <!--- File path. ---> JavaCast( "string", ExpandPath( "./data.txt" ) ) ) ) ) /> <!--- Get first line. ---> <cfset REQUEST.LineData = objLineReader.ReadLine() /> <!--- Loop while we still have line data. ---> <cfloop condition="StructKeyExists( REQUEST, 'LineData' )"> <!--- Get the line number. ---> <cfset intLineNumber = objLineReader.GetLineNumber() /> <!--- Output line data. ---> #intLineNumber#) #REQUEST.LineData#<br /> <!--- Read the next line. ---> <cfset REQUEST.LineData = objLineReader.ReadLine() /> </cfloop>
As you can see, at the center of it all, we are creating a FileReader instance. This is going to get the actual data from the file we specify (in this case, "data.txt"). Then, we wrap the FileReader in a BufferedReader. The buffered reader makes data retrieval from the file much more efficient by bulk loading file data then passing back bits of pre-read data. It only goes back to the file itself when it runs out of loaded data to return. Then, we wrap the BufferedReader in the LineNumberReader.
Looping over the lines in the data can be a bit confusing if you don't understand how ColdFusion handles NULL values passed back from Java. In the example above, you will see that we read the first line into a REQUEST-scoped variable, LineData. We then keep reading lines until the key "LineData" no longer exists in the REQUEST scope. This might seem very odd, but it is how a lot of readers will work in ColdFusion (such as reading in ZIP entries from an ZipInputStream). The LineNumberReader keeps reading lines until the return data is NULL. Since ColdFusion doesn't have a NULL data type, it attempts to create a NULL value by just destroying the variable reference itself. So, it keeps reading data, then hits a NULL, and as a result, it strips the variable "LineData" right out of the REQUEST scope.
So, nothing special here, just a little example of how something like that will work.
Want to use code from this post? Check out the license.
You should wrap this baby up into a simple UDF and submit it to cflib. :)
Which part would be in the UDF? The creating of the line number reader object? What were you envisioning?
So, I am attempting to use this method for importing a file that is about 1MB and is just over 34,000 lines and I run out of memory every time. Any advice? If you'd like to see my code, you can check it out here:
I am not sure that forcing garbage collection actually does anything within a single request processing. I don't think it can because the garbage collector can't be sure that the given value isn't going to be referred to later down in the code perhaps? I think the actual request needs to finish executing before GC works as you intend it to (just a theory).
That said, if the file is only 1MB, you really shouldn't be running out of memory! That's really not a large file. What happens if you simply read in the entire file at one time and then break it up by line break:
<cfset arrLines = ListToArray( fileData, "#Chr( 13 )##Chr( 10 )#" )>
That might not run out of memory? Or, are you already having memory issues?
I see this post is almost 3 years old but all the same, I'm wondering why you nested the Buffer Reader inside the LineNumberReader; isn't LineNumberReader an extension of Buffer Reader? It seems like it's an extra step. Perhaps it's just preference but wouldn't something like this be just as good, if not a tab more efficient:
a = createObject('java', 'java.io.FileReader').init('file_n_path.name');
b = createObject('java', 'java.io.LineNumberReader').init(a);
I think you might be correct. Nice call!
I get this error (CF 8).. Any tips?
Object Instantiation Exception.
An exception occurred when instantiating a Java object. The class must not be an interface or an abstract class. Error: ''.
1 : <!--- File path. --->
22 : JavaCast( "string", ExpandPath( "./data.txt" ) )