ColdFusion CFHttp To Query Much Faster Than Java Buffered Reader

Posted August 8, 2006 at 2:53 PM by Ben Nadel

Tags: ColdFusion

In response to some posts on CF-Talk that have discussed parsing large files, I thought I would do some investigation of my own. I have never parsed in a large file, so this was new an exciting. I was going to try two different approaches: Using the CFHttp to read in a file to a query and using the Java Buffered File Reader to read in a line of a file at a time and build a query based on the records. From my other experiences, I had assumed that the Java method was going to clean the floor with the CFHttp tag in terms of speed.

I was absolutely floored by the speed of CFHttp! Not only did it parse data files like no body's business, it did it so much faster than the Java method. When the test cases got too big, the Java method would often timeout.

NOTE: I am not a great Java programmer, so my methods could have been horrible.

I started building up a data file by repeating these data lines:

Cox Christina Blonde 5'5" 135.0 Muscular Yes
Rodruigez Michelle Black 5'3" 125.0 Muscular Dog Yes
Otting Franics Blonde 5'4" 140.0 Muscular Cat Yes
Turgot Ayse Blonde 5'2" 125.0 Curvey Yes
Clarke Molly Burnette 5'4" 122.0 Lean Cat Yes
Skerret Laura Brunette 5'4" 130.0 Lean Rabbit Yes
Parker Sara Blonde 5'2" 105.0 Skinny No
Vivenzio Sarah Brunette 5'3" 130.0 Curvey Cat Yes
Matzukata Yuu Black 5'4" 120.0 Curvey Yes

... where each field was separated by tab characters (not demonstrated in the above text). I then used the CFHttp tag to read in the file:

  • <cfhttp
  • url="http://...../read_large_files/test.txt"
  • method="GET"
  • name="qGirl"
  • columns="last_name,first_name,hair_color,height,weight,body_type,pet,has_nice_smile"
  • delimiter=" "
  • textqualifier=""
  • firstrowasheaders="no"
  • />
  •  
  • <!--- Display the number of records read in. --->
  • Read in #NumberFormat( qGirl.RecordCount )# records

As you can see, the data file was made up of 8 distinct column: last_name, first_name, hair_color, height, weight, body_type, pet, and has_nice_smile. The delimiter was a tab character. If you don't realize that the textqualifier attribute defaults to double-quotes, you are gonna pull your hair out trying to figure out why it can't find all the columns (and throws errors) if you use quotes in your field values!

To test my Java theory, I set up this test that read in a single line at a time:

  • <!--- Create the query. --->
  • <cfset qGirl = QueryNew( "last_name, first_name, hair_color, height, weight, body_type, pet, has_nice_smile" ) />
  •  
  • <!--- Create the file reader. --->
  • <cfset jobjReader =
  • CreateObject( "java", "java.io.BufferedReader" ).Init(
  • CreateObject( "java", "java.io.FileReader" ).Init(
  • CreateObject( "java", "java.io.File" ).Init(
  • ExpandPath( "./test.txt" )
  • )
  • )
  • ) />
  •  
  • <!--- Create a variable for the line. --->
  • <cfset REQUEST.Line = jobjReader.ReadLine() />
  •  
  • <!--- Set up counter (for shorthand) />
  • <cfset intCounter = 1 />
  •  
  • <!--- Loop while we have a line. --->
  • <cfloop condition="StructKeyExists( REQUEST, 'Line' )">
  •  
  • <!--- Add a record to the query. --->
  • <cfset QueryAddRow( qGirl, 100000 ) />
  •  
  • <!--- Split the line up into an array. --->
  • <cfset arrFields = REQUEST.Line.Split( " " ) />
  •  
  • <!--- Set field values. --->
  • <cfset qGirl[ "last_name" ][ intCounter ] = arrFields[ 1 ] />
  • <cfset qGirl[ "first_name" ][ intCounter ] = arrFields[ 2 ] />
  • <cfset qGirl[ "hair_color" ][ intCounter ] = arrFields[ 3 ] />
  • <cfset qGirl[ "height" ][ intCounter ] = arrFields[ 4 ] />
  • <cfset qGirl[ "weight" ][ intCounter ] = arrFields[ 5 ] />
  • <cfset qGirl[ "body_type" ][ intCounter ] = arrFields[ 6 ] />
  • <cfset qGirl[ "pet" ][ intCounter ] = arrFields[ 7 ] />
  • <cfset qGirl[ "has_nice_smile" ][ intCounter ] = arrFields[ 8 ] />
  •  
  • <!--- Read in the next line. --->
  • <cfset REQUEST.Line = jobjReader.ReadLine() />
  •  
  • <!--- Update the counter. --->
  • <cfset intCounter = (intCounter + 1) />
  •  
  • </cfloop>
  •  
  • <!--- Display the number of records read in. --->
  • Read in #NumberFormat( qGirl.RecordCount )# records

In this test, I am creating a buffered file reader that is reading in one line at a time. For each line, it breaks it up into an array and adds those values to the query.

When I ran the test on 490,000 rows of data, CFHttp executed in about 25 seconds! I can't give you Java file buffer speeds as the page kept timing out. I must be doing something horribly wrong in my methodology as Java is what is really going on underneath any way. I chalk that up to my inexperience.

But, here are some stats for CFHttp:

490,000 rows in 25 seconds. 20 megabyte file.
790,000 rows in 44 seconds. 37 megabyte file.
950,000 rows in 97 seconds. 45 megabyte file.

This is crazy! I am floored at how fast that is. So the question is, why are people still having problems reading in files. For starters, this only works with structured query-esque data. You can't read in an XML file object like this. Secondly, it requires a URL to work, so if you need to read a file that is outside the webroot, you are out of luck. That would be the benefit of the Java methodology (though clearly one better than the one I put together), you can read in files from anywhere in the file system.



Reader Comments

Jan 21, 2008 at 2:44 PM // reply »
17 Comments

Do you think it was the reading of the file,
or cfaddquery?

what is the difference between the java buffered file reader and
java.io.FileReader?


Jan 23, 2008 at 10:01 AM // reply »
11,246 Comments

@Steve,

To be honest, I am not sure what is making it faster. I assume that the parsing in the ColdFusion engine is just much faster and more efficient than anything that I was doing.

As far as the different for the file IO and the buffered file reader, the Buffered File read is actually a decorator that wraps around the file IO object and adds additionally functionality, like reading in large chunks at a time rather than one huge read or a ton of smaller reads. It just allows for more efficient file reading when you want to examine parts of the file content at a time.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 23, 2013 at 9:52 PM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Muhmmadibn Did you figure out a solution to launching PDFs? I am running into the same issues myself. There is no way to close the PDF or go back once you launch it. Thanks in advance! ... read »
May 23, 2013 at 6:06 PM
The Girl Who Broke My Heart, And Made Me A Better Person
Good day,ladies and gentle men, my name is Dr AMADI the great spell caster in Africa, i have help so many people for different kind of problems,who say there is no solution to problems on earth, that ... read »
May 23, 2013 at 4:26 PM
ColdFusion QueryAppend( qOne, qTwo )
@Heather, Glad people are still getting value out of this! ... read »
May 23, 2013 at 3:49 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, I meant the code at the bottom (not the video). I did try to experiment with an intermediary variable, like: value = users.id[ i ]; arrayContains( userIDs, value ); ... but t ... read »
May 23, 2013 at 11:06 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Are you talking about As Number: YES As String: YES As Java: YES? If so, that's with 3 different ways of referencing the constant 1, not users.id[1]. Query object references(*) are what seem ... read »
May 23, 2013 at 9:55 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dan, According to the CF Admin, I'm running Java "1.6.0_45". As far as the DB column, in the database it's an INT. I'll see if I can dig into what CF sees it as. @WebManWalking, But h ... read »
May 23, 2013 at 9:49 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, I think the problem is that we're used to loose typing in ColdFusion, like JavaScript. If a value is a number but it's needed in an expression to be a string, noooo problem. I've encountered ... read »
May 23, 2013 at 9:47 AM
ColdFusion QueryAppend( qOne, qTwo )
You rock! Thank you, thank you, thank you!!! ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools