Ask Ben: Reading In A File Using CFFile And CFInclude
I have seen on some message boards people say to use cfinclude to read in files. I use cffile with action="read" to read in files right now. Is there a better way to do it? How can you read in a file using cfinclude?
NOTE: Charlie Arehart has pointed out serious flaws in my logic below, regarding GetPageContext().Include(). Please see my other blog post for more information.
Yes, you can read in files using the ColdFusion CFInclude tag rather than ColdFusion CFFile tag. However, I would not suggest doing it this way. CFFile was designed to read in files. It's nice and efficient. And, it can read in text files as well as binary files, something that the CFInclude tag cannot do.
When people read in files using CFInclude, they have to store the included data into a content buffer using the CFSaveContent tag:
<!--- Read file contents into strFileData variable. ---> <cfsavecontent variable="strFileData"> <!--- Read in the contents of the file. ---> <cfinclude template="./data.txt" /> </cfsavecontent> <!--- Trim data. ---> <cfset strFileData = Trim( strFileData ) />
When doing it this way, there are few issues. For starters, this it not very readable, at least not to me. Two, the CFInclude tag actually parses the included document. Most of the time you don't want to parse the document, as this can have HUGE processing overhead, you just want to read it in. And, finally, we have to trim the content since we are adding white space during the "reading" process. If the original data has leading or trailing white space, this will get lost.
To overcome the last two points above, we can use smarter white space control and the Include() method of the page context. The Include() method includes the file without parsing it:
<!--- Read file contents into strFileData variable. ---> <cfsavecontent variable="strFileData" <!--- Read in the contents of the file. ---> ><cfset GetPageContext().Include( "./data.txt" ) /></cfsavecontent>
Notice that we are not leaving any white space between our tags. This, to me, is still hard to read.
But, readability is one thing, let's test speed. I have created a rather large text document with 3.8 MILLION characters. I read this in using the CFFile tag as well as the CFSaveContent tag and outputted the character length:
<!--- Test the CFFile speed. ---> <cftimer label="CFFile" type="outline"> <!--- Read the file into strXmlData. ---> <cffile action="READ" file="#ExpandPath( './cffile_data.txt' )#" variable="strXmlData" /> <!--- Output the data character length. ---> File Length: <cfset WriteOutput( strXmlData.Length() ) /> </cftimer> <cftimer label="CFInclude" type="outline"> <!--- Read the file into strXmlData. ---> <cfsavecontent variable="strXmlData" <!--- Read in the file. ---> ><cfset GetPageContext().Include( "./cffile_data.txt" ) /></cfsavecontent> <!--- Output the data character length. ---> File Length: <cfset WriteOutput( strXmlData.Length() ) /> </cftimer>
The CFFile tag was faster than the CFSaveContent tag method on every test by at least 50 ms. Of course, keep in mind that this is a fairly large file (3.6 megabytes). Not only was CFFile faster, it was much more consistent. It always ran at about 63 ms - 78 ms. The CFSaveContent method, on the other hand, was very sporadic. Sometimes it was as low as 125 ms. Other times, it would spike to as much as 625 ms.
This makes a lot of sense to me. CFFile was built to read file data into a variable. If you go the CFSaveContent route, you are most likely taking more steps and creating additional buffers which means more processing which means more room for bottle necks. You are basically asking ColdFusion to work in a way that I would consider different than what it was supposed to do.
Ok, so if it's slower and inconsistent, why would people use CFSaveContent over CFFile? To me, the main reason is purely flexibility. When you use the CFSaveContent tag you can read in multiple files into one variable. Think about including sub-XML documents. You could create the parent XML node then include sub xml data files all into one CFSaveContent tag:
<!--- Read in site map. ---> <cfsavecontent variable="strXmlData"> <sections> <cfinclude template="./section_a.xml" /> <cfinclude template="./section_b.xml" /> <cfinclude template="./section_c.xml" /> <cfinclude template="./section_d.xml" /> </sections> </cfsavecontent>
This would incur all of the overhead we talked about before, but it can make combining multi-part documents very easy to do and maintain.
That being said, if you are just reading in a single file to use the data, please please please use the ColdFusion CFFile tag. Don't go getting all complicated for no reason.
Want to use code from this post? Check out the license.
This is interesting. I had never thought to run performance tests. I'm not sure that I think the readability is a major issue.
I wonder what the performance difference is for very small files. I would think cfinclude and cfsavecontent would win in that case.
Keep in mind that some hosts have disabled cffile, making the cfinclude approach indispensible.
For small the files, there is no difference. I didn't notice any speed difference till I got above a couple HUNDRED THOUSAND lines of text. For small files or even relatively large files, I doubt there is going to be any noticeable difference between the two methodologies.
And, you raise a most excellent point; on a shared server where CFFile has been disabled, this is a most useful alternative. Plus, as I mentioned, it is quite nicely flexible. I use the CFSaveContent methodology to build both my Google Sitemaps and my URL redirection (404 handling).
As for readability, I am mildly retarded and I have trouble reading lots of stuff :)
Just one note on the small files and performance, they both performed at 0 ms. However, the CFSaveContent methodology would be slightly inconsistent and jump from 0 ms to 16 ms occationally where as CFFile was a consistent 0 ms. This is consistent with what I found for large files as well. I think there are more processing steps with CFSaveContent (intermediary buffers and what not) and this just leaves some more room for variability.
Hi, Ben. Another couple of reasons to use this are when CFFILE is restricted (as it may be in the CF Admin), or even when the Java approach to reading a file is also restricted (as can be done when CFOBJECT/createobject are restricted).
Indeed, I may hold at least some responsibility for a recent increase in people mentioning this approach, as I offered it in a couple prominent places recently.
First, I offered it (along with the other ways to read a file) as my tips column in the first issue of the FusionAuthority Quarterly Update. Then I also offered this specific CFINCLUDE approach (again, for when the other approaches are restricted) as the answer/question for the CF Weekly Podcast guys to use in their "CFQuiz" segment a few weeks ago.
In all of them, I should add that I certainly wasn't proposing it as preferable over the more traditional approach (and certainly not for a large file). It was more just to say that it's there if one needed it.
Interesting comment about using getpagecontext include, though I should clarify that it does indeed parse the file just like a regular CFINCLUDE. Try it. The difference, though (and another tip), is that it processed the file like a full page request, in that it runs any application.cfm/cfc (unlike when a page is run via CFINCLUDE). Still, it could have some advantage over a plain include when reading a non-CFM file, so worth considering.
If it's not obvious, I was writing my note at the same time Steve was writing his, and posted it before reading his which made a similar point. :-)
No worries about showing people this method. Certainly, no one can argue with the benefits of having more tools in the old tool belt right?
I stand corrected about the whole GetPageContext().Include() points. I was under the impression that is did not parse the page. I was also not aware that it invoked the Application.cfm/cfc as well. This is good information to have. Thank you very much. I totally believe you, but I still have to test this as I like to see it in action ;)
Thanks for you and Steve filling in all the gaps that I did not point out (and corrections to my logic).
I have a problem which might be related to this post.
I am currently trying to write a program which reads in HTML files and then process them. I have a "<cffile action=read ... " which works fine when I run it on my CF 5 server. However as soon as I push it to my CF 7 server, it breaks with a "500 null" error. The HTML file I am trying to read is pretty huge, like 150 MB but nevertheless, CF 5 has no problems, then why should CF 7?
CF7 can deal with reading smaller-sized HTML, but gives up for larger ones.
Also, this is not the first time I have seen the "500 null" on this CF 7 server. It occured when I was trying to generate 5000 PDFs using CFDOCUMENT.
Any help here is appreciated
Do the ColdFusion logs or stack trace offer any advice? Does it say what line it's breaking on. Sometimes, when I get a 500 null, its because the page is breaking before any content has been flushed, and rather than pushing the error to the client, it goofs up and just displays 500 null error. Try putting a CFFlush before you do the CFFile read and see if it gives you a better error message.
I would assert that your errors that worked in 5 but fail in 7 are due to a known issue in CF 7 (though not really discussed much until CF 8 was said to have fixed it.) The good news is that there is a fix, and/or also an Admin setting that may be your issue.
First, to clarify, in CF 7 (may have been in CF 6, too), file uploads get loaded not just to disk but into CF's memory, and as such, your 500 may be that you're running out of memory. The problem is that CF wouldn't release the memory after the page finished. Yikes!
So a couple of solutions were introduced. One is a set of CF admin settings (referred to as the throttle settings at the bottom of the settings page) where one can set (and there are defaults for) the maximum size of a file that can be uploaded. You may be hitting that max.
And Ben's right, you ought to be able to see this in the logs. If not the regular CF logs (which you can see in the Admin) then the "runtime" or "out" logs as they're called, found not in the [cf]/logs directory, but in the [cf]/runtime/logs instead (or [jrun]/logs if you're running CF in multiserver mode.)
Beyond that, as far as the problem of large files taking up memory and not releasing it, there was a hotfix introduced for CF 7.02 (http://www.adobe.com/go/kb401239). Even those who have applied all cumulative hotfixes to 7 still need to apply this manually. (I'm not aware of a backport of the fix to CF 6, if anyone would wonder.)
And to be clear, the problem isn't really the CFFILE ACTION="upload", but rather the very act of doing a input type="file" that posts to a CF Page. Some think the CFFILE action="upload" tag is the key here, but really all it does is move the file once uploaded from a temp directory to the DESTINATION you name for it. (More at http://www.carehart.org/blog/client/index.cfm/2006/5/7/cfform_not_doing_upload.)
Hope that helps. Let us know.
Hi Ben and Charlie,
Sadly, I dont have access to the production server logs. So I ran my cfm file on the CF 8 developer edition. the CF 8 crashed with the following error on screen. Really sorry for this long error messing the layout of the blog. I am still trying to get in touch with the people who can provide me the CF 7 server logs but I have a feeling the root cause there too would be the "out of memory" exception.
What am I trying to do?? :
Using CFFILE to read a large HTM file (160 MB) and store it in a local variable. (the exception occurs here)
Work Arounds I have tried:
1) Buffered Reader and File Reader using <CFobject>. Same Error.
2) Using CFhttp to hit the HTM file store its source in local variable.
It was finally decided to do this file parsing in Perl. But I am determined to find a feasible work-around for this in CF
Thanks for your time and help !
java.lang.OutOfMemoryError: Java heap space
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
@Vivek, since you mention wanting to run this ultimately on CF8, there's a solution there which is not available in CF 7: CFLOOP's new support for looping over a FILE either per line or per character.
Where the CFFILE ACTION=read reads the whole file in, CFLOOP with the FILE attribute let's you point to a file and then use either INDEX="line" to pull it in one line at a time, or use INDEX="chars" (and corresponding CHARACTERS) attribute to pull in a set number of characters at a time.
Let us know if that solves the problem. (I just noticed this was not in the CF8 reference, so I just added a comment there.)
Because you are getting an our of memory error, there's no way you can read in the entire file at any given time. Like Charlie is saying, you can read it piece-wise, line by line.
What are you ultimately trying to do with the HTML file?
It seems like a pretty straightforward issue but its got me totally stumped. I guess you guys can help me figure it out:
I have a webform where users upload a .TXT file that gets saved to the server. Once the file is uploaded, I wanna read it using the <cffile action = "read"..> tag and parse out the data.
I am able to parse the data when I provide a static Filename to <cffile file = ".."> attribute. However, since end users are going to be uploading the files, I have no way of knowing the name of the file and hence cannot hard code the file= attribute value.
I have tried using a whole load of combinations - with and without the # sign- but it simply doesn't work. Do any of you guys know what I am missing?
After the file uploads via CFFILE, there should be a CFFILE structure has the new file name as it exists on the server:
You can use this in your next CFFILE command:
Does that make sense?
Ben to the rescue.....again. Thanks for the quick reply!
I could have sworn I tried that one.... well, maybe I did not try it after all because it works now!
Thanks again, Cheers!
Glad to help out. Glad it worked!
I am reading a .TXT file and parsing out the data that is tab-delimited. I am using the cffile action = "read" tag to do this. Is there a way for me to just looping from the 2nd row of data?
The first row of data is the column headers which of course I am not concerned about because they are always the same in every file.
The <cfloop index="index" list="#var#" delimiters="#chr(10)##chr(13)#"> won't allow me to use the 'startrow' attribute with the other attributes I have.
Am I stuck at something too straight forward?
Can you just ignore the first line within your loop? Or, you could simply delete the first row from the text with a ListDeleteAt( var, 1 ) before you execute the CFLoop.
I did end up writing a loop to ignore the first line. Since I did not want to modify the original data files, I think skipping the first line will work for now.
Ok cool, sounds good.
Can anyone suggest me using other than cffile tag to write a file. Because in server they restricted me to access these tags for security reason. Kindly advice me.
Most likely, if they have disabled CFFile, they have also disabled CreateObject(), CFInvoke, and CFObject, which might have helped you read in files.
Let me try to move with other server which will support my required tags.
I really appreciate you to gave me a quick response.
Ben, do you know what is the size of the file CFFILE can read into one variable. I have a big text file which is 350 K with over 250 K records. It seems that ColdFusion refuse to read the file into a variable becasue the size. (I am using the variable as a temp file, then parse into fields. Not successful).
I used this way for smaller text files previously and it worked great.
Is the page breaking on the file read? Or the subsequent parsing into fields? It should be able to read the file in, no problem - string data doesn't take up too much RAM. String manipulation, however, depending on how you are doing it, might be the culprit.
Hello, first off, Ben you are my hero.
And now my question, I was trying to read a file, and parse/save data into a db, I ended up uploading it to the server then reading it, to achieve this, but is there a simple way of just reading and saving without the uploading part?
I am sorry, I am not sure what you are asking? Are you trying to read an uploaded file without saving it to the server's file system?
It's a shame that you can use the "delimiter" attribute when cfloop-ing over a file. We were having timeout issues reading in a very large file, and it doesn't use the new line as the delimiter. I ended up chopping the file into smaller files when I created it, and just read in each small file.
@Robert, while you can't use DELIMITER when CFLoop'ing over a file when using the new CF8 FILE attribute (you must use Index instead), you can in fact do what you want. In fact, it's what people did before CF8 added the FILE and index="line" attributes.
Instead, you want to use CFFILE Action="read" to get the file into a variable, and then use CFLOOP LIST, where you CAN use the DELIMITER. People used to use the CHR function to point to the code for a CRLF. You could use that or whatever you need to specify your desired delimiter. There are examples all over to help, but here's one: http://www.learn-coldfusion-tutorial.com/Files.cfm
Does that address your need?
Thanks for the reply. Yep, I tried cffile read but the file is about 150mb, so I crash the JVM with an out-of-memory exception. I was hoping that with cfloop file I could use a that as a file pointer and not have to read the entire file into memory.
Hello Ben, thank you for your reply. Please allow me to clarify.
I have a local file that the user is browsing to and selecting that I wanted to parse and save in DB. I'm uploading the file to the server then reading and parsing it, and was wondering if I could avoid uploading it and just parse/save it directly. I have something like this:
<cffile action="upload" filefield="fileUpload" destination="D:\upload" nameconflict="overwrite">
<cffile action="read" file="#filepath#" variable="myfile">
I think you can get the binary data from the form post... but I don't believe I have ever done that myself. I think you start to add a lot of complexity in parsing the form delimiters if you want to handle this manually. Once you upload the file, it gets saved, regardless, to the temp directory. I believe, if you want, you can get the TMP file path from the form field itself, if you just want to read the file in that way?
Is there a way to capture the file name when reading the content of that file using cffile?
If you have the full path to a file, you should be able to strip off the actual file name using getFileFromPath():
getFileFromPath( "/your/full/path/file.txt" )
... returns "file.txt". Is that what you mean?
Thanks for the quick reply.
That was the idea. The getFileFromPath returns a "52648.tmp" file name which is generated by the server not the actual file name. I'm not sure how to extract the actual file name when the user upload the file from their local machine.
That's odd. Typically, you only see .tmp file names when you are dealing with CFFile[upload] actions. I am not sure what process you have in place that is creating tmp files. Once it gets saved on the server, however, I don't think there's any way to get the original file name - you might have to augment whatever process saves the file in the first place.
main.com.cfm is a dynamic file, would cffile be able to run the code, or would I need to do it this way?
If "main.com.cfm" contains ColdFusion code that you want to run, you would need to execute it via CFInclude; CFFile[read] will only load the actual text content of the file, not run it.
Just to throw some more "don't use the cfinclude method" fire.
With the CF9 admin setting "Cache Template In Request" you may get old results, because as you say, the content of the file is parsed.
<cfloop from="1" to="10" index="i">
<cffile action="write" file="#Expandpath("./test.txt")#" output="#CreateUUID()#">
<cfoutput><cfinclude template="test.txt"><br /></cfoutput>
You will get the same UUID output each time.
The situation gets much worse if you have Trusted Cache enabled in any version of ColdFusion.
@grumpy (Dave?), I'd like to offer a couple of thoughts if it may help readers here.
First, I would just point out that that behavior you're seeing (when "Cache Template In Request" is enabled) is really to be expected. Indeed, it's the very point of the new (in 9) option: to tell CF NOT to check if a referenced file has changed after checking it the first time it's used within a request.
Second, just to be technically accurate, it's not about the file being "parsed", though. It's just about CF looking to see if the update date of the file has changed. With this new setting, it does it only once.
Third, as for the "trusted cache", while you call it "worse", that's just really another dimension to this same issue. With that enabled, CF only checks for a referenced file's update date the first time the file is used (and loaded into the template cache), and it does not check it again (until CF restarts, or that template cache is cleared, or that file alone has been removed from the template cache, whether by aging out or by use of the Admin API cleartrustedcache method and its new feature in CF8 to name a specific file/s.)
So the new feature in 9 is kind of a compromise solution: it allows CF to pick up a change to a file between different requests, but it stops CF checking for the file update WITHIN a single request.
I wouldn't see either of these as bad things, just different, and things that we do want to understand. And it's great when they come up in blogs like this, or in mailing lists, as a chance to bring them to people's attention again, so thanks for bringing them up.
And yes, as you say, the bottom line is that it is indeed one more reason one should not use CFINCLUDE to read in files, when there are many other options, and they aren't susceptible to these issues as you note. :-)
@carehart Nope, not Dave. :-)
Yep, "parsed" was the wrong way to describe that. Thanks for clarifying it for others.
And yeah, "Cache Template In Request" is a major improvement for CF9 and "Trusted Cache" can be very effective as long as, as you say, you understand them and know how they will affect your code.
@GrumpyCFer, yep, though to your last point, about how using "trusted cache" might "affect your code", it really would never affect your code. Maybe you meant that it might "affect how one's code is executed", as it does indeed impact that.
I realize that's a bit pedantic, but we don't want any readers to come away with the wrong impression, especially on a subject like this, about which many are generally unaware in the first place. :-)
PS The reason I wondered if you might be a "Dave" is that, as many know, there is someone who goes by a similar nickname (dfgrumpy), Dave Ferguson, of cfhour.com fame. I didn't know if perhaps he sometimes might use your grumpycfer handle. Too many "grumps in da house". :-)
Yeah, that is a bit pedantic. I can't imagine anyone would presume that means that their code will be re-authored by cfadmin by changing a boolean value. I'm not sure how it would be read otherwise, but hopefully others will find the distinction useful.
@GrumpyCFer, that may surprise you (what you "can't imagine"), but it really would not surprise me at all. The things people think and assume about CF (wrongly, to their peril), stuns me.
This comes from sincere experience, not just casual observation. All I do, all day, every day, is help people solve CF problems, whether on lists, forums, blogs like this, etc., or in my independent troubleshooting consulting, both newcomers and those with years of experience.
So, trust me, I've seen a lot that has me more careful. And yes, words matter and clarifications are often required, maybe not for everyone, but almost certainly for someone. Can it be carried too far? Certainly. Have I done so here? Perhaps.
But at what point does one get the benefit of the doubt, where their suggestion is deemed at least reasonable if it may help someone, even if we may not t agree it needs to be said? That's a rhetorical question. Don't want to drag it out.
Again, my main point was to agree that this is a subject (template caching) on which most are lost and confused, so I do thank you for adding to the discussion as you did earlier.