Ask Ben: Reading In A File Using CFFile And CFInclude
Posted October 30, 2006 at 6:18 PM
I have seen on some message boards people say to use cfinclude to read in files. I use cffile with action="read" to read in files right now. Is there a better way to do it? How can you read in a file using cfinclude?
NOTE: Charlie Arehart has pointed out serious flaws in my logic below, regarding GetPageContext().Include(). Please see my other blog post for more information.
Yes, you can read in files using the ColdFusion CFInclude tag rather than ColdFusion CFFile tag. However, I would not suggest doing it this way. CFFile was designed to read in files. It's nice and efficient. And, it can read in text files as well as binary files, something that the CFInclude tag cannot do.
When people read in files using CFInclude, they have to store the included data into a content buffer using the CFSaveContent tag:
Launch code in new window » Download code as text file »
- <!--- Read file contents into strFileData variable. --->
- <cfsavecontent variable="strFileData">
-
- <!--- Read in the contents of the file. --->
- <cfinclude template="./data.txt" />
-
- </cfsavecontent>
-
- <!--- Trim data. --->
- <cfset strFileData = Trim( strFileData ) />
When doing it this way, there are few issues. For starters, this it not very readable, at least not to me. Two, the CFInclude tag actually parses the included document. Most of the time you don't want to parse the document, as this can have HUGE processing overhead, you just want to read it in. And, finally, we have to trim the content since we are adding white space during the "reading" process. If the original data has leading or trailing white space, this will get lost.
To overcome the last two points above, we can use smarter white space control and the Include() method of the page context. The Include() method includes the file without parsing it:
Launch code in new window » Download code as text file »
- <!--- Read file contents into strFileData variable. --->
- <cfsavecontent variable="strFileData"
-
- <!--- Read in the contents of the file. --->
- ><cfset GetPageContext().Include( "./data.txt" )
-
- /></cfsavecontent>
Notice that we are not leaving any white space between our tags. This, to me, is still hard to read.
But, readability is one thing, let's test speed. I have created a rather large text document with 3.8 MILLION characters. I read this in using the CFFile tag as well as the CFSaveContent tag and outputted the character length:
Launch code in new window » Download code as text file »
- <!--- Test the CFFile speed. --->
- <cftimer label="CFFile" type="outline">
-
- <!--- Read the file into strXmlData. --->
- <cffile
- action="READ"
- file="#ExpandPath( './cffile_data.txt' )#"
- variable="strXmlData"
- />
-
- <!--- Output the data character length. --->
- File Length: <cfset WriteOutput( strXmlData.Length() ) />
-
- </cftimer>
-
-
- <cftimer label="CFInclude" type="outline">
-
- <!--- Read the file into strXmlData. --->
- <cfsavecontent variable="strXmlData"
-
- <!--- Read in the file. --->
- ><cfset GetPageContext().Include( "./cffile_data.txt" )
-
- /></cfsavecontent>
-
- <!--- Output the data character length. --->
- File Length: <cfset WriteOutput( strXmlData.Length() ) />
-
- </cftimer>
The CFFile tag was faster than the CFSaveContent tag method on every test by at least 50 ms. Of course, keep in mind that this is a fairly large file (3.6 megabytes). Not only was CFFile faster, it was much more consistent. It always ran at about 63 ms - 78 ms. The CFSaveContent method, on the other hand, was very sporadic. Sometimes it was as low as 125 ms. Other times, it would spike to as much as 625 ms.
This makes a lot of sense to me. CFFile was built to read file data into a variable. If you go the CFSaveContent route, you are most likely taking more steps and creating additional buffers which means more processing which means more room for bottle necks. You are basically asking ColdFusion to work in a way that I would consider different than what it was supposed to do.
Ok, so if it's slower and inconsistent, why would people use CFSaveContent over CFFile? To me, the main reason is purely flexibility. When you use the CFSaveContent tag you can read in multiple files into one variable. Think about including sub-XML documents. You could create the parent XML node then include sub xml data files all into one CFSaveContent tag:
Launch code in new window » Download code as text file »
- <!--- Read in site map. --->
- <cfsavecontent variable="strXmlData">
- <sections>
- <cfinclude template="./section_a.xml" />
- <cfinclude template="./section_b.xml" />
- <cfinclude template="./section_c.xml" />
- <cfinclude template="./section_d.xml" />
- </sections>
- </cfsavecontent>
This would incur all of the overhead we talked about before, but it can make combining multi-part documents very easy to do and maintain.
That being said, if you are just reading in a single file to use the data, please please please use the ColdFusion CFFile tag. Don't go getting all complicated for no reason.
Download Code Snippet ZIP File
Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Newer Post
Skin Spider : Meta Form Data
Older Post
Skin Spider : Applying The Programmatic Configuration Object
Reader Comments
Ben,
This is interesting. I had never thought to run performance tests. I'm not sure that I think the readability is a major issue.
I wonder what the performance difference is for very small files. I would think cfinclude and cfsavecontent would win in that case.
Keep in mind that some hosts have disabled cffile, making the cfinclude approach indispensible.
Steve,
For small the files, there is no difference. I didn't notice any speed difference till I got above a couple HUNDRED THOUSAND lines of text. For small files or even relatively large files, I doubt there is going to be any noticeable difference between the two methodologies.
And, you raise a most excellent point; on a shared server where CFFile has been disabled, this is a most useful alternative. Plus, as I mentioned, it is quite nicely flexible. I use the CFSaveContent methodology to build both my Google Sitemaps and my URL redirection (404 handling).
As for readability, I am mildly retarded and I have trouble reading lots of stuff :)
Steve,
Just one note on the small files and performance, they both performed at 0 ms. However, the CFSaveContent methodology would be slightly inconsistent and jump from 0 ms to 16 ms occationally where as CFFile was a consistent 0 ms. This is consistent with what I found for large files as well. I think there are more processing steps with CFSaveContent (intermediary buffers and what not) and this just leaves some more room for variability.
Hi, Ben. Another couple of reasons to use this are when CFFILE is restricted (as it may be in the CF Admin), or even when the Java approach to reading a file is also restricted (as can be done when CFOBJECT/createobject are restricted).
Indeed, I may hold at least some responsibility for a recent increase in people mentioning this approach, as I offered it in a couple prominent places recently.
First, I offered it (along with the other ways to read a file) as my tips column in the first issue of the FusionAuthority Quarterly Update. Then I also offered this specific CFINCLUDE approach (again, for when the other approaches are restricted) as the answer/question for the CF Weekly Podcast guys to use in their "CFQuiz" segment a few weeks ago.
In all of them, I should add that I certainly wasn't proposing it as preferable over the more traditional approach (and certainly not for a large file). It was more just to say that it's there if one needed it.
Interesting comment about using getpagecontext include, though I should clarify that it does indeed parse the file just like a regular CFINCLUDE. Try it. The difference, though (and another tip), is that it processed the file like a full page request, in that it runs any application.cfm/cfc (unlike when a page is run via CFINCLUDE). Still, it could have some advantage over a plain include when reading a non-CFM file, so worth considering.
If it's not obvious, I was writing my note at the same time Steve was writing his, and posted it before reading his which made a similar point. :-)
Charlie,
No worries about showing people this method. Certainly, no one can argue with the benefits of having more tools in the old tool belt right?
I stand corrected about the whole GetPageContext().Include() points. I was under the impression that is did not parse the page. I was also not aware that it invoked the Application.cfm/cfc as well. This is good information to have. Thank you very much. I totally believe you, but I still have to test this as I like to see it in action ;)
Thanks for you and Steve filling in all the gaps that I did not point out (and corrections to my logic).
Hi Ben,
I have a problem which might be related to this post.
I am currently trying to write a program which reads in HTML files and then process them. I have a "<cffile action=read ... " which works fine when I run it on my CF 5 server. However as soon as I push it to my CF 7 server, it breaks with a "500 null" error. The HTML file I am trying to read is pretty huge, like 150 MB but nevertheless, CF 5 has no problems, then why should CF 7?
CF7 can deal with reading smaller-sized HTML, but gives up for larger ones.
Also, this is not the first time I have seen the "500 null" on this CF 7 server. It occured when I was trying to generate 5000 PDFs using CFDOCUMENT.
Any help here is appreciated
@Vivek,
Do the ColdFusion logs or stack trace offer any advice? Does it say what line it's breaking on. Sometimes, when I get a 500 null, its because the page is breaking before any content has been flushed, and rather than pushing the error to the client, it goofs up and just displays 500 null error. Try putting a CFFlush before you do the CFFile read and see if it gives you a better error message.
@Vivek,
I would assert that your errors that worked in 5 but fail in 7 are due to a known issue in CF 7 (though not really discussed much until CF 8 was said to have fixed it.) The good news is that there is a fix, and/or also an Admin setting that may be your issue.
First, to clarify, in CF 7 (may have been in CF 6, too), file uploads get loaded not just to disk but into CF's memory, and as such, your 500 may be that you're running out of memory. The problem is that CF wouldn't release the memory after the page finished. Yikes!
So a couple of solutions were introduced. One is a set of CF admin settings (referred to as the throttle settings at the bottom of the settings page) where one can set (and there are defaults for) the maximum size of a file that can be uploaded. You may be hitting that max.
And Ben's right, you ought to be able to see this in the logs. If not the regular CF logs (which you can see in the Admin) then the "runtime" or "out" logs as they're called, found not in the [cf]/logs directory, but in the [cf]/runtime/logs instead (or [jrun]/logs if you're running CF in multiserver mode.)
Beyond that, as far as the problem of large files taking up memory and not releasing it, there was a hotfix introduced for CF 7.02 (http://www.adobe.com/go/kb401239). Even those who have applied all cumulative hotfixes to 7 still need to apply this manually. (I'm not aware of a backport of the fix to CF 6, if anyone would wonder.)
And to be clear, the problem isn't really the CFFILE ACTION="upload", but rather the very act of doing a input type="file" that posts to a CF Page. Some think the CFFILE action="upload" tag is the key here, but really all it does is move the file once uploaded from a temp directory to the DESTINATION you name for it. (More at http://www.carehart.org/blog/client/index.cfm/2006/5/7/cfform_not_doing_upload.)
Hope that helps. Let us know.
Hi Ben and Charlie,
Sadly, I dont have access to the production server logs. So I ran my cfm file on the CF 8 developer edition. the CF 8 crashed with the following error on screen. Really sorry for this long error messing the layout of the blog. I am still trying to get in touch with the people who can provide me the CF 7 server logs but I have a feeling the root cause there too would be the "out of memory" exception.
What am I trying to do?? :
Using CFFILE to read a large HTM file (160 MB) and store it in a local variable. (the exception occurs here)
Work Arounds I have tried:
1) Buffered Reader and File Reader using <CFobject>. Same Error.
2) Using CFhttp to hit the HTM file store its source in local variable.
Same Error.
It was finally decided to do this file parsing in Perl. But I am determined to find a feasible work-around for this in CF
Thanks for your time and help !
-----------------------------------------------------
500
ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at coldfusion.tagext.io.FileUtils.readFile(FileUtils.java:174)
at coldfusion.tagext.io.FileTag.read(FileTag.java:363)
at coldfusion.tagext.io.FileTag.doStartTag(FileTag.java:264)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2661)
at cfsplitUK2ecfm1567663972.runPage(C:\Inetpub\wwwroot\gyan\splitUK.cfm:22)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:196)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:370)
at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:279)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:86)
at coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:27)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:74)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.CfmServlet.service(CfmServlet.java:175)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at coldfusion.tagext.io.FileUtils.readFile(FileUtils.java:174)
at coldfusion.tagext.io.FileTag.read(FileTag.java:363)
at coldfusion.tagext.io.FileTag.doStartTag(FileTag.java:264)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2661)
at cfsplitUK2ecfm1567663972.runPage(C:\Inetpub\wwwroot\gyan\splitUK.cfm:22)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:196)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:370)
at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:279)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:86)
at coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:27)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:74)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.CfmServlet.service(CfmServlet.java:175)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:70)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
@Vivek, since you mention wanting to run this ultimately on CF8, there's a solution there which is not available in CF 7: CFLOOP's new support for looping over a FILE either per line or per character.
Where the CFFILE ACTION=read reads the whole file in, CFLOOP with the FILE attribute let's you point to a file and then use either INDEX="line" to pull it in one line at a time, or use INDEX="chars" (and corresponding CHARACTERS) attribute to pull in a set number of characters at a time.
Let us know if that solves the problem. (I just noticed this was not in the CF8 reference, so I just added a comment there.)
@Vivek,
Because you are getting an our of memory error, there's no way you can read in the entire file at any given time. Like Charlie is saying, you can read it piece-wise, line by line.
What are you ultimately trying to do with the HTML file?
It seems like a pretty straightforward issue but its got me totally stumped. I guess you guys can help me figure it out:
I have a webform where users upload a .TXT file that gets saved to the server. Once the file is uploaded, I wanna read it using the <cffile action = "read"..> tag and parse out the data.
I am able to parse the data when I provide a static Filename to <cffile file = ".."> attribute. However, since end users are going to be uploading the files, I have no way of knowing the name of the file and hence cannot hard code the file= attribute value.
I have tried using a whole load of combinations - with and without the # sign- but it simply doesn't work. Do any of you guys know what I am missing?
Thanks!
@Brian,
After the file uploads via CFFILE, there should be a CFFILE structure has the new file name as it exists on the server:
CFFILE.ServerFile
You can use this in your next CFFILE command:
<cffile
variable="strFileData"
action="read"
file="#yourUploadDirectory##CFFILE.ServerFile#"
/>
Does that make sense?
Ben to the rescue.....again. Thanks for the quick reply!
I could have sworn I tried that one.... well, maybe I did not try it after all because it works now!
Thanks again, Cheers!
@Brian,
Glad to help out. Glad it worked!
I am reading a .TXT file and parsing out the data that is tab-delimited. I am using the cffile action = "read" tag to do this. Is there a way for me to just looping from the 2nd row of data?
The first row of data is the column headers which of course I am not concerned about because they are always the same in every file.
The <cfloop index="index" list="#var#" delimiters="#chr(10)##chr(13)#"> won't allow me to use the 'startrow' attribute with the other attributes I have.
Am I stuck at something too straight forward?
@UMTerp,
Can you just ignore the first line within your loop? Or, you could simply delete the first row from the text with a ListDeleteAt( var, 1 ) before you execute the CFLoop.
Thanks Ben!
I did end up writing a loop to ignore the first line. Since I did not want to modify the original data files, I think skipping the first line will work for now.
@UMTerp,
Ok cool, sounds good.
Hai
Can anyone suggest me using other than cffile tag to write a file. Because in server they restricted me to access these tags for security reason. Kindly advice me.
Thanks
@Sarak,
Most likely, if they have disabled CFFile, they have also disabled CreateObject(), CFInvoke, and CFObject, which might have helped you read in files.
Oh...
Let me try to move with other server which will support my required tags.
I really appreciate you to gave me a quick response.
Thanks,
Ben




