Parsing CSV Data With ColdFusion's CFHTTP Tag

Posted April 19, 2010 at 9:17 AM by Ben Nadel

Tags: ColdFusion

The other day on Twitter, Simon Hume and I were talking about parsing CSV (comma separated values) data files located on a remote server. I mentioned to him that in addition to being able to pull down remote data, ColdFusion's CFHTTP also has the ability to parse CSV data into a query object on the fly. This seems to be a rarely used feature of the tag, not to mention one that I have never demonstrated before. As such, I figured I would put together a very brief demo.

Before we get into the demo, however, I just want to point out that using CFHTTP to parse data does has a few large limitations to be aware of:

  • It cannot use the "file://" protocol. This means that any CSV data that you are trying to parse must be available via a URL. If you want to parse local data, you still have to route back to your server using HTTP and a web-accessible location (this is what I'm doing in the demo below).
  • It cannot handle embedded carriage returns. You can qualify column values if you wish to embed column delimiters or field qualifiers; but, you cannot embed an actual line break.
  • It cannot handle a variable number of columns. Each row must have the same number of columns, even if those columns are empty.

These limitations are only important if you need to pull down the CSV data and parse it in a single step. If you need to read local data, account for a variable number of columns, or handle embedded line breaks, you can always use something like my regular-expression-powered CSV parser. Even if you need to use remote data, you could still pull down the data with CFHTTP and then parse it afterwards.

That said, let's get on to the CFHTTP CSV example. To demo this, I set up the following CSV data file (data.csv):

name,age,hair
Tricia,27,Brunette
Joanna,25,Brunette
Lisa,33,Blonde

This data.csv file is located on my local server in a web accessible area that can be accessed by CFHTTP. Then, I set up this test page that makes a CFHTTP request for this file:

  • <!--- Build the URL for our CSV data. --->
  • <cfset dataUrl = (
  • "http://" &
  • cgi.server_name &
  • getDirectoryFromPath( cgi.script_name ) &
  • "data.csv"
  • ) />
  •  
  • <!---
  • Grab the CSV data over HTTP and parse it into a ColdFusion
  • query. It is the use of the Name attribute (as opposed to
  • the Result attribute) that tells the CFHTTP tag to parse the
  • CSV data.
  • --->
  • <cfhttp
  • name="girls"
  • method="get"
  • url="#dataUrl#"
  • />
  •  
  • <!--- Output the CSV data query. --->
  • <cfdump
  • var="#girls#"
  • label="CSV Query"
  • />

When you use the "Name" attribute of the CFHTTP tag (as opposed to the Result attribute), it tells ColdFusion that you want to parse the target url into a query object. By default, CFHTTP uses the first row as a header row, naming each query column based on the column value located in the CSV file. When we run the above code, we get the following output:

 
 
 
 
 
 
Parsing CSV Data With ColdFusion's CFHTTP Tag. 
 
 
 

As you can see, when we used the Name attribute, ColdFusion parsed the target HTTP file into a ColdFusion query. You'll also notice that after using the first header row to define the column names, it strips the header row out of the resultant query.

If you want to, you can use the Columns attribute to change the name of the result columns. You can also use the Delimiter and TextQualifier attributes to change the column delimiter and field qualifier. And, if you want to keep the header row (or if there is no header row), you can set the FirstRowAsHeaders attribute to false. I'm not going to demo any of these attributes specifically since they are fairly straightforward.

When using CFHTTP to parse CSV data, there are definitely some big limitations; but if your particular situation does not have any of these limitations, using CFHTTP to parse CSV data is just another way that ColdFusion makes things so incredibly easy.




Reader Comments

Apr 19, 2010 at 9:40 AM // reply »
38 Comments

You posted something about parsing CSV back in 2007 and Steven Klotz recommended using OpenCSV:
http://opencsv.sourceforge.net/

I checked it out and it addresses all 3 limitations that you point out above (local file read, multiple line feeds in a record, variable length record.) Please note that it creates an array instead of a query object.

Here's some sample code that I've been using:
<cfscript>
fileReader = createobject("java","java.io.FileReader");
fileReader.init("c:\thefile.csv");
csvReader = createObject("java","au.com.bytecode.opencsv.CSVReader");
csvReader.init(fileReader, ",");
</cfscript>
<CFSET ArrayData = csvReader.readAll()>


Apr 19, 2010 at 9:09 PM // reply »
10,743 Comments

@James,

I think I remember looking at that. Java solutions are always wicked fast.


Apr 20, 2010 at 2:45 AM // reply »
54 Comments

I just implemented this on a website of ours, using it for reading in countries in the application scope and using that variable (querystruct) when creating a dropdown of countries in the different forms on the website. That way I only have to create the list of countries (with ID, short name and full name) once and I can use it everywhere. Moreover, I can change the CSV file in one spot if needed and it will instantly be available in all places implemented. The latter is handy if one uses this approach in a multi-language setting (website/webstore) and users can limit/extend the number of countries shown.


Apr 20, 2010 at 4:37 PM // reply »
2 Comments

The CFHTTP's CSV limitations are exactly why I've gone with looping over the file instead.

Each row looped gets the ListToArray treatment... making sure that the includeEmptyFields parameter is set to true.

The CSV files I get are always messed up in someway. The file starts off with a description instead of a header, there is a footer with a record count, missing or empty columns.

But certainly an underused feature of CFHTTP. Thanks Ben!


Apr 21, 2010 at 9:22 AM // reply »
10,743 Comments

@Sebastiaan,

That's an interesting approach. Are you saying you have a CSV of countries in one place that you share across applications?

@Michael,

It definitely has issues... but for the sweet spot of CSV files where it is right, it just works quite nicely.


Apr 21, 2010 at 9:40 AM // reply »
5 Comments

@Ben
I didn't know that you could do that so thanks for the post.

@James
OpenCSV looks badass so I might just have to try that out. Thanks a lot for posting the link.


Apr 22, 2010 at 2:08 AM // reply »
54 Comments

@Ben

Well, yes and no, as it's on a shared server, more people could make use of it if they knew it existed ;-) But I put the struct in the application-scope, so it's only available in the website itself. If it was to be available to all apps on the server, I guess I should put it in the server-scope? Especially on our own servers it would be interesting to have this struct of countries available for all apps on that server. Gotta figure out a way to do that ;-)

Any ideas (except for the server-scope)?


Apr 23, 2010 at 11:19 AM // reply »
131 Comments

@Sebastian,

If all apps are sharing data, why not add the country list to a database and give all the apps access to the datasource to that database? Could still load that query to either Server or Application scope, if you wanted.


Apr 23, 2010 at 2:22 PM // reply »
54 Comments

@Jason Fisher,

I needed a way to get one standard list of countries available for many websites on different servers quickly. And I didn't want to fuss about with changing database tables. But your suggestion inspires me to create a country table in our CMS, with the country names available per language. Usually this is Dutch, English, German and French, so that means I would need one large table with all the countrynames in it and at the end of each row a language_id flag. Then I could load the list per page when needed instead of putting it in the application scope. Changing languages in the website/webapp would then lead me to get the correct list of countries with the langiage_id flag.


Apr 23, 2010 at 3:08 PM // reply »
131 Comments

@Sebastiaan,

Ah, nice, yeah. You could even create separate queries for each language and load in the Application scope:

application.countries = {};

application.countries["en"] = myEnglishCounrtyQuery;

application.countries["de"] = myGermanCountryQuery;

application.countries["gr"] = myGreekCountryQuery;

Then when loading for a given user you could pull by the user's selected language:

<cfoutput query="#countries[session.lang]#">

<option value="#shortVal#">#longVal#</option>

</cfoutput>

Other ways to eat this elephant, of course, but that might be pretty slick.


Apr 26, 2010 at 12:09 PM // reply »
2 Comments

When I try to run the code it returns me an error: "Variable GIRLS is undefined."

I'm running CF8 on Windows XP... any ideas?
Thanks


Apr 27, 2010 at 3:22 AM // reply »
2 Comments

When I try to run the code it returns me an error: "Variable GIRLS is undefined."

If I use "result" instead, the <cfdump> doesen't fail... so I am sure that the file exists and it is reachable...

I'm running CF8 on Windows XP... any ideas?
Thanks


Jun 3, 2010 at 4:00 PM // reply »
1 Comments

Thanks Ben I've been trying to figure out a simple way to parse a relatively small CSV all day and this solution did the trick.


Jun 7, 2010 at 9:57 PM // reply »
10,743 Comments

@Sebastiaan, @Jason,

Yeah, there's really two ways to think about this - how can I load the data quickly EACH time... and, DO I need to load the data EACH time. I think you are on to the best approach where the data is centrally located, but is loaded and cached locally in each application. Ultimately, I think this will be the most winning approach.

@Erich,

I am not sure why Result would work when Name wouldn't. Typically, Result is what I use for pure CFHTTP usage; however, I believe the use of Name is what signals ColdFusion to create a query.


Mar 8, 2011 at 2:00 AM // reply »
4 Comments

I don't think it works with https

all I get is a blank page


Mar 17, 2011 at 4:17 PM // reply »
3 Comments

Another limitation with using cfhttp for parsing csv files is unicode files. if you specify charset="UTF-8" or any other utf encoding, you will get "The column name is invalid" errors.
To overcome this you have to specify getasbinary="yes" as an attribute.

But if you specify getasbinary="yes" , the name attribute does not work and so you dont get a query object.
So if you have a unicode file:
<cfhttp url="http://enpersmm/teststuff/doc_unicode.txt" charset="UTF-8" name="getdata" method="GET"></cfhttp>

The above will throw an error. You could use:
<cfhttp url="http://enpersmm/teststuff/doc_unicode.txt" charset="UTF-8" getasbinary="yes" name="getdata" method="GET"></cfhttp>

In this case, you will get an error: getdata does not exist.


Mar 17, 2011 at 4:21 PM // reply »
3 Comments

@Lawrence,

to get cfhttp to work with a https url, you will first have to import the SSL certificate into CF. See: http://kb2.adobe.com/cps/400/kb400977.html


Aug 5, 2011 at 11:04 PM // reply »
1 Comments

I too was experiencing the error "Variable GIRLS is undefined." Turns out Coldfusion is converting the reponse to a binary object based on the information in the response header.

By setting "getasbinary = never" in your http request, the error goes away and it works as described here by Ben.



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 16, 2012 at 8:18 PM
Best Of ColdFusion 10 Contest Entry - HTML Email Utility
Just found this, looks good! I'm trying to run it on local, it's the 64bit version and I'm experiencing horrible lag. On average the generate.cfm processes the content change in 60-90 seconds. I've ... read »
May 16, 2012 at 6:40 PM
Maintaining Sessions Across Multiple ColdFusion CFHttp Requests
I am trying to integrate this CFHTTPsession into an application that will log into zeekrewards.com to post ads and I am not having any luck. The code works perfectly for logging into other websites, ... read »
May 16, 2012 at 2:44 PM
Creating A Sometimes-Fixed-Position Element With jQuery
Thank you, very useful technique! Worked like a charm. ... read »
May 16, 2012 at 1:58 PM
Movies As A Religious Experience
Acting can, in a way, ruin the movie-goer's experience. I used to be able to get so caught up in movies and their plots, and totally engaged. But lately, I haven't been able to as much with a lot o ... read »
May 16, 2012 at 1:52 PM
The Science Of Optimal Post-Exercise Nutrition
children of this age eat very less vegetables so u can opt for salads they will like it also carrot ,cucumber,onion and as far as pulses are concerned u can boil them ,give him along with mashed rice ... read »
May 16, 2012 at 1:34 PM
Strange ColdFusion JRUN Stack Overflow Error
Hey, Recently I updated my jrun4 using the latest updater 7 and now i am having memory issues :(:(:( any help is appreciated ... read »
May 16, 2012 at 9:56 AM
ColdFusion 10 Beta, Apache Tomcat, And Symbolic Links On Mac OSX
Hi, Now that ColdFusion 10 is out I have stumbled over this as well and I cannot figure out the proper solution. We're running virtual hosts via Apache2; the ColdFusion-applications store their fil ... read »
May 15, 2012 at 6:03 PM
Movies As A Religious Experience
@Ben, I don't know whether you'd consider this a religious observation, but it seems to me, in a sense, movies multiply how many lives we get to have. Each movie is like a little extra life we get ... read »