Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the New York ColdFusion User Group (Apr. 2008) with: Clark Valberg and Peter Bell and Rob Gonda

Parsing CSV Data With ColdFusion's CFHTTP Tag

By Ben Nadel on
Tags: ColdFusion

The other day on Twitter, Simon Hume and I were talking about parsing CSV (comma separated values) data files located on a remote server. I mentioned to him that in addition to being able to pull down remote data, ColdFusion's CFHTTP also has the ability to parse CSV data into a query object on the fly. This seems to be a rarely used feature of the tag, not to mention one that I have never demonstrated before. As such, I figured I would put together a very brief demo.

Before we get into the demo, however, I just want to point out that using CFHTTP to parse data does has a few large limitations to be aware of:

  • It cannot use the "file://" protocol. This means that any CSV data that you are trying to parse must be available via a URL. If you want to parse local data, you still have to route back to your server using HTTP and a web-accessible location (this is what I'm doing in the demo below).
  • It cannot handle embedded carriage returns. You can qualify column values if you wish to embed column delimiters or field qualifiers; but, you cannot embed an actual line break.
  • It cannot handle a variable number of columns. Each row must have the same number of columns, even if those columns are empty.

These limitations are only important if you need to pull down the CSV data and parse it in a single step. If you need to read local data, account for a variable number of columns, or handle embedded line breaks, you can always use something like my regular-expression-powered CSV parser. Even if you need to use remote data, you could still pull down the data with CFHTTP and then parse it afterwards.

That said, let's get on to the CFHTTP CSV example. To demo this, I set up the following CSV data file (data.csv):

name,age,hair
Tricia,27,Brunette
Joanna,25,Brunette
Lisa,33,Blonde

This data.csv file is located on my local server in a web accessible area that can be accessed by CFHTTP. Then, I set up this test page that makes a CFHTTP request for this file:

  • <!--- Build the URL for our CSV data. --->
  • <cfset dataUrl = (
  • "http://" &
  • cgi.server_name &
  • getDirectoryFromPath( cgi.script_name ) &
  • "data.csv"
  • ) />
  •  
  • <!---
  • Grab the CSV data over HTTP and parse it into a ColdFusion
  • query. It is the use of the Name attribute (as opposed to
  • the Result attribute) that tells the CFHTTP tag to parse the
  • CSV data.
  • --->
  • <cfhttp
  • name="girls"
  • method="get"
  • url="#dataUrl#"
  • />
  •  
  • <!--- Output the CSV data query. --->
  • <cfdump
  • var="#girls#"
  • label="CSV Query"
  • />

When you use the "Name" attribute of the CFHTTP tag (as opposed to the Result attribute), it tells ColdFusion that you want to parse the target url into a query object. By default, CFHTTP uses the first row as a header row, naming each query column based on the column value located in the CSV file. When we run the above code, we get the following output:

 
 
 
 
 
 
Parsing CSV Data With ColdFusion's CFHTTP Tag. 
 
 
 

As you can see, when we used the Name attribute, ColdFusion parsed the target HTTP file into a ColdFusion query. You'll also notice that after using the first header row to define the column names, it strips the header row out of the resultant query.

If you want to, you can use the Columns attribute to change the name of the result columns. You can also use the Delimiter and TextQualifier attributes to change the column delimiter and field qualifier. And, if you want to keep the header row (or if there is no header row), you can set the FirstRowAsHeaders attribute to false. I'm not going to demo any of these attributes specifically since they are fairly straightforward.

When using CFHTTP to parse CSV data, there are definitely some big limitations; but if your particular situation does not have any of these limitations, using CFHTTP to parse CSV data is just another way that ColdFusion makes things so incredibly easy.



Looking For A New Job?

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

You posted something about parsing CSV back in 2007 and Steven Klotz recommended using OpenCSV:
http://opencsv.sourceforge.net/

I checked it out and it addresses all 3 limitations that you point out above (local file read, multiple line feeds in a record, variable length record.) Please note that it creates an array instead of a query object.

Here's some sample code that I've been using:
<cfscript>
fileReader = createobject("java","java.io.FileReader");
fileReader.init("c:\thefile.csv");
csvReader = createObject("java","au.com.bytecode.opencsv.CSVReader");
csvReader.init(fileReader, ",");
</cfscript>
<CFSET ArrayData = csvReader.readAll()>

Reply to this Comment

I just implemented this on a website of ours, using it for reading in countries in the application scope and using that variable (querystruct) when creating a dropdown of countries in the different forms on the website. That way I only have to create the list of countries (with ID, short name and full name) once and I can use it everywhere. Moreover, I can change the CSV file in one spot if needed and it will instantly be available in all places implemented. The latter is handy if one uses this approach in a multi-language setting (website/webstore) and users can limit/extend the number of countries shown.

Reply to this Comment

The CFHTTP's CSV limitations are exactly why I've gone with looping over the file instead.

Each row looped gets the ListToArray treatment... making sure that the includeEmptyFields parameter is set to true.

The CSV files I get are always messed up in someway. The file starts off with a description instead of a header, there is a footer with a record count, missing or empty columns.

But certainly an underused feature of CFHTTP. Thanks Ben!

Reply to this Comment

@Sebastiaan,

That's an interesting approach. Are you saying you have a CSV of countries in one place that you share across applications?

@Michael,

It definitely has issues... but for the sweet spot of CSV files where it is right, it just works quite nicely.

Reply to this Comment

@Ben
I didn't know that you could do that so thanks for the post.

@James
OpenCSV looks badass so I might just have to try that out. Thanks a lot for posting the link.

Reply to this Comment

@Ben

Well, yes and no, as it's on a shared server, more people could make use of it if they knew it existed ;-) But I put the struct in the application-scope, so it's only available in the website itself. If it was to be available to all apps on the server, I guess I should put it in the server-scope? Especially on our own servers it would be interesting to have this struct of countries available for all apps on that server. Gotta figure out a way to do that ;-)

Any ideas (except for the server-scope)?

Reply to this Comment

@Sebastian,

If all apps are sharing data, why not add the country list to a database and give all the apps access to the datasource to that database? Could still load that query to either Server or Application scope, if you wanted.

Reply to this Comment

@Jason Fisher,

I needed a way to get one standard list of countries available for many websites on different servers quickly. And I didn't want to fuss about with changing database tables. But your suggestion inspires me to create a country table in our CMS, with the country names available per language. Usually this is Dutch, English, German and French, so that means I would need one large table with all the countrynames in it and at the end of each row a language_id flag. Then I could load the list per page when needed instead of putting it in the application scope. Changing languages in the website/webapp would then lead me to get the correct list of countries with the langiage_id flag.

Reply to this Comment

@Sebastiaan,

Ah, nice, yeah. You could even create separate queries for each language and load in the Application scope:

application.countries = {};

application.countries["en"] = myEnglishCounrtyQuery;

application.countries["de"] = myGermanCountryQuery;

application.countries["gr"] = myGreekCountryQuery;

Then when loading for a given user you could pull by the user's selected language:

<cfoutput query="#countries[session.lang]#">

<option value="#shortVal#">#longVal#</option>

</cfoutput>

Other ways to eat this elephant, of course, but that might be pretty slick.

Reply to this Comment

When I try to run the code it returns me an error: "Variable GIRLS is undefined."

I'm running CF8 on Windows XP... any ideas?
Thanks

Reply to this Comment

When I try to run the code it returns me an error: "Variable GIRLS is undefined."

If I use "result" instead, the <cfdump> doesen't fail... so I am sure that the file exists and it is reachable...

I'm running CF8 on Windows XP... any ideas?
Thanks

Reply to this Comment

Thanks Ben I've been trying to figure out a simple way to parse a relatively small CSV all day and this solution did the trick.

Reply to this Comment

@Sebastiaan, @Jason,

Yeah, there's really two ways to think about this - how can I load the data quickly EACH time... and, DO I need to load the data EACH time. I think you are on to the best approach where the data is centrally located, but is loaded and cached locally in each application. Ultimately, I think this will be the most winning approach.

@Erich,

I am not sure why Result would work when Name wouldn't. Typically, Result is what I use for pure CFHTTP usage; however, I believe the use of Name is what signals ColdFusion to create a query.

Reply to this Comment

Another limitation with using cfhttp for parsing csv files is unicode files. if you specify charset="UTF-8" or any other utf encoding, you will get "The column name is invalid" errors.
To overcome this you have to specify getasbinary="yes" as an attribute.

But if you specify getasbinary="yes" , the name attribute does not work and so you dont get a query object.
So if you have a unicode file:
<cfhttp url="http://enpersmm/teststuff/doc_unicode.txt" charset="UTF-8" name="getdata" method="GET"></cfhttp>

The above will throw an error. You could use:
<cfhttp url="http://enpersmm/teststuff/doc_unicode.txt" charset="UTF-8" getasbinary="yes" name="getdata" method="GET"></cfhttp>

In this case, you will get an error: getdata does not exist.

Reply to this Comment

I too was experiencing the error "Variable GIRLS is undefined." Turns out Coldfusion is converting the reponse to a binary object based on the information in the response header.

By setting "getasbinary = never" in your http request, the error goes away and it works as described here by Ben.

Reply to this Comment

For secure/sensitive files - how about setting up a secondary web folder on 127.0.0.1 in IIS so it can only be accessed by the local machine - is there a reason I am not thinking of that this would be insecure? I feel like if they only way to access the web site is through the local ip address it couldn't be accessed remotely... am I thinking about that right?

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.