As part of my exploration of writing, reading, and creating Microsoft Excel documents using ColdFusion, I have come across the need to parse comma-separated-value (CSV) data files. While this seems at first like a relatively simple task, I soon found out that it was ANYTHING but simple. It's one thing to worry about lists (for which ColdFusion is the bomb-diggity), but it's an entirely other thing to worry about lists that have field qualifiers, escaped qualifiers, escaped qualifiers that might be empty fields, and non-qualified field values all rolled into one.
I tried looking it up in Google but could not find any satisfactory algorithms (translates to: code that I could understand). Everything on CSV seems to be in Java and none the stuff on CFLib.org seems to comply with the range of CSV values (especially qualified fields). So, in typical blood-and-guts fashion, I sat down and tried to write my own algorithm. This proved to be easy at first until I found out that my approach was highly flawed. I went through about three different implementations over the weekend of the algorithm before I came up with something that seemed to work satisfactorially.
It has to evaluate each character at a time, which probably won't scale or perform nicely. I would have liked to harness the power of CFHttp to convert CSV files to queries, but I could not get CFHttp to work on the LOCAL file system (ie. a URL that begins with "file:"). If anyone knows of great way to do this, please let me know. I suppose that I could written a temporary file to a public folder and then performed a CFHttp to it, then deleted it, but that just felt a bit "hacky." However, in the end that might just prove to be the way to go.
So anyway, this is what I have come up with. It is a function that takes either a chunk of CSV data or a file path to a CSV data file (text file) and converts it to an array of arrays. It assumes that each record is separated by a return character followed optionally by a new line. Not sure if that is cross system compliant, but heck, this is my first attempt:
Launch code in new window » Download code as text file »
I have chosen to convert the CSV to an array of arrays as I was not sure that you could depend on the constant number of fields per row. Plus, I figure that going from an array to a query (after this step) would be rather easy. Plus, since Excel is not perfectly square cols vs. rows, I figure this was more in-line with where I want to go with it (including it in my ColdFusion POI Utility component).
If I create a variable containing this CSV data:
last name,first name,salary,dream salary,happiness
Jones,Mike,"$35,500.00","$73,000.00"
Hopkins,Paul,"$55,234.00","$250,000.00",3.0
Hawkings,Katie,,,
,
Smith,Betty,"$57,010.00","$60,000.00",10.0
... and pass it into the CSVToArray ColdFusion user defined function:
Launch code in new window » Download code as text file »
I get this output:
| | | | ||
| | ![]() | | ||
| | | |
As you can see, the CSVToArray() ColdFusion function handles mixed length records, empty field values, and qualified fields. It even handles escaped qualifiers (ex. "" becomes ") but this was not demonstrated. While this is not perfect, at least it provides me with a CSV conversion interface that I can use in my POI Utility ColdFusion component. Further down the road, I will be able to swap this out later for a better implementation.
Download Code Snippet ZIP File
Comments (14) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Adding Basic CSS Support To My POI Utility ColdFusion Component For Excel Creation
Parsing And Keeping A CSS Model Using ColdFusion
Ben,
I haven't thought through this, so forgive me if it's a stupid question, but
did you consider using regular expressions? If so, what caused you to decide against using them?
Posted by Sammy Larbi on Jan 22, 2007 at 7:45 PM
@Ben,
When dealing with lists, use the GetToken() function. It won't ignore empty list elements. This will significantly speed up your function and replace the loop that you are doing. Also Sammy hit the nails on the head with using RegEx to strip out the text between the qualifiers.
Another trick you can use to speed things up is to use GetToken() to populate the empty the empty cells and then use ListToArray() for the conversion. It's alot quicker then creating a Java Object on each call.
Hopefully this helps you out some.
Posted by Tony Petruzzi on Jan 22, 2007 at 11:26 PM
@Sammy,
I did think of regular expressions, 'cause they are cool, but I wasn't sure how to apply them. Plus I don't think my skills with them would be good enough to handle all the different options that come with CSV formatting. Take for example:
ben,was,here
That is three fields. But this:
"ben,was,here"
is one field. But this:
""ben,was,here""
is three fields; the first starts with a quote literal, and the last field ends with a quote literal. And then this:
""ben,"was,here"""
has two fields.... you get the point? It was just too much for me to wrap my head around. I am sure that regular expressions would rock somehow, I just can't figure it out.
Posted by Ben Nadel on Jan 23, 2007 at 7:20 AM
Tony,
It's funny you mention that because my first attempt actually did use a Tokenizer. In my experience, though, it does skip empty fields:
<cfset Tokenizer = CreateObject(
"java",
"java.util.StringTokenizer"
).Init(
JavaCast( "string", "a,b,,,,c,d,e,f" ),
JavaCast( "string", "," )
) />
<cfloop condition="Tokenizer.HasMoreTokens()">
[#Tokenizer.NextToken()#]<br />
</cfloop>
... outputs:
[a]
[b]
[c]
[d]
[e]
[f]
... it skips right over the empty fields. However, in my current implementation I do add a leading space to all fields which then gets stripped out later.
I did learn some things in iteration three that I didn't know in iteration one, so I could probably go back and apply that to the String Tokenizer. In fact, maybe I will do that.
Posted by Ben Nadel on Jan 23, 2007 at 7:28 AM
Comma seperated is a good idea with cold fusion becoz it is gonna remove some of difficult queries and the irregularities. while is is easy to retrieve the information at the client end.
It is being used in www.compglobe.com where you are entitled to compose your comment and the comment will be transfered to the CSV file at the server level.
www.compglobe.com is also using CSV format to upload the phone no.s if you want to send information to the handset of the recipent to whom you want to delivered the material. www.compglobe.com has various things like message composer and an online radio too.
Posted by Huber on Feb 4, 2007 at 4:09 AM
Doing something similar, i just grabbed http://opencsv.sourceforge.net/ and then did this:
<cfparam name="filename">
<cfscript>
fileReader = createobject("java","java.io.FileReader");
fileReader.init(filename);
csvReader = createObject("java","au.com.bytecode.opencsv.CSVReader");
csvReader.init(fileReader);
</cfscript>
<cfdump var="#csvReader.readAll()#">
Java and ColdFusion play SO nice together *smile*
Posted by Steven Klotz on Apr 12, 2007 at 2:34 PM
Thanks for the code. This was very helpful since I'm just learning CF. I now from other experiences that parsing CSV files can be a real pain to get it to work right.
Posted by J. Prime on May 11, 2007 at 12:28 PM
Thanks for the code and tutorial Ben - I was grappling with exactly the same issue relating to coverting CSV with encapsulating quotes and your post was a lifesaver!!
Posted by Craig McDonald on Jul 12, 2007 at 6:30 PM
Always a pleasure to help out!
Posted by Ben Nadel on Jul 12, 2007 at 6:54 PM
@Stephen:
Have you gotten this work with opencsv's CSVWriter?
Tim
Posted by Tim on Jul 27, 2007 at 6:25 PM
This is similar perhaps to what I need to achieve.(I think)
My client has a list of products. (Product ID, Product Name, description) are the colum headers for the product table.
well, the description field data... is a CSV.
for example
the data in the description field is:
OD(+/-1.2mm), Wall Thickness = 5.0mm (+/- .4mm), Inside Diameter = 65.0mm, Approximate pieces per case = 4, Approximate weight per case = 32.34 lbs
But i need to take the data in that one field, and create more colums to display these attributes rather than this text blob.
Am I on the right track?
Posted by JKS on Jan 22, 2008 at 5:21 PM
@JKS,
You can use CSV parsing to get those values; however, if those are the only values in the field, you can simply treat the data as if it were a comma-delimited list. Then, you can either split the list into an array with ListToArray(), or even use things like ListGetAt() and ListLen() to loop over the elements of the list and examine each individually.
Posted by Ben Nadel on Jan 23, 2008 at 9:41 AM
Ben,
AWESOME JOB!!! I can't believe this was so difficult to find. You definitely saved HOURS of time and helped meet my deadline. This works great. People like you are what make the net an awesome place for research and learning. Thanks!!
Posted by Rob G on Sep 29, 2008 at 7:54 PM
@Rob,
Glad to help. Check out a more updated post on this type of thing:
http://www.bennadel.com/index.cfm?dax=blog:976.view
Posted by Ben Nadel on Sep 29, 2008 at 7:57 PM