Skip to main content
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Andy Matthews
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Andy Matthews ( @commadelimited )

Remediating CSV Injection Attacks In ColdFusion

By on
Tags:

A few days ago, I didn't know what a CSV Injection Attack was. I love generating CSV (Comma Separated Value) files in ColdFusion. And, heretofore, I had always thought of CSV files as containing nothing more than inert text data. On Wednesday, however, David Epler - one of our senior security engineers - got the results of our ongoing Penetration Test (PenTest); and, lo-and-behold, one of the identified vulnerabilities was "CSV Injection". Since this was an unknown attack vector for me, I assume it is also an unknown issue for some of you. As such, I wanted to look at remediating CSV Injection attacks in ColdFusion.

As with all of the OWASP (Open Web Application Security Project) Injection attacks, CSV Injection allows raw, user-provided data to trigger unintended actions in the target system. With CSV Injection, the unintended action is the evaluation of a Formula in the context of spreadsheet program (such as MS Excel, Google Sheets, Numbers) on the victim's computer.

For example, if a CSV data file contains the encoded field:

"=3+2"

... the presence of the equal sign (=) as the first character may cause the given spreadsheet program to evaluate the cell content as a math equation, yielding the sum, 5.

Maths - while challenging for my Caveman brain - isn't in-and-of-itself a threat. But, as George Mauer points out in his article on CSV Injection, some spreadsheet programs have formulas that can execute low-level commands or even make HTTP requests to remote servers. Yikes!

The remediation of CSV Injection attacks recommended by OWASP is to:

  • Quote fields.
  • Escape embedded quotes.
  • Prepend the field value with a single-quote (') if the field starts with one of the malicious characters that can trigger a formula evaluation:
    • Equals (=)
    • Plus (+)
    • Minus (-)
    • At (@)
    • Tab (0x09)
    • Carriage return (0x0D)

Since I love writing ColdFusion components, my remediation for the CSV Injection outlined in the PenTest was to create a component that centralizes the logic for serializing Array data into CSV data. It only has two public methods:

  • serializeArray( rows )
  • serializeArrayAsBinary( rows )

The second method is just a convenience method since I'm almost always taking the generated CSV data and piping it into the variable attribute of the CFContent tag. This component builds on my previous post, celebrating the joys of CSV in ColdFusion:

component
	output = false
	hint = "I provide helper methods for SAFELY serializing Array data as CSV content."
	{

	// These are used internally, but can also be used externally as well in order to make
	// the calling code more obvious (seeing names is easier than seeing ASCII numbers).
	this.chars = {
		COMMA: ",",
		TAB: chr( 9 ),
		NEWLINE: chr( 10 ),
		CARRIAGE_RETURN: chr( 13 ),
		QUOTES: """",
		ESCAPED_QUOTES: """"""
	};

	/**
	* I initialize the CSV serializer with the given defaults.
	*/
	public void function init(
		string fieldDelimiter = this.chars.COMMA,
		string rowDelimiter = this.chars.NEWLINE,
		string encoding = "utf-8"
		) {

		variables.defaultFieldDelimiter = fieldDelimiter;
		variables.defaultRowDelimiter = rowDelimiter;
		variables.defaultEncoding = encoding;

	}

	// ---
	// PUBLIC METHODS.
	// ---

	/**
	* I serialize the given array data as a CSV string payload.
	*/
	public string function serializeArray(
		required array rows,
		string fieldDelimiter = defaultFieldDelimiter,
		string rowDelimiter = defaultRowDelimiter
		) {

		var csvData = rows
			.map(
				( row ) => {

					return( this.serializeArrayRow( row, fieldDelimiter ) );

				}
			)
			.toList( rowDelimiter )
		;

		return( csvData );

	}


	/**
	* I serialize the given array data as a CSV binary payload.
	* 
	* NOTE: This method is provided for convenience - when I generate CSV content, I am
	* ALMOST ALWAYS then streaming the content back to the client using CFContent, which
	* accepts a binary `variable` attribute.
	*/
	public string function serializeArrayAsBinary(
		required array rows,
		string fieldDelimiter = defaultFieldDelimiter,
		string rowDelimiter = defaultRowDelimiter,
		string encoding = defaultEncoding
		) {

		return(
			charsetDecode(
				serializeArray( rows, fieldDelimiter, rowDelimiter ),
				encoding
			)
		);

	}

	// ---
	// PRIVATE METHODS.
	// ---

	/**
	* I return a value in which any KNOWN potential CSV injection vector has been escaped.
	* 
	* CAUTION MODIFIES DATA OUTPUT: This works by checking the first character in the
	* field; and, if it is a potentially dangerous character, it prepends the field value
	* with a single-quote. Some spreadsheet programs will hide this single-quote; others
	* will render it. As such, it may appear to the end user that we've altered their data
	* (which we have). Unfortunately, there's no way around this.
	*/
	private string function escapeCsvInjection( required string field ) {

		switch ( field.left( 1 ) ) {
			// These characters are mentioned on the OWASP website.
			// --
			// https://owasp.org/www-community/attacks/CSV_Injection
			case "=":
			case "+":
			case "-":
			case "@":
			case this.chars.TAB:
			case this.chars.CARRIAGE_RETURN:
			// The pipe character was mentioned in our PenTest results, not on OWASP.
			case "|":

				return( "'" & field );

			break;
			default:

				return( field );

			break;
		}

	}


	/**
	* I return a field value that is quoted and in which embedded special characters have
	* been escaped.
	*/
	private string function escapeField( required string field ) {

		return(
			this.chars.QUOTES &
			field.replace( this.chars.QUOTES, this.chars.ESCAPED_QUOTES, "all" ) &
			this.chars.QUOTES
		);

	}


	/**
	* I serialize the given row, both QUOTING and ESCAPING the content of each field.
	*/
	private string function serializeArrayRow(
		required array row,
		required string fieldDelimiter
		) {

		var csvData = row
			.map(
				( field ) => {

					// NOTE: Calling the toString() method to cast each field to a string
					// in case we are dealing with non-string simple values. This allows
					// us to call member-methods in all of the subsequent invocations.
					var escapedValue = toString( field );
					escapedValue = escapeCsvInjection( escapedValue );
					escapedValue = escapeField( escapedValue );

					return( escapedValue );

				}
			)
			.toList( fieldDelimiter )
		;

		return( csvData );

	}

}

The root of the solution here is that every field value is passed through these three methods:

  • toString()
  • escapeCsvInjection()
  • escapeField()

What I end up with is fields that show up looking like this:

=cmd | Sarah "Stubs" Smith

... and end up getting serialized like this:

"'=cmd | Sarah ""Stubs"" Smith"

Now, let's try to generate a CSV file in ColdFusion that contains malicious content. Notice that I'm using the convenience method, serializeArrayAsBinary(), to pipe the data directly to the CFContent tag. This resets the output buffer and halts all processing on the server:

NOTE: This code is running in Lucee CFML. As such, it does not need the cf prefixes on the CFScript-based tags.

<cfscript>

	serializer = new CsvSerializer();

	rows = [
		[ "ID", "NAME", "EMAIL" ],
		[ 1, "Sarah ""Stubs"" Smith", "sarah.smith@example.com" ],
		[ 2, "John Johnson", "jon.johnson@example.com" ],
		[ 3, "=(3+5)", "#chr( 9 )#dr.evil@example.com" ], // <== CAUTION: Malicious data!
		[ 4, "Jo Jamila", "jo.jamila@example.com" ]
	];

	header
		name = "content-disposition"
		value = getContentDisposition( "user-data.csv" )
	;
	content
		type = "text/csv; charset=utf-8"
		variable = serializer.serializeArrayAsBinary( rows )
	;

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I get the content-disposition header value for the given filename.
	*/
	private string function getContentDisposition(
		required string filename,
		string disposition = "attachment"
		) {

		var encodedFilename = encodeForUrl( filename );

		return( "#disposition#; filename=""#encodedFilename#""; filename*=UTF-8''#encodedFilename#" );

	}

</cfscript>

If we then run this ColdFusion code, generate the CSV file, and import it into Google Sheets, we get the following output:

A malicious CSV file imported into Google Sheets show remediation applied in ColdFusion.

As you can see, the the malicious field values have been neutralized with the single-quote prefix. Furthermore, Google Sheets doesn't render the quote in the sheet output; but, if you look in the Formula input, you can see it.

I am continually impressed with how people find new and interesting ways to misuse and abuse technologies. Thankfully, SQL Injection has long since been all but eradicated from ColdFusion thanks to the CFQueryParam tag. And, many forms of persisted XSS (Cross-Side Scripting) attacks have been neutralized by the various encodeForXYZ() methods. Perhaps we need to introduce an encodeForCsv() method?

Want to use code from this post? Check out the license.

Reader Comments

1 Comments

This could cause issues with systems importing this as when i use SSMS to import it throws an error because of the single quote. I'm sure you'll see this elsewhere also.

The owasp article also adds 'tab' to the csv list, assuming they are covering only csv and not variations of delimited files (tab,pipe,etc) which each has it's unique issues.

international phone numbers often have + at the start... eg : +61380007888 so in that case you'd be converting the value to a string, same with minus, instead of the number being negative. ok not a big deal... however if you have mixed data (local and international then + signifies international and lack of it local number) eg: 1800-000-000 is local, you cant convert this to +611800000000 as in a lot of countries 1800 numbers are in-country only... so you need to leave off the country prefix.

Of course this still begs the question for the minus sign, as the solution is to reformat to surround the number with () brackets (another representation of negative)

So you're stuck with string for phone numbers but no way to tell what is international format? (aka what is 18008884444?? is it +1, +18 or just 1800? depends on country how long a phone number is...)

The bigger question really is what schema are you importing/output using? Should the output be VALIDATED before being stored? And also validated before being exported? (so the name column should it allow a number or forumula/expression?

As someone who imports/exports 20 million line csv files multiple times a day, the prefix is a real issue as most csv readers will not account for it, so you'd have to parse and remove all the single quotes before you import the doc to ensure that you're not going to get erorrs. Once again I think this (like json when you import/export) needs a schema that every row and cell needs to be validated through.

15,688 Comments

@Dawesi,

These are really great points! And, unfortunately, I don't really have a good answer. As you are saying, if the importing system doesn't know about these escapes, then the import is going to lead to some unexpected outcomes (if not even errors).

Since security is often dictated by a set of trade-offs, I suppose the best approach would be to have some sort of a list of fields that skip this step. So, phone-numbers for example, don't get it since they seem to cause such a big issue (given the standard prefixing).

Of course, then you probably have to validate that they are only containing a certain set of characters.

It's definitely not a "one size fits all", which makes it frustrating. 😨 I guess the key is knowing that this can be a problem; and then, dialing it back in places where you can make other assertions without having to do the escaping.

As an aside, I often wrap all my fields in quotes ("), and then escape embedded quotes; but, I know that people complain that this actually has a very real impact on the size of the generated CSV file. It's funny how such a relatively simple text format leaves us with so much to consider!

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel