Skip to main content
Ben Nadel at CFUNITED 2008 (Washington, D.C.) with: Sean Corfield
Ben Nadel at CFUNITED 2008 (Washington, D.C.) with: Sean Corfield ( @seancorfield )

Code Kata: Parsing Strings Like "5mb" Into A Number Of Bytes In Lucee CFML 5.3.7.47

By on
Tags:

In yesterday's post about streaming an incremental ZIP file up to Amazon S3 in Lucee CFML, I had to wait until "chunks" were over 5mb (5 megabytes) in size before I could upload them. To do this, I literally calculated the number of bytes that equated to 5mb. Afterwards, I thought it would be nice if there were methods for converting between bytes and larger data-units. As a code kata, I wanted to see if I could create just functions in Lucee CFML 5.3.7.47.

In ColdFusion, there is already a precedence for converting between two units of measurement: inputBaseN() and formatBaseN(). inputBaseN() converts a given value into decimal (base 10); and, formatBaseN() converts a given decimal (base 10) into another base. As such, when converting between bytes and other units (ex, megabytes), I wanted to use the same input / format terminology:

  • inputBytesN( quantity, unit ) - converts a given unit into bytes.

  • formatBytesN( quantity, unit ) - converts bytes into a given unit.

  • parseBytes( input ) - short-hand function that will parse the quantity and unit out of a string like, "5mb", and pipe them into the inputBytesN() function.

In the end, these functions just wrap a bunch of multiplications and divisions of 1024, which is the number of bytes in a kilobyte (and is the general multiplier needed to move between different units):

<cfscript>

	echo( "<p><strong> Testing parseBytes() </strong></p>" );
	echo( parseBytes( "1.305 kb" ) & "<br />" );
	echo( parseBytes( "2 megabytes" ) & "<br />" );
	echo( parseBytes( "3 gb" ) & "<br />" );
	echo( "<br />" );

	echo( "<p><strong> Testing inputBytesN() </strong></p>" );
	echo( inputBytesN( 1, "bit" ) & "<br />" );
	echo( inputBytesN( 1, "b" ) & "<br />" );
	echo( inputBytesN( 1, "kb" ) & "<br />" );
	echo( inputBytesN( 1, "mb" ) & "<br />" );
	echo( inputBytesN( 1, "gb" ) & "<br />" );
	echo( inputBytesN( 1, "tb" ) & "<br />" );
	echo( "<br />" );

	echo( "<p><strong> Testing formatBytesN() </strong></p>" );
	echo( formatBytesN( 1, "bit" ) & "<br />" );
	echo( formatBytesN( 1, "b" ) & "<br />" );
	echo( formatBytesN( 1024, "kb" ) & "<br />" );
	echo( formatBytesN( 1048576, "mb" ) & "<br />" );
	echo( formatBytesN( 1073741824, "gb" ) & "<br />" );
	echo( formatBytesN( 1099511627776, "tb" ) & "<br />" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I convert the given number of bytes into the given unit. No rounding of decimals is
	* performed. If you want to round the value, you must do it in the calling context.
	* 
	* Example: formatBytesN( 1024, "kb" ) => 1
	* 
	* @quantity I am the number of bytes to convert.
	* @unit I am the unit of measurement into which we are converting.
	*/
	public numeric function formatBytesN(
		required numeric quantity,
		required string unit
		) {

		switch ( unit ) {
			case "bit":
			case "bits":
				return( quantity * 8 );
			break;
			// CAUTION: Lowercase "b" is actually the international standard for BIT.
			// However, since ColdFusion is case-insensitive, I'm going to use any case
			// of "B" to mean Byte.
			case "b":
			case "byte":
			case "bytes":
				return( quantity );
			break;
			case "k":
			case "kb":
			case "kilobyte":
			case "kilobytes":
				return( quantity / 1024 );
			break;
			case "m":
			case "mb":
			case "megabyte":
			case "megabytes":
				return( quantity / 1024 / 1024 );
			break;
			case "g":
			case "gb":
			case "gigabyte":
			case "gigabytes":
				return( quantity / 1024 / 1024 / 1024 );
			break;
			case "t":
			case "tb":
			case "terabyte":
			case "terabytes":
				return( quantity / 1024 / 1024 / 1024 / 1024 );
			break;
			default:
				throw(
					type = "UnsupportedUnit",
					message = "Format unit not recognized",
					extendedInfo = serializeJson( arguments )
				);
			break;
		}

	}


	/**
	* I convert the given quantity into the equivalent number of bytes.
	* 
	* Example: inputBytesN( 1, "kb" ) => 1024
	* 
	* @quantity I am the value to convert.
	* @unit I am the unit of measurement in which the quantity was defined.
	*/
	public numeric function inputBytesN(
		required numeric value,
		required string unit
		) {

		switch ( unit ) {
			case "bit":
			case "bits":
				return( ceiling( value / 8 ) );
			break;
			// CAUTION: Lowercase "b" is actually the international standard for BIT.
			// However, since ColdFusion is case-insensitive, I'm going to use any case
			// of "B" to mean Byte.
			case "b":
			case "byte":
			case "bytes":
				return( value );
			break;
			case "k":
			case "kb":
			case "kilobyte":
			case "kilobytes":
				return( ceiling( value * 1024 ) );
			break;
			case "m":
			case "mb":
			case "megabyte":
			case "megabytes":
				return( ceiling( value * 1024 * 1024 ) );
			break;
			case "g":
			case "gb":
			case "gigabyte":
				return( ceiling( value * 1024 * 1024 * 1024 ) );
			break;
			case "t":
			case "tb":
			case "terabyte":
			case "terabytes":
				return( ceiling( value * 1024 * 1024 * 1024 * 1024 ) );
			break;
			default:
				throw(
					type = "UnsupportedUnit",
					message = "Input unit not recognized",
					extendedInfo = serializeJson( arguments )
				);
			break;
		}

	}


	/**
	* I parse the given quantity/unit string into the number of bytes. This is basically
	* a short-hand for the inputBytesN() function.
	* 
	* Example: parseBytes( "1kb" ) => 1024
	* 
	* @input I am the string to parse and convert.
	*/
	public numeric function parseBytes( required string input ) {

		// RegEx pattern matches leading number followed by trailing strings.
		var parts = input
			.lcase()
			.trim()
			.reMatchNoCase( "^[\d.]+|[a-z]+$" )
		;

		if ( parts.len() != 2 ) {

			throw(
				type = "UnexpectedInput",
				message = "Input string must contain a quantity followed by a unit",
				extendedInfo = serializeJson( arguments )
			);

		}

		var quantity = val( parts[ 1 ] );
		var unit = parts[ 2 ];

		return( inputBytesN( quantity, unit ) );

	}

</cfscript>

As you can see, when converting to bytes, we're really just multiplying by some variation of 1024; and, when converting from bytes, we're really just dividing by some variation of 1024. And, when we run this ColdFusion code, we get the following output:

Testing parseBytes()

1337
2097152
3221225472

Testing inputBytesN()

1
1
1024
1048576
1073741824
1099511627776

Testing formatBytesN()

8
1
1
1
1
1

This was a fun little mental exercise in ColdFusion. Though, looking at the parseBytes() function, it's hard to believe there's still no reMatchGroups() function in ColdFusion - extracting parts of a Regular Expression (RegEx) is still oddly challenging.

Want to use code from this post? Check out the license.

Reader Comments

20 Comments

Re:

it's hard to believe there's still no reMatchGroups() function in ColdFusion - extracting parts of a Regular Expression

Very hard to believe, especially as it's been there since CF2016 ;-). Ref: https://helpx.adobe.com/uk/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/refind.html

Example:
https://trycf.com/gist/7867dc3eb485dc9047088cc9168192fd/acf2016?theme=monokai

This was the result of raising an issue with Adobe to get the feature added, and then them doing so. Ref: https://tracker.adobe.com/#/view/CF-3321666

15,674 Comments

@Adam,

Oh snap!!! I totally missed that one. Looks like it may be Adobe ColdFusion at this point - the Lucee CFML docs have the "scope" option documented; but, when I try to run the code in the inline-editor on the docs, it throws an error that there are too many arguments.

That said, this is awesome! Thanks for pointing this out. Mental model augmented :muscle:

354 Comments

Would be kind of cool if formatBytes didn't require a unit. So if I pass X to it, it recognizes, oh this is greater than 1 meg but less than a gif, so show it as N megs. Oh, this is greater than a gig, so show it as N gigs. Basically, apply the best unit to it.

15,674 Comments

@Raymond,

Yeah, that makes a lot of sense. I wonder if it would make sense to make the unit optional. Then, if it were there, I would use the explicit one; and, if omitted, I could make the "best guess" version.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel