Skip to main content
Ben Nadel at NCDevCon 2016 (Raleigh, NC) with: Roger Austin
Ben Nadel at NCDevCon 2016 (Raleigh, NC) with: Roger Austin ( @RogerTheGeek )

Splitting An Array Into Groups In ColdFusion 2021

By on
Tags:

The other day, I was dealing with a long list of IDs that I needed to use in a SQL query. In fact, the list was so long that the generated SQL statement was exceeding the max packet size property of the MySQL server. As such, I had to split the list of IDs into groups and perform the SQL query in chucks. I was surprised to see that our Utilities.cfc at work didn't already have a user defined function (UDF) for this. As such, I wrote a function that splits an array into groups of a given max-length in ColdFusion.

The logic in this function is fairly simple: I'm using an old-school for-loop to iterate over the given collection is "group size" steps. Then, I'm just slicing-out the desired values. Really, the only complexity here at all is the fact that the Slice function in ColdFusion is NOT SAFE. As such, I have to take care when dealing with the end-bound of the array.

<cfscript>

	values = [ "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k" ];

	writeDump( splitArrayIntoGroups( values, 5 ) );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I split the given collection into separate arrays, each of which has the given max
	* length.
	*/
	public array function splitArrayIntoGroups(
		required array collection,
		required numeric maxLength
		) {

		var collectionLength = collection.len();
		var groups = [];

		for ( var i = 1 ; i <= collectionLength ; i += maxLength ) {

			// CAUTION: The slice() method is NOT SAFE, and will throw an error if we go
			// out-of-bounds on the given collection. As such, we have to make sure that
			// our group-length argument never goes past the last index.
			var groupLength = min( maxLength, ( collectionLength - i + 1 ) );

			groups.append( collection.slice( i, groupLength ) );

		}

		return( groups );

	}

</cfscript>

Here, we're splitting the collection of 11 letters into groups of 5. And, when we run this ColdFusion code, we get the following output:

An array of 11-elements was split into 2 groups of 5-elements, and 1 group of the remaining 1-element.

As you can see, the collection of 11-letters was split into 3 groups: 2 of max-length 5 and then 1 that contained the remaining 1-element.

I really wish that the arraySlice() method in ColdFusion was safe. Having to worry about going out-of-bounds is quite a pain. A few years ago, I wrote an arraySliceSafe() function, and then demonstrated how many other Array-functions would become trivial to write.

Anyway, just a fun little weekend code kata. ColdFusion code is easy like a Sunday morning.

Want to use code from this post? Check out the license.

Reader Comments

205 Comments

We've probably all written utility functions like this. Super helpful. The most challenging part (for me) is naming these functions. Yours is named very well. I think I called mine chunk() LOL 🤣

15,674 Comments

@Chris,

You can't go wrong with "chunk" 😆 Honestly, I use "chunk" and "group" interchangeably in my code all the time.

15,674 Comments

In this post, I mentioned that I needed this "split array" concept in order to get around some issues that I was running into with a large SQL statement. As a quick follow-up, I wrote about that here:

www.bennadel.com/blog/4238-avoiding-mysql-max-allowed-packet-errors-by-splitting-up-large-aggregation-queries-in-coldfusion.htm

Basically, I was generating a WHERE IN() clause that was too large for the MySQL server. So, I had to take the list of IDs, break it up into chunks, and then divide-and-conquer the queries.

18 Comments

I too have written a chunk function and my first implementation used slice. On large arrays (100k+ elements), response time would be measured in seconds so I had to go back to the drawing board.

15,674 Comments

@Robert,

Curious to know what kind of a solution you ended-up with? Just a totally different approach altogether, like making just gabbing "first-N" elements instead of splitting the entire array?

18 Comments

While not very pretty, it is fast.

function chunk(arr, sz) {
    var out = [];
    var ceil = ceiling(arr.len() / sz);
    for(var i = 1; i <= ceil; i++) {
        var t = [];
        var offset = (i - 1) * sz;
        if(i == ceil){
            var c = arr.len() - offset < sz ? arr.len() - offset : sz;
            for(var x = 1; x <= c; x++) t.append(arr[offset + x]);
        }
        else {
            for(var x = 1; x <= sz; x++) t.append(arr[offset + x]);
        }
        out.append(t);
    }
    return out;
}
15,674 Comments

@Robert,

Ah, very cool. I had a previous implementation that took an approach that looked more like yours. Only, instead of calculating offsets internally, it would just build-up another array. It was something like this:

public array function chunk(
	required array collection,
	required numeric size
	) {

	var results = [];
	var chunk = [];

	for ( var item in collection ) {

		chunk.append( item );

		if ( chunk.len() >= size ) {

			results.append( chunk );
			chunk = [];

		}

	}

	if ( chunk.len() ) {

		results.append( chunk );

	}

	return( results );

}

Really, the only difference here is that I'm iterating over the main collection using a for-in loop and you're doing it using an index loop; but, we're both building up intermediary chunks and then appending them to the result.

I love how many ways there are to do stuff like this!

15,674 Comments

@All,

A word of caution. It seems that the arraySlice() function in Lucee has some serious performance overhead and does not play nicely with very large arrays. As the size of the array increases, the performance of the arraySlice() method goes does drastically:

www.bennadel.com/blog/4253-arrayslice-has-an-exponential-performance-overhead-in-lucee-cfml-5-3-8-201.htm

By switching from arraySlice() and over to an algorithm, like the ones that @Bruce and I were discussing, I was able to improve one data-processing algorithm by 500x.

16 Comments

Thanks, yet again Ben!

Just wondered how you handle your utilities.cfc? I have a sprawling one that has been updated over, well, basically my entire career, and is instantiated as a singleton and passed into pretty much all of my other CFCs... but if anyone has a more elegant way, I'm sure it's you.

As an aside, I've often run into a similar problem with MSSQL (> 2000 params) and solved with a dynamically created CTE containing all the IDs I need and sub selecting that in the WHERE clause.

Something like:

WITH IDs AS (
	SELECT  ID
	FROM    (VALUES	#Local.IDs#) AS IDs(ID)
)

SELECT 	Products.ID
	,	Products.ProductName
FROM 	Products
WHERE 	Products.ID IN (SELECT ID FROM IDs)

You have to manually sanitize your ID list first as you can't use cfqueryparam, but it works well for me.

15,674 Comments

@Gary,

Oh chickens, I love the idea of creating a materialized table with a list of IDs. And, more than that, the fact that you used the comma-delimited nature of the ColdFusion List to provide the VALUES collection - well played!

Re: Utilities.cfc, honestly, I used to try to be more clever with it; but, in the end, I've started to remove the complexity and do exactly what you are doing: one gigantor component with a grab-bag full of small functions. Then, I just inject it wherever it is needed. Basically the same as Lodash, but on the server.

The only real exception to that is if I see a number of utility methods that are all tightly related and specialized, I may break those out into their own library. Kind of like when I wanted to do a bunch of things with Java RegEx functionality, and I put that it into its on JRE component.

But mostly, I just jam them all into Utilities.cfc and call it a day.

16 Comments

Just thought I'd come back and update this with some new information...
Having been using this approach for many years, I've finally run into an issue whereby SQL Server is unable to process such a complex query. In this instance we're passing a list of around 60k IDs, which to be fair, is a bit excessive!

SQL Server returns an error I've never seen before:
The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query.

Careful out there, folks.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel