Ask Ben: Iterating Over The Characters In A String

Posted October 2, 2006 at 1:57 PM by Ben Nadel

Tags: ColdFusion, Ask Ben

How can I loop over a string and get each character?

What you are trying to do is iterate over the characters in a string value. In ColdFusion, there are several ways to accomplish this. Some are fairly low level while others are more complex. I will quickly discuss three ways here.

First, as always though, let's just build a string to test with:

  • <!--- Set up a string to iterate over. --->
  • <cfset strText = "Oh man! That Libby is one hot mamma!" />

Now, for our testing, we will iterate over each letter and in a bracket, output both the index of the character and the character itself.

ColdFusion Mid() Function Method

Probably the simplest and most commonly used method is to loop over the Length of a string and get the character at a given index:

  • <!--- Loop over the length of the string. --->
  • <cfloop
  • index="intChar"
  • from="1"
  • to="#Len( strText )#"
  • step="1">
  •  
  • <!---
  • Get the character at the given index. The ColdFusion
  • Mid() function takes the string, the starting index
  • and the number of characters to return. In this case,
  • we only want one character, so the length is one.
  • --->
  • <cfset strChar = Mid( strText, intChar, 1 ) />
  •  
  • <!--- Output the character and the index. --->
  • [#intChar#:#strChar#]
  •  
  • </cfloop>

This gives us the following output:

[1:O] [2:h] [3: ] [4:m] [5:a] [6:n] [7:!] [8: ] [9:T] [10:h] [11:a] [12:t] [13: ] [14:L] [15:i] [16:b] [17:b] [18:y] [19: ] [20:i] [21:s] [22: ] [23:o] [24:n] [25:e] [26: ] [27:h] [28:o] [29:t] [30: ] [31:m] [32:a] [33:m] [34:m] [35:a] [36:!]

ColdFusion / Java Byte Array Method

This one takes the complexity up a notch. We are going to convert the string to a byte array where each index holds the byte representation of a single character in the string. Then, instead of looping over the length of the string, we loop over the length of the byte array.

  • <!---
  • Get the byte array from the text string. Here we are
  • directly calling the underlying Java methods of the
  • string object.
  • --->
  • <cfset arrBytes = strText.GetBytes() />
  •  
  • <!---
  • Loop over the byte array as if it were standard
  • ColdFusion array object.
  • --->
  • <cfloop
  • index="intChar"
  • from="1"
  • to="#ArrayLen( arrBytes )#"
  • step="1">
  •  
  • <!---
  • Get the character at the given index. The bytes are
  • ascii representations of the number. We have to get
  • the Char() of those values.
  • --->
  • <cfset strChar = Chr( arrBytes[ intChar ] ) />
  •  
  • <!--- Output character and the index. --->
  • [#intChar#:#strChar#]
  •  
  • </cfloop>

This gives us the same exact output as the first method. As you can see, even though we are returning a Java byte array for the string (which is traditionally indexed at zero), we can treat it as if it were a standard ColdFusion array and start the index at one. ColdFusion rocks for some nice automatic conversions.

Java String Character Iterator Method

This takes it up one more notch. This creates an instance of the Java string character iterator, java.text.StringCharacterIterator, whose sole purpose in life is to do exactly what you are trying to do. We initialize the iterator with the given string, then we keep asking it for the next character until it is done iterating:

  • <!---
  • Create a string character iterator. Pass it the string as
  • a constructor parameter so that it can create its own
  • internal representation of the string.
  • --->
  • <cfset objIterator = CreateObject(
  • "java",
  • "java.text.StringCharacterIterator"
  • ).Init(
  • strText
  • ) />
  •  
  • <!---
  • We want to keep looping until the current character is
  • the DONE character.
  • --->
  • <cfloop condition="objIterator.Current() NEQ objIterator.DONE">
  •  
  • <!--- Get the current character. --->
  • <cfset strChar = objIterator.Current() />
  •  
  • <!--- Output the current character. --->
  • [#objIterator.GetIndex()#:#strChar#]
  •  
  • <!--- Get the iterator to move onto the next character. --->
  • <cfset objIterator.Next() />
  •  
  • </cfloop>

This gives us "basically" the same output. Sort of. It gives us the same characters but at different indexes:

[0:O] [1:h] [2: ] [3:m] [4:a] [5:n] [6:!] [7: ] [8:T] [9:h] [10:a] [11:t] [12: ] [13:L] [14:i] [15:b] [16:b] [17:y] [18: ] [19:i] [20:s] [21: ] [22:o] [23:n] [24:e] [25: ] [26:h] [27:o] [28:t] [29: ] [30:m] [31:a] [32:m] [33:m] [34:a] [35:!]

As you can see, this starts at index zero, not one. This is because Java in zero based and we are using it in a completely Java way. In the second example we created a Java array but used it like it was a ColdFusion array and hence the automatic index translation. In this case, however, the iteration is done internally to the Java StringCharacterIterator instance and hence, no automatic conversion.

So that's three different ways to iterate over a string. Which one to use? I guess it depends on what you are trying to do? Each of them is going to have pros and cons. I have not done any specific testing, but I can guess that the byte array method (example 2) is the fastest since iterating over an array is always fast. I don't know how the character iterator object does things, so it might be faster or slower than the Mid() methodology. But of course, it depends on how simple or complex you want to be. If it's simple, go with the Mid() methodology... it's just easy. But if you have a ton of data to go through, you might consider the slightly more complicated byte array method.

I, personally, have never used the character iterator in practice. I suppose that you would use it if you need to pass it off to another black boxed algorithm that is expecting something that implements the CharacterIterator interface. But I doubt you are doing that.



Reader Comments

Jan 3, 2008 at 3:37 PM // reply »
16 Comments

Would the byte array method work with UTF-8?


Jan 3, 2008 at 4:49 PM // reply »
11,246 Comments

@Kate,

Interesting question! I am not sure off-hand.


Jan 29, 2008 at 7:41 PM // reply »
1 Comments

Is it possible to take an XML file, read it and convert it to a single byte array? Also is it possilbe to determine the total filesize of that array?

Thanks,
Jimmy


Jan 30, 2008 at 8:04 AM // reply »
11,246 Comments

@Jimmy,

In the underlying Java of ColdFusion, you can take a string and convert it to a byte array by simply calling .GetTypes() on it. This returns data of type byte[].

The file size, I assume would just be the length of the array? If each index is one byte, the file size is directly mirrored by the length of the array?? I think.


Jan 22, 2009 at 11:29 PM // reply »
2 Comments

If you want to get the length with unicode characters, then you need to use GetBytes("UTF8") instead of GetBytes().

Then you can do either Len(ArrBytes) or ArrayLen(ArrBytes) to get the number of bytes.


Jan 23, 2009 at 9:36 AM // reply »
11,246 Comments

@Matt,

Thanks for the tip. I deal so little with extended characters, I have to admit that unicode is a real weakness in my understanding.


Jan 23, 2009 at 2:09 PM // reply »
2 Comments

Not much strength either ;). But recently I found a bug in Coldfusion where Coldfusion forces IIS to close a connection even if you have Keep-Alive enabled. This is due to Coldfusion not passing a "Content-Length" response header as it should.

I have to calculate the document size and then return it in Content-Length, but I realized that I had unicode characters and GetBytes() counted them as 1 byte instead of 3 or 4.

So GetBytes("Utf8") will count your document correctly, whether your document has Unicode characters or not.


May 28, 2009 at 10:19 AM // reply »
2 Comments

Many thanks for the information. It was very helpful for me.

Regards.


Nov 29, 2010 at 9:00 PM // reply »
1 Comments

Hey Ben,

Just used some of the above code. It's still good after 4+ years. Just wanted to thank you. Amazing how you take the time to provide these awesome posts.


Dec 5, 2010 at 3:09 PM // reply »
11,246 Comments

@Jason,

My pleasure good man :)


Jul 13, 2011 at 10:16 AM // reply »
1 Comments

I want to loop over a string to determine if there are ASCII characters ....ex:char(124) %20

am i missing a way to determine this via the iterations noted above?


Sep 8, 2011 at 3:35 PM // reply »
1 Comments

Hey Ben, I've read / learned / used a bunch off of your posts and haven't thanked you yet - so first off thanks!

Second thing, I was looking for ways to iterate over a string's characters because I was trying to trim multiple consecutive blank spaces in the middle of a string from a data feed before storing it in my database. These work great - but was curious if you knew of another (better) method to use for that specific purpose? If not i'll just use the byte array method.

thx again,
Mike


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 24, 2013 at 5:39 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
@Adam Oops! My mistake! I hadn't gotten that far in my testing - I'm still baby stepping my way through the process. ... read »
May 24, 2013 at 5:13 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
Hi Jason, Thanks for checking up on that, but I still stand firm on my position. :) There are actually two listLast()'s in use, and you're right that the one using a space as a delimiter is fine. ... read »
May 24, 2013 at 4:45 PM
Ask Ben: Manually Enforcing Basic HTTP Authorization In ColdFusion
@Ben I have been lurking your site for quite some time, and haven't stepped up to comment until today. Thanks for all the great info - keep it up! @Adam I believe you are mistaken... as the commen ... read »
May 24, 2013 at 11:21 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@WebManWalking, Ha ha, let's us never speak of justifying "##" notation again :P ... read »
May 24, 2013 at 11:18 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Ah, so it was indeed how I vaguely remembered it to be: A direct assignment value = users.id[ i ] causes value to retain the sticky datatype of the query column. Although unnecessary in ... read »
May 24, 2013 at 9:11 AM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Brandon, Hi, No, I haven't been able to do that. I have just kept it as it is. ... read »
May 23, 2013 at 9:52 PM
Preventing Links In Standalone iPhone Applications From Opening In Mobile Safari
@Muhmmadibn Did you figure out a solution to launching PDFs? I am running into the same issues myself. There is no way to close the PDF or go back once you launch it. Thanks in advance! ... read »
May 23, 2013 at 6:06 PM
The Girl Who Broke My Heart, And Made Me A Better Person
Good day,ladies and gentle men, my name is Dr AMADI the great spell caster in Africa, i have help so many people for different kind of problems,who say there is no solution to problems on earth, that ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools