Ask Ben: Breaking An SMS Text Message Up Into Multiple Parts

Posted December 27, 2007 at 2:34 PM

Tags: Ask Ben, ColdFusion

This isn't necessarily an "Ask Ben" question; Michael Appenzellar had brought up the concept of breaking up an SMS text message into multiple parts that were, at most, 120 characters each. He was having a bit of trouble breaking it up, so I thought I would throw together a quick little demo. To start with, let's create the text message that you might want to send to someone around this time of the year:

 Launch code in new window » Download code as text file »

  • <!---
  • Store the message that we would like to split up
  • into MAX:120 character segments.
  • --->
  • <cfsavecontent variable="strMessage">
  •  
  • Deborah, thank you so much for coming over for Christmas
  • celebrations. I had quite a fabulous time. I hope that
  • the present I got for you was not offensive; I just fancy
  • you rather attractive and I could only imagine that that
  • kind of outfit would have looked insanely delicious on you.
  • Happy Holidays.
  •  
  • </cfsavecontent>
  •  
  •  
  • <!---
  • Clean the message - trim it and replace out special
  • characters (line breaks, tabs, carriage returns) with
  • a space.
  • --->
  • <cfset strMessage = REReplace(
  • Trim( strMessage ),
  • "[\t\r\n\s]+",
  • " ",
  • "all"
  • ) />

Don't pay attention to that REReplace() - that just takes the string stored using ColdFusion's CFSaveContent tag and strips out the extra tabbing and line breaks. I just like using CFSaveContent for formatting / display reasons.

Ok, now that we have our message, we want to break it up into 120 max-character SMS text messages. Initially, you might just try to use ColdFusion's Mid() function to grab every 120 character substring of the message:

 Launch code in new window » Download code as text file »

  • <!--- Break the message into 120 character strings. --->
  • <cfloop
  • index="intOffset"
  • from="1"
  • to="#Len( strMessage )#"
  • step="120">
  •  
  • <!--- Output this max:120 character segment. --->
  • <p>
  • #Mid( strMessage, intOffset, 120 )#
  • </p>
  •  
  • </cfloop>

On paper, this looks good, but when you run it, you see that it's not quite ideal. We end up splitting the message up into these three segments:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the pres

ent I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of outfi

t would have looked insanely delicious on you. Happy Holidays.

As you can see, the word "present" in the first line and the word "outfit" in the second line are split between two SMS text messages. The problem here is that Mid() has no context; it has no understanding of the problem in which it is being used. As such, it doesn't care about splitting words.

Now, you could take that and start adding a bunch of logic to back track characters until you hit a space and then adjust your start offset and stuff. That can all get sticky. The easier approach is to leverage the robust rules that can be applied using Regular Expressions. We can think of our SMS message segments as consisting of a pattern and that pattern is that the captured match must be at most 120 words and must end on an appropriate character (meaning, it cannot end in the middle of a word).

I am going to arbitrarily say that a word is considered "split in half" if the next matched character is NOT a space, dash, colon, or "end of string" character. Anything that does not follow this rule must remain grouped together. To apply this kind of pattern rule, we are going to use a positive look ahead:

.{1,120}(?=([\s\-:]|$))

Now, using that pattern in conjunction with ColdFusion 8's new REMatch() function makes this almost too easy:

 Launch code in new window » Download code as text file »

  • <!---
  • Get a 120 limit character pattern using regular
  • expression. This is a pattern that can match upto
  • 120 characters and MUST be followed by an acceptable
  • word boundry.
  • --->
  • <cfset arrSegments = REMatch(
  • ".{1,120}(?=([\s\-:]|$))",
  • strMessage
  • ) />
  •  
  •  
  • <!--- Output the segments returned in the array. --->
  • <cfloop
  • index="strSegment"
  • array="#arrSegments#">
  •  
  • <p>
  • #Trim( strSegment )#
  • </p>
  •  
  • </cfloop>

Running this code, we get the following, more appropriate output:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the

present I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of

outfit would have looked insanely delicious on you. Happy Holidays.

Notice that this time, both the words "present" and "outfit" remain in tact, but moved completely to the next SMS text message. Works like a charm. And, since regular expression pattern matching always picks up where it left off, you never have to worry about word wrapping conflicting with the next segment match.

I hope that helps in some way.

Download Code Snippet ZIP File

Comments (2)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page





Reader Comments

Nice! I'd been looking for something like this a while back. I'll keep this code handy :)

Posted by Gareth on Dec 27, 2007 at 3:04 PM


@Gareth,

No problem.

Posted by Ben Nadel on Dec 27, 2007 at 4:18 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting