Ask Ben: Breaking An SMS Text Message Up Into Multiple Parts

Posted December 27, 2007 at 2:34 PM by Ben Nadel

Tags: ColdFusion, Ask Ben

This isn't necessarily an "Ask Ben" question; Michael Appenzellar had brought up the concept of breaking up an SMS text message into multiple parts that were, at most, 120 characters each. He was having a bit of trouble breaking it up, so I thought I would throw together a quick little demo. To start with, let's create the text message that you might want to send to someone around this time of the year:

  • <!---
  • Store the message that we would like to split up
  • into MAX:120 character segments.
  • --->
  • <cfsavecontent variable="strMessage">
  •  
  • Deborah, thank you so much for coming over for Christmas
  • celebrations. I had quite a fabulous time. I hope that
  • the present I got for you was not offensive; I just fancy
  • you rather attractive and I could only imagine that that
  • kind of outfit would have looked insanely delicious on you.
  • Happy Holidays.
  •  
  • </cfsavecontent>
  •  
  •  
  • <!---
  • Clean the message - trim it and replace out special
  • characters (line breaks, tabs, carriage returns) with
  • a space.
  • --->
  • <cfset strMessage = REReplace(
  • Trim( strMessage ),
  • "[\t\r\n\s]+",
  • " ",
  • "all"
  • ) />

Don't pay attention to that REReplace() - that just takes the string stored using ColdFusion's CFSaveContent tag and strips out the extra tabbing and line breaks. I just like using CFSaveContent for formatting / display reasons.

Ok, now that we have our message, we want to break it up into 120 max-character SMS text messages. Initially, you might just try to use ColdFusion's Mid() function to grab every 120 character substring of the message:

  • <!--- Break the message into 120 character strings. --->
  • <cfloop
  • index="intOffset"
  • from="1"
  • to="#Len( strMessage )#"
  • step="120">
  •  
  • <!--- Output this max:120 character segment. --->
  • <p>
  • #Mid( strMessage, intOffset, 120 )#
  • </p>
  •  
  • </cfloop>

On paper, this looks good, but when you run it, you see that it's not quite ideal. We end up splitting the message up into these three segments:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the pres

ent I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of outfi

t would have looked insanely delicious on you. Happy Holidays.

As you can see, the word "present" in the first line and the word "outfit" in the second line are split between two SMS text messages. The problem here is that Mid() has no context; it has no understanding of the problem in which it is being used. As such, it doesn't care about splitting words.

Now, you could take that and start adding a bunch of logic to back track characters until you hit a space and then adjust your start offset and stuff. That can all get sticky. The easier approach is to leverage the robust rules that can be applied using Regular Expressions. We can think of our SMS message segments as consisting of a pattern and that pattern is that the captured match must be at most 120 words and must end on an appropriate character (meaning, it cannot end in the middle of a word).

I am going to arbitrarily say that a word is considered "split in half" if the next matched character is NOT a space, dash, colon, or "end of string" character. Anything that does not follow this rule must remain grouped together. To apply this kind of pattern rule, we are going to use a positive look ahead:

.{1,120}(?=([\s\-:]|$))

Now, using that pattern in conjunction with ColdFusion 8's new REMatch() function makes this almost too easy:

  • <!---
  • Get a 120 limit character pattern using regular
  • expression. This is a pattern that can match upto
  • 120 characters and MUST be followed by an acceptable
  • word boundry.
  • --->
  • <cfset arrSegments = REMatch(
  • ".{1,120}(?=([\s\-:]|$))",
  • strMessage
  • ) />
  •  
  •  
  • <!--- Output the segments returned in the array. --->
  • <cfloop
  • index="strSegment"
  • array="#arrSegments#">
  •  
  • <p>
  • #Trim( strSegment )#
  • </p>
  •  
  • </cfloop>

Running this code, we get the following, more appropriate output:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the

present I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of

outfit would have looked insanely delicious on you. Happy Holidays.

Notice that this time, both the words "present" and "outfit" remain in tact, but moved completely to the next SMS text message. Works like a charm. And, since regular expression pattern matching always picks up where it left off, you never have to worry about word wrapping conflicting with the next segment match.

I hope that helps in some way.




Reader Comments

Dec 27, 2007 at 3:04 PM // reply »
110 Comments

Nice! I'd been looking for something like this a while back. I'll keep this code handy :)


Dec 27, 2007 at 4:18 PM // reply »
11,314 Comments

@Gareth,

No problem.


Mar 22, 2009 at 5:04 AM // reply »
21 Comments

Whenever I see your code, and particularly RegExp, I get amazed, and feel you are writing magic.

Really I don't know how this:

.{1,120}(?=([\s\-:]|$))

will be interpreted to find the "end of string" character within limit of 120 characters !!
especially this face ... I mean part:

\-:

joking ... :)

But seriously I think I will never come with this solution to fix the word splitting bug. As I know my self, immediately I will use cfloop and basic cfif to check and do it. Any time I start learning RegExp I feel it difficult and get lazy to continue.

Any Way thanks a lot for your very interesting blog. I hope you always keep improved, and I wish if I can keep up your posts.
:)


Mar 22, 2009 at 12:29 PM // reply »
11,314 Comments

@Ameen,

Thanks man. If you ever need help with any regular expression stuff, just let meknow.


Aug 21, 2009 at 12:45 PM // reply »
5 Comments

This regex IS awesome. Off to http://www.regular-expressions.info/lookaround.html to learn more...

Apparently -- Ben, please correct me if I'm wrong -- Ben is using a "positive lookahead" to match at least one and at most 120 consecutive characters followed by an "acceptable word boundry". An "acceptable word boundry" is a space (\s) or a dash (-) or a colon (:) or end of string ($). REMatch then returns an array of all matched occurrences. The - (hyphen) has to be escaped because it's a special character. So we have the smiley face \-:.

regular-expressions.info explains this on a very siple example: q(?=u) matches a q that is followed by a u, without making the u part of the match. Therefore, in Ben's regex, the "acceptable word boundry" is not returned, and we get at most 120 characters in each match. The "acceptable word boundry" that delimited the previous match is included in the next because it falls under the dot (which means "anything") metacharacter.

Another note: dot will not match a newline, but as Ben has already removed all the newlines, it is a non issue. However, the example will not work without removing the newlines.

Regular expressions make my brain hurt...


Sep 6, 2009 at 1:33 PM // reply »
11,314 Comments

@Nikola,

Correct, I am using a positive look-ahead. They are very powerful. The site you mention, regular expression info is very very good. I refer to it all the time.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Jun 19, 2013 at 10:18 AM
ColdFusion Path Usage And Manipulation Overview
Anyone happen to know if the file created by getTempFile will be automatically removed at any point? Nothing mentioned in the docs, and restarting CF doesn't remove them, so it seems it needs manu ... read »
Jun 19, 2013 at 9:41 AM
Working With Inherited Collections In AngularJS
I actually just ran into this same situation with a demo I was putting together. Your implementation of multi-lvl $scope's > Mine :) ... read »
Jun 19, 2013 at 8:17 AM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
@Prateek, to match a word or text you should use .toContain('word') that's a jasmine reference. website is : http://pivotal.github.io/jasmine/ ... read »
Jun 19, 2013 at 8:10 AM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
Hi Guys, Actually i am doing e2e test of angular js of my project but i am not getting one thing that is how to press enter key through the test when my form is filled as i am not using a button but ... read »
Jun 18, 2013 at 9:20 PM
Mapping AngularJS Routes Onto URL Parameters And Client-Side Events
I couldn't find examples of passing multiple arguments using the when() routing statement so figured out through trial and error that you can pass multiple arguments using the following format: .whe ... read »
Jun 18, 2013 at 3:39 PM
Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion
Hi Ben, THANKS! While not bleeding edge, it is new to me & I like learning new things every day! ... read »
Jun 18, 2013 at 12:30 PM
Disabling Auto-Correct And Auto-Capitalize Features On iPhone Inputs
Also spellcheck="false" should be mentioned as part of html5 specs ... read »
Jun 18, 2013 at 8:40 AM
Using Named Functions Within Self-Executing Function Blocks In Javascript
Hi Ben, you forgot to mention the most important thing for named self-executing functions - they can be referenced by name ONLY inside their execution context (which is parens in this case), it mean ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools