Ask Ben: Breaking An SMS Text Message Up Into Multiple Parts

Posted December 27, 2007 at 2:34 PM

Tags: ColdFusion, Ask Ben

This isn't necessarily an "Ask Ben" question; Michael Appenzellar had brought up the concept of breaking up an SMS text message into multiple parts that were, at most, 120 characters each. He was having a bit of trouble breaking it up, so I thought I would throw together a quick little demo. To start with, let's create the text message that you might want to send to someone around this time of the year:

 Launch code in new window » Download code as text file »

  • <!---
  • Store the message that we would like to split up
  • into MAX:120 character segments.
  • --->
  • <cfsavecontent variable="strMessage">
  •  
  • Deborah, thank you so much for coming over for Christmas
  • celebrations. I had quite a fabulous time. I hope that
  • the present I got for you was not offensive; I just fancy
  • you rather attractive and I could only imagine that that
  • kind of outfit would have looked insanely delicious on you.
  • Happy Holidays.
  •  
  • </cfsavecontent>
  •  
  •  
  • <!---
  • Clean the message - trim it and replace out special
  • characters (line breaks, tabs, carriage returns) with
  • a space.
  • --->
  • <cfset strMessage = REReplace(
  • Trim( strMessage ),
  • "[\t\r\n\s]+",
  • " ",
  • "all"
  • ) />

Don't pay attention to that REReplace() - that just takes the string stored using ColdFusion's CFSaveContent tag and strips out the extra tabbing and line breaks. I just like using CFSaveContent for formatting / display reasons.

Ok, now that we have our message, we want to break it up into 120 max-character SMS text messages. Initially, you might just try to use ColdFusion's Mid() function to grab every 120 character substring of the message:

 Launch code in new window » Download code as text file »

  • <!--- Break the message into 120 character strings. --->
  • <cfloop
  • index="intOffset"
  • from="1"
  • to="#Len( strMessage )#"
  • step="120">
  •  
  • <!--- Output this max:120 character segment. --->
  • <p>
  • #Mid( strMessage, intOffset, 120 )#
  • </p>
  •  
  • </cfloop>

On paper, this looks good, but when you run it, you see that it's not quite ideal. We end up splitting the message up into these three segments:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the pres

ent I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of outfi

t would have looked insanely delicious on you. Happy Holidays.

As you can see, the word "present" in the first line and the word "outfit" in the second line are split between two SMS text messages. The problem here is that Mid() has no context; it has no understanding of the problem in which it is being used. As such, it doesn't care about splitting words.

Now, you could take that and start adding a bunch of logic to back track characters until you hit a space and then adjust your start offset and stuff. That can all get sticky. The easier approach is to leverage the robust rules that can be applied using Regular Expressions. We can think of our SMS message segments as consisting of a pattern and that pattern is that the captured match must be at most 120 words and must end on an appropriate character (meaning, it cannot end in the middle of a word).

I am going to arbitrarily say that a word is considered "split in half" if the next matched character is NOT a space, dash, colon, or "end of string" character. Anything that does not follow this rule must remain grouped together. To apply this kind of pattern rule, we are going to use a positive look ahead:

.{1,120}(?=([\s\-:]|$))

Now, using that pattern in conjunction with ColdFusion 8's new REMatch() function makes this almost too easy:

 Launch code in new window » Download code as text file »

  • <!---
  • Get a 120 limit character pattern using regular
  • expression. This is a pattern that can match upto
  • 120 characters and MUST be followed by an acceptable
  • word boundry.
  • --->
  • <cfset arrSegments = REMatch(
  • ".{1,120}(?=([\s\-:]|$))",
  • strMessage
  • ) />
  •  
  •  
  • <!--- Output the segments returned in the array. --->
  • <cfloop
  • index="strSegment"
  • array="#arrSegments#">
  •  
  • <p>
  • #Trim( strSegment )#
  • </p>
  •  
  • </cfloop>

Running this code, we get the following, more appropriate output:

Deborah, thank you so much for coming over for Christmas celebrations. I had quite a fabulous time. I hope that the

present I got for you was not offensive; I just fancy you rather attractive and I could only imagine that that kind of

outfit would have looked insanely delicious on you. Happy Holidays.

Notice that this time, both the words "present" and "outfit" remain in tact, but moved completely to the next SMS text message. Works like a charm. And, since regular expression pattern matching always picks up where it left off, you never have to worry about word wrapping conflicting with the next segment match.

I hope that helps in some way.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Dec 27, 2007 at 3:04 PM // reply »
102 Comments

Nice! I'd been looking for something like this a while back. I'll keep this code handy :)


Dec 27, 2007 at 4:18 PM // reply »
6,516 Comments

@Gareth,

No problem.


Mar 22, 2009 at 5:04 AM // reply »
16 Comments

Whenever I see your code, and particularly RegExp, I get amazed, and feel you are writing magic.

Really I don't know how this:

.{1,120}(?=([\s\-:]|$))

will be interpreted to find the "end of string" character within limit of 120 characters !!
especially this face ... I mean part:

\-:

joking ... :)

But seriously I think I will never come with this solution to fix the word splitting bug. As I know my self, immediately I will use cfloop and basic cfif to check and do it. Any time I start learning RegExp I feel it difficult and get lazy to continue.

Any Way thanks a lot for your very interesting blog. I hope you always keep improved, and I wish if I can keep up your posts.
:)


Mar 22, 2009 at 12:29 PM // reply »
6,516 Comments

@Ameen,

Thanks man. If you ever need help with any regular expression stuff, just let meknow.


Aug 21, 2009 at 12:45 PM // reply »
4 Comments

This regex IS awesome. Off to http://www.regular-expressions.info/lookaround.html to learn more...

Apparently -- Ben, please correct me if I'm wrong -- Ben is using a "positive lookahead" to match at least one and at most 120 consecutive characters followed by an "acceptable word boundry". An "acceptable word boundry" is a space (\s) or a dash (-) or a colon (:) or end of string ($). REMatch then returns an array of all matched occurrences. The - (hyphen) has to be escaped because it's a special character. So we have the smiley face \-:.

regular-expressions.info explains this on a very siple example: q(?=u) matches a q that is followed by a u, without making the u part of the match. Therefore, in Ben's regex, the "acceptable word boundry" is not returned, and we get at most 120 characters in each match. The "acceptable word boundry" that delimited the previous match is included in the next because it falls under the dot (which means "anything") metacharacter.

Another note: dot will not match a newline, but as Ben has already removed all the newlines, it is a non issue. However, the example will not work without removing the newlines.

Regular expressions make my brain hurt...


Sep 6, 2009 at 1:33 PM // reply »
6,516 Comments

@Nikola,

Correct, I am using a positive look-ahead. They are very powerful. The site you mention, regular expression info is very very good. I refer to it all the time.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 20, 2009 at 11:32 PM
Five Months Without Hungarian Notation And I'm Loving It
I've used headless camel case for years for not only ColdFusion variables, but also SQL tables and fields... pretty much everything involving code. I also subscribe to the "don't abbreviate and clea ... read »
Nov 20, 2009 at 11:00 PM
Five Months Without Hungarian Notation And I'm Loving It
@Marcel, Yeah, I always err on the side of longer but more readable variable names. As for the camel casing of CF methods and the headless camel casing of custom items, I get around this by always ... read »
Nov 20, 2009 at 10:56 PM
Five Months Without Hungarian Notation And I'm Loving It
I use the following and love it: my.namespace.MyComponents.functionMethodsOrUDF() CONSTANT_VALUES_OR_PROPERTIES One thing I always try is to CamelCaseBuiltInColdFusionFunctions() so others can tell ... read »
Nov 20, 2009 at 5:38 PM
Learning ColdFusion 8: CFImage Part I - Reading And Writing Images
Hi Ben, Great article. I've been looking around to see if ColdFusion image engine can programatically create the following "wrap around" effect: http://www.creativepro.com/article/photoshop-s-she ... read »
Nov 20, 2009 at 5:35 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Dave: I talked to Gert he suggested: <cfhttp method="get" url="http://{some cf website}" result="stuff" addtoken="yes" /> Note the addition of cfhttp attribute addtoken. That should persist y ... read »
Nov 20, 2009 at 5:23 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
@Todd, Ahh, gotcha, yeah that makes sense. ... read »
Nov 20, 2009 at 5:17 PM
Maintaining ColdFusion Sessions Across SMS Text Message Requests Without Cookies
Ben, sorry if I didn't make this clear. You can make it work like that if you want, just put <cfset session.foo = 1> (and <cfset application.foo = 1>) in your OnRequestStart() and it reve ... read »