Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at CFUNITED 2008 (Washington, D.C.) with:

Ask Ben: Displaying A Blog Teaser (Showing The First N Words)

By Ben Nadel on

Hi Ben, I am writing from London, United Kingdom. How can I display the first 50 words of a blog post as a teaser using some jQuery code snippet? I gotta follow you on twitter. cheers.

I know you asked to see this as a jQuery code snippet - and I will get to that - but first, I want to approach this from a ColdFusion view point. I think it makes less sense to do it with jQuery because by the time we can even execute jQuery, the contents of the page have already loaded; as such, we don't get the benefit of smaller page load times that a teaser might afford us.

I am sure that there are a number of ways to get the first N words in a string, but to me, the most obvious way is to use a regular expression. While it might not be immediately clear as to how "pattern matching" can help us in a case like this, if you think about the first 50 words as a "word item" matched at most 50 times in succession, then suddenly, getting the first 50 words as a pattern begins to make sense. For ease of use, when I do something like this in an HTML context - I like to strip out all markup tags before I grab the first N words. Leaving markup in, while possible, adds a huge level of complexity that is beyond the scope of this blog entry. Once the markup is removed, I then grab the first N words using the regular expression concept I just touched upon.

This action is quite small in scope; so, I thought to myself - how can we take this idea and make it even more effective? Well, as long as we're getting the first 50 words, we might as well display to the user how many words there are in the full content so that they may have an idea of what they are in for. And then I thought, if we're gonna get the total number of words in the full content, then let's get really crazy and actually give the user an estimated read time based on an average words-per-minute value.

With this new focus in mind, here is the ColdFusion solution that I came up with:

  • <!--- Save some content that might appear in a blog post. --->
  • <cfsavecontent variable="blogContent">
  •  
  • <p>
  • Last night, I had the <em>craziest</em> dream about
  • <strong>Tricia</strong>. It was one of those dreams where
  • you know you're dreaming beacuse it's <em>way more
  • awesome</em> than anything that's happened in real life.
  • </p>
  •  
  • <p>
  • We're standing in the locker room talking and she asks me
  • if I would mind if she changed out of her gym clothes
  • while we're talking... would <em>I mind</em>?!? I say,
  • "Not at all," trying to mask my wild enthusiasm.
  • </p>
  •  
  • </cfsavecontent>
  •  
  •  
  • <!---
  • To display the teaser, the first thing I like to do is
  • strip out all of the formatting tags. Leaving that tags in
  • just opens yourself up to a world of complexity that you
  • don't want to deal with.
  •  
  • Let's replace the tags with a space so that we don't cause
  • any unintended word concatination (this will cause some
  • extra spacing, but there is only so much fine-tuning we
  • can do in such a generic process).
  • --->
  • <cfset blogContent = reReplace(
  • blogContent,
  • "</?\w+(\s*[\w:]+\s*=\s*(""[^""]*""|'[^']*'))*\s*/?>",
  • " ",
  • "all"
  • ) />
  •  
  • <!---
  • Now, let's minimize any white space characters. Technically,
  • since HTML doesn't render more than one white space character,
  • this isn't an issue; but, it's something I like to do anyway
  • for intent.
  • --->
  • <cfset blogContent = reReplace(
  • trim( blogContent ),
  • "\s+",
  • " ",
  • "all"
  • ) />
  •  
  • <!---
  • Now that we have cleaned our blog content, we need to extract
  • the first 50 words. While this might not seem like a pattern,
  • I think using a regular expression for this is a very nice
  • approach. Think of it this way: the first 50 words is like
  • finding a pattern that is 50 instances long.
  •  
  • For ease-of-use, we are going to define a word TOKEN as any
  • non-spaces followed by a space.
  •  
  • NOTE: Because regular expressions are, by default, greedy,
  • the notation {1,50} will try to match all 50 word tokens
  • before it tries to match anything less.
  • --->
  • <cfset blogSnippets = reMatch(
  • "([^\s]+\s?){1,50}",
  • blogContent
  • ) />
  •  
  • <!---
  • As a safe-guard, let's double-check to make sure that the
  • reMatch() method returned at least one match. If a blog entry
  • was purely visual (ie. just an image or set of images), then
  • no words would be returned. If that is the case, then let's
  • make up a content description.
  • --->
  • <cfif !arrayLen( blogSnippets )>
  •  
  • <!---
  • Fabricate content snippets. This is an outlier case, but
  • we want to do this for mathematical stability.
  • --->
  • <cfset blogSnippets = [ "Visual content only" ] />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Since reMatch() returns an array of matches, we have now
  • broken our entire blog content up into an array in which
  • each index has 50 or less words. As such, we can use the
  • first index value as our blog teaser.
  •  
  • But, since we broke up the entire blog post, we can also
  • now estimate the total number of words in the entire blog
  • entry (if we want to output it).
  •  
  • We can even go one step further and give the reader an
  • approciate time investment they would have to make based on
  • an average number of words per minute.
  • --->
  • <cfoutput>
  •  
  • <!---
  • Get the approximate word count. The first N-1 matches
  • will be 50; then, we can use the last returned match as
  • a space-delimitted list.
  • --->
  • <cfset wordCount = (
  • (50 * (arrayLen( blogSnippets ) - 1)) +
  • listLen( blogSnippets[ arrayLen( blogSnippets ) ], " " )
  • ) />
  •  
  • <!---
  • Get the estimated time requirement for reading this blog
  • entry - this assumes that a user can read about 200 words
  • per minutes. This can be much higher, but for our
  • purposes, this is fine.
  •  
  • NOTE: Our max here is to make sure the read time never
  • goes below one minute.
  • --->
  • <cfset readingTime = max( round( wordCount / 200 ), 1 ) />
  •  
  •  
  • <p>
  • <!---
  • The first snippet of 50 words will serve as our
  • blog teaser
  • --->
  • #blogSnippets[ 1 ]#
  •  
  • <span style="white-space: nowrap ;">
  • <!--- Words and time estimates. --->
  • [<em>#wordCount# words /
  • aprx. read time: #readingTime# minute(s)</em>]
  • &hellip; <a href="##">read more</a> &raquo;
  • </span>
  • </p>
  •  
  • </cfoutput>

As you can see in the above code, after we strip out the markup, we use the following regular expression to break up the remaining textual content:

([^\s]+\s?){1,50}

This regular expression is very general and defines a "word token" as any sequence of non-space characters followed by an optional space. This "word token" is then matched at most 50 times in succession. When we run this through ColdFusion's REMatch() function, it takes the entire blog content and splits it up into chunks of 50 words (returned in an array). With this array, we can now get our first 50 words from the first index and our total word count based on the number of indices available.

When we run the above code, we get the following output:

Last night, I had the craziest dream about Tricia . It was one of those dreams where you know you're dreaming because it's way more awesome than anything that's happened in real life. We're standing in the locker room talking and she asks me if I would mind if she [74 words / aprx. read time: 1 minute(s)] ... read more »

So that's the ColdFusion method. To me, this makes a lot of sense as the full content would be available in the click-through on the "read more" link. But, I know you asked to see this as a jQuery code snippet, and I appreciate you bearing with me. To make this more generic, I figured I would try to wrap this up in a jQuery plugin such that it can be used to create a teaser for any kind of content.

 
 
 
 
 
 
 
 
 
 

This jQuery code follows the same concept as that defined in the ColdFusion solution - it strips out the markup (using the text() method) and then creates a teaser and a read more link:

  • <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  • <html>
  • <head>
  • <title>Displaying A Content Teaser With jQuery</title>
  • <script type="text/javascript" src="jquery-1.3.2.js"></script>
  • <script type="text/javascript">
  •  
  • // Let's create a jQuery plugin that creates content
  • // teasers with word counts and time estimates.
  • $.fn.teaser = function( teaserOptions ){
  • // Create the internal options.
  • var options = $.extend(
  • {},
  • $.fn.teaser.options,
  • teaserOptions
  • );
  •  
  • // Iterate over each item in this collection such
  • // that each teaser can be applied individually based
  • // on targeted information.
  • this.each(
  • function(){
  • var container = $( this );
  •  
  • // First, we want to wrap the contents of the
  • // container in a new DIV so that we can hide
  • // it in lieu of our teaser.
  • var fullContent = container.children()
  • .wrapAll(
  • "<div class=\"full-content\"></div>"
  • )
  • .parent()
  • ;
  •  
  • // Now, let's get all the text from the
  • // content container so that we can start
  • // constructing our teaser.
  • var rawContent = $.trim( fullContent.text() );
  •  
  • // Remove any extra white space.
  • rawContent = rawContent.replace(
  • new RegExp( "\\s+", "g" ),
  • " "
  • );
  •  
  • // Break the content up into chunks based on
  • // the words count.
  • var snippets = rawContent.match(
  • new RegExp(
  • "([^\\s]+\\s?){1," + options.wordCount + "}",
  • "g"
  • )
  • );
  •  
  • // Check to make sure there is at least one
  • // snippet item.
  • if (!snippets.length){
  • snippets = [ "Visual content only" ];
  • }
  •  
  • // Calculate the word count.
  • var wordCount = (
  • (options.wordCount * (snippets.length - 1)) +
  • snippets[ snippets.length - 1 ].split( " " ).length
  • );
  •  
  • // Calculate the reading time.
  • var readingTime = Math.max(
  • Math.floor( wordCount / options.wordsPerMinute ),
  • 1
  • );
  •  
  • // Now, let's create the teaser. We will simply
  • // append it to the original container.
  • container.append(
  • "<p class=\"teaser-content\">" +
  • snippets[ 0 ] +
  • " [<em>" +
  • (wordCount + " words / aprx. read time: ") +
  • (readingTime + " minute(s)") +
  • "</em>]" +
  • "... <a href=\"#\">read more</a>" +
  • "</p>"
  • );
  •  
  • // Hide the actual content.
  • fullContent.hide();
  • }
  • );
  •  
  •  
  • // Now that we have split up the content and created
  • // the teaser headers, let's hook up the read more
  • // links to display the content.
  • this.find( "p.teaser-content a" )
  • .attr( "href", "javascript:void( 0 )" )
  • .click(
  • function(){
  • var link = $( this );
  •  
  • // Hide the teaser.
  • link.parent().hide();
  •  
  • // Show the content.
  • link.parent().prev().show();
  •  
  • // Cancel default event.
  • return( false );
  • }
  • )
  • ;
  •  
  • // Return this collection for method chaining.
  • return( this );
  • }
  •  
  • // Define defaults for the content teaser plugin.
  • // These can be overridden using the options hash when
  • // calling the teaser method.
  • $.fn.teaser.options = {
  • wordCount: 50,
  • wordsPerMinute: 200
  • };
  •  
  •  
  •  
  • // --------------------------------------------------- //
  • // --------------------------------------------------- //
  •  
  •  
  •  
  • // When the DOM is ready, initialize.
  • $(function(){
  • $( "div.blog-entry" ).teaser();
  • });
  •  
  • </script>
  • </head>
  • <body>
  •  
  • <h1>
  • Displaying A Content Teaser With jQuery
  • </h1>
  •  
  • <div class="blog-entry" style="border-bottom: 1px solid black ;">
  •  
  • <p>
  • Last night, I had the <em>craziest</em> dream about
  • <strong>Tricia</strong>. It was one of those dreams where
  • you know you're dreaming beacuse it's <em>way more
  • awesome</em> than anything that's happened in real life.
  • </p>
  •  
  • <p>
  • We're standing in the locker room talking and she asks me
  • if I would mind if she changed out of her gym clothes
  • while we're talking... would <em>I mind</em>?!? I say,
  • "Not at all," trying to mask my wild enthusiasm.
  • </p>
  •  
  • </div>
  •  
  • <div class="blog-entry" style="border-bottom: 1px solid black ;">
  •  
  • <p>
  • Last night, I had the <em>craziest</em> dream about
  • <strong>Tricia</strong>. It was one of those dreams where
  • you know you're dreaming beacuse it's <em>way more
  • awesome</em> than anything that's happened in real life.
  • </p>
  •  
  • <p>
  • We're standing in the locker room talking and she asks me
  • if I would mind if she changed out of her gym clothes
  • while we're talking... would <em>I mind</em>?!? I say,
  • "Not at all," trying to mask my wild enthusiasm.
  • </p>
  •  
  • </div>
  •  
  • </body>
  • </html>

I hope this helps in some way!



Reader Comments

Just for fun you could do this...

<cfset javaArray = CreateObject( "java","java.util.Arrays" ) />
<cfset wordArray = javaArray.copyOf( blogContent.Split( " " ), 50 ) />
<cfset blogSnippet = ArrayToList( wordArray, " " ) />

<p>#blogSnippet#</p>

@Ben you're like this regular expression master :P

It's funny though how peoples minds works in different ways; for me the first thing that came into my head to solve this problem was to simply use ListLen with a delimiter of " ". That would give the word count. Then loop through anything above your 50 word limit and do ListDeletAt :)

I feel a lot more comfortable using Lists / arrays for some reason. But as you said there are plenty of ways to solve this ... especially if you're a reg ex guru ;)

Tom

@John,

Nice! The biggest down side of going the array route would be that you have to splice the array. Of course Java would have something for that. Awesome.

@Tom,

Thanks my man. Regexs are the awesome.

@Ben, I thought it would be fun to try :) Java arrays and CF arrays aren't the same which makes it harder than it should be. You can also use the copyOfRange method to get something like the first 10 and last 10 words of an article like this:

<cfset javaArray = CreateObject( "java","java.util.Arrays" ) />
<cfset articleArray = blogContent.Split( " " ) />
<cfset startWordArray = javaArray.copyOf( articleArray, 11 ) />
<cfset endWordArray = javaArray.copyOfRange( articleArray, ArrayLen( articleArray ) - 10, ArrayLen( articleArray ) ) />
<cfset blogSnippet = ArrayToList( startWordArray, " " ) & " … " & ArrayToList( endWordArray, " " ) />

<p>#blogSnippet#</p>

Obviously, I'm just getting carried away now! :D

@Tom, I'd recommend against using ListDeleteAt for anything over 50 elements, as the article could be 1000 words long. You'd be better off creating a new array/list and appending the first 50 elements to it if you want to use lists/arrays.

Here's how you do it with a jQuery plugin:

Expander Plugin
http://plugins.learningjquery.com/expander/index.html

The Expander Plugin is a simple little jQuery plugin to hide/collapse a portion of an element's text and add a "read more" link so that the text can be viewed by the user if he or she wishes. By default, the expanded text is followed by a "re-collapse" link. Expanded text can also be re-collapsed at a specified time.

@John,

Cool stuff. I have explored the util.Collections - it looks like some really good stuff is in the util.Arrays as well.

@John,

Nice thinking, I hadn't taken into account the length of text could be huge and that working "backwards" would be quicker. Also changed it to arrays as I find them easier to manipulate. Here's a quick knock up of my solution ... just for fun (and I'm bored at work :P)

<cfset arr=ListToArray(BlogContent," ")>
<cfset wordcount=ArrayLen(arr)>
<cfset finaltxt="">
<cfloop from="1" to="50" index="i">
<cfset finaltxt&=arr[i]&" ">
</cfloop>

<cfoutput>#finaltxt#</cfoutput>

:)

I think Tom pretty much nailed how I would implement it (although I might rejoin with a StringBuffer or something instead). For Javascript, the same solution but using string.split(" ") would be just as effective. Its core JS to split a string into an array of parts, no REGEX required.

@Jon, I'm not sure you'd actually see any real performance gain by using a StringBuffer for 50 words, as you have the overhead of instantiating the Java object first. Definitely worthwhile for longer strings.

@John, True enough... do you know at what threshold the performance gain would start to matter? I've kind of taken it to heart that if you're doing more then adding a couple of strings together you should go with a StringBuffer... an idea on where the actual tipping point might be?

@Shuns,

The only think you need to be careful of with the word boundary is that it will get you on apostrophes. For example it will see "it's" as (it\b'\bs). That's why I tend to lean more towards spaces. I'd rather get a false positive than a false negative (if that makes sense).

Ben, could this concept also be applied to a text edit box where you wish to limit the user to a certain number of words or characters? Most scripts I've found fail as they include html tags in the character count.

I handle mine a little differently. I prefer setting a limit to a maximum number of characters and then finding the complete sentences that fall within that limit. That way my teazer doesn't get cut off midway during a sentence.

Don't know how efficient this code is, but this is how I do it:

<cffunction name="GetShortenedString" returntype="string">
<cfargument name="originalString" required="yes" />
<cfargument name="length" type="numeric" default="200" />
<cfset var newString = "" />

<cfif Len(arguments.originalString) LE arguments.length>
<cfset newString = arguments.originalString />
<cfelse>
<cfset newString = Left(arguments.originalString, arguments.length) />
<cfset reverseString = Reverse(newString) />
<cfset indexOfFullstop = Find(".", reverseString) />
<cfset indexOfLastFullstop = Find(".", newString, (Len(newString) - indexOfFullstop)) />
<cfset newString = Left(newString, indexOfLastFullstop) />
</cfif>
<cfreturn newString />
</cffunction>

@Paolo,

If I may, you can actually rock this out with a very simple regular expression. What you can do is find up to "X" characters and then ensure that the string ends with a period:

[\w\W]{1,100}\.

This will find up to 100 characters and end with a ".". If the 101 characters is not a period, the regex will start to backtrack looking for the "." characters.

Regular expressions are the awesome.

Awesome man!

Been meaning to get myself a copy of Jeffrey Friedl's "Mastering Regular Expressions" but somehow never find the time.

That is much cleaner and I presume less resource intensive!

@Paolo,

Easier, definitely, less resource intensive? Not sure. I want to say Yes, but only because I love regex so much.

I am looking forward to the another blog post with Tricia ;)

back to the topic. Is it possible to include a function which shows how many user had readed the post?

Lars

I have put this on two different pages and it works perfectly... except on one page, right before each instance of the outputted content there are two characters...

 

And on the other page before each instance of the output there is

Â

Any idea why that might be?

  used to be   , but I removed all whitespace and extra linebreaks from the code and it reduced it to  

And to be clear, the output text is inside a <div>
<div class="regularText">#blogSnippets[1]# ... </div>

But the strange characters are appearing BEFORE the div.

Cheers and thanks for the demo

Just thought I would pass along, I added

<cfprocessingdirective pageencoding="utf-8">

This removed the characters. There is still an extra linebreak where the character was, but that I can deal with.

Cheers.

@SuperAlly,

I wonder why the UTF-8 encoding would be needed after the shortening, but not before. Character encoding still confuses me mostly.

Thanks Ben, I was thinking about doing something like this on my website i just didn't know how to do it. This is excellent!

@Tom,

I like your suggestion, but how can i get it to loop through my query to output the finaltxt for all records?

LOVE this code, but am not so hot with regular expressions...I want to keep my <br> "line breaks" ...is there a method to do this? I've tried editing the regex but end up just breaking the whole thing.