Ask Ben: Displaying A Blog Teaser (Showing The First N Words)

By Ben Nadel

Published 2009-09-23 in Ask Ben, ColdFusion, JavaScript / DHTML — Comments (41)

Hi Ben, I am writing from London, United Kingdom. How can I display the first 50 words of a blog post as a teaser using some jQuery code snippet? I gotta follow you on twitter. cheers.

I know you asked to see this as a jQuery code snippet - and I will get to that - but first, I want to approach this from a ColdFusion view point. I think it makes less sense to do it with jQuery because by the time we can even execute jQuery, the contents of the page have already loaded; as such, we don't get the benefit of smaller page load times that a teaser might afford us.

I am sure that there are a number of ways to get the first N words in a string, but to me, the most obvious way is to use a regular expression. While it might not be immediately clear as to how "pattern matching" can help us in a case like this, if you think about the first 50 words as a "word item" matched at most 50 times in succession, then suddenly, getting the first 50 words as a pattern begins to make sense. For ease of use, when I do something like this in an HTML context - I like to strip out all markup tags before I grab the first N words. Leaving markup in, while possible, adds a huge level of complexity that is beyond the scope of this blog entry. Once the markup is removed, I then grab the first N words using the regular expression concept I just touched upon.

This action is quite small in scope; so, I thought to myself - how can we take this idea and make it even more effective? Well, as long as we're getting the first 50 words, we might as well display to the user how many words there are in the full content so that they may have an idea of what they are in for. And then I thought, if we're gonna get the total number of words in the full content, then let's get really crazy and actually give the user an estimated read time based on an average words-per-minute value.

With this new focus in mind, here is the ColdFusion solution that I came up with:

<!--- Save some content that might appear in a blog post. --->
<cfsavecontent variable="blogContent">

	<p>
		Last night, I had the <em>craziest</em> dream about
		<strong>Tricia</strong>. It was one of those dreams where
		you know you're dreaming beacuse it's <em>way more
		awesome</em> than anything that's happened in real life.
	</p>

	<p>
		We're standing in the locker room talking and she asks me
		if I would mind if she changed out of her gym clothes
		while we're talking... would <em>I mind</em>?!? I say,
		"Not at all," trying to mask my wild enthusiasm.
	</p>

</cfsavecontent>


<!---
	To display the teaser, the first thing I like to do is
	strip out all of the formatting tags. Leaving that tags in
	just opens yourself up to a world of complexity that you
	don't want to deal with.

	Let's replace the tags with a space so that we don't cause
	any unintended word concatination (this will cause some
	extra spacing, but there is only so much fine-tuning we
	can do in such a generic process).
--->
<cfset blogContent = reReplace(
	blogContent,
	"</?\w+(\s*[\w:]+\s*=\s*(""[^""]*""|'[^']*'))*\s*/?>",
	" ",
	"all"
	) />

<!---
	Now, let's minimize any white space characters. Technically,
	since HTML doesn't render more than one white space character,
	this isn't an issue; but, it's something I like to do anyway
	for intent.
--->
<cfset blogContent = reReplace(
	trim( blogContent ),
	"\s+",
	" ",
	"all"
	) />

<!---
	Now that we have cleaned our blog content, we need to extract
	the first 50 words. While this might not seem like a pattern,
	I think using a regular expression for this is a very nice
	approach. Think of it this way: the first 50 words is like
	finding a pattern that is 50 instances long.

	For ease-of-use, we are going to define a word TOKEN as any
	non-spaces followed by a space.

	NOTE: Because regular expressions are, by default, greedy,
	the notation {1,50} will try to match all 50 word tokens
	before it tries to match anything less.
--->
<cfset blogSnippets = reMatch(
	"([^\s]+\s?){1,50}",
	blogContent
	) />

<!---
	As a safe-guard, let's double-check to make sure that the
	reMatch() method returned at least one match. If a blog entry
	was purely visual (ie. just an image or set of images), then
	no words would be returned. If that is the case, then let's
	make up a content description.
--->
<cfif !arrayLen( blogSnippets )>

	<!---
		Fabricate content snippets. This is an outlier case, but
		we want to do this for mathematical stability.
	--->
	<cfset blogSnippets = [ "Visual content only" ] />

</cfif>


<!---
	Since reMatch() returns an array of matches, we have now
	broken our entire blog content up into an array in which
	each index has 50 or less words. As such, we can use the
	first index value as our blog teaser.

	But, since we broke up the entire blog post, we can also
	now estimate the total number of words in the entire blog
	entry (if we want to output it).

	We can even go one step further and give the reader an
	approciate time investment they would have to make based on
	an average number of words per minute.
--->
<cfoutput>

	<!---
		Get the approximate word count. The first N-1 matches
		will be 50; then, we can use the last returned match as
		a space-delimitted list.
	--->
	<cfset wordCount = (
		(50 * (arrayLen( blogSnippets ) - 1)) +
		listLen( blogSnippets[ arrayLen( blogSnippets ) ], " " )
		) />

	<!---
		Get the estimated time requirement for reading this blog
		entry - this assumes that a user can read about 200 words
		per minutes. This can be much higher, but for our
		purposes, this is fine.

		NOTE: Our max here is to make sure the read time never
		goes below one minute.
	--->
	<cfset readingTime = max( round( wordCount / 200 ), 1 ) />


	<p>
		<!---
			The first snippet of 50 words will serve as our
			blog teaser
		--->
		#blogSnippets[ 1 ]#

		<span style="white-space: nowrap ;">
			<!--- Words and time estimates. --->
			[<em>#wordCount# words /
			aprx. read time: #readingTime# minute(s)</em>]
			&hellip; <a href="##">read more</a> &raquo;
		</span>
	</p>

</cfoutput>

As you can see in the above code, after we strip out the markup, we use the following regular expression to break up the remaining textual content:

([^\s]+\s?){1,50}

This regular expression is very general and defines a "word token" as any sequence of non-space characters followed by an optional space. This "word token" is then matched at most 50 times in succession. When we run this through ColdFusion's REMatch() function, it takes the entire blog content and splits it up into chunks of 50 words (returned in an array). With this array, we can now get our first 50 words from the first index and our total word count based on the number of indices available.

When we run the above code, we get the following output:

Last night, I had the craziest dream about Tricia . It was one of those dreams where you know you're dreaming because it's way more awesome than anything that's happened in real life. We're standing in the locker room talking and she asks me if I would mind if she [74 words / aprx. read time: 1 minute(s)] ... read more Â»

So that's the ColdFusion method. To me, this makes a lot of sense as the full content would be available in the click-through on the "read more" link. But, I know you asked to see this as a jQuery code snippet, and I appreciate you bearing with me. To make this more generic, I figured I would try to wrap this up in a jQuery plugin such that it can be used to create a teaser for any kind of content.

This jQuery code follows the same concept as that defined in the ColdFusion solution - it strips out the markup (using the text() method) and then creates a teaser and a read more link:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
	<title>Displaying A Content Teaser With jQuery</title>
	<script type="text/javascript" src="jquery-1.3.2.js"></script>
	<script type="text/javascript">

		// Let's create a jQuery plugin that creates content
		// teasers with word counts and time estimates.
		$.fn.teaser = function( teaserOptions ){
			// Create the internal options.
			var options = $.extend(
				{},
				$.fn.teaser.options,
				teaserOptions
				);

			// Iterate over each item in this collection such
			// that each teaser can be applied individually based
			// on targeted information.
			this.each(
				function(){
					var container = $( this );

					// First, we want to wrap the contents of the
					// container in a new DIV so that we can hide
					// it in lieu of our teaser.
					var fullContent = container.children()
						.wrapAll(
							"<div class=\"full-content\"></div>"
							)
						.parent()
					;

					// Now, let's get all the text from the
					// content container so that we can start
					// constructing our teaser.
					var rawContent = $.trim( fullContent.text() );

					// Remove any extra white space.
					rawContent = rawContent.replace(
						new RegExp( "\\s+", "g" ),
						" "
						);

					// Break the content up into chunks based on
					// the words count.
					var snippets = rawContent.match(
						new RegExp(
							"([^\\s]+\\s?){1," + options.wordCount + "}",
							"g"
							)
						);

					// Check to make sure there is at least one
					// snippet item.
					if (!snippets.length){
						snippets = [ "Visual content only" ];
					}

					// Calculate the word count.
					var wordCount = (
						(options.wordCount * (snippets.length - 1)) +
						snippets[ snippets.length - 1 ].split( " " ).length
						);

					// Calculate the reading time.
					var readingTime = Math.max(
						Math.floor( wordCount / options.wordsPerMinute ),
						1
						);

					// Now, let's create the teaser. We will simply
					// append it to the original container.
					container.append(
						"<p class=\"teaser-content\">" +
						snippets[ 0 ] +
						" [<em>" +
						(wordCount + " words / aprx. read time: ") +
						(readingTime + " minute(s)") +
						"</em>]" +
						"... <a href=\"#\">read more</a>" +
						"</p>"
						);

					// Hide the actual content.
					fullContent.hide();
				}
			);


			// Now that we have split up the content and created
			// the teaser headers, let's hook up the read more
			// links to display the content.
			this.find( "p.teaser-content a" )
				.attr( "href", "javascript:void( 0 )" )
				.click(
					function(){
						var link = $( this );

						// Hide the teaser.
						link.parent().hide();

						// Show the content.
						link.parent().prev().show();

						// Cancel default event.
						return( false );
					}
				)
			;

			// Return this collection for method chaining.
			return( this );
		}

		// Define defaults for the content teaser plugin.
		// These can be overridden using the options hash when
		// calling the teaser method.
		$.fn.teaser.options = {
			wordCount: 50,
			wordsPerMinute: 200
		};



		// --------------------------------------------------- //
		// --------------------------------------------------- //



		// When the DOM is ready, initialize.
		$(function(){
			$( "div.blog-entry" ).teaser();
		});

	</script>
</head>
<body>

	<h1>
		Displaying A Content Teaser With jQuery
	</h1>

	<div class="blog-entry" style="border-bottom: 1px solid black ;">

		<p>
			Last night, I had the <em>craziest</em> dream about
			<strong>Tricia</strong>. It was one of those dreams where
			you know you're dreaming beacuse it's <em>way more
			awesome</em> than anything that's happened in real life.
		</p>

		<p>
			We're standing in the locker room talking and she asks me
			if I would mind if she changed out of her gym clothes
			while we're talking... would <em>I mind</em>?!? I say,
			"Not at all," trying to mask my wild enthusiasm.
		</p>

	</div>

	<div class="blog-entry" style="border-bottom: 1px solid black ;">

		<p>
			Last night, I had the <em>craziest</em> dream about
			<strong>Tricia</strong>. It was one of those dreams where
			you know you're dreaming beacuse it's <em>way more
			awesome</em> than anything that's happened in real life.
		</p>

		<p>
			We're standing in the locker room talking and she asks me
			if I would mind if she changed out of her gym clothes
			while we're talking... would <em>I mind</em>?!? I say,
			"Not at all," trying to mask my wild enthusiasm.
		</p>

	</div>

</body>
</html>

I hope this helps in some way!

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/1718

Reader Comments

Tony Nelson Sep 23, 2009 at 10:42 AM

28 Comments

While regular expressions are pretty handy, aren't they a little overkill when a simple #left(blogContent,50)# would suffice?

Ben Nadel Sep 23, 2009 at 10:45 AM

16,020 Comments

@Tony,

Left() gets the left-most characters; this person was looking for left-most words.

John Whish Sep 23, 2009 at 10:46 AM

38 Comments

@Tony, that would only return the first 50 characters, not words.

Tony Nelson Sep 23, 2009 at 10:47 AM

28 Comments

Duh. How'd I miss that?

Ben Nadel Sep 23, 2009 at 10:48 AM

16,020 Comments

@Tony,

Plus, then you miss out on all the total-words and estimated reading time goodness :)

Alain Van Driessche Sep 23, 2009 at 10:48 AM

3 Comments

That was really interesting and useful Ben. Thanks for sharing your expertise once again :)

John Whish Sep 23, 2009 at 11:08 AM

38 Comments

Just for fun you could do this...

<cfset javaArray = CreateObject( "java","java.util.Arrays" ) />
<cfset wordArray = javaArray.copyOf( blogContent.Split( " " ), 50 ) />
<cfset blogSnippet = ArrayToList( wordArray, " " ) />

<p>#blogSnippet#</p>

Tom Jenkins Sep 23, 2009 at 11:22 AM

14 Comments

@Ben you're like this regular expression master :P

It's funny though how peoples minds works in different ways; for me the first thing that came into my head to solve this problem was to simply use ListLen with a delimiter of " ". That would give the word count. Then loop through anything above your 50 word limit and do ListDeletAt :)

I feel a lot more comfortable using Lists / arrays for some reason. But as you said there are plenty of ways to solve this ... especially if you're a reg ex guru ;)

Tom

Ben Nadel Sep 23, 2009 at 11:29 AM

16,020 Comments

@John,

Nice! The biggest down side of going the array route would be that you have to splice the array. Of course Java would have something for that. Awesome.

@Tom,

Thanks my man. Regexs are the awesome.

John Whish Sep 23, 2009 at 11:45 AM

38 Comments

@Ben, I thought it would be fun to try :) Java arrays and CF arrays aren't the same which makes it harder than it should be. You can also use the copyOfRange method to get something like the first 10 and last 10 words of an article like this:

<cfset javaArray = CreateObject( "java","java.util.Arrays" ) />
<cfset articleArray = blogContent.Split( " " ) />
<cfset startWordArray = javaArray.copyOf( articleArray, 11 ) />
<cfset endWordArray = javaArray.copyOfRange( articleArray, ArrayLen( articleArray ) - 10, ArrayLen( articleArray ) ) />
<cfset blogSnippet = ArrayToList( startWordArray, " " ) & " … " & ArrayToList( endWordArray, " " ) />

<p>#blogSnippet#</p>

Obviously, I'm just getting carried away now! :D

@Tom, I'd recommend against using ListDeleteAt for anything over 50 elements, as the article could be 1000 words long. You'd be better off creating a new array/list and appending the first 50 elements to it if you want to use lists/arrays.

James Moberg Sep 23, 2009 at 11:51 AM

91 Comments

Here's how you do it with a jQuery plugin:

Expander Plugin
http://plugins.learningjquery.com/expander/index.html

The Expander Plugin is a simple little jQuery plugin to hide/collapse a portion of an element's text and add a "read more" link so that the text can be viewed by the user if he or she wishes. By default, the expanded text is followed by a "re-collapse" link. Expanded text can also be re-collapsed at a specified time.

Ben Nadel Sep 23, 2009 at 11:57 AM

16,020 Comments

@John,

Cool stuff. I have explored the util.Collections - it looks like some really good stuff is in the util.Arrays as well.

Tom Jenkins Sep 23, 2009 at 12:24 PM

14 Comments

@John,

Nice thinking, I hadn't taken into account the length of text could be huge and that working "backwards" would be quicker. Also changed it to arrays as I find them easier to manipulate. Here's a quick knock up of my solution ... just for fun (and I'm bored at work :P)

<cfoutput>#finaltxt#</cfoutput>

Jon Hartmann Sep 23, 2009 at 12:38 PM

34 Comments

I think Tom pretty much nailed how I would implement it (although I might rejoin with a StringBuffer or something instead). For Javascript, the same solution but using string.split(" ") would be just as effective. Its core JS to split a string into an array of parts, no REGEX required.

John Whish Sep 23, 2009 at 12:43 PM

38 Comments

@Jon, I'm not sure you'd actually see any real performance gain by using a StringBuffer for 50 words, as you have the overhead of instantiating the Java object first. Definitely worthwhile for longer strings.

Jon Hartmann Sep 23, 2009 at 1:40 PM

34 Comments

@John, True enough... do you know at what threshold the performance gain would start to matter? I've kind of taken it to heart that if you're doing more then adding a couple of strings together you should go with a StringBuffer... an idea on where the actual tipping point might be?

shuns Sep 23, 2009 at 8:55 PM

76 Comments

Instead of using \s for whitespace another option is to use the word boundry \b

John Whish Sep 24, 2009 at 3:47 AM

38 Comments

@Jon, good question - I don't know! Sorry :)

Ben Nadel Sep 24, 2009 at 9:15 AM

16,020 Comments

@Shuns,

The only think you need to be careful of with the word boundary is that it will get you on apostrophes. For example it will see "it's" as (it\b'\bs). That's why I tend to lean more towards spaces. I'd rather get a false positive than a false negative (if that makes sense).

Lance Sep 24, 2009 at 11:28 AM

12 Comments

I think we need to know the rest of the story of what happened with Tricia ;-)

shuns Sep 24, 2009 at 6:21 PM

76 Comments

Ah ok Ben, I didn't know that one about the apostrophes, I have only used \b once or twice :)

Connie DeCinko Sep 24, 2009 at 6:43 PM

3 Comments

Ben, could this concept also be applied to a text edit box where you wish to limit the user to a certain number of words or characters? Most scripts I've found fail as they include html tags in the character count.

Ben Nadel Sep 24, 2009 at 7:00 PM

16,020 Comments

@Lance,

Another blog post, perhaps :)

@Connie,

Yes, absolutely.

Joshua Gonzalez Sep 25, 2009 at 5:16 PM

3 Comments

Truly amazing stuff! Just what I needed for my development.

Thanks!

Paolo Broccardo Sep 28, 2009 at 5:41 AM

21 Comments

I handle mine a little differently. I prefer setting a limit to a maximum number of characters and then finding the complete sentences that fall within that limit. That way my teazer doesn't get cut off midway during a sentence.

Don't know how efficient this code is, but this is how I do it:

Ben Nadel Sep 29, 2009 at 9:28 AM

16,020 Comments

@Paolo,

If I may, you can actually rock this out with a very simple regular expression. What you can do is find up to "X" characters and then ensure that the string ends with a period:

[\w\W]{1,100}\.

This will find up to 100 characters and end with a ".". If the 101 characters is not a period, the regex will start to backtrack looking for the "." characters.

Regular expressions are the awesome.

Paolo Sep 29, 2009 at 3:17 PM

34 Comments

Awesome man!

Been meaning to get myself a copy of Jeffrey Friedl's "Mastering Regular Expressions" but somehow never find the time.

That is much cleaner and I presume less resource intensive!

Ben Nadel Oct 1, 2009 at 8:23 AM

16,020 Comments

@Paolo,

Easier, definitely, less resource intensive? Not sure. I want to say Yes, but only because I love regex so much.

Frank Quednau Oct 1, 2009 at 5:18 PM

1 Comments

So far, the first regex to strip tags works pretty well...I am using it in C#, though :)

Thanks very much!

Ben Nadel Oct 1, 2009 at 5:22 PM

16,020 Comments

@Frank,

Awesome. C# is a cool language - I have played around with it a bit.

Lars Nov 17, 2009 at 8:26 AM

1 Comments

I am looking forward to the another blog post with Tricia ;)

back to the topic. Is it possible to include a function which shows how many user had readed the post?

Lars

Ben Nadel Nov 17, 2009 at 8:56 AM

16,020 Comments

@Lars,

You have to include database functionality to do that, which is beyond the concept of this blog post.

SuperAlly Apr 28, 2010 at 5:57 PM

8 Comments

I have put this on two different pages and it works perfectly... except on one page, right before each instance of the outputted content there are two characters...

¬†

And on the other page before each instance of the output there is

Any idea why that might be?

¬† used to be ¬†¬†, but I removed all whitespace and extra linebreaks from the code and it reduced it to ¬†

And to be clear, the output text is inside a <div>
<div class="regularText">#blogSnippets[1]# ... </div>

But the strange characters are appearing BEFORE the div.

Cheers and thanks for the demo

SuperAlly May 5, 2010 at 3:05 PM

8 Comments

Just thought I would pass along, I added

This removed the characters. There is still an extra linebreak where the character was, but that I can deal with.

Cheers.

Ben Nadel May 7, 2010 at 9:52 PM

16,020 Comments

@SuperAlly,

I wonder why the UTF-8 encoding would be needed after the shortening, but not before. Character encoding still confuses me mostly.

FlashD Aug 26, 2010 at 6:16 AM

11 Comments

@Ben
Thanks mate. This is fantastic!
Simple, easy to use and understand!

Well done! AGAIN!!

Bull Sep 4, 2010 at 11:32 PM

1 Comments

Thanks Ben, I was thinking about doing something like this on my website i just didn't know how to do it. This is excellent!

Ben Nadel Sep 5, 2010 at 12:22 PM

16,020 Comments

@Flashdakota, @Bull,

Glad this was able to help you guys out.

Adam Oct 4, 2010 at 3:01 PM

1 Comments

@Tom,

I like your suggestion, but how can i get it to loop through my query to output the finaltxt for all records?

Jonathan Perret Jan 3, 2011 at 11:25 AM

1 Comments

Thanks Ben. Saved me some time today. I owe you a cold one.

hinsel Nov 8, 2012 at 6:09 PM

1 Comments

LOVE this code, but am not so hot with regular expressions...I want to keep my <br> "line breaks" ...is there a method to do this? I've tried editing the regex but end up just breaking the whole thing.

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.