Detecting Rendered Line Breaks In A Text Node In JavaScript

By Ben Nadel

Published 2022-08-17 in JavaScript / DHTML — Comments (13)

At work, I've been building a way to generate "placeholder" images using a fragment of the DOM (Document Object Model). And, up until now, I've been using the .measureText() method, available on the Canvas 2D rendering context, to programmatically wrap lines-of-text onto a <canvas> element. But, this approach has proven to be a bit "glitchy" on the edges. As such, I wanted to see if I could find a way to detect the rendered line breaks in a text node of the document, regardless of what the text in the markup looked like. Then, I could more easily render the lines of text to the <canvas> element. It turns out, the Range class in JavaScript (well, in the browser) might be exactly what I need.

Run this demo in my JavaScript Demos project on GitHub.

View this code in my JavaScript Demos project on GitHub.

ASIDE: As a quick note, I'm actually trying to recreate a very tiny fraction of what the html2canvas library by Niklas von Hertzen already does. But, as stated in his own README, the html2canvas library should not be used in a production application. As such, I wanted to try and create something over which I had full control (and responsibility).

A Range object represents some fragment of the page. This can contain a series of nodes; or, part of a text node. What's really cool about the Range object is that it can return a collection of bounding boxes that represent the visual rendering of the items within the range.

I actually looked at the Range object once before when drawing boxes around selected text. I didn't really have a use-case for that exploration at the time; but, performing that experiment 4-years ago allowed me to see a path forward in my current problem.

If I have a text-node in the DOM, and I create a Range for the contents of that text-node, the .getClientRects() method, on the Range, will return the bounding box for each line of text as it is rendered for the user. Now, this doesn't inherently tell me which chunk of text is on which rendered line; but, it gives us a way to do that with a little brute-force magic.

Consider a Range that has a single character in it - the first character in our text-node. This Range will only have a single bounding box. Now, what if we add the second character to that Range and examine the bounding boxes? If there is still a single bounding box, we can deduce that the second character is in the first line of text. But, if we now have two bounding boxes, we can deduce that the second character belongs in the second line of text.

Extending this, if we incrementally expand the contents of a Range, one character at a time, the last added character will always be in the last line of text. And, we can determine the "index" of that last line of text by using the current count of the bounding boxes.

This is definitely brute force and is probably going to be slow on very large chunks of text. But, for a single paragraph on a desktop computer, this brute force approach feels instantaneous.

Let's see this in action. In the following demo, I have a text node with some static text in it. When you click the button, I examine the text node and brute force extract the rendered lines of text and log them to the console. The method of not here is called extractLinesFromTextNode() - this is where we dynamically extend the Range to identify the text wrapping:

<!doctype html>
<html lang="en">
<head>
	<meta charset="utf-8" />
	<title>
		Detecting Rendered Line Breaks In A Text Node In JavaScript
	</title>
	<link rel="stylesheet" type="text/css" href="./main.css" />
</head>
<body>

	<h1>
		Detecting Rendered Line Breaks In A Text Node In JavaScript
	</h1>

	<p class="sample" style="width: 400px ;">
		You fell victim to one of the classic blunders-the most famous of which is,
		"Never get involved in a land war in Asia" - but only slightly less well-known
		is this: "Never go against a Sicilian when death is on the line"! Ha ha ha ha ha
		ha ha! Ha ha ha ha ha ha ha!
	</p>

	<p>
		<button class="button">
			Detect Line Breaks
		</button>
	</p>

	<script type="text/javascript">

		var source = document.querySelector( ".sample" ).firstChild;
		var button = document.querySelector( ".button" );

		// When the user clicks the button, process the text node.
		button.addEventListener(
			"click",
			function handleClick( event ) {

				logLines( extractLinesFromTextNode( source ) );

			}
		);

		// --------------------------------------------------------------------------- //
		// --------------------------------------------------------------------------- //

		/**
		* I extract the visually rendered lines of text from the given textNode as it
		* exists in the document at this very moment. Meaning, it returns the lines of
		* text as seen by the user.
		*/
		function extractLinesFromTextNode( textNode ) {

			if ( textNode.nodeType !== 3 ) {

				throw( new Error( "Lines can only be extracted from text nodes." ) );

			}

			// BECAUSE SAFARI: None of the "modern" browsers seem to care about the actual
			// layout of the underlying markup. However, Safari seems to create range
			// rectangles based on the physical structure of the markup (even when it
			// makes no difference in the rendering of the text). As such, let's rewrite
			// the text content of the node to REMOVE SUPERFLUOS WHITE-SPACE. This will
			// allow Safari's .getClientRects() to work like the other modern browsers.
			textNode.textContent = collapseWhiteSpace( textNode.textContent );

			// A Range represents a fragment of the document which contains nodes and
			// parts of text nodes. One thing that's really cool about a Range is that we
			// can access the bounding boxes that contain the contents of the Range. By
			// incrementally adding characters - from our text node - into the range, and
			// then looking at the Range's client rectangles, we can determine which
			// characters belong in which rendered line.
			var textContent = textNode.textContent;
			var range = document.createRange();
			var lines = [];
			var lineCharacters = [];

			// Iterate over every character in the text node.
			for ( var i = 0 ; i < textContent.length ; i++ ) {

				// Set the range to span from the beginning of the text node up to and
				// including the current character (offset).
				range.setStart( textNode, 0 );
				range.setEnd( textNode, ( i + 1 ) );

				// At this point, the Range's client rectangles will include a rectangle
				// for each visually-rendered line of text. Which means, the last
				// character in our Range (the current character in our for-loop) will be
				// the last character in the last line of text (in our Range). As such, we
				// can use the current rectangle count to determine the line of text.
				var lineIndex = ( range.getClientRects().length - 1 );

				// If this is the first character in this line, create a new buffer for
				// this line.
				if ( ! lines[ lineIndex ] ) {

					lines.push( lineCharacters = [] );

				}

				// Add this character to the currently pending line of text.
				lineCharacters.push( textContent.charAt( i ) );

			}

			// At this point, we have an array (lines) of arrays (characters). Let's
			// collapse the character buffers down into a single text value.
			lines = lines.map(
				function operator( characters ) {

					return( collapseWhiteSpace( characters.join( "" ) ) );

				}
			);

			// DEBUGGING: Draw boxes around our client rectangles.
			drawRectBoxes( range.getClientRects() );

			return( lines );

		}


		/**
		* I normalize the white-space in the given value such that the amount of white-
		* space matches the rendered white-space (browsers collapse strings of white-space
		* down to single space character, visually, and this is just updating the text to
		* match that behavior).
		*/
		function collapseWhiteSpace( value ) {

			return( value.trim().replace( /\s+/g, " " ) );

		}


		/**
		* I draw red boxes on the screen for the given client rects.
		*/
		function drawRectBoxes( clientRects ) {

			arrayFrom( document.querySelectorAll( ".box" ) ).forEach(
				function iterator( node ) {

					node.remove();

				}
			);

			arrayFrom( clientRects ).forEach(
				function iterator( rect ) {

					var box = document.createElement( "div" );
					box.classList.add( "box" )
					box.style.top = ( rect.y + "px" );
					box.style.left = ( rect.x + "px" );
					box.style.width = ( rect.width + "px" );
					box.style.height = ( rect.height + "px" );
					document.body.appendChild( box );

				}
			);

		}


		/**
		* I log the given lines of text using a grouped output.
		*/
		function logLines( lines ) {

			console.group( "Rendered Lines of Text" );

			lines.forEach(
				function iterator( line, i ) {

					console.log( i, line );

				}
			);

			console.groupEnd();

		}


		/**
		* I create a true array from the given array-like data. Array.from() if you are on
		* modern browsers.
		*/
		function arrayFrom( arrayLike ) {

			return( Array.prototype.slice.call( arrayLike ) );

		}

	</script>

</body>
</html>

As you can see, we're looping over the characters in our text-node, adding each one the Range in sequence. Then, after each character has been added, we look at the current number of bounding boxes in order to determine which line of text contains the just-added character:

var lineIndex = ( range.getClientRects().length - 1 );

At the end of the brute-forcing, we have a two-dimensional array of characters in which the first dimension is this lineIndex value. Then, we simply collapse each character buffer (Array) down into a single String and we have our lines of text:

Multiple lines of text being extracted from a single text node in the DOM using JavaScript.

As you can see, we took a text-node from the DOM, which has no inherent line-breaks or text-wrapping, and used the Range object to determine which substrings of that text-node where on which lines (as seen by the user).

This works on my Chrome, Firefox, Edge, and Safari (though, I had to normalize the white-space in the text-content in order for Safari to work consistently with the modern browsers). And, of course, this is for a text node only. Meaning, this approach wasn't designed to work with an Element node that might contain mixed-content (such as formatting elements). But, such a constraint is sufficient for my particular use-case.

Once I have this production, I'd like to follow-up with a more in-depth example of how I'm generating the placeholder images using the <canvas> element. But, I'm hopeful that this approach will make it much easier to render multi-line text to that image.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4310

Reader Comments

Hassam Ali Aug 17, 2022 at 9:44 PM

21 Comments

This is nice use case of range.

Ben Nadel Aug 17, 2022 at 9:46 PM

16,255 Comments

@Hassam,

Thanks! It would be great if there was something "more native" that would expose this information, though, so it could have better performance. I just assume that calculating the Range values over and over again for each character is relatively slow. But, for my purposes, it seems to be fast enough.

Mario Luevanos Aug 18, 2022 at 9:41 AM

1 Comments

I didn't know about the Range object until now. 🙌

It reminds me of a utility function I created to highlight words on a page where I used document.createTreeWalker to create a TreeWalker object, but I'm thinking the Range object might work too!

Ben Nadel Aug 18, 2022 at 9:46 AM

16,255 Comments

@Mario,

I think I played around with the TreeWalker API a number of years ago because I was trying to access Comment nodes in the DOM and jQuery, at least at the time, didn't make it super easy to get at non-Element nodes.

To be honest, I don't have my head wrapped around the full use-cases for Range objects. I know that the Selection API (what the user has highlighted) uses Ranges inside of it; which is where I think I first came to know of this. But, outside of the Selection, this is the first time that I've used it.

Ben Nadel Aug 18, 2022 at 12:29 PM

16,255 Comments

Here's a fast-follow to demonstrate how I am intending to use this line-extraction technique in order to help write text to the <canvas> object:

www.bennadel.com/blog/4311-rendering-wrapped-text-to-a-canvas-in-javascript.htm

Essentially, with Canvas, there is no "wrapped text" concept. As such, if you want to render wrapped text to the canvas, you have to break the text apart into individual lines; and then, render each line, in turn, at an increasing Y-offset. Hence my need to break text apart at the runtime line-breaks.

Neven Josipovic Nov 7, 2022 at 10:44 AM

1 Comments

💯💯💯💯💯

Ben Nadel Nov 7, 2022 at 10:44 AM

16,255 Comments

@Neven,

Ha ha 💪

Bruce Sep 11, 2023 at 12:10 PM

1 Comments

Clever! 👍

Tran Jan 6, 2024 at 12:06 PM

1 Comments

You saved my life, thanks 😍😍😍

Ben Nadel Jan 7, 2024 at 11:15 AM

16,255 Comments

@Tran, woot woot! Glad to help 🙌

Allan Deutsch Oct 28, 2024 at 10:55 PM

2 Comments

Great demo Ben! This is a pretty clever use of Range. Couple things I've noticed that could be improved:

the way you collapse whitespace would result in incorrect text indentation for pre-formatted content such as a code block. Maybe this is not an issue for you, but I've been having a hard time figuring out a solution here because the textContent property contains indentation that is ignored in markup (ie from indenting nested HTML tags in code).
your lineCharacters variable could be a string instead of an array, and then you wouldn't need to perform a join to convert it to a string. I haven't profiled it, but I suspect this would be faster and likely use less memory as strings are better optimized for holding text than arrays.

You also mentioned that the approach you're using doesn't work for nested elements (ie links). You could probably use createTreeWalker and walk all the child text nodes to address that.

Ben Nadel Oct 30, 2024 at 4:07 PM

16,255 Comments

@Allan,

Thankfully, my particular use-case was very much controlled. I only had a plain-text input (<textarea>) that the user was typing into. Then, I was previewing that rendering in a <div> and subsequently drawing it to a Canvas. So, I ended up not having to deal with any special formatting issues like <pre> tags. Heck, I didn't even support bold or italics or anything. It was definitely a minimum viable product.

I'm curious to hear more about what you're trying to do. Are you screen-shotting a more complex UI or something?

Allan Deutsch Oct 30, 2024 at 6:49 PM

2 Comments

@Ben Nadel,

If you've ever used the inline commenting feature for a document in Microsoft word or a google doc, I'm basically building that UI for any website. So I can capture anything I want when the comment is made, but I need to reliably reproduce the highlight on the same content in future page loads.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.