As I was playing around with inserting text at the last known caret location yesterday, I stumbled upon a large gap in my mental model for how HTML works. For years, I've been using HTML entities to generate web-safe HTML markup. However, I only just realized that if you read the
textContent of an element that contains HTML entities, you don't get the HTML markup of said element, you get the interpreted text content. What this means, as an example, is that if you render an emoji using hex-encoded HTML entities, reading the
To demonstrate, all we're going to do is render a paragraph that is composed entirely of HTML entities. Then, we're going to grab the
textContent of that element and echo the value into both an
input element and the browser's console:
As you can see, our test paragraph contains some common HTML entities and some encoded emoji codepoint sequences. But, when we grab those values using
textContent and echo them to other text-base outputs, we get the following output:
As you can see, the
textContent property contains the evaluated HTML which, in this case, contains actual emoji glyphs, not the Unicode codepoints that we used to define the HTML content.
I can't believe I didn't know that the browser DOM (Document Object Model) worked this way. But, learning this is better late than never. I can definitely see this being helpful (unless you are one of those die-hards that believes "state" should never be stored on the DOM).
Want to use code from this post? Check out the license.