Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Now Using utf8mb4 To Enable Emoji In My Comments

By Ben Nadel on
Tags: SQL

This post is mostly here so that I can test the comment feature in production. I just updated my blog_comment table to use the utf8mb4 character set, which means that it can now support Emoji characters:

ALTER TABLE `blog_comment`
	MODIFY COLUMN `content` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
	MODIFY COLUMN `content_markdown` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
;

From what I understand, the utf8mb4 character set enables support for characters in the Astral Plane, which is where many (but not all) emoji characters live. Specifically, it supports Emjoi that require 4-bytes to encode.

On it's own, utf8mb4 doesn't enable Emoji support in my running MySQL server. I also had to tell the INSERT statement to use the appropriate character set:

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;

INSERT INTO blog_comment 
(
	content,
	content_markdown,

	-- .... truncated for demo ....

I believe that there is a way to configure the MySQL server to use those settings by default; however, that's beyond my experience at this point. So, rather than go down that rabbit-hole this morning, I'm just using SET NAMES, which I know works.

And now, hopefully, I'll be able to add Emoji in the comments below!



Reader Comments

@Jamie,

Woot! It's alive!!

It looks like something in the last Adobe CF Hotfix messed with my key-casing.

Reply to this Comment

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

^ I read this once a year to refresh my memory. If my understanding is correct, unicode is a "character set" and utf8mb4 is an "encoding."

Unicode has "code points" to represent 1,112,064 different characters across all the world's languages (and some random stuff like emojis too). utf8 can store most of the code points of unicode (using up to 3 bytes), and utf84mb can store them all (including emojis, using up to 4 bytes).

This is how you can say hello in Mandarin/UTF8/binary:
$ echo -n "δ½ ε₯½" | xxd -b
11100100 10111101 10100000 11100101 10100101 10111101

ε› η‚Ίζˆ‘ε­ΈδΈ­ζ–‡ε’Œζˆ‘δ½εœ¨ε°η£οΌŒζˆ‘ηŸ₯ι“ι€™ε€‹γ€‚πŸ˜…

Reply to this Comment

@Casey,

Ha, I know that I've read that article a handful of times as well :D Somehow, this stuff never quite plants itself permanently in my brain, and I have to re-learn aspects of this over and over. I think you are right about the "character set" vs. "encoding". ... I think πŸ€·β€β™‚οΈ

Reply to this Comment

Nice πŸ€“!

I posted about this a few weeks ago as well. It gives a bit of an explanation: https://www.petefreitag.com/item/895.cfm

From my experience you don't need to do the SET NAMES thing, you can instead alter the DB schema

ALTER TABLE tableName
DEFAULT CHARACTER SET utf8mb4,
MODIFY columnName varchar(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY anotherColumn text
CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Reply to this Comment

@Pete,

I modified the schema - it might be this piece that I am still missing:

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Though, I feel like we tried that at work, and we still needed to include the SET NAMES stuff. I'll have to play around.

Reply to this Comment

@All,

As a quasi-follow-up to this post, I started to think more deeply about how I might further leverage Emoji in my blog. The native chr() and asc() functions in ColdFusion only work for the Unicode range 0x0000 to 0xffff (or, 0 to 65,535). Once you go above that into the "supplemental" character range, these functions no longer work. As such, I believe you have to dip down into the Java layer:

www.bennadel.com/blog/3804-printing-emoji-characters-from-unicode-codepoints-in-lucee-cfml-5-3-5-92.htm

I know that, these days, you can embed Emoji directly within your CFML code. But, that just feels janky to me (probably cause I'm an old dog learning new tricks). As such, I'm really interesting in being able to render Emoji characters from the Unicode CodePoint.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.