Skip to main content
Ben Nadel at CFUNITED 2008 (Washington, D.C.) with: Matt Woodward
Ben Nadel at CFUNITED 2008 (Washington, D.C.) with: Matt Woodward ( @mpwoodward )

Now Using utf8mb4 To Enable Emoji In My Comments

By on

This post is mostly here so that I can test the comment feature in production. I just updated my blog_comment table to use the utf8mb4 character set, which means that it can now support Emoji characters:

ALTER TABLE `blog_comment`
	MODIFY COLUMN `content` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
	MODIFY COLUMN `content_markdown` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL

From what I understand, the utf8mb4 character set enables support for characters in the Astral Plane, which is where many (but not all) emoji characters live. Specifically, it supports Emjoi that require 4-bytes to encode.

On it's own, utf8mb4 doesn't enable Emoji support in my running MySQL server. I also had to tell the INSERT statement to use the appropriate character set:

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;

INSERT INTO blog_comment 

	-- .... truncated for demo ....

I believe that there is a way to configure the MySQL server to use those settings by default; however, that's beyond my experience at this point. So, rather than go down that rabbit-hole this morning, I'm just using SET NAMES, which I know works.

And now, hopefully, I'll be able to add Emoji in the comments below!

Want to use code from this post? Check out the license.

Reader Comments


Huh, it looks like the key-casing on my returned Struct suddenly changed? That's..... wonky.



Woot! It's alive!!

It looks like something in the last Adobe CF Hotfix messed with my key-casing.


^ I read this once a year to refresh my memory. If my understanding is correct, unicode is a "character set" and utf8mb4 is an "encoding."

Unicode has "code points" to represent 1,112,064 different characters across all the world's languages (and some random stuff like emojis too). utf8 can store most of the code points of unicode (using up to 3 bytes), and utf84mb can store them all (including emojis, using up to 4 bytes).

This is how you can say hello in Mandarin/UTF8/binary:
$ echo -n "δ½ ε₯½" | xxd -b
11100100 10111101 10100000 11100101 10100101 10111101

ε› η‚Ίζˆ‘ε­ΈδΈ­ζ–‡ε’Œζˆ‘δ½εœ¨ε°η£οΌŒζˆ‘ηŸ₯ι“ι€™ε€‹γ€‚πŸ˜…



Ha, I know that I've read that article a handful of times as well :D Somehow, this stuff never quite plants itself permanently in my brain, and I have to re-learn aspects of this over and over. I think you are right about the "character set" vs. "encoding". ... I think πŸ€·β€β™‚οΈ


Nice πŸ€“!

I posted about this a few weeks ago as well. It gives a bit of an explanation:

From my experience you don't need to do the SET NAMES thing, you can instead alter the DB schema

MODIFY columnName varchar(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY anotherColumn text

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.



I modified the schema - it might be this piece that I am still missing:

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Though, I feel like we tried that at work, and we still needed to include the SET NAMES stuff. I'll have to play around.



As a quasi-follow-up to this post, I started to think more deeply about how I might further leverage Emoji in my blog. The native chr() and asc() functions in ColdFusion only work for the Unicode range 0x0000 to 0xffff (or, 0 to 65,535). Once you go above that into the "supplemental" character range, these functions no longer work. As such, I believe you have to dip down into the Java layer:

I know that, these days, you can embed Emoji directly within your CFML code. But, that just feels janky to me (probably cause I'm an old dog learning new tricks). As such, I'm really interesting in being able to render Emoji characters from the Unicode CodePoint.


I finally got around to upgrading my MySQL database version and my table schemas. Now, everything runs on utf8mb4. And, as @Pete pointed out, I can remove the SET NAMES stuff from my SQL statements as long as I have have characterEncoding=UTF-8 in my connection string. I also took this as an opportunity cleanup that connection string as well (as this tends to rot over time).

Post A Comment — I'd Love To Hear From You!

NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.
I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel