Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Amsterdam) with: Ray Camden and David Huselid and Alison Huselid
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Amsterdam) with: Ray Camden ( @cfjedimaster ) David Huselid Alison Huselid

Now Using utf8mb4 To Enable Emoji In My Comments

By on
Tags:

This post is mostly here so that I can test the comment feature in production. I just updated my blog_comment table to use the utf8mb4 character set, which means that it can now support Emoji characters:

ALTER TABLE `blog_comment`
	MODIFY COLUMN `content` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
	MODIFY COLUMN `content_markdown` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
;

From what I understand, the utf8mb4 character set enables support for characters in the Astral Plane, which is where many (but not all) emoji characters live. Specifically, it supports Emjoi that require 4-bytes to encode.

On it's own, utf8mb4 doesn't enable Emoji support in my running MySQL server. I also had to tell the INSERT statement to use the appropriate character set:

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;

INSERT INTO blog_comment 
(
	content,
	content_markdown,

	-- .... truncated for demo ....

I believe that there is a way to configure the MySQL server to use those settings by default; however, that's beyond my experience at this point. So, rather than go down that rabbit-hole this morning, I'm just using SET NAMES, which I know works.

And now, hopefully, I'll be able to add Emoji in the comments below!

Want to use code from this post? Check out the license.

Reader Comments

1 Comments

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

^ I read this once a year to refresh my memory. If my understanding is correct, unicode is a "character set" and utf8mb4 is an "encoding."

Unicode has "code points" to represent 1,112,064 different characters across all the world's languages (and some random stuff like emojis too). utf8 can store most of the code points of unicode (using up to 3 bytes), and utf84mb can store them all (including emojis, using up to 4 bytes).

This is how you can say hello in Mandarin/UTF8/binary:
$ echo -n "你好" | xxd -b
11100100 10111101 10100000 11100101 10100101 10111101

因為我學中文和我住在台灣,我知道這個。😅

15,640 Comments

@Casey,

Ha, I know that I've read that article a handful of times as well :D Somehow, this stuff never quite plants itself permanently in my brain, and I have to re-learn aspects of this over and over. I think you are right about the "character set" vs. "encoding". ... I think 🤷‍♂️

19 Comments

Nice 🤓!

I posted about this a few weeks ago as well. It gives a bit of an explanation: https://www.petefreitag.com/item/895.cfm

From my experience you don't need to do the SET NAMES thing, you can instead alter the DB schema

ALTER TABLE tableName
DEFAULT CHARACTER SET utf8mb4,
MODIFY columnName varchar(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY anotherColumn text
CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

15,640 Comments

@Pete,

I modified the schema - it might be this piece that I am still missing:

And you also need to make sure your database driver is using utf8, for mysql that might be adding useUnicode=true to the connection string.

Though, I feel like we tried that at work, and we still needed to include the SET NAMES stuff. I'll have to play around.

15,640 Comments

@All,

As a quasi-follow-up to this post, I started to think more deeply about how I might further leverage Emoji in my blog. The native chr() and asc() functions in ColdFusion only work for the Unicode range 0x0000 to 0xffff (or, 0 to 65,535). Once you go above that into the "supplemental" character range, these functions no longer work. As such, I believe you have to dip down into the Java layer:

www.bennadel.com/blog/3804-printing-emoji-characters-from-unicode-codepoints-in-lucee-cfml-5-3-5-92.htm

I know that, these days, you can embed Emoji directly within your CFML code. But, that just feels janky to me (probably cause I'm an old dog learning new tricks). As such, I'm really interesting in being able to render Emoji characters from the Unicode CodePoint.

15,640 Comments

I finally got around to upgrading my MySQL database version and my table schemas. Now, everything runs on utf8mb4. And, as @Pete pointed out, I can remove the SET NAMES stuff from my SQL statements as long as I have have characterEncoding=UTF-8 in my connection string. I also took this as an opportunity cleanup that connection string as well (as this tends to rot over time).

www.bennadel.com/blog/4218-upgrading-my-coldfusion-blog-from-mysql-5-7-10-to-mysql-8-0-28.htm

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel