2017 twenty-four merry days of Perl Feed


Printing Unicode with Perl - 2017-12-01

One of the great things about the Wise Old Elf's job was that he got to keep up with all the latest fads the kids were up to. He fondly remembered the hula-hoop craze. Micromachines. Pogs. Oh the memories!

Today's modern day kids are all about text messages. But not just any old text in their text messages. No, these kids are all about the Emojis. And honestly, how could anyone at the North Pole not love Emojis too? After all, Santa has his very own emoji: Character 127877 in Unicode FATHER CHRISTMAS.


The Wise Old Elf was working on Perl script to generate text messages to all the elves telling them what a good job they were doing this year. And, of course, this would include the Father Christmas emoji.

The first thing the Wise Old Elf did was universally enable UTF-8 encoding on STDOUT, STDIN, and STDERR with an open pragma:


use open qw( :encoding(UTF-8) :std );

Now when he printed out his emoji Perl would be outputting the byte sequence necessary to represent that emoji in UTF-8 and (hopefully) the other end would be able to recognize the code for Santa and Do The Right Thing.

He then simply printed out the message he wanted, using the hexadecimal code for 127877, 1F385.


print "\x{1F385} thanks you for all your hard work tonight";

No sooner had he typed this then he started to have second thoughts. \x{1F385} isn't exactly the most instantly recognizable code. What if he had accidentally typed \x{1F383} by mistake, the Unicode character for the jack-o-lantern. What a Nightmare Before Christmas situation that would be!

Of course, the Wise Old Elf was using a modern editor which would be happy to edit utf-8 encoded source code, so he could simply enable UTF-8 source code in Perl too by declaring:


use utf8;

At the top of the source code, and then he'd be able to simply write:


print "🎅 thanks you for all your hard work tonight";

Was he done? He wasn't so sure.

Typing this would this either involve some crazy keyboard shenanigans to type, or he'd have to copy-and-paste from a Unicode pallet. It wasn't the best situation, but he could put up with that.

The main problem was that the Wise Old Elf was Old enough (and Wise enough) to know that not all editors support UTF-8 source code and all it'll take is one Elf using an editor doing the wrong thing with encodings to potentially break everything. The last thing the Wise Old Elf needed was to post a message of thanks for a bunch of unrenderable character sequences instead of the man in the red suit.

Luckily Perl is able to represent all Unicode characters in a simple, readable, ASCII encodable way: You can use the \N{...} sequence with the official name of the character:


print "\N{FATHER CHRISTMAS} thanks you for all your hard work tonight";
Gravatar Image This article contributed by: Mark Fowler <mark@twoshortplanks.com>