2017 twenty-four merry days of Perl Feed

It's All There in the Numbers!

Unicode::UCD - 2017-12-13

The number of letters Santa received each year was truly staggering requiring the operations team at the North Pole to invest heavily in automation just to keep up. With state of the art OCR and lexical analysis AI to process the inputs - light-years ahead of what the humans were using - they were able to process almost any letter into an ordered list that their fulfillment team could process:

    1 Barbie
    3 Oranges
    24 Coloring pencils

Which is why Ponsettia Fairytrifle, head of the Letter Processing Team was surprised when Yule Cuddlefluff, head of the fulfillment team burst into her office door with a complaint right when she was in the middle of morning tea break.

"It's no good", he said, "we're just getting garbage out the other end. It'll need to be scrapped. Go back to the old way, with teams of elves reading the letters round the clock"

"Just wait a minute", Ponsettia countered, "before we throw the elfling out with the bathwater, how about you show me what's wrong?"

"What's Wrong! What's Wrong! Just look at this!"

    ১২ Shopkins

"Pure garbage!"

Well, Ponsettia did agree that did look a little unusual, but she thought she'd dig a little before scrapping her entire department. She wondered what those weird characters were, and decided to find out. Perl's charnames::viacode is able to print out the name for any unicode codepoint, and ord can turn any individual character into a codepoint, so...


1: 
2: 
3: 
4: 

 

use utf8;
use charnames qw(:full);
my @chars = split //, "১২";
say charnames::viacode(ord $_) foreach @chars;

 

At which point the script printed out:

    BENGALI DIGIT ONE
    BENGALI DIGIT TWO

And everything became clear! The weird junk was just numbers in another script! After all, the letters did come in from all around the world...

"Don't worry, I can fix this", Ponsettia explained to Yule, "It's just a minor change to the code".

Unicode::UCD is a core module that has been shipping with Perl since the pre-releases of Perl 5.8. One of the things it can do is turn digits of any script into something Perl will be able treat as a number.


1: 
2: 

 

use Unicode::UCD qw(num);
say num("১২"); # prints "12"

 

However, it's clever enough to return undefined if you start mixing your scripts (as that's likely pure nonsense)

use Unicode::UCD qw(num);
say 'junk'
  if !defined num("\N{DEVANAGARI DIGIT THREE}\N{CHAM DIGIT TWO}");

A few minutes after Yule had left Ponsettia had already checked in a change to her codebase to make the lists print out the right thing. Disaster quickly averted she returned to her still warm cup of tea.

Gravatar Image This article contributed by: Mark Fowler <mark@twoshortplanks.com>