Perl Advent Calendar 2008-12-14

L-HO-HO-HO-st in translation

by Yanick Champoux

There's no doubt about it, Santa has the glorious part of the job. Riding through the sky in his pimped-out reindeer sled, distributing gifts, reaping love, getting offered cookies and milk wherever he goes.

But this is only the bright, shiny tip of the North Pole Celebration Manufacturing Machine. There's a full year of labor that hides behind the Big Red One 24 hours of eggnog-scented glory. Thousands of unsung elves toil on hundred of tasks to make it happen.

One of those crucial tasks is the processing of all Christmas letters. After all, a big part of the magic of Christmas is how Santa Claus always bring the perfect gift (although the system can experience glitches from time to time, as the piles of tube socks I've collected through the years can attest to). There can be no cheating done there, no easy way out: each letter has to be parsed, processed, groked and entered in the big gift database.

If having to process a heap of mail the size of mount Everest wasn't enough, one has to factor in that those letters are written in all languages, and more often than not peppered with the wonderfully cute spelling mistakes that is the hallmark of six year olds all over the world.

As more and more scribe elves are cracking under the pressure, Santa decided that it was time to call the IT elves (and Perl) to the rescue. He gave them the task to come with a system that would take the transcript of a child's letter, figure out in which language it's written, run it through a quick spellcheck and then translate it to something all elves can read; which happens to be English, thanks to the Polar population's addiction to Sesame Street.

As it had been reported on the 4th and 13th, the IT elves already succeeded in implementing the first two steps of Santa's masterplan. The only part left to be done was the translation.

The module to do the job, Lingua::Translate, was found pretty quickly. Since it provides an interface to online translation services, it was the perfect way to outsource the always-thorny problem of translation to someone more qualified for the job. But the question remained, which of the many available backends should be used? The distribution itself provides ::Babelfish & ::SysTran, but Lingua::Translate::InterTran and Lingua::Translate::Google were also available. In a thoroughly unscientific effort to determine which was best, they had a sequilingual intern feed each the same sample letter:

Original letter:

  Cher Papa Noel,

  Comme cadeau, je veux un gros camion jaune.

  Bisoux,
  Martin

Babelfish translation:

  Dear Papa Christmas,

  Like gift, I want a large truck yellow.

  Bisoux,
  Martin

SysTran translation:

  
  # No SysTran server available.
  # Next year's wish list?
  #
  # —Glugg

Google translation:

  Dear Papa Noel, 

  As a gift, I want a big yellow truck. 

  Bisoux, 
  Martin

InterTran translation:

  Dear Dad Noel, 

  As freebie, I want to a big lorry yellow. 

  Bison, 
  Martial

Of course, the number of languages each service supports is also important, but unfortunately the implementations of the method intended to report this information available varied in quality. More frustratingly, this method required the creation of a disposable object in order to be checked. Nevertheless, based on this preliminary test run, the elves agreed that the Google backend looked like the most promising candidate, and plugged it in to the prototype script.

Rumors are that, next year, the IT staff is looking at hooking an optical character recognition engine into the script to do away with the re-typing of the letters. The postal elves can barely wait!

mod14.pl

   1 #!/usr/bin/perl 
   2 
   3 use strict;
   4 use warnings;
   5 
   6 use Text::SpellChecker;
   7 use Lingua::Ident;
   8 use Lingua::Translate;
   9 
  10 Lingua::Translate::config( back_end => 'Google' );
  11 
  12 my $letter = join '', <>;
  13 
  14 print "--- original letter:\n\n$letter\n\n";
  15 
  16 # TODO: add all the languages of the world. -- Glugg
  17 my $ident = Lingua::Ident->new( map "data.$_" => qw/ en fr de / );
  18 
  19 my $speller = Text::SpellChecker->new(
  20     text => $letter,
  21     lang => $ident->identity($letter),
  22 );
  23 
  24 while ( my $word = $speller->next_word ) {
  25 
  26     # just take the first suggestion,
  27     # if there is one
  28     my ($corrected) = $speller->suggestions;
  29     $speller->replace_all( new_word => $corrected ) if $corrected;
  30 }
  31 
  32 print "--- corrected letter:\n\n", $speller->text, "\n\n";
  33 
  34 my $xl8r = Lingua::Translate->new( src => $lang, dest => 'en' );
  35 
  36 print "--- translated letter:\n", $xl8r->translate( $speller->text ), "\n\n";
  37 
  38 __END__
View Source (POD)