Perl Advent Calendar 2008-12-19+20

'Tis the season to tie one on

by Jerrad Pierce

Three modules for the price of two, to make up for the lag.

Santa's terrible with names and spelling, why do you think he keeps all those lists? Eventually even this saint grew impatient with his computer's inability to do what he meant, just because he added an extra S or left off a silent H in some kids name, so he tried some of the solutions on CPAN but none met his exacting standards; it's all black or white with this guy! Not afraid to (re)invent the wheel he rolled his own with three other helpful libraries from the respository, and created a simple tied hash that was forgiving of false figuration and fat finger syndrome.1

Tie::OneOff

Tie::OneOff provides several mechanisms for implementing tie-d interfaces without extraneous files or packages. All follow the same premise of the user providing an explicit hash(ref) for use as a variable accesor method dispatch table (lines 7–18), rather than class methods in another package. Something similar could be achieved through the use of AUTOLOAD but if you're going to use syntactic sugar, why not go for as short and sweet as possible?

Text::Metaphone

You may be familiar with Soundex, a nearly century-old method of comparing surnames. A simple and ready standby, Soundex maps a string into a space of 56_862 buckets denoted by a four-character string, resulting in a number of collisions. Metaphone on the other hand, offers an infinite digest space since its hashes are variable width. Both ignore vowels, but besides not truncating results, Metaphone also takes account of digraphs and trigraphs in its mapping. By default, this implementation deviates from the original and maps certain cases of SCH to "sh" or X so that the author's name is correctly encoded. Unfortunately some other cases are missed, so a common misspelling of Schmidt is encoded differently:

  % perl -MText::Metaphone -le 'print Metaphone($_) for qw/Schmidt Shmidt/'
  SKMTT
  XMTT

String::Approx

In order to ensure that homophones such as Schmidt, Schmitt, Shmidt and Shmitt are not unnecessarily perceived as distinct, Santa made use of String::Approx. With it, even though each has a different Metaphonic hash,

SKMTT
SKMT
XMTT
XMT

John Jacob Jingleheimer'll get his Speak & Spell™ no matter how his last name is spelled; as long as we relax the matching criteria from the default 10% difference on Line 14 to catch Shmitt.

mod20.pl

   1 use Tie::OneOff;
   2 use Text::Metaphone;
   3 use String::Approx 'amatch';
   4 
   5 tie my %ahash, 'Tie::OneOff', do{
   6   my %_hash, #Private stash w/o a closure
   7     {
   8      BASE  => \%_hash,
   9      FETCH => sub {
  10        my $key = shift;
  11        return undef unless %_hash;
  12        return $_hash{$key} if exists $_hash{$key};
  13        my %real = map{ Metaphone($_)=>$_ } keys %_hash;
  14        my @like = amatch(Metaphone($key), ['15%'], keys %real);
  15        return @like ? $_hash{ $real{$like[0]} } : undef;
  16      },
  17      STORE => sub { $_hash{$_[0]} = $_[1] }
  18     }
  19   };
  20 
  21 @ahash{'Jerrad Pierce', 'John Jacob Jingleheimer Schmidt'} =
  22       ('Snowpeak titanium sporks', 'Speak & Spell');
  23 
  24 print join("\n", @ahash{'Jarad Pearse', 'Jon Jacob Jinglehiemer Schmitt'}, '');
  25 #Snowpeak titanium sporks
  26 #Speak & Spell

1. Rather than employ the Metaphone hash as the key in its private data store, Santa's code preserves keys and only seeks similar indexes if an exact match is not found. This approach reduces collisions, and is most appropriate if the hash has been populated with an authoritative source.

View Source (POD)