YA Perl Advent Calendar 2005-12-13

Have you ever found yourself writing

     s/(wibble)wobble/$1wubble/g;

and felt it was somehow wrong, and ineffecient? If you dug a little deeper you probably would have learned about positive look-behind in perlre. Today's module, Regexp::Keep, gives us another way to do that.

     s/wibble\Kwobble/wubble/g;

Wahoo, right? Okay, so consider this: while the documentation doesn't exactly speak to the point, the author was actually giving you a means to implement variable width look-behinds. That's right, if you've ever tried using look-behinds you've likely been bitten by the fact that in order to implement them the perl regexp engine limits you to a fixed width. Now, the module isn't exactly speedy when you could be getting away with using a (fixed width) look-behind

Keep:
           Rate   keep behind
keep   171277/s     --   -41%
behind 288725/s    69%     --

Nevertheless, it is faster than save-and-replace-with-self

Star:
            Rate perlOUT  perlIN keepOUT  keepIN
perlOUT 105519/s      --     -4%    -22%    -25%
perlIN  109523/s      4%      --    -19%    -22%
keepOUT 134998/s     28%     23%      --     -4%
keepIN  140203/s     33%     28%      4%      --

Plus:
            Rate perlOUT  perlIN keepOUT  keepIN
perlOUT 110951/s      --     -4%    -17%    -21%
perlIN  116117/s      5%      --    -13%    -17%
keepOUT 132882/s     20%     14%      --     -5%
keepIN  139665/s     26%     20%      5%      --

Nmbr:
            Rate perlOUT  perlIN keepOUT  keepIN
perlOUT 107875/s      --     -4%    -30%    -34%
perlIN  112177/s      4%      --    -27%    -32%
keepOUT 153858/s     43%     37%      --     -6%
keepIN  164029/s     52%     46%      7%      --

This is all achieved with some seriously strange voodoo: overloading the qr operator, the experimental (?{ }) assertion and a little XS to tweak the guts of perl. But it works. By the way, there's obviously more you could learn about about regular expression construction and performance from those benchmarks but I'll leave that as an exercise for the reader. Or maybe you want to consider using a sexeger instead; also from the same author.

Here's the code for the benchmarks, I ran it on a Sun Fire V440 with perl 5.8.4

ADDENDUM: You can in fact have multiple \K. No, \K and /g together don't work, read the POD and then think about it.

mod13.pl

   1 use Benchmark 'cmpthese';
   2 use Regexp::Keep;
   3 
   4 #Pull out the regular expression compilation to test for overloading penalty
   5 my $pstar = qr/(.*)\..*/;
   6 my $kstar = qr/.*\K\..*/;
   7 my $pplus = qr/(.+)\..*/;
   8 my $kplus = qr/.+\K\..*/;
   9 my $pnmbr = qr/(\w{3})\..*/;
  10 my $knmbr = qr/\w{3}\K\..*/;
  11 
  12 print "Keep:\n";
  13 cmpthese(2_000_000,
  14 	 {
  15 	  behind=>sub{
  16 	    # slow and inefficient
  17 	    my $r = "abc.def.ghi.jkl";
  18 	    $r =~ s/(?<=.{3})\..*//;
  19 	  },
  20 	  keep=>sub{
  21 	    # fast and efficient
  22 	    my $r = "abc.def.ghi.jkl";
  23 	    $r =~ s/.{3}\K\..*//;
  24 	  }
  25          }
  26 	);
  27 
  28 print "Star:\n";
  29 cmpthese(2_000_000,
  30 	 {
  31 	  perlIN=>sub{
  32 	    # slow and inefficient
  33 	    my $r = "abc.def.ghi.jkl";
  34 	    $r =~ s/(.*)\..*/$1/;
  35 	  },
  36 	  keepIN=>sub{
  37 	    # fast and efficient
  38 	    my $r = "abc.def.ghi.jkl";
  39 	    $r =~ s/.*\K\..*//;
  40 	  },
  41 	  perlOUT=>sub{
  42 	    # slow and inefficient
  43 	    my $r = "abc.def.ghi.jkl";
  44 	    $r =~ s/$pstar/$1/;
  45 	  },
  46 	  keepOUT=>sub{
  47 	    # fast and efficient
  48 	    my $r = "abc.def.ghi.jkl";
  49 	    $r =~ s/$kstar//;
  50 	  }
  51 	 }
  52 	);
  53 
  54 print "\nPlus:\n";
  55 cmpthese(2_000_000,
  56 	 {
  57 	  perlIN=>sub{
  58 	    # slow and inefficient
  59 	    my $r = "abc.def.ghi.jkl";
  60 	    $r =~ s/(.+)\..*/$1/;
  61 	  },
  62 	  keepIN=>sub{
  63 	    # fast and efficient
  64 	    my $r = "abc.def.ghi.jkl";
  65 	    $r =~ s/.+\K\..*//;
  66 	  },
  67 	  perlOUT=>sub{
  68 	    # slow and inefficient
  69 	    my $r = "abc.def.ghi.jkl";
  70 	    $r =~ s/$pplus/$1/;
  71 	  },
  72 	  keepOUT=>sub{
  73 	    # fast and efficient
  74 	    my $r = "abc.def.ghi.jkl";
  75 	    $r =~ s/$kplus//;
  76 	  }
  77 	 }
  78 	);
  79 
  80 print "\nNmbr:\n";
  81 cmpthese(2_000_000,
  82 	 {
  83 	  perlIN=>sub{
  84 	    # slow and inefficient
  85 	    my $r = "abc.def.ghi.jkl";
  86 	    $r =~ s/(\w{3})\..*/$1/;
  87 	  },
  88 	  keepIN=>sub{
  89 	    # fast and efficient
  90 	    my $r = "abc.def.ghi.jkl";
  91 	    $r =~ s/\w{3}\K\..*//;
  92 	  },
  93 	  perlOUT=>sub{
  94 	    # slow and inefficient
  95 	    my $r = "abc.def.ghi.jkl";
  96 	    $r =~ s/$pnmbr/$1/;
  97 	  },
  98 	  keepOUT=>sub{
  99 	    # fast and efficient
 100 	    my $r = "abc.def.ghi.jkl";
 101 	    $r =~ s/$knmbr//;
 102 	  }
 103 	 }
 104 	);