Have you ever found yourself writing
s/(wibble)wobble/$1wubble/g;
and felt it was somehow wrong, and ineffecient? If you dug a little deeper
you probably would have learned about positive look-behind in perlre.
Today's module,
Regexp::Keep,
gives us another way to do that.
s/wibble\Kwobble/wubble/g;
Wahoo, right? Okay, so consider this: while the documentation doesn't exactly
speak to the point, the author was actually giving you a means to implement
variable width look-behinds. That's right, if you've ever tried using
look-behinds you've likely been bitten by the fact that in order to implement
them the perl regexp engine limits you to a fixed width. Now, the module isn't
exactly speedy when you could be getting away with using a (fixed width)
look-behind
Keep:
Rate keep behind
keep 171277/s -- -41%
behind 288725/s 69% --
Nevertheless, it is faster than save-and-replace-with-self
Star:
Rate perlOUT perlIN keepOUT keepIN
perlOUT 105519/s -- -4% -22% -25%
perlIN 109523/s 4% -- -19% -22%
keepOUT 134998/s 28% 23% -- -4%
keepIN 140203/s 33% 28% 4% --
Plus:
Rate perlOUT perlIN keepOUT keepIN
perlOUT 110951/s -- -4% -17% -21%
perlIN 116117/s 5% -- -13% -17%
keepOUT 132882/s 20% 14% -- -5%
keepIN 139665/s 26% 20% 5% --
Nmbr:
Rate perlOUT perlIN keepOUT keepIN
perlOUT 107875/s -- -4% -30% -34%
perlIN 112177/s 4% -- -27% -32%
keepOUT 153858/s 43% 37% -- -6%
keepIN 164029/s 52% 46% 7% --
This is all achieved with some seriously strange voodoo: overloading the qr
operator, the experimental (?{ }) assertion and a little XS to tweak
the guts of perl. But it works. By the way, there's obviously more you could
learn about about regular expression construction and performance from those
benchmarks but I'll leave that as an exercise for the reader. Or maybe you want
to consider using a
sexeger instead;
also from the same author.
Here's the code for the benchmarks, I ran it on a Sun Fire V440 with perl 5.8.4
ADDENDUM: You can in fact have multiple \K. No, \K and /g together don't work, read the POD and then think about it.
1 use Benchmark 'cmpthese';
2 use Regexp::Keep;
3
4 #Pull out the regular expression compilation to test for overloading penalty
5 my $pstar = qr/(.*)\..*/;
6 my $kstar = qr/.*\K\..*/;
7 my $pplus = qr/(.+)\..*/;
8 my $kplus = qr/.+\K\..*/;
9 my $pnmbr = qr/(\w{3})\..*/;
10 my $knmbr = qr/\w{3}\K\..*/;
11
12 print "Keep:\n";
13 cmpthese(2_000_000,
14 {
15 behind=>sub{
16 # slow and inefficient
17 my $r = "abc.def.ghi.jkl";
18 $r =~ s/(?<=.{3})\..*//;
19 },
20 keep=>sub{
21 # fast and efficient
22 my $r = "abc.def.ghi.jkl";
23 $r =~ s/.{3}\K\..*//;
24 }
25 }
26 );
27
28 print "Star:\n";
29 cmpthese(2_000_000,
30 {
31 perlIN=>sub{
32 # slow and inefficient
33 my $r = "abc.def.ghi.jkl";
34 $r =~ s/(.*)\..*/$1/;
35 },
36 keepIN=>sub{
37 # fast and efficient
38 my $r = "abc.def.ghi.jkl";
39 $r =~ s/.*\K\..*//;
40 },
41 perlOUT=>sub{
42 # slow and inefficient
43 my $r = "abc.def.ghi.jkl";
44 $r =~ s/$pstar/$1/;
45 },
46 keepOUT=>sub{
47 # fast and efficient
48 my $r = "abc.def.ghi.jkl";
49 $r =~ s/$kstar//;
50 }
51 }
52 );
53
54 print "\nPlus:\n";
55 cmpthese(2_000_000,
56 {
57 perlIN=>sub{
58 # slow and inefficient
59 my $r = "abc.def.ghi.jkl";
60 $r =~ s/(.+)\..*/$1/;
61 },
62 keepIN=>sub{
63 # fast and efficient
64 my $r = "abc.def.ghi.jkl";
65 $r =~ s/.+\K\..*//;
66 },
67 perlOUT=>sub{
68 # slow and inefficient
69 my $r = "abc.def.ghi.jkl";
70 $r =~ s/$pplus/$1/;
71 },
72 keepOUT=>sub{
73 # fast and efficient
74 my $r = "abc.def.ghi.jkl";
75 $r =~ s/$kplus//;
76 }
77 }
78 );
79
80 print "\nNmbr:\n";
81 cmpthese(2_000_000,
82 {
83 perlIN=>sub{
84 # slow and inefficient
85 my $r = "abc.def.ghi.jkl";
86 $r =~ s/(\w{3})\..*/$1/;
87 },
88 keepIN=>sub{
89 # fast and efficient
90 my $r = "abc.def.ghi.jkl";
91 $r =~ s/\w{3}\K\..*//;
92 },
93 perlOUT=>sub{
94 # slow and inefficient
95 my $r = "abc.def.ghi.jkl";
96 $r =~ s/$pnmbr/$1/;
97 },
98 keepOUT=>sub{
99 # fast and efficient
100 my $r = "abc.def.ghi.jkl";
101 $r =~ s/$knmbr//;
102 }
103 }
104 );