Perl Advent Calendar 2006-12-01

Good things come in small packages

by Jerrad Pierce

Welcome to another edition of the Perl Advent Calendar. With some help from The Perl Foundation/Perl Mongers crew I've created PerlAdvent.PM here, where the current and future calendars can live; we are also working on providing a home for all of the previous calendars as well. I hope to delegate more of the writing duties to a wider variety of voices from the community with the support of several other mongers. And now, for the first installment. Enjoy.

Once you get around to reading it, you'll notice that today's code sample is extraordinarily short. That's because our featured module isn't about providing you with a toolkit but rather a tool. At one point or another you're going to write a program that's just t o o   d a m n   s l o w. At which point your next step should be to profile your code, and luckily perl has shipped with one named Devel::DProf since 5.6.0. DProf is great, you run it via the debugger, it logs some timing information, and then you run its buddy drofpp to see where your code is lollygagging.

The problem is, DProf may not always works for your circumstances. DProf tracks how much time is spent in each subroutine, which work great for verbose OO code. But what if you have beefy subs which, for one reason or another, you aren't planning on refactoring into lots of itty bitty pieces? And which pieces would make the most sense to break up? Or, what if you have no subroutines like our simple Monte Carlo pi-calculation example below? Well, in the latter case dprofpp won't report anything!

Enter Devel::SmallProf, the linewise profiler. SmallProf lacks a pretty-printing sidekick counterpart to dprofpp, but more than makes up for it in utility because SmallProf can tell you exactly which statements are consuming the most processor time. Of course, this more intensive introspection comes at a price 1. Profiling with SmallProf is frequently an order of magnitude slower than with DProf! If possible then, it is usually beneficial to first profile with DProf to determine the general problem area and then use SmallProf's $DB::profile to toggle profiling for only the areas of interest.

Here is some sample SmallProf "output". Note that the column labels have sorted to the bottom of the listing.

$ perl -d:SmallProf mod1.pl
$ sort -k 2nr,2 smallprof.out #POD provided command
    99999 0.383615 1.879000    15:  printf "PI ~ %1.8f after %0${COUNT}i
    99999 0.303463 1.815000    10:  if( ($x**2 + $y**2) <= 1 ){
    99999 0.200046 1.721000     6:  my $x = rand();
    99999 0.178753 1.713000     7:  my $y = rand();
    78489 0.094622 1.257000    11:    $HIT++;
        1 0.000008 0.000000     5:for(my $i = 1; $i < 10**$COUNT; ++$i){
        1 0.000006 0.000000     2:my $COUNT = 5; #Orders of magnitude
        1 0.000002 0.000000     3:my $HIT = 0;
        1 0.000002 0.000000    17:}
                              Profile of mod1.pl                       Page 1
           ================ SmallProf version 1.15 ================
        0 0.000000 0.000000     1:#!/usr/bin/perl
        0 0.000000 0.000000     4:
        0 0.000000 0.000000     8:
        0 0.000000 0.000000     9:  #The equals condition is neglected in many
        0 0.000000 0.000000    12:  }
        0 0.000000 0.000000    13:
        0 0.000000 0.000000    14:  #Multiply by 4 as we're using a single
        0 0.000000 0.000000    16:    ($i % 1_000) == 0;
       =================================================================
    count wall tm  cpu time line
After a brief rewrite merging four of the five slowest and non-trivial lines (6-11) we get a denser/cleaner version which executes much more quickly. We can also get more DWIMly sorted results using the more cumbersome command here provided by Tom Metro.
$ perl -d:SmallProf mod1b.pl
$ tail +4 smallprof.out | sort -t '\0' -k 1.29r,1.29 -k 1.11nr,1.18
    count wall tm  cpu time line
    99999 0.575161 1.968000     6:  $HIT++ if rand()**2 + rand()**2 <= 1;
    99999 0.462837 1.896000     8:  printf "PI ~ %1.8f after %0${COUNT}i
        1 0.000010 0.000000     5:for(my $i = 1; $i < 10**$COUNT; ++$i){
        1 0.000005 0.000000     2:my $COUNT = 5; #Orders of magnitude
        1 0.000004 0.000000     3:my $HIT = 0;
        1 0.000003 0.000000    10:}
        0 0.000000 0.000000     1:#!/usr/bin/perl
        0 0.000000 0.000000     4:
        0 0.000000 0.000000     7:
        0 0.000000 0.000000     9:    ($i % 1_000) == 0;
countwall timecpu time
before4784891.1605178.385
after2000021.0231033.931
Note that the current maintainer of SmallProf has created an XS version—requiring 5.8.8 or better—dubbed Devel::FastProf. It reportedly only incurs an approximately 5x speed penatly, and provides a result viewer

P.S. Yes, I am aware that the final result doesn't get printed ;-)

mod1.pl


   1 #!/usr/bin/perl
   2 my $COUNT = 5; #Orders of magnitude
   3 my $HIT = 0;
   4 
   5 for(my $i = 1; $i < 10**$COUNT; ++$i){
   6   my $x = rand();
   7   my $y = rand();
   8 
   9   #The equals condition is neglected in many implementations
  10   if( ($x**2 + $y**2) <= 1 ){
  11     $HIT++;
  12   }
  13 
  14   #Multiply by 4 as we're using a single quadrant
  15   printf "PI ~ %1.8f after %0${COUNT}i points\n", 4*$HIT/$i, $i if
  16     ($i % 1_000) == 0;
  17 }

mod1b.pl


   1 #!/usr/bin/perl
   2 my $COUNT = 5; #Orders of magnitude
   3 my $HIT = 0;
   4 
   5 for(my $i = 1; $i < 10**$COUNT; ++$i){
   6   $HIT++ if rand()**2 + rand()**2 <= 1;
   7 
   8   printf "PI ~ %1.8f after %0${COUNT}i points\n", 4*$HIT/$i, $i if
   9     ($i % 1_000) == 0;
  10 }
1. Because of its line orientation, you can get odd results from SmallProf for code that uses AUTOLOAD.