Perl 2002 Advent Calendar: Devel::DProf

On the 25th day of Advent my True Language brought to me..

Devel::DProf

When I write Perl code, I write it to be maintainable and easily understood (or at least I try, those of you that have been following the Calendar may well disagree.) The speed it executes at isn't really my main concern - it's the speed I can write it and make sure it does what it's supposed to that's important.

Of course, there are times where how quickly your code runs does matter. Santa would not be happy if his program took more than one evening to calculate where the next present he should deliver. Of course, you could counter that by saying that he wouldn't be happy if elves hadn't completed the changes to the program by Christmas Eve.

When it comes down to it we can spend forever and a day tweaking code making that little bit faster, but none of us actually have time to do that. What we need to do is tweak the right bit of the code that makes the most significant change to the speed of the program. If we're not careful we can spend weeks tweaking the wrong bit and get only miniscule speed increase.

This is where I find Devel::DProf comes in. It can profile our code and tell us how long we're taking running various sections of code and then armed with this information we can (hopefully) tweak the section of code we spend the most time in.

[Read the documentation for Devel::DProf on search.cpan.org]

So, I'm sitting here on Christmas Eve considering the fact that if I don't write all you good people a nice Advent Calendar for tomorrow then Santa won't bring me any presents.

Hmm, better look into my home directory and dig up some code...ah, the Expert Perl Quiz of the Week Number 4. Does anyone remember this? It was a simple quiz where you had to write a function that given two anagrams would decide how few chunks the first would have to be split up into so that when they would rearranged they would form the anagram.

So, what did I end up writing...something like this:

  #!/usr/bin/perl

  # turn on Perl's safety features
  use strict;
  use warnings;

  use Algorithm::Permute;

  # subroutine to work out the minimum number of chunks one word
  # has to be broken into to be rearranged into another
  sub min_chunks
  {
    # get the word
    my $word = shift;
    my $target = shift;

    # okay, try breaking it into 1 bit, 2 bits, 3 bits, etc..
    foreach my $bits (1..length($word))
    {
      # get the list of possible ways this can be broken up
      # into this many bits
      foreach my $result (divide($word, $bits))
      {
        # see if any permutation of those bits match the target
        my $p = Algorithm::Permute->new($result);
        while (my @res = $p->next)
        {
          # we got a match?  Wonderful!  Return the number of bits
  	  return $bits if (join '', @res) eq $target;
        }
      }
    }

    # no match? return undef;
    return undef;
  }

  # routine that returns all the possible ways a word can
  # be broken up into the passed number of bits.  Returns
  # a list of array refs that contain bits
  sub divide
  {
    my $string    = shift;
    my $divisions = shift;

    # base case, one string, not divided
    return [$string] if $divisions == 1;

    # recursive case...for every possible bit at the start
    my @result;
    for my $place (1..( length($string)-$divisions+1 ))
    {
      # get the front of the string
      my $head = substr($string,0,$place);

      # and combine it with every possible tail
      foreach my $tail ( divide(substr($string,$place),$divisions-1) )
      {
        # store this possible combination and move on
        push @result, [ $head, @{ $tail } ]
      }
    }

    # and return it
    return @result;
  }

  # test data
  print min_chunks("addresser","readdress");
  print min_chunks("foobarbob","fobbarobo");
  print min_chunks("abcdefgh","hgfedcba");
  print min_chunks("aaaaaaab","baaaaaaa");
  print min_chunks("fishfosha","afishfosh");
  print min_chunks("eetr","tree");

Great! But it runs so slowly. It takes a few seconds on my machine and it's not even accessing the disk, database, or even printing anything out. Can we make it go faster? Well, we need to run it with Devel::DProf to find out what it's spending all it's time doing. Devel::DProf interfaces with perl by hooking into the debugger code. In order to profile a code you need to run it with the -d flag to specify Devel::DProf as the debugger.

  [mark@gan] perl -d:DProf perms.pl

This runs the code as normal (but a bit slower) but what it also does is create a tmon.out file in the current directory. You can open this file up with your favourite editor and have a look at the unintelligible shorthand it's stored in here for everything your script did. What you really want to do is convert that file into something more meaningful. The dprofpp program can read in the file and will print out the 15 top called subroutine.

  [mark@gan] dprofpp
  Total Elapsed Time = 2.407621 Seconds
    User+System Time = 2.367621 Seconds
  Exclusive Times
  %Time ExclSec CumulS #Calls sec/call Csec/c  Name
   78.3   1.856  2.924      6   0.3094 0.4873  main::min_chunks
   55.7   1.319  1.038  62769   0.0000 0.0000  Algorithm::Permute::next
   0.84   0.020  0.019    321   0.0001 0.0001  Algorithm::Permute::DESTROY
   0.84   0.020  0.013    581   0.0000 0.0000  main::divide
   0.42   0.010  0.010      1   0.0100 0.0099  DynaLoader::bootstrap
   0.00   0.000  0.010      3   0.0000 0.0033  main::BEGIN
   0.00   0.000 -0.000      2   0.0000      -  strict::import
   0.00   0.000 -0.000      2   0.0000      -  strict::bits
   0.00   0.000 -0.000      1   0.0000      -  warnings::import
   0.00   0.000 -0.000      2   0.0000      -  Algorithm::Permute::BEGIN
   0.00   0.000 -0.000      1   0.0000      -  vars::import
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_load_flags
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_load_file
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_undef_symbols
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_find_symbol

Ah, so we see that over 78% of our time is being spent in the min_chunks routine. We can see that with our test data almost no of the execution time is being spent in the divide routine. Optimising that would just be pointless, even though it's using a technique called 'tail recursion' which is notoriously slow; It would seem it's taking up so little time it's almost not worth bothering altering it.

So, we look at min_chunks and we realise that there's obvious way to make it faster. So, let's have a look at the next thing down on our hitlist. Gadzooks, Algorithm::Permute::next is taking up at least 55% of the execution time just on it's own. We look at the documentation for this module and find that there's a callback interface that's quicker. Let's use that. Now before we change any code we backup our script (so when we make a mistake we can revert.) I can't emphasise this enough (speaking from experience.) So let's change the code from the object interface to the functional interface and see if it goes any faster. Firstly we need to import the function from the Algorithm::Permute module, so we change

   use Algorithm::Permute;

To:

   use Algorithm::Permute qw(permute);

And then we rewrite the major chunk where we use Algorithm::Permute as an object like so: my $p = Algorithm::Permute->new($result); while (my @res = $p->next) { return $bits if (join '', @res) eq $target; }

To use it with the functional interface like so:

      # get the possibilities
      my @data = @{ $result };

      # see if any permutation of those bits match the target
      my $foo;
      permute { $foo = $bits if (join('', @data) eq $target) }
              @data;
      return $foo if $foo;

Note that now we're running more code than we actually need to now - no longer are we short circuiting the permutations as soon as we find a match. However, Perl's odd like that - sometimes what you might expect to take longer sometimes doesn't. To be truthful the only way we're going to see if our code is faster is to profile it again

  [mark@gan] perl -d:DProf perms.pl  
  358222

  [mark@gan] bash-2.05b$ dprofpp 
  Total Elapsed Time = 0.722584 Seconds
    User+System Time = 0.583314 Seconds
  Exclusive Times
  %Time ExclSec CumulS #Calls sec/call Csec/c  Name
   82.2   0.480  0.479    321   0.0015 0.0015  Algorithm::Permute::permute
   6.86   0.040  0.073    581   0.0001 0.0001  main::divide
   6.69   0.039  0.552      6   0.0064 0.0920  main::min_chunks
   0.00   0.000 -0.000      3   0.0000      -  main::BEGIN
   0.00   0.000 -0.000      2   0.0000      -  strict::import
   0.00   0.000 -0.000      2   0.0000      -  strict::bits
   0.00   0.000 -0.000      1   0.0000      -  warnings::import
   0.00   0.000 -0.000      2   0.0000      -  Algorithm::Permute::BEGIN
   0.00   0.000 -0.000      1   0.0000      -  vars::import
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::bootstrap
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_load_flags
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_load_file
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_undef_symbols
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_find_symbol
   0.00   0.000 -0.000      1   0.0000      -  DynaLoader::dl_install_xsub

Not only have we cut out the expensive time for the next routine, but we've also greatly reduced the time the min_chunks had to run - I guess all that loop operation at setting up and destroying objects took its toll. It turns out that the functional interface is sufficiently faster for our test data that although it tries more permutations, in the end overall it's quicker. To be honest I'd have never guessed that. Which just goes to show how useful Devel::DProf is...

When Perl is Not Fast Enough (YAPC::Europe talk by Nicholas Clark)

Benchmark

Perl Quiz Of The Week

Perl Quiz Of The Week 4