The 2003 Perl Advent Calendar
[about] | [archives] | [contact] | [home]

On the 24th day of Advent my True Language brought to me..
Data::Structure::Util

Maintaining data structures in Perl is a reasonably easy thing to do, but you can still be caught out by one of the common gotchas if you're not very careful in your implementation. One such common gotcha is creating a circular data structure - a structure that points to itself - in such a manner that when you discard the structure the memory that that structure took up will not be reclaimed. This can be problematic in big long running programs that, if not employing proper data structure management, can consume more and more memory getting increasingly bloated over their runtime until they need to be shut down in a (hopefully) controlled manner and restarted.

Data::Structure::Util is a bag 'o tricks that can greatly help in developing better behaved modules, not in the least since it can detect (and correct) circular data structures. Analogous to Scalar::Util, List::Util and Hash::Util, it provides numerous functions that are handy when working with data structures.

To understand how perl manages memory, you must first understand how reference counting works. This is a very basic - yet remarkably powerful - scheme where each variable in your program has a little counter associated with it that counts how many references point to it. These references can either be named references (like $foo, $bar, etc) or references as part of a data structure. Simply put, when this counter falls to zero it means that there's no variables pointing at it, and hence no-one could possibly be using the variable, so the variable can be cleaned up and the memory reclaimed.

For example, here the variable $foo has one reference due to the name, $foo, pointing at it from your current scope.

  my $foo = "bar";

Using our old friend Devel::Peek we can see how many things are pointing at the variable at once.

  use Devel::Peek;
  my $foo = "bar";
  print Dump $foo;
  SV = PV(0x815de80) at 0x8170ee4
    REFCNT = 1                          <-- just one
    FLAGS = (PADBUSY,PADMY,POK,pPOK)
    PV = 0x816c668 "bar"\0
    CUR = 3
    LEN = 4

If we take a second reference to that variable then the REFCNT goes up.

  use Devel::Peek;
  my $foo = "bar";
  my $baz = \$foo;
  print Dump $foo;
  SV = PV(0x815de80) at 0x8170ee4
    REFCNT = 2                          <-- now two!
    FLAGS = (PADBUSY,PADMY,POK,pPOK)
    PV = 0x816c668 "bar"\0
    CUR = 3
    LEN = 4

If we jiggle our program so $foo falls out of scope then we see we only have a reference of one again (since only $bar is pointing at it now, the reference from $baz.

  use Devel::Peek;
  my $baz;
  {
    my $foo = "bar";
    $baz = \$foo;
  }
  print Dump ${ $baz };
  SV = PV(0x815de80) at 0x8170ef0
    REFCNT = 1                        <-- back to one
    FLAGS = (PADBUSY,PADMY,POK,pPOK)
    PV = 0x816c668 "bar"\0
    CUR = 3
    LEN = 4

If we now undefined $baz then the reference of $foo will fall to zero. I would show this, but, of course I no longer have any way of getting to $foo, and that is precisely the reason that perl can throw it away.

Perl programmers abuse the reference counting system to great advantage. Consider the constructor.

  sub new
  {
    my $class = shift;
    my $self  = bless {}, $class;
    return $self
  }
  my $hc = HizenburgCompensator->new();

In the above code we create a variable - an object - and stick it in $self. $self is declared lexically which means it's contents should, by all rights, be cleared up at the end of the scope. But they're not? Why? Because we return a copy of the reference as we exit that subroutine (that gets put in $hc, or whatever else calls $new) and hence that data structure still has a reference pointing at it.

Circular References

The big problem comes in when we accidentally create a circular reference. By way of an example, meet the Flintstones (they're the modern stone age family)

  my $fred  = Person->new();
  my $wilma = Person->new();
  $fred->{spouse} = $wilma;
  $wilma->{spouse} = $fred;
  use Data::Dumper;
  print Dumper $fred;

Eek, this prints out

  $VAR1 = bless( {
                   'spouse' => bless( {
                                        'spouse' => $VAR1
                                      }, 'Person' )
                 }, 'Person' );

This means that we can get back to Fred by going $fred->{spouse}{spouse}. It's a circular reference as I can keep going round and round in circles, alternating between Fred and Wilma as I go to the next spouse.

Why are these so bad? Because even if both $fred and $wilma fall out of scope then they'll both still have a reference pointing at them ($wilma will point at $fred and vice versa) and their reference count will never drop to zero, meaning that memory will never be recovered.

Using weaken

Let's simplify our data structure a little:

  my $person =  bless( {
                     'spouse' => bless( {
                                        }, 'Person' )
                   }, 'Person' );
  $person->{spouse}{spouse} = $person;

Let's use Devel::Peek to have a look at that:

  print Dump $person;
  SV = RV(0x813840) at 0x80a270
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,ROK)
    RV = 0x80a348
    SV = PVHV(0x809f30) at 0x80a348
      REFCNT = 2
      FLAGS = (OBJECT,SHAREKEYS)
      ... 

We can see that the hash that makes up $person has a refcount of two. While this is correct, it means we're never going to recover that data. What we need to do is someone keep one of the references, but have it not counted when we do reference counting. This kind of reference is called a weak reference and normal references can be converted into them with Scalar::Util's weaken function:

  use Scalar::Util qw(weaken);
  weaken($person->{spouse}{spouse});
  
=head2 Doing this the easy way: Using Data::Structure::Util

All this requires an awful lot of thought. I mean, if you accidentally weaken the only reference you are holding to the object then that object's going to be reclaimed before your very eyes. This is not good. What you really want is some way of just dealing with circular references. This is where Data::Structures::Util comes in.

Data::Structure::Util can detect circular data references:

  # create my data structure.
  my $person =  bless( {
                     'spouse' => bless( {
                                        }, 'Person' )
                   }, 'Person' );
  $person->{spouse}{spouse} = $person;
  use Data::Structure::Util qw(has_circular_ref);
  print "Originally circular\n"
    if has_circular_ref($person);
  # weaken a reference, make one non circular.
  use Scalar::Util qw(weaken);
  weaken($person->{spouse}{spouse});
  print "Still circular!\n"
    if has_circular_ref($person);

This prints out

  Originally circular

But not

  Originally circular
  Still circular!

Because Data::Structure::Util is smart enough to realise that one of the links has been weakened. Of course, it's actually smart enough to do that weakening for us.

  # create my data structure.
  my $person =  bless( {
                     'spouse' => bless( {
                                        }, 'Person' )
                   }, 'Person' );
  $person->{spouse}{spouse} = $person;
  use Data::Structure::Util qw(has_circular_ref circular_off);
  print "Originally circular\n"
    if has_circular_ref($person);
  my $offed = circular_off($person);
  print "Weakened '$offed' references\n";
  print "Still circular!\n"
    if has_circular_ref($person);

How simple is that?

Other Handy Utility Functions

There are a couple of other very handy utilities in the package. Top of the list is my personal favourite, unbless. This actually is really useful. The common idiom for unblessing an object is to bless it back into it's native class.

    my $thingy = bless {}, "SpaceAlien";
    bless($thingy, "HASH");

This is a 'class A' hack. For a start, Scalar::Util's blessed method will catch us out, as it's really still blessed - it's just got a stealthy name:

    use Scalar::Util qw(blessed);
    print "Zounders, it's an object!"
      if blessed($thingy);

Better to use Data::Structure::Util's unbless

    use Data::Structure::Util qw(unbless);
    my $thingy = bless {}, "SpaceAlien";
    unbless($thingy);
    use Scalar::Util qw(blessed);
    print "Zounders, it's an object!"
      if blessed($thingy);

This prints nothing, which is exactly what I want.

  • Devel::Peek
  • Scalar::Util