Maintaining data structures in Perl is a reasonably easy thing to do, but you can still be caught out by one of the common gotchas if you're not very careful in your implementation. One such common gotcha is creating a circular data structure - a structure that points to itself - in such a manner that when you discard the structure the memory that that structure took up will not be reclaimed. This can be problematic in big long running programs that, if not employing proper data structure management, can consume more and more memory getting increasingly bloated over their runtime until they need to be shut down in a (hopefully) controlled manner and restarted.
Data::Structure::Util is a bag 'o tricks that can greatly help in developing better behaved modules, not in the least since it can detect (and correct) circular data structures. Analogous to Scalar::Util, List::Util and Hash::Util, it provides numerous functions that are handy when working with data structures.
To understand how perl manages memory, you must first understand how
reference counting works. This is a very basic - yet remarkably
powerful - scheme where each variable in your program has a little
counter associated with it that counts how many references point to
it. These references can either be named references (like $foo
,
$bar
, etc) or references as part of a data structure. Simply put,
when this counter falls to zero it means that there's no variables
pointing at it, and hence no-one could possibly be using the variable,
so the variable can be cleaned up and the memory reclaimed.
For example, here the variable $foo
has one reference due to the
name, $foo
, pointing at it from your current scope.
my $foo = "bar";
Using our old friend Devel::Peek we can see how many things are pointing at the variable at once.
use Devel::Peek; my $foo = "bar"; print Dump $foo;
SV = PV(0x815de80) at 0x8170ee4 REFCNT = 1 <-- just one FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x816c668 "bar"\0 CUR = 3 LEN = 4
If we take a second reference to that variable then the REFCNT
goes
up.
use Devel::Peek; my $foo = "bar"; my $baz = \$foo; print Dump $foo;
SV = PV(0x815de80) at 0x8170ee4 REFCNT = 2 <-- now two! FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x816c668 "bar"\0 CUR = 3 LEN = 4
If we jiggle our program so $foo
falls out of scope then we see we
only have a reference of one again (since only $bar is pointing at it
now, the reference from $baz
.
use Devel::Peek; my $baz; { my $foo = "bar"; $baz = \$foo; } print Dump ${ $baz };
SV = PV(0x815de80) at 0x8170ef0 REFCNT = 1 <-- back to one FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x816c668 "bar"\0 CUR = 3 LEN = 4
If we now undefined $baz
then the reference of $foo
will fall to
zero. I would show this, but, of course I no longer have any way of
getting to $foo
, and that is precisely the reason that perl can
throw it away.
Perl programmers abuse the reference counting system to great advantage. Consider the constructor.
sub new { my $class = shift; my $self = bless {}, $class; return $self }
my $hc = HizenburgCompensator->new();
In the above code we create a variable - an object - and stick it in
$self
. $self
is declared lexically which means it's contents
should, by all rights, be cleared up at the end of the scope. But
they're not? Why? Because we return a copy of the reference as we
exit that subroutine (that gets put in $hc
, or whatever else calls
$new) and hence that data structure still has a reference pointing at
it.
The big problem comes in when we accidentally create a circular reference. By way of an example, meet the Flintstones (they're the modern stone age family)
my $fred = Person->new(); my $wilma = Person->new();
$fred->{spouse} = $wilma; $wilma->{spouse} = $fred;
use Data::Dumper; print Dumper $fred;
Eek, this prints out
$VAR1 = bless( { 'spouse' => bless( { 'spouse' => $VAR1 }, 'Person' ) }, 'Person' );
This means that we can get back to Fred by going $fred->{spouse}{spouse}
. It's a circular reference as I can keep
going round and round in circles, alternating between Fred and Wilma as
I go to the next spouse.
Why are these so bad? Because even if both $fred
and $wilma
fall out of scope then they'll both still have a reference pointing at
them ($wilma
will point at $fred
and vice versa) and their
reference count will never drop to zero, meaning that memory will
never be recovered.
weaken
Let's simplify our data structure a little:
my $person = bless( { 'spouse' => bless( { }, 'Person' ) }, 'Person' ); $person->{spouse}{spouse} = $person;
Let's use Devel::Peek to have a look at that:
print Dump $person; SV = RV(0x813840) at 0x80a270 REFCNT = 1 FLAGS = (PADBUSY,PADMY,ROK) RV = 0x80a348 SV = PVHV(0x809f30) at 0x80a348 REFCNT = 2 FLAGS = (OBJECT,SHAREKEYS) ...
We can see that the hash that makes up $person
has a refcount of
two. While this is correct, it means we're never going to recover
that data. What we need to do is someone keep one of the references,
but have it not counted when we do reference counting. This kind of
reference is called a weak reference and normal references can be
converted into them with Scalar::Util's weaken
function:
use Scalar::Util qw(weaken); weaken($person->{spouse}{spouse}); =head2 Doing this the easy way: Using Data::Structure::Util
All this requires an awful lot of thought. I mean, if you accidentally weaken the only reference you are holding to the object then that object's going to be reclaimed before your very eyes. This is not good. What you really want is some way of just dealing with circular references. This is where Data::Structures::Util comes in.
Data::Structure::Util can detect circular data references:
# create my data structure. my $person = bless( { 'spouse' => bless( { }, 'Person' ) }, 'Person' ); $person->{spouse}{spouse} = $person;
use Data::Structure::Util qw(has_circular_ref); print "Originally circular\n" if has_circular_ref($person);
# weaken a reference, make one non circular. use Scalar::Util qw(weaken); weaken($person->{spouse}{spouse});
print "Still circular!\n" if has_circular_ref($person);
This prints out
Originally circular
But not
Originally circular Still circular!
Because Data::Structure::Util is smart enough to realise that one of the links has been weakened. Of course, it's actually smart enough to do that weakening for us.
# create my data structure. my $person = bless( { 'spouse' => bless( { }, 'Person' ) }, 'Person' ); $person->{spouse}{spouse} = $person;
use Data::Structure::Util qw(has_circular_ref circular_off); print "Originally circular\n" if has_circular_ref($person);
my $offed = circular_off($person); print "Weakened '$offed' references\n";
print "Still circular!\n" if has_circular_ref($person);
How simple is that?
There are a couple of other very handy utilities in the package.
Top of the list is my personal favourite, unbless
. This actually
is really useful. The common idiom for unblessing an object is
to bless it back into it's native class.
my $thingy = bless {}, "SpaceAlien"; bless($thingy, "HASH");
This is a 'class A' hack. For a start, Scalar::Util's blessed
method will catch us out, as it's really still blessed - it's just
got a stealthy name:
use Scalar::Util qw(blessed);
print "Zounders, it's an object!" if blessed($thingy);
Better to use Data::Structure::Util's unbless
use Data::Structure::Util qw(unbless); my $thingy = bless {}, "SpaceAlien"; unbless($thingy);
use Scalar::Util qw(blessed); print "Zounders, it's an object!" if blessed($thingy);
This prints nothing, which is exactly what I want.