The 2004 Perl Advent Calendar
[about] | [archives] | [contact] | [home]

On the 15th day of Advent my True Language brought to me..
Archive::Extract

The Internet is full of stuff. Big old piles of it. Stuff we want to download. Stuff we want to use. Most of this stuff is bundled up in some archive format or another so that it's both compressed (lowering download time) and combined (so all files are collected together in one handy to distribute file.) Before we can use anything from these archives we first have to extract their contents.

Archive::Extract allows us to extract the contents of an archive without worrying about the nitty gritty of what's involved. Although there's plenty of other CPAN modules that can handle this for us - not to mention a plethora of command line tools - Archive::Extract manages to ignore the schmantics involved and just do the job, presenting us with a consistent interface no matter what archive format is used.

Using this module couldn't be simpler:

  # create an archive object
  my $archive = Archive::Extract->new(
    archive => "Test-DatabaseRow-1.04.tar.gz"
  );
  # extract the contents to the current working directory
  $archive->extract()
    or die "Can't extract: " . $archive->error;

And that's all there is to it. This is going to be one of the quickest tutorials of the year - but mainly because this module is so easy to use. The simple interface completely hides the fact that the module behind the scenes is working out what type of archive the filename we passed in is and selecting the right module to do the extraction (or if we don't have the right module installed, using a command line utility to do the work for us.) Like all the best modules, it just does what it does without any fuss, abstracting away all the complexities that are involved behind the scenes.

Temporary Directories

We can specify a directory to Archive::Extract when we call extract telling it where it can extract it's files to:

  $archive->extract(to => "/Users/mark/temp")
    or die "Can't extract: " . $archive->error;

Perl can supply us with a new temporary directory to put the files in if we want:

  # load the temp directory module
  use File::Temp qw(tempdir);
  # extract the files to a tempdir
  $archive->extract(to => tempdir(CLEANUP => 1))
    or die "Can't extract: " . $archive->error;

By specifying <CLEANUP = 1>> we're instructing that the program delete the directory and all it's contents when the it exits. Hopefully we'll have got what we wanted from the files by this point or we'll have copied or moved the files we want to keep somewhere else.

Working Out What We Extracted

If we put all the files in their own temporary directory we can simply find the files using File::Find::Rule:

  use File::Find::Rule;
  my @files = File::Find::Rule->files->in($archive->output_path);

Alternatively we can query the archive to find out what files it contains:

  my $files = $archive->files;

Which produces something like this:

  $files = [
             'Test-DatabaseRow-1.04/',
             'Test-DatabaseRow-1.04/Build.PL',
             'Test-DatabaseRow-1.04/CHANGES',
             'Test-DatabaseRow-1.04/lib/',
             'Test-DatabaseRow-1.04/lib/Test/',
             'Test-DatabaseRow-1.04/lib/Test/DatabaseRow.pm',
             'Test-DatabaseRow-1.04/Makefile.PL',
             'Test-DatabaseRow-1.04/MANIFEST',
             'Test-DatabaseRow-1.04/META.yml',
             'Test-DatabaseRow-1.04/t/',
             'Test-DatabaseRow-1.04/t/01basic.t',
             'Test-DatabaseRow-1.04/t/02throws.t',
             'Test-DatabaseRow-1.04/t/03tests.t',
             'Test-DatabaseRow-1.04/t/04local.t',
             'Test-DatabaseRow-1.04/t/05warn.t',
             'Test-DatabaseRow-1.04/t/06multiple.t',
             'Test-DatabaseRow-1.04/t/07results.t',
             'Test-DatabaseRow-1.04/t/08utf8.t',
             'Test-DatabaseRow-1.04/TODO'
           ];

We can then use File::Spec's rel2abs function to convert these paths into global paths relative to the extraction directory:

  use File::Spec::Functions qw(rel2abs);
  foreach (@{ $files })
   { $_ = rel2abs( $_, $archive->output_path ) }

Giving us something like this:

  $files = [
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/Build.PL',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/CHANGES',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/lib',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/lib/Test',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/lib/Test/DatabaseRow.pm',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/Makefile.PL',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/MANIFEST',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/META.yml',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/01basic.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/02throws.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/03tests.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/04local.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/05warn.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/06multiple.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/07results.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/t/08utf8.t',
             '/tmp/Qm7Zxd12Ex/Test-DatabaseRow-1.04/TODO'
           ];

Which is where the files are located on disk.

  • File::Temp