One of the core modules distributed with perl, File::Find, allows you to find files by recursively searching though directory paths on your hard drive. This is good example of when to use a module, as although the task may sound simple to write yourself it's possible to get into all kinds of trouble in some special cases.
The trouble with File::Find is that it's quite hard to use, and not something beginners can easily get to grips with. It uses a callback interface similar to that I
Enter File::Find::Rule a module that does nothing but provide a new, simpler, interface for File::Find.
So, I was pondering to myself, if I actually buy myself an mp3 player, then what mp3s have I got on my laptop right now that I could install onto it?
# lookup all the files below /home/mark/mp3 @files = File::Find::Rule->file ->in("/home/mark/mp3");
This populates @files with a list of the countless mp3s I carry around with me on my laptops. They're all fully qualified paths like so:
'/home/mark/mp3/madness/It_Must_Be_Love.mp3'
If I'd specified a relative path to File::Find::Rule. I'd have got a relative path back again. For example this code:
# change to my home dir chdir("/home/mark");
# find all the files in the 'mp3' dir in there @files = File::Find::Rule->file() ->in("mp3");
populates @files
with a list of mp3s that look like:
'mp3/madness/It_Must_Be_Love.mp3'
You can easily convert between relative paths and absolute paths whenever you need to by using the File::Spec module.
# use the functional form of File::Spec where it'll export # 'abs2rel' and 'rel2abs' into our namespace. use File::Spec::Functions qw(:ALL);
# convert the absolute path to one relative to "/home/mark" print abs2rel("/home/mark/mp3/madness/It_Must_Be_Love.mp3", "/home/mark") . "\n";
# convert the relative path to an absolute, assuming it # starts from "/home/mark" print rel2abs("mp3/madness/It_Must_Be_Love.mp3", "/home/mark") . "\n";
Omitting the second parameter (the "/home/mark"
) will cause
File::Spec
to just use the current working directory as it's base -
probably what we wanted anyway.
Back in our situation, I've suddenly realised that when I rip my music from CDs rather than downloading it from the web, I use the ogg format which I store in my mp3 dir as they're pretty much the same thing. However, since the mp3 player I'm looking at doesn't yet support oggs I'm not interested in those (oops, I sense much re-encoding in my future.) How do I just find the mp3 files?
my @files = File::Find::Rule->file ->name('*.mp3') ->in("/home/mark/mp3");
So you can see we're chaining rules together. First we say that the
file must be a file (we could have used the directory
method to
get a list of directories back.) You can also see that the name
method takes a standard unix file glob - you can use a standard perl
regular expression in it's place if you want, by using the qr
operator.
my @files = File::Find::Rule->file ->name( qr{\.mp3$} ) ->in("/home/mark/mp3");
I get another thought. What about all the mp3s of sound effects I've downloaded? Better not count any of them, so better disregard all files smaller than two hundred kilobytes.
my @files = File::Find::Rule->file ->name('*.mp3') ->size(">=200K") ->in("/home/mark/mp3");
And all the music I downloaded in the last week may or may not be any good, so we'd better not count that either.
my $last_week = time()-(7*24*60*60); my @files = File::Find::Rule->file ->name('*.mp3') ->size(">=200K") ->mtime("<$last_week") ->in("/home/mark/mp3");
You can set up negative rules with the not clause. You simply need create another rule that hasn't been executed with an in clause.
my $backup = File::Find::Rule->file ->name("*~","*.bak","#*#");
# find large documents my @files = File::Find::Rule->file ->size(">30K") ->not( $backup ) ->in("/home/mark/docs");
Rules that haven't been executed with in
can be happily combined.
For example, finding files that are bigger than they should be:
my $mp3 = File::Find::Rule->file ->named('*.mp3') ->size(">4MB");
my $jpg = File::Find::Rule->file ->named('*.') ->size(">350KB");
my @files = File::Find::Rule->or($mp3, $jpg) ->in("/home/mark");
As this or
is a kind of lazy evaluation it can be used to help your
code not search in particular directories. As way of an example
consider the subversion version control system, and how keeps a
'backup' copy of many files in your current directory in a directory
inside it called .svn
. Say we want to find all of the .pm
files
in a directory, but don't want to find those pesky backup files:
# look for '.svn' and fail my $svn = File::Find::Rule->directory ->name(".svn") ->prune # don't go into it ->discard; # don't report it
my $pm = File::Find::Rule->file ->name("*.pm");
my @files = File::Find::Rule->or( $svn, $pm ) ->in("/home/mark/svn/advent/code");
As the $svn
rule is checked first (it's the first statement in the
or
) it gets to decide that the rule should both not search inside
the .svn
directories (the prune
command) and that the other rule
in the or
should not even be consulted (the discard
command)
about if the file can pass the rule.
File::Find::Rule has numerous extension modules. One such instance is the File::Find::Rule::MMagic module that provides an interface for checking the mime type of a file. For example, with this I can ignore spurious data (in my case normally oggs) that have been accidentally named with an mp3 extension:
use File::Find::Rule::MMagic;
my @files = File::Find::Rule->file ->name("*.mp3") ->magic('audio/mpeg') ->in("/home/mark/mp3");
I can use the File::Find::Rule::MP3Info to look for tracks that are by a particular artist:
use File::Find::Rule::MMagic; use File::Find::Rule::MP3Info;
my @files = File::Find::Rule->file ->name("*.mp3") ->magic('audio/mpeg') ->mp3info( ARTIST => "Green Day") ->in("/home/mark/mp3");
Note how I can load more than one extension module and they 'stack' - I get the ability to use rules from either module.
All mp3 files mentioned in this tutorial downloaded legally though licensed agents of the copyright holders. All ogg files mention in this tutorial extracted from my personal CD collection for my own personal use only.