One of the keys to writing good code is using the right tool for the job. The whole reason why Perl regular expressions are so popular is because they're a completely separate little language within Perl for doing a distinct job. In theory, someone could have written a module with which you could specify a program method call by method call to build up a regular expression matcher - but writing in a domain particular language is much easier
The closer to the domain of the problem the easier it is to code programs in this domain. Text::Glob is an example of an even more domain specific language than regular expressions, not only optimised for matching text buy highly optimised for specifying patterns for matching file names. It allows you to directly use the syntax that you're familiar with from using the shell directly in your Perl programs. Sure, you could do this with a regular expression, but it's much harder to deal with and a lot less clear for the reader - especially if you deal with the edge cases.
Text::Glob can be made to export two functions, glob_to_regex
and match_glob
. The former of these returns a compiled perl
regular expression that implements the glob pattern, whereas the latter
applies the glob pattern directly to a list.
use Text::Glob qw(glob_to_regex); my $regex = glob_to_regex("*.pm"); my @perlfiles = grep /$regex/, @files;
use Test::Glob qw(match_glob); my @perlfiles = match_glob("*.pm", @files);
And that's about it, apart from describing the syntax.
The syntax for wildcards is pretty straight forward compared to normal
regular expressions. '*' means match zero or more characters and '?' means
any one character. Therefore the pattern b?b*
would match like so:
bob # matches (the 'o' matches ?, and '*' nothing) bab # matches (the 'a' matches ?, and '*' nothing) bac # doen't match (the c isn't a b) bobby # matches (the 'o' matches ?, and '*' 'by')
There's some additional constraints on these wildcards. Neither of the wildcards match the '/' separator
bob/foo # doesn't match as we've 'changed directories'
Text::Glob also respects the special nature of dot-files. This
means that if we create a pattern like *_history
it won't match
.bash_history
. What it's actually doing is making sure that '?' or
'*' won't match a dot if it's the first character.
There's two ways of specifying alternatives. The first is a simple list of alternatives separated by commas. For example:
*.{html,htm} # match all html documents *.htm{l,} # same thing The other way is to provide character classes
index_[a-z][a-z].htm{l,} # match all different languages
Obviously, these can be mixed and placed within each other
index{_[a-z][a-z],}.{html,htm,xml,txt,cgi}
So there you have it, another simple but immensely useful module.