What we normally mean by the 'type' of a file is actually the MIME type of a file. Every file sent across the web is sent with it's own MIME type. Attachments in mails have MIME type declarations. For example, a JPEG image is 'image/jpeg' and a web page is 'text/html'.
When we know a file's MIME type then we know what what kind of data it is and what we can do with it. We know what program to load to view it. At the very least, you can use it to check that the data that some user just uploaded to their user page on your webserver is actually valid picture file, and isn't some other kind of binary data that a corrupt client has decided to encode the data as. This of course is following good practice guidelines - never trust any data the user sends you without checking it first.
The File::MMagic module can be used to determine the mime type of a file. It uses all kinds of cunning to do this. Firstly it uses a database of "magic" numbers to look at the first few bytes for telltale signs - for example GIF files start with "GIF" and flash files start with "FWS". If that fails - for example html files don't start with anything special - then the module can use extra regular expression techniques to check both the filename and the contents of the file for give away signs that distinguish them.
The File::MMagic module is pretty easy to use. Essentially it's just case of creating a parser object, and then telling it to look at a file
use File::MMagic; my $mm = File::MMagic->new();
print "The mime type of '$ARGV[0]' is :" $mm->checktype_filename($ARGV[0]) . "\n";
Of course, it can check an open filehandle
use File::MMagic; my $mm = File::MMagic->new();
# open the file in binary mode my $filehandle = IO::File->new("image.jpg") or die "coundn't open 'image.jpg': $!"; binmode $filehandle;
print "The mime type of 'image.jpg' is :" $mm->checktype_filehandle($filehandle) . "\n";
Or even from a chunk of data already loaded into memory:
use File::MMagic; my $mm = File::MMagic->new();
# open a file in binary mode my $filehandle = IO::File->new("image.jpg") or die "coundn't open 'image.jpg': $!"; binmode $filehandle;
# read in the entire file into $data my $data; { local $/; # set it so <> reads all the file at once $data = <$filehandle>; # read in the file }
print "The mime type of 'image.jpg' is :" $mm->checktype_contents($data) . "\n";
So with this new found knowledge we can construct an example script that looks at all files in a directory and builds a web page with a graph. First we check each file for it's MIME type and size and store the cumaltive value in a hash.
#!/usr/bin/perl
# turn on perl's safety features use strict; use warnings;
# load the modules use File::MMagic;
# new parser my $mm = File::MMagic->new();
# open the dir opendir DIR, $ARGV[0] or die "Couldn't open the directory '$ARGV[0]': $!";
# work though the files in the dir my %files; while (my $file = readdir DIR) { # skip it if it isn't just a normal file next unless -f $file;
# get the mime type and other info my $magic = $mm->checktype_filename($file);
# delete anything after the mime type $magic =~ s/ ; # look for a the first semicolon .* # and then anything up until $ # the end of line /;/x;
# add on that size to a hash $files{ $magic } += -s $file; } closedir DIR;
Now using that information we can create a chart using the GD::Chart::hbars module.
use GD::Graph::hbars; use IO::File;
# create a new pie chart my $pie = GD::Graph::hbars->new(400,300);
# plot the data onto it, and get a GD::Image back my $gd = $pie->plot([[keys %files],[values %files]]);
# open a file to write it to, and save it as a png my $img_fh = IO::File->new("chart.png",">") or die "Can't open 'chart.png': $!"; binmode $img_fh; print {$img_fh} $gd->png;
And finally print out the HTML. Note that we use the HTML::Entities module to encode the data that we're printing out. This means that any HTML chars like '<' or '>' will be protected - not that we are likely to have these charecters in the directory, but we never know.
use HTML::Entities;
# open the file my $html_fh = IO::File->new("chart.html",">") or die "Can't open 'chart.html': $!";
# and write out the html my $dir = encode_entities($ARGV[0]); print {$html_fh} qq{ <html> <head><title>Files by mime type for: $dir</title></head> <body> <img src="chart.png" width="400" height="300"> <table>};
# print a line for each MIME type foreach my $key (keys %files) { print {$html_fh} "<tr><td>" . encode_entities($key) . "</td>" . "<td>" . int($files{ $key }/1024). "k</td></tr>"; }
print {$html_fh} q{ </table> </body> </html> };