The 2003 Perl Advent Calendar
[about] | [archives] | [contact] | [home]

On the 1st day of Advent my True Language brought to me..
CGI::Untaint

Whenever you get information passed back from a CGI form, you never really know what you're going to get. No matter how much client side verification of the code you do, it's always possible for someone to bypass your crafted page and write their own html that submits to the same place.

So your CGI script has to be prepared to deal with whatever it's passed. It has to be untrusting, and check and double check the data. Writing code to do this is boring, time consuming, and very tempting to skip. Anything that makes this easier is very welcome, and CGI::Untaint hits the hammer on the head. It provides a framework for creating reusable components that can be used to extract various bits, and does it with the minimum of fuss.

When you use CGI to get parameters the simplest way (but also the most sloppy way) is to do this:

  # create the cgi object
  use CGI;
  my $cgi = CGI->new();
  # extract the data and call error_handling to print out errors
  # if the data can't be extracted (i.e. it was missing)
  my $age = $cgi->param("age");
  error_handing() unless defined($age);

This can put anything in $age at all. You're expecting a round number of years back, but for all you know some idiot's typed "eight" into the form or "12.5" and when you get round to inserting $age into your database it'll all fall over. What you'll need to do is write some code to check this with a regular expression:

  # create the cgi object
  use CGI;
  my $cgi = CGI->new();
  # extract the data and call error_handling to print out errors
  # if the data can't be extracted (i.e. it was missing or malformed)
  my $age = $cgi->param("age");
  error_handing() unless defined($age);
  ($age) = $age =~ m/^(\d+)$/; # extract it if it's all digits
  error_handing() unless defined($age);

And you'll also have do all the same for all the other variables, hoping that the code you just typed hasn't got any subtle bugs in it. What CGI::Untaint allows you to do is to utilise collections of predefined regular expressions to pull things out of the cgi parameters instead.

  # create our untainting object
  use CGI;
  use CGI::Untaint;
  my $cgi = CGI->new();
  my $untaint = CGI::Untaint->new($cgi->Vars);
  # extract 'age' from the parameters as an integer.
  my $age = $untaint->extract( -as_integer => "age" );
  error_handing() unless defined($age);

This has several advantages; Your code is quicker to develop as you're having to write less of the sticky logic yourself. You're reusing code so any code you do write you only have to write once. Finally, the code you're using to do the extraction will have been independently tested and checked that it's functioning correctly.

The extract method takes two arguments, how the data is to be extracted (in this case as an integer) and the name of the cgi parameter to be extracted. The former of these two actually names the module that provides the instructions to CGI::Untaint how to do the extraction (So -as_integer means that the CGI::Untaint::integer module should be used and -as_printable means that CGI::Untaint::printable will be used, and so on.) These 'handlers' can be thought as plug-ins to CGI::Untaint telling it about new ways to extract different types data. The default handlers that are installed when you install the main CGI-Untaint distribution are:

In addition to this basic selection there's a wide collection of modules on the CPAN that can be downloaded and installed. A few of the notable examples are:

Writing Your Own Extensions

Even though there's a great selection of untaint handlers available from the CPAN, sooner or later you're going to find that you're in a situation where you want to check something that there isn't an untaint handler for. For example, you might want to check if a value is one of the options that you were offered from a drop down list, so that extraction handler will be unique to your own particular application.

Creating your own handlers is as simple as writing a quick module that inherits from CGI::Untaint::object and defines a method called _untaint_re that returns a reference regular expression. This regular expression should place the result of the extraction into $1. This is all very simple as soon as you see an example. Here's a handler that extracts red, green or blue, and fails for all other things passed to it:

  package CGI::Untaint::red_green_blue;
  use base qw(CGI::Untaint::object);
  # turn on perl's safety functions
  use strict;
  use warnings;
  # define the regular expression that will do the test
  sub _untaint_re { qr/^(red|green|blue)$/ }
  # return true to keep perl happy
  1;

Your module can define further checks by defining a is_valid method. This method will be passed a reference to the object, on which value can be called to get the current value that's just been extracted by _untaint_re, and the routine should return true or false to indicate if that value was valid or nor. Cunningly, value can also be used to assign to. For example, here's an expansion of the above handler that doesn't care what case the names of the colours are:

  package CGI::Untaint::red_green_blue;
  use base qw(CGI::Untaint::object);
  # turn on perl's safety functions
  use strict;
  use warnings;
  # define the regular expression that will do the test
  sub _untaint_re { qr/^(red|green|blue)$/i }
  sub is_valid
  {
    my $self = shift;
    my $value = $self->value;
    # make the value lower case.
    $self->value(lc($value));
    # return true as it's valid
    return 1;
  }
  # return true to keep perl happy
  1;

Testing Your Own Extensions

When you're writing your own extraction handlers you really should check that you can extract what you expect, and can't extract what you shouldn't be able to. The Test::CGI::Untaint module (blatent plug, since I wrote it) can help you here. It defines two tests is_extractable and unextractable that check that the extraction handler you name either extracts a value as what you expect or doesn't extract anything at all respectively.

  #!/usr/bin/perl
  use strict;
  use warnings;
  # start the testing
  use Test::More tests => 7;
  use Test::CGI::Untaint;
  # check if we can extract the basic colours
  is_extractable("red",  "red",  "red_green_blue","try red");
  is_extractable("green","green","red_green_blue","try blue");
  is_extractable("blue", "blue", "red_green_blue","try green");
  # check the case stuff works
  is_extractable("Red","red","red_green_blue","try Red");
  is_extractable("rEd","red","red_green_blue","try rEd");
  is_extractable("reD","red","red_green_blue","try reD");
  # but not yellow
  unextractable("yellow","red_green_blue", "try yellow");

  • Test::CGI::Untaint
  • CGI::Untaint::date
  • CGI::Untaint::upload
  • CGI::Untaint::email
  • CGI::Untaint::creditcard