The 2004 Perl Advent Calendar
[about] | [archives] | [contact] | [home]

On the 16th day of Advent my True Language brought to me..
Cache::Cache

There are times that a traditional database isn't the best place to store some data. The data stored in a relational database tends to be heavily structured, and the overhead of putting data in and pulling it out is high.

Data pulled out of a database may take time to extract, or may change frequently, but in order to provide a useful web interface we may need to access this data over several web pages. What we require is some way to cache the data we're currently looking at so we don't have to rely on the database each request. Likewise we often need to save data in some intermediate form between requests until we're happy enough with it to store it permanently back in the main database.

Cache::Cache is a simple caching system that can be used between requests - or between any other processes - to store arbitrary Perl data in a simply in a structured manner.

Cache::Cache rather than being a cache itself is actually a framework for creating caches. Various other modules are subclasses that do the actual caching, implementing the same interface as the parent class. This means that you can easily seamlessly switch between cache types, choosing to cache to the filesystem, to shared memory, or any of the other numerous strategies.

Picking a one of the fairly simple implementations, Cache::FileCache is a simple file based cache that stores the data you store in it to a file on disk. First we need to 'connect' to the cache with a namespace that all scripts wanting to access the same cache use:

  my $cache = Cache::FileCache->new({ 
    namespace => "foo", 
  });

We can then save things with:

  $cache->set($key, $data);

And getting them back (later on, in another processes, whatever) is just as easy:

  my $data = $cache->get($key);

as long as you have some way of maintaining the key between requests (for example, it could be that user's username, or a userid in a cookie.)

An Example In Use

Let's write a little To-do List CGI script. We start by loading a load of modules:

  #!/usr/bin/perl
  # turn on the safety features
  use strict;
  use warnings;
  # load our modules
  use CGI;
  use Cache::FileCache;
  use Data::UUID;
  use HTML::Entities;

We're going to be using a file based cache so we then create the instance of Cache::FileCache:

  # set up the cache
  my $cache = Cache::FileCache->new({
    namespace => "todo list",
  });

Each user needs a unique id so that the various people using the todo list page get a different todo list (I don't want anyone else adding to my list - I've got enough on as it is.) We create a new unique id for the user the first time they visit the page and store it in their cookies. Each time they return to the todo list page we can access this id with the cookie function and use it to extract the saved data from the cache.

  # get the data from the cache if there's a key
  # in the cookies, otherwise create a new unique cookie
  my $data; 
  my $cgi = CGI->new();
  my $id = $cgi->cookie("id");
  if ($id)
    { $data = $cache->get($id) }
  else
    { $id = Data::UUID->new->create_str; }
  
  # if we didn't get any data back, create an empty data
  # structure containing no items
  $data ||= { last_transaction => "", list => [] };
  # print the header with the cookie containing the
  # uuid for this user
  my $cookie = $cgi->cookie(-name    => "id",
                            -value   => $id,
                            -expires => '+1h');
  print $cgi->header(-cookie => $cookie);

We now need to update the list we got back with any data that the user has submitted. If the user submits the same data again by hitting the reload button on their browser we don't want to add the same data twice. To avoid this when we submit any new data we also submit a unique transaction id, and if we see the same transaction again we don't do anything.

  # save the data if the user hasn't just accidentally
  # hit refresh and sent the same data again
  if ($cgi->param('val') &&
        $cgi->param('transaction') ne $data->{last_transaction})
  {
    # remeber the new item
    push @{ $data->{list} }, $cgi->param('val');
    # remember we already processed this transaction
    $data->{last_transaction} = $cgi->param('transaction');
    # save the new list to disk
    $cache->set($id,$data);
  }

We can now (finally) print out the list of todo items

  # start the html
  print "<html><body><h1>Todo List</h1>";
      
  # print the todo list
  print "<ol>";
  print "<li>".encode_entities($_)."</li>"
    foreach (@{ $data->{list} });
  print "</ol>";

And print out the form that allows the user to submit new items. As well as providing a text entry box, we add a hidden field that contains the unique transaction:

  # print out a new form
  print "<form>";
  my $trans = Data::UUID->new->create_str;
  print qq{<input type="hidden" name="transaction" value="$trans" />};
  print qq{New Item: <input type="text" name="val" />};
  print qq{<input type="submit" name="submit" value="Add"/>};
  print "</form>";
  # end the html 
  print "</body></html>";

Item Expiry

Things that are put in the cache are currently stored in there forever. This is a big problem, as eventually the disk will fill up with more and more data, and we won't be able to distinguish between old useless data and the current data to clean it out. Worse, because our cookie will only be returned up to an hour after the last time the page was accessed, we'll end up with completely useless inaccessible data stored in the cache.

Cache::Cache allows us to set an expiry time for all data in the cache when we store it:

    # save the new list to disk, for up to an hour
    $cache->set($cookie,$data, 3600);

Or we can use a more friendly string form rather than just a number of seconds:

    # save the new list to disk, for up to an hour
    $cache->set($cookie,$data, "1 hours");

The cache will be cleaned when we call purge. The most efficient time to do this is at the end of the request:

    # close STDOUT, meaning that the user will see the page
    # as completely loaded
    close STDOUT;
    # purge the expired items from the cache.
    $cache->purge();

If our code is complex and there's no clear point where this can be easily done we can set the module to automatically check for expired items every single time get and/or set is called.

    $cache->set_auto_purge_on_set(1);
    $cache->set_auto_purge_on_get(1);

What You Can And Can't Put In a Cache

You can put any combinations of hashes, arrays, scalars and references to any of these structure, blessed or not, into a Cache::Cache. You can't put code references in, filehandles, or any object that relies on a C datastructure since Perl can't replicate that data in a separate process.

  • The Perl Cache homepage
  • Cache::FileCache
  • Cache::SharedMemoryCache
  • Cache::MemoryCache
  • Cache::FastMemoryCache