Building Santa's Naughty and Nice List with Stepford
It's a little known fact that Santa's elves are the ones responsible for producing his yearly naughty and nice list. But working on the list has been taking up time that they'd rather use for drinking pine juice and playing Dark Souls. They have a crufty
Makefile but it doesn't do a great job of rebuilding things when dependencies change, so they're constantly finding output errors and having to delete old files. It also doesn't play all that nicely with the Perl code they wrote to do the real work.
So the elves pooled their money and hired me to automate building the list. Looking at how they'd built the list before, I realized that Stepford was the perfect tool for the job!
What is Stepford?
Stepford is a tool that takes a set of steps (tasks), figures out their dependencies, and then runs them in the right order to get the result that you ask for. The result itself is just another step that you specify when creating the
Stepford::Runner object. Steps are Perl classes built using
Dependencies and Productions
The "big thing" that Stepford does for you is to figure out the dependencies needed to get to the final step. It does this by looking at the dependencies and productions of all your steps and then running those steps in the necessary order.
Both dependencies and productions are declared as Moose attributes with a special
trait. Here's an example:
You'll see how to actually populate the
Stepford matches a production to a dependency solely by name, which means that attribute names for productions and dependencies must be unique to a given set of steps.
A "Step class" is any Moose class which consumes the
Stepford::Role::Step role (or another role which in turn consumes that role). This role in turn requires that a step class implement a few specific methods named
last_run_time. You'll see examples of both of these methods as we go further.
What Goes Into the Naughty and Nice List?
The elves gave me a long list of requirements, but honestly it all seemed like too much trouble. And since these elves are not very technically savvy, I'm going to take the easy route instead and just make some stuff up.
Here's what I'm going to do:
Get the names and IP addresses for all the children in the world, or at least a few of them.
Assign each child a UUID so I can track them easily.
Download the free GeoLite2 database from MaxMind.
Use the GeoLite2 database to look at each child's geographical location and use that to give their IP a naughty/nice score. This will be very scientific.
Look at each child's name and use that to give their name a naughty/nice score. Again, this will be very scientific.
Combine the IP and name scores into a single score per child and generate a text file with the naughty/nice list.
Here's a graph of each step showing each steps' dependencies:
Looking at this graph, you can see a couple interesting things. First, there are two steps, "Get list of children" and "Download GeoLite2 databases", with no dependencies. Next, there are steps that are dependencies for more than one other steps, "Assign UUIDs" and "Get list of children". Finally, the "Combine scores" step has three dependencies but is not a dependency of any other step.
Figuring all this stuff out is what Stepford is for. In fact, it calculates a graph just like this internally.
Building our First Step
Let's start by building the step to "Get list of children". All the step classes for a single set of steps should live under the same namespace. I'm going to use
NN::Step as our namespace prefix.
Let's look at the interesting bits more closely.
All Stepford classes must consume one of the Step roles provided by Stepford. This particular role tells Stepford that all of this step's outputs are in the form of files. This lets Stepford calculate the step's last run time by looking at the file's modification time. For non-file steps, you have to provide a
last_run_time method of your own.
This class has two attributes. The
root_dir attribute is neither a dependency nor a production. You'll see how to set this attribute later on. The
children_file attribute is a production. Some other steps will depend on this production.
Every Step class must provide a
run method. This method is expected to do whatever work the step does. In this case I take the list of children in
DATA and turn it into a CSV file.
logger attribute is provided to each step by the
Stepford::Runner class. You'll learn more about that class later.
Atomic File Steps
I could have used
Stepford::Role::Step::FileGenerator::Atomic instead. If your step is writing a file, using this role will prevent you from leaving behind a half-finished file if the step dies. I didn't use it in my example code just to keep the code simpler, but I highly recommend it for production code.
The other steps are pretty similar. They take some data and spit something new out. Let's take a look at some of the code from the step that adds the UUIDs:
This step depends on the
children_file created by the
Children step. Stepford will figure this out and make sure that the steps are run in the correct order.
AssignUUIDs step in turn has its own
StepProduction which future steps will depend on.
The remaining steps follow a similar pattern. They take an input file and produce an output file. The last step,
WriteList, is a little different, so let's see how:
This is mostly so I can demonstrate how to write a
This step has three dependencies, unlike the previous steps you've seen. Each of these dependencies comes from a separate step. Stepford will figure all that out for us and run those steps before this one.
And here's the
This is pretty straightforward. If the file exists, I return its last modification time. If not, I return
Stepford uses the value of each step's
last_run_time to determine whether or not a given step needs to be run at all. If the data in a dependency is newer than the data in the step that depends on that data, there's no point in regenerating the dependency's data.
(By the way, the
last_run_time method above is essentially the same as the one in
Running Your Steps
Now that I've written my steps, how do I run them? Here's the script I wrote:
The only interesting piece is my use of
Stepford::Runner constructor takes several named arguments. The
step_namespaces argument tells Stepford under what namespace it should look for steps. It will load all the classes that it finds under this namespace.
You can pass multiple namespaces as an array reference. When two steps have a production of the same name, then the step that comes first in the list of namespaces wins. This is useful for testing, as it lets you mock as many steps as you need to.
logger can be any object that provides a certain set of methods (
Finally, if you set
jobs to a value greater than one, Stepford will run steps in parallel, running up to
$jobs steps at once whenever possible.
The call to the
run method also accepts named arguments. Keys in the
config argument which match constructor arguments for a step will be passed to that step class as the step is constructed. Remember way back up above when I mentioned that I'd show you how to set the
root_dir attribute of the
NN::Step::Children class. This is how you do that.
final_steps argument can be a single step class name or an array reference of names. This is how you specify the result you're asking Stepford for.
rake, which are both great tools. However, what makes them shine is how they integrate with certain environments. The
make tool is great if you're interacting with a lot of existing command line tools like compilers, linkers, etc. And of course
rake is great if you're dealing with existing Ruby code.
But our database building code was is written in Perl, so it made sense to write a tool in Perl.
If you're in a similar situation, with a Perl code base that executes a series of steps towards one or more final products, then Stepford might be a good choice for you as well.
It certainly worked well for those elves. Sure, the naughty and nice list they get is complete and utter nonsense, but it's a lot quicker to generate, giving them more time for their pine juice-fueled Dark Souls speedruns.
If you want to see all the step code for this article, check out this article's GitHub repo.