Perl Advent Calendar 2007-12-02

Parallels, parallels, it's Christmas time in the city

by Josh McAdams

"Bah, humbug!" No, that's too strong
'Cause it is my favorite holiday
But all this year's been a busy blur
Don't think I have the energy

To add to my already mad rush
Just 'cause it's 'tis the season.
        ·   ·   ·
—The Waitresses, "Christmas Wrapping"

Like everyone else that merry ol' soul, Saint Nick, has a lot to do around the holiday season. In fact, he has so much going on that there is no way Christmas could be a success without a little multi-tasking. Lucky for Santa, there's a Perl module that can make parallel processes a breeze.

With the help of Parallel::Jobs, Santa is sure to get everything done on time. Take for instance the task of getting his shopping list distributed to vendors across the land. There are a lot of gifts to give and there is no way Santa can wait for each list to get to the right shop before sending the next list. With Parallel::Jobs, sending all of the lists at once is easy.

mod2a.pl

   1 use Parallel::Jobs;
   2 
   3 my @orders = (
   4     [ 'amazonia.txt'   => 'snick@amazonia.com:orders/' ],
   5     [ 'sprawlmart.csv' => 'claus@sprawlmart.com:cheap_stuff/' ],
   6     [ 'priceco.xml'    => 'santa@priceco.com:bulk_buys/' ],
   7 );
   8 
   9 my %placed_orders;
  10 
  11 for my $order (@orders) {
  12     $placed_orders{ Parallel::Jobs::start_job( 'scp', @$order ) } =
  13       $order->[0];
  14 }
  15 
  16 while ( my ( $pid, $event, $data ) = Parallel::Jobs::watch_jobs() ) {
  17     if ( $event eq 'EXIT' ) {
  18         if ( !$data ) {
  19             print "Transferred $placed_orders{$pid}\n";
  20         }
  21         else {
  22             print "Failed to transfer $placed_orders{$pid}\n";
  23         }
  24         delete $placed_orders{$pid};
  25     }
  26 }

In the program above you can see that Santa needs to send three files of orders to three different stores, so he queues them all up in an array and then loops through the array calling scp for each file using the start_jobs subroutine. This subroutine returns a process ID which is stored in a hash, along with the name of the file that is being transferred.

Then the script runs a loop using the watch_jobs subroutine as a condition. This subroutine returns the process ID, an event name, and some data for each process that was started with start_jobs. Once all of the jobs are done, it returns undef and terminates the loop.

Finally, Saint Nick prints out the names of any files that he couldn't track with watch_jobs. This should never happen, but it's better to know on the off chance that it does.

Of course, what you're seeing here is very similar to using wait in the shell. But, this is just the surface of what Parallel::Jobs can do. Parallel::Jobs can also capture output and return it as an event and data through watch_jobs or even write it to a file. It can also pass data that is contained in a file to any of the executed jobs via their STDIN.

We'll wrap up with an example that exercises more of the functionality found in Parallel::Jobs. One feature that we'll use is passing to and collecting data from a job using standard filehandles and named files. We'll also tell Parallel::Jobs to collect the output of a job for later use; à la qx. In this example, the naughty list contains three items and the nice list contains four items.

mod2b.pl

   1 use Parallel::Jobs;
   2 
   3 my $nice_pid =
   4   Parallel::Jobs::start_job(
   5     { stdin_file => 'nice.txt', stdout_capture => 1, stderr_capture => 1 },
   6     'wc', '-l' );
   7 
   8 print "NICE [$nice_pid]\n";
   9 
  10 sleep(1);
  11 
  12 open( NAUGHTY,     '<', 'naughty.txt' ) or die $!;
  13 open( NAUGHTY_OUT, '>', 'naughty.out' ) or die $!;
  14 open( NAUGHTY_ERR, '>', 'naughty.err' ) or die $!;
  15 
  16 my $naughty_pid = Parallel::Jobs::start_job(
  17     {
  18         stdin_handle  => *NAUGHTY,
  19         stdout_handle => *NAUGHTY_OUT,
  20         stderr_handle => *NAUGHTY_ERR
  21     },
  22     'wc', '-l'
  23 );
  24 
  25 print "NOT NICE [$naughty_pid]\n";
  26 
  27 while ( my ( $pid, $event, $data ) = Parallel::Jobs::watch_jobs() ) {
  28     print "Finished [$pid] [$event] [$data]\n";
  29 }
  30 
  31 close NAUGHTY;

Which outputs:

santa@northpole:~ $ mod2b.pl
NICE [21214]
NOT NICE [21215]
Finished [21214] [EXIT] [0]
Finished [21214] [STDOUT] [       4 ]
Finished [21214] [STDOUT] []
Finished [21214] [STDERR] []
Finished [21215] [EXIT] [0]

There are also files named naughty.err and a naughty.out, the latter contains the character 3.