Fast CPAN Module Installation
"This build is taking forever", complained Cookie Cutter, the newest elf to join Santa's Continuous Integration team, "We'll be lucky if it's done by next Christmas, let alone this one!".
"Well, we do use a lot of CPAN modules", Snowstorm, head of the team, explained. "They take a while to install on a new machine, but we're certainly not going to re-write all that code ourselves. I just wish it would install quicker."
"Well," Cookie Cutter smiled, "I might have a way..."
Solving Cookie Cutter's Problem
I write Perl everyday with great CPAN modules.
To install modules from the CPAN, I was using cpanm. I love it because it just works. One command not only installs the module, but first installs all the dependencies of that module that aren't already installed, and all the dependencies of those modules, and so on and so on
shell> cpanm Catalyst
If you develop a serious Perl software, it often depends on hundreds of CPAN modules. In fact, dependence on Catalyst implies dependence on 100+ CPAN modules at least.
Because of this it can take quite a lot of time to install a module with cpanm
. This is because cpanm
installs them in series, downloading one and examining one module at a time.
Like Cookie Cutter, I always hoped I could install CPAN modules faster.
In Perl QA Hackathon 2015, Tatsuhiko Miyagawa, the author of cpanm
, developed Menlo (the code name of cpanm 2.0). And he announced that Menlo would be maintained and released as a regular Perl module in his blog post. This allows us to write Perl code that depends on Menlo. This is exciting, isn't it?
Using Menlo I was finally able to write a CPAN module installer called cpm
which installed in parallel. Rather than downloading one module and examining each module one at a time like cpanm
does, as soon as cpm
has identified multiple dependencies it starts download and install more than one module at once. This parallelism makes cpm
faster than any other CPAN module installer.
Are you sure cpm is fast?
As cpm
is a module just like any other on the CPAN, its installation is straightforward:
$ cpanm App::cpm
Now you have cpm
! Let's try installing Plack with both cpanm
and cpm
, and compare their elapsed times. Because cpm
does not run test cases, we need to execute cpanm
with --notest
option in order to get a fair test:
$ time cpanm -L extlib --notest --quiet Plack
...
real 0m47.705s
Next cpm:
$ time cpm install Plack
...
real 0m16.629s
Wow, this shows cpm (16sec) is about 3 times faster than cpanm (47sec)!
Of course results will change depending on the situation, so why don't you try it yourself?
TODO
In YAPC::Asia 2015, I could talked with miyagawa about cpanm. Then he said the parallel feature might be merged into cpanm itself. I was really happy to hear that.
To merge cpm into cpanm, there are some TODOs or issues that must be resolved:
cpm
will need to support platforms that do not have fork(2) system call. Currentlycpm
doesn't work on such platforms.Messy log output (a big issue!) Currently cpm uses Menlo in parallel and the outputs of all the Menlos are just redirected to one file. So outputs are mixed and really messy. I believe the logging is important for stable software. Do you accept the messiness, or do you have any ideas to resolve this?
Meanwhile, back at the North Pole...
"You see", said Cookie Cutter, "cpm is a CPAN module installer which is faster than other CPAN module installers."
"And we can install it just with 'cpanm App::cpm'?", asked Snowstorm, "That's it?"
"Yep! All we need to do is change one line in our installer script to start using it. And hopefully if we and other people find cpm useful and stable, then it may be merged into cpanm itself!""