Today's column is by William 'N1VUX' Ricker, of Boston.PM, who has requested today's door and prepared this review as a holiday gift for Uri.
So you've got this utility that counts connections by source, maybe it's scanning netstat -a in realtime to colorize a network diagram, or the Apache logs weekly (or hourly). Usually you'd sort by Domain Name or TLD, or by count or by port, but this time the boss wants it sorted by source IP subnet, since that's how the datacenter network is organized. So the desired report will look something like
0.0.0.0 3 reallylong.sillyname.foobar.do 10.32.125.101 62 laxweb1.firewall.foobar.dom 10.32.125.119 98 laxweb2.firewall.foobar.dom 10.37.136.119 45 bosweb1.firewall.foobar.dom 10.37.136.120 73 bosweb2.firewall.foobar.dom 10.21.53.103 1 monitor.foobar.dom 10.21.192.232 2 ops.foobar.dom 10.45.16.89 1 lanserver02.bos.foobar.dom 10.45.183.222 2 desktop9483.bosfoobar.dom 10.46.120.197 3 10.46.120.197 10.47.156.24 1 lanserver031.lax.foobar.dom 10.47.156.26 1 desktop051.lax.foobar.dom 10.47.156.34 2 desktop003.lax.foobar.dom 10.47.156.40 2 desktop031.lax.foobar.dom 127.0.0.1 48 localhost.localdomain 172.24.26.77 1 qa07.foobar.dom 172.24.101.207 1 qa92.foobar.dom 172.24.101.208 6 qa93.foobar.dom 172.24.101.251 1 qa05.foobar.dom
but lots longer, so you need this to run fast.
Sorting with a simple comparison block would require way too much slicing and dicing on each of the O(N*logN) comparisons, so you have to do something efficient -- keep the O(N*logN) steps fast and the slow steps only in O(N). The classical Schwartzian Transform shows the structure of a compound key nicely,
1 my @sorted = 2 map { $_->[0] } 3 sort { 4 $a->[1] <=> $b->[1] 5 || 6 $a->[2] <=> $b->[2] 7 || 8 $a->[3] <=> $b->[3] 9 || 10 $a->[4] <=> $b->[4] 11 } 12 map { [ $_, split /[. ]/ ] } 13 @unsorted;
but it will call back to the coded-in-Perl decision block from the built-in-C Perl runtime for each comparison O(N*logN times), so it still has a speed penalty:
The Guttman-Rosler Transform is a special case of the Schwartzian transform that makes strings, not array-refs; the strings contain null-delimited keys in sequence followed by (typically) the original record # and the contents of the record or index into a hash or array. This allows it to be sorted natively, so the O(N*logN) step is the built-in native C implementation of sort, which acts like sort {$a cmp $b} only much faster.
In the great Internet sorting debate between the Schwartzian Transform and the Guttman-Rosler Transform, Merlyn admitted
«Yes. The ST can sort any multilevel complex sort, and is "programmer efficient". When the GRT can be used, and the programmer is willing to invest time to compute the "transform to and from a single string" functions, the GRT may be faster.»
Uri Guttman will regularly reply that in a multilevel sort, GRT will always be faster (barring really odd cases and data sets so small as to be ignored), since sort @array is faster than sort {anyblock} @array, no matter how simple the block: sort with no written-in-Perl callback is Perl core compiled C speed. It all depends on the balance between programmer time and computer time. (If efficiency was the primary objective, and we wanted to still use Perl, we should consider PDL or something similar.)
But Uri's a fair but balanced guy, so his Sort::Maker module gives you both ST and GRT, and also Joseph Hall's "orcish" maneuver from and good old plain sorting, and builds and compiles the sort idiom for you. Uri's, he gives you all three options, and tools to generate and benchmark them.
Alas, there's a problem with Sort::Maker's POD: it doesn't match the implementation. The POD claims we can use an array ref or a hash for an index's sub arguments, and that we can omit the =>1 for boolean arguments. But it doesn't work. The bold [] on lines 28 and 31 have to be {} and that requires =>1 on booleans too. The following sample is straight from the POD, with the addition of off-screen scaffolding 1..25 and line 35 error output:
26 my $sorter = make_sorter( 27 qw( plain ), 28 number => [ 29 code => '/(\d+)/', 30 'descending', 31 ], 32 ) ; 33 34 35 carp "no sort $@" unless $sorter; 36 37 38 my @sorted = $sorter->( @unsorted ) ;
$ perl uri.pl no sort make_sorter: Unknown attribute '/(\d+)/' at uri.pl line 35 Use of uninitialized value in subroutine entry at uri.pl line 38. Can't use string ("") as a subroutine ref while "strict refs" in use at uri.pl line 38.
Amongst its diverse weapons of inquisition, Sort::Maker can do
Enjoy!
1 #! perl -l -w 2 3 use strict; 4 use warnings; 5 use Carp; 6 use Sort::Maker; 7 8 open my $NETSTAT, "netstat -a|" 9 or die "Netstat fails $@ "; 10 11 my %Connections; 12 13 while (<$NETSTAT>) { 14 my @F=split /\s+/, $_; 15 next unless $F[-1] eq 'ESTABLISHED'; 16 my ($local, $remote)=@F[3..4]; # Slice! 17 18 next if $local =~ /[.:][*]/ or $remote =~ /[*]/; 19 my @L=split /[.:]/, $local; 20 $remote=~ s/[.:][^.:]+$//; 21 22 $Connections{$remote}->{count}++; 23 24 25 26 } 27 28 package utility; 29 { ## Scope for private var 30 use Socket; # for AF_INET, inet_aton 31 my $Zeroes=inet_aton("0.0.0.0"); 32 33 sub key_of { 34 my $addr=shift; 35 return inet_aton("0.0.0.0") unless defined $addr and $addr; 36 my $is_numeric = $addr =~ /^ ([.\d]+) $/; 37 38 if ($is_numeric) { 39 # my $name = (gethostbyaddr($addr,AF_INET))[0]; 40 return inet_ntoa($addr); 41 } 42 else { 43 my $num=(gethostbyname($addr))[4]; 44 return $Zeroes unless $num; 45 return $num; 46 } 47 48 } 49 50 } # end scope block 51 52 package main; 53 use Socket; # for inet_ntoa; 54 55 my $sort_func = 56 make_sorter( 57 plain=>1 , 58 # name=> 'main::sort_func', 59 string => { 60 ascending=>1, 61 unsigned=>1, 62 code => sub { utility::key_of $_ ; 63 } , 64 }, 65 ) 66 67 or croak "no sort: $@"; 68 69 70 my $name; 71 for $name ($sort_func->(keys %Connections)) { 72 my $addr=inet_ntoa(utility::key_of ($name)); 73 print "$Connections{$name}->{count}\t$name\t$addr"; 74 75 }