These days, Glug the elf can mostly fill the Naughty by trolling FaceBook and MySpace. However, some sneaky naughties hide their naughty words elsewhere on-line. To catch naughty hipsters in Northampton swearing and the like on their Twitter accounts, Glug logs into his Twitter identity @AngrySantaElf with his command-line twitter backup-archival tool, teewt.pl. (He thinks spelling Tweet backwards for a backup script is hilarious.)
Glug downloaded this useful little backup script off some website site and hacked it a little to download other peoples tweets instead of one's own precious drivel. It used Net::Twitter::Lite1which provides both OAuth authentication for Twitter and Basic (insecure passwords) for Identi.ca, Laconi.ca, etc. So the first time he runs it as a test, it requires he log into the web Twitter to get the authentication Pin for this app, which it stores in teewt.dat.
$ perl teewt.pl last=angrysantaelf
Authorize this application at: http://twitter.com/oauth/authorize?oauth_token=LKzNrr...
Then, enter the PIN# provided to continue:
123456
$VAR1 = [
{
'source' => 'web',
'retweet_count' => 0,
'created_at' => 'Mon Dec 20 22:56:04 +0000 2010',
'text' => 'Turns out our Slinky Dogs are rabid, so open those Christmas gifts with great care and distance.',
'in_reply_to_user_id' => undef,
'user' => {
'id' => 214624407,
'screen_name' => 'angrysantaelf',
...
After that, Glug grabs the last 500 tweets for each NoHo hipster
$ perl teewt.pl PAGES=5 user=hanneloreEC user=tai_fighter user=martenreed user=marigoldfarmer user=fayewhitaker user=dorabianchi > noho.csv
fetching page 1 ...
... got 100
fetching page 2 ...
... got 100
And then counts the officially sanctioned dirty words2 and average swears per tweet per person.3
$ perl -I. -MRE_BadWords -F, -lane '$c{$F[1]}++; $n{$F[1]}++ while $F[4] =~ m/$BadWords/g;' \
> -e 'END{printf "%d\t%d\t%.3f\t%s\n",$n{$_} , $c{$_}, $n{$_} / $c{$_}, $_
> for sort keys %n }' noho.csv
6 210 0.029 dorabianchi
16 250 0.064 fayewhitaker
1 377 0.003 hanneloreEC
7 118 0.059 marigoldfarmer
10 200 0.050 martenreed
10 136 0.074 tai_fighter
Glug sees four naughty tweeting birds over the 5% badword line, and gives them each a Naughty demerit. He also gives Hannelore a Nice check for being much much cleaner than her bad influence friends. (Dora will get a demerit too, when Glug finds out what she's done but that's not on-line.)
1 #!perl -l 2 # 3 # teewt - backing tweets to CSV or, singly, as Dumper object 4 # 5 # derived from Net::Twitter::Lite - OAuth desktop app example 6 # by William Ricker for Perl Advent Calendar 2010 7 # Copyright 2010 William Ricker 8 # License Same As Perl 9 10 use warnings; 11 use strict; 12 13 use Net::Twitter::Lite; 14 use File::Spec; 15 use Storable; 16 use Data::Dumper; 17 use Text::CSV; 18 use IO::Wrap; # wraphandle STDOUT 19 use feature ":5.10"; 20 21 # Setup 22 23 my $nt = authorize(); 24 25 # Supported Twitter API calls (methods) and columns to export in CSV 26 my %Fetches = ( 27 'sent_direct_messages' => 28 [ qw[ id_str created_at sender_screen_name recipient_screen_name text ]], 29 'direct_messages' => 30 [ qw[ id_str created_at sender_screen_name recipient_screen_name text ]], 31 'user_timeline' => 32 [qw[ created_at user/screen_name user/name user/id_str 33 text geo coordinates place id_str source 34 in_reply_to_status_id_str in_reply_to_screen_name retweet_count ]], 35 'friends_timeline' => 36 [qw[ created_at user/screen_name user/name user/id_str 37 text geo coordinates place id_str 38 in_reply_to_status_id_str in_reply_to_screen_name retweet_count ]], 39 ); 40 41 my %Short = ( 42 sdms => {cmd=>'sent_direct_messages', all=>1}, 43 dms => {cmd=>'direct_messages', all=>1}, 44 mine => {cmd=>'user_timeline',all=>1}, 45 status => {cmd=>'user_timeline',one=>1}, 46 friends => {cmd=>'friends_timeline',all=>1}, 47 'last' => {cmd=>'friends_timeline',one=>1}, 48 sd1 => {cmd=>'sent_direct_messages', one=>1}, 49 dm1 => {cmd=>'direct_messages', one=>1}, 50 ); 51 52 # Work - look for commands on @ARGV 53 54 my $pages = $ENV{PAGES} || 50; 55 56 while ( my $cmd = shift ) { 57 given ($cmd) { 58 59 when (/PAGES=(\d+)/) {$pages = $1;} 60 61 when (%Fetches){ all($cmd)} 62 63 when (%Short){ 64 my ($cmdLong,$one,$all) = @{$Short{$cmd}}{qw[cmd one all]}; 65 # hash slice per http://www.stonehenge.com/merlyn/UnixReview/col68.html 66 once($cmdLong) if $one; 67 all($cmdLong) if $all; 68 } 69 70 # last_$cmd is just one 71 when ( / ^ last_ (\w+) (?(?{$Fetches{$1}})(?-:)|(*FAIL)) $ /xi ) { 72 once($1); 73 } 74 75 # alias - drop a final s and the last_ is optional 76 when ( / ^ (?: last_ )? (\w+) (?(?{$Fetches{$1.q(s)}})(?-:)|(*FAIL)) $ /xi ) { 77 once($1.q(s)); 78 } 79 80 when ( / ^ last = (?: (\d+) | (\w+) ) /xi ) { 81 once('user_timeline', {screen_name => $2} ) if $2; 82 once('user_timeline', {id => $1} ) if $1; 83 84 } 85 86 when ( / ^ user = (?: (\d+) | (\w+) ) /xi ) { 87 all('user_timeline', {screen_name => $2} ) if $2; 88 all('user_timeline', {id => $1} ) if $1; 89 90 } 91 92 default { 93 warn "Unrecognized command $cmd \n supported choices:\n"; 94 warn "last_$_ $_ \n" for sort keys %Fetches ; 95 warn "abbrevs: ".join(q( ),sort keys %Short )."\n"; 96 last; 97 } 98 } # end given cmd 99 } # end cmds 100 101 # ------------------ subs ------------------------- 102 103 sub once { 104 my ($meth, $args) = @_ ; 105 my %args = $args ? %{$args} : () ; #=(screen_name => 'safety' | id => 123456 ); 106 107 #TBD should probably be in eval catch too ? 108 my $status = $nt->$meth( { count => 1, %args } ); 109 print Dumper $status; 110 } 111 112 sub all { 113 my ($meth, $args) = @_ ; 114 # %Fetches is implicit arg 115 my @Fields = @{$Fetches{$meth}}; 116 117 my $csv = Text::CSV->new (); 118 my $io = wraphandle("STDOUT"); 119 120 # setup 121 $csv->print ($io, \@Fields); 122 @Fields = map { m((\w+)/(\w+)) ? [$1, $2] : $_ } @Fields; 123 # expand 'user/screen_name' to qw[user screen_name] 124 # (avoids splitting in inner loop) 125 126 my %args = $args ? %{$args} : () ; #=(screen_name => 'safety' | id => 123456 ); 127 128 for my $i ( 1 .. $pages ) # current 3200 max total fetch, so 50 pages enough 129 { 130 warn "fetching page $i ...\n"; # heartbeat 131 132 my $statuses; 133 eval { 134 $statuses = $nt->$meth( 135 { 136 137 # id => $friendId, # for others, on user_timeline 138 # since_id => $high_water, to filter already seen, on most 139 count => 100, # typically 70..80 returned 140 page => $i, # from 1.. 141 %args, 142 } 143 ); 144 }; # end eval 145 146 if ($@) { 147 my $Err = $@; 148 warn "$Err \n"; 149 given ($Err) { 150 when (/502|503/) { sleep 1; redo }; # the intertubes were clogged 151 default { die "Unexpected error $Err \n" }; 152 } 153 } # end if err 154 my $ni = scalar @$statuses; 155 warn "... got $ni \n"; # heartbeat 156 last unless $ni; 157 for my $status (@$statuses) { 158 $csv->print ($io, 159 [ map {( ref $_ 160 ? $status->{$_->[0]}->{$_->[1]} // "" #/ 161 : $status->{$_} // "" #/ 162 ) } @Fields 163 ] ) ; 164 } #end for each 165 } # end page i 166 } # end sub all 167 168 sub authorize { 169 170 # straight from Net-Twitter-Lite's examples/oauth_desktop.pl 171 # just put into sub 172 173 # You can replace the consumer tokens with your own; 174 # these tokens are for the Net::Twitter example app. 175 my %consumer_tokens = ( 176 consumer_key => 'v8t3JILkStylbgnxGLOQ', 177 consumer_secret => '5r31rSMc0NPtBpHcK8MvnCLg2oAyFLx5eGOMkXM', 178 ); 179 180 # $datafile = oauth_desktop.dat 181 my ( undef, undef, $datafile ) = File::Spec->splitpath($0); 182 $datafile =~ s/\..*/.dat/; 183 184 my $nt = Net::Twitter::Lite->new(%consumer_tokens); 185 my $access_tokens = eval { retrieve($datafile) } || []; 186 187 if (@$access_tokens) { 188 $nt->access_token( $access_tokens->[0] ); 189 $nt->access_token_secret( $access_tokens->[1] ); 190 } 191 else { 192 my $auth_url = $nt->get_authorization_url; 193 print " Authorize this application at: $auth_url\n" 194 . "Then, enter the PIN# provided to continue: "; 195 196 my $pin = <STDIN>; # wait for input 197 chomp $pin; 198 199 # request_access_token stores the tokens in $nt AND returns them 200 my @access_tokens = $nt->request_access_token( verifier => $pin ); 201 202 # save the access tokens 203 store \@access_tokens, $datafile; 204 } 205 206 return $nt; 207 } 208
1. Net::Twitter has more HTML and Error handling bells-and-whistles for those with the Moose in the house, worthwhile for Web-App use, but not needed for command-line/desktop use.
2. George Carlin's Seven Filthy Words aka The Seven Words You Can't Say on the Radio as heard by Supreme Court of the US
3. In the interests of decency, we are ignoring the two worst potty-mouth twitters of the NoHo QC storyline. Glug the Elf since doesn't believe in talking birds so skips Yelling Bird, and the subjects' pintsize robot friend, a known hentai smut linker, is not a natural fictional person so is ineligible. Neither even gets coal, so Glug doesn't search them. Well, maybe he does, but not for $DayJob.