new perls v5.39.10
[andk-cpan-tools.git] / bin / cnntp-solver.pl
blobcb1fc2084b7c00bd60f6751263181fb2db8167b0
1 #!/home/src/perl/repoperls/installed-perls/perl/pVNtS9N/perl-5.8.0@32642/bin/perl
3 # use 5.012; # dor
4 use strict;
5 use warnings;
7 =head1 NAME
9 cnntp-solver - run cpan-testers-parsereport -solve over a bunch from the cpantesters DB
11 =head1 SYNOPSIS
15 =head1 OPTIONS
17 =over 8
19 =cut
21 my @opt = <<'=back' =~ /B<--(\S+)>/g;
23 =item B<--addctgargs=s>
25 Additional string to inject into the options that we pass to
26 ctgetreports. Typically used together with --pick when there is
27 something to grep for in the reports. E.g.
29 --pick RDF-NS-20130402 --addctgargs " --q qr:'(Bad blank node identifier.*)'"
31 =item B<--calcsample=i>
33 Defaults to arbitrary 500 on normal runs and to 2500 if pick is set. 0
34 is treated as unlimited. If the number of passes and fails is so low
35 that the expected numbers in the sample would be too low, then the
36 sample is set higher, no matter what this parameter specifies.
38 =item B<--cpanstats_distrofacts=s>
40 For debugging only: string must consist of whitespace separated
41 arguments $dist and $version (App-MusicExpo 1.002) or the more common
42 $dist-$version (App-MusicExpo-1.002). Output is a JSON represenation
43 of what the function cpanstats_distrofacts(undef,$dist,$version)
44 returns.
46 =item B<--cpanstats_distrofacts_zero=s>
48 Like above but with resetting to zero beforehand.
50 =item B<--db_fixup_release_date=s>
52 For bug fixing only: string must consist of whitespace separated
53 arguments $dist and $version. The function db_fixup_release_date will
54 be called with these arguments and has the potential to fix a
55 I<allversions> bug. Output is a bit unstructured.
57 =item B<--leavewhennojobsleft!>
59 If this option is true the program finishes when it (1) is only
60 waiting for mincalctime and (2) the number of jobs in the jobsqueue
61 drops to zero. It is not used to make the program wait for a longer
62 time than mincalctime, the purpose is only to leave earlier when no
63 reason to wait exists anymore. The purpose of mincalctime is to keep
64 the shell loop around us trivial and the purpose of
65 leavewhennojobsleft is to get back to work asap when the jobqueue is
66 empty.
68 =item B<--maxfromannotate=i>
70 Per default we read all annotations from the file annotations.txt. But
71 the maximum number we want to read can be set lower here. Useful when
72 debugging the enqueing process of a certain package. Put this package
73 into the first line and run
75 .../perl -d bin/cnntp-solver.pl --maxfromannotate=1 --maxfromsolved=0 --maxfromdb=0 --urgent
77 =item B<--maxfromdb=i>
79 Defaults to 666. Per default we read through the huge SQL query. We
80 switch gears in the middle from thorough reading every row to skipping
81 huge amounts of records randomly. But the maximum number we want to
82 read at all can be set lower here.
84 =item B<--maxfromsolved=i>
86 Per default we read through all already previously solved
87 calculations. But the maximum number we want to read can be set lower
88 here.
90 =item B<--mincalctime=i>
92 When nothing left to do, let the script sleep until this amount of
93 time has passed. Not interesting anymore. Redis queue takes over.
95 =item B<--pick=s>
97 Like a ctgetreports for this single item. We fake that it was found in
98 annotations and skip all other steps. Pick per default implies
99 sample=2500 but this can be overridden. Pick implies some kind of
100 force in that we do not calculate whether a recalc is necessary
101 depending on the age of a previous report. We simpy do it, regardless.
103 Maybe it's too much in the single parameter and we should separate some
104 aspects out?
106 =item B<--retrieve_distrofacts=s>
108 For debugging only: string must consist of whitespace separated
109 arguments $dist and $version. Output is a JSON represenation of what
110 the function retrieve_distrofacts(undef,$dist,$version) returns.
112 =item B<--onlystartwhennojobs!>
114 If this option is true the program does not start as long as there are
115 jobs in the queue.
117 =item B<--this_urgent=s>
119 For debugging only: string must consist of whitespace separated
120 arguments $dist and $version. Output is a JSON represenation of what
121 the function this_urgent(undef,$dist,$version) returns.
123 =item B<--urgent!>
125 Dangerous, use with care. Sets urgent globally for the whole run.
126 Useful during debugging to avoid having to play dirty tricks with
127 randomness.
129 =back
131 =head1 DESCRIPTION
133 Fulfills two roles on analysis: (1) enqueueing to redis is done by
134 running this script from crontabs with different options; (2)
135 dequeueing from redis is done by parallelized runs of this script with
136 the C<--pick> option.
138 =head1 FUNCTIONS
140 =over 8
142 =cut
144 #use lib qw( /usr/local/perl-5.10.1-uld/lib/5.10.1/x86_64-linux-ld
145 # /usr/local/perl-5.10.1-uld/lib/5.10.1
146 # /usr/local/perl-5.10.1-uld/lib/site_perl/5.10.1/x86_64-linux-ld
147 # /usr/local/perl-5.10.1-uld/lib/site_perl/5.10.1
149 #use lib "/home/src/perl/CPAN-Testers-Common-Utils-0.002-82Cocq/blib/lib";
150 #use lib "/home/src/perl/CPAN-Testers-Common-DBUtils-0.05-zqYV7K/blib/lib";
151 #use lib "/home/src/perl/CPAN-Testers-Common-Article-0.42-4OIZRu/blib/lib";
152 #use lib "/home/src/perl/CPAN-Testers-Data-Generator-0.41-bzdKLH/blib/lib";
153 #use CPAN::Testers::Data::Generator;
154 # methods discussed in CPAN::Testers::Common::DBUtils
155 # schema in CPAN::WWW::Testers::Generator
157 use Digest::SHA;
158 use File::Basename qw(dirname);
159 use File::Path qw(mkpath);
160 use File::Spec;
161 use File::Temp;
162 use Getopt::Long;
163 use Hash::Util qw(lock_keys);
164 use Pod::Usage qw(pod2usage);
165 our %Opt;
166 lock_keys %Opt, map { /([^=!]+)/ } @opt;
167 GetOptions(\%Opt,
168 @opt,
169 ) or pod2usage(1);
170 unless (defined $Opt{calcsample}) {
171 if ($Opt{pick}) {
172 $Opt{calcsample} = 2500; # arbitrary
173 } else {
174 $Opt{calcsample} = 500; # arbitrary
177 $Opt{maxfromdb} = 666 unless defined $Opt{maxfromdb};
178 sub mytouch ($$);
179 sub cpanstats_distrofacts;
180 sub retrieve_distrofacts;
181 sub this_urgent;
183 use FindBin;
184 use lib "$FindBin::Bin/../CPAN-Blame/lib";
185 use CPAN::Blame::Config::Cnntp;
186 my($workdir,$cpan_home,$ext_src);
187 BEGIN {
188 $workdir = File::Spec->catdir
189 ($CPAN::Blame::Config::Cnntp::Config->{solver_vardir},
190 "workdir");
191 $cpan_home = $CPAN::Blame::Config::Cnntp::Config->{cpan_home};
192 $ext_src = $CPAN::Blame::Config::Cnntp::Config->{ext_src};
195 use lib "$ext_src/cpan-testers-parsereport/lib"
196 #, "/home/src/perl/Config-Perl-V-0.10-ymOAl_/blib/lib"
197 #, "/home/src/perl/DateTime-Format-DateParse-0.04-eYwnxv/blib/lib"
199 use CPAN::Testers::ParseReport 0.002000; # we use it only indirectly but want
200 # to fail early if it is missing
201 use lib "$FindBin::Bin/../CPAN-Blame/lib";
202 use Catalyst::Model;
203 use CPAN::Blame::Model::Solved;
205 use lib "$ext_src/rersyncrecent/lib";
206 use File::Rsync::Mirror::Recent;
208 use CPAN::DistnameInfo;
209 use DateTime;
210 use Time::Moment;
211 use lib "$ext_src/rersyncrecent/lib";
212 use LWP::UserAgent;
213 use List::Util qw(max min minstr reduce sum);
214 use List::MoreUtils qw(uniq);
215 use Storable qw(dclone);
216 use Term::ANSIColor;
217 use Time::Duration qw(duration);
218 use Time::HiRes qw(sleep);
219 use Time::Local qw(timegm);
220 use URI ();
221 use YAML::Syck;
222 use JSON::XS;
223 use Redis;
225 use version; our $VERSION = qv('1.1.1');
227 ###### constants (mostly arbitrary) ######
228 use constant MUST_CONSIDER => 1200; # 2015-06-03: current pause makes nearly 1500 per day
229 # 2016-03-28: s/1400/1000/: try out whether it feels faster
230 use constant MIN_CALCTIME => 20; # not interesting anymore, just a hot-loop-cooler
231 use constant MAX_CALCTIME => 60*3600; # nonsense since redis we are done in no time
233 ###### debugging ######
235 if ($Opt{cpanstats_distrofacts_zero}) {
236 $Opt{cpanstats_distrofacts} = $Opt{cpanstats_distrofacts_zero};
238 for my $func_name (qw(
239 cpanstats_distrofacts
240 retrieve_distrofacts
241 db_fixup_release_date
242 this_urgent
243 )) {
244 if (my $opt = $Opt{$func_name}) {
245 my($dist,$version) = split " ", $opt;
246 unless (defined $version && length $version){
247 my $d = CPAN::DistnameInfo->new("FOO/$opt.tgz");
248 $dist = $d->dist;
249 $version = $d->version;
250 unless (defined $version && length $version){
251 die "argument '$opt' could not be split";
254 no strict 'refs';
255 my $func = \&{$func_name};
256 my $href = $func->(undef,$dist,$version);
257 my $jsonxs = JSON::XS->new->pretty(1)->canonical(1);
258 print $jsonxs->encode($href);
259 exit;
263 ###### subroutines ######
265 =item ok_value_and_distro
269 =cut
270 sub ok_value_and_distro {
271 my($article) = @_;
272 $article->{subject} =~ /(\S+)\s+(\S+)/;
275 =item mydbi
279 =cut
280 sub mydbi ($) {
281 my $file = shift;
282 require DBI;
283 my $dbi = DBI->connect ("dbi:SQLite:dbname=$file");
286 =item mypgdbi
290 =cut
291 sub mypgdbi () {
292 require DBI;
293 my $dbi = DBI->connect ("dbi:Pg:dbname=analysis");
296 =item myredis
300 =cut
301 sub myredis () {
302 my $redis = Redis->new(reconnect => 120, every => 1000);
305 =item my_get_query
309 =cut
310 sub my_get_query {
311 my($dbi,$sql,@args) = @_;
312 $dbi ||= mypgdbi();
313 my $sth = $dbi->prepare($sql);
314 my $rv = eval { $sth->execute(@args); };
315 unless ($rv) {
316 my $err = $sth->errstr;
317 warn "Warning: error occurred while executing '$sql': $err";
319 my @rows;
320 while (my(@trow) = $sth->fetchrow_array()) {
321 push @rows, \@trow;
323 \@rows;
326 =item my_do_query
330 =cut
331 sub my_do_query {
332 my($dbi,$sql,@args) = @_;
333 local($dbi->{PrintError}) = 0;
334 my $success = $dbi->do($sql,undef,@args);
335 unless ($success) {
336 my $err = $dbi->errstr;
337 unless ($err =~ /duplicate key/) {
338 require Carp;
339 Carp::cluck(sprintf
341 "Warning: error occurred while executing sql[%s]with args[%s]: %s",
342 $sql,
343 join(":",map { defined $_ ? "'$_'" : "<undef>"} @args),
344 $err,
348 return $success;
351 =item cpan_lookup_dist
355 =cut
357 package CPAN::Shell::devnull;
358 sub myprint { return; }
359 sub mywarn { return; }
360 sub mydie { my($self,$why) = @_; warn "Caught error[$why]; continuing"; return; }
361 sub mysleep { return; }
363 sub cpan_lookup_dist {
364 my($mdbi,$dist,$version) = @_;
365 return unless defined $dist && defined $version;
366 my $distv = "$dist-$version";
367 $mdbi ||= mypgdbi();
368 my($author,$upload_date,$distroid) = db_memoized($mdbi,$distv);
369 return ($author,$upload_date,$distroid)
370 if $author && $upload_date && $distroid and $upload_date =~ /^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] /;
372 # $DB::single = $dist eq "Net-Twitter" && $version eq "3.07002";
373 # $DB::single = $dist eq "App-CLI-Plugin-Log-Message";
375 use lib "$ext_src/cpanpm/lib";
376 our $CPAN_SETUP_DONE;
377 unless ($CPAN_SETUP_DONE++) {
378 require CPAN;
379 CPAN::HandleConfig->load;
380 CPAN::Shell::setup_output();
381 CPAN::Index->reload;
383 $CPAN::Frontend = "CPAN::Shell"; # alt: devnull
384 my @ret = CPAN::Shell->expand("Distribution", "/\\/$distv/");
385 my $best = reduce {
386 my $la = length(mybasename($a->{ID}));
387 my $lb = length(mybasename($b->{ID}));
388 $la < $lb ? $a : $b
389 } @ret;
391 my($upload_date_authoritative) = 0;
392 if ($best) {
393 # $DB::single=1;
394 my $oldyaml = read_ctx($distv);
395 if ($oldyaml->{upload_date}
396 and $oldyaml->{upload_date} =~ /^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] /
397 and $oldyaml->{upload_date} !~ /\?/) {
398 $upload_date = $oldyaml->{upload_date};
399 } elsif ($best->{UPLOAD_DATE}) {
400 # not expected to happen
401 $upload_date = $best->{UPLOAD_DATE}
402 } elsif (my @stat = stat "$cpan_home/authors/id/$best->{ID}") {
403 # $DB::single++;
404 my(@mtime) = gmtime $stat[9];
405 $mtime[4]++;
406 $mtime[5]+=1900;
407 $upload_date = sprintf "%04d-%02d-%02dT%02d:%02d:%02dz", @mtime[reverse 0..5];
408 $upload_date_authoritative = 1;
409 } else {
410 # S-L-O-W
411 my($id) = $best->{ID};
412 $id =~ s|^./../||;
413 local($CPAN::Frontend) = "CPAN::Shell::devnull";
414 CPAN::Shell->ls($id); # find out the upload date
415 $upload_date = $best->upload_date || "???";
417 $distroid = $author = substr($best->{ID},5);
418 $author =~ s|/.*||;
419 } elsif (my $x = find_on_cpan($dist,$version)) {
420 my $absdistro = "$cpan_home/authors/$x";
421 ($author) = $absdistro =~ m{.././../(.+?)/};
422 ($distroid) = $absdistro =~ m{.././../(.+)$};
423 if (my(@stat) = stat $absdistro) {
424 my(@mtime) = gmtime $stat[9];
425 $mtime[4]++;
426 $mtime[5]+=1900;
427 $upload_date = sprintf "%04d-%02d-%02d", @mtime[5,4,3];
428 } else {
429 # warn "Alert: Could not stat '$absdistro', will try to continue: $!";
430 $upload_date = "....-..-..";
432 } else {
433 # next ARTICLE;
434 $author = "";
435 my(@now) = gmtime;
436 $now[4]++;
437 $now[5]+=1900;
438 $upload_date = sprintf "%04d-%02d-%02d(?)", @now[5,4,3];
439 # leaving $distroid = undef;
441 db_memoize($mdbi,$distv,$author,$upload_date,$distroid,$upload_date_authoritative);
442 return ($author,$upload_date,$distroid);
445 =item db_memoized
449 =cut
450 sub db_memoized {
451 my($dbi,$distv) = @_;
452 my $sql = "SELECT author,upload_date,distroid FROM distlookup WHERE distv=?";
453 my $rows = my_get_query
455 $dbi,
456 $sql,
457 $distv
459 return unless @$rows;
460 if (@$rows==1) {
461 return @{$rows->[0]};
462 } else {
463 my $n = @$rows;
464 die "Panic: cannot happen: '$n' rows when querying primary key '$distv'";
468 =item db_memoize($dbi,$distv)
470 =item db_memoize($dbi,$distv,$author,$upload_date,$distroid,$upload_date_authoritative)
472 First form reserves a record, second form fills it a bit. Undefined
473 values are ignored (*not* written as NULL).
475 =cut
476 sub db_memoize {
477 my($dbi,$distv,$author,$upload_date,$distroid,$upload_date_authoritative) = @_;
478 my $sql = "INSERT INTO distlookup (distv) VALUES (?)";
479 my $success = eval { my_do_query
481 $dbi,
482 $sql,
483 $distv,
485 my(%othersets) =
487 author => $author,
488 upload_date => $upload_date,
489 distroid => $distroid,
490 $upload_date_authoritative ? (release_date_metacpan => $upload_date) : (),
492 my(@set, @bind);
493 while (my($k,$v) = each %othersets) {
494 if (defined $v) {
495 push @set, "$k=?";
496 push @bind, $v;
499 return unless @bind; # they had no data
500 push @bind, $distv;
501 $sql =
502 sprintf "UPDATE distlookup SET %s WHERE distv=?",
503 join(", ", @set);
504 eval { my_do_query
506 $dbi,
507 $sql,
508 @bind,
510 if ($@){
511 warn "Error while processing '$distv' with sql[$sql]: $@";
515 =item mybasename
519 =cut
520 sub mybasename {
521 my($p) = @_;
522 $p =~ s|.*/||;
523 return $p;
526 READ_ANNOTATIONS: {
527 my %anno_line;
528 sub read_annotations {
529 use File::chdir;
530 local($CWD) = "$FindBin::Bin/..";
531 unless ($Opt{pick}) {
532 for (0,1) {
533 last if 0 == system git => "pull"; # may fail
534 sleep 1;
537 my $annofile = "$FindBin::Bin/../annotate.txt";
538 my $fh;
539 unless (open $fh, $annofile) {
540 # $DB::single=1;
541 die "Could not";
544 my $anno = {};
545 local $/ = "\n";
546 my $i = 0;
547 ANNOLINE: while (<$fh>) {
548 chomp;
549 next ANNOLINE if /^\s*$/;
550 $i++;
551 if (defined $Opt{maxfromannotate}) {
552 last ANNOLINE if $i > $Opt{maxfromannotate};
554 my($dist,$splain) = split " ", $_, 2;
555 $anno->{$dist} = $splain;
556 $anno_line{$dist} = $i;
558 return $anno;
563 =item find_on_cpan
567 =cut
568 sub find_on_cpan {
569 my($dist,$version) = @_;
570 return unless defined $dist;
571 my $rf = File::Rsync::Mirror::Recent->new(local => "$cpan_home/authors/RECENT.recent");
572 my $recent = $rf->news();
573 my(@cand) = map
575 $recent->[$_]
576 } grep
578 $recent->[$_]{path} =~ m{/\Q$dist\E-\Q$version\E}
579 && $recent->[$_]{path} !~ m{(meta|readme)$}
580 # it doesn't matter if it is deleted, we found it!
581 # && $recent->[$_]{type} eq "new"
582 } 0..$#$recent;
583 my @path = uniq map { $_->{path} } @cand;
584 if (@path == 0) {
585 return;
586 } elsif (@path != 1){
587 # e.g.
588 # 0 'id/D/DR/DROLSKY/Moose-0.89_02.tar.gz'
589 # 1 'id/D/DR/DROLSKY/Moose-0.89_01.tar.gz'
590 # 2 'id/D/DR/DROLSKY/Moose-0.89.tar.gz'
591 my $best = reduce {
592 my $la = length($a);
593 my $lb = length($b);
594 $la < $lb ? $a : $b
595 } @path;
596 # $DB::single++;
597 return $best;
598 } else {
599 return $path[0];
603 =item $href = cpanstats_distrofacts($dbi, $dist, $version, $distv, $lfh)
605 The original design of this function was and still is to query the
606 C<cpanstats> table for all reports on B<dist> and not B<distv>. It
607 then constructs the order of versions how they entered the database in
608 C<allversions> and count pass and fail and other for this version and
609 determine the third fail day.
611 A sample $href returned (slightly faked):
614 "allversions" : [
616 "date" : "201302071447",
617 "version" : "0.01"
620 "date" : "201302111128",
621 "version" : "0.02"
624 "date" : "201303270850",
625 "version" : "0.03"
628 "date" : "201401241127",
629 "version" : "0.04"
632 "db_fails" : 3,
633 "db_others" : 2,
634 "db_passes" : 1748,
635 "thirdfail" : "2013-12-19 14:54",
636 "thirdfailid" : 37664541,
637 "thirdpass" : "2011-04-17 14:04",
638 "thirdpassid" : 11739866
641 Since this function can also be called from the option
642 --cpanstats_distrofacts, the question is how well it works in
643 concurrency. I *think* the algorithm is OK but I did not try to test
644 for border cases. To avoid corruption we introduced the $lfh
645 parameter.
647 If the $lfh is false, we try to get a lockfilehandle and hold it to
648 the end. If it is a valid lockfilehandle, all the better.
650 =cut
651 sub cpanstats_distrofacts {
652 my($dbi,$dist,$version,$distv,$lfh) = @_;
653 die "ALERT: illegal distname [$dist]" if $dist =~ /'/;
654 $distv ||= "$dist-$version";
655 unless ($lfh) {
656 my $slv_file = slv_file($distv);
657 $lfh = lockfilehandle("$slv_file.LCK") or return;
659 $dbi ||= db_quickaccess_handle();
660 my($oldfacts) = retrieve_distrofacts($dbi,$dist,$version,$distv);
661 # CUT: we do not yet use the oldfacts
662 my($fails, $passes, $others,
663 $thirdfailid, $thirdpassid, $alreadycounted,
664 $thirdfail, $thirdpass, $cuti_reset)
665 = @{$oldfacts}{qw(fails passes others
666 thirdfailid thirdpassid alreadycounted
667 thirdfail thirdpass cuti_reset
669 my $allversions = dclone $oldfacts->{allversions};
670 my($maxid) = do {
671 my $sql0 = "SELECT MAX(id) FROM cpanstats";
672 my $rows0 = my_get_query($dbi, $sql0);
673 $rows0->[0][0];
675 my $maxid_ts = Time::Moment->now_utc->to_string; # 2014-02-09T09:59:11.664143Z
676 $alreadycounted ||= 0;
677 my $sql = "SELECT id,version,fulldate,state
678 FROM cpanstats
679 WHERE dist=?
680 AND id > ?
681 AND id <= ?
682 AND type=2 ORDER BY id";
683 my $rows = my_get_query($dbi, $sql, $dist, $alreadycounted, $maxid);
684 my(%seen,%xseen,%is_authoritative);
685 for my $av (@$allversions) {
686 $seen{$av->{version}}++; # prevent duplicates
687 $xseen{$av->{version}}{$av->{date}}++; # preserve earliest date from previous runs
688 if ($av->{authoritative}) {
689 $is_authoritative{$av->{version}} = 1;
692 $fails ||= 0;
693 $passes ||= 0;
694 $others ||= 0;
695 for my $row (@$rows) {
696 my($id,$v,$date,$state) = @$row; # id,version,fulldate,state
697 if (defined $v){
698 unless ($seen{$v}++) {
699 push @$allversions, { version => $v };
701 if (defined $version and $v eq $version) {
702 if ($state eq "fail") {
703 $fails++;
704 if (!$thirdfailid && $fails == 3) {
705 $thirdfail = $date; # format 200909190440
706 $thirdfailid = $id;
708 } elsif ($state eq "pass") {
709 $passes++;
710 if (!$thirdpassid && $passes == 3) {
711 $thirdpass = $date; # format 200909190440
712 $thirdpassid = $id;
714 } else {
715 $others++
718 $date = "" unless defined $date;
719 $is_authoritative{$v} || $xseen{$v}{$date}++; # format 200909190440
722 if ($thirdfail) {
723 $thirdfail =~ s/^(\d\d\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/$1-$2-$3 $4:$5z/;
725 if ($thirdpass) {
726 $thirdpass =~ s/^(\d\d\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/$1-$2-$3 $4:$5z/;
728 for my $v (@$allversions) {
729 $v->{date} = minstr grep { length $_ } keys %{$xseen{$v->{version}}};
731 # sleep 0.25; # throttle
732 my $facts =
734 allversions => $allversions,
735 thirdfail => $thirdfail,
736 thirdpass => $thirdpass,
737 thirdfailid => $thirdfailid,
738 thirdpassid => $thirdpassid,
739 db_fails => $fails,
740 db_passes => $passes,
741 db_others => $others,
742 db_maxid => $maxid,
743 db_maxid_ts => $maxid_ts,
744 cuti_reset => $cuti_reset,
746 store_distrofacts($dbi,$dist,$distv,$oldfacts,$facts);
747 return $facts;
750 =item store_distrofacts
752 Cache the pass/fail/other counts, thirdfailid and thirdpassid, and the
753 allversions structure. Remember the highest id up to which we have
754 counted/observed so far, so we can start counting there next time.
756 =cut
757 sub store_distrofacts {
758 my($dbi,$dist,$distv,$oldfacts,$facts) = @_;
759 no warnings 'uninitialized';
761 # do we need an insert first? See "insert missing" below
762 db_memoize($dbi,$distv);
764 # then an update
765 my $sql = "UPDATE distlookup
766 SET cuti_fail=?, cuti_pass=?, cuti_others=?,
767 thirdfailid=?, thirdpassid=?, counted_up_to_id=?,
768 cuti_ts=?, cuti_reset=?
769 WHERE distv=?";
770 my(%otherwhere);
771 my @bind =
773 @{$facts}{qw(db_fails db_passes db_others
774 thirdfailid thirdpassid db_maxid
775 db_maxid_ts cuti_reset)},
776 $distv,
778 @otherwhere{qw(fails passes others
779 thirdfailid thirdpassid alreadycounted
780 cuti_reset)}
781 = qw(cuti_fail cuti_pass cuti_others
782 thirdfailid thirdpassid counted_up_to_id
783 cuti_reset);
784 while (my($k,$v) = each %otherwhere) {
785 if (defined $oldfacts->{$k}) {
786 $sql .= " AND $v=?";
787 push @bind, $oldfacts->{$k};
788 } else {
789 $sql .= " AND $v IS NULL";
792 my $sth = $dbi->prepare($sql);
793 my $rv = $sth->execute(@bind);
794 unless ($rv > 0) {
795 my $bind = join ",", map { defined $_ ? "'$_'" : "<undef>" } @bind;
796 warn "WARNING: $rv records updated with sql[$sql]bind[$bind]".
797 " => insert missing!";
800 my $oldav = { map {("$dist-$_->{version}" => $_->{date})}
801 @{$oldfacts->{allversions}||[]}
803 my $av = $facts->{allversions};
804 DREF: for my $i (0..$#$av) {
805 my $dref = $av->[$i] or next; # "distro-reference" { version=>"0.12", date=>"20070408" }
806 my $tdistv = "$dist-$dref->{version}";
807 my $tdate = $dref->{date};
808 if ($i == $#$av and $tdate =~ /[0-9]{8}/) {
809 eval { db_fixup_release_date($dbi,$dist,$dref->{version},$tdistv); };
810 if ($@) {
811 warn "Warning: fixup complained [$@]";
814 next DREF if $tdate && $oldav->{$tdistv} && $tdate eq $oldav->{$tdistv};
815 db_quickaccess_insert_or_update($dbi,$tdistv,undef,undef,$dist);
816 db_memoize($dbi,$tdistv,undef,$tdate,undef,0);
818 my $redis = myredis;
819 my($fail,$pass) = @{$facts}{qw(db_fails db_passes)};
820 #### hset missing in our ubuntu redis, using zscore as a workaround
821 $redis->zadd("analysis:distv:pass",$pass,$distv) if defined $pass;
822 $redis->zadd("analysis:distv:fail",$fail,$distv) if defined $fail;
824 return;
827 =item retrieve_distrofacts
829 The distlookup and distcontext tables hold aggregate data that are
830 generated from cpanstats (plus some). We update them on every run in
831 store_distrofacts and reuse them next time we come by so that we do
832 not have to read so much from cpanstats.
834 When a record has a zero for counted_up_to_id and one of the
835 pass/fail/etc values are set, the whole record is reset. This means, a
836 convenient way to restart counting is by resetting counted_up_to_id.
837 For convenience, this reset can be triggered with the
838 --cpanstats_distrofacts_zero option.
840 =cut
841 sub retrieve_distrofacts {
842 my($dbi,$dist,$version,$distv) = @_;
843 $dbi ||= db_quickaccess_handle();
844 $distv ||= "$dist-$version";
845 my $facts = {};
846 my $sth = $dbi->prepare("SELECT cuti_fail, cuti_pass, cuti_others,
847 thirdfailid, thirdpassid, counted_up_to_id,
848 cuti_reset
849 FROM distlookup WHERE distv=?");
850 $sth->execute($distv);
851 my(@row) = $sth->fetchrow_array;
852 @{$facts}{qw(fails passes others thirdfailid thirdpassid alreadycounted cuti_reset)} = @row;
853 my @av;
854 my $reset = $Opt{cpanstats_distrofacts_zero} || !$facts->{alreadycounted};
855 unless ($reset) {
856 my $cuti_reset ||= 0;
857 if (time - $cuti_reset > 86400*90) {
858 if (rand(1) < 0.001) {
859 $reset = 1;
863 if ($reset) {
864 my $nil = {};
865 @{$nil}{qw(db_fails db_passes db_others thirdfailid thirdpassid alreadycounted)} = ();
866 $nil->{cuti_reset} = time;
867 store_distrofacts($dbi,$dist,$distv,$facts,$nil);
868 $facts = $nil;
869 } else {
870 $sth = $dbi->prepare("SELECT fulldate FROM cpanstats WHERE id=?");
871 $sth->execute($facts->{thirdfailid});
872 (@row) = $sth->fetchrow_array;
873 $facts->{thirdfail} = $row[0];
874 $sth->execute($facts->{thirdpassid});
875 (@row) = $sth->fetchrow_array;
876 $facts->{thirdpass} = $row[0];
877 $sth = $dbi->prepare("SELECT distv FROM distcontext WHERE dist=?");
878 my($sth2) = $dbi->prepare("SELECT upload_date, release_date_metacpan
879 FROM distlookup
880 WHERE distv=?");
881 $sth->execute($dist);
882 while (@row = $sth->fetchrow_array) {
883 my($tdistv) = @row;
884 my($v) = substr($tdistv,length($dist)+1);
885 next unless $v;
886 $sth2->execute($tdistv);
887 my($tud,$trdmc) = $sth2->fetchrow_array;
888 my($tdate) = $trdmc || $tud;
889 next unless $tdate;
890 $tdate =~ s/[^0-9]//g;
891 next unless $tdate;
892 substr($tdate,12)="" if length($tdate)>12;
893 my $av = { version => $v, date => $tdate };
894 $av->{authoritative} = 1 if $trdmc;
895 push @av, $av;
897 @av = sort { $a->{date} cmp $b->{date}
898 || $a->{version} <=> $b->{version}
899 || $a->{version} cmp $b->{version}
900 } @av;
902 $facts->{allversions} = \@av;
903 return $facts;
906 =item $href = distroquestion($dbi,$i,$distroquestlabel,$dist,$version,$lfh)
908 (Called from three different places, and reacting to their context)
910 If the $lfh is false, we try to get a lockfilehandle and hold it to
911 the end. If it is a valid lockfilehandle, all the better.
913 =cut
914 sub distroquestion {
915 my($dbi,$i,$distroquestlabel,$dist,$version,$lfh) = @_;
916 my $distv = "$dist-$version";
917 unless ($lfh) {
918 my $slv_file = slv_file($distv);
919 $lfh = lockfilehandle("$slv_file.LCK") or return;
921 my $greenish = 0;
922 my($distro_age);
923 my($author,$upload_date,$distroid) = cpan_lookup_dist($dbi,$dist,$version);
924 my($tell_upload_date);
925 my($csdfacts) = cpanstats_distrofacts($dbi,$dist,$version,$distv,$lfh);
926 my($allversions) = $csdfacts->{allversions};
927 my($thirdfail) = $csdfacts->{thirdfail};
928 my $passfail_overview =
930 pass => $csdfacts->{db_passes},
931 fail => $csdfacts->{db_fails},
932 other => $csdfacts->{db_others},
934 my $tell_passfail_overview;
935 my $ret =
937 allversions => $allversions,
938 author => $author,
939 dist => $dist,
940 distroid => $distroid,
941 distv => $distv,
942 passfail_overview => $passfail_overview,
943 thirdfail => $thirdfail,
944 upload_date => $upload_date,
945 version => $version,
948 my $color_on = "";
949 my $color_off = "";
950 if (0 == $passfail_overview->{pass}) {
951 $color_on = Term::ANSIColor::color("red");
952 $color_off = Term::ANSIColor::color("reset");
953 } elsif ($passfail_overview->{pass} >= 3 && $passfail_overview->{fail}>=3) {
954 $color_on = Term::ANSIColor::color("green");
955 $color_off = Term::ANSIColor::color("reset");
956 $greenish++;
958 $tell_passfail_overview = sprintf
959 ("%s%3d:%3d:%3d%s",
960 $color_on,
961 $passfail_overview->{pass},
962 $passfail_overview->{fail},
963 $passfail_overview->{other},
964 $color_off,
968 my($y,$m,$d,$hour,$minute,$second) = $upload_date =~ /(\d+)-(\d+)-(\d+)(?:T(\d+):(\d+):(\d+))?/;
969 if (not defined $y) {
970 # ....-..-..
971 printf "Warning: no upload date for %s\n", $ret->{distv};
972 return $ret;
974 my $dtu = DateTime->new
975 (year => $y,
976 month => $m,
977 day => $d,
978 hour => $hour||0,
979 minute => $minute||0,
980 second => $second||0,
981 time_zone => "floating", ### should change to utc, but may soon be unreachable
983 my $dtn = DateTime->now;
984 my $dur = $dtn->subtract_datetime_absolute($dtu);
985 $distro_age = $dur->delta_seconds;
986 my $color_on = "";
987 my $maxyears = $CPAN::Blame::Config::Cnntp::Config->{maxyears} || 8;
988 if ($distro_age >= $maxyears*365.25*86400) {
989 $color_on = Term::ANSIColor::color("yellow");
990 } elsif ( $upload_date =~ /\?/) {
991 $color_on = Term::ANSIColor::color("yellow");
992 } elsif ($distro_age > 0) {
993 $color_on = Term::ANSIColor::color("green");
994 $greenish++;
995 } else {
996 $color_on = Term::ANSIColor::color("red");
998 my $color_off = Term::ANSIColor::color("reset");
999 $tell_upload_date = $color_on . substr($upload_date,2,8) . $color_off;
1001 my $tell_current_version;
1003 my $current_version = @$allversions ? $allversions->[-1]{version} : 0;
1004 my $color_on = "";
1005 if ($current_version eq $version) {
1006 $tell_current_version = "current";
1007 $color_on = Term::ANSIColor::color("green");
1008 $greenish++;
1009 } elsif ($Opt{pick}) {
1010 $tell_current_version = "<?>";
1011 $color_on = Term::ANSIColor::color("green");
1012 } else {
1013 $tell_current_version = "<$current_version";
1014 $color_on = Term::ANSIColor::color("red");
1016 my $color_off = Term::ANSIColor::color("reset");
1017 $tell_current_version = sprintf "%s%s%s", $color_on,$tell_current_version,$color_off;
1019 my $ddist = length($distv)<=52 ? $distv : (substr($distv,0,49)."...");
1020 unless ($i % 50) {
1021 print "gmtime now: ".gmtime()."\n";
1023 printf(
1024 "%3d %-8s %s %s %s/%s (%s)\n",
1025 $i, $distroquestlabel,
1026 $tell_upload_date,
1027 $tell_passfail_overview,
1028 $author || "???",
1029 $ddist,
1030 $tell_current_version,
1032 my $slv_content = slv_content($distv);
1033 my $rtables = CPAN::Blame::Model::Solved->regression_tables($slv_content);
1034 my $rsq1_regressions = join ", ", map { $_->{name} } grep { $_->{rsq}==1 } @$rtables;
1035 $ret->{greenish} = $greenish;
1036 $ret->{distro_age} = $distro_age;
1037 $ret->{rsq1_regressions} = $rsq1_regressions;
1038 # sleep 0.5; # throttle
1039 return $ret;
1042 =item slv_file
1046 =cut
1047 sub slv_file {
1048 my($distv) = @_;
1049 sprintf "%s/solved/%s.slv", $workdir, $distv;
1052 =item slv_content
1056 =cut
1057 sub slv_content {
1058 my($distv) = @_;
1059 my $slv_file = slv_file($distv);
1060 return unless -e $slv_file;
1061 my $content = do { open my $fh, $slv_file or die "Could not open '$slv_file': $!";
1062 local $/; <$fh> };
1065 =item lockfilehandle
1067 =cut
1068 sub lockfilehandle {
1069 my($lockfile) = @_;
1070 use Fcntl qw( :flock );
1071 mkpath dirname $lockfile;
1072 my $lfh;
1073 unless (open $lfh, "+<", $lockfile) {
1074 unless ( open $lfh, ">>", $lockfile ) {
1075 die "ALERT: Could not open >> '$lockfile': $!"; # XXX should not die
1077 unless ( open $lfh, "+<", $lockfile ) {
1078 die "ALERT: Could not open +< '$lockfile': $!"; # XXX should not die
1081 if (flock $lfh, LOCK_EX|LOCK_NB) {
1082 print "Info[$$]: Got the lock, continuing";
1083 } else {
1084 print "FATAL[$$]: lockfile '$lockfile' locked by a different process";
1085 return;
1087 return $lfh;
1091 =item call_ctgetreports
1095 =cut
1096 sub call_ctgetreports {
1097 my($ctx,$ctxi,$ctxtotal) = @_;
1098 my($distv,$distro_age) = ($ctx->{distv},$ctx->{distro_age});
1099 my $slv_file = slv_file($distv);
1100 my @stat = stat ($slv_file . "dv.gz");
1101 unless (@stat) {
1102 @stat = stat ($slv_file . "dv");
1104 my $slv_file_age = @stat ? time-$stat[9] : 0;
1105 # tuning this knob is always difficult for me. We are saying here
1106 # that we compare the age of the distro and the age of the
1107 # regression calculation we have around. Iow, older distros need
1108 # to be recalculated less often. If we increase this parameter
1109 # towards 1, we never calculate again because any result will be
1110 # younger than the distro. And if we go toward 0 we calculate
1111 # every time we come around. So if at 0.1 and with a very slow
1112 # pause2 we feel that 0.1 calculates too often, then we inc to
1113 # 0.25 to reduce the number of recalcs.
1114 my $must_run_ctgetreports;
1115 if ($Opt{pick}) {
1116 $must_run_ctgetreports = 1;
1117 } else {
1118 # tentative name for this variable? it is a relative age, a
1119 # threshold, and it guards recalculations:
1120 # RECALC_AGE_THRESHOLD? Sanity next time we come around
1121 $must_run_ctgetreports = !$slv_file_age || $slv_file_age > $distro_age*0.40; # arbitrary
1123 # not allowed to go beyond this point without having a lock
1124 my $lfh = lockfilehandle("$slv_file.LCK") or return;
1125 if (!$must_run_ctgetreports && -e $slv_file && $slv_file_age > 86400) {
1126 # we want to overrule the decision for dists that have changed
1127 # much since last computation
1128 my $slv_content = slv_content($distv);
1129 my($calctimepass,$calctimefail) = (0,0);
1130 my @lines = split /\n/, $slv_content;
1131 for (@lines) {
1132 next unless /^ ([A-Z]+)/;
1133 if ($1 eq "PASS") {
1134 $calctimepass++;
1135 } elsif ($1 eq "FAIL") {
1136 $calctimefail++;
1139 if ($ctx->{passfail_overview}) {
1140 if ( $calctimepass && $ctx->{passfail_overview}{pass}/$calctimepass > 1.5 # arbitrary
1141 || $calctimefail && $ctx->{passfail_overview}{fail}/$calctimefail > 1.5 # arbitrary
1143 $must_run_ctgetreports=1;
1147 # its ugly to log within a subroutine and to log things that
1148 # belong to the outer scope and to pass parameters in we only need
1149 # for logging's sake:(
1150 printf
1151 ("DEBUG: at [%s UTC] for distv[%s] slv_file_age[%s] upload_date[%s] distro_age[%s] must_run_ctg[%s] progress[%d/%d]\n",
1152 scalar gmtime,
1153 $distv,
1154 $slv_file_age,
1155 $ctx->{upload_date},
1156 $distro_age,
1157 $must_run_ctgetreports,
1158 $ctxi,
1159 $ctxtotal,
1161 if ($must_run_ctgetreports) {
1162 mkpath dirname $slv_file;
1163 my $dumpfile = $slv_file;
1164 $dumpfile =~ s/\.slv$/.slvdv/;
1165 if (-e "$dumpfile.gz") {
1166 0 == system "gunzip -f $dumpfile.gz" or die;
1168 my $ctg = sprintf "%s/ctgetreports", dirname($^X);
1169 $ctg = "$ext_src/cpan-testers-parsereport/bin/ctgetreports";
1170 die "ctgetreports not installed" unless -e $ctg;
1171 my $ctg_cachedir = $CPAN::Blame::Config::Cnntp::Config->{ctgetreports_dir};
1172 my $hashed_ctg_parent;
1173 if (-d $ctg_cachedir) {
1174 my $letter = substr($distv,0,1);
1175 $hashed_ctg_parent = "$ctg_cachedir/$letter";
1176 if (-f "$hashed_ctg_parent/$distv.tar") {
1177 0 == system qq{tar -C "$hashed_ctg_parent" -xf "$hashed_ctg_parent/$distv.tar"}
1178 or die "Problem while untarring $distv.tar";
1180 } else {
1181 die "directory '$ctg_cachedir' doesn't exist. Care to create it?";
1183 my $sample = $Opt{calcsample};
1184 if ($sample && $ctx->{passfail_overview}{fail} && $ctx->{passfail_overview}{pass}) {
1185 # correct the sample size if there is a small amount of either fails or passes
1186 my $NF = $ctx->{passfail_overview}{fail};
1187 my $NP = $ctx->{passfail_overview}{pass};
1188 my $CRIT = min($NP,$NF);
1189 my $N = $NF + $NP;
1190 if ($CRIT * $sample / $N < 3) {
1191 $sample = 1 + int(3.15 * $N / $CRIT); # 3.15 security margin over 3
1193 if ($sample > $N){
1194 # emergency break 2014-12-24:
1195 # Class-C3-Componentised-1.001000 takes longer than 48
1196 # hours, we must avert those; if the result is really
1197 # interesting, we can expect that we'll get more fails
1198 # in the following days
1199 if ($sample > 3200) { # arbitrary: CGI-Struct with 3300 took 53000 seconds
1200 warn "Alert: Late refusing of ctgetreports for $distv; sample too big: $sample";
1201 return;
1202 } else {
1203 $sample = 0;
1207 my $sample_argument = $sample ? " --sample=$sample" : "";
1208 my $addctgargs = $Opt{addctgargs} || "";
1209 my $system =
1211 qq{"$^X" -I "$ext_src/cpan-testers-parsereport/lib" "$ctg"}
1212 .qq{ --pce --cachedir="$hashed_ctg_parent/$distv"}
1213 .qq{ --q meta:from}
1214 .qq{ --q qr:'(.*version .+? required--this is only.*)'}
1215 .qq{ --q qr:'("[A-Za-z0-9_]+" is not exported by the [A-Za-z0-9_:]+ module.*)'}
1216 .qq{ --q qr:'(.*instance running.*)'}
1217 # cf Xref: k85.linux.bogus sent:7814
1218 #.qq{ --q qr:'(.*could.?n.t (?:open|connect).*)'}
1219 .qq{ --q qr:'(.*(?i:could.?n.t (?:open|connect|find)).*)'}
1220 .qq{ --q qr:'(.*Base class package.*)'}
1221 .qq{ --q qr:'(Can.t call method .*)'}
1222 .qq{ --q qr:'(Can.t use an undefined value.*)'}
1223 .qq{ --q qr:'(Can.t use string .*)'}
1224 .qq{ --q qr:'(Can.t modify division)'}
1225 .qq{ --q qr:'(Can\\047t locate loadable object)'}
1226 .qq{ --q qr:'(Connection (?:refused|reset by peer))'}
1227 .qq{ --q qr:'Constants from lexical variables potentially modified elsewhere are deprecated'}
1228 .qq{ --q qr:'(Can.t use keyword .*? as a label)'}
1229 .qq{ --q qr:'(Could not open .*)'}
1230 .qq{ --q qr:'(Address already in use)'}
1231 .qq{ --q qr:'(.*is deprecated (?:and will be removed)?)'}
1232 .qq{ --q qr:'(skip(?:ping |ped: | test).*)'}
1233 .qq{ --q qr:'(Please rerun the make command)'}
1234 .qq{ --q qr:'(v-string .* non-portable at \\S+)'}
1235 .qq{ --q qr:'(The module .* isn.t available on CPAN)'}
1236 .qq{ --q qr:'(Invalid version format)'}
1237 .qq{ --q qr:'did you mean \\\$\\] \\?'}
1238 .qq{ --q qr:'Unescaped left brace in regex is deprecated here, passed through'}
1239 .qq{ --q qr:'Unescaped left brace in regex is deprecated'}
1240 .qq{ --q qr:'Unescaped left brace in regex is illegal'}
1241 .qq{ --q qr:'Unknown warnings category'}
1242 .qq{ --q qr:'Non-ASCII character seen before =encoding'}
1243 .qq{ --q qr:'((Smartmatch|given|when) is experimental)'}
1244 .qq{ --q qr:'# (.+ alternative text .+ contains non-escaped .+)'}
1245 .qq{ --q qr:'=over without closing =back'}
1246 .qq{ --q qr:'starts or ends with whitespace'}
1247 .qq{ --q qr:'Acme::Override::INET replaced IO::Socket::INET'}
1248 .qq{ --q qr:'Invalid byte sequence'}
1249 .qq{ --q qr:'Insecure dependency in (require|eval)'}
1250 .qq{ --q qr:'Passing a list of values to enum'}
1251 .qq{ --q qr:'(undefined symbol: [A-Za-z0-9_]+ )'}
1252 .qq{ --q qr:'Bailout called\\.'}
1253 .qq{ --q qr:'Out of memory\\!'}
1254 .qq{ --q qr:'Prototype mismatch:'}
1255 .qq{ --q qr:'Connection reset by peer'}
1256 .qq{ --q qr:'depends on targets in skipped section'}
1257 .qq{ --q qr:'Network error'}
1258 .qq{ --q qr:'Insecure .+ while running with'}
1259 .qq{ --q qr:'must be adjacent in regex'}
1260 .qq{ --q qr:'locate object method .new. via package .Test2::Context.'}
1261 .qq{ --q qr:'Use of code point 0x[[:xdigit:]]+ is deprecated; the permissible max is 0x7FFFFFFF'}
1262 .qq{ --q qr:'# (.+ but done_testing.+ expects .+)'}
1263 .qq{ --q qr:'Non-zero wait status: ([0-9]+)'}
1264 .qq{ --q qr:'# OpenSSL version:\\s+\\047(.+?)\\047'}
1265 .qq{ --q qr:'attributes must come before the signature'}
1266 .qq{ --q qr:'illegal file descriptor'}
1267 .qq{ --q qr:'(got handshake key 0x[[:xdigit:]]+, needed 0x[[:xdigit:]]+)'}
1268 .qq{ --q qr:'(Cannot detect source of \\047.+?\\047!)'}
1269 .$addctgargs
1270 .qq{ --solve --solvetop=123$sample_argument}
1271 .qq{ --dumpfile=$dumpfile}
1272 .qq{ --minpass=3}
1273 .qq{ --transport=http_cpantesters_gzip}
1274 .qq{ --prefer-local-reports}
1275 .qq{ --vdistro=$distv $ctx->{dist} > $slv_file.new 2>&1}
1276 .q{ --filtercb='my $r = shift; $r->{"meta:ok"} = "FILTERED" if $r->{"qr:locate object method .new. via package .Test2::Context."}'}
1277 .q{ --filtercb='my $r = shift; $r->{"meta:ok"} = "FILTERED" if $r->{"qr:Use of code point 0x[[:xdigit:]]+ is deprecated; the permissible max is 0x7FFFFFFF"}'}
1278 .q{ --filtercb='my $r = shift; for (split " ", $r->{"conf:ccflags"}){$r->{"conf:ccflags~$_"} = 1}'}
1280 warn "Info: running system[$system]";
1281 my $ret = process_ctg($system, $distv);
1282 # gzip: /home/ftp/cnntp-solver-2009/workdir/solved/Dist-Zilla-Plugin-Twitter-0.015-TRIAL.slvdv: No such file or directory
1283 my $redis = myredis;
1284 $redis->zrem("analysis:distv:eta",$distv);
1285 if ($ret==0) {
1286 unless (0 == system "gzip -9f $dumpfile") {
1287 warn "Warning: error while running gzip on '$dumpfile', continuing...";
1288 return;
1290 unless (0 == system
1291 qq{tar -C "$hashed_ctg_parent" --remo -cf "$hashed_ctg_parent/$distv.tar" "$distv"}) {
1292 warn "Warning: error while running tar cf on '$distv', continuing...";
1293 return;
1295 my @stat = stat "$dumpfile.gz";
1296 $redis->zadd("analysis:distv:calctimestamp",$stat[9],$distv) if @stat;
1297 my $lc_ts = Time::Moment->from_epoch($stat[9])->to_string; # lc=last calc
1298 open my $zfh, "-|", "zcat", "$dumpfile.gz" or die "could not fork: $!";
1299 my $j = do {local $/; <$zfh>};
1300 use JSON::XS;
1301 my $h = JSON::XS->new->decode($j);
1302 my %passfail = map { $_ => $h->{"meta:ok"}{$_}{$_} } qw(PASS FAIL);
1303 for my $pf (qw(pass fail)) {
1304 my $redis_key = "analysis:distv:lc$pf"; # analysis:distv:lcpass
1305 my $pf_value = $passfail{uc $pf} // 0; # $passfail{PASS}
1306 $redis->zadd($redis_key, $pf_value, $distv);
1308 rename "$slv_file.new", $slv_file or die "Could not rename: $!";
1309 mytouch("slv",{name => $distv});
1310 return {
1311 lfh => $lfh,
1312 lc_ts => $lc_ts,
1313 lc_pass => $passfail{PASS}//0,
1314 lc_fail => $passfail{FAIL}//0,
1315 }; # did it!
1316 } else {
1317 warn "Alert: Error running ctgetreports; command was [$system]";
1318 return;
1320 } else {
1321 return;
1326 my($old_highscore);
1327 # have seen complications possibly leading to huge CPU waste
1328 my %unwanted = map { $_ => 1 } (
1329 "Data-Sah-Coerce-0.040",
1330 "libxml-enno-1.02",
1332 sub enqueue {
1333 my($distv,$want_ctgetreport,$ctxi) = @_;
1334 if ($unwanted{$distv}) {
1335 require Carp;
1336 Carp::cluck("$distv is generally unwanted, go away.");
1337 sleep 60;
1338 return;
1340 my $redis = myredis;
1341 unless (defined $old_highscore) {
1342 my($topdistv,$highscore) = $redis->zrevrange("analysis:jobqueue:q",0,0,"withscores");
1343 $old_highscore = $highscore;
1344 $old_highscore //= 0;
1346 my($thisscore) = $redis->zscore("analysis:jobqueue:q",$distv);
1347 $thisscore //= 0;
1348 my $score = $want_ctgetreport - $ctxi; # linear decreasing
1349 if ($ctxi < 40 && $old_highscore > $score) {
1350 # the first 40 deserve a better score than the old ones
1351 $score = $old_highscore + 40 - $ctxi;
1353 if ($score<=0) {
1354 $score = 1;
1356 if ($thisscore >= $score) {
1357 # nothing to do
1358 return;
1360 my $debug = 1;
1361 if ($debug) {
1362 my($zcard) = $redis->zcard("analysis:jobqueue:q");
1363 warn "DEBUG: enqueueing ctxi[$ctxi] old_highscore[$old_highscore] score[$score] distv[$distv] zcard[$zcard]\n";
1365 $redis->zadd("analysis:jobqueue:q", $score, $distv);
1369 =item process_ctg
1371 Runs a ctgetreports job with all of our redis interaction
1373 Revisited 2015-04-06: looks wrong to me. But remember, we have redis
1374 1.2 on our ds8143 and that did not even have a hash. That's for
1375 example why we used JSON::XS here, we could have used hash instead.
1376 But I'm sure that the "pattern" that I was reading in the Redis
1377 documentation at that time has changed significantly since then. Maybe
1378 we should rewrite the code according to the current description.
1380 =cut
1381 sub process_ctg {
1382 my($system, $distv) = @_;
1383 my $start = time;
1384 my $redis = myredis;
1385 my $jsonxs = JSON::XS->new->indent(0);
1386 my $job_object = {distv=>$distv,start=>$start};
1387 my $json = $jsonxs->encode($job_object);
1388 my $sha1 = Digest::SHA->new(1);
1389 $sha1->add($json);
1390 my $hex = $sha1->hexdigest;
1391 my $id = $redis->get("analysis:jobqueue:jobs:$hex\:id");
1392 unless ($id) {
1393 my($nid) = $redis->incr("analysis:jobqueue:next.job.id");
1394 if (my($setnx) = $redis->setnx("analysis:jobqueue:jobs:$hex\:id" => $nid)) {
1395 $id = $nid;
1396 } else {
1397 warn "WARNING: Should never happen, it would mean that two jobs calc the same thing at the same time. json[$json]hex[$hex]";
1398 $id = $redis->get("analysis:jobqueue:jobs:$hex\:id");
1401 $redis->set("analysis:jobqueue:jobs:$id\:descr" => $json); # will set again afterwards
1402 $redis->set("analysis:jobqueue:jobs:$id\:hex" => $hex);
1403 $redis->sadd("analysis:jobqueue:runningjobs" => $id); # must srem
1404 my $ret = system $system;
1405 my $finished = time;
1406 my $took = $finished - $start;
1407 $job_object->{finished} = $finished;
1408 $job_object->{took} = $took;
1409 $job_object->{ret} = $ret;
1410 $json = $jsonxs->encode($job_object);
1411 $redis->set("analysis:jobqueue:jobs:$id\:descr" => $json); # setting again
1412 $redis->srem("analysis:jobqueue:runningjobs" => $id); # as promised
1413 $redis->rpush("analysis:jobqueue:recentjobs" => $id);
1414 while ($redis->llen("analysis:jobqueue:recentjobs") > 10000) {
1415 my($del_id) = $redis->lpop("analysis:jobqueue:recentjobs");
1416 my($del_hex) = $redis->get("analysis:jobqueue:jobs:$del_id\:hex");
1417 $redis->del("analysis:jobqueue:jobs:$del_hex\:id");
1418 $redis->del("analysis:jobqueue:jobs:$del_id\:descr");
1419 $redis->del("analysis:jobqueue:jobs:$del_id\:hex");
1421 if ($took > 3600) {
1422 warn "Warning: slow calc on '$distv'; took[$took]";
1424 return $ret;
1427 =item read_ctx
1429 Reads the context of this distv from the yaml file
1431 =cut
1432 sub read_ctx {
1433 my($distv) = @_;
1434 my $outfile = sprintf "%s/solved/%s.yml", $workdir, $distv;
1435 my @stat = stat $outfile or return;
1436 YAML::Syck::LoadFile($outfile);
1439 =item store_ctx ($dbi,$ctx)
1441 $dbi may be undef. (For the record, we often have no dbi handle
1442 around, but if we have one, we want to pass it around. This has its
1443 historical reason in sqlite's misbehaving when faced with concurrent
1444 accesses. So we tried to keep the lifespan of dbi handles as short as
1445 possible.)
1447 Checks the hashref $ctx against the yaml file on disk, writes it if
1448 something has changed. Then writes it also to the database. If this is
1449 a first-time encounter and greenish is >= 3, it is also enqueued for a
1450 first calc.
1452 =cut
1453 sub store_ctx {
1454 my($dbi,$ctx) = @_;
1455 my $outfile = sprintf "%s/solved/%s.yml", $workdir, $ctx->{distv};
1456 my @stat = stat $outfile;
1457 my $want_write = 0;
1458 my $want_enqueue = 0;
1460 my $reduced_ctx = dclone $ctx;
1461 delete $reduced_ctx->{distro_age};
1462 my $curr_dump = YAML::Syck::Dump($reduced_ctx);
1463 if (@stat) {
1464 my $last_dump = do
1465 { open my $fh, $outfile or die "Could not open '$outfile': $!";
1466 local $/;
1467 <$fh>;
1469 $want_write = $curr_dump ne $last_dump;
1470 no warnings 'uninitialized';
1471 if ($want_write and $ctx->{lastcalc} and $ctx->{passfail_overview}) {
1472 my $old_ctx = YAML::Syck::Load($last_dump);
1473 if ($ctx->{lastcalc} ne $old_ctx->{lastcalc}) {
1474 # distlookup: last_calc_ts, lc_pass, lc_fail
1475 db_store_lastcalc(
1476 $dbi,
1477 $ctx->{distv},
1478 $ctx->{lastcalc},
1479 $ctx->{lc_pass},
1480 $ctx->{lc_fail},
1484 } else {
1485 $want_write = 1;
1486 if ($ctx->{greenish} && $ctx->{greenish} >= 3) {
1487 $want_enqueue = 1;
1490 if ($want_write) {
1491 mkpath dirname $outfile;
1492 open my $fh, ">", "$outfile.new" or die "Could not open > '$outfile.new': $!";
1493 print $fh $curr_dump;
1494 close $fh or die "Could not close: $!";
1495 rename "$outfile.new", $outfile or die "Could not rename: $!";
1497 db_quickaccess_insert_or_update
1499 undef,
1500 $ctx->{distv},
1501 $curr_dump,
1502 $ctx->{greenish},
1503 $ctx->{dist},
1506 if ($want_enqueue) {
1507 enqueue($ctx->{distv},0,1);
1509 mytouch("yml",{name => $ctx->{distv}});
1512 =item db_quickaccess_handle
1516 =cut
1517 sub db_quickaccess_handle {
1518 return CPAN::Blame::Config::Cnntp::common_quickaccess_handle();
1521 =item db_quickaccess_insert_or_update
1523 Convenience function for getting stuff into distcontext table. You
1524 cannot set anything to null here, only to some value. Any value given
1525 as undef is skipped in the generated update statement. Against common
1526 best practice, this code first looks whether a record exists and only
1527 does an insert if it does not exist. We believe this is OK since we
1528 never want to delete records here.
1530 Besides that, this function does nothing when the caller would not
1531 change anything.
1533 =cut
1534 sub db_quickaccess_insert_or_update {
1535 my($dbi,$distv,$yaml,$greenish,$dist) = @_;
1536 $dbi ||= db_quickaccess_handle();
1537 my $sql = "SELECT yaml,greenish,dist FROM distcontext WHERE distv=?";
1538 my $rows = my_get_query
1540 $dbi,
1541 $sql,
1542 $distv
1544 if (!@$rows) {
1545 $sql = "INSERT INTO distcontext (distv) VALUES (?)";
1546 my_do_query
1548 $dbi,
1549 $sql,
1550 $distv
1553 no warnings 'uninitialized';
1554 my(%othersets) = (yaml => $yaml, greenish => $greenish, dist => $dist);
1555 my(%oldvalues) =
1557 yaml => $rows->[0][0],
1558 greenish => $rows->[0][1],
1559 dist => $rows->[0][2],
1561 my(@set, @bind);
1562 while (my($k,$v) = each %othersets) {
1563 if (defined $v && $v ne $oldvalues{$k}) {
1564 push @set, "$k=?";
1565 push @bind, $v;
1568 return unless @bind;
1569 push @bind, $distv;
1570 # $sql = "UPDATE distcontext SET yaml=?, greenish=?, dist=? WHERE distv=?";
1571 $sql = sprintf "UPDATE distcontext SET %s WHERE distv=?", join(", ", @set);
1572 warn "DEBUG: sql[$sql] bind[@bind]";
1573 my_do_query($dbi, $sql, @bind);
1576 =item db_quickaccess_delete
1578 forbidden call, because we believe that we want to keep distv=>dist
1579 relation here forever and in any case. We use that relation in
1580 allversions stuff.
1582 =cut
1583 sub db_quickaccess_delete {
1584 my($distv) = @_;
1585 require Carp;
1586 Carp::confess("somebody called db_quickaccess_delete");
1587 my $dbi = db_quickaccess_handle();
1588 my $sql = "DELETE FROM distcontext WHERE distv=?";
1589 my_do_query
1591 $dbi,
1592 $sql,
1593 $distv
1597 =item mytouch
1600 =cut
1601 sub mytouch ($$) {
1602 my($type,$message) = @_;
1603 my $file = sprintf "%s/solved/lastchange_%s.ts", $workdir, $type;
1604 open my $fh, ">", $file or die "Could not open >'$file': $!";
1605 print $fh YAML::Syck::Dump($message);
1606 close $fh or die "Could not close '$file': $!";
1609 =item age_in_secs
1613 =cut
1614 sub age_in_secs {
1615 my($iso) = @_;
1616 my @date = unpack "a4 a1 a2 a1 a2", $iso; # 2006-07-04
1617 $date[2]--; # make month zero-based
1618 my $epoch = eval { timegm(0,0,0,@date[4,2,0]); };
1619 if ($@ || !$epoch) {
1620 require Carp;
1621 Carp::confess("ALERT: date[@date] iso[$iso]");
1623 time - $epoch;
1626 =item db_quickaccess_select_dist
1630 =cut
1631 sub db_quickaccess_select_dist {
1632 my($dbi,$distv) = @_;
1633 my $sql = "SELECT dist FROM distcontext WHERE distv=?";
1634 my $rows = my_get_query
1636 $dbi,
1637 $sql,
1638 $distv
1640 return unless @$rows>0;
1641 return $rows->[0][0];
1644 =item db_fixup_release_date
1646 Fetch a release date from metacpan and write it into
1647 release_date_metacpan in distlookup. Dies if the field is not empty.
1649 =cut
1650 sub db_fixup_release_date {
1651 my($dbi,$dist,$version,$distv) = @_;
1652 $dbi ||= db_quickaccess_handle();
1653 $distv ||= "$dist-$version";
1654 my $sql = "SELECT release_date_metacpan FROM distlookup WHERE distv=?";
1655 my $rows = my_get_query
1657 $dbi,
1658 $sql,
1659 $distv
1661 if (defined $rows->[0][0] and length $rows->[0][0]) {
1662 die "Alert: found record for distv '$distv' where release_date_metacpan is '$rows->[0][0]'";
1664 my $ua = LWP::UserAgent->new(agent => "analysis.cpantesters.org/$VERSION");
1665 $ua->default_header("Accept-Encoding", "gzip");
1666 my $jsonxs = JSON::XS->new->indent(0);
1667 # my $query = "http://api.metacpan.org/v0/release/_search?q=release.name:$distv&fields=name,date,status";
1668 my $query = "http://fastapi.metacpan.org/v1/release/_search?q=name:$distv&fields=name,date,status";
1669 my $resp = $ua->get($query);
1670 my $date;
1671 if ($resp->is_success) {
1672 my $content = $resp->decoded_content;
1673 my $mcpananswer = $jsonxs->decode($content);
1674 if (my $h = $mcpananswer->{hits}{hits}) {
1675 $date = $h->[0]{fields}{date};
1676 } else {
1677 warn sprintf "Warning: result from metacpan api has no hits: %s\n", $content;
1679 } else {
1680 die sprintf "No success getting from metacpan: query '%s' response '%s'", $query, $resp->as_string;
1682 warn sprintf "DEBUG: rows[%d] distv[%s] dist[%s] date[%s]\n", scalar @$rows, $distv, $dist, $date;
1683 $sql = "UPDATE distlookup set release_date_metacpan=? WHERE distv=?";
1684 my $success = my_do_query($dbi,$sql,$date,$distv);
1685 return
1687 success => $success,
1691 =item (void) db_store_lastcalc ($dbi, $distv, $last_calc_ts, $lc_pass, $lc_fail)
1693 Stores the last calc's coordinates. Short-circuits if they are already there.
1695 =cut
1696 sub db_store_lastcalc {
1697 my($dbi, $distv, $last_calc_ts, $lc_pass, $lc_fail) = @_;
1698 $dbi ||= db_quickaccess_handle();
1699 my $sql = "SELECT last_calc_ts, lc_pass, lc_fail FROM distlookup WHERE distv=?";
1700 my $rows = my_get_query
1702 $dbi,
1703 $sql,
1704 $distv
1706 if ( defined $rows->[0][0]
1707 and length $rows->[0][0]
1709 if ( $last_calc_ts eq $rows->[0][0]
1710 and $lc_pass == $rows->[0][1]
1711 and $lc_fail == $rows->[0][2]
1713 return; # short circuit
1714 } else {
1715 $sql = "UPDATE distlookup SET last_calc_ts=?, lc_pass=?, lc_fail=? WHERE distv=?";
1716 my $success = my_do_query($dbi, $sql, $last_calc_ts, $lc_pass, $lc_fail, $distv);
1717 unless ($success) {
1718 warn "Warning: something went wrong storing lastcalc stuff";
1720 return;
1722 } else {
1723 warn "Warning: something wrong with this record, rows->[0][0] empty";
1724 return;
1728 =item db_quickaccess_fixup
1730 as of 201401 never reached. Deprecated 201708.
1732 =cut
1733 sub db_quickaccess_fixup {
1734 my($dbi,$distv,$dist) = @_;
1735 die "db_quickaccess_fixup is deprecated";
1738 =item cleanup_quick
1740 as of 201401 never reached
1742 =cut
1743 sub cleanup_quick {
1744 my($touched_distv) = @_;
1745 my $slv_dir = sprintf("%s/solved", $workdir);
1746 opendir my $dh, $slv_dir or die "Could not opendir '$slv_dir': $!";
1747 my(%deleted_distv, %fixed_dist);
1748 my $i = 0;
1749 my @readdir = readdir $dh;
1750 warn sprintf "Info: readdir found %d entries in solv_dir[%s]\n", scalar @readdir, $slv_dir;
1751 my $skipped_cleanup_because_touched = 0;
1752 DIRENT: for my $dirent (@readdir) {
1753 next DIRENT if $dirent =~ /\~$/;
1754 $i++;
1755 my($basename,$exte) = $dirent =~ /(.+)\.(yml|slv|slvdv|slvdv\.gz|ts)$/;
1756 warn sprintf
1757 ("DEBUG: %s UTC: checking dirent %d: %s; total deleted: %d",
1758 scalar gmtime,
1760 $dirent,
1761 scalar keys %deleted_distv,
1762 ) unless $i % 10000;
1763 next DIRENT unless defined $exte;
1764 my $dbi = db_quickaccess_handle();
1765 my $dist = db_quickaccess_select_dist($dbi,$basename);
1766 if ($dist and !$fixed_dist{$dist}++) {
1767 db_quickaccess_fixup($dbi,$basename,$dist);
1769 next DIRENT if $exte =~ /^(yml|ts)$/;
1770 if ( $touched_distv->{$basename} ){
1771 $skipped_cleanup_because_touched++;
1772 # we may have to get a fresh context so we can delete
1773 # outdated modules now?
1774 next DIRENT;
1776 next DIRENT if $deleted_distv{$basename};
1777 # boring, boring: since we do not delete old stuff in the
1778 # directory, this pretends to delete in quickaccess all
1779 # things already deleted, 15000 distros and growing
1780 #### print "DEBUG: deleting from quickaccess $basename\n";
1781 db_quickaccess_delete($basename);
1782 $deleted_distv{$basename}++;
1784 warn "DEBUG: skipped cleanup because touched[$skipped_cleanup_because_touched]";
1787 =item megaquery
1791 =cut
1792 sub megaquery {
1793 use IPC::ConcurrencyLimit;
1794 my($basename) = File::Basename::basename(__FILE__);
1795 my $limit = IPC::ConcurrencyLimit->new
1797 max_procs => 1,
1798 path => "$workdir/IPC-ConcurrencyLimit-$basename-megaquery",
1800 my $limitid = $limit->get_lock;
1801 while (not $limitid) {
1802 warn "Another process appears to be running a megaquery, sleeping";
1803 sleep 300;
1804 $limitid = $limit->get_lock;
1806 my $dbi = mypgdbi();
1807 my $sqlimit = 30_000_000;
1808 my $elasticity = 2000;
1809 if ($Opt{maxfromdb} && $Opt{maxfromdb} < $sqlimit/$elasticity) {
1810 # the $elasticity here is a pure guess that says if we want
1811 # 200 hits, we should read 200*$elasticity records; indeed we
1812 # got 119 record from reading 200000
1813 $sqlimit = $elasticity*$Opt{maxfromdb};
1815 # memory bummer:
1816 my $sql = "SELECT id,state,dist,version,fulldate
1817 FROM cpanstats
1818 WHERE type=2 and state='fail'
1819 ORDER BY id DESC LIMIT $sqlimit";
1820 print "Info: about to megaquery ".gmtime()." UTC\n";
1821 my $rows = my_get_query($dbi, $sql);
1822 print "Info: megaquery done ".gmtime()." UTC\n";
1823 return $rows;
1826 =item main_enqueuer
1830 =cut
1831 sub main_enqueuer {
1832 my($want_ctgetreport, $touched_distv, $distroquestlabel, $anno, $timetogo,$urgenttimetogo) = @_;
1833 CTG: for my $ctxi (0..$#$want_ctgetreport) {
1834 my $ctx = $want_ctgetreport->[$ctxi];
1835 my($dist,$distv,$version) = @{$ctx}{"dist","distv","version"};
1836 $touched_distv->{$distv}=1;
1837 if ($Opt{pick}) {
1838 my $didit = call_ctgetreports($ctx,1+$ctxi,1+$#$want_ctgetreport);
1839 if ($didit) {
1840 my $lfh = $didit->{lfh};
1841 # update the yml, we might have to parse something in the new reports
1842 my $dbi = mypgdbi();
1843 $ctx = distroquestion($dbi,1+$ctxi,$distroquestlabel,$dist,$version,$lfh);
1844 $ctx->{annotation} = $anno->{$distv};
1845 $ctx->{lastcalc} = $didit->{lc_ts};
1846 $ctx->{lc_pass} = $didit->{lc_pass};
1847 $ctx->{lc_fail} = $didit->{lc_fail};
1848 store_ctx($dbi,$ctx);
1850 } else {
1851 enqueue($distv,$#$want_ctgetreport,$ctxi);
1853 my $time = time;
1854 my $mustgo = 0;
1855 if ($time > $urgenttimetogo) {
1856 $mustgo = 1;
1857 } elsif ($ctxi == $#$want_ctgetreport){
1858 $mustgo = 1;
1859 } elsif (
1861 && $time >= $timetogo
1862 && $ctxi >= MUST_CONSIDER
1864 $mustgo = 1;
1866 if ($mustgo) {
1867 my $utc = gmtime($timetogo);
1868 warn "DEBUG: proc[$$]ctxi[$ctxi]timetogo[$timetogo]that is[$utc UTC]";
1869 last CTG;
1874 =item this_urgent
1876 Originally the decision whether a new calc was due was taken by a
1877 random function. We looked at the timestamp of the yaml file and of
1878 the slvfile. Then we read the yaml file and there we concluded on the
1879 upload date. Then we decided how often we want to calc a distro
1880 depending on its age. We mixed some randomness in and said this is
1881 urgent or not. Mostly crap.
1883 Mixing in cpanstats_distrofacts and analysis:distv:calctimestamp?
1884 Especially interested in pass/fail on last calc and pass/fail now.
1886 In cpanstats_distrofacts I see only (for Crypt-Password-0.28)
1888 "db_fails" : 147,
1889 "db_maxid" : 41454838,
1890 "db_maxid_ts" : "2014-04-18T09:29:04.016700Z",
1891 "db_others" : 1,
1892 "db_passes" : 1923,
1893 "thirdfail" : "2012-02-19 02:40z",
1894 "thirdfailid" : "20108048",
1895 "thirdpass" : "2012-02-18 03:29z",
1896 "thirdpassid" : "20078591"
1898 analysis:distv:calctimestamp has only 2500 entries at the moment, so
1899 is not too interesting yet.
1901 Nein, distlookup ist unser Kandidat. Diese pg Tabelle hat
1903 cuti_ts | timestamp with time zone |
1904 cuti_pass | integer |
1905 cuti_fail | integer |
1906 last_calc_ts | timestamp with time zone |
1907 lc_pass | integer |
1908 lc_fail | integer |
1910 wobei cuti fuer counted_up_to_id steht. Und wobei last_calc_ts
1911 anscheinend redundant auf analysis:distv:calctimestamp ist.
1913 =cut
1914 sub this_urgent {
1915 my($dbi,$dist,$version,$distv,$anno) = @_;
1916 $distv ||= "$dist-$version";
1917 $anno ||= read_annotations();
1919 $dbi ||= db_quickaccess_handle();
1920 my $sql = "SELECT last_calc_ts, lc_pass, lc_fail, cuti_ts, cuti_pass, cuti_fail
1921 FROM distlookup WHERE distv=?";
1922 my $rows = my_get_query
1924 $dbi,
1925 $sql,
1926 $distv
1928 my $expensive_type = 0; # 9 huge, 0 not
1929 my $lowactivity = "";
1930 my $maybeexpensive = "";
1931 if (!$rows->[0][4] || !$rows->[0][5]) {
1932 return +{ this_urgent => 0 };
1934 unless (grep { ($rows->[0][$_]||0)==0 } 1,2,4,5) {
1935 my($lc_ts,$lc_p,$lc_f,$cnt_ts,$cnt_p,$cnt_f) = @{$rows->[0]};
1936 if ($cnt_p>1.025*$lc_p || $cnt_f>1.025*$lc_f) { # arbitrary
1937 my $lc_total = $lc_p + $lc_f;
1938 if ( $lc_total < 2400 ){
1939 return +{ this_urgent => 1 }; # activity
1940 } else {
1941 my($limiting) = min($cnt_p,$cnt_f);
1942 $maybeexpensive = "over 2400: $lc_p=>$cnt_p, $lc_f=>$cnt_f";
1943 if ( $lc_total/$limiting > 250 ) { # arbitrary
1944 $expensive_type = 9;
1945 } else {
1946 $expensive_type = 5;
1949 } elsif ($cnt_p < $lc_p || $cnt_f < $lc_f) { # arbitrary
1950 # we would like to get rid of such brethren; maybe run verify-distlookup-lcpassfail.pl
1951 warn "Sigh, $distv has cnt_p < lc_p || cnt_f < lc_f ($cnt_p < $lc_p || $cnt_f < $lc_f)";
1952 return +{ this_urgent => 0 }; # fix up bad numbers
1953 } elsif ($cnt_p == $lc_p && $cnt_f == $lc_f) {
1954 $lowactivity = "NO growth: $lc_p=>$cnt_p, $lc_f=>$cnt_f";
1955 } elsif ($cnt_p < 1.002*$lc_p && $cnt_f < 1.002*$lc_f) { # arbitrary
1956 $lowactivity = "small growth: $lc_p=>$cnt_p, $lc_f=>$cnt_f";
1960 my($csdfacts) = cpanstats_distrofacts($dbi,$dist,$version,$distv,undef);
1961 if (
1962 $csdfacts
1963 && ! $expensive_type
1964 && $csdfacts->{allversions}
1965 && "ARRAY" eq ref $csdfacts->{allversions}
1966 && scalar @{$csdfacts->{allversions}}
1967 && exists $csdfacts->{allversions}[-1]{version}
1968 && $csdfacts->{allversions}[-1]{version} ne $version
1970 warn "DEBUG: this is urgent: $distv $version ne $csdfacts->{allversions}[-1]{version}";
1971 return +{ this_urgent => 1 };
1973 if ($lowactivity) {
1974 return +{ this_urgent => 0, why_not_urgent => $lowactivity };
1977 my $averagewait = 86400; # XXX arbitrary: the smaller the more
1978 # calc; Actually: we use rand to avoid
1979 # spikes; zero waits and a double of this
1980 # wait are equally likely
1982 my $minimumwait = 1.9; # XXX arbitrary: the smaller the more calc;
1983 # Actually: for every year of age of the
1984 # distro we add this times $averagewait
1985 # waiting time. Since no age below 1 is
1986 # possible, this is the minimum wait;
1987 # setting from 1.7.to 1.8 when the usually
1988 # set task was 3500 but we only calculated
1989 # 1000; either we increase the mincalc or
1990 # reduce the pensum, I would guess;
1991 # 20160527: taking back to (1.7, 2500),
1992 # disk gets too full too quickly
1994 if ($maybeexpensive) {
1995 # XXX arbitrary: if we have indications that the calc takes a
1996 # longer time, push back
1997 $minimumwait = $expensive_type >= 6 ? 42 : 15;
1999 if ($anno->{$distv} && $minimumwait < 5) {
2000 $minimumwait = 5; # e.g. GFUJI/MouseX-Getopt-0.34.tar.gz
2003 my $slv_file = slv_file($distv);
2004 my @slvstat = stat ($slv_file . "dv.gz");
2005 @slvstat = stat $slv_file unless @slvstat;
2006 my $age_of_calc = @slvstat ? time-$slvstat[9] : 999999;
2008 my $yaml = read_ctx($distv);
2009 if ($anno->{$distv}
2010 and (!$yaml->{annotation} || $yaml->{annotation} ne $anno->{$distv})) {
2011 # looks like this annotation has not been stored yet
2012 return +{ this_urgent => 1 };
2014 my ($this_urgent, $why_not_urgent) = (0,"");
2015 my $age_in_secs = eval { age_in_secs($yaml->{upload_date}) }; # distro age in secs
2016 if ($@ || !$age_in_secs) {
2017 my $err = $@;
2018 require Carp;
2019 Carp::cluck("WARNING: skipping '$yaml->{distv}' error[$@] upload_date[$yaml->{upload_date}] distv[$distv]");
2020 $this_urgent = 0;
2021 if ($err) {
2022 $why_not_urgent = $err;
2023 } else {
2024 $why_not_urgent = "no upload date";
2026 return +{ this_urgent => $this_urgent, why_not_urgent => $why_not_urgent};
2028 my $seconds_per_year = 86400*365.25;
2029 my $age_points = $age_in_secs < $seconds_per_year ?
2030 1 : $age_in_secs/$seconds_per_year; # between 1 and 8
2031 my $rand = int(rand(2*$averagewait));
2032 my $rand2 = int($age_points*($rand+$averagewait*$minimumwait));
2034 $this_urgent = $age_of_calc > $rand2; # XXX arbitrary
2035 warn sprintf "DEBUGurg: age_of_calc[%s] age_points[%.3f] rand[%s] rand2[%s] urg[%d]\n",
2036 $age_of_calc, $age_points, $rand, $rand2, $this_urgent;
2037 if (!$this_urgent) {
2038 if (!@slvstat) {
2039 $this_urgent = 1;
2040 warn "DEBUG urg2: urgent because no slvstat"; # never happens
2041 } else {
2042 $why_not_urgent = "age/point calculations";
2045 return +{ this_urgent => $this_urgent, why_not_urgent => $why_not_urgent};
2048 my($timetogo,$urgenttimetogo);
2050 $timetogo = $^T + ($Opt{mincalctime} || MIN_CALCTIME);
2051 $urgenttimetogo = MAX_CALCTIME + $^T;
2053 MAIN: {
2054 my $mainentrytime = time;
2055 my $solved = [];
2056 my $anno = {};
2057 if ($Opt{onlystartwhennojobs}) {
2058 my $redis = myredis;
2059 WAITNOQ: while () {
2060 my($zcard) = $redis->zcard("analysis:jobqueue:q");
2061 last WAITNOQ if $zcard == 0;
2062 sleep 60;
2065 if ($Opt{pick}) {
2066 warn "saw --pick=$Opt{pick}, skipping readdir on solved/ directory";
2067 my $shadow_anno = read_annotations();
2068 if (exists $shadow_anno->{$Opt{pick}}) {
2069 $anno->{$Opt{pick}} = $shadow_anno->{$Opt{pick}};
2071 } else {
2072 $anno = read_annotations();
2073 unless (defined $Opt{maxfromsolved} && 0 == $Opt{maxfromsolved}) {
2074 # looks in workdir/solved/ where we find eg. a2pdf-1.13.{slv,slvdv,yml}
2075 $solved = CPAN::Blame::Model::Solved->all("readdir");
2078 my(%seen,@want_ctgetreport);
2079 # revisit all our friends first
2080 my $touched_distv = {};
2081 ANNO_SOLVED: for my $set
2083 {name => "anno", data => $anno}, # anno is a hashref
2084 {name => "solved", data => $solved}, # solved is an arrayref
2086 my @k; # names like "a2pdf-1.13"
2087 my $urgent = $Opt{urgent} || 0;
2088 if (0) {
2089 } elsif ($Opt{pick}) {
2090 # pick trumps, the others don't play when we are a worker
2091 @k = $Opt{pick};
2092 } elsif ($set->{name} eq "anno") {
2093 @k = keys %{$set->{data}};
2094 if (defined $Opt{maxfromannotate}) {
2095 pop @k while @k > $Opt{maxfromannotate};
2097 } elsif ($set->{name} eq "solved") {
2098 @k = map { $_->{distv} } @{$set->{data}};
2099 if (defined $Opt{maxfromsolved}) {
2100 pop @k while @k > $Opt{maxfromsolved};
2102 } else {
2103 die;
2105 my $distroquestlabel = substr($set->{name},0,4) . "-" . scalar @k; # gives a bit of orientation in the runlog
2106 my $i = 0;
2107 DISTV: for my $k (@k) {
2108 $touched_distv->{$k}=1;
2109 my $d = CPAN::DistnameInfo->new("FOO/$k.tgz");
2110 my $dist = $d->dist;
2111 my $version = $d->version;
2112 ++$i;
2113 next DISTV if $seen{$dist,$version}++;
2114 my($this_urgent, $why_not_urgent);
2115 my $slv_file = slv_file($k);
2116 if ($urgent) {
2117 $this_urgent = 1;
2118 } elsif (! -e ($slv_file."dv.gz") && ! -e ($slv_file."dv")) {
2119 $this_urgent = 1;
2120 } elsif ($Opt{pick}) {
2121 $this_urgent = 1;
2122 } else {
2123 my $href = this_urgent(undef,$dist,$version,$k,$anno);
2124 ($this_urgent, $why_not_urgent) = @{$href}{qw(this_urgent why_not_urgent)};
2126 unless ($this_urgent){
2127 printf "Warning: skipping %s (%s)\n", $k, $why_not_urgent;
2128 next DISTV;
2130 my $pgdbi = mypgdbi();
2131 my $ctx = distroquestion($pgdbi,$i,$distroquestlabel,$dist,$version,undef) or next;
2132 $ctx->{annotation} = $anno->{$k};
2133 store_ctx($pgdbi,$ctx);
2134 push @want_ctgetreport, $ctx;
2136 } # ANNO_SOLVED
2137 my $rows = [];
2138 if ($Opt{pick}) {
2139 } else {
2140 unless (defined $Opt{maxfromdb} && 0==$Opt{maxfromdb}) {
2141 $rows = megaquery();
2144 my $skip_random = 0;
2145 my $i = 0;
2146 my $mindate;
2148 my @now = gmtime;
2149 $now[4]++;
2150 $now[5]+=1900;
2151 $mindate = sprintf "%04d%02d%02d%02d%02d", $now[5]-$CPAN::Blame::Config::Cnntp::Config->{maxyears}, @now[reverse 1..4];
2153 warn sprintf
2155 "DEBUG: before going through megaquery result: array want_ctgetreport[%d] rows from megaquery[%d]\n",
2156 scalar @want_ctgetreport,
2157 scalar @$rows,
2159 my $distroquestlabel = "B" . scalar @$rows;
2160 my $why_last_article = "";
2161 ARTICLE: while (my $row = shift @$rows) {
2162 sleep 0.3;
2163 my($id,$state,$dist,$version,$date) = @$row;
2164 unless ($id){
2165 $why_last_article = "no id in row[@$row]";
2166 last ARTICLE;
2168 if ($date lt $mindate){
2169 $why_last_article = "date[$date] lt mindate[$mindate] i[$i]";
2170 last ARTICLE;
2172 next ARTICLE unless defined $dist and defined $version;
2173 if ($seen{$dist,$version}++){
2174 $why_last_article = "last action was skipping $dist,$version (reaching last ARTICLE via next)";
2175 next ARTICLE;
2177 # note: when we arrive here. We will display current $i in the leftmost column as in
2178 # 170 B2801844 13-10-29 147: 21: 0 KSTAR/Dancer-Plugin-DynamicConfig-0.04 (<0.07)
2179 $i++;
2180 if ($Opt{maxfromdb} && $i > $Opt{maxfromdb}) {
2181 last ARTICLE;
2183 my $pgdbi = mypgdbi();
2184 my $ctx = distroquestion($pgdbi,$i,$distroquestlabel,$dist,$version,undef) or next;
2185 $ctx->{annotation} = $anno->{$ctx->{distv}};
2186 store_ctx($pgdbi,$ctx);
2187 if ($ctx->{greenish} && $ctx->{greenish} >= 3) {
2188 push @want_ctgetreport, $ctx;
2190 if ($i >= 487) { # XXX arbitrary
2191 if ($skip_random) {
2192 # note: in 2013 we never reached 800, we reached the 8
2193 # year old distros before that, yesterday at $i==666,
2194 # and want_ctgetreport had 3823 entries
2195 if ($i >= 1200 && @want_ctgetreport >= 200) { # XXX arbitrary
2196 $why_last_article = "i[$i] > xxx and want_ctgetreport > 150";
2197 last ARTICLE;
2198 } elsif ($i >= 800 && @want_ctgetreport >= 300) { # XXX arbitrary
2199 $why_last_article = "i[$i] > xxx and want_ctgetreport > 200";
2200 last ARTICLE;
2202 } else {
2203 printf "Debug: switching to skip_random mode\n";
2204 $skip_random = 1;
2207 if ($skip_random) {
2208 # compensate for the fact that we do not have and do not
2209 # even want to provide the computing power to calculate
2210 # all possible regressions simultaneously; yes, we are a
2211 # bit sloppy but do not believe anybody will notice; some
2212 # day we may want to stop this program to go into the
2213 # oldest data in the database because it will become
2214 # worthless; then we probably want to make sure all data
2215 # in a certain range get processed exactly once.
2216 my $skip = int rand 40000; # XXX arbitrary
2217 splice @$rows, 0, $skip;
2219 sleep 0.1;
2220 $why_last_article = "simply reached end of loop";
2222 warn sprintf "DEBUG: Reached last ARTICLE why?[%s] array want_ctgetreport[%d]", $why_last_article, scalar @want_ctgetreport;
2223 @want_ctgetreport = sort
2225 # Wishlist: first sort criterium should be in favor of
2226 # those who have no calc yet at all. But then we might
2227 # schedule them too early because cpantesters are behind
2228 # log.txt
2229 ($b->{upload_date}||"") cmp ($a->{upload_date}||"")
2231 ($a->{distro_age}||0) <=> ($b->{distro_age}||0)
2233 $a <=> $b # crap, just the address because there is nothing better in sight
2234 } @want_ctgetreport;
2235 $distroquestlabel = "ctg-" . scalar @want_ctgetreport;
2236 main_enqueuer(\@want_ctgetreport, $touched_distv, $distroquestlabel, $anno, $timetogo,$urgenttimetogo);
2237 if ($Opt{pick}) {
2238 warn "nothing will be deleted, we're on a picking trip";
2239 } else {
2240 # XXX temporarily disabling cleanup_quick on 2014-01-01;
2241 # because we do not understand the exact implications of the
2242 # queuing system; but this for sure must be re-examined.
2243 #### cleanup_quick($touched_distv);
2244 WAITTOGO: while (time < $timetogo) {
2245 if ($Opt{leavewhennojobsleft}) {
2246 my $redis = myredis;
2247 my($zcard) = $redis->zcard("analysis:jobqueue:q");
2248 last WAITTOGO if $zcard == 0;
2250 sleep 60;
2258 __END__
2260 =back
2262 =head1 BUGS
2264 TRIAL?
2266 Too slow update cycle
2268 Often does not recognize when a newer version is available, maybe only
2269 when the newer version is all-green or so?
2271 Sqlite
2273 =head1 HISTORICAL NOTES
2275 show the most recent FAILs on cpantesters. Compare to annotations.txt
2276 which I maintain manually in annotations.txt. Roughly the job is
2277 organized as (1) update the database syncing with cpantesters, (2)
2278 process (2.1) annotations, (2.2) previously solved regressions, (2.3)
2279 database to schedule some 1000 batchjob-candidates with (2.3.1) is the
2280 megaquery and (2.3.2) steps through it. (3) Run these batchjobs, many
2281 of which are being skipped. (4) delete several thousand outdated files
2282 and/or directories. Timings on 2013-10-29 and -30
2284 | | 20131029 | 20131127 |
2285 |---------+----------+----------|
2286 | (1) | 06:30:09 | |
2287 | (2.1) | 10:54:22 | 10:28:49 |
2288 | (2.2) | 15:15:43 | 11:09:37 |
2289 | (2.3.1) | 20:41:42 | 13:26:49 |
2290 | (2.3.2) | 21:08:51 | 13:29:02 |
2291 | (3) | 22:53:15 | 14:29:28 |
2292 | (4) | 16:44:07 | |
2293 | (T) | 16:45:00 | |
2295 The schema of cpanstats.db once was considered to be
2297 0 | id | 5379605 |5430514
2298 1 | state | pass |fail
2299 2 | postdate | 200909 |
2300 3 | tester | bingos@cpan.org |
2301 4 | dist | Yahoo-Photos |Apache-Admin-Config
2302 5 | version | 0.0.2 |0.94
2303 6 | platform | i386-freebsd-thread-multi-64int |i386-freebsd
2304 7 | perl | 5.10.0 |
2305 8 | osname | freebsd |
2306 9 | osvers | 7.2-release |7.2-release
2307 10 | date | 200909190440 |200909190440
2309 but has changed to
2311 id INTEGER PRIMARY KEY, -- 1
2312 state TEXT, -- 2
2313 postdate TEXT, -- 3
2314 tester TEXT, -- 4
2315 dist TEXT, -- 5
2316 version TEXT, -- 6
2317 platform TEXT, -- 7
2318 perl TEXT, -- 8
2319 osname TEXT, -- 9
2320 osvers TEXT, -- 10
2321 date TEXT, -- 11
2322 guid char(36) DEFAULT '', -- 12
2323 type int(2) default 0) -- 13
2325 guid and type are new. guid replaces id but is in a different format. schema is generated by
2327 Revision history for Perl module CPAN::Testers::Data::Generator.
2329 0.41 18/03/2010
2330 - fixes to change the 'id' (was NNTP ID) to an auto incremental field.
2331 - reworked logic to better fit latest changes.
2332 - added repository to META.yml.
2333 - documentation updates.
2335 0.40 02/02/2010
2336 - fixes to accommodate GUID changes.
2337 - added support for 'type' field.
2339 From reading the source, type seems to be 0 or 2 for valid records. 1
2340 is for pause announcements and 3 is for disabled reports. IIUC 0 is
2341 for old records and 2 for new ones. That would mean that there is a
2342 constant number of records with type==0?
2344 A few months ago we counted:
2346 sqlite> select type,count(*) from cpanstats group by type;
2347 0|1570
2348 1|123723
2349 2|6899753
2350 3|15131
2352 and today:
2354 sqlite> select type,count(*) from cpanstats group by type;
2355 0|1570
2356 1|130858
2357 2|7745256
2358 3|15131
2360 So we can rely on having to read only type==2 from now on.
2362 But before we can do anything we must
2363 s/CPAN::WWW::Testers::Generator::Database/CPAN::Testers::Data::Generator/g; or
2364 something equivalent.
2366 I see no reason why I should make use of the guid except maybe for
2367 links to cpantesters. Internal stuff should work with id just like it
2368 always did.
2370 Update 2011-10-12 andk: schema change again, new schema is:
2372 id INTEGER PRIMARY KEY, -- 1
2373 guid TEXT, -- 12
2374 state TEXT, -- 2
2375 postdate TEXT, -- 3
2376 tester TEXT, -- 4
2377 dist TEXT, -- 5
2378 version TEXT, -- 6
2379 platform TEXT, -- 7
2380 perl TEXT, -- 8
2381 osname TEXT, -- 9
2382 osvers TEXT, -- 10
2383 fulldate TEXT, -- is new, probably was date (11)
2384 type INTEGER -- 13
2388 =head1 TODO
2390 open tickets?