new perls v5.39.10
[andk-cpan-tools.git] / bin / rsync-over-recentfile.pl
blob4574f639f9480a82a3099544570af16d425cd81c
1 #!/usr/bin/perl
3 =pod
5 Q1: sorted by epoch? Who has to sort when how often?
7 A11 (wrong): nobody! The mechanism is "event" based and the array is only
8 running push and shift (and grep). It is a journal (that throws memory
9 away based on an interval) and is itself NOT rsynced but in fact (?)
10 rewritten by every slave.
12 A12: usually nobody because the normal flow of things just sort
13 everything in the right slots. But in case somebody breaks the rules,
14 she has to sort accordingly and set a dirtymark flag.
16 Q2: or we do not ever promise that the timestamps are sorted or mirror
17 the sequence of events or we make them floating point numbers so they
18 become uniq and can be treated as hash keys [this I like!]). Please
19 keep in mind that we must be able to help customers who have
20 6000000000 files.
22 A21: yes, we promise the timestamps are sorted. They are floats and
23 keeing them sorted and uniq should be doable without too complicated
24 algorithms.
26 Update 2007-10-21 akoenig no 4711 timestamp evar! They become
27 4711.xxxx, 4711.xxxy, etc. OR something like UUID. But if two hosts
28 differ in the order, comparing them is less simple.
30 Interesting/Ugly is the second generation problem if we let the recent
31 file be written to disk too early. We should not write as long as it
32 contains files yet to be mirrored. Easiest solved with File::Temp.
33 Make a random filename for each loop and rename at the end. DONE
35 Interesting/Advanced is the idea to write a new RECENT file at the end
36 of the inner loop. When a server falls behind for some reason and
37 catches up, then the dependents recover much earlier. LATER (needs the
38 write_recent routines in a module).
40 Stupid is the idea to rewrite large files as
41 chunks to be concatenated later.
43 Stupid is the idea to allow an
44 operation "rename". Mirror remote files to temporary local filenames,
45 then move these to the final name. After every X megabyte a RECENT
46 file is written including the currently running mirrored file with its
47 temporary name. Then the rename op is written to the RECENT file. Lots
48 of race conditions, certainly solvable but a bit complicated. (Update
49 2007-10-21 akoenig maybe a copy operation is better)
51 Bug/Trap: files in recent that do not exist at the source. If we get
52 no RECENT file we just wait until we get one again. Other files that
53 we miss? $degraded_mode? Most conservative approach: reset
54 $max_epoch_ever, do not pass on this recent file, retry the whole
55 collection. The last bit is drastic but given that rsync can handle
56 files quickly that are OK makes it not look that catastrophic. DONE.
58 Interesting/Funny is the idea that dependents fetch files from each
59 other.
61 If dependents merge RECENT files from the peers this would lead to the
62 trap that some objects get added to an older second and the algorithm
63 would have to invalidate $max_epoch_ever. So if they do not merge
64 RECENT files it is easy, otherwise it gets ugly.
66 Interesting/intimidating is the idea that the statistics done by
67 Pennink(sp?) suddenly might get less reliable.
69 Reconsidering the whole: if slaves only write RECENT files reflecting
70 what they already have got, valuable information gets lost. If they
71 write complete and unaltered RECENT files they lead the 2nd generation
72 slaves down the wrong road. Solution might be: they add to the RECENT
73 file the information what they already have. got_at => $epoch? have =>
74 1? status => "todo"? Think of errors/retry status. reminds me of
75 nosuccess_count and nosuccess_time in the pause daemon.
77 and we want deprecation of the whole modules/by_* directories.
79 =cut
81 use strict;
82 use warnings;
83 use File::Basename qw(dirname);
84 use File::Spec;
85 use Getopt::Long qw(GetOptions);
86 use List::Util qw(min);
87 use Time::HiRes qw(sleep);
89 use lib "/home/k/sources/rersyncrecent/lib/";
90 require File::Rsync::Mirror::Recentfile;
92 our %Opt;
93 GetOptions(\%Opt,
94 "loops=i",
95 "verbose!",
96 ) or die;
98 my $loop = 0;
99 my %reached;
100 ITERATION: while () {
101 last if $Opt{loops} && $loop++ >= $Opt{loops};
102 my $iteration_start = time;
104 for my $tuple ([authors => "6h"],[modules => "1h"]) {
105 my($rmodule,$interval) = @$tuple;
106 my $rf = File::Rsync::Mirror::Recentfile->new
108 canonize => "naive_path_normalize",
109 filenameroot => "RECENT",
110 ignore_link_stat_errors => 1,
111 interval => $interval,
112 localroot => "/home/ftp/pub/PAUSE/$rmodule",
113 remote_dir => "",
114 remote_host => "pause.perl.org",
115 remote_module => $rmodule,
116 rsync_options => {
117 # intenionally not using archive=>1 because it contains "r"
118 compress => 1,
119 'rsync-path' => '/usr/bin/rsync',
120 links => 1,
121 times => 1,
122 'omit-dir-times' => 1,
123 checksum => 1,
125 verbose => $Opt{verbose},
128 $rf->mirror(after => $reached{$rmodule}||0, "skip-deletes" => 1);
129 my $re = $rf->recent_events;
130 $reached{$rmodule} = $re->[0]{epoch};
132 $reached{now} = time;
133 for my $k (keys %reached) {
134 next if $k =~ /T/;
135 $reached{$k . "T"} = scalar localtime $reached{$k};
137 require YAML::Syck; print STDERR "Line " . __LINE__ . ", File: " . __FILE__ . "\n" . YAML::Syck::Dump(\%reached); # XXX
139 my $minimum_time_per_loop = 20;
140 my $sleep = $iteration_start + $minimum_time_per_loop - time;
141 if ($sleep > 0.01) {
142 sleep $sleep;
143 } else {
144 # negative time not invented yet
148 print "\n";
150 __END__
152 # Local Variables:
153 # mode: cperl
154 # cperl-indent-level: 2
155 # End: