From 6d63e96f2e22e307050d5a3c4946267030d5959b Mon Sep 17 00:00:00 2001 From: "Andreas J. Koenig" Date: Mon, 13 Apr 2009 10:26:46 +0200 Subject: [PATCH] additional talk about the protocol and its relation to bittorrent --- Changes | 2 +- Todo | 32 ++++++++++++++++++++++++++++++++ lib/File/Rsync/Mirror/Recent.pm | 37 +++++++++++++++++++++++++------------ 3 files changed, 58 insertions(+), 13 deletions(-) diff --git a/Changes b/Changes index 6407a84..7776552 100644 --- a/Changes +++ b/Changes @@ -1,4 +1,4 @@ -2009-04-12 Andreas J. Koenig +2009-04-13 Andreas J. Koenig * release 0.0.4 diff --git a/Todo b/Todo index 6c994b8..0c8d2ae 100644 --- a/Todo +++ b/Todo @@ -1,12 +1,44 @@ 2009-04-13 Andreas J. Koenig + * Study http://en.wikipedia.org/wiki/Magnet_URI_scheme + + * is it true (as stated at + https://fedorahosted.org/InstantMirror/wiki/ExistingRepositoryReplicationMethods) + that rsync can lead to errors when upstream changes before client sync + has completed? Can we see the error when we run this on upstream: + +% perl -e ' +use Time::HiRes qw(time); +while (){ + open my $fh, ">", "changingfile.txt.new" or die; + print $fh (time."\n") x 1000000; + close $fh; + rename "changingfile.txt.new", "changingfile.txt" or die; +} +' + + And on the receiving end: + +% while true; do +rsync k75:`pwd`/changingfile.txt .; cat changingfile.txt| uniq|wc -l +sleep 1; +done + + I cannot get it to fail. Apparently it is sufficient when upstream + always writes atomically (which of course is mandatory)? + * from ABH: > https://fedorahosted.org/InstantMirror/ > https://www.redhat.com/mailman/listinfo/instantmirror-list > irc.freenode.net channel #instantmirror + A cool name for a project. Inspires me to write a few sentences about + bittorrent's role in the grand picture. + http://spreadsheets.google.com/pub?key=pGlWX10blP4u2kM05SDtiMg is a + spreadsheet collected by Atul Aggarwal about bittorrent implementations. + 2009-04-12 Andreas J. Koenig * Interesting last minute bug during real download testing: the output diff --git a/lib/File/Rsync/Mirror/Recent.pm b/lib/File/Rsync/Mirror/Recent.pm index ff237ad..404b9f8 100644 --- a/lib/File/Rsync/Mirror/Recent.pm +++ b/lib/File/Rsync/Mirror/Recent.pm @@ -912,22 +912,27 @@ Normally it takes a long time to determine the diff itself before it can be transferred. Known solutions at the time of this writing are csync2, and rsync 3 batch mode. -For many years the best solution was csync2 which solves the problem -by maintaining a sqlite database on both ends and talking a highly -sophisticated protocol to quickly determine which files to send and -which to delete at any given point in time. Csync2 is often +For many years the best solution was B which solves the +problem by maintaining a sqlite database on both ends and talking a +highly sophisticated protocol to quickly determine which files to send +and which to delete at any given point in time. Csync2 is often inconvenient because it is push technology and the act of syncing demands quite an intimate relationship between the sender and the receiver. This is hard to achieve in an environment of loosely coupled -sites where the number of sites is large or connections are -unreliable or network topology is changing. +sites where the number of sites is large or connections are unreliable +or network topology is changing. -Rsync 3 batch mode works around these problems by providing rsync-able -batch files which allow receiving nodes to replay the history of the -other nodes. This reduces the need to have an incestuous relation but -it has the disadvantage that these batch files replicate the contents -of the involved files. This seems inappropriate when the nodes already -have a means of communicating over rsync. +B works around these problems by providing +rsync-able batch files which allow receiving nodes to replay the +history of the other nodes. This reduces the need to have an +incestuous relation but it has the disadvantage that these batch files +replicate the contents of the involved files. This seems inappropriate +when the nodes already have a means of communicating over rsync. + +B at https://fedorahosted.org/InstantMirror/ is an +ambitious project that tries to combine various technologies to +overcome the current situation. It's been founded in 2009-03 and at +the time of this writing it is still a bit early to comment on. rersyncrecent solves this problem with a couple of (usually 2-10) lightweight index files which cover different overlapping time @@ -942,6 +947,14 @@ and economic it is also a general purpose solution. I'm looking forward to see a CPAN backbone that is only a few seconds behind PAUSE. And then ... the first FUSE based CPAN filesystem anyone? +=head1 LIMITATIONS + +If the tree of the master server is changing faster than the bandwidth +permits to mirror then additional protocols may need to be deployed. +Certainly p2p/bittorrent can help in such situations because +downloading sites help each other and bittorrent chunks large files +into pieces. + =head1 FUTURE DIRECTIONS Currently the origin server must keep track of injected and removed -- 2.11.4.GIT