2 # vim:ts=4 sw=4 et ft=perl:
3 eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
4 if 0; # not running under some shell
5 ############################################################################
9 mogtool -- Inject/extract data to/from a MogileFS installation
11 B<WARNING>: this utility is deprecated! See L<MogileFS::Utils>
15 $ mogtool [general-opts] <command> [command-opts] <command-args>
17 $ mogtool --trackers=127.0.0.1:6001 --domain=foo --class=bar ...
18 $ mogtool --conf=foo.conf ...
20 $ mogtool inject thefile.tgz thefilekey
21 $ mogtool inject --bigfile thebigfile.tgz thefilekey
22 $ mogtool inject --bigfile --gzip thebigfile.tar thefilekey
23 $ mogtool inject --bigfile --gzip mydirectory thedirkey
24 $ mogtool inject --bigfile --gzip /dev/hda4 thedevkey
25 $ mogtool inject --nobigfile bigcontiguousfile bigcfilekey
27 $ mogtool inject --bigfile --gzip --verify \
28 --description="Description" \
29 --receipt="foo@bar.com, baz@bar.com" \
30 --concurrent=5 --chunksize=32M \
31 somehugefile thefilekey
33 $ mogtool extract thefilekey thenewfile.tgz
34 $ mogtool extract thefilekey -
35 $ mogtool extract --bigfile thedirkey .
36 $ mogtool extract --bigfile --asfile thedirkey thefile.tgz
37 $ mogtool extract --bigfile thedevkey /dev/hda4
39 $ mogtool delete thekey
41 $ mogtool locate --noverify thekey
42 $ mogtool locate --bigfile thekey
47 =head1 GENERAL OPTIONS
53 Turn on MogileFS debug output.
55 =item --trackers=<[preferred_ip/]ip:port>[,<[preferred_ip/]ip:port>]*
57 Specify one or more trackers for your MogileFS installation. Note that
58 you can specify preferred IPs to override the default IPs with. So it
59 would look something like B<10.10.0.1/10.0.0.1:8081>.
61 =item --domain=<domain>
63 Set the MogileFS domain to use.
67 Set the class within the domain to use. Defaults to _default.
71 Specify a configuration file to load from.
73 =item --lib=<directory>
75 Specify a directory to use as a library path. Right now, this should
76 be the directory where you expect to find the MogileFS.pm file, if it's
77 not actually installed.
87 Insert a resource into MogileFS. See L</"INJECT OPTIONS"> and L</"INJECT ARGUMENTS">
88 for the rest of how to use the inject mode.
92 Extract a resource from MogileFS. See L</"EXTRACT OPTIONS"> and L</"EXTRACT ARGUMENTS">
93 for how to use extract.
97 Delete a resource. See L</"DELETE OPTIONS"> and L</"DELETE ARGUMENTS">.
101 List the paths to the file identified by the given key.
105 List all big files contained in MogileFS. No options, no arguments.
107 =item listkey|lsk key
109 List all files which match the key. Key is just a prefix, and this will list
110 all keys which match the prefix. So if you specify key as "ABC1" then you'll
111 get all keys which start with the characters "ABC1"
115 =head1 INJECT OPTIONS
117 The following options are used to control the behavior of the injector.
123 If specified, use chunking to break the resource into manageable pieces.
125 =item --chunksize=<size>[B|K|M|G]
127 When instructed to break files into chunks, the injector will use the specified
128 chunk size as the maximum chunk size. Defaults to 64M. You can specify the
129 chunk size manually and specify the units--defaults to bytes.
133 If specified, mogtool will gzip the data as it's going into MogileFS. The resource
134 will be marked as compressed.
136 Note that you do not need to specify this if the resource is already gzipped, but
137 it doesn't hurt. (We automatically detect that and mark it as compressed.)
141 If you previously were working on injecting a big file as chunks and the process
142 died, normally mogtool refuses to do it again. Specify this option to force the
143 overwrite of that file.
145 B<NOTE:> Other than in the above case (partial failure), mogtool will not prompt
146 before overwriting an existing file.
150 If on, we do a full MD5 verification of every chunk after it is replicated. This
151 can take a while on large files!
153 =item --description=<text>
155 Specifies a description for this file. Optional, but assists in reporting and
156 listing the large files in MogileFS. (This is also displayed in any receipts
159 =item --receipt=<email address>[, <email address>]*
161 If specified, emails a copy of the receipt file to the specified comma-separated
162 email addresses. Also creates a local filesystem copy of the receipt file.
164 =item --concurrent=<number>
166 Specifies the number of concurrent processes to run for MogileFS insertion. If
167 you are noticing mogtool spend most of it's time waiting for children and not
168 actually buffering data, you may wish to raise this number. The default is 1
169 but we've found 3 or 4 work well.
173 =head1 INJECT ARGUMENTS
179 What you actually want to inject. This can be a file, directory, or a raw
180 partition in the format I</dev/X>.
182 Please see L</"USAGE EXAMPLES"> for more information on how to inject these
183 different types of resources and the differences thereof.
187 Specifies the key to save this file to. For big files, the key is actually
188 "_big_N:key" and "key,#" where N is one of a bunch of things we use and # is
191 Generally, you want this to be descriptive so you remember what it is later
192 and can identify the file just by looking at the key.
196 =head1 EXTRACT OPTIONS
202 If specified, indicates that this resource was chunked on injection and should be
203 reassembled for extraction.
207 Specifies to mogtool that it should ungzip the output if and only if it was
208 compressed when inserted into the MogileFS system. So, if you're extracting a
209 file that wasn't gzipped to begin with, this doesn't do anything.
213 Useful when extracting something previously inserted as a directory--this option
214 instructs mogtool to treat the resource as a file and not actually run it
215 through tar for decompression.
219 =head1 EXTRACT ARGUMENTS
225 Specifies the key to get the file from.
229 What destination means varies depending on what type of resource you're extracting.
230 However, no matter what, you can specify a single dash (B<->) to mean STDOUT.
232 Please see the usage examples for more information on how extract works.
236 =head1 DELETE OPTIONS
242 The resource is a "big file" and all chunks should be deleted.
246 =head1 DELETE ARGUMENTS
252 Specifies the key of the file to delete.
256 =head1 LOCATE OPTIONS
262 Verify that the returned paths actually contain the file. The locate
263 commands defaults to verify, you can disable it with --noverify.
267 The resource is a "big file" and the locations of the information key should be printed.
271 =head1 LOCATE ARGUMENTS
277 Specifies the key of the file to locate
287 Success during operation.
291 During the locate, list, or listkey operation, the key was not found.
295 Some fatal error occurred.
299 =head1 USAGE EXAMPLES
301 I<Please note that all examples assume you have a default config file that
302 contains the tracker and domain to use. Saves us from having to clutter up
305 =head2 Small Files (<64MB)
307 When it comes to using small files, mogtool is very, very easy.
311 $ mogtool inject foo.dbm foo.dbm.2004.12
313 Injects the file I<foo.dbm> into MogileFS under the key of I<foo.dbm.2004.12>.
315 $ mogtool inject --gzip foo.dbm foo.dbm.2004.12
317 Injects the same file to the same key, but compresses it on the fly for you.
321 $ mogtool extract foo.dbm.2004.12 newfoo.dbm
323 Retrieves the key I<foo.dbm.2004.12> and saves it as I<newfoo.dbm>.
325 $ mogtool extract --gzip foo.dbm.2004.12 newfoo.dbm
327 Gets the file and automatically decompresses it, if and only if it was compressed.
328 So basically, you can turn on gzip in your config file and mogtool will do the
329 smart thing each time.
331 $ mogtool extract foo.dbm.2004.12 -
333 Print the resource to standard out. If you want, you can pipe it somewhere or
334 redirect to a file (but why not just specify the filename?).
336 =head2 Large Files (>64MB)
338 Given mogtool's ability to break files into chunks and later reassemble them,
339 inserting large files (even files over the 4GB barrier) is relatively easy.
343 $ mogtool inject --bigfile largefile.dat largefile.dat
345 As expected, inserts the file I<largefile.dat> into the MogileFS system under
346 the name I<largefile.dat>. Not very creative. Uses the default 64MB chunks.
348 $ mogtool inject --bigfile --chunksize=16M largefile.dat largefile.dat
350 Specify to use 16MB chunks instead of the default. Otherwise, the same.
352 $ mogtool inject --bigfile --chunksize=1000K --gzip largefile.dat somekey
354 Do it again, but specify 1000KB chunks, gzip automatically, and upload it under
355 a different key I<somekey>.
357 $ mogtool inject --bigfile --concurrent=5 --gzip largefile.dat somekey
359 Same as above, but use 5 children processes for uploading chunks to MogileFS.
360 This can take up to 300MB of memory in this example! (It tends to use about
361 (concurrency + 1) * chunksize bytes.)
363 $ mogtool inject --bigfile --chunksize=32M --concurrent=3 --gzip \
364 --receipt="foo@bar.com" --verify --description="A large file" \
365 largefile.dat somekey
367 Break this file into 128MB chunks, set a description, use 3 children to
368 upload them, gzip the file as you go, do a full MD5 verification of every
369 chunk, then email a receipt with all of the MogileFS paths to me.
371 Lots of flexibility with mogtool.
375 $ mogtool extract --bigfile somekey newfile.dat
377 In its basic form, extracts the previously inserted large file and saves it as
380 $ mogtool extract --bigfile --gzip somekey newfile.dat
382 If the file was gzipped on entry, ungzip it and save the result. If it wasn't
383 gzipped, then we just save it.
387 Directories are easily injected and extracted with mogtool. To create the data
388 stream that is inserted into MogileFS, we use tar.
392 $ mogtool inject --bigfile mydir mykey
394 Run I<mydir> through tar and then save it as I<mykey>.
396 $ mogtool inject --bigfile --gzip --concurrent=5 mydir mykey
398 Inject, but also gzip and use multiple injectors.
400 I<Note how this is just like injecting a large file. See injection examples for
401 large files for more examples.>
405 $ mogtool extract --bigfile mykey .
407 Extract the previously injected directory I<mykey> to your local directory.
409 $ mogtool extract --bigfile --asfile mykey foo.tar
411 Take the previously generated tarball and save it as I<foo.tar>. Simply creates
412 the file instead of extracting everything inside.
414 =head2 Partitions/Devices
416 mogtool has the ability to inject raw partitions into MogileFS and to retrieve
417 them later and write them back to a partition. They're treated just like directories
418 for the most part, we just don't pipe things through tar.
422 $ mogtool inject --bigfile /dev/hda3 hda3.backup
424 Save a raw copy of your partition I</dev/hda3> to the key I<hda3.backup>.
426 $ mogtool inject --bigfile --gzip /dev/hda3 hda3.backup
428 Same, but compress on the fly during injection.
432 $ mogtool extract --bigfile hda3.backup /dev/hda4
434 Extract the partition at I<hda3.backup> to the partition I</dev/hda4>. B<WARNING:>
435 mogtool won't ask for confirmation, make sure you don't mistype partition numbers!
437 =head2 Deleting a Resource
439 B<WARNING:> Please make sure you're specifying the right parameter, as delete does
440 not prompt for confirmation of the request!
442 $ mogtool delete thekey
444 Delete a normal file.
446 $ mogtool delete --bigfile thekey
448 Delete a chunked file--this deletes all chunks and the receipt, so the file is gone.
450 =head2 Listing Big Files
452 $ mogtool list backup
454 Lists all large files stored in MogileFS. It is not possible to list all normal files
457 =head2 Listing Files Matching a Key
459 $ mogtool listkey abc1
461 Lists all files in MogileFS whose keys start with the characters "abc1".
463 =head1 CONFIGURATION FILE
465 Instead of adding a ton of options to the command line every time, mogtool enables
466 you to create a default configuration file that it will read all of the options from.
467 It searches two locations for a default configuration file: B<~/.mogtool> and
468 B</etc/mogilefs/mogtool.conf>. (Alternately, you can specify B<--conf=whatever> as
469 an option on the command line.)
471 The file can consist of any number of the following items:
473 trackers = 10.0.0.3:7001, 10.10.0.5/10.0.0.5:7001
474 domain = mogiledomain
481 receipt = foo@bar.com, baz@bar.com
487 None? Send me any you find! :)
489 =head1 PLANNED FEATURES
493 =item --concurrent for extract
495 It would be nice to have concurrent extraction going on.
499 If the receipt file is ever corrupt in MogileFS it would be useful to recover a
500 file given just a receipt. It would have the same arguments as the extract mode,
501 except use a receipt file as the data source.
503 =item partition size verification
505 We can easily get the partition size when we save one to MogileFS, so we should
506 use that information to determine during extraction if a target partition is going
509 =item on the fly gzip extraction
511 Right now we can gzip on an injection, but we should support doing decompression
512 on the fly coming out of MogileFS.
514 =item make list take a prefix
516 If you can specify a prefix, that makes things easier for finding small files that
517 are stored in MogileFS.
519 =item more information on list
521 Have list load up the info file and parse it for information about each of the
522 big files being stored. Maybe have this as an option (-l). (This means the
523 reading and parsing of info files should be abstracted into a function.)
529 Mark Smith E<lt>junior@danga.comE<gt> - most of the implementation and maintenance.
531 Brad Fitzpatrick E<lt>brad@danga.comE<gt> - concepts and rough draft.
533 Robin H. Johnson E<lt>robbat2@orbis-terrarum.netE<gt> - locate function.
535 Copyright (c) 2002-2004 Danga Interactive. All rights reserved.
539 ##############################################################################
543 use Pod
::Usage
qw{ pod2usage
};
544 use Digest
::MD5
qw{ md5_hex
};
545 use Time
::HiRes
qw{ gettimeofday tv_interval
};
547 use POSIX
qw(:sys_wait_h);
552 use constant ERR_MISSING
=> 1;
553 use constant ERR_FATAL
=> 2;
558 abortWithUsage
() unless
560 # general purpose options
561 'trackers=s' => \
$opts{trackers
},
562 'domain=s' => \
$opts{domain
},
563 'class=s' => \
$opts{class},
564 'config=s' => \
$opts{config
},
565 'help' => \
$opts{help
},
566 'debug' => \
$MogileFS::DEBUG
,
567 'lib' => \
$opts{lib
},
569 # extract+inject options
570 'gzip|z' => \
$opts{gzip
},
571 'bigfile|b' => \
$opts{big
},
572 'nobigfile' => \
$opts{nobig
},
575 'overwrite' => \
$opts{overwrite
},
576 'chunksize=s' => \
$opts{chunksize
},
577 'receipt=s' => \
$opts{receipt
},
578 'reciept=s' => \
$opts{receipt
}, # requested :)
579 'verify!' => \
$opts{verify
},
580 'description=s' => \
$opts{des
},
581 'concurrent=i' => \
$opts{concurrent
},
582 'noreplwait' => \
$opts{noreplwait
},
585 'asfile' => \
$opts{asfile
},
588 # now load the config file?
589 my @confs = ( $opts{config
}, "$ENV{HOME}/.mogtool", "/etc/mogilefs/mogtool.conf" );
590 foreach my $conf (@confs) {
591 next unless $conf && -e
$conf;
595 next unless m!(\w+)\s*=\s*(.+)!;
596 $opts{$1} = $2 unless $opts{$1};
601 # now bring in MogileFS, because hopefully we have a lib by now
603 eval "use lib '$opts{lib}';";
606 # no trackers and domain..?
607 unless ($opts{trackers
} && $opts{domain
}) {
608 abortWithUsage
("--trackers and --domain configuration required");
612 use MogileFS
::Client
; 1
613 } or die "Failed to load MogileFS::Client module: $@\n";
615 # init connection to mogile
616 my $mogfs = get_mogfs
();
618 # get our command and pass off to our functions
620 inject
() if $cmd eq 'i' || $cmd eq "inject";
621 extract
() if $cmd eq 'x' || $cmd eq "extract";
622 list
() if $cmd eq 'ls' || $cmd eq "list";
623 listkey
() if $cmd eq 'lsk' || $cmd eq "listkey";
624 mdelete
() if $cmd eq 'rm' || $cmd eq "delete";
625 locate
() if $cmd eq 'lo' || $cmd eq "locate";
627 # fail if we get this far
630 ######################################################################
633 my @trackerinput = split(/\s*,\s*/, $opts{trackers
});
636 foreach my $tracker (@trackerinput) {
637 if ($tracker =~ m!(.+)/(.+):(\d+)!) {
639 push @trackers, "$2:$3";
641 push @trackers, $tracker;
645 my $mogfs = MogileFS
::Client
->new(
646 domain
=> $opts{domain
},
649 or error
("Could not initialize MogileFS", ERR_FATAL
);
650 $mogfs->set_pref_ip(\
%pref_ip);
655 my $err = shift() || "ERROR: no error message provided!";
658 if ($mogerr = $mogfs->errstr) {
666 $syserr =~ s/[\r\n]+$//;
669 my $exitcode = shift();
671 print STDERR
"$err\n";
672 print STDERR
"MogileFS backend error message: $mogerr\n" if $mogerr && $exitcode != ERR_MISSING
;
673 print STDERR
"System error message: $@\n" if $syserr;
675 # if a second argument, exit
676 if (defined ($exitcode)) {
682 my $src = shift @ARGV;
683 my $key = shift @ARGV;
684 abortWithUsage
("source and key required to inject") unless $src && $key;
686 # make sure the source exists and the key is valid
687 die "Error: source $src doesn't exist.\n"
689 die "Error: key $key isn't valid; must not contain spaces or commas.\n"
690 unless $key =~ /^[^\s\,]+$/;
692 # before we get too far, find sendmail?
694 if ($opts{receipt
}) {
695 $sendmail = `which sendmail` || '/usr/sbin/sendmail';
696 $sendmail =~ s/[\r\n]+$//;
697 unless (-e
$sendmail) {
698 die "Error: attempted to find sendmail binary in /usr/sbin but couldn't.\n";
702 # open up O as the handle to use for reading data
703 my $type = 'unknown';
705 my $taropts = ($opts{gzip
} ?
'z' : '') . "cf";
707 open (O
, '-|', 'tar', $taropts, '-', $src)
708 or die "Couldn't open tar for reading: $!\n";
712 or die "Couldn't open file for reading: $!\n";
716 or die "Couldn't open block device for reading: $!\n";
718 die "Error: not file, directory, or partition.\n";
721 # now do some pre-file checking...
723 if ($type ne 'file') {
724 die "Error: you specified to store a file of type $type but didn't specify --bigfile. Please see documentation.\n"
726 } elsif ($size > 64 * 1024 * 1024) {
727 die "Error: the file is more than 64MB and you didn't specify --bigfile. Please see documentation, or use --nobigfile to disable large file chunking and allow large single file uploads\n"
728 unless $opts{big
} || $opts{nobig
};
731 if ($opts{big
} && $opts{nobig
}) {
732 die "Error: You cannot specify both --bigfile and --nobigfile\n";
735 if ($opts{nobigfile
} && $opts{gzip
}) {
736 die "Error: --gzip is not compatible with --nobigfile\n";
739 # see if there's already a pre file?
741 my $data = $mogfs->get_file_data("_big_pre:$key");
743 unless ($opts{overwrite
}) {
744 error
(<<MSG, ERR_FATAL);
745 ERROR: The pre-insert file for $key exists. This indicates that a previous
746 attempt to inject a file failed--or is still running elsewhere! Please
747 verify that a previous injection of this file is finished, or run mogtool
748 again with the --overwrite inject option.
754 # delete the pre notice since we didn't die (overwrite must be on)
755 $mogfs->delete("_big_pre:$key")
756 or error
("ERROR: Unable to delete _big_pre:$key.", ERR_FATAL
);
759 # now create our pre notice
760 my $prefh = $mogfs->new_file("_big_pre:$key", $opts{class})
761 or error
("ERROR: Unable to create _big_pre:$key.", ERR_FATAL
);
762 $prefh->print("starttime:" . time());
764 or error
("ERROR: Unable to save to _big_pre:$key.", ERR_FATAL
);
767 # setup config and temporary variables we're going to be using
768 my $chunk_size = 64 * 1024 * 1024; # 64 MB
770 if ($opts{chunksize
} && ($opts{chunksize
} =~ m!^(\d+)(G|M|K|B)?!i)) {
772 unless (lc $2 eq 'b') {
773 $chunk_size *= (1024 ** ( { g
=> 3, m
=> 2, k
=> 1 }->{lc $2} || 2 ));
775 print "NOTE: Using chunksize of $chunk_size bytes.\n";
778 my $read_size = ($chunk_size > 1024*1024 ?
1024*1024 : $chunk_size);
780 # temporary variables
784 my %chunkinfo; # { id => [ md5, length ] }
785 my %chunkbuf; # { id => data }
786 my %children; # { pid => chunknum }
787 my %chunksout; # { chunknum => pid }
789 # this function writes out a chunk
791 my $cn = shift() + 0;
794 # get the length of the chunk we're going to send
795 my $bufsize = length $chunkbuf{$cn};
796 return unless $bufsize;
798 # now spawn off a child to do the real work
799 if (my $pid = fork()) {
800 print "Spawned child $pid to deal with chunk number $cn.\n";
801 $chunksout{$cn} = $pid;
802 $children{$pid} = $cn;
806 # drop other memory references we're not using anymore
807 foreach my $chunknum (keys %chunkbuf) {
808 next if $chunknum == $cn;
809 delete $chunkbuf{$chunknum};
812 # as a child, get a new mogile connection
813 my $mogfs = get_mogfs
();
814 my $dkey = $opts{big
} ?
"$key,$chunknum" : "$key";
816 my $start_time = [ gettimeofday
() ];
821 my $fh = $mogfs->new_file($dkey, $opts{class}, $bufsize);
822 unless (defined $fh) {
823 die "Unable to create new file";
825 $fh->print($chunkbuf{$cn});
826 unless ($fh->close) {
831 error
("WARNING: Unable to save file '$dkey': $err");
832 printf "This was try #$try and it's been %.2f seconds since we first tried. Retrying...\n", tv_interval
($start_time);
838 my $diff = tv_interval
($start_time);
839 printf " chunk $cn saved in %.2f seconds.\n", $diff;
841 # make sure we never return, always exit
845 # just used to reap our children in a loop until they're done. also
846 # handles respawning a child that failed.
847 my $reap_children = sub {
848 # find out if we have any kids dead
849 while ((my $pid = waitpid -1, WNOHANG
) > 0) {
850 my $cnum = delete $children{$pid};
852 print "Error: reaped child $pid, but no idea what they were doing...\n";
855 if (my $status = $?
) {
856 print "Error: reaped child $pid for chunk $cnum returned non-zero status... Retrying...\n";
860 my @paths = grep { defined $_ } $mogfs->get_paths($opts{big
} ?
"$key,$cnum" : "$key", 1);
862 print "Error: reaped child $pid for chunk $cnum but no paths exist... Retrying...\n";
866 delete $chunkbuf{$cnum};
867 delete $chunksout{$cnum};
868 print "Child $pid successfully finished with chunk $cnum.\n";
872 # this function handles parallel threads
873 $opts{concurrent
} ||= 1;
874 $opts{concurrent
} = 1 if $opts{concurrent
} < 1;
875 my $handle_children = sub {
876 # here we pause while our children are working
878 while ($first || scalar(keys %children) >= $opts{concurrent
}) {
881 select undef, undef, undef, 0.1;
884 # now spawn until we hit the limit
885 foreach my $cnum (keys %chunkbuf) {
886 next if $chunksout{$cnum};
888 last if scalar(keys %children) >= $opts{concurrent
};
892 # setup compression stuff
896 # if they turned gzip on we may or may not need this stream, so make it
897 $zlib = deflateInit
()
898 or error
("Error: unable to create gzip deflation stream", ERR_FATAL
);
904 $upload_fh = $mogfs->new_file($key, $opts{class}, $size);
905 unless (defined $upload_fh) {
906 die "Unable to create new file";
910 error
("ERROR: Unable to open file '$key': $err");
915 # read one meg chunks while we have data
918 while (my $rv = read(O
, $readbuf, $read_size)) {
919 # if this is a file, and this is our first read, see if it's gzipped
920 if (!$sum && $rv >= 2) {
921 if (substr($readbuf, 0, 2) eq "\x1f\x8b") {
922 # this is already gzipped, so just mark it as such and insert it
925 # now turn on our gzipping if the user wants the output gzipped
926 $dogzip = 1 if $opts{gzip
};
930 # now run it through the deflation stream before we process it here
932 my ($out, $status) = $zlib->deflate($readbuf);
933 error
("Error: Deflation failure processing stream", ERR_FATAL
)
934 unless $status == Z_OK
;
936 $rv = length $readbuf;
938 # we don't always get a chunk from deflate
943 # Short circuit if we're just plopping up a big file.
945 $upload_fh->print($readbuf);
947 printf "Upload so far: $sum bytes [%.2f%% complete]\n",
948 ($sum / $size * 100);
953 # now stick our data into our real buffer
959 if ($type ne 'tarball' && $size && $size > $read_size) {
960 printf "Buffer so far: $bufsize bytes [%.2f%% complete]\r", ($sum / $size * 100);
962 print "Buffer so far: $bufsize bytes\r";
965 # if we have one chunk, handle it
966 if ($opts{big
} && $bufsize >= $chunk_size) {
967 $chunkbuf{++$chunknum} = substr($buf, 0, $chunk_size);
969 # calculate the md5, print out status, and save this chunk
970 my $md5 = md5_hex
($buf);
972 print "chunk $key,$chunknum: $md5, len = $chunk_size\n";
974 print "file $key: $md5, len = $chunk_size\n";
976 $chunkinfo{$chunknum} = [ $md5, $chunk_size ];
978 # reset for the next read loop
979 $buf = substr($buf, $chunk_size);
980 $bufsize = length $buf;
982 # now spawn children to save chunks
983 $handle_children->();
988 # now we need to flush the gzip engine
990 my ($out, $status) = $zlib->flush;
991 error
("Error: Deflation failure processing stream", ERR_FATAL
)
992 unless $status == Z_OK
;
994 $bufsize += length $out;
1000 $chunkbuf{++$chunknum} = $buf;
1001 my $md5 = md5_hex
($buf);
1003 print "chunk $key,$chunknum: $md5, len = $bufsize\n";
1005 print "file $key: $md5, len = $bufsize\n";
1007 $chunkinfo{$chunknum} = [ $md5, $bufsize ];
1010 # now, while we still have chunks to process...
1012 $handle_children->();
1016 # verify replication and chunks
1017 my %paths; # { chunknum => [ path, path, path ... ] }
1018 my %still_need = ( %chunkinfo );
1019 while (%still_need) {
1020 print "Replicating: " . join(' ', sort { $a <=> $b } keys %still_need) . "\n";
1021 sleep 1; # give things time to replicate some
1023 # now iterate over each and get the paths
1024 foreach my $num (keys %still_need) {
1025 my $dkey = $opts{big
} ?
"$key,$num" : $key;
1026 my @npaths = grep { defined $_ } $mogfs->get_paths($dkey, 1);
1029 error
("FAILURE: chunk $num has no paths at all.", ERR_FATAL
);
1032 if (scalar(@npaths) >= 2 || $opts{noreplwait
}) {
1033 # okay, this one's replicated, actually verify the paths
1034 foreach my $path (@npaths) {
1035 if ($opts{verify
}) {
1036 print " Verifying chunk $num, path $path...";
1037 my $data = get
($path);
1038 my $len = length($data);
1039 my $md5 = md5_hex
($data);
1040 if ($md5 ne $chunkinfo{$num}->[0]) {
1041 print "md5 mismatch\n";
1043 } elsif ($len != $chunkinfo{$num}->[1]) {
1044 print "length mismatch ($len, $chunkinfo{$num}->[1])\n";
1048 } elsif ($opts{receipt
}) {
1049 # just do a quick size check
1050 print " Size verifying chunk $num, path $path...";
1051 my $clen = (head
($path))[1] || 0;
1052 unless ($clen == $chunkinfo{$num}->[1]) {
1053 print "length mismatch ($clen, $chunkinfo{$num}->[1])\n";
1058 push @
{$paths{$num} ||= []}, $path;
1061 # now make sure %paths contains at least 2 verified
1062 next if scalar(@
{$paths{$num} || []}) < 2 && !$opts{noreplwait
};
1063 delete $still_need{$num};
1068 # prepare the info file
1069 my $des = $opts{des
} || 'no description';
1070 my $compressed = $opts{gzip
} ?
'1' : '0';
1071 #FIXME: add 'partblocks' to info file
1073 # create the info file
1077 compressed $compressed
1083 foreach (sort { $a <=> $b } keys %chunkinfo) {
1084 $info .= "part $_ bytes=$chunkinfo{$_}->[1] md5=$chunkinfo{$_}->[0] paths: ";
1085 $info .= join(', ', @
{$paths{$_} || []});
1089 # now write out the info file
1091 my $fhinfo = $mogfs->new_file("_big_info:$key", $opts{class})
1092 or error
("ERROR: Unable to create _big_info:$key.", ERR_FATAL
);
1093 $fhinfo->print($info);
1095 or error
("ERROR: Unable to save _big_info:$key.", ERR_FATAL
);
1098 print "Waiting for info file replication...\n" unless $opts{noreplwait
};
1099 while (!$opts{noreplwait
}) {
1100 my @paths = $mogfs->get_paths("_big_info:$key", 1);
1102 select undef, undef, undef, 0.25;
1105 foreach my $path (@paths) {
1106 my $data = get
($path);
1107 error
(" FATAL: content mismatch on $path", ERR_FATAL
)
1108 unless $data eq $info;
1113 # now delete our pre file
1114 print "Deleting pre-insert file...\n";
1115 $mogfs->delete("_big_pre:$key")
1116 or error
("ERROR: Unable to delete _big_pre:$key", ERR_FATAL
);
1119 # Wrap up the non big file...
1122 unless ($upload_fh->close) {
1127 error
("ERROR: Unable to close file '$key': $err");
1132 # now email and save a receipt
1133 if ($opts{receipt
}) {
1134 open MAIL
, "| $sendmail -t"
1135 or error
("ERROR: Unable to open sendmail binary: $sendmail", ERR_FATAL
);
1138 From: mogtool\@dev.null
1139 Subject: mogtool.$key.receipt
1145 print "Receipt emailed.\n";
1147 # now dump to a file
1148 open FILE
, ">mogtool.$key.receipt"
1149 or error
("ERROR: Unable to create file mogtool.$key.receipt in current directory.", ERR_FATAL
);
1152 print "Receipt stored in mogtool.$key.receipt.\n";
1162 # parse out the header data
1163 $res->{des
} = ($info =~ /^des\s+(.+)$/m) ?
$1 : undef;
1164 $res->{type
} = ($info =~ /^type\s+(.+)$/m) ?
$1 : undef;
1165 $res->{compressed
} = ($info =~ /^compressed\s+(.+)$/m) ?
$1 : undef;
1166 $res->{filename
} = ($info =~ /^filename\s+(.+)$/m) ?
$1 : undef;
1167 $res->{chunks
} = ($info =~ /^chunks\s+(\d+)$/m) ?
$1 : undef;
1168 $res->{size
} = ($info =~ /^size\s+(\d+)$/m) ?
$1 : undef;
1170 # now get the pieces
1171 $res->{maxnum
} = undef;
1172 while ($info =~ /^part\s+(\d+)\s+bytes=(\d+)\s+md5=(.+)\s+paths:\s+(.+)$/mg) {
1173 $res->{maxnum
} = $1 if !defined $res->{maxnum
} || $1 > $res->{maxnum
};
1174 $res->{parts
}->{$1} = {
1177 paths
=> [ split(/\s*,\s*/, $4) ],
1185 my $key = shift @ARGV;
1186 my $dest = shift @ARGV;
1187 abortWithUsage
("key and destination required to extract") unless $key && $dest;
1189 error
("Error: key $key isn't valid; must not contain spaces or commas.", ERR_FATAL
)
1190 unless $key =~ /^[^\s\,]+$/;
1191 unless ($dest eq '-' || $dest eq '.') {
1192 error
("Error: destination exists: $dest (specify --overwrite if you want to kill it)", ERR_FATAL
)
1193 if -e
$dest && !$opts{overwrite
} && !-b
$dest;
1196 # see if this is really a big file
1199 my $info = $mogfs->get_file_data("_big_info:$key");
1200 die "$key doesn't seem to be a valid big file.\n"
1201 unless $info && $$info;
1204 $file = _parse_info
($$info);
1206 # make sure we have enough info
1207 error
("Error: info file doesn't contain the number of chunks", ERR_FATAL
)
1208 unless $file->{chunks
};
1209 error
("Error: info file doesn't contain the total size", ERR_FATAL
)
1210 unless $file->{size
};
1213 # not a big file, so it has to be of a certain type
1214 $file->{type
} = 'file';
1215 $file->{maxnum
} = 1;
1216 $file->{parts
}->{1} = {
1217 paths
=> [ grep { defined $_ } $mogfs->get_paths($key) ],
1220 # now, if it doesn't exist..
1221 unless (scalar(@
{$file->{parts
}->{1}->{paths
}})) {
1222 error
("Error: file doesn't exist (or did you forget --bigfile?)", ERR_FATAL
);
1226 # several cases.. going to STDOUT?
1230 # open up O as the handle to use for reading data
1231 if ($file->{type
} eq 'file' || $file->{type
} eq 'partition' ||
1232 ($file->{type
} eq 'tarball' && $opts{asfile
})) {
1233 # just write it to the file with this name, but don't overwrite?
1235 $dest = $file->{filename
};
1236 $dest =~ s!^(.+)/!!;
1239 # if we're targeting a block device...
1240 warn "FIXME: add in block checking\n";
1242 or die "Couldn't open $dest: $!\n";
1243 } elsif (-e
$dest) {
1244 if ($opts{overwrite
}) {
1246 or die "Couldn't open $dest: $!\n";
1248 die "File already exists: $dest ... won't overwrite without --overwrite.\n";
1252 or die "Couldn't open $dest: $!\n";
1255 } elsif ($file->{type
} eq 'tarball') {
1256 my $taropts = ($file->{compressed
} ?
'z' : '') . "xf";
1257 open O
, '|-', 'tar', $taropts, '-'
1258 or die "Couldn't open tar for writing: $!\n";
1261 die "Error: unable to handle type '$file->{type}'\n";
1265 # start fetching pieces
1266 foreach my $i (1..$file->{maxnum
}) {
1267 print "Fetching piece $i...\n";
1269 foreach my $path (@
{$file->{parts
}->{$i}->{paths
} || []}) {
1270 print " Trying $path...\n";
1271 my $data = get
($path);
1274 # now verify MD5, etc
1276 my $len = length $data;
1277 my $md5 = md5_hex
($data);
1278 print " ($len bytes, $md5)\n";
1279 next unless $len == $file->{parts
}->{$i}->{bytes
} &&
1280 $md5 eq $file->{parts
}->{$i}->{md5
};
1283 # this chunk verified, write it out
1289 # at this point the file should be complete!
1293 # now make sure we have enough data
1294 #$ mogtool [opts] extract <key> {<file>,<dir>,<device>}
1295 #=> - (for STDOUT) (if compressed, add "z" flag)
1296 #=> . (to untar) (if compressed, do nothing???, make .tar.gz file -- unless they use -z again?)
1297 #=> /dev/sda4 (but check /proc/partitions that it's big enough) (if compress, Compress::Zlib to ungzip
1298 # => foo.jpg (write it to a file)
1306 my $key = shift(@ARGV);
1307 abortWithUsage
("key required to locate") unless $key;
1308 $opts{verify
} = 1 unless defined $opts{verify
};
1309 $opts{bigfile
} = 0 unless $opts{big
};
1312 $dkey = "_big_info:$key" if $opts{big
};
1314 # list all paths for the file
1317 my @paths = grep { defined $_ }
1318 $mogfs->get_paths($dkey,
1319 {verify
=> $opts{verify
}, pathcount
=> 1024 },
1321 if(@paths == 0 && $mogfs->errstr =~ /unknown_key/) {
1322 error
("Error: bigfile $key doesn't exist (or did you force --bigfile?)", ERR_MISSING
) if $opts{big
};
1323 error
("Error: file $key doesn't exist (or did you forget --bigfile?)", ERR_MISSING
);
1325 error
("Error: Something went wrong", ERR_FATAL
) if($mogfs->errstr);
1326 foreach my $key (@paths) {
1330 print "#$ct paths found\n";
1336 # list all big files in mogile
1337 my ($ct, $after, $list);
1339 while (($after, $list) = $mogfs->list_keys("_big_info:", $after)) {
1340 last unless $list && @
$list;
1342 # now extract the key and dump it
1343 foreach my $key (@
$list) {
1344 next unless $key =~ /^_big_info:(.+)$/;
1352 print "#$ct files found\n";
1358 my $key_pattern = shift @ARGV;
1359 $key_pattern = '' unless defined $key_pattern;
1361 # list all files matching a key
1362 my ($ct, $after, $list);
1364 while (($after, $list) = $mogfs->list_keys("$key_pattern", $after)) {
1365 last unless $list && @
$list;
1367 # now extract the key and dump it
1368 foreach my $key (@
$list) {
1375 error
("Error: Something went wrong", ERR_FATAL
) if ($mogfs->errstr && ! ($mogfs->errstr =~ /none_match/));
1376 print "#$ct files found\n";
1382 my $key = shift(@ARGV);
1383 abortWithUsage
("key required to delete") unless $key;
1385 # delete simple file
1386 unless ($opts{big
}) {
1387 my $rv = $mogfs->delete($key);
1388 error
("Failed to delete: $key.", ERR_FATAL
)
1395 my $info = $mogfs->get_file_data("_big_info:$key");
1396 error
("$key doesn't seem to be a valid big file.", ERR_FATAL
)
1397 unless $info && $$info;
1400 my $file = _parse_info
($$info);
1402 # make sure we have enough info to delete
1403 error
("Error: info file doesn't contain required information?", ERR_FATAL
)
1404 unless $file->{chunks
} && $file->{maxnum
};
1406 # now delete each chunk, best attempt
1407 foreach my $i (1..$file->{maxnum
}) {
1408 $mogfs->delete("$key,$i");
1411 # delete the main pieces
1412 my $rv = $mogfs->delete("_big_info:$key");
1413 error
("Unable to delete _big_info:$key.", ERR_FATAL
)
1419 abortWithUsage
() if $opts{help
};
1421 sub abortWithUsage
{
1422 my $msg = "!!!mogtool is DEPRECATED and will be removed in the future!!!\n";
1423 $msg .= join '', @_;
1426 pod2usage
( -verbose
=> 1, -exitval
=> 1, -message
=> "\n$msg\n" );
1428 pod2usage
( -verbose
=> 1, -exitval
=> 1 );
1435 Usage
: mogtool
[opts
] <command
> [command
-opts
] [command
-args
]
1438 * --trackers
=<ip
:port
>[,<ip
:port
>]*
1444 * --conf
=<file
> Location of config file listing trackers
, default
1445 domain
, and default class
1447 Default
: ~/.mogilefs, /etc
/mogilefs/mogilefs
.conf
1449 * --bigfile
| -b Tell mogtool to
split file into
64MB chunks
and
1450 checksum the chunks
,
1452 * --gzip
| -z Use gzip compression
/decompression
1457 inject
| i Inject a file into MogileFS
, by key
1458 extract
| x Extract a file from MogileFS
, by key
1459 list
| ls List large files
in MogileFS
1463 $ mogtool
[opts
] inject
[i
-opts
] <file
,dir
,device
> <key
>
1466 --overwrite Ignore existing _big_pre
: and start anew
.
1467 --chunksize
=n Set the size of individual chunk files
. n is
in the format of
1468 number
[scale
] so
10 is
10 megabytes
, 10M is also
10 megs
, 10G
, 10B
, 10K
...
1470 --receipt
=email Send a receipt to the specified email address
1471 --verify Make sure things replicate
and then check the MD5s?
1472 --des
=string Set the file description
1475 $ mogtool
[opts
] extract
<key
> {<file
>,<dir
>,<device
>}
1476 => - (for STDOUT
) (if compressed
, add
"z" flag
)
1477 => . (to untar
) (if compressed
, do nothing???
, make
.tar
.gz file
-- unless they
use -z again?
)
1478 => /dev/sda4 (but check
/proc/partitions that it
's big enough) (if compress, Compress::Zlib to ungzip)
1479 => foo.jpg (write it to a file)
1484 # mogtool add --key='roast
.sdb1
.2004
-11-07' -z /dev/sda1
1488 <key> = "cow.2004.11.17"
1490 # this is a temporary file that we delete when we're doing recording all chunks
1494 starttime
=UNIXTIMESTAMP
1496 # when done, we write the _info file and delete the _pre.
1500 des Cow
's ljdb backup as of 2004-11-17
1501 type { partition, file, tarball }
1503 filename ljbinlog.305.gz
1504 partblocks 234324324324
1507 part 1 <bytes> <md5hex>
1508 part 2 <bytes> <md5hex>
1509 part 3 <bytes> <md5hex>
1510 part 4 <bytes> <md5hex>
1511 part 5 <bytes> <md5hex>
1520 BEGIN MOGTOOL RECIEEPT
1525 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1526 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1527 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1528 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1535 perl -w bin/mogtool --gzip inject --overwrite --chunksize=24M --des="This is a description" --receipt="marksmith@danga.com" ../music/jesse/Unsorted jesse.music.unsorted