Checking in changes prior to tagging of version 2.30.
[MogileFS-Utils.git] / mogtool
blob0b6c333fbb9021a94ffa712ca48cdd908650e1d6
1 #!/usr/bin/perl
2 # vim:ts=4 sw=4 et ft=perl:
3 eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
4 if 0; # not running under some shell
5 ############################################################################
7 =head1 NAME
9 mogtool -- Inject/extract data to/from a MogileFS installation
11 B<WARNING>: this utility is deprecated! See L<MogileFS::Utils>
13 =head1 SYNOPSIS
15 $ mogtool [general-opts] <command> [command-opts] <command-args>
17 $ mogtool --trackers=127.0.0.1:6001 --domain=foo --class=bar ...
18 $ mogtool --conf=foo.conf ...
20 $ mogtool inject thefile.tgz thefilekey
21 $ mogtool inject --bigfile thebigfile.tgz thefilekey
22 $ mogtool inject --bigfile --gzip thebigfile.tar thefilekey
23 $ mogtool inject --bigfile --gzip mydirectory thedirkey
24 $ mogtool inject --bigfile --gzip /dev/hda4 thedevkey
25 $ mogtool inject --nobigfile bigcontiguousfile bigcfilekey
27 $ mogtool inject --bigfile --gzip --verify \
28 --description="Description" \
29 --receipt="foo@bar.com, baz@bar.com" \
30 --concurrent=5 --chunksize=32M \
31 somehugefile thefilekey
33 $ mogtool extract thefilekey thenewfile.tgz
34 $ mogtool extract thefilekey -
35 $ mogtool extract --bigfile thedirkey .
36 $ mogtool extract --bigfile --asfile thedirkey thefile.tgz
37 $ mogtool extract --bigfile thedevkey /dev/hda4
39 $ mogtool delete thekey
41 $ mogtool locate --noverify thekey
42 $ mogtool locate --bigfile thekey
44 $ mogtool list
45 $ mogtool listkey key
47 =head1 GENERAL OPTIONS
49 =over 4
51 =item --debug
53 Turn on MogileFS debug output.
55 =item --trackers=<[preferred_ip/]ip:port>[,<[preferred_ip/]ip:port>]*
57 Specify one or more trackers for your MogileFS installation. Note that
58 you can specify preferred IPs to override the default IPs with. So it
59 would look something like B<10.10.0.1/10.0.0.1:8081>.
61 =item --domain=<domain>
63 Set the MogileFS domain to use.
65 =item --class=<class>
67 Set the class within the domain to use. Defaults to _default.
69 =item --conf=<file>
71 Specify a configuration file to load from.
73 =item --lib=<directory>
75 Specify a directory to use as a library path. Right now, this should
76 be the directory where you expect to find the MogileFS.pm file, if it's
77 not actually installed.
79 =back
81 =head1 COMMANDS
83 =over 4
85 =item inject|i
87 Insert a resource into MogileFS. See L</"INJECT OPTIONS"> and L</"INJECT ARGUMENTS">
88 for the rest of how to use the inject mode.
90 =item extract|x
92 Extract a resource from MogileFS. See L</"EXTRACT OPTIONS"> and L</"EXTRACT ARGUMENTS">
93 for how to use extract.
95 =item delete|rm
97 Delete a resource. See L</"DELETE OPTIONS"> and L</"DELETE ARGUMENTS">.
99 =item locate|lo key
101 List the paths to the file identified by the given key.
103 =item list|ls
105 List all big files contained in MogileFS. No options, no arguments.
107 =item listkey|lsk key
109 List all files which match the key. Key is just a prefix, and this will list
110 all keys which match the prefix. So if you specify key as "ABC1" then you'll
111 get all keys which start with the characters "ABC1"
113 =back
115 =head1 INJECT OPTIONS
117 The following options are used to control the behavior of the injector.
119 =over 4
121 =item --bigfile|-b
123 If specified, use chunking to break the resource into manageable pieces.
125 =item --chunksize=<size>[B|K|M|G]
127 When instructed to break files into chunks, the injector will use the specified
128 chunk size as the maximum chunk size. Defaults to 64M. You can specify the
129 chunk size manually and specify the units--defaults to bytes.
131 =item --gzip|-z
133 If specified, mogtool will gzip the data as it's going into MogileFS. The resource
134 will be marked as compressed.
136 Note that you do not need to specify this if the resource is already gzipped, but
137 it doesn't hurt. (We automatically detect that and mark it as compressed.)
139 =item --overwrite
141 If you previously were working on injecting a big file as chunks and the process
142 died, normally mogtool refuses to do it again. Specify this option to force the
143 overwrite of that file.
145 B<NOTE:> Other than in the above case (partial failure), mogtool will not prompt
146 before overwriting an existing file.
148 =item --verify
150 If on, we do a full MD5 verification of every chunk after it is replicated. This
151 can take a while on large files!
153 =item --description=<text>
155 Specifies a description for this file. Optional, but assists in reporting and
156 listing the large files in MogileFS. (This is also displayed in any receipts
157 that are created.)
159 =item --receipt=<email address>[, <email address>]*
161 If specified, emails a copy of the receipt file to the specified comma-separated
162 email addresses. Also creates a local filesystem copy of the receipt file.
164 =item --concurrent=<number>
166 Specifies the number of concurrent processes to run for MogileFS insertion. If
167 you are noticing mogtool spend most of it's time waiting for children and not
168 actually buffering data, you may wish to raise this number. The default is 1
169 but we've found 3 or 4 work well.
171 =back
173 =head1 INJECT ARGUMENTS
175 =over 4
177 =item resource
179 What you actually want to inject. This can be a file, directory, or a raw
180 partition in the format I</dev/X>.
182 Please see L</"USAGE EXAMPLES"> for more information on how to inject these
183 different types of resources and the differences thereof.
185 =item key
187 Specifies the key to save this file to. For big files, the key is actually
188 "_big_N:key" and "key,#" where N is one of a bunch of things we use and # is
189 the chunk number.
191 Generally, you want this to be descriptive so you remember what it is later
192 and can identify the file just by looking at the key.
194 =back
196 =head1 EXTRACT OPTIONS
198 =over 4
200 =item --bigfile|-b
202 If specified, indicates that this resource was chunked on injection and should be
203 reassembled for extraction.
205 =item --gzip|-z
207 Specifies to mogtool that it should ungzip the output if and only if it was
208 compressed when inserted into the MogileFS system. So, if you're extracting a
209 file that wasn't gzipped to begin with, this doesn't do anything.
211 =item --asfile
213 Useful when extracting something previously inserted as a directory--this option
214 instructs mogtool to treat the resource as a file and not actually run it
215 through tar for decompression.
217 =back
219 =head1 EXTRACT ARGUMENTS
221 =over 4
223 =item key
225 Specifies the key to get the file from.
227 =item destination
229 What destination means varies depending on what type of resource you're extracting.
230 However, no matter what, you can specify a single dash (B<->) to mean STDOUT.
232 Please see the usage examples for more information on how extract works.
234 =back
236 =head1 DELETE OPTIONS
238 =over 4
240 =item --bigfile|-b
242 The resource is a "big file" and all chunks should be deleted.
244 =back
246 =head1 DELETE ARGUMENTS
248 =over 4
250 =item key
252 Specifies the key of the file to delete.
254 =back
256 =head1 LOCATE OPTIONS
258 =over 4
260 =item --verify
262 Verify that the returned paths actually contain the file. The locate
263 commands defaults to verify, you can disable it with --noverify.
265 =item --bigfile|-b
267 The resource is a "big file" and the locations of the information key should be printed.
269 =back
271 =head1 LOCATE ARGUMENTS
273 =over 4
275 =item key
277 Specifies the key of the file to locate
279 =back
281 =head1 RETURN VALUES
283 =over 4
285 =item 0
287 Success during operation.
289 =item 1
291 During the locate, list, or listkey operation, the key was not found.
293 =item 2
295 Some fatal error occurred.
297 =back
299 =head1 USAGE EXAMPLES
301 I<Please note that all examples assume you have a default config file that
302 contains the tracker and domain to use. Saves us from having to clutter up
303 the command line.>
305 =head2 Small Files (<64MB)
307 When it comes to using small files, mogtool is very, very easy.
309 =head3 Injection
311 $ mogtool inject foo.dbm foo.dbm.2004.12
313 Injects the file I<foo.dbm> into MogileFS under the key of I<foo.dbm.2004.12>.
315 $ mogtool inject --gzip foo.dbm foo.dbm.2004.12
317 Injects the same file to the same key, but compresses it on the fly for you.
319 =head3 Extraction
321 $ mogtool extract foo.dbm.2004.12 newfoo.dbm
323 Retrieves the key I<foo.dbm.2004.12> and saves it as I<newfoo.dbm>.
325 $ mogtool extract --gzip foo.dbm.2004.12 newfoo.dbm
327 Gets the file and automatically decompresses it, if and only if it was compressed.
328 So basically, you can turn on gzip in your config file and mogtool will do the
329 smart thing each time.
331 $ mogtool extract foo.dbm.2004.12 -
333 Print the resource to standard out. If you want, you can pipe it somewhere or
334 redirect to a file (but why not just specify the filename?).
336 =head2 Large Files (>64MB)
338 Given mogtool's ability to break files into chunks and later reassemble them,
339 inserting large files (even files over the 4GB barrier) is relatively easy.
341 =head3 Injection
343 $ mogtool inject --bigfile largefile.dat largefile.dat
345 As expected, inserts the file I<largefile.dat> into the MogileFS system under
346 the name I<largefile.dat>. Not very creative. Uses the default 64MB chunks.
348 $ mogtool inject --bigfile --chunksize=16M largefile.dat largefile.dat
350 Specify to use 16MB chunks instead of the default. Otherwise, the same.
352 $ mogtool inject --bigfile --chunksize=1000K --gzip largefile.dat somekey
354 Do it again, but specify 1000KB chunks, gzip automatically, and upload it under
355 a different key I<somekey>.
357 $ mogtool inject --bigfile --concurrent=5 --gzip largefile.dat somekey
359 Same as above, but use 5 children processes for uploading chunks to MogileFS.
360 This can take up to 300MB of memory in this example! (It tends to use about
361 (concurrency + 1) * chunksize bytes.)
363 $ mogtool inject --bigfile --chunksize=32M --concurrent=3 --gzip \
364 --receipt="foo@bar.com" --verify --description="A large file" \
365 largefile.dat somekey
367 Break this file into 128MB chunks, set a description, use 3 children to
368 upload them, gzip the file as you go, do a full MD5 verification of every
369 chunk, then email a receipt with all of the MogileFS paths to me.
371 Lots of flexibility with mogtool.
373 =head3 Extraction
375 $ mogtool extract --bigfile somekey newfile.dat
377 In its basic form, extracts the previously inserted large file and saves it as
378 I<newfile.dat>.
380 $ mogtool extract --bigfile --gzip somekey newfile.dat
382 If the file was gzipped on entry, ungzip it and save the result. If it wasn't
383 gzipped, then we just save it.
385 =head2 Directories
387 Directories are easily injected and extracted with mogtool. To create the data
388 stream that is inserted into MogileFS, we use tar.
390 =head3 Injection
392 $ mogtool inject --bigfile mydir mykey
394 Run I<mydir> through tar and then save it as I<mykey>.
396 $ mogtool inject --bigfile --gzip --concurrent=5 mydir mykey
398 Inject, but also gzip and use multiple injectors.
400 I<Note how this is just like injecting a large file. See injection examples for
401 large files for more examples.>
403 =head3 Extraction
405 $ mogtool extract --bigfile mykey .
407 Extract the previously injected directory I<mykey> to your local directory.
409 $ mogtool extract --bigfile --asfile mykey foo.tar
411 Take the previously generated tarball and save it as I<foo.tar>. Simply creates
412 the file instead of extracting everything inside.
414 =head2 Partitions/Devices
416 mogtool has the ability to inject raw partitions into MogileFS and to retrieve
417 them later and write them back to a partition. They're treated just like directories
418 for the most part, we just don't pipe things through tar.
420 =head3 Injection
422 $ mogtool inject --bigfile /dev/hda3 hda3.backup
424 Save a raw copy of your partition I</dev/hda3> to the key I<hda3.backup>.
426 $ mogtool inject --bigfile --gzip /dev/hda3 hda3.backup
428 Same, but compress on the fly during injection.
430 =head3 Extraction
432 $ mogtool extract --bigfile hda3.backup /dev/hda4
434 Extract the partition at I<hda3.backup> to the partition I</dev/hda4>. B<WARNING:>
435 mogtool won't ask for confirmation, make sure you don't mistype partition numbers!
437 =head2 Deleting a Resource
439 B<WARNING:> Please make sure you're specifying the right parameter, as delete does
440 not prompt for confirmation of the request!
442 $ mogtool delete thekey
444 Delete a normal file.
446 $ mogtool delete --bigfile thekey
448 Delete a chunked file--this deletes all chunks and the receipt, so the file is gone.
450 =head2 Listing Big Files
452 $ mogtool list backup
454 Lists all large files stored in MogileFS. It is not possible to list all normal files
455 at this time.
457 =head2 Listing Files Matching a Key
459 $ mogtool listkey abc1
461 Lists all files in MogileFS whose keys start with the characters "abc1".
463 =head1 CONFIGURATION FILE
465 Instead of adding a ton of options to the command line every time, mogtool enables
466 you to create a default configuration file that it will read all of the options from.
467 It searches two locations for a default configuration file: B<~/.mogtool> and
468 B</etc/mogilefs/mogtool.conf>. (Alternately, you can specify B<--conf=whatever> as
469 an option on the command line.)
471 The file can consist of any number of the following items:
473 trackers = 10.0.0.3:7001, 10.10.0.5/10.0.0.5:7001
474 domain = mogiledomain
475 class = fileclass
476 lib = /home/foo/lib
477 gzip = 1
478 big = 1
479 overwrite = 1
480 chunksize = 32M
481 receipt = foo@bar.com, baz@bar.com
482 verify = 1
483 concurrent = 3
485 =head1 KNOWN BUGS
487 None? Send me any you find! :)
489 =head1 PLANNED FEATURES
491 =over 4
493 =item --concurrent for extract
495 It would be nice to have concurrent extraction going on.
497 =item recover mode
499 If the receipt file is ever corrupt in MogileFS it would be useful to recover a
500 file given just a receipt. It would have the same arguments as the extract mode,
501 except use a receipt file as the data source.
503 =item partition size verification
505 We can easily get the partition size when we save one to MogileFS, so we should
506 use that information to determine during extraction if a target partition is going
507 to be big enough.
509 =item on the fly gzip extraction
511 Right now we can gzip on an injection, but we should support doing decompression
512 on the fly coming out of MogileFS.
514 =item make list take a prefix
516 If you can specify a prefix, that makes things easier for finding small files that
517 are stored in MogileFS.
519 =item more information on list
521 Have list load up the info file and parse it for information about each of the
522 big files being stored. Maybe have this as an option (-l). (This means the
523 reading and parsing of info files should be abstracted into a function.)
525 =back
527 =head1 AUTHOR
529 Mark Smith E<lt>junior@danga.comE<gt> - most of the implementation and maintenance.
531 Brad Fitzpatrick E<lt>brad@danga.comE<gt> - concepts and rough draft.
533 Robin H. Johnson E<lt>robbat2@orbis-terrarum.netE<gt> - locate function.
535 Copyright (c) 2002-2004 Danga Interactive. All rights reserved.
537 =cut
539 ##############################################################################
541 use strict;
542 use Getopt::Long;
543 use Pod::Usage qw{ pod2usage };
544 use Digest::MD5 qw{ md5_hex };
545 use Time::HiRes qw{ gettimeofday tv_interval };
546 use LWP::Simple;
547 use POSIX qw(:sys_wait_h);
548 use Compress::Zlib;
550 $| = 1;
552 use constant ERR_MISSING => 1;
553 use constant ERR_FATAL => 2;
555 my %opts;
556 $opts{help} = 0;
558 abortWithUsage() unless
559 GetOptions(
560 # general purpose options
561 'trackers=s' => \$opts{trackers},
562 'domain=s' => \$opts{domain},
563 'class=s' => \$opts{class},
564 'config=s' => \$opts{config},
565 'help' => \$opts{help},
566 'debug' => \$MogileFS::DEBUG,
567 'lib' => \$opts{lib},
569 # extract+inject options
570 'gzip|z' => \$opts{gzip},
571 'bigfile|b' => \$opts{big},
572 'nobigfile' => \$opts{nobig},
574 # inject options
575 'overwrite' => \$opts{overwrite},
576 'chunksize=s' => \$opts{chunksize},
577 'receipt=s' => \$opts{receipt},
578 'reciept=s' => \$opts{receipt}, # requested :)
579 'verify!' => \$opts{verify},
580 'description=s' => \$opts{des},
581 'concurrent=i' => \$opts{concurrent},
582 'noreplwait' => \$opts{noreplwait},
584 # extract options
585 'asfile' => \$opts{asfile},
588 # now load the config file?
589 my @confs = ( $opts{config}, "$ENV{HOME}/.mogtool", "/etc/mogilefs/mogtool.conf" );
590 foreach my $conf (@confs) {
591 next unless $conf && -e $conf;
592 open FILE, "<$conf";
593 foreach (<FILE>) {
594 s!#.*!!;
595 next unless m!(\w+)\s*=\s*(.+)!;
596 $opts{$1} = $2 unless $opts{$1};
598 close FILE;
601 # now bring in MogileFS, because hopefully we have a lib by now
602 if ($opts{lib}) {
603 eval "use lib '$opts{lib}';";
606 # no trackers and domain..?
607 unless ($opts{trackers} && $opts{domain}) {
608 abortWithUsage("--trackers and --domain configuration required");
611 eval qq{
612 use MogileFS::Client; 1
613 } or die "Failed to load MogileFS::Client module: $@\n";
615 # init connection to mogile
616 my $mogfs = get_mogfs();
618 # get our command and pass off to our functions
619 my $cmd = shift;
620 inject() if $cmd eq 'i' || $cmd eq "inject";
621 extract() if $cmd eq 'x' || $cmd eq "extract";
622 list() if $cmd eq 'ls' || $cmd eq "list";
623 listkey() if $cmd eq 'lsk' || $cmd eq "listkey";
624 mdelete() if $cmd eq 'rm' || $cmd eq "delete";
625 locate() if $cmd eq 'lo' || $cmd eq "locate";
627 # fail if we get this far
628 abortWithUsage();
630 ######################################################################
632 sub get_mogfs {
633 my @trackerinput = split(/\s*,\s*/, $opts{trackers});
634 my @trackers;
635 my %pref_ip;
636 foreach my $tracker (@trackerinput) {
637 if ($tracker =~ m!(.+)/(.+):(\d+)!) {
638 $pref_ip{$2} = $1;
639 push @trackers, "$2:$3";
640 } else {
641 push @trackers, $tracker;
645 my $mogfs = MogileFS::Client->new(
646 domain => $opts{domain},
647 hosts => \@trackers,
649 or error("Could not initialize MogileFS", ERR_FATAL);
650 $mogfs->set_pref_ip(\%pref_ip);
651 return $mogfs;
654 sub error {
655 my $err = shift() || "ERROR: no error message provided!";
657 my $mogerr = undef;
658 if ($mogerr = $mogfs->errstr) {
659 $mogerr =~ s/^\s+//;
660 $mogerr =~ s/\s+$//;
663 my $syserr = undef;
664 if ($@) {
665 $syserr = $@;
666 $syserr =~ s/[\r\n]+$//;
669 my $exitcode = shift();
671 print STDERR "$err\n";
672 print STDERR "MogileFS backend error message: $mogerr\n" if $mogerr && $exitcode != ERR_MISSING;
673 print STDERR "System error message: $@\n" if $syserr;
675 # if a second argument, exit
676 if (defined ($exitcode)) {
677 exit $exitcode+0;
681 sub inject {
682 my $src = shift @ARGV;
683 my $key = shift @ARGV;
684 abortWithUsage("source and key required to inject") unless $src && $key;
686 # make sure the source exists and the key is valid
687 die "Error: source $src doesn't exist.\n"
688 unless -e $src;
689 die "Error: key $key isn't valid; must not contain spaces or commas.\n"
690 unless $key =~ /^[^\s\,]+$/;
692 # before we get too far, find sendmail?
693 my $sendmail;
694 if ($opts{receipt}) {
695 $sendmail = `which sendmail` || '/usr/sbin/sendmail';
696 $sendmail =~ s/[\r\n]+$//;
697 unless (-e $sendmail) {
698 die "Error: attempted to find sendmail binary in /usr/sbin but couldn't.\n";
702 # open up O as the handle to use for reading data
703 my $type = 'unknown';
704 if (-d $src) {
705 my $taropts = ($opts{gzip} ? 'z' : '') . "cf";
706 $type = 'tarball';
707 open (O, '-|', 'tar', $taropts, '-', $src)
708 or die "Couldn't open tar for reading: $!\n";
709 } elsif (-f $src) {
710 $type = 'file';
711 open (O, "<$src")
712 or die "Couldn't open file for reading: $!\n";
713 } elsif (-b $src) {
714 $type = 'partition';
715 open (O, "<$src")
716 or die "Couldn't open block device for reading: $!\n";
717 } else {
718 die "Error: not file, directory, or partition.\n";
721 # now do some pre-file checking...
722 my $size = -s $src;
723 if ($type ne 'file') {
724 die "Error: you specified to store a file of type $type but didn't specify --bigfile. Please see documentation.\n"
725 unless $opts{big};
726 } elsif ($size > 64 * 1024 * 1024) {
727 die "Error: the file is more than 64MB and you didn't specify --bigfile. Please see documentation, or use --nobigfile to disable large file chunking and allow large single file uploads\n"
728 unless $opts{big} || $opts{nobig};
731 if ($opts{big} && $opts{nobig}) {
732 die "Error: You cannot specify both --bigfile and --nobigfile\n";
735 if ($opts{nobigfile} && $opts{gzip}) {
736 die "Error: --gzip is not compatible with --nobigfile\n";
739 # see if there's already a pre file?
740 if ($opts{big}) {
741 my $data = $mogfs->get_file_data("_big_pre:$key");
742 if (defined $data) {
743 unless ($opts{overwrite}) {
744 error(<<MSG, ERR_FATAL);
745 ERROR: The pre-insert file for $key exists. This indicates that a previous
746 attempt to inject a file failed--or is still running elsewhere! Please
747 verify that a previous injection of this file is finished, or run mogtool
748 again with the --overwrite inject option.
750 $$data
754 # delete the pre notice since we didn't die (overwrite must be on)
755 $mogfs->delete("_big_pre:$key")
756 or error("ERROR: Unable to delete _big_pre:$key.", ERR_FATAL);
759 # now create our pre notice
760 my $prefh = $mogfs->new_file("_big_pre:$key", $opts{class})
761 or error("ERROR: Unable to create _big_pre:$key.", ERR_FATAL);
762 $prefh->print("starttime:" . time());
763 $prefh->close()
764 or error("ERROR: Unable to save to _big_pre:$key.", ERR_FATAL);
767 # setup config and temporary variables we're going to be using
768 my $chunk_size = 64 * 1024 * 1024; # 64 MB
769 if ($opts{big}) {
770 if ($opts{chunksize} && ($opts{chunksize} =~ m!^(\d+)(G|M|K|B)?!i)) {
771 $chunk_size = $1;
772 unless (lc $2 eq 'b') {
773 $chunk_size *= (1024 ** ( { g => 3, m => 2, k => 1 }->{lc $2} || 2 ));
775 print "NOTE: Using chunksize of $chunk_size bytes.\n";
778 my $read_size = ($chunk_size > 1024*1024 ? 1024*1024 : $chunk_size);
780 # temporary variables
781 my $buf;
782 my $bufsize = 0;
783 my $chunknum = 0;
784 my %chunkinfo; # { id => [ md5, length ] }
785 my %chunkbuf; # { id => data }
786 my %children; # { pid => chunknum }
787 my %chunksout; # { chunknum => pid }
789 # this function writes out a chunk
790 my $emit = sub {
791 my $cn = shift() + 0;
792 return unless $cn;
794 # get the length of the chunk we're going to send
795 my $bufsize = length $chunkbuf{$cn};
796 return unless $bufsize;
798 # now spawn off a child to do the real work
799 if (my $pid = fork()) {
800 print "Spawned child $pid to deal with chunk number $cn.\n";
801 $chunksout{$cn} = $pid;
802 $children{$pid} = $cn;
803 return;
806 # drop other memory references we're not using anymore
807 foreach my $chunknum (keys %chunkbuf) {
808 next if $chunknum == $cn;
809 delete $chunkbuf{$chunknum};
812 # as a child, get a new mogile connection
813 my $mogfs = get_mogfs();
814 my $dkey = $opts{big} ? "$key,$chunknum" : "$key";
816 my $start_time = [ gettimeofday() ];
817 my $try = 0;
818 while (1) {
819 $try++;
820 eval {
821 my $fh = $mogfs->new_file($dkey, $opts{class}, $bufsize);
822 unless (defined $fh) {
823 die "Unable to create new file";
825 $fh->print($chunkbuf{$cn});
826 unless ($fh->close) {
827 die "Close failed";
830 if (my $err = $@) {
831 error("WARNING: Unable to save file '$dkey': $err");
832 printf "This was try #$try and it's been %.2f seconds since we first tried. Retrying...\n", tv_interval($start_time);
833 sleep 1;
834 next;
836 last;
838 my $diff = tv_interval($start_time);
839 printf " chunk $cn saved in %.2f seconds.\n", $diff;
841 # make sure we never return, always exit
842 exit 0;
845 # just used to reap our children in a loop until they're done. also
846 # handles respawning a child that failed.
847 my $reap_children = sub {
848 # find out if we have any kids dead
849 while ((my $pid = waitpid -1, WNOHANG) > 0) {
850 my $cnum = delete $children{$pid};
851 unless ($cnum) {
852 print "Error: reaped child $pid, but no idea what they were doing...\n";
853 next;
855 if (my $status = $?) {
856 print "Error: reaped child $pid for chunk $cnum returned non-zero status... Retrying...\n";
857 $emit->($cnum);
858 next;
860 my @paths = grep { defined $_ } $mogfs->get_paths($opts{big} ? "$key,$cnum" : "$key", 1);
861 unless (@paths) {
862 print "Error: reaped child $pid for chunk $cnum but no paths exist... Retrying...\n";
863 $emit->($cnum);
864 next;
866 delete $chunkbuf{$cnum};
867 delete $chunksout{$cnum};
868 print "Child $pid successfully finished with chunk $cnum.\n";
872 # this function handles parallel threads
873 $opts{concurrent} ||= 1;
874 $opts{concurrent} = 1 if $opts{concurrent} < 1;
875 my $handle_children = sub {
876 # here we pause while our children are working
877 my $first = 1;
878 while ($first || scalar(keys %children) >= $opts{concurrent}) {
879 $first = 0;
880 $reap_children->();
881 select undef, undef, undef, 0.1;
884 # now spawn until we hit the limit
885 foreach my $cnum (keys %chunkbuf) {
886 next if $chunksout{$cnum};
887 $emit->($cnum);
888 last if scalar(keys %children) >= $opts{concurrent};
892 # setup compression stuff
893 my $dogzip = 0;
894 my $zlib;
895 if ($opts{gzip}) {
896 # if they turned gzip on we may or may not need this stream, so make it
897 $zlib = deflateInit()
898 or error("Error: unable to create gzip deflation stream", ERR_FATAL);
901 my $upload_fh;
902 if ($opts{nobig}) {
903 eval {
904 $upload_fh = $mogfs->new_file($key, $opts{class}, $size);
905 unless (defined $upload_fh) {
906 die "Unable to create new file";
909 if (my $err = $@) {
910 error("ERROR: Unable to open file '$key': $err");
911 die "Giving up.\n";
915 # read one meg chunks while we have data
916 my $sum = 0;
917 my $readbuf = '';
918 while (my $rv = read(O, $readbuf, $read_size)) {
919 # if this is a file, and this is our first read, see if it's gzipped
920 if (!$sum && $rv >= 2) {
921 if (substr($readbuf, 0, 2) eq "\x1f\x8b") {
922 # this is already gzipped, so just mark it as such and insert it
923 $opts{gzip} = 1;
924 } else {
925 # now turn on our gzipping if the user wants the output gzipped
926 $dogzip = 1 if $opts{gzip};
930 # now run it through the deflation stream before we process it here
931 if ($dogzip) {
932 my ($out, $status) = $zlib->deflate($readbuf);
933 error("Error: Deflation failure processing stream", ERR_FATAL)
934 unless $status == Z_OK;
935 $readbuf = $out;
936 $rv = length $readbuf;
938 # we don't always get a chunk from deflate
939 next unless $rv;
942 $sum += $rv;
943 # Short circuit if we're just plopping up a big file.
944 if ($opts{nobig}) {
945 $upload_fh->print($readbuf);
946 if ($size) {
947 printf "Upload so far: $sum bytes [%.2f%% complete]\n",
948 ($sum / $size * 100);
950 next;
953 # now stick our data into our real buffer
954 $buf .= $readbuf;
955 $bufsize += $rv;
956 $readbuf = '';
958 # generate output
959 if ($type ne 'tarball' && $size && $size > $read_size) {
960 printf "Buffer so far: $bufsize bytes [%.2f%% complete]\r", ($sum / $size * 100);
961 } else {
962 print "Buffer so far: $bufsize bytes\r";
965 # if we have one chunk, handle it
966 if ($opts{big} && $bufsize >= $chunk_size) {
967 $chunkbuf{++$chunknum} = substr($buf, 0, $chunk_size);
969 # calculate the md5, print out status, and save this chunk
970 my $md5 = md5_hex($buf);
971 if ($opts{big}) {
972 print "chunk $key,$chunknum: $md5, len = $chunk_size\n";
973 } else {
974 print "file $key: $md5, len = $chunk_size\n";
976 $chunkinfo{$chunknum} = [ $md5, $chunk_size ];
978 # reset for the next read loop
979 $buf = substr($buf, $chunk_size);
980 $bufsize = length $buf;
982 # now spawn children to save chunks
983 $handle_children->();
986 close O;
988 # now we need to flush the gzip engine
989 if ($dogzip) {
990 my ($out, $status) = $zlib->flush;
991 error("Error: Deflation failure processing stream", ERR_FATAL)
992 unless $status == Z_OK;
993 $buf .= $out;
994 $bufsize += length $out;
995 $sum += length $out;
998 # final piece
999 if ($buf) {
1000 $chunkbuf{++$chunknum} = $buf;
1001 my $md5 = md5_hex($buf);
1002 if ($opts{big}) {
1003 print "chunk $key,$chunknum: $md5, len = $bufsize\n";
1004 } else {
1005 print "file $key: $md5, len = $bufsize\n";
1007 $chunkinfo{$chunknum} = [ $md5, $bufsize ];
1010 # now, while we still have chunks to process...
1011 while (%chunkbuf) {
1012 $handle_children->();
1013 sleep 1;
1016 # verify replication and chunks
1017 my %paths; # { chunknum => [ path, path, path ... ] }
1018 my %still_need = ( %chunkinfo );
1019 while (%still_need) {
1020 print "Replicating: " . join(' ', sort { $a <=> $b } keys %still_need) . "\n";
1021 sleep 1; # give things time to replicate some
1023 # now iterate over each and get the paths
1024 foreach my $num (keys %still_need) {
1025 my $dkey = $opts{big} ? "$key,$num" : $key;
1026 my @npaths = grep { defined $_ } $mogfs->get_paths($dkey, 1);
1028 unless (@npaths) {
1029 error("FAILURE: chunk $num has no paths at all.", ERR_FATAL);
1032 if (scalar(@npaths) >= 2 || $opts{noreplwait}) {
1033 # okay, this one's replicated, actually verify the paths
1034 foreach my $path (@npaths) {
1035 if ($opts{verify}) {
1036 print " Verifying chunk $num, path $path...";
1037 my $data = get($path);
1038 my $len = length($data);
1039 my $md5 = md5_hex($data);
1040 if ($md5 ne $chunkinfo{$num}->[0]) {
1041 print "md5 mismatch\n";
1042 next;
1043 } elsif ($len != $chunkinfo{$num}->[1]) {
1044 print "length mismatch ($len, $chunkinfo{$num}->[1])\n";
1045 next;
1047 print "ok\n";
1048 } elsif ($opts{receipt}) {
1049 # just do a quick size check
1050 print " Size verifying chunk $num, path $path...";
1051 my $clen = (head($path))[1] || 0;
1052 unless ($clen == $chunkinfo{$num}->[1]) {
1053 print "length mismatch ($clen, $chunkinfo{$num}->[1])\n";
1054 next;
1056 print "ok\n";
1058 push @{$paths{$num} ||= []}, $path;
1061 # now make sure %paths contains at least 2 verified
1062 next if scalar(@{$paths{$num} || []}) < 2 && !$opts{noreplwait};
1063 delete $still_need{$num};
1068 # prepare the info file
1069 my $des = $opts{des} || 'no description';
1070 my $compressed = $opts{gzip} ? '1' : '0';
1071 #FIXME: add 'partblocks' to info file
1073 # create the info file
1074 my $info = <<INFO;
1075 des $des
1076 type $type
1077 compressed $compressed
1078 filename $src
1079 chunks $chunknum
1080 size $sum
1082 INFO
1083 foreach (sort { $a <=> $b } keys %chunkinfo) {
1084 $info .= "part $_ bytes=$chunkinfo{$_}->[1] md5=$chunkinfo{$_}->[0] paths: ";
1085 $info .= join(', ', @{$paths{$_} || []});
1086 $info .= "\n";
1089 # now write out the info file
1090 if ($opts{big}) {
1091 my $fhinfo = $mogfs->new_file("_big_info:$key", $opts{class})
1092 or error("ERROR: Unable to create _big_info:$key.", ERR_FATAL);
1093 $fhinfo->print($info);
1094 $fhinfo->close()
1095 or error("ERROR: Unable to save _big_info:$key.", ERR_FATAL);
1097 # verify info file
1098 print "Waiting for info file replication...\n" unless $opts{noreplwait};
1099 while (!$opts{noreplwait}) {
1100 my @paths = $mogfs->get_paths("_big_info:$key", 1);
1101 if (@paths < 2) {
1102 select undef, undef, undef, 0.25;
1103 next;
1105 foreach my $path (@paths) {
1106 my $data = get($path);
1107 error(" FATAL: content mismatch on $path", ERR_FATAL)
1108 unless $data eq $info;
1110 last;
1113 # now delete our pre file
1114 print "Deleting pre-insert file...\n";
1115 $mogfs->delete("_big_pre:$key")
1116 or error("ERROR: Unable to delete _big_pre:$key", ERR_FATAL);
1119 # Wrap up the non big file...
1120 if ($opts{nobig}) {
1121 eval {
1122 unless ($upload_fh->close) {
1123 die "Close failed";
1126 if (my $err = $@) {
1127 error("ERROR: Unable to close file '$key': $err");
1128 die "Giving up.\n";
1132 # now email and save a receipt
1133 if ($opts{receipt}) {
1134 open MAIL, "| $sendmail -t"
1135 or error("ERROR: Unable to open sendmail binary: $sendmail", ERR_FATAL);
1136 print MAIL <<MAIL;
1137 To: $opts{receipt}
1138 From: mogtool\@dev.null
1139 Subject: mogtool.$key.receipt
1141 $info
1143 MAIL
1144 close MAIL;
1145 print "Receipt emailed.\n";
1147 # now dump to a file
1148 open FILE, ">mogtool.$key.receipt"
1149 or error("ERROR: Unable to create file mogtool.$key.receipt in current directory.", ERR_FATAL);
1150 print FILE $info;
1151 close FILE;
1152 print "Receipt stored in mogtool.$key.receipt.\n";
1155 exit 0;
1158 sub _parse_info {
1159 my $info = shift;
1160 my $res = {};
1162 # parse out the header data
1163 $res->{des} = ($info =~ /^des\s+(.+)$/m) ? $1 : undef;
1164 $res->{type} = ($info =~ /^type\s+(.+)$/m) ? $1 : undef;
1165 $res->{compressed} = ($info =~ /^compressed\s+(.+)$/m) ? $1 : undef;
1166 $res->{filename} = ($info =~ /^filename\s+(.+)$/m) ? $1 : undef;
1167 $res->{chunks} = ($info =~ /^chunks\s+(\d+)$/m) ? $1 : undef;
1168 $res->{size} = ($info =~ /^size\s+(\d+)$/m) ? $1 : undef;
1170 # now get the pieces
1171 $res->{maxnum} = undef;
1172 while ($info =~ /^part\s+(\d+)\s+bytes=(\d+)\s+md5=(.+)\s+paths:\s+(.+)$/mg) {
1173 $res->{maxnum} = $1 if !defined $res->{maxnum} || $1 > $res->{maxnum};
1174 $res->{parts}->{$1} = {
1175 bytes => $2,
1176 md5 => $3,
1177 paths => [ split(/\s*,\s*/, $4) ],
1181 return $res;
1184 sub extract {
1185 my $key = shift @ARGV;
1186 my $dest = shift @ARGV;
1187 abortWithUsage("key and destination required to extract") unless $key && $dest;
1189 error("Error: key $key isn't valid; must not contain spaces or commas.", ERR_FATAL)
1190 unless $key =~ /^[^\s\,]+$/;
1191 unless ($dest eq '-' || $dest eq '.') {
1192 error("Error: destination exists: $dest (specify --overwrite if you want to kill it)", ERR_FATAL)
1193 if -e $dest && !$opts{overwrite} && !-b $dest;
1196 # see if this is really a big file
1197 my $file;
1198 if ($opts{big}) {
1199 my $info = $mogfs->get_file_data("_big_info:$key");
1200 die "$key doesn't seem to be a valid big file.\n"
1201 unless $info && $$info;
1203 # verify validity
1204 $file = _parse_info($$info);
1206 # make sure we have enough info
1207 error("Error: info file doesn't contain the number of chunks", ERR_FATAL)
1208 unless $file->{chunks};
1209 error("Error: info file doesn't contain the total size", ERR_FATAL)
1210 unless $file->{size};
1212 } else {
1213 # not a big file, so it has to be of a certain type
1214 $file->{type} = 'file';
1215 $file->{maxnum} = 1;
1216 $file->{parts}->{1} = {
1217 paths => [ grep { defined $_ } $mogfs->get_paths($key) ],
1220 # now, if it doesn't exist..
1221 unless (scalar(@{$file->{parts}->{1}->{paths}})) {
1222 error("Error: file doesn't exist (or did you forget --bigfile?)", ERR_FATAL);
1226 # several cases.. going to STDOUT?
1227 if ($dest eq '-') {
1228 *O = *STDOUT;
1229 } else {
1230 # open up O as the handle to use for reading data
1231 if ($file->{type} eq 'file' || $file->{type} eq 'partition' ||
1232 ($file->{type} eq 'tarball' && $opts{asfile})) {
1233 # just write it to the file with this name, but don't overwrite?
1234 if ($dest eq '.') {
1235 $dest = $file->{filename};
1236 $dest =~ s!^(.+)/!!;
1238 if (-b $dest) {
1239 # if we're targeting a block device...
1240 warn "FIXME: add in block checking\n";
1241 open O, ">$dest"
1242 or die "Couldn't open $dest: $!\n";
1243 } elsif (-e $dest) {
1244 if ($opts{overwrite}) {
1245 open O, ">$dest"
1246 or die "Couldn't open $dest: $!\n";
1247 } else {
1248 die "File already exists: $dest ... won't overwrite without --overwrite.\n";
1250 } else {
1251 open O, ">$dest"
1252 or die "Couldn't open $dest: $!\n";
1255 } elsif ($file->{type} eq 'tarball') {
1256 my $taropts = ($file->{compressed} ? 'z' : '') . "xf";
1257 open O, '|-', 'tar', $taropts, '-'
1258 or die "Couldn't open tar for writing: $!\n";
1260 } else {
1261 die "Error: unable to handle type '$file->{type}'\n";
1265 # start fetching pieces
1266 foreach my $i (1..$file->{maxnum}) {
1267 print "Fetching piece $i...\n";
1269 foreach my $path (@{$file->{parts}->{$i}->{paths} || []}) {
1270 print " Trying $path...\n";
1271 my $data = get($path);
1272 next unless $data;
1274 # now verify MD5, etc
1275 if ($opts{big}) {
1276 my $len = length $data;
1277 my $md5 = md5_hex($data);
1278 print " ($len bytes, $md5)\n";
1279 next unless $len == $file->{parts}->{$i}->{bytes} &&
1280 $md5 eq $file->{parts}->{$i}->{md5};
1283 # this chunk verified, write it out
1284 print O $data;
1285 last;
1289 # at this point the file should be complete!
1290 close O;
1291 print "Done.\n";
1293 # now make sure we have enough data
1294 #$ mogtool [opts] extract <key> {<file>,<dir>,<device>}
1295 #=> - (for STDOUT) (if compressed, add "z" flag)
1296 #=> . (to untar) (if compressed, do nothing???, make .tar.gz file -- unless they use -z again?)
1297 #=> /dev/sda4 (but check /proc/partitions that it's big enough) (if compress, Compress::Zlib to ungzip
1298 # => foo.jpg (write it to a file)
1301 # now check
1302 exit 0;
1305 sub locate {
1306 my $key = shift(@ARGV);
1307 abortWithUsage("key required to locate") unless $key;
1308 $opts{verify} = 1 unless defined $opts{verify};
1309 $opts{bigfile} = 0 unless $opts{big};
1311 my $dkey = $key;
1312 $dkey = "_big_info:$key" if $opts{big};
1314 # list all paths for the file
1315 my $ct = 0;
1316 my @paths = [];
1317 my @paths = grep { defined $_ }
1318 $mogfs->get_paths($dkey,
1319 {verify => $opts{verify}, pathcount => 1024 },
1321 if(@paths == 0 && $mogfs->errstr =~ /unknown_key/) {
1322 error("Error: bigfile $key doesn't exist (or did you force --bigfile?)", ERR_MISSING) if $opts{big};
1323 error("Error: file $key doesn't exist (or did you forget --bigfile?)", ERR_MISSING);
1325 error("Error: Something went wrong", ERR_FATAL) if($mogfs->errstr);
1326 foreach my $key (@paths) {
1327 $ct++;
1328 print "$key\n";
1330 print "#$ct paths found\n";
1331 exit 0 if($ct > 0);
1332 exit ERR_MISSING;
1335 sub list {
1336 # list all big files in mogile
1337 my ($ct, $after, $list);
1338 my $ct = 0;
1339 while (($after, $list) = $mogfs->list_keys("_big_info:", $after)) {
1340 last unless $list && @$list;
1342 # now extract the key and dump it
1343 foreach my $key (@$list) {
1344 next unless $key =~ /^_big_info:(.+)$/;
1346 $key = $1;
1347 $ct++;
1349 print "$key\n";
1352 print "#$ct files found\n";
1353 exit 0 if($ct > 0);
1354 exit ERR_MISSING;
1357 sub listkey {
1358 my $key_pattern = shift @ARGV;
1359 $key_pattern = '' unless defined $key_pattern;
1361 # list all files matching a key
1362 my ($ct, $after, $list);
1363 $ct = 0;
1364 while (($after, $list) = $mogfs->list_keys("$key_pattern", $after)) {
1365 last unless $list && @$list;
1367 # now extract the key and dump it
1368 foreach my $key (@$list) {
1370 $ct++;
1372 print "$key\n";
1375 error("Error: Something went wrong", ERR_FATAL) if ($mogfs->errstr && ! ($mogfs->errstr =~ /none_match/));
1376 print "#$ct files found\n";
1377 exit 0 if($ct > 0);
1378 exit ERR_MISSING;
1381 sub mdelete {
1382 my $key = shift(@ARGV);
1383 abortWithUsage("key required to delete") unless $key;
1385 # delete simple file
1386 unless ($opts{big}) {
1387 my $rv = $mogfs->delete($key);
1388 error("Failed to delete: $key.", ERR_FATAL)
1389 unless $rv;
1390 print "Deleted.\n";
1391 exit 0;
1394 # delete big file
1395 my $info = $mogfs->get_file_data("_big_info:$key");
1396 error("$key doesn't seem to be a valid big file.", ERR_FATAL)
1397 unless $info && $$info;
1399 # verify validity
1400 my $file = _parse_info($$info);
1402 # make sure we have enough info to delete
1403 error("Error: info file doesn't contain required information?", ERR_FATAL)
1404 unless $file->{chunks} && $file->{maxnum};
1406 # now delete each chunk, best attempt
1407 foreach my $i (1..$file->{maxnum}) {
1408 $mogfs->delete("$key,$i");
1411 # delete the main pieces
1412 my $rv = $mogfs->delete("_big_info:$key");
1413 error("Unable to delete _big_info:$key.", ERR_FATAL)
1414 unless $rv;
1415 print "Deleted.\n";
1416 exit 0;
1419 abortWithUsage() if $opts{help};
1421 sub abortWithUsage {
1422 my $msg = "!!!mogtool is DEPRECATED and will be removed in the future!!!\n";
1423 $msg .= join '', @_;
1425 if ( $msg ) {
1426 pod2usage( -verbose => 1, -exitval => 1, -message => "\n$msg\n" );
1427 } else {
1428 pod2usage( -verbose => 1, -exitval => 1 );
1433 __END__
1435 Usage: mogtool [opts] <command> [command-opts] [command-args]
1437 General options:
1438 * --trackers=<ip:port>[,<ip:port>]*
1440 * --domain=<domain>
1442 * --class=<class>
1444 * --conf=<file> Location of config file listing trackers, default
1445 domain, and default class
1447 Default: ~/.mogilefs, /etc/mogilefs/mogilefs.conf
1449 * --bigfile | -b Tell mogtool to split file into 64MB chunks and
1450 checksum the chunks,
1452 * --gzip | -z Use gzip compression/decompression
1455 Commands:
1457 inject | i Inject a file into MogileFS, by key
1458 extract | x Extract a file from MogileFS, by key
1459 list | ls List large files in MogileFS
1461 'inject' syntax:
1463 $ mogtool [opts] inject [i-opts] <file,dir,device> <key>
1465 Valid i-opts:
1466 --overwrite Ignore existing _big_pre: and start anew.
1467 --chunksize=n Set the size of individual chunk files. n is in the format of
1468 number[scale] so 10 is 10 megabytes, 10M is also 10 megs, 10G, 10B, 10K...
1469 case insensitive
1470 --receipt=email Send a receipt to the specified email address
1471 --verify Make sure things replicate and then check the MD5s?
1472 --des=string Set the file description
1475 $ mogtool [opts] extract <key> {<file>,<dir>,<device>}
1476 => - (for STDOUT) (if compressed, add "z" flag)
1477 => . (to untar) (if compressed, do nothing???, make .tar.gz file -- unless they use -z again?)
1478 => /dev/sda4 (but check /proc/partitions that it's big enough) (if compress, Compress::Zlib to ungzip)
1479 => foo.jpg (write it to a file)
1482 --key
1484 # mogtool add --key='roast.sdb1.2004-11-07' -z /dev/sda1
1488 <key> = "cow.2004.11.17"
1490 # this is a temporary file that we delete when we're doing recording all chunks
1492 _big_pre:<key>
1494 starttime=UNIXTIMESTAMP
1496 # when done, we write the _info file and delete the _pre.
1498 _big_info:<key>
1500 des Cow's ljdb backup as of 2004-11-17
1501 type { partition, file, tarball }
1502 compressed {0, 1}
1503 filename ljbinlog.305.gz
1504 partblocks 234324324324
1507 part 1 <bytes> <md5hex>
1508 part 2 <bytes> <md5hex>
1509 part 3 <bytes> <md5hex>
1510 part 4 <bytes> <md5hex>
1511 part 5 <bytes> <md5hex>
1513 _big:<key>,<n>
1514 _big:<key>,<n>
1515 _big:<key>,<n>
1518 Receipt format:
1520 BEGIN MOGTOOL RECIEEPT
1521 type partition
1522 des Foo
1523 compressed foo
1525 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1526 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1527 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1528 part 1 bytes=23423432 md5=2349823948239423984 paths: http://dev5/2/23/23/.fid, http://dev6/23/423/4/324.fid
1531 END RECIEPT
1535 perl -w bin/mogtool --gzip inject --overwrite --chunksize=24M --des="This is a description" --receipt="marksmith@danga.com" ../music/jesse/Unsorted jesse.music.unsorted