Released as 20240522 ('Tbilisi')
[parallel.git] / src / parallel_tutorial.pod
blob26bba7daa7254bbdf27c6c4d7ce08643310f9da8
1 #!/usr/bin/perl -w
3 # SPDX-FileCopyrightText: 2021-2024 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
4 # SPDX-License-Identifier: GFDL-1.3-or-later
5 # SPDX-License-Identifier: CC-BY-SA-4.0
7 =head1 GNU Parallel Tutorial
9 This tutorial shows off much of GNU B<parallel>'s functionality. The
10 tutorial is meant to learn the options in and syntax of GNU
11 B<parallel>. The tutorial is B<not> to show realistic examples from the
12 real world.
14 =head2 Reader's guide
16 If you prefer reading a book buy B<GNU Parallel 2018> at
17 https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
18 or download it at: https://doi.org/10.5281/zenodo.1146014
20 Otherwise start by watching the intro videos for a quick introduction:
21 https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
23 Then browse through the examples (B<man parallel_examples>). That will give
24 you an idea of what GNU B<parallel> is capable of.
26 If you want to dive even deeper: spend a couple of hours walking
27 through the tutorial (B<man parallel_tutorial>). Your command line
28 will love you for it.
30 Finally you may want to look at the rest of the manual (B<man
31 parallel>) if you have special needs not already covered.
33 If you want to know the design decisions behind GNU B<parallel>, try:
34 B<man parallel_design>. This is also a good intro if you intend to
35 change GNU B<parallel>.
39 =head1 Prerequisites
41 To run this tutorial you must have the following:
43 =over 9
45 =item parallel >= version 20160822
47 Install the newest version using your package manager (recommended for
48 security reasons), the way described in README, or with this command:
50 $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
51 fetch -o - http://pi.dk/3 ) > install.sh
52 $ sha1sum install.sh
53 12345678 51621b7f 1ee103c0 0783aae4 ef9889f8
54 $ md5sum install.sh
55 62eada78 703b5500 241b8e50 baf62758
56 $ sha512sum install.sh
57 160d3159 9480cf5c a101512f 150b7ac0 206a65dc 86f2bb6b bdf1a2bc 96bc6d06
58 7f8237c2 0964b67f bccf8a93 332528fa 11e5ab43 2a6226a6 ceb197ab 7f03c061
59 $ bash install.sh
61 This will also install the newest version of the tutorial which you
62 can see by running this:
64 man parallel_tutorial
66 Most of the tutorial will work on older versions, too.
69 =item abc-file:
71 The file can be generated by this command:
73 parallel -k echo ::: A B C > abc-file
75 =item def-file:
77 The file can be generated by this command:
79 parallel -k echo ::: D E F > def-file
81 =item abc0-file:
83 The file can be generated by this command:
85 perl -e 'printf "A\0B\0C\0"' > abc0-file
87 =item abc_-file:
89 The file can be generated by this command:
91 perl -e 'printf "A_B_C_"' > abc_-file
93 =item tsv-file.tsv
95 The file can be generated by this command:
97 perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
99 =item num8
101 The file can be generated by this command:
103 perl -e 'for(1..8){print "$_\n"}' > num8
105 =item num128
107 The file can be generated by this command:
109 perl -e 'for(1..128){print "$_\n"}' > num128
111 =item num30000
113 The file can be generated by this command:
115 perl -e 'for(1..30000){print "$_\n"}' > num30000
117 =item num1000000
119 The file can be generated by this command:
121 perl -e 'for(1..1000000){print "$_\n"}' > num1000000
123 =item num_%header
125 The file can be generated by this command:
127 (echo %head1; echo %head2; \
128 perl -e 'for(1..10){print "$_\n"}') > num_%header
130 =item fixedlen
132 The file can be generated by this command:
134 perl -e 'print "HHHHAAABBBCCC"' > fixedlen
136 =item For remote running: ssh login on 2 servers with no password in
137 $SERVER1 and $SERVER2 must work.
139 SERVER1=server.example.com
140 SERVER2=server2.example.net
142 So you must be able to do this without entering a password:
144 ssh $SERVER1 echo works
145 ssh $SERVER2 echo works
147 It can be setup by running B<ssh-keygen -t dsa; ssh-copy-id $SERVER1>
148 and using an empty passphrase, or you can use B<ssh-agent>.
150 =back
153 =head1 Input sources
155 GNU B<parallel> reads input from input sources. These can be files, the
156 command line, and stdin (standard input or a pipe).
158 =head2 A single input source
160 Input can be read from the command line:
162 parallel echo ::: A B C
164 Output (the order may be different because the jobs are run in
165 parallel):
171 The input source can be a file:
173 parallel -a abc-file echo
175 Output: Same as above.
177 STDIN (standard input) can be the input source:
179 cat abc-file | parallel echo
181 Output: Same as above.
184 =head2 Multiple input sources
186 GNU B<parallel> can take multiple input sources given on the command
187 line. GNU B<parallel> then generates all combinations of the input
188 sources:
190 parallel echo ::: A B C ::: D E F
192 Output (the order may be different):
204 The input sources can be files:
206 parallel -a abc-file -a def-file echo
208 Output: Same as above.
210 STDIN (standard input) can be one of the input sources using B<->:
212 cat abc-file | parallel -a - -a def-file echo
214 Output: Same as above.
216 Instead of B<-a> files can be given after B<::::>:
218 cat abc-file | parallel echo :::: - def-file
220 Output: Same as above.
222 ::: and :::: can be mixed:
224 parallel echo ::: A B C :::: def-file
226 Output: Same as above.
228 =head3 Linking arguments from input sources
230 With B<--link> you can link the input sources and get one argument
231 from each input source:
233 parallel --link echo ::: A B C ::: D E F
235 Output (the order may be different):
241 If one of the input sources is too short, its values will wrap:
243 parallel --link echo ::: A B C D E ::: F G
245 Output (the order may be different):
253 For more flexible linking you can use B<:::+> and B<::::+>. They work
254 like B<:::> and B<::::> except they link the previous input source to
255 this input source.
257 This will link ABC to GHI:
259 parallel echo :::: abc-file :::+ G H I :::: def-file
261 Output (the order may be different):
263 A G D
264 A G E
265 A G F
266 B H D
267 B H E
268 B H F
269 C I D
270 C I E
271 C I F
273 This will link GHI to DEF:
275 parallel echo :::: abc-file ::: G H I ::::+ def-file
277 Output (the order may be different):
279 A G D
280 A H E
281 A I F
282 B G D
283 B H E
284 B I F
285 C G D
286 C H E
287 C I F
289 If one of the input sources is too short when using B<:::+> or
290 B<::::+>, the rest will be ignored:
292 parallel echo ::: A B C D E :::+ F G
294 Output (the order may be different):
300 =head2 Changing the argument separator.
302 GNU B<parallel> can use other separators than B<:::> or B<::::>. This is
303 typically useful if B<:::> or B<::::> is used in the command to run:
305 parallel --arg-sep ,, echo ,, A B C :::: def-file
307 Output (the order may be different):
319 Changing the argument file separator:
321 parallel --arg-file-sep // echo ::: A B C // def-file
323 Output: Same as above.
326 =head2 Changing the argument delimiter
328 GNU B<parallel> will normally treat a full line as a single argument: It
329 uses B<\n> as argument delimiter. This can be changed with B<-d>:
331 parallel -d _ echo :::: abc_-file
333 Output (the order may be different):
339 NUL can be given as B<\0>:
341 parallel -d '\0' echo :::: abc0-file
343 Output: Same as above.
345 A shorthand for B<-d '\0'> is B<-0> (this will often be used to read files
346 from B<find ... -print0>):
348 parallel -0 echo :::: abc0-file
350 Output: Same as above.
352 =head2 End-of-file value for input source
354 GNU B<parallel> can stop reading when it encounters a certain value:
356 parallel -E stop echo ::: A B stop C D
358 Output:
363 =head2 Skipping empty lines
365 Using B<--no-run-if-empty> GNU B<parallel> will skip empty lines.
367 (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
369 Output:
375 =head1 Building the command line
377 =head2 No command means arguments are commands
379 If no command is given after parallel the arguments themselves are
380 treated as commands:
382 parallel ::: ls 'echo foo' pwd
384 Output (the order may be different):
386 [list of files in current dir]
388 [/path/to/current/working/dir]
390 The command can be a script, a binary or a Bash function if the function is
391 exported using B<export -f>:
393 # Only works in Bash
394 my_func() {
395 echo in my_func $1
397 export -f my_func
398 parallel my_func ::: 1 2 3
400 Output (the order may be different):
402 in my_func 1
403 in my_func 2
404 in my_func 3
406 =head2 Replacement strings
408 =head3 The 7 predefined replacement strings
410 GNU B<parallel> has several replacement strings. If no replacement
411 strings are used the default is to append B<{}>:
413 parallel echo ::: A/B.C
415 Output:
417 A/B.C
419 The default replacement string is B<{}>:
421 parallel echo {} ::: A/B.C
423 Output:
425 A/B.C
427 The replacement string B<{.}> removes the extension:
429 parallel echo {.} ::: A/B.C
431 Output:
435 The replacement string B<{/}> removes the path:
437 parallel echo {/} ::: A/B.C
439 Output:
443 The replacement string B<{//}> keeps only the path:
445 parallel echo {//} ::: A/B.C
447 Output:
451 The replacement string B<{/.}> removes the path and the extension:
453 parallel echo {/.} ::: A/B.C
455 Output:
459 The replacement string B<{#}> gives the job number:
461 parallel echo {#} ::: A B C
463 Output (the order may be different):
469 The replacement string B<{%}> gives the job slot number (between 1 and
470 number of jobs to run in parallel):
472 parallel -j 2 echo {%} ::: A B C
474 Output (the order may be different and 1 and 2 may be swapped):
480 =head3 Changing the replacement strings
482 The replacement string B<{}> can be changed with B<-I>:
484 parallel -I ,, echo ,, ::: A/B.C
486 Output:
488 A/B.C
490 The replacement string B<{.}> can be changed with B<--extensionreplace>:
492 parallel --extensionreplace ,, echo ,, ::: A/B.C
494 Output:
498 The replacement string B<{/}> can be replaced with B<--basenamereplace>:
500 parallel --basenamereplace ,, echo ,, ::: A/B.C
502 Output:
506 The replacement string B<{//}> can be changed with B<--dirnamereplace>:
508 parallel --dirnamereplace ,, echo ,, ::: A/B.C
510 Output:
514 The replacement string B<{/.}> can be changed with B<--basenameextensionreplace>:
516 parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
518 Output:
522 The replacement string B<{#}> can be changed with B<--seqreplace>:
524 parallel --seqreplace ,, echo ,, ::: A B C
526 Output (the order may be different):
532 The replacement string B<{%}> can be changed with B<--slotreplace>:
534 parallel -j2 --slotreplace ,, echo ,, ::: A B C
536 Output (the order may be different and 1 and 2 may be swapped):
542 =head3 Perl expression replacement string
544 When predefined replacement strings are not flexible enough a perl
545 expression can be used instead. One example is to remove two
546 extensions: foo.tar.gz becomes foo
548 parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
550 Output:
554 In B<{= =}> you can access all of GNU B<parallel>'s internal functions
555 and variables. A few are worth mentioning.
557 B<total_jobs()> returns the total number of jobs:
559 parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
561 Output:
563 Job 1 of 5
564 Job 2 of 5
565 Job 3 of 5
566 Job 4 of 5
567 Job 5 of 5
569 B<Q(...)> shell quotes the string:
571 parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
573 Output:
575 */!#$ shell quoted is \*/\!\#\$
577 B<skip()> skips the job:
579 parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
581 Output:
588 B<@arg> contains the input source variables:
590 parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
591 ::: {1..3} ::: {1..3}
593 Output:
602 If the strings B<{=> and B<=}> cause problems they can be replaced with B<--parens>:
604 parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
605 ::: foo.tar.gz
607 Output:
611 To define a shorthand replacement string use B<--rpl>:
613 parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
614 ::: foo.tar.gz
616 Output: Same as above.
618 If the shorthand starts with B<{> it can be used as a positional
619 replacement string, too:
621 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
622 ::: foo.tar.gz
624 Output: Same as above.
626 If the shorthand contains matching parenthesis the replacement string
627 becomes a dynamic replacement string and the string in the parenthesis
628 can be accessed as $$1. If there are multiple matching parenthesis,
629 the matched strings can be accessed using $$2, $$3 and so on.
631 You can think of this as giving arguments to the replacement
632 string. Here we give the argument B<.tar.gz> to the replacement string
633 B<{%I<string>}> which removes I<string>:
635 parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
637 Output:
639 foo.zip
641 Here we give the two arguments B<tar.gz> and B<zip> to the replacement
642 string B<{/I<string1>/I<string2>}> which replaces I<string1> with
643 I<string2>:
645 parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
646 ::: foo.tar.gz
648 Output:
650 foo.zip
653 GNU B<parallel>'s 7 replacement strings are implemented as this:
655 --rpl '{} '
656 --rpl '{#} $_=$job->seq()'
657 --rpl '{%} $_=$job->slot()'
658 --rpl '{/} s:.*/::'
659 --rpl '{//} $Global::use{"File::Basename"} ||=
660 eval "use File::Basename; 1;"; $_ = dirname($_);'
661 --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
662 --rpl '{.} s:\.[^/.]+$::'
664 =head3 Positional replacement strings
666 With multiple input sources the argument from the individual input
667 sources can be accessed with S<< B<{>numberB<}> >>:
669 parallel echo {1} and {2} ::: A B ::: C D
671 Output (the order may be different):
673 A and C
674 A and D
675 B and C
676 B and D
678 The positional replacement strings can also be modified using B</>, B<//>, B</.>, and B<.>:
680 parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
682 Output (the order may be different):
684 /=B.C //=A /.=B .=A/B
685 /=E.F //=D /.=E .=D/E
687 If a position is negative, it will refer to the input source counted
688 from behind:
690 parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
691 ::: A B ::: C D ::: E F
693 Output (the order may be different):
695 1=A 2=C 3=E -1=E -2=C -3=A
696 1=A 2=C 3=F -1=F -2=C -3=A
697 1=A 2=D 3=E -1=E -2=D -3=A
698 1=A 2=D 3=F -1=F -2=D -3=A
699 1=B 2=C 3=E -1=E -2=C -3=B
700 1=B 2=C 3=F -1=F -2=C -3=B
701 1=B 2=D 3=E -1=E -2=D -3=B
702 1=B 2=D 3=F -1=F -2=D -3=B
705 =head3 Positional perl expression replacement string
707 To use a perl expression as a positional replacement string simply
708 prepend the perl expression with number and space:
710 parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
711 ::: bar ::: foo.tar.gz
713 Output:
715 foo bar
717 If a shorthand defined using B<--rpl> starts with B<{> it can be used as
718 a positional replacement string, too:
720 parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
721 ::: bar ::: foo.tar.gz
723 Output: Same as above.
726 =head3 Input from columns
728 The columns in a file can be bound to positional replacement strings
729 using B<--colsep>. Here the columns are separated by TAB (\t):
731 parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
733 Output (the order may be different):
735 1=f1 2=f2
736 1=A 2=B
737 1=C 2=D
739 =head3 Header defined replacement strings
741 With B<--header> GNU B<parallel> will use the first value of the input
742 source as the name of the replacement string. Only the non-modified
743 version B<{}> is supported:
745 parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
747 Output (the order may be different):
749 f1=A f2=C
750 f1=A f2=D
751 f1=B f2=C
752 f1=B f2=D
754 It is useful with B<--colsep> for processing files with TAB separated values:
756 parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
757 :::: tsv-file.tsv
759 Output (the order may be different):
761 f1=A f2=B
762 f1=C f2=D
764 =head3 More pre-defined replacement strings with --plus
766 B<--plus> adds the replacement strings B<{+/} {+.} {+..} {+...} {..} {...}
767 {/..} {/...} {##}>. The idea being that B<{+foo}> matches the opposite of B<{foo}>
768 and B<{}> = B<{+/}>/B<{/}> = B<{.}>.B<{+.}> = B<{+/}>/B<{/.}>.B<{+.}> = B<{..}>.B<{+..}> =
769 B<{+/}>/B<{/..}>.B<{+..}> = B<{...}>.B<{+...}> = B<{+/}>/B<{/...}>.B<{+...}>.
771 parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
772 parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
773 parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
774 parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
775 parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
776 parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
777 parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
778 parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
780 Output:
782 dir/sub/file.ex1.ex2.ex3
784 B<{##}> is simply the number of jobs:
786 parallel --plus echo Job {#} of {##} ::: {1..5}
788 Output:
790 Job 1 of 5
791 Job 2 of 5
792 Job 3 of 5
793 Job 4 of 5
794 Job 5 of 5
796 =head3 Dynamic replacement strings with --plus
798 B<--plus> also defines these dynamic replacement strings:
800 =over 19
802 =item B<{:-I<string>}>
804 Default value is I<string> if the argument is empty.
806 =item B<{:I<number>}>
808 Substring from I<number> till end of string.
810 =item B<{:I<number1>:I<number2>}>
812 Substring from I<number1> to I<number2>.
814 =item B<{#I<string>}>
816 If the argument starts with I<string>, remove it.
818 =item B<{%I<string>}>
820 If the argument ends with I<string>, remove it.
822 =item B<{/I<string1>/I<string2>}>
824 Replace I<string1> with I<string2>.
826 =item B<{^I<string>}>
828 If the argument starts with I<string>, upper case it. I<string> must
829 be a single letter.
831 =item B<{^^I<string>}>
833 If the argument contains I<string>, upper case it. I<string> must be a
834 single letter.
836 =item B<{,I<string>}>
838 If the argument starts with I<string>, lower case it. I<string> must
839 be a single letter.
841 =item B<{,,I<string>}>
843 If the argument contains I<string>, lower case it. I<string> must be a
844 single letter.
846 =back
848 They are inspired from B<Bash>:
850 unset myvar
851 echo ${myvar:-myval}
852 parallel --plus echo {:-myval} ::: "$myvar"
854 myvar=abcAaAdef
855 echo ${myvar:2}
856 parallel --plus echo {:2} ::: "$myvar"
858 echo ${myvar:2:3}
859 parallel --plus echo {:2:3} ::: "$myvar"
861 echo ${myvar#bc}
862 parallel --plus echo {#bc} ::: "$myvar"
863 echo ${myvar#abc}
864 parallel --plus echo {#abc} ::: "$myvar"
866 echo ${myvar%de}
867 parallel --plus echo {%de} ::: "$myvar"
868 echo ${myvar%def}
869 parallel --plus echo {%def} ::: "$myvar"
871 echo ${myvar/def/ghi}
872 parallel --plus echo {/def/ghi} ::: "$myvar"
874 echo ${myvar^a}
875 parallel --plus echo {^a} ::: "$myvar"
876 echo ${myvar^^a}
877 parallel --plus echo {^^a} ::: "$myvar"
879 myvar=AbcAaAdef
880 echo ${myvar,A}
881 parallel --plus echo '{,A}' ::: "$myvar"
882 echo ${myvar,,A}
883 parallel --plus echo '{,,A}' ::: "$myvar"
885 Output:
887 myval
888 myval
889 cAaAdef
890 cAaAdef
893 abcAaAdef
894 abcAaAdef
895 AaAdef
896 AaAdef
897 abcAaAdef
898 abcAaAdef
899 abcAaA
900 abcAaA
901 abcAaAghi
902 abcAaAghi
903 AbcAaAdef
904 AbcAaAdef
905 AbcAAAdef
906 AbcAAAdef
907 abcAaAdef
908 abcAaAdef
909 abcaaadef
910 abcaaadef
913 =head2 More than one argument
915 With B<--xargs> GNU B<parallel> will fit as many arguments as possible on a
916 single line:
918 cat num30000 | parallel --xargs echo | wc -l
920 Output (if you run this under Bash on GNU/Linux):
924 The 30000 arguments fitted on 2 lines.
926 The maximal length of a single line can be set with B<-s>. With a maximal
927 line length of 10000 chars 17 commands will be run:
929 cat num30000 | parallel --xargs -s 10000 echo | wc -l
931 Output:
935 For better parallelism GNU B<parallel> can distribute the arguments
936 between all the parallel jobs when end of file is met.
938 Below GNU B<parallel> reads the last argument when generating the second
939 job. When GNU B<parallel> reads the last argument, it spreads all the
940 arguments for the second job over 4 jobs instead, as 4 parallel jobs
941 are requested.
943 The first job will be the same as the B<--xargs> example above, but the
944 second job will be split into 4 evenly sized jobs, resulting in a
945 total of 5 jobs:
947 cat num30000 | parallel --jobs 4 -m echo | wc -l
949 Output (if you run this under Bash on GNU/Linux):
953 This is even more visible when running 4 jobs with 10 arguments. The
954 10 arguments are being spread over 4 jobs:
956 parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
958 Output:
960 1 2 3
961 4 5 6
962 7 8 9
965 A replacement string can be part of a word. B<-m> will not repeat the context:
967 parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
969 Output (the order may be different):
971 pre-A B-post
972 pre-C D-post
973 pre-E F-post
974 pre-G-post
976 To repeat the context use B<-X> which otherwise works like B<-m>:
978 parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
980 Output (the order may be different):
982 pre-A-post pre-B-post
983 pre-C-post pre-D-post
984 pre-E-post pre-F-post
985 pre-G-post
987 To limit the number of arguments use B<-N>:
989 parallel -N3 echo ::: A B C D E F G H
991 Output (the order may be different):
993 A B C
994 D E F
997 B<-N> also sets the positional replacement strings:
999 parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
1001 Output (the order may be different):
1003 1=A 2=B 3=C
1004 1=D 2=E 3=F
1005 1=G 2=H 3=
1007 B<-N0> reads 1 argument but inserts none:
1009 parallel -N0 echo foo ::: 1 2 3
1011 Output:
1017 =head2 Quoting
1019 Command lines that contain special characters may need to be protected from the shell.
1021 The B<perl> program B<print "@ARGV\n"> basically works like B<echo>.
1023 perl -e 'print "@ARGV\n"' A
1025 Output:
1029 To run that in parallel the command needs to be quoted:
1031 parallel perl -e 'print "@ARGV\n"' ::: This wont work
1033 Output:
1035 [Nothing]
1037 To quote the command use B<-q>:
1039 parallel -q perl -e 'print "@ARGV\n"' ::: This works
1041 Output (the order may be different):
1043 This
1044 works
1046 Or you can quote the critical part using B<\'>:
1048 parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
1050 Output (the order may be different):
1052 This
1053 works,
1056 GNU B<parallel> can also \-quote full lines. Simply run this:
1058 parallel --shellquote
1059 Warning: Input is read from the terminal. You either know what you
1060 Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
1061 Warning: ::: or :::: or to pipe data into parallel. If so
1062 Warning: consider going through the tutorial: man parallel_tutorial
1063 Warning: Press CTRL-D to exit.
1064 perl -e 'print "@ARGV\n"'
1065 [CTRL-D]
1067 Output:
1069 perl\ -e\ \'print\ \"@ARGV\\n\"\'
1071 This can then be used as the command:
1073 parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
1075 Output (the order may be different):
1077 This
1078 also
1079 works
1082 =head2 Trimming space
1084 Space can be trimmed on the arguments using B<--trim>:
1086 parallel --trim r echo pre-{}-post ::: ' A '
1088 Output:
1090 pre- A-post
1092 To trim on the left side:
1094 parallel --trim l echo pre-{}-post ::: ' A '
1096 Output:
1098 pre-A -post
1100 To trim on the both sides:
1102 parallel --trim lr echo pre-{}-post ::: ' A '
1104 Output:
1106 pre-A-post
1109 =head2 Respecting the shell
1111 This tutorial uses Bash as the shell. GNU B<parallel> respects which
1112 shell you are using, so in B<zsh> you can do:
1114 parallel echo \={} ::: zsh bash ls
1116 Output:
1118 /usr/bin/zsh
1119 /bin/bash
1120 /bin/ls
1122 In B<csh> you can do:
1124 parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
1126 Output:
1128 [somedir] is a dir
1130 This also becomes useful if you use GNU B<parallel> in a shell script:
1131 GNU B<parallel> will use the same shell as the shell script.
1134 =head1 Controlling the output
1136 The output can prefixed with the argument:
1138 parallel --tag echo foo-{} ::: A B C
1140 Output (the order may be different):
1142 A foo-A
1143 B foo-B
1144 C foo-C
1146 To prefix it with another string use B<--tagstring>:
1148 parallel --tagstring {}-bar echo foo-{} ::: A B C
1150 Output (the order may be different):
1152 A-bar foo-A
1153 B-bar foo-B
1154 C-bar foo-C
1156 To see what commands will be run without running them use B<--dryrun>:
1158 parallel --dryrun echo {} ::: A B C
1160 Output (the order may be different):
1162 echo A
1163 echo B
1164 echo C
1166 To print the command before running them use B<--verbose>:
1168 parallel --verbose echo {} ::: A B C
1170 Output (the order may be different):
1172 echo A
1173 echo B
1175 echo C
1179 GNU B<parallel> will postpone the output until the command completes:
1181 parallel -j2 'printf "%s-start\n%s" {} {};
1182 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1184 Output:
1186 2-start
1187 2-middle
1188 2-end
1189 1-start
1190 1-middle
1191 1-end
1192 4-start
1193 4-middle
1194 4-end
1196 To get the output immediately use B<--ungroup>:
1198 parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
1199 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1201 Output:
1203 4-start
1204 42-start
1205 2-middle
1206 2-end
1207 1-start
1208 1-middle
1209 1-end
1210 -middle
1211 4-end
1213 B<--ungroup> is fast, but can cause half a line from one job to be mixed
1214 with half a line of another job. That has happened in the second line,
1215 where the line '4-middle' is mixed with '2-start'.
1217 To avoid this use B<--linebuffer>:
1219 parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
1220 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1222 Output:
1224 4-start
1225 2-start
1226 2-middle
1227 2-end
1228 1-start
1229 1-middle
1230 1-end
1231 4-middle
1232 4-end
1234 To force the output in the same order as the arguments use B<--keep-order>/B<-k>:
1236 parallel -j2 -k 'printf "%s-start\n%s" {} {};
1237 sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
1239 Output:
1241 4-start
1242 4-middle
1243 4-end
1244 2-start
1245 2-middle
1246 2-end
1247 1-start
1248 1-middle
1249 1-end
1252 =head2 Saving output into files
1254 GNU B<parallel> can save the output of each job into files:
1256 parallel --files echo ::: A B C
1258 Output will be similar to this:
1260 /tmp/pAh6uWuQCg.par
1261 /tmp/opjhZCzAX4.par
1262 /tmp/W0AT_Rph2o.par
1264 By default GNU B<parallel> will cache the output in files in B</tmp>. This
1265 can be changed by setting B<$TMPDIR> or B<--tmpdir>:
1267 parallel --tmpdir /var/tmp --files echo ::: A B C
1269 Output will be similar to this:
1271 /var/tmp/N_vk7phQRc.par
1272 /var/tmp/7zA4Ccf3wZ.par
1273 /var/tmp/LIuKgF_2LP.par
1277 TMPDIR=/var/tmp parallel --files echo ::: A B C
1279 Output: Same as above.
1281 The output files can be saved in a structured way using B<--results>:
1283 parallel --results outdir echo ::: A B C
1285 Output:
1291 These files were also generated containing the standard output
1292 (stdout), standard error (stderr), and the sequence number (seq):
1294 outdir/1/A/seq
1295 outdir/1/A/stderr
1296 outdir/1/A/stdout
1297 outdir/1/B/seq
1298 outdir/1/B/stderr
1299 outdir/1/B/stdout
1300 outdir/1/C/seq
1301 outdir/1/C/stderr
1302 outdir/1/C/stdout
1304 B<--header :> will take the first value as name and use that in the
1305 directory structure. This is useful if you are using multiple input
1306 sources:
1308 parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
1310 Generated files:
1312 outdir/f1/A/f2/C/seq
1313 outdir/f1/A/f2/C/stderr
1314 outdir/f1/A/f2/C/stdout
1315 outdir/f1/A/f2/D/seq
1316 outdir/f1/A/f2/D/stderr
1317 outdir/f1/A/f2/D/stdout
1318 outdir/f1/B/f2/C/seq
1319 outdir/f1/B/f2/C/stderr
1320 outdir/f1/B/f2/C/stdout
1321 outdir/f1/B/f2/D/seq
1322 outdir/f1/B/f2/D/stderr
1323 outdir/f1/B/f2/D/stdout
1325 The directories are named after the variables and their values.
1327 =head1 Controlling the execution
1329 =head2 Number of simultaneous jobs
1331 The number of concurrent jobs is given with B<--jobs>/B<-j>:
1333 /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
1335 With 64 jobs in parallel the 128 B<sleep>s will take 2-8 seconds to run -
1336 depending on how fast your machine is.
1338 By default B<--jobs> is the same as the number of CPU cores. So this:
1340 /usr/bin/time parallel -N0 sleep 1 :::: num128
1342 should take twice the time of running 2 jobs per CPU core:
1344 /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
1346 B<--jobs 0> will run as many jobs in parallel as possible:
1348 /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
1350 which should take 1-7 seconds depending on how fast your machine is.
1352 B<--jobs> can read from a file which is re-read when a job finishes:
1354 echo 50% > my_jobs
1355 /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
1356 sleep 1
1357 echo 0 > my_jobs
1358 wait
1360 The first second only 50% of the CPU cores will run a job. Then B<0> is
1361 put into B<my_jobs> and then the rest of the jobs will be started in
1362 parallel.
1364 Instead of basing the percentage on the number of CPU cores
1365 GNU B<parallel> can base it on the number of CPUs:
1367 parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
1369 =head2 Shuffle job order
1371 If you have many jobs (e.g. by multiple combinations of input
1372 sources), it can be handy to shuffle the jobs, so you get different
1373 values run. Use B<--shuf> for that:
1375 parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
1377 Output:
1379 All combinations but different order for each run.
1381 =head2 Interactivity
1383 GNU B<parallel> can ask the user if a command should be run using B<--interactive>:
1385 parallel --interactive echo ::: 1 2 3
1387 Output:
1389 echo 1 ?...y
1390 echo 2 ?...n
1392 echo 3 ?...y
1395 GNU B<parallel> can be used to put arguments on the command line for an
1396 interactive command such as B<emacs> to edit one file at a time:
1398 parallel --tty emacs ::: 1 2 3
1400 Or give multiple argument in one go to open multiple files:
1402 parallel -X --tty vi ::: 1 2 3
1404 =head2 A terminal for every job
1406 Using B<--tmux> GNU B<parallel> can start a terminal for every job run:
1408 seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
1410 This will tell you to run something similar to:
1412 tmux -S /tmp/tmsrPrO0 attach
1414 Using normal B<tmux> keystrokes (CTRL-b n or CTRL-b p) you can cycle
1415 between windows of the running jobs. When a job is finished it will
1416 pause for 10 seconds before closing the window.
1418 =head2 Timing
1420 Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
1421 B<parallel> can delay starting new jobs. B<--delay> I<X> will make
1422 sure there is at least I<X> seconds between each start:
1424 parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
1426 Output:
1428 Starting 1
1429 Thu Aug 15 16:24:33 CEST 2013
1430 Starting 2
1431 Thu Aug 15 16:24:35 CEST 2013
1432 Starting 3
1433 Thu Aug 15 16:24:38 CEST 2013
1436 If jobs taking more than a certain amount of time are known to fail,
1437 they can be stopped with B<--timeout>. The accuracy of B<--timeout> is
1438 2 seconds:
1440 parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
1442 Output:
1447 GNU B<parallel> can compute the median runtime for jobs and kill those
1448 that take more than 200% of the median runtime:
1450 parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
1452 Output:
1459 =head2 Progress information
1461 Based on the runtime of completed jobs GNU B<parallel> can estimate the
1462 total runtime:
1464 parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
1466 Output:
1468 Computers / CPU cores / Max jobs to run
1469 1:local / 2 / 2
1471 Computer:jobs running/jobs completed/%of started jobs/
1472 Average seconds to complete
1473 ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
1475 GNU B<parallel> can give progress information with B<--progress>:
1477 parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
1479 Output:
1481 Computers / CPU cores / Max jobs to run
1482 1:local / 2 / 2
1484 Computer:jobs running/jobs completed/%of started jobs/
1485 Average seconds to complete
1486 local:0/9/100%/1.1s
1488 A progress bar can be shown with B<--bar>:
1490 parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
1492 And a graphic bar can be shown with B<--bar> and B<zenity>:
1494 seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
1495 2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
1496 zenity --progress --auto-kill --auto-close)
1498 A logfile of the jobs completed so far can be generated with B<--joblog>:
1500 parallel --joblog /tmp/log exit ::: 1 2 3 0
1501 cat /tmp/log
1503 Output:
1505 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1506 1 : 1376577364.974 0.008 0 0 1 0 exit 1
1507 2 : 1376577364.982 0.013 0 0 2 0 exit 2
1508 3 : 1376577364.990 0.013 0 0 3 0 exit 3
1509 4 : 1376577365.003 0.003 0 0 0 0 exit 0
1511 The log contains the job sequence, which host the job was run on, the
1512 start time and run time, how much data was transferred, the exit
1513 value, the signal that killed the job, and finally the command being
1514 run.
1516 With a joblog GNU B<parallel> can be stopped and later pickup where it
1517 left off. It it important that the input of the completed jobs is
1518 unchanged.
1520 parallel --joblog /tmp/log exit ::: 1 2 3 0
1521 cat /tmp/log
1522 parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
1523 cat /tmp/log
1525 Output:
1527 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1528 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1529 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1530 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1531 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1533 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1534 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1535 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1536 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1537 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1538 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1539 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1541 Note how the start time of the last 2 jobs is clearly different from the second run.
1543 With B<--resume-failed> GNU B<parallel> will re-run the jobs that failed:
1545 parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
1546 cat /tmp/log
1548 Output:
1550 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1551 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1552 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1553 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1554 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1555 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1556 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1557 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1558 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1559 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1561 Note how seq 1 2 3 have been repeated because they had exit value
1562 different from 0.
1564 B<--retry-failed> does almost the same as B<--resume-failed>. Where
1565 B<--resume-failed> reads the commands from the command line (and
1566 ignores the commands in the joblog), B<--retry-failed> ignores the
1567 command line and reruns the commands mentioned in the joblog.
1569 parallel --retry-failed --joblog /tmp/log
1570 cat /tmp/log
1572 Output:
1574 Seq Host Starttime Runtime Send Receive Exitval Signal Command
1575 1 : 1376580069.544 0.008 0 0 1 0 exit 1
1576 2 : 1376580069.552 0.009 0 0 2 0 exit 2
1577 3 : 1376580069.560 0.012 0 0 3 0 exit 3
1578 4 : 1376580069.571 0.005 0 0 0 0 exit 0
1579 5 : 1376580070.028 0.009 0 0 0 0 exit 0
1580 6 : 1376580070.038 0.007 0 0 0 0 exit 0
1581 1 : 1376580154.433 0.010 0 0 1 0 exit 1
1582 2 : 1376580154.444 0.022 0 0 2 0 exit 2
1583 3 : 1376580154.466 0.005 0 0 3 0 exit 3
1584 1 : 1376580164.633 0.010 0 0 1 0 exit 1
1585 2 : 1376580164.644 0.022 0 0 2 0 exit 2
1586 3 : 1376580164.666 0.005 0 0 3 0 exit 3
1589 =head2 Termination
1591 =head3 Unconditional termination
1593 By default GNU B<parallel> will wait for all jobs to finish before exiting.
1595 If you send GNU B<parallel> the B<TERM> signal, GNU B<parallel> will
1596 stop spawning new jobs and wait for the remaining jobs to finish. If
1597 you send GNU B<parallel> the B<TERM> signal again, GNU B<parallel>
1598 will kill all running jobs and exit.
1600 =head3 Termination dependent on job status
1602 For certain jobs there is no need to continue if one of the jobs fails
1603 and has an exit code different from 0. GNU B<parallel> will stop spawning new jobs
1604 with B<--halt soon,fail=1>:
1606 parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1608 Output:
1613 parallel: This job failed:
1614 echo 1; exit 1
1615 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1618 With B<--halt now,fail=1> the running jobs will be killed immediately:
1620 parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
1622 Output:
1627 parallel: This job failed:
1628 echo 1; exit 1
1630 If B<--halt> is given a percentage this percentage of the jobs must fail
1631 before GNU B<parallel> stops spawning more jobs:
1633 parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
1634 ::: 0 1 2 3 4 5 6 7 8 9
1636 Output:
1640 parallel: This job failed:
1641 echo 1; exit 1
1643 parallel: This job failed:
1644 echo 2; exit 2
1645 parallel: Starting no more jobs. Waiting for 1 jobs to finish.
1647 parallel: This job failed:
1648 echo 3; exit 3
1650 If you are looking for success instead of failures, you can use
1651 B<success>. This will finish as soon as the first job succeeds:
1653 parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
1655 Output:
1661 parallel: This job succeeded:
1662 echo 0; exit 0
1664 GNU B<parallel> can retry the command with B<--retries>. This is useful if a
1665 command fails for unknown reasons now and then.
1667 parallel -k --retries 3 \
1668 'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
1669 cat /tmp/runs
1671 Output:
1673 completed 1
1674 completed 2
1675 completed 0
1677 tried 1
1678 tried 2
1679 tried 1
1680 tried 2
1681 tried 1
1682 tried 2
1683 tried 0
1685 Note how job 1 and 2 were tried 3 times, but 0 was not retried because it had exit code 0.
1687 =head3 Termination signals (advanced)
1689 Using B<--termseq> you can control which signals are sent when killing
1690 children. Normally children will be killed by sending them B<SIGTERM>,
1691 waiting 200 ms, then another B<SIGTERM>, waiting 100 ms, then another
1692 B<SIGTERM>, waiting 50 ms, then a B<SIGKILL>, finally waiting 25 ms
1693 before giving up. It looks like this:
1695 show_signals() {
1696 perl -e 'for(keys %SIG) {
1697 $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
1699 while(1){sleep 1}'
1701 export -f show_signals
1702 echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
1703 -u --timeout 1 show_signals
1705 Output:
1707 Got TERM
1708 Got TERM
1709 Got TERM
1711 Or just:
1713 echo | parallel -u --timeout 1 show_signals
1715 Output: Same as above.
1717 You can change this to B<SIGINT>, B<SIGTERM>, B<SIGKILL>:
1719 echo | parallel --termseq INT,200,TERM,100,KILL,25 \
1720 -u --timeout 1 show_signals
1722 Output:
1724 Got INT
1725 Got TERM
1727 The B<SIGKILL> does not show because it cannot be caught, and thus the
1728 child dies.
1731 =head2 Limiting the resources
1733 To avoid overloading systems GNU B<parallel> can look at the system load
1734 before starting another job:
1736 parallel --load 100% echo load is less than {} job per cpu ::: 1
1738 Output:
1740 [when then load is less than the number of cpu cores]
1741 load is less than 1 job per cpu
1743 GNU B<parallel> can also check if the system is swapping.
1745 parallel --noswap echo the system is not swapping ::: now
1747 Output:
1749 [when then system is not swapping]
1750 the system is not swapping now
1752 Some jobs need a lot of memory, and should only be started when there
1753 is enough memory free. Using B<--memfree> GNU B<parallel> can check if
1754 there is enough memory free. Additionally, GNU B<parallel> will kill
1755 off the youngest job if the memory free falls below 50% of the
1756 size. The killed job will put back on the queue and retried later.
1758 parallel --memfree 1G echo will run if more than 1 GB is ::: free
1760 GNU B<parallel> can run the jobs with a nice value. This will work both
1761 locally and remotely.
1763 parallel --nice 17 echo this is being run with nice -n ::: 17
1765 Output:
1767 this is being run with nice -n 17
1769 =head1 Remote execution
1771 GNU B<parallel> can run jobs on remote servers. It uses B<ssh> to
1772 communicate with the remote machines.
1774 =head2 Sshlogin
1776 The most basic sshlogin is B<-S> I<host>:
1778 parallel -S $SERVER1 echo running on ::: $SERVER1
1780 Output:
1782 running on [$SERVER1]
1784 To use a different username prepend the server with I<username@>:
1786 parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
1788 Output:
1790 running on [username@$SERVER1]
1792 The special sshlogin B<:> is the local machine:
1794 parallel -S : echo running on ::: the_local_machine
1796 Output:
1798 running on the_local_machine
1800 If B<ssh> is not in $PATH it can be prepended to $SERVER1:
1802 parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
1804 Output:
1806 custom ssh
1808 The B<ssh> command can also be given using B<--ssh>:
1810 parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
1812 or by setting B<$PARALLEL_SSH>:
1814 export PARALLEL_SSH=/usr/bin/ssh
1815 parallel -S $SERVER1 echo custom ::: ssh
1817 Several servers can be given using multiple B<-S>:
1819 parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
1821 Output (the order may be different):
1823 running
1825 more
1826 hosts
1828 Or they can be separated by B<,>:
1830 parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
1832 Output: Same as above.
1834 Or newline:
1836 # This gives a \n between $SERVER1 and $SERVER2
1837 SERVERS="`echo $SERVER1; echo $SERVER2`"
1838 parallel -S "$SERVERS" echo ::: running on more hosts
1840 They can also be read from a file (replace I<user@> with the user on B<$SERVER2>):
1842 echo $SERVER1 > nodefile
1843 # Force 4 cores, special ssh-command, username
1844 echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
1845 parallel --sshloginfile nodefile echo ::: running on more hosts
1847 Output: Same as above.
1849 Every time a job finished, the B<--sshloginfile> will be re-read, so
1850 it is possible to both add and remove hosts while running.
1852 The special B<--sshloginfile ..> reads from B<~/.parallel/sshloginfile>.
1854 To force GNU B<parallel> to treat a server having a given number of CPU
1855 cores prepend the number of core followed by B</> to the sshlogin:
1857 parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
1859 Output:
1861 force 4 cpus on server
1863 Servers can be put into groups by prepending I<@groupname> to the
1864 server and the group can then be selected by appending I<@groupname> to
1865 the argument if using B<--hostgroup>:
1867 parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
1868 ::: run_on_grp1@grp1 run_on_grp2@grp2
1870 Output:
1872 run_on_grp1
1873 run_on_grp2
1875 A host can be in multiple groups by separating the groups with B<+>, and
1876 you can force GNU B<parallel> to limit the groups on which the command
1877 can be run with B<-S> I<@groupname>:
1879 parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
1880 ::: run_on_grp1 also_grp1
1882 Output:
1884 run_on_grp1
1885 also_grp1
1887 =head2 Transferring files
1889 GNU B<parallel> can transfer the files to be processed to the remote
1890 host. It does that using rsync.
1892 echo This is input_file > input_file
1893 parallel -S $SERVER1 --transferfile {} cat ::: input_file
1895 Output:
1897 This is input_file
1899 If the files are processed into another file, the resulting file can be
1900 transferred back:
1902 echo This is input_file > input_file
1903 parallel -S $SERVER1 --transferfile {} --return {}.out \
1904 cat {} ">"{}.out ::: input_file
1905 cat input_file.out
1907 Output: Same as above.
1909 To remove the input and output file on the remote server use B<--cleanup>:
1911 echo This is input_file > input_file
1912 parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
1913 cat {} ">"{}.out ::: input_file
1914 cat input_file.out
1916 Output: Same as above.
1918 There is a shorthand for B<--transferfile {} --return --cleanup> called B<--trc>:
1920 echo This is input_file > input_file
1921 parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
1922 cat input_file.out
1924 Output: Same as above.
1926 Some jobs need a common database for all jobs. GNU B<parallel> can
1927 transfer that using B<--basefile> which will transfer the file before the
1928 first job:
1930 echo common data > common_file
1931 parallel --basefile common_file -S $SERVER1 \
1932 cat common_file\; echo {} ::: foo
1934 Output:
1936 common data
1939 To remove it from the remote host after the last job use B<--cleanup>.
1942 =head2 Working dir
1944 The default working dir on the remote machines is the login dir. This
1945 can be changed with B<--workdir> I<mydir>.
1947 Files transferred using B<--transferfile> and B<--return> will be relative
1948 to I<mydir> on remote computers, and the command will be executed in
1949 the dir I<mydir>.
1951 The special I<mydir> value B<...> will create working dirs under
1952 B<~/.parallel/tmp> on the remote computers. If B<--cleanup> is given
1953 these dirs will be removed.
1955 The special I<mydir> value B<.> uses the current working dir. If the
1956 current working dir is beneath your home dir, the value B<.> is
1957 treated as the relative path to your home dir. This means that if your
1958 home dir is different on remote computers (e.g. if your login is
1959 different) the relative path will still be relative to your home dir.
1961 parallel -S $SERVER1 pwd ::: ""
1962 parallel --workdir . -S $SERVER1 pwd ::: ""
1963 parallel --workdir ... -S $SERVER1 pwd ::: ""
1965 Output:
1967 [the login dir on $SERVER1]
1968 [current dir relative on $SERVER1]
1969 [a dir in ~/.parallel/tmp/...]
1972 =head2 Avoid overloading sshd
1974 If many jobs are started on the same server, B<sshd> can be
1975 overloaded. GNU B<parallel> can insert a delay between each job run on
1976 the same server:
1978 parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
1980 Output (the order may be different):
1986 B<sshd> will be less overloaded if using B<--controlmaster>, which will
1987 multiplex ssh connections:
1989 parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
1991 Output: Same as above.
1993 =head2 Ignore hosts that are down
1995 In clusters with many hosts a few of them are often down. GNU B<parallel>
1996 can ignore those hosts. In this case the host 173.194.32.46 is down:
1998 parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
2000 Output:
2004 =head2 Running the same commands on all hosts
2006 GNU B<parallel> can run the same command on all the hosts:
2008 parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
2010 Output (the order may be different):
2017 Often you will just want to run a single command on all hosts with out
2018 arguments. B<--nonall> is a no argument B<--onall>:
2020 parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
2022 Output:
2024 foo bar
2025 foo bar
2027 When B<--tag> is used with B<--nonall> and B<--onall> the B<--tagstring> is the host:
2029 parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
2031 Output (the order may be different):
2033 $SERVER1 foo bar
2034 $SERVER2 foo bar
2036 B<--jobs> sets the number of servers to log in to in parallel.
2038 =head2 Transferring environment variables and functions
2040 B<env_parallel> is a shell function that transfers all aliases,
2041 functions, variables, and arrays. You active it by running:
2043 source `which env_parallel.bash`
2045 Replace B<bash> with the shell you use.
2047 Now you can use B<env_parallel> instead of B<parallel> and still have
2048 your environment:
2050 alias myecho=echo
2051 myvar="Joe's var is"
2052 env_parallel -S $SERVER1 'myecho $myvar' ::: green
2054 Output:
2056 Joe's var is green
2058 The disadvantage is that if your environment is huge B<env_parallel>
2059 will fail.
2061 When B<env_parallel> fails, you can still use B<--env> to tell GNU
2062 B<parallel> to transfer an environment variable to the remote system.
2064 MYVAR='foo bar'
2065 export MYVAR
2066 parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
2068 Output:
2070 foo bar baz
2072 This works for functions, too, if your shell is Bash:
2074 # This only works in Bash
2075 my_func() {
2076 echo in my_func $1
2078 export -f my_func
2079 parallel --env my_func -S $SERVER1 my_func ::: baz
2081 Output:
2083 in my_func baz
2085 GNU B<parallel> can copy all user defined variables and functions to
2086 the remote system. It just needs to record which ones to ignore in
2087 B<~/.parallel/ignored_vars>. Do that by running this once:
2089 parallel --record-env
2090 cat ~/.parallel/ignored_vars
2092 Output:
2094 [list of variables to ignore - including $PATH and $HOME]
2096 Now all other variables and functions defined will be copied when
2097 using B<--env _>.
2099 # The function is only copied if using Bash
2100 my_func2() {
2101 echo in my_func2 $VAR $1
2103 export -f my_func2
2104 VAR=foo
2105 export VAR
2107 parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
2109 Output:
2112 in my_func2 foo bar
2114 If you use B<env_parallel> the variables, functions, and aliases do
2115 not even need to be exported to be copied:
2117 NOT='not exported var'
2118 alias myecho=echo
2119 not_ex() {
2120 myecho in not_exported_func $NOT $1
2122 env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
2124 Output:
2126 not exported var
2127 in not_exported_func not exported var bar
2130 =head2 Showing what is actually run
2132 B<--verbose> will show the command that would be run on the local
2133 machine.
2135 When using B<--cat>, B<--pipepart>, or when a job is run on a remote
2136 machine, the command is wrapped with helper scripts. B<-vv> shows all
2137 of this.
2139 parallel -vv --pipepart --block 1M wc :::: num30000
2141 Output:
2143 <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
2144 $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
2145 ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
2146 0 0 0 168894 | (wc)
2147 30000 30000 168894
2149 When the command gets more complex, the output is so hard to read,
2150 that it is only useful for debugging:
2152 my_func3() {
2153 echo in my_func $1 > $1.out
2155 export -f my_func3
2156 parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
2157 -S $SERVER1 my_func3 {} ::: abc-file
2159 Output will be similar to:
2162 ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
2163 --protocol 30 -rlDzR -essh ./abc-file
2164 server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
2165 \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
2166 eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
2167 c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
2168 TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
2169 BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
2170 iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
2171 IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
2172 0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
2173 ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
2174 TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
2175 YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
2176 Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
2177 RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
2178 MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
2179 KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
2180 fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
2181 W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
2182 JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
2183 mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
2184 dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
2185 _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
2186 ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
2187 server:./abc-file.out ./.;ssh server -- \(rm\ -f\
2188 ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
2189 ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
2190 2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
2191 server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
2192 sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
2193 ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
2194 ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
2195 .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
2197 =head1 Saving output to shell variables (advanced)
2199 GNU B<parset> will set shell variables to the output of GNU
2200 B<parallel>. GNU B<parset> has one important limitation: It cannot be
2201 part of a pipe. In particular this means it cannot read anything from
2202 standard input (stdin) or pipe output to another program.
2204 To use GNU B<parset> prepend command with destination variables:
2206 parset myvar1,myvar2 echo ::: a b
2207 echo $myvar1
2208 echo $myvar2
2210 Output:
2215 If you only give a single variable, it will be treated as an array:
2217 parset myarray seq {} 5 ::: 1 2 3
2218 echo "${myarray[1]}"
2220 Output:
2227 The commands to run can be an array:
2229 cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
2230 parset data ::: "${cmd[@]}"
2231 echo "${data[0]}"
2232 echo "${data[1]}"
2234 Output:
2236 <<joe "double space" cartoon>>
2237 [current dir]
2240 =head1 Saving to an SQL base (advanced)
2242 GNU B<parallel> can save into an SQL base. Point GNU B<parallel> to a
2243 table and it will put the joblog there together with the variables and
2244 the output each in their own column.
2246 =head2 CSV as SQL base
2248 The simplest is to use a CSV file as the storage table:
2250 parallel --sqlandworker csv:///%2Ftmp/log.csv \
2251 seq ::: 10 ::: 12 13 14
2252 cat /tmp/log.csv
2254 Note how '/' in the path must be written as %2F.
2256 Output will be similar to:
2258 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2259 Command,V1,V2,Stdout,Stderr
2260 1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
2264 2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
2269 3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
2276 A proper CSV reader (like LibreOffice or R's read.csv) will read this
2277 format correctly - even with fields containing newlines as above.
2279 If the output is big you may want to put it into files using B<--results>:
2281 parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
2282 seq ::: 10 ::: 12 13 14
2283 cat /tmp/log2.csv
2285 Output will be similar to:
2287 Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
2288 Command,V1,V2,Stdout,Stderr
2289 1,:,1458824738.287,0.029,0,9,0,0,
2290 "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
2291 2,:,1458824738.298,0.025,0,12,0,0,
2292 "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
2293 3,:,1458824738.309,0.026,0,15,0,0,
2294 "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
2297 =head2 DBURL as table
2299 The CSV file is an example of a DBURL.
2301 GNU B<parallel> uses a DBURL to address the table. A DBURL has this format:
2303 vendor://[[user][:password]@][host][:port]/[database[/table]
2305 Example:
2307 mysql://scott:tiger@my.example.com/mydatabase/mytable
2308 postgresql://scott:tiger@pg.example.com/mydatabase/mytable
2309 sqlite3:///%2Ftmp%2Fmydatabase/mytable
2310 csv:///%2Ftmp/log.csv
2312 To refer to B</tmp/mydatabase> with B<sqlite> or B<csv> you need to
2313 encode the B</> as B<%2F>.
2315 Run a job using B<sqlite> on B<mytable> in B</tmp/mydatabase>:
2317 DBURL=sqlite3:///%2Ftmp%2Fmydatabase
2318 DBURLTABLE=$DBURL/mytable
2319 parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2321 To see the result:
2323 sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
2325 Output will be similar to:
2327 Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
2328 Command|V1|V2|Stdout|Stderr
2329 1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
2331 2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
2333 3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
2335 4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
2338 The first columns are well known from B<--joblog>. B<V1> and B<V2> are
2339 data from the input sources. B<Stdout> and B<Stderr> are standard
2340 output and standard error, respectively.
2342 =head2 Using multiple workers
2344 Using an SQL base as storage costs overhead in the order of 1 second
2345 per job.
2347 One of the situations where it makes sense is if you have multiple
2348 workers.
2350 You can then have a single master machine that submits jobs to the SQL
2351 base (but does not do any of the work):
2353 parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
2355 On the worker machines you run exactly the same command except you
2356 replace B<--sqlmaster> with B<--sqlworker>.
2358 parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
2360 To run a master and a worker on the same machine use B<--sqlandworker>
2361 as shown earlier.
2364 =head1 --pipe
2366 The B<--pipe> functionality puts GNU B<parallel> in a different mode:
2367 Instead of treating the data on stdin (standard input) as arguments
2368 for a command to run, the data will be sent to stdin (standard input)
2369 of the command.
2371 The typical situation is:
2373 command_A | command_B | command_C
2375 where command_B is slow, and you want to speed up command_B.
2377 =head2 Chunk size
2379 By default GNU B<parallel> will start an instance of command_B, read a
2380 chunk of 1 MB, and pass that to the instance. Then start another
2381 instance, read another chunk, and pass that to the second instance.
2383 cat num1000000 | parallel --pipe wc
2385 Output (the order may be different):
2387 165668 165668 1048571
2388 149797 149797 1048579
2389 149796 149796 1048572
2390 149797 149797 1048579
2391 149797 149797 1048579
2392 149796 149796 1048572
2393 85349 85349 597444
2395 The size of the chunk is not exactly 1 MB because GNU B<parallel> only
2396 passes full lines - never half a line, thus the blocksize is only
2397 1 MB on average. You can change the block size to 2 MB with B<--block>:
2399 cat num1000000 | parallel --pipe --block 2M wc
2401 Output (the order may be different):
2403 315465 315465 2097150
2404 299593 299593 2097151
2405 299593 299593 2097151
2406 85349 85349 597444
2408 GNU B<parallel> treats each line as a record. If the order of records
2409 is unimportant (e.g. you need all lines processed, but you do not care
2410 which is processed first), then you can use B<--roundrobin>. Without
2411 B<--roundrobin> GNU B<parallel> will start a command per block; with
2412 B<--roundrobin> only the requested number of jobs will be started
2413 (B<--jobs>). The records will then be distributed between the running
2414 jobs:
2416 cat num1000000 | parallel --pipe -j4 --roundrobin wc
2418 Output will be similar to:
2420 149797 149797 1048579
2421 299593 299593 2097151
2422 315465 315465 2097150
2423 235145 235145 1646016
2425 One of the 4 instances got a single record, 2 instances got 2 full
2426 records each, and one instance got 1 full and 1 partial record.
2428 =head2 Records
2430 GNU B<parallel> sees the input as records. The default record is a single
2431 line.
2433 Using B<-N140000> GNU B<parallel> will read 140000 records at a time:
2435 cat num1000000 | parallel --pipe -N140000 wc
2437 Output (the order may be different):
2439 140000 140000 868895
2440 140000 140000 980000
2441 140000 140000 980000
2442 140000 140000 980000
2443 140000 140000 980000
2444 140000 140000 980000
2445 140000 140000 980000
2446 20000 20000 140001
2448 Note how that the last job could not get the full 140000 lines, but
2449 only 20000 lines.
2451 If a record is 75 lines B<-L> can be used:
2453 cat num1000000 | parallel --pipe -L75 wc
2455 Output (the order may be different):
2457 165600 165600 1048095
2458 149850 149850 1048950
2459 149775 149775 1048425
2460 149775 149775 1048425
2461 149850 149850 1048950
2462 149775 149775 1048425
2463 85350 85350 597450
2464 25 25 176
2466 Note how GNU B<parallel> still reads a block of around 1 MB; but
2467 instead of passing full lines to B<wc> it passes full 75 lines at a
2468 time. This of course does not hold for the last job (which in this
2469 case got 25 lines).
2471 =head2 Fixed length records
2473 Fixed length records can be processed by setting B<--recend ''> and
2474 B<--block I<recordsize>>. A header of size I<n> can be processed with
2475 B<--header .{I<n>}>.
2477 Here is how to process a file with a 4-byte header and a 3-byte record
2478 size:
2480 cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
2481 'echo start; cat; echo'
2483 Output:
2485 start
2486 HHHHAAA
2487 start
2488 HHHHCCC
2489 start
2490 HHHHBBB
2492 It may be more efficient to increase B<--block> to a multiplum of the
2493 record size.
2495 =head2 Record separators
2497 GNU B<parallel> uses separators to determine where two records split.
2499 B<--recstart> gives the string that starts a record; B<--recend> gives the
2500 string that ends a record. The default is B<--recend '\n'> (newline).
2502 If both B<--recend> and B<--recstart> are given, then the record will only
2503 split if the recend string is immediately followed by the recstart
2504 string.
2506 Here the B<--recend> is set to B<', '>:
2508 echo /foo, bar/, /baz, qux/, | \
2509 parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
2511 Output:
2513 JOB1
2514 /foo, END
2515 JOB2
2516 bar/, END
2517 JOB3
2518 /baz, END
2519 JOB4
2520 qux/,
2523 Here the B<--recstart> is set to B</>:
2525 echo /foo, bar/, /baz, qux/, | \
2526 parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
2528 Output:
2530 JOB1
2531 /foo, barEND
2532 JOB2
2533 /, END
2534 JOB3
2535 /baz, quxEND
2536 JOB4
2540 Here both B<--recend> and B<--recstart> are set:
2542 echo /foo, bar/, /baz, qux/, | \
2543 parallel -kN1 --recend ', ' --recstart / --pipe \
2544 echo JOB{#}\;cat\;echo END
2546 Output:
2548 JOB1
2549 /foo, bar/, END
2550 JOB2
2551 /baz, qux/,
2554 Note the difference between setting one string and setting both strings.
2556 With B<--regexp> the B<--recend> and B<--recstart> will be treated as
2557 a regular expression:
2559 echo foo,bar,_baz,__qux, | \
2560 parallel -kN1 --regexp --recend ,_+ --pipe \
2561 echo JOB{#}\;cat\;echo END
2563 Output:
2565 JOB1
2566 foo,bar,_END
2567 JOB2
2568 baz,__END
2569 JOB3
2570 qux,
2573 GNU B<parallel> can remove the record separators with
2574 B<--remove-rec-sep>/B<--rrs>:
2576 echo foo,bar,_baz,__qux, | \
2577 parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
2578 echo JOB{#}\;cat\;echo END
2580 Output:
2582 JOB1
2583 foo,barEND
2584 JOB2
2585 bazEND
2586 JOB3
2587 qux,
2590 =head2 Header
2592 If the input data has a header, the header can be repeated for each
2593 job by matching the header with B<--header>. If headers start with
2594 B<%> you can do this:
2596 cat num_%header | \
2597 parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
2599 Output (the order may be different):
2601 JOB1
2602 %head1
2603 %head2
2607 JOB2
2608 %head1
2609 %head2
2613 JOB3
2614 %head1
2615 %head2
2619 JOB4
2620 %head1
2621 %head2
2624 If the header is 2 lines, B<--header> 2 will work:
2626 cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
2628 Output: Same as above.
2630 =head2 --pipepart
2632 B<--pipe> is not very efficient. It maxes out at around 500
2633 MB/s. B<--pipepart> can easily deliver 5 GB/s. But there are a few
2634 limitations. The input has to be a normal file (not a pipe) given by
2635 B<-a> or B<::::> and B<-L>/B<-l>/B<-N> do not work. B<--recend> and
2636 B<--recstart>, however, I<do> work, and records can often be split on
2637 that alone.
2639 parallel --pipepart -a num1000000 --block 3m wc
2641 Output (the order may be different):
2643 444443 444444 3000002
2644 428572 428572 3000004
2645 126985 126984 888890
2648 =head1 Shebang
2650 =head2 Input data and parallel command in the same file
2652 GNU B<parallel> is often called as this:
2654 cat input_file | parallel command
2656 With B<--shebang> the I<input_file> and B<parallel> can be combined into the same script.
2658 UNIX shell scripts start with a shebang line like this:
2660 #!/bin/bash
2662 GNU B<parallel> can do that, too. With B<--shebang> the arguments can be
2663 listed in the file. The B<parallel> command is the first line of the
2664 script:
2666 #!/usr/bin/parallel --shebang -r echo
2672 Output (the order may be different):
2678 =head2 Parallelizing existing scripts
2680 GNU B<parallel> is often called as this:
2682 cat input_file | parallel command
2683 parallel command ::: foo bar
2685 If B<command> is a script, B<parallel> can be combined into a single
2686 file so this will run the script in parallel:
2688 cat input_file | command
2689 command foo bar
2691 This B<perl> script B<perl_echo> works like B<echo>:
2693 #!/usr/bin/perl
2695 print "@ARGV\n"
2697 It can be called as this:
2699 parallel perl_echo ::: foo bar
2701 By changing the B<#!>-line it can be run in parallel:
2703 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2705 print "@ARGV\n"
2707 Thus this will work:
2709 perl_echo foo bar
2711 Output (the order may be different):
2716 This technique can be used for:
2718 =over 9
2720 =item Perl:
2722 #!/usr/bin/parallel --shebang-wrap /usr/bin/perl
2724 print "Arguments @ARGV\n";
2727 =item Python:
2729 #!/usr/bin/parallel --shebang-wrap /usr/bin/python
2731 import sys
2732 print 'Arguments', str(sys.argv)
2735 =item Bash/sh/zsh/Korn shell:
2737 #!/usr/bin/parallel --shebang-wrap /bin/bash
2739 echo Arguments "$@"
2742 =item csh:
2744 #!/usr/bin/parallel --shebang-wrap /bin/csh
2746 echo Arguments "$argv"
2749 =item Tcl:
2751 #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
2753 puts "Arguments $argv"
2756 =item R:
2758 #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
2760 args <- commandArgs(trailingOnly = TRUE)
2761 print(paste("Arguments ",args))
2764 =item GNUplot:
2766 #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
2768 print "Arguments ", system('echo $ARG')
2771 =item Ruby:
2773 #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
2775 print "Arguments "
2776 puts ARGV
2779 =item Octave:
2781 #!/usr/bin/parallel --shebang-wrap /usr/bin/octave
2783 printf ("Arguments");
2784 arg_list = argv ();
2785 for i = 1:nargin
2786 printf (" %s", arg_list{i});
2787 endfor
2788 printf ("\n");
2790 =item Common LISP:
2792 #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
2794 (format t "~&~S~&" 'Arguments)
2795 (format t "~&~S~&" *args*)
2797 =item PHP:
2799 #!/usr/bin/parallel --shebang-wrap /usr/bin/php
2800 <?php
2801 echo "Arguments";
2802 foreach(array_slice($argv,1) as $v)
2804 echo " $v";
2806 echo "\n";
2809 =item Node.js:
2811 #!/usr/bin/parallel --shebang-wrap /usr/bin/node
2813 var myArgs = process.argv.slice(2);
2814 console.log('Arguments ', myArgs);
2816 =item LUA:
2818 #!/usr/bin/parallel --shebang-wrap /usr/bin/lua
2820 io.write "Arguments"
2821 for a = 1, #arg do
2822 io.write(" ")
2823 io.write(arg[a])
2825 print("")
2827 =item C#:
2829 #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
2831 var argv = Environment.GetEnvironmentVariable("ARGV");
2832 print("Arguments "+argv);
2834 =back
2836 =head1 Semaphore
2838 GNU B<parallel> can work as a counting semaphore. This is slower and less
2839 efficient than its normal mode.
2841 A counting semaphore is like a row of toilets. People needing a toilet
2842 can use any toilet, but if there are more people than toilets, they
2843 will have to wait for one of the toilets to become available.
2845 An alias for B<parallel --semaphore> is B<sem>.
2847 B<sem> will follow a person to the toilets, wait until a toilet is
2848 available, leave the person in the toilet and exit.
2850 B<sem --fg> will follow a person to the toilets, wait until a toilet is
2851 available, stay with the person in the toilet and exit when the person
2852 exits.
2854 B<sem --wait> will wait for all persons to leave the toilets.
2856 B<sem> does not have a queue discipline, so the next person is chosen
2857 randomly.
2859 B<-j> sets the number of toilets.
2861 =head2 Mutex
2863 The default is to have only one toilet (this is called a mutex). The
2864 program is started in the background and B<sem> exits immediately. Use
2865 B<--wait> to wait for all B<sem>s to finish:
2867 sem 'sleep 1; echo The first finished' &&
2868 echo The first is now running in the background &&
2869 sem 'sleep 1; echo The second finished' &&
2870 echo The second is now running in the background
2871 sem --wait
2873 Output:
2875 The first is now running in the background
2876 The first finished
2877 The second is now running in the background
2878 The second finished
2880 The command can be run in the foreground with B<--fg>, which will only
2881 exit when the command completes:
2883 sem --fg 'sleep 1; echo The first finished' &&
2884 echo The first finished running in the foreground &&
2885 sem --fg 'sleep 1; echo The second finished' &&
2886 echo The second finished running in the foreground
2887 sem --wait
2889 The difference between this and just running the command, is that a
2890 mutex is set, so if other B<sem>s were running in the background only one
2891 would run at a time.
2893 To control which semaphore is used, use
2894 B<--semaphorename>/B<--id>. Run this in one terminal:
2896 sem --id my_id -u 'echo First started; sleep 10; echo First done'
2898 and simultaneously this in another terminal:
2900 sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
2902 Note how the second will only be started when the first has finished.
2904 =head2 Counting semaphore
2906 A mutex is like having a single toilet: When it is in use everyone
2907 else will have to wait. A counting semaphore is like having multiple
2908 toilets: Several people can use the toilets, but when they all are in
2909 use, everyone else will have to wait.
2911 B<sem> can emulate a counting semaphore. Use B<--jobs> to set the
2912 number of toilets like this:
2914 sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
2915 sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
2916 sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
2917 sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
2918 sem --wait --id my_id
2920 Output:
2922 Start 1
2923 Start 2
2924 Start 3
2925 1 done
2926 Start 4
2927 2 done
2928 3 done
2929 4 done
2931 =head2 Timeout
2933 With B<--semaphoretimeout> you can force running the command anyway after
2934 a period (positive number) or give up (negative number):
2936 sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
2937 sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
2938 sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
2939 sem --id foo --wait
2941 Output:
2943 Slow started
2944 parallel: Warning: Semaphore timed out. Stealing the semaphore.
2945 Forced running after 1 sec
2946 parallel: Warning: Semaphore timed out. Exiting.
2947 Slow ended
2949 Note how the 'Give up' was not run.
2951 =head1 Informational
2953 GNU B<parallel> has some options to give short information about the
2954 configuration.
2956 B<--help> will print a summary of the most important options:
2958 parallel --help
2960 Output:
2962 Usage:
2964 parallel [options] [command [arguments]] < list_of_arguments
2965 parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
2966 cat ... | parallel --pipe [options] [command [arguments]]
2968 -j n Run n jobs in parallel
2969 -k Keep same order
2970 -X Multiple arguments with context replace
2971 --colsep regexp Split input on regexp for positional replacements
2972 {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
2973 {3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
2974 With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
2975 {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
2977 -S sshlogin Example: foo@server.example.com
2978 --slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
2979 --trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
2980 --onall Run the given command with argument on all sshlogins
2981 --nonall Run the given command with no arguments on all sshlogins
2983 --pipe Split stdin (standard input) to multiple jobs.
2984 --recend str Record end separator for --pipe.
2985 --recstart str Record start separator for --pipe.
2987 See 'man parallel' for details
2989 Academic tradition requires you to cite works you base your article on.
2990 When using programs that use GNU Parallel to process data for publication
2991 please cite:
2993 O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
2994 ;login: The USENIX Magazine, February 2011:42-47.
2996 This helps funding further development; AND IT WON'T COST YOU A CENT.
2997 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2999 When asking for help, always report the full output of this:
3001 parallel --version
3003 Output:
3005 GNU parallel 20230122
3006 Copyright (C) 2007-2024 Ole Tange, http://ole.tange.dk and Free Software
3007 Foundation, Inc.
3008 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
3009 This is free software: you are free to change and redistribute it.
3010 GNU parallel comes with no warranty.
3012 Web site: https://www.gnu.org/software/parallel
3014 When using programs that use GNU Parallel to process data for publication
3015 please cite as described in 'parallel --citation'.
3017 In scripts B<--minversion> can be used to ensure the user has at least
3018 this version:
3020 parallel --minversion 20130722 && \
3021 echo Your version is at least 20130722.
3023 Output:
3025 20160322
3026 Your version is at least 20130722.
3028 If you are using GNU B<parallel> for research the BibTeX citation can be
3029 generated using B<--citation>:
3031 parallel --citation
3033 Output:
3035 Academic tradition requires you to cite works you base your article on.
3036 When using programs that use GNU Parallel to process data for publication
3037 please cite:
3039 @article{Tange2011a,
3040 title = {GNU Parallel - The Command-Line Power Tool},
3041 author = {O. Tange},
3042 address = {Frederiksberg, Denmark},
3043 journal = {;login: The USENIX Magazine},
3044 month = {Feb},
3045 number = {1},
3046 volume = {36},
3047 url = {https://www.gnu.org/s/parallel},
3048 year = {2011},
3049 pages = {42-47},
3050 doi = {10.5281/zenodo.16303}
3053 (Feel free to use \nocite{Tange2011a})
3055 This helps funding further development; AND IT WON'T COST YOU A CENT.
3056 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
3058 If you send a copy of your published article to tange@gnu.org, it will be
3059 mentioned in the release notes of next version of GNU Parallel.
3061 With B<--max-line-length-allowed> GNU B<parallel> will report the maximal
3062 size of the command line:
3064 parallel --max-line-length-allowed
3066 Output (may vary on different systems):
3068 131071
3070 B<--number-of-cpus> and B<--number-of-cores> run system specific code to
3071 determine the number of CPUs and CPU cores on the system. On
3072 unsupported platforms they will return 1:
3074 parallel --number-of-cpus
3075 parallel --number-of-cores
3077 Output (may vary on different systems):
3082 =head1 Profiles
3084 The defaults for GNU B<parallel> can be changed systemwide by putting the
3085 command line options in B</etc/parallel/config>. They can be changed for
3086 a user by putting them in B<~/.parallel/config>.
3088 Profiles work the same way, but have to be referred to with B<--profile>:
3090 echo '--nice 17' > ~/.parallel/nicetimeout
3091 echo '--timeout 300%' >> ~/.parallel/nicetimeout
3092 parallel --profile nicetimeout echo ::: A B C
3094 Output:
3100 Profiles can be combined:
3102 echo '-vv --dry-run' > ~/.parallel/dryverbose
3103 parallel --profile dryverbose --profile nicetimeout echo ::: A B C
3105 Output:
3107 echo A
3108 echo B
3109 echo C
3112 =head1 Spread the word
3114 I hope you have learned something from this tutorial.
3116 If you like GNU B<parallel>:
3118 =over 2
3120 =item *
3122 (Re-)walk through the tutorial if you have not done so in the past year
3123 (https://www.gnu.org/software/parallel/parallel_tutorial.html)
3125 =item *
3127 Give a demo at your local user group/your team/your colleagues
3129 =item *
3131 Post the intro videos and the tutorial on Reddit, Mastodon, Diaspora*,
3132 forums, blogs, Identi.ca, Google+, Twitter, Facebook, Linkedin, and
3133 mailing lists
3135 =item *
3137 Request or write a review for your favourite blog or magazine
3138 (especially if you do something cool with GNU B<parallel>)
3140 =item *
3142 Invite me for your next conference
3144 =back
3146 If you use GNU B<parallel> for research:
3148 =over 2
3150 =item *
3152 Please cite GNU B<parallel> in you publications (use B<--citation>)
3154 =back
3156 If GNU B<parallel> saves you money:
3158 =over 2
3160 =item *
3162 (Have your company) donate to FSF or become a member
3163 https://my.fsf.org/donate/
3165 =back
3167 (C) 2013-2024 Ole Tange, GFDLv1.3+ (See
3168 LICENSES/GFDL-1.3-or-later.txt)
3171 =cut