Released as 20210622 ('Protasevich')
[parallel.git] / src / parallel_alternatives.pod
bloba31655b4e93b89533a0cd41229d05d146b21bed7
1 #!/usr/bin/perl -w
3 # SPDX-FileCopyrightText: 2021 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
4 # SPDX-License-Identifier: GFDL-1.3-or-later
5 # SPDX-License-Identifier: CC-BY-SA-4.0
7 =encoding utf8
9 =head1 NAME
11 parallel_alternatives - Alternatives to GNU B<parallel>
14 =head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
16 There are a lot programs with some of the functionality of GNU
17 B<parallel>. GNU B<parallel> strives to include the best of the
18 functionality without sacrificing ease of use.
20 B<parallel> has existed since 2002 and as GNU B<parallel> since
21 2010. A lot of the alternatives have not had the vitality to survive
22 that long, but have come and gone during that time.
24 GNU B<parallel> is actively maintained with a new release every month
25 since 2010. Most other alternatives are fleeting interests of the
26 developers with irregular releases and only maintained for a few
27 years.
30 =head2 SUMMARY LEGEND
32 The following features are in some of the comparable tools:
34 =head3 Inputs
36 =over
38 =item I1. Arguments can be read from stdin
40 =item I2. Arguments can be read from a file
42 =item I3. Arguments can be read from multiple files
44 =item I4. Arguments can be read from command line
46 =item I5. Arguments can be read from a table
48 =item I6. Arguments can be read from the same file using #! (shebang)
50 =item I7. Line oriented input as default (Quoting of special chars not needed)
52 =back
55 =head3 Manipulation of input
57 =over
59 =item M1. Composed command
61 =item M2. Multiple arguments can fill up an execution line
63 =item M3. Arguments can be put anywhere in the execution line
65 =item M4. Multiple arguments can be put anywhere in the execution line
67 =item M5. Arguments can be replaced with context
69 =item M6. Input can be treated as the complete command line
71 =back
74 =head3 Outputs
76 =over
78 =item O1. Grouping output so output from different jobs do not mix
80 =item O2. Send stderr (standard error) to stderr (standard error)
82 =item O3. Send stdout (standard output) to stdout (standard output)
84 =item O4. Order of output can be same as order of input
86 =item O5. Stdout only contains stdout (standard output) from the command
88 =item O6. Stderr only contains stderr (standard error) from the command
90 =item O7. Buffering on disk
92 =item O8. Cleanup of temporary files if killed
94 =item O9. Test if disk runs full during run
96 =item O10. Output of a line bigger than 4 GB
98 =back
101 =head3 Execution
103 =over
105 =item E1. Running jobs in parallel
107 =item E2. List running jobs
109 =item E3. Finish running jobs, but do not start new jobs
111 =item E4. Number of running jobs can depend on number of cpus
113 =item E5. Finish running jobs, but do not start new jobs after first failure
115 =item E6. Number of running jobs can be adjusted while running
117 =item E7. Only spawn new jobs if load is less than a limit
119 =back
122 =head3 Remote execution
124 =over
126 =item R1. Jobs can be run on remote computers
128 =item R2. Basefiles can be transferred
130 =item R3. Argument files can be transferred
132 =item R4. Result files can be transferred
134 =item R5. Cleanup of transferred files
136 =item R6. No config files needed
138 =item R7. Do not run more than SSHD's MaxStartups can handle
140 =item R8. Configurable SSH command
142 =item R9. Retry if connection breaks occasionally
144 =back
147 =head3 Semaphore
149 =over
151 =item S1. Possibility to work as a mutex
153 =item S2. Possibility to work as a counting semaphore
155 =back
158 =head3 Legend
160 =over
162 =item - = no
164 =item x = not applicable
166 =item ID = yes
168 =back
170 As every new version of the programs are not tested the table may be
171 outdated. Please file a bug report if you find errors (See REPORTING
172 BUGS).
174 parallel:
176 =over
178 =item I1 I2 I3 I4 I5 I6 I7
180 =item M1 M2 M3 M4 M5 M6
182 =item O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
184 =item E1 E2 E3 E4 E5 E6 E7
186 =item R1 R2 R3 R4 R5 R6 R7 R8 R9
188 =item S1 S2
190 =back
193 =head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
195 Summary (see legend above):
197 =over
199 =item I1 I2 - - - - -
201 =item - M2 M3 - - -
203 =item - O2 O3 - O5 O6
205 =item E1 - - - - - -
207 =item - - - - - x - - -
209 =item - -
211 =back
213 B<xargs> offers some of the same possibilities as GNU B<parallel>.
215 B<xargs> deals badly with special characters (such as space, \, ' and
216 "). To see the problem try this:
218 touch important_file
219 touch 'not important_file'
220 ls not* | xargs rm
221 mkdir -p "My brother's 12\" records"
222 ls | xargs rmdir
223 touch 'c:\windows\system32\clfs.sys'
224 echo 'c:\windows\system32\clfs.sys' | xargs ls -l
226 You can specify B<-0>, but many input generators are not optimized for
227 using B<NUL> as separator but are optimized for B<newline> as
228 separator. E.g. B<awk>, B<ls>, B<echo>, B<tar -v>, B<head> (requires
229 using B<-z>), B<tail> (requires using B<-z>), B<sed> (requires using
230 B<-z>), B<perl> (B<-0> and \0 instead of \n), B<locate> (requires
231 using B<-0>), B<find> (requires using B<-print0>), B<grep> (requires
232 using B<-z> or B<-Z>), B<sort> (requires using B<-z>).
234 GNU B<parallel>'s newline separation can be emulated with:
236 B<cat | xargs -d "\n" -n1 I<command>>
238 B<xargs> can run a given number of jobs in parallel, but has no
239 support for running number-of-cpu-cores jobs in parallel.
241 B<xargs> has no support for grouping the output, therefore output may
242 run together, e.g. the first half of a line is from one process and
243 the last half of the line is from another process. The example
244 B<Parallel grep> cannot be done reliably with B<xargs> because of
245 this. To see this in action try:
247 parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
248 '>' {} ::: a b c d e f g h
249 # Serial = no mixing = the wanted result
250 # 'tr -s a-z' squeezes repeating letters into a single letter
251 echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
252 # Compare to 8 jobs in parallel
253 parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
254 echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
255 echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
256 tr -s a-z
258 Or try this:
260 slow_seq() {
261 echo Count to "$@"
262 seq "$@" |
263 perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
265 export -f slow_seq
266 # Serial = no mixing = the wanted result
267 seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
268 # Compare to 8 jobs in parallel
269 seq 8 | parallel -P8 slow_seq {}
270 seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
272 B<xargs> has no support for keeping the order of the output, therefore
273 if running jobs in parallel using B<xargs> the output of the second
274 job cannot be postponed till the first job is done.
276 B<xargs> has no support for running jobs on remote computers.
278 B<xargs> has no support for context replace, so you will have to create the
279 arguments.
281 If you use a replace string in B<xargs> (B<-I>) you can not force
282 B<xargs> to use more than one argument.
284 Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
285 composed commands and redirection require using B<bash -c>.
287 ls | parallel "wc {} >{}.wc"
288 ls | parallel "echo {}; ls {}|wc"
290 becomes (assuming you have 8 cores and that none of the filenames
291 contain space, " or ').
293 ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
294 ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
296 https://www.gnu.org/software/findutils/
299 =head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
301 Summary (see legend above):
303 =over
305 =item - - - x - x -
307 =item - M2 M3 - - - -
309 =item - O2 O3 O4 O5 O6
311 =item - - - - - - -
313 =item - - - - - - - - -
315 =item x x
317 =back
319 B<find -exec> offers some of the same possibilities as GNU B<parallel>.
321 B<find -exec> only works on files. Processing other input (such as
322 hosts or URLs) will require creating these inputs as files. B<find
323 -exec> has no support for running commands in parallel.
325 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
328 =head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
330 Summary (see legend above):
332 =over
334 =item - - - - - - -
336 =item - - - - - -
338 =item O1 O2 O3 - x O6
340 =item E1 - - - E5 -
342 =item - - - - - - - - -
344 =item - -
346 =back
348 B<make -j> can run jobs in parallel, but requires a crafted Makefile
349 to do this. That results in extra quoting to get filenames containing
350 newlines to work correctly.
352 B<make -j> computes a dependency graph before running jobs. Jobs run
353 by GNU B<parallel> does not depend on each other.
355 (Very early versions of GNU B<parallel> were coincidentally implemented
356 using B<make -j>).
358 https://www.gnu.org/software/make/ (Last checked: 2019-01)
361 =head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
363 Summary (see legend above):
365 =over
367 =item I1 I2 - - - - I7
369 =item M1 - M3 - - M6
371 =item O1 - - x - -
373 =item E1 E2 ?E3 E4 - - -
375 =item R1 R2 R3 R4 - - ?R7 ? ?
377 =item - -
379 =back
381 B<ppss> is also a tool for running jobs in parallel.
383 The output of B<ppss> is status information and thus not useful for
384 using as input for another command. The output from the jobs are put
385 into files.
387 The argument replace string ($ITEM) cannot be changed. Arguments must
388 be quoted - thus arguments containing special characters (space '"&!*)
389 may cause problems. More than one argument is not supported. Filenames
390 containing newlines are not processed correctly. When reading input
391 from a file null cannot be used as a terminator. B<ppss> needs to read
392 the whole input file before starting any jobs.
394 Output and status information is stored in ppss_dir and thus requires
395 cleanup when completed. If the dir is not removed before running
396 B<ppss> again it may cause nothing to happen as B<ppss> thinks the
397 task is already done. GNU B<parallel> will normally not need cleaning
398 up if running locally and will only need cleaning up if stopped
399 abnormally and running remote (B<--cleanup> may not complete if
400 stopped abnormally). The example B<Parallel grep> would require extra
401 postprocessing if written using B<ppss>.
403 For remote systems PPSS requires 3 steps: config, deploy, and
404 start. GNU B<parallel> only requires one step.
406 =head3 EXAMPLES FROM ppss MANUAL
408 Here are the examples from B<ppss>'s manual page with the equivalent
409 using GNU B<parallel>:
411 1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
413 1$ find /path/to/files -type f | parallel gzip
415 2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
417 2$ find /path/to/files -type f | parallel cp {} /destination/dir
419 3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
421 3$ parallel -a list-of-urls.txt wget -q
423 4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
425 4$ parallel -a list-of-urls.txt wget -q {}
427 5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \
428 -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \
429 -n nodes.txt -o /some/output/dir --upload --download;
430 ./ppss deploy -C config.cfg
431 ./ppss start -C config
433 5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
434 find source/dir -type f |
435 parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
437 6$ ./ppss stop -C config.cfg
439 6$ killall -TERM parallel
441 7$ ./ppss pause -C config.cfg
443 7$ Press: CTRL-Z or killall -SIGTSTP parallel
445 8$ ./ppss continue -C config.cfg
447 8$ Enter: fg or killall -SIGCONT parallel
449 9$ ./ppss.sh status -C config.cfg
451 9$ killall -SIGUSR2 parallel
453 https://github.com/louwrentius/PPSS
456 =head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
458 Summary (see legend above):
460 =over
462 =item I1 I2 - I4 I5 - -
464 =item M1 - M3 - - M6
466 =item O1 O2 O3 - O5 O6
468 =item E1 - - E4 - E6 -
470 =item R1 - - - - R6 - - -
472 =item S1 -
474 =back
476 B<pexec> is also a tool for running jobs in parallel.
478 =head3 EXAMPLES FROM pexec MANUAL
480 Here are the examples from B<pexec>'s info page with the equivalent
481 using GNU B<parallel>:
483 1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
484 'echo "scale=10000;sqrt($NUM)" | bc'
486 1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \
487 bc > sqrt-{}.dat'
489 2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
491 2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
493 3$ pexec -f image.list -n auto -e B -u star.log -c -- \
494 'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
496 3$ parallel -a image.list \
497 'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
499 4$ pexec -r *.png -e IMG -c -o - -- \
500 'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
502 4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
504 5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
506 5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
508 6$ for p in *.png ; do echo ${p%.png} ; done | \
509 pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
511 6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
513 7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done)
514 pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
516 7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
518 8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \
519 'pexec -j -m blockread -d $IMG | \
520 jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
521 pexec -j -m blockwrite -s th_$IMG'
523 8$ # Combining GNU B<parallel> and GNU B<sem>.
524 ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
525 'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
527 # If reading and writing is done to the same disk, this may be
528 # faster as only one process will be either reading or writing:
529 ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
530 'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
532 https://www.gnu.org/software/pexec/
535 =head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
537 B<xjobs> is also a tool for running jobs in parallel. It only supports
538 running jobs on your local computer.
540 B<xjobs> deals badly with special characters just like B<xargs>. See
541 the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
543 =head3 EXAMPLES FROM xjobs MANUAL
545 Here are the examples from B<xjobs>'s man page with the equivalent
546 using GNU B<parallel>:
548 1$ ls -1 *.zip | xjobs unzip
550 1$ ls *.zip | parallel unzip
552 2$ ls -1 *.zip | xjobs -n unzip
554 2$ ls *.zip | parallel unzip >/dev/null
556 3$ find . -name '*.bak' | xjobs gzip
558 3$ find . -name '*.bak' | parallel gzip
560 4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
562 4$ ls *.jar | parallel jar tf {} '>' {}.idx
564 5$ xjobs -s script
566 5$ cat script | parallel
568 6$ mkfifo /var/run/my_named_pipe;
569 xjobs -s /var/run/my_named_pipe &
570 echo unzip 1.zip >> /var/run/my_named_pipe;
571 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
573 6$ mkfifo /var/run/my_named_pipe;
574 cat /var/run/my_named_pipe | parallel &
575 echo unzip 1.zip >> /var/run/my_named_pipe;
576 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
578 https://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
581 =head2 DIFFERENCES BETWEEN prll AND GNU Parallel
583 B<prll> is also a tool for running jobs in parallel. It does not
584 support running jobs on remote computers.
586 B<prll> encourages using BASH aliases and BASH functions instead of
587 scripts. GNU B<parallel> supports scripts directly, functions if they
588 are exported using B<export -f>, and aliases if using B<env_parallel>.
590 B<prll> generates a lot of status information on stderr (standard
591 error) which makes it harder to use the stderr (standard error) output
592 of the job directly as input for another program.
594 =head3 EXAMPLES FROM prll's MANUAL
596 Here is the example from B<prll>'s man page with the equivalent
597 using GNU B<parallel>:
599 1$ prll -s 'mogrify -flip $1' *.jpg
601 1$ parallel mogrify -flip ::: *.jpg
603 https://github.com/exzombie/prll (Last checked: 2019-01)
606 =head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
608 B<dxargs> is also a tool for running jobs in parallel.
610 B<dxargs> does not deal well with more simultaneous jobs than SSHD's
611 MaxStartups. B<dxargs> is only built for remote run jobs, but does not
612 support transferring of files.
614 https://web.archive.org/web/20120518070250/http://www.
615 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked: 2019-01)
618 =head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
620 middleman(mdm) is also a tool for running jobs in parallel.
622 =head3 EXAMPLES FROM middleman's WEBSITE
624 Here are the shellscripts of
625 https://web.archive.org/web/20110728064735/http://mdm.
626 berlios.de/usage.html ported to GNU B<parallel>:
628 1$ seq 19 | parallel buffon -o - | sort -n > result
629 cat files | parallel cmd
630 find dir -execdir sem cmd {} \;
632 https://github.com/cklin/mdm (Last checked: 2019-01)
635 =head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
637 B<xapply> can run jobs in parallel on the local computer.
639 =head3 EXAMPLES FROM xapply's MANUAL
641 Here are the examples from B<xapply>'s man page with the equivalent
642 using GNU B<parallel>:
644 1$ xapply '(cd %1 && make all)' */
646 1$ parallel 'cd {} && make all' ::: */
648 2$ xapply -f 'diff %1 ../version5/%1' manifest | more
650 2$ parallel diff {} ../version5/{} < manifest | more
652 3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
654 3$ parallel --link diff {1} {2} :::: manifest1 checklist1
656 4$ xapply 'indent' *.c
658 4$ parallel indent ::: *.c
660 5$ find ~ksb/bin -type f ! -perm -111 -print | \
661 xapply -f -v 'chmod a+x' -
663 5$ find ~ksb/bin -type f ! -perm -111 -print | \
664 parallel -v chmod a+x
666 6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
668 6$ sh <(find */ -... | parallel -s 1024 echo vi)
670 6$ find */ -... | parallel -s 1024 -Xuj1 vi
672 7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
674 7$ sh <(find ... | parallel -n5 echo vi)
676 7$ find ... | parallel -n5 -uj1 vi
678 8$ xapply -fn "" /etc/passwd
680 8$ parallel -k echo < /etc/passwd
682 9$ tr ':' '\012' < /etc/passwd | \
683 xapply -7 -nf 'chown %1 %6' - - - - - - -
685 9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
687 10$ xapply '[ -d %1/RCS ] || echo %1' */
689 10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
691 11$ xapply -f '[ -f %1 ] && echo %1' List | ...
693 11$ parallel '[ -f {} ] && echo {}' < List | ...
695 https://web.archive.org/web/20160702211113/
696 http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
699 =head2 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
701 B<apply> can build command lines based on a template and arguments -
702 very much like GNU B<parallel>. B<apply> does not run jobs in
703 parallel. B<apply> does not use an argument separator (like B<:::>);
704 instead the template must be the first argument.
706 =head3 EXAMPLES FROM IBM's KNOWLEDGE CENTER
708 Here are the examples from IBM's Knowledge Center and the
709 corresponding command using GNU B<parallel>:
711 =head4 To obtain results similar to those of the B<ls> command, enter:
713 1$ apply echo *
714 1$ parallel echo ::: *
716 =head4 To compare the file named a1 to the file named b1, and
717 the file named a2 to the file named b2, enter:
719 2$ apply -2 cmp a1 b1 a2 b2
720 2$ parallel -N2 cmp ::: a1 b1 a2 b2
722 =head4 To run the B<who> command five times, enter:
724 3$ apply -0 who 1 2 3 4 5
725 3$ parallel -N0 who ::: 1 2 3 4 5
727 =head4 To link all files in the current directory to the directory
728 /usr/joe, enter:
730 4$ apply 'ln %1 /usr/joe' *
731 4$ parallel ln {} /usr/joe ::: *
733 https://www-01.ibm.com/support/knowledgecenter/
734 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
737 =head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
739 B<paexec> can run jobs in parallel on both the local and remote computers.
741 B<paexec> requires commands to print a blank line as the last
742 output. This means you will have to write a wrapper for most programs.
744 B<paexec> has a job dependency facility so a job can depend on another
745 job to be executed successfully. Sort of a poor-man's B<make>.
747 =head3 EXAMPLES FROM paexec's EXAMPLE CATALOG
749 Here are the examples from B<paexec>'s example catalog with the equivalent
750 using GNU B<parallel>:
752 =head4 1_div_X_run
754 1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
756 1$ parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
758 =head4 all_substr_run
760 2$ ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
762 2$ parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
764 =head4 cc_wrapper_run
766 3$ ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
767 -n 'host1 host2' \
768 -t '/usr/bin/ssh -x' <<EOF [...]
770 3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
771 -S host1,host2 <<EOF [...]
773 # This is not exactly the same, but avoids the wrapper
774 parallel gcc -O2 -c -o {.}.o {} \
775 -S host1,host2 <<EOF [...]
777 =head4 toupper_run
779 4$ ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
781 4$ parallel echo {} '|' ./toupper_cmd <<EOF [...]
783 # Without the wrapper:
784 parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
786 https://github.com/cheusov/paexec
789 =head2 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
791 Summary (see legend above):
793 =over
795 =item I1 - - I4 - - (I7)
797 =item M1 (M2) M3 (M4) M5 M6
799 =item - O2 O3 - O5 - - N/A N/A O10
801 =item E1 - - - - - -
803 =item - - - - - - - - -
805 =item - -
807 =back
809 (I7): Only under special circumstances. See below.
811 (M2+M4): Only if there is a single replacement string.
813 B<map> rejects input with special characters:
815 echo "The Cure" > My\ brother\'s\ 12\"\ records
817 ls | map 'echo %; wc %'
819 It works with GNU B<parallel>:
821 ls | parallel 'echo {}; wc {}'
823 Under some circumstances it also works with B<map>:
825 ls | map 'echo % works %'
827 But tiny changes make it reject the input with special characters:
829 ls | map 'echo % does not work "%"'
831 This means that many UTF-8 characters will be rejected. This is by
832 design. From the web page: "As such, programs that I<quietly handle
833 them, with no warnings at all,> are doing their users a disservice."
835 B<map> delays each job by 0.01 s. This can be emulated by using
836 B<parallel --delay 0.01>.
838 B<map> prints '+' on stderr when a job starts, and '-' when a job
839 finishes. This cannot be disabled. B<parallel> has B<--bar> if you
840 need to see progress.
842 B<map>'s replacement strings (% %D %B %E) can be simulated in GNU
843 B<parallel> by putting this in B<~/.parallel/config>:
845 --rpl '%'
846 --rpl '%D $_=Q(::dirname($_));'
847 --rpl '%B s:.*/::;s:\.[^/.]+$::;'
848 --rpl '%E s:.*\.::'
850 B<map> does not have an argument separator on the command line, but
851 uses the first argument as command. This makes quoting harder which again
852 may affect readability. Compare:
854 map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
856 parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
858 B<map> can do multiple arguments with context replace, but not without
859 context replace:
861 parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
863 map "echo 'BEGIN{'%'}END'" 1 2 3
865 B<map> has no support for grouping. So this gives the wrong results:
867 parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
868 ::: a b c d e f
869 ls -l a b c d e f
870 parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par
871 map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf
872 map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
873 map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
874 ls -l out*
875 md5sum out*
877 =head3 EXAMPLES FROM map's WEBSITE
879 Here are the examples from B<map>'s web page with the equivalent using
880 GNU B<parallel>:
882 1$ ls *.gif | map convert % %B.png # default max-args: 1
884 1$ ls *.gif | parallel convert {} {.}.png
886 2$ map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1
888 2$ parallel 'mkdir {.}; tar -C {.} -xf {}' ::: *.tgz
890 3$ ls *.gif | map cp % /tmp # default max-args: 100
892 3$ ls *.gif | parallel -X cp {} /tmp
894 4$ ls *.tar | map -n 1 tar -xf %
896 4$ ls *.tar | parallel tar -xf
898 5$ map "cp % /tmp" *.tgz
900 5$ parallel cp {} /tmp ::: *.tgz
902 6$ map "du -sm /home/%/mail" alice bob carol
904 6$ parallel "du -sm /home/{}/mail" ::: alice bob carol
905 or if you prefer running a single job with multiple args:
906 6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol
908 7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7'
910 7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}'
912 8$ export MAP_MAX_PROCS=$(( `nproc` / 2 ))
914 8$ export PARALLEL=-j50%
916 https://github.com/sitaramc/map (Last checked: 2020-05)
919 =head2 DIFFERENCES BETWEEN ladon AND GNU Parallel
921 B<ladon> can run multiple jobs on files in parallel.
923 B<ladon> only works on files and the only way to specify files is
924 using a quoted glob string (such as \*.jpg). It is not possible to
925 list the files manually.
927 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
928 RELPATH
930 These can be simulated using GNU B<parallel> by putting this in
931 B<~/.parallel/config>:
933 --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
934 --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
935 --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
936 --rpl 'EXT s:.*\.::'
937 --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
938 s:\Q$c/\E::;$_=::dirname($_);'
939 --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
940 s:\Q$c/\E::;'
942 B<ladon> deals badly with filenames containing " and newline, and it
943 fails for output larger than 200k:
945 ladon '*' -- seq 36000 | wc
947 =head3 EXAMPLES FROM ladon MANUAL
949 It is assumed that the '--rpl's above are put in B<~/.parallel/config>
950 and that it is run under a shell that supports '**' globbing (such as B<zsh>):
952 1$ ladon "**/*.txt" -- echo RELPATH
954 1$ parallel echo RELPATH ::: **/*.txt
956 2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
958 2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
960 3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \
961 -thumbnail 100x100^ -gravity center -extent 100x100 \
962 thumbs/RELPATH
964 3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH
965 -thumbnail 100x100^ -gravity center -extent 100x100 \
966 thumbs/RELPATH ::: **/*.jpg
968 4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
970 4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
972 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
975 =head2 DIFFERENCES BETWEEN jobflow AND GNU Parallel
977 B<jobflow> can run multiple jobs in parallel.
979 Just like B<xargs> output from B<jobflow> jobs running in parallel mix
980 together by default. B<jobflow> can buffer into files (placed in
981 /run/shm), but these are not cleaned up if B<jobflow> dies
982 unexpectedly (e.g. by Ctrl-C). If the total output is big (in the
983 order of RAM+swap) it can cause the system to slow to a crawl and
984 eventually run out of memory.
986 B<jobflow> gives no error if the command is unknown, and like B<xargs>
987 redirection and composed commands require wrapping with B<bash -c>.
989 Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
990 the command template. More than that either crashes the program or
991 simple does not execute the command.
993 B<jobflow> has no equivalent for B<--pipe>, or B<--sshlogin>.
995 B<jobflow> makes it possible to set resource limits on the running
996 jobs. This can be emulated by GNU B<parallel> using B<bash>'s B<ulimit>:
998 jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
1000 parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
1003 =head3 EXAMPLES FROM jobflow README
1005 1$ cat things.list | jobflow -threads=8 -exec ./mytask {}
1007 1$ cat things.list | parallel -j8 ./mytask {}
1009 2$ seq 100 | jobflow -threads=100 -exec echo {}
1011 2$ seq 100 | parallel -j100 echo {}
1013 3$ cat urls.txt | jobflow -threads=32 -exec wget {}
1015 3$ cat urls.txt | parallel -j32 wget {}
1017 4$ find . -name '*.bmp' | \
1018 jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
1020 4$ find . -name '*.bmp' | \
1021 parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
1023 https://github.com/rofl0r/jobflow
1026 =head2 DIFFERENCES BETWEEN gargs AND GNU Parallel
1028 B<gargs> can run multiple jobs in parallel.
1030 Older versions cache output in memory. This causes it to be extremely
1031 slow when the output is larger than the physical RAM, and can cause
1032 the system to run out of memory.
1034 See more details on this in B<man parallel_design>.
1036 Newer versions cache output in files, but leave files in $TMPDIR if it
1037 is killed.
1039 Output to stderr (standard error) is changed if the command fails.
1041 =head3 EXAMPLES FROM gargs WEBSITE
1043 1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
1045 1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
1047 2$ cat t.txt | gargs --sep "\s+" \
1048 -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
1050 2$ cat t.txt | parallel --colsep "\\s+" \
1051 -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
1053 https://github.com/brentp/gargs
1056 =head2 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
1058 B<orgalorg> can run the same job on multiple machines. This is related
1059 to B<--onall> and B<--nonall>.
1061 B<orgalorg> supports entering the SSH password - provided it is the
1062 same for all servers. GNU B<parallel> advocates using B<ssh-agent>
1063 instead, but it is possible to emulate B<orgalorg>'s behavior by
1064 setting SSHPASS and by using B<--ssh "sshpass ssh">.
1066 To make the emulation easier, make a simple alias:
1068 alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
1070 If you want to supply a password run:
1072 SSHPASS=`ssh-askpass`
1074 or set the password directly:
1076 SSHPASS=P4$$w0rd!
1078 If the above is set up you can then do:
1080 orgalorg -o frontend1 -o frontend2 -p -C uptime
1081 par_emul -S frontend1 -S frontend2 uptime
1083 orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
1084 par_emul -S frontend1 -S frontend2 top -bid 1
1086 orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
1087 'md5sum /tmp/bigfile' -S bigfile
1088 par_emul -S frontend1 -S frontend2 --basefile bigfile \
1089 --workdir /tmp md5sum /tmp/bigfile
1091 B<orgalorg> has a progress indicator for the transferring of a
1092 file. GNU B<parallel> does not.
1094 https://github.com/reconquest/orgalorg
1097 =head2 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
1099 Rust parallel focuses on speed. It is almost as fast as B<xargs>, but
1100 not as fast as B<parallel-bash>. It implements a few features from GNU
1101 B<parallel>, but lacks many functions. All these fail:
1103 # Read arguments from file
1104 parallel -a file echo
1105 # Changing the delimiter
1106 parallel -d _ echo ::: a_b_c_
1108 These do something different from GNU B<parallel>
1110 # -q to protect quoted $ and space
1111 parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
1112 # Generation of combination of inputs
1113 parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
1114 # {= perl expression =} replacement string
1115 parallel echo '{= s/new/old/ =}' ::: my.new your.new
1116 # --pipe
1117 seq 100000 | parallel --pipe wc
1118 # linked arguments
1119 parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
1120 # Run different shell dialects
1121 zsh -c 'parallel echo \={} ::: zsh && true'
1122 csh -c 'parallel echo \$\{\} ::: shell && true'
1123 bash -c 'parallel echo \$\({}\) ::: pwd && true'
1124 # Rust parallel does not start before the last argument is read
1125 (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
1126 tail -f /var/log/syslog | parallel echo
1128 Most of the examples from the book GNU Parallel 2018 do not work, thus
1129 Rust parallel is not close to being a compatible replacement.
1131 Rust parallel has no remote facilities.
1133 It uses /tmp/parallel for tmp files and does not clean up if
1134 terminated abruptly. If another user on the system uses Rust parallel,
1135 then /tmp/parallel will have the wrong permissions and Rust parallel
1136 will fail. A malicious user can setup the right permissions and
1137 symlink the output file to one of the user's files and next time the
1138 user uses Rust parallel it will overwrite this file.
1140 attacker$ mkdir /tmp/parallel
1141 attacker$ chmod a+rwX /tmp/parallel
1142 # Symlink to the file the attacker wants to zero out
1143 attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
1144 victim$ seq 1000 | parallel echo
1145 # This file is now overwritten with stderr from 'echo'
1146 victim$ cat ~victim/.important-file
1148 If /tmp/parallel runs full during the run, Rust parallel does not
1149 report this, but finishes with success - thereby risking data loss.
1151 https://github.com/mmstick/parallel
1154 =head2 DIFFERENCES BETWEEN Rush AND GNU Parallel
1156 B<rush> (https://github.com/shenwei356/rush) is written in Go and
1157 based on B<gargs>.
1159 Just like GNU B<parallel> B<rush> buffers in temporary files. But
1160 opposite GNU B<parallel> B<rush> does not clean up, if the process
1161 dies abnormally.
1163 B<rush> has some string manipulations that can be emulated by putting
1164 this into ~/.parallel/config (/ is used instead of %, and % is used
1165 instead of ^ as that is closer to bash's ${var%postfix}):
1167 --rpl '{:} s:(\.[^/]+)*$::'
1168 --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
1169 --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
1170 --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
1171 --rpl '{@(.*?)} /$$1/ and $_=$1;'
1173 =head3 EXAMPLES FROM rush's WEBSITE
1175 Here are the examples from B<rush>'s website with the equivalent
1176 command in GNU B<parallel>.
1178 B<1. Simple run, quoting is not necessary>
1180 1$ seq 1 3 | rush echo {}
1182 1$ seq 1 3 | parallel echo {}
1184 B<2. Read data from file (`-i`)>
1186 2$ rush echo {} -i data1.txt -i data2.txt
1188 2$ cat data1.txt data2.txt | parallel echo {}
1190 B<3. Keep output order (`-k`)>
1192 3$ seq 1 3 | rush 'echo {}' -k
1194 3$ seq 1 3 | parallel -k echo {}
1197 B<4. Timeout (`-t`)>
1199 4$ time seq 1 | rush 'sleep 2; echo {}' -t 1
1201 4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
1203 B<5. Retry (`-r`)>
1205 5$ seq 1 | rush 'python unexisted_script.py' -r 1
1207 5$ seq 1 | parallel --retries 2 'python unexisted_script.py'
1209 Use B<-u> to see it is really run twice:
1211 5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
1213 B<6. Dirname (`{/}`) and basename (`{%}`) and remove custom
1214 suffix (`{^suffix}`)>
1216 6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
1218 6$ echo dir/file_1.txt.gz |
1219 parallel --plus echo {//} {/} {%_1.txt.gz}
1221 B<7. Get basename, and remove last (`{.}`) or any (`{:}`) extension>
1223 7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
1225 7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
1227 B<8. Job ID, combine fields index and other replacement strings>
1229 8$ echo 12 file.txt dir/s_1.fq.gz |
1230 rush 'echo job {#}: {2} {2.} {3%:^_1}'
1232 8$ echo 12 file.txt dir/s_1.fq.gz |
1233 parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1235 B<9. Capture submatch using regular expression (`{@regexp}`)>
1237 9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1239 9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1241 B<10. Custom field delimiter (`-d`)>
1243 10$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1245 10$ echo a=b=c | parallel -d = echo {1} {2} {3}
1247 B<11. Send multi-lines to every command (`-n`)>
1249 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1251 11$ seq 5 |
1252 parallel -n 2 -k \
1253 'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1255 11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1257 11$ seq 5 | parallel -n 2 -k 'echo {}; echo'
1260 B<12. Custom record delimiter (`-D`), note that empty records are not used.>
1262 12$ echo a b c d | rush -D " " -k 'echo {}'
1264 12$ echo a b c d | parallel -d " " -k 'echo {}'
1266 12$ echo abcd | rush -D "" -k 'echo {}'
1268 Cannot be done by GNU Parallel
1270 12$ cat fasta.fa
1271 >seq1
1273 >seq2
1276 >seq3
1277 attac
1281 12$ cat fasta.fa | rush -D ">" \
1282 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1283 # rush fails to join the multiline sequences
1285 12$ cat fasta.fa | (read -n1 ignore_first_char;
1286 parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1287 name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1290 B<13. Assign value to variable, like `awk -v` (`-v`)>
1292 13$ seq 1 |
1293 rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1295 13$ seq 1 |
1296 parallel -N0 \
1297 'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1299 13$ for var in a b; do \
1300 13$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1301 13$ done
1303 In GNU B<parallel> you would typically do:
1305 13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1307 If you I<really> want the var:
1309 13$ seq 1 3 |
1310 parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1312 If you I<really> want the B<for>-loop:
1314 13$ for var in a b; do
1315 export var;
1316 seq 1 3 | parallel -k 'echo var: $var, data: {}';
1317 done
1319 Contrary to B<rush> this also works if the value is complex like:
1321 My brother's 12" records
1324 B<14. B<Preset variable> (`-v`), avoid repeatedly writing verbose replacement strings>
1326 14$ # naive way
1327 echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1329 14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1331 14$ # macro + removing suffix
1332 echo read_1.fq.gz |
1333 rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1335 14$ echo read_1.fq.gz |
1336 parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1338 14$ # macro + regular expression
1339 echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1341 14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1343 Contrary to B<rush> GNU B<parallel> works with complex values:
1345 14$ echo "My brother's 12\"read_1.fq.gz" |
1346 parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1348 B<15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and exit.>
1350 15$ seq 1 20 | rush 'sleep 1; echo {}'
1353 15$ seq 1 20 | parallel 'sleep 1; echo {}'
1356 B<16. Continue/resume jobs (`-c`). When some jobs failed (by
1357 execution failure, timeout, or canceling by user with `Ctrl + C`),
1358 please switch flag `-c/--continue` on and run again, so that `rush`
1359 can save successful commands and ignore them in I<NEXT> run.>
1361 16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1362 cat successful_cmds.rush
1363 seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1365 16$ seq 1 3 | parallel --joblog mylog --timeout 2 \
1366 'sleep {}; echo {}'
1367 cat mylog
1368 seq 1 3 | parallel --joblog mylog --retry-failed \
1369 'sleep {}; echo {}'
1371 Multi-line jobs:
1373 16$ seq 1 3 | rush 'sleep {}; echo {}; \
1374 echo finish {}' -t 3 -c -C finished.rush
1375 cat finished.rush
1376 seq 1 3 | rush 'sleep {}; echo {}; \
1377 echo finish {}' -t 3 -c -C finished.rush
1379 16$ seq 1 3 |
1380 parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1381 echo finish {}'
1382 cat mylog
1383 seq 1 3 |
1384 parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1385 echo finish {}'
1387 B<17. A comprehensive example: downloading 1K+ pages given by
1388 three URL list files using `phantomjs save_page.js` (some page
1389 contents are dynamically generated by Javascript, so `wget` does not
1390 work). Here I set max jobs number (`-j`) as `20`, each job has a max
1391 running time (`-t`) of `60` seconds and `3` retry changes
1392 (`-r`). Continue flag `-c` is also switched on, so we can continue
1393 unfinished jobs. Luckily, it's accomplished in one run :)>
1395 17$ for f in $(seq 2014 2016); do \
1396 /bin/rm -rf $f; mkdir -p $f; \
1397 cat $f.html.txt | rush -v d=$f -d = \
1398 'phantomjs save_page.js "{}" > {d}/{3}.html' \
1399 -j 20 -t 60 -r 3 -c; \
1400 done
1402 GNU B<parallel> can append to an existing joblog with '+':
1404 17$ rm mylog
1405 for f in $(seq 2014 2016); do
1406 /bin/rm -rf $f; mkdir -p $f;
1407 cat $f.html.txt |
1408 parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1409 --colsep = \
1410 phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1411 done
1413 B<18. A bioinformatics example: mapping with `bwa`, and
1414 processing result with `samtools`:>
1416 18$ ref=ref/xxx.fa
1417 threads=25
1418 ls -d raw.cluster.clean.mapping/* \
1419 | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1420 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1421 samtools view -bS {p}.sam > {p}.bam; \
1422 samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1423 samtools index {p}.sorted.bam; \
1424 samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1425 /bin/rm {p}.bam {p}.sam;' \
1426 -j 2 --verbose -c -C mapping.rush
1428 GNU B<parallel> would use a function:
1430 18$ ref=ref/xxx.fa
1431 export ref
1432 thr=25
1433 export thr
1434 bwa_sam() {
1435 p="$1"
1436 bam="$p".bam
1437 sam="$p".sam
1438 sortbam="$p".sorted.bam
1439 bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1440 samtools view -bS "$sam" > "$bam"
1441 samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1442 samtools index "$sortbam"
1443 samtools flagstat "$sortbam" > "$sortbam".flagstat
1444 /bin/rm "$bam" "$sam"
1446 export -f bwa_sam
1447 ls -d raw.cluster.clean.mapping/* |
1448 parallel -j 2 --verbose --joblog mylog bwa_sam
1450 =head3 Other B<rush> features
1452 B<rush> has:
1454 =over 4
1456 =item * B<awk -v> like custom defined variables (B<-v>)
1458 With GNU B<parallel> you would simply set a shell variable:
1460 parallel 'v={}; echo "$v"' ::: foo
1461 echo foo | rush -v v={} 'echo {v}'
1463 Also B<rush> does not like special chars. So these B<do not work>:
1465 echo does not work | rush -v v=\" 'echo {v}'
1466 echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
1468 Whereas the corresponding GNU B<parallel> version works:
1470 parallel 'v=\"; echo "$v"' ::: works
1471 parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
1473 =item * Exit on first error(s) (-e)
1475 This is called B<--halt now,fail=1> (or shorter: B<--halt 2>) when
1476 used with GNU B<parallel>.
1478 =item * Settable records sending to every command (B<-n>, default 1)
1480 This is also called B<-n> in GNU B<parallel>.
1482 =item * Practical replacement strings
1484 =over 4
1486 =item {:} remove any extension
1488 With GNU B<parallel> this can be emulated by:
1490 parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1492 =item {^suffix}, remove suffix
1494 With GNU B<parallel> this can be emulated by:
1496 parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1498 =item {@regexp}, capture submatch using regular expression
1500 With GNU B<parallel> this can be emulated by:
1502 parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1503 echo '{@\d_(.*).gz}' ::: 1_foo.gz
1505 =item {%.}, {%:}, basename without extension
1507 With GNU B<parallel> this can be emulated by:
1509 parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1511 And if you need it often, you define a B<--rpl> in
1512 B<$HOME/.parallel/config>:
1514 --rpl '{%.} s:.*/::;s/\..*//'
1515 --rpl '{%:} s:.*/::;s/\..*//'
1517 Then you can use them as:
1519 parallel echo {%.} {%:} ::: dir/foo.bar.gz
1521 =back
1523 =item * Preset variable (macro)
1525 E.g.
1527 echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1529 With GNU B<parallel> this can be emulated by:
1531 echo foosuffix |
1532 parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1534 Opposite B<rush> GNU B<parallel> works fine if the input contains
1535 double space, ' and ":
1537 echo "1'6\" foosuffix" |
1538 parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1541 =item * Commands of multi-lines
1543 While you I<can> use multi-lined commands in GNU B<parallel>, to
1544 improve readability GNU B<parallel> discourages the use of multi-line
1545 commands. In most cases it can be written as a function:
1547 seq 1 3 |
1548 parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1549 echo finish {}'
1551 Could be written as:
1553 doit() {
1554 sleep "$1"
1555 echo "$1"
1556 echo finish "$1"
1558 export -f doit
1559 seq 1 3 | parallel --timeout 2 --joblog my.log doit
1561 The failed commands can be resumed with:
1563 seq 1 3 |
1564 parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1565 echo finish {}'
1567 =back
1569 https://github.com/shenwei356/rush
1572 =head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1574 ClusterSSH solves a different problem than GNU B<parallel>.
1576 ClusterSSH opens a terminal window for each computer and using a
1577 master window you can run the same command on all the computers. This
1578 is typically used for administrating several computers that are almost
1579 identical.
1581 GNU B<parallel> runs the same (or different) commands with different
1582 arguments in parallel possibly using remote computers to help
1583 computing. If more than one computer is listed in B<-S> GNU B<parallel> may
1584 only use one of these (e.g. if there are 8 jobs to be run and one
1585 computer has 8 cores).
1587 GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
1589 B<parallel --nonall -S server-a,server-b do_stuff foo bar>
1591 https://github.com/duncs/clusterssh
1594 =head2 DIFFERENCES BETWEEN coshell AND GNU Parallel
1596 B<coshell> only accepts full commands on standard input. Any quoting
1597 needs to be done by the user.
1599 Commands are run in B<sh> so any B<bash>/B<tcsh>/B<zsh> specific
1600 syntax will not work.
1602 Output can be buffered by using B<-d>. Output is buffered in memory,
1603 so big output can cause swapping and therefore be terrible slow or
1604 even cause out of memory.
1606 https://github.com/gdm85/coshell (Last checked: 2019-01)
1609 =head2 DIFFERENCES BETWEEN spread AND GNU Parallel
1611 B<spread> runs commands on all directories.
1613 It can be emulated with GNU B<parallel> using this Bash function:
1615 spread() {
1616 _cmds() {
1617 perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1619 parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1622 This works except for the B<--exclude> option.
1624 (Last checked: 2017-11)
1627 =head2 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1629 B<pyargs> deals badly with input containing spaces. It buffers stdout,
1630 but not stderr. It buffers in RAM. {} does not work as replacement
1631 string. It does not support running functions.
1633 B<pyargs> does not support composed commands if run with B<--lines>,
1634 and fails on B<pyargs traceroute gnu.org fsf.org>.
1636 =head3 Examples
1638 seq 5 | pyargs -P50 -L seq
1639 seq 5 | parallel -P50 --lb seq
1641 seq 5 | pyargs -P50 --mark -L seq
1642 seq 5 | parallel -P50 --lb \
1643 --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1644 # Similar, but not precisely the same
1645 seq 5 | parallel -P50 --lb --tag seq
1647 seq 5 | pyargs -P50 --mark command
1648 # Somewhat longer with GNU Parallel due to the special
1649 # --mark formatting
1650 cmd="$(echo "command" | parallel --shellquote)"
1651 wrap_cmd() {
1652 echo "MARK $cmd $@================================" >&3
1653 echo "OUTPUT START[$cmd $@]:"
1654 eval $cmd "$@"
1655 echo "OUTPUT END[$cmd $@]"
1657 (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1658 # Similar, but not exactly the same
1659 seq 5 | parallel -t --tag command
1661 (echo '1 2 3';echo 4 5 6) | pyargs --stream seq
1662 (echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
1663 parallel -r -d' ' seq
1664 # Similar, but not exactly the same
1665 parallel seq ::: 1 2 3 4 5 6
1667 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1670 =head2 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1672 B<concurrently> runs jobs in parallel.
1674 The output is prepended with the job number, and may be incomplete:
1676 $ concurrently 'seq 100000' | (sleep 3;wc -l)
1677 7165
1679 When pretty printing it caches output in memory. Output mixes by using
1680 test MIX below whether or not output is cached.
1682 There seems to be no way of making a template command and have
1683 B<concurrently> fill that with different args. The full commands must
1684 be given on the command line.
1686 There is also no way of controlling how many jobs should be run in
1687 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1688 simply started in parallel.
1690 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1693 =head2 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1695 B<map> does not run jobs in parallel by default. The README suggests using:
1697 ... | map t 'sleep $t && say done &'
1699 But this fails if more jobs are run in parallel than the number of
1700 available processes. Since there is no support for parallelization in
1701 B<map> itself, the output also mixes:
1703 seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1705 The major difference is that GNU B<parallel> is built for parallelization
1706 and B<map> is not. So GNU B<parallel> has lots of ways of dealing with the
1707 issues that parallelization raises:
1709 =over 4
1711 =item *
1713 Keep the number of processes manageable
1715 =item *
1717 Make sure output does not mix
1719 =item *
1721 Make Ctrl-C kill all running processes
1723 =back
1725 =head3 EXAMPLES FROM maps WEBSITE
1727 Here are the 5 examples converted to GNU Parallel:
1729 1$ ls *.c | map f 'foo $f'
1730 1$ ls *.c | parallel foo
1732 2$ ls *.c | map f 'foo $f; bar $f'
1733 2$ ls *.c | parallel 'foo {}; bar {}'
1735 3$ cat urls | map u 'curl -O $u'
1736 3$ cat urls | parallel curl -O
1738 4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1739 4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1740 4$ parallel 'sleep {} && say done' ::: 1 1 1
1742 5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1743 5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1744 5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1746 https://github.com/soveran/map (Last checked: 2019-01)
1749 =head2 DIFFERENCES BETWEEN loop AND GNU Parallel
1751 B<loop> mixes stdout and stderr:
1753 loop 'ls /no-such-file' >/dev/null
1755 B<loop>'s replacement string B<$ITEM> does not quote strings:
1757 echo 'two spaces' | loop 'echo $ITEM'
1759 B<loop> cannot run functions:
1761 myfunc() { echo joe; }
1762 export -f myfunc
1763 loop 'myfunc this fails'
1765 =head3 EXAMPLES FROM loop's WEBSITE
1767 Some of the examples from https://github.com/Miserlou/Loop/ can be
1768 emulated with GNU B<parallel>:
1770 # A couple of functions will make the code easier to read
1771 $ loopy() {
1772 yes | parallel -uN0 -j1 "$@"
1774 $ export -f loopy
1775 $ time_out() {
1776 parallel -uN0 -q --timeout "$@" ::: 1
1778 $ match() {
1779 perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1781 $ export -f match
1783 $ loop 'ls' --every 10s
1784 $ loopy --delay 10s ls
1786 $ loop 'touch $COUNT.txt' --count-by 5
1787 $ loopy touch '{= $_=seq()*5 =}'.txt
1789 $ loop --until-contains 200 -- \
1790 ./get_response_code.sh --site mysite.biz`
1791 $ loopy --halt now,success=1 \
1792 './get_response_code.sh --site mysite.biz | match 200'
1794 $ loop './poke_server' --for-duration 8h
1795 $ time_out 8h loopy ./poke_server
1797 $ loop './poke_server' --until-success
1798 $ loopy --halt now,success=1 ./poke_server
1800 $ cat files_to_create.txt | loop 'touch $ITEM'
1801 $ cat files_to_create.txt | parallel touch {}
1803 $ loop 'ls' --for-duration 10min --summary
1804 # --joblog is somewhat more verbose than --summary
1805 $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1807 $ loop 'echo hello'
1808 $ loopy echo hello
1810 $ loop 'echo $COUNT'
1811 # GNU Parallel counts from 1
1812 $ loopy echo {#}
1813 # Counting from 0 can be forced
1814 $ loopy echo '{= $_=seq()-1 =}'
1816 $ loop 'echo $COUNT' --count-by 2
1817 $ loopy echo '{= $_=2*(seq()-1) =}'
1819 $ loop 'echo $COUNT' --count-by 2 --offset 10
1820 $ loopy echo '{= $_=10+2*(seq()-1) =}'
1822 $ loop 'echo $COUNT' --count-by 1.1
1823 # GNU Parallel rounds 3.3000000000000003 to 3.3
1824 $ loopy echo '{= $_=1.1*(seq()-1) =}'
1826 $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1827 $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1829 $ loop 'echo $COUNT' --num 3 --summary
1830 # --joblog is somewhat more verbose than --summary
1831 $ seq 3 | parallel --joblog my.log echo; cat my.log
1833 $ loop 'ls -foobarbatz' --num 3 --summary
1834 # --joblog is somewhat more verbose than --summary
1835 $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1837 $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1838 # Can be emulated by running 2 jobs
1839 $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1840 $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1842 $ loop 'date' --every 5s
1843 $ loopy --delay 5s date
1845 $ loop 'date' --for-duration 8s --every 2s
1846 $ time_out 8s loopy --delay 2s date
1848 $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1849 $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date +%s`))s
1850 $ time_out $seconds loopy --delay 5s date -u
1852 $ loop 'echo $RANDOM' --until-contains "666"
1853 $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1855 $ loop 'if (( RANDOM % 2 )); then
1856 (echo "TRUE"; true);
1857 else
1858 (echo "FALSE"; false);
1859 fi' --until-success
1860 $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1861 (echo "TRUE"; true);
1862 else
1863 (echo "FALSE"; false);
1866 $ loop 'if (( RANDOM % 2 )); then
1867 (echo "TRUE"; true);
1868 else
1869 (echo "FALSE"; false);
1870 fi' --until-error
1871 $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1872 (echo "TRUE"; true);
1873 else
1874 (echo "FALSE"; false);
1877 $ loop 'date' --until-match "(\d{4})"
1878 $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1880 $ loop 'echo $ITEM' --for red,green,blue
1881 $ parallel echo ::: red green blue
1883 $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1884 $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1886 $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1887 $ ls | parallel cp {} {}.bak; ls
1889 $ loop 'echo $ITEM | tr a-z A-Z' -i
1890 $ parallel 'echo {} | tr a-z A-Z'
1891 # Or more efficiently:
1892 $ parallel --pipe tr a-z A-Z
1894 $ loop 'echo $ITEM' --for "`ls`"
1895 $ parallel echo {} ::: "`ls`"
1897 $ ls | loop './my_program $ITEM' --until-success;
1898 $ ls | parallel --halt now,success=1 ./my_program {}
1900 $ ls | loop './my_program $ITEM' --until-fail;
1901 $ ls | parallel --halt now,fail=1 ./my_program {}
1903 $ ./deploy.sh;
1904 loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1905 --every 5s --until-contains 200;
1906 ./announce_to_slack.sh
1907 $ ./deploy.sh;
1908 loopy --delay 5s --halt now,success=1 \
1909 'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1910 ./announce_to_slack.sh
1912 $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1913 $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1915 $ ./create_big_file -o my_big_file.bin;
1916 loop 'ls' --until-contains 'my_big_file.bin';
1917 ./upload_big_file my_big_file.bin
1918 # inotifywait is a better tool to detect file system changes.
1919 # It can even make sure the file is complete
1920 # so you are not uploading an incomplete file
1921 $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1922 grep my_big_file.bin
1924 $ ls | loop 'cp $ITEM $ITEM.bak'
1925 $ ls | parallel cp {} {}.bak
1927 $ loop './do_thing.sh' --every 15s --until-success --num 5
1928 $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1930 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1933 =head2 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1935 B<lorikeet> can run jobs in parallel. It does this based on a
1936 dependency graph described in a file, so this is similar to B<make>.
1938 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1941 =head2 DIFFERENCES BETWEEN spp AND GNU Parallel
1943 B<spp> can run jobs in parallel. B<spp> does not use a command
1944 template to generate the jobs, but requires jobs to be in a
1945 file. Output from the jobs mix.
1947 https://github.com/john01dav/spp (Last checked: 2019-01)
1950 =head2 DIFFERENCES BETWEEN paral AND GNU Parallel
1952 B<paral> prints a lot of status information and stores the output from
1953 the commands run into files. This means it cannot be used the middle
1954 of a pipe like this
1956 paral "echo this" "echo does not" "echo work" | wc
1958 Instead it puts the output into files named like
1959 B<out_#_I<command>.out.log>. To get a very similar behaviour with GNU
1960 B<parallel> use B<--results
1961 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta>
1963 B<paral> only takes arguments on the command line and each argument
1964 should be a full command. Thus it does not use command templates.
1966 This limits how many jobs it can run in total, because they all need
1967 to fit on a single command line.
1969 B<paral> has no support for running jobs remotely.
1971 =head3 EXAMPLES FROM README.markdown
1973 The examples from B<README.markdown> and the corresponding command run
1974 with GNU B<parallel> (B<--results
1975 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta> is omitted from
1976 the GNU B<parallel> command):
1978 1$ paral "command 1" "command 2 --flag" "command arg1 arg2"
1979 1$ parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1981 2$ paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1982 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1983 2$ parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1984 "sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
1985 # Or shorter:
1986 parallel "sleep {} && echo c{}" ::: {1..5}
1988 3$ paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1989 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1990 3$ parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1991 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1992 # Or shorter:
1993 parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1995 4$ paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1996 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1997 4$ parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1999 5$ paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
2000 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
2001 5$ parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
2003 6$ paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
2004 "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
2005 6$ parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
2007 7$ paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
2008 echo c && sleep 0.5 && echo d && sleep 0.5 && \
2009 echo e && sleep 0.5 && echo f && sleep 0.5 && \
2010 echo g && sleep 0.5 && echo h"
2011 7$ parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
2012 echo c && sleep 0.5 && echo d && sleep 0.5 && \
2013 echo e && sleep 0.5 && echo f && sleep 0.5 && \
2014 echo g && sleep 0.5 && echo h"
2016 https://github.com/amattn/paral (Last checked: 2019-01)
2019 =head2 DIFFERENCES BETWEEN concurr AND GNU Parallel
2021 B<concurr> is built to run jobs in parallel using a client/server
2022 model.
2024 =head3 EXAMPLES FROM README.md
2026 The examples from B<README.md>:
2028 1$ concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
2029 1$ parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
2031 2$ concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
2032 2$ parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
2034 3$ concurr 'echo {}' < input_file
2035 3$ parallel 'echo {}' < input_file
2037 4$ cat file | concurr 'echo {}'
2038 4$ cat file | parallel 'echo {}'
2040 B<concurr> deals badly empty input files and with output larger than
2041 64 KB.
2043 https://github.com/mmstick/concurr (Last checked: 2019-01)
2046 =head2 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
2048 B<lesser-parallel> is the inspiration for B<parallel --embed>. Both
2049 B<lesser-parallel> and B<parallel --embed> define bash functions that
2050 can be included as part of a bash script to run jobs in parallel.
2052 B<lesser-parallel> implements a few of the replacement strings, but
2053 hardly any options, whereas B<parallel --embed> gives you the full
2054 GNU B<parallel> experience.
2056 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
2059 =head2 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
2061 B<npm-parallel> can run npm tasks in parallel.
2063 There are no examples and very little documentation, so it is hard to
2064 compare to GNU B<parallel>.
2066 https://github.com/spion/npm-parallel (Last checked: 2019-01)
2069 =head2 DIFFERENCES BETWEEN machma AND GNU Parallel
2071 B<machma> runs tasks in parallel. It gives time stamped
2072 output. It buffers in RAM.
2074 =head3 EXAMPLES FROM README.md
2076 The examples from README.md:
2078 1$ # Put shorthand for timestamp in config for the examples
2079 echo '--rpl '\
2080 \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
2081 > ~/.parallel/machma
2082 echo '--line-buffer --tagstring "{#} {time} {}"' \
2083 >> ~/.parallel/machma
2085 2$ find . -iname '*.jpg' |
2086 machma -- mogrify -resize 1200x1200 -filter Lanczos {}
2087 find . -iname '*.jpg' |
2088 parallel --bar -Jmachma mogrify -resize 1200x1200 \
2089 -filter Lanczos {}
2091 3$ cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
2092 3$ cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
2094 4$ cat /tmp/ips |
2095 machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
2096 4$ cat /tmp/ips |
2097 parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
2099 5$ find . -iname '*.jpg' |
2100 machma --timeout 5s -- mogrify -resize 1200x1200 \
2101 -filter Lanczos {}
2102 5$ find . -iname '*.jpg' |
2103 parallel --timeout 5s --bar mogrify -resize 1200x1200 \
2104 -filter Lanczos {}
2106 6$ find . -iname '*.jpg' -print0 |
2107 machma --null -- mogrify -resize 1200x1200 -filter Lanczos {}
2108 6$ find . -iname '*.jpg' -print0 |
2109 parallel --null --bar mogrify -resize 1200x1200 \
2110 -filter Lanczos {}
2112 https://github.com/fd0/machma (Last checked: 2019-06)
2115 =head2 DIFFERENCES BETWEEN interlace AND GNU Parallel
2117 Summary (see legend above):
2119 =over
2121 =item - I2 I3 I4 - - -
2123 =item M1 - M3 - - M6
2125 =item - O2 O3 - - - - x x
2127 =item E1 E2 - - - - -
2129 =item - - - - - - - - -
2131 =item - -
2133 =back
2135 B<interlace> is built for network analysis to run network tools in parallel.
2137 B<interface> does not buffer output, so output from different jobs mixes.
2139 The overhead for each target is O(n*n), so with 1000 targets it
2140 becomes very slow with an overhead in the order of 500ms/target.
2142 =head3 EXAMPLES FROM interlace's WEBSITE
2144 Using B<prips> most of the examples from
2145 https://github.com/codingo/Interlace can be run with GNU B<parallel>:
2147 Blocker
2149 commands.txt:
2150 mkdir -p _output_/_target_/scans/
2151 _blocker_
2152 nmap _target_ -oA _output_/_target_/scans/_target_-nmap
2153 interlace -tL ./targets.txt -cL commands.txt -o $output
2155 parallel -a targets.txt \
2156 mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
2158 Blocks
2160 commands.txt:
2161 _block:nmap_
2162 mkdir -p _target_/output/scans/
2163 nmap _target_ -oN _target_/output/scans/_target_-nmap
2164 _block:nmap_
2165 nikto --host _target_
2166 interlace -tL ./targets.txt -cL commands.txt
2168 _nmap() {
2169 mkdir -p $1/output/scans/
2170 nmap $1 -oN $1/output/scans/$1-nmap
2172 export -f _nmap
2173 parallel ::: _nmap "nikto --host" :::: targets.txt
2175 Run Nikto Over Multiple Sites
2177 interlace -tL ./targets.txt -threads 5 \
2178 -c "nikto --host _target_ > ./_target_-nikto.txt" -v
2180 parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
2182 Run Nikto Over Multiple Sites and Ports
2184 interlace -tL ./targets.txt -threads 5 -c \
2185 "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
2186 -p 80,443 -v
2188 parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
2189 :::: targets.txt ::: 80 443
2191 Run a List of Commands against Target Hosts
2193 commands.txt:
2194 nikto --host _target_:_port_ > _output_/_target_-nikto.txt
2195 sslscan _target_:_port_ > _output_/_target_-sslscan.txt
2196 testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
2197 interlace -t example.com -o ~/Engagements/example/ \
2198 -cL ./commands.txt -p 80,443
2200 parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
2201 ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
2203 CIDR notation with an application that doesn't support it
2205 interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
2206 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
2208 prips 192.168.12.0/24 |
2209 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2211 Glob notation with an application that doesn't support it
2213 interlace -t 192.168.12.* -c "vhostscan _target_ \
2214 -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
2216 # Glob is not supported in prips
2217 prips 192.168.12.0/24 |
2218 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2220 Dash (-) notation with an application that doesn't support it
2222 interlace -t 192.168.12.1-15 -c \
2223 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2224 -o ~/scans/ -threads 50
2226 # Dash notation is not supported in prips
2227 prips 192.168.12.1 192.168.12.15 |
2228 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2230 Threading Support for an application that doesn't support it
2232 interlace -tL ./target-list.txt -c \
2233 "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
2234 -o ~/scans/ -threads 50
2236 cat ./target-list.txt |
2237 parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
2239 alternatively
2241 ./vhosts-commands.txt:
2242 vhostscan -t $target -oN _output_/_target_-vhosts.txt
2243 interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
2244 -threads 50 -o ~/scans
2246 ./vhosts-commands.txt:
2247 vhostscan -t "$1" -oN "$2"
2248 parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
2249 :::: ./target-list.txt
2251 Exclusions
2253 interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
2254 "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2255 -o ~/scans/ -threads 50
2257 prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
2258 parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2260 Run Nikto Using Multiple Proxies
2262 interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
2263 "nikto --host _target_:_port_ -useproxy _proxy_ > \
2264 ./_target_-_port_-nikto.txt" -p 80,443 -v
2266 parallel -j5 \
2267 "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
2268 :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
2270 https://github.com/codingo/Interlace (Last checked: 2019-09)
2273 =head2 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
2275 I have been unable to get the code to run at all. It seems unfinished.
2277 https://github.com/otonvm/Parallel (Last checked: 2019-02)
2280 =head2 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
2282 B<par> requires Haskell to work. This limits the number of platforms
2283 this can work on.
2285 B<par> does line buffering in memory. The memory usage is 3x the
2286 longest line (compared to 1x for B<parallel --lb>). Commands must be
2287 given as arguments. There is no template.
2289 These are the examples from https://github.com/k-bx/par with the
2290 corresponding GNU B<parallel> command.
2292 par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2293 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2294 parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2295 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2297 par "echo foo; sleep 1; foofoo" \
2298 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2299 parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
2300 "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2302 par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
2303 parallel --lb --colsep , --tagstring {1} {2} \
2304 ::: "[fooechoer],echo foo" "[bar],echo bar"
2306 par --succeed "foo" "bar" && echo 'wow'
2307 parallel "foo" "bar"; true && echo 'wow'
2309 https://github.com/k-bx/par (Last checked: 2019-02)
2311 =head2 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2313 B<parallelshell> does not allow for composed commands:
2315 # This does not work
2316 parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2318 Instead you have to wrap that in a shell:
2320 parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2322 It buffers output in RAM. All commands must be given on the command
2323 line and all commands are started in parallel at the same time. This
2324 will cause the system to freeze if there are so many jobs that there
2325 is not enough memory to run them all at the same time.
2327 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2329 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2332 =head2 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2334 B<shell-executor> does not allow for composed commands:
2336 # This does not work
2337 sx 'echo foo;echo bar' 'echo baz;echo quuz'
2339 Instead you have to wrap that in a shell:
2341 sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2343 It buffers output in RAM. All commands must be given on the command
2344 line and all commands are started in parallel at the same time. This
2345 will cause the system to freeze if there are so many jobs that there
2346 is not enough memory to run them all at the same time.
2348 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2351 =head2 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2353 B<par> buffers in memory to avoid mixing of jobs. It takes 1s per 1
2354 million output lines.
2356 B<par> needs to have all commands before starting the first job. The
2357 jobs are read from stdin (standard input) so any quoting will have to
2358 be done by the user.
2360 Stdout (standard output) is prepended with o:. Stderr (standard error)
2361 is sendt to stdout (standard output) and prepended with e:.
2363 For short jobs with little output B<par> is 20% faster than GNU
2364 B<parallel> and 60% slower than B<xargs>.
2366 https://github.com/UnixJunkie/PAR
2368 https://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2371 =head2 DIFFERENCES BETWEEN fd AND GNU Parallel
2373 B<fd> does not support composed commands, so commands must be wrapped
2374 in B<sh -c>.
2376 It buffers output in RAM.
2378 It only takes file names from the filesystem as input (similar to B<find>).
2380 https://github.com/sharkdp/fd (Last checked: 2019-02)
2383 =head2 DIFFERENCES BETWEEN lateral AND GNU Parallel
2385 B<lateral> is very similar to B<sem>: It takes a single command and
2386 runs it in the background. The design means that output from parallel
2387 running jobs may mix. If it dies unexpectly it leaves a socket in
2388 ~/.lateral/socket.PID.
2390 B<lateral> deals badly with too long command lines. This makes the
2391 B<lateral> server crash:
2393 lateral run echo `seq 100000| head -c 1000k`
2395 Any options will be read by B<lateral> so this does not work
2396 (B<lateral> interprets the B<-l>):
2398 lateral run ls -l
2400 Composed commands do not work:
2402 lateral run pwd ';' ls
2404 Functions do not work:
2406 myfunc() { echo a; }
2407 export -f myfunc
2408 lateral run myfunc
2410 Running B<emacs> in the terminal causes the parent shell to die:
2412 echo '#!/bin/bash' > mycmd
2413 echo emacs -nw >> mycmd
2414 chmod +x mycmd
2415 lateral start
2416 lateral run ./mycmd
2418 Here are the examples from https://github.com/akramer/lateral with the
2419 corresponding GNU B<sem> and GNU B<parallel> commands:
2421 1$ lateral start
2422 for i in $(cat /tmp/names); do
2423 lateral run -- some_command $i
2424 done
2425 lateral wait
2427 1$ for i in $(cat /tmp/names); do
2428 sem some_command $i
2429 done
2430 sem --wait
2432 1$ parallel some_command :::: /tmp/names
2434 2$ lateral start
2435 for i in $(seq 1 100); do
2436 lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2437 done
2438 lateral wait
2440 2$ for i in $(seq 1 100); do
2441 sem my_slow_command < workfile$i > /tmp/logfile$i
2442 done
2443 sem --wait
2445 2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2446 ::: {1..100}
2448 3$ lateral start -p 0 # yup, it will just queue tasks
2449 for i in $(seq 1 100); do
2450 lateral run -- command_still_outputs_but_wont_spam inputfile$i
2451 done
2452 # command output spam can commence
2453 lateral config -p 10; lateral wait
2455 3$ for i in $(seq 1 100); do
2456 echo "command inputfile$i" >> joblist
2457 done
2458 parallel -j 10 :::: joblist
2460 3$ echo 1 > /tmp/njobs
2461 parallel -j /tmp/njobs command inputfile{} \
2462 ::: {1..100} &
2463 echo 10 >/tmp/njobs
2464 wait
2466 https://github.com/akramer/lateral (Last checked: 2019-03)
2469 =head2 DIFFERENCES BETWEEN with-this AND GNU Parallel
2471 The examples from https://github.com/amritb/with-this.git and the
2472 corresponding GNU B<parallel> command:
2474 with -v "$(cat myurls.txt)" "curl -L this"
2475 parallel curl -L ::: myurls.txt
2477 with -v "$(cat myregions.txt)" \
2478 "aws --region=this ec2 describe-instance-status"
2479 parallel aws --region={} ec2 describe-instance-status \
2480 :::: myregions.txt
2482 with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2483 ls | parallel kubectl --kubeconfig={} get pods
2485 with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2486 ls | grep config | parallel kubectl --kubeconfig={} get pods
2488 with -v "$(echo {1..10})" "echo 123"
2489 parallel -N0 echo 123 ::: {1..10}
2491 Stderr is merged with stdout. B<with-this> buffers in RAM. It uses 3x
2492 the output size, so you cannot have output larger than 1/3rd the
2493 amount of RAM. The input values cannot contain spaces. Composed
2494 commands do not work.
2496 B<with-this> gives some additional information, so the output has to
2497 be cleaned before piping it to the next command.
2499 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2502 =head2 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2504 Summary (see legend above):
2506 =over
2508 =item - - - I4 - - I7
2510 =item - - M3 - - M6
2512 =item - O2 O3 - O5 O6 - x x
2514 =item E1 - - - - - E7
2516 =item - x x x x x x x x
2518 =item - -
2520 =back
2522 =head3 EXAMPLES FROM Tollef's parallel MANUAL
2524 B<Tollef> parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2526 B<GNU> parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2528 B<Tollef> parallel -j 3 ufraw -o processed -- *.NEF
2530 B<GNU> parallel -j 3 ufraw -o processed ::: *.NEF
2532 B<Tollef> parallel -j 3 -- ls df "echo hi"
2534 B<GNU> parallel -j 3 ::: ls df "echo hi"
2536 (Last checked: 2019-08)
2538 =head2 DIFFERENCES BETWEEN rargs AND GNU Parallel
2540 Summary (see legend above):
2542 =over
2544 =item I1 - - - - - I7
2546 =item - - M3 M4 - -
2548 =item - O2 O3 - O5 O6 - O8 -
2550 =item E1 - - E4 - - -
2552 =item - - - - - - - - -
2554 =item - -
2556 =back
2558 B<rargs> has elegant ways of doing named regexp capture and field ranges.
2560 With GNU B<parallel> you can use B<--rpl> to get a similar
2561 functionality as regexp capture gives, and use B<join> and B<@arg> to
2562 get the field ranges. But the syntax is longer. This:
2564 --rpl '{r(\d+)\.\.(\d+)} $_=join"$opt::colsep",@arg[$$1..$$2]'
2566 would make it possible to use:
2568 {1r3..6}
2570 for field 3..6.
2572 For full support of {n..m:s} including negative numbers use a dynamic
2573 replacement string like this:
2576 PARALLEL=--rpl\ \''{r((-?\d+)?)\.\.((-?\d+)?)((:([^}]*))?)}
2577 $a = defined $$2 ? $$2 < 0 ? 1+$#arg+$$2 : $$2 : 1;
2578 $b = defined $$4 ? $$4 < 0 ? 1+$#arg+$$4 : $$4 : $#arg+1;
2579 $s = defined $$6 ? $$7 : " ";
2580 $_ = join $s,@arg[$a..$b]'\'
2581 export PARALLEL
2583 You can then do:
2585 head /etc/passwd | parallel --colsep : echo ..={1r..} ..3={1r..3} \
2586 4..={1r4..} 2..4={1r2..4} 3..3={1r3..3} ..3:-={1r..3:-} \
2587 ..3:/={1r..3:/} -1={-1} -5={-5} -6={-6} -3..={1r-3..}
2589 =head3 EXAMPLES FROM rargs MANUAL
2591 ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}
2592 ls *.bak | parallel mv {} {.}
2594 cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}
2595 cat download-list.csv | parallel --csv wget {1} -O {2}
2596 # or use regexps:
2597 cat download-list.csv |
2598 parallel --rpl '{url} s/,.*//' --rpl '{filename} s/.*?,//' wget {url} -O {filename}
2600 cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
2601 cat /etc/passwd |
2602 parallel -q --colsep : echo -e 'id: "{1}"\t name: "{5}"\t rest: "{=6 $_=join":",@arg[6..$#arg]=}"'
2604 https://github.com/lotabout/rargs (Last checked: 2020-01)
2607 =head2 DIFFERENCES BETWEEN threader AND GNU Parallel
2609 Summary (see legend above):
2611 =over
2613 =item I1 - - - - - -
2615 =item M1 - M3 - - M6
2617 =item O1 - O3 - O5 - - N/A N/A
2619 =item E1 - - E4 - - -
2621 =item - - - - - - - - -
2623 =item - -
2625 =back
2627 Newline separates arguments, but newline at the end of file is treated
2628 as an empty argument. So this runs 2 jobs:
2630 echo two_jobs | threader -run 'echo "$THREADID"'
2632 B<threader> ignores stderr, so any output to stderr is
2633 lost. B<threader> buffers in RAM, so output bigger than the machine's
2634 virtual memory will cause the machine to crash.
2636 https://github.com/voodooEntity/threader (Last checked: 2020-04)
2639 =head2 DIFFERENCES BETWEEN runp AND GNU Parallel
2641 Summary (see legend above):
2643 =over
2645 =item I1 I2 - - - - -
2647 =item M1 - (M3) - - M6
2649 =item O1 O2 O3 - O5 O6 - N/A N/A -
2651 =item E1 - - - - - -
2653 =item - - - - - - - - -
2655 =item - -
2657 =back
2659 (M3): You can add a prefix and a postfix to the input, so it means you can
2660 only insert the argument on the command line once.
2662 B<runp> runs 10 jobs in parallel by default. B<runp> blocks if output
2663 of a command is > 64 Kbytes. Quoting of input is needed. It adds
2664 output to stderr (this can be prevented with -q)
2666 =head3 Examples as GNU Parallel
2668 base='https://images-api.nasa.gov/search'
2669 query='jupiter'
2670 desc='planet'
2671 type='image'
2672 url="$base?q=$query&description=$desc&media_type=$type"
2674 # Download the images in parallel using runp
2675 curl -s $url | jq -r .collection.items[].href | \
2676 runp -p 'curl -s' | jq -r .[] | grep large | \
2677 runp -p 'curl -s -L -O'
2679 time curl -s $url | jq -r .collection.items[].href | \
2680 runp -g 1 -q -p 'curl -s' | jq -r .[] | grep large | \
2681 runp -g 1 -q -p 'curl -s -L -O'
2683 # Download the images in parallel
2684 curl -s $url | jq -r .collection.items[].href | \
2685 parallel curl -s | jq -r .[] | grep large | \
2686 parallel curl -s -L -O
2688 time curl -s $url | jq -r .collection.items[].href | \
2689 parallel -j 1 curl -s | jq -r .[] | grep large | \
2690 parallel -j 1 curl -s -L -O
2693 =head4 Run some test commands (read from file)
2695 # Create a file containing commands to run in parallel.
2696 cat << EOF > /tmp/test-commands.txt
2697 sleep 5
2698 sleep 3
2699 blah # this will fail
2700 ls $PWD # PWD shell variable is used here
2703 # Run commands from the file.
2704 runp /tmp/test-commands.txt > /dev/null
2706 parallel -a /tmp/test-commands.txt > /dev/null
2708 =head4 Ping several hosts and see packet loss (read from stdin)
2710 # First copy this line and press Enter
2711 runp -p 'ping -c 5 -W 2' -s '| grep loss'
2712 localhost
2713 1.1.1.1
2714 8.8.8.8
2715 # Press Enter and Ctrl-D when done entering the hosts
2717 # First copy this line and press Enter
2718 parallel ping -c 5 -W 2 {} '| grep loss'
2719 localhost
2720 1.1.1.1
2721 8.8.8.8
2722 # Press Enter and Ctrl-D when done entering the hosts
2724 =head4 Get directories' sizes (read from stdin)
2726 echo -e "$HOME\n/etc\n/tmp" | runp -q -p 'sudo du -sh'
2728 echo -e "$HOME\n/etc\n/tmp" | parallel sudo du -sh
2729 # or:
2730 parallel sudo du -sh ::: "$HOME" /etc /tmp
2732 =head4 Compress files
2734 find . -iname '*.txt' | runp -p 'gzip --best'
2736 find . -iname '*.txt' | parallel gzip --best
2738 =head4 Measure HTTP request + response time
2740 export CURL="curl -w 'time_total: %{time_total}\n'"
2741 CURL="$CURL -o /dev/null -s https://golang.org/"
2742 perl -wE 'for (1..10) { say $ENV{CURL} }' |
2743 runp -q # Make 10 requests
2745 perl -wE 'for (1..10) { say $ENV{CURL} }' | parallel
2746 # or:
2747 parallel -N0 "$CURL" ::: {1..10}
2749 =head4 Find open TCP ports
2751 cat << EOF > /tmp/host-port.txt
2752 localhost 22
2753 localhost 80
2754 localhost 81
2755 127.0.0.1 443
2756 127.0.0.1 444
2757 scanme.nmap.org 22
2758 scanme.nmap.org 23
2759 scanme.nmap.org 443
2762 1$ cat /tmp/host-port.txt |
2763 runp -q -p 'netcat -v -w2 -z' 2>&1 | egrep '(succeeded!|open)$'
2765 # --colsep is needed to split the line
2766 1$ cat /tmp/host-port.txt |
2767 parallel --colsep ' ' netcat -v -w2 -z 2>&1 |
2768 egrep '(succeeded!|open)$'
2769 # or use uq for unquoted:
2770 1$ cat /tmp/host-port.txt |
2771 parallel netcat -v -w2 -z {=uq=} 2>&1 |
2772 egrep '(succeeded!|open)$'
2774 https://github.com/jreisinger/runp (Last checked: 2020-04)
2777 =head2 DIFFERENCES BETWEEN papply AND GNU Parallel
2779 Summary (see legend above):
2781 =over
2783 =item - - - I4 - - -
2785 =item M1 - M3 - - M6
2787 =item - - O3 - O5 - - N/A N/A O10
2789 =item E1 - - E4 - - -
2791 =item - - - - - - - - -
2793 =item - -
2795 =back
2797 B<papply> does not print the output if the command fails:
2799 $ papply 'echo %F; false' foo
2800 "echo foo; false" did not succeed
2802 B<papply>'s replacement strings (%F %d %f %n %e %z) can be simulated in GNU
2803 B<parallel> by putting this in B<~/.parallel/config>:
2805 --rpl '%F'
2806 --rpl '%d $_=Q(::dirname($_));'
2807 --rpl '%f s:.*/::;'
2808 --rpl '%n s:.*/::;s:\.[^/.]+$::;'
2809 --rpl '%e s:.*\.:.:'
2810 --rpl '%z $_=""'
2812 B<papply> buffers in RAM, and uses twice the amount of output. So
2813 output of 5 GB takes 10 GB RAM.
2815 The buffering is very CPU intensive: Buffering a line of 5 GB takes 40
2816 seconds (compared to 10 seconds with GNU B<parallel>).
2819 =head3 Examples as GNU Parallel
2821 1$ papply gzip *.txt
2823 1$ parallel gzip ::: *.txt
2825 2$ papply "convert %F %n.jpg" *.png
2827 2$ parallel convert {} {.}.jpg ::: *.png
2830 https://pypi.org/project/papply/ (Last checked: 2020-04)
2833 =head2 DIFFERENCES BETWEEN async AND GNU Parallel
2835 Summary (see legend above):
2837 =over
2839 =item - - - I4 - - I7
2841 =item - - - - - M6
2843 =item - O2 O3 - O5 O6 - N/A N/A O10
2845 =item E1 - - E4 - E6 -
2847 =item - - - - - - - - -
2849 =item S1 S2
2851 =back
2853 B<async> is very similary to GNU B<parallel>'s B<--semaphore> mode
2854 (aka B<sem>). B<async> requires the user to start a server process.
2856 The input is quoted like B<-q> so you need B<bash -c "...;..."> to run
2857 composed commands.
2859 =head3 Examples as GNU Parallel
2861 1$ S="/tmp/example_socket"
2863 1$ ID=myid
2865 2$ async -s="$S" server --start
2867 2$ # GNU Parallel does not need a server to run
2869 3$ for i in {1..20}; do
2870 # prints command output to stdout
2871 async -s="$S" cmd -- bash -c "sleep 1 && echo test $i"
2872 done
2874 3$ for i in {1..20}; do
2875 # prints command output to stdout
2876 sem --id "$ID" -j100% "sleep 1 && echo test $i"
2877 # GNU Parallel will only print job when it is done
2878 # If you need output from different jobs to mix
2879 # use -u or --line-buffer
2880 sem --id "$ID" -j100% --line-buffer "sleep 1 && echo test $i"
2881 done
2883 4$ # wait until all commands are finished
2884 async -s="$S" wait
2886 4$ sem --id "$ID" --wait
2888 5$ # configure the server to run four commands in parallel
2889 async -s="$S" server -j4
2891 5$ export PARALLEL=-j4
2893 6$ mkdir "/tmp/ex_dir"
2894 for i in {21..40}; do
2895 # redirects command output to /tmp/ex_dir/file*
2896 async -s="$S" cmd -o "/tmp/ex_dir/file$i" -- \
2897 bash -c "sleep 1 && echo test $i"
2898 done
2900 6$ mkdir "/tmp/ex_dir"
2901 for i in {21..40}; do
2902 # redirects command output to /tmp/ex_dir/file*
2903 sem --id "$ID" --result '/tmp/my-ex/file-{=$_=""=}'"$i" \
2904 "sleep 1 && echo test $i"
2905 done
2907 7$ sem --id "$ID" --wait
2909 7$ async -s="$S" wait
2911 8$ # stops server
2912 async -s="$S" server --stop
2914 8$ # GNU Parallel does not need to stop a server
2917 https://github.com/ctbur/async/ (Last checked: 2020-11)
2920 =head2 DIFFERENCES BETWEEN pardi AND GNU Parallel
2922 Summary (see legend above):
2924 =over
2926 =item I1 I2 - - - - I7
2928 =item M1 - - - - M6
2930 =item O1 O2 O3 O4 O5 - O7 - - O10
2932 =item E1 - - E4 - - -
2934 =item - - - - - - - - -
2936 =item - -
2938 =back
2940 B<pardi> is very similar to B<parallel --pipe --cat>: It reads blocks
2941 of data and not arguments. So it cannot insert an argument in the
2942 command line. It puts the block into a temporary file, and this file
2943 name (%IN) can be put in the command line. You can only use %IN once.
2945 It can also run full command lines in parallel (like: B<cat file |
2946 parallel>).
2948 =head3 EXAMPLES FROM pardi test.sh
2950 1$ time pardi -v -c 100 -i data/decoys.smi -ie .smi -oe .smi \
2951 -o data/decoys_std_pardi.smi \
2952 -w '(standardiser -i %IN -o %OUT 2>&1) > /dev/null'
2954 1$ cat data/decoys.smi |
2955 time parallel -N 100 --pipe --cat \
2956 '(standardiser -i {} -o {#} 2>&1) > /dev/null; cat {#}; rm {#}' \
2957 > data/decoys_std_pardi.smi
2959 2$ pardi -n 1 -i data/test_in.types -o data/test_out.types \
2960 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2962 2$ cat data/test_in.types | parallel -n 1 -k --pipe --cat \
2963 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2965 3$ pardi -c 6 -i data/test_in.types -o data/test_out.types \
2966 -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2968 3$ cat data/test_in.types | parallel -n 6 -k --pipe --cat \
2969 --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2971 4$ pardi -i data/decoys.mol2 -o data/still_decoys.mol2 \
2972 -d 's:@<TRIPOS>MOLECULE' -w 'cp %IN %OUT'
2974 4$ cat data/decoys.mol2 |
2975 parallel -n 1 --pipe --cat --recstart '@<TRIPOS>MOLECULE' \
2976 'cp {} {#}; cat {#}; rm {#}' > data/still_decoys.mol2
2978 5$ pardi -i data/decoys.mol2 -o data/decoys2.mol2 \
2979 -d b:10000 -w 'cp %IN %OUT' --preserve
2981 5$ cat data/decoys.mol2 |
2982 parallel -k --pipe --block 10k --recend '' --cat \
2983 'cat {} > {#}; cat {#}; rm {#}' > data/decoys2.mol2
2985 https://github.com/UnixJunkie/pardi (Last checked: 2021-01)
2988 =head2 DIFFERENCES BETWEEN bthread AND GNU Parallel
2990 Summary (see legend above):
2992 =over
2994 =item - - - I4 - - -
2996 =item - - - - - M6
2998 =item O1 - O3 - - - O7 O8 - -
3000 =item E1 - - - - - -
3002 =item - - - - - - - - -
3004 =item - -
3006 =back
3008 B<bthread> takes around 1 sec per MB of output. The maximal output
3009 line length is 1073741759.
3011 You cannot quote space in the command, so you cannot run composed
3012 commands like B<sh -c "echo a; echo b">.
3014 https://gitlab.com/netikras/bthread (Last checked: 2021-01)
3017 =head2 DIFFERENCES BETWEEN simple_gpu_scheduler AND GNU Parallel
3019 Summary (see legend above):
3021 =over
3023 =item I1 - - - - - I7
3025 =item M1 - - - - M6
3027 =item - O2 O3 - - O6 - x x O10
3029 =item E1 - - - - - -
3031 =item - - - - - - - - -
3033 =item - -
3035 =back
3037 =head3 EXAMPLES FROM simple_gpu_scheduler MANUAL
3039 1$ simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
3041 1$ parallel -j3 --shuf \
3042 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
3044 2$ simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
3045 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
3046 simple_gpu_scheduler --gpus 0,1,2
3048 2$ parallel --header : --shuf -j3 -v \
3049 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' \
3050 python3 train_dnn.py --lr {lr} --batch_size {bs} \
3051 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3053 3$ simple_hypersearch \
3054 "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
3055 --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
3056 simple_gpu_scheduler --gpus 0,1,2
3058 3$ parallel --header : --shuf \
3059 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' \
3060 python3 train_dnn.py --lr {lr} --batch_size {bs} \
3061 ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3063 4$ touch gpu.queue
3064 tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
3065 echo "my_command_with | and stuff > logfile" >> gpu.queue
3067 4$ touch gpu.queue
3068 tail -f -n 0 gpu.queue |
3069 parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
3070 # Needed to fill job slots once
3071 seq 3 | parallel echo true >> gpu.queue
3072 # Add jobs
3073 echo "my_command_with | and stuff > logfile" >> gpu.queue
3074 # Needed to flush output from completed jobs
3075 seq 3 | parallel echo true >> gpu.queue
3077 https://github.com/ExpectationMax/simple_gpu_scheduler (Last checked:
3078 2021-01)
3081 =head2 DIFFERENCES BETWEEN parasweep AND GNU Parallel
3083 B<parasweep> is a Python module for facilitating parallel parameter
3084 sweeps.
3086 A B<parasweep> job will normally take a text file as input. The text
3087 file contains arguments for the job. Some of these arguments will be
3088 fixed and some of them will be changed by B<parasweep>.
3090 It does this by having a template file such as template.txt:
3092 Xval: {x}
3093 Yval: {y}
3094 FixedValue: 9
3095 # x with 2 decimals
3096 DecimalX: {x:.2f}
3097 TenX: ${x*10}
3098 RandomVal: {r}
3100 and from this template it generates the file to be used by the job by
3101 replacing the replacement strings.
3103 Being a Python module B<parasweep> integrates tighter with Python than
3104 GNU B<parallel>. You get the parameters directly in a Python data
3105 structure. With GNU B<parallel> you can use the JSON or CSV output
3106 format to get something similar, but you would have to read the
3107 output.
3109 B<parasweep> has a filtering method to ignore parameter combinations
3110 you do not need.
3112 Instead of calling the jobs directly, B<parasweep> can use Python's
3113 Distributed Resource Management Application API to make jobs run with
3114 different cluster software.
3117 GNU B<parallel> B<--tmpl> supports templates with replacement
3118 strings. Such as:
3120 Xval: {x}
3121 Yval: {y}
3122 FixedValue: 9
3123 # x with 2 decimals
3124 DecimalX: {=x $_=sprintf("%.2f",$_) =}
3125 TenX: {=x $_=$_*10 =}
3126 RandomVal: {=1 $_=rand() =}
3128 that can be used like:
3130 parallel --header : --tmpl my.tmpl={#}.t myprog {#}.t \
3131 ::: x 1 2 3 ::: y 1 2 3
3133 Filtering is supported as:
3135 parallel --filter '{1} > {2}' echo ::: 1 2 3 ::: 1 2 3
3137 https://github.com/eviatarbach/parasweep (Last checked: 2021-01)
3140 =head2 DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
3142 Summary (see legend above):
3144 =over
3146 =item I1 I2 - - - - -
3148 =item - - M3 - - M6
3150 =item - O2 O3 - O5 O6 - O8 x O10
3152 =item E1 - - - - - -
3154 =item - - - - - - - - -
3156 =item - -
3158 =back
3160 B<parallel-bash> is written in pure bash. It is really fast (overhead
3161 of ~0.05 ms/job compared to GNU B<parallel>'s ~3 ms/job). So if your
3162 jobs are extremely short lived, and you can live with the quite
3163 limited command, this may be useful.
3165 It works by making a queue for each process. Then the jobs are
3166 distributed to the queues in a round robin fashion. Finally the queues
3167 are started in parallel. This works fine, if you are lucky, but if
3168 not, all the long jobs may end up in the same queue, so you may see:
3170 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
3171 time parallel -P4 sleep {}
3172 (7 seconds)
3173 $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
3174 time ./parallel-bash.bash -p 4 -c sleep {}
3175 (12 seconds)
3177 Ctrl-C does not stop spawning new jobs. Ctrl-Z does not suspend
3178 running jobs.
3181 =head3 EXAMPLES FROM parallel-bash
3183 1$ some_input | parallel-bash -p 5 -c echo
3185 1$ some_input | parallel -j 5 echo
3187 2$ parallel-bash -p 5 -c echo < some_file
3189 2$ parallel -j 5 echo < some_file
3191 3$ parallel-bash -p 5 -c echo <<< 'some string'
3193 3$ parallel -j 5 -c echo <<< 'some string'
3195 4$ something | parallel-bash -p 5 -c echo {} {}
3197 4$ something | parallel -j 5 echo {} {}
3199 https://reposhub.com/python/command-line-tools/Akianonymus-parallel-bash.html
3200 (Last checked: 2021-06)
3203 =head2 DIFFERENCES BETWEEN bash-concurrent AND GNU Parallel
3205 B<bash-concurrent> is more an alternative to B<make> than to GNU
3206 B<parallel>. Its input is very similar to a Makefile, where jobs
3207 depend on other jobs.
3209 It has a nice progress indicator where you can see which jobs
3210 completed successfully, which jobs are currently running, which jobs
3211 failed, and which jobs were skipped due to a depending job failed.
3212 The indicator does not deal well with resizing the window.
3214 Output is cached in tempfiles on disk, but is only shown if there is
3215 an error, so it is not meant to be part of a UNIX pipeline. If
3216 B<bash-concurrent> crashes these tempfiles are not removed.
3218 It uses an O(n*n) algorithm, so if you have 1000 independent jobs it
3219 takes 22 seconds to start it.
3221 https://github.com/themattrix/bash-concurrent
3222 (Last checked: 2021-02)
3225 =head2 Todo
3227 http://code.google.com/p/spawntool/
3229 http://code.google.com/p/push/
3231 https://github.com/mylanconnolly/parallel
3233 https://github.com/krashanoff/parallel
3235 https://github.com/Nukesor/pueue
3237 https://arxiv.org/pdf/2012.15443.pdf KumQuat
3239 https://arxiv.org/pdf/2007.09436.pdf PaSH: Light-touch Data-Parallel Shell Processing
3241 https://github.com/JeiKeiLim/simple_distribute_job
3243 https://github.com/reggi/pkgrun - not obvious how to use
3245 https://github.com/benoror/better-npm-run - not obvious how to use
3247 https://github.com/bahmutov/with-package
3249 https://github.com/xuchenCN/go-pssh
3251 https://github.com/flesler/parallel
3253 https://github.com/Julian/Verge
3255 https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html
3257 https://vicerveza.homeunix.net/~viric/soft/ts/
3259 https://github.com/chapmanjacobd/que
3261 https://github.com/ExpectationMax/simple_gpu_scheduler
3262 simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
3263 parallel -j3 --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
3265 simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
3266 parallel --header : --shuf -j3 -v CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3268 simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
3269 parallel --header : --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3271 touch gpu.queue
3272 tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
3273 echo "my_command_with | and stuff > logfile" >> gpu.queue
3275 touch gpu.queue
3276 tail -f -n 0 gpu.queue | parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
3277 # Needed to fill job slots once
3278 seq 3 | parallel echo true >> gpu.queue
3279 # Add jobs
3280 echo "my_command_with | and stuff > logfile" >> gpu.queue
3281 # Needed to flush output from completed jobs
3282 seq 3 | parallel echo true >> gpu.queue
3284 https://github.com/Overv/outrun#outrun
3286 =head1 TESTING OTHER TOOLS
3288 There are certain issues that are very common on parallelizing
3289 tools. Here are a few stress tests. Be warned: If the tool is badly
3290 coded it may overload your machine.
3293 =head2 MIX: Output mixes
3295 Output from 2 jobs should not mix. If the output is not used, this
3296 does not matter; but if the output I<is> used then it is important
3297 that you do not get half a line from one job followed by half a line
3298 from another job.
3300 If the tool does not buffer, output will most likely mix now and then.
3302 This test stresses whether output mixes.
3304 #!/bin/bash
3306 paralleltool="parallel -j0"
3308 cat <<-EOF > mycommand
3309 #!/bin/bash
3311 # If a, b, c, d, e, and f mix: Very bad
3312 perl -e 'print STDOUT "a"x3000_000," "'
3313 perl -e 'print STDERR "b"x3000_000," "'
3314 perl -e 'print STDOUT "c"x3000_000," "'
3315 perl -e 'print STDERR "d"x3000_000," "'
3316 perl -e 'print STDOUT "e"x3000_000," "'
3317 perl -e 'print STDERR "f"x3000_000," "'
3318 echo
3319 echo >&2
3321 chmod +x mycommand
3323 # Run 30 jobs in parallel
3324 seq 30 |
3325 $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
3327 # 'a c e' and 'b d f' should always stay together
3328 # and there should only be a single line per job
3331 =head2 STDERRMERGE: Stderr is merged with stdout
3333 Output from stdout and stderr should not be merged, but kept separated.
3335 This test shows whether stdout is mixed with stderr.
3337 #!/bin/bash
3339 paralleltool="parallel -j0"
3341 cat <<-EOF > mycommand
3342 #!/bin/bash
3344 echo stdout
3345 echo stderr >&2
3346 echo stdout
3347 echo stderr >&2
3349 chmod +x mycommand
3351 # Run one job
3352 echo |
3353 $paralleltool ./mycommand > stdout 2> stderr
3354 cat stdout
3355 cat stderr
3358 =head2 RAM: Output limited by RAM
3360 Some tools cache output in RAM. This makes them extremely slow if the
3361 output is bigger than physical memory and crash if the output is
3362 bigger than the virtual memory.
3364 #!/bin/bash
3366 paralleltool="parallel -j0"
3368 cat <<'EOF' > mycommand
3369 #!/bin/bash
3371 # Generate 1 GB output
3372 yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
3374 chmod +x mycommand
3376 # Run 20 jobs in parallel
3377 # Adjust 20 to be > physical RAM and < free space on /tmp
3378 seq 20 | time $paralleltool ./mycommand | wc -c
3381 =head2 DISKFULL: Incomplete data if /tmp runs full
3383 If caching is done on disk, the disk can run full during the run. Not
3384 all programs discover this. GNU Parallel discovers it, if it stays
3385 full for at least 2 seconds.
3387 #!/bin/bash
3389 paralleltool="parallel -j0"
3391 # This should be a dir with less than 100 GB free space
3392 smalldisk=/tmp/shm/parallel
3394 TMPDIR="$smalldisk"
3395 export TMPDIR
3397 max_output() {
3398 # Force worst case scenario:
3399 # Make GNU Parallel only check once per second
3400 sleep 10
3401 # Generate 100 GB to fill $TMPDIR
3402 # Adjust if /tmp is bigger than 100 GB
3403 yes | head -c 100G >$TMPDIR/$$
3404 # Generate 10 MB output that will not be buffered due to full disk
3405 perl -e 'print "X"x10_000_000' | head -c 10M
3406 echo This part is missing from incomplete output
3407 sleep 2
3408 rm $TMPDIR/$$
3409 echo Final output
3412 export -f max_output
3413 seq 10 | $paralleltool max_output | tr -s X
3416 =head2 CLEANUP: Leaving tmp files at unexpected death
3418 Some tools do not clean up tmp files if they are killed. If the tool
3419 buffers on disk, they may not clean up, if they are killed.
3421 #!/bin/bash
3423 paralleltool=parallel
3425 ls /tmp >/tmp/before
3426 seq 10 | $paralleltool sleep &
3427 pid=$!
3428 # Give the tool time to start up
3429 sleep 1
3430 # Kill it without giving it a chance to cleanup
3431 kill -9 $!
3432 # Should be empty: No files should be left behind
3433 diff <(ls /tmp) /tmp/before
3436 =head2 SPCCHAR: Dealing badly with special file names.
3438 It is not uncommon for users to create files like:
3440 My brother's 12" *** record (costs $$$).jpg
3442 Some tools break on this.
3444 #!/bin/bash
3446 paralleltool=parallel
3448 touch "My brother's 12\" *** record (costs \$\$\$).jpg"
3449 ls My*jpg | $paralleltool ls -l
3452 =head2 COMPOSED: Composed commands do not work
3454 Some tools require you to wrap composed commands into B<bash -c>.
3456 echo bar | $paralleltool echo foo';' echo {}
3459 =head2 ONEREP: Only one replacement string allowed
3461 Some tools can only insert the argument once.
3463 echo bar | $paralleltool echo {} foo {}
3466 =head2 INPUTSIZE: Length of input should not be limited
3468 Some tools limit the length of the input lines artificially with no good
3469 reason. GNU B<parallel> does not:
3471 perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
3473 GNU B<parallel> limits the command to run to 128 KB due to execve(1):
3475 perl -e 'print "x"x131_000' | parallel echo {} | wc
3478 =head2 NUMWORDS: Speed depends on number of words
3480 Some tools become very slow if output lines have many words.
3482 #!/bin/bash
3484 paralleltool=parallel
3486 cat <<-EOF > mycommand
3487 #!/bin/bash
3489 # 10 MB of lines with 1000 words
3490 yes "`seq 1000`" | head -c 10M
3492 chmod +x mycommand
3494 # Run 30 jobs in parallel
3495 seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
3497 =head2 4GB: Output with a line > 4GB should be OK
3499 #!/bin/bash
3501 paralleltool="parallel -j0"
3503 cat <<-EOF > mycommand
3504 #!/bin/bash
3506 perl -e '\$a="a"x1000_000; for(1..5000) { print \$a }'
3508 chmod +x mycommand
3510 # Run 1 job
3511 seq 1 | $paralleltool ./mycommand | LC_ALL=C wc
3514 =head1 AUTHOR
3516 When using GNU B<parallel> for a publication please cite:
3518 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
3519 The USENIX Magazine, February 2011:42-47.
3521 This helps funding further development; and it won't cost you a cent.
3522 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
3524 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
3526 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
3528 Copyright (C) 2010-2021 Ole Tange, http://ole.tange.dk and Free
3529 Software Foundation, Inc.
3531 Parts of the manual concerning B<xargs> compatibility is inspired by
3532 the manual of B<xargs> from GNU findutils 4.4.2.
3535 =head1 LICENSE
3537 This program is free software; you can redistribute it and/or modify
3538 it under the terms of the GNU General Public License as published by
3539 the Free Software Foundation; either version 3 of the License, or
3540 at your option any later version.
3542 This program is distributed in the hope that it will be useful,
3543 but WITHOUT ANY WARRANTY; without even the implied warranty of
3544 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
3545 GNU General Public License for more details.
3547 You should have received a copy of the GNU General Public License
3548 along with this program. If not, see <https://www.gnu.org/licenses/>.
3550 =head2 Documentation license I
3552 Permission is granted to copy, distribute and/or modify this
3553 documentation under the terms of the GNU Free Documentation License,
3554 Version 1.3 or any later version published by the Free Software
3555 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
3556 with no Back-Cover Texts. A copy of the license is included in the
3557 file LICENSES/GFDL-1.3-or-later.txt.
3559 =head2 Documentation license II
3561 You are free:
3563 =over 9
3565 =item B<to Share>
3567 to copy, distribute and transmit the work
3569 =item B<to Remix>
3571 to adapt the work
3573 =back
3575 Under the following conditions:
3577 =over 9
3579 =item B<Attribution>
3581 You must attribute the work in the manner specified by the author or
3582 licensor (but not in any way that suggests that they endorse you or
3583 your use of the work).
3585 =item B<Share Alike>
3587 If you alter, transform, or build upon this work, you may distribute
3588 the resulting work only under the same, similar or a compatible
3589 license.
3591 =back
3593 With the understanding that:
3595 =over 9
3597 =item B<Waiver>
3599 Any of the above conditions can be waived if you get permission from
3600 the copyright holder.
3602 =item B<Public Domain>
3604 Where the work or any of its elements is in the public domain under
3605 applicable law, that status is in no way affected by the license.
3607 =item B<Other Rights>
3609 In no way are any of the following rights affected by the license:
3611 =over 2
3613 =item *
3615 Your fair dealing or fair use rights, or other applicable
3616 copyright exceptions and limitations;
3618 =item *
3620 The author's moral rights;
3622 =item *
3624 Rights other persons may have either in the work itself or in
3625 how the work is used, such as publicity or privacy rights.
3627 =back
3629 =back
3631 =over 9
3633 =item B<Notice>
3635 For any reuse or distribution, you must make clear to others the
3636 license terms of this work.
3638 =back
3640 A copy of the full license is included in the file as
3641 LICENCES/CC-BY-SA-4.0.txt
3644 =head1 DEPENDENCIES
3646 GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
3647 IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
3648 it also uses rsync with ssh.
3651 =head1 SEE ALSO
3653 B<find>(1), B<xargs>(1), B<make>(1), B<pexec>(1), B<ppss>(1),
3654 B<xjobs>(1), B<prll>(1), B<dxargs>(1), B<mdm>(1)
3656 =cut