src/parallel_alternatives.pod

   1 #!/usr/bin/perl -w
   2
   3 # SPDX-FileCopyrightText: 2021 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
   4 # SPDX-License-Identifier: GFDL-1.3-or-later
   5 # SPDX-License-Identifier: CC-BY-SA-4.0
   6
   7 =encoding utf8
   8
   9 =head1 NAME
  10
  11 parallel_alternatives - Alternatives to GNU B<parallel>
  12
  13
  14 =head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
  15
  16 There are a lot programs with some of the functionality of GNU
  17 B<parallel>. GNU B<parallel> strives to include the best of the
  18 functionality without sacrificing ease of use.
  19
  20 B<parallel> has existed since 2002 and as GNU B<parallel> since
  21 2010. A lot of the alternatives have not had the vitality to survive
  22 that long, but have come and gone during that time.
  23
  24 GNU B<parallel> is actively maintained with a new release every month
  25 since 2010. Most other alternatives are fleeting interests of the
  26 developers with irregular releases and only maintained for a few
  27 years.
  28
  29
  30 =head2 SUMMARY LEGEND
  31
  32 The following features are in some of the comparable tools:
  33
  34 =head3 Inputs
  35
  36 =over
  37
  38 =item I1. Arguments can be read from stdin
  39
  40 =item I2. Arguments can be read from a file
  41
  42 =item I3. Arguments can be read from multiple files
  43
  44 =item I4. Arguments can be read from command line
  45
  46 =item I5. Arguments can be read from a table
  47
  48 =item I6. Arguments can be read from the same file using #! (shebang)
  49
  50 =item I7. Line oriented input as default (Quoting of special chars not needed)
  51
  52 =back
  53
  54
  55 =head3 Manipulation of input
  56
  57 =over
  58
  59 =item M1. Composed command
  60
  61 =item M2. Multiple arguments can fill up an execution line
  62
  63 =item M3. Arguments can be put anywhere in the execution line
  64
  65 =item M4. Multiple arguments can be put anywhere in the execution line
  66
  67 =item M5. Arguments can be replaced with context
  68
  69 =item M6. Input can be treated as the complete command line
  70
  71 =back
  72
  73
  74 =head3 Outputs
  75
  76 =over
  77
  78 =item O1. Grouping output so output from different jobs do not mix
  79
  80 =item O2. Send stderr (standard error) to stderr (standard error)
  81
  82 =item O3. Send stdout (standard output) to stdout (standard output)
  83
  84 =item O4. Order of output can be same as order of input
  85
  86 =item O5. Stdout only contains stdout (standard output) from the command
  87
  88 =item O6. Stderr only contains stderr (standard error) from the command
  89
  90 =item O7. Buffering on disk
  91
  92 =item O8. Cleanup of temporary files if killed
  93
  94 =item O9. Test if disk runs full during run
  95
  96 =item O10. Output of a line bigger than 4 GB
  97
  98 =back
  99
 100
 101 =head3 Execution
 102
 103 =over
 104
 105 =item E1. Running jobs in parallel
 106
 107 =item E2. List running jobs
 108
 109 =item E3. Finish running jobs, but do not start new jobs
 110
 111 =item E4. Number of running jobs can depend on number of cpus
 112
 113 =item E5. Finish running jobs, but do not start new jobs after first failure
 114
 115 =item E6. Number of running jobs can be adjusted while running
 116
 117 =item E7. Only spawn new jobs if load is less than a limit
 118
 119 =back
 120
 121
 122 =head3 Remote execution
 123
 124 =over
 125
 126 =item R1. Jobs can be run on remote computers
 127
 128 =item R2. Basefiles can be transferred
 129
 130 =item R3. Argument files can be transferred
 131
 132 =item R4. Result files can be transferred
 133
 134 =item R5. Cleanup of transferred files
 135
 136 =item R6. No config files needed
 137
 138 =item R7. Do not run more than SSHD's MaxStartups can handle
 139
 140 =item R8. Configurable SSH command
 141
 142 =item R9. Retry if connection breaks occasionally
 143
 144 =back
 145
 146
 147 =head3 Semaphore
 148
 149 =over
 150
 151 =item S1. Possibility to work as a mutex
 152
 153 =item S2. Possibility to work as a counting semaphore
 154
 155 =back
 156
 157
 158 =head3 Legend
 159
 160 =over
 161
 162 =item - = no
 163
 164 =item x = not applicable
 165
 166 =item ID = yes
 167
 168 =back
 169
 170 As every new version of the programs are not tested the table may be
 171 outdated. Please file a bug report if you find errors (See REPORTING
 172 BUGS).
 173
 174 parallel:
 175
 176 =over
 177
 178 =item I1 I2 I3 I4 I5 I6 I7
 179
 180 =item M1 M2 M3 M4 M5 M6
 181
 182 =item O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
 183
 184 =item E1 E2 E3 E4 E5 E6 E7
 185
 186 =item R1 R2 R3 R4 R5 R6 R7 R8 R9
 187
 188 =item S1 S2
 189
 190 =back
 191
 192
 193 =head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
 194
 195 Summary (see legend above):
 196
 197 =over
 198
 199 =item I1 I2 - - - - -
 200
 201 =item - M2 M3 - - -
 202
 203 =item - O2 O3 - O5 O6
 204
 205 =item E1 - - - - - -
 206
 207 =item - - - - - x - - -
 208
 209 =item - -
 210
 211 =back
 212
 213 B<xargs> offers some of the same possibilities as GNU B<parallel>.
 214
 215 B<xargs> deals badly with special characters (such as space, \, ' and
 216 "). To see the problem try this:
 217
 218   touch important_file
 219   touch 'not important_file'
 220   ls not* | xargs rm
 221   mkdir -p "My brother's 12\" records"
 222   ls | xargs rmdir
 223   touch 'c:\windows\system32\clfs.sys'
 224   echo 'c:\windows\system32\clfs.sys' | xargs ls -l
 225
 226 You can specify B<-0>, but many input generators are not optimized for
 227 using B<NUL> as separator but are optimized for B<newline> as
 228 separator. E.g. B<awk>, B<ls>, B<echo>, B<tar -v>, B<head> (requires
 229 using B<-z>), B<tail> (requires using B<-z>), B<sed> (requires using
 230 B<-z>), B<perl> (B<-0> and \0 instead of \n), B<locate> (requires
 231 using B<-0>), B<find> (requires using B<-print0>), B<grep> (requires
 232 using B<-z> or B<-Z>), B<sort> (requires using B<-z>).
 233
 234 GNU B<parallel>'s newline separation can be emulated with:
 235
 236 B<cat | xargs -d "\n" -n1 I<command>>
 237
 238 B<xargs> can run a given number of jobs in parallel, but has no
 239 support for running number-of-cpu-cores jobs in parallel.
 240
 241 B<xargs> has no support for grouping the output, therefore output may
 242 run together, e.g. the first half of a line is from one process and
 243 the last half of the line is from another process. The example
 244 B<Parallel grep> cannot be done reliably with B<xargs> because of
 245 this. To see this in action try:
 246
 247   parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
 248     '>' {} ::: a b c d e f g h
 249   # Serial = no mixing = the wanted result
 250   # 'tr -s a-z' squeezes repeating letters into a single letter
 251   echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
 252   # Compare to 8 jobs in parallel
 253   parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
 254   echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
 255   echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
 256     tr -s a-z
 257
 258 Or try this:
 259
 260   slow_seq() {
 261     echo Count to "$@"
 262     seq "$@" |
 263       perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
 264   }
 265   export -f slow_seq
 266   # Serial = no mixing = the wanted result
 267   seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
 268   # Compare to 8 jobs in parallel
 269   seq 8 | parallel -P8 slow_seq {}
 270   seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
 271
 272 B<xargs> has no support for keeping the order of the output, therefore
 273 if running jobs in parallel using B<xargs> the output of the second
 274 job cannot be postponed till the first job is done.
 275
 276 B<xargs> has no support for running jobs on remote computers.
 277
 278 B<xargs> has no support for context replace, so you will have to create the
 279 arguments.
 280
 281 If you use a replace string in B<xargs> (B<-I>) you can not force
 282 B<xargs> to use more than one argument.
 283
 284 Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
 285 composed commands and redirection require using B<bash -c>.
 286
 287   ls | parallel "wc {} >{}.wc"
 288   ls | parallel "echo {}; ls {}|wc"
 289
 290 becomes (assuming you have 8 cores and that none of the filenames
 291 contain space, " or ').
 292
 293   ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
 294   ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
 295
 296 https://www.gnu.org/software/findutils/
 297
 298
 299 =head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
 300
 301 Summary (see legend above):
 302
 303 =over
 304
 305 =item -  -  -  x  -  x  -
 306
 307 =item -  M2 M3 -  -  -  -
 308
 309 =item -  O2 O3 O4 O5 O6
 310
 311 =item -  -  -  -  -  -  -
 312
 313 =item -  -  -  -  -  -  -  -  -
 314
 315 =item x  x
 316
 317 =back
 318
 319 B<find -exec> offers some of the same possibilities as GNU B<parallel>.
 320
 321 B<find -exec> only works on files. Processing other input (such as
 322 hosts or URLs) will require creating these inputs as files. B<find
 323 -exec> has no support for running commands in parallel.
 324
 325 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
 326
 327
 328 =head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
 329
 330 Summary (see legend above):
 331
 332 =over
 333
 334 =item -  -  -  -  -  -  -
 335
 336 =item -  -  -  -  -  -
 337
 338 =item O1 O2 O3 -  x  O6
 339
 340 =item E1 -  -  -  E5 -
 341
 342 =item -  -  -  -  -  -  -  -  -
 343
 344 =item -  -
 345
 346 =back
 347
 348 B<make -j> can run jobs in parallel, but requires a crafted Makefile
 349 to do this. That results in extra quoting to get filenames containing
 350 newlines to work correctly.
 351
 352 B<make -j> computes a dependency graph before running jobs. Jobs run
 353 by GNU B<parallel> does not depend on each other.
 354
 355 (Very early versions of GNU B<parallel> were coincidentally implemented
 356 using B<make -j>).
 357
 358 https://www.gnu.org/software/make/ (Last checked: 2019-01)
 359
 360
 361 =head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
 362
 363 Summary (see legend above):
 364
 365 =over
 366
 367 =item I1 I2 - - - - I7
 368
 369 =item M1 - M3 - - M6
 370
 371 =item O1 - - x - -
 372
 373 =item E1 E2 ?E3 E4 - - -
 374
 375 =item R1 R2 R3 R4 - - ?R7 ? ?
 376
 377 =item - -
 378
 379 =back
 380
 381 B<ppss> is also a tool for running jobs in parallel.
 382
 383 The output of B<ppss> is status information and thus not useful for
 384 using as input for another command. The output from the jobs are put
 385 into files.
 386
 387 The argument replace string ($ITEM) cannot be changed. Arguments must
 388 be quoted - thus arguments containing special characters (space '"&!*)
 389 may cause problems. More than one argument is not supported. Filenames
 390 containing newlines are not processed correctly. When reading input
 391 from a file null cannot be used as a terminator. B<ppss> needs to read
 392 the whole input file before starting any jobs.
 393
 394 Output and status information is stored in ppss_dir and thus requires
 395 cleanup when completed. If the dir is not removed before running
 396 B<ppss> again it may cause nothing to happen as B<ppss> thinks the
 397 task is already done. GNU B<parallel> will normally not need cleaning
 398 up if running locally and will only need cleaning up if stopped
 399 abnormally and running remote (B<--cleanup> may not complete if
 400 stopped abnormally). The example B<Parallel grep> would require extra
 401 postprocessing if written using B<ppss>.
 402
 403 For remote systems PPSS requires 3 steps: config, deploy, and
 404 start. GNU B<parallel> only requires one step.
 405
 406 =head3 EXAMPLES FROM ppss MANUAL
 407
 408 Here are the examples from B<ppss>'s manual page with the equivalent
 409 using GNU B<parallel>:
 410
 411   1$ ./ppss.sh standalone -d /path/to/files -c 'gzip '
 412
 413   1$ find /path/to/files -type f | parallel gzip
 414
 415   2$ ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
 416
 417   2$ find /path/to/files -type f | parallel cp {} /destination/dir
 418
 419   3$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
 420
 421   3$ parallel -a list-of-urls.txt wget -q
 422
 423   4$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
 424
 425   4$ parallel -a list-of-urls.txt wget -q {}
 426
 427   5$ ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir \
 428        -m 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh \
 429        -n nodes.txt -o /some/output/dir --upload --download;
 430      ./ppss deploy -C config.cfg
 431      ./ppss start -C config
 432
 433   5$ # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
 434      find source/dir -type f |
 435        parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
 436
 437   6$ ./ppss stop -C config.cfg
 438
 439   6$ killall -TERM parallel
 440
 441   7$ ./ppss pause -C config.cfg
 442
 443   7$ Press: CTRL-Z or killall -SIGTSTP parallel
 444
 445   8$ ./ppss continue -C config.cfg
 446
 447   8$ Enter: fg or killall -SIGCONT parallel
 448
 449   9$ ./ppss.sh status -C config.cfg
 450
 451   9$ killall -SIGUSR2 parallel
 452
 453 https://github.com/louwrentius/PPSS
 454
 455
 456 =head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
 457
 458 Summary (see legend above):
 459
 460 =over
 461
 462 =item I1 I2 - I4 I5 - -
 463
 464 =item M1 - M3 - - M6
 465
 466 =item O1 O2 O3 - O5 O6
 467
 468 =item E1 - - E4 - E6 -
 469
 470 =item R1 - - - - R6 - - -
 471
 472 =item S1 -
 473
 474 =back
 475
 476 B<pexec> is also a tool for running jobs in parallel.
 477
 478 =head3 EXAMPLES FROM pexec MANUAL
 479
 480 Here are the examples from B<pexec>'s info page with the equivalent
 481 using GNU B<parallel>:
 482
 483   1$ pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
 484        'echo "scale=10000;sqrt($NUM)" | bc'
 485
 486   1$ seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | \
 487        bc > sqrt-{}.dat'
 488
 489   2$ pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
 490
 491   2$ ls myfiles*.ext | parallel sort {} ">{}.sort"
 492
 493   3$ pexec -f image.list -n auto -e B -u star.log -c -- \
 494        'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
 495
 496   3$ parallel -a image.list \
 497        'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
 498
 499   4$ pexec -r *.png -e IMG -c -o - -- \
 500        'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
 501
 502   4$ ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
 503
 504   5$ pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 505
 506   5$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
 507
 508   6$ for p in *.png ; do echo ${p%.png} ; done | \
 509        pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 510
 511   6$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
 512
 513   7$ LIST=$(for p in *.png ; do echo ${p%.png} ; done)
 514      pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 515
 516   7$ ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
 517
 518   8$ pexec -n 8 -r *.jpg -y unix -e IMG -c \
 519        'pexec -j -m blockread -d $IMG | \
 520         jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
 521         pexec -j -m blockwrite -s th_$IMG'
 522
 523   8$ # Combining GNU B<parallel> and GNU B<sem>.
 524      ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
 525        'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
 526
 527      # If reading and writing is done to the same disk, this may be
 528      # faster as only one process will be either reading or writing:
 529      ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
 530        'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
 531
 532 https://www.gnu.org/software/pexec/
 533
 534
 535 =head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
 536
 537 B<xjobs> is also a tool for running jobs in parallel. It only supports
 538 running jobs on your local computer.
 539
 540 B<xjobs> deals badly with special characters just like B<xargs>. See
 541 the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
 542
 543 =head3 EXAMPLES FROM xjobs MANUAL
 544
 545 Here are the examples from B<xjobs>'s man page with the equivalent
 546 using GNU B<parallel>:
 547
 548   1$ ls -1 *.zip | xjobs unzip
 549
 550   1$ ls *.zip | parallel unzip
 551
 552   2$ ls -1 *.zip | xjobs -n unzip
 553
 554   2$ ls *.zip | parallel unzip >/dev/null
 555
 556   3$ find . -name '*.bak' | xjobs gzip
 557
 558   3$ find . -name '*.bak' | parallel gzip
 559
 560   4$ ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
 561
 562   4$ ls *.jar | parallel jar tf {} '>' {}.idx
 563
 564   5$ xjobs -s script
 565
 566   5$ cat script | parallel
 567
 568   6$ mkfifo /var/run/my_named_pipe;
 569      xjobs -s /var/run/my_named_pipe &
 570      echo unzip 1.zip >> /var/run/my_named_pipe;
 571      echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
 572
 573   6$ mkfifo /var/run/my_named_pipe;
 574      cat /var/run/my_named_pipe | parallel &
 575      echo unzip 1.zip >> /var/run/my_named_pipe;
 576      echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
 577
 578 https://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
 579
 580
 581 =head2 DIFFERENCES BETWEEN prll AND GNU Parallel
 582
 583 B<prll> is also a tool for running jobs in parallel. It does not
 584 support running jobs on remote computers.
 585
 586 B<prll> encourages using BASH aliases and BASH functions instead of
 587 scripts. GNU B<parallel> supports scripts directly, functions if they
 588 are exported using B<export -f>, and aliases if using B<env_parallel>.
 589
 590 B<prll> generates a lot of status information on stderr (standard
 591 error) which makes it harder to use the stderr (standard error) output
 592 of the job directly as input for another program.
 593
 594 =head3 EXAMPLES FROM prll's MANUAL
 595
 596 Here is the example from B<prll>'s man page with the equivalent
 597 using GNU B<parallel>:
 598
 599   1$ prll -s 'mogrify -flip $1' *.jpg
 600
 601   1$ parallel mogrify -flip ::: *.jpg
 602
 603 https://github.com/exzombie/prll (Last checked: 2019-01)
 604
 605
 606 =head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
 607
 608 B<dxargs> is also a tool for running jobs in parallel.
 609
 610 B<dxargs> does not deal well with more simultaneous jobs than SSHD's
 611 MaxStartups. B<dxargs> is only built for remote run jobs, but does not
 612 support transferring of files.
 613
 614 https://web.archive.org/web/20120518070250/http://www.
 615 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked: 2019-01)
 616
 617
 618 =head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
 619
 620 middleman(mdm) is also a tool for running jobs in parallel.
 621
 622 =head3 EXAMPLES FROM middleman's WEBSITE
 623
 624 Here are the shellscripts of
 625 https://web.archive.org/web/20110728064735/http://mdm.
 626 berlios.de/usage.html ported to GNU B<parallel>:
 627
 628   1$ seq 19 | parallel buffon -o - | sort -n > result
 629      cat files | parallel cmd
 630      find dir -execdir sem cmd {} \;
 631
 632 https://github.com/cklin/mdm (Last checked: 2019-01)
 633
 634
 635 =head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
 636
 637 B<xapply> can run jobs in parallel on the local computer.
 638
 639 =head3 EXAMPLES FROM xapply's MANUAL
 640
 641 Here are the examples from B<xapply>'s man page with the equivalent
 642 using GNU B<parallel>:
 643
 644   1$ xapply '(cd %1 && make all)' */
 645
 646   1$ parallel 'cd {} && make all' ::: */
 647
 648   2$ xapply -f 'diff %1 ../version5/%1' manifest | more
 649
 650   2$ parallel diff {} ../version5/{} < manifest | more
 651
 652   3$ xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
 653
 654   3$ parallel --link diff {1} {2} :::: manifest1 checklist1
 655
 656   4$ xapply 'indent' *.c
 657
 658   4$ parallel indent ::: *.c
 659
 660   5$ find ~ksb/bin -type f ! -perm -111 -print | \
 661        xapply -f -v 'chmod a+x' -
 662
 663   5$ find ~ksb/bin -type f ! -perm -111 -print | \
 664        parallel -v chmod a+x
 665
 666   6$ find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
 667
 668   6$ sh <(find */ -... | parallel -s 1024 echo vi)
 669
 670   6$ find */ -... | parallel -s 1024 -Xuj1 vi
 671
 672   7$ find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
 673
 674   7$ sh <(find ... | parallel -n5 echo vi)
 675
 676   7$ find ... | parallel -n5 -uj1 vi
 677
 678   8$ xapply -fn "" /etc/passwd
 679
 680   8$ parallel -k echo < /etc/passwd
 681
 682   9$ tr ':' '\012' < /etc/passwd | \
 683        xapply -7 -nf 'chown %1 %6' - - - - - - -
 684
 685   9$ tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
 686
 687   10$ xapply '[ -d %1/RCS ] || echo %1' */
 688
 689   10$ parallel '[ -d {}/RCS ] || echo {}' ::: */
 690
 691   11$ xapply -f '[ -f %1 ] && echo %1' List | ...
 692
 693   11$ parallel '[ -f {} ] && echo {}' < List | ...
 694
 695 https://web.archive.org/web/20160702211113/
 696 http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
 697
 698
 699 =head2 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
 700
 701 B<apply> can build command lines based on a template and arguments -
 702 very much like GNU B<parallel>. B<apply> does not run jobs in
 703 parallel. B<apply> does not use an argument separator (like B<:::>);
 704 instead the template must be the first argument.
 705
 706 =head3 EXAMPLES FROM IBM's KNOWLEDGE CENTER
 707
 708 Here are the examples from IBM's Knowledge Center and the
 709 corresponding command using GNU B<parallel>:
 710
 711 =head4 To obtain results similar to those of the B<ls> command, enter:
 712
 713   1$ apply echo *
 714   1$ parallel echo ::: *
 715
 716 =head4 To compare the file named a1 to the file named b1, and
 717 the file named a2 to the file named b2, enter:
 718
 719   2$ apply -2 cmp a1 b1 a2 b2
 720   2$ parallel -N2 cmp ::: a1 b1 a2 b2
 721
 722 =head4 To run the B<who> command five times, enter:
 723
 724   3$ apply -0 who 1 2 3 4 5
 725   3$ parallel -N0 who ::: 1 2 3 4 5
 726
 727 =head4 To link all files in the current directory to the directory
 728 /usr/joe, enter:
 729
 730   4$ apply 'ln %1 /usr/joe' *
 731   4$ parallel ln {} /usr/joe ::: *
 732
 733 https://www-01.ibm.com/support/knowledgecenter/
 734 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
 735
 736
 737 =head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
 738
 739 B<paexec> can run jobs in parallel on both the local and remote computers.
 740
 741 B<paexec> requires commands to print a blank line as the last
 742 output. This means you will have to write a wrapper for most programs.
 743
 744 B<paexec> has a job dependency facility so a job can depend on another
 745 job to be executed successfully. Sort of a poor-man's B<make>.
 746
 747 =head3 EXAMPLES FROM paexec's EXAMPLE CATALOG
 748
 749 Here are the examples from B<paexec>'s example catalog with the equivalent
 750 using GNU B<parallel>:
 751
 752 =head4 1_div_X_run
 753
 754   1$ ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
 755
 756   1$ parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
 757
 758 =head4 all_substr_run
 759
 760   2$ ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
 761
 762   2$ parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
 763
 764 =head4 cc_wrapper_run
 765
 766   3$ ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
 767              -n 'host1 host2' \
 768              -t '/usr/bin/ssh -x' <<EOF [...]
 769
 770   3$ parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
 771              -S host1,host2 <<EOF [...]
 772
 773      # This is not exactly the same, but avoids the wrapper
 774      parallel gcc -O2 -c -o {.}.o {} \
 775              -S host1,host2 <<EOF [...]
 776
 777 =head4 toupper_run
 778
 779   4$ ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
 780
 781   4$ parallel echo {} '|' ./toupper_cmd <<EOF [...]
 782
 783      # Without the wrapper:
 784      parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
 785
 786 https://github.com/cheusov/paexec
 787
 788
 789 =head2 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
 790
 791 Summary (see legend above):
 792
 793 =over
 794
 795 =item I1 - - I4 - - (I7)
 796
 797 =item M1 (M2) M3 (M4) M5 M6
 798
 799 =item - O2 O3 - O5 - - N/A N/A O10
 800
 801 =item E1 - - - - - -
 802
 803 =item - - - - - - - - -
 804
 805 =item - -
 806
 807 =back
 808
 809 (I7): Only under special circumstances. See below.
 810
 811 (M2+M4): Only if there is a single replacement string.
 812
 813 B<map> rejects input with special characters:
 814
 815   echo "The Cure" > My\ brother\'s\ 12\"\ records
 816
 817   ls | map 'echo %; wc %'
 818
 819 It works with GNU B<parallel>:
 820
 821   ls | parallel 'echo {}; wc {}'
 822
 823 Under some circumstances it also works with B<map>:
 824
 825   ls | map 'echo % works %'
 826
 827 But tiny changes make it reject the input with special characters:
 828
 829   ls | map 'echo % does not work "%"'
 830
 831 This means that many UTF-8 characters will be rejected. This is by
 832 design. From the web page: "As such, programs that I<quietly handle
 833 them, with no warnings at all,> are doing their users a disservice."
 834
 835 B<map> delays each job by 0.01 s. This can be emulated by using
 836 B<parallel --delay 0.01>.
 837
 838 B<map> prints '+' on stderr when a job starts, and '-' when a job
 839 finishes. This cannot be disabled. B<parallel> has B<--bar> if you
 840 need to see progress.
 841
 842 B<map>'s replacement strings (% %D %B %E) can be simulated in GNU
 843 B<parallel> by putting this in B<~/.parallel/config>:
 844
 845   --rpl '%'
 846   --rpl '%D $_=Q(::dirname($_));'
 847   --rpl '%B s:.*/::;s:\.[^/.]+$::;'
 848   --rpl '%E s:.*\.::'
 849
 850 B<map> does not have an argument separator on the command line, but
 851 uses the first argument as command. This makes quoting harder which again
 852 may affect readability. Compare:
 853
 854   map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
 855
 856   parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
 857
 858 B<map> can do multiple arguments with context replace, but not without
 859 context replace:
 860
 861   parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
 862
 863   map "echo 'BEGIN{'%'}END'" 1 2 3
 864
 865 B<map> has no support for grouping. So this gives the wrong results:
 866
 867   parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
 868     ::: a b c d e f
 869   ls -l a b c d e f
 870   parallel -kP4 -n1 grep 1 ::: a b c d e f > out.par
 871   map -n1 -p 4 'grep 1' a b c d e f > out.map-unbuf
 872   map -n1 -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
 873   map -n1 -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
 874   ls -l out*
 875   md5sum out*
 876
 877 =head3 EXAMPLES FROM map's WEBSITE
 878
 879 Here are the examples from B<map>'s web page with the equivalent using
 880 GNU B<parallel>:
 881
 882   1$ ls *.gif | map convert % %B.png         # default max-args: 1
 883
 884   1$ ls *.gif | parallel convert {} {.}.png
 885
 886   2$ map "mkdir %B; tar -C %B -xf %" *.tgz   # default max-args: 1
 887
 888   2$ parallel 'mkdir {.}; tar -C {.} -xf {}' :::  *.tgz
 889
 890   3$ ls *.gif | map cp % /tmp                # default max-args: 100
 891
 892   3$ ls *.gif | parallel -X cp {} /tmp
 893
 894   4$ ls *.tar | map -n 1 tar -xf %
 895
 896   4$ ls *.tar | parallel tar -xf
 897
 898   5$ map "cp % /tmp" *.tgz
 899
 900   5$ parallel cp {} /tmp ::: *.tgz
 901
 902   6$ map "du -sm /home/%/mail" alice bob carol
 903
 904   6$ parallel "du -sm /home/{}/mail" ::: alice bob carol
 905   or if you prefer running a single job with multiple args:
 906   6$ parallel -Xj1 "du -sm /home/{}/mail" ::: alice bob carol
 907
 908   7$ cat /etc/passwd | map -d: 'echo user %1 has shell %7'
 909
 910   7$ cat /etc/passwd | parallel --colsep : 'echo user {1} has shell {7}'
 911
 912   8$ export MAP_MAX_PROCS=$(( `nproc` / 2 ))
 913
 914   8$ export PARALLEL=-j50%
 915
 916 https://github.com/sitaramc/map (Last checked: 2020-05)
 917
 918
 919 =head2 DIFFERENCES BETWEEN ladon AND GNU Parallel
 920
 921 B<ladon> can run multiple jobs on files in parallel.
 922
 923 B<ladon> only works on files and the only way to specify files is
 924 using a quoted glob string (such as \*.jpg). It is not possible to
 925 list the files manually.
 926
 927 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
 928 RELPATH
 929
 930 These can be simulated using GNU B<parallel> by putting this in
 931 B<~/.parallel/config>:
 932
 933     --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
 934     --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
 935     --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
 936     --rpl 'EXT s:.*\.::'
 937     --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
 938            s:\Q$c/\E::;$_=::dirname($_);'
 939     --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
 940            s:\Q$c/\E::;'
 941
 942 B<ladon> deals badly with filenames containing " and newline, and it
 943 fails for output larger than 200k:
 944
 945     ladon '*' -- seq 36000 | wc
 946
 947 =head3 EXAMPLES FROM ladon MANUAL
 948
 949 It is assumed that the '--rpl's above are put in B<~/.parallel/config>
 950 and that it is run under a shell that supports '**' globbing (such as B<zsh>):
 951
 952   1$ ladon "**/*.txt" -- echo RELPATH
 953
 954   1$ parallel echo RELPATH ::: **/*.txt
 955
 956   2$ ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
 957
 958   2$ parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
 959
 960   3$ ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH \
 961        -thumbnail 100x100^ -gravity center -extent 100x100 \
 962        thumbs/RELPATH
 963
 964   3$ parallel mkdir -p thumbs/RELDIR\; convert FULLPATH
 965        -thumbnail 100x100^ -gravity center -extent 100x100 \
 966        thumbs/RELPATH ::: **/*.jpg
 967
 968   4$ ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
 969
 970   4$ parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
 971
 972 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
 973
 974
 975 =head2 DIFFERENCES BETWEEN jobflow AND GNU Parallel
 976
 977 B<jobflow> can run multiple jobs in parallel.
 978
 979 Just like B<xargs> output from B<jobflow> jobs running in parallel mix
 980 together by default. B<jobflow> can buffer into files (placed in
 981 /run/shm), but these are not cleaned up if B<jobflow> dies
 982 unexpectedly (e.g. by Ctrl-C). If the total output is big (in the
 983 order of RAM+swap) it can cause the system to slow to a crawl and
 984 eventually run out of memory.
 985
 986 B<jobflow> gives no error if the command is unknown, and like B<xargs>
 987 redirection and composed commands require wrapping with B<bash -c>.
 988
 989 Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
 990 the command template. More than that either crashes the program or
 991 simple does not execute the command.
 992
 993 B<jobflow> has no equivalent for B<--pipe>, or B<--sshlogin>.
 994
 995 B<jobflow> makes it possible to set resource limits on the running
 996 jobs. This can be emulated by GNU B<parallel> using B<bash>'s B<ulimit>:
 997
 998   jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
 999
1000   parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
1001
1002
1003 =head3 EXAMPLES FROM jobflow README
1004
1005   1$ cat things.list | jobflow -threads=8 -exec ./mytask {}
1006
1007   1$ cat things.list | parallel -j8 ./mytask {}
1008
1009   2$ seq 100 | jobflow -threads=100 -exec echo {}
1010
1011   2$ seq 100 | parallel -j100 echo {}
1012
1013   3$ cat urls.txt | jobflow -threads=32 -exec wget {}
1014
1015   3$ cat urls.txt | parallel -j32 wget {}
1016
1017   4$ find . -name '*.bmp' | \
1018        jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
1019
1020   4$ find . -name '*.bmp' | \
1021        parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
1022
1023 https://github.com/rofl0r/jobflow
1024
1025
1026 =head2 DIFFERENCES BETWEEN gargs AND GNU Parallel
1027
1028 B<gargs> can run multiple jobs in parallel.
1029
1030 Older versions cache output in memory. This causes it to be extremely
1031 slow when the output is larger than the physical RAM, and can cause
1032 the system to run out of memory.
1033
1034 See more details on this in B<man parallel_design>.
1035
1036 Newer versions cache output in files, but leave files in $TMPDIR if it
1037 is killed.
1038
1039 Output to stderr (standard error) is changed if the command fails.
1040
1041 =head3 EXAMPLES FROM gargs WEBSITE
1042
1043   1$ seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
1044
1045   1$ seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
1046
1047   2$ cat t.txt | gargs --sep "\s+" \
1048        -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
1049
1050   2$ cat t.txt | parallel --colsep "\\s+" \
1051        -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
1052
1053 https://github.com/brentp/gargs
1054
1055
1056 =head2 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
1057
1058 B<orgalorg> can run the same job on multiple machines. This is related
1059 to B<--onall> and B<--nonall>.
1060
1061 B<orgalorg> supports entering the SSH password - provided it is the
1062 same for all servers. GNU B<parallel> advocates using B<ssh-agent>
1063 instead, but it is possible to emulate B<orgalorg>'s behavior by
1064 setting SSHPASS and by using B<--ssh "sshpass ssh">.
1065
1066 To make the emulation easier, make a simple alias:
1067
1068   alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
1069
1070 If you want to supply a password run:
1071
1072   SSHPASS=`ssh-askpass`
1073
1074 or set the password directly:
1075
1076   SSHPASS=P4$$w0rd!
1077
1078 If the above is set up you can then do:
1079
1080   orgalorg -o frontend1 -o frontend2 -p -C uptime
1081   par_emul -S frontend1 -S frontend2 uptime
1082
1083   orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
1084   par_emul -S frontend1 -S frontend2 top -bid 1
1085
1086   orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
1087     'md5sum /tmp/bigfile' -S bigfile
1088   par_emul -S frontend1 -S frontend2 --basefile bigfile \
1089     --workdir /tmp md5sum /tmp/bigfile
1090
1091 B<orgalorg> has a progress indicator for the transferring of a
1092 file. GNU B<parallel> does not.
1093
1094 https://github.com/reconquest/orgalorg
1095
1096
1097 =head2 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
1098
1099 Rust parallel focuses on speed. It is almost as fast as B<xargs>, but
1100 not as fast as B<parallel-bash>. It implements a few features from GNU
1101 B<parallel>, but lacks many functions. All these fail:
1102
1103   # Read arguments from file
1104   parallel -a file echo
1105   # Changing the delimiter
1106   parallel -d _ echo ::: a_b_c_
1107
1108 These do something different from GNU B<parallel>
1109
1110   # -q to protect quoted $ and space
1111   parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
1112   # Generation of combination of inputs
1113   parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
1114   # {= perl expression =} replacement string
1115   parallel echo '{= s/new/old/ =}' ::: my.new your.new
1116   # --pipe
1117   seq 100000 | parallel --pipe wc
1118   # linked arguments
1119   parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
1120   # Run different shell dialects
1121   zsh -c 'parallel echo \={} ::: zsh && true'
1122   csh -c 'parallel echo \$\{\} ::: shell && true'
1123   bash -c 'parallel echo \$\({}\) ::: pwd && true'
1124   # Rust parallel does not start before the last argument is read
1125   (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
1126   tail -f /var/log/syslog | parallel echo
1127
1128 Most of the examples from the book GNU Parallel 2018 do not work, thus
1129 Rust parallel is not close to being a compatible replacement.
1130
1131 Rust parallel has no remote facilities.
1132
1133 It uses /tmp/parallel for tmp files and does not clean up if
1134 terminated abruptly. If another user on the system uses Rust parallel,
1135 then /tmp/parallel will have the wrong permissions and Rust parallel
1136 will fail. A malicious user can setup the right permissions and
1137 symlink the output file to one of the user's files and next time the
1138 user uses Rust parallel it will overwrite this file.
1139
1140   attacker$ mkdir /tmp/parallel
1141   attacker$ chmod a+rwX /tmp/parallel
1142   # Symlink to the file the attacker wants to zero out
1143   attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
1144   victim$ seq 1000 | parallel echo
1145   # This file is now overwritten with stderr from 'echo'
1146   victim$ cat ~victim/.important-file
1147
1148 If /tmp/parallel runs full during the run, Rust parallel does not
1149 report this, but finishes with success - thereby risking data loss.
1150
1151 https://github.com/mmstick/parallel
1152
1153
1154 =head2 DIFFERENCES BETWEEN Rush AND GNU Parallel
1155
1156 B<rush> (https://github.com/shenwei356/rush) is written in Go and
1157 based on B<gargs>.
1158
1159 Just like GNU B<parallel> B<rush> buffers in temporary files. But
1160 opposite GNU B<parallel> B<rush> does not clean up, if the process
1161 dies abnormally.
1162
1163 B<rush> has some string manipulations that can be emulated by putting
1164 this into ~/.parallel/config (/ is used instead of %, and % is used
1165 instead of ^ as that is closer to bash's ${var%postfix}):
1166
1167   --rpl '{:} s:(\.[^/]+)*$::'
1168   --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
1169   --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
1170   --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
1171   --rpl '{@(.*?)} /$$1/ and $_=$1;'
1172
1173 =head3 EXAMPLES FROM rush's WEBSITE
1174
1175 Here are the examples from B<rush>'s website with the equivalent
1176 command in GNU B<parallel>.
1177
1178 B<1. Simple run, quoting is not necessary>
1179
1180   1$ seq 1 3 | rush echo {}
1181
1182   1$ seq 1 3 | parallel echo {}
1183
1184 B<2. Read data from file (`-i`)>
1185
1186   2$ rush echo {} -i data1.txt -i data2.txt
1187
1188   2$ cat data1.txt data2.txt | parallel echo {}
1189
1190 B<3. Keep output order (`-k`)>
1191
1192   3$ seq 1 3 | rush 'echo {}' -k
1193
1194   3$ seq 1 3 | parallel -k echo {}
1195
1196
1197 B<4. Timeout (`-t`)>
1198
1199   4$ time seq 1 | rush 'sleep 2; echo {}' -t 1
1200
1201   4$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
1202
1203 B<5. Retry (`-r`)>
1204
1205   5$ seq 1 | rush 'python unexisted_script.py' -r 1
1206
1207   5$ seq 1 | parallel --retries 2 'python unexisted_script.py'
1208
1209 Use B<-u> to see it is really run twice:
1210
1211   5$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
1212
1213 B<6. Dirname (`{/}`) and basename (`{%}`) and remove custom
1214 suffix (`{^suffix}`)>
1215
1216   6$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
1217
1218   6$ echo dir/file_1.txt.gz |
1219        parallel --plus echo {//} {/} {%_1.txt.gz}
1220
1221 B<7. Get basename, and remove last (`{.}`) or any (`{:}`) extension>
1222
1223   7$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
1224
1225   7$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
1226
1227 B<8. Job ID, combine fields index and other replacement strings>
1228
1229   8$ echo 12 file.txt dir/s_1.fq.gz |
1230        rush 'echo job {#}: {2} {2.} {3%:^_1}'
1231
1232   8$ echo 12 file.txt dir/s_1.fq.gz |
1233        parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1234
1235 B<9. Capture submatch using regular expression (`{@regexp}`)>
1236
1237   9$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1238
1239   9$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1240
1241 B<10. Custom field delimiter (`-d`)>
1242
1243   10$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1244
1245   10$ echo a=b=c | parallel -d = echo {1} {2} {3}
1246
1247 B<11. Send multi-lines to every command (`-n`)>
1248
1249   11$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1250
1251   11$ seq 5 |
1252         parallel -n 2 -k \
1253           'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1254
1255   11$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1256
1257   11$ seq 5 | parallel -n 2 -k 'echo {}; echo'
1258
1259
1260 B<12. Custom record delimiter (`-D`), note that empty records are not used.>
1261
1262   12$ echo a b c d | rush -D " " -k 'echo {}'
1263
1264   12$ echo a b c d | parallel -d " " -k 'echo {}'
1265
1266   12$ echo abcd | rush -D "" -k 'echo {}'
1267
1268   Cannot be done by GNU Parallel
1269
1270   12$ cat fasta.fa
1271   >seq1
1272   tag
1273   >seq2
1274   cat
1275   gat
1276   >seq3
1277   attac
1278   a
1279   cat
1280
1281   12$ cat fasta.fa | rush -D ">" \
1282         'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1283       # rush fails to join the multiline sequences
1284
1285   12$ cat fasta.fa | (read -n1 ignore_first_char;
1286         parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1287           name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1288       )
1289
1290 B<13. Assign value to variable, like `awk -v` (`-v`)>
1291
1292   13$ seq 1 |
1293         rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1294
1295   13$ seq 1 |
1296         parallel -N0 \
1297           'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1298
1299   13$ for var in a b; do \
1300   13$   seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1301   13$ done
1302
1303 In GNU B<parallel> you would typically do:
1304
1305   13$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1306
1307 If you I<really> want the var:
1308
1309   13$ seq 1 3 |
1310         parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1311
1312 If you I<really> want the B<for>-loop:
1313
1314   13$ for var in a b; do
1315         export var;
1316         seq 1 3 | parallel -k 'echo var: $var, data: {}';
1317       done
1318
1319 Contrary to B<rush> this also works if the value is complex like:
1320
1321   My brother's 12" records
1322
1323
1324 B<14. B<Preset variable> (`-v`), avoid repeatedly writing verbose replacement strings>
1325
1326   14$ # naive way
1327       echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1328
1329   14$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1330
1331   14$ # macro + removing suffix
1332       echo read_1.fq.gz |
1333         rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1334
1335   14$ echo read_1.fq.gz |
1336         parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1337
1338   14$ # macro + regular expression
1339       echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1340
1341   14$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1342
1343 Contrary to B<rush> GNU B<parallel> works with complex values:
1344
1345   14$ echo "My brother's 12\"read_1.fq.gz" |
1346         parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1347
1348 B<15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and exit.>
1349
1350   15$ seq 1 20 | rush 'sleep 1; echo {}'
1351       ^C
1352
1353   15$ seq 1 20 | parallel 'sleep 1; echo {}'
1354       ^C
1355
1356 B<16. Continue/resume jobs (`-c`). When some jobs failed (by
1357 execution failure, timeout, or canceling by user with `Ctrl + C`),
1358 please switch flag `-c/--continue` on and run again, so that `rush`
1359 can save successful commands and ignore them in I<NEXT> run.>
1360
1361   16$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1362       cat successful_cmds.rush
1363       seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1364
1365   16$ seq 1 3 | parallel --joblog mylog --timeout 2 \
1366         'sleep {}; echo {}'
1367       cat mylog
1368       seq 1 3 | parallel --joblog mylog --retry-failed \
1369         'sleep {}; echo {}'
1370
1371 Multi-line jobs:
1372
1373   16$ seq 1 3 | rush 'sleep {}; echo {}; \
1374         echo finish {}' -t 3 -c -C finished.rush
1375       cat finished.rush
1376       seq 1 3 | rush 'sleep {}; echo {}; \
1377         echo finish {}' -t 3 -c -C finished.rush
1378
1379   16$ seq 1 3 |
1380         parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1381           echo finish {}'
1382       cat mylog
1383       seq 1 3 |
1384         parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1385           echo finish {}'
1386
1387 B<17. A comprehensive example: downloading 1K+ pages given by
1388 three URL list files using `phantomjs save_page.js` (some page
1389 contents are dynamically generated by Javascript, so `wget` does not
1390 work). Here I set max jobs number (`-j`) as `20`, each job has a max
1391 running time (`-t`) of `60` seconds and `3` retry changes
1392 (`-r`). Continue flag `-c` is also switched on, so we can continue
1393 unfinished jobs. Luckily, it's accomplished in one run :)>
1394
1395   17$ for f in $(seq 2014 2016); do \
1396         /bin/rm -rf $f; mkdir -p $f; \
1397         cat $f.html.txt | rush -v d=$f -d = \
1398           'phantomjs save_page.js "{}" > {d}/{3}.html' \
1399           -j 20 -t 60 -r 3 -c; \
1400       done
1401
1402 GNU B<parallel> can append to an existing joblog with '+':
1403
1404   17$ rm mylog
1405       for f in $(seq 2014 2016); do
1406         /bin/rm -rf $f; mkdir -p $f;
1407         cat $f.html.txt |
1408           parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1409             --colsep = \
1410             phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1411       done
1412
1413 B<18. A bioinformatics example: mapping with `bwa`, and
1414 processing result with `samtools`:>
1415
1416   18$ ref=ref/xxx.fa
1417       threads=25
1418       ls -d raw.cluster.clean.mapping/* \
1419         | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1420         'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1421         samtools view -bS {p}.sam > {p}.bam; \
1422         samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1423         samtools index {p}.sorted.bam; \
1424         samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1425         /bin/rm {p}.bam {p}.sam;' \
1426         -j 2 --verbose -c -C mapping.rush
1427
1428 GNU B<parallel> would use a function:
1429
1430   18$ ref=ref/xxx.fa
1431       export ref
1432       thr=25
1433       export thr
1434       bwa_sam() {
1435         p="$1"
1436         bam="$p".bam
1437         sam="$p".sam
1438         sortbam="$p".sorted.bam
1439         bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1440         samtools view -bS "$sam" > "$bam"
1441         samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1442         samtools index "$sortbam"
1443         samtools flagstat "$sortbam" > "$sortbam".flagstat
1444         /bin/rm "$bam" "$sam"
1445       }
1446       export -f bwa_sam
1447       ls -d raw.cluster.clean.mapping/* |
1448         parallel -j 2 --verbose --joblog mylog bwa_sam
1449
1450 =head3 Other B<rush> features
1451
1452 B<rush> has:
1453
1454 =over 4
1455
1456 =item * B<awk -v> like custom defined variables (B<-v>)
1457
1458 With GNU B<parallel> you would simply set a shell variable:
1459
1460    parallel 'v={}; echo "$v"' ::: foo
1461    echo foo | rush -v v={} 'echo {v}'
1462
1463 Also B<rush> does not like special chars. So these B<do not work>:
1464
1465    echo does not work | rush -v v=\" 'echo {v}'
1466    echo "My  brother's  12\"  records" | rush -v v={} 'echo {v}'
1467
1468 Whereas the corresponding GNU B<parallel> version works:
1469
1470    parallel 'v=\"; echo "$v"' ::: works
1471    parallel 'v={}; echo "$v"' ::: "My  brother's  12\"  records"
1472
1473 =item * Exit on first error(s) (-e)
1474
1475 This is called B<--halt now,fail=1> (or shorter: B<--halt 2>) when
1476 used with GNU B<parallel>.
1477
1478 =item * Settable records sending to every command (B<-n>, default 1)
1479
1480 This is also called B<-n> in GNU B<parallel>.
1481
1482 =item * Practical replacement strings
1483
1484 =over 4
1485
1486 =item {:} remove any extension
1487
1488 With GNU B<parallel> this can be emulated by:
1489
1490   parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1491
1492 =item {^suffix}, remove suffix
1493
1494 With GNU B<parallel> this can be emulated by:
1495
1496   parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1497
1498 =item {@regexp}, capture submatch using regular expression
1499
1500 With GNU B<parallel> this can be emulated by:
1501
1502   parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1503     echo '{@\d_(.*).gz}' ::: 1_foo.gz
1504
1505 =item {%.}, {%:}, basename without extension
1506
1507 With GNU B<parallel> this can be emulated by:
1508
1509   parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1510
1511 And if you need it often, you define a B<--rpl> in
1512 B<$HOME/.parallel/config>:
1513
1514   --rpl '{%.} s:.*/::;s/\..*//'
1515   --rpl '{%:} s:.*/::;s/\..*//'
1516
1517 Then you can use them as:
1518
1519   parallel echo {%.} {%:} ::: dir/foo.bar.gz
1520
1521 =back
1522
1523 =item * Preset variable (macro)
1524
1525 E.g.
1526
1527   echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1528
1529 With GNU B<parallel> this can be emulated by:
1530
1531   echo foosuffix |
1532     parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1533
1534 Opposite B<rush> GNU B<parallel> works fine if the input contains
1535 double space, ' and ":
1536
1537   echo "1'6\"  foosuffix" |
1538     parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1539
1540
1541 =item * Commands of multi-lines
1542
1543 While you I<can> use multi-lined commands in GNU B<parallel>, to
1544 improve readability GNU B<parallel> discourages the use of multi-line
1545 commands. In most cases it can be written as a function:
1546
1547   seq 1 3 |
1548     parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1549       echo finish {}'
1550
1551 Could be written as:
1552
1553   doit() {
1554     sleep "$1"
1555     echo "$1"
1556     echo finish "$1"
1557   }
1558   export -f doit
1559   seq 1 3 | parallel --timeout 2 --joblog my.log doit
1560
1561 The failed commands can be resumed with:
1562
1563   seq 1 3 |
1564     parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1565       echo finish {}'
1566
1567 =back
1568
1569 https://github.com/shenwei356/rush
1570
1571
1572 =head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1573
1574 ClusterSSH solves a different problem than GNU B<parallel>.
1575
1576 ClusterSSH opens a terminal window for each computer and using a
1577 master window you can run the same command on all the computers. This
1578 is typically used for administrating several computers that are almost
1579 identical.
1580
1581 GNU B<parallel> runs the same (or different) commands with different
1582 arguments in parallel possibly using remote computers to help
1583 computing. If more than one computer is listed in B<-S> GNU B<parallel> may
1584 only use one of these (e.g. if there are 8 jobs to be run and one
1585 computer has 8 cores).
1586
1587 GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
1588
1589 B<parallel --nonall -S server-a,server-b do_stuff foo bar>
1590
1591 https://github.com/duncs/clusterssh
1592
1593
1594 =head2 DIFFERENCES BETWEEN coshell AND GNU Parallel
1595
1596 B<coshell> only accepts full commands on standard input. Any quoting
1597 needs to be done by the user.
1598
1599 Commands are run in B<sh> so any B<bash>/B<tcsh>/B<zsh> specific
1600 syntax will not work.
1601
1602 Output can be buffered by using B<-d>. Output is buffered in memory,
1603 so big output can cause swapping and therefore be terrible slow or
1604 even cause out of memory.
1605
1606 https://github.com/gdm85/coshell (Last checked: 2019-01)
1607
1608
1609 =head2 DIFFERENCES BETWEEN spread AND GNU Parallel
1610
1611 B<spread> runs commands on all directories.
1612
1613 It can be emulated with GNU B<parallel> using this Bash function:
1614
1615   spread() {
1616     _cmds() {
1617       perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1618     }
1619     parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1620   }
1621
1622 This works except for the B<--exclude> option.
1623
1624 (Last checked: 2017-11)
1625
1626
1627 =head2 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1628
1629 B<pyargs> deals badly with input containing spaces. It buffers stdout,
1630 but not stderr. It buffers in RAM. {} does not work as replacement
1631 string. It does not support running functions.
1632
1633 B<pyargs> does not support composed commands if run with B<--lines>,
1634 and fails on B<pyargs traceroute gnu.org fsf.org>.
1635
1636 =head3 Examples
1637
1638   seq 5 | pyargs -P50 -L seq
1639   seq 5 | parallel -P50 --lb seq
1640
1641   seq 5 | pyargs -P50 --mark -L seq
1642   seq 5 | parallel -P50 --lb \
1643     --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1644   # Similar, but not precisely the same
1645   seq 5 | parallel -P50 --lb --tag seq
1646
1647   seq 5 | pyargs -P50  --mark command
1648   # Somewhat longer with GNU Parallel due to the special
1649   #   --mark formatting
1650   cmd="$(echo "command" | parallel --shellquote)"
1651   wrap_cmd() {
1652      echo "MARK $cmd $@================================" >&3
1653      echo "OUTPUT START[$cmd $@]:"
1654      eval $cmd "$@"
1655      echo "OUTPUT END[$cmd $@]"
1656   }
1657   (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1658   # Similar, but not exactly the same
1659   seq 5 | parallel -t --tag command
1660
1661   (echo '1  2  3';echo 4 5 6) | pyargs  --stream seq
1662   (echo '1  2  3';echo 4 5 6) | perl -pe 's/\n/ /' |
1663     parallel -r -d' ' seq
1664   # Similar, but not exactly the same
1665   parallel seq ::: 1 2 3 4 5 6
1666
1667 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1668
1669
1670 =head2 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1671
1672 B<concurrently> runs jobs in parallel.
1673
1674 The output is prepended with the job number, and may be incomplete:
1675
1676   $ concurrently 'seq 100000' | (sleep 3;wc -l)
1677   7165
1678
1679 When pretty printing it caches output in memory. Output mixes by using
1680 test MIX below whether or not output is cached.
1681
1682 There seems to be no way of making a template command and have
1683 B<concurrently> fill that with different args. The full commands must
1684 be given on the command line.
1685
1686 There is also no way of controlling how many jobs should be run in
1687 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1688 simply started in parallel.
1689
1690 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1691
1692
1693 =head2 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1694
1695 B<map> does not run jobs in parallel by default. The README suggests using:
1696
1697   ... | map t 'sleep $t && say done &'
1698
1699 But this fails if more jobs are run in parallel than the number of
1700 available processes. Since there is no support for parallelization in
1701 B<map> itself, the output also mixes:
1702
1703   seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1704
1705 The major difference is that GNU B<parallel> is built for parallelization
1706 and B<map> is not. So GNU B<parallel> has lots of ways of dealing with the
1707 issues that parallelization raises:
1708
1709 =over 4
1710
1711 =item *
1712
1713 Keep the number of processes manageable
1714
1715 =item *
1716
1717 Make sure output does not mix
1718
1719 =item *
1720
1721 Make Ctrl-C kill all running processes
1722
1723 =back
1724
1725 =head3 EXAMPLES FROM maps WEBSITE
1726
1727 Here are the 5 examples converted to GNU Parallel:
1728
1729   1$ ls *.c | map f 'foo $f'
1730   1$ ls *.c | parallel foo
1731
1732   2$ ls *.c | map f 'foo $f; bar $f'
1733   2$ ls *.c | parallel 'foo {}; bar {}'
1734
1735   3$ cat urls | map u 'curl -O $u'
1736   3$ cat urls | parallel curl -O
1737
1738   4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1739   4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1740   4$ parallel 'sleep {} && say done' ::: 1 1 1
1741
1742   5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1743   5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1744   5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1745
1746 https://github.com/soveran/map (Last checked: 2019-01)
1747
1748
1749 =head2 DIFFERENCES BETWEEN loop AND GNU Parallel
1750
1751 B<loop> mixes stdout and stderr:
1752
1753     loop 'ls /no-such-file' >/dev/null
1754
1755 B<loop>'s replacement string B<$ITEM> does not quote strings:
1756
1757     echo 'two  spaces' | loop 'echo $ITEM'
1758
1759 B<loop> cannot run functions:
1760
1761     myfunc() { echo joe; }
1762     export -f myfunc
1763     loop 'myfunc this fails'
1764
1765 =head3 EXAMPLES FROM loop's WEBSITE
1766
1767 Some of the examples from https://github.com/Miserlou/Loop/ can be
1768 emulated with GNU B<parallel>:
1769
1770     # A couple of functions will make the code easier to read
1771     $ loopy() {
1772         yes | parallel -uN0 -j1 "$@"
1773       }
1774     $ export -f loopy
1775     $ time_out() {
1776         parallel -uN0 -q --timeout "$@" ::: 1
1777       }
1778     $ match() {
1779         perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1780       }
1781     $ export -f match
1782
1783     $ loop 'ls' --every 10s
1784     $ loopy --delay 10s ls
1785
1786     $ loop 'touch $COUNT.txt' --count-by 5
1787     $ loopy touch '{= $_=seq()*5 =}'.txt
1788
1789     $ loop --until-contains 200 -- \
1790         ./get_response_code.sh --site mysite.biz`
1791     $ loopy --halt now,success=1 \
1792         './get_response_code.sh --site mysite.biz | match 200'
1793
1794     $ loop './poke_server' --for-duration 8h
1795     $ time_out 8h loopy ./poke_server
1796
1797     $ loop './poke_server' --until-success
1798     $ loopy --halt now,success=1 ./poke_server
1799
1800     $ cat files_to_create.txt | loop 'touch $ITEM'
1801     $ cat files_to_create.txt | parallel touch {}
1802
1803     $ loop 'ls' --for-duration 10min --summary
1804     # --joblog is somewhat more verbose than --summary
1805     $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1806
1807     $ loop 'echo hello'
1808     $ loopy echo hello
1809
1810     $ loop 'echo $COUNT'
1811     # GNU Parallel counts from 1
1812     $ loopy echo {#}
1813     # Counting from 0 can be forced
1814     $ loopy echo '{= $_=seq()-1 =}'
1815
1816     $ loop 'echo $COUNT' --count-by 2
1817     $ loopy echo '{= $_=2*(seq()-1) =}'
1818
1819     $ loop 'echo $COUNT' --count-by 2 --offset 10
1820     $ loopy echo '{= $_=10+2*(seq()-1) =}'
1821
1822     $ loop 'echo $COUNT' --count-by 1.1
1823     # GNU Parallel rounds 3.3000000000000003 to 3.3
1824     $ loopy echo '{= $_=1.1*(seq()-1) =}'
1825
1826     $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1827     $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1828
1829     $ loop 'echo $COUNT' --num 3 --summary
1830     # --joblog is somewhat more verbose than --summary
1831     $ seq 3 | parallel --joblog my.log echo; cat my.log
1832
1833     $ loop 'ls -foobarbatz' --num 3 --summary
1834     # --joblog is somewhat more verbose than --summary
1835     $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1836
1837     $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1838     # Can be emulated by running 2 jobs
1839     $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1840     $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1841
1842     $ loop 'date' --every 5s
1843     $ loopy --delay 5s date
1844
1845     $ loop 'date' --for-duration 8s --every 2s
1846     $ time_out 8s loopy --delay 2s date
1847
1848     $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1849     $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date  +%s`))s
1850     $ time_out $seconds loopy --delay 5s date -u
1851
1852     $ loop 'echo $RANDOM' --until-contains "666"
1853     $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1854
1855     $ loop 'if (( RANDOM % 2 )); then
1856               (echo "TRUE"; true);
1857             else
1858               (echo "FALSE"; false);
1859             fi' --until-success
1860     $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1861                                     (echo "TRUE"; true);
1862                                   else
1863                                     (echo "FALSE"; false);
1864                                   fi'
1865
1866     $ loop 'if (( RANDOM % 2 )); then
1867         (echo "TRUE"; true);
1868       else
1869         (echo "FALSE"; false);
1870       fi' --until-error
1871     $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1872                                  (echo "TRUE"; true);
1873                                else
1874                                  (echo "FALSE"; false);
1875                                fi'
1876
1877     $ loop 'date' --until-match "(\d{4})"
1878     $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1879
1880     $ loop 'echo $ITEM' --for red,green,blue
1881     $ parallel echo ::: red green blue
1882
1883     $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1884     $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1885
1886     $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1887     $ ls | parallel cp {} {}.bak; ls
1888
1889     $ loop 'echo $ITEM | tr a-z A-Z' -i
1890     $ parallel 'echo {} | tr a-z A-Z'
1891     # Or more efficiently:
1892     $ parallel --pipe tr a-z A-Z
1893
1894     $ loop 'echo $ITEM' --for "`ls`"
1895     $ parallel echo {} ::: "`ls`"
1896
1897     $ ls | loop './my_program $ITEM' --until-success;
1898     $ ls | parallel --halt now,success=1 ./my_program {}
1899
1900     $ ls | loop './my_program $ITEM' --until-fail;
1901     $ ls | parallel --halt now,fail=1 ./my_program {}
1902
1903     $ ./deploy.sh;
1904       loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1905         --every 5s --until-contains 200;
1906       ./announce_to_slack.sh
1907     $ ./deploy.sh;
1908       loopy --delay 5s --halt now,success=1 \
1909       'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1910       ./announce_to_slack.sh
1911
1912     $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1913     $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1914
1915     $ ./create_big_file -o my_big_file.bin;
1916       loop 'ls' --until-contains 'my_big_file.bin';
1917       ./upload_big_file my_big_file.bin
1918     # inotifywait is a better tool to detect file system changes.
1919     # It can even make sure the file is complete
1920     # so you are not uploading an incomplete file
1921     $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1922         grep my_big_file.bin
1923
1924     $ ls | loop 'cp $ITEM $ITEM.bak'
1925     $ ls | parallel cp {} {}.bak
1926
1927     $ loop './do_thing.sh' --every 15s --until-success --num 5
1928     $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1929
1930 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1931
1932
1933 =head2 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1934
1935 B<lorikeet> can run jobs in parallel. It does this based on a
1936 dependency graph described in a file, so this is similar to B<make>.
1937
1938 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1939
1940
1941 =head2 DIFFERENCES BETWEEN spp AND GNU Parallel
1942
1943 B<spp> can run jobs in parallel. B<spp> does not use a command
1944 template to generate the jobs, but requires jobs to be in a
1945 file. Output from the jobs mix.
1946
1947 https://github.com/john01dav/spp (Last checked: 2019-01)
1948
1949
1950 =head2 DIFFERENCES BETWEEN paral AND GNU Parallel
1951
1952 B<paral> prints a lot of status information and stores the output from
1953 the commands run into files. This means it cannot be used the middle
1954 of a pipe like this
1955
1956   paral "echo this" "echo does not" "echo work" | wc
1957
1958 Instead it puts the output into files named like
1959 B<out_#_I<command>.out.log>. To get a very similar behaviour with GNU
1960 B<parallel> use B<--results
1961 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta>
1962
1963 B<paral> only takes arguments on the command line and each argument
1964 should be a full command. Thus it does not use command templates.
1965
1966 This limits how many jobs it can run in total, because they all need
1967 to fit on a single command line.
1968
1969 B<paral> has no support for running jobs remotely.
1970
1971 =head3 EXAMPLES FROM README.markdown
1972
1973 The examples from B<README.markdown> and the corresponding command run
1974 with GNU B<parallel> (B<--results
1975 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta> is omitted from
1976 the GNU B<parallel> command):
1977
1978   1$ paral "command 1" "command 2 --flag" "command arg1 arg2"
1979   1$ parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1980
1981   2$ paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1982        "sleep 3 && echo c3" "sleep 4 && echo c4"  "sleep 5 && echo c5"
1983   2$ parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1984        "sleep 3 && echo c3" "sleep 4 && echo c4"  "sleep 5 && echo c5"
1985      # Or shorter:
1986      parallel "sleep {} && echo c{}" ::: {1..5}
1987
1988   3$ paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1989        "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1990   3$ parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1991        "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1992      # Or shorter:
1993      parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1994
1995   4$ paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1996        "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1997   4$ parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1998
1999   5$ paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
2000        "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
2001   5$ parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
2002
2003   6$ paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
2004        "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
2005   6$ parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
2006
2007   7$ paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
2008        echo c && sleep 0.5 && echo d && sleep 0.5 && \
2009        echo e && sleep 0.5 && echo f && sleep 0.5 && \
2010        echo g && sleep 0.5 && echo h"
2011   7$ parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
2012        echo c && sleep 0.5 && echo d && sleep 0.5 && \
2013        echo e && sleep 0.5 && echo f && sleep 0.5 && \
2014        echo g && sleep 0.5 && echo h"
2015
2016 https://github.com/amattn/paral (Last checked: 2019-01)
2017
2018
2019 =head2 DIFFERENCES BETWEEN concurr AND GNU Parallel
2020
2021 B<concurr> is built to run jobs in parallel using a client/server
2022 model.
2023
2024 =head3 EXAMPLES FROM README.md
2025
2026 The examples from B<README.md>:
2027
2028   1$ concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
2029   1$ parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
2030
2031   2$ concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
2032   2$ parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
2033
2034   3$ concurr 'echo {}' < input_file
2035   3$ parallel 'echo {}' < input_file
2036
2037   4$ cat file | concurr 'echo {}'
2038   4$ cat file | parallel 'echo {}'
2039
2040 B<concurr> deals badly empty input files and with output larger than
2041 64 KB.
2042
2043 https://github.com/mmstick/concurr (Last checked: 2019-01)
2044
2045
2046 =head2 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
2047
2048 B<lesser-parallel> is the inspiration for B<parallel --embed>. Both
2049 B<lesser-parallel> and B<parallel --embed> define bash functions that
2050 can be included as part of a bash script to run jobs in parallel.
2051
2052 B<lesser-parallel> implements a few of the replacement strings, but
2053 hardly any options, whereas B<parallel --embed> gives you the full
2054 GNU B<parallel> experience.
2055
2056 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
2057
2058
2059 =head2 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
2060
2061 B<npm-parallel> can run npm tasks in parallel.
2062
2063 There are no examples and very little documentation, so it is hard to
2064 compare to GNU B<parallel>.
2065
2066 https://github.com/spion/npm-parallel (Last checked: 2019-01)
2067
2068
2069 =head2 DIFFERENCES BETWEEN machma AND GNU Parallel
2070
2071 B<machma> runs tasks in parallel. It gives time stamped
2072 output. It buffers in RAM.
2073
2074 =head3 EXAMPLES FROM README.md
2075
2076 The examples from README.md:
2077
2078   1$ # Put shorthand for timestamp in config for the examples
2079      echo '--rpl '\
2080        \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
2081        > ~/.parallel/machma
2082      echo '--line-buffer --tagstring "{#} {time} {}"' \
2083        >> ~/.parallel/machma
2084
2085   2$ find . -iname '*.jpg' |
2086        machma --  mogrify -resize 1200x1200 -filter Lanczos {}
2087      find . -iname '*.jpg' |
2088        parallel --bar -Jmachma mogrify -resize 1200x1200 \
2089          -filter Lanczos {}
2090
2091   3$ cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
2092   3$ cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
2093
2094   4$ cat /tmp/ips |
2095        machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
2096   4$ cat /tmp/ips |
2097        parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
2098
2099   5$ find . -iname '*.jpg' |
2100        machma --timeout 5s -- mogrify -resize 1200x1200 \
2101          -filter Lanczos {}
2102   5$ find . -iname '*.jpg' |
2103        parallel --timeout 5s --bar mogrify -resize 1200x1200 \
2104          -filter Lanczos {}
2105
2106   6$ find . -iname '*.jpg' -print0 |
2107        machma --null --  mogrify -resize 1200x1200 -filter Lanczos {}
2108   6$ find . -iname '*.jpg' -print0 |
2109        parallel --null --bar mogrify -resize 1200x1200 \
2110          -filter Lanczos {}
2111
2112 https://github.com/fd0/machma (Last checked: 2019-06)
2113
2114
2115 =head2 DIFFERENCES BETWEEN interlace AND GNU Parallel
2116
2117 Summary (see legend above):
2118
2119 =over
2120
2121 =item - I2 I3 I4 - - -
2122
2123 =item M1 - M3 - - M6
2124
2125 =item - O2 O3 - - - - x x
2126
2127 =item E1 E2 - - - - -
2128
2129 =item - - - - - - - - -
2130
2131 =item - -
2132
2133 =back
2134
2135 B<interlace> is built for network analysis to run network tools in parallel.
2136
2137 B<interface> does not buffer output, so output from different jobs mixes.
2138
2139 The overhead for each target is O(n*n), so with 1000 targets it
2140 becomes very slow with an overhead in the order of 500ms/target.
2141
2142 =head3 EXAMPLES FROM interlace's WEBSITE
2143
2144 Using B<prips> most of the examples from
2145 https://github.com/codingo/Interlace can be run with GNU B<parallel>:
2146
2147 Blocker
2148
2149   commands.txt:
2150     mkdir -p _output_/_target_/scans/
2151     _blocker_
2152     nmap _target_ -oA _output_/_target_/scans/_target_-nmap
2153   interlace -tL ./targets.txt -cL commands.txt -o $output
2154
2155   parallel -a targets.txt \
2156     mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
2157
2158 Blocks
2159
2160   commands.txt:
2161     _block:nmap_
2162     mkdir -p _target_/output/scans/
2163     nmap _target_ -oN _target_/output/scans/_target_-nmap
2164     _block:nmap_
2165     nikto --host _target_
2166   interlace -tL ./targets.txt -cL commands.txt
2167
2168   _nmap() {
2169     mkdir -p $1/output/scans/
2170     nmap $1 -oN $1/output/scans/$1-nmap
2171   }
2172   export -f _nmap
2173   parallel ::: _nmap "nikto --host" :::: targets.txt
2174
2175 Run Nikto Over Multiple Sites
2176
2177   interlace -tL ./targets.txt -threads 5 \
2178     -c "nikto --host _target_ > ./_target_-nikto.txt" -v
2179
2180   parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
2181
2182 Run Nikto Over Multiple Sites and Ports
2183
2184   interlace -tL ./targets.txt -threads 5 -c \
2185     "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
2186     -p 80,443 -v
2187
2188   parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
2189     :::: targets.txt ::: 80 443
2190
2191 Run a List of Commands against Target Hosts
2192
2193   commands.txt:
2194     nikto --host _target_:_port_ > _output_/_target_-nikto.txt
2195     sslscan _target_:_port_ >  _output_/_target_-sslscan.txt
2196     testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
2197   interlace -t example.com -o ~/Engagements/example/ \
2198     -cL ./commands.txt -p 80,443
2199
2200   parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
2201     ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
2202
2203 CIDR notation with an application that doesn't support it
2204
2205   interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
2206     -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
2207
2208   prips 192.168.12.0/24 |
2209     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2210
2211 Glob notation with an application that doesn't support it
2212
2213   interlace -t 192.168.12.* -c "vhostscan _target_ \
2214     -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
2215
2216   # Glob is not supported in prips
2217   prips 192.168.12.0/24 |
2218     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2219
2220 Dash (-) notation with an application that doesn't support it
2221
2222   interlace -t 192.168.12.1-15 -c \
2223     "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2224     -o ~/scans/ -threads 50
2225
2226   # Dash notation is not supported in prips
2227   prips 192.168.12.1 192.168.12.15 |
2228     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2229
2230 Threading Support for an application that doesn't support it
2231
2232   interlace -tL ./target-list.txt -c \
2233     "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
2234     -o ~/scans/ -threads 50
2235
2236   cat ./target-list.txt |
2237     parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
2238
2239 alternatively
2240
2241   ./vhosts-commands.txt:
2242     vhostscan -t $target -oN _output_/_target_-vhosts.txt
2243   interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
2244     -threads 50 -o ~/scans
2245
2246   ./vhosts-commands.txt:
2247     vhostscan -t "$1" -oN "$2"
2248   parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
2249     :::: ./target-list.txt
2250
2251 Exclusions
2252
2253   interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
2254     "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2255     -o ~/scans/ -threads 50
2256
2257   prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
2258     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2259
2260 Run Nikto Using Multiple Proxies
2261
2262    interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
2263      "nikto --host _target_:_port_ -useproxy _proxy_ > \
2264       ./_target_-_port_-nikto.txt" -p 80,443 -v
2265
2266    parallel -j5 \
2267      "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
2268      :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
2269
2270 https://github.com/codingo/Interlace (Last checked: 2019-09)
2271
2272
2273 =head2 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
2274
2275 I have been unable to get the code to run at all. It seems unfinished.
2276
2277 https://github.com/otonvm/Parallel (Last checked: 2019-02)
2278
2279
2280 =head2 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
2281
2282 B<par> requires Haskell to work. This limits the number of platforms
2283 this can work on.
2284
2285 B<par> does line buffering in memory. The memory usage is 3x the
2286 longest line (compared to 1x for B<parallel --lb>). Commands must be
2287 given as arguments. There is no template.
2288
2289 These are the examples from https://github.com/k-bx/par with the
2290 corresponding GNU B<parallel> command.
2291
2292   par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2293       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2294   parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2295       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2296
2297   par "echo foo; sleep 1; foofoo" \
2298       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2299   parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
2300       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2301
2302   par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
2303   parallel --lb --colsep , --tagstring {1} {2} \
2304     ::: "[fooechoer],echo foo" "[bar],echo bar"
2305
2306   par --succeed "foo" "bar" && echo 'wow'
2307   parallel "foo" "bar"; true && echo 'wow'
2308
2309 https://github.com/k-bx/par (Last checked: 2019-02)
2310
2311 =head2 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2312
2313 B<parallelshell> does not allow for composed commands:
2314
2315   # This does not work
2316   parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2317
2318 Instead you have to wrap that in a shell:
2319
2320   parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2321
2322 It buffers output in RAM. All commands must be given on the command
2323 line and all commands are started in parallel at the same time. This
2324 will cause the system to freeze if there are so many jobs that there
2325 is not enough memory to run them all at the same time.
2326
2327 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2328
2329 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2330
2331
2332 =head2 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2333
2334 B<shell-executor> does not allow for composed commands:
2335
2336   # This does not work
2337   sx 'echo foo;echo bar' 'echo baz;echo quuz'
2338
2339 Instead you have to wrap that in a shell:
2340
2341   sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2342
2343 It buffers output in RAM. All commands must be given on the command
2344 line and all commands are started in parallel at the same time. This
2345 will cause the system to freeze if there are so many jobs that there
2346 is not enough memory to run them all at the same time.
2347
2348 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2349
2350
2351 =head2 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2352
2353 B<par> buffers in memory to avoid mixing of jobs. It takes 1s per 1
2354 million output lines.
2355
2356 B<par> needs to have all commands before starting the first job. The
2357 jobs are read from stdin (standard input) so any quoting will have to
2358 be done by the user.
2359
2360 Stdout (standard output) is prepended with o:. Stderr (standard error)
2361 is sendt to stdout (standard output) and prepended with e:.
2362
2363 For short jobs with little output B<par> is 20% faster than GNU
2364 B<parallel> and 60% slower than B<xargs>.
2365
2366 https://github.com/UnixJunkie/PAR
2367
2368 https://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2369
2370
2371 =head2 DIFFERENCES BETWEEN fd AND GNU Parallel
2372
2373 B<fd> does not support composed commands, so commands must be wrapped
2374 in B<sh -c>.
2375
2376 It buffers output in RAM.
2377
2378 It only takes file names from the filesystem as input (similar to B<find>).
2379
2380 https://github.com/sharkdp/fd (Last checked: 2019-02)
2381
2382
2383 =head2 DIFFERENCES BETWEEN lateral AND GNU Parallel
2384
2385 B<lateral> is very similar to B<sem>: It takes a single command and
2386 runs it in the background. The design means that output from parallel
2387 running jobs may mix. If it dies unexpectly it leaves a socket in
2388 ~/.lateral/socket.PID.
2389
2390 B<lateral> deals badly with too long command lines. This makes the
2391 B<lateral> server crash:
2392
2393   lateral run echo `seq 100000| head -c 1000k`
2394
2395 Any options will be read by B<lateral> so this does not work
2396 (B<lateral> interprets the B<-l>):
2397
2398   lateral run ls -l
2399
2400 Composed commands do not work:
2401
2402   lateral run pwd ';' ls
2403
2404 Functions do not work:
2405
2406   myfunc() { echo a; }
2407   export -f myfunc
2408   lateral run myfunc
2409
2410 Running B<emacs> in the terminal causes the parent shell to die:
2411
2412   echo '#!/bin/bash' > mycmd
2413   echo emacs -nw >> mycmd
2414   chmod +x mycmd
2415   lateral start
2416   lateral run ./mycmd
2417
2418 Here are the examples from https://github.com/akramer/lateral with the
2419 corresponding GNU B<sem> and GNU B<parallel> commands:
2420
2421   1$ lateral start
2422      for i in $(cat /tmp/names); do
2423        lateral run -- some_command $i
2424      done
2425      lateral wait
2426
2427   1$ for i in $(cat /tmp/names); do
2428        sem some_command $i
2429      done
2430      sem --wait
2431
2432   1$ parallel some_command :::: /tmp/names
2433
2434   2$ lateral start
2435      for i in $(seq 1 100); do
2436        lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2437      done
2438      lateral wait
2439
2440   2$ for i in $(seq 1 100); do
2441        sem my_slow_command < workfile$i > /tmp/logfile$i
2442      done
2443      sem --wait
2444
2445   2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2446        ::: {1..100}
2447
2448   3$ lateral start -p 0 # yup, it will just queue tasks
2449      for i in $(seq 1 100); do
2450        lateral run -- command_still_outputs_but_wont_spam inputfile$i
2451      done
2452      # command output spam can commence
2453      lateral config -p 10; lateral wait
2454
2455   3$ for i in $(seq 1 100); do
2456        echo "command inputfile$i" >> joblist
2457      done
2458      parallel -j 10 :::: joblist
2459
2460   3$ echo 1 > /tmp/njobs
2461      parallel -j /tmp/njobs command inputfile{} \
2462        ::: {1..100} &
2463      echo 10 >/tmp/njobs
2464      wait
2465
2466 https://github.com/akramer/lateral (Last checked: 2019-03)
2467
2468
2469 =head2 DIFFERENCES BETWEEN with-this AND GNU Parallel
2470
2471 The examples from https://github.com/amritb/with-this.git and the
2472 corresponding GNU B<parallel> command:
2473
2474   with -v "$(cat myurls.txt)" "curl -L this"
2475   parallel curl -L ::: myurls.txt
2476
2477   with -v "$(cat myregions.txt)" \
2478     "aws --region=this ec2 describe-instance-status"
2479   parallel aws --region={} ec2 describe-instance-status \
2480     :::: myregions.txt
2481
2482   with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2483   ls | parallel kubectl --kubeconfig={} get pods
2484
2485   with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2486   ls | grep config | parallel kubectl --kubeconfig={} get pods
2487
2488   with -v "$(echo {1..10})" "echo 123"
2489   parallel -N0 echo 123 ::: {1..10}
2490
2491 Stderr is merged with stdout. B<with-this> buffers in RAM. It uses 3x
2492 the output size, so you cannot have output larger than 1/3rd the
2493 amount of RAM. The input values cannot contain spaces. Composed
2494 commands do not work.
2495
2496 B<with-this> gives some additional information, so the output has to
2497 be cleaned before piping it to the next command.
2498
2499 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2500
2501
2502 =head2 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2503
2504 Summary (see legend above):
2505
2506 =over
2507
2508 =item - - - I4 - - I7
2509
2510 =item - - M3 - - M6
2511
2512 =item - O2 O3 - O5 O6 - x x
2513
2514 =item E1 - - - - - E7
2515
2516 =item - x x x x x x x x
2517
2518 =item - -
2519
2520 =back
2521
2522 =head3 EXAMPLES FROM Tollef's parallel MANUAL
2523
2524 B<Tollef> parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2525
2526 B<GNU> parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2527
2528 B<Tollef> parallel -j 3 ufraw -o processed -- *.NEF
2529
2530 B<GNU> parallel -j 3 ufraw -o processed ::: *.NEF
2531
2532 B<Tollef> parallel -j 3 -- ls df "echo hi"
2533
2534 B<GNU> parallel -j 3 ::: ls df "echo hi"
2535
2536 (Last checked: 2019-08)
2537
2538 =head2 DIFFERENCES BETWEEN rargs AND GNU Parallel
2539
2540 Summary (see legend above):
2541
2542 =over
2543
2544 =item I1 - - - - - I7
2545
2546 =item - - M3 M4 - -
2547
2548 =item - O2 O3 - O5 O6 - O8 -
2549
2550 =item E1 - - E4 - - -
2551
2552 =item - - - - - - - - -
2553
2554 =item - -
2555
2556 =back
2557
2558 B<rargs> has elegant ways of doing named regexp capture and field ranges.
2559
2560 With GNU B<parallel> you can use B<--rpl> to get a similar
2561 functionality as regexp capture gives, and use B<join> and B<@arg> to
2562 get the field ranges. But the syntax is longer. This:
2563
2564   --rpl '{r(\d+)\.\.(\d+)} $_=join"$opt::colsep",@arg[$$1..$$2]'
2565
2566 would make it possible to use:
2567
2568   {1r3..6}
2569
2570 for field 3..6.
2571
2572 For full support of {n..m:s} including negative numbers use a dynamic
2573 replacement string like this:
2574
2575
2576   PARALLEL=--rpl\ \''{r((-?\d+)?)\.\.((-?\d+)?)((:([^}]*))?)}
2577           $a = defined $$2 ? $$2 < 0 ? 1+$#arg+$$2 : $$2 : 1;
2578           $b = defined $$4 ? $$4 < 0 ? 1+$#arg+$$4 : $$4 : $#arg+1;
2579           $s = defined $$6 ? $$7 : " ";
2580           $_ = join $s,@arg[$a..$b]'\'
2581   export PARALLEL
2582
2583 You can then do:
2584
2585   head /etc/passwd | parallel --colsep : echo ..={1r..} ..3={1r..3} \
2586     4..={1r4..} 2..4={1r2..4} 3..3={1r3..3} ..3:-={1r..3:-} \
2587     ..3:/={1r..3:/} -1={-1} -5={-5} -6={-6} -3..={1r-3..}
2588
2589 =head3 EXAMPLES FROM rargs MANUAL
2590
2591   ls *.bak | rargs -p '(.*)\.bak' mv {0} {1}
2592   ls *.bak | parallel mv {} {.}
2593
2594   cat download-list.csv | rargs -p '(?P<url>.*),(?P<filename>.*)' wget {url} -O {filename}
2595   cat download-list.csv | parallel --csv wget {1} -O {2}
2596   # or use regexps:
2597   cat download-list.csv |
2598     parallel --rpl '{url} s/,.*//' --rpl '{filename} s/.*?,//' wget {url} -O {filename}
2599
2600   cat /etc/passwd | rargs -d: echo -e 'id: "{1}"\t name: "{5}"\t rest: "{6..::}"'
2601   cat /etc/passwd |
2602     parallel -q --colsep : echo -e 'id: "{1}"\t name: "{5}"\t rest: "{=6 $_=join":",@arg[6..$#arg]=}"'
2603
2604 https://github.com/lotabout/rargs (Last checked: 2020-01)
2605
2606
2607 =head2 DIFFERENCES BETWEEN threader AND GNU Parallel
2608
2609 Summary (see legend above):
2610
2611 =over
2612
2613 =item I1 - - - - - -
2614
2615 =item M1 - M3 - - M6
2616
2617 =item O1 - O3 - O5 - - N/A N/A
2618
2619 =item E1 - - E4 - - -
2620
2621 =item - - - - - - - - -
2622
2623 =item - -
2624
2625 =back
2626
2627 Newline separates arguments, but newline at the end of file is treated
2628 as an empty argument. So this runs 2 jobs:
2629
2630   echo two_jobs | threader -run 'echo "$THREADID"'
2631
2632 B<threader> ignores stderr, so any output to stderr is
2633 lost. B<threader> buffers in RAM, so output bigger than the machine's
2634 virtual memory will cause the machine to crash.
2635
2636 https://github.com/voodooEntity/threader (Last checked: 2020-04)
2637
2638
2639 =head2 DIFFERENCES BETWEEN runp AND GNU Parallel
2640
2641 Summary (see legend above):
2642
2643 =over
2644
2645 =item I1 I2 - - - - -
2646
2647 =item M1 - (M3) - - M6
2648
2649 =item O1 O2 O3 - O5 O6 - N/A N/A -
2650
2651 =item E1 - - - - - -
2652
2653 =item - - - - - - - - -
2654
2655 =item - -
2656
2657 =back
2658
2659 (M3): You can add a prefix and a postfix to the input, so it means you can
2660 only insert the argument on the command line once.
2661
2662 B<runp> runs 10 jobs in parallel by default.  B<runp> blocks if output
2663 of a command is > 64 Kbytes.  Quoting of input is needed.  It adds
2664 output to stderr (this can be prevented with -q)
2665
2666 =head3 Examples as GNU Parallel
2667
2668   base='https://images-api.nasa.gov/search'
2669   query='jupiter'
2670   desc='planet'
2671   type='image'
2672   url="$base?q=$query&description=$desc&media_type=$type"
2673
2674   # Download the images in parallel using runp
2675   curl -s $url | jq -r .collection.items[].href | \
2676     runp -p 'curl -s' | jq -r .[] | grep large | \
2677     runp -p 'curl -s -L -O'
2678
2679   time curl -s $url | jq -r .collection.items[].href | \
2680     runp -g 1 -q -p 'curl -s' | jq -r .[] | grep large | \
2681     runp -g 1 -q -p 'curl -s -L -O'
2682
2683   # Download the images in parallel
2684   curl -s $url | jq -r .collection.items[].href | \
2685     parallel curl -s | jq -r .[] | grep large | \
2686     parallel curl -s -L -O
2687
2688   time curl -s $url | jq -r .collection.items[].href | \
2689     parallel -j 1 curl -s | jq -r .[] | grep large | \
2690     parallel -j 1 curl -s -L -O
2691
2692
2693 =head4 Run some test commands (read from file)
2694
2695   # Create a file containing commands to run in parallel.
2696   cat << EOF > /tmp/test-commands.txt
2697   sleep 5
2698   sleep 3
2699   blah     # this will fail
2700   ls $PWD  # PWD shell variable is used here
2701   EOF
2702
2703   # Run commands from the file.
2704   runp /tmp/test-commands.txt > /dev/null
2705
2706   parallel -a /tmp/test-commands.txt > /dev/null
2707
2708 =head4 Ping several hosts and see packet loss (read from stdin)
2709
2710   # First copy this line and press Enter
2711   runp -p 'ping -c 5 -W 2' -s '| grep loss'
2712   localhost
2713   1.1.1.1
2714   8.8.8.8
2715   # Press Enter and Ctrl-D when done entering the hosts
2716
2717   # First copy this line and press Enter
2718   parallel ping -c 5 -W 2 {} '| grep loss'
2719   localhost
2720   1.1.1.1
2721   8.8.8.8
2722   # Press Enter and Ctrl-D when done entering the hosts
2723
2724 =head4 Get directories' sizes (read from stdin)
2725
2726   echo -e "$HOME\n/etc\n/tmp" | runp -q -p 'sudo du -sh'
2727
2728   echo -e "$HOME\n/etc\n/tmp" | parallel sudo du -sh
2729   # or:
2730   parallel sudo du -sh ::: "$HOME" /etc /tmp
2731
2732 =head4 Compress files
2733
2734   find . -iname '*.txt' | runp -p 'gzip --best'
2735
2736   find . -iname '*.txt' | parallel gzip --best
2737
2738 =head4 Measure HTTP request + response time
2739
2740   export CURL="curl -w 'time_total:  %{time_total}\n'"
2741   CURL="$CURL -o /dev/null -s https://golang.org/"
2742   perl -wE 'for (1..10) { say $ENV{CURL} }' |
2743      runp -q  # Make 10 requests
2744
2745   perl -wE 'for (1..10) { say $ENV{CURL} }' | parallel
2746   # or:
2747   parallel -N0 "$CURL" ::: {1..10}
2748
2749 =head4 Find open TCP ports
2750
2751   cat << EOF > /tmp/host-port.txt
2752   localhost 22
2753   localhost 80
2754   localhost 81
2755   127.0.0.1 443
2756   127.0.0.1 444
2757   scanme.nmap.org 22
2758   scanme.nmap.org 23
2759   scanme.nmap.org 443
2760   EOF
2761
2762   1$ cat /tmp/host-port.txt |
2763        runp -q -p 'netcat -v -w2 -z' 2>&1 | egrep '(succeeded!|open)$'
2764
2765   # --colsep is needed to split the line
2766   1$ cat /tmp/host-port.txt |
2767        parallel --colsep ' ' netcat -v -w2 -z 2>&1 |
2768        egrep '(succeeded!|open)$'
2769   # or use uq for unquoted:
2770   1$ cat /tmp/host-port.txt |
2771        parallel netcat -v -w2 -z {=uq=} 2>&1 |
2772        egrep '(succeeded!|open)$'
2773
2774 https://github.com/jreisinger/runp (Last checked: 2020-04)
2775
2776
2777 =head2 DIFFERENCES BETWEEN papply AND GNU Parallel
2778
2779 Summary (see legend above):
2780
2781 =over
2782
2783 =item - - - I4 - - -
2784
2785 =item M1 - M3 - - M6
2786
2787 =item - - O3 - O5 - - N/A N/A O10
2788
2789 =item E1 - - E4 - - -
2790
2791 =item - - - - - - - - -
2792
2793 =item - -
2794
2795 =back
2796
2797 B<papply> does not print the output if the command fails:
2798
2799   $ papply 'echo %F; false' foo
2800   "echo foo; false" did not succeed
2801
2802 B<papply>'s replacement strings (%F %d %f %n %e %z) can be simulated in GNU
2803 B<parallel> by putting this in B<~/.parallel/config>:
2804
2805   --rpl '%F'
2806   --rpl '%d $_=Q(::dirname($_));'
2807   --rpl '%f s:.*/::;'
2808   --rpl '%n s:.*/::;s:\.[^/.]+$::;'
2809   --rpl '%e s:.*\.:.:'
2810   --rpl '%z $_=""'
2811
2812 B<papply> buffers in RAM, and uses twice the amount of output. So
2813 output of 5 GB takes 10 GB RAM.
2814
2815 The buffering is very CPU intensive: Buffering a line of 5 GB takes 40
2816 seconds (compared to 10 seconds with GNU B<parallel>).
2817
2818
2819 =head3 Examples as GNU Parallel
2820
2821   1$ papply gzip *.txt
2822
2823   1$ parallel gzip ::: *.txt
2824
2825   2$ papply "convert %F %n.jpg" *.png
2826
2827   2$ parallel convert {} {.}.jpg ::: *.png
2828
2829
2830 https://pypi.org/project/papply/ (Last checked: 2020-04)
2831
2832
2833 =head2 DIFFERENCES BETWEEN async AND GNU Parallel
2834
2835 Summary (see legend above):
2836
2837 =over
2838
2839 =item - - - I4 - - I7
2840
2841 =item - - - - - M6
2842
2843 =item - O2 O3 - O5 O6 - N/A N/A O10
2844
2845 =item E1 - - E4 - E6 -
2846
2847 =item - - - - - - - - -
2848
2849 =item S1 S2
2850
2851 =back
2852
2853 B<async> is very similary to GNU B<parallel>'s B<--semaphore> mode
2854 (aka B<sem>). B<async> requires the user to start a server process.
2855
2856 The input is quoted like B<-q> so you need B<bash -c "...;..."> to run
2857 composed commands.
2858
2859 =head3 Examples as GNU Parallel
2860
2861   1$ S="/tmp/example_socket"
2862
2863   1$ ID=myid
2864
2865   2$ async -s="$S" server --start
2866
2867   2$ # GNU Parallel does not need a server to run
2868
2869   3$ for i in {1..20}; do
2870          # prints command output to stdout
2871          async -s="$S" cmd -- bash -c "sleep 1 && echo test $i"
2872      done
2873
2874   3$ for i in {1..20}; do
2875          # prints command output to stdout
2876          sem --id "$ID" -j100% "sleep 1 && echo test $i"
2877          # GNU Parallel will only print job when it is done
2878          # If you need output from different jobs to mix
2879          # use -u or --line-buffer
2880          sem --id "$ID" -j100% --line-buffer "sleep 1 && echo test $i"
2881      done
2882
2883   4$ # wait until all commands are finished
2884      async -s="$S" wait
2885
2886   4$ sem --id "$ID" --wait
2887
2888   5$ # configure the server to run four commands in parallel
2889      async -s="$S" server -j4
2890
2891   5$ export PARALLEL=-j4
2892
2893   6$ mkdir "/tmp/ex_dir"
2894      for i in {21..40}; do
2895        # redirects command output to /tmp/ex_dir/file*
2896        async -s="$S" cmd -o "/tmp/ex_dir/file$i" -- \
2897          bash -c "sleep 1 && echo test $i"
2898      done
2899
2900   6$ mkdir "/tmp/ex_dir"
2901      for i in {21..40}; do
2902        # redirects command output to /tmp/ex_dir/file*
2903        sem --id "$ID" --result '/tmp/my-ex/file-{=$_=""=}'"$i" \
2904          "sleep 1 && echo test $i"
2905      done
2906
2907   7$ sem --id "$ID" --wait
2908
2909   7$ async -s="$S" wait
2910
2911   8$ # stops server
2912      async -s="$S" server --stop
2913
2914   8$ # GNU Parallel does not need to stop a server
2915
2916
2917 https://github.com/ctbur/async/ (Last checked: 2020-11)
2918
2919
2920 =head2 DIFFERENCES BETWEEN pardi AND GNU Parallel
2921
2922 Summary (see legend above):
2923
2924 =over
2925
2926 =item I1 I2 - - - - I7
2927
2928 =item M1 - - - - M6
2929
2930 =item O1 O2 O3 O4 O5 - O7 - - O10
2931
2932 =item E1 - - E4 - - -
2933
2934 =item - - - - - - - - -
2935
2936 =item - -
2937
2938 =back
2939
2940 B<pardi> is very similar to B<parallel --pipe --cat>: It reads blocks
2941 of data and not arguments. So it cannot insert an argument in the
2942 command line. It puts the block into a temporary file, and this file
2943 name (%IN) can be put in the command line. You can only use %IN once.
2944
2945 It can also run full command lines in parallel (like: B<cat file |
2946 parallel>).
2947
2948 =head3 EXAMPLES FROM pardi test.sh
2949
2950   1$ time pardi -v -c 100 -i data/decoys.smi -ie .smi -oe .smi \
2951        -o data/decoys_std_pardi.smi \
2952           -w '(standardiser -i %IN -o %OUT 2>&1) > /dev/null'
2953
2954   1$ cat data/decoys.smi |
2955        time parallel -N 100 --pipe --cat \
2956          '(standardiser -i {} -o {#} 2>&1) > /dev/null; cat {#}; rm {#}' \
2957          > data/decoys_std_pardi.smi
2958
2959   2$ pardi -n 1 -i data/test_in.types -o data/test_out.types \
2960              -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2961
2962   2$ cat data/test_in.types | parallel -n 1 -k --pipe --cat \
2963              --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2964
2965   3$ pardi -c 6 -i data/test_in.types -o data/test_out.types \
2966              -d 'r:^#atoms:' -w 'cat %IN > %OUT'
2967
2968   3$ cat data/test_in.types | parallel -n 6 -k --pipe --cat \
2969              --regexp --recstart '^#atoms' 'cat {}' > data/test_out.types
2970
2971   4$ pardi -i data/decoys.mol2 -o data/still_decoys.mol2 \
2972              -d 's:@<TRIPOS>MOLECULE' -w 'cp %IN %OUT'
2973
2974   4$ cat data/decoys.mol2 |
2975        parallel -n 1 --pipe --cat --recstart '@<TRIPOS>MOLECULE' \
2976          'cp {} {#}; cat {#}; rm {#}' > data/still_decoys.mol2
2977
2978   5$ pardi -i data/decoys.mol2 -o data/decoys2.mol2 \
2979              -d b:10000 -w 'cp %IN %OUT' --preserve
2980
2981   5$ cat data/decoys.mol2 |
2982        parallel -k --pipe --block 10k --recend '' --cat \
2983          'cat {} > {#}; cat {#}; rm {#}' > data/decoys2.mol2
2984
2985 https://github.com/UnixJunkie/pardi (Last checked: 2021-01)
2986
2987
2988 =head2 DIFFERENCES BETWEEN bthread AND GNU Parallel
2989
2990 Summary (see legend above):
2991
2992 =over
2993
2994 =item - - - I4 -  - -
2995
2996 =item - - - - - M6
2997
2998 =item O1 - O3 - - - O7 O8 - -
2999
3000 =item E1 - - - - - -
3001
3002 =item - - - - - - - - -
3003
3004 =item - -
3005
3006 =back
3007
3008 B<bthread> takes around 1 sec per MB of output. The maximal output
3009 line length is 1073741759.
3010
3011 You cannot quote space in the command, so you cannot run composed
3012 commands like B<sh -c "echo a; echo b">.
3013
3014 https://gitlab.com/netikras/bthread (Last checked: 2021-01)
3015
3016
3017 =head2 DIFFERENCES BETWEEN simple_gpu_scheduler AND GNU Parallel
3018
3019 Summary (see legend above):
3020
3021 =over
3022
3023 =item I1 - - - - - I7
3024
3025 =item M1 - - - - M6
3026
3027 =item - O2 O3 - - O6 - x x O10
3028
3029 =item E1 - - - - - -
3030
3031 =item - - - - - - - - -
3032
3033 =item - -
3034
3035 =back
3036
3037 =head3 EXAMPLES FROM simple_gpu_scheduler MANUAL
3038
3039   1$ simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
3040
3041   1$ parallel -j3 --shuf \
3042      CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
3043
3044   2$ simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
3045        -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
3046        simple_gpu_scheduler --gpus 0,1,2
3047
3048   2$ parallel --header : --shuf -j3 -v \
3049        CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' \
3050        python3 train_dnn.py --lr {lr} --batch_size {bs} \
3051        ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3052
3053   3$ simple_hypersearch \
3054        "python3 train_dnn.py --lr {lr} --batch_size {bs}" \
3055        --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 |
3056        simple_gpu_scheduler --gpus 0,1,2
3057
3058   3$ parallel --header : --shuf \
3059        CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' \
3060        python3 train_dnn.py --lr {lr} --batch_size {bs} \
3061        ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3062
3063   4$ touch gpu.queue
3064      tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
3065      echo "my_command_with | and stuff > logfile" >> gpu.queue
3066
3067   4$ touch gpu.queue
3068      tail -f -n 0 gpu.queue |
3069        parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
3070      # Needed to fill job slots once
3071      seq 3 | parallel echo true >> gpu.queue
3072      # Add jobs
3073      echo "my_command_with | and stuff > logfile" >> gpu.queue
3074      # Needed to flush output from completed jobs
3075      seq 3 | parallel echo true >> gpu.queue
3076
3077 https://github.com/ExpectationMax/simple_gpu_scheduler (Last checked:
3078 2021-01)
3079
3080
3081 =head2 DIFFERENCES BETWEEN parasweep AND GNU Parallel
3082
3083 B<parasweep> is a Python module for facilitating parallel parameter
3084 sweeps.
3085
3086 A B<parasweep> job will normally take a text file as input. The text
3087 file contains arguments for the job. Some of these arguments will be
3088 fixed and some of them will be changed by B<parasweep>.
3089
3090 It does this by having a template file such as template.txt:
3091
3092   Xval: {x}
3093   Yval: {y}
3094   FixedValue: 9
3095   # x with 2 decimals
3096   DecimalX: {x:.2f}
3097   TenX: ${x*10}
3098   RandomVal: {r}
3099
3100 and from this template it generates the file to be used by the job by
3101 replacing the replacement strings.
3102
3103 Being a Python module B<parasweep> integrates tighter with Python than
3104 GNU B<parallel>. You get the parameters directly in a Python data
3105 structure. With GNU B<parallel> you can use the JSON or CSV output
3106 format to get something similar, but you would have to read the
3107 output.
3108
3109 B<parasweep> has a filtering method to ignore parameter combinations
3110 you do not need.
3111
3112 Instead of calling the jobs directly, B<parasweep> can use Python's
3113 Distributed Resource Management Application API to make jobs run with
3114 different cluster software.
3115
3116
3117 GNU B<parallel> B<--tmpl> supports templates with replacement
3118 strings. Such as:
3119
3120   Xval: {x}
3121   Yval: {y}
3122   FixedValue: 9
3123   # x with 2 decimals
3124   DecimalX: {=x $_=sprintf("%.2f",$_) =}
3125   TenX: {=x $_=$_*10 =}
3126   RandomVal: {=1 $_=rand() =}
3127
3128 that can be used like:
3129
3130   parallel --header : --tmpl my.tmpl={#}.t myprog {#}.t \
3131     ::: x 1 2 3 ::: y 1 2 3
3132
3133 Filtering is supported as:
3134
3135   parallel --filter '{1} > {2}' echo ::: 1 2 3 ::: 1 2 3
3136
3137 https://github.com/eviatarbach/parasweep (Last checked: 2021-01)
3138
3139
3140 =head2 DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
3141
3142 Summary (see legend above):
3143
3144 =over
3145
3146 =item I1 I2 - - - - -
3147
3148 =item - - M3 - - M6
3149
3150 =item - O2 O3 - O5 O6 - O8 x O10
3151
3152 =item E1 - - - - - -
3153
3154 =item - - - - - - - - -
3155
3156 =item - -
3157
3158 =back
3159
3160 B<parallel-bash> is written in pure bash. It is really fast (overhead
3161 of ~0.05 ms/job compared to GNU B<parallel>'s ~3 ms/job). So if your
3162 jobs are extremely short lived, and you can live with the quite
3163 limited command, this may be useful.
3164
3165 It works by making a queue for each process. Then the jobs are
3166 distributed to the queues in a round robin fashion. Finally the queues
3167 are started in parallel. This works fine, if you are lucky, but if
3168 not, all the long jobs may end up in the same queue, so you may see:
3169
3170   $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
3171       time parallel -P4 sleep {}
3172   (7 seconds)
3173   $ printf "%b\n" 1 1 1 4 1 1 1 4 1 1 1 4 |
3174       time ./parallel-bash.bash -p 4 -c sleep {}
3175   (12 seconds)
3176
3177 Ctrl-C does not stop spawning new jobs. Ctrl-Z does not suspend
3178 running jobs.
3179
3180
3181 =head3 EXAMPLES FROM parallel-bash
3182
3183   1$ some_input | parallel-bash -p 5 -c echo
3184
3185   1$ some_input | parallel -j 5 echo
3186
3187   2$ parallel-bash -p 5 -c echo < some_file
3188
3189   2$ parallel -j 5 echo < some_file
3190
3191   3$ parallel-bash -p 5 -c echo <<< 'some string'
3192
3193   3$ parallel -j 5 -c echo <<< 'some string'
3194
3195   4$ something | parallel-bash -p 5 -c echo {} {}
3196
3197   4$ something | parallel -j 5 echo {} {}
3198
3199 https://reposhub.com/python/command-line-tools/Akianonymus-parallel-bash.html
3200 (Last checked: 2021-06)
3201
3202
3203 =head2 DIFFERENCES BETWEEN bash-concurrent AND GNU Parallel
3204
3205 B<bash-concurrent> is more an alternative to B<make> than to GNU
3206 B<parallel>. Its input is very similar to a Makefile, where jobs
3207 depend on other jobs.
3208
3209 It has a nice progress indicator where you can see which jobs
3210 completed successfully, which jobs are currently running, which jobs
3211 failed, and which jobs were skipped due to a depending job failed.
3212 The indicator does not deal well with resizing the window.
3213
3214 Output is cached in tempfiles on disk, but is only shown if there is
3215 an error, so it is not meant to be part of a UNIX pipeline. If
3216 B<bash-concurrent> crashes these tempfiles are not removed.
3217
3218 It uses an O(n*n) algorithm, so if you have 1000 independent jobs it
3219 takes 22 seconds to start it.
3220
3221 https://github.com/themattrix/bash-concurrent
3222 (Last checked: 2021-02)
3223
3224
3225 =head2 Todo
3226
3227 http://code.google.com/p/spawntool/
3228
3229 http://code.google.com/p/push/
3230
3231 https://github.com/mylanconnolly/parallel
3232
3233 https://github.com/krashanoff/parallel
3234
3235 https://github.com/Nukesor/pueue
3236
3237 https://arxiv.org/pdf/2012.15443.pdf KumQuat
3238
3239 https://arxiv.org/pdf/2007.09436.pdf PaSH: Light-touch Data-Parallel Shell Processing
3240
3241 https://github.com/JeiKeiLim/simple_distribute_job
3242
3243 https://github.com/reggi/pkgrun - not obvious how to use
3244
3245 https://github.com/benoror/better-npm-run - not obvious how to use
3246
3247 https://github.com/bahmutov/with-package
3248
3249 https://github.com/xuchenCN/go-pssh
3250
3251 https://github.com/flesler/parallel
3252
3253 https://github.com/Julian/Verge
3254
3255 https://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html
3256
3257 https://vicerveza.homeunix.net/~viric/soft/ts/
3258
3259 https://github.com/chapmanjacobd/que
3260
3261 https://github.com/ExpectationMax/simple_gpu_scheduler
3262     simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
3263     parallel -j3 --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
3264
3265     simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
3266     parallel --header : --shuf -j3 -v CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3267
3268     simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
3269     parallel --header : --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
3270
3271     touch gpu.queue
3272     tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
3273     echo "my_command_with | and stuff > logfile" >> gpu.queue
3274
3275     touch gpu.queue
3276     tail -f -n 0 gpu.queue | parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
3277     # Needed to fill job slots once
3278     seq 3 | parallel echo true >> gpu.queue
3279     # Add jobs
3280     echo "my_command_with | and stuff > logfile" >> gpu.queue
3281     # Needed to flush output from completed jobs
3282     seq 3 | parallel echo true >> gpu.queue
3283
3284 https://github.com/Overv/outrun#outrun
3285
3286 =head1 TESTING OTHER TOOLS
3287
3288 There are certain issues that are very common on parallelizing
3289 tools. Here are a few stress tests. Be warned: If the tool is badly
3290 coded it may overload your machine.
3291
3292
3293 =head2 MIX: Output mixes
3294
3295 Output from 2 jobs should not mix. If the output is not used, this
3296 does not matter; but if the output I<is> used then it is important
3297 that you do not get half a line from one job followed by half a line
3298 from another job.
3299
3300 If the tool does not buffer, output will most likely mix now and then.
3301
3302 This test stresses whether output mixes.
3303
3304   #!/bin/bash
3305
3306   paralleltool="parallel -j0"
3307
3308   cat <<-EOF > mycommand
3309   #!/bin/bash
3310
3311   # If a, b, c, d, e, and f mix: Very bad
3312   perl -e 'print STDOUT "a"x3000_000," "'
3313   perl -e 'print STDERR "b"x3000_000," "'
3314   perl -e 'print STDOUT "c"x3000_000," "'
3315   perl -e 'print STDERR "d"x3000_000," "'
3316   perl -e 'print STDOUT "e"x3000_000," "'
3317   perl -e 'print STDERR "f"x3000_000," "'
3318   echo
3319   echo >&2
3320   EOF
3321   chmod +x mycommand
3322
3323   # Run 30 jobs in parallel
3324   seq 30 |
3325     $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
3326
3327   # 'a c e' and 'b d f' should always stay together
3328   # and there should only be a single line per job
3329
3330
3331 =head2 STDERRMERGE: Stderr is merged with stdout
3332
3333 Output from stdout and stderr should not be merged, but kept separated.
3334
3335 This test shows whether stdout is mixed with stderr.
3336
3337   #!/bin/bash
3338
3339   paralleltool="parallel -j0"
3340
3341   cat <<-EOF > mycommand
3342   #!/bin/bash
3343
3344   echo stdout
3345   echo stderr >&2
3346   echo stdout
3347   echo stderr >&2
3348   EOF
3349   chmod +x mycommand
3350
3351   # Run one job
3352   echo |
3353     $paralleltool ./mycommand > stdout 2> stderr
3354   cat stdout
3355   cat stderr
3356
3357
3358 =head2 RAM: Output limited by RAM
3359
3360 Some tools cache output in RAM. This makes them extremely slow if the
3361 output is bigger than physical memory and crash if the output is
3362 bigger than the virtual memory.
3363
3364   #!/bin/bash
3365
3366   paralleltool="parallel -j0"
3367
3368   cat <<'EOF' > mycommand
3369   #!/bin/bash
3370
3371   # Generate 1 GB output
3372   yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
3373   EOF
3374   chmod +x mycommand
3375
3376   # Run 20 jobs in parallel
3377   # Adjust 20 to be > physical RAM and < free space on /tmp
3378   seq 20 | time $paralleltool ./mycommand | wc -c
3379
3380
3381 =head2 DISKFULL: Incomplete data if /tmp runs full
3382
3383 If caching is done on disk, the disk can run full during the run. Not
3384 all programs discover this. GNU Parallel discovers it, if it stays
3385 full for at least 2 seconds.
3386
3387   #!/bin/bash
3388
3389   paralleltool="parallel -j0"
3390
3391   # This should be a dir with less than 100 GB free space
3392   smalldisk=/tmp/shm/parallel
3393
3394   TMPDIR="$smalldisk"
3395   export TMPDIR
3396
3397   max_output() {
3398       # Force worst case scenario:
3399       # Make GNU Parallel only check once per second
3400       sleep 10
3401       # Generate 100 GB to fill $TMPDIR
3402       # Adjust if /tmp is bigger than 100 GB
3403       yes | head -c 100G >$TMPDIR/$$
3404       # Generate 10 MB output that will not be buffered due to full disk
3405       perl -e 'print "X"x10_000_000' | head -c 10M
3406       echo This part is missing from incomplete output
3407       sleep 2
3408       rm $TMPDIR/$$
3409       echo Final output
3410   }
3411
3412   export -f max_output
3413   seq 10 | $paralleltool max_output | tr -s X
3414
3415
3416 =head2 CLEANUP: Leaving tmp files at unexpected death
3417
3418 Some tools do not clean up tmp files if they are killed. If the tool
3419 buffers on disk, they may not clean up, if they are killed.
3420
3421   #!/bin/bash
3422
3423   paralleltool=parallel
3424
3425   ls /tmp >/tmp/before
3426   seq 10 | $paralleltool sleep &
3427   pid=$!
3428   # Give the tool time to start up
3429   sleep 1
3430   # Kill it without giving it a chance to cleanup
3431   kill -9 $!
3432   # Should be empty: No files should be left behind
3433   diff <(ls /tmp) /tmp/before
3434
3435
3436 =head2 SPCCHAR: Dealing badly with special file names.
3437
3438 It is not uncommon for users to create files like:
3439
3440   My brother's 12" *** record  (costs $$$).jpg
3441
3442 Some tools break on this.
3443
3444   #!/bin/bash
3445
3446   paralleltool=parallel
3447
3448   touch "My brother's 12\" *** record  (costs \$\$\$).jpg"
3449   ls My*jpg | $paralleltool ls -l
3450
3451
3452 =head2 COMPOSED: Composed commands do not work
3453
3454 Some tools require you to wrap composed commands into B<bash -c>.
3455
3456   echo bar | $paralleltool echo foo';' echo {}
3457
3458
3459 =head2 ONEREP: Only one replacement string allowed
3460
3461 Some tools can only insert the argument once.
3462
3463   echo bar | $paralleltool echo {} foo {}
3464
3465
3466 =head2 INPUTSIZE: Length of input should not be limited
3467
3468 Some tools limit the length of the input lines artificially with no good
3469 reason. GNU B<parallel> does not:
3470
3471   perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
3472
3473 GNU B<parallel> limits the command to run to 128 KB due to execve(1):
3474
3475   perl -e 'print "x"x131_000' | parallel echo {} | wc
3476
3477
3478 =head2 NUMWORDS: Speed depends on number of words
3479
3480 Some tools become very slow if output lines have many words.
3481
3482   #!/bin/bash
3483
3484   paralleltool=parallel
3485
3486   cat <<-EOF > mycommand
3487   #!/bin/bash
3488
3489   # 10 MB of lines with 1000 words
3490   yes "`seq 1000`" | head -c 10M
3491   EOF
3492   chmod +x mycommand
3493
3494   # Run 30 jobs in parallel
3495   seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
3496
3497 =head2 4GB: Output with a line > 4GB should be OK
3498
3499   #!/bin/bash
3500
3501   paralleltool="parallel -j0"
3502
3503   cat <<-EOF > mycommand
3504   #!/bin/bash
3505
3506   perl -e '\$a="a"x1000_000; for(1..5000) { print \$a }'
3507   EOF
3508   chmod +x mycommand
3509
3510   # Run 1 job
3511   seq 1 | $paralleltool ./mycommand | LC_ALL=C wc
3512
3513
3514 =head1 AUTHOR
3515
3516 When using GNU B<parallel> for a publication please cite:
3517
3518 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
3519 The USENIX Magazine, February 2011:42-47.
3520
3521 This helps funding further development; and it won't cost you a cent.
3522 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
3523
3524 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
3525
3526 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
3527
3528 Copyright (C) 2010-2021 Ole Tange, http://ole.tange.dk and Free
3529 Software Foundation, Inc.
3530
3531 Parts of the manual concerning B<xargs> compatibility is inspired by
3532 the manual of B<xargs> from GNU findutils 4.4.2.
3533
3534
3535 =head1 LICENSE
3536
3537 This program is free software; you can redistribute it and/or modify
3538 it under the terms of the GNU General Public License as published by
3539 the Free Software Foundation; either version 3 of the License, or
3540 at your option any later version.
3541
3542 This program is distributed in the hope that it will be useful,
3543 but WITHOUT ANY WARRANTY; without even the implied warranty of
3544 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
3545 GNU General Public License for more details.
3546
3547 You should have received a copy of the GNU General Public License
3548 along with this program.  If not, see <https://www.gnu.org/licenses/>.
3549
3550 =head2 Documentation license I
3551
3552 Permission is granted to copy, distribute and/or modify this
3553 documentation under the terms of the GNU Free Documentation License,
3554 Version 1.3 or any later version published by the Free Software
3555 Foundation; with no Invariant Sections, with no Front-Cover Texts, and
3556 with no Back-Cover Texts.  A copy of the license is included in the
3557 file LICENSES/GFDL-1.3-or-later.txt.
3558
3559 =head2 Documentation license II
3560
3561 You are free:
3562
3563 =over 9
3564
3565 =item B<to Share>
3566
3567 to copy, distribute and transmit the work
3568
3569 =item B<to Remix>
3570
3571 to adapt the work
3572
3573 =back
3574
3575 Under the following conditions:
3576
3577 =over 9
3578
3579 =item B<Attribution>
3580
3581 You must attribute the work in the manner specified by the author or
3582 licensor (but not in any way that suggests that they endorse you or
3583 your use of the work).
3584
3585 =item B<Share Alike>
3586
3587 If you alter, transform, or build upon this work, you may distribute
3588 the resulting work only under the same, similar or a compatible
3589 license.
3590
3591 =back
3592
3593 With the understanding that:
3594
3595 =over 9
3596
3597 =item B<Waiver>
3598
3599 Any of the above conditions can be waived if you get permission from
3600 the copyright holder.
3601
3602 =item B<Public Domain>
3603
3604 Where the work or any of its elements is in the public domain under
3605 applicable law, that status is in no way affected by the license.
3606
3607 =item B<Other Rights>
3608
3609 In no way are any of the following rights affected by the license:
3610
3611 =over 2
3612
3613 =item *
3614
3615 Your fair dealing or fair use rights, or other applicable
3616 copyright exceptions and limitations;
3617
3618 =item *
3619
3620 The author's moral rights;
3621
3622 =item *
3623
3624 Rights other persons may have either in the work itself or in
3625 how the work is used, such as publicity or privacy rights.
3626
3627 =back
3628
3629 =back
3630
3631 =over 9
3632
3633 =item B<Notice>
3634
3635 For any reuse or distribution, you must make clear to others the
3636 license terms of this work.
3637
3638 =back
3639
3640 A copy of the full license is included in the file as
3641 LICENCES/CC-BY-SA-4.0.txt
3642
3643
3644 =head1 DEPENDENCIES
3645
3646 GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
3647 IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
3648 it also uses rsync with ssh.
3649
3650
3651 =head1 SEE ALSO
3652
3653 B<find>(1), B<xargs>(1), B<make>(1), B<pexec>(1), B<ppss>(1),
3654 B<xjobs>(1), B<prll>(1), B<dxargs>(1), B<mdm>(1)
3655
3656 =cut