src/parallel_alternatives.pod

   1 #!/usr/bin/perl -w
   2
   3 =encoding utf8
   4
   5 =head1 NAME
   6
   7 parallel_alternatives - Alternatives to GNU B<parallel>
   8
   9
  10 =head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
  11
  12 There are a lot programs with some of the functionality of GNU
  13 B<parallel>. GNU B<parallel> strives to include the best of the
  14 functionality without sacrificing ease of use.
  15
  16 B<parallel> has existed since 2002 and as GNU B<parallel> since
  17 2010. A lot of the alternatives have not had the vitality to survive
  18 that long, but have come and gone during that time.
  19
  20 GNU B<parallel> is actively maintained with a new release every month
  21 since 2010. Most other alternatives are fleeting interests of the
  22 developers with irregular releases and only maintained for a few
  23 years.
  24
  25
  26 =head2 SUMMARY TABLE
  27
  28 The following features are in some of the comparable tools:
  29
  30 B<Inputs>
  31  I1. Arguments can be read from stdin
  32  I2. Arguments can be read from a file
  33  I3. Arguments can be read from multiple files
  34  I4. Arguments can be read from command line
  35  I5. Arguments can be read from a table
  36  I6. Arguments can be read from the same file using #! (shebang)
  37  I7. Line oriented input as default (Quoting of special chars not needed)
  38
  39 B<Manipulation of input>
  40  M1. Composed command
  41  M2. Multiple arguments can fill up an execution line
  42  M3. Arguments can be put anywhere in the execution line
  43  M4. Multiple arguments can be put anywhere in the execution line
  44  M5. Arguments can be replaced with context
  45  M6. Input can be treated as the complete command line
  46
  47 B<Outputs>
  48  O1. Grouping output so output from different jobs do not mix
  49  O2. Send stderr (standard error) to stderr (standard error)
  50  O3. Send stdout (standard output) to stdout (standard output)
  51  O4. Order of output can be same as order of input
  52  O5. Stdout only contains stdout (standard output) from the command
  53  O6. Stderr only contains stderr (standard error) from the command
  54  O7. Buffering on disk
  55  O8. Cleanup of file if killed
  56  O9. Test if disk runs full during run
  57
  58 B<Execution>
  59  E1. Running jobs in parallel
  60  E2. List running jobs
  61  E3. Finish running jobs, but do not start new jobs
  62  E4. Number of running jobs can depend on number of cpus
  63  E5. Finish running jobs, but do not start new jobs after first failure
  64  E6. Number of running jobs can be adjusted while running
  65  E7. Only spawn new jobs if load is less than a limit
  66
  67 B<Remote execution>
  68  R1. Jobs can be run on remote computers
  69  R2. Basefiles can be transferred
  70  R3. Argument files can be transferred
  71  R4. Result files can be transferred
  72  R5. Cleanup of transferred files
  73  R6. No config files needed
  74  R7. Do not run more than SSHD's MaxStartups can handle
  75  R8. Configurable SSH command
  76  R9. Retry if connection breaks occasionally
  77
  78 B<Semaphore>
  79  S1. Possibility to work as a mutex
  80  S2. Possibility to work as a counting semaphore
  81
  82 B<Legend>
  83  - = no
  84  x = not applicable
  85  ID = yes
  86
  87 As every new version of the programs are not tested the table may be
  88 outdated. Please file a bug-report if you find errors (See REPORTING
  89 BUGS).
  90
  91 parallel:
  92 I1 I2 I3 I4 I5 I6 I7
  93 M1 M2 M3 M4 M5 M6
  94 O1 O2 O3 O4 O5 O6 O7 O8 O9
  95 E1 E2 E3 E4 E5 E6 E7
  96 R1 R2 R3 R4 R5 R6 R7 R8 R9
  97 S1 S2
  98
  99 find -exec:
 100 -  -  -  x  -  x  -
 101 -  M2 M3 -  -  -  -
 102 -  O2 O3 O4 O5 O6
 103 -  -  -  -  -  -  -
 104 -  -  -  -  -  -  -  -  -
 105 x  x
 106
 107 make -j:
 108 -  -  -  -  -  -  -
 109 -  -  -  -  -  -
 110 O1 O2 O3 -  x  O6
 111 E1 -  -  -  E5 -
 112 -  -  -  -  -  -  -  -  -
 113 -  -
 114
 115
 116 xjobs, prll, dxargs, mdm/middelman, xapply, paexec, ladon, jobflow,
 117 ClusterSSH: TODO - Please file a bug-report if you know what features
 118 they support (See REPORTING BUGS).
 119
 120
 121 =head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
 122
 123 Summary table (see legend above):
 124 I1 I2 - - - - -
 125 - M2 M3 - - -
 126 - O2 O3 - O5 O6
 127 E1 - - - - - -
 128 - - - - - x - - -
 129 - -
 130
 131 B<xargs> offers some of the same possibilities as GNU B<parallel>.
 132
 133 B<xargs> deals badly with special characters (such as space, \, ' and
 134 "). To see the problem try this:
 135
 136   touch important_file
 137   touch 'not important_file'
 138   ls not* | xargs rm
 139   mkdir -p "My brother's 12\" records"
 140   ls | xargs rmdir
 141   touch 'c:\windows\system32\clfs.sys'
 142   echo 'c:\windows\system32\clfs.sys' | xargs ls -l
 143
 144 You can specify B<-0>, but many input generators are not optimized for
 145 using B<NUL> as separator but are optimized for B<newline> as
 146 separator. E.g. B<awk>, B<ls>, B<echo>, B<tar -v>, B<head> (requires
 147 using B<-z>), B<tail> (requires using B<-z>), B<sed> (requires using
 148 B<-z>), B<perl> (B<-0> and \0 instead of \n), B<locate> (requires
 149 using B<-0>), B<find> (requires using B<-print0>), B<grep> (requires
 150 using B<-z> or B<-Z>), B<sort> (requires using B<-z>).
 151
 152 GNU B<parallel>'s newline separation can be emulated with:
 153
 154 B<cat | xargs -d "\n" -n1 I<command>>
 155
 156 B<xargs> can run a given number of jobs in parallel, but has no
 157 support for running number-of-cpu-cores jobs in parallel.
 158
 159 B<xargs> has no support for grouping the output, therefore output may
 160 run together, e.g. the first half of a line is from one process and
 161 the last half of the line is from another process. The example
 162 B<Parallel grep> cannot be done reliably with B<xargs> because of
 163 this. To see this in action try:
 164
 165   parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
 166     '>' {} ::: a b c d e f g h
 167   # Serial = no mixing = the wanted result
 168   # 'tr -s a-z' squeezes repeating letters into a single letter
 169   echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
 170   # Compare to 8 jobs in parallel
 171   parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
 172   echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
 173   echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
 174     tr -s a-z
 175
 176 Or try this:
 177
 178   slow_seq() {
 179     echo Count to "$@"
 180     seq "$@" |
 181       perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
 182   }
 183   export -f slow_seq
 184   # Serial = no mixing = the wanted result
 185   seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
 186   # Compare to 8 jobs in parallel
 187   seq 8 | parallel -P8 slow_seq {}
 188   seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
 189
 190 B<xargs> has no support for keeping the order of the output, therefore
 191 if running jobs in parallel using B<xargs> the output of the second
 192 job cannot be postponed till the first job is done.
 193
 194 B<xargs> has no support for running jobs on remote computers.
 195
 196 B<xargs> has no support for context replace, so you will have to create the
 197 arguments.
 198
 199 If you use a replace string in B<xargs> (B<-I>) you can not force
 200 B<xargs> to use more than one argument.
 201
 202 Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
 203 composed commands and redirection require using B<bash -c>.
 204
 205   ls | parallel "wc {} >{}.wc"
 206   ls | parallel "echo {}; ls {}|wc"
 207
 208 becomes (assuming you have 8 cores and that none of the filenames
 209 contain space, " or ').
 210
 211   ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
 212   ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
 213
 214 https://www.gnu.org/software/findutils/
 215
 216
 217 =head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
 218
 219 B<find -exec> offers some of the same possibilities as GNU B<parallel>.
 220
 221 B<find -exec> only works on files. Processing other input (such as
 222 hosts or URLs) will require creating these inputs as files. B<find
 223 -exec> has no support for running commands in parallel.
 224
 225 https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
 226
 227
 228 =head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
 229
 230 B<make -j> can run jobs in parallel, but requires a crafted Makefile
 231 to do this. That results in extra quoting to get filenames containing
 232 newlines to work correctly.
 233
 234 B<make -j> computes a dependency graph before running jobs. Jobs run
 235 by GNU B<parallel> does not depend on each other.
 236
 237 (Very early versions of GNU B<parallel> were coincidentally implemented
 238 using B<make -j>).
 239
 240 https://www.gnu.org/software/make/ (Last checked: 2019-01)
 241
 242
 243 =head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
 244
 245 Summary table (see legend above):
 246 I1 I2 - - - - I7
 247 M1 - M3 - - M6
 248 O1 - - x - -
 249 E1 E2 ?E3 E4 - - -
 250 R1 R2 R3 R4 - - ?R7 ? ?
 251 - -
 252
 253 B<ppss> is also a tool for running jobs in parallel.
 254
 255 The output of B<ppss> is status information and thus not useful for
 256 using as input for another command. The output from the jobs are put
 257 into files.
 258
 259 The argument replace string ($ITEM) cannot be changed. Arguments must
 260 be quoted - thus arguments containing special characters (space '"&!*)
 261 may cause problems. More than one argument is not supported. Filenames
 262 containing newlines are not processed correctly. When reading input
 263 from a file null cannot be used as a terminator. B<ppss> needs to read
 264 the whole input file before starting any jobs.
 265
 266 Output and status information is stored in ppss_dir and thus requires
 267 cleanup when completed. If the dir is not removed before running
 268 B<ppss> again it may cause nothing to happen as B<ppss> thinks the
 269 task is already done. GNU B<parallel> will normally not need cleaning
 270 up if running locally and will only need cleaning up if stopped
 271 abnormally and running remote (B<--cleanup> may not complete if
 272 stopped abnormally). The example B<Parallel grep> would require extra
 273 postprocessing if written using B<ppss>.
 274
 275 For remote systems PPSS requires 3 steps: config, deploy, and
 276 start. GNU B<parallel> only requires one step.
 277
 278 =head3 EXAMPLES FROM ppss MANUAL
 279
 280 Here are the examples from B<ppss>'s manual page with the equivalent
 281 using GNU B<parallel>:
 282
 283 B<1> ./ppss.sh standalone -d /path/to/files -c 'gzip '
 284
 285 B<1> find /path/to/files -type f | parallel gzip
 286
 287 B<2> ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
 288
 289 B<2> find /path/to/files -type f | parallel cp {} /destination/dir
 290
 291 B<3> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
 292
 293 B<3> parallel -a list-of-urls.txt wget -q
 294
 295 B<4> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
 296
 297 B<4> parallel -a list-of-urls.txt wget -q {}
 298
 299 B<5> ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -m
 300 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -o
 301 /some/output/dir --upload --download ; ./ppss deploy -C config.cfg ;
 302 ./ppss start -C config
 303
 304 B<5> # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
 305
 306 B<5> find source/dir -type f | parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
 307
 308 B<6> ./ppss stop -C config.cfg
 309
 310 B<6> killall -TERM parallel
 311
 312 B<7> ./ppss pause -C config.cfg
 313
 314 B<7> Press: CTRL-Z or killall -SIGTSTP parallel
 315
 316 B<8> ./ppss continue -C config.cfg
 317
 318 B<8> Enter: fg or killall -SIGCONT parallel
 319
 320 B<9> ./ppss.sh status -C config.cfg
 321
 322 B<9> killall -SIGUSR2 parallel
 323
 324 https://github.com/louwrentius/PPSS
 325
 326
 327 =head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
 328
 329 Summary table (see legend above):
 330 I1 I2 - I4 I5 - -
 331 M1 - M3 - - M6
 332 O1 O2 O3 - O5 O6
 333 E1 - - E4 - E6 -
 334 R1 - - - - R6 - - -
 335 S1 -
 336
 337 B<pexec> is also a tool for running jobs in parallel.
 338
 339 =head3 EXAMPLES FROM pexec MANUAL
 340
 341 Here are the examples from B<pexec>'s info page with the equivalent
 342 using GNU B<parallel>:
 343
 344 B<1> pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
 345   'echo "scale=10000;sqrt($NUM)" | bc'
 346
 347 B<1> seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | bc > sqrt-{}.dat'
 348
 349 B<2> pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
 350
 351 B<2> ls myfiles*.ext | parallel sort {} ">{}.sort"
 352
 353 B<3> pexec -f image.list -n auto -e B -u star.log -c -- \
 354   'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
 355
 356 B<3> parallel -a image.list \
 357   'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
 358
 359 B<4> pexec -r *.png -e IMG -c -o - -- \
 360   'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
 361
 362 B<4> ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
 363
 364 B<5> pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 365
 366 B<5> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
 367
 368 B<6> for p in *.png ; do echo ${p%.png} ; done | \
 369   pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 370
 371 B<6> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
 372
 373 B<7> LIST=$(for p in *.png ; do echo ${p%.png} ; done)
 374   pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
 375
 376 B<7> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
 377
 378 B<8> pexec -n 8 -r *.jpg -y unix -e IMG -c \
 379   'pexec -j -m blockread -d $IMG | \
 380   jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
 381   pexec -j -m blockwrite -s th_$IMG'
 382
 383 B<8> Combining GNU B<parallel> and GNU B<sem>.
 384
 385 B<8> ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
 386   'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
 387
 388 B<8> If reading and writing is done to the same disk, this may be
 389 faster as only one process will be either reading or writing:
 390
 391 B<8> ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
 392   'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
 393
 394 https://www.gnu.org/software/pexec/
 395
 396
 397 =head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
 398
 399 B<xjobs> is also a tool for running jobs in parallel. It only supports
 400 running jobs on your local computer.
 401
 402 B<xjobs> deals badly with special characters just like B<xargs>. See
 403 the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
 404
 405 Here are the examples from B<xjobs>'s man page with the equivalent
 406 using GNU B<parallel>:
 407
 408 B<1> ls -1 *.zip | xjobs unzip
 409
 410 B<1> ls *.zip | parallel unzip
 411
 412 B<2> ls -1 *.zip | xjobs -n unzip
 413
 414 B<2> ls *.zip | parallel unzip >/dev/null
 415
 416 B<3> find . -name '*.bak' | xjobs gzip
 417
 418 B<3> find . -name '*.bak' | parallel gzip
 419
 420 B<4> ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
 421
 422 B<4> ls *.jar | parallel jar tf {} '>' {}.idx
 423
 424 B<5> xjobs -s script
 425
 426 B<5> cat script | parallel
 427
 428 B<6> mkfifo /var/run/my_named_pipe;
 429 xjobs -s /var/run/my_named_pipe &
 430 echo unzip 1.zip >> /var/run/my_named_pipe;
 431 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
 432
 433 B<6> mkfifo /var/run/my_named_pipe;
 434 cat /var/run/my_named_pipe | parallel &
 435 echo unzip 1.zip >> /var/run/my_named_pipe;
 436 echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
 437
 438 http://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
 439
 440
 441 =head2 DIFFERENCES BETWEEN prll AND GNU Parallel
 442
 443 B<prll> is also a tool for running jobs in parallel. It does not
 444 support running jobs on remote computers.
 445
 446 B<prll> encourages using BASH aliases and BASH functions instead of
 447 scripts. GNU B<parallel> supports scripts directly, functions if they
 448 are exported using B<export -f>, and aliases if using B<env_parallel>.
 449
 450 B<prll> generates a lot of status information on stderr (standard
 451 error) which makes it harder to use the stderr (standard error) output
 452 of the job directly as input for another program.
 453
 454 Here is the example from B<prll>'s man page with the equivalent
 455 using GNU B<parallel>:
 456
 457   prll -s 'mogrify -flip $1' *.jpg
 458   parallel mogrify -flip ::: *.jpg
 459
 460 https://github.com/exzombie/prll (Last checked: 2019-01)
 461
 462
 463 =head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
 464
 465 B<dxargs> is also a tool for running jobs in parallel.
 466
 467 B<dxargs> does not deal well with more simultaneous jobs than SSHD's
 468 MaxStartups. B<dxargs> is only built for remote run jobs, but does not
 469 support transferring of files.
 470
 471 https://web.archive.org/web/20120518070250/http://www.
 472 semicomplete.com/blog/geekery/distributed-xargs.html (Last checked: 2019-01)
 473
 474
 475 =head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
 476
 477 middleman(mdm) is also a tool for running jobs in parallel.
 478
 479 Here are the shellscripts of
 480 https://web.archive.org/web/20110728064735/http://mdm.
 481 berlios.de/usage.html ported to GNU B<parallel>:
 482
 483   seq 19 | parallel buffon -o - | sort -n > result
 484   cat files | parallel cmd
 485   find dir -execdir sem cmd {} \;
 486
 487 https://github.com/cklin/mdm (Last checked: 2019-01)
 488
 489
 490 =head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
 491
 492 B<xapply> can run jobs in parallel on the local computer.
 493
 494 Here are the examples from B<xapply>'s man page with the equivalent
 495 using GNU B<parallel>:
 496
 497 B<1> xapply '(cd %1 && make all)' */
 498
 499 B<1> parallel 'cd {} && make all' ::: */
 500
 501 B<2> xapply -f 'diff %1 ../version5/%1' manifest | more
 502
 503 B<2> parallel diff {} ../version5/{} < manifest | more
 504
 505 B<3> xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
 506
 507 B<3> parallel --link diff {1} {2} :::: manifest1 checklist1
 508
 509 B<4> xapply 'indent' *.c
 510
 511 B<4> parallel indent ::: *.c
 512
 513 B<5> find ~ksb/bin -type f ! -perm -111 -print | xapply -f -v 'chmod a+x' -
 514
 515 B<5> find ~ksb/bin -type f ! -perm -111 -print | parallel -v chmod a+x
 516
 517 B<6> find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
 518
 519 B<6> sh <(find */ -... | parallel -s 1024 echo vi)
 520
 521 B<6> find */ -... | parallel -s 1024 -Xuj1 vi
 522
 523 B<7> find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
 524
 525 B<7> sh <(find ... |parallel -n5 echo vi)
 526
 527 B<7> find ... |parallel -n5 -uj1 vi
 528
 529 B<8> xapply -fn "" /etc/passwd
 530
 531 B<8> parallel -k echo < /etc/passwd
 532
 533 B<9> tr ':' '\012' < /etc/passwd | xapply -7 -nf 'chown %1 %6' - - - - - - -
 534
 535 B<9> tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
 536
 537 B<10> xapply '[ -d %1/RCS ] || echo %1' */
 538
 539 B<10> parallel '[ -d {}/RCS ] || echo {}' ::: */
 540
 541 B<11> xapply -f '[ -f %1 ] && echo %1' List | ...
 542
 543 B<11> parallel '[ -f {} ] && echo {}' < List | ...
 544
 545 https://web.archive.org/web/20160702211113/
 546 http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
 547
 548
 549 =head2 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
 550
 551 B<apply> can build command lines based on a template and arguments -
 552 very much like GNU B<parallel>. B<apply> does not run jobs in
 553 parallel. B<apply> does not use an argument separator (like B<:::>);
 554 instead the template must be the first argument.
 555
 556 Here are the examples from IBM's Knowledge Center and the
 557 corresponding command using GNU B<parallel>:
 558
 559 1. To obtain results similar to those of the B<ls> command, enter:
 560
 561   apply echo *
 562   parallel echo ::: *
 563
 564 2. To compare the file named B<a1> to the file named B<b1>, and the
 565 file named B<a2> to the file named B<b2>, enter:
 566
 567   apply -2 cmp a1 b1 a2 b2
 568   parallel -N2 cmp ::: a1 b1 a2 b2
 569
 570 3. To run the B<who> command five times, enter:
 571
 572   apply -0 who 1 2 3 4 5
 573   parallel -N0 who ::: 1 2 3 4 5
 574
 575 4. To link all files in the current directory to the directory
 576 B</usr/joe>, enter:
 577
 578   apply 'ln %1 /usr/joe' *
 579   parallel ln {} /usr/joe ::: *
 580
 581 https://www-01.ibm.com/support/knowledgecenter/
 582 ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
 583
 584
 585 =head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
 586
 587 B<paexec> can run jobs in parallel on both the local and remote computers.
 588
 589 B<paexec> requires commands to print a blank line as the last
 590 output. This means you will have to write a wrapper for most programs.
 591
 592 B<paexec> has a job dependency facility so a job can depend on another
 593 job to be executed successfully. Sort of a poor-man's B<make>.
 594
 595 Here are the examples from B<paexec>'s example catalog with the equivalent
 596 using GNU B<parallel>:
 597
 598 =over 1
 599
 600 =item 1_div_X_run:
 601
 602   ../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
 603   parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
 604
 605 =item all_substr_run:
 606
 607   ../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
 608   parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
 609
 610 =item cc_wrapper_run:
 611
 612   ../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
 613              -n 'host1 host2' \
 614              -t '/usr/bin/ssh -x' <<EOF [...]
 615   parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
 616              -S host1,host2 <<EOF [...]
 617   # This is not exactly the same, but avoids the wrapper
 618   parallel gcc -O2 -c -o {.}.o {} \
 619              -S host1,host2 <<EOF [...]
 620
 621 =item toupper_run:
 622
 623   ../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
 624   parallel echo {} '|' ./toupper_cmd <<EOF [...]
 625   # Without the wrapper:
 626   parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
 627
 628 =back
 629
 630 https://github.com/cheusov/paexec
 631
 632
 633 =head2 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
 634
 635 B<map> sees it as a feature to have less features and in doing so it
 636 also handles corner cases incorrectly. A lot of GNU B<parallel>'s code
 637 is to handle corner cases correctly on every platform, so you will not
 638 get a nasty surprise if a user, for example, saves a file called: I<My
 639 brother's 12" records.txt>
 640
 641 B<map>'s example showing how to deal with special characters fails on
 642 special characters:
 643
 644   echo "The Cure" > My\ brother\'s\ 12\"\ records
 645
 646   ls | \
 647     map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' |
 648     bc
 649
 650 It works with GNU B<parallel>:
 651
 652   ls | \
 653     parallel \
 654       'echo -n `gzip < {} | wc -c`; echo -n '*100/'; wc -c < {}' | bc
 655
 656 And you can even get the file name prepended:
 657
 658   ls | \
 659     parallel --tag \
 660       '(echo -n `gzip < {} | wc -c`'*100/'; wc -c < {}) | bc'
 661
 662 B<map> has no support for grouping. So this gives the wrong results
 663 without any warnings:
 664
 665   parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
 666     ::: a b c d e f
 667   ls -l a b c d e f
 668   parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
 669   map -p 4 'grep 1' a b c d e f > out.map-unbuf
 670   map -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
 671   map -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
 672   ls -l out*
 673   md5sum out*
 674
 675 The documentation shows a workaround, but not only does that mix
 676 stdout (standard output) with stderr (standard error) it also fails
 677 completely for certain jobs (and may even be considered less readable):
 678
 679   parallel echo -n {} ::: 1 2 3
 680
 681   map -p 4 'echo -n % 2>&1 | sed -e "s/^/$$:/"' 1 2 3 | \
 682     sort | cut -f2- -d:
 683
 684 B<map>s replacement strings (% %D %B %E) can be simulated in GNU
 685 B<parallel> by putting this in B<~/.parallel/config>:
 686
 687   --rpl '%'
 688   --rpl '%D $_=Q(::dirname($_));'
 689   --rpl '%B s:.*/::;s:\.[^/.]+$::;'
 690   --rpl '%E s:.*\.::'
 691
 692 B<map> does not have an argument separator on the command line, but
 693 uses the first argument as command. This makes quoting harder which again
 694 may affect readability. Compare:
 695
 696   map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
 697
 698   parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
 699
 700 B<map> can do multiple arguments with context replace, but not without
 701 context replace:
 702
 703   parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
 704
 705   map "echo 'BEGIN{'%'}END'" 1 2 3
 706
 707 B<map> requires Perl v5.10.0 making it harder to use on old systems.
 708
 709 B<map> has no way of using % in the command (GNU B<parallel> has -I to
 710 specify another replacement string than B<{}>).
 711
 712 By design B<map> is option incompatible with B<xargs>, it does not
 713 have remote job execution, a structured way of saving results,
 714 multiple input sources, progress indicator, configurable record
 715 delimiter (only field delimiter), logging of jobs run with possibility
 716 to resume, keeping the output in the same order as input, --pipe
 717 processing, and dynamically timeouts.
 718
 719 https://github.com/sitaramc/map
 720
 721
 722 =head2 DIFFERENCES BETWEEN ladon AND GNU Parallel
 723
 724 B<ladon> can run multiple jobs on files in parallel.
 725
 726 B<ladon> only works on files and the only way to specify files is
 727 using a quoted glob string (such as \*.jpg). It is not possible to
 728 list the files manually.
 729
 730 As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
 731 RELPATH
 732
 733 These can be simulated using GNU B<parallel> by putting this in
 734 B<~/.parallel/config>:
 735
 736     --rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
 737     --rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
 738     --rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
 739     --rpl 'EXT s:.*\.::'
 740     --rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
 741            s:\Q$c/\E::;$_=::dirname($_);'
 742     --rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
 743            s:\Q$c/\E::;'
 744
 745 B<ladon> deals badly with filenames containing " and newline, and it
 746 fails for output larger than 200k:
 747
 748     ladon '*' -- seq 36000 | wc
 749
 750 =head3 EXAMPLES FROM ladon MANUAL
 751
 752 It is assumed that the '--rpl's above are put in B<~/.parallel/config>
 753 and that it is run under a shell that supports '**' globbing (such as B<zsh>):
 754
 755 B<1> ladon "**/*.txt" -- echo RELPATH
 756
 757 B<1> parallel echo RELPATH ::: **/*.txt
 758
 759 B<2> ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
 760
 761 B<2> parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
 762
 763 B<3> ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH -thumbnail 100x100^ -gravity center -extent 100x100 thumbs/RELPATH
 764
 765 B<3> parallel mkdir -p thumbs/RELDIR\; convert FULLPATH -thumbnail 100x100^ -gravity center -extent 100x100 thumbs/RELPATH ::: **/*.jpg
 766
 767 B<4> ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
 768
 769 B<4> parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
 770
 771 https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
 772
 773
 774 =head2 DIFFERENCES BETWEEN jobflow AND GNU Parallel
 775
 776 B<jobflow> can run multiple jobs in parallel.
 777
 778 Just like B<xargs> output from B<jobflow> jobs running in parallel mix
 779 together by default. B<jobflow> can buffer into files (placed in
 780 /run/shm), but these are not cleaned up if B<jobflow> dies
 781 unexpectedly (e.g. by Ctrl-C). If the total output is big (in the
 782 order of RAM+swap) it can cause the system to slow to a crawl and
 783 eventually run out of memory.
 784
 785 B<jobflow> gives no error if the command is unknown, and like B<xargs>
 786 redirection and composed commands require wrapping with B<bash -c>.
 787
 788 Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
 789 the command template. More than that either crashes the program or
 790 simple does not execute the command.
 791
 792 B<jobflow> has no equivalent for B<--pipe>, or B<--sshlogin>.
 793
 794 B<jobflow> makes it possible to set resource limits on the running
 795 jobs. This can be emulated by GNU B<parallel> using B<bash>'s B<ulimit>:
 796
 797   jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
 798
 799   parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
 800
 801
 802 =head3 EXAMPLES FROM jobflow README
 803
 804 B<1> cat things.list | jobflow -threads=8 -exec ./mytask {}
 805
 806 B<1> cat things.list | parallel -j8 ./mytask {}
 807
 808 B<2> seq 100 | jobflow -threads=100 -exec echo {}
 809
 810 B<2> seq 100 | parallel -j100 echo {}
 811
 812 B<3> cat urls.txt | jobflow -threads=32 -exec wget {}
 813
 814 B<3> cat urls.txt | parallel -j32 wget {}
 815
 816 B<4> find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
 817
 818 B<4> find . -name '*.bmp' | parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
 819
 820 https://github.com/rofl0r/jobflow
 821
 822
 823 =head2 DIFFERENCES BETWEEN gargs AND GNU Parallel
 824
 825 B<gargs> can run multiple jobs in parallel.
 826
 827 Older versions cache output in memory. This causes it to be extremely
 828 slow when the output is larger than the physical RAM, and can cause
 829 the system to run out of memory.
 830
 831 See more details on this in B<man parallel_design>.
 832
 833 Newer versions cache output in files, but leave files in $TMPDIR if it
 834 is killed.
 835
 836 Output to stderr (standard error) is changed if the command fails.
 837
 838 Here are the two examples from B<gargs> website.
 839
 840 B<1> seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
 841
 842 B<1> seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
 843
 844 B<2> cat t.txt | gargs --sep "\s+" -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
 845
 846 B<2> cat t.txt | parallel --colsep "\\s+" -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
 847
 848 https://github.com/brentp/gargs
 849
 850
 851 =head2 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
 852
 853 B<orgalorg> can run the same job on multiple machines. This is related
 854 to B<--onall> and B<--nonall>.
 855
 856 B<orgalorg> supports entering the SSH password - provided it is the
 857 same for all servers. GNU B<parallel> advocates using B<ssh-agent>
 858 instead, but it is possible to emulate B<orgalorg>'s behavior by
 859 setting SSHPASS and by using B<--ssh "sshpass ssh">.
 860
 861 To make the emulation easier, make a simple alias:
 862
 863   alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
 864
 865 If you want to supply a password run:
 866
 867   SSHPASS=`ssh-askpass`
 868
 869 or set the password directly:
 870
 871   SSHPASS=P4$$w0rd!
 872
 873 If the above is set up you can then do:
 874
 875   orgalorg -o frontend1 -o frontend2 -p -C uptime
 876   par_emul -S frontend1 -S frontend2 uptime
 877
 878   orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
 879   par_emul -S frontend1 -S frontend2 top -bid 1
 880
 881   orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
 882     'md5sum /tmp/bigfile' -S bigfile
 883   par_emul -S frontend1 -S frontend2 --basefile bigfile \
 884     --workdir /tmp md5sum /tmp/bigfile
 885
 886 B<orgalorg> has a progress indicator for the transferring of a
 887 file. GNU B<parallel> does not.
 888
 889 https://github.com/reconquest/orgalorg
 890
 891
 892 =head2 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
 893
 894 Rust parallel focuses on speed. It is almost as fast as B<xargs>. It
 895 implements a few features from GNU B<parallel>, but lacks many
 896 functions. All these fail:
 897
 898   # Read arguments from file
 899   parallel -a file echo
 900   # Changing the delimiter
 901   parallel -d _ echo ::: a_b_c_
 902
 903 These do something different from GNU B<parallel>
 904
 905   # -q to protect quoted $ and space
 906   parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
 907   # Generation of combination of inputs
 908   parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
 909   # {= perl expression =} replacement string
 910   parallel echo '{= s/new/old/ =}' ::: my.new your.new
 911   # --pipe
 912   seq 100000 | parallel --pipe wc
 913   # linked arguments
 914   parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
 915   # Run different shell dialects
 916   zsh -c 'parallel echo \={} ::: zsh && true'
 917   csh -c 'parallel echo \$\{\} ::: shell && true'
 918   bash -c 'parallel echo \$\({}\) ::: pwd && true'
 919   # Rust parallel does not start before the last argument is read
 920   (seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
 921   tail -f /var/log/syslog | parallel echo
 922
 923 Most of the examples from the book GNU Parallel 2018 do not work, thus
 924 Rust parallel is not close to being a compatible replacement.
 925
 926 Rust parallel has no remote facilities.
 927
 928 It uses /tmp/parallel for tmp files and does not clean up if
 929 terminated abruptly. If another user on the system uses Rust parallel,
 930 then /tmp/parallel will have the wrong permissions and Rust parallel
 931 will fail. A malicious user can setup the right permissions and
 932 symlink the output file to one of the user's files and next time the
 933 user uses Rust parallel it will overwrite this file.
 934
 935   attacker$ mkdir /tmp/parallel
 936   attacker$ chmod a+rwX /tmp/parallel
 937   # Symlink to the file the attacker wants to zero out
 938   attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
 939   victim$ seq 1000 | parallel echo
 940   # This file is now overwritten with stderr from 'echo'
 941   victim$ cat ~victim/.important-file
 942
 943 If /tmp/parallel runs full during the run, Rust parallel does not
 944 report this, but finishes with success - thereby risking data loss.
 945
 946 https://github.com/mmstick/parallel
 947
 948
 949 =head2 DIFFERENCES BETWEEN Rush AND GNU Parallel
 950
 951 B<rush> (https://github.com/shenwei356/rush) is written in Go and
 952 based on B<gargs>.
 953
 954 Just like GNU B<parallel> B<rush> buffers in temporary files. But
 955 opposite GNU B<parallel> B<rush> does not clean up, if the process
 956 dies abnormally.
 957
 958 B<rush> has some string manipulations that can be emulated by putting
 959 this into ~/.parallel/config (/ is used instead of %, and % is used
 960 instead of ^ as that is closer to bash's ${var%postfix}):
 961
 962   --rpl '{:} s:(\.[^/]+)*$::'
 963   --rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
 964   --rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
 965   --rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
 966   --rpl '{@(.*?)} /$$1/ and $_=$1;'
 967
 968 Here are the examples from B<rush>'s website with the equivalent
 969 command in GNU B<parallel>.
 970
 971 =head3 EXAMPLES
 972
 973 B<1. Simple run, quoting is not necessary>
 974
 975   $ seq 1 3 | rush echo {}
 976
 977   $ seq 1 3 | parallel echo {}
 978
 979 B<2. Read data from file (`-i`)>
 980
 981   $ rush echo {} -i data1.txt -i data2.txt
 982
 983   $ cat data1.txt data2.txt | parallel echo {}
 984
 985 B<3. Keep output order (`-k`)>
 986
 987   $ seq 1 3 | rush 'echo {}' -k
 988
 989   $ seq 1 3 | parallel -k echo {}
 990
 991
 992 B<4. Timeout (`-t`)>
 993
 994   $ time seq 1 | rush 'sleep 2; echo {}' -t 1
 995
 996   $ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
 997
 998 B<5. Retry (`-r`)>
 999
1000   $ seq 1 | rush 'python unexisted_script.py' -r 1
1001
1002   $ seq 1 | parallel --retries 2 'python unexisted_script.py'
1003
1004 Use B<-u> to see it is really run twice:
1005
1006   $ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
1007
1008 B<6. Dirname (`{/}`) and basename (`{%}`) and remove custom
1009 suffix (`{^suffix}`)>
1010
1011   $ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
1012
1013   $ echo dir/file_1.txt.gz |
1014       parallel --plus echo {//} {/} {%_1.txt.gz}
1015
1016 B<7. Get basename, and remove last (`{.}`) or any (`{:}`) extension>
1017
1018   $ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
1019
1020   $ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
1021
1022 B<8. Job ID, combine fields index and other replacement strings>
1023
1024   $ echo 12 file.txt dir/s_1.fq.gz |
1025       rush 'echo job {#}: {2} {2.} {3%:^_1}'
1026
1027   $ echo 12 file.txt dir/s_1.fq.gz |
1028       parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
1029
1030 B<9. Capture submatch using regular expression (`{@regexp}`)>
1031
1032   $ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
1033
1034   $ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
1035
1036 B<10. Custom field delimiter (`-d`)>
1037
1038   $ echo a=b=c | rush 'echo {1} {2} {3}' -d =
1039
1040   $ echo a=b=c | parallel -d = echo {1} {2} {3}
1041
1042 B<11. Send multi-lines to every command (`-n`)>
1043
1044   $ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1045
1046   $ seq 5 |
1047       parallel -n 2 -k \
1048         'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
1049
1050   $ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1051
1052   $ seq 5 | parallel -n 2 -k 'echo {}; echo'
1053
1054
1055 B<12. Custom record delimiter (`-D`), note that empty records are not used.>
1056
1057   $ echo a b c d | rush -D " " -k 'echo {}'
1058
1059   $ echo a b c d | parallel -d " " -k 'echo {}'
1060
1061   $ echo abcd | rush -D "" -k 'echo {}'
1062
1063   Cannot be done by GNU Parallel
1064
1065   $ cat fasta.fa
1066   >seq1
1067   tag
1068   >seq2
1069   cat
1070   gat
1071   >seq3
1072   attac
1073   a
1074   cat
1075
1076   $ cat fasta.fa | rush -D ">" \
1077       'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
1078   # rush fails to join the multiline sequences
1079
1080   $ cat fasta.fa | (read -n1 ignore_first_char;
1081       parallel -d '>' --colsep '\n' echo FASTA record {#}: \
1082         name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
1083     )
1084
1085 B<13. Assign value to variable, like `awk -v` (`-v`)>
1086
1087   $ seq 1 |
1088       rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
1089
1090   $ seq 1 |
1091       parallel -N0 \
1092         'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
1093
1094   $ for var in a b; do \
1095   $   seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
1096   $ done
1097
1098 In GNU B<parallel> you would typically do:
1099
1100   $ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
1101
1102 If you I<really> want the var:
1103
1104   $ seq 1 3 |
1105       parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
1106
1107 If you I<really> want the B<for>-loop:
1108
1109   $ for var in a b; do
1110   >   export var;
1111   >   seq 1 3 | parallel -k 'echo var: $var, data: {}';
1112   > done
1113
1114 Contrary to B<rush> this also works if the value is complex like:
1115
1116   My brother's 12" records
1117
1118
1119 B<14. B<Preset variable> (`-v`), avoid repeatedly writing verbose replacement strings>
1120
1121   # naive way
1122   $ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
1123
1124   $ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
1125
1126   # macro + removing suffix
1127   $ echo read_1.fq.gz |
1128       rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
1129
1130   $ echo read_1.fq.gz |
1131       parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
1132
1133   # macro + regular expression
1134   $ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
1135
1136   $ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1137
1138 Contrary to B<rush> GNU B<parallel> works with complex values:
1139
1140   echo "My brother's 12\"read_1.fq.gz" |
1141     parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
1142
1143 B<15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and exit.>
1144
1145   $ seq 1 20 | rush 'sleep 1; echo {}'
1146   ^C
1147
1148   $ seq 1 20 | parallel 'sleep 1; echo {}'
1149   ^C
1150
1151 B<16. Continue/resume jobs (`-c`). When some jobs failed (by
1152 execution failure, timeout, or canceling by user with `Ctrl + C`),
1153 please switch flag `-c/--continue` on and run again, so that `rush`
1154 can save successful commands and ignore them in I<NEXT> run.>
1155
1156   $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1157   $ cat successful_cmds.rush
1158   $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1159
1160   $ seq 1 3 | parallel --joblog mylog --timeout 2 \
1161       'sleep {}; echo {}'
1162   $ cat mylog
1163   $ seq 1 3 | parallel --joblog mylog --retry-failed \
1164       'sleep {}; echo {}'
1165
1166 Multi-line jobs:
1167
1168   $ seq 1 3 | rush 'sleep {}; echo {}; \
1169     echo finish {}' -t 3 -c -C finished.rush
1170   $ cat finished.rush
1171   $ seq 1 3 | rush 'sleep {}; echo {}; \
1172     echo finish {}' -t 3 -c -C finished.rush
1173
1174   $ seq 1 3 |
1175       parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
1176     echo finish {}'
1177   $ cat mylog
1178   $ seq 1 3 |
1179       parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
1180         echo finish {}'
1181
1182 B<17. A comprehensive example: downloading 1K+ pages given by
1183 three URL list files using `phantomjs save_page.js` (some page
1184 contents are dynamically generated by Javascript, so `wget` does not
1185 work). Here I set max jobs number (`-j`) as `20`, each job has a max
1186 running time (`-t`) of `60` seconds and `3` retry changes
1187 (`-r`). Continue flag `-c` is also switched on, so we can continue
1188 unfinished jobs. Luckily, it's accomplished in one run :)>
1189
1190   $ for f in $(seq 2014 2016); do \
1191   $    /bin/rm -rf $f; mkdir -p $f; \
1192   $    cat $f.html.txt | rush -v d=$f -d = \
1193          'phantomjs save_page.js "{}" > {d}/{3}.html' \
1194          -j 20 -t 60 -r 3 -c; \
1195   $ done
1196
1197 GNU B<parallel> can append to an existing joblog with '+':
1198
1199   $ rm mylog
1200   $ for f in $(seq 2014 2016); do
1201       /bin/rm -rf $f; mkdir -p $f;
1202       cat $f.html.txt |
1203         parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
1204           --colsep = \
1205           phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
1206     done
1207
1208 B<18. A bioinformatics example: mapping with `bwa`, and
1209 processing result with `samtools`:>
1210
1211   $ ref=ref/xxx.fa
1212   $ threads=25
1213   $ ls -d raw.cluster.clean.mapping/* \
1214     | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
1215         'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz >{p}.sam;\
1216         samtools view -bS {p}.sam > {p}.bam; \
1217         samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
1218         samtools index {p}.sorted.bam; \
1219         samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
1220         /bin/rm {p}.bam {p}.sam;' \
1221         -j 2 --verbose -c -C mapping.rush
1222
1223 GNU B<parallel> would use a function:
1224
1225   $ ref=ref/xxx.fa
1226   $ export ref
1227   $ thr=25
1228   $ export thr
1229   $ bwa_sam() {
1230       p="$1"
1231       bam="$p".bam
1232       sam="$p".sam
1233       sortbam="$p".sorted.bam
1234       bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
1235       samtools view -bS "$sam" > "$bam"
1236       samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
1237       samtools index "$sortbam"
1238       samtools flagstat "$sortbam" > "$sortbam".flagstat
1239       /bin/rm "$bam" "$sam"
1240     }
1241   $ export -f bwa_sam
1242   $ ls -d raw.cluster.clean.mapping/* |
1243       parallel -j 2 --verbose --joblog mylog bwa_sam
1244
1245 =head3 Other B<rush> features
1246
1247 B<rush> has:
1248
1249 =over 4
1250
1251 =item * B<awk -v> like custom defined variables (B<-v>)
1252
1253 With GNU B<parallel> you would simply set a shell variable:
1254
1255    parallel 'v={}; echo "$v"' ::: foo
1256    echo foo | rush -v v={} 'echo {v}'
1257
1258 Also B<rush> does not like special chars. So these B<do not work>:
1259
1260    echo does not work | rush -v v=\" 'echo {v}'
1261    echo "My  brother's  12\"  records" | rush -v v={} 'echo {v}'
1262
1263 Whereas the corresponding GNU B<parallel> version works:
1264
1265    parallel 'v=\"; echo "$v"' ::: works
1266    parallel 'v={}; echo "$v"' ::: "My  brother's  12\"  records"
1267
1268 =item * Exit on first error(s) (-e)
1269
1270 This is called B<--halt now,fail=1> (or shorter: B<--halt 2>) when
1271 used with GNU B<parallel>.
1272
1273 =item * Settable records sending to every command (B<-n>, default 1)
1274
1275 This is also called B<-n> in GNU B<parallel>.
1276
1277 =item * Practical replacement strings
1278
1279 =over 4
1280
1281 =item {:} remove any extension
1282
1283 With GNU B<parallel> this can be emulated by:
1284
1285   parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
1286
1287 =item {^suffix}, remove suffix
1288
1289 With GNU B<parallel> this can be emulated by:
1290
1291   parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
1292
1293 =item {@regexp}, capture submatch using regular expression
1294
1295 With GNU B<parallel> this can be emulated by:
1296
1297   parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
1298     echo '{@\d_(.*).gz}' ::: 1_foo.gz
1299
1300 =item {%.}, {%:}, basename without extension
1301
1302 With GNU B<parallel> this can be emulated by:
1303
1304   parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
1305
1306 And if you need it often, you define a B<--rpl> in
1307 B<$HOME/.parallel/config>:
1308
1309   --rpl '{%.} s:.*/::;s/\..*//'
1310   --rpl '{%:} s:.*/::;s/\..*//'
1311
1312 Then you can use them as:
1313
1314   parallel echo {%.} {%:} ::: dir/foo.bar.gz
1315
1316 =back
1317
1318 =item * Preset variable (macro)
1319
1320 E.g.
1321
1322   echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
1323
1324 With GNU B<parallel> this can be emulated by:
1325
1326   echo foosuffix |
1327     parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
1328
1329 Opposite B<rush> GNU B<parallel> works fine if the input contains
1330 double space, ' and ":
1331
1332   echo "1'6\"  foosuffix" |
1333     parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
1334
1335
1336 =item * Commands of multi-lines
1337
1338 While you I<can> use multi-lined commands in GNU B<parallel>, to
1339 improve readability GNU B<parallel> discourages the use of multi-line
1340 commands. In most cases it can be written as a function:
1341
1342   seq 1 3 |
1343     parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
1344       echo finish {}'
1345
1346 Could be written as:
1347
1348   doit() {
1349     sleep "$1"
1350     echo "$1"
1351     echo finish "$1"
1352   }
1353   export -f doit
1354   seq 1 3 | parallel --timeout 2 --joblog my.log doit
1355
1356 The failed commands can be resumed with:
1357
1358   seq 1 3 |
1359     parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
1360       echo finish {}'
1361
1362 =back
1363
1364 https://github.com/shenwei356/rush
1365
1366
1367 =head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
1368
1369 ClusterSSH solves a different problem than GNU B<parallel>.
1370
1371 ClusterSSH opens a terminal window for each computer and using a
1372 master window you can run the same command on all the computers. This
1373 is typically used for administrating several computers that are almost
1374 identical.
1375
1376 GNU B<parallel> runs the same (or different) commands with different
1377 arguments in parallel possibly using remote computers to help
1378 computing. If more than one computer is listed in B<-S> GNU B<parallel> may
1379 only use one of these (e.g. if there are 8 jobs to be run and one
1380 computer has 8 cores).
1381
1382 GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
1383
1384 B<parallel --nonall -S server-a,server-b do_stuff foo bar>
1385
1386 https://github.com/duncs/clusterssh
1387
1388
1389 =head2 DIFFERENCES BETWEEN coshell AND GNU Parallel
1390
1391 B<coshell> only accepts full commands on standard input. Any quoting
1392 needs to be done by the user.
1393
1394 Commands are run in B<sh> so any B<bash>/B<tcsh>/B<zsh> specific
1395 syntax will not work.
1396
1397 Output can be buffered by using B<-d>. Output is buffered in memory,
1398 so big output can cause swapping and therefore be terrible slow or
1399 even cause out of memory.
1400
1401 https://github.com/gdm85/coshell (Last checked: 2019-01)
1402
1403
1404 =head2 DIFFERENCES BETWEEN spread AND GNU Parallel
1405
1406 B<spread> runs commands on all directories.
1407
1408 It can be emulated with GNU B<parallel> using this Bash function:
1409
1410   spread() {
1411     _cmds() {
1412       perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
1413     }
1414     parallel $(_cmds "$@")'|| echo exit status $?' ::: */
1415   }
1416
1417 This works except for the B<--exclude> option.
1418
1419 (Last checked: 2017-11)
1420
1421
1422 =head2 DIFFERENCES BETWEEN pyargs AND GNU Parallel
1423
1424 B<pyargs> deals badly with input containing spaces. It buffers stdout,
1425 but not stderr. It buffers in RAM. {} does not work as replacement
1426 string. It does not support running functions.
1427
1428 B<pyargs> does not support composed commands if run with B<--lines>,
1429 and fails on B<pyargs traceroute gnu.org fsf.org>.
1430
1431 =head3 Examples
1432
1433   seq 5 | pyargs -P50 -L seq
1434   seq 5 | parallel -P50 --lb seq
1435
1436   seq 5 | pyargs -P50 --mark -L seq
1437   seq 5 | parallel -P50 --lb \
1438     --tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
1439   # Similar, but not precisely the same
1440   seq 5 | parallel -P50 --lb --tag seq
1441
1442   seq 5 | pyargs -P50  --mark command
1443   # Somewhat longer with GNU Parallel due to the special
1444   #   --mark formatting
1445   cmd="$(echo "command" | parallel --shellquote)"
1446   wrap_cmd() {
1447      echo "MARK $cmd $@================================" >&3
1448      echo "OUTPUT START[$cmd $@]:"
1449      eval $cmd "$@"
1450      echo "OUTPUT END[$cmd $@]"
1451   }
1452   (seq 5 | env_parallel -P2 wrap_cmd) 3>&1
1453   # Similar, but not exactly the same
1454   seq 5 | parallel -t --tag command
1455
1456   (echo '1  2  3';echo 4 5 6) | pyargs  --stream seq
1457   (echo '1  2  3';echo 4 5 6) | perl -pe 's/\n/ /' |
1458     parallel -r -d' ' seq
1459   # Similar, but not exactly the same
1460   parallel seq ::: 1 2 3 4 5 6
1461
1462 https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
1463
1464
1465 =head2 DIFFERENCES BETWEEN concurrently AND GNU Parallel
1466
1467 B<concurrently> runs jobs in parallel.
1468
1469 The output is prepended with the job number, and may be incomplete:
1470
1471   $ concurrently 'seq 100000' | (sleep 3;wc -l)
1472   7165
1473
1474 When pretty printing it caches output in memory. Output mixes by using
1475 test MIX below whether or not output is cached.
1476
1477 There seems to be no way of making a template command and have
1478 B<concurrently> fill that with different args. The full commands must
1479 be given on the command line.
1480
1481 There is also no way of controlling how many jobs should be run in
1482 parallel at a time - i.e. "number of jobslots". Instead all jobs are
1483 simply started in parallel.
1484
1485 https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
1486
1487
1488 =head2 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
1489
1490 B<map> does not run jobs in parallel by default. The README suggests using:
1491
1492   ... | map t 'sleep $t && say done &'
1493
1494 But this fails if more jobs are run in parallel than the number of
1495 available processes. Since there is no support for parallelization in
1496 B<map> itself, the output also mixes:
1497
1498   seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
1499
1500 The major difference is that GNU B<parallel> is built for parallelization
1501 and B<map> is not. So GNU B<parallel> has lots of ways of dealing with the
1502 issues that parallelization raises:
1503
1504 =over 4
1505
1506 =item *
1507
1508 Keep the number of processes manageable
1509
1510 =item *
1511
1512 Make sure output does not mix
1513
1514 =item *
1515
1516 Make Ctrl-C kill all running processes
1517
1518 =back
1519
1520 Here are the 5 examples converted to GNU Parallel:
1521
1522   1$ ls *.c | map f 'foo $f'
1523   1$ ls *.c | parallel foo
1524
1525   2$ ls *.c | map f 'foo $f; bar $f'
1526   2$ ls *.c | parallel 'foo {}; bar {}'
1527
1528   3$ cat urls | map u 'curl -O $u'
1529   3$ cat urls | parallel curl -O
1530
1531   4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
1532   4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
1533   4$ parallel 'sleep {} && say done' ::: 1 1 1
1534
1535   5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
1536   5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
1537   5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
1538
1539 https://github.com/soveran/map (Last checked: 2019-01)
1540
1541
1542 =head2 DIFFERENCES BETWEEN loop AND GNU Parallel
1543
1544 B<loop> mixes stdout and stderr:
1545
1546     loop 'ls /no-such-file' >/dev/null
1547
1548 B<loop>'s replacement string B<$ITEM> does not quote strings:
1549
1550     echo 'two  spaces' | loop 'echo $ITEM'
1551
1552 B<loop> cannot run functions:
1553
1554     myfunc() { echo joe; }
1555     export -f myfunc
1556     loop 'myfunc this fails'
1557
1558 Some of the examples from https://github.com/Miserlou/Loop/ can be
1559 emulated with GNU B<parallel>:
1560
1561     # A couple of functions will make the code easier to read
1562     $ loopy() {
1563         yes | parallel -uN0 -j1 "$@"
1564       }
1565     $ export -f loopy
1566     $ time_out() {
1567         parallel -uN0 -q --timeout "$@" ::: 1
1568       }
1569     $ match() {
1570         perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
1571       }
1572     $ export -f match
1573
1574     $ loop 'ls' --every 10s
1575     $ loopy --delay 10s ls
1576
1577     $ loop 'touch $COUNT.txt' --count-by 5
1578     $ loopy touch '{= $_=seq()*5 =}'.txt
1579
1580     $ loop --until-contains 200 -- \
1581         ./get_response_code.sh --site mysite.biz`
1582     $ loopy --halt now,success=1 \
1583         './get_response_code.sh --site mysite.biz | match 200'
1584
1585     $ loop './poke_server' --for-duration 8h
1586     $ time_out 8h loopy ./poke_server
1587
1588     $ loop './poke_server' --until-success
1589     $ loopy --halt now,success=1 ./poke_server
1590
1591     $ cat files_to_create.txt | loop 'touch $ITEM'
1592     $ cat files_to_create.txt | parallel touch {}
1593
1594     $ loop 'ls' --for-duration 10min --summary
1595     # --joblog is somewhat more verbose than --summary
1596     $ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
1597
1598     $ loop 'echo hello'
1599     $ loopy echo hello
1600
1601     $ loop 'echo $COUNT'
1602     # GNU Parallel counts from 1
1603     $ loopy echo {#}
1604     # Counting from 0 can be forced
1605     $ loopy echo '{= $_=seq()-1 =}'
1606
1607     $ loop 'echo $COUNT' --count-by 2
1608     $ loopy echo '{= $_=2*(seq()-1) =}'
1609
1610     $ loop 'echo $COUNT' --count-by 2 --offset 10
1611     $ loopy echo '{= $_=10+2*(seq()-1) =}'
1612
1613     $ loop 'echo $COUNT' --count-by 1.1
1614     # GNU Parallel rounds 3.3000000000000003 to 3.3
1615     $ loopy echo '{= $_=1.1*(seq()-1) =}'
1616
1617     $ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
1618     $ loopy echo '{= $_=2*(seq()-1) =} {#}'
1619
1620     $ loop 'echo $COUNT' --num 3 --summary
1621     # --joblog is somewhat more verbose than --summary
1622     $ seq 3 | parallel --joblog my.log echo; cat my.log
1623
1624     $ loop 'ls -foobarbatz' --num 3 --summary
1625     # --joblog is somewhat more verbose than --summary
1626     $ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
1627
1628     $ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
1629     # Can be emulated by running 2 jobs
1630     $ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
1631     $ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
1632
1633     $ loop 'date' --every 5s
1634     $ loopy --delay 5s date
1635
1636     $ loop 'date' --for-duration 8s --every 2s
1637     $ time_out 8s loopy --delay 2s date
1638
1639     $ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
1640     $ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date  +%s`))s
1641     $ time_out $seconds loopy --delay 5s date -u
1642
1643     $ loop 'echo $RANDOM' --until-contains "666"
1644     $ loopy --halt now,success=1 'echo $RANDOM | match 666'
1645
1646     $ loop 'if (( RANDOM % 2 )); then
1647               (echo "TRUE"; true);
1648             else
1649               (echo "FALSE"; false);
1650             fi' --until-success
1651     $ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
1652                                     (echo "TRUE"; true);
1653                                   else
1654                                     (echo "FALSE"; false);
1655                                   fi'
1656
1657     $ loop 'if (( RANDOM % 2 )); then
1658         (echo "TRUE"; true);
1659       else
1660         (echo "FALSE"; false);
1661       fi' --until-error
1662     $ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
1663                                  (echo "TRUE"; true);
1664                                else
1665                                  (echo "FALSE"; false);
1666                                fi'
1667
1668     $ loop 'date' --until-match "(\d{4})"
1669     $ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
1670
1671     $ loop 'echo $ITEM' --for red,green,blue
1672     $ parallel echo ::: red green blue
1673
1674     $ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
1675     $ cat /tmp/my-list-of-files-to-create.txt | parallel touch
1676
1677     $ ls | loop 'cp $ITEM $ITEM.bak'; ls
1678     $ ls | parallel cp {} {}.bak; ls
1679
1680     $ loop 'echo $ITEM | tr a-z A-Z' -i
1681     $ parallel 'echo {} | tr a-z A-Z'
1682     # Or more efficiently:
1683     $ parallel --pipe tr a-z A-Z
1684
1685     $ loop 'echo $ITEM' --for "`ls`"
1686     $ parallel echo {} ::: "`ls`"
1687
1688     $ ls | loop './my_program $ITEM' --until-success;
1689     $ ls | parallel --halt now,success=1 ./my_program {}
1690
1691     $ ls | loop './my_program $ITEM' --until-fail;
1692     $ ls | parallel --halt now,fail=1 ./my_program {}
1693
1694     $ ./deploy.sh;
1695       loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
1696         --every 5s --until-contains 200;
1697       ./announce_to_slack.sh
1698     $ ./deploy.sh;
1699       loopy --delay 5s --halt now,success=1 \
1700       'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
1701       ./announce_to_slack.sh
1702
1703     $ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
1704     $ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
1705
1706     $ ./create_big_file -o my_big_file.bin;
1707       loop 'ls' --until-contains 'my_big_file.bin';
1708       ./upload_big_file my_big_file.bin
1709     # inotifywait is a better tool to detect file system changes.
1710     # It can even make sure the file is complete
1711     # so you are not uploading an incomplete file
1712     $ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
1713         grep my_big_file.bin
1714
1715     $ ls | loop 'cp $ITEM $ITEM.bak'
1716     $ ls | parallel cp {} {}.bak
1717
1718     $ loop './do_thing.sh' --every 15s --until-success --num 5
1719     $ parallel --retries 5 --delay 15s ::: ./do_thing.sh
1720
1721 https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
1722
1723
1724 =head2 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
1725
1726 B<lorikeet> can run jobs in parallel. It does this based on a
1727 dependency graph described in a file, so this is similar to B<make>.
1728
1729 https://github.com/cetra3/lorikeet (Last checked: 2018-10)
1730
1731
1732 =head2 DIFFERENCES BETWEEN spp AND GNU Parallel
1733
1734 B<spp> can run jobs in parallel. B<spp> does not use a command
1735 template to generate the jobs, but requires jobs to be in a
1736 file. Output from the jobs mix.
1737
1738 https://github.com/john01dav/spp (Last checked: 2019-01)
1739
1740
1741 =head2 DIFFERENCES BETWEEN paral AND GNU Parallel
1742
1743 B<paral> prints a lot of status information and stores the output from
1744 the commands run into files. This means it cannot be used the middle
1745 of a pipe like this
1746
1747   paral "echo this" "echo does not" "echo work" | wc
1748
1749 Instead it puts the output into files named like
1750 B<out_#_I<command>.out.log>. To get a very similar behaviour with GNU
1751 B<parallel> use B<--results
1752 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta>
1753
1754 B<paral> only takes arguments on the command line and each argument
1755 should be a full command. Thus it does not use command templates.
1756
1757 This limits how many jobs it can run in total, because they all need
1758 to fit on a single command line.
1759
1760 B<paral> has no support for running jobs remotely.
1761
1762 The examples from B<README.markdown> and the corresponding command run
1763 with GNU B<parallel> (B<--results
1764 'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta> is omitted from
1765 the GNU B<parallel> command):
1766
1767   paral "command 1" "command 2 --flag" "command arg1 arg2"
1768   parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
1769
1770   paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
1771     "sleep 3 && echo c3" "sleep 4 && echo c4"  "sleep 5 && echo c5"
1772   parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
1773     "sleep 3 && echo c3" "sleep 4 && echo c4"  "sleep 5 && echo c5"
1774   # Or shorter:
1775   parallel "sleep {} && echo c{}" ::: {1..5}
1776
1777   paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1778     "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1779   parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
1780     "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1781   # Or shorter:
1782   parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
1783
1784   paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1785     "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1786   parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
1787
1788   paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1789     "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1790   parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
1791
1792   paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
1793     "sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
1794   parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
1795
1796   paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1797     echo c && sleep 0.5 && echo d && sleep 0.5 && \
1798     echo e && sleep 0.5 && echo f && sleep 0.5 && \
1799     echo g && sleep 0.5 && echo h"
1800   parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
1801     echo c && sleep 0.5 && echo d && sleep 0.5 && \
1802     echo e && sleep 0.5 && echo f && sleep 0.5 && \
1803     echo g && sleep 0.5 && echo h"
1804
1805 https://github.com/amattn/paral (Last checked: 2019-01)
1806
1807
1808 =head2 DIFFERENCES BETWEEN concurr AND GNU Parallel
1809
1810 B<concurr> is built to run jobs in parallel using a client/server
1811 model.
1812
1813 The examples from B<README.md>:
1814
1815   concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
1816   parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
1817
1818   concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
1819   parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
1820
1821   concurr 'echo {}' < input_file
1822   parallel 'echo {}' < input_file
1823
1824   cat file | concurr 'echo {}'
1825   cat file | parallel 'echo {}'
1826
1827 B<concurr> deals badly empty input files and with output larger than
1828 64 KB.
1829
1830 https://github.com/mmstick/concurr (Last checked: 2019-01)
1831
1832
1833 =head2 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
1834
1835 B<lesser-parallel> is the inspiration for B<parallel --embed>. Both
1836 B<lesser-parallel> and B<parallel --embed> define bash functions that
1837 can be included as part of a bash script to run jobs in parallel.
1838
1839 B<lesser-parallel> implements a few of the replacement strings, but
1840 hardly any options, whereas B<parallel --embed> gives you the full
1841 GNU B<parallel> experience.
1842
1843 https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
1844
1845
1846 =head2 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
1847
1848 B<npm-parallel> can run npm tasks in parallel.
1849
1850 There are no examples and very little documentation, so it is hard to
1851 compare to GNU B<parallel>.
1852
1853 https://github.com/spion/npm-parallel (Last checked: 2019-01)
1854
1855
1856 =head2 DIFFERENCES BETWEEN machma AND GNU Parallel
1857
1858 B<machma> runs tasks in parallel. It gives time stamped
1859 output. It buffers in RAM. The examples from README.md:
1860
1861   # Put shorthand for timestamp in config for the examples
1862   echo '--rpl '\
1863     \''{time} $_=::strftime("%Y-%m-%d %H:%M:%S",localtime())'\' \
1864     > ~/.parallel/machma
1865   echo '--line-buffer --tagstring "{#} {time} {}"' >> ~/.parallel/machma
1866
1867   find . -iname '*.jpg' |
1868     machma --  mogrify -resize 1200x1200 -filter Lanczos {}
1869   find . -iname '*.jpg' |
1870     parallel --bar -Jmachma mogrify -resize 1200x1200 -filter Lanczos {}
1871
1872   cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
1873   cat /tmp/ips | parallel -j2 -Jmachma ping -c 2 -q {}
1874
1875   cat /tmp/ips |
1876     machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
1877   cat /tmp/ips |
1878     parallel -Jmachma 'ping -c 2 -q {} > /dev/null && echo alive'
1879
1880   find . -iname '*.jpg' |
1881     machma --timeout 5s -- mogrify -resize 1200x1200 -filter Lanczos {}
1882   find . -iname '*.jpg' |
1883     parallel --timeout 5s --bar mogrify -resize 1200x1200 \
1884       -filter Lanczos {}
1885
1886   find . -iname '*.jpg' -print0 |
1887     machma --null --  mogrify -resize 1200x1200 -filter Lanczos {}
1888   find . -iname '*.jpg' -print0 |
1889     parallel --null --bar mogrify -resize 1200x1200 -filter Lanczos {}
1890
1891 https://github.com/fd0/machma (Last checked: 2019-06)
1892
1893
1894 =head2 DIFFERENCES BETWEEN interlace AND GNU Parallel
1895
1896 Summary table (see legend above):
1897 - I2 I3 I4 - - -
1898 M1 - M3 - - M6
1899 - O2 O3 - - - - x x
1900 E1 E2 - - - - -
1901 - - - - - - - - -
1902 - -
1903
1904 B<interlace> is built for network analysis to run network tools in parallel.
1905
1906 B<interface> does not buffer output, so output from different jobs mixes.
1907
1908 The overhead for each target is O(n*n), so with 1000 targets it
1909 becomes very slow with an overhead in the order of 500ms/target.
1910
1911 Using B<prips> most of the examples from
1912 https://github.com/codingo/Interlace can be run with GNU B<parallel>:
1913
1914 Blocker
1915
1916   commands.txt:
1917     mkdir -p _output_/_target_/scans/
1918     _blocker_
1919     nmap _target_ -oA _output_/_target_/scans/_target_-nmap
1920   interlace -tL ./targets.txt -cL commands.txt -o $output
1921
1922   parallel -a targets.txt \
1923     mkdir -p $output/{}/scans/\; nmap {} -oA $output/{}/scans/{}-nmap
1924
1925 Blocks
1926
1927   commands.txt:
1928     _block:nmap_
1929     mkdir -p _target_/output/scans/
1930     nmap _target_ -oN _target_/output/scans/_target_-nmap
1931     _block:nmap_
1932     nikto --host _target_
1933   interlace -tL ./targets.txt -cL commands.txt
1934
1935   _nmap() {
1936     mkdir -p $1/output/scans/
1937     nmap $1 -oN $1/output/scans/$1-nmap
1938   }
1939   export -f _nmap
1940   parallel ::: _nmap "nikto --host" :::: targets.txt
1941
1942 Run Nikto Over Multiple Sites
1943
1944   interlace -tL ./targets.txt -threads 5 \
1945     -c "nikto --host _target_ > ./_target_-nikto.txt" -v
1946
1947   parallel -a targets.txt -P5 nikto --host {} \> ./{}_-nikto.txt
1948
1949 Run Nikto Over Multiple Sites and Ports
1950
1951   interlace -tL ./targets.txt -threads 5 -c \
1952     "nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
1953     -p 80,443 -v
1954
1955   parallel -P5 nikto --host {1}:{2} \> ./{1}-{2}-nikto.txt \
1956     :::: targets.txt ::: 80 443
1957
1958 Run a List of Commands against Target Hosts
1959
1960   commands.txt:
1961     nikto --host _target_:_port_ > _output_/_target_-nikto.txt
1962     sslscan _target_:_port_ >  _output_/_target_-sslscan.txt
1963     testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
1964   interlace -t example.com -o ~/Engagements/example/ \
1965     -cL ./commands.txt -p 80,443
1966
1967   parallel --results ~/Engagements/example/{2}:{3}{1} {1} {2}:{3} \
1968     ::: "nikto --host" sslscan testssl.sh ::: example.com ::: 80 443
1969
1970 CIDR notation with an application that doesn't support it
1971
1972   interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
1973     -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1974
1975   prips 192.168.12.0/24 |
1976     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1977
1978 Glob notation with an application that doesn't support it
1979
1980   interlace -t 192.168.12.* -c "vhostscan _target_ \
1981     -oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
1982
1983   # Glob is not supported in prips
1984   prips 192.168.12.0/24 |
1985     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1986
1987 Dash (-) notation with an application that doesn't support it
1988
1989   interlace -t 192.168.12.1-15 -c \
1990     "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
1991     -o ~/scans/ -threads 50
1992
1993   # Dash notation is not supported in prips
1994   prips 192.168.12.1 192.168.12.15 |
1995     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
1996
1997 Threading Support for an application that doesn't support it
1998
1999   interlace -tL ./target-list.txt -c \
2000     "vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
2001     -o ~/scans/ -threads 50
2002
2003   cat ./target-list.txt |
2004     parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
2005
2006 alternatively
2007
2008   ./vhosts-commands.txt:
2009     vhostscan -t $target -oN _output_/_target_-vhosts.txt
2010   interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
2011     -threads 50 -o ~/scans
2012
2013   ./vhosts-commands.txt:
2014     vhostscan -t "$1" -oN "$2"
2015   parallel -P50 ./vhosts-commands.txt {} ~/scans/{}-vhosts.txt \
2016     :::: ./target-list.txt
2017
2018 Exclusions
2019
2020   interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
2021     "vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
2022     -o ~/scans/ -threads 50
2023
2024   prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
2025     parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
2026
2027 Run Nikto Using Multiple Proxies
2028
2029    interlace -tL ./targets.txt -pL ./proxies.txt -threads 5 -c \
2030      "nikto --host _target_:_port_ -useproxy _proxy_ > \
2031       ./_target_-_port_-nikto.txt" -p 80,443 -v
2032
2033    parallel -j5 \
2034      "nikto --host {1}:{2} -useproxy {3} > ./{1}-{2}-nikto.txt" \
2035      :::: ./targets.txt ::: 80 443 :::: ./proxies.txt
2036
2037 https://github.com/codingo/Interlace (Last checked: 2019-09)
2038
2039
2040 =head2 DIFFERENCES BETWEEN otonvm Parallel AND GNU Parallel
2041
2042 I have been unable to get the code to run at all. It seems unfinished.
2043
2044 https://github.com/otonvm/Parallel (Last checked: 2019-02)
2045
2046
2047 =head2 DIFFERENCES BETWEEN k-bx par AND GNU Parallel
2048
2049 B<par> requires Haskell to work. This limits the number of platforms
2050 this can work on.
2051
2052 B<par> does line buffering in memory. The memory usage is 3x the
2053 longest line (compared to 1x for B<parallel --lb>). Commands must be
2054 given as arguments. There is no template.
2055
2056 These are the examples from https://github.com/k-bx/par with the
2057 corresponding GNU B<parallel> command.
2058
2059   par "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2060       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2061   parallel --lb ::: "echo foo; sleep 1; echo foo; sleep 1; echo foo" \
2062       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2063
2064   par "echo foo; sleep 1; foofoo" \
2065       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2066   parallel --lb --halt 1 ::: "echo foo; sleep 1; foofoo" \
2067       "echo bar; sleep 1; echo bar; sleep 1; echo bar" && echo "success"
2068
2069   par "PARPREFIX=[fooechoer] echo foo" "PARPREFIX=[bar] echo bar"
2070   parallel --lb --colsep , --tagstring {1} {2} \
2071     ::: "[fooechoer],echo foo" "[bar],echo bar"
2072
2073   par --succeed "foo" "bar" && echo 'wow'
2074   parallel "foo" "bar"; true && echo 'wow'
2075
2076 https://github.com/k-bx/par (Last checked: 2019-02)
2077
2078 =head2 DIFFERENCES BETWEEN parallelshell AND GNU Parallel
2079
2080 B<parallelshell> does not allow for composed commands:
2081
2082   # This does not work
2083   parallelshell 'echo foo;echo bar' 'echo baz;echo quuz'
2084
2085 Instead you have to wrap that in a shell:
2086
2087   parallelshell 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2088
2089 It buffers output in RAM. All commands must be given on the command
2090 line and all commands are started in parallel at the same time. This
2091 will cause the system to freeze if there are so many jobs that there
2092 is not enough memory to run them all at the same time.
2093
2094 https://github.com/keithamus/parallelshell (Last checked: 2019-02)
2095
2096 https://github.com/darkguy2008/parallelshell (Last checked: 2019-03)
2097
2098
2099 =head2 DIFFERENCES BETWEEN shell-executor AND GNU Parallel
2100
2101 B<shell-executor> does not allow for composed commands:
2102
2103   # This does not work
2104   sx 'echo foo;echo bar' 'echo baz;echo quuz'
2105
2106 Instead you have to wrap that in a shell:
2107
2108   sx 'sh -c "echo foo;echo bar"' 'sh -c "echo baz;echo quuz"'
2109
2110 It buffers output in RAM. All commands must be given on the command
2111 line and all commands are started in parallel at the same time. This
2112 will cause the system to freeze if there are so many jobs that there
2113 is not enough memory to run them all at the same time.
2114
2115 https://github.com/royriojas/shell-executor (Last checked: 2019-02)
2116
2117
2118 =head2 DIFFERENCES BETWEEN non-GNU par AND GNU Parallel
2119
2120 B<par> buffers in memory to avoid mixing of jobs. It takes 1s per 1
2121 million output lines.
2122
2123 B<par> needs to have all commands before starting the first job. The
2124 jobs are read from stdin (standard input) so any quoting will have to
2125 be done by the user.
2126
2127 Stdout (standard output) is prepended with o:. Stderr (standard error)
2128 is sendt to stdout (standard output) and prepended with e:.
2129
2130 For short jobs with little output B<par> is 20% faster than GNU
2131 B<parallel> and 60% slower than B<xargs>.
2132
2133 http://savannah.nongnu.org/projects/par (Last checked: 2019-02)
2134
2135
2136 =head2 DIFFERENCES BETWEEN fd AND GNU Parallel
2137
2138 B<fd> does not support composed commands, so commands must be wrapped
2139 in B<sh -c>.
2140
2141 It buffers output in RAM.
2142
2143 It only takes file names from the filesystem as input (similar to B<find>).
2144
2145 https://github.com/sharkdp/fd (Last checked: 2019-02)
2146
2147
2148 =head2 DIFFERENCES BETWEEN lateral AND GNU Parallel
2149
2150 B<lateral> is very similar to B<sem>: It takes a single command and
2151 runs it in the background. The design means that output from parallel
2152 running jobs may mix. If it dies unexpectly it leaves a socket in
2153 ~/.lateral/socket.PID.
2154
2155 B<lateral> deals badly with too long command lines. This makes the
2156 B<lateral> server crash:
2157
2158   lateral run echo `seq 100000| head -c 1000k`
2159
2160 Any options will be read by B<lateral> so this does not work
2161 (B<lateral> interprets the B<-l>):
2162
2163   lateral run ls -l
2164
2165 Composed commands do not work:
2166
2167   lateral run pwd ';' ls
2168
2169 Functions do not work:
2170
2171   myfunc() { echo a; }
2172   export -f myfunc
2173   lateral run myfunc
2174
2175 Running B<emacs> in the terminal causes the parent shell to die:
2176
2177   echo '#!/bin/bash' > mycmd
2178   echo emacs -nw >> mycmd
2179   chmod +x mycmd
2180   lateral start
2181   lateral run ./mycmd
2182
2183 Here are the examples from https://github.com/akramer/lateral with the
2184 corresponding GNU B<sem> and GNU B<parallel> commands:
2185
2186   1$ lateral start
2187   1$ for i in $(cat /tmp/names); do
2188   1$   lateral run -- some_command $i
2189   1$ done
2190   1$ lateral wait
2191   1$
2192   1$ for i in $(cat /tmp/names); do
2193   1$   sem some_command $i
2194   1$ done
2195   1$ sem --wait
2196   1$
2197   1$ parallel some_command :::: /tmp/names
2198
2199   2$ lateral start
2200   2$ for i in $(seq 1 100); do
2201   2$   lateral run -- my_slow_command < workfile$i > /tmp/logfile$i
2202   2$ done
2203   2$ lateral wait
2204   2$
2205   2$ for i in $(seq 1 100); do
2206   2$   sem my_slow_command < workfile$i > /tmp/logfile$i
2207   2$ done
2208   2$ sem --wait
2209   2$
2210   2$ parallel 'my_slow_command < workfile{} > /tmp/logfile{}' \
2211        ::: {1..100}
2212
2213   3$ lateral start -p 0 # yup, it will just queue tasks
2214   3$ for i in $(seq 1 100); do
2215   3$   lateral run -- command_still_outputs_but_wont_spam inputfile$i
2216   3$ done
2217   3$ # command output spam can commence
2218   3$ lateral config -p 10; lateral wait
2219   3$
2220   3$ for i in $(seq 1 100); do
2221   3$   echo "command inputfile$i" >> joblist
2222   3$ done
2223   3$ parallel -j 10 :::: joblist
2224   3$
2225   3$ echo 1 > /tmp/njobs
2226   3$ parallel -j /tmp/njobs command inputfile{} \
2227        ::: {1..100} &
2228   3$ echo 10 >/tmp/njobs
2229   3$ wait
2230
2231 https://github.com/akramer/lateral (Last checked: 2019-03)
2232
2233
2234 =head2 DIFFERENCES BETWEEN with-this AND GNU Parallel
2235
2236 The examples from https://github.com/amritb/with-this.git and the
2237 corresponding GNU B<parallel> command:
2238
2239   with -v "$(cat myurls.txt)" "curl -L this"
2240   parallel curl -L ::: myurls.txt
2241
2242   with -v "$(cat myregions.txt)" \
2243     "aws --region=this ec2 describe-instance-status"
2244   parallel aws --region={} ec2 describe-instance-status \
2245     :::: myregions.txt
2246
2247   with -v "$(ls)" "kubectl --kubeconfig=this get pods"
2248   ls | parallel kubectl --kubeconfig={} get pods
2249
2250   with -v "$(ls | grep config)" "kubectl --kubeconfig=this get pods"
2251   ls | grep config | parallel kubectl --kubeconfig={} get pods
2252
2253   with -v "$(echo {1..10})" "echo 123"
2254   parallel -N0 echo 123 ::: {1..10}
2255
2256 Stderr is merged with stdout. B<with-this> buffers in RAM. It uses 3x
2257 the output size, so you cannot have output larger than 1/3rd the
2258 amount of RAM. The input values cannot contain spaces. Composed
2259 commands do not work.
2260
2261 B<with-this> gives some additional information, so the output has to
2262 be cleaned before piping it to the next command.
2263
2264 https://github.com/amritb/with-this.git (Last checked: 2019-03)
2265
2266
2267 =head2 DIFFERENCES BETWEEN Tollef's parallel (moreutils) AND GNU Parallel
2268
2269 Summary table (see legend above):
2270 - - - I4 - - I7
2271 - - M3 - - M6
2272 - O2 O3 - O5 O6 - x x
2273 E1 - - - - - E7
2274 - x x x x x x x x
2275 - -
2276
2277 =head3 EXAMPLES FROM Tollef's parallel MANUAL
2278
2279 B<Tollef> parallel sh -c "echo hi; sleep 2; echo bye" -- 1 2 3
2280
2281 B<GNU> parallel "echo hi; sleep 2; echo bye" ::: 1 2 3
2282
2283 B<Tollef> parallel -j 3 ufraw -o processed -- *.NEF
2284
2285 B<GNU> parallel -j 3 ufraw -o processed ::: *.NEF
2286
2287 B<Tollef> parallel -j 3 -- ls df "echo hi"
2288
2289 B<GNU> parallel -j 3 ::: ls df "echo hi"
2290
2291 (Last checked: 2019-08)
2292
2293
2294 =head2 Todo
2295
2296 Url for spread
2297
2298 https://github.com/reggi/pkgrun
2299
2300 https://github.com/benoror/better-npm-run - not obvious how to use
2301
2302 https://github.com/bahmutov/with-package
2303
2304 https://github.com/xuchenCN/go-pssh
2305
2306 https://github.com/flesler/parallel
2307
2308 https://github.com/Julian/Verge
2309
2310 https://github.com/ExpectationMax/simple_gpu_scheduler
2311     simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
2312     parallel -j3 --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt
2313
2314     simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
2315     parallel --header : --shuf -j3 -v CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2316
2317     simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
2318     parallel --header : --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128
2319
2320     touch gpu.queue
2321     tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
2322     echo "my_command_with | and stuff > logfile" >> gpu.queue
2323
2324     touch gpu.queue
2325     tail -f -n 0 gpu.queue | parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
2326     # Needed to fill job slots once
2327     seq 3 | parallel echo true >> gpu.queue
2328     # Add jobs
2329     echo "my_command_with | and stuff > logfile" >> gpu.queue
2330     # Needed to flush output from completed jobs
2331     seq 3 | parallel echo true >> gpu.queue
2332
2333
2334 =head1 TESTING OTHER TOOLS
2335
2336 There are certain issues that are very common on parallelizing
2337 tools. Here are a few stress tests. Be warned: If the tool is badly
2338 coded it may overload your machine.
2339
2340
2341 =head2 MIX: Output mixes
2342
2343 Output from 2 jobs should not mix. If the output is not used, this
2344 does not matter; but if the output I<is> used then it is important
2345 that you do not get half a line from one job followed by half a line
2346 from another job.
2347
2348 If the tool does not buffer, output will most likely mix now and then.
2349
2350 This test stresses whether output mixes.
2351
2352   #!/bin/bash
2353
2354   paralleltool="parallel -j0"
2355
2356   cat <<-EOF > mycommand
2357   #!/bin/bash
2358
2359   # If a, b, c, d, e, and f mix: Very bad
2360   perl -e 'print STDOUT "a"x3000_000," "'
2361   perl -e 'print STDERR "b"x3000_000," "'
2362   perl -e 'print STDOUT "c"x3000_000," "'
2363   perl -e 'print STDERR "d"x3000_000," "'
2364   perl -e 'print STDOUT "e"x3000_000," "'
2365   perl -e 'print STDERR "f"x3000_000," "'
2366   echo
2367   echo >&2
2368   EOF
2369   chmod +x mycommand
2370
2371   # Run 30 jobs in parallel
2372   seq 30 |
2373     $paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
2374
2375   # 'a c e' and 'b d f' should always stay together
2376   # and there should only be a single line per job
2377
2378
2379 =head2 STDERRMERGE: Stderr is merged with stdout
2380
2381 Output from stdout and stderr should not be merged, but kept separated.
2382
2383 This test shows whether stdout is mixed with stderr.
2384
2385   #!/bin/bash
2386
2387   paralleltool="parallel -j0"
2388
2389   cat <<-EOF > mycommand
2390   #!/bin/bash
2391
2392   echo stdout
2393   echo stderr >&2
2394   echo stdout
2395   echo stderr >&2
2396   EOF
2397   chmod +x mycommand
2398
2399   # Run one job
2400   echo |
2401     $paralleltool ./mycommand > stdout 2> stderr
2402   cat stdout
2403   cat stderr
2404
2405
2406 =head2 RAM: Output limited by RAM
2407
2408 Some tools cache output in RAM. This makes them extremely slow if the
2409 output is bigger than physical memory and crash if the output is
2410 bigger than the virtual memory.
2411
2412   #!/bin/bash
2413
2414   paralleltool="parallel -j0"
2415
2416   cat <<'EOF' > mycommand
2417   #!/bin/bash
2418
2419   # Generate 1 GB output
2420   yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
2421   EOF
2422   chmod +x mycommand
2423
2424   # Run 20 jobs in parallel
2425   # Adjust 20 to be > physical RAM and < free space on /tmp
2426   seq 20 | time $paralleltool ./mycommand | wc -c
2427
2428
2429 =head2 DISKFULL: Incomplete data if /tmp runs full
2430
2431 If caching is done on disk, the disk can run full during the run. Not
2432 all programs discover this. GNU Parallel discovers it, if it stays
2433 full for at least 2 seconds.
2434
2435   #!/bin/bash
2436
2437   paralleltool="parallel -j0"
2438
2439   # This should be a dir with less than 100 GB free space
2440   smalldisk=/tmp/shm/parallel
2441
2442   TMPDIR="$smalldisk"
2443   export TMPDIR
2444
2445   max_output() {
2446       # Force worst case scenario:
2447       # Make GNU Parallel only check once per second
2448       sleep 10
2449       # Generate 100 GB to fill $TMPDIR
2450       # Adjust if /tmp is bigger than 100 GB
2451       yes | head -c 100G >$TMPDIR/$$
2452       # Generate 10 MB output that will not be buffered due to full disk
2453       perl -e 'print "X"x10_000_000' | head -c 10M
2454       echo This part is missing from incomplete output
2455       sleep 2
2456       rm $TMPDIR/$$
2457       echo Final output
2458   }
2459
2460   export -f max_output
2461   seq 10 | $paralleltool max_output | tr -s X
2462
2463
2464 =head2 CLEANUP: Leaving tmp files at unexpected death
2465
2466 Some tools do not clean up tmp files if they are killed. If the tool
2467 buffers on disk, they may not clean up, if they are killed.
2468
2469   #!/bin/bash
2470
2471   paralleltool=parallel
2472
2473   ls /tmp >/tmp/before
2474   seq 10 | $paralleltool sleep &
2475   pid=$!
2476   # Give the tool time to start up
2477   sleep 1
2478   # Kill it without giving it a chance to cleanup
2479   kill -9 $!
2480   # Should be empty: No files should be left behind
2481   diff <(ls /tmp) /tmp/before
2482
2483
2484 =head2 SPCCHAR: Dealing badly with special file names.
2485
2486 It is not uncommon for users to create files like:
2487
2488   My brother's 12" *** record  (costs $$$).jpg
2489
2490 Some tools break on this.
2491
2492   #!/bin/bash
2493
2494   paralleltool=parallel
2495
2496   touch "My brother's 12\" *** record  (costs \$\$\$).jpg"
2497   ls My*jpg | $paralleltool ls -l
2498
2499
2500 =head2 COMPOSED: Composed commands do not work
2501
2502 Some tools require you to wrap composed commands into B<bash -c>.
2503
2504   echo bar | $paralleltool echo foo';' echo {}
2505
2506
2507 =head2 ONEREP: Only one replacement string allowed
2508
2509 Some tools can only insert the argument once.
2510
2511   echo bar | $paralleltool echo {} foo {}
2512
2513
2514 =head2 INPUTSIZE: Length of input should not be limited
2515
2516 Some tools limit the length of the input lines artificially with no good
2517 reason. GNU B<parallel> does not:
2518
2519   perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
2520
2521 GNU B<parallel> limits the command to run to 128 KB due to execve(1):
2522
2523   perl -e 'print "x"x131_000' | parallel echo {} | wc
2524
2525
2526 =head2 NUMWORDS: Speed depends on number of words
2527
2528 Some tools become very slow if output lines have many words.
2529
2530   #!/bin/bash
2531
2532   paralleltool=parallel
2533
2534   cat <<-EOF > mycommand
2535   #!/bin/bash
2536
2537   # 10 MB of lines with 1000 words
2538   yes "`seq 1000`" | head -c 10M
2539   EOF
2540   chmod +x mycommand
2541
2542   # Run 30 jobs in parallel
2543   seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
2544
2545
2546 =head1 AUTHOR
2547
2548 When using GNU B<parallel> for a publication please cite:
2549
2550 O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
2551 The USENIX Magazine, February 2011:42-47.
2552
2553 This helps funding further development; and it won't cost you a cent.
2554 If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
2555
2556 Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
2557
2558 Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
2559
2560 Copyright (C) 2010-2019 Ole Tange, http://ole.tange.dk and Free
2561 Software Foundation, Inc.
2562
2563 Parts of the manual concerning B<xargs> compatibility is inspired by
2564 the manual of B<xargs> from GNU findutils 4.4.2.
2565
2566
2567 =head1 LICENSE
2568
2569 This program is free software; you can redistribute it and/or modify
2570 it under the terms of the GNU General Public License as published by
2571 the Free Software Foundation; either version 3 of the License, or
2572 at your option any later version.
2573
2574 This program is distributed in the hope that it will be useful,
2575 but WITHOUT ANY WARRANTY; without even the implied warranty of
2576 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
2577 GNU General Public License for more details.
2578
2579 You should have received a copy of the GNU General Public License
2580 along with this program.  If not, see <http://www.gnu.org/licenses/>.
2581
2582 =head2 Documentation license I
2583
2584 Permission is granted to copy, distribute and/or modify this documentation
2585 under the terms of the GNU Free Documentation License, Version 1.3 or
2586 any later version published by the Free Software Foundation; with no
2587 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
2588 Texts.  A copy of the license is included in the file fdl.txt.
2589
2590 =head2 Documentation license II
2591
2592 You are free:
2593
2594 =over 9
2595
2596 =item B<to Share>
2597
2598 to copy, distribute and transmit the work
2599
2600 =item B<to Remix>
2601
2602 to adapt the work
2603
2604 =back
2605
2606 Under the following conditions:
2607
2608 =over 9
2609
2610 =item B<Attribution>
2611
2612 You must attribute the work in the manner specified by the author or
2613 licensor (but not in any way that suggests that they endorse you or
2614 your use of the work).
2615
2616 =item B<Share Alike>
2617
2618 If you alter, transform, or build upon this work, you may distribute
2619 the resulting work only under the same, similar or a compatible
2620 license.
2621
2622 =back
2623
2624 With the understanding that:
2625
2626 =over 9
2627
2628 =item B<Waiver>
2629
2630 Any of the above conditions can be waived if you get permission from
2631 the copyright holder.
2632
2633 =item B<Public Domain>
2634
2635 Where the work or any of its elements is in the public domain under
2636 applicable law, that status is in no way affected by the license.
2637
2638 =item B<Other Rights>
2639
2640 In no way are any of the following rights affected by the license:
2641
2642 =over 2
2643
2644 =item *
2645
2646 Your fair dealing or fair use rights, or other applicable
2647 copyright exceptions and limitations;
2648
2649 =item *
2650
2651 The author's moral rights;
2652
2653 =item *
2654
2655 Rights other persons may have either in the work itself or in
2656 how the work is used, such as publicity or privacy rights.
2657
2658 =back
2659
2660 =back
2661
2662 =over 9
2663
2664 =item B<Notice>
2665
2666 For any reuse or distribution, you must make clear to others the
2667 license terms of this work.
2668
2669 =back
2670
2671 A copy of the full license is included in the file as cc-by-sa.txt.
2672
2673
2674 =head1 DEPENDENCIES
2675
2676 GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
2677 IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
2678 it also uses rsync with ssh.
2679
2680
2681 =head1 SEE ALSO
2682
2683 B<find>(1), B<xargs>(1), B<make>(1), B<pexec>(1), B<ppss>(1),
2684 B<xjobs>(1), B<prll>(1), B<dxargs>(1), B<mdm>(1)
2685
2686 =cut