src/parallel_design.pod

   1 #!/usr/bin/perl -w
   2
   3 # SPDX-FileCopyrightText: 2021-2024 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
   4 # SPDX-License-Identifier: GFDL-1.3-or-later
   5 # SPDX-License-Identifier: CC-BY-SA-4.0
   6
   7 =encoding utf8
   8
   9
  10 =head1 Design of GNU Parallel
  11
  12 This document describes design decisions made in the development of
  13 GNU B<parallel> and the reasoning behind them. It will give an
  14 overview of why some of the code looks the way it does, and will help
  15 new maintainers understand the code better.
  16
  17
  18 =head2 One file program
  19
  20 GNU B<parallel> is a Perl script in a single file. It is object
  21 oriented, but contrary to normal Perl scripts each class is not in its
  22 own file. This is due to user experience: The goal is that in a pinch
  23 the user will be able to get GNU B<parallel> working simply by copying
  24 a single file: No need to mess around with environment variables like
  25 PERL5LIB.
  26
  27
  28 =head2 Choice of programming language
  29
  30 GNU B<parallel> is designed to be able to run on old systems. That
  31 means that it cannot depend on a compiler being installed - and
  32 especially not a compiler for a language that is younger than 20 years
  33 old.
  34
  35 The goal is that you can use GNU B<parallel> on any system, even if
  36 you are not allowed to install additional software.
  37
  38 Of all the systems I have experienced, I have yet to see a system that
  39 had GCC installed that did not have Perl. The same goes for Rust, Go,
  40 Haskell, and other younger languages. I have, however, seen systems
  41 with Perl without any of the mentioned compilers.
  42
  43 Most modern systems also have either Python2 or Python3 installed, but
  44 you still cannot be certain which version, and since Python2 cannot
  45 run under Python3, Python is not an option.
  46
  47 Perl has the added benefit that implementing the {= perlexpr =}
  48 replacement string was fairly easy.
  49
  50 The primary drawback is that Perl is slow. So there is an overhead of
  51 3-10 ms/job and 1 ms/MB output (and even more if you use B<--tag>).
  52
  53
  54 =head2 Old Perl style
  55
  56 GNU B<parallel> uses some old, deprecated constructs. This is due to a
  57 goal of being able to run on old installations. Currently the target
  58 is CentOS 3.9 and Perl 5.8.0.
  59
  60
  61 =head2 Scalability up and down
  62
  63 The smallest system GNU B<parallel> is tested on is a 32 MB ASUS
  64 WL500gP. The largest is a 2 TB 128-core machine. It scales up to
  65 around 100 machines - depending on the duration of each job.
  66
  67
  68 =head2 Exponentially back off
  69
  70 GNU B<parallel> busy waits. This is because the reason why a job is
  71 not started may be due to load average (when using B<--load>), and
  72 thus it will not make sense to just wait for a job to finish. Instead
  73 the load average must be rechecked regularly. Load average is not the
  74 only reason: B<--timeout> has a similar problem.
  75
  76 To not burn up too much CPU GNU B<parallel> sleeps exponentially
  77 longer and longer if nothing happens, maxing out at 1 second.
  78
  79
  80 =head2 Shell compatibility
  81
  82 It is a goal to have GNU B<parallel> work equally well in any
  83 shell. However, in practice GNU B<parallel> is being developed in
  84 B<bash> and thus testing in other shells is limited to reported bugs.
  85
  86 When an incompatibility is found there is often not an easy fix:
  87 Fixing the problem in B<csh> often breaks it in B<bash>. In these
  88 cases the fix is often to use a small Perl script and call that.
  89
  90
  91 =head2 env_parallel
  92
  93 B<env_parallel> is a dummy shell script that will run if
  94 B<env_parallel> is not an alias or a function and tell the user how to
  95 activate the alias/function for the supported shells.
  96
  97 The alias or function will copy the current environment and run the
  98 command with GNU B<parallel> in the copy of the environment.
  99
 100 The problem is that you cannot access all of the current environment
 101 inside Perl. E.g. aliases, functions and unexported shell variables.
 102
 103 The idea is therefore to take the environment and put it in
 104 B<$PARALLEL_ENV> which GNU B<parallel> prepends to every command.
 105
 106 The only way to have access to the environment is directly from the
 107 shell, so the program must be written in a shell script that will be
 108 sourced and there has to deal with the dialect of the relevant shell.
 109
 110
 111 =head3 env_parallel.*
 112
 113 These are the files that implements the alias or function
 114 B<env_parallel> for a given shell. It could be argued that these
 115 should be put in some obscure place under /usr/lib, but by putting
 116 them in your path it becomes trivial to find the path to them and
 117 B<source> them:
 118
 119   source `which env_parallel.foo`
 120
 121 The beauty is that they can be put anywhere in the path without the
 122 user having to know the location. So if the user's path includes
 123 /afs/bin/i386_fc5 or /usr/pkg/parallel/bin or
 124 /usr/local/parallel/20161222/sunos5.6/bin the files can be put in the
 125 dir that makes most sense for the sysadmin.
 126
 127
 128 =head3 env_parallel.bash / env_parallel.sh / env_parallel.ash /
 129 env_parallel.dash / env_parallel.zsh / env_parallel.ksh /
 130 env_parallel.mksh
 131
 132 B<env_parallel.(bash|sh|ash|dash|ksh|mksh|zsh)> defines the function
 133 B<env_parallel>. It uses B<alias> and B<typeset> to dump the
 134 configuration (with a few exceptions) into B<$PARALLEL_ENV> before
 135 running GNU B<parallel>.
 136
 137 After GNU B<parallel> is finished, B<$PARALLEL_ENV> is deleted.
 138
 139
 140 =head3 env_parallel.csh
 141
 142 B<env_parallel.csh> has two purposes: If B<env_parallel> is not an
 143 alias: make it into an alias that sets B<$PARALLEL> with arguments
 144 and calls B<env_parallel.csh>.
 145
 146 If B<env_parallel> is an alias, then B<env_parallel.csh> uses
 147 B<$PARALLEL> as the arguments for GNU B<parallel>.
 148
 149 It exports the environment by writing a variable definition to a file
 150 for each variable.  The definitions of aliases are appended to this
 151 file. Finally the file is put into B<$PARALLEL_ENV>.
 152
 153 GNU B<parallel> is then run and B<$PARALLEL_ENV> is deleted.
 154
 155
 156 =head3 env_parallel.fish
 157
 158 First all functions definitions are generated using a loop and
 159 B<functions>.
 160
 161 Dumping the scalar variable definitions is harder.
 162
 163 B<fish> can represent non-printable characters in (at least) 2
 164 ways. To avoid problems all scalars are converted to \XX quoting.
 165
 166 Then commands to generate the definitions are made and separated by
 167 NUL.
 168
 169 This is then piped into a Perl script that quotes all values. List
 170 elements will be appended using two spaces.
 171
 172 Finally \n is converted into \1 because B<fish> variables cannot
 173 contain \n. GNU B<parallel> will later convert all \1 from
 174 B<$PARALLEL_ENV> into \n.
 175
 176 This is then all saved in B<$PARALLEL_ENV>.
 177
 178 GNU B<parallel> is called, and B<$PARALLEL_ENV> is deleted.
 179
 180
 181 =head2 parset (supported in sh, ash, dash, bash, zsh, ksh, mksh)
 182
 183 B<parset> is a shell function. This is the reason why B<parset> can
 184 set variables: It runs in the shell which is calling it.
 185
 186 It is also the reason why B<parset> does not work, when data is piped
 187 into it: B<... | parset ...> makes B<parset> start in a subshell, and
 188 any changes in environment can therefore not make it back to the
 189 calling shell.
 190
 191
 192 =head2 Job slots
 193
 194 The easiest way to explain what GNU B<parallel> does is to assume that
 195 there are a number of job slots, and when a slot becomes available a
 196 job from the queue will be run in that slot. But originally GNU
 197 B<parallel> did not model job slots in the code. Job slots have been
 198 added to make it possible to use B<{%}> as a replacement string.
 199
 200 While the job sequence number can be computed in advance, the job slot
 201 can only be computed the moment a slot becomes available. So it has
 202 been implemented as a stack with lazy evaluation: Draw one from an
 203 empty stack and the stack is extended by one. When a job is done, push
 204 the available job slot back on the stack.
 205
 206 This implementation also means that if you re-run the same jobs, you
 207 cannot assume jobs will get the same slots. And if you use remote
 208 executions, you cannot assume that a given job slot will remain on the
 209 same remote server. This goes double since number of job slots can be
 210 adjusted on the fly (by giving B<--jobs> a file name).
 211
 212
 213 =head2 Rsync protocol version
 214
 215 B<rsync> 3.1.x uses protocol 31 which is unsupported by version
 216 2.5.7. That means that you cannot push a file to a remote system using
 217 B<rsync> protocol 31, if the remote system uses 2.5.7. B<rsync> does
 218 not automatically downgrade to protocol 30.
 219
 220 GNU B<parallel> does not require protocol 31, so if the B<rsync>
 221 version is >= 3.1.0 then B<--protocol 30> is added to force newer
 222 B<rsync>s to talk to version 2.5.7.
 223
 224
 225 =head2 Compression
 226
 227 GNU B<parallel> buffers output in temporary files.  B<--compress>
 228 compresses the buffered data.  This is a bit tricky because there
 229 should be no files to clean up if GNU B<parallel> is killed by a power
 230 outage.
 231
 232 GNU B<parallel> first selects a compression program. If the user has
 233 not selected one, the first of these that is in $PATH is used: B<pzstd
 234 lbzip2 pbzip2 zstd pixz lz4 pigz lzop plzip lzip gzip lrz pxz bzip2
 235 lzma xz clzip>. They are sorted by speed on a 128 core machine.
 236
 237 Schematically the setup is as follows:
 238
 239   command started by parallel | compress > tmpfile
 240   cattail tmpfile | uncompress | parallel which reads the output
 241
 242 The setup is duplicated for both standard output (stdout) and standard
 243 error (stderr).
 244
 245 GNU B<parallel> pipes output from the command run into the compression
 246 program which saves to a tmpfile. GNU B<parallel> records the pid of
 247 the compress program.  At the same time a small Perl script (called
 248 B<cattail> above) is started: It basically does B<cat> followed by
 249 B<tail -f>, but it also removes the tmpfile as soon as the first byte
 250 is read, and it continuously checks if the pid of the compression
 251 program is dead. If the compress program is dead, B<cattail> reads the
 252 rest of tmpfile and exits.
 253
 254 As most compression programs write out a header when they start, the
 255 tmpfile in practice is removed by B<cattail> after around 40 ms.
 256
 257 More detailed it works like this:
 258
 259   bash ( command ) |
 260     sh ( emptywrapper ( bash ( compound compress ) ) >tmpfile )
 261   cattail ( rm tmpfile; compound decompress ) < tmpfile
 262
 263 This complex setup is to make sure compress program is only started if
 264 there is input. This means each job will cause 8 processes to run. If
 265 combined with B<--keep-order> these processes will run until the job
 266 has been printed.
 267
 268
 269 =head2 Wrapping
 270
 271 The command given by the user can be wrapped in multiple
 272 templates. Templates can be wrapped in other templates.
 273
 274
 275
 276 =over 15
 277
 278 =item B<$COMMAND>
 279
 280 the command to run.
 281
 282
 283 =item B<$INPUT>
 284
 285 the input to run.
 286
 287
 288 =item B<$SHELL>
 289
 290 the shell that started GNU Parallel.
 291
 292
 293 =item B<$SSHLOGIN>
 294
 295 the sshlogin.
 296
 297
 298 =item B<$WORKDIR>
 299
 300 the working dir.
 301
 302
 303 =item B<$FILE>
 304
 305 the file to read parts from.
 306
 307
 308 =item B<$STARTPOS>
 309
 310 the first byte position to read from B<$FILE>.
 311
 312
 313 =item B<$LENGTH>
 314
 315 the number of bytes to read from B<$FILE>.
 316
 317
 318 =item --shellquote
 319
 320 echo I<Double quoted $INPUT>
 321
 322
 323 =item --nice I<pri>
 324
 325 Remote: See B<The remote system wrapper>.
 326
 327 Local: B<setpriority(0,0,$nice)>
 328
 329 =item --cat
 330
 331   cat > {}; $COMMAND {};
 332   perl -e '$bash = shift;
 333     $csh = shift;
 334     for(@ARGV) { unlink;rmdir; }
 335     if($bash =~ s/h//) { exit $bash;  }
 336     exit $csh;' "$?h" "$status" {};
 337
 338 {} is set to B<$PARALLEL_TMP> which is a tmpfile. The Perl script
 339 saves the exit value, unlinks the tmpfile, and returns the exit value
 340 - no matter if the shell is B<bash>/B<ksh>/B<zsh> (using $?) or
 341 B<*csh>/B<fish> (using $status).
 342
 343 =item --fifo
 344
 345   perl -e '($s,$c,$f) = @ARGV;
 346     # mkfifo $PARALLEL_TMP
 347     system "mkfifo", $f;
 348     # spawn $shell -c $command &
 349     $pid = fork || exec $s, "-c", $c;
 350     open($o,">",$f) || die $!;
 351     # cat > $PARALLEL_TMP
 352     while(sysread(STDIN,$buf,131072)){
 353        syswrite $o, $buf;
 354     }
 355     close $o;
 356     # waitpid to get the exit code from $command
 357     waitpid $pid,0;
 358     # Cleanup
 359     unlink $f;
 360     exit $?/256;' $SHELL -c $COMMAND $PARALLEL_TMP
 361
 362 This is an elaborate way of: mkfifo {}; run B<$COMMAND> in the
 363 background using B<$SHELL>; copying STDIN to {}; waiting for background
 364 to complete; remove {} and exit with the exit code from B<$COMMAND>.
 365
 366 It is made this way to be compatible with B<*csh>/B<fish>.
 367
 368 =item --pipepart
 369
 370
 371   < $FILE perl -e 'while(@ARGV) {
 372       sysseek(STDIN,shift,0) || die;
 373       $left = shift;
 374       while($read =
 375             sysread(STDIN,$buf,
 376                     ($left > 131072 ? 131072 : $left))){
 377         $left -= $read;
 378         syswrite(STDOUT,$buf);
 379       }
 380     }' $STARTPOS $LENGTH
 381
 382 This will read B<$LENGTH> bytes from B<$FILE> starting at B<$STARTPOS>
 383 and send it to STDOUT.
 384
 385 =item --sshlogin $SSHLOGIN
 386
 387   ssh $SSHLOGIN "$COMMAND"
 388
 389 =item --transfer
 390
 391   ssh $SSHLOGIN mkdir -p ./$WORKDIR;
 392   rsync --protocol 30 -rlDzR \
 393         -essh ./{} $SSHLOGIN:./$WORKDIR;
 394   ssh $SSHLOGIN "$COMMAND"
 395
 396 Read about B<--protocol 30> in the section B<Rsync protocol version>.
 397
 398 =item --transferfile I<file>
 399
 400 <<todo>>
 401
 402 =item --basefile
 403
 404 <<todo>>
 405
 406 =item --return I<file>
 407
 408   $COMMAND; _EXIT_status=$?; mkdir -p $WORKDIR;
 409   rsync --protocol 30 \
 410     --rsync-path=cd\ ./$WORKDIR\;\ rsync \
 411     -rlDzR -essh $SSHLOGIN:./$FILE ./$WORKDIR;
 412   exit $_EXIT_status;
 413
 414 The B<--rsync-path=cd ...> is needed because old versions of B<rsync>
 415 do not support B<--no-implied-dirs>.
 416
 417 The B<$_EXIT_status> trick is to postpone the exit value. This makes it
 418 incompatible with B<*csh> and should be fixed in the future. Maybe a
 419 wrapping 'sh -c' is enough?
 420
 421 =item --cleanup
 422
 423 $RETURN is the wrapper from B<--return>
 424
 425   $COMMAND; _EXIT_status=$?; $RETURN;
 426   ssh $SSHLOGIN \(rm\ -f\ ./$WORKDIR/{}\;\
 427                   rmdir\ ./$WORKDIR\ \>\&/dev/null\;\);
 428   exit $_EXIT_status;
 429
 430 B<$_EXIT_status>: see B<--return> above.
 431
 432
 433 =item --pipe
 434
 435   perl -e 'if(sysread(STDIN, $buf, 1)) {
 436         open($fh, "|-", "@ARGV") || die;
 437         syswrite($fh, $buf);
 438         # Align up to 128k block
 439         if($read = sysread(STDIN, $buf, 131071)) {
 440             syswrite($fh, $buf);
 441         }
 442         while($read = sysread(STDIN, $buf, 131072)) {
 443             syswrite($fh, $buf);
 444         }
 445         close $fh;
 446         exit ($?&127 ? 128+($?&127) : 1+$?>>8)
 447     }' $SHELL -c $COMMAND
 448
 449 This small wrapper makes sure that B<$COMMAND> will never be run if
 450 there is no data.
 451
 452 =item --tmux
 453
 454 <<TODO Fixup with '-quoting>>
 455 mkfifo /tmp/tmx3cMEV &&
 456   sh -c 'tmux -S /tmp/tmsaKpv1 new-session -s p334310 -d "sleep .2" >/dev/null 2>&1';
 457 tmux -S /tmp/tmsaKpv1 new-window -t p334310 -n wc\ 10 \(wc\ 10\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ /tmp/tmx3cMEV\&echo\ wc\\\ 10\;\ echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10;
 458 exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' /tmp/tmx3cMEV
 459
 460
 461 mkfifo I<tmpfile.tmx>;
 462 tmux -S <tmpfile.tms> new-session -s pI<PID> -d 'sleep .2' >&/dev/null;
 463 tmux -S <tmpfile.tms> new-window -t pI<PID> -n <<shell quoted input>> \(<<shell quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ I<tmpfile.tmx>\&echo\ <<shell double quoted input>>\;echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10;
 464 exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' I<tmpfile.tmx>
 465
 466 First a FIFO is made (.tmx). It is used for communicating exit
 467 value. Next a new tmux session is made. This may fail if there is
 468 already a session, so the output is ignored. If all job slots finish
 469 at the same time, then B<tmux> will close the session. A temporary
 470 socket is made (.tms) to avoid a race condition in B<tmux>. It is
 471 cleaned up when GNU B<parallel> finishes.
 472
 473 The input is used as the name of the windows in B<tmux>. When the job
 474 inside B<tmux> finishes, the exit value is printed to the FIFO (.tmx).
 475 This FIFO is opened by B<perl> outside B<tmux>, and B<perl> then
 476 removes the FIFO. B<Perl> blocks until the first value is read from
 477 the FIFO, and this value is used as exit value.
 478
 479 To make it compatible with B<csh> and B<bash> the exit value is
 480 printed as: $?h/$status and this is parsed by B<perl>.
 481
 482 There is a bug that makes it necessary to print the exit value 3
 483 times.
 484
 485 Another bug in B<tmux> requires the length of the tmux title and
 486 command to not have certain limits.  When inside these limits, 75 '\ '
 487 are added to the title to force it to be outside the limits.
 488
 489 You can map the bad limits using:
 490
 491   perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 1600 1500 90 |
 492     perl -ane '$F[0]+$F[1]+$F[2] < 2037 and print ' |
 493     parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' \
 494       new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm -f /tmp/p{%}-O*'
 495
 496   perl -e 'sub r { int(rand(shift)).($_[0] && "\t".r(@_)) } print map { r(@ARGV)."\n" } 1..10000' 17000 17000 90 |
 497     parallel --colsep '\t' --tagstring '{1}\t{2}\t{3}' \
 498   tmux -S /tmp/p{%}-'{=3 $_="O"x$_ =}' new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm /tmp/p{%}-O*'
 499   > value.csv 2>/dev/null
 500
 501   R -e 'a<-read.table("value.csv");X11();plot(a[,1],a[,2],col=a[,4]+5,cex=0.1);Sys.sleep(1000)'
 502
 503 For B<tmux 1.8> 17000 can be lowered to 2100.
 504
 505 The interesting areas are title 0..1000 with (title + whole command)
 506 in 996..1127 and 9331..9636.
 507
 508 =back
 509
 510 The ordering of the wrapping is important:
 511
 512 =over 5
 513
 514 =item *
 515
 516 $PARALLEL_ENV which is set in env_parallel.* must be prepended to the
 517 command first, as the command may contain exported variables or
 518 functions.
 519
 520 =item *
 521
 522 B<--nice>/B<--cat>/B<--fifo> should be done on the remote machine
 523
 524 =item *
 525
 526 B<--pipepart>/B<--pipe> should be done on the local machine inside B<--tmux>
 527
 528 =back
 529
 530
 531 =head2 Convenience options --nice --basefile --transfer --return
 532 --cleanup --tmux --group --compress --cat --fifo --workdir --tag
 533 --tagstring
 534
 535 These are all convenience options that make it easier to do a
 536 task. But more importantly: They are tested to work on corner cases,
 537 too. Take B<--nice> as an example:
 538
 539   nice parallel command ...
 540
 541 will work just fine. But when run remotely, you need to move the nice
 542 command so it is being run on the server:
 543
 544   parallel -S server nice command ...
 545
 546 And this will again work just fine, as long as you are running a
 547 single command. When you are running a composed command you need nice
 548 to apply to the whole command, and it gets harder still:
 549
 550   parallel -S server -q nice bash -c 'command1 ...; cmd2 | cmd3'
 551
 552 It is not impossible, but by using B<--nice> GNU B<parallel> will do
 553 the right thing for you. Similarly when transferring files: It starts
 554 to get hard when the file names contain space, :, `, *, or other
 555 special characters.
 556
 557 To run the commands in a B<tmux> session you basically just need to
 558 quote the command. For simple commands that is easy, but when commands
 559 contain special characters, it gets much harder to get right.
 560
 561 B<--compress> not only compresses standard output (stdout) but also
 562 standard error (stderr); and it does so into files, that are open but
 563 deleted, so a crash will not leave these files around.
 564
 565 B<--cat> and B<--fifo> are easy to do by hand, until you want to clean
 566 up the tmpfile and keep the exit code of the command.
 567
 568 The real killer comes when you try to combine several of these: Doing
 569 that correctly for all corner cases is next to impossible to do by
 570 hand.
 571
 572 =head2 --shard
 573
 574 The simple way to implement sharding would be to:
 575
 576 =over 5
 577
 578 =item 1
 579
 580 start n jobs,
 581
 582 =item 2
 583
 584 split each line into columns,
 585
 586 =item 3
 587
 588 select the data from the relevant column
 589
 590 =item 4
 591
 592 compute a hash value from the data
 593
 594 =item 5
 595
 596 take the modulo n of the hash value
 597
 598 =item 6
 599
 600 pass the full line to the jobslot that has the computed value
 601
 602 =back
 603
 604 Unfortunately Perl is rather slow at computing the hash value (and
 605 somewhat slow at splitting into columns).
 606
 607 One solution is to use a compiled language for the splitting and
 608 hashing, but that would go against the design criteria of not
 609 depending on a compiler.
 610
 611 Luckily those tasks can be parallelized. So GNU B<parallel> starts n
 612 sharders that do step 2-6, and passes blocks of 100k to each of those
 613 in a round robin manner. To make sure these sharders compute the hash
 614 the same way, $PERL_HASH_SEED is set to the same value for all sharders.
 615
 616 Running n sharders poses a new problem: Instead of having n outputs
 617 (one for each computed value) you now have n outputs for each of the n
 618 values, so in total n*n outputs; and you need to merge these n*n
 619 outputs together into n outputs.
 620
 621 This can be done by simply running 'parallel -j0 --lb cat :::
 622 outputs_for_one_value', but that is rather inefficient, as it spawns a
 623 process for each file. Instead the core code from 'parcat' is run,
 624 which is also a bit faster.
 625
 626 All the sharders and parcats communicate through named pipes that are
 627 unlinked as soon as they are opened.
 628
 629
 630 =head2 Shell shock
 631
 632 The shell shock bug in B<bash> did not affect GNU B<parallel>, but the
 633 solutions did. B<bash> first introduced functions in variables named:
 634 I<BASH_FUNC_myfunc()> and later changed that to
 635 I<BASH_FUNC_myfunc%%>. When transferring functions GNU B<parallel>
 636 reads off the function and changes that into a function definition,
 637 which is copied to the remote system and executed before the actual
 638 command is executed. Therefore GNU B<parallel> needs to know how to
 639 read the function.
 640
 641 From version 20150122 GNU B<parallel> tries both the ()-version and
 642 the %%-version, and the function definition works on both pre- and
 643 post-shell shock versions of B<bash>.
 644
 645
 646 =head2 The remote system wrapper
 647
 648 The remote system wrapper does some initialization before starting the
 649 command on the remote system.
 650
 651 =head3 Make quoting unnecessary by hex encoding everything
 652
 653 When you run B<ssh server foo> then B<foo> has to be quoted once:
 654
 655   ssh server "echo foo; echo bar"
 656
 657 If you run B<ssh server1 ssh server2 foo> then B<foo> has to be quoted
 658 twice:
 659
 660   ssh server1 ssh server2 \'"echo foo; echo bar"\'
 661
 662 GNU B<parallel> avoids this by packing everyting into hex values and
 663 running a command that does not need quoting:
 664
 665   perl -X -e GNU_Parallel_worker,eval+pack+q/H10000000/,join+q//,@ARGV
 666
 667 This command reads hex from the command line and converts that to
 668 bytes that are then eval'ed as a Perl expression.
 669
 670 The string B<GNU_Parallel_worker> is not needed. It is simply there to
 671 let the user know, that this process is GNU B<parallel> working.
 672
 673 =head3 Ctrl-C and standard error (stderr)
 674
 675 If the user presses Ctrl-C the user expects jobs to stop. This works
 676 out of the box if the jobs are run locally. Unfortunately it is not so
 677 simple if the jobs are run remotely.
 678
 679 If remote jobs are run in a tty using B<ssh -tt>, then Ctrl-C works,
 680 but all output to standard error (stderr) is sent to standard output
 681 (stdout). This is not what the user expects.
 682
 683 If remote jobs are run without a tty using B<ssh> (without B<-tt>),
 684 then output to standard error (stderr) is kept on stderr, but Ctrl-C
 685 does not kill remote jobs. This is not what the user expects.
 686
 687 So what is needed is a way to have both. It seems the reason why
 688 Ctrl-C does not kill the remote jobs is because the shell does not
 689 propagate the hang-up signal from B<sshd>. But when B<sshd> dies, the
 690 parent of the login shell becomes B<init> (process id 1). So by
 691 exec'ing a Perl wrapper to monitor the parent pid and kill the child
 692 if the parent pid becomes 1, then Ctrl-C works and stderr is kept on
 693 stderr.
 694
 695 Ctrl-C does, however, kill the ssh connection, so any output from
 696 a remote dying process is lost.
 697
 698 To be able to kill all (grand)*children a new process group is
 699 started.
 700
 701
 702 =head3 --nice
 703
 704 B<nice>ing the remote process is done by B<setpriority(0,0,$nice)>. A
 705 few old systems do not implement this and B<--nice> is unsupported on
 706 those.
 707
 708
 709 =head3 Setting $PARALLEL_TMP
 710
 711 B<$PARALLEL_TMP> is used by B<--fifo> and B<--cat> and must point to a
 712 non-exitent file in B<$TMPDIR>. This file name is computed on the
 713 remote system.
 714
 715
 716 =head3 The wrapper
 717
 718 The wrapper looks like this:
 719
 720   $shell = $PARALLEL_SHELL || $SHELL;
 721   $tmpdir = $TMPDIR || $PARALLEL_REMOTE_TMPDIR;
 722   $nice = $opt::nice;
 723   $termseq = $opt::termseq;
 724
 725   # Check that $tmpdir is writable
 726   -w $tmpdir ||
 727       die("$tmpdir is not writable.".
 728         " Set PARALLEL_REMOTE_TMPDIR");
 729   # Set $PARALLEL_TMP to a non-existent file name in $TMPDIR
 730   do {
 731       $ENV{PARALLEL_TMP} = $tmpdir."/par".
 732         join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
 733   } while(-e $ENV{PARALLEL_TMP});
 734   # Set $script to a non-existent file name in $TMPDIR
 735   do {
 736       $script = $tmpdir."/par".
 737         join"", map { (0..9,"a".."z","A".."Z")[rand(62)] } (1..5);
 738   } while(-e $script);
 739   # Create a script from the hex code
 740   # that removes itself and runs the commands
 741   open($fh,">",$script) || die;
 742   # ' needed due to rc-shell
 743   print($fh("rm \'$script\'\n",$bashfunc.$cmd));
 744   close $fh;
 745   my $parent = getppid;
 746   my $done = 0;
 747   $SIG{CHLD} = sub { $done = 1; };
 748   $pid = fork;
 749   unless($pid) {
 750       # Make own process group to be able to kill HUP it later
 751       eval { setpgrp };
 752       # Set nice value
 753       eval { setpriority(0,0,$nice) };
 754       # Run the script
 755       exec($shell,$script);
 756       die("exec failed: $!");
 757   }
 758   while((not $done) and (getppid == $parent)) {
 759       # Parent pid is not changed, so sshd is alive
 760       # Exponential sleep up to 1 sec
 761       $s = $s < 1 ? 0.001 + $s * 1.03 : $s;
 762       select(undef, undef, undef, $s);
 763   }
 764   if(not $done) {
 765       # sshd is dead: User pressed Ctrl-C
 766       # Kill as per --termseq
 767       my @term_seq = split/,/,$termseq;
 768       if(not @term_seq) {
 769         @term_seq = ("TERM",200,"TERM",100,"TERM",50,"KILL",25);
 770       }
 771       while(@term_seq && kill(0,-$pid)) {
 772         kill(shift @term_seq, -$pid);
 773         select(undef, undef, undef, (shift @term_seq)/1000);
 774       }
 775   }
 776   wait;
 777   exit ($?&127 ? 128+($?&127) : 1+$?>>8)
 778
 779
 780 =head2 Transferring of variables and functions
 781
 782 Transferring of variables and functions given by B<--env> is done by
 783 running a Perl script remotely that calls the actual command. The Perl
 784 script sets B<$ENV{>I<variable>B<}> to the correct value before
 785 exec'ing a shell that runs the function definition followed by the
 786 actual command.
 787
 788 The function B<env_parallel> copies the full current environment into
 789 the environment variable B<PARALLEL_ENV>. This variable is picked up
 790 by GNU B<parallel> and used to create the Perl script mentioned above.
 791
 792
 793 =head2 Base64 encoded bzip2
 794
 795 B<csh> limits words of commands to 1024 chars. This is often too little
 796 when GNU B<parallel> encodes environment variables and wraps the
 797 command with different templates. All of these are combined and quoted
 798 into one single word, which often is longer than 1024 chars.
 799
 800 When the line to run is > 1000 chars, GNU B<parallel> therefore
 801 encodes the line to run. The encoding B<bzip2>s the line to run,
 802 converts this to base64, splits the base64 into 1000 char blocks (so
 803 B<csh> does not fail), and prepends it with this Perl script that
 804 decodes, decompresses and B<eval>s the line.
 805
 806     @GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
 807     eval "@GNU_Parallel";
 808
 809     $SIG{CHLD}="IGNORE";
 810     # Search for bzip2. Not found => use default path
 811     my $zip = (grep { -x $_ } "/usr/local/bin/bzip2")[0] || "bzip2";
 812     # $in = stdin on $zip, $out = stdout from $zip
 813     my($in, $out,$eval);
 814     open3($in,$out,">&STDERR",$zip,"-dc");
 815     if(my $perlpid = fork) {
 816         close $in;
 817         $eval = join "", <$out>;
 818         close $out;
 819     } else {
 820         close $out;
 821         # Pipe decoded base64 into 'bzip2 -dc'
 822         print $in (decode_base64(join"",@ARGV));
 823         close $in;
 824         exit;
 825     }
 826     wait;
 827     eval $eval;
 828
 829 Perl and B<bzip2> must be installed on the remote system, but a small
 830 test showed that B<bzip2> is installed by default on all platforms
 831 that runs GNU B<parallel>, so this is not a big problem.
 832
 833 The added bonus of this is that much bigger environments can now be
 834 transferred as they will be below B<bash>'s limit of 131072 chars.
 835
 836
 837 =head2 Which shell to use
 838
 839 Different shells behave differently. A command that works in B<tcsh>
 840 may not work in B<bash>.  It is therefore important that the correct
 841 shell is used when GNU B<parallel> executes commands.
 842
 843 GNU B<parallel> tries hard to use the right shell. If GNU B<parallel>
 844 is called from B<tcsh> it will use B<tcsh>.  If it is called from
 845 B<bash> it will use B<bash>. It does this by looking at the
 846 (grand)*parent process: If the (grand)*parent process is a shell, use
 847 this shell; otherwise look at the parent of this (grand)*parent. If
 848 none of the (grand)*parents are shells, then $SHELL is used.
 849
 850 This will do the right thing if called from:
 851
 852 =over 2
 853
 854 =item *
 855
 856 an interactive shell
 857
 858 =item *
 859
 860 a shell script
 861
 862 =item *
 863
 864 a Perl script in `` or using B<system> if called as a single string.
 865
 866 =back
 867
 868 While these cover most cases, there are situations where it will fail:
 869
 870 =over 2
 871
 872 =item *
 873
 874 When run using B<exec>.
 875
 876 =item *
 877
 878 When run as the last command using B<-c> from another shell (because
 879 some shells use B<exec>):
 880
 881   zsh% bash -c "parallel 'echo {} is not run in bash; \
 882        set | grep BASH_VERSION' ::: This"
 883
 884 You can work around that by appending '&& true':
 885
 886   zsh% bash -c "parallel 'echo {} is run in bash; \
 887        set | grep BASH_VERSION' ::: This && true"
 888
 889 =item *
 890
 891 When run in a Perl script using B<system> with parallel as the first
 892 string:
 893
 894   #!/usr/bin/perl
 895
 896   system("parallel",'setenv a {}; echo $a',":::",2);
 897
 898 Here it depends on which shell is used to call the Perl script. If the
 899 Perl script is called from B<tcsh> it will work just fine, but if it
 900 is called from B<bash> it will fail, because the command B<setenv> is
 901 not known to B<bash>.
 902
 903 =back
 904
 905 If GNU B<parallel> guesses wrong in these situation, set the shell using
 906 B<$PARALLEL_SHELL>.
 907
 908
 909 =head2 Always running commands in a shell
 910
 911 If the command is a simple command with no redirection and setting of
 912 variables, the command I<could> be run without spawning a
 913 shell. E.g. this simple B<grep> matching either 'ls ' or ' wc E<gt>E<gt> c':
 914
 915   parallel "grep -E 'ls | wc >> c' {}" ::: foo
 916
 917 could be run as:
 918
 919   system("grep","-E","ls | wc >> c","foo");
 920
 921 However, as soon as the command is a bit more complex a shell I<must>
 922 be spawned:
 923
 924   parallel "grep -E 'ls | wc >> c' {} | wc >> c" ::: foo
 925   parallel "LANG=C grep -E 'ls | wc >> c' {}" ::: foo
 926
 927 It is impossible to tell how B<| wc E<gt>E<gt> c> should be
 928 interpreted without parsing the string (is the B<|> a pipe in shell or
 929 an alternation in a B<grep> regexp?  Is B<LANG=C> a command in B<csh>
 930 or setting a variable in B<bash>? Is B<E<gt>E<gt>> redirection or part
 931 of a regexp?).
 932
 933 On top of this, wrapper scripts will often require a shell to be
 934 spawned.
 935
 936 The downside is that you need to quote special shell chars twice:
 937
 938   parallel echo '*' ::: This will expand the asterisk
 939   parallel echo "'*'" ::: This will not
 940   parallel "echo '*'" ::: This will not
 941   parallel echo '\*' ::: This will not
 942   parallel echo \''*'\' ::: This will not
 943   parallel -q echo '*' ::: This will not
 944
 945 B<-q> will quote all special chars, thus redirection will not work:
 946 this prints '* > out.1' and I<does not> save '*' into the file out.1:
 947
 948   parallel -q echo "*" ">" out.{} ::: 1
 949
 950 GNU B<parallel> tries to live up to Principle Of Least Astonishment
 951 (POLA), and the requirement of using B<-q> is hard to understand, when
 952 you do not see the whole picture.
 953
 954
 955 =head2 Quoting
 956
 957 Quoting depends on the shell. For most shells '-quoting is used for
 958 strings containing special characters.
 959
 960 For B<tcsh>/B<csh> newline is quoted as \ followed by newline. Other
 961 special characters are also \-quoted.
 962
 963 For B<rc> everything is quoted using '.
 964
 965
 966 =head2 --pipepart vs. --pipe
 967
 968 While B<--pipe> and B<--pipepart> look much the same to the user, they are
 969 implemented very differently.
 970
 971 With B<--pipe> GNU B<parallel> reads the blocks from standard input
 972 (stdin), which is then given to the command on standard input (stdin);
 973 so every block is being processed by GNU B<parallel> itself. This is
 974 the reason why B<--pipe> maxes out at around 500 MB/sec.
 975
 976 B<--pipepart>, on the other hand, first identifies at which byte
 977 positions blocks start and how long they are. It does that by seeking
 978 into the file by the size of a block and then reading until it meets
 979 end of a block. The seeking explains why GNU B<parallel> does not know
 980 the line number and why B<-L/-l> and B<-N> do not work.
 981
 982 With a reasonable block and file size this seeking is more than 1000
 983 time faster than reading the full file. The byte positions are then
 984 given to a small script that reads from position X to Y and sends
 985 output to standard output (stdout). This small script is prepended to
 986 the command and the full command is executed just as if GNU
 987 B<parallel> had been in its normal mode. The script looks like this:
 988
 989   < file perl -e 'while(@ARGV) {
 990      sysseek(STDIN,shift,0) || die;
 991      $left = shift;
 992      while($read = sysread(STDIN,$buf,
 993                            ($left > 131072 ? 131072 : $left))){
 994        $left -= $read; syswrite(STDOUT,$buf);
 995      }
 996   }' startbyte length_in_bytes
 997
 998 It delivers 1 GB/s per core.
 999
1000 Instead of the script B<dd> was tried, but many versions of B<dd> do
1001 not support reading from one byte to another and might cause partial
1002 data. See this for a surprising example:
1003
1004   yes | dd bs=1024k count=10 | wc
1005
1006
1007 =head2 --block-size adjustment
1008
1009 Every time GNU B<parallel> detects a record bigger than
1010 B<--block-size> it increases the block size by 30%. A small
1011 B<--block-size> gives very poor performance; by exponentially
1012 increasing the block size performance will not suffer.
1013
1014 GNU B<parallel> will waste CPU power if B<--block-size> does not
1015 contain a full record, because it tries to find a full record and will
1016 fail to do so. The recommendation is therefore to use a
1017 B<--block-size> > 2 records, so you always get at least one full
1018 record when you read one block.
1019
1020 If you use B<-N> then B<--block-size> should be big enough to contain
1021 N+1 records.
1022
1023
1024 =head2 Automatic --block-size computation
1025
1026 With B<--pipepart> GNU B<parallel> can compute the B<--block-size>
1027 automatically. A B<--block-size> of B<-1> will use a block size so
1028 that each jobslot will receive approximately 1 block.  B<--block -2>
1029 will pass 2 blocks to each jobslot and B<->I<n> will pass I<n> blocks
1030 to each jobslot.
1031
1032 This can be done because B<--pipepart> reads from files, and we can
1033 compute the total size of the input.
1034
1035
1036 =head2 --jobs and --onall
1037
1038 When running the same commands on many servers what should B<--jobs>
1039 signify? Is it the number of servers to run on in parallel?  Is it the
1040 number of jobs run in parallel on each server?
1041
1042 GNU B<parallel> lets B<--jobs> represent the number of servers to run
1043 on in parallel. This is to make it possible to run a sequence of
1044 commands (that cannot be parallelized) on each server, but run the
1045 same sequence on multiple servers.
1046
1047
1048 =head2 --shuf
1049
1050 When using B<--shuf> to shuffle the jobs, all jobs are read, then they
1051 are shuffled, and finally executed. When using SQL this makes the
1052 B<--sqlmaster> be the part that shuffles the jobs. The B<--sqlworker>s
1053 simply executes according to Seq number.
1054
1055
1056 =head2 --csv
1057
1058 B<--pipepart> is incompatible with B<--csv> because you can have
1059 records like:
1060
1061   a,b,c
1062   a,"
1063   a,b,c
1064   a,b,c
1065   a,b,c
1066   ",c
1067   a,b,c
1068
1069 Here the second record contains a multi-line field that looks like
1070 records. Since B<--pipepart> does not read then whole file when
1071 searching for record endings, it may start reading in this multi-line
1072 field, which would be wrong.
1073
1074
1075 =head2 Buffering on disk
1076
1077 GNU B<parallel> buffers output, because if output is not buffered you
1078 have to be ridiculously careful on sizes to avoid mixing of outputs
1079 (see excellent example on https://catern.com/posts/pipes.html).
1080
1081 GNU B<parallel> buffers on disk in $TMPDIR using files, that are
1082 removed as soon as they are created, but which are kept open. So even
1083 if GNU B<parallel> is killed by a power outage, there will be no files
1084 to clean up afterwards. Another advantage is that the file system is
1085 aware that these files will be lost in case of a crash, so it does
1086 not need to sync them to disk.
1087
1088 It gives the odd situation that a disk can be fully used, but there
1089 are no visible files on it.
1090
1091
1092 =head3 Partly buffering in memory
1093
1094 When using output formats SQL and CSV then GNU Parallel has to read
1095 the whole output into memory. When run normally it will only read the
1096 output from a single job. But when using B<--linebuffer> every line
1097 printed will also be buffered in memory - for all jobs currently
1098 running.
1099
1100 If memory is tight, then do not use the output format SQL/CSV with
1101 B<--linebuffer>.
1102
1103
1104 =head3 Comparing to buffering in memory
1105
1106 B<gargs> is a parallelizing tool that buffers in memory. It is
1107 therefore a useful way of comparing the advantages and disadvantages
1108 of buffering in memory to buffering on disk.
1109
1110 On an system with 6 GB RAM free and 6 GB free swap these were tested
1111 with different sizes:
1112
1113   echo /dev/zero | gargs "head -c $size {}" >/dev/null
1114   echo /dev/zero | parallel "head -c $size {}" >/dev/null
1115
1116 The results are here:
1117
1118   JobRuntime      Command
1119        0.344      parallel_test 1M
1120        0.362      parallel_test 10M
1121        0.640      parallel_test 100M
1122        9.818      parallel_test 1000M
1123       23.888      parallel_test 2000M
1124       30.217      parallel_test 2500M
1125       30.963      parallel_test 2750M
1126       34.648      parallel_test 3000M
1127       43.302      parallel_test 4000M
1128       55.167      parallel_test 5000M
1129       67.493      parallel_test 6000M
1130      178.654      parallel_test 7000M
1131      204.138      parallel_test 8000M
1132      230.052      parallel_test 9000M
1133      255.639      parallel_test 10000M
1134      757.981      parallel_test 30000M
1135        0.537      gargs_test 1M
1136        0.292      gargs_test 10M
1137        0.398      gargs_test 100M
1138        3.456      gargs_test 1000M
1139        8.577      gargs_test 2000M
1140       22.705      gargs_test 2500M
1141      123.076      gargs_test 2750M
1142       89.866      gargs_test 3000M
1143      291.798      gargs_test 4000M
1144
1145 GNU B<parallel> is pretty much limited by the speed of the disk: Up to
1146 6 GB data is written to disk but cached, so reading is fast. Above 6
1147 GB data are both written and read from disk. When the 30000MB job is
1148 running, the disk system is slow, but usable: If you are not using the
1149 disk, you almost do not feel it.
1150
1151 B<gargs> has a speed advantage up until 2500M where it hits a
1152 wall. Then the system starts swapping like crazy and is completely
1153 unusable. At 5000M it goes out of memory.
1154
1155 You can make GNU B<parallel> behave similar to B<gargs> if you point
1156 $TMPDIR to a tmpfs-filesystem: It will be faster for small outputs,
1157 but may kill your system for larger outputs and cause you to lose
1158 output.
1159
1160
1161 =head2 Disk full
1162
1163 GNU B<parallel> buffers on disk. If the disk is full, data may be
1164 lost. To check if the disk is full GNU B<parallel> writes a 8193 byte
1165 file every second. If this file is written successfully, it is removed
1166 immediately. If it is not written successfully, the disk is full. The
1167 size 8193 was chosen because 8192 gave wrong result on some file
1168 systems, whereas 8193 did the correct thing on all tested filesystems.
1169
1170
1171 =head2 Memory usage
1172
1173 Normally GNU B<parallel> will use around 17 MB RAM constantly - no
1174 matter how many jobs or how much output there is. There are a few
1175 things that cause the memory usage to rise:
1176
1177 =over 3
1178
1179 =item *
1180
1181 Multiple input sources. GNU B<parallel> reads an input source only
1182 once. This is by design, as an input source can be a stream
1183 (e.g. FIFO, pipe, standard input (stdin)) which cannot be rewound and
1184 read again. When reading a single input source, the memory is freed as
1185 soon as the job is done - thus keeping the memory usage constant.
1186
1187 But when reading multiple input sources GNU B<parallel> keeps the
1188 already read values for generating all combinations with other input
1189 sources.
1190
1191 =item *
1192
1193 Computing the number of jobs. B<--bar>, B<--eta>, and B<--halt xx%>
1194 use B<total_jobs()> to compute the total number of jobs. It does this
1195 by generating the data structures for all jobs. All these job data
1196 structures will be stored in memory and take up around 400 bytes/job.
1197
1198 =item *
1199
1200 Buffering a full line. B<--linebuffer> will read a full line per
1201 running job. A very long output line (say 1 GB without \n) will
1202 increase RAM usage temporarily: From when the beginning of the line is
1203 read till the line is printed.
1204
1205 =item *
1206
1207 Buffering the full output of a single job. This happens when using
1208 B<--results *.csv/*.tsv> or B<--sql*>. Here GNU B<parallel> will read
1209 the whole output of a single job and save it as csv/tsv or SQL.
1210
1211 =back
1212
1213
1214 =head2 Argument separators ::: :::: :::+ ::::+
1215
1216 The argument separator B<:::> was chosen because I have never seen
1217 B<:::> used in any command. The natural choice B<--> would be a bad
1218 idea since it is not unlikely that the template command will contain
1219 B<-->. I have seen B<::> used in programming languanges to separate
1220 classes, and I did not want the user to be confused that the separator
1221 had anything to do with classes.
1222
1223 B<:::> also makes a visual separation, which is good if there are
1224 multiple B<:::>.
1225
1226 When B<:::> was chosen, B<::::> came as a fairly natural extension.
1227
1228 Linking input sources meant having to decide for some way to indicate
1229 linking of B<:::> and B<::::>. B<:::+> and B<::::+> were chosen, so
1230 that they were similar to B<:::> and B<::::>.
1231
1232 In 2022 I realized that B<///> would have been an even better choice,
1233 because you cannot have an file named B<///> whereas you I<can> have a
1234 file named B<:::>.
1235
1236
1237 =head2 Perl replacement strings, {= =}, and --rpl
1238
1239 The shorthands for replacement strings make a command look more
1240 cryptic. Different users will need different replacement
1241 strings. Instead of inventing more shorthands you get more
1242 flexible replacement strings if they can be programmed by the user.
1243
1244 The language Perl was chosen because GNU B<parallel> is written in
1245 Perl and it was easy and reasonably fast to run the code given by the
1246 user.
1247
1248 If a user needs the same programmed replacement string again and
1249 again, the user may want to make his own shorthand for it. This is
1250 what B<--rpl> is for. It works so well, that even GNU B<parallel>'s
1251 own shorthands are implemented using B<--rpl>.
1252
1253 In Perl code the bigrams B<{=> and B<=}> rarely exist. They look like a
1254 matching pair and can be entered on all keyboards. This made them good
1255 candidates for enclosing the Perl expression in the replacement
1256 strings. Another candidate ,, and ,, was rejected because they do not
1257 look like a matching pair. B<--parens> was made, so that the users can
1258 still use ,, and ,, if they like: B<--parens ,,,,>
1259
1260 Internally, however, the B<{=> and B<=}> are replaced by \257< and
1261 \257>. This is to make it simpler to make regular expressions. You
1262 only need to look one character ahead, and never have to look behind.
1263
1264
1265 =head2 Test suite
1266
1267 GNU B<parallel> uses its own testing framework. This is mostly due to
1268 historical reasons. It deals reasonably well with tests that are
1269 dependent on how long a given test runs (e.g. more than 10 secs is a
1270 pass, but less is a fail). It parallelizes most tests, but it is easy
1271 to force a test to run as the single test (which may be important for
1272 timing issues). It deals reasonably well with tests that fail
1273 intermittently. It detects which tests failed and pushes these to the
1274 top, so when running the test suite again, the tests that failed most
1275 recently are run first.
1276
1277 If GNU B<parallel> should adopt a real testing framework then those
1278 elements would be important.
1279
1280 Since many tests are dependent on which hardware it is running on,
1281 these tests break when run on a different hardware than what the test
1282 was written for.
1283
1284 When most bugs are fixed a test is added, so this bug will not
1285 reappear. It is, however, sometimes hard to create the environment in
1286 which the bug shows up - especially if the bug only shows up
1287 sometimes. One of the harder problems was to make a machine start
1288 swapping without forcing it to its knees.
1289
1290
1291 =head2 Median run time
1292
1293 Using a percentage for B<--timeout> causes GNU B<parallel> to compute
1294 the median run time of a job. The median is a better indicator of the
1295 expected run time than average, because there will often be outliers
1296 taking way longer than the normal run time.
1297
1298 To avoid keeping all run times in memory, an implementation of
1299 remedian was made (Rousseeuw et al).
1300
1301
1302 =head2 Error messages and warnings
1303
1304 Error messages like: ERROR, Not found, and 42 are not very
1305 helpful. GNU B<parallel> strives to inform the user:
1306
1307 =over 2
1308
1309 =item *
1310
1311 What went wrong?
1312
1313 =item *
1314
1315 Why did it go wrong?
1316
1317 =item *
1318
1319 What can be done about it?
1320
1321 =back
1322
1323 Unfortunately it is not always possible to predict the root cause of
1324 the error.
1325
1326
1327 =head2 Determine number of CPUs
1328
1329 CPUs is an ambiguous term. It can mean the number of socket filled
1330 (i.e. the number of physical chips). It can mean the number of cores
1331 (i.e. the number of physical compute cores). It can mean the number of
1332 hyperthreaded cores (i.e. the number of virtual cores - with some of
1333 them possibly being hyperthreaded).
1334
1335 On ark.intel.com Intel uses the terms I<cores> and I<threads> for
1336 number of physical cores and the number of hyperthreaded cores
1337 respectively.
1338
1339 GNU B<parallel> uses uses I<CPUs> as the number of compute units and
1340 the terms I<sockets>, I<cores>, and I<threads> to specify how the
1341 number of compute units is calculated.
1342
1343
1344 =head2 Computation of load
1345
1346 Contrary to the obvious B<--load> does not use load average. This is
1347 due to load average rising too slowly. Instead it uses B<ps> to list
1348 the number of threads in running or blocked state (state D, O or
1349 R). This gives an instant load.
1350
1351 As remote calculation of load can be slow, a process is spawned to run
1352 B<ps> and put the result in a file, which is then used next time.
1353
1354
1355 =head2 Killing jobs
1356
1357 GNU B<parallel> kills jobs. It can be due to B<--memfree>, B<--halt>,
1358 or when GNU B<parallel> meets a condition from which it cannot
1359 recover. Every job is started as its own process group. This way any
1360 (grand)*children will get killed, too. The process group is killed
1361 with the specification mentioned in B<--termseq>.
1362
1363
1364 =head2 SQL interface
1365
1366 GNU B<parallel> uses the DBURL from GNU B<sql> to give database
1367 software, username, password, host, port, database, and table in a
1368 single string.
1369
1370 The DBURL must point to a table name. The table will be dropped and
1371 created. The reason for not reusing an existing table is that the user
1372 may have added more input sources which would require more columns in
1373 the table. By prepending '+' to the DBURL the table will not be
1374 dropped.
1375
1376 The table columns are similar to joblog with the addition of B<V1>
1377 .. B<Vn> which are values from the input sources, and Stdout and
1378 Stderr which are the output from standard output and standard error,
1379 respectively.
1380
1381 The Signal column has been renamed to _Signal due to Signal being a
1382 reserved word in MySQL.
1383
1384
1385 =head2 Logo
1386
1387 The logo is inspired by the Cafe Wall illusion. The font is DejaVu
1388 Sans.
1389
1390 =head2 Citation notice
1391
1392 For details: See
1393 https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt
1394
1395 Funding a free software project is hard. GNU B<parallel> is no
1396 exception. On top of that it seems the less visible a project is, the
1397 harder it is to get funding. And the nature of GNU B<parallel> is that
1398 it will never be seen by "the guy with the checkbook", but only by the
1399 people doing the actual work.
1400
1401 This problem has been covered by others - though no solution has been
1402 found: https://www.slideshare.net/NadiaEghbal/consider-the-maintainer
1403 https://www.numfocus.org/blog/why-is-numpy-only-now-getting-funded/
1404
1405 Before implementing the citation notice it was discussed with the
1406 users:
1407 https://lists.gnu.org/archive/html/parallel/2013-11/msg00006.html
1408
1409 Having to spend 10 seconds on running B<parallel --citation> once is
1410 no doubt not an ideal solution, but no one has so far come up with an
1411 ideal solution - neither for funding GNU B<parallel> nor other free
1412 software.
1413
1414 If you believe you have the perfect solution, you should try it out,
1415 and if it works, you should post it on the email list. Ideas that will
1416 cost work and which have not been tested are, however, unlikely to be
1417 prioritized.
1418
1419 Running B<parallel --citation> one single time takes less than 10
1420 seconds, and will silence the citation notice for future runs. This is
1421 comparable to graphical tools where you have to click a checkbox
1422 saying "Do not show this again". But if that is too much trouble for
1423 you, why not use one of the alternatives instead?  See a list in:
1424 B<man parallel_alternatives>.
1425
1426 As the request for citation is not a legal requirement this is
1427 acceptable under GPLv3 and cleared with Richard M. Stallman
1428 himself. Thus it does not fall under this:
1429 https://www.gnu.org/licenses/gpl-faq.en.html#RequireCitation
1430
1431
1432 =head1 Ideas for new design
1433
1434 =head2 Multiple processes working together
1435
1436 Open3 is slow. Printing is slow. It would be good if they did not tie
1437 up resources, but were run in separate threads.
1438
1439
1440 =head2 --rrs on remote using a perl wrapper
1441
1442 ... | perl -pe '$/=$recend$recstart;BEGIN{ if(substr($_) eq $recstart) substr($_)="" } eof and substr($_) eq $recend) substr($_)=""
1443
1444 It ought to be possible to write a filter that removed rec sep on the
1445 fly instead of inside GNU B<parallel>. This could then use more cpus.
1446
1447 Will that require 2x record size memory?
1448
1449 Will that require 2x block size memory?
1450
1451
1452 =head1 Historical decisions
1453
1454 These decisions were relevant for earlier versions of GNU B<parallel>,
1455 but not the current version. They are kept here as historical record.
1456
1457
1458 =head2 --tollef
1459
1460 You can read about the history of GNU B<parallel> on
1461 https://www.gnu.org/software/parallel/history.html
1462
1463 B<--tollef> was included to make GNU B<parallel> switch compatible
1464 with the parallel from moreutils (which is made by Tollef Fog
1465 Heen). This was done so that users of that parallel easily could port
1466 their use to GNU B<parallel>: Simply set B<PARALLEL="--tollef"> and
1467 that would be it.
1468
1469 But several distributions chose to make B<--tollef> global (by putting
1470 it into /etc/parallel/config) without making the users aware of this,
1471 and that caused much confusion when people tried out the examples from
1472 GNU B<parallel>'s man page and these did not work.  The users became
1473 frustrated because the distribution did not make it clear to them that
1474 it has made B<--tollef> global.
1475
1476 So to lessen the frustration and the resulting support, B<--tollef>
1477 was obsoleted 20130222 and removed one year later.
1478
1479
1480 =cut