Released as 20240522 ('Tbilisi')
[parallel.git] / src / parallel_book.pod
blob2919af7917d8f79bb119cdee8cb35263c7cd7d36
1 #!/usr/bin/perl -w
3 # SPDX-FileCopyrightText: 2021-2024 Ole Tange, http://ole.tange.dk and Free Software and Foundation, Inc.
4 # SPDX-License-Identifier: GFDL-1.3-or-later
5 # SPDX-License-Identifier: CC-BY-SA-4.0
7 =encoding utf8
9 =head1 Why should you read this book?
11 If you write shell scripts to do the same processing for different
12 input, then GNU B<parallel> will make your life easier and make your
13 scripts run faster.
15 The book is written so you get the juicy parts first: The goal is that
16 you read just enough to get you going. GNU B<parallel> has an
17 overwhelming amount of special features to help in different
18 situations, and to avoid overloading you with information, the most
19 used features are presented first.
21 All the examples are tested in Bash, and most will work in other
22 shells, too, but there are a few exceptions. So you are recommended to
23 use Bash while testing out the examples.
26 =head1 Learn GNU Parallel in 5 minutes
28 You just need to run commands in parallel. You do not care about fine
29 tuning.
31 To get going please run this to make some example files:
33 # If your system does not have 'seq', replace 'seq' with 'jot'
34 seq 5 | parallel seq {} '>' example.{}
36 =head2 Input sources
38 GNU B<parallel> reads values from input sources. One input source is
39 the command line. The values are put after B<:::> :
41 parallel echo ::: 1 2 3 4 5
43 This makes it easy to run the same program on some files:
45 parallel wc ::: example.*
47 If you give multiple B<:::>s, GNU B<parallel> will generate all
48 combinations:
50 parallel wc ::: -l -c ::: example.*
52 GNU B<parallel> can also read the values from stdin (standard input):
54 seq 5 | parallel echo
57 =head2 Building the command line
59 The command line is put before the B<:::>. It can contain contain a
60 command and options for the command:
62 parallel wc -l ::: example.*
64 The command can contain multiple programs. Just remember to quote
65 characters that are interpreted by the shell (such as B<;>):
67 parallel echo counting lines';' wc -l ::: example.*
69 The value will normally be appended to the command, but can be placed
70 anywhere by using the replacement string B<{}>:
72 parallel echo counting {}';' wc -l {} ::: example.*
74 When using multiple input sources you use the positional replacement
75 strings B<{1}> and B<{2}>:
77 parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
79 You can check what will be run with B<--dry-run>:
81 parallel --dry-run echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
83 This is a good idea to do for every command until you are comfortable
84 with GNU B<parallel>.
86 =head2 Controlling the output
88 The output will be printed as soon as the command completes. This
89 means the output may come in a different order than the input:
91 parallel sleep {}';' echo {} done ::: 5 4 3 2 1
93 You can force GNU B<parallel> to print in the order of the values with
94 B<--keep-order>/B<-k>. This will still run the commands in parallel.
95 The output of the later jobs will be delayed, until the earlier jobs
96 are printed:
98 parallel -k sleep {}';' echo {} done ::: 5 4 3 2 1
101 =head2 Controlling the execution
103 If your jobs are compute intensive, you will most likely run one job
104 for each core in the system. This is the default for GNU B<parallel>.
106 But sometimes you want more jobs running. You control the number of
107 job slots with B<-j>. Give B<-j> the number of jobs to run in
108 parallel:
110 parallel -j50 \
111 wget https://ftpmirror.gnu.org/parallel/parallel-{1}{2}22.tar.bz2 \
112 ::: 2012 2013 2014 2015 2016 \
113 ::: 01 02 03 04 05 06 07 08 09 10 11 12
116 =head2 Pipe mode
118 GNU B<parallel> can also pass blocks of data to commands on stdin
119 (standard input):
121 seq 1000000 | parallel --pipe wc
123 This can be used to process big text files. By default GNU B<parallel>
124 splits on \n (newline) and passes a block of around 1 MB to each job.
127 =head2 That's it
129 You have now learned the basic use of GNU B<parallel>. This will
130 probably cover most cases of your use of GNU B<parallel>.
132 The rest of this document will go into more details on each of the
133 sections and cover special use cases.
136 =head1 Learn GNU Parallel in an hour
138 In this part we will dive deeper into what you learned in the first 5 minutes.
140 To get going please run this to make some example files:
142 seq 6 > seq6
143 seq 6 -1 1 > seq-6
145 =head2 Input sources
147 On top of the command line, input sources can also be stdin (standard
148 input or '-'), files and fifos and they can be mixed. Files are given
149 after B<-a> or B<::::>. So these all do the same:
151 parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1
152 parallel echo Dice1={1} Dice2={2} :::: <(seq 6) :::: <(seq 6 -1 1)
153 parallel echo Dice1={1} Dice2={2} :::: seq6 seq-6
154 parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq-6
155 parallel -a seq6 -a seq-6 echo Dice1={1} Dice2={2}
156 parallel -a seq6 echo Dice1={1} Dice2={2} :::: seq-6
157 parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq-6
158 cat seq-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 -
160 If stdin (standard input) is the only input source, you do not need the '-':
162 cat seq6 | parallel echo Dice1={1}
164 =head3 Linking input sources
166 You can link multiple input sources with B<:::+> and B<::::+>:
168 parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6
169 parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6
171 The B<:::+> (and B<::::+>) will link each value to the corresponding
172 value in the previous input source, so value number 3 from the first
173 input source will be linked to value number 3 from the second input
174 source.
176 You can combine B<:::+> and B<:::>, so you link 2 input sources, but
177 generate all combinations with other input sources:
179 parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \
180 ::: VI V IV III II I ::::+ seq-6
183 =head2 Building the command line
185 =head3 The command
187 The command can be a script, a binary or a Bash function if the
188 function is exported using B<export -f>:
190 # Works only in Bash
191 my_func() {
192 echo in my_func "$1"
194 export -f my_func
195 parallel my_func ::: 1 2 3
197 If the command is complex, it often improves readability to make it
198 into a function.
201 =head3 The replacement strings
203 GNU B<parallel> has some replacement strings to make it easier to
204 refer to the input read from the input sources.
206 If the input is B<mydir/mysubdir/myfile.myext> then:
208 {} = mydir/mysubdir/myfile.myext
209 {.} = mydir/mysubdir/myfile
210 {/} = myfile.myext
211 {//} = mydir/mysubdir
212 {/.} = myfile
213 {#} = the sequence number of the job
214 {%} = the job slot number
216 When a job is started it gets a sequence number that starts at 1 and
217 increases by 1 for each new job. The job also gets assigned a slot
218 number. This number is from 1 to the number of jobs running in
219 parallel. It is unique between the running jobs, but is re-used as
220 soon as a job finishes.
222 =head4 The positional replacement strings
224 The replacement strings have corresponding positional replacement
225 strings. If the value from the 3rd input source is
226 B<mydir/mysubdir/myfile.myext>:
228 {3} = mydir/mysubdir/myfile.myext
229 {3.} = mydir/mysubdir/myfile
230 {3/} = myfile.myext
231 {3//} = mydir/mysubdir
232 {3/.} = myfile
234 So the number of the input source is simply prepended inside the {}'s.
237 =head1 Replacement strings
239 --plus replacement strings
241 change the replacement string (-I --extensionreplace --basenamereplace --basenamereplace --dirnamereplace --basenameextensionreplace --seqreplace --slotreplace
243 --header with named replacement string
245 {= =}
247 Dynamic replacement strings
249 =head2 Defining replacement strings
254 =head2 Copying environment
256 env_parallel
258 =head2 Controlling the output
260 =head3 parset
262 B<parset> is a shell function to get the output from GNU B<parallel>
263 into shell variables.
265 B<parset> is fully supported for B<Bash/Zsh/Ksh> and partially supported
266 for B<ash/dash>. I will assume you run B<Bash>.
268 To activate B<parset> you have to run:
270 . `which env_parallel.bash`
272 (replace B<bash> with your shell's name).
274 Then you can run:
276 parset a,b,c seq ::: 4 5 6
277 echo "$c"
281 parset 'a b c' seq ::: 4 5 6
282 echo "$c"
284 If you give a single variable, this will become an array:
286 parset arr seq ::: 4 5 6
287 echo "${arr[1]}"
289 B<parset> has one limitation: If it reads from a pipe, the output will
290 be lost.
292 echo This will not work | parset myarr echo
293 echo Nothing: "${myarr[*]}"
295 Instead you can do this:
297 echo This will work > tempfile
298 parset myarr echo < tempfile
299 echo ${myarr[*]}
305 =head2 Controlling the execution
307 --dryrun -v
309 =head2 Remote execution
311 For this section you must have B<ssh> access with no password to 2
312 servers: B<$server1> and B<$server2>.
314 server1=server.example.com
315 server2=server2.example.net
317 So you must be able to do this:
319 ssh $server1 echo works
320 ssh $server2 echo works
322 It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id $server1'
323 and using an empty passphrase. Or you can use B<ssh-agent>.
325 =head3 Workers
327 =head3 --transferfile
329 B<--transferfile> I<filename> will transfer I<filename> to the
330 worker. I<filename> can contain a replacement string:
332 parallel -S $server1,$server2 --transferfile {} wc ::: example.*
333 parallel -S $server1,$server2 --transferfile {2} \
334 echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
336 A shorthand for B<--transferfile {}> is B<--transfer>.
338 =head3 --return
342 =head3 --cleanup
344 A shorthand for B<--transfer --return {} --cleanup> is B<--trc {}>.
347 =head2 Pipe mode
349 --pipepart
352 =head2 That's it
354 =head1 Advanced usage
356 parset fifo, cmd substitution, arrayelements, array with var names and cmds, env_parset
359 env_parallel
361 Interfacing with R.
363 Interfacing with JSON/jq
365 4dl() {
366 board="$(printf -- '%s' "${1}" | cut -d '/' -f4)"
367 thread="$(printf -- '%s' "${1}" | cut -d '/' -f6)"
368 wget -qO- "https://a.4cdn.org/${board}/thread/${thread}.json" |
369 jq -r '
370 .posts
371 | map(select(.tim != null))
372 | map((.tim | tostring) + .ext)
373 | map("https://i.4cdn.org/'"${board}"'/"+.)[]
375 parallel --gnu -j 0 wget -nv
378 Interfacing with XML/?
380 Interfacing with HTML/?
382 =head2 Controlling the execution
384 --termseq
387 =head2 Remote execution
389 seq 10 | parallel --sshlogin 'ssh -i "key.pem" a@b.com' echo
391 seq 10 | PARALLEL_SSH='ssh -i "key.pem"' parallel --sshlogin a@b.com echo
393 seq 10 | parallel --ssh 'ssh -i "key.pem"' --sshlogin a@b.com echo
395 ssh-agent
397 The sshlogin file format
399 Check if servers are up
403 =cut