2008-01-25 Douglas Gregor <doug.gregor@gmail.com>
[official-gcc.git] / libstdc++-v3 / doc / html / ext / parallel_mode.html
blob7ca8dbe937cc75f9f01ee39066bca57661d05c29
1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <!DOCTYPE html
3 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
6 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta name="AUTHOR" content="bkoz@gcc.gnu.org (Benjamin Kosnik)" />
9 <meta name="KEYWORDS" content="c++, libstdc++, gdb, g++, debug" />
10 <meta name="DESCRIPTION" content="The libstdc++ parallel mode." />
11 <meta name="GENERATOR" content="emacs and ten fingers" />
12 <title>The libstdc++ parallel mode</title>
13 <link rel="StyleSheet" href="lib3styles.css" type="text/css" />
14 <link rel="Copyright" href="17_intro/license.html" type="text/html" />
15 </head>
16 <body>
18 <h1 class="centered"><a name="top">The libstdc++ parallel mode</a></h1>
20 <p class="fineprint"><em>
21 The latest version of this document is always available at
22 <a href="http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html">
23 http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html</a>.
24 </em></p>
26 <p><em>
27 To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++ homepage</a>.
28 </em></p>
30 <!-- ####################################################### -->
31 <hr />
32 <p> The libstdc++ parallel mode is an experimental parallel
33 implementation of many algorithms the C++ Standard Library.
34 </p>
36 <p>
37 Several of the standard algorithms, for instance
38 <code>std::sort</code>, are made parallel using OpenMP
39 annotations. These parallel mode constructs and can be invoked by
40 explicit source declaration or by compiling existing sources with a
41 specific compiler flag.
42 </p>
44 <h3 class="left"><a name="parallel">The libstdc++ parallel mode</a></h3>
46 <p>The libstdc++ parallel mode performs parallelization of algorithms,
47 function objects, classes, and functions in the C++ Standard.</p>
49 <h4 class="left">Using the libstdc++ parallel mode</h4>
51 <p>To use the libstdc++ parallel mode, compile your application with
52 the compiler flag <code>-D_GLIBCXX_PARALLEL -fopenmp</code>. This
53 will link in <code>libgomp</code>, the GNU OpenMP <a
54 href="http://gcc.gnu.org/onlinedocs/libgomp">implementation</a>,
55 whose presence is mandatory. In addition, hardware capable of atomic
56 operations is mandatory. Actually activating these atomic
57 operations may require explicit compiler flags on some targets
58 (like sparc and x86), such as <code>-march=i686</code>,
59 <code>-march=native</code> or <code>-mcpu=v9</code>.
60 </p>
62 <p>Note that the <code>_GLIBCXX_PARALLEL</code> define may change the
63 sizes and behavior of standard class templates such as
64 <code>std::search</code>, and therefore one can only link code
65 compiled with parallel mode and code compiled without parallel mode
66 if no instantiation of a container is passed between the two
67 translation units. Parallel mode functionality has distinct linkage,
68 and cannot be confused with normal mode symbols.</p>
71 <p>The following library components in the include
72 <code>&lt;numeric&gt;</code> are included in the parallel mode:</p>
73 <ul>
74 <li><code>std::accumulate</code></li>
75 <li><code>std::adjacent_difference</code></li>
76 <li><code>std::inner_product</code></li>
77 <li><code>std::partial_sum</code></li>
78 </ul>
80 <p>The following library components in the include
81 <code>&lt;algorithm&gt;</code> are included in the parallel mode:</p>
82 <ul>
83 <li><code>std::adjacent_find</code></li>
84 <li><code>std::count</code></li>
85 <li><code>std::count_if</code></li>
86 <li><code>std::equal</code></li>
87 <li><code>std::find</code></li>
88 <li><code>std::find_if</code></li>
89 <li><code>std::find_first_of</code></li>
90 <li><code>std::for_each</code></li>
91 <li><code>std::generate</code></li>
92 <li><code>std::generate_n</code></li>
93 <li><code>std::lexicographical_compare</code></li>
94 <li><code>std::mismatch</code></li>
95 <li><code>std::search</code></li>
96 <li><code>std::search_n</code></li>
97 <li><code>std::transform</code></li>
98 <li><code>std::replace</code></li>
99 <li><code>std::replace_if</code></li>
100 <li><code>std::max_element</code></li>
101 <li><code>std::merge</code></li>
102 <li><code>std::min_element</code></li>
103 <li><code>std::nth_element</code></li>
104 <li><code>std::partial_sort</code></li>
105 <li><code>std::partition</code></li>
106 <li><code>std::random_shuffle</code></li>
107 <li><code>std::set_union</code></li>
108 <li><code>std::set_intersection</code></li>
109 <li><code>std::set_symmetric_difference</code></li>
110 <li><code>std::set_difference</code></li>
111 <li><code>std::sort</code></li>
112 <li><code>std::stable_sort</code></li>
113 <li><code>std::unique_copy</code></li>
114 </ul>
116 <p>The following library components in the includes
117 <code>&lt;set&gt;</code> and <code>&lt;map&gt;</code> are included in the parallel mode:</p>
118 <ul>
119 <li><code>std::(multi_)map/set&lt;T&gt;::(multi_)map/set(Iterator begin, Iterator end)</code> (bulk construction)</li>
120 <li><code>std::(multi_)map/set&lt;T&gt;::insert(Iterator begin, Iterator end)</code> (bulk insertion)</li>
121 </ul>
124 <h4 class="left">Using the parallel algorithms without parallel mode</h4>
126 <p>When it is not feasible to recompile your entire application, or
127 only specific algorithms need to be parallel-aware, individual
128 parallel algorithms can be made available explicitly. These
129 parallel algorithms are functionally equivalent to the standard
130 drop-in algorithms used in parallel mode, but they are available in
131 a separate namespace as GNU extensions and may be used in programs
132 compiled with either release mode or with parallel mode. The
133 following table provides the names and headers of the parallel
134 algorithms:
135 </p>
138 <table title="Parallel algorithms" border="1">
139 <tr>
140 <th>Algorithm</th>
141 <th>Header</th>
142 <th>Parallel algorithm</th>
143 <th>Parallel header</th>
144 </tr>
145 <tr>
146 <td>std::accumulate</td>
147 <td>&lt;numeric&gt;</td>
148 <td>__gnu_parallel::accumulate</td>
149 <td>&lt;parallel/numeric&gt;</td>
150 </tr>
151 <tr>
152 <td>std::adjacent_difference</td>
153 <td>&lt;numeric&gt;</td>
154 <td>__gnu_parallel::adjacent_difference</td>
155 <td>&lt;parallel/numeric&gt;</td>
156 </tr>
157 <tr>
158 <td>std::inner_product</td>
159 <td>&lt;numeric&gt;</td>
160 <td>__gnu_parallel::inner_product</td>
161 <td>&lt;parallel/numeric&gt;</td>
162 </tr>
163 <tr>
164 <td>std::partial_sum</td>
165 <td>&lt;numeric&gt;</td>
166 <td>__gnu_parallel::partial_sum</td>
167 <td>&lt;parallel/numeric&gt;</td>
168 </tr>
169 <tr>
170 <td>std::adjacent_find</td>
171 <td>&lt;algorithm&gt;</td>
172 <td>__gnu_parallel::adjacent_find</td>
173 <td>&lt;parallel/algorithm&gt;</td>
174 </tr>
176 <tr>
177 <td>std::count</td>
178 <td>&lt;algorithm&gt;</td>
179 <td>__gnu_parallel::count</td>
180 <td>&lt;parallel/algorithm&gt;</td>
181 </tr>
183 <tr>
184 <td>std::count_if</td>
185 <td>&lt;algorithm&gt;</td>
186 <td>__gnu_parallel::count_if</td>
187 <td>&lt;parallel/algorithm&gt;</td>
188 </tr>
190 <tr>
191 <td>std::equal</td>
192 <td>&lt;algorithm&gt;</td>
193 <td>__gnu_parallel::equal</td>
194 <td>&lt;parallel/algorithm&gt;</td>
195 </tr>
197 <tr>
198 <td>std::find</td>
199 <td>&lt;algorithm&gt;</td>
200 <td>__gnu_parallel::find</td>
201 <td>&lt;parallel/algorithm&gt;</td>
202 </tr>
204 <tr>
205 <td>std::find_if</td>
206 <td>&lt;algorithm&gt;</td>
207 <td>__gnu_parallel::find_if</td>
208 <td>&lt;parallel/algorithm&gt;</td>
209 </tr>
211 <tr>
212 <td>std::find_first_of</td>
213 <td>&lt;algorithm&gt;</td>
214 <td>__gnu_parallel::find_first_of</td>
215 <td>&lt;parallel/algorithm&gt;</td>
216 </tr>
218 <tr>
219 <td>std::for_each</td>
220 <td>&lt;algorithm&gt;</td>
221 <td>__gnu_parallel::for_each</td>
222 <td>&lt;parallel/algorithm&gt;</td>
223 </tr>
225 <tr>
226 <td>std::generate</td>
227 <td>&lt;algorithm&gt;</td>
228 <td>__gnu_parallel::generate</td>
229 <td>&lt;parallel/algorithm&gt;</td>
230 </tr>
232 <tr>
233 <td>std::generate_n</td>
234 <td>&lt;algorithm&gt;</td>
235 <td>__gnu_parallel::generate_n</td>
236 <td>&lt;parallel/algorithm&gt;</td>
237 </tr>
239 <tr>
240 <td>std::lexicographical_compare</td>
241 <td>&lt;algorithm&gt;</td>
242 <td>__gnu_parallel::lexicographical_compare</td>
243 <td>&lt;parallel/algorithm&gt;</td>
244 </tr>
246 <tr>
247 <td>std::mismatch</td>
248 <td>&lt;algorithm&gt;</td>
249 <td>__gnu_parallel::mismatch</td>
250 <td>&lt;parallel/algorithm&gt;</td>
251 </tr>
253 <tr>
254 <td>std::search</td>
255 <td>&lt;algorithm&gt;</td>
256 <td>__gnu_parallel::search</td>
257 <td>&lt;parallel/algorithm&gt;</td>
258 </tr>
260 <tr>
261 <td>std::search_n</td>
262 <td>&lt;algorithm&gt;</td>
263 <td>__gnu_parallel::search_n</td>
264 <td>&lt;parallel/algorithm&gt;</td>
265 </tr>
267 <tr>
268 <td>std::transform</td>
269 <td>&lt;algorithm&gt;</td>
270 <td>__gnu_parallel::transform</td>
271 <td>&lt;parallel/algorithm&gt;</td>
272 </tr>
274 <tr>
275 <td>std::replace</td>
276 <td>&lt;algorithm&gt;</td>
277 <td>__gnu_parallel::replace</td>
278 <td>&lt;parallel/algorithm&gt;</td>
279 </tr>
281 <tr>
282 <td>std::replace_if</td>
283 <td>&lt;algorithm&gt;</td>
284 <td>__gnu_parallel::replace_if</td>
285 <td>&lt;parallel/algorithm&gt;</td>
286 </tr>
288 <tr>
289 <td>std::max_element</td>
290 <td>&lt;algorithm&gt;</td>
291 <td>__gnu_parallel::max_element</td>
292 <td>&lt;parallel/algorithm&gt;</td>
293 </tr>
295 <tr>
296 <td>std::merge</td>
297 <td>&lt;algorithm&gt;</td>
298 <td>__gnu_parallel::merge</td>
299 <td>&lt;parallel/algorithm&gt;</td>
300 </tr>
302 <tr>
303 <td>std::min_element</td>
304 <td>&lt;algorithm&gt;</td>
305 <td>__gnu_parallel::min_element</td>
306 <td>&lt;parallel/algorithm&gt;</td>
307 </tr>
309 <tr>
310 <td>std::nth_element</td>
311 <td>&lt;algorithm&gt;</td>
312 <td>__gnu_parallel::nth_element</td>
313 <td>&lt;parallel/algorithm&gt;</td>
314 </tr>
316 <tr>
317 <td>std::partial_sort</td>
318 <td>&lt;algorithm&gt;</td>
319 <td>__gnu_parallel::partial_sort</td>
320 <td>&lt;parallel/algorithm&gt;</td>
321 </tr>
323 <tr>
324 <td>std::partition</td>
325 <td>&lt;algorithm&gt;</td>
326 <td>__gnu_parallel::partition</td>
327 <td>&lt;parallel/algorithm&gt;</td>
328 </tr>
330 <tr>
331 <td>std::random_shuffle</td>
332 <td>&lt;algorithm&gt;</td>
333 <td>__gnu_parallel::random_shuffle</td>
334 <td>&lt;parallel/algorithm&gt;</td>
335 </tr>
337 <tr>
338 <td>std::set_union</td>
339 <td>&lt;algorithm&gt;</td>
340 <td>__gnu_parallel::set_union</td>
341 <td>&lt;parallel/algorithm&gt;</td>
342 </tr>
344 <tr>
345 <td>std::set_intersection</td>
346 <td>&lt;algorithm&gt;</td>
347 <td>__gnu_parallel::set_intersection</td>
348 <td>&lt;parallel/algorithm&gt;</td>
349 </tr>
351 <tr>
352 <td>std::set_symmetric_difference</td>
353 <td>&lt;algorithm&gt;</td>
354 <td>__gnu_parallel::set_symmetric_difference</td>
355 <td>&lt;parallel/algorithm&gt;</td>
356 </tr>
358 <tr>
359 <td>std::set_difference</td>
360 <td>&lt;algorithm&gt;</td>
361 <td>__gnu_parallel::set_difference</td>
362 <td>&lt;parallel/algorithm&gt;</td>
363 </tr>
365 <tr>
366 <td>std::sort</td>
367 <td>&lt;algorithm&gt;</td>
368 <td>__gnu_parallel::sort</td>
369 <td>&lt;parallel/algorithm&gt;</td>
370 </tr>
372 <tr>
373 <td>std::stable_sort</td>
374 <td>&lt;algorithm&gt;</td>
375 <td>__gnu_parallel::stable_sort</td>
376 <td>&lt;parallel/algorithm&gt;</td>
377 </tr>
379 <tr>
380 <td>std::unique_copy</td>
381 <td>&lt;algorithm&gt;</td>
382 <td>__gnu_parallel::unique_copy</td>
383 <td>&lt;parallel/algorithm&gt;</td>
384 </tr>
386 </table>
389 <h4 class="left">Parallel mode semantics</h4>
391 <p> The parallel mode STL algorithms are currently not exception-safe,
392 i. e. user-defined functors must not throw exceptions.
393 </p>
395 <p> Since the current GCC OpenMP implementation does not support
396 OpenMP parallel regions in concurrent threads,
397 it is not possible to call parallel STL algorithm in
398 concurrent threads, either.
399 It might work with other compilers, though.</p>
402 <h4 class="left">Configuration and Tuning</h4>
404 <p> Some algorithm variants can be enabled/disabled/selected at compile-time.
405 See <a href="latest-doxygen/compiletime__settings_8h.html">
406 <code>&lt;compiletime_settings.h&gt;</code></a> and
407 See <a href="latest-doxygen/compiletime__settings_8h.html">
408 <code>&lt;features.h&gt;</code></a> for details.
409 </p>
412 To specify the number of threads to be used for an algorithm,
413 use <code>omp_set_num_threads</code>.
414 To force a function to execute sequentially,
415 even though parallelism is switched on in general,
416 add <code>__gnu_parallel::sequential_tag()</code>
417 to the end of the argument list.
418 </p>
421 Parallelism always incurs some overhead. Thus, it is not
422 helpful to parallelize operations on very small sets of data.
423 There are measures to avoid parallelizing stuff that is not worth it.
424 For each algorithm, a minimum problem size can be stated,
425 usually using the variable
426 <code>__gnu_parallel::Settings::[algorithm]_minimal_n</code>.
427 Please see <a href="latest-doxygen/settings_8h.html">
428 <code>&lt;settings.h&gt;</code></a> for details.</p>
432 <h4 class="left">Interface basics and general design</h4>
434 <p>All parallel algorithms are intended to have signatures that are
435 equivalent to the ISO C++ algorithms replaced. For instance, the
436 <code>std::adjacent_find</code> function is declared as:
437 </p>
438 <pre>
439 namespace std
441 template&lt;typename _FIter&gt;
442 _FIter
443 adjacent_find(_FIter, _FIter);
445 </pre>
448 Which means that there should be something equivalent for the parallel
449 version. Indeed, this is the case:
450 </p>
452 <pre>
453 namespace std
455 namespace __parallel
457 template&lt;typename _FIter&gt;
458 _FIter
459 adjacent_find(_FIter, _FIter);
464 </pre>
466 <p>But.... why the elipses?
467 </p>
469 <p> The elipses in the example above represent additional overloads
470 required for the parallel version of the function. These additional
471 overloads are used to dispatch calls from the ISO C++ function
472 signature to the appropriate parallel function (or sequential
473 function, if no parallel functions are deemed worthy), based on either
474 compile-time or run-time conditions.
475 </p>
477 <p> Compile-time conditions are referred to as "embarrassingly
478 parallel," and are denoted with the appropriate dispatch object, ie
479 one of <code>__gnu_parallel::sequential_tag</code>,
480 <code>__gnu_parallel::parallel_tag</code>,
481 <code>__gnu_parallel::balanced_tag</code>,
482 <code>__gnu_parallel::unbalanced_tag</code>,
483 <code>__gnu_parallel::omp_loop_tag</code>, or
484 <code>__gnu_parallel::omp_loop_static_tag</code>.
485 </p>
487 <p> Run-time conditions depend on the hardware being used, the number
488 of threads available, etc., and are denoted by the use of the enum
489 <code>__gnu_parallel::parallelism</code>. Values of this enum include
490 <code>__gnu_parallel::sequential</code>,
491 <code>__gnu_parallel::parallel_unbalanced</code>,
492 <code>__gnu_parallel::parallel_balanced</code>,
493 <code>__gnu_parallel::parallel_omp_loop</code>,
494 <code>__gnu_parallel::parallel_omp_loop_static</code>, or
495 <code>__gnu_parallel::parallel_taskqueue</code>.
496 </p>
498 <p> Putting all this together, the general view of overloads for the
499 parallel algorithms look like this:
500 </p>
501 <ul>
502 <li>ISO C++ signature</li>
503 <li>ISO C++ signature + sequential_tag argument</li>
504 <li>ISO C++ signature + parallelism argument</li>
505 </ul>
507 <p> Please note that the implementation may use additional functions
508 (designated with the <code>_switch</code> suffix) to dispatch from the
509 ISO C++ signature to the correct parallel version. Also, some of the
510 algorithms do not have support for run-time conditions, so the last
511 overload is therefore missing.
512 </p>
515 <h4 class="left">Relevant namespaces</h4>
517 <p> One namespace contain versions of code that are explicitly sequential:
518 <code>__gnu_serial</code>.
519 </p>
521 <p> Two namespaces contain the parallel mode:
522 <code>std::__parallel</code> and <code>__gnu_parallel</code>.
523 </p>
525 <p> Parallel implementations of standard components, including
526 template helpers to select parallelism, are defined in <code>namespace
527 std::__parallel</code>. For instance, <code>std::transform</code> from
528 &lt;algorithm&gt; has a parallel counterpart in
529 <code>std::__parallel::transform</code> from
530 &lt;parallel/algorithm&gt;. In addition, these parallel
531 implementations are injected into <code>namespace
532 __gnu_parallel</code> with using declarations.
533 </p>
535 <p> Support and general infrastructure is in <code>namespace
536 __gnu_parallel</code>.
537 </p>
539 <p> More information, and an organized index of types and functions
540 related to the parallel mode on a per-namespace basis, can be found in
541 the generated source documentation.
542 </p>
544 <h4 class="left">Testing</h4>
546 <p> Both the normal conformance and regression tests and the
547 supplemental performance tests work.</p>
549 <p> To run the conformance and regression tests with the parallel mode
550 active,</p>
551 <code>make check-parallel</code>
553 <p>The log and summary files for conformance testing are in the
554 <code>testsuite/parallel</code> directory.</p>
556 <p> To run the performance tests with the parallel mode active, </p>
557 <code>make check-performance-parallel</code>
559 <p>The result file for performance testing are in the
560 <code>testsuite</code> directory, in the file
561 <code>libstdc++_performance.sum</code>. In addition, the policy-based
562 containers have their own visualizations, which have additional
563 software dependencies than the usual bare-boned text file, and can be
564 generated by using the <code>make doc-performance</code> rule in the
565 testsuite's Makefile.</p>
567 <p>Return <a href="#top">to the top of the page</a> or
568 <a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
569 </p>
572 <h4 class="left">References / Further Reading</h4>
575 Johannes Singler, Peter Sanders, Felix Putze. The Multi-Core Standard Template Library. Euro-Par 2007: Parallel Processing. (LNCS 4641)
576 </p>
579 Leonor Frias, Johannes Singler: Parallelization of Bulk Operations for STL Dictionaries. Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
580 </p>
582 <!-- ####################################################### -->
584 <hr />
585 <p class="fineprint"><em>
586 See <a href="17_intro/license.html">license.html</a> for copying conditions.
587 Comments and suggestions are welcome, and may be sent to
588 <a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
589 </em></p>
592 </body>
593 </html>