Merge -r 127928:132243 from trunk
[official-gcc.git] / libstdc++-v3 / doc / html / ext / pb_ds / motivation.html
blob420fc6451031ee87af7c324333a74a630336cbc8
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5 <head>
6 <meta name="generator" content=
7 "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
9 <title>Motivation</title>
10 <meta http-equiv="Content-Type" content=
11 "text/html; charset=us-ascii" />
12 </head>
14 <body>
15 <div id="page">
16 <h1>Motivation</h1>
18 <p>Many fine associative-container libraries were already
19 written, most notably, the STL's associative containers. Why
20 then write another library? This section shows some possible
21 advantages of this library, when considering the challenges in
22 <a href="introduction.html">Introduction</a>. Many of these
23 points stem from the fact that the STL introduced
24 associative-containers in a two-step process (first
25 standardizing tree-based containers, only then adding
26 hash-based containers, which are fundamentally different), did
27 not standardize priority queues as containers, and (in our
28 opinion) overloads the iterator concept.</p>
30 <h2><a name="assoc" id="assoc">Associative Containers</a></h2>
32 <h3><a name="assoc_policies" id="assoc_policies">More
33 Configuration Choices</a></h3>
35 <p>Associative containers require a relatively large number of
36 policies to function efficiently in various settings. In some
37 cases this is needed for making their common operations more
38 efficient, and in other cases this allows them to support a
39 larger set of operations</p>
41 <ol>
42 <li>Hash-based containers, for example, support look-up and
43 insertion methods (<i>e.g.</i>, <tt>find</tt> and
44 <tt>insert</tt>). In order to locate elements quickly, they
45 are supplied a hash functor, which instruct how to transform
46 a key object into some size type; <i>e.g.</i>, a hash functor
47 might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A
48 hash table, though, requires transforming each key object
49 into some size-type type in some specific domain;
50 <i>e.g.</i>, a hash table with a 128-long table might
51 transform <tt>"hello"</tt> into position 63. The policy by
52 which the hash value is transformed into a position within
53 the table can dramatically affect performance (see <a href=
54 "hash_based_containers.html#hash_policies">Design::Associative
55 Containers::Hash-Based Containers::Hash Policies</a>).
56 Hash-based containers also do not resize naturally (as
57 opposed to tree-based containers, for example). The
58 appropriate resize policy is unfortunately intertwined with
59 the policy that transforms hash value into a position within
60 the table (see <a href=
61 "hash_based_containers.html#resize_policies">Design::Associative
62 Containers::Hash-Based Containers::Resize Policies</a>).
64 <p><a href=
65 "assoc_performance_tests.html#hash_based">Associative-Container
66 Performance Tests::Hash-Based Containers</a> quantifies
67 some of these points.</p>
68 </li>
70 <li>Tree-based containers, for example, also support look-up
71 and insertion methods, and are primarily useful when
72 maintaining order between elements is important. In some
73 cases, though, one can utilize their balancing algorithms for
74 completely different purposes.
76 <p>Figure <a href="#node_invariants">Metadata for
77 order-statistics and interval intersections</a>-A, for
78 example, shows a tree whose each node contains two entries:
79 a floating-point key, and some size-type <i>metadata</i>
80 (in bold beneath it) that is the number of nodes in the
81 sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5
82 nodes (including itself) in its sub-tree.) A container based
83 on this data structure can obviously answer efficiently
84 whether 0.3 is in the container object, but it can also
85 answer what is the order of 0.3 among all those in the
86 container object [<a href=
87 "references.html#clrs2001">clrs2001</a>] (see <a href=
88 "assoc_examples.html#tree_like_based">Associative Container
89 Examples::Tree-Like-Based Containers (Trees and
90 Tries)</a>).</p>
92 <p>As another example, Figure <a href=
93 "#node_invariants">Metadata for order-statistics and
94 interval intersections</a>-B shows a tree whose each node
95 contains two entries: a half-open geometric line interval,
96 and a number <i>metadata</i> (in bold beneath it) that is
97 the largest endpoint of all intervals in its sub-tree.
98 (<i>E.g.</i>, the root describes the interval <i>[20,
99 36)</i>, and the largest endpoint in its sub-tree is 99.) A
100 container based on this data structure can obviously answer
101 efficiently whether <i>[3, 41)</i> is in the container
102 object, but it can also answer efficiently whether the
103 container object has intervals that intersect <i>[3,
104 41)</i> (see <a href=
105 "assoc_examples.html#tree_like_based">Associative Container
106 Examples::Tree-Like-Based Containers (Trees and
107 Tries)</a>). These types of queries are very useful in
108 geometric algorithms and lease-management algorithms.</p>
110 <p>It is important to note, however, that as the trees are
111 modified, their internal structure changes. To maintain
112 these invariants, one must supply some policy that is aware
113 of these changes (see <a href=
114 "tree_based_containers.html#invariants">Design::Associative
115 Containers::Tree-Based Containers::Node Invariants</a>);
116 without this, it would be better to use a linked list (in
117 itself very efficient for these purposes).</p>
119 <p><a href=
120 "assoc_performance_tests.html#tree_like_based">Associative-Container
121 Performance Tests::Tree-Like-Based Containers</a>
122 quantifies some of these points.</p>
123 </li>
124 </ol>
126 <h6 class="c1"><a name="node_invariants" id=
127 "node_invariants"><img src="node_invariants.png" alt=
128 "no image" /></a></h6>
130 <h6 class="c1">Metadata for order-statistics and interval
131 intersections.</h6>
133 <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More
134 Data Structures and Traits</a></h3>
136 <p>The STL contains associative containers based on red-black
137 trees and collision-chaining hash tables. These are obviously
138 very useful, but they are not ideal for all types of
139 settings.</p>
141 <p>Figure <a href=
142 "#different_underlying_data_structures">Different underlying
143 data structures</a> shows different underlying data structures
144 (the ones currently supported in <tt>pb_ds</tt>). A shows a
145 collision-chaining hash-table, B shows a probing hash-table, C
146 shows a red-black tree, D shows a splay tree, E shows a tree
147 based on an ordered vector(implicit in the order of the
148 elements), F shows a PATRICIA trie, and G shows a list-based
149 container with update policies.</p>
151 <p>Each of these data structures has some performance benefits,
152 in terms of speed, size or both (see <a href=
153 "assoc_performance_tests.html">Associative-Container
154 Performance Tests</a> and <a href=
155 "assoc_performance_tests.html#dss_family_choice">Associative-Container
156 Performance Tests::Observations::Underlying Data-Structure
157 Families</a>). For now, though, note that <i>e.g.</i>,
158 vector-based trees and probing hash tables manipulate memory
159 more efficiently than red-black trees and collision-chaining
160 hash tables, and that list-based associative containers are
161 very useful for constructing "multimaps" (see <a href=
162 "#assoc_mapping_semantics">Alternative to Multiple Equivalent
163 Keys</a>, <a href=
164 "assoc_performance_tests.html#multimaps">Associative Container
165 Performance Tests::Multimaps</a>, and <a href=
166 "assoc_performance_tests.html#msc">Associative Container
167 Performance Tests::Observations::Mapping-Semantics
168 Considerations</a>).</p>
170 <h6 class="c1"><a name="different_underlying_data_structures"
171 id="different_underlying_data_structures"><img src=
172 "different_underlying_dss.png" alt="no image" /></a></h6>
174 <h6 class="c1">Different underlying data structures.</h6>
176 <p>Now consider a function manipulating a generic associative
177 container, <i>e.g.</i>,</p>
178 <pre>
179 <b>template</b>&lt;
180 <b>class</b> Cntnr&gt;
181 <b>int</b>
182 some_op_sequence
183 (Cntnr &amp;r_cnt)
187 </pre>
189 <p>Ideally, the underlying data structure of <tt>Cntnr</tt>
190 would not affect what can be done with <tt>r_cnt</tt>.
191 Unfortunately, this is not the case.</p>
193 <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then
194 the function can use <tt>std::for_each(r_cnt.find(foo),
195 r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt>
196 to all elements between <tt>foo</tt> and <tt>bar</tt>. If
197 <tt>Cntnr</tt> is a hash-based container, then this call's
198 results are undefined.</p>
200 <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object
201 of the comparison functor can be accessed. If <tt>Cntnr</tt> is
202 hash based, these queries are nonsensical.</p>
204 <p>There are various other differences based on the container's
205 underlying data structure. For one, they can be constructed by,
206 and queried for, different policies. Furthermore:</p>
208 <ol>
209 <li>Containers based on C, D, E and F store elements in a
210 meaningful order; the others store elements in a meaningless
211 (and probably time-varying) order. By implication, only
212 containers based on C, D, E and F can support erase
213 operations taking an iterator and returning an iterator to
214 the following element without performance loss (see <a href=
215 "#assoc_ers_methods">Slightly Different Methods::Methods
216 Related to Erase</a>).</li>
218 <li>Containers based on C, D, E, and F can be split and
219 joined efficiently, while the others cannot. Containers based
220 on C and D, furthermore, can guarantee that this is
221 exception-free; containers based on E cannot guarantee
222 this.</li>
224 <li>Containers based on all but E can guarantee that erasing
225 an element is exception free; containers based on E cannot
226 guarantee this. Containers based on all but B and E can
227 guarantee that modifying an object of their type does not
228 invalidate iterators or references to their elements, while
229 containers based on B and E cannot. Containers based on C, D,
230 and E can furthermore make a stronger guarantee, namely that
231 modifying an object of their type does not affect the order
232 of iterators.</li>
233 </ol>
235 <p>A unified tag and traits system (as used for the STL's
236 iterators, for example) can ease generic manipulation of
237 associative containers based on different underlying
238 data structures (see <a href=
239 "tutorial.html#assoc_ds_gen">Tutorial::Associative
240 Containers::Determining Containers' Attributes</a> and <a href=
241 "ds_gen.html#container_traits">Design::Associative
242 Containers::Data-Structure Genericity::Data-Structure Tags and
243 Traits</a>).</p>
245 <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating
246 between Iterator Types</a></h3>
248 <p>Iterators are centric to the STL's design, because of the
249 container/algorithm/iterator decomposition that allows an
250 algorithm to operate on a range through iterators of some
251 sequence (<i>e.g.</i>, one originating from a container).
252 Iterators, then, are useful because they allow going over a
253 <u>sequence</u>. The STL also uses iterators for accessing a
254 <u>specific</u> element - <i>e.g.</i>, when an associative
255 container returns one through <tt>find</tt>. The STL, however,
256 consistently uses the same types of iterators for both
257 purposes: going over a range, and accessing a specific found
258 element. Before the introduction of hash-based containers to
259 the STL, this made sense (with the exception of priority
260 queues, which are discussed in <a href="#pq">Priority
261 Queues</a>).</p>
263 <p>Using the STL's associative containers together with
264 non-order-preserving associative containers (and also because
265 of priority-queues container), there is a possible need for
266 different types of iterators for self-organizing containers -
267 the iterator concept seems overloaded to mean two different
268 things (in some cases). The following subsections explain this;
269 <a href="tutorial.html#assoc_find_range">Tutorial::Associative
270 Containers::Point-Type and Range-Type Methods</a> explains an
271 alternative design which does not complicate the use of
272 order-preserving containers, but is better for unordered
273 containers; <a href=
274 "ds_gen.html#find_range">Design::Associative
275 Containers::Data-Structure Genericity::Point-Type and
276 Range-Type Methods</a> explains the design further.</p>
278 <h4><a name="assoc_find_it_range_it" id=
279 "assoc_find_it_range_it">Using Point-Type Iterators for
280 Range-Type Operations</a></h4>
282 <p>Suppose <tt>cntnr</tt> is some associative container, and
283 say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what
284 will be the outcome of</p>
285 <pre>
286 std::for_each(c.find(1), c.find(5), foo);
287 </pre>
289 <p>If <tt>cntnr</tt> is a tree-based container object, then an
290 in-order walk will apply <tt>foo</tt> to the relevant elements,
291 <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range
292 iteration in different data structures</a> -A. If <tt>c</tt> is
293 a hash-based container, then the order of elements between any
294 two elements is undefined (and probably time-varying); there is
295 no guarantee that the elements traversed will coincide with the
296 <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in
297 Figure <a href="#range_it_in_hts">Range iteration in different
298 data structures</a>-B.</p>
300 <h6 class="c1"><a name="range_it_in_hts" id=
301 "range_it_in_hts"><img src="point_iterators_range_ops_1.png"
302 alt="no image" /></a></h6>
304 <h6 class="c1">Range iteration in different
305 data structures.</h6>
307 <p>In our opinion, this problem is not caused just because
308 red-black trees are order preserving while collision-chaining
309 hash tables are (generally) not - it is more fundamental. Most
310 of the STL's containers order sequences in a well-defined
311 manner that is determined by their <u>interface</u>: calling
312 <tt>insert</tt> on a tree-based container modifies its sequence
313 in a predictable way, as does calling <tt>push_back</tt> on a
314 list or a vector. Conversely, collision-chaining hash tables,
315 probing hash tables, priority queues, and list-based containers
316 (which are very useful for "multimaps") are self-organizing
317 data structures; the effect of each operation modifies their
318 sequences in a manner that is (practically) determined by their
319 <u>implementation</u>.</p>
321 <p>Consequently, applying an algorithm to a sequence obtained
322 from most containers <u>may or may not</u> make sense, but
323 applying it to a sub-sequence of a self-organizing container
324 <u>does not</u>.</p>
326 <h4><a name="assoc_range_it_for_find_it" id=
327 "assoc_range_it_for_find_it">The Cost of Enabling Range
328 Capabilities to Point-Type Iterators</a></h4>
330 <p>Suppose <tt>c</tt> is some collision-chaining hash-based
331 container object, and one calls <tt>c.find(3)</tt>. Then what
332 composes the returned iterator?</p>
334 <p>Figure <a href="#find_its_in_hash_tables">Point-type
335 iterators in hash tables</a>-A shows the simplest (and most
336 efficient) implementation of a collision-chaining hash table.
337 The little box marked <tt>point_iterator</tt> shows an object
338 that contains a pointer to the element's node. Note that this
339 "iterator" has no way to move to the next element (<i>i.e.</i>,
340 it cannot support <tt><b>operator</b>++</tt>). Conversely, the
341 little box marked <tt>iterator</tt> stores both a pointer to
342 the element, as well as some other information (<i>e.g.</i>,
343 the bucket number of the element). the second iterator, then,
344 is "heavier" than the first one- it requires more time and
345 space. If we were to use a different container to
346 cross-reference into this hash-table using these iterators - it
347 would take much more space. As noted in <a href=
348 "#assoc_find_it_range_it">Using Point-Type Iterators for
349 Range-Type Operations</a>, nothing much can be done by
350 incrementing these iterators, so why is this extra information
351 needed?</p>
353 <p>Alternatively, one might create a collision-chaining
354 hash-table where the lists might be linked, forming a
355 monolithic total-element list, as in Figure <a href=
356 "#find_its_in_hash_tables">Point-type iterators in hash
357 tables</a>-B (this seems similar to the Dinkumware design
358 [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]).
359 Here the iterators are as light as can be, but the hash-table's
360 operations are more complicated.</p>
362 <h6 class="c1"><a name="find_its_in_hash_tables" id=
363 "find_its_in_hash_tables"><img src=
364 "point_iterators_range_ops_2.png" alt="no image" /></a></h6>
366 <h6 class="c1">Point-type iterators in hash tables.</h6>
368 <p>It should be noted that containers based on
369 collision-chaining hash-tables are not the only ones with this
370 type of behavior; many other self-organizing data structures
371 display it as well.</p>
373 <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation
374 Guarantees</a></h4>
376 <p>Consider the following snippet:</p>
377 <pre>
378 it = c.find(3);
380 c.erase(5);
381 </pre>
383 <p>Following the call to <tt>erase</tt>, what is the validity
384 of <tt>it</tt>: can it be de-referenced? can it be
385 incremented?</p>
387 <p>The answer depends on the underlying data structure of the
388 container. Figure <a href=
389 "#invalidation_guarantee_erase">Effect of erase in different
390 underlying data structures</a> shows three cases: A1 and A2
391 show a red-black tree; B1 and B2 show a probing hash-table; C1
392 and C2 show a collision-chaining hash table.</p>
394 <h6 class="c1"><a name="invalidation_guarantee_erase" id=
395 "invalidation_guarantee_erase"><img src=
396 "invalidation_guarantee_erase.png" alt="no image" /></a></h6>
398 <h6 class="c1">Effect of erase in different underlying
399 data structures.</h6>
401 <ol>
402 <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3
403 can be de-referenced and incremented. The sequence of
404 iterators changed, but in a way that is well-defined by the
405 <u>interface</u>.</li>
407 <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
408 not valid at all - it cannot be de-referenced or incremented;
409 the order of iterators changed in a way that is (practically)
410 determined by the <u>implementation</u> and not by the
411 <u>interface</u>.</li>
413 <li>Erasing 5 from C1 yields C2. Here the situation is more
414 complicated. On the one hand, there is no problem in
415 de-referencing <tt>it</tt>. On the other hand, the order of
416 iterators changed in a way that is (practically) determined
417 by the <u>implementation</u> and not by the
418 <u>interface</u>.</li>
419 </ol>
421 <p>So in classic STL, it is not always possible to express
422 whether <tt>it</tt> is valid or not. This is true also for
423 <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems
424 overloaded.</p>
426 <h3><a name="assoc_methods" id="assoc_methods">Slightly
427 Different Methods</a></h3>
429 <p>[<a href="references.html#meyers02both">meyers02both</a>]
430 points out that a class's methods should comprise only
431 operations which depend on the class's internal structure;
432 other operations are best designed as external functions.
433 Possibly, therefore, the STL's associative containers lack some
434 useful methods, and provide some other methods which would be
435 better left out (<i>e.g.</i>, [<a href=
436 "references.html#sgi_stl">sgi_stl</a>] ).</p>
438 <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods
439 Related to Erase</a></h4>
441 <ol>
442 <li>Order-preserving STL associative containers provide the
443 method
444 <pre>
445 iterator
446 erase
447 (iterator it)
448 </pre>which takes an iterator, erases the corresponding element,
449 and returns an iterator to the following element. Also hash-based
450 STL associative containers provide this method. This <u>seemingly
451 increases</u> genericity between associative containers, since, <i>
452 e.g.</i>, it is possible to use
453 <pre>
454 <b>typename</b> C::iterator it = c.begin();
455 <b>typename</b> C::iterator e_it = c.end();
457 <b>while</b>(it != e_it)
458 it = pred(*it)? c.erase(it) : ++it;
459 </pre>in order to erase from a container object <tt>
460 c</tt> all element which match a predicate <tt>pred</tt>.
461 However, in a different sense this actually
462 <u>decreases</u> genericity: an integral implication of
463 this method is that tree-based associative containers'
464 memory use is linear in the total number of elements they
465 store, while hash-based containers' memory use is unbounded
466 in the total number of elements they store. Assume a
467 hash-based container is allowed to decrease its size when
468 an element is erased. Then the elements might be rehashed,
469 which means that there is no "next" element - it is simply
470 undefined. Consequently, it is possible to infer from the
471 fact that STL hash-based containers provide this method
472 that they cannot downsize when elements are erased
473 (<a href="assoc_performance_tests.html#hash_based">Performance
474 Tests::Hash-Based Container Tests</a> quantifies this.) As
475 a consequence, different code is needed to manipulate
476 different containers, assuming that memory should be
477 conserved. <tt>pb_ds</tt>'s non-order preserving
478 associative containers omit this method.
479 </li>
481 <li>All of <tt>pb_ds</tt>'s associative containers include a
482 conditional-erase method
483 <pre>
484 <b>template</b>&lt;
485 <b>class</b> Pred&gt;
486 size_type
487 erase_if
488 (Pred pred)
489 </pre>which erases all elements matching a predicate. This is
490 probably the only way to ensure linear-time multiple-item erase
491 which can actually downsize a container.
492 </li>
494 <li>STL associative containers provide methods for
495 multiple-item erase of the form
496 <pre>
497 size_type
498 erase
499 (It b,
500 It e)
501 </pre>erasing a range of elements given by a pair of iterators. For
502 tree-based or trie-based containers, this can implemented more
503 efficiently as a (small) sequence of split and join operations. For
504 other, unordered, containers, this method isn't much better than an
505 external loop. Moreover, if <tt>c</tt> is a hash-based container,
506 then, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost
507 certain to do something different than erasing all elements whose
508 keys are between 2 and 5, and is likely to produce other undefined
509 behavior.
510 </li>
511 </ol>
513 <h4><a name="assoc_split_join_methods" id=
514 "assoc_split_join_methods">Methods Related to Split and
515 Join</a></h4>
517 <p>It is well-known that tree-based and trie-based container
518 objects can be efficiently split or joined [<a href=
519 "references.html#clrs2001">clrs2001</a>]. Externally splitting
520 or joining trees is super-linear, and, furthermore, can throw
521 exceptions. Split and join methods, consequently, seem good
522 choices for tree-based container methods, especially, since as
523 noted just before, they are efficient replacements for erasing
524 sub-sequences. <a href=
525 "assoc_performance_tests.html#tree_like_based">Performance
526 Tests::Tree-Like-Based Container Tests</a> quantifies this.</p>
528 <h4><a name="assoc_insert_methods" id=
529 "assoc_insert_methods">Methods Related to Insert</a></h4>
531 <p>STL associative containers provide methods of the form</p>
532 <pre>
533 <b>template</b>&lt;
534 <b>class</b> It&gt;
535 size_type
536 insert
537 (It b,
538 It e);
539 </pre>for inserting a range of elements given by a pair of
540 iterators. At best, this can be implemented as an external loop,
541 or, even more efficiently, as a join operation (for the case of
542 tree-based or trie-based containers). Moreover, these methods seem
543 similar to constructors taking a range given by a pair of
544 iterators; the constructors, however, are transactional, whereas
545 the insert methods are not; this is possibly confusing.
547 <h4><a name="assoc_equiv_comp_methods" id=
548 "assoc_equiv_comp_methods">Functions Related to
549 Comparison</a></h4>
551 <p>Associative containers are parametrized by policies
552 allowing to test key equivalence; <i>e.g.</i> a hash-based
553 container can do this through its equivalence functor, and a
554 tree-based container can do this through its comparison
555 functor. In addition, some STL associative containers have
556 global function operators, <i>e.g.</i>,
557 <tt><b>operator</b>==</tt> and <tt><b>operator</b>&lt;=</tt>,
558 that allow comparing entire associative containers.</p>
560 <p>In our opinion, these functions are better left out. To
561 begin with, they do not significantly improve over an external
562 loop. More importantly, however, they are possibly misleading -
563 <tt><b>operator</b>==</tt>, for example, usually checks for
564 equivalence, or interchangeability, but the associative
565 container cannot check for values' equivalence, only keys'
566 equivalence; also, are two containers considered equivalent if
567 they store the same values in different order? this is an
568 arbitrary decision.</p>
570 <h3><a name="assoc_mapping_semantics" id=
571 "assoc_mapping_semantics">Alternative to Multiple Equivalent
572 Keys</a></h3>
574 <p>Maps (or sets) allow mapping (or storing) unique-key values.
575 The STL, however, also supplies associative containers which
576 map (or store) multiple values with equivalent keys:
577 <tt>std::multimap</tt>, <tt>std::multiset</tt>,
578 <tt>std::tr1::unordered_multimap</tt>, and
579 <tt>unordered_multiset</tt>. We first discuss how these might
580 be used, then why we think it is best to avoid them.</p>
582 <p>Suppose one builds a simple bank-account application that
583 records for each client (identified by an <tt>std::string</tt>)
584 and account-id (marked by an <tt><b>unsigned long</b></tt>) -
585 the balance in the account (described by a
586 <tt><b>float</b></tt>). Suppose further that ordering this
587 information is not useful, so a hash-based container is
588 preferable to a tree based container. Then one can use</p>
589 <pre>
590 std::tr1::unordered_map&lt;std::pair&lt;std::string, <b>unsigned long</b>&gt;, <b>float</b>, ...&gt;
591 </pre>which <u>hashes every combination of client and
592 account-id</u>. This might work well, except for the fact that it
593 is now impossible to efficiently list all of the accounts of a
594 specific client (this would practically require iterating over all
595 entries). Instead, one can use
596 <pre>
597 std::tr1::unordered_multimap&lt;std::pair&lt;std::string, <tt><b>unsigned long</b></tt>&gt;, <b>float</b>, ...&gt;
598 </pre>which <u>hashes every client</u>, and <u>decides equivalence
599 based on client</u> only. This will ensure that all accounts
600 belonging to a specific user are stored consecutively.
602 <p>Also, suppose one wants an integers' priority queue
603 (<i>i.e.,</i> a container that supports <tt>push</tt>,
604 <tt>pop</tt>, and <tt>top</tt> operations, the last of which
605 returns the largest <tt><b>int</b></tt>) that also supports
606 operations such as <tt>find</tt> and <tt>lower_bound</tt>. A
607 reasonable solution is to build an adapter over
608 <tt>std::set&lt;<b>int</b>&gt;</tt>. In this adapter,
609 <i>e.g.</i>, <tt>push</tt> will just call the tree-based
610 associative container's <tt>insert</tt> method; <tt>pop</tt>
611 will call its <tt>end</tt> method, and use it to return the
612 preceding element (which must be the largest). Then this might
613 work well, except that the container object cannot hold
614 multiple instances of the same integer (<tt>push(4)</tt>,
615 <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the
616 container object). If multiple keys are necessary, then one
617 might build the adapter over an
618 <tt>std::multiset&lt;<b>int</b>&gt;</tt>.</p>
620 <p class="c1">STL non-unique-mapping containers, then, are
621 useful when (1) a key can be decomposed in to a primary key and
622 a secondary key, (2) a key is needed multiple times, or (3) any
623 combination of (1) and (2).</p>
625 <p>Figure <a href="#embedded_lists_1">Non-unique mapping
626 containers in the STL's design</a> shows how the STL's design
627 works internally; in this figure nodes shaded equally represent
628 equivalent-key values. Equivalent keys are stored consecutively
629 using the properties of the underlying data structure: binary
630 search trees (Figure <a href="#embedded_lists_1">Non-unique
631 mapping containers in the STL's design</a>-A) store
632 equivalent-key values consecutively (in the sense of an
633 in-order walk) naturally; collision-chaining hash tables
634 (Figure <a href="#embedded_lists_1">Non-unique mapping
635 containers in the STL's design</a>-B) store equivalent-key
636 values in the same bucket, the bucket can be arranged so that
637 equivalent-key values are consecutive.</p>
639 <h6 class="c1"><a name="embedded_lists_1" id=
640 "embedded_lists_1"><img src="embedded_lists_1.png" alt=
641 "no image" /></a></h6>
643 <h6 class="c1">Non-unique mapping containers in the STL's
644 design.</h6>
646 <p>Put differently, STL non-unique mapping
647 associative-containers are associative containers that map
648 primary keys to linked lists that are embedded into the
649 container. Figure <a href="#embedded_lists_2">Effect of
650 embedded lists in STL multimaps</a> shows again the two
651 containers from Figure <a href="#embedded_lists_1">Non-unique
652 mapping containers in the STL's design</a>, this time with the
653 embedded linked lists of the grayed nodes marked
654 explicitly.</p>
656 <h6 class="c1"><a name="embedded_lists_2" id=
657 "embedded_lists_2"><img src="embedded_lists_2.png" alt=
658 "no image" /></a></h6>
660 <h6 class="c1">Effect of embedded lists in STL multimaps.</h6>
662 <p>These embedded linked lists have several disadvantages.</p>
664 <ol>
665 <li>The underlying data structure embeds the linked lists
666 according to its own consideration, which means that the
667 search path for a value might include several different
668 equivalent-key values. For example, the search path for the
669 the black node in either of Figures <a href=
670 "#embedded_lists_1">Non-unique mapping containers in the
671 STL's design</a> A or B, includes more than a single gray
672 node.</li>
674 <li>The links of the linked lists are the underlying
675 data structures' nodes, which typically are quite structured.
676 <i>E.g.</i>, in the case of tree-based containers (Figure
677 <a href="#embedded_lists_2">Effect of embedded lists in STL
678 multimaps</a>-B), each "link" is actually a node with three
679 pointers (one to a parent and two to children), and a
680 relatively-complicated iteration algorithm. The linked lists,
681 therefore, can take up quite a lot of memory, and iterating
682 over all values equal to a given key (<i>e.g.</i>, through
683 the return value of the STL's <tt>equal_range</tt>) can be
684 expensive.</li>
686 <li>The primary key is stored multiply; this uses more
687 memory.</li>
689 <li>Finally, the interface of this design excludes several
690 useful underlying data structures. <i>E.g.</i>, of all the
691 unordered self-organizing data structures, practically only
692 collision-chaining hash tables can (efficiently) guarantee
693 that equivalent-key values are stored consecutively.</li>
694 </ol>
696 <p>The above reasons hold even when the ratio of secondary keys
697 to primary keys (or average number of identical keys) is small,
698 but when it is large, there are more severe problems:</p>
700 <ol>
701 <li>The underlying data structures order the links inside
702 each embedded linked-lists according to their internal
703 considerations, which effectively means that each of the
704 links is unordered. Irrespective of the underlying
705 data structure, searching for a specific value can degrade to
706 linear complexity.</li>
708 <li>Similarly to the above point, it is impossible to apply
709 to the secondary keys considerations that apply to primary
710 keys. For example, it is not possible to maintain secondary
711 keys by sorted order.</li>
713 <li>While the interface "understands" that all equivalent-key
714 values constitute a distinct list (<i>e.g.</i>, through
715 <tt>equal_range</tt>), the underlying data structure
716 typically does not. This means, <i>e.g.</i>, that operations
717 such as erasing from a tree-based container all values whose
718 keys are equivalent to a a given key can be super-linear in
719 the size of the tree; this is also true also for several
720 other operations that target a specific list.</li>
721 </ol>
723 <p>In <tt>pb_ds</tt>, therefore, all associative containers map
724 (or store) unique-key values. One can (1) map primary keys to
725 secondary associative-containers (<i>i.e.</i>, containers of
726 secondary keys) or non-associative containers (2) map identical
727 keys to a size-type representing the number of times they
728 occur, or (3) any combination of (1) and (2). Instead of
729 allowing multiple equivalent-key values, <tt>pb_ds</tt>
730 supplies associative containers based on underlying
731 data structures that are suitable as secondary
732 associative-containers (see <a href=
733 "assoc_performance_tests.html#msc">Associative-Container
734 Performance Tests::Observations::Mapping-Semantics
735 Considerations</a>).</p>
737 <p>Figures <a href="#embedded_lists_3">Non-unique mapping
738 containers in <tt>pb_ds</tt></a> A and B show the equivalent
739 structures in <tt>pb_ds</tt>'s design, to those in Figures
740 <a href="#embedded_lists_1">Non-unique mapping containers in
741 the STL's design</a> A and B, respectively. Each shaded box
742 represents some size-type or secondary
743 associative-container.</p>
745 <h6 class="c1"><a name="embedded_lists_3" id=
746 "embedded_lists_3"><img src="embedded_lists_3.png" alt=
747 "no image" /></a></h6>
749 <h6 class="c1">Non-unique mapping containers in the
750 <tt>pb_ds</tt>.</h6>
752 <p>In the first example above, then, one would use an
753 associative container mapping each user to an associative
754 container which maps each application id to a start time (see
755 <a href=
756 "../../../../testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>);
757 in the second example, one would use an associative container
758 mapping each <tt><b>int</b></tt> to some size-type indicating
759 the number of times it logically occurs (see <a href=
760 "../../../../testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p>
762 <p><a href=
763 "assoc_performance_tests.html#multimaps">Associative-Container
764 Performance Tests::Multimaps</a> quantifies some of these
765 points, and <a href=
766 "assoc_performance_tests.html#msc">Associative-Container
767 Performance Tests::Observations::Mapping-Semantics
768 Considerations</a> shows some simple calculations.</p>
770 <p><a href="assoc_examples.html#mmaps">Associative-Container
771 Examples::Multimaps</a> shows some simple examples of using
772 "multimaps".</p>
774 <p><a href="lu_based_containers.html">Design::Associative
775 Containers::List-Based Containers</a> discusses types of
776 containers especially suited as secondary
777 associative-containers.</p>
779 <h2><a name="pq" id="pq">Priority Queues</a></h2>
781 <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different
782 Methods</a></h3>
784 <p>Priority queues are containers that allow efficiently
785 inserting values and accessing the maximal value (in the sense
786 of the container's comparison functor); <i>i.e.</i>, their
787 interface supports <tt>push</tt> and <tt>pop</tt>. The STL's
788 priority queues indeed support these methods, but they support
789 little else. For algorithmic and software-engineering purposes,
790 other methods are needed:</p>
792 <ol>
793 <li>Many graph algorithms [<a href=
794 "references.html#clrs2001">clrs2001</a>] require increasing a
795 value in a priority queue (again, in the sense of the
796 container's comparison functor), or joining two
797 priority-queue objects.</li>
799 <li>It is sometimes necessary to erase an arbitrary value in
800 a priority queue. For example, consider the <tt>select</tt>
801 function for monitoring file descriptors:
802 <pre>
803 <b>int</b>
804 select
805 (<b>int</b> nfds,
806 fd_set *readfds,
807 fd_set *writefds,
808 fd_set *errorfds,
809 <b>struct</b> timeval *timeout);
810 </pre>then, as the <tt>select</tt> manual page [<a href=
811 "references.html#select_man">select_man</a>] states:
813 <p><q>The nfds argument specifies the range of file
814 descriptors to be tested. The select() function tests file
815 descriptors in the range of 0 to nfds-1.</q></p>
817 <p>It stands to reason, therefore, that we might wish to
818 maintain a minimal value for <tt>nfds</tt>, and priority
819 queues immediately come to mind. Note, though, that when a
820 socket is closed, the minimal file description might
821 change; in the absence of an efficient means to erase an
822 arbitrary value from a priority queue, we might as well
823 avoid its use altogether.</p>
825 <p><a href="pq_examples.html#xref">Priority-Queue
826 Examples::Cross-Referencing</a> shows examples for these
827 types of operations.</p>
828 </li>
830 <li>STL containers typically support iterators. It is
831 somewhat unusual for <tt>std::priority_queue</tt> to omit
832 them (see, <i>e.g.</i>, [<a href=
833 "references.html#meyers01stl">meyers01stl</a>]). One might
834 ask why do priority queues need to support iterators, since
835 they are self-organizing containers with a different purpose
836 than abstracting sequences. There are several reasons:
838 <ol>
839 <li>Iterators (even in self-organizing containers) are
840 useful for many purposes, <i>e.g.</i>, cross-referencing
841 containers, serialization, and debugging code that uses
842 these containers.</li>
844 <li>The STL's hash-based containers support iterators,
845 even though they too are self-organizing containers with
846 a different purpose than abstracting sequences.</li>
848 <li>In STL-like containers, it is natural to specify the
849 interface of operations for modifying a value or erasing
850 a value (discussed previously) in terms of a iterators.
851 This is discussed further in <a href=
852 "pq_design.html#pq_it">Design::Priority
853 Queues::Iterators</a>. It should be noted that the STL's
854 containers also use iterators for accessing and
855 manipulating a specific value. <i>E.g.</i>, in hash-based
856 containers, one checks the existence of a key by
857 comparing the iterator returned by <tt>find</tt> to the
858 iterator returned by <tt>end</tt>, and not by comparing a
859 pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li>
860 </ol>
861 </li>
862 </ol>
864 <p><a href="pq_performance_tests.html">Performance
865 Tests::Priority Queues</a> quantifies some of these points.</p>
867 <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data
868 Structures and Traits</a></h3>
870 <p>There are three main implementations of priority queues: the
871 first employs a binary heap, typically one which uses a
872 sequence; the second uses a tree (or forest of trees), which is
873 typically less structured than an associative container's tree;
874 the third simply uses an associative container. These are
875 shown, respectively, in Figures <a href=
876 "#pq_different_underlying_dss">Underlying Priority-Queue
877 Data-Structures</a> A1 and A2, B, and C.</p>
879 <h6 class="c1"><a name="pq_different_underlying_dss" id=
880 "pq_different_underlying_dss"><img src=
881 "pq_different_underlying_dss.png" alt="no image" /></a></h6>
883 <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6>
885 <p>No single implementation can completely replace any of the
886 others. Some have better <tt>push</tt> and <tt>pop</tt>
887 amortized performance, some have better bounded (worst case)
888 response time than others, some optimize a single method at the
889 expense of others, <i>etc.</i>. In general the "best"
890 implementation is dictated by the problem (see <a href=
891 "pq_performance_tests.html#pq_observations">Performance
892 Tests::Priority Queues::Observations</a>).</p>
894 <p>As with associative containers (see <a href=
895 "#assoc_ds_genericity">Associative Containers::Traits for
896 Underlying Data-Structures</a>), the more implementations
897 co-exist, the more necessary a traits mechanism is for handling
898 generic containers safely and efficiently. This is especially
899 important for priority queues, since the invalidation
900 guarantees of one of the most useful data structures - binary
901 heaps - is markedly different than those of most of the
902 others.</p>
904 <p><a href="pq_design.html#pq_traits">Design::Priority
905 Queues::Traits</a> discusses this further.</p>
907 <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap
908 Implementation</a></h3>
910 <p>Binary heaps are one of the most useful underlying
911 data structures for priority queues. They are very efficient in
912 terms of memory (since they don't require per-value structure
913 metadata), and have the best amortized <tt>push</tt> and
914 <tt>pop</tt> performance for primitive types (<i>e.g.</i>,
915 <tt><b>int</b></tt>s).</p>
917 <p>The STL's <tt>priority_queue</tt> implements this data
918 structure as an adapter over a sequence, typically
919 <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond
920 to Figures <a href="#pq_different_underlying_dss">Underlying
921 Priority-Queue Data-Structures</a> A1 and A2, respectively.</p>
923 <p>This is indeed an elegant example of the adapter concept and
924 the algorithm/container/iterator decomposition (see [<a href=
925 "references.html#nelson96stlpq">nelson96stlpql</a>]). There are
926 possibly reasons, however, why a binary-heap priority queue
927 would be better implemented as a container instead of a
928 sequence adapter:</p>
930 <ol>
931 <li><tt>std::priority_queue</tt> cannot erase values from its
932 adapted sequence (irrespective of the sequence type). This
933 means that the memory use of an <tt>std::priority_queue</tt>
934 object is always proportional to the maximal number of values
935 it ever contained, and not to the number of values that it
936 currently contains (see <a href=
937 "priority_queue_text_pop_mem_usage_test.html">Priority Queue
938 Text <tt>pop</tt> Memory Use Test</a>); this implementation
939 of binary heaps acts very differently than other underlying
940 data structures (<i>e.g.</i>, pairing heaps).</li>
942 <li>Some combinations of adapted sequences and value types
943 are very inefficient or just don't make sense. If one uses
944 <tt>std::priority_queue&lt;std::vector&lt;std::string&gt;
945 &gt; &gt;</tt>, for example, then not only will each
946 operation perform a logarithmic number of
947 <tt>std::string</tt> assignments, but, furthermore, any
948 operation (including <tt>pop</tt>) can render the container
949 useless due to exceptions. Conversely, if one uses
950 <tt>std::priority_queue&lt;std::deque&lt;<b>int</b>&gt; &gt;
951 &gt;</tt>, then each operation uses incurs a logarithmic
952 number of indirect accesses (through pointers) unnecessarily.
953 It might be better to let the container make a conservative
954 deduction whether to use the structure in Figures <a href=
955 "#pq_different_underlying_dss">Underlying Priority-Queue
956 Data-Structures</a> A1 or A2.</li>
958 <li>There does not seem to be a systematic way to determine
959 what exactly can be done with the priority queue.
961 <ol>
962 <li>If <tt>p</tt> is a priority queue adapting an
963 <tt>std::vector</tt>, then it is possible to iterate over
964 all values by using <tt>&amp;p.top()</tt> and
965 <tt>&amp;p.top() + p.size()</tt>, but this will not work
966 if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any
967 case, one cannot use <tt>p.begin()</tt> and
968 <tt>p.end()</tt>. If a different sequence is adapted, it
969 is even more difficult to determine what can be
970 done.</li>
972 <li>If <tt>p</tt> is a priority queue adapting an
973 <tt>std::deque</tt>, then the reference return by
974 <tt>p.top()</tt> will remain valid until it is popped,
975 but if <tt>p</tt> adapts an <tt>std::vector</tt>, the
976 next <tt>push</tt> will invalidate it. If a different
977 sequence is adapted, it is even more difficult to
978 determine what can be done.</li>
979 </ol>
980 </li>
982 <li>Sequence-based binary heaps can still implement
983 linear-time <tt>erase</tt> and <tt>modify</tt> operations.
984 This means that if one needs, <i>e.g.</i>, to erase a small
985 (say logarithmic) number of values, then one might still
986 choose this underlying data structure. Using
987 <tt>std::priority_queue</tt>, however, this will generally
988 change the order of growth of the entire sequence of
989 operations.</li>
990 </ol>
991 </div>
992 </body>
993 </html>