libstdc++-v3/doc/html/ext/pb_ds/motivation.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   2     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   3
   4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   5 <head>
   6   <meta name="generator" content=
   7   "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
   8
   9   <title>Motivation</title>
  10   <meta http-equiv="Content-Type" content=
  11   "text/html; charset=us-ascii" />
  12   </head>
  13
  14 <body>
  15   <div id="page">
  16     <h1>Motivation</h1>
  17
  18     <p>Many fine associative-container libraries were already
  19     written, most notably, the STL's associative containers. Why
  20     then write another library? This section shows some possible
  21     advantages of this library, when considering the challenges in
  22     <a href="introduction.html">Introduction</a>. Many of these
  23     points stem from the fact that the STL introduced
  24     associative-containers in a two-step process (first
  25     standardizing tree-based containers, only then adding
  26     hash-based containers, which are fundamentally different), did
  27     not standardize priority queues as containers, and (in our
  28     opinion) overloads the iterator concept.</p>
  29
  30     <h2><a name="assoc" id="assoc">Associative Containers</a></h2>
  31
  32     <h3><a name="assoc_policies" id="assoc_policies">More
  33     Configuration Choices</a></h3>
  34
  35     <p>Associative containers require a relatively large number of
  36     policies to function efficiently in various settings. In some
  37     cases this is needed for making their common operations more
  38     efficient, and in other cases this allows them to support a
  39     larger set of operations</p>
  40
  41     <ol>
  42       <li>Hash-based containers, for example, support look-up and
  43       insertion methods (<i>e.g.</i>, <tt>find</tt> and
  44       <tt>insert</tt>). In order to locate elements quickly, they
  45       are supplied a hash functor, which instruct how to transform
  46       a key object into some size type; <i>e.g.</i>, a hash functor
  47       might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A
  48       hash table, though, requires transforming each key object
  49       into some size-type type in some specific domain;
  50       <i>e.g.</i>, a hash table with a 128-long table might
  51       transform <tt>"hello"</tt> into position 63. The policy by
  52       which the hash value is transformed into a position within
  53       the table can dramatically affect performance (see <a href=
  54       "hash_based_containers.html#hash_policies">Design::Associative
  55       Containers::Hash-Based Containers::Hash Policies</a>).
  56       Hash-based containers also do not resize naturally (as
  57       opposed to tree-based containers, for example). The
  58       appropriate resize policy is unfortunately intertwined with
  59       the policy that transforms hash value into a position within
  60       the table (see <a href=
  61       "hash_based_containers.html#resize_policies">Design::Associative
  62       Containers::Hash-Based Containers::Resize Policies</a>).
  63
  64         <p><a href=
  65         "assoc_performance_tests.html#hash_based">Associative-Container
  66         Performance Tests::Hash-Based Containers</a> quantifies
  67         some of these points.</p>
  68       </li>
  69
  70       <li>Tree-based containers, for example, also support look-up
  71       and insertion methods, and are primarily useful when
  72       maintaining order between elements is important. In some
  73       cases, though, one can utilize their balancing algorithms for
  74       completely different purposes.
  75
  76         <p>Figure <a href="#node_invariants">Metadata for
  77         order-statistics and interval intersections</a>-A, for
  78         example, shows a tree whose each node contains two entries:
  79         a floating-point key, and some size-type <i>metadata</i>
  80         (in bold beneath it) that is the number of nodes in the
  81         sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5
  82         nodes (including itself) in its sub-tree.) A container based
  83         on this data structure can obviously answer efficiently
  84         whether 0.3 is in the container object, but it can also
  85         answer what is the order of 0.3 among all those in the
  86         container object [<a href=
  87         "references.html#clrs2001">clrs2001</a>] (see <a href=
  88         "assoc_examples.html#tree_like_based">Associative Container
  89         Examples::Tree-Like-Based Containers (Trees and
  90         Tries)</a>).</p>
  91
  92         <p>As another example, Figure <a href=
  93         "#node_invariants">Metadata for order-statistics and
  94         interval intersections</a>-B shows a tree whose each node
  95         contains two entries: a half-open geometric line interval,
  96         and a number <i>metadata</i> (in bold beneath it) that is
  97         the largest endpoint of all intervals in its sub-tree.
  98         (<i>E.g.</i>, the root describes the interval <i>[20,
  99         36)</i>, and the largest endpoint in its sub-tree is 99.) A
 100         container based on this data structure can obviously answer
 101         efficiently whether <i>[3, 41)</i> is in the container
 102         object, but it can also answer efficiently whether the
 103         container object has intervals that intersect <i>[3,
 104         41)</i> (see <a href=
 105         "assoc_examples.html#tree_like_based">Associative Container
 106         Examples::Tree-Like-Based Containers (Trees and
 107         Tries)</a>). These types of queries are very useful in
 108         geometric algorithms and lease-management algorithms.</p>
 109
 110         <p>It is important to note, however, that as the trees are
 111         modified, their internal structure changes. To maintain
 112         these invariants, one must supply some policy that is aware
 113         of these changes (see <a href=
 114         "tree_based_containers.html#invariants">Design::Associative
 115         Containers::Tree-Based Containers::Node Invariants</a>);
 116         without this, it would be better to use a linked list (in
 117         itself very efficient for these purposes).</p>
 118
 119         <p><a href=
 120         "assoc_performance_tests.html#tree_like_based">Associative-Container
 121         Performance Tests::Tree-Like-Based Containers</a>
 122         quantifies some of these points.</p>
 123       </li>
 124     </ol>
 125
 126     <h6 class="c1"><a name="node_invariants" id=
 127     "node_invariants"><img src="node_invariants.png" alt=
 128     "no image" /></a></h6>
 129
 130     <h6 class="c1">Metadata for order-statistics and interval
 131     intersections.</h6>
 132
 133     <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More
 134     Data Structures and Traits</a></h3>
 135
 136     <p>The STL contains associative containers based on red-black
 137     trees and collision-chaining hash tables. These are obviously
 138     very useful, but they are not ideal for all types of
 139     settings.</p>
 140
 141     <p>Figure <a href=
 142     "#different_underlying_data_structures">Different underlying
 143     data structures</a> shows different underlying data structures
 144     (the ones currently supported in <tt>pb_ds</tt>). A shows a
 145     collision-chaining hash-table, B shows a probing hash-table, C
 146     shows a red-black tree, D shows a splay tree, E shows a tree
 147     based on an ordered vector(implicit in the order of the
 148     elements), F shows a PATRICIA trie, and G shows a list-based
 149     container with update policies.</p>
 150
 151     <p>Each of these data structures has some performance benefits,
 152     in terms of speed, size or both (see <a href=
 153     "assoc_performance_tests.html">Associative-Container
 154     Performance Tests</a> and <a href=
 155     "assoc_performance_tests.html#dss_family_choice">Associative-Container
 156     Performance Tests::Observations::Underlying Data-Structure
 157     Families</a>). For now, though, note that <i>e.g.</i>,
 158     vector-based trees and probing hash tables manipulate memory
 159     more efficiently than red-black trees and collision-chaining
 160     hash tables, and that list-based associative containers are
 161     very useful for constructing "multimaps" (see <a href=
 162     "#assoc_mapping_semantics">Alternative to Multiple Equivalent
 163     Keys</a>, <a href=
 164     "assoc_performance_tests.html#multimaps">Associative Container
 165     Performance Tests::Multimaps</a>, and <a href=
 166     "assoc_performance_tests.html#msc">Associative Container
 167     Performance Tests::Observations::Mapping-Semantics
 168     Considerations</a>).</p>
 169
 170     <h6 class="c1"><a name="different_underlying_data_structures"
 171     id="different_underlying_data_structures"><img src=
 172     "different_underlying_dss.png" alt="no image" /></a></h6>
 173
 174     <h6 class="c1">Different underlying data structures.</h6>
 175
 176     <p>Now consider a function manipulating a generic associative
 177     container, <i>e.g.</i>,</p>
 178     <pre>
 179 <b>template</b>&lt;
 180     <b>class</b> Cntnr&gt;
 181 <b>int</b>
 182     some_op_sequence
 183     (Cntnr &amp;r_cnt)
 184 {
 185     ...
 186 }
 187 </pre>
 188
 189     <p>Ideally, the underlying data structure of <tt>Cntnr</tt>
 190     would not affect what can be done with <tt>r_cnt</tt>.
 191     Unfortunately, this is not the case.</p>
 192
 193     <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then
 194     the function can use <tt>std::for_each(r_cnt.find(foo),
 195     r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt>
 196     to all elements between <tt>foo</tt> and <tt>bar</tt>. If
 197     <tt>Cntnr</tt> is a hash-based container, then this call's
 198     results are undefined.</p>
 199
 200     <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object
 201     of the comparison functor can be accessed. If <tt>Cntnr</tt> is
 202     hash based, these queries are nonsensical.</p>
 203
 204     <p>There are various other differences based on the container's
 205     underlying data structure. For one, they can be constructed by,
 206     and queried for, different policies. Furthermore:</p>
 207
 208     <ol>
 209       <li>Containers based on C, D, E and F store elements in a
 210       meaningful order; the others store elements in a meaningless
 211       (and probably time-varying) order. By implication, only
 212       containers based on C, D, E and F can support erase
 213       operations taking an iterator and returning an iterator to
 214       the following element without performance loss (see <a href=
 215       "#assoc_ers_methods">Slightly Different Methods::Methods
 216       Related to Erase</a>).</li>
 217
 218       <li>Containers based on C, D, E, and F can be split and
 219       joined efficiently, while the others cannot. Containers based
 220       on C and D, furthermore, can guarantee that this is
 221       exception-free; containers based on E cannot guarantee
 222       this.</li>
 223
 224       <li>Containers based on all but E can guarantee that erasing
 225       an element is exception free; containers based on E cannot
 226       guarantee this. Containers based on all but B and E can
 227       guarantee that modifying an object of their type does not
 228       invalidate iterators or references to their elements, while
 229       containers based on B and E cannot. Containers based on C, D,
 230       and E can furthermore make a stronger guarantee, namely that
 231       modifying an object of their type does not affect the order
 232       of iterators.</li>
 233     </ol>
 234
 235     <p>A unified tag and traits system (as used for the STL's
 236     iterators, for example) can ease generic manipulation of
 237     associative containers based on different underlying
 238     data structures (see <a href=
 239     "tutorial.html#assoc_ds_gen">Tutorial::Associative
 240     Containers::Determining Containers' Attributes</a> and <a href=
 241     "ds_gen.html#container_traits">Design::Associative
 242     Containers::Data-Structure Genericity::Data-Structure Tags and
 243     Traits</a>).</p>
 244
 245     <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating
 246     between Iterator Types</a></h3>
 247
 248     <p>Iterators are centric to the STL's design, because of the
 249     container/algorithm/iterator decomposition that allows an
 250     algorithm to operate on a range through iterators of some
 251     sequence (<i>e.g.</i>, one originating from a container).
 252     Iterators, then, are useful because they allow going over a
 253     <u>sequence</u>. The STL also uses iterators for accessing a
 254     <u>specific</u> element - <i>e.g.</i>, when an associative
 255     container returns one through <tt>find</tt>. The STL, however,
 256     consistently uses the same types of iterators for both
 257     purposes: going over a range, and accessing a specific found
 258     element. Before the introduction of hash-based containers to
 259     the STL, this made sense (with the exception of priority
 260     queues, which are discussed in <a href="#pq">Priority
 261     Queues</a>).</p>
 262
 263     <p>Using the STL's associative containers together with
 264     non-order-preserving associative containers (and also because
 265     of priority-queues container), there is a possible need for
 266     different types of iterators for self-organizing containers -
 267     the iterator concept seems overloaded to mean two different
 268     things (in some cases). The following subsections explain this;
 269     <a href="tutorial.html#assoc_find_range">Tutorial::Associative
 270     Containers::Point-Type and Range-Type Methods</a> explains an
 271     alternative design which does not complicate the use of
 272     order-preserving containers, but is better for unordered
 273     containers; <a href=
 274     "ds_gen.html#find_range">Design::Associative
 275     Containers::Data-Structure Genericity::Point-Type and
 276     Range-Type Methods</a> explains the design further.</p>
 277
 278     <h4><a name="assoc_find_it_range_it" id=
 279     "assoc_find_it_range_it">Using Point-Type Iterators for
 280     Range-Type Operations</a></h4>
 281
 282     <p>Suppose <tt>cntnr</tt> is some associative container, and
 283     say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what
 284     will be the outcome of</p>
 285     <pre>
 286 std::for_each(c.find(1), c.find(5), foo);
 287 </pre>
 288
 289     <p>If <tt>cntnr</tt> is a tree-based container object, then an
 290     in-order walk will apply <tt>foo</tt> to the relevant elements,
 291     <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range
 292     iteration in different data structures</a> -A. If <tt>c</tt> is
 293     a hash-based container, then the order of elements between any
 294     two elements is undefined (and probably time-varying); there is
 295     no guarantee that the elements traversed will coincide with the
 296     <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in
 297     Figure <a href="#range_it_in_hts">Range iteration in different
 298     data structures</a>-B.</p>
 299
 300     <h6 class="c1"><a name="range_it_in_hts" id=
 301     "range_it_in_hts"><img src="point_iterators_range_ops_1.png"
 302     alt="no image" /></a></h6>
 303
 304     <h6 class="c1">Range iteration in different
 305     data structures.</h6>
 306
 307     <p>In our opinion, this problem is not caused just because
 308     red-black trees are order preserving while collision-chaining
 309     hash tables are (generally) not - it is more fundamental. Most
 310     of the STL's containers order sequences in a well-defined
 311     manner that is determined by their <u>interface</u>: calling
 312     <tt>insert</tt> on a tree-based container modifies its sequence
 313     in a predictable way, as does calling <tt>push_back</tt> on a
 314     list or a vector. Conversely, collision-chaining hash tables,
 315     probing hash tables, priority queues, and list-based containers
 316     (which are very useful for "multimaps") are self-organizing
 317     data structures; the effect of each operation modifies their
 318     sequences in a manner that is (practically) determined by their
 319     <u>implementation</u>.</p>
 320
 321     <p>Consequently, applying an algorithm to a sequence obtained
 322     from most containers <u>may or may not</u> make sense, but
 323     applying it to a sub-sequence of a self-organizing container
 324     <u>does not</u>.</p>
 325
 326     <h4><a name="assoc_range_it_for_find_it" id=
 327     "assoc_range_it_for_find_it">The Cost of Enabling Range
 328     Capabilities to Point-Type Iterators</a></h4>
 329
 330     <p>Suppose <tt>c</tt> is some collision-chaining hash-based
 331     container object, and one calls <tt>c.find(3)</tt>. Then what
 332     composes the returned iterator?</p>
 333
 334     <p>Figure <a href="#find_its_in_hash_tables">Point-type
 335     iterators in hash tables</a>-A shows the simplest (and most
 336     efficient) implementation of a collision-chaining hash table.
 337     The little box marked <tt>point_iterator</tt> shows an object
 338     that contains a pointer to the element's node. Note that this
 339     "iterator" has no way to move to the next element (<i>i.e.</i>,
 340     it cannot support <tt><b>operator</b>++</tt>). Conversely, the
 341     little box marked <tt>iterator</tt> stores both a pointer to
 342     the element, as well as some other information (<i>e.g.</i>,
 343     the bucket number of the element). the second iterator, then,
 344     is "heavier" than the first one- it requires more time and
 345     space. If we were to use a different container to
 346     cross-reference into this hash-table using these iterators - it
 347     would take much more space. As noted in <a href=
 348     "#assoc_find_it_range_it">Using Point-Type Iterators for
 349     Range-Type Operations</a>, nothing much can be done by
 350     incrementing these iterators, so why is this extra information
 351     needed?</p>
 352
 353     <p>Alternatively, one might create a collision-chaining
 354     hash-table where the lists might be linked, forming a
 355     monolithic total-element list, as in Figure <a href=
 356     "#find_its_in_hash_tables">Point-type iterators in hash
 357     tables</a>-B (this seems similar to the Dinkumware design
 358     [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]).
 359     Here the iterators are as light as can be, but the hash-table's
 360     operations are more complicated.</p>
 361
 362     <h6 class="c1"><a name="find_its_in_hash_tables" id=
 363     "find_its_in_hash_tables"><img src=
 364     "point_iterators_range_ops_2.png" alt="no image" /></a></h6>
 365
 366     <h6 class="c1">Point-type iterators in hash tables.</h6>
 367
 368     <p>It should be noted that containers based on
 369     collision-chaining hash-tables are not the only ones with this
 370     type of behavior; many other self-organizing data structures
 371     display it as well.</p>
 372
 373     <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation
 374     Guarantees</a></h4>
 375
 376     <p>Consider the following snippet:</p>
 377     <pre>
 378 it = c.find(3);
 379
 380 c.erase(5);
 381 </pre>
 382
 383     <p>Following the call to <tt>erase</tt>, what is the validity
 384     of <tt>it</tt>: can it be de-referenced? can it be
 385     incremented?</p>
 386
 387     <p>The answer depends on the underlying data structure of the
 388     container. Figure <a href=
 389     "#invalidation_guarantee_erase">Effect of erase in different
 390     underlying data structures</a> shows three cases: A1 and A2
 391     show a red-black tree; B1 and B2 show a probing hash-table; C1
 392     and C2 show a collision-chaining hash table.</p>
 393
 394     <h6 class="c1"><a name="invalidation_guarantee_erase" id=
 395     "invalidation_guarantee_erase"><img src=
 396     "invalidation_guarantee_erase.png" alt="no image" /></a></h6>
 397
 398     <h6 class="c1">Effect of erase in different underlying
 399     data structures.</h6>
 400
 401     <ol>
 402       <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3
 403       can be de-referenced and incremented. The sequence of
 404       iterators changed, but in a way that is well-defined by the
 405       <u>interface</u>.</li>
 406
 407       <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
 408       not valid at all - it cannot be de-referenced or incremented;
 409       the order of iterators changed in a way that is (practically)
 410       determined by the <u>implementation</u> and not by the
 411       <u>interface</u>.</li>
 412
 413       <li>Erasing 5 from C1 yields C2. Here the situation is more
 414       complicated. On the one hand, there is no problem in
 415       de-referencing <tt>it</tt>. On the other hand, the order of
 416       iterators changed in a way that is (practically) determined
 417       by the <u>implementation</u> and not by the
 418       <u>interface</u>.</li>
 419     </ol>
 420
 421     <p>So in classic STL, it is not always possible to express
 422     whether <tt>it</tt> is valid or not. This is true also for
 423     <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems
 424     overloaded.</p>
 425
 426     <h3><a name="assoc_methods" id="assoc_methods">Slightly
 427     Different Methods</a></h3>
 428
 429     <p>[<a href="references.html#meyers02both">meyers02both</a>]
 430     points out that a class's methods should comprise only
 431     operations which depend on the class's internal structure;
 432     other operations are best designed as external functions.
 433     Possibly, therefore, the STL's associative containers lack some
 434     useful methods, and provide some other methods which would be
 435     better left out (<i>e.g.</i>, [<a href=
 436     "references.html#sgi_stl">sgi_stl</a>] ).</p>
 437
 438     <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods
 439     Related to Erase</a></h4>
 440
 441     <ol>
 442       <li>Order-preserving STL associative containers provide the
 443       method
 444         <pre>
 445 iterator
 446     erase
 447     (iterator it)
 448 </pre>which takes an iterator, erases the corresponding element,
 449 and returns an iterator to the following element. Also hash-based
 450 STL associative containers provide this method. This <u>seemingly
 451 increases</u> genericity between associative containers, since, <i>
 452         e.g.</i>, it is possible to use
 453         <pre>
 454 <b>typename</b> C::iterator it = c.begin();
 455 <b>typename</b> C::iterator e_it = c.end();
 456
 457 <b>while</b>(it != e_it)
 458     it = pred(*it)? c.erase(it) : ++it;
 459 </pre>in order to erase from a container object <tt>
 460         c</tt> all element which match a predicate <tt>pred</tt>.
 461         However, in a different sense this actually
 462         <u>decreases</u> genericity: an integral implication of
 463         this method is that tree-based associative containers'
 464         memory use is linear in the total number of elements they
 465         store, while hash-based containers' memory use is unbounded
 466         in the total number of elements they store. Assume a
 467         hash-based container is allowed to decrease its size when
 468         an element is erased. Then the elements might be rehashed,
 469         which means that there is no "next" element - it is simply
 470         undefined. Consequently, it is possible to infer from the
 471         fact that STL hash-based containers provide this method
 472         that they cannot downsize when elements are erased
 473         (<a href="assoc_performance_tests.html#hash_based">Performance
 474         Tests::Hash-Based Container Tests</a> quantifies this.) As
 475         a consequence, different code is needed to manipulate
 476         different containers, assuming that memory should be
 477         conserved. <tt>pb_ds</tt>'s non-order preserving
 478         associative containers omit this method.
 479       </li>
 480
 481       <li>All of <tt>pb_ds</tt>'s associative containers include a
 482       conditional-erase method
 483         <pre>
 484 <b>template</b>&lt;
 485     <b>class</b> Pred&gt;
 486 size_type
 487     erase_if
 488     (Pred pred)
 489 </pre>which erases all elements matching a predicate. This is
 490 probably the only way to ensure linear-time multiple-item erase
 491 which can actually downsize a container.
 492       </li>
 493
 494       <li>STL associative containers provide methods for
 495       multiple-item erase of the form
 496         <pre>
 497 size_type
 498     erase
 499     (It b,
 500         It e)
 501 </pre>erasing a range of elements given by a pair of iterators. For
 502 tree-based or trie-based containers, this can implemented more
 503 efficiently as a (small) sequence of split and join operations. For
 504 other, unordered, containers, this method isn't much better than an
 505 external loop. Moreover, if <tt>c</tt> is a hash-based container,
 506 then, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost
 507 certain to do something different than erasing all elements whose
 508 keys are between 2 and 5, and is likely to produce other undefined
 509 behavior.
 510       </li>
 511     </ol>
 512
 513     <h4><a name="assoc_split_join_methods" id=
 514     "assoc_split_join_methods">Methods Related to Split and
 515     Join</a></h4>
 516
 517     <p>It is well-known that tree-based and trie-based container
 518     objects can be efficiently split or joined [<a href=
 519     "references.html#clrs2001">clrs2001</a>]. Externally splitting
 520     or joining trees is super-linear, and, furthermore, can throw
 521     exceptions. Split and join methods, consequently, seem good
 522     choices for tree-based container methods, especially, since as
 523     noted just before, they are efficient replacements for erasing
 524     sub-sequences. <a href=
 525     "assoc_performance_tests.html#tree_like_based">Performance
 526     Tests::Tree-Like-Based Container Tests</a> quantifies this.</p>
 527
 528     <h4><a name="assoc_insert_methods" id=
 529     "assoc_insert_methods">Methods Related to Insert</a></h4>
 530
 531     <p>STL associative containers provide methods of the form</p>
 532     <pre>
 533 <b>template</b>&lt;
 534     <b>class</b> It&gt;
 535 size_type
 536     insert
 537     (It b,
 538         It e);
 539 </pre>for inserting a range of elements given by a pair of
 540 iterators. At best, this can be implemented as an external loop,
 541 or, even more efficiently, as a join operation (for the case of
 542 tree-based or trie-based containers). Moreover, these methods seem
 543 similar to constructors taking a range given by a pair of
 544 iterators; the constructors, however, are transactional, whereas
 545 the insert methods are not; this is possibly confusing.
 546
 547     <h4><a name="assoc_equiv_comp_methods" id=
 548     "assoc_equiv_comp_methods">Functions Related to
 549     Comparison</a></h4>
 550
 551     <p>Associative containers are parametrized by policies
 552     allowing to test key equivalence; <i>e.g.</i> a hash-based
 553     container can do this through its equivalence functor, and a
 554     tree-based container can do this through its comparison
 555     functor. In addition, some STL associative containers have
 556     global function operators, <i>e.g.</i>,
 557     <tt><b>operator</b>==</tt> and <tt><b>operator</b>&lt;=</tt>,
 558     that allow comparing entire associative containers.</p>
 559
 560     <p>In our opinion, these functions are better left out. To
 561     begin with, they do not significantly improve over an external
 562     loop. More importantly, however, they are possibly misleading -
 563     <tt><b>operator</b>==</tt>, for example, usually checks for
 564     equivalence, or interchangeability, but the associative
 565     container cannot check for values' equivalence, only keys'
 566     equivalence; also, are two containers considered equivalent if
 567     they store the same values in different order? this is an
 568     arbitrary decision.</p>
 569
 570     <h3><a name="assoc_mapping_semantics" id=
 571     "assoc_mapping_semantics">Alternative to Multiple Equivalent
 572     Keys</a></h3>
 573
 574     <p>Maps (or sets) allow mapping (or storing) unique-key values.
 575     The STL, however, also supplies associative containers which
 576     map (or store) multiple values with equivalent keys:
 577     <tt>std::multimap</tt>, <tt>std::multiset</tt>,
 578     <tt>std::tr1::unordered_multimap</tt>, and
 579     <tt>unordered_multiset</tt>. We first discuss how these might
 580     be used, then why we think it is best to avoid them.</p>
 581
 582     <p>Suppose one builds a simple bank-account application that
 583     records for each client (identified by an <tt>std::string</tt>)
 584     and account-id (marked by an <tt><b>unsigned long</b></tt>) -
 585     the balance in the account (described by a
 586     <tt><b>float</b></tt>). Suppose further that ordering this
 587     information is not useful, so a hash-based container is
 588     preferable to a tree based container. Then one can use</p>
 589     <pre>
 590 std::tr1::unordered_map&lt;std::pair&lt;std::string, <b>unsigned long</b>&gt;, <b>float</b>, ...&gt;
 591 </pre>which <u>hashes every combination of client and
 592 account-id</u>. This might work well, except for the fact that it
 593 is now impossible to efficiently list all of the accounts of a
 594 specific client (this would practically require iterating over all
 595 entries). Instead, one can use
 596     <pre>
 597 std::tr1::unordered_multimap&lt;std::pair&lt;std::string, <tt><b>unsigned long</b></tt>&gt;, <b>float</b>, ...&gt;
 598 </pre>which <u>hashes every client</u>, and <u>decides equivalence
 599 based on client</u> only. This will ensure that all accounts
 600 belonging to a specific user are stored consecutively.
 601
 602     <p>Also, suppose one wants an integers' priority queue
 603     (<i>i.e.,</i> a container that supports <tt>push</tt>,
 604     <tt>pop</tt>, and <tt>top</tt> operations, the last of which
 605     returns the largest <tt><b>int</b></tt>) that also supports
 606     operations such as <tt>find</tt> and <tt>lower_bound</tt>. A
 607     reasonable solution is to build an adapter over
 608     <tt>std::set&lt;<b>int</b>&gt;</tt>. In this adapter,
 609     <i>e.g.</i>, <tt>push</tt> will just call the tree-based
 610     associative container's <tt>insert</tt> method; <tt>pop</tt>
 611     will call its <tt>end</tt> method, and use it to return the
 612     preceding element (which must be the largest). Then this might
 613     work well, except that the container object cannot hold
 614     multiple instances of the same integer (<tt>push(4)</tt>,
 615     <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the
 616     container object). If multiple keys are necessary, then one
 617     might build the adapter over an
 618     <tt>std::multiset&lt;<b>int</b>&gt;</tt>.</p>
 619
 620     <p class="c1">STL non-unique-mapping containers, then, are
 621     useful when (1) a key can be decomposed in to a primary key and
 622     a secondary key, (2) a key is needed multiple times, or (3) any
 623     combination of (1) and (2).</p>
 624
 625     <p>Figure <a href="#embedded_lists_1">Non-unique mapping
 626     containers in the STL's design</a> shows how the STL's design
 627     works internally; in this figure nodes shaded equally represent
 628     equivalent-key values. Equivalent keys are stored consecutively
 629     using the properties of the underlying data structure: binary
 630     search trees (Figure <a href="#embedded_lists_1">Non-unique
 631     mapping containers in the STL's design</a>-A) store
 632     equivalent-key values consecutively (in the sense of an
 633     in-order walk) naturally; collision-chaining hash tables
 634     (Figure <a href="#embedded_lists_1">Non-unique mapping
 635     containers in the STL's design</a>-B) store equivalent-key
 636     values in the same bucket, the bucket can be arranged so that
 637     equivalent-key values are consecutive.</p>
 638
 639     <h6 class="c1"><a name="embedded_lists_1" id=
 640     "embedded_lists_1"><img src="embedded_lists_1.png" alt=
 641     "no image" /></a></h6>
 642
 643     <h6 class="c1">Non-unique mapping containers in the STL's
 644     design.</h6>
 645
 646     <p>Put differently, STL non-unique mapping
 647     associative-containers are associative containers that map
 648     primary keys to linked lists that are embedded into the
 649     container. Figure <a href="#embedded_lists_2">Effect of
 650     embedded lists in STL multimaps</a> shows again the two
 651     containers from Figure <a href="#embedded_lists_1">Non-unique
 652     mapping containers in the STL's design</a>, this time with the
 653     embedded linked lists of the grayed nodes marked
 654     explicitly.</p>
 655
 656     <h6 class="c1"><a name="embedded_lists_2" id=
 657     "embedded_lists_2"><img src="embedded_lists_2.png" alt=
 658     "no image" /></a></h6>
 659
 660     <h6 class="c1">Effect of embedded lists in STL multimaps.</h6>
 661
 662     <p>These embedded linked lists have several disadvantages.</p>
 663
 664     <ol>
 665       <li>The underlying data structure embeds the linked lists
 666       according to its own consideration, which means that the
 667       search path for a value might include several different
 668       equivalent-key values. For example, the search path for the
 669       the black node in either of Figures <a href=
 670       "#embedded_lists_1">Non-unique mapping containers in the
 671       STL's design</a> A or B, includes more than a single gray
 672       node.</li>
 673
 674       <li>The links of the linked lists are the underlying
 675       data structures' nodes, which typically are quite structured.
 676       <i>E.g.</i>, in the case of tree-based containers (Figure
 677       <a href="#embedded_lists_2">Effect of embedded lists in STL
 678       multimaps</a>-B), each "link" is actually a node with three
 679       pointers (one to a parent and two to children), and a
 680       relatively-complicated iteration algorithm. The linked lists,
 681       therefore, can take up quite a lot of memory, and iterating
 682       over all values equal to a given key (<i>e.g.</i>, through
 683       the return value of the STL's <tt>equal_range</tt>) can be
 684       expensive.</li>
 685
 686       <li>The primary key is stored multiply; this uses more
 687       memory.</li>
 688
 689       <li>Finally, the interface of this design excludes several
 690       useful underlying data structures. <i>E.g.</i>, of all the
 691       unordered self-organizing data structures, practically only
 692       collision-chaining hash tables can (efficiently) guarantee
 693       that equivalent-key values are stored consecutively.</li>
 694     </ol>
 695
 696     <p>The above reasons hold even when the ratio of secondary keys
 697     to primary keys (or average number of identical keys) is small,
 698     but when it is large, there are more severe problems:</p>
 699
 700     <ol>
 701       <li>The underlying data structures order the links inside
 702       each embedded linked-lists according to their internal
 703       considerations, which effectively means that each of the
 704       links is unordered. Irrespective of the underlying
 705       data structure, searching for a specific value can degrade to
 706       linear complexity.</li>
 707
 708       <li>Similarly to the above point, it is impossible to apply
 709       to the secondary keys considerations that apply to primary
 710       keys. For example, it is not possible to maintain secondary
 711       keys by sorted order.</li>
 712
 713       <li>While the interface "understands" that all equivalent-key
 714       values constitute a distinct list (<i>e.g.</i>, through
 715       <tt>equal_range</tt>), the underlying data structure
 716       typically does not. This means, <i>e.g.</i>, that operations
 717       such as erasing from a tree-based container all values whose
 718       keys are equivalent to a a given key can be super-linear in
 719       the size of the tree; this is also true also for several
 720       other operations that target a specific list.</li>
 721     </ol>
 722
 723     <p>In <tt>pb_ds</tt>, therefore, all associative containers map
 724     (or store) unique-key values. One can (1) map primary keys to
 725     secondary associative-containers (<i>i.e.</i>, containers of
 726     secondary keys) or non-associative containers (2) map identical
 727     keys to a size-type representing the number of times they
 728     occur, or (3) any combination of (1) and (2). Instead of
 729     allowing multiple equivalent-key values, <tt>pb_ds</tt>
 730     supplies associative containers based on underlying
 731     data structures that are suitable as secondary
 732     associative-containers (see <a href=
 733     "assoc_performance_tests.html#msc">Associative-Container
 734     Performance Tests::Observations::Mapping-Semantics
 735     Considerations</a>).</p>
 736
 737     <p>Figures <a href="#embedded_lists_3">Non-unique mapping
 738     containers in <tt>pb_ds</tt></a> A and B show the equivalent
 739     structures in <tt>pb_ds</tt>'s design, to those in Figures
 740     <a href="#embedded_lists_1">Non-unique mapping containers in
 741     the STL's design</a> A and B, respectively. Each shaded box
 742     represents some size-type or secondary
 743     associative-container.</p>
 744
 745     <h6 class="c1"><a name="embedded_lists_3" id=
 746     "embedded_lists_3"><img src="embedded_lists_3.png" alt=
 747     "no image" /></a></h6>
 748
 749     <h6 class="c1">Non-unique mapping containers in the
 750     <tt>pb_ds</tt>.</h6>
 751
 752     <p>In the first example above, then, one would use an
 753     associative container mapping each user to an associative
 754     container which maps each application id to a start time (see
 755     <a href=
 756     "../../../../testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>);
 757     in the second example, one would use an associative container
 758     mapping each <tt><b>int</b></tt> to some size-type indicating
 759     the number of times it logically occurs (see <a href=
 760     "../../../../testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p>
 761
 762     <p><a href=
 763     "assoc_performance_tests.html#multimaps">Associative-Container
 764     Performance Tests::Multimaps</a> quantifies some of these
 765     points, and <a href=
 766     "assoc_performance_tests.html#msc">Associative-Container
 767     Performance Tests::Observations::Mapping-Semantics
 768     Considerations</a> shows some simple calculations.</p>
 769
 770     <p><a href="assoc_examples.html#mmaps">Associative-Container
 771     Examples::Multimaps</a> shows some simple examples of using
 772     "multimaps".</p>
 773
 774     <p><a href="lu_based_containers.html">Design::Associative
 775     Containers::List-Based Containers</a> discusses types of
 776     containers especially suited as secondary
 777     associative-containers.</p>
 778
 779     <h2><a name="pq" id="pq">Priority Queues</a></h2>
 780
 781     <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different
 782     Methods</a></h3>
 783
 784     <p>Priority queues are containers that allow efficiently
 785     inserting values and accessing the maximal value (in the sense
 786     of the container's comparison functor); <i>i.e.</i>, their
 787     interface supports <tt>push</tt> and <tt>pop</tt>. The STL's
 788     priority queues indeed support these methods, but they support
 789     little else. For algorithmic and software-engineering purposes,
 790     other methods are needed:</p>
 791
 792     <ol>
 793       <li>Many graph algorithms [<a href=
 794       "references.html#clrs2001">clrs2001</a>] require increasing a
 795       value in a priority queue (again, in the sense of the
 796       container's comparison functor), or joining two
 797       priority-queue objects.</li>
 798
 799       <li>It is sometimes necessary to erase an arbitrary value in
 800       a priority queue. For example, consider the <tt>select</tt>
 801       function for monitoring file descriptors:
 802         <pre>
 803 <b>int</b>
 804     select
 805     (<b>int</b> nfds,
 806         fd_set *readfds,
 807         fd_set *writefds,
 808         fd_set *errorfds,
 809         <b>struct</b> timeval *timeout);
 810 </pre>then, as the <tt>select</tt> manual page [<a href=
 811 "references.html#select_man">select_man</a>] states:
 812
 813         <p><q>The nfds argument specifies the range of file
 814         descriptors to be tested. The select() function tests file
 815         descriptors in the range of 0 to nfds-1.</q></p>
 816
 817         <p>It stands to reason, therefore, that we might wish to
 818         maintain a minimal value for <tt>nfds</tt>, and priority
 819         queues immediately come to mind. Note, though, that when a
 820         socket is closed, the minimal file description might
 821         change; in the absence of an efficient means to erase an
 822         arbitrary value from a priority queue, we might as well
 823         avoid its use altogether.</p>
 824
 825         <p><a href="pq_examples.html#xref">Priority-Queue
 826         Examples::Cross-Referencing</a> shows examples for these
 827         types of operations.</p>
 828       </li>
 829
 830       <li>STL containers typically support iterators. It is
 831       somewhat unusual for <tt>std::priority_queue</tt> to omit
 832       them (see, <i>e.g.</i>, [<a href=
 833       "references.html#meyers01stl">meyers01stl</a>]). One might
 834       ask why do priority queues need to support iterators, since
 835       they are self-organizing containers with a different purpose
 836       than abstracting sequences. There are several reasons:
 837
 838         <ol>
 839           <li>Iterators (even in self-organizing containers) are
 840           useful for many purposes, <i>e.g.</i>, cross-referencing
 841           containers, serialization, and debugging code that uses
 842           these containers.</li>
 843
 844           <li>The STL's hash-based containers support iterators,
 845           even though they too are self-organizing containers with
 846           a different purpose than abstracting sequences.</li>
 847
 848           <li>In STL-like containers, it is natural to specify the
 849           interface of operations for modifying a value or erasing
 850           a value (discussed previously) in terms of a iterators.
 851           This is discussed further in <a href=
 852           "pq_design.html#pq_it">Design::Priority
 853           Queues::Iterators</a>. It should be noted that the STL's
 854           containers also use iterators for accessing and
 855           manipulating a specific value. <i>E.g.</i>, in hash-based
 856           containers, one checks the existence of a key by
 857           comparing the iterator returned by <tt>find</tt> to the
 858           iterator returned by <tt>end</tt>, and not by comparing a
 859           pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li>
 860         </ol>
 861       </li>
 862     </ol>
 863
 864     <p><a href="pq_performance_tests.html">Performance
 865     Tests::Priority Queues</a> quantifies some of these points.</p>
 866
 867     <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data
 868     Structures and Traits</a></h3>
 869
 870     <p>There are three main implementations of priority queues: the
 871     first employs a binary heap, typically one which uses a
 872     sequence; the second uses a tree (or forest of trees), which is
 873     typically less structured than an associative container's tree;
 874     the third simply uses an associative container. These are
 875     shown, respectively, in Figures <a href=
 876     "#pq_different_underlying_dss">Underlying Priority-Queue
 877     Data-Structures</a> A1 and A2, B, and C.</p>
 878
 879     <h6 class="c1"><a name="pq_different_underlying_dss" id=
 880     "pq_different_underlying_dss"><img src=
 881     "pq_different_underlying_dss.png" alt="no image" /></a></h6>
 882
 883     <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6>
 884
 885     <p>No single implementation can completely replace any of the
 886     others. Some have better <tt>push</tt> and <tt>pop</tt>
 887     amortized performance, some have better bounded (worst case)
 888     response time than others, some optimize a single method at the
 889     expense of others, <i>etc.</i>. In general the "best"
 890     implementation is dictated by the problem (see <a href=
 891     "pq_performance_tests.html#pq_observations">Performance
 892     Tests::Priority Queues::Observations</a>).</p>
 893
 894     <p>As with associative containers (see <a href=
 895     "#assoc_ds_genericity">Associative Containers::Traits for
 896     Underlying Data-Structures</a>), the more implementations
 897     co-exist, the more necessary a traits mechanism is for handling
 898     generic containers safely and efficiently. This is especially
 899     important for priority queues, since the invalidation
 900     guarantees of one of the most useful data structures - binary
 901     heaps - is markedly different than those of most of the
 902     others.</p>
 903
 904     <p><a href="pq_design.html#pq_traits">Design::Priority
 905     Queues::Traits</a> discusses this further.</p>
 906
 907     <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap
 908     Implementation</a></h3>
 909
 910     <p>Binary heaps are one of the most useful underlying
 911     data structures for priority queues. They are very efficient in
 912     terms of memory (since they don't require per-value structure
 913     metadata), and have the best amortized <tt>push</tt> and
 914     <tt>pop</tt> performance for primitive types (<i>e.g.</i>,
 915     <tt><b>int</b></tt>s).</p>
 916
 917     <p>The STL's <tt>priority_queue</tt> implements this data
 918     structure as an adapter over a sequence, typically
 919     <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond
 920     to Figures <a href="#pq_different_underlying_dss">Underlying
 921     Priority-Queue Data-Structures</a> A1 and A2, respectively.</p>
 922
 923     <p>This is indeed an elegant example of the adapter concept and
 924     the algorithm/container/iterator decomposition (see [<a href=
 925     "references.html#nelson96stlpq">nelson96stlpql</a>]). There are
 926     possibly reasons, however, why a binary-heap priority queue
 927     would be better implemented as a container instead of a
 928     sequence adapter:</p>
 929
 930     <ol>
 931       <li><tt>std::priority_queue</tt> cannot erase values from its
 932       adapted sequence (irrespective of the sequence type). This
 933       means that the memory use of an <tt>std::priority_queue</tt>
 934       object is always proportional to the maximal number of values
 935       it ever contained, and not to the number of values that it
 936       currently contains (see <a href=
 937       "priority_queue_text_pop_mem_usage_test.html">Priority Queue
 938       Text <tt>pop</tt> Memory Use Test</a>); this implementation
 939       of binary heaps acts very differently than other underlying
 940       data structures (<i>e.g.</i>, pairing heaps).</li>
 941
 942       <li>Some combinations of adapted sequences and value types
 943       are very inefficient or just don't make sense. If one uses
 944       <tt>std::priority_queue&lt;std::vector&lt;std::string&gt;
 945       &gt; &gt;</tt>, for example, then not only will each
 946       operation perform a logarithmic number of
 947       <tt>std::string</tt> assignments, but, furthermore, any
 948       operation (including <tt>pop</tt>) can render the container
 949       useless due to exceptions. Conversely, if one uses
 950       <tt>std::priority_queue&lt;std::deque&lt;<b>int</b>&gt; &gt;
 951       &gt;</tt>, then each operation uses incurs a logarithmic
 952       number of indirect accesses (through pointers) unnecessarily.
 953       It might be better to let the container make a conservative
 954       deduction whether to use the structure in Figures <a href=
 955       "#pq_different_underlying_dss">Underlying Priority-Queue
 956       Data-Structures</a> A1 or A2.</li>
 957
 958       <li>There does not seem to be a systematic way to determine
 959       what exactly can be done with the priority queue.
 960
 961         <ol>
 962           <li>If <tt>p</tt> is a priority queue adapting an
 963           <tt>std::vector</tt>, then it is possible to iterate over
 964           all values by using <tt>&amp;p.top()</tt> and
 965           <tt>&amp;p.top() + p.size()</tt>, but this will not work
 966           if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any
 967           case, one cannot use <tt>p.begin()</tt> and
 968           <tt>p.end()</tt>. If a different sequence is adapted, it
 969           is even more difficult to determine what can be
 970           done.</li>
 971
 972           <li>If <tt>p</tt> is a priority queue adapting an
 973           <tt>std::deque</tt>, then the reference return by
 974           <tt>p.top()</tt> will remain valid until it is popped,
 975           but if <tt>p</tt> adapts an <tt>std::vector</tt>, the
 976           next <tt>push</tt> will invalidate it. If a different
 977           sequence is adapted, it is even more difficult to
 978           determine what can be done.</li>
 979         </ol>
 980       </li>
 981
 982       <li>Sequence-based binary heaps can still implement
 983       linear-time <tt>erase</tt> and <tt>modify</tt> operations.
 984       This means that if one needs, <i>e.g.</i>, to erase a small
 985       (say logarithmic) number of values, then one might still
 986       choose this underlying data structure. Using
 987       <tt>std::priority_queue</tt>, however, this will generally
 988       change the order of growth of the entire sequence of
 989       operations.</li>
 990     </ol>
 991   </div>
 992 </body>
 993 </html>