libstdc++-v3/doc/xml/manual/policy_data_structures.xml

   1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
   2          xml:id="manual.ext.containers.pbds" xreflabel="pbds">
   3   <info>
   4     <title>Policy-Based Data Structures</title>
   5     <keywordset>
   6       <keyword>ISO C++</keyword>
   7       <keyword>policy</keyword>
   8       <keyword>container</keyword>
   9       <keyword>data</keyword>
  10       <keyword>structure</keyword>
  11       <keyword>associated</keyword>
  12       <keyword>tree</keyword>
  13       <keyword>trie</keyword>
  14       <keyword>hash</keyword>
  15       <keyword>metaprogramming</keyword>
  16     </keywordset>
  17   </info>
  18   <?dbhtml filename="policy_data_structures.html"?>
  19
  20   <!-- 2006-04-01 Ami Tavory -->
  21   <!-- 2011-05-25 Benjamin Kosnik -->
  22
  23   <!-- S01: intro -->
  24   <section xml:id="pbds.intro">
  25     <info><title>Intro</title></info>
  26
  27     <para>
  28       This is a library of policy-based elementary data structures:
  29       associative containers and priority queues. It is designed for
  30       high-performance, flexibility, semantic safety, and conformance to
  31       the corresponding containers in <literal>std</literal> and
  32       <literal>std::tr1</literal> (except for some points where it differs
  33       by design).
  34     </para>
  35     <para>
  36     </para>
  37
  38     <section xml:id="pbds.intro.issues">
  39       <info><title>Performance Issues</title></info>
  40       <para>
  41       </para>
  42
  43       <para>
  44         An attempt is made to categorize the wide variety of possible
  45         container designs in terms of performance-impacting factors. These
  46         performance factors are translated into design policies and
  47         incorporated into container design.
  48       </para>
  49
  50       <para>
  51         There is tension between unravelling factors into a coherent set of
  52         policies. Every attempt is made to make a minimal set of
  53         factors. However, in many cases multiple factors make for long
  54         template names. Every attempt is made to alias and use typedefs in
  55         the source files, but the generated names for external symbols can
  56         be large for binary files or debuggers.
  57       </para>
  58
  59       <para>
  60         In many cases, the longer names allow capabilities and behaviours
  61         controlled by macros to also be unamibiguously emitted as distinct
  62         generated names.
  63       </para>
  64
  65       <para>
  66         Specific issues found while unraveling performance factors in the
  67         design of associative containers and priority queues follow.
  68       </para>
  69
  70       <section xml:id="pbds.intro.issues.associative">
  71         <info><title>Associative</title></info>
  72
  73         <para>
  74           Associative containers depend on their composite policies to a very
  75           large extent. Implicitly hard-wiring policies can hamper their
  76           performance and limit their functionality. An efficient hash-based
  77           container, for example, requires policies for testing key
  78           equivalence, hashing keys, translating hash values into positions
  79           within the hash table, and determining when and how to resize the
  80           table internally. A tree-based container can efficiently support
  81           order statistics, i.e. the ability to query what is the order of
  82           each key within the sequence of keys in the container, but only if
  83           the container is supplied with a policy to internally update
  84           meta-data. There are many other such examples.
  85         </para>
  86
  87         <para>
  88           Ideally, all associative containers would share the same
  89           interface. Unfortunately, underlying data structures and mapping
  90           semantics differentiate between different containers. For example,
  91           suppose one writes a generic function manipulating an associative
  92           container.
  93         </para>
  94
  95         <programlisting>
  96           template&lt;typename Cntnr&gt;
  97           void
  98           some_op_sequence(Cntnr&amp; r_cnt)
  99           {
 100           ...
 101           }
 102         </programlisting>
 103
 104         <para>
 105           Given this, then what can one assume about the instantiating
 106           container? The answer varies according to its underlying data
 107           structure. If the underlying data structure of
 108           <literal>Cntnr</literal> is based on a tree or trie, then the order
 109           of elements is well defined; otherwise, it is not, in general. If
 110           the underlying data structure of <literal>Cntnr</literal> is based
 111           on a collision-chaining hash table, then modifying
 112           r_<literal>Cntnr</literal> will not invalidate its iterators' order;
 113           if the underlying data structure is a probing hash table, then this
 114           is not the case. If the underlying data structure is based on a tree
 115           or trie, then a reference to the container can efficiently be split;
 116           otherwise, it cannot, in general. If the underlying data structure
 117           is a red-black tree, then splitting a reference to the container is
 118           exception-free; if it is an ordered-vector tree, exceptions can be
 119           thrown.
 120         </para>
 121
 122       </section>
 123
 124       <section xml:id="pbds.intro.issues.priority_queue">
 125         <info><title>Priority Que</title></info>
 126
 127         <para>
 128           Priority queues are useful when one needs to efficiently access a
 129           minimum (or maximum) value as the set of values changes.
 130         </para>
 131
 132         <para>
 133           Most useful data structures for priority queues have a relatively
 134           simple structure, as they are geared toward relatively simple
 135           requirements. Unfortunately, these structures do not support access
 136           to an arbitrary value, which turns out to be necessary in many
 137           algorithms. Say, decreasing an arbitrary value in a graph
 138           algorithm. Therefore, some extra mechanism is necessary and must be
 139           invented for accessing arbitrary values. There are at least two
 140           alternatives: embedding an associative container in a priority
 141           queue, or allowing cross-referencing through iterators. The first
 142           solution adds significant overhead; the second solution requires a
 143           precise definition of iterator invalidation. Which is the next
 144           point...
 145         </para>
 146
 147         <para>
 148           Priority queues, like hash-based containers, store values in an
 149           order that is meaningless and undefined externally. For example, a
 150           <code>push</code> operation can internally reorganize the
 151           values. Because of this characteristic, describing a priority
 152           queues' iterator is difficult: on one hand, the values to which
 153           iterators point can remain valid, but on the other, the logical
 154           order of iterators can change unpredictably.
 155         </para>
 156
 157         <para>
 158           Roughly speaking, any element that is both inserted to a priority
 159           queue (e.g. through <code>push</code>) and removed
 160           from it (e.g., through <code>pop</code>), incurs a
 161           logarithmic overhead (in the amortized sense). Different underlying
 162           data structures place the actual cost differently: some are
 163           optimized for amortized complexity, whereas others guarantee that
 164           specific operations only have a constant cost. One underlying data
 165           structure might be chosen if modifying a value is frequent
 166           (Dijkstra's shortest-path algorithm), whereas a different one might
 167           be chosen otherwise. Unfortunately, an array-based binary heap - an
 168           underlying data structure that optimizes (in the amortized sense)
 169           <code>push</code> and <code>pop</code> operations, differs from the
 170           others in terms of its invalidation guarantees. Other design
 171           decisions also impact the cost and placement of the overhead, at the
 172           expense of more difference in the kinds of operations that the
 173           underlying data structure can support. These differences pose a
 174           challenge when creating a uniform interface for priority queues.
 175         </para>
 176       </section>
 177     </section>
 178
 179     <section xml:id="pbds.intro.motivation">
 180       <info><title>Goals</title></info>
 181
 182       <para>
 183         Many fine associative-container libraries were already written,
 184         most notably, the C++ standard's associative containers. Why
 185         then write another library? This section shows some possible
 186         advantages of this library, when considering the challenges in
 187         the introduction. Many of these points stem from the fact that
 188         the ISO C++ process introduced associative-containers in a
 189         two-step process (first standardizing tree-based containers,
 190         only then adding hash-based containers, which are fundamentally
 191         different), did not standardize priority queues as containers,
 192         and (in our opinion) overloads the iterator concept.
 193       </para>
 194
 195       <section xml:id="pbds.intro.motivation.associative">
 196         <info><title>Associative</title></info>
 197         <para>
 198         </para>
 199
 200         <section xml:id="motivation.associative.policy">
 201           <info><title>Policy Choices</title></info>
 202           <para>
 203             Associative containers require a relatively large number of
 204             policies to function efficiently in various settings. In some
 205             cases this is needed for making their common operations more
 206             efficient, and in other cases this allows them to support a
 207             larger set of operations
 208           </para>
 209
 210           <orderedlist>
 211             <listitem>
 212               <para>
 213                 Hash-based containers, for example, support look-up and
 214                 insertion methods (<function>find</function> and
 215                 <function>insert</function>). In order to locate elements
 216                 quickly, they are supplied a hash functor, which instruct
 217                 how to transform a key object into some size type; a hash
 218                 functor might transform <constant>"hello"</constant>
 219                 into <constant>1123002298</constant>. A hash table, though,
 220                 requires transforming each key object into some size-type
 221                 type in some specific domain; a hash table with a 128-long
 222                 table might transform <constant>"hello"</constant> into
 223                 position <constant>63</constant>. The policy by which the
 224                 hash value is transformed into a position within the table
 225                 can dramatically affect performance.  Hash-based containers
 226                 also do not resize naturally (as opposed to tree-based
 227                 containers, for example). The appropriate resize policy is
 228                 unfortunately intertwined with the policy that transforms
 229                 hash value into a position within the table.
 230               </para>
 231             </listitem>
 232
 233             <listitem>
 234               <para>
 235                 Tree-based containers, for example, also support look-up and
 236                 insertion methods, and are primarily useful when maintaining
 237                 order between elements is important. In some cases, though,
 238                 one can utilize their balancing algorithms for completely
 239                 different purposes.
 240               </para>
 241
 242               <para>
 243                 Figure A shows a tree whose each node contains two entries:
 244                 a floating-point key, and some size-type
 245                 <emphasis>metadata</emphasis> (in bold beneath it) that is
 246                 the number of nodes in the sub-tree. (The root has key 0.99,
 247                 and has 5 nodes (including itself) in its sub-tree.) A
 248                 container based on this data structure can obviously answer
 249                 efficiently whether 0.3 is in the container object, but it
 250                 can also answer what is the order of 0.3 among all those in
 251                 the container object: see <xref linkend="biblio.clrs2001"/>.
 252
 253               </para>
 254
 255               <para>
 256                 As another example, Figure B shows a tree whose each node
 257                 contains two entries: a half-open geometric line interval,
 258                 and a number <emphasis>metadata</emphasis> (in bold beneath
 259                 it) that is the largest endpoint of all intervals in its
 260                 sub-tree.  (The root describes the interval <constant>[20,
 261                 36)</constant>, and the largest endpoint in its sub-tree is
 262                 99.) A container based on this data structure can obviously
 263                 answer efficiently whether <constant>[3, 41)</constant> is
 264                 in the container object, but it can also answer efficiently
 265                 whether the container object has intervals that intersect
 266                 <constant>[3, 41)</constant>. These types of queries are
 267                 very useful in geometric algorithms and lease-management
 268                 algorithms.
 269               </para>
 270
 271               <para>
 272                 It is important to note, however, that as the trees are
 273                 modified, their internal structure changes. To maintain
 274                 these invariants, one must supply some policy that is aware
 275                 of these changes.  Without this, it would be better to use a
 276                 linked list (in itself very efficient for these purposes).
 277               </para>
 278
 279             </listitem>
 280           </orderedlist>
 281
 282           <figure>
 283             <title>Node Invariants</title>
 284             <mediaobject>
 285               <imageobject>
 286                 <imagedata align="center" format="PNG" scale="100"
 287                            fileref="../images/pbds_node_invariants.png"/>
 288               </imageobject>
 289               <textobject>
 290                 <phrase>Node Invariants</phrase>
 291               </textobject>
 292             </mediaobject>
 293           </figure>
 294
 295         </section>
 296
 297         <section xml:id="motivation.associative.underlying">
 298           <info><title>Underlying Data Structures</title></info>
 299           <para>
 300             The standard C++ library contains associative containers based on
 301             red-black trees and collision-chaining hash tables. These are
 302             very useful, but they are not ideal for all types of
 303             settings.
 304           </para>
 305
 306           <para>
 307             The figure below shows the different underlying data structures
 308             currently supported in this library.
 309           </para>
 310
 311           <figure>
 312             <title>Underlying Associative Data Structures</title>
 313             <mediaobject>
 314               <imageobject>
 315                 <imagedata align="center" format="PNG" scale="100"
 316                            fileref="../images/pbds_different_underlying_dss_1.png"/>
 317               </imageobject>
 318               <textobject>
 319                 <phrase>Underlying Associative Data Structures</phrase>
 320               </textobject>
 321             </mediaobject>
 322           </figure>
 323
 324           <para>
 325             A shows a collision-chaining hash-table, B shows a probing
 326             hash-table, C shows a red-black tree, D shows a splay tree, E shows
 327             a tree based on an ordered vector(implicit in the order of the
 328             elements), F shows a PATRICIA trie, and G shows a list-based
 329             container with update policies.
 330           </para>
 331
 332           <para>
 333             Each of these data structures has some performance benefits, in
 334             terms of speed, size or both. For now, note that vector-based trees
 335             and probing hash tables manipulate memory more efficiently than
 336             red-black trees and collision-chaining hash tables, and that
 337             list-based associative containers are very useful for constructing
 338             "multimaps".
 339           </para>
 340
 341           <para>
 342             Now consider a function manipulating a generic associative
 343             container,
 344           </para>
 345           <programlisting>
 346             template&lt;class Cntnr&gt;
 347             int
 348             some_op_sequence(Cntnr &amp;r_cnt)
 349             {
 350             ...
 351             }
 352           </programlisting>
 353
 354           <para>
 355             Ideally, the underlying data structure
 356             of <classname>Cntnr</classname> would not affect what can be
 357             done with <varname>r_cnt</varname>.  Unfortunately, this is not
 358             the case.
 359           </para>
 360
 361           <para>
 362             For example, if <classname>Cntnr</classname>
 363             is <classname>std::map</classname>, then the function can
 364             use
 365           </para>
 366           <programlisting>
 367             std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
 368           </programlisting>
 369           <para>
 370             in order to apply <classname>foobar</classname> to all
 371             elements between <classname>foo</classname> and
 372             <classname>bar</classname>. If
 373             <classname>Cntnr</classname> is a hash-based container,
 374             then this call's results are undefined.
 375           </para>
 376
 377           <para>
 378             Also, if <classname>Cntnr</classname> is tree-based, the type
 379             and object of the comparison functor can be
 380             accessed. If <classname>Cntnr</classname> is hash based, these
 381             queries are nonsensical.
 382           </para>
 383
 384           <para>
 385             There are various other differences based on the container's
 386             underlying data structure. For one, they can be constructed by,
 387             and queried for, different policies. Furthermore:
 388           </para>
 389
 390           <orderedlist>
 391             <listitem>
 392               <para>
 393                 Containers based on C, D, E and F store elements in a
 394                 meaningful order; the others store elements in a meaningless
 395                 (and probably time-varying) order. By implication, only
 396                 containers based on C, D, E and F can
 397                 support <function>erase</function> operations taking an
 398                 iterator and returning an iterator to the following element
 399                 without performance loss.
 400               </para>
 401             </listitem>
 402
 403             <listitem>
 404               <para>
 405                 Containers based on C, D, E, and F can be split and joined
 406                 efficiently, while the others cannot. Containers based on C
 407                 and D, furthermore, can guarantee that this is exception-free;
 408                 containers based on E cannot guarantee this.
 409               </para>
 410             </listitem>
 411
 412             <listitem>
 413               <para>
 414                 Containers based on all but E can guarantee that
 415                 erasing an element is exception free; containers based on E
 416                 cannot guarantee this. Containers based on all but B and E
 417                 can guarantee that modifying an object of their type does
 418                 not invalidate iterators or references to their elements,
 419                 while containers based on B and E cannot. Containers based
 420                 on C, D, and E can furthermore make a stronger guarantee,
 421                 namely that modifying an object of their type does not
 422                 affect the order of iterators.
 423               </para>
 424             </listitem>
 425           </orderedlist>
 426
 427           <para>
 428             A unified tag and traits system (as used for the C++ standard
 429             library iterators, for example) can ease generic manipulation of
 430             associative containers based on different underlying data
 431             structures.
 432           </para>
 433
 434         </section>
 435
 436         <section xml:id="motivation.associative.iterators">
 437           <info><title>Iterators</title></info>
 438           <para>
 439             Iterators are centric to the design of the standard library
 440             containers, because of the container/algorithm/iterator
 441             decomposition that allows an algorithm to operate on a range
 442             through iterators of some sequence.  Iterators, then, are useful
 443             because they allow going over a
 444             specific <emphasis>sequence</emphasis>.  The standard library
 445             also uses iterators for accessing a
 446             specific <emphasis>element</emphasis>: when an associative
 447             container returns one through <function>find</function>. The
 448             standard library consistently uses the same types of iterators
 449             for both purposes: going over a range, and accessing a specific
 450             found element. Before the introduction of hash-based containers
 451             to the standard library, this made sense (with the exception of
 452             priority queues, which are discussed later).
 453           </para>
 454
 455           <para>
 456             Using the standard associative containers together with
 457             non-order-preserving associative containers (and also because of
 458             priority-queues container), there is a possible need for
 459             different types of iterators for self-organizing containers:
 460             the iterator concept seems overloaded to mean two different
 461             things (in some cases). <remark> XXX
 462             "ds_gen.html#find_range">Design::Associative
 463             Containers::Data-Structure Genericity::Point-Type and Range-Type
 464             Methods</remark>.
 465           </para>
 466
 467           <section xml:id="associative.iterators.using">
 468             <info>
 469               <title>Using Point Iterators for Range Operations</title>
 470             </info>
 471             <para>
 472               Suppose <classname>cntnr</classname> is some associative
 473               container, and say <varname>c</varname> is an object of
 474               type <classname>cntnr</classname>. Then what will be the outcome
 475               of
 476             </para>
 477
 478             <programlisting>
 479               std::for_each(c.find(1), c.find(5), foo);
 480             </programlisting>
 481
 482             <para>
 483               If <classname>cntnr</classname> is a tree-based container
 484               object, then an in-order walk will
 485               apply <classname>foo</classname> to the relevant elements,
 486               as in the graphic below, label A. If <varname>c</varname> is
 487               a hash-based container, then the order of elements between any
 488               two elements is undefined (and probably time-varying); there is
 489               no guarantee that the elements traversed will coincide with the
 490               <emphasis>logical</emphasis> elements between 1 and 5, as in
 491               label B.
 492             </para>
 493
 494             <figure>
 495               <title>Range Iteration in Different Data Structures</title>
 496               <mediaobject>
 497                 <imageobject>
 498                   <imagedata align="center" format="PNG" scale="100"
 499                              fileref="../images/pbds_point_iterators_range_ops_1.png"/>
 500                 </imageobject>
 501                 <textobject>
 502                   <phrase>Node Invariants</phrase>
 503                 </textobject>
 504               </mediaobject>
 505             </figure>
 506
 507             <para>
 508               In our opinion, this problem is not caused just because
 509               red-black trees are order preserving while
 510               collision-chaining hash tables are (generally) not - it
 511               is more fundamental. Most of the standard's containers
 512               order sequences in a well-defined manner that is
 513               determined by their <emphasis>interface</emphasis>:
 514               calling <function>insert</function> on a tree-based
 515               container modifies its sequence in a predictable way, as
 516               does calling <function>push_back</function> on a list or
 517               a vector. Conversely, collision-chaining hash tables,
 518               probing hash tables, priority queues, and list-based
 519               containers (which are very useful for "multimaps") are
 520               self-organizing data structures; the effect of each
 521               operation modifies their sequences in a manner that is
 522               (practically) determined by their
 523               <emphasis>implementation</emphasis>.
 524             </para>
 525
 526             <para>
 527               Consequently, applying an algorithm to a sequence obtained from most
 528               containers may or may not make sense, but applying it to a
 529               sub-sequence of a self-organizing container does not.
 530             </para>
 531           </section>
 532
 533           <section xml:id="associative.iterators.cost">
 534             <info>
 535               <title>Cost to Point Iterators to Enable Range Operations</title>
 536             </info>
 537             <para>
 538               Suppose <varname>c</varname> is some collision-chaining
 539               hash-based container object, and one calls
 540             </para>
 541             <programlisting>c.find(3)</programlisting>
 542             <para>
 543               Then what composes the returned iterator?
 544             </para>
 545
 546             <para>
 547               In the graphic below, label A shows the simplest (and
 548               most efficient) implementation of a collision-chaining
 549               hash table.  The little box marked
 550               <classname>point_iterator</classname> shows an object
 551               that contains a pointer to the element's node. Note that
 552               this "iterator" has no way to move to the next element (
 553               it cannot support
 554               <function>operator++</function>). Conversely, the little
 555               box marked <classname>iterator</classname> stores both a
 556               pointer to the element, as well as some other
 557               information (the bucket number of the element). the
 558               second iterator, then, is "heavier" than the first one-
 559               it requires more time and space. If we were to use a
 560               different container to cross-reference into this
 561               hash-table using these iterators - it would take much
 562               more space. As noted above, nothing much can be done by
 563               incrementing these iterators, so why is this extra
 564               information needed?
 565             </para>
 566
 567             <para>
 568               Alternatively, one might create a collision-chaining hash-table
 569               where the lists might be linked, forming a monolithic total-element
 570               list, as in the graphic below, label B.  Here the iterators are as
 571               light as can be, but the hash-table's operations are more
 572               complicated.
 573             </para>
 574
 575             <figure>
 576               <title>Point Iteration in Hash Data Structures</title>
 577               <mediaobject>
 578                 <imageobject>
 579                   <imagedata align="center" format="PNG" scale="100"
 580                              fileref="../images/pbds_point_iterators_range_ops_2.png"/>
 581                 </imageobject>
 582                 <textobject>
 583                   <phrase>Point Iteration in Hash Data Structures</phrase>
 584                 </textobject>
 585               </mediaobject>
 586             </figure>
 587
 588             <para>
 589               It should be noted that containers based on collision-chaining
 590               hash-tables are not the only ones with this type of behavior;
 591               many other self-organizing data structures display it as well.
 592             </para>
 593           </section>
 594
 595           <section xml:id="associative.iterators.invalidation">
 596             <info><title>Invalidation Guarantees</title></info>
 597             <para>Consider the following snippet:</para>
 598             <programlisting>
 599               it = c.find(3);
 600               c.erase(5);
 601             </programlisting>
 602
 603             <para>
 604               Following the call to <classname>erase</classname>, what is the
 605               validity of <classname>it</classname>: can it be de-referenced?
 606               can it be incremented?
 607             </para>
 608
 609             <para>
 610               The answer depends on the underlying data structure of the
 611               container. The graphic below shows three cases: A1 and A2 show
 612               a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
 613               show a collision-chaining hash table.
 614             </para>
 615
 616             <figure>
 617               <title>Effect of erase in different underlying data structures</title>
 618               <mediaobject>
 619                 <imageobject>
 620                   <imagedata align="center" format="PNG" scale="100"
 621                              fileref="../images/pbds_invalidation_guarantee_erase.png"/>
 622                 </imageobject>
 623                 <textobject>
 624                   <phrase>Effect of erase in different underlying data structures</phrase>
 625                 </textobject>
 626               </mediaobject>
 627             </figure>
 628
 629             <orderedlist>
 630               <listitem>
 631                 <para>
 632                   Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
 633                   be de-referenced and incremented. The sequence of iterators
 634                   changed, but in a way that is well-defined by the interface.
 635                 </para>
 636               </listitem>
 637
 638               <listitem>
 639                 <para>
 640                   Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
 641                   not valid at all - it cannot be de-referenced or
 642                   incremented; the order of iterators changed in a way that is
 643                   (practically) determined by the implementation and not by
 644                   the interface.
 645                 </para>
 646               </listitem>
 647
 648               <listitem>
 649                 <para>
 650                   Erasing 5 from C1 yields C2. Here the situation is more
 651                   complicated. On the one hand, there is no problem in
 652                   de-referencing <classname>it</classname>. On the other hand,
 653                   the order of iterators changed in a way that is
 654                   (practically) determined by the implementation and not by
 655                   the interface.
 656                 </para>
 657               </listitem>
 658             </orderedlist>
 659
 660             <para>
 661               So in the standard library containers, it is not always possible
 662               to express whether <varname>it</varname> is valid or not. This
 663               is true also for <function>insert</function>. Again, the
 664               iterator concept seems overloaded.
 665             </para>
 666           </section>
 667         </section> <!--iterators-->
 668
 669
 670         <section xml:id="motivation.associative.functions">
 671           <info><title>Functional</title></info>
 672           <para>
 673           </para>
 674
 675           <para>
 676             The design of the functional overlay to the underlying data
 677             structures differs slightly from some of the conventions used in
 678             the C++ standard.  A strict public interface of methods that
 679             comprise only operations which depend on the class's internal
 680             structure; other operations are best designed as external
 681             functions. (See <xref linkend="biblio.meyers02both"/>).With this
 682             rubric, the standard associative containers lack some useful
 683             methods, and provide other methods which would be better
 684             removed.
 685           </para>
 686
 687           <section xml:id="motivation.associative.functions.erase">
 688             <info><title><function>erase</function></title></info>
 689
 690             <orderedlist>
 691               <listitem>
 692                 <para>
 693                   Order-preserving standard associative containers provide the
 694                   method
 695                 </para>
 696                 <programlisting>
 697                   iterator
 698                   erase(iterator it)
 699                 </programlisting>
 700
 701                 <para>
 702                   which takes an iterator, erases the corresponding
 703                   element, and returns an iterator to the following
 704                   element. Also standardd hash-based associative
 705                   containers provide this method. This seemingly
 706                   increasesgenericity between associative containers,
 707                   since it is possible to use
 708                 </para>
 709                 <programlisting>
 710                   typename C::iterator it = c.begin();
 711                   typename C::iterator e_it = c.end();
 712
 713                   while(it != e_it)
 714                   it = pred(*it)? c.erase(it) : ++it;
 715                 </programlisting>
 716
 717                 <para>
 718                   in order to erase from a container object <varname>
 719                   c</varname> all element which match a
 720                   predicate <classname>pred</classname>. However, in a
 721                   different sense this actually decreases genericity: an
 722                   integral implication of this method is that tree-based
 723                   associative containers' memory use is linear in the total
 724                   number of elements they store, while hash-based
 725                   containers' memory use is unbounded in the total number of
 726                   elements they store. Assume a hash-based container is
 727                   allowed to decrease its size when an element is
 728                   erased. Then the elements might be rehashed, which means
 729                   that there is no "next" element - it is simply
 730                   undefined. Consequently, it is possible to infer from the
 731                   fact that the standard library's hash-based containers
 732                   provide this method that they cannot downsize when
 733                   elements are erased. As a consequence, different code is
 734                   needed to manipulate different containers, assuming that
 735                   memory should be conserved. Therefor, this library's
 736                   non-order preserving associative containers omit this
 737                   method.
 738                 </para>
 739               </listitem>
 740
 741               <listitem>
 742                 <para>
 743                   All associative containers include a conditional-erase method
 744                 </para>
 745                 <programlisting>
 746                   template&lt;
 747                   class Pred&gt;
 748                   size_type
 749                   erase_if
 750                   (Pred pred)
 751                 </programlisting>
 752                 <para>
 753                   which erases all elements matching a predicate. This is probably the
 754                   only way to ensure linear-time multiple-item erase which can
 755                   actually downsize a container.
 756                 </para>
 757               </listitem>
 758
 759               <listitem>
 760                 <para>
 761                   The standard associative containers provide methods for
 762                   multiple-item erase of the form
 763                 </para>
 764                 <programlisting>
 765                   size_type
 766                   erase(It b, It e)
 767                 </programlisting>
 768                 <para>
 769                   erasing a range of elements given by a pair of
 770                   iterators. For tree-based or trie-based containers, this can
 771                   implemented more efficiently as a (small) sequence of split
 772                   and join operations. For other, unordered, containers, this
 773                   method isn't much better than an external loop. Moreover,
 774                   if <varname>c</varname> is a hash-based container,
 775                   then
 776                 </para>
 777                 <programlisting>
 778                   c.erase(c.find(2), c.find(5))
 779                 </programlisting>
 780                 <para>
 781                   is almost certain to do something
 782                   different than erasing all elements whose keys are between 2
 783                   and 5, and is likely to produce other undefined behavior.
 784                 </para>
 785               </listitem>
 786             </orderedlist>
 787           </section> <!-- erase -->
 788
 789           <section xml:id="motivation.associative.functions.split">
 790             <info>
 791               <title>
 792                 <function>split</function> and <function>join</function>
 793               </title>
 794             </info>
 795             <para>
 796               It is well-known that tree-based and trie-based container
 797               objects can be efficiently split or joined (See
 798               <xref linkend="biblio.clrs2001"/>). Externally splitting or
 799               joining trees is super-linear, and, furthermore, can throw
 800               exceptions. Split and join methods, consequently, seem good
 801               choices for tree-based container methods, especially, since as
 802               noted just before, they are efficient replacements for erasing
 803               sub-sequences.
 804             </para>
 805
 806           </section> <!-- split -->
 807
 808           <section xml:id="motivation.associative.functions.insert">
 809             <info>
 810               <title>
 811                 <function>insert</function>
 812               </title>
 813             </info>
 814             <para>
 815               The standard associative containers provide methods of the form
 816             </para>
 817             <programlisting>
 818               template&lt;class It&gt;
 819               size_type
 820               insert(It b, It e);
 821             </programlisting>
 822
 823             <para>
 824               for inserting a range of elements given by a pair of
 825               iterators. At best, this can be implemented as an external loop,
 826               or, even more efficiently, as a join operation (for the case of
 827               tree-based or trie-based containers). Moreover, these methods seem
 828               similar to constructors taking a range given by a pair of
 829               iterators; the constructors, however, are transactional, whereas
 830               the insert methods are not; this is possibly confusing.
 831             </para>
 832
 833           </section> <!-- insert -->
 834
 835           <section xml:id="motivation.associative.functions.compare">
 836             <info>
 837               <title>
 838                 <function>operator==</function> and <function>operator&lt;=</function>
 839               </title>
 840             </info>
 841
 842             <para>
 843               Associative containers are parametrized by policies allowing to
 844               test key equivalence: a hash-based container can do this through
 845               its equivalence functor, and a tree-based container can do this
 846               through its comparison functor. In addition, some standard
 847               associative containers have global function operators, like
 848               <function>operator==</function> and <function>operator&lt;=</function>,
 849               that allow comparing entire associative containers.
 850             </para>
 851
 852             <para>
 853               In our opinion, these functions are better left out. To begin
 854               with, they do not significantly improve over an external
 855               loop. More importantly, however, they are possibly misleading -
 856               <function>operator==</function>, for example, usually checks for
 857               equivalence, or interchangeability, but the associative
 858               container cannot check for values' equivalence, only keys'
 859               equivalence; also, are two containers considered equivalent if
 860               they store the same values in different order? this is an
 861               arbitrary decision.
 862             </para>
 863           </section> <!-- compare -->
 864
 865         </section>  <!-- functional -->
 866
 867       </section> <!--associative-->
 868
 869       <section xml:id="pbds.intro.motivation.priority_queue">
 870         <info><title>Priority Queues</title></info>
 871
 872         <section xml:id="motivation.priority_queue.policy">
 873           <info><title>Policy Choices</title></info>
 874
 875           <para>
 876             Priority queues are containers that allow efficiently inserting
 877             values and accessing the maximal value (in the sense of the
 878             container's comparison functor). Their interface
 879             supports <function>push</function>
 880             and <function>pop</function>. The standard
 881             container <classname>std::priorityqueue</classname> indeed support
 882             these methods, but little else. For algorithmic and
 883             software-engineering purposes, other methods are needed:
 884           </para>
 885
 886           <orderedlist>
 887             <listitem>
 888               <para>
 889                 Many graph algorithms (see
 890                 <xref linkend="biblio.clrs2001"/>) require increasing a
 891                 value in a priority queue (again, in the sense of the
 892                 container's comparison functor), or joining two
 893                 priority-queue objects.
 894               </para>
 895             </listitem>
 896
 897             <listitem>
 898               <para>The return type of <classname>priority_queue</classname>'s
 899               <function>push</function> method is a point-type iterator, which can
 900               be used for modifying or erasing arbitrary values. For
 901               example:</para>
 902               <programlisting>
 903                 priority_queue&lt;int&gt; p;
 904                 priority_queue&lt;int&gt;::point_iterator it = p.push(3);
 905                 p.modify(it, 4);
 906               </programlisting>
 907
 908               <para>These types of cross-referencing operations are necessary
 909               for making priority queues useful for different applications,
 910               especially graph applications.</para>
 911
 912             </listitem>
 913             <listitem>
 914               <para>
 915                 It is sometimes necessary to erase an arbitrary value in a
 916                 priority queue. For example, consider
 917                 the <function>select</function> function for monitoring
 918                 file descriptors:
 919               </para>
 920
 921               <programlisting>
 922                 int
 923                 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
 924                 struct timeval *timeout);
 925               </programlisting>
 926               <para>
 927                 then, as the select documentation states:
 928               </para>
 929               <para>
 930                 <quote>
 931                   The nfds argument specifies the range of file
 932                   descriptors to be tested. The select() function tests file
 933                 descriptors in the range of 0 to nfds-1.</quote>
 934               </para>
 935
 936               <para>
 937                 It stands to reason, therefore, that we might wish to
 938                 maintain a minimal value for <varname>nfds</varname>, and
 939                 priority queues immediately come to mind. Note, though, that
 940                 when a socket is closed, the minimal file description might
 941                 change; in the absence of an efficient means to erase an
 942                 arbitrary value from a priority queue, we might as well
 943                 avoid its use altogether.
 944               </para>
 945
 946               <para>
 947                 The standard containers typically support iterators. It is
 948                 somewhat unusual
 949                 for <classname>std::priority_queue</classname> to omit them
 950                 (See <xref linkend="biblio.meyers01stl"/>). One might
 951                 ask why do priority queues need to support iterators, since
 952                 they are self-organizing containers with a different purpose
 953                 than abstracting sequences. There are several reasons:
 954               </para>
 955               <orderedlist>
 956                 <listitem>
 957                   <para>
 958                     Iterators (even in self-organizing containers) are
 959                     useful for many purposes: cross-referencing
 960                     containers, serialization, and debugging code that uses
 961                     these containers.
 962                   </para>
 963                 </listitem>
 964
 965                 <listitem>
 966                   <para>
 967                     The standard library's hash-based containers support
 968                     iterators, even though they too are self-organizing
 969                     containers with a different purpose than abstracting
 970                     sequences.
 971                   </para>
 972                 </listitem>
 973
 974                 <listitem>
 975                   <para>
 976                     In standard-library-like containers, it is natural to specify the
 977                     interface of operations for modifying a value or erasing
 978                     a value (discussed previously) in terms of a iterators.
 979                     It should be noted that the standard
 980                     containers also use iterators for accessing and
 981                     manipulating a specific value. In hash-based
 982                     containers, one checks the existence of a key by
 983                     comparing the iterator returned by <function>find</function> to the
 984                     iterator returned by <function>end</function>, and not by comparing a
 985                     pointer returned by <function>find</function> to <type>NULL</type>.
 986                   </para>
 987                 </listitem>
 988               </orderedlist>
 989             </listitem>
 990           </orderedlist>
 991
 992         </section>
 993
 994         <section xml:id="motivation.priority_queue.underlying">
 995           <info><title>Underlying Data Structures</title></info>
 996
 997           <para>
 998             There are three main implementations of priority queues: the
 999             first employs a binary heap, typically one which uses a
1000             sequence; the second uses a tree (or forest of trees), which is
1001             typically less structured than an associative container's tree;
1002             the third simply uses an associative container. These are
1003             shown in the figure below with labels A1 and A2, B, and C.
1004           </para>
1005
1006           <figure>
1007             <title>Underlying Priority Queue Data Structures</title>
1008             <mediaobject>
1009               <imageobject>
1010                 <imagedata align="center" format="PNG" scale="100"
1011                            fileref="../images/pbds_different_underlying_dss_2.png"/>
1012               </imageobject>
1013               <textobject>
1014                 <phrase>Underlying Priority Queue Data Structures</phrase>
1015               </textobject>
1016             </mediaobject>
1017           </figure>
1018
1019           <para>
1020             No single implementation can completely replace any of the
1021             others. Some have better <function>push</function>
1022             and <function>pop</function> amortized performance, some have
1023             better bounded (worst case) response time than others, some
1024             optimize a single method at the expense of others, etc. In
1025             general the "best" implementation is dictated by the specific
1026             problem.
1027           </para>
1028
1029           <para>
1030             As with associative containers, the more implementations
1031             co-exist, the more necessary a traits mechanism is for handling
1032             generic containers safely and efficiently. This is especially
1033             important for priority queues, since the invalidation guarantees
1034             of one of the most useful data structures - binary heaps - is
1035             markedly different than those of most of the others.
1036           </para>
1037
1038         </section>
1039
1040         <section xml:id="motivation.priority_queue.binary_heap">
1041           <info><title>Binary Heaps</title></info>
1042
1043
1044           <para>
1045             Binary heaps are one of the most useful underlying
1046             data structures for priority queues. They are very efficient in
1047             terms of memory (since they don't require per-value structure
1048             metadata), and have the best amortized <function>push</function> and
1049             <function>pop</function> performance for primitive types like
1050             <type>int</type>.
1051           </para>
1052
1053           <para>
1054             The standard library's <classname>priority_queue</classname>
1055             implements this data structure as an adapter over a sequence,
1056             typically
1057             <classname>std::vector</classname>
1058             or <classname>std::deque</classname>, which correspond to labels
1059             A1 and A2 respectively in the graphic above.
1060           </para>
1061
1062           <para>
1063             This is indeed an elegant example of the adapter concept and
1064             the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1065             several reasons why a binary-heap priority queue
1066             may be better implemented as a container instead of a
1067             sequence adapter:
1068           </para>
1069
1070           <orderedlist>
1071             <listitem>
1072               <para>
1073                 <classname>std::priority_queue</classname> cannot erase values
1074                 from its adapted sequence (irrespective of the sequence
1075                 type). This means that the memory use of
1076                 an <classname>std::priority_queue</classname> object is always
1077                 proportional to the maximal number of values it ever contained,
1078                 and not to the number of values that it currently
1079                 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1080                 This implementation of binary heaps acts very differently than
1081                 other underlying data structures (See also pairing heaps).
1082               </para>
1083             </listitem>
1084
1085             <listitem>
1086               <para>
1087                 Some combinations of adapted sequences and value types
1088                 are very inefficient or just don't make sense. If one uses
1089                 <classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1090                 &gt; &gt;</classname>, for example, then not only will each
1091                 operation perform a logarithmic number of
1092                 <classname>std::string</classname> assignments, but, furthermore, any
1093                 operation (including <function>pop</function>) can render the container
1094                 useless due to exceptions. Conversely, if one uses
1095                 <classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1096                 &gt;</classname>, then each operation uses incurs a logarithmic
1097                 number of indirect accesses (through pointers) unnecessarily.
1098                 It might be better to let the container make a conservative
1099                 deduction whether to use the structure in the graphic above, labels A1 or A2.
1100               </para>
1101             </listitem>
1102
1103             <listitem>
1104               <para>
1105                 There does not seem to be a systematic way to determine
1106                 what exactly can be done with the priority queue.
1107               </para>
1108               <orderedlist>
1109                 <listitem>
1110                   <para>
1111                     If <classname>p</classname> is a priority queue adapting an
1112                     <classname>std::vector</classname>, then it is possible to iterate over
1113                     all values by using <function>&amp;p.top()</function> and
1114                     <function>&amp;p.top() + p.size()</function>, but this will not work
1115                     if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1116                     case, one cannot use <classname>p.begin()</classname> and
1117                     <classname>p.end()</classname>. If a different sequence is adapted, it
1118                     is even more difficult to determine what can be
1119                     done.
1120                   </para>
1121                 </listitem>
1122
1123                 <listitem>
1124                   <para>
1125                     If <varname>p</varname> is a priority queue adapting an
1126                     <classname>std::deque</classname>, then the reference return by
1127                   </para>
1128                   <programlisting>
1129                     p.top()
1130                   </programlisting>
1131                   <para>
1132                     will remain valid until it is popped,
1133                     but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1134                     next <function>push</function> will invalidate it. If a different
1135                     sequence is adapted, it is even more difficult to
1136                     determine what can be done.
1137                   </para>
1138                 </listitem>
1139               </orderedlist>
1140             </listitem>
1141
1142             <listitem>
1143               <para>
1144                 Sequence-based binary heaps can still implement
1145                 linear-time <function>erase</function> and <function>modify</function> operations.
1146                 This means that if one needs to erase a small
1147                 (say logarithmic) number of values, then one might still
1148                 choose this underlying data structure. Using
1149                 <classname>std::priority_queue</classname>, however, this will generally
1150                 change the order of growth of the entire sequence of
1151                 operations.
1152               </para>
1153             </listitem>
1154           </orderedlist>
1155
1156         </section>
1157       </section>
1158     </section> <!-- goals/motivation -->
1159   </section> <!-- intro -->
1160
1161   <!-- S02: Using -->
1162   <section xml:id="containers.pbds.using">
1163     <info><title>Using</title></info>
1164     <?dbhtml filename="policy_data_structures_using.html"?>
1165
1166     <section xml:id="pbds.using.prereq">
1167       <info><title>Prerequisites</title></info>
1168
1169       <para>The library contains only header files, and does not require any
1170       other libraries except the standard C++ library . All classes are
1171       defined in namespace <code>__gnu_pbds</code>. The library internally
1172       uses macros beginning with <code>PB_DS</code>, but
1173       <code>#undef</code>s anything it <code>#define</code>s (except for
1174       header guards). Compiling the library in an environment where macros
1175       beginning in <code>PB_DS</code> are defined, may yield unpredictable
1176       results in compilation, execution, or both.</para>
1177
1178       <para>
1179         Further dependencies are necessary to create the visual output
1180         for the performance tests. To create these graphs, an
1181         additional package is needed: <command>pychart</command>.
1182       </para>
1183     </section>
1184
1185     <section xml:id="pbds.using.organization">
1186       <info><title>Organization</title></info>
1187
1188       <para>
1189         The various data structures are organized as follows.
1190       </para>
1191
1192       <itemizedlist>
1193         <listitem>
1194           <para>
1195             Branch-Based
1196           </para>
1197
1198           <itemizedlist>
1199             <listitem>
1200               <para>
1201                 <classname>basic_branch</classname>
1202                 is an abstract base class for branched-based
1203                 associative-containers
1204               </para>
1205             </listitem>
1206
1207             <listitem>
1208               <para>
1209                 <classname>tree</classname>
1210                 is a concrete base class for tree-based
1211                 associative-containers
1212               </para>
1213             </listitem>
1214
1215             <listitem>
1216               <para>
1217                 <classname>trie</classname>
1218                 is a concrete base class trie-based
1219                 associative-containers
1220               </para>
1221             </listitem>
1222           </itemizedlist>
1223         </listitem>
1224
1225         <listitem>
1226           <para>
1227             Hash-Based
1228           </para>
1229           <itemizedlist>
1230             <listitem>
1231               <para>
1232                 <classname>basic_hash_table</classname>
1233                 is an abstract base class for hash-based
1234                 associative-containers
1235               </para>
1236             </listitem>
1237
1238             <listitem>
1239               <para>
1240                 <classname>cc_hash_table</classname>
1241                 is a concrete collision-chaining hash-based
1242                 associative-containers
1243               </para>
1244             </listitem>
1245
1246             <listitem>
1247               <para>
1248                 <classname>gp_hash_table</classname>
1249                 is a concrete (general) probing hash-based
1250                 associative-containers
1251               </para>
1252             </listitem>
1253           </itemizedlist>
1254         </listitem>
1255
1256         <listitem>
1257           <para>
1258             List-Based
1259           </para>
1260           <itemizedlist>
1261             <listitem>
1262               <para>
1263                 <classname>list_update</classname>
1264                 list-based update-policy associative container
1265               </para>
1266             </listitem>
1267           </itemizedlist>
1268         </listitem>
1269         <listitem>
1270           <para>
1271             Heap-Based
1272           </para>
1273           <itemizedlist>
1274             <listitem>
1275               <para>
1276                 <classname>priority_queue</classname>
1277                 A priority queue.
1278               </para>
1279             </listitem>
1280           </itemizedlist>
1281         </listitem>
1282       </itemizedlist>
1283
1284       <para>
1285         The hierarchy is composed naturally so that commonality is
1286         captured by base classes. Thus <function>operator[]</function>
1287         is defined at the base of any hierarchy, since all derived
1288         containers support it. Conversely <function>split</function> is
1289         defined in <classname>basic_branch</classname>, since only
1290         tree-like containers support it.
1291       </para>
1292
1293       <para>
1294         In addition, there are the following diagnostics classes,
1295         used to report errors specific to this library's data
1296         structures.
1297       </para>
1298
1299       <figure>
1300         <title>Exception Hierarchy</title>
1301         <mediaobject>
1302           <imageobject>
1303             <imagedata align="center" format="PDF" scale="75"
1304                        fileref="../images/pbds_exception_hierarchy.pdf"/>
1305           </imageobject>
1306           <imageobject>
1307             <imagedata align="center" format="PNG" scale="100"
1308                        fileref="../images/pbds_exception_hierarchy.png"/>
1309           </imageobject>
1310           <textobject>
1311             <phrase>Exception Hierarchy</phrase>
1312           </textobject>
1313         </mediaobject>
1314       </figure>
1315
1316     </section>
1317
1318     <section xml:id="pbds.using.tutorial">
1319       <info><title>Tutorial</title></info>
1320
1321       <section xml:id="pbds.using.tutorial.basic">
1322         <info><title>Basic Use</title></info>
1323
1324         <para>
1325           For the most part, the policy-based containers containers in
1326           namespace <literal>__gnu_pbds</literal> have the same interface as
1327           the equivalent containers in the standard C++ library, except for
1328           the names used for the container classes themselves. For example,
1329           this shows basic operations on a collision-chaining hash-based
1330           container:
1331         </para>
1332         <programlisting>
1333           #include &lt;ext/pb_ds/assoc_container.h&gt;
1334
1335           int main()
1336           {
1337           __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1338           c[2] = 'b';
1339           assert(c.find(1) == c.end());
1340           };
1341         </programlisting>
1342
1343         <para>
1344           The container is called
1345           <classname>__gnu_pbds::cc_hash_table</classname> instead of
1346           <classname>std::unordered_map</classname>, since <quote>unordered
1347           map</quote> does not necessarily mean a hash-based map as implied by
1348           the C++ library (C++11 or TR1). For example, list-based associative
1349           containers, which are very useful for the construction of
1350           "multimaps," are also unordered.
1351         </para>
1352
1353         <para>This snippet shows a red-black tree based container:</para>
1354
1355         <programlisting>
1356           #include &lt;ext/pb_ds/assoc_container.h&gt;
1357
1358           int main()
1359           {
1360           __gnu_pbds::tree&lt;int, char&gt; c;
1361           c[2] = 'b';
1362           assert(c.find(2) != c.end());
1363           };
1364         </programlisting>
1365
1366         <para>The container is called <classname>tree</classname> instead of
1367         <classname>map</classname> since the underlying data structures are
1368         being named with specificity.
1369         </para>
1370
1371         <para>
1372           The member function naming convention is to strive to be the same as
1373           the equivalent member functions in other C++ standard library
1374           containers. The familiar methods are unchanged:
1375           <function>begin</function>, <function>end</function>,
1376           <function>size</function>, <function>empty</function>, and
1377           <function>clear</function>.
1378         </para>
1379
1380         <para>
1381           This isn't to say that things are exactly as one would expect, given
1382           the container requirments and interfaces in the C++ standard.
1383         </para>
1384
1385         <para>
1386           The names of containers' policies and policy accessors are
1387           different then the usual. For example, if <type>hash_type</type> is
1388         some type of hash-based container, then</para>
1389
1390         <programlisting>
1391           hash_type::hash_fn
1392         </programlisting>
1393
1394         <para>
1395           gives the type of its hash functor, and if <varname>obj</varname> is
1396           some hash-based container object, then
1397         </para>
1398
1399         <programlisting>
1400           obj.get_hash_fn()
1401         </programlisting>
1402
1403         <para>will return a reference to its hash-functor object.</para>
1404
1405
1406         <para>
1407           Similarly, if <type>tree_type</type> is some type of tree-based
1408           container, then
1409         </para>
1410
1411         <programlisting>
1412           tree_type::cmp_fn
1413         </programlisting>
1414
1415         <para>
1416           gives the type of its comparison functor, and if
1417           <varname>obj</varname> is some tree-based container object,
1418           then
1419         </para>
1420
1421         <programlisting>
1422           obj.get_cmp_fn()
1423         </programlisting>
1424
1425         <para>will return a reference to its comparison-functor object.</para>
1426
1427         <para>
1428           It would be nice to give names consistent with those in the existing
1429           C++ standard (inclusive of TR1). Unfortunately, these standard
1430           containers don't consistently name types and methods. For example,
1431           <classname>std::tr1::unordered_map</classname> uses
1432           <type>hasher</type> for the hash functor, but
1433           <classname>std::map</classname> uses <type>key_compare</type> for
1434           the comparison functor. Also, we could not find an accessor for
1435           <classname>std::tr1::unordered_map</classname>'s hash functor, but
1436           <classname>std::map</classname> uses <classname>compare</classname>
1437           for accessing the comparison functor.
1438         </para>
1439
1440         <para>
1441           Instead, <literal>__gnu_pbds</literal> attempts to be internally
1442           consistent, and uses standard-derived terminology if possible.
1443         </para>
1444
1445         <para>
1446           Another source of difference is in scope:
1447           <literal>__gnu_pbds</literal> contains more types of associative
1448           containers than the standard C++ library, and more opportunities
1449           to configure these new containers, since different types of
1450           associative containers are useful in different settings.
1451         </para>
1452
1453         <para>
1454           Namespace <literal>__gnu_pbds</literal> contains different classes for
1455           hash-based containers, tree-based containers, trie-based containers,
1456           and list-based containers.
1457         </para>
1458
1459         <para>
1460           Since associative containers share parts of their interface, they
1461           are organized as a class hierarchy.
1462         </para>
1463
1464         <para>Each type or method is defined in the most-common ancestor
1465         in which it makes sense.
1466         </para>
1467
1468         <para>For example, all associative containers support iteration
1469         expressed in the following form:
1470         </para>
1471
1472         <programlisting>
1473           const_iterator
1474           begin() const;
1475
1476           iterator
1477           begin();
1478
1479           const_iterator
1480           end() const;
1481
1482           iterator
1483           end();
1484         </programlisting>
1485
1486         <para>
1487           But not all containers contain or use hash functors. Yet, both
1488           collision-chaining and (general) probing hash-based associative
1489           containers have a hash functor, so
1490           <classname>basic_hash_table</classname> contains the interface:
1491         </para>
1492
1493         <programlisting>
1494           const hash_fn&amp;
1495           get_hash_fn() const;
1496
1497           hash_fn&amp;
1498           get_hash_fn();
1499         </programlisting>
1500
1501         <para>
1502           so all hash-based associative containers inherit the same
1503           hash-functor accessor methods.
1504         </para>
1505
1506       </section> <!--basic use -->
1507
1508       <section xml:id="pbds.using.tutorial.configuring">
1509         <info>
1510           <title>
1511             Configuring via Template Parameters
1512           </title>
1513         </info>
1514
1515         <para>
1516           In general, each of this library's containers is
1517           parametrized by more policies than those of the standard library. For
1518           example, the standard hash-based container is parametrized as
1519           follows:
1520         </para>
1521         <programlisting>
1522           template&lt;typename Key, typename Mapped, typename Hash,
1523           typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1524           class unordered_map;
1525         </programlisting>
1526
1527         <para>
1528           and so can be configured by key type, mapped type, a functor
1529           that translates keys to unsigned integral types, an equivalence
1530           predicate, an allocator, and an indicator whether to store hash
1531           values with each entry. this library's collision-chaining
1532           hash-based container is parametrized as
1533         </para>
1534         <programlisting>
1535           template&lt;typename Key, typename Mapped, typename Hash_Fn,
1536           typename Eq_Fn, typename Comb_Hash_Fn,
1537           typename Resize_Policy, bool Store_Hash
1538           typename Allocator&gt;
1539           class cc_hash_table;
1540         </programlisting>
1541
1542         <para>
1543           and so can be configured by the first four types of
1544           <classname>std::tr1::unordered_map</classname>, then a
1545           policy for translating the key-hash result into a position
1546           within the table, then a policy by which the table resizes,
1547           an indicator whether to store hash values with each entry,
1548           and an allocator (which is typically the last template
1549           parameter in standard containers).
1550         </para>
1551
1552         <para>
1553           Nearly all policy parameters have default values, so this
1554           need not be considered for casual use. It is important to
1555           note, however, that hash-based containers' policies can
1556           dramatically alter their performance in different settings,
1557           and that tree-based containers' policies can make them
1558           useful for other purposes than just look-up.
1559         </para>
1560
1561
1562         <para>As opposed to associative containers, priority queues have
1563         relatively few configuration options. The priority queue is
1564         parametrized as follows:</para>
1565         <programlisting>
1566           template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1567           typename Allocator&gt;
1568           class priority_queue;
1569         </programlisting>
1570
1571         <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1572         <classname>Allocator</classname> parameters are the container's value type,
1573         comparison-functor type, and allocator type, respectively;
1574         these are very similar to the standard's priority queue. The
1575         <classname>Tag</classname> parameter is different: there are a number of
1576         pre-defined tag types corresponding to binary heaps, binomial
1577         heaps, etc., and <classname>Tag</classname> should be instantiated
1578         by one of them.</para>
1579
1580         <para>Note that as opposed to the
1581         <classname>std::priority_queue</classname>,
1582         <classname>__gnu_pbds::priority_queue</classname> is not a
1583         sequence-adapter; it is a regular container.</para>
1584
1585       </section>
1586
1587       <section xml:id="pbds.using.tutorial.traits">
1588         <info>
1589           <title>
1590             Querying Container Attributes
1591           </title>
1592         </info>
1593         <para></para>
1594
1595         <para>A containers underlying data structure
1596         affect their performance; Unfortunately, they can also affect
1597         their interface. When manipulating generically associative
1598         containers, it is often useful to be able to statically
1599         determine what they can support and what the cannot.
1600         </para>
1601
1602         <para>Happily, the standard provides a good solution to a similar
1603         problem - that of the different behavior of iterators. If
1604         <classname>It</classname> is an iterator, then
1605         </para>
1606         <programlisting>
1607           typename std::iterator_traits&lt;It&gt;::iterator_category
1608         </programlisting>
1609
1610         <para>is one of a small number of pre-defined tag classes, and
1611         </para>
1612         <programlisting>
1613           typename std::iterator_traits&lt;It&gt;::value_type
1614         </programlisting>
1615
1616         <para>is the value type to which the iterator "points".</para>
1617
1618         <para>
1619           Similarly, in this library, if <type>C</type> is a
1620           container, then <classname>container_traits</classname> is a
1621           trait class that stores information about the kind of
1622           container that is implemented.
1623         </para>
1624         <programlisting>
1625           typename container_traits&lt;C&gt;::container_category
1626         </programlisting>
1627         <para>
1628           is one of a small number of predefined tag structures that
1629           uniquely identifies the type of underlying data structure.
1630         </para>
1631
1632         <para>In most cases, however, the exact underlying data
1633         structure is not really important, but what is important is
1634         one of its other attributes: whether it guarantees storing
1635         elements by key order, for example. For this one can
1636         use</para>
1637         <programlisting>
1638           typename container_traits&lt;C&gt;::order_preserving
1639         </programlisting>
1640         <para>
1641           Also,
1642         </para>
1643         <programlisting>
1644           typename container_traits&lt;C&gt;::invalidation_guarantee
1645         </programlisting>
1646
1647         <para>is the container's invalidation guarantee. Invalidation
1648         guarantees are especially important regarding priority queues,
1649         since in this library's design, iterators are practically the
1650         only way to manipulate them.</para>
1651       </section>
1652
1653       <section xml:id="pbds.using.tutorial.point_range_iteration">
1654         <info>
1655           <title>
1656             Point and Range Iteration
1657           </title>
1658         </info>
1659         <para></para>
1660
1661         <para>This library differentiates between two types of methods
1662         and iterators: point-type, and range-type. For example,
1663         <function>find</function> and <function>insert</function> are point-type methods, since
1664         they each deal with a specific element; their returned
1665         iterators are point-type iterators. <function>begin</function> and
1666         <function>end</function> are range-type methods, since they are not used to
1667         find a specific element, but rather to go over all elements in
1668         a container object; their returned iterators are range-type
1669         iterators.
1670         </para>
1671
1672         <para>Most containers store elements in an order that is
1673         determined by their interface. Correspondingly, it is fine that
1674         their point-type iterators are synonymous with their range-type
1675         iterators. For example, in the following snippet
1676         </para>
1677         <programlisting>
1678           std::for_each(c.find(1), c.find(5), foo);
1679         </programlisting>
1680         <para>
1681           two point-type iterators (returned by <function>find</function>) are used
1682           for a range-type purpose - going over all elements whose key is
1683           between 1 and 5.
1684         </para>
1685
1686         <para>
1687           Conversely, the above snippet makes no sense for
1688           self-organizing containers - ones that order (and reorder)
1689           their elements by implementation. It would be nice to have a
1690           uniform iterator system that would allow the above snippet to
1691           compile only if it made sense.
1692         </para>
1693
1694         <para>
1695           This could trivially be done by specializing
1696           <function>std::for_each</function> for the case of iterators returned by
1697           <classname>std::tr1::unordered_map</classname>, but this would only solve the
1698           problem for one algorithm and one container. Fundamentally, the
1699           problem is that one can loop using a self-organizing
1700           container's point-type iterators.
1701         </para>
1702
1703         <para>
1704           This library's containers define two families of
1705           iterators: <type>point_const_iterator</type> and
1706           <type>point_iterator</type> are the iterator types returned by
1707           point-type methods; <type>const_iterator</type> and
1708           <type>iterator</type> are the iterator types returned by range-type
1709           methods.
1710         </para>
1711         <programlisting>
1712           class &lt;- some container -&gt;
1713           {
1714           public:
1715           ...
1716
1717           typedef &lt;- something -&gt; const_iterator;
1718
1719           typedef &lt;- something -&gt; iterator;
1720
1721           typedef &lt;- something -&gt; point_const_iterator;
1722
1723           typedef &lt;- something -&gt; point_iterator;
1724
1725           ...
1726
1727           public:
1728           ...
1729
1730           const_iterator begin () const;
1731
1732           iterator begin();
1733
1734           point_const_iterator find(...) const;
1735
1736           point_iterator find(...);
1737           };
1738         </programlisting>
1739
1740         <para>For
1741         containers whose interface defines sequence order , it
1742         is very simple: point-type and range-type iterators are exactly
1743         the same, which means that the above snippet will compile if it
1744         is used for an order-preserving associative container.
1745         </para>
1746
1747         <para>
1748           For self-organizing containers, however, (hash-based
1749           containers as a special example), the preceding snippet will
1750           not compile, because their point-type iterators do not support
1751           <function>operator++</function>.
1752         </para>
1753
1754         <para>In any case, both for order-preserving and self-organizing
1755         containers, the following snippet will compile:
1756         </para>
1757         <programlisting>
1758           typename Cntnr::point_iterator it = c.find(2);
1759         </programlisting>
1760
1761         <para>
1762           because a range-type iterator can always be converted to a
1763           point-type iterator.
1764         </para>
1765
1766         <para>Distingushing between iterator types also
1767         raises the point that a container's iterators might have
1768         different invalidation rules concerning their de-referencing
1769         abilities and movement abilities. This now corresponds exactly
1770         to the question of whether point-type and range-type iterators
1771         are valid. As explained above, <classname>container_traits</classname> allows
1772         querying a container for its data structure attributes. The
1773         iterator-invalidation guarantees are certainly a property of
1774         the underlying data structure, and so
1775         </para>
1776         <programlisting>
1777           container_traits&lt;C&gt;::invalidation_guarantee
1778         </programlisting>
1779
1780         <para>
1781           gives one of three pre-determined types that answer this
1782           query.
1783         </para>
1784
1785       </section>
1786     </section> <!-- tutorial -->
1787
1788     <section xml:id="pbds.using.examples">
1789       <info><title>Examples</title></info>
1790       <para>
1791         Additional code examples are provided in the source
1792         distribution, as part of the regression and performance
1793         testsuite.
1794       </para>
1795
1796       <section xml:id="pbds.using.examples.basic">
1797         <info><title>Intermediate Use</title></info>
1798
1799         <itemizedlist>
1800           <listitem>
1801             <para>
1802               Basic use of maps:
1803               <filename>basic_map.cc</filename>
1804             </para>
1805           </listitem>
1806
1807           <listitem>
1808             <para>
1809               Basic use of sets:
1810               <filename>basic_set.cc</filename>
1811             </para>
1812           </listitem>
1813
1814           <listitem>
1815             <para>
1816               Conditionally erasing values from an associative container object:
1817               <filename>erase_if.cc</filename>
1818             </para>
1819           </listitem>
1820
1821           <listitem>
1822             <para>
1823               Basic use of multimaps:
1824               <filename>basic_multimap.cc</filename>
1825             </para>
1826           </listitem>
1827
1828           <listitem>
1829             <para>
1830               Basic use of multisets:
1831               <filename>basic_multiset.cc</filename>
1832             </para>
1833           </listitem>
1834
1835           <listitem>
1836             <para>
1837               Basic use of priority queues:
1838               <filename>basic_priority_queue.cc</filename>
1839             </para>
1840           </listitem>
1841
1842           <listitem>
1843             <para>
1844               Splitting and joining priority queues:
1845               <filename>priority_queue_split_join.cc</filename>
1846             </para>
1847           </listitem>
1848
1849           <listitem>
1850             <para>
1851               Conditionally erasing values from a priority queue:
1852               <filename>priority_queue_erase_if.cc</filename>
1853             </para>
1854           </listitem>
1855         </itemizedlist>
1856
1857       </section>
1858
1859       <section xml:id="pbds.using.examples.query">
1860         <info><title>Querying with <classname>container_traits</classname> </title></info>
1861         <itemizedlist>
1862           <listitem>
1863             <para>
1864               Using <classname>container_traits</classname> to query
1865               about underlying data structure behavior:
1866               <filename>assoc_container_traits.cc</filename>
1867             </para>
1868           </listitem>
1869
1870           <listitem>
1871             <para>
1872               A non-compiling example showing wrong use of finding keys in
1873               hash-based containers: <filename>hash_find_neg.cc</filename>
1874             </para>
1875           </listitem>
1876           <listitem>
1877             <para>
1878               Using <classname>container_traits</classname>
1879               to query about underlying data structure behavior:
1880               <filename>priority_queue_container_traits.cc</filename>
1881             </para>
1882           </listitem>
1883
1884         </itemizedlist>
1885
1886       </section>
1887
1888       <section xml:id="pbds.using.examples.container">
1889         <info><title>By Container Method</title></info>
1890         <para></para>
1891
1892         <section xml:id="pbds.using.examples.container.hash">
1893           <info><title>Hash-Based</title></info>
1894
1895           <section xml:id="pbds.using.examples.container.hash.resize">
1896             <info><title>size Related</title></info>
1897
1898             <itemizedlist>
1899               <listitem>
1900                 <para>
1901                   Setting the initial size of a hash-based container
1902                   object:
1903                   <filename>hash_initial_size.cc</filename>
1904                 </para>
1905               </listitem>
1906
1907               <listitem>
1908                 <para>
1909                   A non-compiling example showing how not to resize a
1910                   hash-based container object:
1911                   <filename>hash_resize_neg.cc</filename>
1912                 </para>
1913               </listitem>
1914
1915               <listitem>
1916                 <para>
1917                   Resizing the size of a hash-based container object:
1918                   <filename>hash_resize.cc</filename>
1919                 </para>
1920               </listitem>
1921
1922               <listitem>
1923                 <para>
1924                   Showing an illegal resize of a hash-based container
1925                   object:
1926                   <filename>hash_illegal_resize.cc</filename>
1927                 </para>
1928               </listitem>
1929
1930               <listitem>
1931                 <para>
1932                   Changing the load factors of a hash-based container
1933                   object: <filename>hash_load_set_change.cc</filename>
1934                 </para>
1935               </listitem>
1936             </itemizedlist>
1937           </section>
1938
1939           <section xml:id="pbds.using.examples.container.hash.hashor">
1940             <info><title>Hashing Function Related</title></info>
1941             <para></para>
1942
1943             <itemizedlist>
1944               <listitem>
1945                 <para>
1946                   Using a modulo range-hashing function for the case of an
1947                   unknown skewed key distribution:
1948                   <filename>hash_mod.cc</filename>
1949                 </para>
1950               </listitem>
1951
1952               <listitem>
1953                 <para>
1954                   Writing a range-hashing functor for the case of a known
1955                   skewed key distribution:
1956                   <filename>shift_mask.cc</filename>
1957                 </para>
1958               </listitem>
1959
1960               <listitem>
1961                 <para>
1962                   Storing the hash value along with each key:
1963                   <filename>store_hash.cc</filename>
1964                 </para>
1965               </listitem>
1966
1967               <listitem>
1968                 <para>
1969                   Writing a ranged-hash functor:
1970                   <filename>ranged_hash.cc</filename>
1971                 </para>
1972               </listitem>
1973             </itemizedlist>
1974
1975           </section>
1976
1977         </section>
1978
1979         <section xml:id="pbds.using.examples.container.branch">
1980           <info><title>Branch-Based</title></info>
1981
1982
1983           <section xml:id="pbds.using.examples.container.branch.split">
1984             <info><title>split or join Related</title></info>
1985
1986             <itemizedlist>
1987               <listitem>
1988                 <para>
1989                   Joining two tree-based container objects:
1990                   <filename>tree_join.cc</filename>
1991                 </para>
1992               </listitem>
1993
1994               <listitem>
1995                 <para>
1996                   Splitting a PATRICIA trie container object:
1997                   <filename>trie_split.cc</filename>
1998                 </para>
1999               </listitem>
2000
2001               <listitem>
2002                 <para>
2003                   Order statistics while joining two tree-based container
2004                   objects:
2005                   <filename>tree_order_statistics_join.cc</filename>
2006                 </para>
2007               </listitem>
2008             </itemizedlist>
2009
2010           </section>
2011
2012           <section xml:id="pbds.using.examples.container.branch.invariants">
2013             <info><title>Node Invariants</title></info>
2014
2015             <itemizedlist>
2016               <listitem>
2017                 <para>
2018                   Using trees for order statistics:
2019                   <filename>tree_order_statistics.cc</filename>
2020                 </para>
2021               </listitem>
2022
2023               <listitem>
2024                 <para>
2025                   Augmenting trees to support operations on line
2026                   intervals:
2027                   <filename>tree_intervals.cc</filename>
2028                 </para>
2029               </listitem>
2030             </itemizedlist>
2031
2032           </section>
2033
2034           <section xml:id="pbds.using.examples.container.branch.trie">
2035             <info><title>trie</title></info>
2036             <itemizedlist>
2037               <listitem>
2038                 <para>
2039                   Using a PATRICIA trie for DNA strings:
2040                   <filename>trie_dna.cc</filename>
2041                 </para>
2042               </listitem>
2043
2044               <listitem>
2045                 <para>
2046                   Using a PATRICIA
2047                   trie for finding all entries whose key matches a given prefix:
2048                   <filename>trie_prefix_search.cc</filename>
2049                 </para>
2050               </listitem>
2051             </itemizedlist>
2052
2053           </section>
2054
2055         </section>
2056
2057         <section xml:id="pbds.using.examples.container.priority_queue">
2058           <info><title>Priority Queues</title></info>
2059           <itemizedlist>
2060             <listitem>
2061               <para>
2062                 Cross referencing an associative container and a priority
2063                 queue: <filename>priority_queue_xref.cc</filename>
2064               </para>
2065             </listitem>
2066
2067             <listitem>
2068               <para>
2069                 Cross referencing a vector and a priority queue using a
2070                 very simple version of Dijkstra's shortest path
2071                 algorithm:
2072                 <filename>priority_queue_dijkstra.cc</filename>
2073               </para>
2074             </listitem>
2075           </itemizedlist>
2076
2077         </section>
2078
2079
2080       </section>
2081
2082     </section>
2083
2084   </section> <!-- using -->
2085
2086   <!-- S03: Design -->
2087
2088
2089 <section xml:id="containers.pbds.design">
2090   <info><title>Design</title></info>
2091   <?dbhtml filename="policy_data_structures_design.html"?>
2092   <para></para>
2093
2094   <section xml:id="pbds.design.concepts">
2095     <info><title>Concepts</title></info>
2096
2097     <section xml:id="pbds.design.concepts.null_type">
2098       <info><title>Null Policy Classes</title></info>
2099
2100       <para>
2101         Associative containers are typically parametrized by various
2102         policies. For example, a hash-based associative container is
2103         parametrized by a hash-functor, transforming each key into an
2104         non-negative numerical type. Each such value is then further mapped
2105         into a position within the table. The mapping of a key into a
2106         position within the table is therefore a two-step process.
2107       </para>
2108
2109       <para>
2110         In some cases, instantiations are redundant. For example, when the
2111         keys are integers, it is possible to use a redundant hash policy,
2112         which transforms each key into its value.
2113       </para>
2114
2115       <para>
2116         In some other cases, these policies are irrelevant.  For example, a
2117         hash-based associative container might transform keys into positions
2118         within a table by a different method than the two-step method
2119         described above. In such a case, the hash functor is simply
2120         irrelevant.
2121       </para>
2122
2123       <para>
2124         When a policy is either redundant or irrelevant, it can be replaced
2125         by <classname>null_type</classname>.
2126       </para>
2127
2128       <para>
2129         For example, a <emphasis>set</emphasis> is an associative
2130         container with one of its template parameters (the one for the
2131         mapped type) replaced with <classname>null_type</classname>. Other
2132         places simplifications are made possible with this technique
2133         include node updates in tree and trie data structures, and hash
2134         and probe functions for hash data structures.
2135       </para>
2136     </section>
2137
2138     <section xml:id="pbds.design.concepts.associative_semantics">
2139       <info><title>Map and Set Semantics</title></info>
2140
2141       <section xml:id="concepts.associative_semantics.set_vs_map">
2142         <info>
2143           <title>
2144             Distinguishing Between Maps and Sets
2145           </title>
2146         </info>
2147
2148         <para>
2149           Anyone familiar with the standard knows that there are four kinds
2150           of associative containers: maps, sets, multimaps, and
2151           multisets. The map datatype associates each key to
2152           some data.
2153         </para>
2154
2155         <para>
2156           Sets are associative containers that simply store keys -
2157           they do not map them to anything. In the standard, each map class
2158           has a corresponding set class. E.g.,
2159           <classname>std::map&lt;int, char&gt;</classname> maps each
2160           <classname>int</classname> to a <classname>char</classname>, but
2161           <classname>std::set&lt;int, char&gt;</classname> simply stores
2162           <classname>int</classname>s. In this library, however, there are no
2163           distinct classes for maps and sets. Instead, an associative
2164           container's <classname>Mapped</classname> template parameter is a policy: if
2165           it is instantiated by <classname>null_type</classname>, then it
2166           is a "set"; otherwise, it is a "map". E.g.,
2167         </para>
2168         <programlisting>
2169           cc_hash_table&lt;int, char&gt;
2170         </programlisting>
2171         <para>
2172           is a "map" mapping each <type>int</type> value to a <type>
2173           char</type>, but
2174         </para>
2175         <programlisting>
2176           cc_hash_table&lt;int, null_type&gt;
2177         </programlisting>
2178         <para>
2179           is a type that uniquely stores <type>int</type> values.
2180         </para>
2181         <para>Once the <classname>Mapped</classname> template parameter is instantiated
2182         by <classname>null_type</classname>, then
2183         the "set" acts very similarly to the standard's sets - it does not
2184         map each key to a distinct <classname>null_type</classname> object. Also,
2185         , the container's <type>value_type</type> is essentially
2186         its <type>key_type</type> - just as with the standard's sets
2187         .</para>
2188
2189         <para>
2190           The standard's multimaps and multisets allow, respectively,
2191           non-uniquely mapping keys and non-uniquely storing keys. As
2192           discussed, the
2193           reasons why this might be necessary are 1) that a key might be
2194           decomposed into a primary key and a secondary key, 2) that a
2195           key might appear more than once, or 3) any arbitrary
2196           combination of 1)s and 2)s. Correspondingly,
2197           one should use 1) "maps" mapping primary keys to secondary
2198           keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2199           combination of 1)s and 2)s. Thus, for example, an
2200           <classname>std::multiset&lt;int&gt;</classname> might be used to store
2201           multiple instances of integers, but using this library's
2202           containers, one might use
2203         </para>
2204         <programlisting>
2205           tree&lt;int, size_t&gt;
2206         </programlisting>
2207
2208         <para>
2209           i.e., a <classname>map</classname> of <type>int</type>s to
2210           <type>size_t</type>s.
2211         </para>
2212         <para>
2213           These "multimaps" and "multisets" might be confusing to
2214           anyone familiar with the standard's <classname>std::multimap</classname> and
2215           <classname>std::multiset</classname>, because there is no clear
2216           correspondence between the two. For example, in some cases
2217           where one uses <classname>std::multiset</classname> in the standard, one might use
2218           in this library a "multimap" of "multisets" - i.e., a
2219           container that maps primary keys each to an associative
2220           container that maps each secondary key to the number of times
2221           it occurs.
2222         </para>
2223
2224         <para>
2225           When one uses a "multimap," one should choose with care the
2226           type of container used for secondary keys.
2227         </para>
2228       </section> <!-- map vs set -->
2229
2230
2231       <section xml:id="concepts.associative_semantics.multi">
2232         <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2233
2234         <para>
2235           Brace onself: this library does not contain containers like
2236           <classname>std::multimap</classname> or
2237           <classname>std::multiset</classname>. Instead, these data
2238           structures can be synthesized via manipulation of the
2239           <classname>Mapped</classname> template parameter.
2240         </para>
2241         <para>
2242           One maps the unique part of a key - the primary key, into an
2243           associative-container of the (originally) non-unique parts of
2244           the key - the secondary key. A primary associative-container
2245           is an associative container of primary keys; a secondary
2246           associative-container is an associative container of
2247           secondary keys.
2248         </para>
2249
2250         <para>
2251           Stepping back a bit, and starting in from the beginning.
2252         </para>
2253
2254
2255         <para>
2256           Maps (or sets) allow mapping (or storing) unique-key values.
2257           The standard library also supplies associative containers which
2258           map (or store) multiple values with equivalent keys:
2259           <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2260           <classname>std::tr1::unordered_multimap</classname>, and
2261           <classname>unordered_multiset</classname>. We first discuss how these might
2262           be used, then why we think it is best to avoid them.
2263         </para>
2264
2265         <para>
2266           Suppose one builds a simple bank-account application that
2267           records for each client (identified by an <classname>std::string</classname>)
2268           and account-id (marked by an <type>unsigned long</type>) -
2269           the balance in the account (described by a
2270           <type>float</type>). Suppose further that ordering this
2271           information is not useful, so a hash-based container is
2272           preferable to a tree based container. Then one can use
2273         </para>
2274
2275         <programlisting>
2276           std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2277         </programlisting>
2278
2279         <para>
2280           which hashes every combination of client and account-id. This
2281           might work well, except for the fact that it is now impossible
2282           to efficiently list all of the accounts of a specific client
2283           (this would practically require iterating over all
2284           entries). Instead, one can use
2285         </para>
2286
2287         <programlisting>
2288           std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2289         </programlisting>
2290
2291         <para>
2292           which hashes every client, and decides equivalence based on
2293           client only. This will ensure that all accounts belonging to a
2294           specific user are stored consecutively.
2295         </para>
2296
2297         <para>
2298           Also, suppose one wants an integers' priority queue
2299           (a container that supports <function>push</function>,
2300           <function>pop</function>, and <function>top</function> operations, the last of which
2301           returns the largest <type>int</type>) that also supports
2302           operations such as <function>find</function> and <function>lower_bound</function>. A
2303           reasonable solution is to build an adapter over
2304           <classname>std::set&lt;int&gt;</classname>. In this adapter,
2305           <function>push</function> will just call the tree-based
2306           associative container's <function>insert</function> method; <function>pop</function>
2307           will call its <function>end</function> method, and use it to return the
2308           preceding element (which must be the largest). Then this might
2309           work well, except that the container object cannot hold
2310           multiple instances of the same integer (<function>push(4)</function>,
2311           will be a no-op if <constant>4</constant> is already in the
2312           container object). If multiple keys are necessary, then one
2313           might build the adapter over an
2314           <classname>std::multiset&lt;int&gt;</classname>.
2315         </para>
2316
2317         <para>
2318           The standard library's non-unique-mapping containers are useful
2319           when (1) a key can be decomposed in to a primary key and a
2320           secondary key, (2) a key is needed multiple times, or (3) any
2321           combination of (1) and (2).
2322         </para>
2323
2324         <para>
2325           The graphic below shows how the standard library's container
2326           design works internally; in this figure nodes shaded equally
2327           represent equivalent-key values. Equivalent keys are stored
2328           consecutively using the properties of the underlying data
2329           structure: binary search trees (label A) store equivalent-key
2330           values consecutively (in the sense of an in-order walk)
2331           naturally; collision-chaining hash tables (label B) store
2332           equivalent-key values in the same bucket, the bucket can be
2333           arranged so that equivalent-key values are consecutive.
2334         </para>
2335
2336         <figure>
2337           <title>Non-unique Mapping Standard Containers</title>
2338           <mediaobject>
2339             <imageobject>
2340               <imagedata align="center" format="PNG" scale="100"
2341                          fileref="../images/pbds_embedded_lists_1.png"/>
2342             </imageobject>
2343             <textobject>
2344               <phrase>Non-unique Mapping Standard Containers</phrase>
2345             </textobject>
2346           </mediaobject>
2347         </figure>
2348
2349         <para>
2350           Put differently, the standards' non-unique mapping
2351           associative-containers are associative containers that map
2352           primary keys to linked lists that are embedded into the
2353           container. The graphic below shows again the two
2354           containers from the first graphic above, this time with
2355           the embedded linked lists of the grayed nodes marked
2356           explicitly.
2357         </para>
2358
2359         <figure xml:id="fig.pbds_embedded_lists_2">
2360           <title>
2361             Effect of embedded lists in
2362             <classname>std::multimap</classname>
2363           </title>
2364           <mediaobject>
2365             <imageobject>
2366               <imagedata align="center" format="PNG" scale="100"
2367                          fileref="../images/pbds_embedded_lists_2.png"/>
2368             </imageobject>
2369             <textobject>
2370               <phrase>
2371                 Effect of embedded lists in
2372                 <classname>std::multimap</classname>
2373               </phrase>
2374             </textobject>
2375           </mediaobject>
2376         </figure>
2377
2378         <para>
2379           These embedded linked lists have several disadvantages.
2380         </para>
2381
2382         <orderedlist>
2383           <listitem>
2384             <para>
2385               The underlying data structure embeds the linked lists
2386               according to its own consideration, which means that the
2387               search path for a value might include several different
2388               equivalent-key values. For example, the search path for the
2389               the black node in either of the first graphic, labels A or B,
2390               includes more than a single gray node.
2391             </para>
2392           </listitem>
2393
2394           <listitem>
2395             <para>
2396               The links of the linked lists are the underlying data
2397               structures' nodes, which typically are quite structured.  In
2398               the case of tree-based containers (the grapic above, label
2399               B), each "link" is actually a node with three pointers (one
2400               to a parent and two to children), and a
2401               relatively-complicated iteration algorithm. The linked
2402               lists, therefore, can take up quite a lot of memory, and
2403               iterating over all values equal to a given key (through the
2404               return value of the standard
2405               library's <function>equal_range</function>) can be
2406               expensive.
2407             </para>
2408           </listitem>
2409
2410           <listitem>
2411             <para>
2412               The primary key is stored multiply; this uses more memory.
2413             </para>
2414           </listitem>
2415
2416           <listitem>
2417             <para>
2418               Finally, the interface of this design excludes several
2419               useful underlying data structures. Of all the unordered
2420               self-organizing data structures, practically only
2421               collision-chaining hash tables can (efficiently) guarantee
2422               that equivalent-key values are stored consecutively.
2423             </para>
2424           </listitem>
2425         </orderedlist>
2426
2427         <para>
2428           The above reasons hold even when the ratio of secondary keys to
2429           primary keys (or average number of identical keys) is small, but
2430           when it is large, there are more severe problems:
2431         </para>
2432
2433         <orderedlist>
2434           <listitem>
2435             <para>
2436               The underlying data structures order the links inside each
2437               embedded linked-lists according to their internal
2438               considerations, which effectively means that each of the
2439               links is unordered. Irrespective of the underlying data
2440               structure, searching for a specific value can degrade to
2441               linear complexity.
2442             </para>
2443           </listitem>
2444
2445           <listitem>
2446             <para>
2447               Similarly to the above point, it is impossible to apply
2448               to the secondary keys considerations that apply to primary
2449               keys. For example, it is not possible to maintain secondary
2450               keys by sorted order.
2451             </para>
2452           </listitem>
2453
2454           <listitem>
2455             <para>
2456               While the interface "understands" that all equivalent-key
2457               values constitute a distinct list (through
2458               <function>equal_range</function>), the underlying data
2459               structure typically does not. This means that operations such
2460               as erasing from a tree-based container all values whose keys
2461               are equivalent to a a given key can be super-linear in the
2462               size of the tree; this is also true also for several other
2463               operations that target a specific list.
2464             </para>
2465           </listitem>
2466
2467         </orderedlist>
2468
2469         <para>
2470           In this library, all associative containers map
2471           (or store) unique-key values. One can (1) map primary keys to
2472           secondary associative-containers (containers of
2473           secondary keys) or non-associative containers (2) map identical
2474           keys to a size-type representing the number of times they
2475           occur, or (3) any combination of (1) and (2). Instead of
2476           allowing multiple equivalent-key values, this library
2477           supplies associative containers based on underlying
2478           data structures that are suitable as secondary
2479           associative-containers.
2480         </para>
2481
2482         <para>
2483           In the figure below, labels A and B show the equivalent
2484           underlying data structures in this library, as mapped to the
2485           first graphic above. Labels A and B, respectively. Each shaded
2486           box represents some size-type or secondary
2487           associative-container.
2488         </para>
2489
2490         <figure>
2491           <title>Non-unique Mapping Containers</title>
2492           <mediaobject>
2493             <imageobject>
2494               <imagedata align="center" format="PNG" scale="100"
2495                          fileref="../images/pbds_embedded_lists_3.png"/>
2496             </imageobject>
2497             <textobject>
2498               <phrase>Non-unique Mapping Containers</phrase>
2499             </textobject>
2500           </mediaobject>
2501         </figure>
2502
2503         <para>
2504           In the first example above, then, one would use an associative
2505           container mapping each user to an associative container which
2506           maps each application id to a start time (see
2507           <filename>example/basic_multimap.cc</filename>); in the second
2508           example, one would use an associative container mapping
2509           each <classname>int</classname> to some size-type indicating the
2510           number of times it logically occurs
2511           (see <filename>example/basic_multiset.cc</filename>.
2512         </para>
2513
2514         <para>
2515           See the discussion in list-based container types for containers
2516           especially suited as secondary associative-containers.
2517         </para>
2518       </section>
2519
2520     </section> <!-- map and set semantics -->
2521
2522     <section xml:id="pbds.design.concepts.iterator_semantics">
2523       <info><title>Iterator Semantics</title></info>
2524
2525       <section xml:id="concepts.iterator_semantics.point_and_range">
2526         <info><title>Point and Range Iterators</title></info>
2527
2528         <para>
2529           Iterator concepts are bifurcated in this design, and are
2530           comprised of point-type and range-type iteration.
2531         </para>
2532
2533         <para>
2534           A point-type iterator is an iterator that refers to a specific
2535           element as returned through an
2536           associative-container's <function>find</function> method.
2537         </para>
2538
2539         <para>
2540           A range-type iterator is an iterator that is used to go over a
2541           sequence of elements, as returned by a container's
2542           <function>find</function> method.
2543         </para>
2544
2545         <para>
2546           A point-type method is a method that
2547           returns a point-type iterator; a range-type method is a method
2548           that returns a range-type iterator.
2549         </para>
2550
2551         <para>For most containers, these types are synonymous; for
2552         self-organizing containers, such as hash-based containers or
2553         priority queues, these are inherently different (in any
2554         implementation, including that of C++ standard library
2555         components), but in this design, it is made explicit. They are
2556         distinct types.
2557         </para>
2558       </section>
2559
2560
2561       <section xml:id="concepts.iterator_semantics.both">
2562         <info><title>Distinguishing Point and Range Iterators</title></info>
2563
2564         <para>When using this library, is necessary to differentiate
2565         between two types of methods and iterators: point-type methods and
2566         iterators, and range-type methods and iterators. Each associative
2567         container's interface includes the methods:</para>
2568         <programlisting>
2569           point_const_iterator
2570           find(const_key_reference r_key) const;
2571
2572           point_iterator
2573           find(const_key_reference r_key);
2574
2575           std::pair&lt;point_iterator,bool&gt;
2576           insert(const_reference r_val);
2577         </programlisting>
2578
2579         <para>The relationship between these iterator types varies between
2580         container types. The figure below
2581         shows the most general invariant between point-type and
2582         range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2583         always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2584         shows invariants for order-preserving containers: point-type
2585         iterators are synonymous with range-type iterators.
2586         Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2587         containers: iterators are synonymous with const iterators.</para>
2588
2589         <figure>
2590           <title>Point Iterator Hierarchy</title>
2591           <mediaobject>
2592             <imageobject>
2593               <imagedata align="center" format="PNG" scale="100"
2594                          fileref="../images/pbds_point_iterator_hierarchy.png"/>
2595             </imageobject>
2596             <textobject>
2597               <phrase>Point Iterator Hierarchy</phrase>
2598             </textobject>
2599           </mediaobject>
2600         </figure>
2601
2602
2603         <para>Note that point-type iterators in self-organizing containers
2604         (hash-based associative containers) lack movement
2605         operators, such as <literal>operator++</literal> - in fact, this
2606         is the reason why this library differentiates from the standard C++ librarys
2607         design on this point.</para>
2608
2609         <para>Typically, one can determine an iterator's movement
2610         capabilities using
2611         <literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2612         which is a <literal>struct</literal> indicating the iterator's
2613         movement capabilities. Unfortunately, none of the standard predefined
2614         categories reflect a pointer's <emphasis>not</emphasis> having any
2615         movement capabilities whatsoever. Consequently,
2616         <literal>pb_ds</literal> adds a type
2617         <literal>trivial_iterator_tag</literal> (whose name is taken from
2618         a concept in C++ standardese, which is the category of iterators
2619         with no movement capabilities.) All other standard C++ library
2620         tags, such as <literal>forward_iterator_tag</literal> retain their
2621         common use.</para>
2622
2623       </section>
2624
2625       <section xml:id="pbds.design.concepts.invalidation">
2626         <info><title>Invalidation Guarantees</title></info>
2627         <para>
2628           If one manipulates a container object, then iterators previously
2629           obtained from it can be invalidated. In some cases a
2630           previously-obtained iterator cannot be de-referenced; in other cases,
2631           the iterator's next or previous element might have changed
2632           unpredictably. This corresponds exactly to the question whether a
2633           point-type or range-type iterator (see previous concept) is valid or
2634           not. In this design, one can query a container (in compile time) about
2635           its invalidation guarantees.
2636         </para>
2637
2638
2639         <para>
2640           Given three different types of associative containers, a modifying
2641           operation (in that example, <function>erase</function>) invalidated
2642           iterators in three different ways: the iterator of one container
2643           remained completely valid - it could be de-referenced and
2644           incremented; the iterator of a different container could not even be
2645           de-referenced; the iterator of the third container could be
2646           de-referenced, but its "next" iterator changed unpredictably.
2647         </para>
2648
2649         <para>
2650           Distinguishing between find and range types allows fine-grained
2651           invalidation guarantees, because these questions correspond exactly
2652           to the question of whether point-type iterators and range-type
2653           iterators are valid. The graphic below shows tags corresponding to
2654           different types of invalidation guarantees.
2655         </para>
2656
2657         <figure>
2658           <title>Invalidation Guarantee Tags Hierarchy</title>
2659           <mediaobject>
2660             <imageobject>
2661               <imagedata align="center" format="PDF" scale="75"
2662                          fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2663             </imageobject>
2664             <imageobject>
2665               <imagedata align="center" format="PNG" scale="100"
2666                          fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2667             </imageobject>
2668             <textobject>
2669               <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2670             </textobject>
2671           </mediaobject>
2672         </figure>
2673
2674         <itemizedlist>
2675           <listitem>
2676             <para>
2677               <classname>basic_invalidation_guarantee</classname>
2678               corresponds to a basic guarantee that a point-type iterator,
2679               a found pointer, or a found reference, remains valid as long
2680               as the container object is not modified.
2681             </para>
2682           </listitem>
2683
2684           <listitem>
2685             <para>
2686               <classname>point_invalidation_guarantee</classname>
2687               corresponds to a guarantee that a point-type iterator, a
2688               found pointer, or a found reference, remains valid even if
2689               the container object is modified.
2690             </para>
2691           </listitem>
2692
2693           <listitem>
2694             <para>
2695               <classname>range_invalidation_guarantee</classname>
2696               corresponds to a guarantee that a range-type iterator remains
2697               valid even if the container object is modified.
2698             </para>
2699           </listitem>
2700         </itemizedlist>
2701
2702         <para>To find the invalidation guarantee of a
2703         container, one can use</para>
2704         <programlisting>
2705           typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2706         </programlisting>
2707
2708         <para>Note that this hierarchy corresponds to the logic it
2709         represents: if a container has range-invalidation guarantees,
2710         then it must also have find invalidation guarantees;
2711         correspondingly, its invalidation guarantee (in this case
2712         <classname>range_invalidation_guarantee</classname>)
2713         can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2714         This means that this this hierarchy can be used easily using
2715         standard metaprogramming techniques, by specializing on the
2716         type of <literal>invalidation_guarantee</literal>.</para>
2717
2718         <para>
2719           These types of problems were addressed, in a more general
2720           setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2721           our opinion, an invalidation-guarantee hierarchy would solve
2722           these problems in all container types - not just associative
2723           containers.
2724         </para>
2725
2726       </section>
2727     </section> <!-- iterator semantics -->
2728
2729     <section xml:id="pbds.design.concepts.genericity">
2730       <info><title>Genericity</title></info>
2731
2732       <para>
2733         The design attempts to address the following problem of
2734         data-structure genericity. When writing a function manipulating
2735         a generic container object, what is the behavior of the object?
2736         Suppose one writes
2737       </para>
2738       <programlisting>
2739         template&lt;typename Cntnr&gt;
2740         void
2741         some_op_sequence(Cntnr &amp;r_container)
2742         {
2743         ...
2744         }
2745       </programlisting>
2746
2747       <para>
2748         then one needs to address the following questions in the body
2749         of <function>some_op_sequence</function>:
2750       </para>
2751
2752       <itemizedlist>
2753         <listitem>
2754           <para>
2755             Which types and methods does <literal>Cntnr</literal> support?
2756             Containers based on hash tables can be queries for the
2757             hash-functor type and object; this is meaningless for tree-based
2758             containers. Containers based on trees can be split, joined, or
2759             can erase iterators and return the following iterator; this
2760             cannot be done by hash-based containers.
2761           </para>
2762         </listitem>
2763
2764         <listitem>
2765           <para>
2766             What are the exception and invalidation guarantees
2767             of <literal>Cntnr</literal>? A container based on a probing
2768             hash-table invalidates all iterators when it is modified; this
2769             is not the case for containers based on node-based
2770             trees. Containers based on a node-based tree can be split or
2771             joined without exceptions; this is not the case for containers
2772             based on vector-based trees.
2773           </para>
2774         </listitem>
2775
2776         <listitem>
2777           <para>
2778             How does the container maintain its elements? Tree-based and
2779             Trie-based containers store elements by key order; others,
2780             typically, do not. A container based on a splay trees or lists
2781             with update policies "cache" "frequently accessed" elements;
2782             containers based on most other underlying data structures do
2783             not.
2784           </para>
2785         </listitem>
2786         <listitem>
2787           <para>
2788             How does one query a container about characteristics and
2789             capabilities? What is the relationship between two different
2790             data structures, if anything?
2791           </para>
2792         </listitem>
2793       </itemizedlist>
2794
2795       <para>The remainder of this section explains these issues in
2796       detail.</para>
2797
2798
2799       <section xml:id="concepts.genericity.tag">
2800         <info><title>Tag</title></info>
2801         <para>
2802           Tags are very useful for manipulating generic types. For example, if
2803           <literal>It</literal> is an iterator class, then <literal>typename
2804           It::iterator_category</literal> or <literal>typename
2805           std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2806           yield its category, and <literal>typename
2807           std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2808           value type.
2809         </para>
2810
2811         <para>
2812           This library contains a container tag hierarchy corresponding to the
2813           diagram below.
2814         </para>
2815
2816         <figure>
2817           <title>Container Tag Hierarchy</title>
2818           <mediaobject>
2819             <imageobject>
2820               <imagedata align="center" format="PDF" scale="75"
2821                          fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2822             </imageobject>
2823             <imageobject>
2824               <imagedata align="center" format="PNG" scale="100"
2825                          fileref="../images/pbds_container_tag_hierarchy.png"/>
2826             </imageobject>
2827             <textobject>
2828               <phrase>Container Tag Hierarchy</phrase>
2829             </textobject>
2830           </mediaobject>
2831         </figure>
2832
2833         <para>
2834           Given any container <type>Cntnr</type>, the tag of
2835           the underlying data structure can be found via <literal>typename
2836           Cntnr::container_category</literal>.
2837         </para>
2838
2839       </section> <!-- tag -->
2840
2841       <section xml:id="concepts.genericity.traits">
2842         <info><title>Traits</title></info>
2843         <para></para>
2844
2845         <para>Additionally, a traits mechanism can be used to query a
2846         container type for its attributes. Given any container
2847         <literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2848         is a traits class identifying the properties of the
2849         container.</para>
2850
2851         <para>To find if a container can throw when a key is erased (which
2852         is true for vector-based trees, for example), one can
2853         use
2854         </para>
2855         <programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2856
2857         <para>
2858           Some of the definitions in <classname>container_traits</classname>
2859           are dependent on other
2860           definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2861           is <constant>true</constant> (which is the case for containers
2862           based on trees and tries), then the container can be split or
2863           joined; in this
2864           case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2865           indicates whether splits or joins can throw exceptions (which is
2866           true for vector-based trees);
2867           otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2868           will yield a compilation error. (This is somewhat similar to a
2869           compile-time version of the COM model).
2870         </para>
2871
2872       </section> <!-- traits -->
2873
2874     </section> <!-- genericity -->
2875   </section> <!-- concepts -->
2876
2877   <section xml:id="pbds.design.container">
2878     <info><title>By Container</title></info>
2879
2880     <!-- hash -->
2881     <section xml:id="pbds.design.container.hash">
2882       <info><title>hash</title></info>
2883
2884       <!--
2885
2886 // hash policies
2887 /// general terms / background
2888 /// range hashing policies
2889 /// ranged-hash policies
2890 /// implementation
2891
2892 // resize policies
2893 /// general
2894 /// size policies
2895 /// trigger policies
2896 /// implementation
2897
2898 // policy interactions
2899 /// probe/size/trigger
2900 /// hash/trigger
2901 /// eq/hash/storing hash values
2902 /// size/load-check trigger
2903       -->
2904       <section xml:id="container.hash.interface">
2905         <info><title>Interface</title></info>
2906
2907
2908
2909         <para>
2910           The collision-chaining hash-based container has the
2911         following declaration.</para>
2912         <programlisting>
2913           template&lt;
2914           typename Key,
2915           typename Mapped,
2916           typename Hash_Fn = std::hash&lt;Key&gt;,
2917           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2918           typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2919           typename Resize_Policy = default explained below.
2920           bool Store_Hash = false,
2921           typename Allocator = std::allocator&lt;char&gt; &gt;
2922           class cc_hash_table;
2923         </programlisting>
2924
2925         <para>The parameters have the following meaning:</para>
2926
2927         <orderedlist>
2928           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2929
2930           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2931
2932           <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2933
2934           <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2935
2936           <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2937           it describes how to translate hash values into positions
2938           within the table. </para></listitem>
2939
2940           <listitem><para><classname>Resize_Policy</classname> describes how a container object
2941           should change its internal size. </para></listitem>
2942
2943           <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2944           should be stored with each entry. </para></listitem>
2945
2946           <listitem><para><classname>Allocator</classname> is an allocator
2947           type.</para></listitem>
2948         </orderedlist>
2949
2950         <para>The probing hash-based container has the following
2951         declaration.</para>
2952         <programlisting>
2953           template&lt;
2954           typename Key,
2955           typename Mapped,
2956           typename Hash_Fn = std::hash&lt;Key&gt;,
2957           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2958           typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2959           typename Probe_Fn = default explained below.
2960           typename Resize_Policy = default explained below.
2961           bool Store_Hash = false,
2962           typename Allocator =  std::allocator&lt;char&gt; &gt;
2963           class gp_hash_table;
2964         </programlisting>
2965
2966         <para>The parameters are identical to those of the
2967         collision-chaining container, except for the following.</para>
2968
2969         <orderedlist>
2970           <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2971           sequence into a sequence of positions within the table.</para></listitem>
2972
2973           <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2974         </orderedlist>
2975
2976         <para>Some of the default template values depend on the values of
2977         other parameters, and are explained below.</para>
2978
2979       </section>
2980       <section xml:id="container.hash.details">
2981         <info><title>Details</title></info>
2982
2983         <section xml:id="container.hash.details.hash_policies">
2984           <info><title>Hash Policies</title></info>
2985
2986           <section xml:id="details.hash_policies.general">
2987             <info><title>General</title></info>
2988
2989             <para>Following is an explanation of some functions which hashing
2990             involves. The graphic below illustrates the discussion.</para>
2991
2992             <figure>
2993               <title>Hash functions, ranged-hash functions, and
2994               range-hashing functions</title>
2995               <mediaobject>
2996                 <imageobject>
2997                   <imagedata align="center" format="PNG" scale="100"
2998                              fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
2999                 </imageobject>
3000                 <textobject>
3001                   <phrase>Hash functions, ranged-hash functions, and
3002                   range-hashing functions</phrase>
3003                 </textobject>
3004               </mediaobject>
3005             </figure>
3006
3007             <para>Let U be a domain (e.g., the integers, or the
3008             strings of 3 characters). A hash-table algorithm needs to map
3009             elements of U "uniformly" into the range [0,..., m -
3010             1] (where m is a non-negative integral value, and
3011             is, in general, time varying). I.e., the algorithm needs
3012             a ranged-hash function</para>
3013
3014             <para>
3015               f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3016             </para>
3017
3018             <para>such that for any u in U ,</para>
3019
3020             <para>0 ≤ f(u, m) ≤ m - 1</para>
3021
3022             <para>and which has "good uniformity" properties (say
3023             <xref linkend="biblio.knuth98sorting"/>.)
3024             One
3025             common solution is to use the composition of the hash
3026             function</para>
3027
3028             <para>h : U → Z<subscript>+</subscript> ,</para>
3029
3030             <para>which maps elements of U into the non-negative
3031             integrals, and</para>
3032
3033             <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3034             Z<subscript>+</subscript>,</para>
3035
3036             <para>which maps a non-negative hash value, and a non-negative
3037             range upper-bound into a non-negative integral in the range
3038             between 0 (inclusive) and the range upper bound (exclusive),
3039             i.e., for any r in Z<subscript>+</subscript>,</para>
3040
3041             <para>0 ≤ g(r, m) ≤ m - 1</para>
3042
3043
3044             <para>The resulting ranged-hash function, is</para>
3045
3046             <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3047             <equation>
3048               <title>Ranged Hash Function</title>
3049               <mathphrase>
3050                 f(u , m) = g(h(u), m)
3051               </mathphrase>
3052             </equation>
3053
3054             <para>From the above, it is obvious that given g and
3055             h, f can always be composed (however the converse
3056             is not true). The standard's hash-based containers allow specifying
3057             a hash function, and use a hard-wired range-hashing function;
3058             the ranged-hash function is implicitly composed.</para>
3059
3060             <para>The above describes the case where a key is to be mapped
3061             into a single position within a hash table, e.g.,
3062             in a collision-chaining table. In other cases, a key is to be
3063             mapped into a sequence of positions within a table,
3064             e.g., in a probing table. Similar terms apply in this
3065             case: the table requires a ranged probe function,
3066             mapping a key into a sequence of positions withing the table.
3067             This is typically achieved by composing a hash function
3068             mapping the key into a non-negative integral type, a
3069             probe function transforming the hash value into a
3070             sequence of hash values, and a range-hashing function
3071             transforming the sequence of hash values into a sequence of
3072             positions.</para>
3073
3074           </section>
3075
3076           <section xml:id="details.hash_policies.range">
3077             <info><title>Range Hashing</title></info>
3078
3079             <para>Some common choices for range-hashing functions are the
3080             division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3081             as</para>
3082
3083             <equation>
3084               <title>Range-Hashing, Division Method</title>
3085               <mathphrase>
3086                 g(r, m) = r mod m
3087               </mathphrase>
3088             </equation>
3089
3090
3091
3092             <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3093
3094             <para>and</para>
3095
3096             <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3097
3098             <para>respectively, for some positive integrals u and
3099             v (typically powers of 2), and some a. Each of
3100             these range-hashing functions works best for some different
3101             setting.</para>
3102
3103             <para>The division method (see above) is a
3104             very common choice. However, even this single method can be
3105             implemented in two very different ways. It is possible to
3106             implement using the low
3107             level % (modulo) operation (for any m), or the
3108             low level &amp; (bit-mask) operation (for the case where
3109             m is a power of 2), i.e.,</para>
3110
3111             <equation>
3112               <title>Division via Prime Modulo</title>
3113               <mathphrase>
3114                 g(r, m) = r % m
3115               </mathphrase>
3116             </equation>
3117
3118             <para>and</para>
3119
3120             <equation>
3121               <title>Division via Bit Mask</title>
3122               <mathphrase>
3123                 g(r, m) = r &amp; m - 1, (with m =
3124                 2<superscript>k</superscript> for some k)
3125               </mathphrase>
3126             </equation>
3127
3128
3129             <para>respectively.</para>
3130
3131             <para>The % (modulo) implementation has the advantage that for
3132             m a prime far from a power of 2, g(r, m) is
3133             affected by all the bits of r (minimizing the chance of
3134             collision). It has the disadvantage of using the costly modulo
3135             operation. This method is hard-wired into SGI's implementation
3136             .</para>
3137
3138             <para>The &amp; (bit-mask) implementation has the advantage of
3139             relying on the fast bit-wise and operation. It has the
3140             disadvantage that for g(r, m) is affected only by the
3141             low order bits of r. This method is hard-wired into
3142             Dinkumware's implementation.</para>
3143
3144
3145           </section>
3146
3147           <section xml:id="details.hash_policies.ranged">
3148             <info><title>Ranged Hash</title></info>
3149
3150             <para>In cases it is beneficial to allow the
3151             client to directly specify a ranged-hash hash function. It is
3152             true, that the writer of the ranged-hash function cannot rely
3153             on the values of m having specific numerical properties
3154             suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3155             the values of m are determined by a resize policy with
3156             possibly orthogonal considerations.</para>
3157
3158             <para>There are two cases where a ranged-hash function can be
3159             superior. The firs is when using perfect hashing: the
3160             second is when the values of m can be used to estimate
3161             the "general" number of distinct values required. This is
3162             described in the following.</para>
3163
3164             <para>Let</para>
3165
3166             <para>
3167               s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3168             </para>
3169
3170             <para>be a string of t characters, each of which is from
3171             domain S. Consider the following ranged-hash
3172             function:</para>
3173             <equation>
3174               <title>
3175                 A Standard String Hash Function
3176               </title>
3177               <mathphrase>
3178                 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3179                 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3180               </mathphrase>
3181             </equation>
3182
3183
3184             <para>where a is some non-negative integral value. This is
3185             the standard string-hashing function used in SGI's
3186             implementation (with a = 5). Its advantage is that
3187             it takes into account all of the characters of the string.</para>
3188
3189             <para>Now assume that s is the string representation of a
3190             of a long DNA sequence (and so S = {'A', 'C', 'G',
3191             'T'}). In this case, scanning the entire string might be
3192             prohibitively expensive. A possible alternative might be to use
3193             only the first k characters of the string, where</para>
3194
3195             <para>|S|<superscript>k</superscript> ≥ m ,</para>
3196
3197             <para>i.e., using the hash function</para>
3198
3199             <equation>
3200               <title>
3201                 Only k String DNA Hash
3202               </title>
3203               <mathphrase>
3204                 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3205                 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3206               </mathphrase>
3207             </equation>
3208
3209             <para>requiring scanning over only</para>
3210
3211             <para>k = log<subscript>4</subscript>( m )</para>
3212
3213             <para>characters.</para>
3214
3215             <para>Other more elaborate hash-functions might scan k
3216             characters starting at a random position (determined at each
3217             resize), or scanning k random positions (determined at
3218             each resize), i.e., using</para>
3219
3220             <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3221             r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3222             a<superscript>i</superscript> mod m ,</para>
3223
3224             <para>or</para>
3225
3226             <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3227             1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3228             m ,</para>
3229
3230             <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3231             each in the (inclusive) range [0,...,t-1].</para>
3232
3233             <para>It should be noted that the above functions cannot be
3234             decomposed as per a ranged hash composed of hash and range hashing.</para>
3235
3236
3237           </section>
3238
3239           <section xml:id="details.hash_policies.implementation">
3240             <info><title>Implementation</title></info>
3241
3242             <para>This sub-subsection describes the implementation of
3243             the above in this library. It first explains range-hashing
3244             functions in collision-chaining tables, then ranged-hash
3245             functions in collision-chaining tables, then probing-based
3246             tables, and finally lists the relevant classes in this
3247             library.</para>
3248
3249             <section xml:id="hash_policies.implementation.collision-chaining">
3250               <info><title>
3251                 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3252               </title></info>
3253
3254
3255               <para><classname>cc_hash_table</classname> is
3256               parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3257               hash functor and a combining hash functor, respectively.</para>
3258
3259               <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3260               range-hashing functor. <classname>cc_hash_table</classname>
3261               synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3262               <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3263               diagram for this case. The user inserts an element (point A),
3264               the container transforms the key into a non-negative integral
3265               using the hash functor (points B and C), and transforms the
3266               result into a position using the combining functor (points D
3267               and E).</para>
3268
3269               <figure>
3270                 <title>Insert hash sequence diagram</title>
3271                 <mediaobject>
3272                   <imageobject>
3273                     <imagedata align="center" format="PNG" scale="100"
3274                                fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3275                   </imageobject>
3276                   <textobject>
3277                     <phrase>Insert hash sequence diagram</phrase>
3278                   </textobject>
3279                 </mediaobject>
3280               </figure>
3281
3282               <para>If <classname>cc_hash_table</classname>'s
3283               hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3284               a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3285               diagram. The user inserts an element (point A), the container
3286               transforms the key into a position using the combining functor
3287               (points B and C).</para>
3288
3289               <figure>
3290                 <title>Insert hash sequence diagram with a null policy</title>
3291                 <mediaobject>
3292                   <imageobject>
3293                     <imagedata align="center" format="PNG" scale="100"
3294                                fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3295                   </imageobject>
3296                   <textobject>
3297                     <phrase>Insert hash sequence diagram with a null policy</phrase>
3298                   </textobject>
3299                 </mediaobject>
3300               </figure>
3301
3302             </section>
3303
3304             <section xml:id="hash_policies.implementation.probe">
3305               <info><title>
3306                 Probing tables
3307               </title></info>
3308               <para><classname>gp_hash_table</classname> is parametrized by
3309               <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3310               and <classname>Comb_Probe_Fn</classname>. As before, if
3311               <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3312               are both <classname>null_type</classname>, then
3313               <classname>Comb_Probe_Fn</classname> is a ranged-probe
3314               functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3315               functor, <classname>Probe_Fn</classname> is a functor for offsets
3316               from a hash value, and <classname>Comb_Probe_Fn</classname>
3317               transforms a probe sequence into a sequence of positions within
3318               the table.</para>
3319
3320             </section>
3321
3322             <section xml:id="hash_policies.implementation.predefined">
3323               <info><title>
3324                 Pre-Defined Policies
3325               </title></info>
3326
3327               <para>This library contains some pre-defined classes
3328               implementing range-hashing and probing functions:</para>
3329
3330               <orderedlist>
3331                 <listitem><para><classname>direct_mask_range_hashing</classname>
3332                 and <classname>direct_mod_range_hashing</classname>
3333                 are range-hashing functions based on a bit-mask and a modulo
3334                 operation, respectively.</para></listitem>
3335
3336                 <listitem><para><classname>linear_probe_fn</classname>, and
3337                 <classname>quadratic_probe_fn</classname> are
3338                 a linear probe and a quadratic probe function,
3339                 respectively.</para></listitem>
3340               </orderedlist>
3341
3342               <para>
3343                 The graphic below shows the relationships.
3344               </para>
3345               <figure>
3346                 <title>Hash policy class diagram</title>
3347                 <mediaobject>
3348                   <imageobject>
3349                     <imagedata align="center" format="PNG" scale="100"
3350                                fileref="../images/pbds_hash_policy_cd.png"/>
3351                   </imageobject>
3352                   <textobject>
3353                     <phrase>Hash policy class diagram</phrase>
3354                   </textobject>
3355                 </mediaobject>
3356               </figure>
3357
3358
3359             </section>
3360
3361           </section> <!-- impl -->
3362
3363         </section>
3364
3365         <section xml:id="container.hash.details.resize_policies">
3366           <info><title>Resize Policies</title></info>
3367
3368           <section xml:id="resize_policies.general">
3369             <info><title>General</title></info>
3370
3371             <para>Hash-tables, as opposed to trees, do not naturally grow or
3372             shrink. It is necessary to specify policies to determine how
3373             and when a hash table should change its size. Usually, resize
3374             policies can be decomposed into orthogonal policies:</para>
3375
3376             <orderedlist>
3377               <listitem><para>A size policy indicating how a hash table
3378               should grow (e.g., it should multiply by powers of
3379               2).</para></listitem>
3380
3381               <listitem><para>A trigger policy indicating when a hash
3382               table should grow (e.g., a load factor is
3383               exceeded).</para></listitem>
3384             </orderedlist>
3385
3386           </section>
3387
3388           <section xml:id="resize_policies.size">
3389             <info><title>Size Policies</title></info>
3390
3391
3392             <para>Size policies determine how a hash table changes size. These
3393             policies are simple, and there are relatively few sensible
3394             options. An exponential-size policy (with the initial size and
3395             growth factors both powers of 2) works well with a mask-based
3396             range-hashing function, and is the
3397             hard-wired policy used by Dinkumware. A
3398             prime-list based policy works well with a modulo-prime range
3399             hashing function and is the hard-wired policy used by SGI's
3400             implementation.</para>
3401
3402           </section>
3403
3404           <section xml:id="resize_policies.trigger">
3405             <info><title>Trigger Policies</title></info>
3406
3407             <para>Trigger policies determine when a hash table changes size.
3408             Following is a description of two policies: load-check
3409             policies, and collision-check policies.</para>
3410
3411             <para>Load-check policies are straightforward. The user specifies
3412             two factors, Α<subscript>min</subscript> and
3413             Α<subscript>max</subscript>, and the hash table maintains the
3414             invariant that</para>
3415
3416             <para>Α<subscript>min</subscript> ≤ (number of
3417             stored elements) / (hash-table size) ≤
3418             Α<subscript>max</subscript><remark>load factor min max</remark></para>
3419
3420             <para>Collision-check policies work in the opposite direction of
3421             load-check policies. They focus on keeping the number of
3422             collisions moderate and hoping that the size of the table will
3423             not grow very large, instead of keeping a moderate load-factor
3424             and hoping that the number of collisions will be small. A
3425             maximal collision-check policy resizes when the longest
3426             probe-sequence grows too large.</para>
3427
3428             <para>Consider the graphic below. Let the size of the hash table
3429             be denoted by m, the length of a probe sequence be denoted by k,
3430             and some load factor be denoted by Α. We would like to
3431             calculate the minimal length of k, such that if there were Α
3432             m elements in the hash table, a probe sequence of length k would
3433             be found with probability at most 1/m.</para>
3434
3435             <figure>
3436               <title>Balls and bins</title>
3437               <mediaobject>
3438                 <imageobject>
3439                   <imagedata align="center" format="PNG" scale="100"
3440                              fileref="../images/pbds_balls_and_bins.png"/>
3441                 </imageobject>
3442                 <textobject>
3443                   <phrase>Balls and bins</phrase>
3444                 </textobject>
3445               </mediaobject>
3446             </figure>
3447
3448             <para>Denote the probability that a probe sequence of length
3449             k appears in bin i by p<subscript>i</subscript>, the
3450             length of the probe sequence of bin i by
3451             l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3452
3453
3454
3455             <equation>
3456               <title>
3457                 Probability of Probe Sequence of Length k
3458               </title>
3459               <mathphrase>
3460                 p<subscript>1</subscript> =
3461               </mathphrase>
3462             </equation>
3463
3464             <para>P(l<subscript>1</subscript> ≥ k) =</para>
3465
3466             <para>
3467               P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3468             </para>
3469
3470             <para>
3471               e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3472             </para>
3473
3474             <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3475             calculate the probability that some bin contains a probe
3476             sequence greater than k, we note that the
3477             l<subscript>i</subscript> are negatively-dependent
3478             (<xref linkend="biblio.dubhashi98neg"/>)
3479             . Let
3480             I(.) denote the indicator function. Then</para>
3481
3482             <equation>
3483               <title>
3484                 Probability Probe Sequence in Some Bin
3485               </title>
3486               <mathphrase>
3487                 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3488               </mathphrase>
3489             </equation>
3490
3491             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3492             I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3493
3494             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3495             l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3496             p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3497
3498             <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3499             - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3500
3501             <para>where (a) follows from the fact that the Chernoff bound can
3502             be applied to negatively-dependent variables (<xref
3503             linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3504             equation into the second one, and equating with 1/m, we
3505             obtain</para>
3506
3507
3508             <para>k ~ √ ( 2 α ln 2 m ln(m) )
3509             ) .</para>
3510
3511           </section>
3512
3513           <section xml:id="resize_policies.impl">
3514             <info><title>Implementation</title></info>
3515
3516             <para>This sub-subsection describes the implementation of the
3517             above in this library. It first describes resize policies and
3518             their decomposition into trigger and size policies, then
3519             describes pre-defined classes, and finally discusses controlled
3520             access the policies' internals.</para>
3521
3522             <section xml:id="resize_policies.impl.decomposition">
3523               <info><title>Decomposition</title></info>
3524
3525
3526               <para>Each hash-based container is parametrized by a
3527               <classname>Resize_Policy</classname> parameter; the container derives
3528               <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3529               example:</para>
3530               <programlisting>
3531                 cc_hash_table&lt;typename Key,
3532                 typename Mapped,
3533                 ...
3534                 typename Resize_Policy
3535                 ...&gt; : public Resize_Policy
3536               </programlisting>
3537
3538               <para>As a container object is modified, it continuously notifies
3539               its <classname>Resize_Policy</classname> base of internal changes
3540               (e.g., collisions encountered and elements being
3541               inserted). It queries its <classname>Resize_Policy</classname> base whether
3542               it needs to be resized, and if so, to what size.</para>
3543
3544               <para>The graphic below shows a (possible) sequence diagram
3545               of an insert operation. The user inserts an element; the hash
3546               table notifies its resize policy that a search has started
3547               (point A); in this case, a single collision is encountered -
3548               the table notifies its resize policy of this (point B); the
3549               container finally notifies its resize policy that the search
3550               has ended (point C); it then queries its resize policy whether
3551               a resize is needed, and if so, what is the new size (points D
3552               to G); following the resize, it notifies the policy that a
3553               resize has completed (point H); finally, the element is
3554               inserted, and the policy notified (point I).</para>
3555
3556               <figure>
3557                 <title>Insert resize sequence diagram</title>
3558                 <mediaobject>
3559                   <imageobject>
3560                     <imagedata align="center" format="PNG" scale="100"
3561                                fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3562                   </imageobject>
3563                   <textobject>
3564                     <phrase>Insert resize sequence diagram</phrase>
3565                   </textobject>
3566                 </mediaobject>
3567               </figure>
3568
3569
3570               <para>In practice, a resize policy can be usually orthogonally
3571               decomposed to a size policy and a trigger policy. Consequently,
3572               the library contains a single class for instantiating a resize
3573               policy: <classname>hash_standard_resize_policy</classname>
3574               is parametrized by <classname>Size_Policy</classname> and
3575               <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3576               both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3577               to these policies.</para>
3578
3579               <para>The two graphics immediately below show sequence diagrams
3580               illustrating the interaction between the standard resize policy
3581               and its trigger and size policies, respectively.</para>
3582
3583               <figure>
3584                 <title>Standard resize policy trigger sequence
3585                 diagram</title>
3586                 <mediaobject>
3587                   <imageobject>
3588                     <imagedata align="center" format="PNG" scale="100"
3589                                fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3590                   </imageobject>
3591                   <textobject>
3592                     <phrase>Standard resize policy trigger sequence
3593                     diagram</phrase>
3594                   </textobject>
3595                 </mediaobject>
3596               </figure>
3597
3598               <figure>
3599                 <title>Standard resize policy size sequence
3600                 diagram</title>
3601                 <mediaobject>
3602                   <imageobject>
3603                     <imagedata align="center" format="PNG" scale="100"
3604                                fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3605                   </imageobject>
3606                   <textobject>
3607                     <phrase>Standard resize policy size sequence
3608                     diagram</phrase>
3609                   </textobject>
3610                 </mediaobject>
3611               </figure>
3612
3613
3614             </section>
3615
3616             <section xml:id="resize_policies.impl.predefined">
3617               <info><title>Predefined Policies</title></info>
3618               <para>The library includes the following
3619               instantiations of size and trigger policies:</para>
3620
3621               <orderedlist>
3622                 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3623                 implements a load check trigger policy.</para></listitem>
3624
3625                 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3626                 implements a collision check trigger policy.</para></listitem>
3627
3628                 <listitem><para><classname>hash_exponential_size_policy</classname>
3629                 implements an exponential-size policy (which should be used
3630                 with mask range hashing).</para></listitem>
3631
3632                 <listitem><para><classname>hash_prime_size_policy</classname>
3633                 implementing a size policy based on a sequence of primes
3634                 (which should
3635                 be used with mod range hashing</para></listitem>
3636               </orderedlist>
3637
3638               <para>The graphic below gives an overall picture of the resize-related
3639               classes. <classname>basic_hash_table</classname>
3640               is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3641               publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3642               <classname>hash_standard_resize_policy</classname>
3643               itself is parametrized by <classname>Trigger_Policy</classname> and
3644               <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3645               instantiated by <classname>hash_load_check_resize_trigger</classname>,
3646               or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3647               <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3648               or <classname>hash_prime_size_policy</classname>.</para>
3649
3650             </section>
3651
3652             <section xml:id="resize_policies.impl.internals">
3653               <info><title>Controling Access to Internals</title></info>
3654
3655               <para>There are cases where (controlled) access to resize
3656               policies' internals is beneficial. E.g., it is sometimes
3657               useful to query a hash-table for the table's actual size (as
3658               opposed to its <function>size()</function> - the number of values it
3659               currently holds); it is sometimes useful to set a table's
3660               initial size, externally resize it, or change load factors.</para>
3661
3662               <para>Clearly, supporting such methods both decreases the
3663               encapsulation of hash-based containers, and increases the
3664               diversity between different associative-containers' interfaces.
3665               Conversely, omitting such methods can decrease containers'
3666               flexibility.</para>
3667
3668               <para>In order to avoid, to the extent possible, the above
3669               conflict, the hash-based containers themselves do not address
3670               any of these questions; this is deferred to the resize policies,
3671               which are easier to change or replace. Thus, for example,
3672               neither <classname>cc_hash_table</classname> nor
3673               <classname>gp_hash_table</classname>
3674               contain methods for querying the actual size of the table; this
3675               is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3676
3677               <para>Furthermore, the policies themselves are parametrized by
3678               template arguments that determine the methods they support
3679               (
3680               <xref linkend="biblio.alexandrescu01modern"/>
3681               shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3682               is parametrized by <classname>External_Size_Access</classname> that
3683               determines whether it supports methods for querying the actual
3684               size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3685               is parametrized by <classname>External_Load_Access</classname> that
3686               determines whether it supports methods for querying or
3687               modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3688               is parametrized by <classname>External_Load_Access</classname> that
3689               determines whether it supports methods for querying the
3690               load.</para>
3691
3692               <para>Some operations, for example, resizing a container at
3693               run time, or changing the load factors of a load-check trigger
3694               policy, require the container itself to resize. As mentioned
3695               above, the hash-based containers themselves do not contain
3696               these types of methods, only their resize policies.
3697               Consequently, there must be some mechanism for a resize policy
3698               to manipulate the hash-based container. As the hash-based
3699               container is a subclass of the resize policy, this is done
3700               through virtual methods. Each hash-based container has a
3701               <classname>private</classname> <classname>virtual</classname> method:</para>
3702               <programlisting>
3703                 virtual void
3704                 do_resize
3705                 (size_type new_size);
3706               </programlisting>
3707
3708               <para>which resizes the container. Implementations of
3709               <classname>Resize_Policy</classname> can export public methods for resizing
3710               the container externally; these methods internally call
3711               <classname>do_resize</classname> to resize the table.</para>
3712
3713
3714             </section>
3715
3716           </section>
3717
3718
3719         </section> <!-- resize policies -->
3720
3721         <section xml:id="container.hash.details.policy_interaction">
3722           <info><title>Policy Interactions</title></info>
3723           <para>
3724           </para>
3725           <para>Hash-tables are unfortunately especially susceptible to
3726           choice of policies. One of the more complicated aspects of this
3727           is that poor combinations of good policies can form a poor
3728           container. Following are some considerations.</para>
3729
3730           <section xml:id="policy_interaction.probesizetrigger">
3731             <info><title>probe/size/trigger</title></info>
3732
3733             <para>Some combinations do not work well for probing containers.
3734             For example, combining a quadratic probe policy with an
3735             exponential size policy can yield a poor container: when an
3736             element is inserted, a trigger policy might decide that there
3737             is no need to resize, as the table still contains unused
3738             entries; the probe sequence, however, might never reach any of
3739             the unused entries.</para>
3740
3741             <para>Unfortunately, this library cannot detect such problems at
3742             compilation (they are halting reducible). It therefore defines
3743             an exception class <classname>insert_error</classname> to throw an
3744             exception in this case.</para>
3745
3746           </section>
3747
3748           <section xml:id="policy_interaction.hashtrigger">
3749             <info><title>hash/trigger</title></info>
3750
3751             <para>Some trigger policies are especially susceptible to poor
3752             hash functions. Suppose, as an extreme case, that the hash
3753             function transforms each key to the same hash value. After some
3754             inserts, a collision detecting policy will always indicate that
3755             the container needs to grow.</para>
3756
3757             <para>The library, therefore, by design, limits each operation to
3758             one resize. For each <classname>insert</classname>, for example, it queries
3759             only once whether a resize is needed.</para>
3760
3761           </section>
3762
3763           <section xml:id="policy_interaction.eqstorehash">
3764             <info><title>equivalence functors/storing hash values/hash</title></info>
3765
3766             <para><classname>cc_hash_table</classname> and
3767             <classname>gp_hash_table</classname> are
3768             parametrized by an equivalence functor and by a
3769             <classname>Store_Hash</classname> parameter. If the latter parameter is
3770             <classname>true</classname>, then the container stores with each entry
3771             a hash value, and uses this value in case of collisions to
3772             determine whether to apply a hash value. This can lower the
3773             cost of collision for some types, but increase the cost of
3774             collisions for other types.</para>
3775
3776             <para>If a ranged-hash function or ranged probe function is
3777             directly supplied, however, then it makes no sense to store the
3778             hash value with each entry. This library's container will
3779             fail at compilation, by design, if this is attempted.</para>
3780
3781           </section>
3782
3783           <section xml:id="policy_interaction.sizeloadtrigger">
3784             <info><title>size/load-check trigger</title></info>
3785
3786             <para>Assume a size policy issues an increasing sequence of sizes
3787             a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3788             example, an exponential size policy might issue the sequence of
3789             sizes 8, 16, 32, 64, ...</para>
3790
3791             <para>If a load-check trigger policy is used, with loads
3792             α<subscript>min</subscript> and α<subscript>max</subscript>,
3793             respectively, then it is a good idea to have:</para>
3794
3795             <orderedlist>
3796               <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3797
3798               <listitem><para>α<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3799             </orderedlist>
3800
3801             <para>This will ensure that the amortized hash cost of each
3802             modifying operation is at most approximately 3.</para>
3803
3804             <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3805             any case, a bad choice, and α<subscript>min</subscript> &gt;
3806             α <subscript>max</subscript> is horrendous.</para>
3807
3808           </section>
3809
3810         </section>
3811
3812       </section> <!-- details -->
3813
3814     </section> <!-- hash -->
3815
3816     <!-- tree -->
3817     <section xml:id="pbds.design.container.tree">
3818       <info><title>tree</title></info>
3819
3820       <section xml:id="container.tree.interface">
3821         <info><title>Interface</title></info>
3822
3823         <para>The tree-based container has the following declaration:</para>
3824         <programlisting>
3825           template&lt;
3826           typename Key,
3827           typename Mapped,
3828           typename Cmp_Fn = std::less&lt;Key&gt;,
3829           typename Tag = rb_tree_tag,
3830           template&lt;
3831           typename Const_Node_Iterator,
3832           typename Node_Iterator,
3833           typename Cmp_Fn_,
3834           typename Allocator_&gt;
3835           class Node_Update = null_node_update,
3836           typename Allocator = std::allocator&lt;char&gt; &gt;
3837           class tree;
3838         </programlisting>
3839
3840         <para>The parameters have the following meaning:</para>
3841
3842         <orderedlist>
3843           <listitem>
3844           <para><classname>Key</classname> is the key type.</para></listitem>
3845
3846           <listitem>
3847           <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3848
3849           <listitem>
3850           <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3851
3852           <listitem>
3853             <para><classname>Tag</classname> specifies which underlying data structure
3854           to use.</para></listitem>
3855
3856           <listitem>
3857             <para><classname>Node_Update</classname> is a policy for updating node
3858           invariants.</para></listitem>
3859
3860           <listitem>
3861             <para><classname>Allocator</classname> is an allocator
3862           type.</para></listitem>
3863         </orderedlist>
3864
3865         <para>The <classname>Tag</classname> parameter specifies which underlying
3866         data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3867         <classname>ov_tree_tag</classname>,
3868         specifies an underlying red-black tree, splay tree, or
3869         ordered-vector tree, respectively; any other tag is illegal.
3870         Note that containers based on the former two contain more types
3871         and methods than the latter (e.g.,
3872         <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3873         exception and invalidation guarantees.</para>
3874
3875       </section>
3876
3877       <section xml:id="container.tree.details">
3878         <info><title>Details</title></info>
3879
3880         <section xml:id="container.tree.node">
3881           <info><title>Node Invariants</title></info>
3882
3883
3884           <para>Consider the two trees in the graphic below, labels A and B. The first
3885           is a tree of floats; the second is a tree of pairs, each
3886           signifying a geometric line interval. Each element in a tree is referred to as a node of the tree. Of course, each of
3887           these trees can support the usual queries: the first can easily
3888           search for <classname>0.4</classname>; the second can easily search for
3889           <classname>std::make_pair(10, 41)</classname>.</para>
3890
3891           <para>Each of these trees can efficiently support other queries.
3892           The first can efficiently determine that the 2rd key in the
3893           tree is <constant>0.3</constant>; the second can efficiently determine
3894           whether any of its intervals overlaps
3895           <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3896           applications or distributed file systems with leases, for
3897           example).  It should be noted that an <classname>std::set</classname> can
3898           only solve these types of problems with linear complexity.</para>
3899
3900           <para>In order to do so, each tree stores some metadata in
3901           each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3902           each node the size of the sub-tree rooted at the node; the
3903           second stores at each node the maximal endpoint of the
3904           intervals at the sub-tree rooted at the node.</para>
3905
3906           <figure>
3907             <title>Tree node invariants</title>
3908             <mediaobject>
3909               <imageobject>
3910                 <imagedata align="center" format="PNG" scale="100"
3911                            fileref="../images/pbds_tree_node_invariants.png"/>
3912               </imageobject>
3913               <textobject>
3914                 <phrase>Tree node invariants</phrase>
3915               </textobject>
3916             </mediaobject>
3917           </figure>
3918
3919           <para>Supporting such trees is difficult for a number of
3920           reasons:</para>
3921
3922           <orderedlist>
3923             <listitem><para>There must be a way to specify what a node's metadata
3924             should be (if any).</para></listitem>
3925
3926             <listitem><para>Various operations can invalidate node
3927             invariants.  The graphic below shows how a right rotation,
3928             performed on A, results in B, with nodes x and y having
3929             corrupted invariants (the grayed nodes in C). The graphic shows
3930             how an insert, performed on D, results in E, with nodes x and y
3931             having corrupted invariants (the grayed nodes in F). It is not
3932             feasible to know outside the tree the effect of an operation on
3933             the nodes of the tree.</para></listitem>
3934
3935             <listitem><para>The search paths of standard associative containers are
3936             defined by comparisons between keys, and not through
3937             metadata.</para></listitem>
3938
3939             <listitem><para>It is not feasible to know in advance which methods trees
3940             can support. Besides the usual <classname>find</classname> method, the
3941             first tree can support a <classname>find_by_order</classname> method, while
3942             the second can support an <classname>overlaps</classname> method.</para></listitem>
3943           </orderedlist>
3944
3945           <figure>
3946             <title>Tree node invalidation</title>
3947             <mediaobject>
3948               <imageobject>
3949                 <imagedata align="center" format="PNG" scale="100"
3950                            fileref="../images/pbds_tree_node_invalidations.png"/>
3951               </imageobject>
3952               <textobject>
3953                 <phrase>Tree node invalidation</phrase>
3954               </textobject>
3955             </mediaobject>
3956           </figure>
3957
3958           <para>These problems are solved by a combination of two means:
3959           node iterators, and template-template node updater
3960           parameters.</para>
3961
3962           <section xml:id="container.tree.node.iterators">
3963             <info><title>Node Iterators</title></info>
3964
3965
3966             <para>Each tree-based container defines two additional iterator
3967             types, <classname>const_node_iterator</classname>
3968             and <classname>node_iterator</classname>.
3969             These iterators allow descending from a node to one of its
3970             children. Node iterator allow search paths different than those
3971             determined by the comparison functor. The <classname>tree</classname>
3972             supports the methods:</para>
3973             <programlisting>
3974               const_node_iterator
3975               node_begin() const;
3976
3977               node_iterator
3978               node_begin();
3979
3980               const_node_iterator
3981               node_end() const;
3982
3983               node_iterator
3984               node_end();
3985             </programlisting>
3986
3987             <para>The first pairs return node iterators corresponding to the
3988             root node of the tree; the latter pair returns node iterators
3989             corresponding to a just-after-leaf node.</para>
3990           </section>
3991
3992           <section xml:id="container.tree.node.updator">
3993             <info><title>Node Updator</title></info>
3994
3995             <para>The tree-based containers are parametrized by a
3996             <classname>Node_Update</classname> template-template parameter. A
3997             tree-based container instantiates
3998             <classname>Node_Update</classname> to some
3999             <classname>node_update</classname> class, and publicly subclasses
4000             <classname>node_update</classname>. The graphic below shows this
4001             scheme, as well as some predefined policies (which are explained
4002             below).</para>
4003
4004             <figure>
4005               <title>A tree and its update policy</title>
4006               <mediaobject>
4007                 <imageobject>
4008                   <imagedata align="center" format="PNG" scale="100"
4009                              fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4010                 </imageobject>
4011                 <textobject>
4012                   <phrase>A tree and its update policy</phrase>
4013                 </textobject>
4014               </mediaobject>
4015             </figure>
4016
4017             <para><classname>node_update</classname> (an instantiation of
4018             <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4019             the type of metadata it requires. For order statistics,
4020             e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4021             The tree defines within each node a <classname>metadata_type</classname>
4022             object.</para>
4023
4024             <para><classname>node_update</classname> must also define the following method
4025             for restoring node invariants:</para>
4026             <programlisting>
4027               void
4028               operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4029             </programlisting>
4030
4031             <para>In this method, <varname>nd_it</varname> is a
4032             <classname>node_iterator</classname> corresponding to a node whose
4033             A) all descendants have valid invariants, and B) its own
4034             invariants might be violated; <classname>end_nd_it</classname> is
4035             a <classname>const_node_iterator</classname> corresponding to a
4036             just-after-leaf node. This method should correct the node
4037             invariants of the node pointed to by
4038             <classname>nd_it</classname>. For example, say node x in the
4039             graphic below label A has an invalid invariant, but its' children,
4040             y and z have valid invariants. After the invocation, all three
4041             nodes should have valid invariants, as in label B.</para>
4042
4043
4044             <figure>
4045               <title>Restoring node invariants</title>
4046               <mediaobject>
4047                 <imageobject>
4048                   <imagedata align="center" format="PNG" scale="100"
4049                              fileref="../images/pbds_restoring_node_invariants.png"/>
4050                 </imageobject>
4051                 <textobject>
4052                   <phrase>Restoring node invariants</phrase>
4053                 </textobject>
4054               </mediaobject>
4055             </figure>
4056
4057             <para>When a tree operation might invalidate some node invariant,
4058             it invokes this method in its <classname>node_update</classname> base to
4059             restore the invariant. For example, the graphic below shows
4060             an <function>insert</function> operation (point A); the tree performs some
4061             operations, and calls the update functor three times (points B,
4062             C, and D). (It is well known that any <function>insert</function>,
4063             <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4064             all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4065             .</para>
4066
4067             <figure>
4068               <title>Insert update sequence</title>
4069               <mediaobject>
4070                 <imageobject>
4071                   <imagedata align="center" format="PNG" scale="100"
4072                              fileref="../images/pbds_update_seq_diagram.png"/>
4073                 </imageobject>
4074                 <textobject>
4075                   <phrase>Insert update sequence</phrase>
4076                 </textobject>
4077               </mediaobject>
4078             </figure>
4079
4080             <para>To complete the description of the scheme, three questions
4081             need to be answered:</para>
4082
4083             <orderedlist>
4084               <listitem><para>How can a tree which supports order statistics define a
4085               method such as <classname>find_by_order</classname>?</para></listitem>
4086
4087               <listitem><para>How can the node updater base access methods of the
4088               tree?</para></listitem>
4089
4090               <listitem><para>How can the following cyclic dependency be resolved?
4091               <classname>node_update</classname> is a base class of the tree, yet it
4092               uses node iterators defined in the tree (its child).</para></listitem>
4093             </orderedlist>
4094
4095             <para>The first two questions are answered by the fact that
4096             <classname>node_update</classname> (an instantiation of
4097             <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4098             of the tree. Consequently:</para>
4099
4100             <orderedlist>
4101               <listitem><para>Any public methods of
4102               <classname>node_update</classname> are automatically methods of
4103               the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4104               Thus an order-statistics node updater,
4105               <classname>tree_order_statistics_node_update</classname> defines
4106               the <function>find_by_order</function> method; any tree
4107               instantiated by this policy consequently supports this method as
4108               well.</para></listitem>
4109
4110               <listitem><para>In C++, if a base class declares a method as
4111               <literal>virtual</literal>, it is
4112               <literal>virtual</literal> in its subclasses. If
4113               <classname>node_update</classname> needs to access one of the
4114               tree's methods, say the member function
4115               <function>end</function>, it simply declares that method as
4116               <literal>virtual</literal> abstract.</para></listitem>
4117             </orderedlist>
4118
4119             <para>The cyclic dependency is solved through template-template
4120             parameters. <classname>Node_Update</classname> is parametrized by
4121             the tree's node iterators, its comparison functor, and its
4122             allocator type. Thus, instantiations of
4123             <classname>Node_Update</classname> have all information
4124             required.</para>
4125
4126             <para>This library assumes that constructing a metadata object and
4127             modifying it are exception free. Suppose that during some method,
4128             say <classname>insert</classname>, a metadata-related operation
4129             (e.g., changing the value of a metadata) throws an exception. Ack!
4130             Rolling back the method is unusually complex.</para>
4131
4132             <para>Previously, a distinction was made between redundant
4133             policies and null policies. Node invariants show a
4134             case where null policies are required.</para>
4135
4136             <para>Assume a regular tree is required, one which need not
4137             support order statistics or interval overlap queries.
4138             Seemingly, in this case a redundant policy - a policy which
4139             doesn't affect nodes' contents would suffice. This, would lead
4140             to the following drawbacks:</para>
4141
4142             <orderedlist>
4143               <listitem><para>Each node would carry a useless metadata object, wasting
4144               space.</para></listitem>
4145
4146               <listitem><para>The tree cannot know if its
4147               <classname>Node_Update</classname> policy actually modifies a
4148               node's metadata (this is halting reducible). In the graphic
4149               below, assume the shaded node is inserted. The tree would have
4150               to traverse the useless path shown to the root, applying
4151               redundant updates all the way.</para></listitem>
4152             </orderedlist>
4153             <figure>
4154               <title>Useless update path</title>
4155               <mediaobject>
4156                 <imageobject>
4157                   <imagedata align="center" format="PNG" scale="100"
4158                              fileref="../images/pbds_rationale_null_node_updator.png"/>
4159                 </imageobject>
4160                 <textobject>
4161                   <phrase>Useless update path</phrase>
4162                 </textobject>
4163               </mediaobject>
4164             </figure>
4165
4166
4167             <para>A null policy class, <classname>null_node_update</classname>
4168             solves both these problems. The tree detects that node
4169             invariants are irrelevant, and defines all accordingly.</para>
4170
4171           </section>
4172
4173         </section>
4174
4175         <section xml:id="container.tree.details.split">
4176           <info><title>Split and Join</title></info>
4177
4178           <para>Tree-based containers support split and join methods.
4179           It is possible to split a tree so that it passes
4180           all nodes with keys larger than a given key to a different
4181           tree. These methods have the following advantages over the
4182           alternative of externally inserting to the destination
4183           tree and erasing from the source tree:</para>
4184
4185           <orderedlist>
4186             <listitem><para>These methods are efficient - red-black trees are split
4187             and joined in poly-logarithmic complexity; ordered-vector
4188             trees are split and joined at linear complexity. The
4189             alternatives have super-linear complexity.</para></listitem>
4190
4191             <listitem><para>Aside from orders of growth, these operations perform
4192             few allocations and de-allocations. For red-black trees, allocations are not performed,
4193             and the methods are exception-free. </para></listitem>
4194           </orderedlist>
4195         </section>
4196
4197       </section> <!-- details -->
4198
4199     </section> <!-- tree -->
4200
4201     <!-- trie -->
4202     <section xml:id="pbds.design.container.trie">
4203       <info><title>Trie</title></info>
4204
4205       <section xml:id="container.trie.interface">
4206         <info><title>Interface</title></info>
4207
4208         <para>The trie-based container has the following declaration:</para>
4209         <programlisting>
4210           template&lt;typename Key,
4211           typename Mapped,
4212           typename Cmp_Fn = std::less&lt;Key&gt;,
4213           typename Tag = pat_trie_tag,
4214           template&lt;typename Const_Node_Iterator,
4215           typename Node_Iterator,
4216           typename E_Access_Traits_,
4217           typename Allocator_&gt;
4218           class Node_Update = null_node_update,
4219           typename Allocator = std::allocator&lt;char&gt; &gt;
4220           class trie;
4221         </programlisting>
4222
4223         <para>The parameters have the following meaning:</para>
4224
4225         <orderedlist>
4226           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4227
4228           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4229
4230           <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4231
4232           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4233           to use, and is described shortly.</para></listitem>
4234
4235           <listitem><para><classname>Node_Update</classname> is a policy for updating node
4236           invariants. This is described below.</para></listitem>
4237
4238           <listitem><para><classname>Allocator</classname> is an allocator
4239           type.</para></listitem>
4240         </orderedlist>
4241
4242         <para>The <classname>Tag</classname> parameter specifies which underlying
4243         data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4244         underlying PATRICIA trie (explained shortly); any other tag is
4245         currently illegal.</para>
4246
4247         <para>Following is a description of a (PATRICIA) trie
4248         (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4249         <xref linkend="biblio.filliatre2000ptset"/>).
4250         </para>
4251
4252         <para>A (PATRICIA) trie is similar to a tree, but with the
4253         following differences:</para>
4254
4255         <orderedlist>
4256           <listitem><para>It explicitly views keys as a sequence of elements.
4257           E.g., a trie can view a string as a sequence of
4258           characters; a trie can view a number as a sequence of
4259           bits.</para></listitem>
4260
4261           <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4262           + 1, where n is the number of distinct
4263           elements.</para></listitem>
4264
4265           <listitem><para>It stores values only at leaf nodes.</para></listitem>
4266
4267           <listitem><para>Internal nodes have the properties that A) each has at
4268           least two children, and B) each shares the same prefix with
4269           any of its descendant.</para></listitem>
4270         </orderedlist>
4271
4272         <para>A (PATRICIA) trie has some useful properties:</para>
4273
4274         <orderedlist>
4275           <listitem><para>It can be configured to use large node fan-out, giving it
4276           very efficient find performance (albeit at insertion
4277           complexity and size).</para></listitem>
4278
4279           <listitem><para>It works well for common-prefix keys.</para></listitem>
4280
4281           <listitem><para>It can support efficiently queries such as which
4282           keys match a certain prefix. This is sometimes useful in file
4283           systems and routers, and for "type-ahead" aka predictive text matching
4284           on mobile devices.</para></listitem>
4285         </orderedlist>
4286
4287
4288       </section>
4289
4290       <section xml:id="container.trie.details">
4291         <info><title>Details</title></info>
4292
4293         <section xml:id="container.trie.details.etraits">
4294           <info><title>Element Access Traits</title></info>
4295
4296           <para>A trie inherently views its keys as sequences of elements.
4297           For example, a trie can view a string as a sequence of
4298           characters. A trie needs to map each of n elements to a
4299           number in {0, n - 1}. For example, a trie can map a
4300           character <varname>c</varname> to
4301           <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4302
4303           <para>Seemingly, then, a trie can assume that its keys support
4304           (const) iterators, and that the <classname>value_type</classname> of this
4305           iterator can be cast to a <classname>size_t</classname>. There are several
4306           reasons, though, to decouple the mechanism by which the trie
4307           accesses its keys' elements from the trie:</para>
4308
4309           <orderedlist>
4310             <listitem><para>In some cases, the numerical value of an element is
4311             inappropriate. Consider a trie storing DNA strings. It is
4312             logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4313             'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4314
4315             <listitem><para>In some cases the keys' iterators are different than what
4316             is needed. For example, a trie can be used to search for
4317             common suffixes, by using strings'
4318             <classname>reverse_iterator</classname>. As another example, a trie mapping
4319             UNICODE strings would have a huge fan-out if each node would
4320             branch on a UNICODE character; instead, one can define an
4321             iterator iterating over 8-bit (or less) groups.</para></listitem>
4322           </orderedlist>
4323
4324           <para>trie is,
4325           consequently, parametrized by <classname>E_Access_Traits</classname> -
4326           traits which instruct how to access sequences' elements.
4327           <classname>string_trie_e_access_traits</classname>
4328           is a traits class for strings. Each such traits define some
4329           types, like:</para>
4330           <programlisting>
4331             typename E_Access_Traits::const_iterator
4332           </programlisting>
4333
4334           <para>is a const iterator iterating over a key's elements. The
4335           traits class must also define methods for obtaining an iterator
4336           to the first and last element of a key.</para>
4337
4338           <para>The graphic below shows a
4339           (PATRICIA) trie resulting from inserting the words: "I wish
4340           that I could ever see a poem lovely as a trie" (which,
4341           unfortunately, does not rhyme).</para>
4342
4343           <para>The leaf nodes contain values; each internal node contains
4344           two <classname>typename E_Access_Traits::const_iterator</classname>
4345           objects, indicating the maximal common prefix of all keys in
4346           the sub-tree. For example, the shaded internal node roots a
4347           sub-tree with leafs "a" and "as". The maximal common prefix is
4348           "a". The internal node contains, consequently, to const
4349           iterators, one pointing to <varname>'a'</varname>, and the other to
4350           <varname>'s'</varname>.</para>
4351
4352           <figure>
4353             <title>A PATRICIA trie</title>
4354             <mediaobject>
4355               <imageobject>
4356                 <imagedata align="center" format="PNG" scale="100"
4357                            fileref="../images/pbds_pat_trie.png"/>
4358               </imageobject>
4359               <textobject>
4360                 <phrase>A PATRICIA trie</phrase>
4361               </textobject>
4362             </mediaobject>
4363           </figure>
4364
4365         </section>
4366
4367         <section xml:id="container.trie.details.node">
4368           <info><title>Node Invariants</title></info>
4369
4370           <para>Trie-based containers support node invariants, as do
4371           tree-based containers. There are two minor
4372           differences, though, which, unfortunately, thwart sharing them
4373           sharing the same node-updating policies:</para>
4374
4375           <orderedlist>
4376             <listitem>
4377               <para>A trie's <classname>Node_Update</classname> template-template
4378               parameter is parametrized by <classname>E_Access_Traits</classname>, while
4379               a tree's <classname>Node_Update</classname> template-template parameter is
4380             parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4381
4382             <listitem><para>Tree-based containers store values in all nodes, while
4383             trie-based containers (at least in this implementation) store
4384             values in leafs.</para></listitem>
4385           </orderedlist>
4386
4387           <para>The graphic below shows the scheme, as well as some predefined
4388           policies (which are explained below).</para>
4389
4390           <figure>
4391             <title>A trie and its update policy</title>
4392             <mediaobject>
4393               <imageobject>
4394                 <imagedata align="center" format="PNG" scale="100"
4395                            fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4396               </imageobject>
4397               <textobject>
4398                 <phrase>A trie and its update policy</phrase>
4399               </textobject>
4400             </mediaobject>
4401           </figure>
4402
4403
4404           <para>This library offers the following pre-defined trie node
4405           updating policies:</para>
4406
4407           <orderedlist>
4408             <listitem>
4409               <para>
4410                 <classname>trie_order_statistics_node_update</classname>
4411                 supports order statistics.
4412               </para>
4413             </listitem>
4414
4415             <listitem><para><classname>trie_prefix_search_node_update</classname>
4416             supports searching for ranges that match a given prefix.</para></listitem>
4417
4418             <listitem><para><classname>null_node_update</classname>
4419             is the null node updater.</para></listitem>
4420           </orderedlist>
4421
4422         </section>
4423
4424         <section xml:id="container.trie.details.split">
4425           <info><title>Split and Join</title></info>
4426           <para>Trie-based containers support split and join methods; the
4427           rationale is equal to that of tree-based containers supporting
4428           these methods.</para>
4429         </section>
4430
4431       </section> <!-- details -->
4432
4433     </section> <!-- trie -->
4434
4435     <!-- list_update -->
4436     <section xml:id="pbds.design.container.list">
4437       <info><title>List</title></info>
4438
4439       <section xml:id="container.list.interface">
4440         <info><title>Interface</title></info>
4441
4442         <para>The list-based container has the following declaration:</para>
4443         <programlisting>
4444           template&lt;typename Key,
4445           typename Mapped,
4446           typename Eq_Fn = std::equal_to&lt;Key&gt;,
4447           typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4448           typename Allocator = std::allocator&lt;char&gt; &gt;
4449           class list_update;
4450         </programlisting>
4451
4452         <para>The parameters have the following meaning:</para>
4453
4454         <orderedlist>
4455           <listitem>
4456             <para>
4457               <classname>Key</classname> is the key type.
4458             </para>
4459           </listitem>
4460
4461           <listitem>
4462             <para>
4463               <classname>Mapped</classname> is the mapped-policy.
4464             </para>
4465           </listitem>
4466
4467           <listitem>
4468             <para>
4469               <classname>Eq_Fn</classname> is a key equivalence functor.
4470             </para>
4471           </listitem>
4472
4473           <listitem>
4474             <para>
4475               <classname>Update_Policy</classname> is a policy updating positions in
4476               the list based on access patterns. It is described in the
4477               following subsection.
4478             </para>
4479           </listitem>
4480
4481           <listitem>
4482             <para>
4483               <classname>Allocator</classname> is an allocator type.
4484             </para>
4485           </listitem>
4486         </orderedlist>
4487
4488         <para>A list-based associative container is a container that
4489         stores elements in a linked-list. It does not order the elements
4490         by any particular order related to the keys.  List-based
4491         containers are primarily useful for creating "multimaps". In fact,
4492         list-based containers are designed in this library expressly for
4493         this purpose.</para>
4494
4495         <para>List-based containers might also be useful for some rare
4496         cases, where a key is encapsulated to the extent that only
4497         key-equivalence can be tested. Hash-based containers need to know
4498         how to transform a key into a size type, and tree-based containers
4499         need to know if some key is larger than another.  List-based
4500         associative containers, conversely, only need to know if two keys
4501         are equivalent.</para>
4502
4503         <para>Since a list-based associative container does not order
4504         elements by keys, is it possible to order the list in some
4505         useful manner? Remarkably, many on-line competitive
4506         algorithms exist for reordering lists to reflect access
4507         prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4508         </para>
4509
4510       </section>
4511
4512       <section xml:id="container.list.details">
4513         <info><title>Details</title></info>
4514         <para>
4515         </para>
4516         <section xml:id="container.list.details.ds">
4517           <info><title>Underlying Data Structure</title></info>
4518
4519           <para>The graphic below shows a
4520           simple list of integer keys. If we search for the integer 6, we
4521           are paying an overhead: the link with key 6 is only the fifth
4522           link; if it were the first link, it could be accessed
4523           faster.</para>
4524
4525           <figure>
4526             <title>A simple list</title>
4527             <mediaobject>
4528               <imageobject>
4529                 <imagedata align="center" format="PNG" scale="100"
4530                            fileref="../images/pbds_simple_list.png"/>
4531               </imageobject>
4532               <textobject>
4533                 <phrase>A simple list</phrase>
4534               </textobject>
4535             </mediaobject>
4536           </figure>
4537
4538           <para>List-update algorithms reorder lists as elements are
4539           accessed. They try to determine, by the access history, which
4540           keys to move to the front of the list. Some of these algorithms
4541           require adding some metadata alongside each entry.</para>
4542
4543           <para>For example, in the graphic below label A shows the counter
4544           algorithm. Each node contains both a key and a count metadata
4545           (shown in bold). When an element is accessed (e.g. 6) its count is
4546           incremented, as shown in label B. If the count reaches some
4547           predetermined value, say 10, as shown in label C, the count is set
4548           to 0 and the node is moved to the front of the list, as in label
4549           D.
4550           </para>
4551
4552           <figure>
4553             <title>The counter algorithm</title>
4554             <mediaobject>
4555               <imageobject>
4556                 <imagedata align="center" format="PNG" scale="100"
4557                            fileref="../images/pbds_list_update.png"/>
4558               </imageobject>
4559               <textobject>
4560                 <phrase>The counter algorithm</phrase>
4561               </textobject>
4562             </mediaobject>
4563           </figure>
4564
4565
4566         </section>
4567
4568         <section xml:id="container.list.details.policies">
4569           <info><title>Policies</title></info>
4570
4571           <para>this library allows instantiating lists with policies
4572           implementing any algorithm moving nodes to the front of the
4573           list (policies implementing algorithms interchanging nodes are
4574           unsupported).</para>
4575
4576           <para>Associative containers based on lists are parametrized by a
4577           <classname>Update_Policy</classname> parameter. This parameter defines the
4578           type of metadata each node contains, how to create the
4579           metadata, and how to decide, using this metadata, whether to
4580           move a node to the front of the list. A list-based associative
4581           container object derives (publicly) from its update policy.
4582           </para>
4583
4584           <para>An instantiation of <classname>Update_Policy</classname> must define
4585           internally <classname>update_metadata</classname> as the metadata it
4586           requires. Internally, each node of the list contains, besides
4587           the usual key and data, an instance of <classname>typename
4588           Update_Policy::update_metadata</classname>.</para>
4589
4590           <para>An instantiation of <classname>Update_Policy</classname> must define
4591           internally two operators:</para>
4592           <programlisting>
4593             update_metadata
4594             operator()();
4595
4596             bool
4597             operator()(update_metadata &amp;);
4598           </programlisting>
4599
4600           <para>The first is called by the container object, when creating a
4601           new node, to create the node's metadata. The second is called
4602           by the container object, when a node is accessed (
4603           when a find operation's key is equivalent to the key of the
4604           node), to determine whether to move the node to the front of
4605           the list.
4606           </para>
4607
4608           <para>The library contains two predefined implementations of
4609           list-update policies. The first
4610           is <classname>lu_counter_policy</classname>, which implements the
4611           counter algorithm described above. The second is
4612           <classname>lu_move_to_front_policy</classname>,
4613           which unconditionally move an accessed element to the front of
4614           the list. The latter type is very useful in this library,
4615           since there is no need to associate metadata with each element.
4616           (See <xref linkend="biblio.andrew04mtf"/>
4617           </para>
4618
4619         </section>
4620
4621         <section xml:id="container.list.details.mapped">
4622           <info><title>Use in Multimaps</title></info>
4623
4624           <para>In this library, there are no equivalents for the standard's
4625           multimaps and multisets; instead one uses an associative
4626           container mapping primary keys to secondary keys.</para>
4627
4628           <para>List-based containers are especially useful as associative
4629           containers for secondary keys. In fact, they are implemented
4630           here expressly for this purpose.</para>
4631
4632           <para>To begin with, these containers use very little per-entry
4633           structure memory overhead, since they can be implemented as
4634           singly-linked lists. (Arrays use even lower per-entry memory
4635           overhead, but they are less flexible in moving around entries,
4636           and have weaker invalidation guarantees).</para>
4637
4638           <para>More importantly, though, list-based containers use very
4639           little per-container memory overhead. The memory overhead of an
4640           empty list-based container is practically that of a pointer.
4641           This is important for when they are used as secondary
4642           associative-containers in situations where the average ratio of
4643           secondary keys to primary keys is low (or even 1).</para>
4644
4645           <para>In order to reduce the per-container memory overhead as much
4646           as possible, they are implemented as closely as possible to
4647           singly-linked lists.</para>
4648
4649           <orderedlist>
4650             <listitem>
4651               <para>
4652                 List-based containers do not store internally the number
4653                 of values that they hold. This means that their <function>size</function>
4654                 method has linear complexity (just like <classname>std::list</classname>).
4655                 Note that finding the number of equivalent-key values in a
4656                 standard multimap also has linear complexity (because it must be
4657                 done,  via <function>std::distance</function> of the
4658                 multimap's <function>equal_range</function> method), but usually with
4659                 higher constants.
4660               </para>
4661             </listitem>
4662
4663             <listitem>
4664               <para>
4665                 Most associative-container objects each hold a policy
4666                 object (a hash-based container object holds a
4667                 hash functor). List-based containers, conversely, only have
4668                 class-wide policy objects.
4669               </para>
4670             </listitem>
4671           </orderedlist>
4672
4673
4674         </section>
4675
4676       </section> <!-- details -->
4677
4678     </section> <!-- list -->
4679
4680
4681     <!-- priority_queue -->
4682     <section xml:id="pbds.design.container.priority_queue">
4683       <info><title>Priority Queue</title></info>
4684
4685       <section xml:id="container.priority_queue.interface">
4686         <info><title>Interface</title></info>
4687
4688         <para>The priority queue container has the following
4689         declaration:
4690         </para>
4691         <programlisting>
4692           template&lt;typename  Value_Type,
4693           typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4694           typename  Tag = pairing_heap_tag,
4695           typename  Allocator = std::allocator&lt;char &gt; &gt;
4696           class priority_queue;
4697         </programlisting>
4698
4699         <para>The parameters have the following meaning:</para>
4700
4701         <orderedlist>
4702           <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4703
4704           <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4705
4706           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4707           to use.</para></listitem>
4708
4709           <listitem><para><classname>Allocator</classname> is an allocator
4710           type.</para></listitem>
4711         </orderedlist>
4712
4713         <para>The <classname>Tag</classname> parameter specifies which underlying
4714         data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4715         <classname>binomial_heap_tag</classname>,
4716         <classname>rc_binomial_heap_tag</classname>,
4717         or <classname>thin_heap_tag</classname>,
4718         specifies, respectively,
4719         an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4720         binary heap (<xref linkend="biblio.clrs2001"/>),
4721         binomial heap (<xref linkend="biblio.clrs2001"/>),
4722         a binomial heap with a redundant binary counter (<xref linkend="biblio.maverick_lowerbounds"/>),
4723         or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4724         </para>
4725
4726         <para>
4727           As mentioned in the tutorial,
4728           <classname>__gnu_pbds::priority_queue</classname> shares most of the
4729           same interface with <classname>std::priority_queue</classname>.
4730           E.g. if <varname>q</varname> is a priority queue of type
4731           <classname>Q</classname>, then <function>q.top()</function> will
4732           return the "largest" value in the container (according to
4733           <classname>typename
4734           Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4735           has a larger (and very slightly different) interface than
4736           <classname>std::priority_queue</classname>, however, since typically
4737           <classname>push</classname> and <classname>pop</classname> are deemed
4738         insufficient for manipulating priority-queues. </para>
4739
4740         <para>Different settings require different priority-queue
4741         implementations which are described in later; see traits
4742         discusses ways to differentiate between the different traits of
4743         different implementations.</para>
4744
4745
4746       </section>
4747
4748       <section xml:id="container.priority_queue.details">
4749         <info><title>Details</title></info>
4750
4751         <section xml:id="container.priority_queue.details.iterators">
4752           <info><title>Iterators</title></info>
4753
4754           <para>There are many different underlying-data structures for
4755           implementing priority queues. Unfortunately, most such
4756           structures are oriented towards making <function>push</function> and
4757           <function>top</function> efficient, and consequently don't allow efficient
4758           access of other elements: for instance, they cannot support an efficient
4759           <function>find</function> method. In the use case where it
4760           is important to both access and "do something with" an
4761           arbitrary value, one would be out of luck. For example, many graph algorithms require
4762           modifying a value (typically increasing it in the sense of the
4763           priority queue's comparison functor).</para>
4764
4765           <para>In order to access and manipulate an arbitrary value in a
4766           priority queue, one needs to reference the internals of the
4767           priority queue from some form of an associative container -
4768           this is unavoidable. Of course, in order to maintain the
4769           encapsulation of the priority queue, this needs to be done in a
4770           way that minimizes exposure to implementation internals.</para>
4771
4772           <para>In this library the priority queue's <function>insert</function>
4773           method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4774           <function>erase</function> operations. This both preserves the priority
4775           queue's encapsulation, and allows accessing arbitrary values (since the
4776           returned iterators from the <function>push</function> operation can be
4777           stored in some form of associative container).</para>
4778
4779           <para>Priority queues' iterators present a problem regarding their
4780           invalidation guarantees. One assumes that calling
4781           <function>operator++</function> on an iterator will associate it
4782           with the "next" value. Priority-queues are
4783           self-organizing: each operation changes what the "next" value
4784           means. Consequently, it does not make sense that <function>push</function>
4785           will return an iterator that can be incremented - this can have
4786           no possible use. Also, as in the case of hash-based containers,
4787           it is awkward to define if a subsequent <function>push</function> operation
4788           invalidates a prior returned iterator: it invalidates it in the
4789           sense that its "next" value is not related to what it
4790           previously considered to be its "next" value. However, it might not
4791           invalidate it, in the sense that it can be
4792           de-referenced and used for <function>modify</function> and <function>erase</function>
4793           operations.</para>
4794
4795           <para>Similarly to the case of the other unordered associative
4796           containers, this library uses a distinction between
4797           point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4798           converted to a <classname>point_iterator</classname>, and a
4799           <classname>const_iterator</classname> can always be converted to a
4800           <classname>point_const_iterator</classname>.</para>
4801
4802           <para>The following snippet demonstrates manipulating an arbitrary
4803           value:</para>
4804           <programlisting>
4805             // A priority queue of integers.
4806             priority_queue&lt;int &gt; p;
4807
4808             // Insert some values into the priority queue.
4809             priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4810
4811             p.push(1);
4812             p.push(2);
4813
4814             // Now modify a value.
4815             p.modify(it, 3);
4816
4817             assert(p.top() == 3);
4818           </programlisting>
4819
4820
4821           <para>It should be noted that an alternative design could embed an
4822           associative container in a priority queue. Could, but most
4823           probably should not. To begin with, it should be noted that one
4824           could always encapsulate a priority queue and an associative
4825           container mapping values to priority queue iterators with no
4826           performance loss. One cannot, however, "un-encapsulate" a priority
4827           queue embedding an associative container, which might lead to
4828           performance loss. Assume, that one needs to associate each value
4829           with some data unrelated to priority queues. Then using
4830           this library's design, one could use an
4831           associative container mapping each value to a pair consisting of
4832           this data and a priority queue's iterator. Using the embedded
4833           method would need to use two associative containers. Similar
4834           problems might arise in cases where a value can reside
4835           simultaneously in many priority queues.</para>
4836
4837         </section>
4838
4839
4840         <section xml:id="container.priority_queue.details.d">
4841           <info><title>Underlying Data Structure</title></info>
4842
4843           <para>There are three main implementations of priority queues: the
4844           first employs a binary heap, typically one which uses a
4845           sequence; the second uses a tree (or forest of trees), which is
4846           typically less structured than an associative container's tree;
4847           the third simply uses an associative container. These are
4848           shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4849
4850           <figure>
4851             <title>Underlying Priority-Queue Data-Structures.</title>
4852             <mediaobject>
4853               <imageobject>
4854                 <imagedata align="center" format="PNG" scale="100"
4855                            fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4856               </imageobject>
4857               <textobject>
4858                 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4859               </textobject>
4860             </mediaobject>
4861           </figure>
4862
4863           <para>Roughly speaking, any value that is both pushed and popped
4864           from a priority queue must incur a logarithmic expense (in the
4865           amortized sense). Any priority queue implementation that would
4866           avoid this, would violate known bounds on comparison-based
4867           sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4868           </para>
4869
4870           <para>Most implementations do
4871           not differ in the asymptotic amortized complexity of
4872           <function>push</function> and <function>pop</function> operations, but they differ in
4873           the constants involved, in the complexity of other operations
4874           (e.g., <function>modify</function>), and in the worst-case
4875           complexity of single operations. In general, the more
4876           "structured" an implementation (i.e., the more internal
4877           invariants it possesses) - the higher its amortized complexity
4878           of <function>push</function> and <function>pop</function> operations.</para>
4879
4880           <para>This library implements different algorithms using a
4881           single class: <classname>priority_queue</classname>.
4882           Instantiating the <classname>Tag</classname> template parameter, "selects"
4883           the implementation:</para>
4884
4885           <orderedlist>
4886             <listitem><para>
4887               Instantiating <classname>Tag = binary_heap_tag</classname> creates
4888               a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4889               selected by priority_queue
4890               if <classname>Value_Type</classname> is instantiated by a primitive type
4891               (e.g., an <type>int</type>); the latter is
4892               internally selected for all other types (e.g.,
4893               <classname>std::string</classname>). This implementations is relatively
4894               unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4895               performance; it is the "best-in-kind" for primitive
4896               types, e.g., <type>int</type>s. Conversely, it has
4897               high worst-case performance, and can support only linear-time
4898             <function>modify</function> and <function>erase</function> operations.</para></listitem>
4899
4900             <listitem><para>Instantiating <classname>Tag =
4901             pairing_heap_tag</classname> creates a pairing heap of the form
4902             in represented by label B in the graphic above. This
4903             implementations too is relatively unstructured, and so has good
4904             <function>push</function> and <function>pop</function>
4905             performance; it is the "best-in-kind" for non-primitive types,
4906             e.g., <classname>std:string</classname>s. It also has very good
4907             worst-case <function>push</function> and
4908             <function>join</function> performance (O(1)), but has high
4909             worst-case <function>pop</function>
4910             complexity.</para></listitem>
4911
4912             <listitem><para>Instantiating <classname>Tag =
4913             binomial_heap_tag</classname> creates a binomial heap of the
4914             form repsented by label B in the graphic above. This
4915             implementations is more structured than a pairing heap, and so
4916             has worse <function>push</function> and <function>pop</function>
4917             performance. Conversely, it has sub-linear worst-case bounds for
4918             <function>pop</function>, e.g., and so it might be preferred in
4919             cases where responsiveness is important.</para></listitem>
4920
4921             <listitem><para>Instantiating <classname>Tag =
4922             rc_binomial_heap_tag</classname> creates a binomial heap of the
4923             form represented in label B above, accompanied by a redundant
4924             counter which governs the trees. This implementations is
4925             therefore more structured than a binomial heap, and so has worse
4926             <function>push</function> and <function>pop</function>
4927             performance. Conversely, it guarantees O(1)
4928             <function>push</function> complexity, and so it might be
4929             preferred in cases where the responsiveness of a binomial heap
4930             is insufficient.</para></listitem>
4931
4932             <listitem><para>Instantiating <classname>Tag =
4933             thin_heap_tag</classname> creates a thin heap of the form
4934             represented by the label B in the graphic above. This
4935             implementations too is more structured than a pairing heap, and
4936             so has worse <function>push</function> and
4937             <function>pop</function> performance. Conversely, it has better
4938             worst-case and identical amortized complexities than a Fibonacci
4939             heap, and so might be more appropriate for some graph
4940             algorithms.</para></listitem>
4941           </orderedlist>
4942
4943           <para>Of course, one can use any order-preserving associative
4944           container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4945           over the associative container (much as
4946           <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4947           This has the advantage that no cross-referencing is necessary
4948           at all; the priority queue itself is an associative container.
4949           Most associative containers are too structured to compete with
4950           priority queues in terms of <function>push</function> and <function>pop</function>
4951           performance.</para>
4952
4953
4954
4955         </section>
4956
4957         <section xml:id="container.priority_queue.details.traits">
4958           <info><title>Traits</title></info>
4959
4960           <para>It would be nice if all priority queues could
4961           share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4962           two binary heaps might throw an exception (not corrupt
4963           any of the heaps on which it operates), but joining two pairing
4964           heaps is exception free.</para>
4965
4966           <para>Tags and traits are very useful for manipulating generic
4967           types. <classname>__gnu_pbds::priority_queue</classname>
4968           publicly defines <classname>container_category</classname> as one of the tags. Given any
4969           container <classname>Cntnr</classname>, the tag of the underlying
4970           data structure can be found via <classname>typename
4971           Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4972           </para>
4973
4974           <figure>
4975             <title>Priority-Queue Data-Structure Tags.</title>
4976             <mediaobject>
4977               <imageobject>
4978                 <imagedata align="center" format="PNG" scale="100"
4979                  fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
4980               </imageobject>
4981               <textobject>
4982                 <phrase>Priority-Queue Data-Structure Tags.</phrase>
4983               </textobject>
4984             </mediaobject>
4985           </figure>
4986
4987
4988           <para>Additionally, a traits mechanism can be used to query a
4989           container type for its attributes. Given any container
4990           <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
4991           is a traits class identifying the properties of the
4992           container.</para>
4993
4994           <para>To find if a container might throw if two of its objects are
4995           joined, one can use
4996           <programlisting>
4997             container_traits&lt;Cntnr&gt;::split_join_can_throw
4998           </programlisting>
4999           </para>
5000
5001           <para>
5002             Different priority-queue implementations have different invalidation guarantees. This is
5003             especially important, since there is no way to access an arbitrary
5004             value of priority queues except for iterators. Similarly to
5005             associative containers, one can use
5006             <programlisting>
5007               container_traits&lt;Cntnr&gt;::invalidation_guarantee
5008             </programlisting>
5009           to get the invalidation guarantee type of a priority queue.</para>
5010
5011           <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5012           will be for different implementations. All implementations of
5013           type represented by label B have <classname>point_invalidation_guarantee</classname>:
5014           the container can freely internally reorganize the nodes -
5015           range-type iterators are invalidated, but point-type iterators
5016           are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5017           the container can freely internally reallocate the array - both
5018           point-type and range-type iterators might be invalidated.</para>
5019
5020           <para>
5021             This has major implications, and constitutes a good reason to avoid
5022             using binary heaps. A binary heap can perform <function>modify</function>
5023             or <function>erase</function> efficiently given a valid point-type
5024             iterator. However, in order to supply it with a valid point-type
5025             iterator, one needs to iterate (linearly) over all
5026             values, then supply the relevant iterator (recall that a
5027             range-type iterator can always be converted to a point-type
5028             iterator). This means that if the number of <function>modify</function> or
5029             <function>erase</function> operations is non-negligible (say
5030             super-logarithmic in the total sequence of operations) - binary
5031             heaps will perform badly.
5032           </para>
5033
5034         </section>
5035
5036       </section> <!-- details -->
5037
5038     </section> <!-- priority_queue -->
5039
5040
5041
5042   </section> <!-- container -->
5043
5044   </section> <!-- design -->
5045
5046
5047
5048   <!-- S04: Test -->
5049   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5050               href="test_policy_data_structures.xml">
5051   </xi:include>
5052
5053   <!-- S05: Reference/Acknowledgments -->
5054   <section xml:id="pbds.ack">
5055     <info><title>Acknowledgments</title></info>
5056     <?dbhtml filename="policy_data_structures_ack.html"?>
5057
5058     <para>
5059       Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5060       Laboratories), and Benjamin Kosnik (Red Hat).
5061     </para>
5062
5063     <para>
5064       This library was partially written at IBM's Haifa Research Labs.
5065       It is based heavily on policy-based design and uses many useful
5066       techniques from Modern C++ Design: Generic Programming and Design
5067       Patterns Applied by Andrei Alexandrescu.
5068     </para>
5069
5070     <para>
5071       Two ideas are borrowed from the SGI-STL implementation:
5072     </para>
5073
5074     <orderedlist>
5075       <listitem>
5076         <para>
5077           The prime-based resize policies use a list of primes taken from
5078           the SGI-STL implementation.
5079         </para>
5080       </listitem>
5081
5082       <listitem>
5083         <para>
5084           The red-black trees contain both a root node and a header node
5085           (containing metadata), connected in a way that forward and
5086           reverse iteration can be performed efficiently.
5087         </para>
5088       </listitem>
5089     </orderedlist>
5090
5091     <para>
5092       Some test utilities borrow ideas from
5093       <link xmlns:xlink="http://www.w3.org/1999/xlink"
5094             xlink:href="http://www.boost.org/libs/timer/">boost::timer</link>.
5095     </para>
5096
5097     <para>
5098       We would like to thank Scott Meyers for useful comments (without
5099       attributing to him any flaws in the design or implementation of the
5100       library).
5101     </para>
5102     <para>We would like to thank Matt Austern for the suggestion to
5103     include tries.</para>
5104   </section>
5105
5106   <!-- S06: Biblio -->
5107 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5108             href="policy_data_structures_biblio.xml">
5109 </xi:include>
5110
5111 </chapter>