Dead
[official-gcc.git] / gomp-20050608-branch / libstdc++-v3 / docs / html / ext / pb_assoc / hash_based_containers.html
blob9fea0e6d16847ea7410d8427c94408a17399f665
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2 <html>
3 <head>
4 <title>Hash-Based Containers</title>
5 <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
6 <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
7 </head>
9 <body bgcolor = "white">
11 <h1>Hash-Based Containers</h1>
13 <p>
14 This section describes hash-based containers. It is organized
15 as follows.
16 </p>
18 <ol>
19 <li>
20 <a href = "#overview">Overview</a> is an overview.
21 </li>
22 <li>
23 <a href = "#hash_policies">Hash Policies</a> discusses
24 hash policies.
25 </li>
26 <li>
27 <a href = "#resize_policies">Resize Policies</a> discusses
28 resize policies.
29 </li>
30 <li>
31 <a href = "#policy_interaction">Policy Interaction</a> discusses
32 interaction between policies.
33 </li>
34 </ol>
38 <h2><a name = "overview">Overview</a></h2>
41 <p>
42 Figure
43 <a href = "#hash_cd">Hash-based containers</a>
44 shows the container-hierarchy; the hash-based containers are circled.
45 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
46 is a collision-chaining hash-based container;
47 <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
48 is a (general) probing hash-based container.
49 </p>
51 <h6 align = "center">
52 <a name = "hash_cd">
53 <img src = "hash_cd.jpg" width = "70%" alt = "no image">
54 </h6>
55 <h6 align = "center">
56 </a>
57 Hash-based containers.
58 </h6>
60 <p>
61 The collision-chaining hash-based container has the following declaration.
62 </p>
63 <pre>
64 <b>template</b>&lt;
65 <b>typename</b> Key,
66 <b>typename</b> Data,
67 <b>class</b> Hash_Fn = std::hash&lt;Key&gt;,
68 <b>class</b> Eq_Fn = std::equal_to&lt;Key&gt;,
69 <b>class</b> Comb_Hash_Fn =
70 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
71 <b>class</b> Resize_Policy = <i>default explained below.</i>
72 <b>bool</b> Store_Hash = <b>false</b>,
73 <b>class</b> Allocator =
74 std::allocator&lt;<b>char</b>&gt; &gt;
75 <b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>;
76 </pre>
78 <p>
79 The parameters have the following meaning:
80 </p>
81 <ol>
82 <li> <tt>Key</tt> is the key type.
83 </li>
84 <li> <tt>Data</tt> is the data-policy, and is explained in
85 <a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>.
86 </li>
87 <li> <tt>Hash_Fn</tt> is a key hashing functor.</li>
88 <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li>
89 <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it
90 describes how to translate hash values into positions within the table.
91 This is described in
92 <a name = "#hash_policies">Hash Policies</a>.</li>
93 </li>
94 <li> <tt>Resize_Policy</tt> describes how a container object should
95 change its internal size. This is described in
96 <a name = #resize_policies">Resize Policies</a>.</li>
97 <li> <tt>Store_Hash</tt> indicates whether the hash value should
98 be stored with each entry. This is described in
99 <a name = "#policy_interaction">Policy Interaction</a>.</li>
100 <li> <tt>Allocator</tt> is (surprisingly) an allocator type.
101 </li>
102 </ol>
105 The probing hash-based container has the following declaration.
106 </p>
107 <pre>
108 <b>template</b>&lt;
109 <b>typename</b> Key,
110 <b>typename</b> Data,
111 <b>class</b> Hash_Fn =
112 std::hash&lt;
113 Key&gt;,
114 <b>class</b> Eq_Fn =
115 std::equal_to&lt;
116 Key&gt;,
117 <b>class</b> Comb_Probe_Fn =
118 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
119 <b>class</b> Probe_Fn = <i>default explained below.</i>
120 <b>class</b> Resize_Policy = <i>default explained below.</i>
121 <b>bool</b> Store_Hash = <b>false</b>,
122 <b>class</b> Allocator =
123 std::allocator&lt;<b>char</b>&gt; &gt;
124 <b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>;
125 </pre>
128 The parameters are identical to those of the collision-chaining container, except
129 for the following.
130 </p>
131 <ol>
132 <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into
133 a sequence of positions within the table.
134 </li>
135 <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li>
136 </ol>
140 Some of the default template values depend on the values of other parameters,
141 and are explained in
142 <a name = "#policy_interaction">Policy Interaction</a>.
143 </p>
145 <h2><a name = "hash_policies">Hash Policies</h2></a>
147 This subsection describes hash policies. It is organized as follows:
148 </p>
149 <ol>
150 <li> <a href = "#general_terms">General Terms</a> describes
151 some general terms.
152 </li>
153 <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a>
154 describes range-hasing functions.</li>
155 <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a>
156 describes ranged-hash functions. </li>
157 <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a>
158 describes the implementation of these concepts in <tt>pb_assoc</tt>.
159 </li>
160 </ol>
163 <h3><a name="general_terms">General Terms</a></h3>
166 There
167 are actually three functions involved in transforming a key into a hash-table's
168 position (see Figure
169 <a href = "#hash_ranged_hash_range_hashing_fns">
170 Hash runctions, ranged-hash functions, and range-hashing functions
171 </a>):
172 </p>
173 <ol>
174 <li>
175 A <i>ranged-hash</i> function, which maps keys into an interval of the
176 non-negative integrals. This is the function actually required by the
177 hash-table algorithm.
178 </li>
179 <li>
180 A hash function, which maps keys into non-negative integral types. This is
181 typically specified by the writer of the key class.
182 </li>
183 <li>
184 A <i>range-hashing</i> function, which maps non-negative integral types into an
185 interval of non-negative integral types.
186 </li>
187 </ol>
189 <h6 align = "center">
190 <a name = "hash_ranged_hash_range_hashing_fns">
191 <img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image">
192 </h6>
193 <h6 align = "center">
194 </a>
195 Hash runctions, ranged-hash functions, and range-hashing functions.
196 </h6>
199 Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
200 characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
201 into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
202 value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
203 function
204 </p>
206 <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
207 </p>
209 such that for any <i>u</i> in <i>U</i>
211 </p>
213 <i>0 &le; f(u, m) &le; m - 1 </i>,
214 </p>
216 and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
217 One common solution is to use the composition of the hash function
218 </p>
220 <i>h : U &rarr; Z<sub>+</sub> </i>,
221 </p>
223 which maps elements of <i>U</i> into the non-negative integrals, and
224 </p>
226 <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
227 </p>
229 which maps a non-negative hash value, and a non-negative range upper-bound into
230 a non-negative integral in the range between 0 (inclusive) and the range upper
231 bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
232 </p>
234 <i>0 &le; g(r, m) &le; m - 1 </i>.
235 </p>
237 The resulting ranged-hash function, is
238 </p>
240 <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
241 </i>(1) .
242 </p>
245 From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
246 always be composed (however the converse is not true). The STL's hash-based
247 containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed.
248 </p>
252 The above describes the case where a key is to be mapped into a <i>single
253 position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
254 In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
255 within a table, <i>e.g.</i>, in a probing table.
256 </p>
258 Similar terms apply in this case: the table requires a <i>ranged probe</i>
259 function, mapping a key into a sequence of positions withing the table. This is
260 typically acheived by composing a <i>hash function</i> mapping the key
261 into a non-negative integral type, a <i>probe</i> function transforming the
262 hash value into a sequence of hash values, and a <i>range-hashing</i> function
263 transforming the sequence of hash values into a sequence of positions.
264 </p>
267 <h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3>
270 Some common choices for range-hashing functions are the division,
271 multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
272 defined as
273 </p>
275 <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
276 </p>
278 <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
279 </p>
282 </p>
284 <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
285 </p>
287 respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
288 powers of 2), and some <i>a</i>. Each of these range-hashing functions works
289 best for some different setting.
290 </p>
292 The division method <a href="#division_method">(2)</a> is a very common
293 choice. However, even this single method can be implemented in two very
294 different ways. It is possible to implement <a href="#division_method">(2)</a>
295 using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
296 level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
297 2), <i>i.e.</i>,
298 </p>
300 <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
301 </p>
304 </p>
306 <a name="eqn:division_method_bit_mask">
307 <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
308 </i>
309 for some<i> k) </i></a>(4) ,
310 </p>
312 respectively.
313 </p>
315 The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
316 has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
317 is affected by all the bits of <i>r</i> (minimizing the chance of collision).
318 It has the disadvantage of using the costly modulo operation. This method is
319 hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
320 </p>
323 The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
324 has the advantage of relying on the fast bitwise and operation. It has the
325 disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
326 This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
327 </p>
332 <h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3>
335 In some less frequent cases it is beneficial to allow the client to
336 directly specify a ranged-hash hash function. It is true, that the writer of
337 the ranged-hash function cannot rely on the values of <i>m</i> having specific
338 numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
339 since the values of <i>m</i> are determined by a resize policy with possibly
340 orthogonal considerations.
341 </p>
344 There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing
345 [<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second
346 is when the values of <i>m</i> can be used to estimate the
347 "general" number of distinct values required. This is described in the following.
348 </p>
352 </p>
355 <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
356 </p>
359 be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
360 Consider the following ranged-hash function:
361 </p>
364 <a name="eqn:total_string_dna_hash">
366 f<sub>1</sub>(s, m) =
367 &sum; <sub>i =
368 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
369 </a> (5) ,
370 </p>
373 where <i>a</i> is some non-negative integral value. This is the standard
374 string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
375 Its advantage is that it takes into account all of the characters of the
376 string.
377 </p>
380 Now assume that <i>s</i> is the string representation of a of a long DNA
381 sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
382 entire string might be prohibitively expensive. A possible alternative might be
383 to use only the first <i>k</i> characters of the string, where
384 </p>
387 k <sup>|S|</sup> &ge; m ,
388 </p>
390 <i>i.e.</i>, using the hash function
391 </p>
393 <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
394 - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
395 </p>
397 requiring scanning over only
398 </p>
400 <i>k = </i>log<i><sub>4</sub>( m ) </i>
401 </p>
403 characters.
404 </p>
406 Other more elaborate hash-functions might scan <i>k</i> characters starting at
407 a random position (determined at each resize), or scanning <i>k</i> random
408 positions (determined at each resize), <i>i.e.</i>, using
409 </p>
411 <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
412 s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
413 </p>
416 </p>
418 <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
419 a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
420 </p>
423 respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
424 (inclusive) range <i>[0,...,t-1]</i>.
425 </p>
428 <h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3>
431 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is
432 parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor
433 and a combining hash functor, respectively.
434 </p>
437 For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
438 one of the
439 <a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>,
440 then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor.
441 The container will synthesize a ranged-hash functor from both. For example, Figure
442 <a href = "#hash_range_hashing_seq_diagram">
443 Insert hash sequence diagram
444 </a>
445 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
446 the container transforms the key into a non-negative integral using the hash
447 functor (points B and C), and transforms the result into a position
448 using the combining functor (points D and E).
449 </p>
451 <h6 align = "center">
452 <a name = "hash_range_hashing_seq_diagram">
453 <img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image">
454 </a>
455 </h6>
456 <h6 align = "center">
457 Insert hash sequence diagram.
458 </h6>
462 <tt>pb_assoc</tt> contains the following range-hashing policies:
463 </p>
465 <ol>
466 <li>
467 <a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
469 <a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
470 are range-hashing functions based on a bit-mask and a modulo operation, respectively.
471 </li>
472 </ol>
476 If <tt>Comb_Hash_Fn</tt> is instantiated by
477 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
478 and a combining-hash functor, the container treats
479 the combining hash functor as a ranged-hash function. For example, Figure
480 <a href = "#hash_range_hashing_seq_diagram2">
481 Insert hash sequence diagram with a null combination policy
482 </a>
483 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
484 the container transforms the key into a position
485 using the combining functor (points B and C).
486 </p>
489 <h6 align = "center">
490 <a name = "hash_range_hashing_seq_diagram2">
491 <img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image">
492 </a>
493 </h6>
494 <h6 align = "center">
495 Insert hash sequence diagram with a null combination policy.
496 </h6>
499 Similarly,
500 <a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt>
501 is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
502 <tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt>
503 and <tt>Comb_Probe_Fn</tt> are, respectively,
504 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and
505 <a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>,
506 then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt>
507 is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value,
508 and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions
509 within the table.
510 </p>
513 <tt>pb_assoc</tt> contains the following probe policies:
514 </p>
516 <ol>
517 <li>
518 <a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe
519 function.
520 </li>
521 <li>
522 <a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is
523 a quadratic probe function.
524 </li>
525 </ol>
535 <h2><a name = "resize_policies">Resize Policies</a></h2>
538 This subsection describes resize policies. It is organized as follows:
539 </p>
541 <ol>
542 <li> <a href = "#general">General Terms</a> describes general
543 terms.
544 </li>
545 <li> <a href = "#size_policies">Size Policies</a> describes size
546 policies.
547 </li>
548 <li> <a href = "#trigger_policies">Trigger Policies</a> describes trigger
549 policies.
550 </li> <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
551 describes the implementation of these concepts in <tt>pb_assoc</tt>.
552 </li>
553 </ol>
556 <h3><a name = "general">General Terms</a></h3>
559 Hash-tables, as opposed to trees, do not naturally grow or shrink. It
560 is necessary to specify policies to determine how and when a hash table should change
561 its size.
562 </p>
565 In general, resize policies can be decomposed into (probably orthogonal)
566 policies:
567 </p>
568 <ol>
569 <li> A <i>size policy</i> indicating <i>how</i> a hash table should
570 grow (<i>e.g.,</i> it should multiply by powers of 2).
571 </li>
572 <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
573 grow (<i>e.g.,</i> a load factor is exceeded).
574 </li>
575 </ol>
579 <h3><a name = "size_policies">Size Policies</a></h3>
582 Size policies determine how a hash table
583 changes size. These policies are simple, and there are relatively
584 few sensible options. An exponential-size policy (with the initial
585 size and growth factors both powers of 2) works well with a
586 mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
587 and is the
588 hard-wired policy used by Dinkumware
589 [<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
590 prime-list based policy works well with a modulo-prime range
591 hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
592 and is the
593 hard-wired policy used by SGI's implementation
594 [<a href = "references.html#sgi_stl">sgi_stl</a>].
595 </p>
600 <h3><a name = "trigger_policies">Trigger Policies</a></h3>
603 Trigger policies determine when a hash table changes size.
604 Following is a description of two polcies: <i>load-check</i>
605 policies, and a collision-check policies.
606 </p>
609 Load-check policies are straightforward. The user
610 specifies two factors, <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>, and
611 the hash table maintains the invariant that
612 </p>
615 <a name = "eqn:load_factor_min_max">
616 &alpha;<sub>min</sub>
617 &le;
618 (number of stored elements) / (hash-table size)
619 &le;
620 &alpha;<sub>max</sub>
621 </a>
622 </i>
625 </p>
628 Collision-check policies work in the opposite direction of
629 load-check policies. They focus on keeping the number of
630 collisions moderate and hoping
631 that the size of the table will not grow very large,
632 instead of keeping a moderate load-factor and
633 hoping that the number of collisions will be small.
635 maximal collision-check policy resizes when the shortest
636 probe-sequence grows too large.
637 </p>
641 Consider Figure
642 <a href = "#balls_and_bins">Balls and bins</a>.
643 Let the size of the hash table be denoted by <i>m</i>, the
644 length of a probe sequence be denoted by <i>k</i>, and some load
645 factor be denoted by &alpha;. We would like to calculate the
646 minimal length of <i>k</i>, such that if there were <i>&alpha; m</i> elements
647 in the hash table, a probe sequence of length <i>k</i> would be found
648 with probability at most <i>1/m</i>.
649 </p>
651 <h6 align = "center">
652 <a name = "balls_and_bins">
653 <img src = "balls_and_bins.jpg" width = "70%" alt = "no image">
654 </a>
655 </h6>
656 <h6 align = "center">
657 Balls and bins.
658 </h6>
662 Denote the probability that a probe sequence of length <i>k</i>
663 appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
664 of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
665 Then
666 </p>
668 <a name = "eqn:prob_of_p1">
669 <i>p<sub>1</sub>
670 = </i>(3)
671 </a>
672 </p>
675 <b>P</b>(l<sub>1</sub> &ge; k)
677 </i>
678 </p>
680 <i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1 )
681 &le; </i>(a)
682 </p>
690 &alpha; ( k / &alpha; - 1 )<sup>2</sup>
694 </i>
696 </p>
698 where (a) follows from the Chernoff bound
699 [<a href = "references.html#motwani95random">motwani95random</a>].
701 calculate the probability that <i>some</i> bin contains a probe
702 sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
703 negatively-dependent
704 [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
705 Let <i><b>I</b>(.)</i>
706 denote the indicator function. Then
708 <a name = "eqn:at_least_k_i_n_some_bin">
709 <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> &ge; k )
710 = </i>(3)
711 </a>
712 </p>
715 <b>P</b>
717 &sum; <sub>i = 1</sub><sup>m</sup>
718 <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1
721 </i>
722 </p>
725 <b>P</b>
727 &sum; <sub>i = 1</sub><sup>m</sup>
728 <b>I</b>
730 l<sub>i</sub> &ge; k
732 &ge;
733 m p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
735 &le; </i>(a)
736 </p>
744 m p<sub>1</sub>
746 1 / (m p<sub>1</sub>) - 1
748 <sup>2</sup>
754 </i>
755 </p>
757 where (a) follows from the fact that the Chernoff bound can be
758 applied to negatively-dependent variables
759 [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
760 Inserting <a href = "#prob_of_p1">(2)</a> into
761 <a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
762 we obtain
763 </p>
768 &radic;
770 2 &alpha; </i>ln<i> 2 m </i>ln<i>(m) )
772 </i>
774 </p>
784 <h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3>
787 The resize policies in the previous subsection are conceptually straightforward. The design
788 of hash-based containers' size-related interface is complicated by some factors.
789 </p>
790 <ol>
791 <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
792 distinction between the number of entries the container holds and the number of entries it is using. This,
793 of course, is not the case for hash-based containers. Moreover, even describing the
794 "actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
795 holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
796 </li>
797 <li>
798 The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
799 maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
800 <ol>
801 <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
802 </li>
803 <li>
804 In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
805 </li>
806 </ol>
807 </li>
808 <li>
809 Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
810 discusses the previous concepts.)
811 </li>
812 </ol>
815 Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
816 </p>
821 This library attempts to address these types of problems by delegating all size-related functionality to
822 policy classes. Hash-based containers
823 are parameterized by a resize-policy class (among others), and derive publicly from
824 the resize-policy class
825 [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
826 <i>E.g.</i>, a collision-chaining
827 hash table is defined as follows:
828 </p>
829 <pre>
830 cc_ht_map&lt;
831 <b>class</b> Key,
832 <b>class</b> Data,
834 <b>class</b> Resize_Policy
835 ...&gt; :
836 <b>public</b> Resize_Policy
837 </pre>
840 The containers themselves lack any functionality or public interface for manipulating sizes. A container
841 object merely forwards events to its resize policy object and queries it for needed actions.
842 </p>
845 Figure
846 <a href = "#insert_resize_sequence_diagram1">
847 Insert resize sequence diagram
848 </a>
849 shows a (possible) sequence diagram of an insert operation.
850 The user inserts an element; the hash table
851 notifies its resize policy that a search has started (point A);
852 in this case, a single collision is encountered - the table
853 notifies its resize policy of this (point B); the container
854 finally notifies its resize policy that the search has ended (point C);
855 it then queries its resize policy whether a resize is needed, and if so,
856 what is the new size (points D to G); following the resize, it notifies
857 the policy that a resize has completed (point H); finally, the element
858 is inserted, and the policy notified (point I).
859 </p>
861 <h6 align = "center">
862 <a name = "insert_resize_sequence_diagram1">
863 <img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image">
864 </a>
865 </h6>
866 <h6 align = "center">
867 Insert resize sequence diagram.
868 </h6>
871 This addresses, to some extent, the problems mentioned above:
872 </p>
873 <ol>
874 <li>
875 Different instantiations of range-hashing policies can be met with different instantiations of
876 resize policies.
877 </li>
878 <li>
879 Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
880 a container has no method for querying its actual size. It merely continuously forwards enough information to
881 its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design the appropriate method. Also, a container has no methods for setting its size. It merely queries its
882 resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
883 supports a <tt><b>protected virtual</b></tt> function for external resize.
884 </li>
885 </ol>
888 The library contains a single class for instantiating a resize policy,
889 <tt>pb_assoc</tt> contains
890 a standard resize policy,
891 <a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> (the name is explained shortly).
892 In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
893 queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
894 ([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
895 changing between alternative interfaces at compile time.)
896 </p>
899 As noted before,
900 size and trigger policies are usually orthogonal.
901 <a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a>
902 is parameterized by size and trigger policies. For example,
903 a collision-chaining hash table
904 is typically be defined as follows:
905 </p>
906 <pre>
907 cc_ht_map&lt;
908 key,
909 data,
911 ht_standard_resize_policy&lt;
912 some_trigger_policy,
913 some_size_policy,
914 ...&gt; &gt;
915 </pre>
918 The sole function of
919 <a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a>
920 is to
921 act as a standard delegator
922 [<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
923 policies.
926 Figures
927 <a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
929 <a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
930 show sequence diagrams illustrating the interaction between
931 the standard resize policy and its trigger and size policies, respectively.
932 </p>
934 <h6 align = "center">
935 <a name = "insert_resize_sequence_diagram2">
936 <img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image">
937 </a>
938 </h6>
939 <h6 align = "center">
940 Standard resize policy trigger sequence diagram.
941 </h6>
943 <h6 align = "center">
944 <a name = "insert_resize_sequence_diagram3">
945 <img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image">
946 </a>
947 </h6>
948 <h6 align = "center">
949 Standard resize policy size sequence diagram.
950 </h6>
953 The library (currently) supports the following instantiations of size
954 and trigger policies:
955 </p>
957 <ol>
958 <li>
959 <a href = "ht_load_check_trigger.html"><tt>ht_load_check_trigger</tt></a> implements
960 a load check trigger policy.
961 </li>
962 <li>
963 <a href = "ht_max_collision_check_grow_resize_trigger.html"><tt>ht_max_collision_check_grow_resize_trigger</tt></a>
964 implements a collision check trigger policy.
965 </li>
966 <li>
967 <a href = "ht_exponential_size_policy.html"><tt>ht_exponential_size_policy</tt></a> implemens
968 an exponential-size policy (which should be used with mask range hashing).
969 </li>
970 <li>
971 <a href = "ht_prime_size_policy.html"><tt>ht_prime_size_policy</tt></a> implementing
972 a size policy based on a sequence of primes
973 [<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
974 </li>
975 </ol>
978 The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
979 </p>
983 Figure
984 <a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
985 of the resize-related classes.
986 <tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
987 by <tt>Resize_Policy</tt>, from which it subclasses publicly
988 [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
989 This class is currently instantiated only by
990 <a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a>.
991 <a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> itself
992 is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
993 Currently, <tt>Trigger_Policy</tt> is instantiated by
994 <a href = "ht_load_check_trigger.html"><tt>ht_load_check_trigger</tt></a>,
996 <a href = "ht_max_collision_check_grow_resize_trigger.html"><tt>ht_max_collision_check_grow_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
997 <a href = "ht_exponential_size_policy.html"><tt>ht_exponential_size_policy</tt></a>,
999 <a href = "ht_prime_size_policy.html"><tt>ht_prime_size_policy</tt></a>.
1000 </p>
1003 <h6 align = "center">
1004 <a name = "resize_policy_cd">
1005 <img src = "resize_policy_cd.jpg" width = "70%" alt = "no image">
1006 </a>
1007 </h6>
1008 <h6 align = "center">
1009 Resize policy class diagram.
1010 </h6>
1015 <h2><a name = "#policy_interaction">Policy Interaction</a></h2>
1018 Hash-tables are unfortunately susceptible to choice of policies. One
1019 of the more complicated aspects of this is that poor combinations of good policies
1020 can alter performance drastically. Following are some considerations.
1021 </p>
1027 <h3>Range-Hashing Policies and Resize Policies</h3>
1030 </p>
1033 <h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3>
1037 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
1039 <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
1040 are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt>
1041 parameter. If the latter parameter is <tt><b>true</b></tt>, then
1042 the container stores with each entry a hash value, and uses
1043 this value in case of collisions to determine whether to apply a hash value.
1044 This can lower the cost of collision for some types, but increase the cost of collisions for other types.
1045 </p>
1048 If a ranged-hash function or ranged probe function is directly supplied, however,
1049 then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted.
1050 </p>
1054 </body>
1056 </html>