old-autovect-branch/libstdc++-v3/docs/html/ext/pb_assoc/hash_policies.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
   2 <html>
   3 <head>
   4     <title>Hash Policies</title>
   5     <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
   6     <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
   7 </head>
   8 <body bgcolor="white">
   9
  10 <h1>Hash Policies</h1>
  11 <p>
  12     This subsection describes hash policies. It is organized as follows:
  13 </p>
  14 <ol>
  15     <li> The <a href = "#general_terms">General Terms</a> Section describes
  16             some general terms.
  17     </li>
  18     <li> The <a href = "#range_hashing_fns">Range-Hashing Functions</a> Section
  19         describes range-hasing functions.</li>
  20     <li> The <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> Section
  21         describes ranged-hash functions. </li>
  22     <li> The <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> Section
  23             describes the implementation of these concepts in <tt>pb_assoc</tt>.
  24     </li>
  25 </ol>
  26
  27
  28 <h2><a name="general_terms">General Terms</a></h2>
  29
  30 <p>
  31     There
  32 are actually three functions involved in transforming a key into a hash-table's
  33 position (see Figure
  34 <a href = "#hash_ranged_hash_range_hashing_fns">
  35 Hash runctions, ranged-hash functions, and range-hashing functions
  36 </a>):
  37 </p>
  38 <ol>
  39     <li>
  40         A <i>ranged-hash</i> function, which maps keys into an interval of the
  41         non-negative integrals. This is the function actually required by the
  42         hash-table algorithm.
  43     </li>
  44     <li>
  45         A hash function, which maps keys into non-negative integral types. This is
  46         typically specified by the writer of the key class.
  47     </li>
  48     <li>
  49         A <i>range-hashing</i> function, which maps non-negative integral types into an
  50         interval of non-negative integral types.
  51     </li>
  52 </ol>
  53
  54 <h6 align = "center">
  55 <a name = "hash_ranged_hash_range_hashing_fns">
  56 <img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "40%" alt = "no image">
  57 </a>
  58 Hash runctions, ranged-hash functions, and range-hashing functions.
  59 </h6>
  60
  61 <p>
  62     Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
  63     characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
  64     into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
  65     value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
  66     function
  67 </p>
  68 <p>
  69     <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
  70 </p>
  71 <p>
  72     such that for any <i>u</i> in <i>U</i>
  73 ,
  74 </p>
  75 <p>
  76     <i>0 &le; f(u, m) &le; m - 1 </i>,
  77 </p>
  78 <p>
  79     and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
  80     One common solution is to use the composition of the hash function
  81 </p>
  82 <p>
  83     <i>h : U &rarr; Z<sub>+</sub> </i>,
  84 </p>
  85 <p>
  86     which maps elements of <i>U</i> into the non-negative integrals, and
  87 </p>
  88 <p>
  89     <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
  90 </p>
  91 <p>
  92     which maps a non-negative hash value, and a non-negative range upper-bound into
  93     a non-negative integral in the range between 0 (inclusive) and the range upper
  94     bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
  95 </p>
  96 <p>
  97     <i>0 &le; g(r, m) &le; m - 1 </i>.
  98 </p>
  99 <p>
 100     The resulting ranged-hash function, is
 101 </p>
 102 <p>
 103     <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
 104     </i>(1) .
 105 </p>
 106
 107 <p>
 108     From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
 109     always be composed (however the converse is not true).
 110 </p>
 111
 112
 113 <p>
 114     The above describes the case where a key is to be mapped into a <i>single
 115 position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
 116 In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
 117 within a table, <i>e.g.</i>, in a probing table.
 118 </p>
 119 <p>
 120     Similar terms apply in this case: the table requires a <i>ranged probe</i>
 121 function, mapping a key into a sequence of positions withing the table. This is
 122 typically acheived by composing a <i>hash function</i> mapping the key
 123 into a non-negative integral type, a <i>probe</i> function transforming the
 124 hash value into a sequence of hash values, and a <i>range-hashing</i> function
 125 transforming the sequence of hash values into a sequence of positions.
 126 </p>
 127
 128
 129 <h2><a name="range_hashing_fns">Range-Hashing Functions</a></h2>
 130
 131 <p>
 132     Some common choices for range-hashing functions are the division,
 133     multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
 134     defined as
 135 </p>
 136 <p>
 137     <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
 138 </p>
 139 <p>
 140     <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
 141 </p>
 142 <p>
 143     and
 144 </p>
 145 <p>
 146     <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
 147 </p>
 148 <p>
 149 respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
 150 powers of 2), and some <i>a</i>. Each of these range-hashing functions works
 151 best for some different setting.
 152 </p>
 153 <p>
 154     The division method <a href="#division_method">(2)</a> is a very common
 155     choice. However, even this single method can be implemented in two very
 156     different ways. It is possible to implement <a href="#division_method">(2)</a>
 157     using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
 158     level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
 159     2), <i>i.e.</i>,
 160 </p>
 161 <p>
 162     <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
 163 </p>
 164 <p>
 165     and
 166 </p>
 167 <p>
 168     <a name="eqn:division_method_bit_mask">
 169     <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
 170     </i>
 171         for some<i> k) </i></a>(4) ,
 172 </p>
 173 <p>
 174     respectively.
 175 </p>
 176 <p>
 177     The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
 178     has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
 179     is affected by all the bits of <i>r</i> (minimizing the chance of collision).
 180     It has the disadvantage of using the costly modulo operation. This method is
 181     hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
 182 </p>
 183
 184 <p>
 185     The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
 186     has the advantage of relying on the fast bitwise and operation. It has the
 187     disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
 188     This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
 189 </p>
 190
 191
 192
 193
 194 <h2><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h2>
 195
 196 <p>
 197     Although rarer, there are cases where it is beneficial to allow the client to
 198 directly specify a ranged-hash hash function. It is true, that the writer of
 199 the ranged-hash function cannot rely on the values of <i>m</i> having specific
 200 numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
 201 since the values of <i>m</i> are determined by a resize policy with possibly
 202 orthogonal considerations [<a href="references.html#austern98segmented">austern98segmented</a>].
 203 The values of <i>m</i> can be used in some cases, though, to estimate the
 204 "general" number of distinct values required.
 205 </p>
 206
 207 <p>
 208     Let
 209 </p>
 210
 211 <p>
 212     <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
 213 </p>
 214
 215 <p>
 216     be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
 217 Consider the following ranged-hash function:
 218 </p>
 219
 220 <p>
 221     <a name="eqn:total_string_dna_hash">
 222         <i>
 223             f<sub>1</sub>(s, m) =
 224             &sum; <sub>i =
 225             0</sub><sup>t   - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
 226     </a> (5) ,
 227 </p>
 228
 229 <p>
 230     where <i>a</i> is some non-negative integral value. This is the standard
 231 string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
 232 Its advantage is that it takes into account all of the characters of the
 233 string.
 234 </p>
 235
 236 <p>
 237     Now assume that <i>s</i> is the string representation of a of a long DNA
 238 sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
 239 entire string might be prohibitively expensive. A possible alternative might be
 240 to use only the first <i>k</i> characters of the string, where
 241 </p>
 242
 243 <p>
 244     k <sup>|S|</sup> &ge; m ,
 245 </p>
 246 <p>
 247     <i>i.e.</i>, using the hash function
 248 </p>
 249 <p>
 250     <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
 251                 - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
 252 </p>
 253 <p>
 254     requiring scanning over only
 255 </p>
 256 <p>
 257     <i>k = </i>log<i><sub>4</sub>( m ) </i>
 258 </p>
 259 <p>
 260     characters.
 261 </p>
 262 <p>
 263     Other more elaborate hash-functions might scan <i>k</i> characters starting at
 264     a random position (determined at each resize), or scanning <i>k</i> random
 265     positions (determined at each resize), <i>i.e.</i>, using
 266 </p>
 267 <p>
 268     <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
 269         s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
 270 </p>
 271 <p>
 272     or
 273 </p>
 274 <p>
 275     <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
 276         a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
 277 </p>
 278 <p>
 279 <p>
 280     respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
 281     (inclusive) range <i>[0,...,t-1]</i>.
 282 </p>
 283
 284
 285 <h2><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h2>
 286
 287 <p>
 288     Containers based on collision-chaining hash tables in <tt>pb_assoc</tt>
 289 are parameterized by the functors <tt>Hash_Fn</tt>, and <tt>Comb_Hash_Fn</tt>.
 290 </p>
 291
 292 <p>
 293     If such a container is instantiated with any hash functor and
 294 range-hashing functor, the container will synthesize a ranged-hash functor
 295 automatically. For example, Figure
 296 <a href = "#hash_range_hashing_seq_diagram">
 297 Insert hash sequence diagram
 298 </a>
 299 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
 300 the container transforms the key into a non-negative integral using the hash
 301 functor (points B and C), and transforms the result into a position
 302 using the combining functor (points D and E).
 303 </p>
 304
 305 <h6 align = "center">
 306 <a name = "hash_range_hashing_seq_diagram">
 307 <img src = "hash_range_hashing_seq_diagram.jpg" width = "50%" alt = "no image">
 308 </a>
 309 </h6>
 310 <h6 align = "center">
 311 Insert hash sequence diagram.
 312 </h6>
 313
 314
 315 <p>
 316     If such a container is instantiated with the
 317 <a href = "concepts.html#concepts_null_policies">null policy</a>
 318 hash functor,
 319 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
 320 and a combining-hash functor, the container treats
 321 the combining hash functor as a ranged-hash function. For example, Figure
 322 <a href = "#hash_range_hashing_seq_diagram2">
 323 Insert hash sequence diagram with a null combination policy
 324 </a>
 325 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
 326 the container transforms the key into a position
 327 using the combining functor (points B and C).
 328 </p>
 329
 330
 331 <h6 align = "center">
 332 <a name = "hash_range_hashing_seq_diagram2">
 333 <img src = "hash_range_hashing_seq_diagram2.jpg" width = "50%" alt = "no image">
 334 </a>
 335 </h6>
 336 <h6 align = "center">
 337 Insert hash sequence diagram with a null combination policy.
 338 </h6>
 339
 340 <p>
 341     <tt>pb_assoc</tt> contains the following hash-related policies:
 342 </p>
 343
 344 <ol>
 345     <li>
 346 <a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
 347 and
 348 <a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
 349 are range-hashing functions based on a bit-mask and a modulo operation, respectively.
 350     </li>
 351     <li>
 352 <a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> and
 353 <a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are probe
 354 classes based on linear and quadratic increment, respectively.
 355     </li>
 356     <li>
 357 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>
 358 and
 359 <a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>
 360 are
 361 <a href = "concepts.html#concepts_null_policies">null policy classes</a> for creating
 362 ranged-hash and ranged-probe functions directly (<i>i.e.</i>, not through
 363 composition).
 364     </li>
 365 </ol>
 366
 367 <p>
 368     <tt>pb_assoc</tt> does not provide any hash functions (it relies on those
 369 of the STL).
 370 </p>
 371
 372
 373 </body>
 374
 375 </html>