1 <!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.0 Transitional//EN">
4 <title>Hash Policies
</title>
5 <meta name=
"GENERATOR" content=
"Microsoft Visual Studio .NET 7.1">
6 <meta name=
"vs_targetSchema" content=
"http://schemas.microsoft.com/intellisense/ie5">
10 <h1>Hash Policies
</h1>
12 This subsection describes hash policies. It is organized as follows:
15 <li> The
<a href =
"#general_terms">General Terms
</a> Section describes
18 <li> The
<a href =
"#range_hashing_fns">Range-Hashing Functions
</a> Section
19 describes range-hasing functions.
</li>
20 <li> The
<a href =
"#hash_policies_ranged_hash_policies">Ranged-Hash Functions
</a> Section
21 describes ranged-hash functions.
</li>
22 <li> The
<a href =
"#pb_assoc_imp">Implementation in
<tt>pb_assoc
</tt></a> Section
23 describes the implementation of these concepts in
<tt>pb_assoc
</tt>.
28 <h2><a name=
"general_terms">General Terms
</a></h2>
32 are actually three functions involved in transforming a key into a hash-table's
34 <a href =
"#hash_ranged_hash_range_hashing_fns">
35 Hash runctions, ranged-hash functions, and range-hashing functions
40 A
<i>ranged-hash
</i> function, which maps keys into an interval of the
41 non-negative integrals. This is the function actually required by the
45 A hash function, which maps keys into non-negative integral types. This is
46 typically specified by the writer of the key class.
49 A
<i>range-hashing
</i> function, which maps non-negative integral types into an
50 interval of non-negative integral types.
55 <a name =
"hash_ranged_hash_range_hashing_fns">
56 <img src =
"hash_ranged_hash_range_hashing_fns.jpg" width =
"40%" alt =
"no image">
58 Hash runctions, ranged-hash functions, and range-hashing functions.
62 Let
<i>U
</i> be a domain (
<i>e.g.
</i>, the integers, or the strings of
3
63 characters). A hash-table algorithm needs to map elements of
<i>U
</i> "uniformly"
64 into the range
<i>[
0,..., m -
1]
</i> (where
<i>m
</i> is a non-negative integral
65 value, and is, in general, time varying).
<i>I.e.
</i>, the algorithm needs a
<i>ranged-hash
</i>
69 <i>f : U
× Z
<sub>+
</sub> → Z
<sub>+
</sub> </i>,
72 such that for any
<i>u
</i> in
<i>U
</i>
76 <i>0 ≤ f(u, m)
≤ m -
1 </i>,
79 and which has
"good uniformity" properties [
<a href=
"references.html#knuth98sorting">knuth98sorting
</a>].
80 One common solution is to use the composition of the hash function
83 <i>h : U
→ Z
<sub>+
</sub> </i>,
86 which maps elements of
<i>U
</i> into the non-negative integrals, and
89 <i>g : Z
<sub>+
</sub> × Z
<sub>+
</sub> → Z
<sub>+
</sub>,
</i>
92 which maps a non-negative hash value, and a non-negative range upper-bound into
93 a non-negative integral in the range between
0 (inclusive) and the range upper
94 bound (exclusive),
<i>i.e.
</i>, for any
<i>r
</i> in
<i>Z
<sub>+
</sub></i>,
97 <i>0 ≤ g(r, m)
≤ m -
1 </i>.
100 The resulting ranged-hash function, is
103 <i><a name=
"eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m)
</a>
108 From the above, it is obvious that given
<i>g
</i> and
<i>h
</i>,
<i>f
</i> can
109 always be composed (however the converse is not true).
114 The above describes the case where a key is to be mapped into a
<i>single
115 position
</i> within a hash table,
<i>e.g.
</i>, in a collision-chaining table.
116 In other cases, a key is to be mapped into a
<i>sequence of poisitions
</i>
117 within a table,
<i>e.g.
</i>, in a probing table.
120 Similar terms apply in this case: the table requires a
<i>ranged probe
</i>
121 function, mapping a key into a sequence of positions withing the table. This is
122 typically acheived by composing a
<i>hash function
</i> mapping the key
123 into a non-negative integral type, a
<i>probe
</i> function transforming the
124 hash value into a sequence of hash values, and a
<i>range-hashing
</i> function
125 transforming the sequence of hash values into a sequence of positions.
129 <h2><a name=
"range_hashing_fns">Range-Hashing Functions
</a></h2>
132 Some common choices for range-hashing functions are the division,
133 multiplication, and middle-square methods [
<a href=
"references.html#knuth98sorting">knuth98sorting
</a>],
137 <i><a name=
"eqn:division_method">g(r, m) = r mod m
</a></i>(
2) ,
140 <i>g(r, m) =
⌈ u/v ( a r mod v )
⌉ </i>,
146 <i>g(r, m) =
⌈ u/v ( r
<sup>2</sup> mod v )
⌉ </i>,
149 respectively, for some positive integrals
<i>u
</i> and
<i>v
</i> (typically
150 powers of
2), and some
<i>a
</i>. Each of these range-hashing functions works
151 best for some different setting.
154 The division method
<a href=
"#division_method">(
2)
</a> is a very common
155 choice. However, even this single method can be implemented in two very
156 different ways. It is possible to implement
<a href=
"#division_method">(
2)
</a>
157 using the low level
<i>%
</i> (modulo) operation (for any
<i>m
</i>), or the low
158 level
<i>&</i> (bit-mask) operation (for the case where
<i>m
</i> is a power of
162 <i><a name=
"eqn:division_method_prime_mod">g(r, m) = r % m
</a></i>(
3) ,
168 <a name=
"eqn:division_method_bit_mask">
169 <i>g(r, m) = r
& m -
1, ( m =
2<sup>k
</sup>
171 for some
<i> k)
</i></a>(
4) ,
177 The
<i>%
</i> (modulo) implementation
<a href=
"#division_method_prime_mod">(
3)
</a>
178 has the advantage that for
<i>m
</i> a prime far from a power of
2,
<i>g(r, m)
</i>
179 is affected by all the bits of
<i>r
</i> (minimizing the chance of collision).
180 It has the disadvantage of using the costly modulo operation. This method is
181 hard-wired into SGI's implementation [
<a href=
"references.html#sgi_stl">sgi_stl
</a>].
185 The
<i>&</i> (bit-mask) implementation
<a href=
"#division_method_bit_mask">(
4)
</a>
186 has the advantage of relying on the fast bitwise and operation. It has the
187 disadvantage that for
<i>g(r, m)
</i> is affected only by the low order bits of
<i>r
</i>.
188 This method is hard-wired into Dinkumware's implementation [
<a href=
"references.html#dinkumware_stl">dinkumware_stl
</a>].
194 <h2><a name=
"hash_policies_ranged_hash_policies">Ranged-Hash Functions
</a></h2>
197 Although rarer, there are cases where it is beneficial to allow the client to
198 directly specify a ranged-hash hash function. It is true, that the writer of
199 the ranged-hash function cannot rely on the values of
<i>m
</i> having specific
200 numerical properties suitable for hashing (in the sense used in [
<a href=
"references.html#knuth98sorting">knuth98sorting
</a>]),
201 since the values of
<i>m
</i> are determined by a resize policy with possibly
202 orthogonal considerations [
<a href=
"references.html#austern98segmented">austern98segmented
</a>].
203 The values of
<i>m
</i> can be used in some cases, though, to estimate the
204 "general" number of distinct values required.
212 <i>s = [ s
<sub>0</sub>,..., s
<sub>t -
1</sub>]
</i>
216 be a string of
<i>t
</i> characters, each of which is from domain
<i>S
</i>.
217 Consider the following ranged-hash function:
221 <a name=
"eqn:total_string_dna_hash">
223 f
<sub>1</sub>(s, m) =
225 0</sub><sup>t -
1</sup> s
<sub>i
</sub> a
<sup>i
</sup> </i>mod
<i> m
</i>
230 where
<i>a
</i> is some non-negative integral value. This is the standard
231 string-hashing function used in SGI's implementation (with
<i>a =
5</i>) [
<a href=
"references.html#sgi_stl">sgi_stl
</a>].
232 Its advantage is that it takes into account all of the characters of the
237 Now assume that
<i>s
</i> is the string representation of a of a long DNA
238 sequence (and so
<i>S = {'A', 'C', 'G', 'T'}
</i>). In this case, scanning the
239 entire string might be prohibitively expensive. A possible alternative might be
240 to use only the first
<i>k
</i> characters of the string, where
244 k
<sup>|S|
</sup> ≥ m ,
247 <i>i.e.
</i>, using the hash function
250 <a name=
"eqn:only_k_string_dna_hash"><i>f
<sub>2</sub>(s, m) =
∑ <sub>i =
0</sub><sup>k
251 -
1</sup> s
<sub>i
</sub> a
<sup>i
</sup> </i>mod
<i>m
</i></a>, (
6)
254 requiring scanning over only
257 <i>k =
</i>log
<i><sub>4</sub>( m )
</i>
263 Other more elaborate hash-functions might scan
<i>k
</i> characters starting at
264 a random position (determined at each resize), or scanning
<i>k
</i> random
265 positions (determined at each resize),
<i>i.e.
</i>, using
268 <i>f
<sub>3</sub>(s, m) =
∑ <sub>i = r
<sub>0</sub></sub><sup>r
<sub>0</sub> + k -
1</sup>
269 s
<sub>i
</sub> a
<sup>i
</sup> </i>mod
<i>m
</i>,
275 <i>f
<sub>4</sub>(s, m) =
∑ <sub>i =
0</sub><sup>k -
1</sup> s
<sub>r
<sub>i
</sub></sub>
276 a
<sup>r
<sub>i
</sub></sup> </i>mod
<i>m
</i>,
280 respectively, for
<i>r
<sub>0</sub>,..., r
<sub>k-
1</sub></i> each in the
281 (inclusive) range
<i>[
0,...,t-
1]
</i>.
285 <h2><a name=
"pb_assoc_imp">Implementation in
<tt>pb_assoc
</tt></a></h2>
288 Containers based on collision-chaining hash tables in
<tt>pb_assoc
</tt>
289 are parameterized by the functors
<tt>Hash_Fn
</tt>, and
<tt>Comb_Hash_Fn
</tt>.
293 If such a container is instantiated with any hash functor and
294 range-hashing functor, the container will synthesize a ranged-hash functor
295 automatically. For example, Figure
296 <a href =
"#hash_range_hashing_seq_diagram">
297 Insert hash sequence diagram
299 shows an
<tt>insert
</tt> sequence diagram. The user inserts an element (point A),
300 the container transforms the key into a non-negative integral using the hash
301 functor (points B and C), and transforms the result into a position
302 using the combining functor (points D and E).
305 <h6 align =
"center">
306 <a name =
"hash_range_hashing_seq_diagram">
307 <img src =
"hash_range_hashing_seq_diagram.jpg" width =
"50%" alt =
"no image">
310 <h6 align =
"center">
311 Insert hash sequence diagram.
316 If such a container is instantiated with the
317 <a href =
"concepts.html#concepts_null_policies">null policy
</a>
319 <a href =
"null_hash_fn.html"><tt>null_hash_fn
</tt></a>,
320 and a combining-hash functor, the container treats
321 the combining hash functor as a ranged-hash function. For example, Figure
322 <a href =
"#hash_range_hashing_seq_diagram2">
323 Insert hash sequence diagram with a null combination policy
325 shows an
<tt>insert
</tt> sequence diagram. The user inserts an element (point A),
326 the container transforms the key into a position
327 using the combining functor (points B and C).
331 <h6 align =
"center">
332 <a name =
"hash_range_hashing_seq_diagram2">
333 <img src =
"hash_range_hashing_seq_diagram2.jpg" width =
"50%" alt =
"no image">
336 <h6 align =
"center">
337 Insert hash sequence diagram with a null combination policy.
341 <tt>pb_assoc
</tt> contains the following hash-related policies:
346 <a href =
"direct_mask_range_hashing.html"><tt>direct_mask_range_hashing
</tt></a>
348 <a href =
"direct_mod_range_hashing.html"><tt>direct_mod_range_hashing
</tt></a>
349 are range-hashing functions based on a bit-mask and a modulo operation, respectively.
352 <a href =
"linear_probe_fn.html"><tt>linear_probe_fn
</tt></a> and
353 <a href =
"quadratic_probe_fn.html"><tt>quadratic_probe_fn
</tt></a> are probe
354 classes based on linear and quadratic increment, respectively.
357 <a href =
"null_hash_fn.html"><tt>null_hash_fn
</tt></a>
359 <a href =
"null_probe_fn.html"><tt>null_probe_fn
</tt></a>
361 <a href =
"concepts.html#concepts_null_policies">null policy classes
</a> for creating
362 ranged-hash and ranged-probe functions directly (
<i>i.e.
</i>, not through
368 <tt>pb_assoc
</tt> does not provide any hash functions (it relies on those