diff options
Diffstat (limited to 'libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html')
-rw-r--r-- | libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html | 1056 |
1 files changed, 1056 insertions, 0 deletions
diff --git a/libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html b/libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html new file mode 100644 index 00000000000..9fea0e6d168 --- /dev/null +++ b/libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html @@ -0,0 +1,1056 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> +<html> + <head> + <title>Hash-Based Containers</title> + <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1"> + <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> + </head> + +<body bgcolor = "white"> + +<h1>Hash-Based Containers</h1> + +<p> + This section describes hash-based containers. It is organized +as follows. +</p> + +<ol> + <li> + <a href = "#overview">Overview</a> is an overview. + </li> + <li> + <a href = "#hash_policies">Hash Policies</a> discusses + hash policies. + </li> + <li> + <a href = "#resize_policies">Resize Policies</a> discusses + resize policies. + </li> + <li> + <a href = "#policy_interaction">Policy Interaction</a> discusses + interaction between policies. + </li> +</ol> + + + +<h2><a name = "overview">Overview</a></h2> + + +<p> + Figure +<a href = "#hash_cd">Hash-based containers</a> + shows the container-hierarchy; the hash-based containers are circled. +<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> +is a collision-chaining hash-based container; +<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> +is a (general) probing hash-based container. +</p> + +<h6 align = "center"> +<a name = "hash_cd"> +<img src = "hash_cd.jpg" width = "70%" alt = "no image"> +</h6> +<h6 align = "center"> +</a> +Hash-based containers. +</h6> + +<p> + The collision-chaining hash-based container has the following declaration. +</p> +<pre> +<b>template</b>< + <b>typename</b> Key, + <b>typename</b> Data, + <b>class</b> Hash_Fn = std::hash<Key>, + <b>class</b> Eq_Fn = std::equal_to<Key>, + <b>class</b> Comb_Hash_Fn = + <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> + <b>class</b> Resize_Policy = <i>default explained below.</i> + <b>bool</b> Store_Hash = <b>false</b>, + <b>class</b> Allocator = + std::allocator<<b>char</b>> > +<b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>; +</pre> + +<p> + The parameters have the following meaning: +</p> +<ol> + <li> <tt>Key</tt> is the key type. + </li> + <li> <tt>Data</tt> is the data-policy, and is explained in +<a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>. + </li> + <li> <tt>Hash_Fn</tt> is a key hashing functor.</li> + <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li> + <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it +describes how to translate hash values into positions within the table. +This is described in +<a name = "#hash_policies">Hash Policies</a>.</li> + </li> + <li> <tt>Resize_Policy</tt> describes how a container object should +change its internal size. This is described in +<a name = #resize_policies">Resize Policies</a>.</li> + <li> <tt>Store_Hash</tt> indicates whether the hash value should +be stored with each entry. This is described in +<a name = "#policy_interaction">Policy Interaction</a>.</li> + <li> <tt>Allocator</tt> is (surprisingly) an allocator type. + </li> +</ol> + +<p> + The probing hash-based container has the following declaration. +</p> +<pre> +<b>template</b>< + <b>typename</b> Key, + <b>typename</b> Data, + <b>class</b> Hash_Fn = + std::hash< + Key>, + <b>class</b> Eq_Fn = + std::equal_to< + Key>, + <b>class</b> Comb_Probe_Fn = + <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> + <b>class</b> Probe_Fn = <i>default explained below.</i> + <b>class</b> Resize_Policy = <i>default explained below.</i> + <b>bool</b> Store_Hash = <b>false</b>, + <b>class</b> Allocator = + std::allocator<<b>char</b>> > +<b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>; +</pre> + +<p> + The parameters are identical to those of the collision-chaining container, except +for the following. +</p> +<ol> + <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into +a sequence of positions within the table. + </li> + <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li> +</ol> + + +<p> + Some of the default template values depend on the values of other parameters, +and are explained in +<a name = "#policy_interaction">Policy Interaction</a>. +</p> + +<h2><a name = "hash_policies">Hash Policies</h2></a> +<p> + This subsection describes hash policies. It is organized as follows: +</p> +<ol> + <li> <a href = "#general_terms">General Terms</a> describes + some general terms. + </li> + <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a> + describes range-hasing functions.</li> + <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> + describes ranged-hash functions. </li> + <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> + describes the implementation of these concepts in <tt>pb_assoc</tt>. + </li> +</ol> + + +<h3><a name="general_terms">General Terms</a></h3> + +<p> + There +are actually three functions involved in transforming a key into a hash-table's +position (see Figure +<a href = "#hash_ranged_hash_range_hashing_fns"> +Hash runctions, ranged-hash functions, and range-hashing functions +</a>): +</p> +<ol> + <li> + A <i>ranged-hash</i> function, which maps keys into an interval of the + non-negative integrals. This is the function actually required by the + hash-table algorithm. + </li> + <li> + A hash function, which maps keys into non-negative integral types. This is + typically specified by the writer of the key class. + </li> + <li> + A <i>range-hashing</i> function, which maps non-negative integral types into an + interval of non-negative integral types. + </li> +</ol> + +<h6 align = "center"> +<a name = "hash_ranged_hash_range_hashing_fns"> +<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image"> +</h6> +<h6 align = "center"> +</a> +Hash runctions, ranged-hash functions, and range-hashing functions. +</h6> + +<p> + Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3 + characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly" + into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral + value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i> + function +</p> +<p> + <i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>, +</p> +<p> + such that for any <i>u</i> in <i>U</i> +, +</p> +<p> + <i>0 ≤ f(u, m) ≤ m - 1 </i>, +</p> +<p> + and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>]. + One common solution is to use the composition of the hash function +</p> +<p> + <i>h : U → Z<sub>+</sub> </i>, +</p> +<p> + which maps elements of <i>U</i> into the non-negative integrals, and +</p> +<p> + <i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i> +</p> +<p> + which maps a non-negative hash value, and a non-negative range upper-bound into + a non-negative integral in the range between 0 (inclusive) and the range upper + bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>, +</p> +<p> + <i>0 ≤ g(r, m) ≤ m - 1 </i>. +</p> +<p> + The resulting ranged-hash function, is +</p> +<p> + <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a> + </i>(1) . +</p> + +<p> + From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can + always be composed (however the converse is not true). The STL's hash-based + containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed. +</p> + + +<p> + The above describes the case where a key is to be mapped into a <i>single +position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table. +In other cases, a key is to be mapped into a <i>sequence of poisitions</i> +within a table, <i>e.g.</i>, in a probing table. +</p> +<p> + Similar terms apply in this case: the table requires a <i>ranged probe</i> +function, mapping a key into a sequence of positions withing the table. This is +typically acheived by composing a <i>hash function</i> mapping the key +into a non-negative integral type, a <i>probe</i> function transforming the +hash value into a sequence of hash values, and a <i>range-hashing</i> function +transforming the sequence of hash values into a sequence of positions. +</p> + + +<h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3> + +<p> + Some common choices for range-hashing functions are the division, + multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>], + defined as +</p> +<p> + <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) , +</p> +<p> + <i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>, +</p> +<p> + and +</p> +<p> + <i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>, +</p> +<p> +respectively, for some positive integrals <i>u</i> and <i>v</i> (typically +powers of 2), and some <i>a</i>. Each of these range-hashing functions works +best for some different setting. +</p> +<p> + The division method <a href="#division_method">(2)</a> is a very common + choice. However, even this single method can be implemented in two very + different ways. It is possible to implement <a href="#division_method">(2)</a> + using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low + level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of + 2), <i>i.e.</i>, +</p> +<p> + <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) , +</p> +<p> + and +</p> +<p> + <a name="eqn:division_method_bit_mask"> + <i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup> + </i> + for some<i> k) </i></a>(4) , +</p> +<p> + respectively. +</p> +<p> + The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a> + has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> + is affected by all the bits of <i>r</i> (minimizing the chance of collision). + It has the disadvantage of using the costly modulo operation. This method is + hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>]. +</p> + +<p> + The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a> + has the advantage of relying on the fast bitwise and operation. It has the + disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>. + This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]. +</p> + + + + +<h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3> + +<p> + In some less frequent cases it is beneficial to allow the client to +directly specify a ranged-hash hash function. It is true, that the writer of +the ranged-hash function cannot rely on the values of <i>m</i> having specific +numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]), +since the values of <i>m</i> are determined by a resize policy with possibly +orthogonal considerations. +</p> + +<p> + There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing +[<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second +is when the values of <i>m</i> can be used to estimate the +"general" number of distinct values required. This is described in the following. +</p> + +<p> + Let +</p> + +<p> + <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i> +</p> + +<p> + be a string of <i>t</i> characters, each of which is from domain <i>S</i>. +Consider the following ranged-hash function: +</p> + +<p> + <a name="eqn:total_string_dna_hash"> + <i> + f<sub>1</sub>(s, m) = + ∑ <sub>i = + 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i> + </a> (5) , +</p> + +<p> + where <i>a</i> is some non-negative integral value. This is the standard +string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>]. +Its advantage is that it takes into account all of the characters of the +string. +</p> + +<p> + Now assume that <i>s</i> is the string representation of a of a long DNA +sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the +entire string might be prohibitively expensive. A possible alternative might be +to use only the first <i>k</i> characters of the string, where +</p> + +<p> + k <sup>|S|</sup> ≥ m , +</p> +<p> + <i>i.e.</i>, using the hash function +</p> +<p> + <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k + - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6) +</p> +<p> + requiring scanning over only +</p> +<p> + <i>k = </i>log<i><sub>4</sub>( m ) </i> +</p> +<p> + characters. +</p> +<p> + Other more elaborate hash-functions might scan <i>k</i> characters starting at + a random position (determined at each resize), or scanning <i>k</i> random + positions (determined at each resize), <i>i.e.</i>, using +</p> +<p> + <i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup> + s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>, +</p> +<p> + or +</p> +<p> + <i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub> + a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>, +</p> +<p> +<p> + respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the + (inclusive) range <i>[0,...,t-1]</i>. +</p> + + +<h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3> + +<p> +<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is +parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor +and a combining hash functor, respectively. +</p> + +<p> + For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, +one of the +<a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>, +then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor. +The container will synthesize a ranged-hash functor from both. For example, Figure +<a href = "#hash_range_hashing_seq_diagram"> +Insert hash sequence diagram +</a> +shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), +the container transforms the key into a non-negative integral using the hash +functor (points B and C), and transforms the result into a position +using the combining functor (points D and E). +</p> + +<h6 align = "center"> +<a name = "hash_range_hashing_seq_diagram"> +<img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Insert hash sequence diagram. +</h6> + + +<p> + <tt>pb_assoc</tt> contains the following range-hashing policies: +</p> + +<ol> + <li> +<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a> +and +<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a> +are range-hashing functions based on a bit-mask and a modulo operation, respectively. + </li> +</ol> + + +<p> + If <tt>Comb_Hash_Fn</tt> is instantiated by +<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, +and a combining-hash functor, the container treats +the combining hash functor as a ranged-hash function. For example, Figure +<a href = "#hash_range_hashing_seq_diagram2"> +Insert hash sequence diagram with a null combination policy +</a> +shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), +the container transforms the key into a position +using the combining functor (points B and C). +</p> + + +<h6 align = "center"> +<a name = "hash_range_hashing_seq_diagram2"> +<img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Insert hash sequence diagram with a null combination policy. +</h6> + +<p> + Similarly, +<a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt> +is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and +<tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt> +and <tt>Comb_Probe_Fn</tt> are, respectively, +<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and +<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>, +then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt> +is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value, +and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions +within the table. +</p> + +<p> + <tt>pb_assoc</tt> contains the following probe policies: +</p> + +<ol> + <li> +<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe +function. + </li> + <li> +<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is +a quadratic probe function. + </li> +</ol> + + + + + + + + + +<h2><a name = "resize_policies">Resize Policies</a></h2> + +<p> + This subsection describes resize policies. It is organized as follows: +</p> + +<ol> + <li> <a href = "#general">General Terms</a> describes general + terms. + </li> + <li> <a href = "#size_policies">Size Policies</a> describes size + policies. + </li> + <li> <a href = "#trigger_policies">Trigger Policies</a> describes trigger + policies. + </li> <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a> + describes the implementation of these concepts in <tt>pb_assoc</tt>. + </li> +</ol> + + +<h3><a name = "general">General Terms</a></h3> + +<p> + Hash-tables, as opposed to trees, do not naturally grow or shrink. It +is necessary to specify policies to determine how and when a hash table should change +its size. +</p> + +<p> + In general, resize policies can be decomposed into (probably orthogonal) +policies: +</p> +<ol> + <li> A <i>size policy</i> indicating <i>how</i> a hash table should +grow (<i>e.g.,</i> it should multiply by powers of 2). + </li> + <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should +grow (<i>e.g.,</i> a load factor is exceeded). + </li> +</ol> + + + +<h3><a name = "size_policies">Size Policies</a></h3> + +<p> + Size policies determine how a hash table +changes size. These policies are simple, and there are relatively +few sensible options. An exponential-size policy (with the initial +size and growth factors both powers of 2) works well with a +mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), +and is the +hard-wired policy used by Dinkumware +[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A +prime-list based policy works well with a modulo-prime range +hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), +and is the +hard-wired policy used by SGI's implementation +[<a href = "references.html#sgi_stl">sgi_stl</a>]. +</p> + + + + +<h3><a name = "trigger_policies">Trigger Policies</a></h3> + +<p> + Trigger policies determine when a hash table changes size. +Following is a description of two polcies: <i>load-check</i> +policies, and a collision-check policies. +</p> + +<p> + Load-check policies are straightforward. The user +specifies two factors, <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>, and +the hash table maintains the invariant that +</p> +<p> + <i> + <a name = "eqn:load_factor_min_max"> + α<sub>min</sub> + ≤ + (number of stored elements) / (hash-table size) + ≤ + α<sub>max</sub> + </a> + </i> + (1) + . +</p> + +<p> + Collision-check policies work in the opposite direction of +load-check policies. They focus on keeping the number of +collisions moderate and hoping +that the size of the table will not grow very large, +instead of keeping a moderate load-factor and +hoping that the number of collisions will be small. +A +maximal collision-check policy resizes when the shortest +probe-sequence grows too large. +</p> + + +<p> + Consider Figure +<a href = "#balls_and_bins">Balls and bins</a>. + Let the size of the hash table be denoted by <i>m</i>, the +length of a probe sequence be denoted by <i>k</i>, and some load +factor be denoted by α. We would like to calculate the +minimal length of <i>k</i>, such that if there were <i>α m</i> elements +in the hash table, a probe sequence of length <i>k</i> would be found +with probability at most <i>1/m</i>. +</p> + +<h6 align = "center"> +<a name = "balls_and_bins"> +<img src = "balls_and_bins.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Balls and bins. +</h6> + + +<p> + Denote the probability that a probe sequence of length <i>k</i> +appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence +of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution. +Then +</p> + <p> + <a name = "eqn:prob_of_p1"> + <i>p<sub>1</sub> + = </i>(3) + </a> + </p> + <p> + <i> + <b>P</b>(l<sub>1</sub> ≥ k) + = + </i> + </p> + <p> + <i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1 ) + ≤ </i>(a) + </p> + <p> + <i> + e + ^ + ( + - + ( + α ( k / α - 1 )<sup>2</sup> + ) + /2 + ) + </i> + , +</p> +<p> + where (a) follows from the Chernoff bound +[<a href = "references.html#motwani95random">motwani95random</a>]. +To +calculate the probability that <i>some</i> bin contains a probe +sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are +negatively-dependent +[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. +Let <i><b>I</b>(.)</i> +denote the indicator function. Then + <p> + <a name = "eqn:at_least_k_i_n_some_bin"> + <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> ≥ k ) + = </i>(3) + </a> + </p> + <p> + <i> + <b>P</b> + ( + ∑ <sub>i = 1</sub><sup>m</sup> + <b>I</b>(l<sub>i</sub> ≥ k) ≥ 1 + ) + = + </i> + </p> + <p> + <i> + <b>P</b> + ( + ∑ <sub>i = 1</sub><sup>m</sup> + <b>I</b> + ( + l<sub>i</sub> ≥ k + ) + ≥ + m p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 ) + ) + ≤ </i>(a) + </p> + <p> + <i> + e + ^ + ( + ( + - + m p<sub>1</sub> + ( + 1 / (m p<sub>1</sub>) - 1 + ) + <sup>2</sup> + ) + / + 2 + ) + , + </i> + </p> +<p> +where (a) follows from the fact that the Chernoff bound can be +applied to negatively-dependent variables +[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. +Inserting <a href = "#prob_of_p1">(2)</a> into +<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>, +we obtain +</p> +<p> + <i> + k + ~ + √ + ( + 2 α </i>ln<i> 2 m </i>ln<i>(m) ) + ) + </i> + . +</p> + + + + + + + + + +<h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3> + +<p> + The resize policies in the previous subsection are conceptually straightforward. The design +of hash-based containers' size-related interface is complicated by some factors. +</p> +<ol> + <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no +distinction between the number of entries the container holds and the number of entries it is using. This, +of course, is not the case for hash-based containers. Moreover, even describing the +"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container +holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries. + </li> + <li> + The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy +maintains an invariant concerning the load factor of a container object. This is sometimes too rigid: + <ol> + <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container). + </li> + <li> + In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy. + </li> + </ol> + </li> + <li> + Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a> +discusses the previous concepts.) + </li> +</ol> + +<p> + Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers. +</p> + + + +<p> + This library attempts to address these types of problems by delegating all size-related functionality to +policy classes. Hash-based containers +are parameterized by a resize-policy class (among others), and derive publicly from +the resize-policy class +[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] + <i>E.g.</i>, a collision-chaining +hash table is defined as follows: +</p> +<pre> +cc_ht_map< + <b>class</b> Key, + <b>class</b> Data, + ... + <b>class</b> Resize_Policy + ...> : + <b>public</b> Resize_Policy +</pre> + +<p> + The containers themselves lack any functionality or public interface for manipulating sizes. A container +object merely forwards events to its resize policy object and queries it for needed actions. +</p> + +<p> + Figure +<a href = "#insert_resize_sequence_diagram1"> +Insert resize sequence diagram +</a> +shows a (possible) sequence diagram of an insert operation. +The user inserts an element; the hash table +notifies its resize policy that a search has started (point A); +in this case, a single collision is encountered - the table +notifies its resize policy of this (point B); the container +finally notifies its resize policy that the search has ended (point C); +it then queries its resize policy whether a resize is needed, and if so, +what is the new size (points D to G); following the resize, it notifies +the policy that a resize has completed (point H); finally, the element +is inserted, and the policy notified (point I). +</p> + +<h6 align = "center"> +<a name = "insert_resize_sequence_diagram1"> +<img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Insert resize sequence diagram. +</h6> + +<p> + This addresses, to some extent, the problems mentioned above: +</p> +<ol> + <li> + Different instantiations of range-hashing policies can be met with different instantiations of + resize policies. + </li> + <li> + Questions on size-related interface are avoided, since the containers have no size-related methods. Thus + a container has no method for querying its actual size. It merely continuously forwards enough information to + its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design the appropriate method. Also, a container has no methods for setting its size. It merely queries its +resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and +supports a <tt><b>protected virtual</b></tt> function for external resize. + </li> +</ol> + +<p> + The library contains a single class for instantiating a resize policy, +<tt>pb_assoc</tt> contains +a standard resize policy, +<a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> (the name is explained shortly). +In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports +queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility). +([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for +changing between alternative interfaces at compile time.) +</p> + +<p> +As noted before, + size and trigger policies are usually orthogonal. +<a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> +is parameterized by size and trigger policies. For example, +a collision-chaining hash table +is typically be defined as follows: +</p> +<pre> +cc_ht_map< + key, + data, + ... + ht_standard_resize_policy< + some_trigger_policy, + some_size_policy, + ...> > +</pre> + +<p> + The sole function of +<a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> + is to +act as a standard delegator +[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these +policies. + +<p> + Figures +<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a> + and +<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a> + show sequence diagrams illustrating the interaction between + the standard resize policy and its trigger and size policies, respectively. +</p> + +<h6 align = "center"> +<a name = "insert_resize_sequence_diagram2"> +<img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Standard resize policy trigger sequence diagram. +</h6> + +<h6 align = "center"> +<a name = "insert_resize_sequence_diagram3"> +<img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Standard resize policy size sequence diagram. +</h6> + +<p> + The library (currently) supports the following instantiations of size +and trigger policies: +</p> + +<ol> + <li> + <a href = "ht_load_check_trigger.html"><tt>ht_load_check_trigger</tt></a> implements + a load check trigger policy. + </li> + <li> + <a href = "ht_max_collision_check_grow_resize_trigger.html"><tt>ht_max_collision_check_grow_resize_trigger</tt></a> + implements a collision check trigger policy. + </li> + <li> +<a href = "ht_exponential_size_policy.html"><tt>ht_exponential_size_policy</tt></a> implemens +an exponential-size policy (which should be used with mask range hashing). + </li> + <li> +<a href = "ht_prime_size_policy.html"><tt>ht_prime_size_policy</tt></a> implementing +a size policy based on a sequence of primes +[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing + </li> +</ol> + +<p> + The trigger policies also support interfaces for changing their specifics which depend on compile time constants. +</p> + + +<p> + Figure +<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture +of the resize-related classes. +<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized +by <tt>Resize_Policy</tt>, from which it subclasses publicly +[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]. +This class is currently instantiated only by +<a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a>. +<a href = "ht_standard_resize_policy.html"><tt>ht_standard_resize_policy</tt></a> itself +is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>. +Currently, <tt>Trigger_Policy</tt> is instantiated by +<a href = "ht_load_check_trigger.html"><tt>ht_load_check_trigger</tt></a>, +or +<a href = "ht_max_collision_check_grow_resize_trigger.html"><tt>ht_max_collision_check_grow_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by +<a href = "ht_exponential_size_policy.html"><tt>ht_exponential_size_policy</tt></a>, +or +<a href = "ht_prime_size_policy.html"><tt>ht_prime_size_policy</tt></a>. +</p> + + +<h6 align = "center"> +<a name = "resize_policy_cd"> +<img src = "resize_policy_cd.jpg" width = "70%" alt = "no image"> +</a> +</h6> +<h6 align = "center"> +Resize policy class diagram. +</h6> + + + + +<h2><a name = "#policy_interaction">Policy Interaction</a></h2> + +<p> + Hash-tables are unfortunately susceptible to choice of policies. One +of the more complicated aspects of this is that poor combinations of good policies +can alter performance drastically. Following are some considerations. +</p> + + + + + +<h3>Range-Hashing Policies and Resize Policies</h3> + +<p> +</p> + + +<h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3> + + +<p> +<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> +and +<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> +are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt> +parameter. If the latter parameter is <tt><b>true</b></tt>, then +the container stores with each entry a hash value, and uses +this value in case of collisions to determine whether to apply a hash value. +This can lower the cost of collision for some types, but increase the cost of collisions for other types. +</p> + +<p> + If a ranged-hash function or ranged probe function is directly supplied, however, +then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted. +</p> + + + +</body> + +</html> |