Linear hashing: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile app edit Android app edit
Algorithm details: Reworded without losing information
Line 40:
===Hash functions===
====Determining the hashing function of a bucket====
To accessdetermine athe 0-based index of the bucket that contains the record with key <math>c</math>, a group of hash functions,function collectively known as a dynamic hash function,<math>h_i</math> is applied on the key <math>c</math>. At any time, at most two hash functions <math>h_i</math> and <math>h_{i+1}</math> are used. The hash function <math>h_i</math> represents the 0-based index of the bucket that should store the record with key <math>c</math>. When a bucket which uses the hash function <math>h_i</math> is split into two new buckets, the hash function <math>h_i</math> is replaced with <math>h_{i+1}</math> for both of those new buckets. WhenThe the total numberfamily of bucketshash is equal to one,functions <math>i=0 h_i(c)</math> is also referred to as the dynamic hash function.
 
====Typical hashing function====
A typical example uses the division modulo x operation in which a record corresponds to a bucket in which all the keys have the same rightmost binary digits. In this exampleTypically, the value of <math>i</math> in <math>h_i</math> corresponds to the number of rightmost binary digits that are used to segregate the buckets. TheThis family ofdynamic hash functionsfunction can be expressed arithmetically isas
 
<math> h_i(c) \mapsto c \pmod{2^i}</math>.<ref name="LMS" />
 
Note that when the total number of buckets is equal to one, <math>i=0</math>.<ref name="LMS" />
 
====Determining the hashing function for a key====
Line 57 ⟶ 59:
Linear hashing algorithms may use only controlled splits or both controlled and uncontrolled splits.
 
'''Controlled splitting''' occurs if a split is performed whenever the load factor, which is monitored by the file, exceeds a predetermined threshold.<ref name="LMS" /> If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the ''load factor'' surpasses a set threshold, the ''split pointer's'' designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).<ref name=":0">{{Cite book |last=Silberschatz |first=Abraham |title=Database system concepts |last2=Korth |first2=Henry F. |last3=Sudarshan |first3=S. |date=2020 |publisher=McGraw-Hill Education |isbn=978-1-260-08450-4 |edition=Seventh |___location=New York, NY}}</ref>
An '''uncontrolled split''' occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
 
An '''uncontrolled split''' occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
'''Controlled splitting''' occurs if a split is performed whenever the load factor, which is monitored by the file, exceeds a predetermined threshold.<ref name="LMS" /> If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the ''load factor'' surpasses a set threshold, the designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).<ref name=":0">{{Cite book |last=Silberschatz |first=Abraham |title=Database system concepts |last2=Korth |first2=Henry F. |last3=Sudarshan |first3=S. |date=2020 |publisher=McGraw-Hill Education |isbn=978-1-260-08450-4 |edition=Seventh |___location=New York, NY}}</ref>
 
'''File contraction''' occurs in some LH algorithm implementations if a controlled split causes the load factor to sink below a threshold. In this case, a merge operation would be triggered which would undo the last split, and reset the file state.<ref name="LMS" />