Linear hashing: Difference between revisions

Content deleted Content added
Used "load factor" as terminology instead of redefining it. Renamed heading
Algorithm details: Reorganized information
Line 45:
When a bucket is split into two new buckets, the hash function <math>h_i</math> is replaced with <math>h_{i+1}</math> for both of those buckets.<ref name="LMS" />
 
===Split= controlAddressing ====
Linear hashing algorithms may use only controlled splits or both controlled and uncontrolled splits.
 
An '''uncontrolled split''' occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
 
'''Controlled splitting''' occurs if a split is performed whenever the load factor, which is monitored by the file, exceeds a predetermined threshold.<ref name="LMS" /> If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the load factor surpasses a set threshold, the designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).<ref name=":0">{{Cite book |last=Silberschatz |first=Abraham |title=Database system concepts |last2=Korth |first2=Henry F. |last3=Sudarshan |first3=S. |date=2020 |publisher=McGraw-Hill Education |isbn=978-1-260-08450-4 |edition=Seventh |___location=New York, NY}}</ref>
 
'''File contraction''' occurs in some LH algorithm implementations if a controlled split causes the load factor to sink below a threshold. In this case, a merge operation would be triggered which would undo the last split, and reset the file state.<ref name="LMS" />
 
===Addressing===
Addressing is based on the file state, consisting of the split pointer <math>s</math>
and the level <math>l</math>. If the level is <math>l</math>, then the hash functions
used are <math>h_l</math> and <math>h_{l+1}</math>.
 
The LH algorithm for hashing key <math>c</math> is<ref name="LMS" />
 
<math>a := h_l(c)</math>
 
if <math> a<s: a := h_{l+1}(c) </math>
 
===Split control===
Linear hashing algorithms may use only controlled splits or both controlled and uncontrolled splits.
 
An '''uncontrolled split''' occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
 
'''Controlled splitting''' occurs if a split is performed whenever the load factor, which is monitored by the file, exceeds a predetermined threshold.<ref name="LMS" /> If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the ''load factor'' surpasses a set threshold, the designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).<ref name=":0">{{Cite book |last=Silberschatz |first=Abraham |title=Database system concepts |last2=Korth |first2=Henry F. |last3=Sudarshan |first3=S. |date=2020 |publisher=McGraw-Hill Education |isbn=978-1-260-08450-4 |edition=Seventh |___location=New York, NY}}</ref>
 
'''File contraction''' occurs in some LH algorithm implementations if a controlled split causes the load factor to sink below a threshold. In this case, a merge operation would be triggered which would undo the last split, and reset the file state.<ref name="LMS" />
 
===Split pointer===
Line 73:
 
For example, if numerical records are inserted into the hash index according to their farthest right binary digits, the bucket corresponding with the appended bucket will be split. Thus, if we have the buckets labelled as 000, 001, 10, 11, 100, 101, we would split the bucket 10 because we are appending and creating the next sequential bucket 110. This would give us the buckets 000, 001, 010, 11, 100, 101, 110.<ref name=":0" />
 
===File state calculation===
The file state consists of split pointer <math>s</math> and level <math>l</math>. If the
original file started with <math>N=1</math> buckets, then the number of buckets
<math>n</math> and the file state are related via
<ref name="CS">
{{Citation | first1=Juan | last1 = Chabkinian | first2=Thomas | last2=Schwarz
| title=Fast LH* | journal=International Journal of Parallel Programming
| volume=44 | number=4 | date=2016 | pages=709–734
| doi = 10.1007/s10766-015-0371-8 | s2cid = 7448240 }}
</ref>
 
<math>n = 2^l+s </math>
 
===LH*===
Line 95 ⟶ 82:
if there is only one split or merge going on while the request is processed, it can
be shown that there are at most two forwards. After a forward, the final bucket sends an
Image Adjustment Message to the client whose state is now closer to the state of the distributed file.<ref name="LMS" /> While forwards are reasonably rare for active clients,
their number can be even further reduced by additional information exchange between
servers and clients <ref name="CS/">
{{Citation |last1=Chabkinian |first1=Juan |title=Fast LH* |date=2016 |journal=International Journal of Parallel Programming |volume=44 |number=4 |pages=709–734 |doi=10.1007/s10766-015-0371-8 |s2cid=7448240 |last2=Schwarz |first2=Thomas}}
</ref>
 
== Other properties ==
 
===File state calculation===
The file state consists of split pointer <math>s</math> and level <math>l</math>. If the original file started with <math>N=1</math> buckets, then the number of buckets <math>n</math> and the file state are related via
<ref name="CS" />
 
<math>n = 2^l+s </math>.
 
==Adoption in language systems==