Erasure code: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile web edit Advanced mobile edit
 
(12 intermediate revisions by 9 users not shown)
Line 2:
 
{{Short description|Code added to allow recovery of lost data}}
In [[coding theory]], an '''erasure code''' is a [[forward error correction]] (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of ''k'' symbols into a longer message (code word) with ''n'' symbols such that the original message can be recovered from a subset of the ''n'' symbols. The fraction ''r'' = ''k''/''n'' is called the [[code rate]]. The fraction ''k’/k'', where ''k’'' denotes the number of symbols required for recovery, is called [[reception efficiency]]. The recovery algorithm expects that it is known which of the ''n'' symbols are lost.
 
== History ==
 
Erasure coding was invented by [[Irving Reed]] and [[Gustave Solomon]] in 1960.<ref name="xinnor">
{{Cite web |date=2023-09-03 |title=RAID vs. Erasure Coding. What's the Difference? {{!}} Blog {{!}} Xinnor |url=https://xinnor.io/blog/a-guide-to-raid-pt-4-raid-vs-erasure-coding-whats-the-difference/ |access-date=2024-09-18 |website=The Fastest and Most Reliable Software RAID {{!}} Xinnor |language=en}}
</ref>
 
There are many different erasure coding schemes. The most popular erasure codes are
[[Reed-Solomon coding]],
[[Low-density parity-check code]] (LDPC codes), and
[[Turbo codes]].<ref name="xinnor" />
 
As of 2023, modern data storage systems can be designed to tolerate the complete failure of a few disks without data loss, using one of 3 approaches:<ref>
{{Cite web |date=2014-04-07 |title=Ceph.io — Erasure Coding in Ceph |url=https://ceph.io/en/news/blog/2014/erasure-coding-in-ceph-2/ |access-date=2024-09-18 |website=ceph.io |language=en}}
</ref><ref name="bdr">
{{Cite web |last=Lee |first=Brandon |date=2023-12-26 |title=RAID vs Erasure Coding vs Replication |url=https://www.bdrsuite.com/blog/raid-vs-erasure-coding-vs-replication/ |access-date=2024-09-18 |website=BDRSuite |language=en-US}}
</ref><ref name="jor">
{{Cite web |last=O'Reilly |first=Jim |title=RAID Vs. Erasure Coding |url=https://www.networkcomputing.com/data-center-networking/raid-vs-erasure-coding |access-date=2024-09-18 |website=www.networkcomputing.com |language=en}}
</ref>
* Replication
* RAID
* Erasure Coding
 
While technically RAID can be seen as a kind of erasure code,<ref>
Dimitri Pertin, Alexandre van Kempen, Benoît Parrein, Nicolas Normand. [https://hal.science/hal-01162047/document "Comparison of RAID-6 Erasure Codes"].
The third Sino-French Workshop on Information and Communication Technologies,
SIFWICT 2015, Jun 2015, Nantes, France. ffhal-01162047f
</ref>
"RAID" is generally applied to an array attached to a single host computer (which is a single point of failure), while "erasure coding" generally implies multiple hosts,<ref name="bdr" />
sometimes called a [[Redundant Array of Inexpensive Servers]] (RAIS).
The erasure code allows operations to continue when any one of those hosts stops.<ref name="jor" /><ref>
{{Cite web |title=Understanding IBM Spectrum Scale Erasure Code Edition fault tolerance |url=https://www.ibm.com/docs/en/storage-scale-ece/5.1.3?topic=issece-understanding-spectrum-scale-erasure-code-edition-fault-tolerance |access-date=2024-09-18 |website=www.ibm.com |language=en-us}}
</ref>
 
Compared to block-level RAID systems, object storage erasure coding has some significant differences that make it more resilient.<ref>
{{Cite web |date=2021-07-27 |title=Object Storage Erasure Coding vs. Block Storage RAID |url=https://blog.min.io/erasure-coding-vs-raid/ |access-date=2024-09-18 |website=MinIO Blog |language=en}}
</ref><ref>
{{Cite web |title=Erasure coding vs Raid as a data protection method {{!}} Computer Weekly |url=https://www.computerweekly.com/feature/Erasure-coding-versus-RAID-as-a-data-protection-method |access-date=2024-09-18 |website=ComputerWeekly.com |language=en}}
</ref><ref>
{{Cite web |last=Kruth |first=Peter |date=2023-10-04 |title=Erasure Code: RAID As It Should Be – Huawei BLOG |url=https://blog.huawei.com/2021/11/30/erasure-code-raid-as-it-should-be/ |access-date=2024-09-18 |archive-url=https://web.archive.org/web/20231004193238/https://blog.huawei.com/2021/11/30/erasure-code-raid-as-it-should-be/ |archive-date=2023-10-04 }}
</ref><ref>
{{Cite web |date=2022-04-25 |title=Erasure Coding 101 |url=https://blog.min.io/erasure-coding/ |access-date=2024-09-18 |website=MinIO Blog |language=en}}
</ref><ref>
{{Cite web |last=Bhaskaran |first=Dinesh Kumar |date=July 6, 2018 |title=Why erasure coding is the future of data resiliency |url=https://www.datacenterdynamics.com/en/opinions/why-erasure-coding-is-the-future-of-data-resiliency/ |url-status=dead |archive-url=https://web.archive.org/web/20200807015110/https://www.datacenterdynamics.com/en/opinions/why-erasure-coding-is-the-future-of-data-resiliency/ |archive-date=August 7, 2020}}
</ref>
 
==Optimal erasure codes==
Line 15 ⟶ 60:
If one of these values, <math>v_e</math>, is erased, it can be easily recovered by summing the remaining variables:
:<math>v_{e}= - \sum_{i=1 ,i \neq e }^{k+1}v_i.</math>
 
[[RAID 5]] is a widely used application of the parity check erasure code.<ref name="xinnor" />
 
===Polynomial oversampling===
Line 22 ⟶ 69:
 
[[Alice and Bob|Alice]] wants to send her telephone number (555629) to [[Alice and Bob|Bob]] using err-mail. Err-mail works just like e-mail, except
#About half of all the mail gets lost.<ref>Some versions of this story refer to the err-mail daemon.</ref>
#Messages longer than 5 characters are illegal.
#It is very expensive (similar to air-mail).
Line 56 ⟶ 103:
 
Most practical erasure codes are [[systematic code]]s -- each one of the original ''k'' symbols can be found copied, unencoded, as one of the ''n'' message symbols.<ref name="vinayak" />
(Erasure codes that support [[secret splittingsharing]] never use a systematic code).
 
 
==Near-optimal erasure codes==
Line 100 ⟶ 146:
 
Initially, erasure codes were used to reduce the cost of storing "cold" (rarely-accessed) data efficiently; but erasure codes can also be used to improve performance serving "hot" (more-frequently-accessed) data.<ref name="vinayak" />
 
[[RAID]] N+M divides data blocks across N+M drives, and can recover all the data when any M drives fail.<ref name="xinnor" />
In particular, RAID 7.3 refers to triple-parity RAID, and can recover all the data when any 3 drives fail.<ref>
Adam Leventhal.
[https://dl.acm.org/doi/pdf/10.1145/1661785.1670144 ""Triple-parity RAID and beyond"].
2009.
</ref>
 
==Examples==
Line 123 ⟶ 176:
*[[Reed–Solomon|Reed–Solomon codes]]
*[http://switzernet.com/people/emin-gabrielyan/051103-erasure-9-5-resilient/ Erasure Resilient Systematic Code], an MDS code outperforming Reed–Solomon in the maximal number of redundant packets,{{According to whom|date=August 2023}} see [http://switzernet.com/people/emin-gabrielyan/051025-erasure-resilient/ RS(4,2) with 2 bits] or [http://switzernet.com/people/emin-gabrielyan/051027-erasure-9-2-resilient/ RS(9,2) with 3 bits]
*[[Regenerating Codes]]<ref name="regcodes">{{cite journal|last1=Dimakis|first1=Alexandros G.|last2=Godfrey|first2=P. Brighten|last3=Wu|first3=Yunnan|last4=Wainwright|first4=Martin J.|last5=Ramchandran|first5=Kannan|title=Network Coding for Distributed Storage Systems|journal=IEEE Transactions on Information Theory|date=September 2010|volume=56|issue=9|pages=4539–4551|doi=10.1109/TIT.2010.2054295|arxiv=cs/0702015|citeseerx=10.1.1.117.6892|s2cid=260559901 }}</ref><ref>{{Cite web |date=2017-07-31 |title=home [Erasure Coding for Distributed Storage Wiki] |url=http://storagewiki.ece.utexas.edu/doku.php |access-date=2023-08-20 |archive-url=https://web.archive.org/web/20170731144507/http://storagewiki.ece.utexas.edu/doku.php |accessarchive-date=20232017-0807-2031 |website=web.archive.org}}</ref>
*any other [[maximum distance separable code]]
 
Line 130 ⟶ 183:
*[[Secret sharing]] (differs in that the original secret is encrypted and obscured until the decode quorum is reached)
*[[Spelling alphabet]]
* [[Binary erasure channel]]
 
==References==