Locally recoverable code: Difference between revisions

Content deleted Content added
Submitting using AfC-submit-wizard
OAbot (talk | contribs)
m Open access bot: arxiv updated in citation with #oabot.
 
(46 intermediate revisions by 16 users not shown)
Line 1:
{{Short description|Coding Theory}}
{{Draft topics|stem}}
{{AfC topic|stem}}
{{AfC submission|||ts=20240924004443|u=Yaroslav-Marta|ns=118}}
{{AfC submission|t||ts=20231130204613|u=Yaroslav-Marta|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
'''Locally Recoverablerecoverable Codescodes''' are a family of [[error correction codescode]]s that were introduced first by D. S. Papailiopoulos and A. G. Dimakis<ref>{{Citation
|first1=Dimitris S.|last1=Papailiopoulos |first2=Alexandros G. |last2=Dimakis |titlecontribution="Locally Repairablerepairable Codes"codes |pages=2771-27752771–2775 |___location=Cambridge, MA, USA |publishertitle=2012 IEEE International Symposium on Information Theory Proceedings |date=2012 |doi=10.1109/ISIT.2012.6284027|isbn=978-1-4673-2579-0 |arxiv=1206.3804 |publisher=IEEE}}</ref> and have been widely studied in Information[[information theory]] due to their applications related to Distributivedistributive and Cloud[[cloud Storagestorage]] Systemssystems.<ref>{{Citation
<ref>{{Citation
|first1=A.
|last1=Barg
Line 14 ⟶ 9:
|first3=S.
|last3=Vlăduţ
|titlecontribution="Locally recoverable codes on algebraic curves"
|pages=1252-12561252–1256
|___location=Hong Kong, China
|publishertitle=2015 IEEE International Symposium on Information Theory
|date=2015
|doi=10.1109/ISIT.2015.7282656
|arxiv=1603.08876
}}</ref>
|isbn=978-1-4673-7704-1
<ref>{{Citation
|publisher=IEEE
}}</ref><ref>{{Citation
|first1=V. R.
|last1=Cadambe
|first2=A.
|last2=Mazumdar
|title="Bounds on the Size of Locally Recoverable Codes"
|pages=5787-57945787–5794
|publisherjournal=IEEE Transactions on Information Theory
|date=2015
|volume=61
|issue=11
|doi=10.1109/TIT.2015.2477406
|publisher=IEEE
}}</ref>
}}</ref><ref>{{Citation
|first1=A.
|last1=Dukes
Line 39 ⟶ 38:
|first3=G.
|last3=Micheli
|title="Optimal selection for good polynomials of degree up to five"
|journal=Designs, Codes and Cryptography
|pages=1427-1436
|pages=1427–1436
|date=2022
|volume=90
|issue=6
|doi= 10.1007/s10623-022-01046-y
|publisher=IEEE
}}</ref>
|arxiv=2104.01434
<ref>{{Citation
}}</ref><ref>{{Citation
|first1=K.
|last1=Haymaker
Line 51 ⟶ 54:
|first3=G.
|last3=Matthews
|title="Locally Recoverable Codes with Availability t≥2''t''≥2 from Fiber Products of Curves"
|date=2022
|doi=10.3934/amc.2018020
|doi-access=free
}}</ref>
 
An <math>[n, k, d, r]_{q}</math> LRC is an <math>[n, k, d]_{q}</math> [[linear code]] such that there is a [[Function (mathematics)|function]] <math>f_{i}</math> that takes as input <math>i</math> and a [[Set (mathematics)|set]] of <math>r</math> other [[Coordinate system|coordinates]] of a codeword <math>c = (c_{1}, \ldots, c_{n}) \in C</math> different from <math>c_{i}</math>, and outputs <math>c_{i}</math>.
 
==Overview==
 
[[Error correction code|Erasure-correcting codes]], or simply [[Error correction code|erasure codes]], for distributed and [[cloud storage]] systems, are becoming more and more popular as a result of the present spike in demand for [[cloud computing]] and storage services. This has inspired researchers in the fields of [[Information theory|information]] and [[coding theory]] to investigate new facets of codes that are specifically suited for use with storage systems.
It is well-known that LRC is a [[linear code|code]] that needs only a limited [[Set (mathematics)|set]] of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and [[cloud storage]] systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much [[data]] as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems.
 
The following [[definition]] of the LRC follows from the description above: an <math>[n, k, r]</math>-Locally Recoverable Code (LRC) of length <math>n</math> is a [[linear code|code]] that produces an <math>n</math>-symbol codeword from <math>k</math> information symbols, and for any symbol of the codeword, there exist at most <math>r</math> other symbols such that the value of the symbol can be recovered from them. The locality [[parameter]] satisfies <math>1 \leq r \leq k</math> because the entire codeword can be found by accessing <math>k</math> symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum [[Hamming distance|distance]] <math>d</math>, can recover <math>d-1</math> erasures.
 
==Definition==
 
Let <math>C</math> be a <math>[n, k, d]_{q}</math> [[linear code]]. For <math>i \in \{1, \ldots, n\}</math>, let us denote by <math>r_{i}</math> the minimum number of other [[Coordinate system|coordinates]] we have to look at to recover an erasure in [[Coordinate system|coordinate]] <math>i</math>. The number <math>r_{i}r_i</math> is said to be the ''locality of the <math>i</math>-th [[Coordinate system|coordinate]]'' of the code. The ''locality'' of the code is defined as <div style="text-align: center;"><math>r = \textrm{max}\{r_{i}|r_i \mid i \in \{1, \ldots, n\}\}.</math></div>
 
An <math>[n, k, d, r]_{q}</math> ''locally recoverable code'' (LRC) is an <math>[n, k, d]_{q}_q</math> [[linear code]] <math>C \in \mathbb F_q^n</math> with locality <math>r</math>.
 
Let <math>C</math> be an <math>[n, k, d]_{q}_q</math>-locally recoverable code. Then an erased component can be recovered linearly,<ref>{{Citation
|first1=Dimitris S.|last1=Papailiopoulos |first2=Alexandros G. |last2=Dimakis |titlecontribution="Locally Repairablerepairable Codes"codes |pages=2771-27752771–2775 |___location=Cambridge, MA, USA |publishertitle=2012 IEEE International Symposium on Information Theory |date=2012 |doi=10.1109/ISIT.2012.6284027|isbn=978-1-4673-2579-0 |arxiv=1206.3804 }}</ref>, i.e. for every <math>i \in \{1, \ldots, n\}</math>, the space of [[linear equationsequation]]s of the code contains [[Element (mathematics)|elements]] of the form <math> x_{i}x_i = f(x_{i_{1}i_1}, \ldots, x_{i_{r}i_r})</math>, where <math>i_{j}i_j \neq i</math>.
 
==Optimal Locallylocally Recoverablerecoverable Codescodes==
 
'''Theorem'''<ref>{{Citation
|first1=V. |last1=Cadambe |first2=A. |last2=Mazumdar |titlecontribution="An upper bound on the size of locally recoverable codes" |pages=1-51–5 |___location=Calgary, AB, Canada |publishertitle=2013 International Symposium on Network Coding |date=2013 |doi=10.1109/NetCod.2013.6570829|arxiv=1308.3200 |isbn=978-1-4799-0823-3 }}</ref> Let <math>n = (r+1)s</math> and let <math>C</math> be an <math>[n, k, d]_{q}</math>-locally recoverable code having <math>s</math> disjoint locality [[Disjoint sets|sets]] of size <math>r+1</math>. Then <div style="text-align: center;"><math>d \leq n - k - \left\lceil\frac{k}{r}\right\rceil + 2.</math></div>
 
An <math>[n, k, d, r]_{q}</math>-LRC <math>C</math> is said to be optimal if the minimum [[Hamming distance|distance]] of <math>C</math> satisfies <div style="text-align: center;"><math>d = n - k - \left\lceil\frac{k}{r}\right\rceil + 2 .</math></div>
 
== Tamo--BargTamo–Barg Codescodes ==
 
Let <math>f</math> \in <math>\mathbb F_{q} [x]</math> be a [[polynomial]] and let <math>\ell</math> be a positive [[integer]]. Then <math>f</math> is said to be (<math>r</math>, <math>\ell</math>)-good if
 
:• <math>f</math> has [[Degree of a polynomial|degree]] <math>r+1</math>,
 
:• there exist distinct [[subset]]s <math>A_{1}</math> , . . .\ldots, <math>A_{\ell}</math> distinct subsets of <math>\mathbb F_{q}</math> such that
 
::– for any <math>i \in \{1, \ldots, \ell\}</math>, <math>f</math> (<math>A_{i}</math> ) = \{<math>t_{i}\}</math>} for some <math>t_{i}</math> \in <math>\mathbb F_{q}</math> , i.e., <math>f</math> is [[Constant (mathematics)|constant]] on <math>A_{i}</math>,
 
::– #<math>\# A_{i}</math> = <math>r + 1</math>,
 
::– <math>A_{i}</math> \cap <math>A_{j} = \varnothing</math> = ∅ for any <math>i</math> \neq <math>j</math>.
 
We say that {<math>A_{1},\ldots,A_{\ell}</math>} is a splitting covering for <math>f</math>.<ref>{{Citation |first1=G.
|last1=Micheli |title="Constructions of Locally Recoverable Codes Which are Optimal" |pages=167-175167–175 |publisherjournal=IEEE Transactions on Information Theory |date=2020 |volume=66 |doi=10.1109/TIT.2019.2939464
|arxiv=1806.11492 }}</ref>.
 
=== Tamo--BargTamo–Barg Constructionconstruction ===
 
The Tamo--BargTamo–Barg construction utilizes good polynomials.<ref>{{Citation
|first1=I.|last1=Tamo |first2=A. |last2=Barg |titlecontribution="A family of optimal locally recoverable code"codes |pages=686-690686–690 |___location=Honolulu, HI, USA |publishertitle=2014 IEEE International Symposium on Information Theory |date=2014 |doi=10.1109/ISIT.2014.6874920|isbn=978-1-4799-5186-4 }}</ref>
:• Suppose that a <math>(r, \ell)</math>-good polynomial <math>f(x)</math> over <math>\mathbb F_{q}</math> is given with splitting covering <math>i \in \{1, \ldots, \ell\}</math>.
:• Let <math>s</math> \leq <math>\ell-1</math> be a positive [[integer]].
:• Consider the following <math>\mathbb F_{q}F_q</math> -[[vector space]] of polynomials[[polynomial]]s <math display="block">V = \left\{\sum_{i=0}^s g_i(x)f(x)^i: \deg(g_i(x)) \leq \deg(f(x))-2 \right\}.</math>
:• Let <math>T</math> = <math display="inline">T = \bigcup_{i=1}^\ell A_i</math>.
<div style="text-align: center;"><math>V = \{\sum_{i=0}^s g_{i}(x)f(x)^i:{\text{deg}}(g_{i}(x)) \leq {\text{deg}}(f(x))-2\}</math></div>
:• The code <math> \{ ev_\operatorname{Tev}_T(g):g \in V \}</math> is an <math>((r+1)\ell,(s+1)r,d,r)</math> -optimal locally coverable code, where <math>ev_\operatorname{Tev}_T</math> denotes evaluation of <math>g</math> at all points in the [[Set (mathematics)|set]] <math>T</math>.
:• Let <math>T</math> = <math display="inline">\bigcup_{i=1}^\ell A_i</math>
:• The code <math> \{ ev_{T}(g):g \in V \}</math> is an <math>((r+1)\ell,(s+1)r,d,r)</math> optimal locally coverable code, where <math>ev_{T}</math> denotes evaluation of g at all points in the set <math>T</math>.
 
=== Parameters of Tamo--BargTamo–Barg Codescodes ===
 
:• '''Length.''' The length is the number of evaluation points. Because the [[Set (mathematics)|sets]] <math>A_i</math> are [[Disjoint sets|disjoint]] for <math>i \in \{1, \ldots, \ell\}</math>, the length of the code is <math>|T| = (r+1)\ell</math>.
 
:• '''Dimension.''' The [[dimension]] of the code is <math>(s+1)r</math>, for <math>s</math> ≤ <math>\ell-1</math>, as each <math>g_i</math> has [[Degree of a polynomial|degree]] at most <math>{\text{deg}}(f(x))-2</math>, covering a [[vector space]] of [[dimension]] <math>{\text{deg}}(f(x))-1=r</math>, and by the construction of <math>V</math>, there are <math>s+1</math> distinct <math>g_i</math>.
 
:• '''Distance.''' The [[Hamming distance|distance]] is given by the fact that <math>V \subseteq \mathbb F_q [x]_{\leq k}</math>, where <math>k = r + 1 - 2 + s(r+1)</math>, and the obtained code is the [[Reed–Solomon error correction|Reed-Solomon code]] of [[Degree of a polynomial|degree]] at most <math>k</math>, so the minimum [[Hamming distance|distance]] equals <math>(r+1)\ell-((r+1)-2+s(r+1))</math>.
 
:• '''Locality.''' After the erasure of the single component, the evaluation at <math>a_i \in A_i</math>, where <math>|A_i|=r+1</math>, is unknown, but the evaluations for all other <math>a \in A_i</math> are known, so at most <math>r</math> evaluations are needed to uniquely determine the erased component, which gives us the locality of <math>r</math>.
 
: To see this, <math>g</math> restricted to <math>A_j</math> can be described by a [[polynomial]] <math>h</math> of [[Degree of a polynomial|degree]] at most <math>{\text{deg}}(f(x))-2 = r + 1 - 2 = r - 1</math> thanks to the form of the [[Element (mathematics)|elements]] in <math>V</math> (i.e., thanks to the fact that <math>f</math> is [[Constant function|constant]] on <math>A_j</math>, and the <math>g_i</math>'s have [[Degree of a polynomial|degree]] at most <math>{\text{deg}}(f(x))-2</math>). On the other hand <math>|A_j \backslash \{a_j\}| = r</math>, and <math>r</math> evaluations uniquely determine a [[polynomial]] of [[Degree of a polynomial|degree]] <math>r-1</math>. Therefore <math>h</math> can be constructed and evaluated at <math>a_j</math> to recover <math>g(a_j)</math>.
 
=== Example of Tamo-BargTamo–Barg Constructionconstruction ===
 
We will use <math>x^5 \in \mathbb F_{41} [x]</math> to construct <math>[15, 8, 6, 4]</math>-LRC. Notice that the [[Degree of a polynomial|degree]] of this [[polynomial]] is 5, and it is [[Constant (mathematics)|constant]] on <math>A_i</math> for <math>i \in \{1, \ldots, 8\}</math>, where <math>A_1 = \{1, 10, 16, 18, 37\}</math>, <math>A_2 = 2A_1</math>, <math>A_3 = 3A_1</math>, <math>A_4 = 4A_1</math>, <math>A_5 = 5A_1</math>, <math>A_6 = 6A_1</math>, <math>A_7 = 11A_1</math>, and <math>A_8 = 15A_1</math>: <math>A_1^5 = \{1\}</math>, <math>A_2^5 = \{32\}</math>, <math>A_3^5 = \{38\}</math>, <math>A_4^5 = \{40\}</math>, <math>A_5^5 = \{9\}</math>, <math>A_6^5 = \{27\}</math>, <math>A_7^5 = \{3\}</math>, <math>A_8^5 = \{14\}</math>. Hence, <math>x^5</math> is a <math>(4,8)</math>-good polynomial over <math>\mathbb F_{41}</math> by the definition. Now, we will use this [[polynomial]] to construct a code of [[dimension]] <math>k = 8</math> and length <math>n = 15</math> over <math>\mathbb F_{41}</math>. The locality of this code is 4, which will allow us to recover a single [[Server (computing)|server]] failure by looking at the infromationinformation contained in at most 4 other [[Server (computing)|servers]].
 
Next, let's us define the encoding [[polynomial]]: <math>f_a(x) = \sum_{i=0}^{r - 1} f_i(x)x^i</math>, where <math>f_i(x) = \sum_{i=0}^{\frac{k}{r} - 1} a_{i,j}g(x)^j</math>. So, <math>f_a(x) =</math> <math>a_{0,0} +</math> <math>a_{0,1}x^5 +</math> <math>a_{1,0}x +</math> <math>a_{1,1}x^6 +</math> <math>a_{2,0}x^2 +</math> <math>a_{2,1}x^7 +</math> <math>a_{3,0}x^3 +</math> <math>a_{3,1}x^8</math>.
 
Thus, we can use the obtained encoding [[polynomial]] if we take our [[data]] to encode as the [[Row and column vectors|row vector]] <math>a = </math> <math>(a_{0,0}, a_{0,1}, a_{1,0}, a_{1,1}, a_{2,0}, a_{2,1}, a_{3,0}, a_{3,1})</math>. Encoding the [[Vector (mathematics and physics)|vector]] <math>m</math> to a length 15 message [[Vector (mathematics and physics)|vector]] <math>c</math> by multiplying <math>m</math> by the [[generator matrix]]
 
: <div style="text-align: center;"><math>
<math>
G = \begin{pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
Line 134 ⟶ 145:
1 & 16 & 37 & 10 & 18 & 10 & 37 & 1 & 16 & 18 & 1 & 37 & 10 & 18 & 16
\end{pmatrix}
. </math></div>
 
For example, the encoding of information [[Vector (mathematics and physics)|vector]] <math>m = (1, 1, 1, 1, 1, 1, 1, 1)</math> gives the codeword <math>c = mG = (8, 8, 5, 9, 21, 3, 36, 31, 32, 12, 2, 20, 37, 33, 21)</math>.
 
Observe that we constructed an optimal LRC; therefore, using the [[Singleton bound]], we have that the [[Hamming distance|distance]] of this code is <math>d = n - k - \left\lceil\frac{k}{r}\right\rceil + 2 = 15 - 8 - 2 + 2 = 7</math>. Thus, we can recover any 6 erasures from our codeword by looking at no more than 8 other components.
 
==Locally Recoverablerecoverable Codescodes with Availabilityavailability==
 
A code <math>C</math> has all-symbol locality <math>r</math> and availability <math>t</math> if every code symbol can be recovered from <math>t</math> disjoint repair sets of other symbols, each [[Set (mathematics)|set]] of [[Cardinality|size]] at most <math>r</math> symbols. Such codes are called <math>(r,t)_a</math>-LRC.<ref>{{Citation
'''Definition'''<ref>{{Citation
|first1=P. |last1=Huang |first2=E. |last2=Yaakobi |first3=H.|last3=Uchikawa |first4=P.H.|last4=Siegel |titlecontribution="Linear locally repairable codes with availability" |pages=1871-18751871–1875 |___location=Hong Kong, China |publishertitle=2015 IEEE International Symposium on Information Theory |date=2015 |doi=10.1109/ISIT.2015.7282780|isbn=978-1-4673-7704-1 }}</ref> A code <math>C</math> has all-symbol locality <math>r</math> and availability <math>t</math> if every code symbol can be recovered from <math>t</math> disjoint repair sets of other symbols, each set of size at most <math>r</math> symbols. Such codes are called <math>(r,t)_a</math>-LRC.
 
'''Theorem'''<ref>{{Citation |first1=I. |last1=Tamo |first2=A. |last2=Barg |title="Bounds on locally recoverable codes with multiple recovering sets" |pages=691-695 |___location=Honolulu, HI, USA |publisher=2014 IEEE International Symposium on Information Theory |date=2014 |doi=10.1109/ISIT.2014.6874921}}</ref> The minimum [[Hamming distance|distance]] of <math>[n,k,d]_q</math>-LRC having locality <math>r</math> and availability <math>t</math> satisfies the [[Upper and lower bounds|upper bound ]]
<div style="text-align: center;"><math>Vd =\leq n - \{\sum_{i=0}^st g_{i}(x)f(x)^i:{\textleft\lfloor\frac{degk-1}}(g_{r^i}(x)) \leq {right\text{deg}}(f(x))-2\}rfloor.</math></div>
 
If the code is [[Systematic code|systematic]] and locality and availability apply only to its information symbols, then the code has information locality <math>r</math> and availability <math>t</math>, and is called <math>(r,t)_i</math>-LRC.<ref>{{Citation |first1=I. |last1=Tamo |first2=A. |last2=Barg |contribution=Bounds on locally recoverable codes with multiple recovering sets |pages=691–695 |___location=Honolulu, HI, USA |title=2014 IEEE International Symposium on Information Theory |date=2014 |doi=10.1109/ISIT.2014.6874921|arxiv=1402.0916 |isbn=978-1-4799-5186-4 }}</ref>
<div style="text-align: center;"><math>d \leq n - \sum_{i=0}^{t} \left\lfloor\frac{k-1}{r^i}\right\rfloor</math></div>.
 
If the code is systematic and locality and availability apply only to its information symbols, then the code has information locality <math>r</math> and availability <math>t</math>, and is called <math>(r,t)_i</math>-LRC.
 
'''Theorem'''<ref>{{Citation
|first1=A. |last1=Wang |first2=Z. |last2=Zhang |title="Repair locality with multiple erasure tolerance" |pages=6979-69876979–6987 |publisherjournal=IEEE Transactions on Information Theory |date=2014 |volume=60 |issue=11 |doi=10.1109/TIT.2014.2351404|arxiv=1306.4774 }}</ref> The minimum [[Hamming distance|distance]] <math>d</math> of an <math>[n,k,d]_q</math> linear <math>(r,t)_i</math>-LRC satisfies the [[Upper and lower bounds|upper bound]]
<div style="text-align: center;"><math>d \leq n - \sum_{i=0}^{t} k-\left\lfloorlceil\frac{t(k-1)+1}{t(r^i-1)+1}\right\rfloorrceil+2.</math></div>.
 
== References ==
<div style="text-align: center;"><math>d \leq n-k-\left\lceil\frac{t(k-1)+1}{t(r-1)+1}\right\rceil+2</math></div>.
 
== References ==
<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->
{{reflist}}
 
[[Category:Cryptography]]
[[Category:Information theory]]
[[Category:Error detection and correction]]