Locally recoverable code: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: arxiv updated in citation with #oabot.
 
(179 intermediate revisions by 19 users not shown)
Line 1:
{{Short description|Coding Theory}}
{{AfC submission|t||ts=20231130204613|u=Yaroslav-Marta|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
'''Locally recoverable codes''' are a family of [[error correction code]]s that were introduced first by D. S. Papailiopoulos and A. G. Dimakis<ref>{{Citation
==Introduction==
|first1=Dimitris S.|last1=Papailiopoulos |first2=Alexandros G. |last2=Dimakis |contribution=Locally repairable codes |pages=2771–2775 |___location=Cambridge, MA, USA |title=2012 IEEE International Symposium on Information Theory Proceedings |date=2012 |doi=10.1109/ISIT.2012.6284027|isbn=978-1-4673-2579-0 |arxiv=1206.3804 |publisher=IEEE}}</ref> and have been widely studied in [[information theory]] due to their applications related to distributive and [[cloud storage]] systems.<ref>{{Citation
|first1=A.
|last1=Barg
|first2=I.
|last2=Tamo
|first3=S.
|last3=Vlăduţ
|contribution=Locally recoverable codes on algebraic curves
|pages=1252–1256
|___location=Hong Kong, China
|title=2015 IEEE International Symposium on Information Theory
|date=2015
|doi=10.1109/ISIT.2015.7282656
|arxiv=1603.08876
|isbn=978-1-4673-7704-1
|publisher=IEEE
}}</ref><ref>{{Citation
|first1=V. R.
|last1=Cadambe
|first2=A.
|last2=Mazumdar
|title=Bounds on the Size of Locally Recoverable Codes
|pages=5787–5794
|journal=IEEE Transactions on Information Theory
|date=2015
|volume=61
|issue=11
|doi=10.1109/TIT.2015.2477406
|publisher=IEEE
}}</ref><ref>{{Citation
|first1=A.
|last1=Dukes
|first2=A.
|last2=Ferraguti
|first3=G.
|last3=Micheli
|title=Optimal selection for good polynomials of degree up to five
|journal=Designs, Codes and Cryptography
|pages=1427–1436
|date=2022
|volume=90
|issue=6
|doi= 10.1007/s10623-022-01046-y
|publisher=IEEE
|arxiv=2104.01434
}}</ref><ref>{{Citation
|first1=K.
|last1=Haymaker
|first2=B.
|last2=Malmskog
|first3=G.
|last3=Matthews
|title=Locally Recoverable Codes with Availability ''t''≥2 from Fiber Products of Curves
|date=2022
|doi=10.3934/amc.2018020
|doi-access=free
}}</ref>
 
An <math>[n, k, d, r]_{q}</math> LRC is an <math>[n, k, d]_{q}</math> [[linear code]] such that there is a [[Function (mathematics)|function]] <math>f_{i}</math> that takes as input <math>i</math> and a [[Set (mathematics)|set]] of <math>r</math> other [[Coordinate system|coordinates]] of a codeword <math>c = (c_{1}, \ldots, c_{n}) \in C</math> different from <math>c_{i}</math>, and outputs <math>c_{i}</math>.
Locally Recoverable Codes are a family of error correction codes that were introduced first by D. S. Papailiopoulos and A. G. Dimakis and have been widely studied in Information theory due to their applications related to Distributive and Cloud Storage Systems.
 
==Overview==
A locally recoverable code is a linear code such that there is a function that takes set of coordinates of a codeword and some specific coordinate and outputs an appropriate coordinate.
 
[[Error correction code|Erasure-correcting codes]], or simply [[Error correction code|erasure codes]], for distributed and [[cloud storage]] systems, are becoming more and more popular as a result of the present spike in demand for [[cloud computing]] and storage services. This has inspired researchers in the fields of [[Information theory|information]] and [[coding theory]] to investigate new facets of codes that are specifically suited for use with storage systems.
It is well-known that LRC is a [[linear code|code]] that needs only a limited [[Set (mathematics)|set]] of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and [[cloud storage]] systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much [[data]] as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems.
 
The following [[definition]] of the LRC follows from the description above: an <math>[n, k, r]</math>-Locally Recoverable Code (LRC) of length <math>n</math> is a [[linear code|code]] that produces an <math>n</math>-symbol codeword from <math>k</math> information symbols, and for any symbol of the codeword, there exist at most <math>r</math> other symbols such that the value of the symbol can be recovered from them. The locality [[parameter]] satisfies <math>1 \leq r \leq k</math> because the entire codeword can be found by accessing <math>k</math> symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum [[Hamming distance|distance]] <math>d</math>, can recover <math>d-1</math> erasures.
 
==Definition==
Let <math>C</math> be a <math>[n, k, d]_{q}</math> linear code. For <math>i \in \{1, \ldots, n\}</math>, let us denote by <math>r_{i}</math> the minimum number of other coordinates we have to look at to recover an erasure in coordinate <math>i</math>. The number <math>r_{i}</math> is said to be the ''locality of the <math>i</math>-th coordinate'' of the code. The ''locality'' of the code is defined as <div style="text-align: center;"><math>r = max\{r_{i}|i \in \{1, \ldots, n\}\}</math></div>
 
Let <math>C</math> be a <math>[n, k, d]_{q}</math> [[linear code]]. For <math>i \in \{1, \ldots, n\}</math>, let us denote by <math>r_{i}</math> the minimum number of other [[Coordinate system|coordinates]] we have to look at to recover an erasure in [[Coordinate system|coordinate]] <math>i</math>. The number <math>r_i</math> is said to be the ''locality of the <math>i</math>-th [[Coordinate system|coordinate]]'' of the code. The ''locality'' of the code is defined as <div style="text-align: center;"><math>r = \max\{r_i \mid i \in \{1, \ldots, n\}\}.</math></div>
 
An <math>[n, k, d, r]_{q}</math> ''locally recoverable code'' (LRC) is an <math>[n, k, d]_{q}_q</math> [[linear code]] <math>C \in \mathbb F_q^n</math> with locality <math>r</math>.
 
Let <math>C</math> be an <math>[n, k, d]_q</math>-locally recoverable code. Then an erased component can be recovered linearly,<ref>{{Citation
|first1=Dimitris S.|last1=Papailiopoulos |first2=Alexandros G. |last2=Dimakis |contribution=Locally repairable codes |pages=2771–2775 |___location=Cambridge, MA, USA |title=2012 IEEE International Symposium on Information Theory |date=2012 |doi=10.1109/ISIT.2012.6284027|isbn=978-1-4673-2579-0 |arxiv=1206.3804 }}</ref> i.e. for every <math>i \in \{1, \ldots, n\}</math>, the space of [[linear equation]]s of the code contains [[Element (mathematics)|elements]] of the form <math> x_i = f(x_{i_1}, \ldots, x_{i_r})</math>, where <math>i_j \neq i</math>.
 
==Optimal locally recoverable codes==
Let <math>C</math> be an <math>[n, k, d]_{q}</math>-locally recoverable code. Then a deleted component can be recovered linearly, i.e. for every <math>i \in \{1, \ldots, n\}</math>, the space of linear equations of the code contains elements of the form <math> x_{i} = f(x_{i}, \ldots, x_{i_{r}})</math>, where <math>i_{j} \neq i</math>.
 
'''Theorem'''<ref>{{Citation
==Optimal Locally Recoverable Codes==
|first1=V. |last1=Cadambe |first2=A. |last2=Mazumdar |contribution=An upper bound on the size of locally recoverable codes |pages=1–5 |___location=Calgary, AB, Canada |title=2013 International Symposium on Network Coding |date=2013 |doi=10.1109/NetCod.2013.6570829|arxiv=1308.3200 |isbn=978-1-4799-0823-3 }}</ref> Let <math>n = (r+1)s</math> and let <math>C</math> be an <math>[n, k, d]_{q}</math>-locally recoverable code having <math>s</math> disjoint locality [[Disjoint sets|sets]] of size <math>r+1</math>. Then <div style="text-align: center;"><math>d \leq n - k - \left\lceil\frac{k}{r}\right\rceil + 2.</math></div>
 
'''Theorem 1.3''' Let <math>n = (r+1)s</math> and let <math>C</math> be anAn <math>[n, k, d, r]_{q}</math>-locally recoverable code havingLRC <math>sC</math> disjointis said to be optimal localityif setsthe ofminimum size[[Hamming distance|distance]] of <math>r+1C</math>. Then,satisfies <div style="text-align: center;"><math>d \leq= n - k - \left\lceil\frac{k}{r}\right\rceil + 2 .</math></div>
 
== Tamo–Barg codes ==
 
AnLet <math>[n,f k,\in d,\mathbb r]_F_{q} [x]</math>-LRC be a [[polynomial]] and let <math>C\ell</math> is sai to be optimala ifpositive the minimum distance[[integer]]. ofThen <math>Cf</math> satisfiesis <divsaid style="text-align:to center;">be (<math>d = n - k - \lceil\frac{k}{r}\rceil + 2</math>, <math>\ell</divmath>)-good if
 
:• <math>f</math> has [[Degree of a polynomial|degree]] <math>r+1</math>,
 
:• there exist distinct [[subset]]s <math>A_{1}, \ldots, A_{\ell}</math> of <math>\mathbb F_{q}</math> such that
 
::– for any <math>i \in \{1, \ldots, \ell\}</math>, <math>f(A_{i}) = \{t_{i}\}</math> for some <math>t_{i} \in \mathbb F_{q}</math> , i.e., <math>f</math> is [[Constant (mathematics)|constant]] on <math>A_{i}</math>,
 
::– <math>\# A_{i} = r + 1</math>,
 
::– <math>A_{i} \cap A_{j} = \varnothing</math> for any <math>i \neq j</math>.
 
We say that {<math>A_{1},\ldots,A_{\ell}</math>} is a splitting covering for <math>f</math>.<ref>{{Citation |first1=G.
|last1=Micheli |title=Constructions of Locally Recoverable Codes Which are Optimal |pages=167–175 |journal=IEEE Transactions on Information Theory |date=2020 |volume=66 |doi=10.1109/TIT.2019.2939464
|arxiv=1806.11492 }}</ref>
 
=== Tamo–Barg construction ===
 
The Tamo–Barg construction utilizes good polynomials.<ref>{{Citation
|first1=I.|last1=Tamo |first2=A. |last2=Barg |contribution=A family of optimal locally recoverable codes |pages=686–690 |___location=Honolulu, HI, USA |title=2014 IEEE International Symposium on Information Theory |date=2014 |doi=10.1109/ISIT.2014.6874920|isbn=978-1-4799-5186-4 }}</ref>
:• Suppose that a <math>(r, \ell)</math>-good polynomial <math>f(x)</math> over <math>\mathbb F_{q}</math> is given with splitting covering <math>i \in \{1, \ldots, \ell\}</math>.
:• Let <math>s \leq \ell-1</math> be a positive [[integer]].
:• Consider the following <math>\mathbb F_q</math>-[[vector space]] of [[polynomial]]s <math display="block">V = \left\{\sum_{i=0}^s g_i(x)f(x)^i: \deg(g_i(x)) \leq \deg(f(x))-2 \right\}.</math>
:• Let <math display="inline">T = \bigcup_{i=1}^\ell A_i</math>.
:• The code <math> \{ \operatorname{ev}_T(g):g \in V \}</math> is an <math>((r+1)\ell,(s+1)r,d,r)</math>-optimal locally coverable code, where <math>\operatorname{ev}_T</math> denotes evaluation of <math>g</math> at all points in the [[Set (mathematics)|set]] <math>T</math>.
 
=== Parameters of Tamo–Barg codes ===
 
:• '''Length.''' The length is the number of evaluation points. Because the [[Set (mathematics)|sets]] <math>A_i</math> are [[Disjoint sets|disjoint]] for <math>i \in \{1, \ldots, \ell\}</math>, the length of the code is <math>|T| = (r+1)\ell</math>.
 
:• '''Dimension.''' The [[dimension]] of the code is <math>(s+1)r</math>, for <math>s</math> ≤ <math>\ell-1</math>, as each <math>g_i</math> has [[Degree of a polynomial|degree]] at most <math>\deg(f(x))-2</math>, covering a [[vector space]] of [[dimension]] <math>\deg(f(x))-1=r</math>, and by the construction of <math>V</math>, there are <math>s+1</math> distinct <math>g_i</math>.
 
:• '''Distance.''' The [[Hamming distance|distance]] is given by the fact that <math>V \subseteq \mathbb F_q [x]_{\leq k}</math>, where <math>k = r + 1 - 2 + s(r+1)</math>, and the obtained code is the [[Reed–Solomon error correction|Reed-Solomon code]] of [[Degree of a polynomial|degree]] at most <math>k</math>, so the minimum [[Hamming distance|distance]] equals <math>(r+1)\ell-((r+1)-2+s(r+1))</math>.
 
:• '''Locality.''' After the erasure of the single component, the evaluation at <math>a_i \in A_i</math>, where <math>|A_i|=r+1</math>, is unknown, but the evaluations for all other <math>a \in A_i</math> are known, so at most <math>r</math> evaluations are needed to uniquely determine the erased component, which gives us the locality of <math>r</math>.
 
: To see this, <math>g</math> restricted to <math>A_j</math> can be described by a [[polynomial]] <math>h</math> of [[Degree of a polynomial|degree]] at most <math>\deg(f(x))-2 = r + 1 - 2 = r - 1</math> thanks to the form of the [[Element (mathematics)|elements]] in <math>V</math> (i.e., thanks to the fact that <math>f</math> is [[Constant function|constant]] on <math>A_j</math>, and the <math>g_i</math>'s have [[Degree of a polynomial|degree]] at most <math>\deg(f(x))-2</math>). On the other hand <math>|A_j \backslash \{a_j\}| = r</math>, and <math>r</math> evaluations uniquely determine a [[polynomial]] of [[Degree of a polynomial|degree]] <math>r-1</math>. Therefore <math>h</math> can be constructed and evaluated at <math>a_j</math> to recover <math>g(a_j)</math>.
 
=== Example of Tamo–Barg construction ===
 
We will use <math>x^5 \in \mathbb F_{41} [x]</math> to construct <math>[15, 8, 6, 4]</math>-LRC. Notice that the [[Degree of a polynomial|degree]] of this [[polynomial]] is 5, and it is [[Constant (mathematics)|constant]] on <math>A_i</math> for <math>i \in \{1, \ldots, 8\}</math>, where <math>A_1 = \{1, 10, 16, 18, 37\}</math>, <math>A_2 = 2A_1</math>, <math>A_3 = 3A_1</math>, <math>A_4 = 4A_1</math>, <math>A_5 = 5A_1</math>, <math>A_6 = 6A_1</math>, <math>A_7 = 11A_1</math>, and <math>A_8 = 15A_1</math>: <math>A_1^5 = \{1\}</math>, <math>A_2^5 = \{32\}</math>, <math>A_3^5 = \{38\}</math>, <math>A_4^5 = \{40\}</math>, <math>A_5^5 = \{9\}</math>, <math>A_6^5 = \{27\}</math>, <math>A_7^5 = \{3\}</math>, <math>A_8^5 = \{14\}</math>. Hence, <math>x^5</math> is a <math>(4,8)</math>-good polynomial over <math>\mathbb F_{41}</math> by the definition. Now, we will use this [[polynomial]] to construct a code of [[dimension]] <math>k = 8</math> and length <math>n = 15</math> over <math>\mathbb F_{41}</math>. The locality of this code is 4, which will allow us to recover a single [[Server (computing)|server]] failure by looking at the information contained in at most 4 other [[Server (computing)|servers]].
 
Next, let us define the encoding [[polynomial]]: <math>f_a(x) = \sum_{i=0}^{r - 1} f_i(x)x^i</math>, where <math>f_i(x) = \sum_{i=0}^{\frac{k}{r} - 1} a_{i,j}g(x)^j</math>. So, <math>f_a(x) =</math> <math>a_{0,0} +</math> <math>a_{0,1}x^5 +</math> <math>a_{1,0}x +</math> <math>a_{1,1}x^6 +</math> <math>a_{2,0}x^2 +</math> <math>a_{2,1}x^7 +</math> <math>a_{3,0}x^3 +</math> <math>a_{3,1}x^8</math>.
 
Thus, we can use the obtained encoding [[polynomial]] if we take our [[data]] to encode as the [[Row and column vectors|row vector]] <math>a = </math> <math>(a_{0,0}, a_{0,1}, a_{1,0}, a_{1,1}, a_{2,0}, a_{2,1}, a_{3,0}, a_{3,1})</math>. Encoding the [[Vector (mathematics and physics)|vector]] <math>m</math> to a length 15 message [[Vector (mathematics and physics)|vector]] <math>c</math> by multiplying <math>m</math> by the [[generator matrix]]
 
: <div style="text-align: center;"><math>
G = \begin{pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
1 & 1 & 1 & 1 & 1 & 32 & 32 & 32 & 32 & 32 & 38 & 38 & 38 & 38 & 38\\
1 & 10 & 16 & 18 & 37 & 2 & 20 & 32 & 33 & 36 & 3 & 7 & 13 & 29 & 30\\
1 & 10 & 16 & 18 & 37 & 23 & 25 & 40 & 31 & 4 & 32 & 20 & 2 & 36 & 33\\
1 & 18 & 10 & 37 & 16 & 4 & 31 & 40 & 23 & 25 & 9 & 8 & 5 & 21 & 39\\
1 & 18 & 10 & 37 & 16 & 5 & 8 & 9 & 39 & 21 & 14 & 17 & 26 & 19 & 6\\
1 & 16 & 37 & 10 & 18 & 8 & 5 & 9 & 21 & 39 & 27 & 15 & 24 & 35 & 22\\
1 & 16 & 37 & 10 & 18 & 10 & 37 & 1 & 16 & 18 & 1 & 37 & 10 & 18 & 16
\end{pmatrix}
. </math></div>
 
For example, the encoding of information [[Vector (mathematics and physics)|vector]] <math>m = (1, 1, 1, 1, 1, 1, 1, 1)</math> gives the codeword <math>c = mG = (8, 8, 5, 9, 21, 3, 36, 31, 32, 12, 2, 20, 37, 33, 21)</math>.
 
Observe that we constructed an optimal LRC; therefore, using the [[Singleton bound]], we have that the [[Hamming distance|distance]] of this code is <math>d = n - k - \left\lceil\frac{k}{r}\right\rceil + 2 = 15 - 8 - 2 + 2 = 7</math>. Thus, we can recover any 6 erasures from our codeword by looking at no more than 8 other components.
 
==Locally recoverable codes with availability==
 
A code <math>C</math> has all-symbol locality <math>r</math> and availability <math>t</math> if every code symbol can be recovered from <math>t</math> disjoint repair sets of other symbols, each [[Set (mathematics)|set]] of [[Cardinality|size]] at most <math>r</math> symbols. Such codes are called <math>(r,t)_a</math>-LRC.<ref>{{Citation
|first1=P. |last1=Huang |first2=E. |last2=Yaakobi |first3=H.|last3=Uchikawa |first4=P.H.|last4=Siegel |contribution=Linear locally repairable codes with availability |pages=1871–1875 |___location=Hong Kong, China |title=2015 IEEE International Symposium on Information Theory |date=2015 |doi=10.1109/ISIT.2015.7282780|isbn=978-1-4673-7704-1 }}</ref>
 
'''Theorem''' The minimum [[Hamming distance|distance]] of <math>[n,k,d]_q</math>-LRC having locality <math>r</math> and availability <math>t</math> satisfies the [[Upper and lower bounds|upper bound]]
<div style="text-align: center;"><math>d \leq n - \sum_{i=0}^t \left\lfloor\frac{k-1}{r^i}\right\rfloor.</math></div>
If the code is [[Systematic code|systematic]] and locality and availability apply only to its information symbols, then the code has information locality <math>r</math> and availability <math>t</math>, and is called <math>(r,t)_i</math>-LRC.<ref>{{Citation |first1=I. |last1=Tamo |first2=A. |last2=Barg |contribution=Bounds on locally recoverable codes with multiple recovering sets |pages=691–695 |___location=Honolulu, HI, USA |title=2014 IEEE International Symposium on Information Theory |date=2014 |doi=10.1109/ISIT.2014.6874921|arxiv=1402.0916 |isbn=978-1-4799-5186-4 }}</ref>
 
'''Theorem'''<ref>{{Citation
|first1=A. |last1=Wang |first2=Z. |last2=Zhang |title=Repair locality with multiple erasure tolerance |pages=6979–6987 |journal=IEEE Transactions on Information Theory |date=2014 |volume=60 |issue=11 |doi=10.1109/TIT.2014.2351404|arxiv=1306.4774 }}</ref> The minimum [[Hamming distance|distance]] <math>d</math> of an <math>[n,k,d]_q</math> linear <math>(r,t)_i</math>-LRC satisfies the [[Upper and lower bounds|upper bound]]
<div style="text-align: center;"><math>d \leq n-k-\left\lceil\frac{t(k-1)+1}{t(r-1)+1}\right\rceil+2.</math></div>
 
== References ==
 
<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->
{{reflist}}
 
[[Category:Cryptography]]
[[Category:Information theory]]
[[Category:Error detection and correction]]