Locally recoverable code: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:44, 24 September 2024 edit Yaroslav-Marta (talk \| contribs) 152 edits Submitting using AfC-submit-wizard ← Previous edit		Latest revision as of 05:28, 6 January 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: arxiv updated in citation with #oabot.
(46 intermediate revisions by 16 users not shown)
Line 1: {{Short description\|Coding Theory}} ~~{{Draft topics\|stem}}~~ ~~{{AfC topic\|stem}}~~ ~~{{AfC submission\|\|\|ts=20240924004443\|u=Yaroslav-Marta\|ns=118}}~~ ~~{{AfC submission\|t\|\|ts=20231130204613\|u=Yaroslav-Marta\|ns=118\|demo=}}<!-- Important, do not remove this line before article has been created. -->~~ '''Locally ~~Recoverable~~recoverable ~~Codes~~codes''' are a family of [[error correction ~~codes~~code]]s that were introduced first by D. S. Papailiopoulos and A. G. Dimakis<ref>{{Citation \|first1=Dimitris S.\|last1=Papailiopoulos \|first2=Alexandros G. \|last2=Dimakis \|~~title~~contribution="Locally ~~Repairable~~repairable ~~Codes"~~codes \|pages=~~2771-2775~~2771–2775 \|___location=Cambridge, MA, USA \|~~publisher~~title=2012 IEEE International Symposium on Information Theory Proceedings \|date=2012 \|doi=10.1109/ISIT.2012.6284027\|isbn=978-1-4673-2579-0 \|arxiv=1206.3804 \|publisher=IEEE}}</ref> and have been widely studied in ~~Information~~[[information theory]] due to their applications related to ~~Distributive~~distributive and ~~Cloud~~[[cloud ~~Storage~~storage]] ~~Systems~~systems.<ref>{{Citation <ref>{{Citation▼ \|first1=A. \|last1=Barg Line 14 ⟶ 9: \|first3=S. \|last3=Vlăduţ \|~~title~~contribution="Locally recoverable codes on algebraic curves" \|pages=~~1252-1256~~1252–1256 \|___location=Hong Kong, China \|~~publisher~~title=2015 IEEE International Symposium on Information Theory \|date=2015 \|doi=10.1109/ISIT.2015.7282656 \|arxiv=1603.08876 ~~}}</ref>~~ \|isbn=978-1-4673-7704-1 <ref>{{Citation▼ \|publisher=IEEE ▲}}</ref><ref>{{Citation \|first1=V. R. \|last1=Cadambe \|first2=A. \|last2=Mazumdar \|title="Bounds on the Size of Locally Recoverable Codes" \|pages=~~5787-5794~~5787–5794 \|~~publisher~~journal=IEEE Transactions on Information Theory \|date=2015 \|volume=61 \|issue=11 \|doi=10.1109/TIT.2015.2477406 \|publisher=IEEE ~~}}</ref>~~ }}</ref><ref>{{Citation \|first1=A. \|last1=Dukes Line 39 ⟶ 38: \|first3=G. \|last3=Micheli \|title="Optimal selection for good polynomials of degree up to five" \|journal=Designs, Codes and Cryptography ~~\|pages=1427-1436~~ \|pages=1427–1436 \|date=2022 \|volume=90 \|issue=6 \|doi= 10.1007/s10623-022-01046-y \|publisher=IEEE ~~}}</ref>~~ \|arxiv=2104.01434 ~~<ref>{{Citation~~ ▲}}</ref><ref>{{Citation \|first1=K. \|last1=Haymaker Line 51 ⟶ 54: \|first3=G. \|last3=Matthews \|title="Locally Recoverable Codes with Availability ~~t≥2~~''t''≥2 from Fiber Products of Curves" \|date=2022 \|doi=10.3934/amc.2018020 \|doi-access=free }}</ref> An <math>[n, k, d, r]_{q}</math> LRC is an <math>[n, k, d]_{q}</math> [[linear code]] such that there is a [[Function (mathematics)\|function]] <math>f_{i}</math> that takes as input <math>i</math> and a [[Set (mathematics)\|set]] of <math>r</math> other [[Coordinate system\|coordinates]] of a codeword <math>c = (c_{1}, \ldots, c_{n}) \in C</math> different from <math>c_{i}</math>, and outputs <math>c_{i}</math>. ==Overview== [[Error correction code\|Erasure-correcting codes]], or simply [[Error correction code\|erasure codes]], for distributed and [[cloud storage]] systems, are becoming more and more popular as a result of the present spike in demand for [[cloud computing]] and storage services. This has inspired researchers in the fields of [[Information theory\|information]] and [[coding theory]] to investigate new facets of codes that are specifically suited for use with storage systems. It is well-known that LRC is a [[linear code\|code]] that needs only a limited [[Set (mathematics)\|set]] of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and [[cloud storage]] systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much [[data]] as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems. The following [[definition]] of the LRC follows from the description above: an <math>[n, k, r]</math>-Locally Recoverable Code (LRC) of length <math>n</math> is a [[linear code\|code]] that produces an <math>n</math>-symbol codeword from <math>k</math> information symbols, and for any symbol of the codeword, there exist at most <math>r</math> other symbols such that the value of the symbol can be recovered from them. The locality [[parameter]] satisfies <math>1 \leq r \leq k</math> because the entire codeword can be found by accessing <math>k</math> symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum [[Hamming distance\|distance]] <math>d</math>, can recover <math>d-1</math> erasures. ==Definition== Let <math>C</math> be a <math>[n, k, d]_{q}</math> [[linear code]]. For <math>i \in \{1, \ldots, n\}</math>, let us denote by <math>r_{i}</math> the minimum number of other [[Coordinate system\|coordinates]] we have to look at to recover an erasure in [[Coordinate system\|coordinate]] <math>i</math>. The number <math>~~r_{i}~~r_i</math> is said to be the ''locality of the <math>i</math>-th [[Coordinate system\|coordinate]]'' of the code. The ''locality'' of the code is defined as <div style="text-align: center;"><math>r = \~~textrm{~~max}\{~~r_{i}\|~~r_i \mid i \in \{1, \ldots, n\}\}.</math></div> An <math>[n, k, d, r]_{q}</math> ''locally recoverable code'' (LRC) is an <math>[n, k, d]~~_{q}~~_q</math> [[linear code]] <math>C \in \mathbb F_q^n</math> with locality <math>r</math>. Let <math>C</math> be an <math>[n, k, d]~~_{q}~~_q</math>-locally recoverable code. Then an erased component can be recovered linearly,<ref>{{Citation \|first1=Dimitris S.\|last1=Papailiopoulos \|first2=Alexandros G. \|last2=Dimakis \|~~title~~contribution="Locally ~~Repairable~~repairable ~~Codes"~~codes \|pages=~~2771-2775~~2771–2775 \|___location=Cambridge, MA, USA \|~~publisher~~title=2012 IEEE International Symposium on Information Theory \|date=2012 \|doi=10.1109/ISIT.2012.6284027\|isbn=978-1-4673-2579-0 \|arxiv=1206.3804 }}</ref>, i.e. for every <math>i \in \{1, \ldots, n\}</math>, the space of [[linear ~~equations~~equation]]s of the code contains [[Element (mathematics)\|elements]] of the form <math> ~~x_{i}~~x_i = f(x_{~~i_{1}~~i_1}, \ldots, x_{~~i_{r}~~i_r})</math>, where <math>~~i_{j}~~i_j \neq i</math>. ==Optimal ~~Locally~~locally ~~Recoverable~~recoverable ~~Codes~~codes== '''Theorem'''<ref>{{Citation \|first1=V. \|last1=Cadambe \|first2=A. \|last2=Mazumdar \|~~title~~contribution="An upper bound on the size of locally recoverable codes" \|pages=~~1-5~~1–5 \|___location=Calgary, AB, Canada \|~~publisher~~title=2013 International Symposium on Network Coding \|date=2013 \|doi=10.1109/NetCod.2013.6570829\|arxiv=1308.3200 \|isbn=978-1-4799-0823-3 }}</ref> Let <math>n = (r+1)s</math> and let <math>C</math> be an <math>[n, k, d]_{q}</math>-locally recoverable code having <math>s</math> disjoint locality [[Disjoint sets\|sets]] of size <math>r+1</math>. Then <div style="text-align: center;"><math>d \leq n - k - \left\lceil\frac{k}{r}\right\rceil + 2.</math></div> An <math>[n, k, d, r]_{q}</math>-LRC <math>C</math> is said to be optimal if the minimum [[Hamming distance\|distance]] of <math>C</math> satisfies <div style="text-align: center;"><math>d = n - k - \left\lceil\frac{k}{r}\right\rceil + 2 .</math></div> == ~~Tamo--Barg~~Tamo–Barg ~~Codes~~codes == Let <math>f~~</math>~~ ∈\in ~~<math>~~\mathbb F_{q} [x]</math> be a [[polynomial]] and let <math>\ell</math> be a positive [[integer]]. Then <math>f</math> is said to be (<math>r</math>, <math>\ell</math>)-good if :• <math>f</math> has [[Degree of a polynomial\|degree]] <math>r+1</math>, :• there exist distinct [[subset]]s <math>A_{1}~~</math>~~ , ~~. . .~~\ldots, ~~<math>~~A_{\ell}</math> ~~distinct subsets~~ of <math>\mathbb F_{q}</math> such that ::– for any <math>i \in \{1, \ldots, \ell\}</math>, <math>f~~</math>~~ (~~<math>~~A_{i}~~</math>~~ ) = \{~~<math>~~t_{i}\}</math>} for some <math>t_{i}~~</math>~~ ∈\in ~~<math>~~\mathbb F_{q}</math> , i.e., <math>f</math> is [[Constant (mathematics)\|constant]] on <math>A_{i}</math>, ::– #<math>\# A_{i}~~</math>~~ = ~~<math>~~r + 1</math>, ::– <math>A_{i}~~</math>~~ ∩\cap ~~<math>~~A_{j} = \varnothing</math> ~~= ∅~~ for any <math>i~~</math>~~ ≠\neq ~~<math>~~j</math>. We say that {<math>A_{1},\ldots,A_{\ell}</math>} is a splitting covering for <math>f</math>.<ref>{{Citation \|first1=G. \|last1=Micheli \|title="Constructions of Locally Recoverable Codes Which are Optimal" \|pages=~~167-175~~167–175 \|~~publisher~~journal=IEEE Transactions on Information Theory \|date=2020 \|volume=66 \|doi=10.1109/TIT.2019.2939464 \|arxiv=1806.11492 }}</ref>. === ~~Tamo--Barg~~Tamo–Barg ~~Construction~~construction === The ~~Tamo--Barg~~Tamo–Barg construction utilizes good polynomials.<ref>{{Citation \|first1=I.\|last1=Tamo \|first2=A. \|last2=Barg \|~~title~~contribution="A family of optimal locally recoverable ~~code"~~codes \|pages=~~686-690~~686–690 \|___location=Honolulu, HI, USA \|~~publisher~~title=2014 IEEE International Symposium on Information Theory \|date=2014 \|doi=10.1109/ISIT.2014.6874920\|isbn=978-1-4799-5186-4 }}</ref> :• Suppose that a <math>(r, \ell)</math>-good polynomial <math>f(x)</math> over <math>\mathbb F_{q}</math> is given with splitting covering <math>i \in \{1, \ldots, \ell\}</math>. :• Let <math>s~~</math>~~ ≤\leq ~~<math>~~\ell-1</math> be a positive [[integer]]. :• Consider the following <math>\mathbb ~~F_{q}~~F_q</math> -[[vector space]] of ~~polynomials~~[[polynomial]]s <math display="block">V = \left\{\sum_{i=0}^s g_i(x)f(x)^i: \deg(g_i(x)) \leq \deg(f(x))-2 \right\}.</math> :• Let ~~<math>T</math> =~~ <math display="inline">T = \bigcup_{i=1}^\ell A_i</math>.▼ <div style="text-align: center;"><math>V = \{\sum_{i=0}^s g_{i}(x)f(x)^i:{\text{deg}}(g_{i}(x)) \leq {\text{deg}}(f(x))-2\}</math></div>▼ :• The code <math> \{ ~~ev_~~\operatorname{Tev}_T(g):g \in V \}</math> is an <math>((r+1)\ell,(s+1)r,d,r)</math> -optimal locally coverable code, where <math>~~ev_~~\operatorname{Tev}_T</math> denotes evaluation of <math>g</math> at all points in the [[Set (mathematics)\|set]] <math>T</math>.▼ ▲:• Let <math>T</math> = <math display="inline">\bigcup_{i=1}^\ell A_i</math> ▲:• The code <math> \{ ev_{T}(g):g \in V \}</math> is an <math>((r+1)\ell,(s+1)r,d,r)</math> optimal locally coverable code, where <math>ev_{T}</math> denotes evaluation of g at all points in the set <math>T</math>. === Parameters of ~~Tamo--Barg~~Tamo–Barg ~~Codes~~codes === :• '''Length.''' The length is the number of evaluation points. Because the [[Set (mathematics)\|sets]] <math>A_i</math> are [[Disjoint sets\|disjoint]] for <math>i \in \{1, \ldots, \ell\}</math>, the length of the code is <math>\|T\| = (r+1)\ell</math>. :• '''Dimension.''' The [[dimension]] of the code is <math>(s+1)r</math>, for <math>s</math> ≤ <math>\ell-1</math>, as each <math>g_i</math> has [[Degree of a polynomial\|degree]] at most <math>{\~~text{~~deg}}(f(x))-2</math>, covering a [[vector space]] of [[dimension]] <math>{\~~text{~~deg}}(f(x))-1=r</math>, and by the construction of <math>V</math>, there are <math>s+1</math> distinct <math>g_i</math>. :• '''Distance.''' The [[Hamming distance\|distance]] is given by the fact that <math>V \subseteq \mathbb F_q [x]_{\leq k}</math>, where <math>k = r + 1 - 2 + s(r+1)</math>, and the obtained code is the [[Reed–Solomon error correction\|Reed-Solomon code]] of [[Degree of a polynomial\|degree]] at most <math>k</math>, so the minimum [[Hamming distance\|distance]] equals <math>(r+1)\ell-((r+1)-2+s(r+1))</math>. :• '''Locality.''' After the erasure of the single component, the evaluation at <math>a_i \in A_i</math>, where <math>\|A_i\|=r+1</math>, is unknown, but the evaluations for all other <math>a \in A_i</math> are known, so at most <math>r</math> evaluations are needed to uniquely determine the erased component, which gives us the locality of <math>r</math>. : To see this, <math>g</math> restricted to <math>A_j</math> can be described by a [[polynomial]] <math>h</math> of [[Degree of a polynomial\|degree]] at most <math>{\~~text{~~deg}}(f(x))-2 = r + 1 - 2 = r - 1</math> thanks to the form of the [[Element (mathematics)\|elements]] in <math>V</math> (i.e., thanks to the fact that <math>f</math> is [[Constant function\|constant]] on <math>A_j</math>, and the <math>g_i</math>'s have [[Degree of a polynomial\|degree]] at most <math>{\~~text{~~deg}}(f(x))-2</math>). On the other hand <math>\|A_j \backslash \{a_j\}\| = r</math>, and <math>r</math> evaluations uniquely determine a [[polynomial]] of [[Degree of a polynomial\|degree]] <math>r-1</math>. Therefore <math>h</math> can be constructed and evaluated at <math>a_j</math> to recover <math>g(a_j)</math>. === Example of ~~Tamo-Barg~~Tamo–Barg ~~Construction~~construction === We will use <math>x^5 \in \mathbb F_{41} [x]</math> to construct <math>[15, 8, 6, 4]</math>-LRC. Notice that the [[Degree of a polynomial\|degree]] of this [[polynomial]] is 5, and it is [[Constant (mathematics)\|constant]] on <math>A_i</math> for <math>i \in \{1, \ldots, 8\}</math>, where <math>A_1 = \{1, 10, 16, 18, 37\}</math>, <math>A_2 = 2A_1</math>, <math>A_3 = 3A_1</math>, <math>A_4 = 4A_1</math>, <math>A_5 = 5A_1</math>, <math>A_6 = 6A_1</math>, <math>A_7 = 11A_1</math>, and <math>A_8 = 15A_1</math>: <math>A_1^5 = \{1\}</math>, <math>A_2^5 = \{32\}</math>, <math>A_3^5 = \{38\}</math>, <math>A_4^5 = \{40\}</math>, <math>A_5^5 = \{9\}</math>, <math>A_6^5 = \{27\}</math>, <math>A_7^5 = \{3\}</math>, <math>A_8^5 = \{14\}</math>. Hence, <math>x^5</math> is a <math>(4,8)</math>-good polynomial over <math>\mathbb F_{41}</math> by the definition. Now, we will use this [[polynomial]] to construct a code of [[dimension]] <math>k = 8</math> and length <math>n = 15</math> over <math>\mathbb F_{41}</math>. The locality of this code is 4, which will allow us to recover a single [[Server (computing)\|server]] failure by looking at the ~~infromation~~information contained in at most 4 other [[Server (computing)\|servers]]. Next, let's us define the encoding [[polynomial]]: <math>f_a(x) = \sum_{i=0}^{r - 1} f_i(x)x^i</math>, where <math>f_i(x) = \sum_{i=0}^{\frac{k}{r} - 1} a_{i,j}g(x)^j</math>. So, <math>f_a(x) =</math> <math>a_{0,0} +</math> <math>a_{0,1}x^5 +</math> <math>a_{1,0}x +</math> <math>a_{1,1}x^6 +</math> <math>a_{2,0}x^2 +</math> <math>a_{2,1}x^7 +</math> <math>a_{3,0}x^3 +</math> <math>a_{3,1}x^8</math>. Thus, we can use the obtained encoding [[polynomial]] if we take our [[data]] to encode as the [[Row and column vectors\|row vector]] <math>a = </math> <math>(a_{0,0}, a_{0,1}, a_{1,0}, a_{1,1}, a_{2,0}, a_{2,1}, a_{3,0}, a_{3,1})</math>. Encoding the [[Vector (mathematics and physics)\|vector]] <math>m</math> to a length 15 message [[Vector (mathematics and physics)\|vector]] <math>c</math> by multiplying <math>m</math> by the [[generator matrix]] : <div style="text-align: center;"><math> ~~<math>~~ G = \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ Line 134 ⟶ 145: 1 & 16 & 37 & 10 & 18 & 10 & 37 & 1 & 16 & 18 & 1 & 37 & 10 & 18 & 16 \end{pmatrix} . </math></div> For example, the encoding of information [[Vector (mathematics and physics)\|vector]] <math>m = (1, 1, 1, 1, 1, 1, 1, 1)</math> gives the codeword <math>c = mG = (8, 8, 5, 9, 21, 3, 36, 31, 32, 12, 2, 20, 37, 33, 21)</math>. Observe that we constructed an optimal LRC; therefore, using the [[Singleton bound]], we have that the [[Hamming distance\|distance]] of this code is <math>d = n - k - \left\lceil\frac{k}{r}\right\rceil + 2 = 15 - 8 - 2 + 2 = 7</math>. Thus, we can recover any 6 erasures from our codeword by looking at no more than 8 other components. ==Locally ~~Recoverable~~recoverable ~~Codes~~codes with ~~Availability~~availability== A code <math>C</math> has all-symbol locality <math>r</math> and availability <math>t</math> if every code symbol can be recovered from <math>t</math> disjoint repair sets of other symbols, each [[Set (mathematics)\|set]] of [[Cardinality\|size]] at most <math>r</math> symbols. Such codes are called <math>(r,t)_a</math>-LRC.<ref>{{Citation ~~'''Definition'''<ref>{{Citation~~ \|first1=P. \|last1=Huang \|first2=E. \|last2=Yaakobi \|first3=H.\|last3=Uchikawa \|first4=P.H.\|last4=Siegel \|~~title~~contribution="Linear locally repairable codes with availability" \|pages=~~1871-1875~~1871–1875 \|___location=Hong Kong, China \|~~publisher~~title=2015 IEEE International Symposium on Information Theory \|date=2015 \|doi=10.1109/ISIT.2015.7282780\|isbn=978-1-4673-7704-1 }}</ref> A code <math>C</math> has all-symbol locality <math>r</math> and availability <math>t</math> if every code symbol can be recovered from <math>t</math> disjoint repair sets of other symbols, each set of size at most <math>r</math> symbols. Such codes are called <math>(r,t)_a</math>-LRC. '''Theorem'''<ref>{{Citation \|first1=I. \|last1=Tamo \|first2=A. \|last2=Barg \|title="Bounds on locally recoverable codes with multiple recovering sets" \|pages=691-695 \|___location=Honolulu, HI, USA \|publisher=2014 IEEE International Symposium on Information Theory \|date=2014 \|doi=10.1109/ISIT.2014.6874921}}</ref> The minimum [[Hamming distance\|distance]] of <math>[n,k,d]_q</math>-LRC having locality <math>r</math> and availability <math>t</math> satisfies the [[Upper and lower bounds\|upper bound ]] ▲<div style="text-align: center;"><math>Vd =\leq n - \{\sum_{i=0}^st ~~g_{i}(x)f(x)^i:{~~\~~text~~left\lfloor\frac{~~deg~~k-1}~~}(g_~~{r^i}~~(x))~~ \~~leq {~~right\~~text{deg}}(f(x))-2\}~~rfloor.</math></div> If the code is [[Systematic code\|systematic]] and locality and availability apply only to its information symbols, then the code has information locality <math>r</math> and availability <math>t</math>, and is called <math>(r,t)_i</math>-LRC.<ref>{{Citation \|first1=I. \|last1=Tamo \|first2=A. \|last2=Barg \|contribution=Bounds on locally recoverable codes with multiple recovering sets \|pages=691–695 \|___location=Honolulu, HI, USA \|title=2014 IEEE International Symposium on Information Theory \|date=2014 \|doi=10.1109/ISIT.2014.6874921\|arxiv=1402.0916 \|isbn=978-1-4799-5186-4 }}</ref> <div style="text-align: center;"><math>d \leq n - \sum_{i=0}^{t} \left\lfloor\frac{k-1}{r^i}\right\rfloor</math></div>.▼ If the code is systematic and locality and availability apply only to its information symbols, then the code has information locality <math>r</math> and availability <math>t</math>, and is called <math>(r,t)_i</math>-LRC. '''Theorem'''<ref>{{Citation \|first1=A. \|last1=Wang \|first2=Z. \|last2=Zhang \|title="Repair locality with multiple erasure tolerance" \|pages=~~6979-6987~~6979–6987 \|~~publisher~~journal=IEEE Transactions on Information Theory \|date=2014 \|volume=60 \|issue=11 \|doi=10.1109/TIT.2014.2351404\|arxiv=1306.4774 }}</ref> The minimum [[Hamming distance\|distance]] <math>d</math> of an <math>[n,k,d]_q</math> linear <math>(r,t)_i</math>-LRC satisfies the [[Upper and lower bounds\|upper bound]] ▲<div style="text-align: center;"><math>d \leq n - ~~\sum_{i=0}^{t}~~ k-\left\~~lfloor~~lceil\frac{t(k-1)+1}{t(r^i-1)+1}\right\~~rfloor~~rceil+2.</math></div>. == References ==▼ ~~<div style="text-align: center;"><math>d \leq n-k-\left\lceil\frac{t(k-1)+1}{t(r-1)+1}\right\rceil+2</math></div>.~~ ▲== References == ~~<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->~~ {{reflist}} [[Category:Cryptography]] [[Category:Information theory]] [[Category:Error detection and correction]]