Erasure code

In computer science, an erasure code transforms a message of n blocks into a message with more than n blocks, such that the original message can be recovered from a subset of those blocks. The fraction of the blocks required is called the rate, denoted r. Erasure codes are used in some forms of forward error correction.

Optimal erasure codes

Optimal erasure codes produce n/r blocks where any n blocks is sufficient to recover the original message. Optimal codes are often costly (in terms of memory usage, CPU time or both) when n is large.

Parity Check

Parity check is the special case where r=1. From a set of k values $\{v_{i}\}_{1\leq i\leq k}$ , a check-sum is computed and appended to the k source values:
$v_{k+1}=-\sum _{i=1}^{k}v_{i}$ .
The set of k+1 values $\{v_{i}\}_{1\leq i\leq k+1}$ is now consistent with regard to the check-sum. If one of this values $v_{e}$ is erased, it can be easily recovered by summing the remaining variables:
$v_{e}=-\sum _{i=1,i\neq e}^{k+1}v_{i}$ .

Err-mail (special case where n=2)

Alice wants to send her telephone number (555629) to Bob using err-mail. Err-mail works just like e-mail, except

About half of all the mail gets lost.^[1]
Messages longer than 5 characters are illegal.
It is very expensive (similar to air-mail).

Instead of asking Bob to acknowledge the messages she sends, Alice devises the following scheme.

She breaks her telephone number up into two parts a=555, b=629, and sends 2 messages – "A=555" and "B=629" – to Bob.
She constructs a linear function, $f(n)=a+(b-a)(n-1)$ , in this case $f(n)=555+74(n-1).$

File:Optimal Erasure Codes1.gif

She computes the values f(3), f(4), and f(5), and then transmits three redundant messages: "C=703", "D=777" and "E=851".

Bob knows that the form of f(n) is $f(n)=a+(b-a)(n-1)$ , where a and b are the two parts of the telephone number. Now suppose Bob receives "D=777" and "E=851".

File:Optimal Erasure Codes2.gif

Bob can reconstruct Alice's phone number by computing the values of a and b from the values (f(4) and f(5)) he has received. Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%.

Note that Alice cannot encode her telephone number in just one err-mail, because it contains six characters, and the maximum length of one err-mail message is five characters. If she sent her phone number in pieces, asking Bob to acknowledge receipt of each piece, at least four messages would have to be sent anyway (two from Alice, and two acknowledgments from Bob). So the erasure code in this example, which requires five messages, is quite economical.

This example is a little bit contrived. For truly generic erasure codes that work over any data set, we would need something other than the f(n) given.

Err-mail 2 (special case where n=3)

When err-mail 2 arrives (the upgrade), the maximum message length is reduced to 4 characters. Alice will now start by sending "A=55", "B=56" and "C=29". She will then place markers called A through E on the floor in a configuration that is known to Bob. She will then fit a plane such that its height above A is 55, its height above B is 56 and its height above C is 29. The redundant messages are then the height of that plane above D and E.

If Bob receives three messages he can reconstruct the plane and extract Alice's phone number.

Real world implementation

This process is implemented by Reed-Solomon codes with code words constructed over a finite field using an r by r Vandermonde matrix^[2].

Near-optimal erasure codes

Near-optimal erasure codes poduce n/r symbols from n source symbols, and they require (1+ε)n symbols to recover the message. Reducing ε can be done at the cost of CPU time. Near+optimal erasure codes trade correction capabilities for computational complexity, they can be decoded with algorithm having a linear time complexity.

Fountain codes (also known as rateless erasure codes) are an example of near-optimal erasure codes, they transform an n block message into a practically infinite encoded form. Encoded symbols can be generated ad infinitum and some number of them is enough to recover the message.

Examples

Near optimal erasure codes

Near optimal fountain (rateless erasure) codes

Optimal erasure codes

Parity: used in redundant array of independent disks
Optimal erasure codes with arbitrary parameters are surprisingly simple.
Reed-Solomon coding

References

^ Some versions of this story refer to the err-mail daemon.
^ Software FEC in computer communications by Luigi Rizzo describes optimal erasure correction codes

Feclib is a near optimal extension to Luigi Rizzo's work that uses band matrices. Many parameters can be set, like the size of the width of the band and size of the finite field. It also successfully exploits the large register size of modern CPUs. How it compares to the near optimal codes mentioned above is unknown.

[1] Some versions of this story refer to the err-mail daemon.

[2] Software FEC in computer communications by Luigi Rizzo describes optimal erasure correction codes

[1]

[2]

Erasure code

Contents

Optimal erasure codes

Parity Check

Err-mail (special case where n=2)

Err-mail 2 (special case where n=3)

Real world implementation

Near-optimal erasure codes

Examples

Near optimal erasure codes

Near optimal fountain (rateless erasure) codes

Optimal erasure codes

See also

References