Tornado code: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 22:28, 28 May 2008 edit Waterpie (talk \| contribs) 8 edits →The definition of xor was wrong. Fixed it. ← Previous edit		Latest revision as of 10:47, 23 April 2025 edit undo Citation bot (talk \| contribs) Bots 5,871,216 edits Alter: title, template type. Added chapter. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar
(40 intermediate revisions by 32 users not shown)
Line 1: In [[~~computer~~coding ~~science~~theory]], '''~~tornado~~Tornado codes''' are a class of [[erasure ~~codes~~code]]s that support [[error correction]]. ~~and~~ ~~have~~Tornado codes require a constant C more redundant blocks than the more data-efficient [[Reed–Solomon erasure code]]s, but are much faster ~~fast~~to ~~encoding~~generate and ~~decoding~~can ~~algorithms~~fix erasures faster. Software-based implementations of tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than ~~software-based~~Reed–Solomon ~~[[Reed-Solomon]]~~erasure codes.{{sfn\|Byers\|Luby\|Mitzenmacher\|Rege\|1998}} Since the introduction of Tornado codes, many other similar erasure codes ~~while~~have ~~having~~emerged, ~~only~~most ~~slightly~~notably ~~worse~~[[Online ~~overhead~~codes]], [[LT codes]] and [[Raptor codes]]. Tornado codes use a layered approach. All layers except the last use an [[Low-density parity-check code\|LDPC]] error correction code, which is fast but has a chance of failure. The final layer uses a Reed–Solomon correction code, which is slower but is optimal in terms of failure recovery. Tornado codes dictates how many levels, how many recovery blocks in each level, and the distribution used to generate blocks for the non-final layers. Tornado codes are fixed rate, near optimal erasure correcting codes that use sparse bipartite graphs to trade encoding and decoding speed for reception overhead. Since the introduction of Tornado codes, many other similar codes have emerged, most notably [[Online codes]], [[LT codes]] and [[Raptor codes]]. == ~~Rough~~ Overview ==▼ The input data is divided into blocks. Blocks are sequences of bits that are all the same size. Recovery data uses the same block size as the input data. The erasure of a block (input or recovery) is detected by some other means. (For example, a block from disk does not pass a CRC check or a network packet with a given sequence number never arrived.) The number of recovery blocks is given by the user. Then the number of levels is determined along with the number of blocks in each level. The number in each level is determined by a factor B which is less than one. If there are N input blocks, the first recovery level has BN blocks, the second has B×B×N, the third has B×B×B×N, and so on. ▲== Rough Overview == ~~Tornado~~All ~~codes~~levels ~~work~~of ~~via~~recovery except the final one use an LDPC, which works by xor (exclusive -or). Xor operates on binary values, 1s and 0s. A xor B is 1 if A and B have different values and 0 if A and B have the same values. If you are given result of (A xor B) and A, you can determine the value for B. (~~Note that~~ A xor B xor A ~~is equal to~~= B.) Similarly, if you are given result of (A xor B) and B, you can determine the value for A. This extends to multiple values, so given result of (A xor B xor C xor D) and any 3 of the values, the missing value can be recovered. So the recovery blocks in level one are just the xor of some set of input blocks. Similarly, the recovery blocks in level two are each the xor of some set of blocks in level one. The blocks used in the xor are chosen randomly, without repetition. However, the ''number'' of blocks xor'ed to make a recovery block is chosen from a very specific distribution for each level. The xor of a number of binary variables is called their "parity" and this is often used in error detection and correction. Tornado codes use it for error correction. They use another checksum (like CRC-32 or MD5) for error detection. Since xor is a fast operation and the recovery blocks are an xor of only a subset of the blocks in the input (or at a lower recovery level), the recovery blocks can be generated quickly. The Tornado code algorithm starts with the sender breaking an input file or message into equal sized blocks of bytes. Let's call these blocks A[1] through A[N]. The sender records the index of each block and computes a checksums for the block and its index. (These will be used to determine if a block has been damaged during transmission and therefore needs to be recovered.) The sender also calculates some parity blocks, B[1] through B[K]. Each of these parity blocks holds the parity for a subset of the input blocks A[1] through A[N]. The sizes of and composition of these subsets is key to the speed and success of this algorithm. For each parity block, the sender records the indices of the input blocks and a checksum for the parity block and its input indices. The final level is a Reed–Solomon code. Reed–Solomon codes are optimal in terms of recovering from failures, but slow to generate and recover. Since each level has fewer blocks than the one before, the Reed–Solomon code has a small number of recovery blocks to generate and to use in recovery. So, even though Reed–Solomon is slow, it only has a small amount of data to handle. ~~The sender now sends the input and parity blocks (with their indices and checksums) to the receiver. During this transmission, some of the blocks may be corrupted.~~ During recovery, the Reed–Solomon code is recovered first. This is guaranteed to work if the number of missing blocks in the next-to-final level is less than the present blocks in the final level. The receiver uses the checksums to identify the bad blocks and discards them. The receiver is now left with a subset of the input blocks and some parity blocks. As long as the receiver has received N + C blocks (where C is some constant), it is highly probable that the receiver can recover the file. Now, each parity block is associated with a subset of input blocks and for most parity blocks there may be multiple input blocks missing from its subset. However, given the sizes of the random subsets, it is highly likely that there exists one parity block that is missing only one of its input blocks. Using the xor operation described above, that missing input block can be recovered. Once it is recovered, a parity block that was previously missing two input blocks may now be missing just one and that one can now be recovered. This process continues - input blocks being recovered and more parity blocks being available to recover missing blocks - until the entire input file or message is recovered. Going lower, the LDPC (xor) recovery level can be used to recover the level beneath it ''with high probability'' if all the recovery blocks are present and the level beneath is missing at most C' fewer blocks than the recovery level. The algorithm for recovery is to find some recovery block that has only one of its generating set missing from the lower level. Then the xor of the recovery block with all of the blocks that are present is equal to the missing block. The beauty of the algorithm comes from determining the sizes and composition of the subsets. On average, the sizes are low - making it very fast to create them and fast to recover the file. However, occasionally they are large - covering most of the input blocks - so that any missing block can be recovered. == ~~Exact~~Patent ~~Algorithm~~issues == Tornado codes were formerly patented inside the United States of America.{{sfn\|Mitzenmacher\|2004}} Patents US6163870 A (filed Nov 6, 1997) and US 6081909 A (filed Nov 6, 1997) describe Tornado codes, and have expired as of November 6, 2017. Patents US6307487 B1 (filed Feb 5, 1999) and US6320520 B1 (filed Sep 17, 1999) also mention Tornado codes, and have expired as of February 5, 2019, and September 17, 2019, respectively. == Citations ==▼ ''Need details on the exact formula for computing the size of subsets, given a value for C and expected loss rate. Also need big O() notation for algorithm and comparison for best of Reed-Solomon implementation. Source code is always great.'' [[Michael Luby]] created the Tornado codes.{{sfn\|Luby\|Mitzenmacher\|Shokrollahi\|Spielman\|1997}}{{sfn\|Luby\|Mitzenmacher\|Shokrollahi\|1998}} == See also == The subsets for each parity block are generated in a semi-random fashion. The algorithm assigned for each input block a number of parity blocks it will be present in. The algorithm also assigns for each parity block the number of input blocks that will be in it. Then randomly parity blocks with remaining slots are assigned input blocks with remaining slots. This is called a random degree-constrained bipartite graph. This method generates parity blocks with random subsets of input blocks with a specific distribution that allows the input file or message to be recovered with very high probability. [[Erasure code]] * [[Raptor code]] ~~== Patent Issues ==~~ ~~Tornado codes and many similar codes (LT codes, Online codes) are patented inside the United States of America and cannot be used with out the patent holder's permission.~~ ▲== Citations == == Notes == ~~Michael Luby created Tornado codes and has good descriptions on his website. [http://www.icsi.berkeley.edu/~luby/]~~ {{reflist}} == References == A readable description from CMU (PostScript): [www.cs.cmu.edu/afs/cs.cmu.edu/project/pscico-guyb/realworld/www/tornado.ps]▼ * {{cite conference \|vauthors=Byers JW, Luby M, Mitzenmacher M, Rege A \|date=October 1998 \|title=A digital fountain approach to reliable distribution of bulk data \|book-title=SIGCOMM '98: Proceedings \|conference=ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication \|pages=56–67 \|doi=10.1145/285237.285258 \|doi-access=free}} * {{cite book \|vauthors=Luby M, Mitzenmacher M, Shokrollahi A, Spielman D, Stemann V \|chapter=Practical loss-resilient codes \|author-link1=Michael Luby \|author-link2=Michael Mitzenmacher \|author-link3=Amin Shokrollahi \|author-link4=Daniel Spielman \|title=Proceedings of the twenty-ninth annual ACM symposium on Theory of computing - STOC '97 \|pages=150–159 \|year=1997\|doi=10.1145/258533.258573 \|isbn=0-89791-888-6 }} * {{cite journal \|vauthors=Luby M, Mitzenmacher M, Shokrollahi A \|author-link1=Michael Luby \|author-link2=Michael Mitzenmacher \|author-link3=Amin Shokrollahi \|title=Analysis of Random Processes via And-Or Tree Evaluation \|journal=Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms \|pages=364–373 \|year=1998}} * {{cite book \|vauthors=Mitzenmacher M \|chapter=Digital fountains: A survey and look forward \|author-link=Michael Mitzenmacher \|title=Manufacturing Engineer \|year=2004\|pages=271–276 \|doi=10.1109/ITW.2004.1405313 \|isbn=0-7803-8720-1 }} == External links == ~~Probable patent citation from Michael Luby's CV:~~ ▲A readable description from CMU (PostScript): [https://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscico-guyb/realworld/www/tornado.ps] and another from Luby at the [[International Computer Science Institute]] (PostScript) [http://www.icsi.berkeley.edu/~luby/PAPERS/tordig.ps]. "Message Encoding and Transmission System and Method for Multilevel Data Redundancy", Andres Albanese, Michael Luby, Johannes Blomer and Je Edmonds, U.S. Patent No. 5,617,541, Issued April 1, 1997. Serial Number 08/361,802; 12-21-94, Assignment recorded February 27, 1995, Reel 7364, Frames 685-689. [[Category:Coding theory]] [[Category:Capacity-approaching codes]]