Revision as of 04:15, 13 May 2008 edit 209.122.188.237 (talk) Rough description of algorithm with some citations - writen by Michael Nahas, designer of parchive2 file format. ← Previous edit		Revision as of 23:26, 13 May 2008 edit undo Mdnahas (talk \| contribs) Extended confirmed users 591 edits Fixed errors that I introduced yesterday. Added some explanation about the bipartite graph. Next edit →
Line 10: The xor of a number of binary variables is called their "parity" and this is often used in error detection and correction. Tornado codes use it for error correction. They use another checksum (like CRC-32 or MD5) for error detection. The Tornado code algorithm starts with the sender breaking an input file or message into equal sized blocks of bytes. Let's call these blocks A[1] through A[N]. The sender records the index of each block and computes a checksums for the block and its index. (These will be used to determine if a block has been damaged during transmission and therefore needs to be recovered.) The sender also calculates some parity blocks, B[1] through B[K]. Each of these parity blocks holds the parity for a ~~random~~ subset of the input blocks A[1] through A[N]. (The sizes of ~~these~~and ~~random~~composition of these subsets is key to the speed and success of this algorithm.) For each parity block, the sender records the indices of the input blocks and a checksum for the parity block and its input indices. The sender now sends the input and parity blocks (with their indices and checksums) to the receiver. During this transmission, some of the blocks may be corrupted. Line 16: The receiver uses the checksums to identify the bad blocks and discards them. The receiver is now left with a subset of the input blocks and some parity blocks. As long as the receiver has received N + C blocks (where C is some constant), it is highly probable that the receiver can recover the file. Now, each parity block is associated with a subset of input blocks and for most parity blocks there may be multiple input blocks missing from its subset. However, given the sizes of the random subsets, it is highly likely that there exists one parity block that is missing only one of its input blocks. Using the xor operation described above, that missing input block can be recovered. Once it is recovered, a parity block that was previously missing two input blocks may now be missing just one and that one can now be recovered. This process continues - input blocks being recovered and more parity blocks being available to recover missing blocks - until the entire input file or message is recovered. The beauty of the algorithm comes from determining the sizes and composition of the ~~random~~ subsets. On average, the sizes are low - making it very fast to create them and fast to recover the file. However, occasionally they are large - covering most of the input blocks - so that any missing block can be recovered. == Exact Algorithm == ''Need details on the exact formula for computing the size of subsets, given a value for C and expected loss rate. Also need big O() notation for algorithm and comparison for best of Reed-Solomon implementation. Source code is always great.'' The subsets for each parity block are generated in a semi-random fashion. The algorithm assigned for each input block a number of parity blocks it will be present in. The algorithm also assigns for each parity block the number of input blocks that will be in it. Then randomly parity blocks with remaining slots are assigned input blocks with remaining slots. This is called a random degree-constrained bipartite graph. This method generates parity blocks with random subsets of input blocks with a specific distribution that allows the input file or message to be recovered with very high probability. == Patent Issues ==

Tornado code: Difference between revisions