Content deleted Content added
→variable-length comma-free codes: Corrected non-existant links |
|||
Line 83:
There are many variable-length codes. When compressing data, we wonder -- which one is the best code ? (Which code compresses the file into the fewest number of bits ?)
If one knows ahead of time all the letters that could possibly be used, and has a good estimate of the [[letter frequencies]], the best possible comma-free code is a [[Huffman coding|Huffman code]]. (Usually the Huffman process generates a variable-length code. But when all the letters have the same frequency, such as previously compressed or encrypted data, and additionally the number of codewords is a power of the alphabet size the Huffman process will generate a fixed-length code.)
All other codes use *more* bits than a Huffman code. (Usually there are several Huffman codes, all of which compress the file into exactly the same number of bits).
Some data compression algorithms can compress files even smaller than [[Huffman coding|Huffman compression]]. Generally this is because they don't use a code at all. They may represent "a" by one pattern of bits in one place in the compressed file, then use that same pattern of bits to represent a completely different letter later on, as in [[Adaptive Huffman coding|adaptive Huffman]] compression. They may use a short pattern of bits to represent several letters, as in [[LZW]] compression -- changing any one of those bits may completely change that block of letters. Or they may avoid mapping particular bits to particular letters (the definition of a code) in other creative ways, an in [[range encoding]].
=== existence of prefix codes / Krafts inequality ===
|