Unary coding

This is an old revision of this page, as edited by TXiKiBoT (talk | contribs) at 10:23, 24 November 2009 (robot Adding: fr:Codage unaire). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality.

nUnary codeAlternative
110
20110
3001110
400011110
50000111110
6000001111110
700000011111110
80000000111111110
9000000001111111110
100000000001111111110

Unary coding is an optimally efficient encoding for the following discrete probability distribution

for .

In symbol-by-symbol coding, it is optimal for any geometric distribution

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

for . Although it is the optimal symbol-by-symbol coding for such probability distributions, its optimality can, like that of Huffman coding, be over-stated. Arithmetic coding has better compression capability for the last two distributions mentioned above because it does not consider input symbols independently, but rather implicitly groups the inputs.

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

See also

References

  • Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
  • Professor K.R Rao, EE5359:Principles of Digital Video Coding.