Unary coding

This is an old revision of this page, as edited by Arkanosis (talk | contribs) at 10:40, 24 November 2009 (Depending on how we interpret ''natural number'', interpretation of unary coding may differ). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Unary coding is an entropy encoding that represents a natural number, n, with n ones followed by a zero (if natural number is understood as non-negative integer) or with n − 1 ones followed by a zero (if natural number is understood as strictly positive integer). For example 5 is represented as 111110 or 11110. Some representations use n or n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality.

n (non-negative)n (strictly positive)Unary codeAlternative
0101
121001
23110001
3411100001
451111000001
56111110000001
6711111100000001
781111111000000001
89111111110000000001
9101111111100000000001

Unary coding is an optimally efficient encoding for the following discrete probability distribution

for .

In symbol-by-symbol coding, it is optimal for any geometric distribution

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

for . Although it is the optimal symbol-by-symbol coding for such probability distributions, its optimality can, like that of Huffman coding, be over-stated. Arithmetic coding has better compression capability for the last two distributions mentioned above because it does not consider input symbols independently, but rather implicitly groups the inputs.

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

See also

References

  • Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
  • Professor K.R Rao, EE5359:Principles of Digital Video Coding.