Unary coding

This is an old revision of this page, as edited by DumZiBoT (talk | contribs) at 06:08, 31 May 2008 (robot Adding: es:Código unario). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality.

ncoding
11
201
3001
40001
500001
6000001
70000001
800000001
9000000001
100000000001

Unary coding is an optimally efficient encoding for the following discrete probability distribution

for .

In symbol-by-symbol coding, it is optimal for any geometric distribution

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

for . Although it is the optimal symbol-by-symbol coding for such probability distributions, its optimality can, like that of Huffman coding, be over-stated. Arithmetic coding has better compression capability for the last two distributions mentioned above because it does not consider input symbols independently, but rather implicitly groups the inputs.

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

References

  • Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
  • Professor K.R Rao, EE5359:Principles of Digital Video Coding.