Unary coding

This is an old revision of this page, as edited by 85.65.244.34 (talk) at 11:31, 23 April 2007 (optimally efficient '''only''' for constrained integer-size codes...). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n ones followed by a zero. The ones and zeros are interchangeable without loss of generality.

Unary coding is an optimally efficient encoding for the following discrete probability distribution

for .

When comparing only the subset of encoders with an integer number of symbols in the output code, it is optimal for any geometric distribution

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

for .

Note that these two distributions can be encoded more efficiently with arithmetic coding (by allocating a non-integer number of symbols per code), so unary encoding is not optimally efficient for these distributions according to Shannon's source coding theorem.

A modified unary encoding is used in UTF-8. Unary codes are also used in split-index schemes like the Golomb Rice code. Unary coding is prefix-free, and can be uniquely decoded.

References

  • Khalid Sayood, Data Compression, 3rd ed, Morgan Kaufmann.
  • Professor K.R Rao, EE5359:Principles of Digital Video Coding.