Golomb coding: Difference between revisions

Content deleted Content added
Construction of codes: use mvar for better typography
Overview: replace the cryptic figure with this one that shows more
Line 10:
 
== Overview ==
[[File:Golomb code example.png|thumb|upright 1.5|Golomb coding example for a source x with geometric distribution, with parameter {{math|''p''(0) {{=}} 0.2}}, using Golomb code with {{math|''M'' {{=}} 3}}. The 2-bit code 00 is used 20% of the time; the 3-bit codes 010, 011, and 100 are used over 38% of the time; 4 bits or more are needed in a minority of cases. For this source, entropy = 3.610 bits; for this code with this source, rate = 3.639 bits; therefore redundancy = 0.030 bits, or efficiency = 0.992 bits per bit.]]
 
===Construction of codes===
 
Golomb coding uses a tunable parameter {{mvar|M}} to divide an input value {{mvar|N}} (aka {{mvar|x}}) into two parts: {{mvar|q}}, the result of a division by {{mvar|M}}, and {{mvar|r}}, the remainder. The quotient is sent in [[unary coding]], followed by the remainder in [[truncated binary encoding]]. When <math>M=1</math>, Golomb coding is equivalent to unary coding.
 
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' ({{mvar|q}}), and the ''offset'' within the bin ({{mvar|r}}). The aboveexample figure shows the position {{mvar|q}} and offset {{mvar|r}} for the encoding of integer {{mvar|Nx}} using Golomb–Rice parameter {{mvarmath|''M'' {{=}} 3}}, with source probabilities following a geometric distribution with {{math|''p''(0) {{=}} 0.2}}.
[[Image:golomb rice code.svg]]
 
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' ({{mvar|q}}), and the ''offset'' within the bin ({{mvar|r}}). The above figure shows the position {{mvar|q}} and offset {{mvar|r}} for the encoding of integer {{mvar|N}} using Golomb–Rice parameter {{mvar|M}}.
 
Formally, the two parts are given by the following expression, where <math>x</math> is the nonnegative integer being encoded:
Line 27 ⟶ 26:
:<math>r = x - qM</math>.
 
[[ImageFile:GolombCodeRedundancy.svg|300pxthumb|upright 1.5|This image shows the redundancy, in bits, of the Golomb code, when {{mvar|M}} is chosen optimally for {{math| (1 − ''p''&nbsp;) &ge;&nbsp;1/2 0.|thumb|right5}}]]
 
Both {{mvar|q}} and {{mvar|r}} will be encoded using variable numbers of bits – {{mvar|q}} by a unary code, and {{mvar|r}} by {{mvar|b}} bits for Rice code, or a choice between {{mvar|b}} and {{mvar|b}}+1 bits for Golomb code (i.e. {{mvar|M}} is not a power of 2). Let <math>b = \lfloor\log_2(M)\rfloor</math>. If <math>r < 2^{b+1} - M</math>, then use {{mvar|b}} bits to encode {{mvar|r}}; otherwise, use {{mvar|b}}+1 bits to encode {{mvar|r}}. Clearly, <math>b=\log_2(M)</math> if {{mvar|M}} is a power of 2 and we can encode all values of {{mvar|r}} with {{mvar|b}} bits.
 
The integer {{mvar|x}} treated by Golomb was the run length of a [[Bernoulli process]], which has a [[geometric distribution]] starting at 0. The best choice of parameter {{mvar|M}} is a function of the corresponding Bernoulli process, which is parameterized by <math>p = P(X=0)</math> the probability of success in a given [[Bernoulli trial]]. {{mvar|M}} is either the median of the distribution or the median +/− 1. It can be determined by these inequalities:
 
: <math>(1-p)^M + (1-p)^{M+1} \leq 1 < (1-p)^{M-1} + (1-p)^M,</math>
 
which are solved by
 
: <math>M = \left\lceil -\frac{\log(2 -p)}{\log(1-p)}\right\rceil</math>.
 
For the example with {{math|''p''(0) {{=}} 0.2}}:
The Golomb code for this distribution is equivalent to the [[Huffman code]] for the same probabilities, if it were possible to compute the Huffman code.
: <math>M = \left\lceil -\frac{\log(1.8)}{\log(0.8)}\right\rceil = \left\lceil 2.634 \right\rceil = 3</math>.
 
The Golomb code for this distribution is equivalent to the [[Huffman code]] for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source values.
 
===Use with signed integers===