Golomb coding: Difference between revisions

Content deleted Content added
m Use for run-length encoding: :<math>\text, &prime; {{val}}
m Overview: fix math italics
Line 18:
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' ({{mvar|q}}), and the ''offset'' within the bin ({{mvar|r}}). The example figure shows the position {{mvar|q}} and offset {{mvar|r}} for the encoding of integer {{mvar|x}} using Golomb–Rice parameter {{math|''M'' {{=}} 3}}, with source probabilities following a geometric distribution with {{math|''p''(0) {{=}} 0.2}}.
 
Formally, the two parts are given by the following expression, where <math>{{mvar|x</math>}} is the nonnegative integer being encoded:
 
:<math>q = \left \lfloor \frac{x}{M} \right \rfloor</math>
Line 30:
Both {{mvar|q}} and {{mvar|r}} will be encoded using variable numbers of bits: {{mvar|q}} by a unary code, and {{mvar|r}} by {{mvar|b}} bits for Rice code, or a choice between {{mvar|b}} and {{mvar|b}}+1 bits for Golomb code (i.e. {{mvar|M}} is not a power of 2), with <math>b = \lfloor\log_2(M)\rfloor</math>. If <math>r < 2^{b+1} - M</math>, then use {{mvar|b}} bits to encode {{mvar|r}}; otherwise, use {{mvar|b}}+1 bits to encode {{mvar|r}}. Clearly, <math>b=\log_2(M)</math> if {{mvar|M}} is a power of 2 and we can encode all values of {{mvar|r}} with {{mvar|b}} bits.
 
The integer {{mvar|x}} treated by Golomb was the run length of a [[Bernoulli process]], which has a [[geometric distribution]] starting at 0. The best choice of parameter {{mvar|M}} is a function of the corresponding Bernoulli process, which is parameterized by <math>p = P(x=0)</math> the probability of success in a given [[Bernoulli trial]]. {{mvar|M}} is either the median of the distribution or the median +/− &pm;1. It can be determined by these inequalities:
: <math>(1-p)^M + (1-p)^{M+1} \leq 1 < (1-p)^{M-1} + (1-p)^M,</math>
which are solved by
Line 42:
===Use with signed integers===
 
Golomb's scheme was designed to encode sequences of non-negative numbers. However it is easily extended to accept sequences containing negative numbers using an ''overlap and interleave'' scheme, in which all values are reassigned to some positive number in a unique and reversible way. The sequence begins: 0, -1, 1, -2, 2, -3, 3, -4, 4 ... The ''n''<sup>th</sup> negative value (i.e., {{tmath|-n}}) is mapped to the ''n''<sup>th</sup> odd number ({{tmath|2n-1}}), and the ''m''<sup>th</sup> positive value is mapped to the ''m''<sup>th</sup> even number ({{tmath|2m}}). This may be expressed mathematically as follows: a positive value <math>{{mvar|x</math>}} is mapped to (<math>x^\prime=2|x|=2x, x\ge0</math>), and a negative value <math>{{mvar|y</math>}} is mapped to (<math>y^\prime=2|y|-1=-2y-1, y<0</math>). Such a code may be used for simplicity, even if suboptimal. Truly optimal codes for two-sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters, including this one.<ref>{{Cite journal | last1 = Merhav | first1 = N. | last2 = Seroussi | first2 = G. | last3 = Weinberger | first3 = M. J. | title = Coding of sources with two-sided geometric distributions and unknown parameters | journal = [[IEEE Transactions on Information Theory]]| volume = 46 | issue = 1 | pages = 229–236 | year = 2000 | doi=10.1109/18.817520}}</ref>
 
== Simple algorithm ==