Golomb coding: Difference between revisions

Content deleted Content added
m Use for run-length encoding: :<math>\text, &prime; {{val}}
Line 123:
 
For example, with a Rice–Golomb encoding using parameter {{math|''M'' {{=}} 10}}, the decimal number 42 would first be split into {{mvar|q}} = 4 and {{mvar|r}} = 2, and would be encoded as qcode({{mvar|q}}),rcode({{mvar|r}}) = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the {{mvar|q}} code is enough to say when {{mvar|q}} ends and {{mvar|r}} begins ; both the qcode and rcode are self-delimited).
 
 
 
== Use for run-length encoding ==
Line 128 ⟶ 130:
:''Note that {{mvar|p}} and {{math|1 – p}} are reversed in this section compared to the use in earlier sections.''
 
Given an alphabet of two symbols, or a set of two events, ''P'' and ''Q'', with probabilities ''p'' and ({{math|1&nbsp; &minus;&nbsp; ''p''}}) respectively, where {{math|''p''&nbsp; &nbsp; 1/2}}, Golomb coding can be used to encode runs of zero or more ''P''<nowiki>'</nowiki>&prime;s separated by single ''Q''<nowiki>'</nowiki>&prime;s. In this application, the best setting of the parameter ''M'' is the nearest integer to <math>- \frac{-1}{\log_{2}p}</math>. When ''p'' = 1/2, ''M'' = 1, and the Golomb code corresponds to unary ({{math|''n''&nbsp; &nbsp; 0}} ''P'''&prime;s followed by a ''Q'' is encoded as ''n'' ones followed by a zero). If a simpler code is desired, one can assign Golomb–Rice parameter <math>{{mvar|b</math>}} (i.e., Golomb parameter <math>M=2^b</math>) to the integer nearest to <math>- \log_2(-\log_2 p)</math>; although not always the best parameter, it is usually the best Rice parameter and its compression performance is quite close to the optimal Golomb code. (Rice himself proposed using various codes for the same data to figure out which was best. A later [[Jet Propulsion Laboratory|JPL]] researcher proposed various methods of optimizing or estimating the code parameter.<ref>{{Cite techreport | last1 = Kiely | first1 = A. | title = Selecting the Golomb Parameter in Rice Coding | number = 42-159 | institution = [[Jet Propulsion Laboratory]] | year = 2004}}</ref>)
 
Consider using a Rice code with a binary portion having <math>{{mvar|b</math>}} bits to run-length encode sequences where ''P'' has a probability <math>{{mvar|p</math>}}. If <math>\mathbb{P}[\mathbftext{bit~ is~ part~ of~ }k\mathbftext{-run}]</math> is the probability that a bit will be part of an <math>{{mvar|k</math>}}-bit run (<math>k-1</math> ''P''s and one ''Q'') and <math>(\mathbftext{compression~ ratio~ of~ }k\mathbftext{-run})</math> is the compression ratio of that run, then the expected compression ratio is
<!-- below mostly comes from above reference (Kiely), but not exactly, so leave uncited for now -->
:<math>\begin{align}
\mathbb{E}[\mathbftext{compression~ ratio}]
&= \sum_{k=1}^\infty (\mathbftext{compression~ ratio~ of~ }k\mathbftext{-run}) \cdot \mathbb{P}[\mathbftext{bit~ is~ part~ of~ }k\mathbftext{-run}] \\
&= \sum_{k=1}^\infty \frac{b+1+\lfloor 2^{-b}(k-1) \rfloor}{k} \cdot kp^{k-1} (1-p)^2 \\
&= (1-p)^2 \sum_{j=0}^\infty (b+1+j) \cdot \sum_{i=j2^b+1}^{(j+1)2^b} p^{i-1} \\
Line 142 ⟶ 144:
\end{align}</math>
 
Compression is often expressed in terms of <math>1-\mathbb{E}[\mathbftext{compression~ ratio}]</math>, the proportion compressed. For <math>p \approx 1</math>, the run-length coding approach results in compression ratios close to [[Entropy (information theory)|entropy]]. For example, using Rice code <math>b=6</math> for <math>p=0.99</math> yields <math>{{val|91.89\|u=%</math>}} compression, while the entropy limit is <math>{{val|91.92\|u=%</math>}}.
 
== Adaptive run-length Golomb–Rice encoding ==