Unary coding: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:42, 26 January 2024 edit Beland (talk \| contribs) Autopatrolled, Administrators 259,085 edits m MOS:ELLIPSES (via WP:JWB) ← Previous edit		Latest revision as of 18:11, 31 May 2025 edit undo Em3rgent0rdr (talk \| contribs) Extended confirmed users 7,995 edits shorten wording on thermometer code. Tag: Visual edit
(10 intermediate revisions by 3 users not shown)
Line 1: {{Short description\|Entropy encoding}} '''Unary coding''',<ref group="nb" name="NB1"/> or the [[unary numeral system]] ~~and also sometimes called '''thermometer code'''~~, is an [[entropy encoding]] that represents a [[natural number]], ''n'', with ~~a code of length ''n'' + 1 ( or ''n'' ), usually~~ ''n'' ones followed by a zero (if the term ''natural number'' is understood as ''non-negative integer'') or with ''n'' − 1 ones followed by a zero (if the term ''natural number'' is understood as ''strictly positive integer''). A ~~For~~unary ~~example~~number's 5code islength ~~represented~~would thus be ''n'' + 1 with that first definition, or ''n'' with that second definition. Unary code when vertical behaves like mercury in a [[thermometer]] that gets taller or shorter as ~~111110~~''n'' gets bigger or ~~11110~~smaller, and so is sometimes called '''thermometer code'''.<ref>{{Cite web \|title=University of Alberta Dictionary of Cognitive Science: Thermometer Code \|url=http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/T/thermcode.html \|access-date=2025-05-31 \|website=www.bcp.psych.ualberta.ca}}</ref> An ~~Some~~alternative ~~representations~~representation ~~use~~uses ''n'' or ''n'' − 1 zeros followed by a one., ~~The~~effectively swapping the ones and zeros ~~are interchangeable~~, [[without loss of generality]]. ~~Unary~~For ~~coding~~example, isthe ~~both~~first aten ~~[[prefix-free~~unary ~~code]]~~codes ~~and a [[self-synchronizing code]].~~are: {\| class="wikitable" ~~! n (non-negative) !! n (strictly positive) !~~! Unary code !! Alternative !n (non-negative) !n (strictly positive) \|- \| 0 \|\| 1 \|\| 0 \| \| 1 \|- ~~\| 1 \|\| 2 \|~~\| 10 \|\| 01 \|1 \|2 \|- ~~\| 2 \|\| 3 \|~~\| 110 \|\| 001 \|2 \|3 \|- ~~\| 3 \|\| 4 \|~~\| 1110 \|\| 0001 \|3 \|4 \|- ~~\| 4 \|\| 5 \|~~\| 11110 \|\| 00001 \|4 \|5 \|- ~~\| 5 \|\| 6 \|~~\| 111110 \|\| 000001 \|5 \|6 \|- ~~\| 6 \|\| 7 \|~~\| 1111110 \|\| 0000001 \|6 \|7 \|- ~~\| 7 \|\| 8 \|~~\| 11111110 \|\| 00000001 \|7 \|8 \|- ~~\| 8 \|\| 9 \|~~\| 111111110 \|\| 000000001 \|8 \|9 \|- ~~\| 9 \|\| 10 \|~~\| 1111111110 \|\| 0000000001 \|9 \|10 \|} Unary coding is an ''optimally efficient''{{Clarify\|reason=This term needs some introduction.\|date=May 2025}} encoding for the following discrete [[probability distribution]]{{Citation needed\|date=May 2025\|reason=Citation needed for this strong claim and to explain this term "optimally efficient".}} :<math>\operatorname{P}(n) = 2^{-n}\,</math> Line 39 ⟶ 61: :<math>\operatorname{P}(n) \ge \operatorname{P}(n+1) + \operatorname{P}(n+2)\, </math> for <math>n=1,2,3,...</math>. Although it is the optimal symbol-by-symbol coding for such probability distributions, [[Golomb coding]] achieves better compression capability for the geometric distribution because it does not consider input symbols independently, but rather implicitly groups the inputs. For the same reason, [[arithmetic encoding]] performs better for general probability distributions, as in the last case above. Unary coding is both a [[prefix-free code]] and a [[self-synchronizing code]]. ==Unary code in use today== Line 77 ⟶ 101: \| colspan="3" \| ... \|} These codes are guaranteed to end validly on any length of data ( when reading arbitrary data ) and in the ( separate ) write cycle allow for the use and transmission of an extra bit ( the one used for the first bit ) while maintaining overall and per-integer unary code lengths of exactly N. ==Uniquely decodable non-prefix unary codes== Line 117 ⟶ 141: \| colspan="3" \| ... \|} These codes also ( when writing unsigned integers ) allow for the use and transmission of an extra bit ( the one used for the first bit ). Thus they are able to transmit 'm' integers * N unary bits and 1 additional bit of information within m*N bits of data. ==Symmetric unary codes== ~~Following~~The ~~set~~following ofsymmetric unary codes ~~are symmetric and~~ can be read ~~in any direction. It is also~~and instantaneously ~~decodable~~decoded in either direction.: {\| class="wikitable" style="text-align: center;" ~~! n (strictly positive) !~~! Unary code !Alternative !n (non-negative) !n (strictly positive) \|- ~~\| 1 \|~~\| 1 \|0 \|0 \|1 \|- ~~\| 2 \|~~\| 00 \|11 \|1 \|2 \|- ~~\| 3 \|~~\| 010 \|101 \|2 \|3 \|- ~~\| 4 \|~~\| 0110 \|1001 \|3 \|4 \|- ~~\| 5 \|~~\| 01110 \|10001 \|4 \|5 \|- ~~\| 6 \|~~\| 011110 \|100001 \|5 \|6 \|- ~~\| 7 \|~~\| 0111110 \|1000001 \|6 \|7 \|- ~~\| 8 \|~~\| 01111110 \|10000001 \|7 \|8 \|- ~~\| 9 \|~~\| 011111110 \|100000001 \|8 \|9 \|- ~~\| 10 \|~~\| 0111111110 \|1000000001 \|9 \|10 \|- \| colspan="4" \| ... \|} == Canonical unary codes == {{See also\|Canonical Huffman code}} For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. It involves starting with numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'. For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. The largest ''n'' numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'.{{Clarify\|date=May 2025\|reason=(1) Poor grammar. (2) Are canonical codes only for the positive natural number convention? (3) Shouldn't this say that we are starting with the largest n for this 0 or 2^(n-1) code? (4) Why use red to write the final code (in this example corresponding to n=10)? Does it belong or not belong to the canonical format? (6) Why is there an extra row after the n=10 row? (Shouldn't we delete that empty row to indicate that there are no more codes after the maximum?) (7) Is the maximum code corresponding to n=10 or n=9, which this table both uses 9 digits to express. (We should delete this n=10 row if is beyond the maximum...the algorithm part about "reducing the number of digits by one" would only make sense if 9 is the maximum.)}} {\| class="wikitable" ! n !! Unary code Line 202 ⟶ 238: \|111111110 \|- style="color: red;" \| 10 \|\| ~~0000000000~~000000000 \|111111111 ~~\|1111111111~~ \|- \| colspan="3" \| \|} Canonical codes can [http://www.cs.ucf.edu/courses/cap5015/Huff.pdf require less processing time to decode]{{Clarification\|reason=Citation deals with [[Canonical Huffman code]]. Is the statement relevant only for when dealing with [[Huffman encoding]], or is this a general statement about Canonical unary code?\|date=May 2025}} when they are processed as numbers not a string. If the number of codes required per symbol length is different to 1, i.e. there are more non-unary codes of some length required, those would be achieved by increasing/decreasing the values numerically without reducing the length in that case. ==Generalized unary coding== Line 267 ⟶ 303: [[Category:Coding theory]] [[Category:Entropy coding]] [[Category:Data compression]]