Unary coding: Difference between revisions

Content deleted Content added
shorten wording on thermometer code.
 
(10 intermediate revisions by 3 users not shown)
Line 1:
{{Short description|Entropy encoding}}
'''Unary coding''',<ref group="nb" name="NB1"/> or the [[unary numeral system]] and also sometimes called '''thermometer code''', is an [[entropy encoding]] that represents a [[natural number]], ''n'', with a code of length ''n''&nbsp;+&nbsp;1 ( or ''n'' ), usually ''n'' ones followed by a zero (if the term ''natural number'' is understood as ''non-negative integer'') or with ''n''&nbsp;&minus;&nbsp;1 ones followed by a zero (if the term ''natural number'' is understood as ''strictly positive integer''). A Forunary examplenumber's 5code islength representedwould thus be ''n''&nbsp;+&nbsp;1 with that first definition, or ''n'' with that second definition. Unary code when vertical behaves like mercury in a [[thermometer]] that gets taller or shorter as 111110''n'' gets bigger or 11110smaller, and so is sometimes called '''thermometer code'''.<ref>{{Cite web |title=University of Alberta Dictionary of Cognitive Science: Thermometer Code |url=http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/T/thermcode.html |access-date=2025-05-31 |website=www.bcp.psych.ualberta.ca}}</ref> An Somealternative representationsrepresentation useuses ''n'' or ''n''&nbsp;&minus;&nbsp;1 zeros followed by a one., Theeffectively swapping the ones and zeros are interchangeable, [[without loss of generality]]. UnaryFor codingexample, isthe bothfirst aten [[prefix-freeunary code]]codes and a [[self-synchronizing code]].are:
{| class="wikitable"
! n (non-negative) !! n (strictly positive) !! Unary code !! Alternative
!n (non-negative)
!n (strictly positive)
|-
| 0 || 1
|| 0 |
| 1
|-
| 1 || 2 || 10 || 01
|1
|2
|-
| 2 || 3 || 110 || 001
|2
|3
|-
| 3 || 4 || 1110 || 0001
|3
|4
|-
| 4 || 5 || 11110 || 00001
|4
|5
|-
| 5 || 6 || 111110 || 000001
|5
|6
|-
| 6 || 7 || 1111110 || 0000001
|6
|7
|-
| 7 || 8 || 11111110 || 00000001
|7
|8
|-
| 8 || 9 || 111111110 || 000000001
|8
|9
|-
| 9 || 10 || 1111111110 || 0000000001
|9
|10
|}
 
Unary coding is an ''optimally efficient''{{Clarify|reason=This term needs some introduction.|date=May 2025}} encoding for the following discrete [[probability distribution]]{{Citation needed|date=May 2025|reason=Citation needed for this strong claim and to explain this term "optimally efficient".}}
 
:<math>\operatorname{P}(n) = 2^{-n}\,</math>
Line 39 ⟶ 61:
:<math>\operatorname{P}(n) \ge \operatorname{P}(n+1) + \operatorname{P}(n+2)\, </math>
 
for <math>n=1,2,3,...</math>. Although it is the optimal symbol-by-symbol coding for such probability distributions, [[Golomb coding]] achieves better compression capability for the geometric distribution because it does not consider input symbols independently, but rather implicitly groups the inputs. For the same reason, [[arithmetic encoding]] performs better for general probability distributions, as in the last case above.
 
Unary coding is both a [[prefix-free code]] and a [[self-synchronizing code]].
 
==Unary code in use today==
Line 77 ⟶ 101:
| colspan="3" | ...
|}
These codes are guaranteed to end validly on any length of data ( when reading arbitrary data ) and in the ( separate ) write cycle allow for the use and transmission of an extra bit ( the one used for the first bit ) while maintaining overall and per-integer unary code lengths of exactly N.
 
==Uniquely decodable non-prefix unary codes==
Line 117 ⟶ 141:
| colspan="3" | ...
|}
These codes also ( when writing unsigned integers ) allow for the use and transmission of an extra bit ( the one used for the first bit ). Thus they are able to transmit 'm' integers * N unary bits and 1 additional bit of information within m*N bits of data.
 
==Symmetric unary codes==
FollowingThe setfollowing ofsymmetric unary codes are symmetric and can be read in any direction. It is alsoand instantaneously decodabledecoded in either direction.:
{| class="wikitable" style="text-align: center;"
! n (strictly positive) !! Unary code
!Alternative
!n (non-negative)
!n (strictly positive)
|-
| 1 || 1
|0
|0
|1
|-
| 2 || 00
|11
|1
|2
|-
| 3 || 010
|101
|2
|3
|-
| 4 || 0110
|1001
|3
|4
|-
| 5 || 01110
|10001
|4
|5
|-
| 6 || 011110
|100001
|5
|6
|-
| 7 || 0111110
|1000001
|6
|7
|-
| 8 || 01111110
|10000001
|7
|8
|-
| 9 || 011111110
|100000001
|8
|9
|-
| 10 || 0111111110
|1000000001
|9
|10
|-
| colspan="4" | ...
|}
 
== Canonical unary codes ==
{{See also|Canonical Huffman code}}
For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. It involves starting with numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'.
For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. The largest ''n'' numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'.{{Clarify|date=May 2025|reason=(1) Poor grammar. (2) Are canonical codes only for the positive natural number convention? (3) Shouldn't this say that we are starting with the largest n for this 0 or 2^(n-1) code? (4) Why use red to write the final code (in this example corresponding to n=10)? Does it belong or not belong to the canonical format? (6) Why is there an extra row after the n=10 row? (Shouldn't we delete that empty row to indicate that there are no more codes after the maximum?) (7) Is the maximum code corresponding to n=10 or n=9, which this table both uses 9 digits to express. (We should delete this n=10 row if is beyond the maximum...the algorithm part about "reducing the number of digits by one" would only make sense if 9 is the maximum.)}}
{| class="wikitable"
! n !! Unary code
Line 202 ⟶ 238:
|111111110
|- style="color: red;"
| 10 || 0000000000000000000
|111111111
|1111111111
|-
| colspan="3" |
|}
Canonical codes can [http://www.cs.ucf.edu/courses/cap5015/Huff.pdf require less processing time to decode]{{Clarification|reason=Citation deals with [[Canonical Huffman code]]. Is the statement relevant only for when dealing with [[Huffman encoding]], or is this a general statement about Canonical unary code?|date=May 2025}} when they are processed as numbers not a string. If the number of codes required per symbol length is different to 1, i.e. there are more non-unary codes of some length required, those would be achieved by increasing/decreasing the values numerically without reducing the length in that case.
 
==Generalized unary coding==
Line 267 ⟶ 303:
[[Category:Coding theory]]
[[Category:Entropy coding]]
[[Category:Data compression]]