Unary coding: Difference between revisions

Content deleted Content added
try again to introduce the "Alternative".
shorten wording on thermometer code.
 
(4 intermediate revisions by the same user not shown)
Line 1:
{{Short description|Entropy encoding}}
'''Unary coding''',<ref group="nb" name="NB1"/> or the [[unary numeral system]], is an [[entropy encoding]] that represents a [[natural number]], ''n'', with ''n'' ones followed by a zero (if the term ''natural number'' is understood as ''non-negative integer'') or with ''n''&nbsp;&minus;&nbsp;1 ones followed by a zero (if the term ''natural number'' is understood as ''strictly positive integer''). A unary number's code length would thus be ''n''&nbsp;+&nbsp;1 with that first definition, or ''n'' with that second definition. A unary code is sometimes called a '''thermometer code''', because a unaryUnary code writtenwhen verticallyvertical behaves like mercury in a [[thermometer]] that gets taller or shorter as ''n'' gets bigger or smaller, and so is sometimes called '''thermometer code'''.<ref>{{Cite web |title=University of Alberta Dictionary of Cognitive Science: Thermometer Code |url=http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/T/thermcode.html |access-date=2025-05-31 |website=www.bcp.psych.ualberta.ca}}</ref> An alternative representation uses ''n'' or ''n''&nbsp;&minus;&nbsp;1 zeros followed by a one, effectively swapping the ones and zeros, [[without loss of generality]]. For example, the first ten unary codes are:
{| class="wikitable"
! Unary code !! Alternative
Line 47:
|}
 
Unary coding is an ''optimally efficient''{{Clarify|reason=This term needs some introduction.|date=May 2025}} encoding for the following discrete [[probability distribution]]{{Citation needed|date=May 2025|reason=Citation needed for this strong claim and to explain this term "optimally efficient".}}
 
:<math>\operatorname{P}(n) = 2^{-n}\,</math>
Line 144:
 
==Symmetric unary codes==
FollowingThe setfollowing ofsymmetric unary codes are symmetric and can be read in any direction. It is alsoand instantaneously decodabledecoded in either direction.:
{| class="wikitable" style="text-align: center;"
! n (strictly positive) !! Unary code
!Alternative
!n (non-negative)
!n (strictly positive)
|-
| 1 || 1
|0
|0
|1
|-
| 2 || 00
|11
|1
|2
|-
| 3 || 010
|101
|2
|3
|-
| 4 || 0110
|1001
|3
|4
|-
| 5 || 01110
|10001
|4
|5
|-
| 6 || 011110
|100001
|5
|6
|-
| 7 || 0111110
|1000001
|6
|7
|-
| 8 || 01111110
|10000001
|7
|8
|-
| 9 || 011111110
|100000001
|8
|9
|-
| 10 || 0111111110
|1000000001
|9
|10
|-
| colspan="4" | ...
|}
 
== Canonical unary codes ==
{{See also|Canonical Huffman code}}
For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. It involves starting with numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'.
For unary values where the maximum is known, one can use canonical unary codes that are of a somewhat numerical nature and different from character based codes. The largest ''n'' numerical '0' or '-1' ( <math>\operatorname2^{n} - 1\,</math>) and the maximum number of digits then for each step reducing the number of digits by one and increasing/decreasing the result by numerical '1'.{{Clarify|date=May 2025|reason=(1) Poor grammar. (2) Are canonical codes only for the positive natural number convention? (3) Shouldn't this say that we are starting with the largest n for this 0 or 2^(n-1) code? (4) Why use red to write the final code (in this example corresponding to n=10)? Does it belong or not belong to the canonical format? (6) Why is there an extra row after the n=10 row? (Shouldn't we delete that empty row to indicate that there are no more codes after the maximum?) (7) Is the maximum code corresponding to n=10 or n=9, which this table both uses 9 digits to express. (We should delete this n=10 row if is beyond the maximum...the algorithm part about "reducing the number of digits by one" would only make sense if 9 is the maximum.)}}
{| class="wikitable"
! n !! Unary code
Line 231 ⟶ 243:
| colspan="3" |
|}
Canonical codes can [http://www.cs.ucf.edu/courses/cap5015/Huff.pdf require less processing time to decode]{{Clarification|reason=Citation deals with [[Canonical Huffman code]]. Is the statement relevant only for when dealing with [[Huffman encoding]], or is this a general statement about Canonical unary code?|date=May 2025}} when they are processed as numbers not a string. If the number of codes required per symbol length is different to 1, i.e. there are more non-unary codes of some length required, those would be achieved by increasing/decreasing the values numerically without reducing the length in that case.
 
==Generalized unary coding==