Content deleted Content added
m Open access bot: doi added to citation with #oabot. |
Undid gf revision 1266160024 by 2A01:599:607:58C4:5C65:F467:5055:AECE (talk)— not an error |
||
(11 intermediate revisions by 8 users not shown) | |||
Line 1:
{{Short description|Family of algorithms for sampling from discrete probability distributions}}
[[File:Alias Table.png|alt=A circle on the left has 5 lines to 5 boxes in a column labeled "Acceptance". The first and second box are solid and each have the number 1 in them. The second box is half full and has the number 0.5 in it. The fourth box is solid with a 1 and the fifth box is three quarters full with a 0.75. Each box has an arrow from the filled region to its index, i.e., the first box points to a 1, the second box to a two, etc. There is a second column of five boxes labeled "Alias", each corresponding to one of the first boxes. Three are empty, but the third has a 2 in it and the fifth has a 1 in it. There is an arrow from the empty part of the third box in the first column to the third box in the second column and similarly for the fifth boxes.|thumb|A diagram of an alias table that represents the probability distribution〈0.25, 0.3, 0.1, 0.2, 0.15〉]]
In [[computing]], the '''alias method''' is a family of efficient [[algorithm]]s for [[
==Operation==
Internally, the algorithm consults two tables, a ''[[probability]] [[Table (information)|table]]'' {{mvar|U<sub>i</sub>}} and an ''alias table'' {{mvar|K<sub>i</sub>}} (for {{math|1 ≤ ''i'' ≤ ''n''}}). To generate a random outcome, a fair [[dice|die]] is rolled to determine an index {{mvar|i}} into the two tables.
More concretely, the algorithm operates as follows:
Line 12 ⟶ 13:
# Otherwise, return {{mvar|K<sub>i</sub>}}.
An alternative formulation of the probability table, proposed by Marsaglia et al.<ref name=marsaglia>{{Citation |first1=George |last1=Marsaglia |author-link1=George Marsaglia |first2=Wai Wan |last2=Tsang |first3=Jingbo |last3=Wang |title=Fast Generation of Discrete Random Variables |journal=Journal of Statistical Software |date=2004-07-12 |volume=11 |issue=3 |pages=1–11 |doi=10.18637/jss.v011.i03 |doi-access=free |url=https://www.researchgate.net/publication/5142858}}</ref> as the
==Table generation==
The distribution may be padded with additional probabilities {{math|1=''p<sub>i</sub>'' = 0}} to increase {{mvar|n}} to a convenient value, such as a [[power of two]].
To generate the
* The "overfull" group, where {{math|''U<sub>i</sub>'' > 1}},
* The "underfull" group, where {{math|''U<sub>i</sub>'' < 1}} and {{mvar|K<sub>i</sub>}} has not been initialized, and
* The "exactly full" group, where {{math|1=''U<sub>i</sub>'' = 1}} or {{mvar|K<sub>i</sub>}} ''has'' been initialized.
If {{math|1=''U<sub>i</sub>'' = 1}}, the corresponding value {{mvar|K<sub>i</sub>}} will never be consulted and is unimportant, but a value of {{math|1=''K<sub>i</sub>'' = ''i''}} is sensible. This also avoids problems if the probabilities are represented as [[fixed-point number]]s which cannot represent {{math|1=''U<sub>i</sub>'' = 1}} exactly.
As long as not all table entries are exactly full, repeat the following steps:
# Arbitrarily choose an overfull entry {{math|''U<sub>i</sub>'' > 1}} and an underfull entry {{math|''U<sub>j</sub>'' < 1}}. (If one of these exists, the other must, as well.)
# Allocate the unused space in entry {{mvar|j}} to outcome {{mvar|i}}, by setting {{math|1=''K<sub>j</sub>''
# Remove the allocated space from entry {{mvar|i}} by changing {{math|1=''U<sub>i</sub>''
# Entry {{mvar|j}} is now exactly full.
# Assign entry {{mvar|i}} to the appropriate category based on the new value of {{mvar|U<sub>i</sub>}}.
Each iteration moves at least one entry to the "exactly full" category (and the last moves two), so the procedure is guaranteed to terminate after at most {{math|''n'' −1}} iterations. Each iteration can be done in {{math|''O''(1)}} time, so the table can be set up in {{math|''O''(''n'')}} time.
Vose<ref name=Vose/>{{Rp|974}} points out that floating-point rounding errors may cause the guarantee referred to in step 1 to be violated. If one category empties before the other, the remaining entries may have {{mvar|U<sub>i</sub>}} set to 1 with negligible error. The solution accounting for floating point is sometimes called the '''Walker-Vose method''' or the '''Vose alias method'''.
As the lookup procedure is slightly faster if {{math|''y'' < ''U<sub>i</sub>''}} (because {{mvar|K<sub>i</sub>}} does not need to be consulted), one goal during table generation is to maximize the sum of the {{mvar|U<sub>i</sub>}}. Doing this optimally turns out to be [[NP hard]],<ref name=marsaglia/>{{Rp|6}} but a [[greedy algorithm]] comes reasonably close: rob from the richest and give to the poorest. That is, at each step choose the largest {{mvar|U<sub>i</sub>}} and the smallest {{mvar|U<sub>j</sub>}}. Because this requires sorting the {{mvar|U<sub>i</sub>}}, it requires {{math|''O''(''n'' log ''n'')}} time.
Line 42 ⟶ 43:
Although the alias method is very efficient if generating a uniform deviate is itself fast, there are cases where it is far from optimal in terms of random bit usage. This is because it uses a full-precision random variate {{mvar|x}} each time, even when only a few random bits are needed.
One case arises when the probabilities are particularly well balanced, so many {{math|1=''U<sub>i</sub>'' = 1}}.
Another case arises when the probabilities are strongly unbalanced, so many {{math|''U<sub>i</sub>'' ≈ 0}}. For example if {{math|1=''p''<sub>1</sub> = 0.999}} and {{math|1=''p''<sub>2</sub> = 0.001}}, then the great majority of the time, only a few random bits are required to determine that case 1 applies.
In such cases, the table method described by Marsaglia et al.
==Literature==
Line 53 ⟶ 54:
* http://www.keithschwarz.com/darts-dice-coins/ Keith Schwarz: Detailed explanation, numerically stable version of Vose's algorithm, and link to Java implementation
* https://jugit.fz-juelich.de/mlz/ransampl Joachim Wuttke: Implementation as a small C library.
* https://gist.github.com/0b5786e9bfc73e75eb8180b5400cd1f8 Liam Huang's Implementation in C++
* https://github.com/joseftw/jos.weightedresult/blob/develop/src/JOS.WeightedResult/AliasMethodVose.cs C# implementation of Vose's algorithm.
* https://github.com/cdanek/KaimiraWeightedList C# implementation of Vose's algorithm without floating point instability.
==References==
|