Inverse transform sampling: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:53, 21 June 2025 edit TheDocumentHacker (talk \| contribs) 128 edits Link suggestions feature: 2 links added. Tags: Visual edit Newcomer task Suggested: add links ← Previous edit		Latest revision as of 15:43, 26 August 2025 edit undo Vonkje (talk \| contribs) Extended confirmed users 841 edits m link to inverse function
(3 intermediate revisions by 3 users not shown)
Line 24: We are randomly choosing a proportion of the area under the curve and returning the number in the ___domain such that exactly this proportion of the area occurs to the left of that number. Intuitively, we are unlikely to choose a number in the far end of tails because there is very little area in them which would require choosing a number very close to zero or one. Computationally, this method involves computing the [[quantile function]] of the distribution — in other words, computing the [[cumulative distribution function]] (CDF) of the distribution (which maps a number in the ___domain to a probability between 0 and 1) and then [[inverse function\|inverting]] that function. This is the source of the term "inverse" or "inversion" in most of the names for this method. Note that for a [[discrete distribution]], computing the CDF is not in general too difficult: we simply add up the individual probabilities for the various points of the distribution. For a [[continuous distribution]], however, we need to integrate the [[probability density function]] (PDF) of the distribution, which is impossible to do analytically for most distributions (including the [[normal distribution]]). As a result, this method may be computationally inefficient for many distributions and other methods are preferred; however, it is a useful method for building more generally applicable samplers such as those based on [[rejection sampling]]. For the [[normal distribution]], the lack of an [[Closed-form expression\|analytical expression]] for the corresponding quantile function means that other methods (e.g. the [[Box–Muller transform]]) may be preferred computationally. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on:<ref>{{cite book \|author=Luc Devroye \|url=http://www.eirene.de/Devroye.pdf \|title=Non-Uniform Random Variate Generation \|publisher=Springer-Verlag \|place=New York \|year=1986 \|access-date=2012-04-12 \|archive-date=2014-08-18 \|archive-url=https://web.archive.org/web/20140818200854/http://www.eirene.de/Devroye.pdf \|url-status=dead }}</ref> see, for example, the [[ziggurat algorithm]] and [[rejection sampling]]. On the other hand, it is possible to approximate the quantile function of the normal distribution extremely accurately using moderate-degree polynomials, and in fact the method of doing this is fast enough that inversion sampling is now the default method for sampling from a normal distribution in the statistical package [[R (programming language)\|R]].<ref>{{Cite web\|url=https://stat.ethz.ch/R-manual/R-devel/library/base/html/Random.html\|title = R: Random Number Generation}}</ref> ==Formal statement== Line 37: == Intuition == From <math>U \sim \mathrm{Unif}[0,1]</math>, we want to generate <math>X</math> with [[Cumulative distribution function\|CDF]] <math>F_X(x).</math> We assume <math>F_X(x)</math> to be a continuous, strictly [[Monotonic function\|increasing function]], which provides good intuition. We want to see if we can find some strictly monotone transformation <math>T:[0,1]\mapsto \mathbb{R}</math>, such that <math>T(U)\overset{d}{=}X</math>. We will have Line 64: Expressed differently, given a cumulative distribution function <math>F_X</math> and a uniform variable <math>U\in[0,1]</math>, the random variable <math>X = F_X^{-1}(U)</math> has the distribution <math>F_X</math>.<ref name="mcneil2005" /> In the continuous case, a treatment of such inverse functions as objects satisfying differential equations can be given.<ref>{{cite journal \| last1 = Steinbrecher \| first1 = György \| last2 = Shaw \| first2 = William T. \| title = Quantile mechanics \| journal = European Journal of Applied Mathematics \| date = 19 March 2008 \| volume = 19 \| issue = 2 \| pages = 87–112 \| doi = 10.1017/S0956792508007341\| s2cid = 6899308 }}</ref> Some such differential equations admit explicit [[power series]] solutions, despite their non-linearity.<ref>{{Cite journal \|last1=Arridge \|first1=Simon \|last2=Maass \|first2=Peter \|last3=Öktem \|first3=Ozan \|last4=Schönlieb \|first4=Carola-Bibiane \|title=Solving inverse problems using data-driven models \|journal=Acta Numerica \|year=2019 \|language=en \|volume=28 \|pages=1–174 \|doi=10.1017/S0962492919000059 \|s2cid=197480023 \|issn=0962-4929\|doi-access=free }}</ref> == Examples == Line 94: ==Proof of correctness== Let <math>F</math> be a [[cumulative distribution function]], and let <math>F^{-1}</math> be its [[cumulative distribution function#Inverse_distribution_function_(quantile_function)\|generalized inverse function]] (using the [[infimum]] because CDFs are weakly monotonic and [[Càdlàg\|right-continuous]]):<ref>{{cite book \|author=Luc Devroye \|title=Non-Uniform Random Variate Generation \|publisher=Springer-Verlag \|place=New York \|year=1986 \|chapter=Section 2.2. Inversion by numerical solution of ''F''(''X'') = ''U'' \|chapter-url=~~http~~https://luc.devroye.org/chapter_two.pdf}}</ref> :<math>F^{-1}(u) = \inf\;\{x \mid F(x)\geq u\} \qquad (0<u<1).</math>