Normalizing constant: Difference between revisions

Content deleted Content added
No edit summary
 
(46 intermediate revisions by 35 users not shown)
Line 1:
{{Use American English|date = March 2019}}
The concept of a '''normalizing constant''' arises in [[probability theory]] and a variety of other areas of [[mathematics]].
{{Short description|Constant a such that af(x) is a probability measure}}
{{distinguish|Proportionality factor}}
 
In [[probability theory]], a '''normalizing constant''' or '''normalizing factor''' is used to reduce any probability function to a probability density function with total probability of one.
==Definition and examples==
 
For example, a Gaussian function can be normalized into a probability density function, which gives the standard normal distribution. In Bayes' theorem, a normalizing constant is used to ensure that the sum of all possible hypotheses equals 1. Other uses of normalizing constants include making the value of a Legendre polynomial at 1 and in the orthogonality of orthonormal functions.
In [[probability theory]], a '''normalizing constant''' is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a [[probability density function]] or a [[probability mass function]].<ref>''Continuous Distributions'' at University of Alabama.</ref><ref>Feller, 1968, p. 22.</ref> For example, we have
 
A similar concept has been used in areas other than probability, such as for polynomials.
:<math>\int_{-\infty}^\infty e^{-x^2/2}\,dx=\sqrt{2\pi\,},</math>
 
==Definition and examples==
so that
 
In [[probability theory]], a '''normalizing constant''' is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a [[probability density function]] or a [[probability mass function]].<ref>''[http://www.math.uah.edu/stat/dist/Continuous.xhtml Continuous Distributions''] at Department of Mathematical Sciences: University of Alabama. in Huntsville</ref><ref>{{harvnb|Feller, |1968, |p. =22.}}</ref> For example, we have
:<math> \varphi(x) = \frac{1}{\sqrt{2\pi\,}} e^{-x^2/2} </math>
 
==Examples==
is a probability density function.<ref>Feller, 1968, p. 174.</ref> This is the density of the standard [[normal distribution]]. (''Standard'', in this case, means the [[expected value]] is 0 and the [[variance]] is 1.)
 
If we start from the simple [[Gaussian function]]
Similarly,
<math display="block">p(x) = e^{-x^2/2}, \quad x\in(-\infty,\infty) </math>
we have the corresponding [[Gaussian integral]]
:<math display="block">\int_{-\infty}^\infty p(x) \, dx = \int_{-\infty}^\infty e^{-x^2/2} \, dx = \sqrt{2\pi\,},</math>
 
Now if we use the latter's [[reciprocal value]] as a normalizing constant for the former, defining a function <math> \varphi(x) </math> as
:<math> display="block">\varphi(x) = \frac{1}{\sqrt{2\pi\,}} p(x) = \frac{1}{\sqrt{2\pi\,}} e^{-x^2/2} </math>
so that its [[integral of a Gaussian function|integral]] is unit
<math display="block">\int_{-\infty}^\infty \varphi(x) \, dx = \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi\,}} e^{-x^2/2} \, dx = 1 </math>
then the function <math> \varphi(x) </math> is a probability density function.<ref>{{harvnb|Feller, |1968, |p. =174.}}</ref> This is the density of the standard [[normal distribution]]. (''Standard'', in this case, means the [[expected value]] is 0 and the [[variance]] is 1.)
 
And constant <math display="inline"> \frac{1}{\sqrt{2\pi}} </math> is the '''normalizing constant''' of function <math>p(x)</math>.
:<math>\sum_{n=0}^\infty \frac{\lambda^n}{n!}=e^\lambda ,</math>
 
Similarly,
:<math display="block">\sum_{n=0}^\infty \frac{\lambda^n}{n!} = e^{\lambda} ,</math>
and consequently
:<math display="block">f(n) = \frac{\lambda^n e^{-\lambda}}{n!} </math>
 
is a probability mass function on the set of all nonnegative integers.<ref>{{harvnb|Feller, |1968, |p. =156.}}</ref> This is the probability mass function of the [[Poisson distribution]] with expected value &lambda;λ.
:<math>f(n)=\frac{\lambda^n e^{-\lambda}}{n!}</math>
 
is a probability mass function on the set of all nonnegative integers.<ref>Feller, 1968, p. 156.</ref> This is the probability mass function of the [[Poisson distribution]] with expected value &lambda;.
 
Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the [[Boltzmann distribution]] plays a central role in [[statistical mechanics]]. In that context, the normalizing constant is called the [[partition function (statistical mechanics)|partition function]].
Line 27 ⟶ 38:
==Bayes' theorem==
[[Bayes' theorem]] says that the posterior probability measure is proportional to the product of the prior probability measure and the [[likelihood function]]. ''Proportional to'' implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have
:<math display="block">P(H_0|D) = \frac{P(D|H_0)P(H_0)}{P(D)}</math>
 
:<math>P(H_0|D) = \frac{P(D|H_0)P(H_0)}{P(D)}</math>
 
where P(H<sub>0</sub>) is the prior probability that the hypothesis is true; P(D|H<sub>0</sub>) is the [[conditional probability]] of the data given that the hypothesis is true, but given that the data are known it is the [[likelihood function|likelihood]] of the hypothesis (or its parameters) given the data; P(H<sub>0</sub>|D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality:
:<math display="block">P(H_0|D) \propto P(D|H_0)P(H_0).</math>
 
:<math>P(H_0|D) \propto P(D|H_0)P(H_0).</math>
 
Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that
:<math display="block">P(H_0|D) = \frac{P(D|H_0)P(H_0)}{\displaystyle\sum_i P(D|H_i)P(H_i)} .</math>
In this case, the [[Multiplicative inverse|reciprocal]] of the value
:<math display="block">P(D) = \sum_i P(D|H_i)P(H_i) \;</math>
is the ''normalizing constant''.<ref>{{harvnb|Feller, |1968, |p. =124.}}</ref> It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.
 
For concreteness, there are many methods of estimating the normalizing constant for practical purposes. Methods include the bridge sampling technique, the naive Monte Carlo estimator, the generalized harmonic mean estimator, and importance sampling.<ref>{{Cite web |last = Gronau | first = Quentin | date = 2020 | title = bridgesampling: An R Package for Estimating Normalizing Constants | url = https://cran.r-project.org/web/packages/bridgesampling/vignettes/bridgesampling_paper.pdf | access-date = September 11, 2021 | website = The Comprehensive R Archive Network}}</ref>
:<math>P(H_0|D) = \frac{P(D|H_0)P(H_0)}{\displaystyle\sum_i P(D|H_i)P(H_i)} .</math>
 
In this case, the [[Multiplicative inverse|reciprocal]] of the value
 
:<math>P(D)=\sum_i P(D|H_i)P(H_i) \;</math>
 
is the ''normalizing constant''.<ref>Feller, 1968, p. 124.</ref> It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.
 
==Non-probabilistic uses==
 
The [[Legendre polynomials]] are characterized by [[orthogonality]] with respect to the uniform measure on the interval [&minus; 1−1, 1] and the fact that they are '''normalized''' so that their value at 1 is 1. The constant by which one multiplies a polynomial so its value at 1 is 1 is a normalizing constant.
 
[[Orthonormal]] functions are normalized such that <math display="block">\langle f_i , \, f_j \rangle = \, \delta_{i,j}</math> with respect to some inner product {{math|⟨''f'', ''g''⟩}}.
:<math>\langle f_i , \, f_j\rangle = \, \delta_{i,j}</math>
with respect to some inner product <''f'',&nbsp;''g''>.
 
The constant {{math|1/&{{radic;|2}}}} is used to establish the [[hyperbolic functions#relationshipComparison towith ordinary trigonometriccircular functions|hyperbolic functions]] cosh and sinh from the lengths of the adjacent and opposite sides of a [[hyperbolic sector#Hyperbolic triangle|hyperbolic triangle]].
 
==NotesSee also==
*[[Normalization (statistics)]]
{{reflist}}
 
==References==
{{reflist}}
*[http://www.math.uah.edu/stat/dist/Continuous.xhtml Continuous Distributions] at Department of Mathematical Sciences: University of Alabama in Huntsville
{{refbegin}}
*{{cite book | last = Feller | first = William |authorlink author-link = William Feller | title = An Introduction to Probability Theory and its Applications (volume I) | publisher = John Wiley & Sons | date = 1968 | isbn = 0-471-25708-7}}
 
{{refend}}
[[Category:Probability theory]]
[[Category:One]]
 
[[Category:Theory of probability distributions]]
[[de:Normalisierung (Mathematik)]]
[[Category:Probability1 theory(number)]]
[[pt:Constante de normalização]]
[[sv:Normering]]