Content deleted Content added
No edit summary |
rm :-indents (MOS:INDENT) |
||
(18 intermediate revisions by 16 users not shown) | |||
Line 1:
{{Short description|Generalization of the concept from statistical mechanics}}
{{For|the partition function in number theory|Partition function (number theory)}}
The '''partition function''' or '''configuration integral''', as used in [[probability theory]], [[information theory]] and [[dynamical systems]], is a generalization of the definition of a [[partition function in statistical mechanics]]. It is a special case of a [[normalizing constant]] in probability theory, for the [[Boltzmann distribution]]. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated [[probability measure]], the [[Gibbs measure]], has the [[Markov property]]. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the [[Hopfield network]]), and applications such as [[genomics]], [[corpus linguistics]] and [[artificial intelligence]], which employ [[Markov network]]s, and [[Markov logic network]]s. The Gibbs measure is also the unique measure that has the property of maximizing the [[entropy (general concept)|entropy]] for a fixed expectation value of the energy; this underlies the appearance of the partition function in [[maximum entropy method]]s and the algorithms derived therefrom.
Line 7 ⟶ 8:
==Definition==
Given a set of [[random
The function ''H'' is understood to be a real-valued function on the space of states <math>\{X_1,X_2,\
for the case of continuously-varying <math>X_i</math>.
Line 19 ⟶ 20:
When ''H'' is an [[observable]], such as a finite-dimensional [[matrix (mathematics)|matrix]] or an infinite-dimensional [[Hilbert space]] [[operator (mathematics)|operator]] or element of a [[C-star algebra]], it is common to express the summation as a [[trace (linear algebra)|trace]], so that
When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be [[trace class]], that is, of a form such that the summation exists and is bounded.
Line 25 ⟶ 26:
The number of variables <math>X_i</math> need not be [[countable]], in which case the sums are to be replaced by [[functional integral]]s. Although there are many notations for functional integrals, a common one would be
Such is the case for the [[partition function in quantum field theory]].
Line 31 ⟶ 32:
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a [[generating function]] for [[correlation function]]s. This is discussed in greater detail below.
==The parameter ''β''==
The role or meaning of the parameter <math>\beta</math> can be understood in a variety of different ways. In classical thermodynamics, it is an [[inverse temperature]]. More generally, one would say that it is the variable that is [[Conjugate variables (thermodynamics)|conjugate]] to some (arbitrary) function <math>H</math> of the random variables <math>X</math>. The word ''conjugate'' here is used in the sense of conjugate [[generalized coordinates]] in [[Lagrangian mechanics]], thus, properly <math>\beta</math> is a [[Lagrange multiplier]]. It is not uncommonly called the [[generalized force]]. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the [[expectation value]] of <math>H</math>, even as many different [[probability distribution]]s can give rise to exactly this same (fixed) value.
For the general case, one considers a set of functions <math>\{H_k(x_1,\
Some specific examples are in order. In basic thermodynamics problems, when using the [[canonical ensemble]], the use of just one parameter <math>\beta</math> reflects the fact that there is only one expectation value that must be held constant: the [[Thermodynamic free energy|free energy]] (due to [[conservation of energy]]). For chemistry problems involving chemical reactions, the [[grand canonical ensemble]] provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the [[fugacity]], is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).
Line 41 ⟶ 42:
For the general case, one has
with <math>\beta = (\beta_1, \beta_2, \
For a collection of observables <math>H_k</math>, one would write
As before, it is presumed that the argument of {{math|tr}} is [[trace class]].
The corresponding [[Gibbs measure]] then provides a probability distribution such that the expectation value of each <math>H_k</math> is a fixed value. More precisely, one has
with the angle brackets <math>\langle H_k \rangle</math> denoting the expected value of <math>H_k</math>, and <math>\
Although the value of <math>\beta</math> is commonly taken to be real, it need not be, in general; this is discussed in the section [[#Normalization|Normalization]] below. The values of <math>\beta</math> can be understood to be the coordinates of points in a space; this space is in fact a [[manifold]], as sketched below. The study of these spaces as manifolds constitutes the field of [[information geometry]].
Line 62 ⟶ 63:
The potential function itself commonly takes the form of a sum:
where the sum over ''s'' is a sum over some subset of the [[power set]] ''P''(''X'') of the set <math>X = \lbrace x_1,x_2,\dots \rbrace</math>. For example, in [[statistical mechanics]], such as the [[Ising model]], the sum is over pairs of nearest neighbors. In probability theory, such as [[Markov networks]], the sum might be over the [[clique (graph theory)|cliques]] of a graph; so, for the Ising model and other [[lattice model (physics)|lattice models]], the maximal cliques are edges.
The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the [[
This symmetry has a critically important interpretation in probability theory: it implies that the [[Gibbs measure]] has the [[Markov property]]; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the [[equivalence class]]es of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as [[Hopfield network]]s.
Line 72 ⟶ 73:
==As a measure==
The value of the expression
can be interpreted as a likelihood that a specific [[
is the [[probability density function|probability]] of the configuration <math>(x_1,x_2,\dots)</math> occurring in the system, which is now properly normalized so that <math>0\le P(x_1,x_2,\dots)\le 1</math>, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a [[measure (mathematics)|measure]] (a [[probability measure]]) on the [[probability space]]; formally, it is called the [[Gibbs measure]]. It generalizes the narrower concepts of the [[grand canonical ensemble]] and [[canonical ensemble]] in statistical mechanics.
Line 88 ⟶ 89:
==Expectation values==
The partition function is commonly used as a [[probability-generating function]] for [[expectation value]]s of various functions of the random variables. So, for example, taking <math>\beta</math> as an adjustable parameter, then the derivative of <math>\log(Z(\beta))</math> with respect to
gives the average (expectation value) of ''H''. In physics, this would be called the average [[energy]] of the system.
Given the definition of the probability measure above, the expectation value of any function ''f'' of the random variables ''X'' may now be written as expected: so, for discrete-valued ''X'', one writes
\langle f\rangle
& = \sum_{x_i} f(x_1,x_2,\dots) P(x_1,x_2,\dots) \\
Line 102 ⟶ 103:
</math>
The above notation
Thus, for example, the [[entropy (general concept)|entropy]] is given by
& = -
& = -
& =
\end{align}
</math>
Line 119 ⟶ 120:
Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite [[covariance matrix]]
\langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle</math>
This matrix is positive semi-definite, and may be interpreted as a [[metric tensor]], specifically, a [[Riemannian metric]]. Equipping the space of
That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value:
& = \left\langle \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \right\rangle \\
& = \sum_{x} P(x) \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \\
& = \sum_{x} P(x)
\left(H_i + \frac{\partial\log Z}{\partial \beta_i}\right)
Line 144 ⟶ 145:
By introducing artificial auxiliary functions <math>J_k</math> into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing
& = Z(\beta,J_1,J_2,\dots) \\
& = \sum_{x_i} \exp \left(-\beta H(x_1,x_2,\dots) +
Line 153 ⟶ 154:
one then has
\frac{\partial}{\partial J_k}
\log Z(\beta,J)\right|_{J=0}
Line 162 ⟶ 163:
Multiple differentiations lead to the [[Ursell function|connected correlation function]]s of the random variables. Thus the correlation function <math>C(x_j,x_k)</math> between variables <math>x_j</math> and <math>x_k</math> is given by:
\frac{\partial}{\partial J_j}
\frac{\partial}{\partial J_k}
Line 168 ⟶ 169:
</math>
==Gaussian integrals==
For the case where ''H'' can be written as a [[quadratic form]] involving a [[differential operator]], that is, as
then
When the random variables are anti-commuting [[Grassmann number]]s, then the partition function can be expressed as a determinant of the operator ''D''. This is done by writing it as a [[Berezin integral]] (also called Grassmann integral).
==General properties==
Line 180 ⟶ 184:
* [[Exponential family]]
* [[Partition function (statistical mechanics)]]
* [[Partition problem]]
* [[Markov random field]]
==References==
|