Partition function (mathematics): Difference between revisions

Content deleted Content added
Linas (talk | contribs)
Information geometry: more verbose lead
rm :-indents (MOS:INDENT)
 
(39 intermediate revisions by 34 users not shown)
Line 1:
{{Short description|Generalization of the concept from statistical mechanics}}
The '''partition function''' or '''configuration integral''', as used in [[probability theory]], [[information science]] and [[dynamical systems]], is an abstraction of the definition of a [[partition function in statistical mechanics]]. It is a special case of a [[normalizing constant]] in probability theory, for the [[Boltzmann distribution]]. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated [[probability measure]], the [[Gibbs measure]], has the [[Markov property]]. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the [[Hopfield network]]), and applications such as [[genomics]], [[corpus linguistics]] and [[artificial intelligence]], which employ [[Markov network]]s, and [[Markov logic network]]s. The Gibbs measure is also the unique measure that has the property of maximizing the [[entropy (general concept)|entropy]] for a fixed expectation value of the energy; this underlies the appearance of the partition function in [[maximum entropy method]]s and the algorithms derived therefrom.
{{For|the partition function in number theory|Partition function (number theory)}}
The '''partition function''' or '''configuration integral''', as used in [[probability theory]], [[information sciencetheory]] and [[dynamical systems]], is ana abstractiongeneralization of the definition of a [[partition function in statistical mechanics]]. It is a special case of a [[normalizing constant]] in probability theory, for the [[Boltzmann distribution]]. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated [[probability measure]], the [[Gibbs measure]], has the [[Markov property]]. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the [[Hopfield network]]), and applications such as [[genomics]], [[corpus linguistics]] and [[artificial intelligence]], which employ [[Markov network]]s, and [[Markov logic network]]s. The Gibbs measure is also the unique measure that has the property of maximizing the [[entropy (general concept)|entropy]] for a fixed expectation value of the energy; this underlies the appearance of the partition function in [[maximum entropy method]]s and the algorithms derived therefrom.
 
The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate [[expectation value]]s and [[Green's function]]s, forming a bridge to [[Fredholm theory]]. It also provides a natural setting for the [[information geometry]] approach to [[information theory]], where the [[Fisher information metric]] can be understood to be a [[correlation function]] derived from the partition function; it happens to define a [[Riemannian manifold]].
 
When the setting for random variables is on [[complex projective space]] or [[projective Hilbert space]], geometrized with the [[Fubini-StudyFubini–Study metric]], the theory of [[quantum mechanics]] and more generally [[quantum field theory]] results. In these theories, the partition function is heavily exploited in the [[path integral formulation]], with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued [[simplex]] of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.
 
==Definition==
Given a set of [[random variablesvariable]]s <math>X_i</math> taking on values <math>x_i</math>, and some sort of [[Scalar potential|potential function]] or [[Hamiltonian function|Hamiltonian]] <math>H(x_1,x_2,\dots)</math>, the partition function is defined as
 
:<math display="block">Z(\beta) = \sum_{x_i} \exp \left(-\beta H(x_1,x_2,\dots) \right)</math>
 
The function ''H'' is understood to be a real-valued function on the space of states <math>\{X_1,X_2,\cdotsdots\}</math>, while <math>\beta</math> is a real-valued free parameter (conventionally, the [[inverse temperature]]). The sum over the <math>x_i</math> is understood to be a sum over all possible values that each of the random variables <math>X_i</math> may take. Thus, the sum is to be replaced by an [[integral]] when the <math>X_i</math> are continuous, rather than discrete. Thus, one writes
 
:<math display="block">Z(\beta) = \int \exp \left(-\beta H(x_1,x_2,\dots) \right) \, dx_1 \, dx_2 \cdots</math>
 
for the case of continuously-varying <math>X_i</math>.
Line 18 ⟶ 20:
When ''H'' is an [[observable]], such as a finite-dimensional [[matrix (mathematics)|matrix]] or an infinite-dimensional [[Hilbert space]] [[operator (mathematics)|operator]] or element of a [[C-star algebra]], it is common to express the summation as a [[trace (linear algebra)|trace]], so that
 
:<math display="block">Z(\beta) = \mboxoperatorname{tr}\left(\exp\left(-\beta H\right)\right)</math>
 
When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be [[trace class]], that is, of a form such that the summation exists and is bounded.
Line 24 ⟶ 26:
The number of variables <math>X_i</math> need not be [[countable]], in which case the sums are to be replaced by [[functional integral]]s. Although there are many notations for functional integrals, a common one would be
 
:<math display="block">Z = \int \mathcal{D} \phivarphi \exp \left(- \beta H[\phivarphi] \right)</math>
 
Such is the case for the [[partition function in quantum field theory]].
Line 30 ⟶ 32:
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a [[generating function]] for [[correlation function]]s. This is discussed in greater detail below.
 
==The parameter &beta; ''β''==
 
The role or meaning of the parameter <math>\beta</math> can be understood in a variety of different ways. In classical thermodynamics, it is an [[inverse temperature]]. More generally, one would say that it is the variable that is [[Conjugate variables (thermodynamics)|conjugate]] to some (arbitrary) function <math>H</math> of the random variables <math>X</math>. The word ''conjugate'' here is used in the sense of conjugate [[generalized coordinates]] in [[Lagrangian mechanics]], thus, properly <math>\beta</math> is a [[Lagrange multiplier]]. It is not uncommonly called the [[generalized force]]. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the [[expectation value]] of <math>H</math>, even as many different [[probability distribution]]s can give rise to exactly this same (fixed) value.
 
For the general case, one considers a set of functions <math>\{H_k(x_1,\cdotsdots)\}</math> that each depend on the random variables <math>X_i</math>. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of [[Lagrange multiplier]]s. In the general case, [[maximum entropy method]]s illustrate the manner in which this is done.
 
Some specific examples are in order. In basic thermodynamics problems, when using the [[canonical ensemble]], the use of just one parameter <math>\beta</math> reflects the fact that there is only one expectation value that must be held constant: the [[Thermodynamic free energy|free energy]] (due to [[conservation of energy]]). For chemistry problems involving chemical reactions, the [[grand canonical ensemble]] provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the [[fugacity]], is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).
 
For the general case, one has
 
:<math display="block">Z(\beta) = \sum_{x_i} \exp \left(-\sum_k\beta_k H_k(x_i) \right)</math>
 
with <math>\beta = (\beta_1, \beta_2, \cdotsdots)</math> a point in a space.
 
For a collection of observables <math>H_k</math>, one would write
 
:<math display="block">Z(\beta) = \mboxoperatorname{tr} \left[\,\exp \left(-\sum_k\beta_k H_k\right) \right]</math>
 
As before, it is presumed that the argument of {{math|tr}} is [[trace class]].
 
The corresponding [[Gibbs measure]] then provides a probability distribution such that the expectation value of each <math>H_k</math> is a fixed value. More precisely, one has
 
:<math display="block">\frac{\partial}{\partial \beta_k} \left(- \log Z \right) = \langle H_k\rangle = \mathrm{E}\left[H_k\right]</math>
 
with the angle brackets <math>\langle H_k \rangle</math> denoting the expected value of <math>H_k</math>, and <math>\mathrmoperatorname{E}[\;,\cdot\,]</math> being a common alternative notation. A precise definition of this expectation value is given below.
 
Although the value of <math>\beta</math> is commonly taken to be real, it need not be, in general; this is discussed in the section [[#Normalization|Normalization]] below. The values of <math>\beta</math> can be understood to be the coordinates of points in a space; this space is in fact a [[manifold]], as sketched below. The study of these spaces as manifolds constitutes the field of [[information geometry]].
 
== Symmetry ==
The potential function itself commonly takes the form of a sum:
 
:<math display="block">H(x_1,x_2,\dots) = \sum_s V(s)\,</math>
 
where the sum over ''s'' is a sum over some subset of the [[power set]] ''P''(''X'') of the set <math>X = \lbrace x_1,x_2,\dots \rbrace</math>. For example, in [[statistical mechanics]], such as the [[Ising model]], the sum is over pairs of nearest neighbors. In probability theory, such as [[Markov networks]], the sum might be over the [[clique (graph theory)|cliques]] of a graph; so, for the Ising model and other [[lattice model (physics)|lattice models]], the maximal cliques are edges.
 
The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the [[groupGroup action (mathematics)|action]] of a [[group (mathematics)|group symmetry]], such as [[translational invariance]]. Such symmetries can be discrete or continuous; they materialize in the [[correlation function]]s for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice- versa).
 
This symmetry has a critically important interpretation in probability theory: it implies that the [[Gibbs measure]] has the [[Markov property]]; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the [[equivalence class]]es of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as [[Hopfield network]]s.
 
==As a measure==
The value of the expression
:<math display="block">\exp \left(-\beta H(x_1,x_2,\dots) \right)</math>
 
can be interpreted as a likelihood that a specific [[configurationConfiguration space (physics)|configuration]] of values <math>(x_1,x_2,\dots)</math> occurs in the system. Thus, given a specific configuration <math>(x_1,x_2,\dots)</math>,
 
:<math display="block">P(x_1,x_2,\dots) = \frac{1}{Z(\beta)} \exp \left(-\beta H(x_1,x_2,\dots) \right)</math>
 
is the [[probability density function|probability]] of the configuration <math>(x_1,x_2,\dots)</math> occurring in the system, which is now properly normalized so that <math>0\le P(x_1,x_2,\dots)\le 1</math>, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a [[measure (mathematics)|measure]] (a [[probability measure]]) on the [[probability space]]; formally, it is called the [[Gibbs measure]]. It generalizes the narrower concepts of the [[grand canonical ensemble]] and [[canonical ensemble]] in statistical mechanics.
Line 81 ⟶ 83:
There exists at least one configuration <math>(x_1,x_2,\dots)</math> for which the probability is maximized; this configuration is conventionally called the [[ground state]]. If the configuration is unique, the ground state is said to be '''non-degenerate''', and the system is said to be [[ergodic]]; otherwise the ground state is '''degenerate'''. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be an [[invariant measure]]. When it does not commute, the symmetry is said to be [[spontaneously broken]].
 
Conditions under which a ground state exists and is unique are given by the [[Karush–Kuhn–Tucker conditions]]; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.{{Citation needed|date=June 2013}}
 
==Normalization==
The values taken by <math>\beta</math> depend on the [[mathematical space]] over which the random field varies. Thus, real-valued random fields take values on a [[simplex]]: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range over [[complex projective space]] (or complex-valued [[projective Hilbert space]]), where the random variables are interpreted as [[probability amplitude]]s. The emphasis here is on the word ''projective'', as the amplitudes are still normalized to one. The normalization for the potential function is the [[Jacobian matrix and determinant|Jacobian]] for the appropriate mathematical space: it is 1 for ordinary probabilities, and ''i'' for Hilbert space; thus, in [[quantum field theory]], one sees <math>it H</math> in the exponential, rather than <math>\beta H</math>. The partition function is very heavily exploited in the [[path integral formulation]] of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.
 
==Expectation values==
The partition function is commonly used as a [[probability-generating function]] for [[expectation value]]s of various functions of the random variables. So, for example, taking <math>\beta</math> as an adjustable parameter, then the derivative of <math>\log(Z(\beta))</math> with respect to <math>\beta</math>
 
:<math display="block">\boldoperatorname{E}[H] = \langle H \rangle = -\frac {\partial \log(Z(\beta))} {\partial \beta}</math>
 
gives the average (expectation value) of ''H''. In physics, this would be called the average [[energy]] of the system.
 
Given the definition of the probability measure above, the expectation value of any function ''f'' of the random variables ''X'' may now be written as expected: so, for discrete-valued ''X'', one writes
:<math display="block">\begin{align}
\langle f\rangle
& = \sum_{x_i} f(x_1,x_2,\dots) P(x_1,x_2,\dots) \\
& = \frac{1}{Z(\beta)} \sum_{x_i} f(x_1,x_2,\dots) \exp \left(-\beta H(x_1,x_2,\dots) \right)
Line 101 ⟶ 103:
</math>
 
The above notation ismakes strictly correctsense for a finite number of discrete random variables,. butIn shouldmore be seen to be somewhat 'informal' for continuous variables;general properlysettings, the summations above should be replaced with theintegrals notations of the underlying [[sigma algebra]] used to defineover a [[probability space]]. That said, the identities continue to hold, when properly formulated on a [[measure space]].
 
Thus, for example, the [[entropy (general concept)|entropy]] is given by
 
:<math display="block">\begin{align} S
& = -k_Bk_\text{B} \langle\ln P\rangle \\[1ex]
& = -k_Bk_\text{B} \sum_{x_i} P(x_1, x_2, \dots) \ln P(x_1,x_2,\dots) \\
& = k_Bk_\text{B} \left(\beta \langle H\rangle + \log Z(\beta)\right)
\end{align}
</math>
Line 114 ⟶ 116:
The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use in [[maximum entropy method]]s.
 
== Information geometry ==
The points <math>\beta</math> can be understood to form a space, and specifically, a [[manifold]]. Thus, it is reasonable to ask about the structure of this manifold; this is the task of [[information geometry]].
 
Multiple derivatives with regard to the lagrangeLagrange multipliers gives rise to a positive semi-definite [[covariance matrix]]
:<math display="block">g_{ij}(\beta) = \frac{\partial^2}{\partial \beta^i\partial \beta^j} \left(-\log Z(\beta)\right) =
\langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle</math>
This matrix is positive semi-definite, and may be interpreted as a [[metric tensor]], specifically, a [[Riemannian metric]]. EquipingEquipping the space of lagrangeLagrange multipliers with a metric in this way turns it into a [[Riemannian manifold]].<ref>{{cite journal |first=Gavin E. |last=Crooks, "|year=2007 |title=Measuring thermodynamicThermodynamic length" (2007),Length |journal=[http://arxiv[Physical Review Letters|Phys. Rev. Lett.]] |volume=99 |issue=10 |pages=100602 |doi=10.org1103/abs/0706PhysRevLett.055999.100602 ArXiv|pmid=17930381 |arxiv=0706.0559] |bibcode=2007PhRvL..99j0602C |s2cid=7527491 }}</ref> The study of such manifolds is referred to as [[information geometry]]; the metric above is the [[Fisher information metric]]. Here, <math>\beta</math> serves as a coordinate on the manifold. It is interesting to compare the above definition to the simpler [[Fisher information]], from which it is inspired.
 
That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value:
:<math display="block">\begin{align} g_{ij}(\beta)
& = \left\langle \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \right\rangle \\
& = \sum_{x} P(x) \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \\
& = \sum_{x} P(x)
\left(H_i + \frac{\partial\log Z}{\partial \beta_i}\right)
Line 138 ⟶ 140:
where we've written <math>P(x)</math> for <math>P(x_1,x_2,\dots)</math> and the summation is understood to be over all values of all random variables <math>X_k</math>. For continuous-valued random variables, the summations are replaced by integrals, of course.
 
Curiously, the [[Fisher information metric]] can also be understood as the flat-space [[Euclidean metric]], after appropriate change of variables, as described in the main article on it. When the <math>\beta</math> are complex-valued, the resulting metric is the [[Fubini–Study metric]]. When written in terms of [[mixed state (physics)|mixed states]], instead of [[pure state]]s, it is known as the [[Bures metric]].
The [[Bures metric]] is the quantum-mechanical analog of the Fisher information metric.
 
== Correlation functions==
By introducing artificial auxiliary functions <math>J_k</math> into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing
 
:<math display="block">\begin{align} Z(\beta,J)
& = Z(\beta,J_1,J_2,\dots) \\
& = \sum_{x_i} \exp \left(-\beta H(x_1,x_2,\dots) +
Line 151 ⟶ 153:
</math>
 
one then has
:<math display="block">\boldoperatorname{E}[x_k] = \langle x_k \rangle = \left.
\frac{\partial}{\partial J_k}
\log Z(\beta,J)\right|_{J=0}
Line 159 ⟶ 161:
as the expectation value of <math>x_k</math>. In the [[path integral formulation]] of [[quantum field theory]], these auxiliary functions are commonly referred to as [[source field]]s.
 
Multiple differentiations lead to the [[Ursell function|connected correlation function]]s of the random variables. Thus the correlation function <math>C(x_j,x_k)</math> between variables <math>x_j</math> and <math>x_k</math> is given by:
 
:<math display="block">C(x_j,x_k) = \left.
\frac{\partial}{\partial J_j}
\frac{\partial}{\partial J_k}
Line 167 ⟶ 169:
</math>
 
==Gaussian integrals==
For the case where ''H'' can be written as a [[quadratic form]] involving a [[differential operator]], that is, as
 
:<math display="block">H = \frac{1}{2} \sum_n x_n D x_n</math>
 
then thepartition function can be understood to be a sum or [[Gaussian integral|integral]] over Gaussians. The correlation function <math>C(x_j,x_k)</math> can be understood to be the [[Green's function]] for the differential operator (and generally giving rise to [[Fredholm theory]]). In the quantum field theory setting, such functions are referred to as [[propagator]]s; higher order correlators are called n-point functions; working with them defines the [[effective action]] of a theory.
 
When the random variables are anti-commuting [[Grassmann number]]s, then the partition function can be expressed as a determinant of the operator ''D''. This is done by writing it as a [[Berezin integral]] (also called Grassmann integral).
 
==General properties==
Line 179 ⟶ 184:
* [[Exponential family]]
* [[Partition function (statistical mechanics)]]
* [[Partition problem]]
== References==
* [[Markov random field]]
 
== References==
{{reflist}}