Schur complement: Difference between revisions

Content deleted Content added
minus sign missing in an equation
Owen Reich (talk | contribs)
Link suggestions feature: 2 links added.
 
(227 intermediate revisions by more than 100 users not shown)
Line 1:
In{{Short description|Tool in [[linear algebra]] and the theory of [[matrix (mathematics)|matrices]],analysis}}
theThe '''Schur complement''' (namedis aftera key tool in the fields of [[Issailinear Schuralgebra]]), ofthe a blocktheory of a [[matrix within(mathematics)|matrices]], the[[numerical analysis]], and [[statistics]].
larger matrix is defined as follows.
Suppose ''A'', ''B'', ''C'', ''D'' are respectively
''p''×''p'', ''p''×''q'', ''q''×''p''
and ''q''×''q'' matrices, and ''D'' is invertible.
Let
 
It is defined for a [[block matrix]]. Suppose ''p'', ''q'' are [[nonnegative integers]] such that ''p + q > 0'', and suppose ''A'', ''B'', ''C'', ''D'' are respectively ''p'' × ''p'', ''p'' × ''q'', ''q'' × ''p'', and ''q'' × ''q'' matrices of complex numbers. Let
:<math>M=\left[\begin{matrix} A & B \\ C & D \end{matrix}\right]</math>
<math display="block">M = \begin{bmatrix} A & B \\ C & D \end{bmatrix}</math>
so that ''M'' is a (''p'' + ''q'') × (''p'' + ''q'') matrix.
 
so thatIf ''MD'' is ainvertible, then the Schur complement of the block (''pD''+ of the matrix ''qM'')&times;( is the ''p''+ × ''qp'') matrix. defined by
<math display="block">M/D := A - BD^{-1}C.</math>
If ''A'' is invertible, the Schur complement of the block ''A'' of the matrix ''M'' is the ''q'' × ''q'' matrix defined by
<math display="block">M/A := D - CA^{-1}B.</math>
In the case that ''A'' or ''D'' is [[singular matrix|singular]], substituting a [[generalized inverse]] for the inverses on ''M/A'' and ''M/D'' yields the '''generalized Schur complement'''.
 
ThenThe the '''Schur complement''' ofis thenamed blockafter ''D''[[Issai Schur]]<ref>{{ ofcite thejournal
|author=Schur, J.
matrix ''M'' is the ''p''&times;''p'' matrix
|title=Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind
|journal=J. reine u. angewandte Mathematik
|volume=147
|year=1917
|pages=205–232
|doi=10.1515/crll.1917.147.205
|url=https://eudml.org/doc/149467
}}
</ref> who used it to prove [[Schur's lemma]], although it had been used previously.<ref name="Zhang 2005">{{cite book |title=The Schur Complement and Its Applications |first=Fuzhen |last=Zhang |editor1-first=Fuzhen |editor1-last=Zhang |series=Numerical Methods and Algorithms |year=2005 |volume=4 |publisher=Springer| isbn=0-387-24271-6 |doi=10.1007/b105056}}</ref> [[Emilie Virginia Haynsworth]] was the first to call it the ''Schur complement''.<ref>Haynsworth, E. V., "On the Schur Complement", ''Basel Mathematical Notes'', #BNB 20, 17 pages, June 1968.</ref> The Schur complement is sometimes referred to as the ''Feshbach map'' after a physicist [[Herman Feshbach]].<ref>{{ cite journal
|author=Feshbach, Herman
|title=Unified theory of nuclear reactions
|journal=Annals of Physics
|volume=5
|issue=4
|year=1958
|pages=357–390
|doi=10.1016/0003-4916(58)90007-1
}}
</ref>
 
==Background==
:<math>A-BD^{-1}C.</math>
The Schur complement arises when performing a block [[Gaussian elimination]] on the matrix ''M''. In order to eliminate the elements below the block diagonal, one multiplies the matrix ''M'' by a ''block lower triangular'' matrix on the right as follows:
<math display="block">\begin{align}
&M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \quad \to \quad \begin{bmatrix} A & B \\ C & D \end{bmatrix} \begin{bmatrix} I_p & 0 \\ -D^{-1}C & I_q \end{bmatrix} = \begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix},
\end{align}</math>
where ''I<sub>p</sub>'' denotes a ''p''&times;''p'' [[identity matrix]]. As a result, the Schur complement <math>M/D = A - BD^{-1}C</math> appears in the upper-left ''p''&times;''p'' block.
 
Continuing the elimination process beyond this point (i.e., performing a block [[Gauss–Jordan elimination]]),
The Schur complement arises as the result of performing a "partial" [[Gaussian elimination]] by multiplying the matrix ''M'' from the right with the "lower triangular" block matrix
<math display="block">\begin{align}
&\begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix} \quad \to \quad \begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix}
= \begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix},
\end{align}</math>
leads to an [[LDU decomposition]] of ''M'', which reads
<math display="block">\begin{align}
M &= \begin{bmatrix} A & B \\ C & D \end{bmatrix}
= \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}\begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix}\begin{bmatrix} I_p & 0 \\ D^{-1}C & I_q \end{bmatrix}.
\end{align}</math>
Thus, the inverse of ''M'' may be expressed involving ''D''<sup>&minus;1</sup> and the inverse of Schur's complement, assuming it exists, as
<math display="block">\begin{align}
M^{-1} = \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1}
={} &\left(\begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}
\begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix}
\begin{bmatrix} I_p & 0 \\ D^{-1}C & I_q \end{bmatrix}
\right)^{-1} \\
={} &\begin{bmatrix} I_p & 0 \\ -D^{-1}C & I_q \end{bmatrix}
\begin{bmatrix} \left(A - BD^{-1}C\right)^{-1} & 0 \\ 0 & D^{-1} \end{bmatrix}
\begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix} \\[4pt]
={} &\begin{bmatrix}
\left(A - BD^{-1}C\right)^{-1} & -\left(A - BD^{-1}C\right)^{-1} BD^{-1} \\
-D^{-1}C\left(A - BD^{-1}C\right)^{-1} & D^{-1} + D^{-1}C\left(A - BD^{-1}C\right)^{-1}BD^{-1}
\end{bmatrix} \\[4pt]
={} &\begin{bmatrix}
\left(M/D\right)^{-1} & -\left(M/D\right)^{-1} B D^{-1} \\
-D^{-1}C\left(M/D\right)^{-1} & D^{-1} + D^{-1}C\left(M/D\right)^{-1} B D^{-1}
\end{bmatrix}.
\end{align}</math>
The above relationship comes from the elimination operations that involve ''D''<sup>&minus;1</sup> and ''M/D''. An equivalent derivation can be done with the roles of ''A'' and ''D'' interchanged. By equating the expressions for ''M''<sup>&minus;1</sup> obtained in these two different ways, one can establish the [[matrix inversion lemma]], which relates the two Schur complements of ''M'': ''M/D'' and ''M/A'' (see ''"Derivation from LDU decomposition"'' in {{slink|Woodbury_matrix_identity#Alternative_proofs}}).
 
== Properties ==
:<math>LT=\left[\begin{matrix} E_p & 0 \\ -D^{-1}C & D^{-1} \end{matrix}\right].</math>
 
Here* If ''E<sub>p</sub>'' denotesand a''q'' are both 1 (i.e., ''pA''&times;, ''pB'', unit''C'' matrix.and After''D'' multiplicationare withall thescalars), matrixwe ''LT''get the Schur complementfamiliar appearsformula infor the upperinverse ''p''&times;''p''of block.a The product2-by-2 matrix is:
: <math> M^{-1} = \frac{1}{AD-BC} \left[ \begin{matrix} D & -B \\ -C & A \end{matrix}\right] </math>
: provided that ''AD''&nbsp;&minus;&nbsp;''BC'' is non-zero.
* In general, if ''A'' is invertible, then
: <math>\begin{align}
M &= \begin{bmatrix} A&B\\C&D \end{bmatrix} =
\begin{bmatrix} I_p & 0 \\ CA^{-1} & I_q \end{bmatrix}\begin{bmatrix} A & 0 \\ 0 & D - CA^{-1}B \end{bmatrix}\begin{bmatrix} I_p & A^{-1}B \\ 0 & I_q \end{bmatrix}, \\[4pt]
M^{-1} &= \begin{bmatrix} A^{-1} + A^{-1} B (M/A)^{-1} C A^{-1} & - A^{-1} B (M/A)^{-1} \\ - (M/A)^{-1} CA^{-1} & (M/A)^{-1} \end{bmatrix}
\end{align}</math>
: whenever this inverse exists.
* (Schur's formula) When ''A'', respectively ''D'', is invertible, the determinant of ''M'' is also clearly seen to be given by
: <math>\det(M) = \det(A) \det\left(D - CA^{-1} B\right)</math>, respectively
: <math>\det(M) = \det(D) \det\left(A - BD^{-1} C\right)</math>,
: which generalizes the determinant formula for 2 × 2 matrices.
* (Guttman rank additivity formula) If ''D'' is invertible, then the [[Rank (linear algebra)|rank]] of ''M'' is given by
: <math> \operatorname{rank}(M) = \operatorname{rank}(D) + \operatorname{rank}\left(A - BD^{-1} C\right)</math>
* ([[Haynsworth inertia additivity formula]]) If ''A'' is invertible, then the ''inertia'' of the block matrix ''M'' is equal to the inertia of ''A'' plus the inertia of ''M''/''A''.
* (Quotient identity) <math>A/B = ((A/C)/(B/C))</math>.<ref>{{Cite journal |last1=Crabtree |first1=Douglas E. |last2=Haynsworth |first2=Emilie V. |date=1969 |title=An identity for the Schur complement of a matrix |url=https://www.ams.org/proc/1969-022-02/S0002-9939-1969-0255573-1/ |journal=Proceedings of the American Mathematical Society |language=en |volume=22 |issue=2 |pages=364–366 |doi=10.1090/S0002-9939-1969-0255573-1 |s2cid=122868483 |issn=0002-9939|doi-access=free |url-access=subscription }}</ref>
* The Schur complement of a [[Laplacian matrix]] is also a Laplacian matrix.<ref>{{Cite journal |last=Devriendt |first=Karel |date=2022 |title=Effective resistance is more than distance: Laplacians, Simplices and the Schur complement |url=https://linkinghub.elsevier.com/retrieve/pii/S0024379522000039 |journal=Linear Algebra and Its Applications |language=en |volume=639 |pages=24–49 |doi=10.1016/j.laa.2022.01.002|arxiv=2010.04521 |s2cid=222272289 }}</ref>
 
== Application to solving linear equations ==
:<math>M\cdot LT=\left[\begin{matrix} A-BD^{-1}C & BD^{-1} \\ 0 & E_q \end{matrix}\right].</math>
The Schur complement arises naturally in solving a [[system of linear equations]] such as<ref name="Boyd 2004" />
 
<math>
The inverse of ''M'' thus may be expressed involving <math>D^{-1}</math> and the inverse of Schur's complement only as
\begin{bmatrix} A & B \\ C & D \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix}
= \begin{bmatrix} u \\ v \end{bmatrix}
</math>.
 
Assuming that the submatrix <math>A</math> is invertible, we can eliminate <math>x</math> from the equations, as follows.
:<math> \left[ \begin{matrix} A & B \\ C & D \end{matrix}\right]^{-1} = \left[ \begin{matrix} \left(A-B D^{-1} C \right)^{-1} & -\left(A-B D^{-1} C \right)^{-1} B D^{-1} \\ -D^{-1}C\left(A-B D^{-1} C \right)^{-1} & D^{-1}+ D^{-1} C \left(A-B D^{-1} C \right)^{-1} B D^{-1} \end{matrix} \right], </math>
 
<math>x = A^{-1} (u - By).</math>
or more simply put,
 
Substituting this expression into the second equation yields
:<math> \left[ \begin{matrix} A & B \\ C & D \end{matrix}\right]^{-1} =
: <math>\left(D - CA^{-1}B\right)y = v-CA^{-1}u.</math>
\left[ \begin{matrix} I & 0 \\ -D^{-1}C & I \end{matrix}\right]
\left[ \begin{matrix} (A-BD^{-1}C)^{-1} & 0 \\ 0 & D^{-1} \end{matrix}\right]
\left[ \begin{matrix} I & -BD^{-1} \\ 0 & I \end{matrix}\right]. </math>
 
IfWe refer to this as the ''Mreduced equation'' isobtained aby [[positive-definiteeliminating matrix<math>x</math> |positivefrom definite]]the original equation. symmetricThe matrix, thenappearing soin the reduced equation is called the Schur complement of ''D''the first block <math>A</math> in ''<math>M''.</math>:
: <math>S \ \overset{\underset{\mathrm{def}}{}}{=}\ D - CA^{-1}B</math>.
 
Solving the reduced equation, we obtain
== Application to solving linear equations ==
: <math>y = S^{-1} \left(v-CA^{-1}u\right).</math>
 
Substituting this into the first equation yields
The Schur complement arises naturally in solving a system of linear equations such as
: <math>x = \left(A^{-1} + A^{-1} B S^{-1} C A^{-1}\right) u - A^{-1} B S^{-1} v. </math>
 
We can express the above two equation as:
:<math>Ax + By = a</math>
: <math>\begin{bmatrix} x \\ y \end{bmatrix} =
:<math>Cx + Dy = b</math>
\begin{bmatrix}
A^{-1} + A^{-1} B S^{-1} C A^{-1} & -A^{-1} B S^{-1} \\
-S^{-1} C A^{-1} & S^{-1}
\end{bmatrix}
\begin{bmatrix} u \\ v \end{bmatrix}.
</math>
 
Therefore, a formulation for the inverse of a block matrix is:
where ''x'', ''a'' are ''p''-dimensional [[column vector]]s, ''y'', ''b'' are ''q''-dimensional column vectors, and ''A'', ''B'', ''C'', ''D'' are as above. Multiplying the bottom equation by <math>BD^{-1}</math> and then subtracting from the top equation one obtains
: <math>
\begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} =
\begin{bmatrix}
A^{-1} + A^{-1} B S^{-1} C A^{-1} & - A^{-1} B S^{-1} \\
-S^{-1} C A^{-1} & S^{-1}
\end{bmatrix} = \begin{bmatrix}
I_p & -A^{-1}B\\
& I_q
\end{bmatrix}\begin{bmatrix}
A^{-1} & \\
& S^{-1}
\end{bmatrix}\begin{bmatrix}
I_p & \\
-CA^{-1} & I_q
\end{bmatrix}.
</math>
 
In particular, we see that the Schur complement is the inverse of the <math>2,2</math> block entry of the inverse of <math>M</math>.
:<math>(A - BD^{-1} C) x = a - BD^{-1} b.\,</math>
 
In practice, one needs <math>A</math> to be [[condition number|well-conditioned]] in order for this algorithm to be numerically accurate.
Thus if one can invert ''D'' as well as the Schur complement of ''D'', one can solve for ''x'', and
then by using the equation <math>Cx + Dy = b</math> one can solve for ''y''. This reduces the problem of
inverting a <math>(p+q) \times (p+q)</math> matrix to that of inverting a ''p''&times;''p'' matrix and a ''q''&times;''q'' matrix. In practice one needs ''D'' to be well-conditioned in order for this algorithm to be accurate.
 
This method is useful in electrical engineering to reduce the dimension of a network's equations. It is especially useful when element(s) of the output vector are zero. For example, when <math>u</math> or <math>v</math> is zero, we can eliminate the associated rows of the coefficient matrix without any changes to the rest of the output vector. If <math>v</math> is null then the above equation for <math>x</math> reduces to <math>x = \left(A^{-1} + A^{-1} B S^{-1} C A^{-1}\right) u</math>, thus reducing the dimension of the coefficient matrix while leaving <math>u</math> unmodified. This is used to advantage in electrical engineering where it is referred to as node elimination or [[Kron reduction]].
== Applications to probability theory and statistics==
 
==Applications to probability theory and statistics==
Suppose the random column vectors ''X'', ''Y'' live in '''R'''<sup>''n''</sup> and '''R'''<sup>''m''</sup> respectively, and the vector (''X'', ''Y'') in '''R'''<sup>''n''+''m''</sup> has a [[multivariate normal distribution]] whose variance is the symmetric positive-definite matrix
 
Suppose the random column vectors ''X'', ''Y'' live in '''R'''<sup>''n''</sup> and '''R'''<sup>''m''</sup> respectively, and the vector (''X'', ''Y'') in '''R'''<sup>''n'' + ''m''</sup> has a [[multivariate normal distribution]] whose covariance is the symmetric positive-definite matrix
:<math>V=\left[\begin{matrix} A & B \\ B^T & C \end{matrix}\right].</math>
 
:<math>\Sigma = \left[\begin{matrix} A & B \\ B^\mathrm{T} & C\end{matrix}\right],</math>
Then the conditional variance of ''X'' given ''Y'' is the Schur complement of ''C'' in ''V'':
 
where <math display="inline">A \in \mathbb{R}^{n \times n}</math> is the [[covariance matrix]] of ''X'', <math display="inline">C \in \mathbb{R}^{m \times m}</math> is the covariance matrix of ''Y'' and <math display="inline">B \in \mathbb{R}^{n \times m}</math> is the covariance matrix between ''X'' and ''Y''.
:<math>\operatorname{var}(X\mid Y)=A-BC^{-1}B^T.</math>
 
Then the [[Conditional variance|conditional covariance]] of ''X'' given ''Y'' is the Schur complement of ''C'' in <math display="inline">\Sigma</math>:<ref name="von Mises 1964">{{cite book |title=Mathematical theory of probability and statistics |url=https://archive.org/details/mathematicaltheo0057vonm |url-access=registration |first=Richard |last=von Mises |year=1964|publisher=Academic Press| chapter=Chapter VIII.9.3|isbn=978-1483255385}}</ref>
If we take the matrix ''V'' above to be, not a variance of a random vector, but a ''sample'' variance, then it may have a [[Wishart distribution]]. In that case, the Schur complement of ''C'' in ''V'' also has a Wishart distribution.
 
:<math>\begin{align}
[[Category:Linear algebra]]
\operatorname{Cov}(X \mid Y) &= A - BC^{-1}B^\mathrm{T} \\
\operatorname{E}(X \mid Y) &= \operatorname{E}(X) + BC^{-1}(Y - \operatorname{E}(Y))
\end{align}</math>
 
If we take the matrix <math>\Sigma</math> above to be, not a covariance of a random vector, but a ''sample'' covariance, then it may have a [[Wishart distribution]]. In that case, the Schur complement of ''C'' in <math>\Sigma</math> also has a Wishart distribution.{{Citation needed|date=January 2014}}
[[it:Complemento di Schur]]
 
== Conditions for positive definiteness and semi-definiteness ==
 
Let ''X'' be a [[symmetric matrix]] of real numbers given by
<math display="block">X = \left[\begin{matrix} A & B \\ B^\mathrm{T} & C\end{matrix}\right].</math>
Then by the [[Haynsworth inertia additivity formula]], we find
* If ''A'' is invertible, then ''X'' is positive definite if and only if ''A'' and its complement ''X/A'' are both positive definite:<ref name="Zhang 2005"></ref>{{rp|34}}
:<math display="block">X \succ 0 \Leftrightarrow A \succ 0, X/A = C - B^\mathrm{T} A^{-1} B \succ 0.</math>
* If ''C'' is invertible, then ''X'' is positive definite if and only if ''C'' and its complement ''X/C'' are both positive definite:
:<math display="block">X \succ 0 \Leftrightarrow C \succ 0, X/C = A - B C^{-1} B^\mathrm{T} \succ 0.</math>
* If ''A'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/A'' is positive semi-definite:<ref name="Zhang 2005"/>{{rp|34}}
:<math display="block">\text{If } A \succ 0, \text{ then } X \succeq 0 \Leftrightarrow X/A = C - B^\mathrm{T} A^{-1} B \succeq 0.</math>
* If ''C'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/C'' is positive semi-definite:
:<math display="block">\text{If } C \succ 0,\text{ then } X \succeq 0 \Leftrightarrow X/C = A - B C^{-1} B^\mathrm{T} \succeq 0.</math>
 
The first and third statements can also be derived<ref name="Boyd 2004">Boyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5)</ref> by considering the minimizer of the quantity
<math display="block">u^\mathrm{T} A u + 2 v^\mathrm{T} B^\mathrm{T} u + v^\mathrm{T} C v, \,</math>
as a function of ''v'' (for fixed ''u'').
 
Furthermore, since
<math display="block">
\left[\begin{matrix} A & B \\ B^\mathrm{T} & C \end{matrix}\right] \succ 0
\Longleftrightarrow \left[\begin{matrix} C & B^\mathrm{T} \\ B & A \end{matrix}\right] \succ 0
</math>
and similarly for positive semi-definite matrices, the second (respectively fourth) statement is immediate from the first (resp. third) statement.
 
There is also a sufficient and necessary condition for the positive semi-definiteness of ''X'' in terms of a generalized Schur complement.<ref name="Zhang 2005" /> Precisely,
* <math>X \succeq 0 \Leftrightarrow A \succeq 0, C - B^\mathrm{T} A^g B \succeq 0, \left(I - A A^{g}\right)B = 0 \, </math> and
* <math>X \succeq 0 \Leftrightarrow C \succeq 0, A - B C^g B^\mathrm{T} \succeq 0, \left(I - C C^g\right)B^\mathrm{T} = 0, </math>
 
where <math>A^g</math> denotes a [[generalized inverse]] of <math>A</math>.
 
== See also ==
* [[Woodbury matrix identity]]
* [[Quasi-Newton method]]
* [[Haynsworth inertia additivity formula]]
* [[Gaussian process]]
* [[Total least squares]]
* [[Guyan reduction]] in computational mechanics
 
== References ==
{{reflist}}
 
[[Category:Linear algebra]]
[[Category:Issai Schur]]