Content deleted Content added
minus sign missing in an equation |
Owen Reich (talk | contribs) Link suggestions feature: 2 links added. |
||
(227 intermediate revisions by more than 100 users not shown) | |||
Line 1:
It is defined for a [[block matrix]]. Suppose ''p'', ''q'' are [[nonnegative integers]] such that ''p + q > 0'', and suppose ''A'', ''B'', ''C'', ''D'' are respectively ''p'' × ''p'', ''p'' × ''q'', ''q'' × ''p'', and ''q'' × ''q'' matrices of complex numbers. Let
<math display="block">M = \begin{bmatrix} A & B \\ C & D \end{bmatrix}</math>
so that ''M'' is a (''p'' + ''q'') × (''p'' + ''q'') matrix.
<math display="block">M/D := A - BD^{-1}C.</math>
If ''A'' is invertible, the Schur complement of the block ''A'' of the matrix ''M'' is the ''q'' × ''q'' matrix defined by
<math display="block">M/A := D - CA^{-1}B.</math>
In the case that ''A'' or ''D'' is [[singular matrix|singular]], substituting a [[generalized inverse]] for the inverses on ''M/A'' and ''M/D'' yields the '''generalized Schur complement'''.
|author=Schur, J.
|title=Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind
|journal=J. reine u. angewandte Mathematik
|volume=147
|year=1917
|pages=205–232
|doi=10.1515/crll.1917.147.205
|url=https://eudml.org/doc/149467
}}
</ref> who used it to prove [[Schur's lemma]], although it had been used previously.<ref name="Zhang 2005">{{cite book |title=The Schur Complement and Its Applications |first=Fuzhen |last=Zhang |editor1-first=Fuzhen |editor1-last=Zhang |series=Numerical Methods and Algorithms |year=2005 |volume=4 |publisher=Springer| isbn=0-387-24271-6 |doi=10.1007/b105056}}</ref> [[Emilie Virginia Haynsworth]] was the first to call it the ''Schur complement''.<ref>Haynsworth, E. V., "On the Schur Complement", ''Basel Mathematical Notes'', #BNB 20, 17 pages, June 1968.</ref> The Schur complement is sometimes referred to as the ''Feshbach map'' after a physicist [[Herman Feshbach]].<ref>{{ cite journal
|author=Feshbach, Herman
|title=Unified theory of nuclear reactions
|journal=Annals of Physics
|volume=5
|issue=4
|year=1958
|pages=357–390
|doi=10.1016/0003-4916(58)90007-1
}}
</ref>
==Background==
The Schur complement arises when performing a block [[Gaussian elimination]] on the matrix ''M''. In order to eliminate the elements below the block diagonal, one multiplies the matrix ''M'' by a ''block lower triangular'' matrix on the right as follows:
<math display="block">\begin{align}
&M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \quad \to \quad \begin{bmatrix} A & B \\ C & D \end{bmatrix} \begin{bmatrix} I_p & 0 \\ -D^{-1}C & I_q \end{bmatrix} = \begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix},
\end{align}</math>
where ''I<sub>p</sub>'' denotes a ''p''×''p'' [[identity matrix]]. As a result, the Schur complement <math>M/D = A - BD^{-1}C</math> appears in the upper-left ''p''×''p'' block.
Continuing the elimination process beyond this point (i.e., performing a block [[Gauss–Jordan elimination]]),
<math display="block">\begin{align}
&\begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix} \quad \to \quad \begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1}C & B \\ 0 & D \end{bmatrix}
= \begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix},
\end{align}</math>
leads to an [[LDU decomposition]] of ''M'', which reads
<math display="block">\begin{align}
M &= \begin{bmatrix} A & B \\ C & D \end{bmatrix}
= \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}\begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix}\begin{bmatrix} I_p & 0 \\ D^{-1}C & I_q \end{bmatrix}.
\end{align}</math>
Thus, the inverse of ''M'' may be expressed involving ''D''<sup>−1</sup> and the inverse of Schur's complement, assuming it exists, as
<math display="block">\begin{align}
M^{-1} = \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1}
={} &\left(\begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}
\begin{bmatrix} A - BD^{-1}C & 0 \\ 0 & D \end{bmatrix}
\begin{bmatrix} I_p & 0 \\ D^{-1}C & I_q \end{bmatrix}
\right)^{-1} \\
={} &\begin{bmatrix} I_p & 0 \\ -D^{-1}C & I_q \end{bmatrix}
\begin{bmatrix} \left(A - BD^{-1}C\right)^{-1} & 0 \\ 0 & D^{-1} \end{bmatrix}
\begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix} \\[4pt]
={} &\begin{bmatrix}
\left(A - BD^{-1}C\right)^{-1} & -\left(A - BD^{-1}C\right)^{-1} BD^{-1} \\
-D^{-1}C\left(A - BD^{-1}C\right)^{-1} & D^{-1} + D^{-1}C\left(A - BD^{-1}C\right)^{-1}BD^{-1}
\end{bmatrix} \\[4pt]
={} &\begin{bmatrix}
\left(M/D\right)^{-1} & -\left(M/D\right)^{-1} B D^{-1} \\
-D^{-1}C\left(M/D\right)^{-1} & D^{-1} + D^{-1}C\left(M/D\right)^{-1} B D^{-1}
\end{bmatrix}.
\end{align}</math>
The above relationship comes from the elimination operations that involve ''D''<sup>−1</sup> and ''M/D''. An equivalent derivation can be done with the roles of ''A'' and ''D'' interchanged. By equating the expressions for ''M''<sup>−1</sup> obtained in these two different ways, one can establish the [[matrix inversion lemma]], which relates the two Schur complements of ''M'': ''M/D'' and ''M/A'' (see ''"Derivation from LDU decomposition"'' in {{slink|Woodbury_matrix_identity#Alternative_proofs}}).
== Properties ==
: <math> M^{-1} = \frac{1}{AD-BC} \left[ \begin{matrix} D & -B \\ -C & A \end{matrix}\right] </math>
: provided that ''AD'' − ''BC'' is non-zero.
* In general, if ''A'' is invertible, then
: <math>\begin{align}
M &= \begin{bmatrix} A&B\\C&D \end{bmatrix} =
\begin{bmatrix} I_p & 0 \\ CA^{-1} & I_q \end{bmatrix}\begin{bmatrix} A & 0 \\ 0 & D - CA^{-1}B \end{bmatrix}\begin{bmatrix} I_p & A^{-1}B \\ 0 & I_q \end{bmatrix}, \\[4pt]
M^{-1} &= \begin{bmatrix} A^{-1} + A^{-1} B (M/A)^{-1} C A^{-1} & - A^{-1} B (M/A)^{-1} \\ - (M/A)^{-1} CA^{-1} & (M/A)^{-1} \end{bmatrix}
\end{align}</math>
: whenever this inverse exists.
* (Schur's formula) When ''A'', respectively ''D'', is invertible, the determinant of ''M'' is also clearly seen to be given by
: <math>\det(M) = \det(A) \det\left(D - CA^{-1} B\right)</math>, respectively
: <math>\det(M) = \det(D) \det\left(A - BD^{-1} C\right)</math>,
: which generalizes the determinant formula for 2 × 2 matrices.
* (Guttman rank additivity formula) If ''D'' is invertible, then the [[Rank (linear algebra)|rank]] of ''M'' is given by
: <math> \operatorname{rank}(M) = \operatorname{rank}(D) + \operatorname{rank}\left(A - BD^{-1} C\right)</math>
* ([[Haynsworth inertia additivity formula]]) If ''A'' is invertible, then the ''inertia'' of the block matrix ''M'' is equal to the inertia of ''A'' plus the inertia of ''M''/''A''.
* (Quotient identity) <math>A/B = ((A/C)/(B/C))</math>.<ref>{{Cite journal |last1=Crabtree |first1=Douglas E. |last2=Haynsworth |first2=Emilie V. |date=1969 |title=An identity for the Schur complement of a matrix |url=https://www.ams.org/proc/1969-022-02/S0002-9939-1969-0255573-1/ |journal=Proceedings of the American Mathematical Society |language=en |volume=22 |issue=2 |pages=364–366 |doi=10.1090/S0002-9939-1969-0255573-1 |s2cid=122868483 |issn=0002-9939|doi-access=free |url-access=subscription }}</ref>
* The Schur complement of a [[Laplacian matrix]] is also a Laplacian matrix.<ref>{{Cite journal |last=Devriendt |first=Karel |date=2022 |title=Effective resistance is more than distance: Laplacians, Simplices and the Schur complement |url=https://linkinghub.elsevier.com/retrieve/pii/S0024379522000039 |journal=Linear Algebra and Its Applications |language=en |volume=639 |pages=24–49 |doi=10.1016/j.laa.2022.01.002|arxiv=2010.04521 |s2cid=222272289 }}</ref>
== Application to solving linear equations ==
The Schur complement arises naturally in solving a [[system of linear equations]] such as<ref name="Boyd 2004" />
<math>
\begin{bmatrix} A & B \\ C & D \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix}
= \begin{bmatrix} u \\ v \end{bmatrix}
</math>.
Assuming that the submatrix <math>A</math> is invertible, we can eliminate <math>x</math> from the equations, as follows.
<math>x = A^{-1} (u - By).</math>
Substituting this expression into the second equation yields
: <math>\left(D - CA^{-1}B\right)y = v-CA^{-1}u.</math>
: <math>S \ \overset{\underset{\mathrm{def}}{}}{=}\ D - CA^{-1}B</math>.
Solving the reduced equation, we obtain
: <math>y = S^{-1} \left(v-CA^{-1}u\right).</math>
Substituting this into the first equation yields
: <math>x = \left(A^{-1} + A^{-1} B S^{-1} C A^{-1}\right) u - A^{-1} B S^{-1} v. </math>
We can express the above two equation as:
: <math>\begin{bmatrix} x \\ y \end{bmatrix} =
\begin{bmatrix}
A^{-1} + A^{-1} B S^{-1} C A^{-1} & -A^{-1} B S^{-1} \\
-S^{-1} C A^{-1} & S^{-1}
\end{bmatrix}
\begin{bmatrix} u \\ v \end{bmatrix}.
</math>
Therefore, a formulation for the inverse of a block matrix is:
: <math>
\begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} =
\begin{bmatrix}
A^{-1} + A^{-1} B S^{-1} C A^{-1} & - A^{-1} B S^{-1} \\
-S^{-1} C A^{-1} & S^{-1}
\end{bmatrix} = \begin{bmatrix}
I_p & -A^{-1}B\\
& I_q
\end{bmatrix}\begin{bmatrix}
A^{-1} & \\
& S^{-1}
\end{bmatrix}\begin{bmatrix}
I_p & \\
-CA^{-1} & I_q
\end{bmatrix}.
</math>
In particular, we see that the Schur complement is the inverse of the <math>2,2</math> block entry of the inverse of <math>M</math>.
In practice, one needs <math>A</math> to be [[condition number|well-conditioned]] in order for this algorithm to be numerically accurate.
This method is useful in electrical engineering to reduce the dimension of a network's equations. It is especially useful when element(s) of the output vector are zero. For example, when <math>u</math> or <math>v</math> is zero, we can eliminate the associated rows of the coefficient matrix without any changes to the rest of the output vector. If <math>v</math> is null then the above equation for <math>x</math> reduces to <math>x = \left(A^{-1} + A^{-1} B S^{-1} C A^{-1}\right) u</math>, thus reducing the dimension of the coefficient matrix while leaving <math>u</math> unmodified. This is used to advantage in electrical engineering where it is referred to as node elimination or [[Kron reduction]].
==Applications to probability theory and statistics==
Suppose the random column vectors ''X'', ''Y'' live in '''R'''<sup>''n''</sup> and '''R'''<sup>''m''</sup> respectively, and the vector (''X'', ''Y'') in '''R'''<sup>''n'' + ''m''</sup> has a [[multivariate normal distribution]] whose covariance is the symmetric positive-definite matrix
:<math>\Sigma = \left[\begin{matrix} A & B \\ B^\mathrm{T} & C\end{matrix}\right],</math>
where <math display="inline">A \in \mathbb{R}^{n \times n}</math> is the [[covariance matrix]] of ''X'', <math display="inline">C \in \mathbb{R}^{m \times m}</math> is the covariance matrix of ''Y'' and <math display="inline">B \in \mathbb{R}^{n \times m}</math> is the covariance matrix between ''X'' and ''Y''.
Then the [[Conditional variance|conditional covariance]] of ''X'' given ''Y'' is the Schur complement of ''C'' in <math display="inline">\Sigma</math>:<ref name="von Mises 1964">{{cite book |title=Mathematical theory of probability and statistics |url=https://archive.org/details/mathematicaltheo0057vonm |url-access=registration |first=Richard |last=von Mises |year=1964|publisher=Academic Press| chapter=Chapter VIII.9.3|isbn=978-1483255385}}</ref>
:<math>\begin{align}
\operatorname{Cov}(X \mid Y) &= A - BC^{-1}B^\mathrm{T} \\
\operatorname{E}(X \mid Y) &= \operatorname{E}(X) + BC^{-1}(Y - \operatorname{E}(Y))
\end{align}</math>
If we take the matrix <math>\Sigma</math> above to be, not a covariance of a random vector, but a ''sample'' covariance, then it may have a [[Wishart distribution]]. In that case, the Schur complement of ''C'' in <math>\Sigma</math> also has a Wishart distribution.{{Citation needed|date=January 2014}}
== Conditions for positive definiteness and semi-definiteness ==
Let ''X'' be a [[symmetric matrix]] of real numbers given by
<math display="block">X = \left[\begin{matrix} A & B \\ B^\mathrm{T} & C\end{matrix}\right].</math>
Then by the [[Haynsworth inertia additivity formula]], we find
* If ''A'' is invertible, then ''X'' is positive definite if and only if ''A'' and its complement ''X/A'' are both positive definite:<ref name="Zhang 2005"></ref>{{rp|34}}
:<math display="block">X \succ 0 \Leftrightarrow A \succ 0, X/A = C - B^\mathrm{T} A^{-1} B \succ 0.</math>
* If ''C'' is invertible, then ''X'' is positive definite if and only if ''C'' and its complement ''X/C'' are both positive definite:
:<math display="block">X \succ 0 \Leftrightarrow C \succ 0, X/C = A - B C^{-1} B^\mathrm{T} \succ 0.</math>
* If ''A'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/A'' is positive semi-definite:<ref name="Zhang 2005"/>{{rp|34}}
:<math display="block">\text{If } A \succ 0, \text{ then } X \succeq 0 \Leftrightarrow X/A = C - B^\mathrm{T} A^{-1} B \succeq 0.</math>
* If ''C'' is positive definite, then ''X'' is positive semi-definite if and only if the complement ''X/C'' is positive semi-definite:
:<math display="block">\text{If } C \succ 0,\text{ then } X \succeq 0 \Leftrightarrow X/C = A - B C^{-1} B^\mathrm{T} \succeq 0.</math>
The first and third statements can also be derived<ref name="Boyd 2004">Boyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5)</ref> by considering the minimizer of the quantity
<math display="block">u^\mathrm{T} A u + 2 v^\mathrm{T} B^\mathrm{T} u + v^\mathrm{T} C v, \,</math>
as a function of ''v'' (for fixed ''u'').
Furthermore, since
<math display="block">
\left[\begin{matrix} A & B \\ B^\mathrm{T} & C \end{matrix}\right] \succ 0
\Longleftrightarrow \left[\begin{matrix} C & B^\mathrm{T} \\ B & A \end{matrix}\right] \succ 0
</math>
and similarly for positive semi-definite matrices, the second (respectively fourth) statement is immediate from the first (resp. third) statement.
There is also a sufficient and necessary condition for the positive semi-definiteness of ''X'' in terms of a generalized Schur complement.<ref name="Zhang 2005" /> Precisely,
* <math>X \succeq 0 \Leftrightarrow A \succeq 0, C - B^\mathrm{T} A^g B \succeq 0, \left(I - A A^{g}\right)B = 0 \, </math> and
* <math>X \succeq 0 \Leftrightarrow C \succeq 0, A - B C^g B^\mathrm{T} \succeq 0, \left(I - C C^g\right)B^\mathrm{T} = 0, </math>
where <math>A^g</math> denotes a [[generalized inverse]] of <math>A</math>.
== See also ==
* [[Woodbury matrix identity]]
* [[Quasi-Newton method]]
* [[Haynsworth inertia additivity formula]]
* [[Gaussian process]]
* [[Total least squares]]
* [[Guyan reduction]] in computational mechanics
== References ==
{{reflist}}
[[Category:Linear algebra]]
[[Category:Issai Schur]]
|