Block matrix pseudoinverse: Difference between revisions

Content deleted Content added
m Added short description
Tags: Mobile edit Mobile app edit iOS app edit App description add
 
(31 intermediate revisions by 20 users not shown)
Line 1:
{{Short description|Formula for the pseudoinverse of a partitioned matrix}}
{{refimprove|date=December 2010}}
In [[mathematics]], a '''block matrix pseudoinverse''' is a formula offor the [[pseudoinverse]] of a [[partitioned matrix]]. This is useful for decomposing or approximating many algorithms updating parameters in [[signal processing]], which are based on the [[least squares]] method.
 
== Derivation ==
Consider a column-wise partitioned matrix:
:<math>
:<math> [\mathbf A, \mathbf B], \qquad \mathbf A \in \reals^{n\times m}, \qquad \mathbf B \in \reals^{p\times m}, \qquad m \geq n+p.</math>
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix},\quad
 
\mathbf A \in \reals^{m \times n},\quad
If the above matrix is full rank, the [[pseudoinverse]] matrices of it and its transpose are as follows.
\mathbf B \in \reals^{m \times p},\quad
:<math>
m \geq n + p.
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
^{+} = ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1} [\mathbf A, \mathbf B]^T,
</math>
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf B^T
\end{bmatrix}
^{+} = [\mathbf A, \mathbf B] ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}.
</math>
The pseudoinverse requires (''n''&nbsp;+&nbsp;''p'')-square matrix inversion.
 
If the above matrix is full column rank, the [[Moore–Penrose inverse]] matrices of it and its transpose are
To reduce complexity and introduce parallelism, we derive the following decomposed formula. From a block matrix inverse<math> \mathbf ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}</math>, we can have{{citation needed|date=December 2010}}{{Original research|date=December 2010}}
:<math> \begin{align}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &=
\left(
\mathbf A, & \mathbf B
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}
^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right]^T,
\right)^{-1} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}, \\
</math>
 
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf BA^\textsf{T} \\
\mathbf B^\textsf{T}
\end{bmatrix}
\end{bmatrix}^+ &=
^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right],
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix} \left(
</math>
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}
where [[orthogonal projection]] matrices are defined by
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}
::<math>
\right)^{-1}.
\begin{align}
\end{align}</math>
\mathbf P_A^\perp & = \mathbf I - \mathbf A (\mathbf A^T \mathbf A)^{-1} \mathbf A^T, \\ \mathbf P_B^\perp & = \mathbf I - \mathbf B (\mathbf B^T \mathbf B)^{-1} \mathbf B^T.
\end{align}
</math>
 
This computation of the pseudoinverse requires (''n''&nbsp;+&nbsp;''p'')-square matrix inversion and does not take advantage of the block form.
Interestingly, from the [[idempotence]] of projection matrix, we can verify that the pseudoinverse of block matrix consists of pseudoinverse of projected matrices:
:<math>
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
^{+}
=
\begin{bmatrix}
(\mathbf P_B^{\perp}\mathbf A)^{+}
\\
(\mathbf P_A^{\perp}\mathbf B)^{+}
\end{bmatrix}, </math>
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf B^T
\end{bmatrix}
^{+}
= [(\mathbf A^T \mathbf P_B^{\perp})^{+},
\quad (\mathbf B^T \mathbf P_A^{\perp})^{+} ]. </math>
 
To reduce computational costs to ''n''- and ''p''-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives <ref name=Baksalary>{{cite journal|author=[[Jerzy Baksalary|J.K. Baksalary]] and O.M. Baksalary|title=Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix|journal=Linear Algebra Appl.|volume=421|date=2007|pages=16–23|doi=10.1016/j.laa.2006.03.031|doi-access=}}</ref>
Thus, we decomposed the block matrix pseudoinverse into two submatrix pseudoinverses, which cost ''n''- and ''p''-square matrix inversions, respectively.
:<math>\begin{align}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &=
\begin{bmatrix}
\mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1} \\
\mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}
\end{bmatrix} =
\begin{bmatrix}
\left(\mathbf P_B^\perp\mathbf A\right)^+ \\
\left(\mathbf P_A^\perp\mathbf B\right)^+
\end{bmatrix}, \\
 
\begin{bmatrix}
Note that the above formulae are not necessarily valid if <math>[\mathbf A, \mathbf B]</math> does not have full rank – for example, if <math>\mathbf A \neq 0</math>, then
\mathbf A^\textsf{T} \\
\mathbf B^\textsf{T}
\end{bmatrix}^+ &=
\begin{bmatrix}
\mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1},\quad
\mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}
\end{bmatrix} =
\begin{bmatrix}
\left(\mathbf A^\textsf{T} \mathbf P_B^\perp\right)^+ &
\left(\mathbf B^\textsf{T} \mathbf P_A^\perp\right)^+
\end{bmatrix},
\end{align}</math>
 
where [[orthogonal projection]] matrices are defined by
::<math>\begin{align}
\mathbf P_A^\perp &= \mathbf I - \mathbf A \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1} \mathbf A^\textsf{T}, \\
\mathbf P_B^\perp &= \mathbf I - \mathbf B \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1} \mathbf B^\textsf{T}.
\end{align}</math>
 
The above formulas are not necessarily valid if <math>\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}</math> does not have full rank – for example, if <math>\mathbf A \neq 0</math>, then
:<math>
\begin{bmatrix}\mathbf A & \mathbf A\end{bmatrix}^+ =
\frac{1}{2}\begin{bmatrix}
\mathbf A, & \mathbf A
\mathbf A^+ \\
\end{bmatrix}
\mathbf A^+
^{+}
\end{bmatrix} \neq
= \frac{1}{2}
\begin{bmatrix}
\mathbf A^{+} \left(\mathbf P_A^\perp\mathbf A\right)^{+} \\
\left(\mathbf P_A^\perp\mathbf A\right)^+
\end{bmatrix}
\end{bmatrix} =
\neq
0
\begin{bmatrix}
(\mathbf P_A^{\perp}\mathbf A)^{+}
\\
(\mathbf P_A^{\perp}\mathbf A)^{+}
\end{bmatrix}
= 0
</math>
 
Line 90 ⟶ 86:
=== Column-wise partitioning in over-determined least squares ===
 
Suppose a solution <math>
<math> \mathbf x = \begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix}
</math> solves an over-determined system:
:<math>
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
\begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix} =
\mathbf d,\quad
=
\mathbf d \in \reals^{m \times 1}.
</math>
,
\qquad \mathbf d \in \reals^{m\times 1}.</math>
 
Using the block matrix pseudoinverse, we have
:<math>\mathbf x =
\begin{bmatrix}
\mathbf x
\mathbf A, & \mathbf B
=
\beginend{bmatrix}^+\,\mathbf d =
\begin{bmatrix}
\mathbf A, & \mathbf B
\left(\mathbf P_B^\perp \mathbf A\right)^+ \\
\end{bmatrix}
\left(\mathbf P_A^\perp \mathbf B\right)^+
^{+}\,
\end{bmatrix}\mathbf d.
=
\begin{bmatrix}
(\mathbf P_B^{\perp} \mathbf A)^{+}\\
(\mathbf P_A^{\perp} \mathbf B)^{+}
\end{bmatrix}
\mathbf d
.
</math>
 
Therefore, we have a decomposed solution:
:<math>
\mathbf x_1 = \left(\mathbf P_B^\perp \mathbf A\right)^+\,\mathbf d,\quad
\mathbf x_1
\mathbf x_2 = \left(\mathbf P_A^\perp \mathbf B\right)^+\,\mathbf d.
=
(\mathbf P_B^{\perp} \mathbf A)^{+}\,
\mathbf d
,
\qquad
\mathbf x_2
=
(\mathbf P_A^{\perp} \mathbf B)^{+}
\,
\mathbf d
.
</math>
 
=== Row-wise partitioning in under-determined least squares ===
 
Suppose a solution <math> \mathbf x </math> solves an under-determined system:
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf BA^\textsf{T} \\
\mathbf B^\textsf{T}
\end{bmatrix}
\end{bmatrix}\mathbf x =
\begin{bmatrix}
=
\mathbf e \\
\begin{bmatrix}
\mathbf e \\ \mathbf f
\end{bmatrix}, \quad
\qquad \mathbf e \in \reals^{n \times 1},\quad
\qquad \mathbf f \in \reals^{p \times 1}.
</math>
 
The minimum-norm solution is given by
:<math>\mathbf x =
\begin{bmatrix}
\mathbf x
\mathbf A^\textsf{T} \\
=
\mathbf B^\textsf{T}
\begin{bmatrix}
\end{bmatrix}^+\,
\mathbf A^T \\ \mathbf B^T
\endbegin{bmatrix}
\mathbf e \\
^{+}\,
\mathbf f
\begin{bmatrix}
\end{bmatrix}.
\mathbf e \\ \mathbf f
\end{bmatrix}.
</math>
 
Using the block matrix pseudoinverse, we have
:<math>
\mathbf x = \begin{bmatrix}
\left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+ &
=
[ \left(\mathbf AB^\textsf{T}\mathbf P_BP_A^{\perp}\right)^{+},
\end{bmatrix} \begin{bmatrix}
\quad (\mathbf B^T\mathbf P_A^{\perp})^{+} ]
\mathbf e \\
\begin{bmatrix}
\mathbf e \\ \mathbf f
\end{bmatrix} =
\left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+\,\mathbf e +
=
\left(\mathbf AB^\textsf{T}\mathbf P_BP_A^{\perp}\right)^{+}\,\mathbf ef.
+
(\mathbf B^T\mathbf P_A^{\perp} )^{+}\,\mathbf f
.
</math>
 
== Comments on matrix inversion ==
 
Instead of <math> \mathbf \left([\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}^\textsf{T} [\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}\right)^{-1}</math>, we need to calculate directly or indirectly{{citation needed|date=December 2010}}{{original research?|date=December 2010}}
we need to calculate directly or indirectly{{citation needed|date=December 2010}}{{original research?|date=December 2010}}
 
:<math>
\quad \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1},\quad
\quad \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1},\quad
\quad \left(\mathbf A^\textsf{T} \mathbf P_B^{\perp} \mathbf A\right)^{-1}, \quad
\quad \left(\mathbf B^\textsf{T} \mathbf P_A^{\perp} \mathbf B\right)^{-1}.
.
</math>
 
In a dense and small system, we can use [[singular value decomposition]], [[QR decomposition]], or [[Cholesky decomposition]] to replace the matrix inversions with numerical routines. In a large system, we may employ [[iterative methods]] such as Krylov subspace methods.
 
Considering [[parallel algorithms]], we can compute <math>\left(\mathbf A^\textsf{T} \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf B\right)^{-1}</math> in parallel. Then, we finish to compute <math>\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}</math> also in parallel.
<math>(\mathbf B^T \mathbf B)^{-1}</math> in parallel. Then, we finish to compute <math>(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}</math> and <math>(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}</math> also in parallel.
 
== Proof of block matrix inversion ==
Let a block matrix be
:<math>\begin{bmatrix}
A & B \\
C & D
\end{bmatrix}
.</math>
We can get an inverse formula by combining the previous results in.<ref>[http://ccrma.stanford.edu/~jos/lattice/Block_matrix_decompositions.html Block matrix decompositions]</ref>
:<math>\begin{bmatrix}
A & B \\
C & D
\end{bmatrix}^{-1}
=
\begin{bmatrix}
(A - BD^{-1}C)^{-1} & -A^{-1}B(D - CA^{-1}B)^{-1} \\
-D^{-1}C(A - BD^{-1}C)^{-1} & (D - CA^{-1}B)^{-1}
\end{bmatrix}
=
\begin{bmatrix}
S^{-1}_D & -A^{-1}BS^{-1}_A \\
-D^{-1}CS^{-1}_D & S^{-1}_A
\end{bmatrix}
,</math>
where <math>S_A</math> and <math>S_D</math>, respectively, [[Schur complements]] of <math>A</math>
and <math>D</math>, are defined by <math>S_A = D - C A^{-1}B</math>, and <math>S_D =
A - BD^{-1}C</math>. This relation is derived by using Block Triangular
Decomposition. It is called ''simple block matrix inversion.''<ref>[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=30419&arnumber=1399280&count=249&index=181 S. Jo, S. W. Kim and T. J. Park, "Equally constrained affine projection algorithm," ''in Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers,'' vol. 1, pp. 955–959, Nov. 7–10, 2004.]</ref>
 
Now we can obtain the inverse of the symmetric block matrix:
:<math>
\begin{bmatrix}
\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\
\mathbf B^T \mathbf A & \mathbf B^T \mathbf B
\end{bmatrix}^{-1}
=
\begin{bmatrix}
(\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1}
& -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1}
\\
-(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1}
& (\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1}
\end{bmatrix}
</math>
:::<math>
=
\begin{bmatrix}
(\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}
& -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}
\\
-(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}
& (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
\end{bmatrix}
</math>
Since the block matrix is symmetric, we also have
:<math>
\begin{bmatrix}
\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\
\mathbf B^T \mathbf A & \mathbf B^T \mathbf B
\end{bmatrix}^{-1}
=
\begin{bmatrix}
(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}
&
-(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}
\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}
\\
-(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
\mathbf B^T \mathbf A (\mathbf A^T \mathbf A)^{-1}
& (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
\end{bmatrix}.
</math>
 
Then, we can see how the Schur complements are connected to the projection matrices of the symmetric, partitioned matrix.
 
== See also ==
*[[ {{section link|Invertible matrix#|Blockwise inversion]]}}
 
==References ==
Line 286 ⟶ 185:
== External links ==
* [http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html The Matrix Reference Manual] by [http://www.ee.ic.ac.uk/hp/staff/dmb/dmb.html Mike Brookes]
* [https://web.archive.org/web/20060414125709/http://www.csit.fsu.edu/~burkardt/papers/linear_glossary.html Linear Algebra Glossary] by [http://www.csit.fsu.edu/~burkardt/ John Burkardt]
* [http://2302www2.imm.dtu.dk/unipubdb/matrixcookbookviews/edoc_download.php/3274/pdf/imm3274.htmlpdf The Matrix Cookbook] by [http://2302www2.imm.dtu.dk/unipubdb/views/publication_details.php?id=3274/ Kaare Brandt Petersen]
* [http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf Lecture 8: Least-norm solutions of undetermined equations] by [https://stanford.edu/~boyd/ Stephen P. Boyd]
 
{{Numerical linear algebra}}