Block matrix pseudoinverse: Difference between revisions

Content deleted Content added
Derivation: Wikilink Jerzy Baksalary in ref, not yet in en but in pl
m Added short description
Tags: Mobile edit Mobile app edit iOS app edit App description add
 
(12 intermediate revisions by 6 users not shown)
Line 1:
{{Short description|Formula for the pseudoinverse of a partitioned matrix}}
{{refimprove|date=December 2010}}
In [[mathematics]], a '''block matrix pseudoinverse''' is a formula for the [[pseudoinverse]] of a [[partitioned matrix]]. This is useful for decomposing or approximating many algorithms updating parameters in [[signal processing]], which are based on the [[least squares]] method.
Line 4 ⟶ 5:
== Derivation ==
Consider a column-wise partitioned matrix:
:<math>
:<math> [\mathbf A, \mathbf B], \qquad \mathbf A \in \reals^{m\times n}, \qquad \mathbf B \in \reals^{m\times p}, \qquad m \geq n+p.</math>
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix},\quad
 
\mathbf A \in \reals^{m \times n},\quad
If the above matrix is full rank, the [[Moore–Penrose inverse]] matrices of it and its transpose are
\mathbf B \in \reals^{m \times p},\quad
:<math>
m \geq n + p.
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
^{+} = ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1} [\mathbf A, \mathbf B]^T,
</math>
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf B^T
\end{bmatrix}
^{+} = [\mathbf A, \mathbf B] ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}.
</math>
 
If the above matrix is full column rank, the [[Moore–Penrose inverse]] matrices of it and its transpose are
:<math>\begin{align}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &=
\left(
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}
\right)^{-1} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}, \\
 
\begin{bmatrix}
\mathbf A^\textsf{T} \\
\mathbf B^\textsf{T}
\end{bmatrix}^+ &=
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix} \left(
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}
\right)^{-1}.
\end{align}</math>
 
This computation of the pseudoinverse requires (''n''&nbsp;+&nbsp;''p'')-square matrix inversion and does not take advantage of the block form.
 
To reduce computational costs to ''n''- and ''p''-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives <ref name=Baksalary>{{cite journal|author=[[Jerzy Baksalary|J.K. Baksalary]] and O.M. Baksalary|title=Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix|journal=Linear Algebra Appl.|volume=421|date=2007|pages=16–23|doi=10.1016/j.laa.2006.03.031|doi-access=}}</ref>
:<math> \begin{align}
\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &=
\begin{bmatrix}
\mathbf A, & \mathbf B
\mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1} \\
\end{bmatrix}
\mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}
^{+} =
\beginend{bmatrix} =
\begin{bmatrix}
\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}
\left(\mathbf P_B^\perp\mathbf A\right)^+ \\
\\
\mathbf P_A^\perp \mathbf Bleft(\mathbf B^T \mathbf P_A^\perp \mathbf B\right)^{-1}+
\end{bmatrix}, \\
 
=
\begin{bmatrix}
(\mathbf P_B^{\perp} \mathbf A)^\textsf{+T} \\
\mathbf B^\textsf{T}
\\
\end{bmatrix}^+ &=
(\mathbf P_A^{\perp}\mathbf B)^{+}
\endbegin{bmatrix},
\mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1},\quad
</math>
\mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}
:<math>
\beginend{bmatrix} =
\begin{bmatrix}
\mathbf A^T \\ \mathbf B^T
\left(\mathbf A^\textsf{T} \mathbf P_B^\perp\right)^+ &
\end{bmatrix}
^{+} = \left[\mathbf P_B^\perp \mathbf Aleft( \mathbf AB^T \mathbf P_B^\perp \mathbf A)^textsf{-1T}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf Bright)^{-1}\right]+
\end{bmatrix},
= [(\mathbf A^T \mathbf P_B^{\perp})^{+},
\end{align}</math>
\quad (\mathbf B^T \mathbf P_A^{\perp})^{+} ],
 
</math>
where [[orthogonal projection]] matrices are defined by
::<math>\begin{align}
\mathbf P_A^\perp &= \mathbf I - \mathbf A \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1} \mathbf A^\textsf{T}, \\
\begin{align}
\mathbf P_A^\perp & = \mathbf I - \mathbf A (\mathbf A^T \mathbf A)^{-1} \mathbf A^T, \\ \mathbf P_B^\perp & = \mathbf I - \mathbf B \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1} \mathbf B^\textsf{T}.
\end{align}</math>
</math>
 
The above formulas are not necessarily valid if <math>[\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}</math> does not have full rank – for example, if <math>\mathbf A \neq 0</math>, then
:<math>
\begin{bmatrix}\mathbf A & \mathbf A\end{bmatrix}^+ =
\frac{1}{2}\begin{bmatrix}
\mathbf A, & \mathbf A
\mathbf A^+ \\
\end{bmatrix}
\mathbf A^+
^{+}
\end{bmatrix} \neq
= \frac{1}{2}
\begin{bmatrix}
\mathbf A^{+} \left(\mathbf P_A^\perp\mathbf A\right)^{+} \\
\left(\mathbf P_A^\perp\mathbf A\right)^+
\end{bmatrix}
\end{bmatrix} =
\neq
0
\begin{bmatrix}
(\mathbf P_A^{\perp}\mathbf A)^{+}
\\
(\mathbf P_A^{\perp}\mathbf A)^{+}
\end{bmatrix}
= 0
</math>
 
Line 81 ⟶ 86:
=== Column-wise partitioning in over-determined least squares ===
 
Suppose a solution <math>
<math> \mathbf x = \begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix}
</math> solves an over-determined system:
:<math>
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
\begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix} =
\mathbf d,\quad
=
\mathbf d \in \reals^{m \times 1}.
</math>
,
\qquad \mathbf d \in \reals^{m\times 1}.</math>
 
Using the block matrix pseudoinverse, we have
:<math>\mathbf x =
\begin{bmatrix}
\mathbf x
\mathbf A, & \mathbf B
=
\beginend{bmatrix}^+\,\mathbf d =
\begin{bmatrix}
\mathbf A, & \mathbf B
\left(\mathbf P_B^\perp \mathbf A\right)^+ \\
\end{bmatrix}
\left(\mathbf P_A^\perp \mathbf B\right)^+
^{+}\,
\end{bmatrix}\mathbf d.
=
\begin{bmatrix}
(\mathbf P_B^{\perp} \mathbf A)^{+}\\
(\mathbf P_A^{\perp} \mathbf B)^{+}
\end{bmatrix}
\mathbf d
.
</math>
 
Therefore, we have a decomposed solution:
:<math>
\mathbf x_1 = \left(\mathbf P_B^\perp \mathbf A\right)^+\,\mathbf d,\quad
\mathbf x_1
\mathbf x_2 = \left(\mathbf P_A^\perp \mathbf B\right)^+\,\mathbf d.
=
(\mathbf P_B^{\perp} \mathbf A)^{+}\,
\mathbf d
,
\qquad
\mathbf x_2
=
(\mathbf P_A^{\perp} \mathbf B)^{+}
\,
\mathbf d
.
</math>
 
=== Row-wise partitioning in under-determined least squares ===
 
Suppose a solution <math> \mathbf x </math> solves an under-determined system:
:<math>
\begin{bmatrix}
\mathbf A^T \\ \mathbf BA^\textsf{T} \\
\mathbf B^\textsf{T}
\end{bmatrix}
\end{bmatrix}\mathbf x =
\begin{bmatrix}
=
\mathbf e \\
\begin{bmatrix}
\mathbf e \\ \mathbf f
\end{bmatrix}, \quad
\qquad \mathbf e \in \reals^{n \times 1},\quad
\qquad \mathbf f \in \reals^{p \times 1}.
</math>
 
The minimum-norm solution is given by
:<math>\mathbf x =
\begin{bmatrix}
\mathbf x
\mathbf A^\textsf{T} \\
=
\mathbf B^\textsf{T}
\begin{bmatrix}
\end{bmatrix}^+\,
\mathbf A^T \\ \mathbf B^T
\endbegin{bmatrix}
\mathbf e \\
^{+}\,
\mathbf f
\begin{bmatrix}
\end{bmatrix}.
\mathbf e \\ \mathbf f
\end{bmatrix}.
</math>
 
Using the block matrix pseudoinverse, we have
:<math>
\mathbf x = \begin{bmatrix}
\left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+ &
=
[ \left(\mathbf AB^\textsf{T}\mathbf P_BP_A^{\perp}\right)^{+},
\end{bmatrix} \begin{bmatrix}
\quad (\mathbf B^T\mathbf P_A^{\perp})^{+} ]
\mathbf e \\
\begin{bmatrix}
\mathbf e \\ \mathbf f
\end{bmatrix} =
\left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+\,\mathbf e +
=
\left(\mathbf AB^\textsf{T}\mathbf P_BP_A^{\perp}\right)^{+}\,\mathbf ef.
+
(\mathbf B^T\mathbf P_A^{\perp} )^{+}\,\mathbf f
.
</math>
 
== Comments on matrix inversion ==
 
Instead of <math> \mathbf \left([\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}^\textsf{T} [\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}\right)^{-1}</math>, we need to calculate directly or indirectly{{citation needed|date=December 2010}}{{original research?|date=December 2010}}
we need to calculate directly or indirectly{{citation needed|date=December 2010}}{{original research?|date=December 2010}}
 
:<math>
\quad \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1},\quad
\quad \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1},\quad
\quad \left(\mathbf A^\textsf{T} \mathbf P_B^{\perp} \mathbf A\right)^{-1}, \quad
\quad \left(\mathbf B^\textsf{T} \mathbf P_A^{\perp} \mathbf B\right)^{-1}.
.
</math>
 
In a dense and small system, we can use [[singular value decomposition]], [[QR decomposition]], or [[Cholesky decomposition]] to replace the matrix inversions with numerical routines. In a large system, we may employ [[iterative methods]] such as Krylov subspace methods.
 
Considering [[parallel algorithms]], we can compute <math>\left(\mathbf A^\textsf{T} \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf B\right)^{-1}</math> in parallel. Then, we finish to compute <math>\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}</math> also in parallel.
<math>(\mathbf B^T \mathbf B)^{-1}</math> in parallel. Then, we finish to compute <math>(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}</math> and <math>(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}</math> also in parallel.
 
== See also ==
*[[ {{section link|Invertible matrix#|Blockwise inversion]]}}
 
==References ==
Line 205 ⟶ 187:
* [https://web.archive.org/web/20060414125709/http://www.csit.fsu.edu/~burkardt/papers/linear_glossary.html Linear Algebra Glossary] by [http://www.csit.fsu.edu/~burkardt/ John Burkardt]
* [http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf The Matrix Cookbook] by [http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3274/ Kaare Brandt Petersen]
* [http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf Lecture 8: Least-norm solutions of undetermined equations] by [httphttps://www.stanford.edu/~boyd/ Stephen P. Boyd]
 
{{Numerical linear algebra}}