Block matrix pseudoinverse: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:06, 7 December 2012 edit 146.83.117.43 (talk) →Derivation ← Previous edit		Latest revision as of 00:24, 4 November 2024 edit undo OlliverWithDoubleL (talk \| contribs) Extended confirmed users 25,390 edits m Added short description Tags: Mobile edit Mobile app edit iOS app edit App description add
(31 intermediate revisions by 20 users not shown)
Line 1: {{Short description\|Formula for the pseudoinverse of a partitioned matrix}} {{refimprove\|date=December 2010}} In [[mathematics]], a '''block matrix pseudoinverse''' is a formula offor the [[pseudoinverse]] of a [[partitioned matrix]]. This is useful for decomposing or approximating many algorithms updating parameters in [[signal processing]], which are based on the [[least squares]] method. == Derivation == Consider a column-wise partitioned matrix: :<math> ~~:<math> [\mathbf A, \mathbf B], \qquad \mathbf A \in \reals^{n\times m}, \qquad \mathbf B \in \reals^{p\times m}, \qquad m \geq n+p.</math>~~ \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix},\quad \mathbf A \in \reals^{m \times n},\quad ~~If the above matrix is full rank, the [[pseudoinverse]] matrices of it and its transpose are as follows.~~ \mathbf B \in \reals^{m \times p},\quad ~~:<math>~~ m \geq n + p. ~~\begin{bmatrix}~~ ~~\mathbf A, & \mathbf B~~ ~~\end{bmatrix}~~ ~~^{+} = ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1} [\mathbf A, \mathbf B]^T,~~ </math> ~~:<math>~~ ~~\begin{bmatrix}~~ ~~\mathbf A^T \\ \mathbf B^T~~ ~~\end{bmatrix}~~ ~~^{+} = [\mathbf A, \mathbf B] ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}.~~ ~~</math>~~ ~~The pseudoinverse requires (''n'' + ''p'')-square matrix inversion.~~ If the above matrix is full column rank, the [[Moore–Penrose inverse]] matrices of it and its transpose are To reduce complexity and introduce parallelism, we derive the following decomposed formula. From a block matrix inverse<math> \mathbf ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}</math>, we can have{{citation needed\|date=December 2010}}{{Original research\|date=December 2010}} :<math> \begin{align} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &= \left( ~~\mathbf A, & \mathbf B~~ \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix} ~~^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right]^T,~~ \right)^{-1} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T}, \\ ~~</math>~~ ~~:<math>~~ \begin{bmatrix} ~~\mathbf~~ ~~A^T~~ \\ \mathbf BA^\textsf{T} \\ \mathbf B^\textsf{T} ~~\end{bmatrix}~~ \end{bmatrix}^+ &= ~~^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right],~~ \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix} \left( ~~</math>~~ \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^\textsf{T} ~~where [[orthogonal projection]] matrices are defined by~~ \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix} ~~::<math>~~ \right)^{-1}. ~~\begin{align}~~ \end{align}</math> ~~\mathbf P_A^\perp & = \mathbf I - \mathbf A (\mathbf A^T \mathbf A)^{-1} \mathbf A^T, \\ \mathbf P_B^\perp & = \mathbf I - \mathbf B (\mathbf B^T \mathbf B)^{-1} \mathbf B^T.~~ ~~\end{align}~~ ~~</math>~~ This computation of the pseudoinverse requires (''n'' + ''p'')-square matrix inversion and does not take advantage of the block form. ~~Interestingly, from the [[idempotence]] of projection matrix, we can verify that the pseudoinverse of block matrix consists of pseudoinverse of projected matrices:~~ ~~:<math>~~ ~~\begin{bmatrix}~~ ~~\mathbf A, & \mathbf B~~ ~~\end{bmatrix}~~ ~~^{+}~~ = ~~\begin{bmatrix}~~ ~~(\mathbf P_B^{\perp}\mathbf A)^{+}~~ \\ ~~(\mathbf P_A^{\perp}\mathbf B)^{+}~~ ~~\end{bmatrix}, </math>~~ ~~:<math>~~ ~~\begin{bmatrix}~~ ~~\mathbf A^T \\ \mathbf B^T~~ ~~\end{bmatrix}~~ ~~^{+}~~ ~~= [(\mathbf A^T \mathbf P_B^{\perp})^{+},~~ ~~\quad (\mathbf B^T \mathbf P_A^{\perp})^{+} ]. </math>~~ To reduce computational costs to ''n''- and ''p''-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives <ref name=Baksalary>{{cite journal\|author=[[Jerzy Baksalary\|J.K. Baksalary]] and O.M. Baksalary\|title=Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix\|journal=Linear Algebra Appl.\|volume=421\|date=2007\|pages=16–23\|doi=10.1016/j.laa.2006.03.031\|doi-access=}}</ref> ~~Thus, we decomposed the block matrix pseudoinverse into two submatrix pseudoinverses, which cost ''n''- and ''p''-square matrix inversions, respectively.~~ :<math>\begin{align} \begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}^+ &= \begin{bmatrix} \mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1} \\ \mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1} \end{bmatrix} = \begin{bmatrix} \left(\mathbf P_B^\perp\mathbf A\right)^+ \\ \left(\mathbf P_A^\perp\mathbf B\right)^+ \end{bmatrix}, \\ \begin{bmatrix} ~~Note that the above formulae are not necessarily valid if <math>[\mathbf A, \mathbf B]</math> does not have full rank – for example, if <math>\mathbf A \neq 0</math>, then~~ \mathbf A^\textsf{T} \\ \mathbf B^\textsf{T} \end{bmatrix}^+ &= \begin{bmatrix} \mathbf P_B^\perp \mathbf A\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1},\quad \mathbf P_A^\perp \mathbf B\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1} \end{bmatrix} = \begin{bmatrix} \left(\mathbf A^\textsf{T} \mathbf P_B^\perp\right)^+ & \left(\mathbf B^\textsf{T} \mathbf P_A^\perp\right)^+ \end{bmatrix}, \end{align}</math> where [[orthogonal projection]] matrices are defined by ::<math>\begin{align} \mathbf P_A^\perp &= \mathbf I - \mathbf A \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1} \mathbf A^\textsf{T}, \\ \mathbf P_B^\perp &= \mathbf I - \mathbf B \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1} \mathbf B^\textsf{T}. \end{align}</math> The above formulas are not necessarily valid if <math>\begin{bmatrix}\mathbf A & \mathbf B\end{bmatrix}</math> does not have full rank – for example, if <math>\mathbf A \neq 0</math>, then :<math> \begin{bmatrix}\mathbf A & \mathbf A\end{bmatrix}^+ = \frac{1}{2}\begin{bmatrix} ~~\mathbf A, & \mathbf A~~ \mathbf A^+ \\ ~~\end{bmatrix}~~ \mathbf A^+ ~~^{+}~~ \end{bmatrix} \neq ~~= \frac{1}{2}~~ \begin{bmatrix} ~~\mathbf~~ ~~A^{+}~~ \left(\mathbf P_A^\perp\mathbf A\right)^{+} \\ \left(\mathbf P_A^\perp\mathbf A\right)^+ ~~\end{bmatrix}~~ \end{bmatrix} = ~~\neq~~ 0 ~~\begin{bmatrix}~~ ~~(\mathbf P_A^{\perp}\mathbf A)^{+}~~ \\ ~~(\mathbf P_A^{\perp}\mathbf A)^{+}~~ ~~\end{bmatrix}~~ ~~= 0~~ </math> Line 90 ⟶ 86: === Column-wise partitioning in over-determined least squares === Suppose a solution <math> ~~<math>~~ \mathbf x = \begin{bmatrix} \mathbf x_1 \\ \mathbf x_2 \\ \end{bmatrix} </math> solves an over-determined system: :<math> \begin{bmatrix} \mathbf A, & \mathbf B \end{bmatrix} \begin{bmatrix} \mathbf x_1 \\ \mathbf x_2 \\ \end{bmatrix} = \mathbf d,\quad = \mathbf d \in \reals^{m \times 1}. </math> , ~~\qquad \mathbf d \in \reals^{m\times 1}.</math>~~ Using the block matrix pseudoinverse, we have :<math>\mathbf x = \begin{bmatrix} ~~\mathbf x~~ \mathbf A, & \mathbf B = \~~begin~~end{bmatrix}^+\,\mathbf d = \begin{bmatrix} ~~\mathbf A, & \mathbf B~~ \left(\mathbf P_B^\perp \mathbf A\right)^+ \\ ~~\end{bmatrix}~~ \left(\mathbf P_A^\perp \mathbf B\right)^+ ~~^{+}\,~~ \end{bmatrix}\mathbf d. = ~~\begin{bmatrix}~~ ~~(\mathbf P_B^{\perp} \mathbf A)^{+}\\~~ ~~(\mathbf P_A^{\perp} \mathbf B)^{+}~~ ~~\end{bmatrix}~~ ~~\mathbf d~~ . </math> Therefore, we have a decomposed solution: :<math> \mathbf x_1 = \left(\mathbf P_B^\perp \mathbf A\right)^+\,\mathbf d,\quad ~~\mathbf x_1~~ \mathbf x_2 = \left(\mathbf P_A^\perp \mathbf B\right)^+\,\mathbf d. = ~~(\mathbf P_B^{\perp} \mathbf A)^{+}\,~~ ~~\mathbf d~~ , ~~\qquad~~ ~~\mathbf x_2~~ = ~~(\mathbf P_A^{\perp} \mathbf B)^{+}~~ \, ~~\mathbf d~~ . </math> === Row-wise partitioning in under-determined least squares === Suppose a solution <math> \mathbf x </math> solves an under-determined system: :<math> \begin{bmatrix} ~~\mathbf~~ ~~A^T~~ \\ \mathbf BA^\textsf{T} \\ \mathbf B^\textsf{T} ~~\end{bmatrix}~~ \end{bmatrix}\mathbf x = \begin{bmatrix} = \mathbf e \\ ~~\begin{bmatrix}~~ ~~\mathbf~~ e \\ \mathbf f \end{bmatrix}, \quad ~~\qquad~~ \mathbf e \in \reals^{n \times 1},\quad ~~\qquad~~ \mathbf f \in \reals^{p \times 1}. </math> The minimum-norm solution is given by :<math>\mathbf x = \begin{bmatrix} ~~\mathbf x~~ \mathbf A^\textsf{T} \\ = \mathbf B^\textsf{T} ~~\begin{bmatrix}~~ \end{bmatrix}^+\, ~~\mathbf A^T \\ \mathbf B^T~~ \~~end~~begin{bmatrix} \mathbf e \\ ~~^{+}\,~~ \mathbf f ~~\begin{bmatrix}~~ \end{bmatrix}. ~~\mathbf e \\ \mathbf f~~ ~~\end{bmatrix}.~~ </math> Using the block matrix pseudoinverse, we have :<math> \mathbf x = \begin{bmatrix} \left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+ & = [ \left(\mathbf AB^\textsf{T}\mathbf ~~P_B~~P_A^{\perp}\right)^{+}, \end{bmatrix} \begin{bmatrix} ~~\quad (\mathbf B^T\mathbf P_A^{\perp})^{+} ]~~ \mathbf e \\ ~~\begin{bmatrix}~~ ~~\mathbf~~ e \\ \mathbf f \end{bmatrix} = \left(\mathbf A^\textsf{T}\mathbf P_B^\perp\right)^+\,\mathbf e + = \left(\mathbf AB^\textsf{T}\mathbf ~~P_B~~P_A^{\perp}\right)^{+}\,\mathbf ef. + ~~(\mathbf B^T\mathbf P_A^{\perp} )^{+}\,\mathbf f~~ . </math> == Comments on matrix inversion == Instead of <math> \mathbf \left([\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}^\textsf{T} [\begin{bmatrix}\mathbf A, & \mathbf B]\end{bmatrix}\right)^{-1}</math>, we need to calculate directly or indirectly{{citation needed\|date=December 2010}}{{original research?\|date=December 2010}} ~~we need to calculate directly or indirectly{{citation needed\|date=December 2010}}{{original research?\|date=December 2010}}~~ :<math> ~~\quad~~ \left(\mathbf A^\textsf{T} \mathbf A\right)^{-1},\quad ~~\quad~~ \left(\mathbf B^\textsf{T} \mathbf B\right)^{-1},\quad ~~\quad~~ \left(\mathbf A^\textsf{T} \mathbf P_B^{\perp} \mathbf A\right)^{-1}, \quad ~~\quad~~ \left(\mathbf B^\textsf{T} \mathbf P_A^{\perp} \mathbf B\right)^{-1}. . </math> In a dense and small system, we can use [[singular value decomposition]], [[QR decomposition]], or [[Cholesky decomposition]] to replace the matrix inversions with numerical routines. In a large system, we may employ [[iterative methods]] such as Krylov subspace methods. Considering [[parallel algorithms]], we can compute <math>\left(\mathbf A^\textsf{T} \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf B\right)^{-1}</math> in parallel. Then, we finish to compute <math>\left(\mathbf A^\textsf{T} \mathbf P_B^\perp \mathbf A\right)^{-1}</math> and <math>\left(\mathbf B^\textsf{T} \mathbf P_A^\perp \mathbf B\right)^{-1}</math> also in parallel. <math>(\mathbf B^T \mathbf B)^{-1}</math> in parallel. Then, we finish to compute <math>(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}</math> and <math>(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}</math> also in parallel. ~~== Proof of block matrix inversion ==~~ ~~Let a block matrix be~~ ~~:<math>\begin{bmatrix}~~ ~~A & B \\~~ ~~C & D~~ ~~\end{bmatrix}~~ ~~.</math>~~ ~~We can get an inverse formula by combining the previous results in.<ref>[http://ccrma.stanford.edu/~jos/lattice/Block_matrix_decompositions.html Block matrix decompositions]</ref>~~ ~~:<math>\begin{bmatrix}~~ ~~A & B \\~~ ~~C & D~~ ~~\end{bmatrix}^{-1}~~ = ~~\begin{bmatrix}~~ ~~(A - BD^{-1}C)^{-1} & -A^{-1}B(D - CA^{-1}B)^{-1} \\~~ ~~-D^{-1}C(A - BD^{-1}C)^{-1} & (D - CA^{-1}B)^{-1}~~ ~~\end{bmatrix}~~ = ~~\begin{bmatrix}~~ ~~S^{-1}_D & -A^{-1}BS^{-1}_A \\~~ ~~-D^{-1}CS^{-1}_D & S^{-1}_A~~ ~~\end{bmatrix}~~ ~~,</math>~~ ~~where <math>S_A</math> and <math>S_D</math>, respectively, [[Schur complements]] of <math>A</math>~~ ~~and <math>D</math>, are defined by <math>S_A = D - C A^{-1}B</math>, and <math>S_D =~~ ~~A - BD^{-1}C</math>. This relation is derived by using Block Triangular~~ Decomposition. It is called ''simple block matrix inversion.''<ref>[http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=30419&arnumber=1399280&count=249&index=181 S. Jo, S. W. Kim and T. J. Park, "Equally constrained affine projection algorithm," ''in Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers,'' vol. 1, pp. 955–959, Nov. 7–10, 2004.]</ref> ~~Now we can obtain the inverse of the symmetric block matrix:~~ ~~:<math>~~ ~~\begin{bmatrix}~~ ~~\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\~~ ~~\mathbf B^T \mathbf A & \mathbf B^T \mathbf B~~ ~~\end{bmatrix}^{-1}~~ = ~~\begin{bmatrix}~~ ~~(\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1}~~ ~~& -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1}~~ \\ ~~-(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1}~~ ~~& (\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1}~~ ~~\end{bmatrix}~~ ~~</math>~~ ~~:::<math>~~ = ~~\begin{bmatrix}~~ ~~(\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}~~ ~~& -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}~~ \\ ~~-(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}~~ ~~& (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}~~ ~~\end{bmatrix}~~ ~~</math>~~ ~~Since the block matrix is symmetric, we also have~~ ~~:<math>~~ ~~\begin{bmatrix}~~ ~~\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\~~ ~~\mathbf B^T \mathbf A & \mathbf B^T \mathbf B~~ ~~\end{bmatrix}^{-1}~~ = ~~\begin{bmatrix}~~ ~~(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}~~ & ~~-(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}~~ ~~\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}~~ \\ ~~-(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}~~ ~~\mathbf B^T \mathbf A (\mathbf A^T \mathbf A)^{-1}~~ ~~& (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}~~ ~~\end{bmatrix}.~~ ~~</math>~~ ~~Then, we can see how the Schur complements are connected to the projection matrices of the symmetric, partitioned matrix.~~ == See also == [[ {{section link\|Invertible matrix#\|Blockwise inversion]]}} ==References == Line 286 ⟶ 185: == External links == [http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html The Matrix Reference Manual] by [http://www.ee.ic.ac.uk/hp/staff/dmb/dmb.html Mike Brookes] * [https://web.archive.org/web/20060414125709/http://www.csit.fsu.edu/~burkardt/papers/linear_glossary.html Linear Algebra Glossary] by [http://www.csit.fsu.edu/~burkardt/ John Burkardt] * [http://~~2302~~www2.imm.dtu.dk/~~uni~~pubdb/~~matrixcookbook~~views/edoc_download.php/3274/pdf/imm3274.~~html~~pdf The Matrix Cookbook] by [http://~~2302~~www2.imm.dtu.dk/~~uni~~pubdb/views/publication_details.php?id=3274/ Kaare Brandt Petersen] * [http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf Lecture 8: Least-norm solutions of undetermined equations] by [https://stanford.edu/~boyd/ Stephen P. Boyd] {{Numerical linear algebra}}