Projection matrix: Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
m Task 18 (cosmetic): eval 9 templates: del empty params (4×);
No edit summary
Line 8:
:<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - \mathbf{P} \mathbf{y} = \left( \mathbf{I} - \mathbf{P} \right) \mathbf{y}.</math>
where <math>\mathbf{I}</math> is the [[identity matrix]]. The matrix <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math> is sometimes referred to as the '''residual maker matrix'''. Moreover, the element in the ''i''th row and ''j''th column of <math>\mathbf{P}</math> is equal to the [[covariance]] between the ''j''th response value and the ''i''th fitted value, divided by the [[variance]] of the former:
:<math>p_{ij} = \frac{\operatorname{Cov}\left[ \hat{y}_i, y_j \right] / }{\operatorname{Var}\left[y_j \right]}</math>
:<math>
 
\begin{align}
p_{ij} = \operatorname{Cov}\left[ \hat{y}_i, y_j \right] / \operatorname{Var}\left[y_j \right]
\end{align}
</math>
Therefore, the [[covariance matrix]] of the residuals <math>\mathbf{r}</math>, by [[error propagation]], equals
:<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right)^{\mathsftextsf{T}} \mathbf{\Sigma} \left( \mathbf{I}-\mathbf{P} \right)</math>,
where <math>\mathbf{\Sigma}</matH> is the [[covariance matrix]] of the error vector (and by extension, the response vector as well). For the case of linear models with [[independent and identically distributed]] errors in which <math>\mathbf{\Sigma} = \sigma^{2} \mathbf{I}</math>, this reduces to:<ref name="Hoaglin1977"/>
:<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right) \sigma^{2}</math>.
Line 22 ⟶ 19:
 
From the figure, it is clear that the closest point from the vector <math>\mathbf{b}</math> onto the column space of <math>\mathbf{A}</math>, is <math>\mathbf{Ax}</math>, and is one where we can draw a line orthogonal to the column space of <math>\mathbf{A}</math>. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so
:<math>\mathbf{A}^\textsf{T}(\mathbf{b}-\mathbf{Ax}) = 0</math>
 
From there, one rearranges, so
:<math>\begin{align}
:<math>\mathbf{A}^{T}\mathbf{b}-\mathbf{A}^{T}\mathbf{Ax} = 0</math>
:<math> && \mathbf{A}^\textsf{T}\mathbf{b}= &- \mathbf{A}^\textsf{T}\mathbf{Ax}</math> = 0 \\
:<math> \mathbf{x}=(Rightarrow && \mathbf{A}^\textsf{T}\mathbf{A})^{-1b} &= \mathbf{A}^\textsf{T}\mathbf{bAx}</math> \\
\Rightarrow && \mathbf{x} &= \left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}
\end{align}</math>
 
Therefore, since <math>\mathbf{x}</math> is on the column space of <math>\mathbf{A}</math>, the projection matrix, which maps <math>\mathbf{b}</math> onto <math>\mathbf{x}</math> is just <math>\mathbf{Ax}</math>, or <math>\mathbf{A}\left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}</math>
 
== Linear model ==
Suppose that we wish to estimate a linear model using linear least squares. The model can be written as
:<math>\mathbf{y} = \mathbf{X} \boldsymbol \beta + \boldsymbol \varepsilon,</math>
where <math>\mathbf{X}</math> is a matrix of [[explanatory variable]]s (the [[design matrix]]), '''''β''''' is a vector of unknown parameters to be estimated, and '''''ε''''' is the error vector.
 
Line 42 ⟶ 41:
When the weights for each observation are identical and the [[errors and residuals in statistics|errors]] are uncorrelated, the estimated parameters are
 
:<math>\hat{\boldsymbol \beta} = \left( \mathbf{X}^{\mathsftextsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsftextsf{T}} \mathbf{y},</math>
 
so the fitted values are
 
:<math>\hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol \beta} = \mathbf{X} \left( \mathbf{X}^{\mathsftextsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsftextsf{T}} \mathbf{y}.</math>
 
Therefore, the projection matrix (and hat matrix) is given by
 
:<math>\mathbf{P} \equiv \mathbf{X} \left(\mathbf{X}^{\mathsftextsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsftextsf{T}}.</math>
 
=== Weighted and generalized least squares ===
Line 57 ⟶ 56:
 
: <math>
\hat{\mathbf\beta}_{\text{GLS}}= \left( \mathbf{X}^\mathsftextsf{T} \mathbf{\Psi}^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^\mathsftextsf{T} \mathbf{\Psi}^{-1}\mathbf{y}
</math>.
 
Line 63 ⟶ 62:
 
: <math>
H = \mathbf{X}\left( \mathbf{X}^\mathsftextsf{T} \mathbf{\Psi}^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^\mathsftextsf{T} \mathbf{\Psi}^{-1}
</math>
 
and again it may be seen that <math>H^2 = H\cdot H = H</math>, though now it is no longer symmetric.
 
== Properties ==
The projection matrix has a number of useful algebraic properties.<ref>{{cite book |last=Gans |first=P. |year=1992 |title=Data Fitting in the Chemical Sciences |url=https://archive.org/details/datafittinginche0000gans |url-access=registration |publisher=Wiley |isbn=0-471-93412-7 }}</ref><ref>{{cite book |last=Draper |first=N. R. |last2=Smith |first2=H. |year=1998 |title=Applied Regression Analysis |publisher=Wiley |isbn=0-471-17082-8 }}</ref> In the language of [[linear algebra]], the projection matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix <math>\mathbf{X}</math>.<ref name = "Freedman09" /> (Note that <math>\left( \mathbf{X}^{\mathsftextsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsftextsf{T}}</math> is the [[Moore–Penrose pseudoinverse#Full rank|pseudoinverse of X]].) Some facts of the projection matrix in this setting are summarized as follows:<ref name = "Freedman09" />
* <math>\mathbf{u} = (\mathbf{I} - \mathbf{P})\mathbf{y},</math> and <math>\mathbf{u} = \mathbf{y} - \mathbf{P} \mathbf{y} \perp \mathbf{X}.</math>
* <math>\mathbf{P}</math> is symmetric, and so is <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math>.
Line 87 ⟶ 86:
 
Suppose the design matrix <math>X</math> can be decomposed by columns as <math>X= [A~~~B]</math>.
Define the hat or projection operator as <math>P\{X\} = X \left(X^\mathsftextsf{T} X \right)^{-1} X^\mathsftextsf{T}</math>. Similarly, define the residual operator as <math>M\{X\} = I - P\{X\}</math>.
Then the projection matrix can be decomposed as follows:<ref>{{cite book|last1=Rao|first1=C. Radhakrishna|last2=Toutenburg|first2=Helge|author3=Shalabh|first4=Christian|last4=Heumann|title=Linear Models and Generalizations|url=https://archive.org/details/linearmodelsgene00raop|url-access=limited|year=2008|publisher=Springer|___location=Berlin|isbn=978-3-540-74226-5|pages=[https://archive.org/details/linearmodelsgene00raop/page/n335 323]|edition=3rd}}</ref>
:<math>
P\{X\} = P\{A\} + P\{M\{A\} B\},
</math>
where, e.g., <math>P\{A\} = A \left(A^\mathsftextsf{T} A \right)^{-1} A^\mathsftextsf{T}</math> and <math>M\{A\} = I - P\{A\}</math>.
There are a number of applications of such a decomposition. In the classical application <math>A</math> is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the [[fixed effects model]], where <math>A</math> is a large [[sparse matrix]] of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of <math>X </math> without explicitly forming the matrix <math>X</math>, which might be too large to fit into computer memory.