Revision as of 21:50, 7 December 2020 edit Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 18 (cosmetic): eval 9 templates: del empty params (4×); Tag: AWB ← Previous edit		Revision as of 08:41, 18 December 2020 edit undo 84.246.203.253 (talk) No edit summary Next edit →
Line 8: :<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - \mathbf{P} \mathbf{y} = \left( \mathbf{I} - \mathbf{P} \right) \mathbf{y}.</math> where <math>\mathbf{I}</math> is the [[identity matrix]]. The matrix <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math> is sometimes referred to as the '''residual maker matrix'''. Moreover, the element in the ''i''th row and ''j''th column of <math>\mathbf{P}</math> is equal to the [[covariance]] between the ''j''th response value and the ''i''th fitted value, divided by the [[variance]] of the former: :<math>p_{ij} = \frac{\operatorname{Cov}\left[ \hat{y}_i, y_j \right] / }{\operatorname{Var}\left[y_j \right]}</math>▼ ~~:<math>~~ \begin{align}▼ ▲p_{ij} = \operatorname{Cov}\left[ \hat{y}_i, y_j \right] / \operatorname{Var}\left[y_j \right] \end{align}▼ ~~</math>~~ Therefore, the [[covariance matrix]] of the residuals <math>\mathbf{r}</math>, by [[error propagation]], equals :<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right)^{\~~mathsf~~textsf{T}} \mathbf{\Sigma} \left( \mathbf{I}-\mathbf{P} \right)</math>, where <math>\mathbf{\Sigma}</matH> is the [[covariance matrix]] of the error vector (and by extension, the response vector as well). For the case of linear models with [[independent and identically distributed]] errors in which <math>\mathbf{\Sigma} = \sigma^{2} \mathbf{I}</math>, this reduces to:<ref name="Hoaglin1977"/> :<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right) \sigma^{2}</math>. Line 22 ⟶ 19: From the figure, it is clear that the closest point from the vector <math>\mathbf{b}</math> onto the column space of <math>\mathbf{A}</math>, is <math>\mathbf{Ax}</math>, and is one where we can draw a line orthogonal to the column space of <math>\mathbf{A}</math>. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so :<math>\mathbf{A}^\textsf{T}(\mathbf{b}-\mathbf{Ax}) = 0</math> From there, one rearranges, so ▲:<math>\begin{align} ~~:<math>\mathbf{A}^{T}\mathbf{b}-\mathbf{A}^{T}\mathbf{Ax} = 0</math>~~ ~~:<math>~~ && \mathbf{A}^\textsf{T}\mathbf{b}= &- \mathbf{A}^\textsf{T}\mathbf{Ax}~~</math>~~ = 0 \\ ~~:<math>~~ \~~mathbf{x}=(~~Rightarrow && \mathbf{A}^\textsf{T}\mathbf{~~A})^{-1~~b} &= \mathbf{A}^\textsf{T}\mathbf{bAx}~~</math>~~ \\ \Rightarrow && \mathbf{x} &= \left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b} ▲\end{align}</math> Therefore, since <math>\mathbf{x}</math> is on the column space of <math>\mathbf{A}</math>, the projection matrix, which maps <math>\mathbf{b}</math> onto <math>\mathbf{x}</math> is just <math>\mathbf{Ax}</math>, or <math>\mathbf{A}\left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}</math> == Linear model == Suppose that we wish to estimate a linear model using linear least squares. The model can be written as :<math>\mathbf{y} = \mathbf{X} \boldsymbol \beta + \boldsymbol \varepsilon,</math> where <math>\mathbf{X}</math> is a matrix of [[explanatory variable]]s (the [[design matrix]]), '''''β''''' is a vector of unknown parameters to be estimated, and '''''ε''''' is the error vector. Line 42 ⟶ 41: When the weights for each observation are identical and the [[errors and residuals in statistics\|errors]] are uncorrelated, the estimated parameters are :<math>\hat{\boldsymbol \beta} = \left( \mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{y},</math> so the fitted values are :<math>\hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol \beta} = \mathbf{X} \left( \mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{y}.</math> Therefore, the projection matrix (and hat matrix) is given by :<math>\mathbf{P} \equiv \mathbf{X} \left(\mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\~~mathsf~~textsf{T}}.</math> === Weighted and generalized least squares === Line 57 ⟶ 56: : <math> \hat{\mathbf\beta}_{\text{GLS}}= \left( \mathbf{X}^\~~mathsf~~textsf{T} \mathbf{\Psi}^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^\~~mathsf~~textsf{T} \mathbf{\Psi}^{-1}\mathbf{y} </math>. Line 63 ⟶ 62: : <math> H = \mathbf{X}\left( \mathbf{X}^\~~mathsf~~textsf{T} \mathbf{\Psi}^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^\~~mathsf~~textsf{T} \mathbf{\Psi}^{-1} </math> and again it may be seen that <math>H^2 = H\cdot H = H</math>, though now it is no longer symmetric. == Properties == The projection matrix has a number of useful algebraic properties.<ref>{{cite book \|last=Gans \|first=P. \|year=1992 \|title=Data Fitting in the Chemical Sciences \|url=https://archive.org/details/datafittinginche0000gans \|url-access=registration \|publisher=Wiley \|isbn=0-471-93412-7 }}</ref><ref>{{cite book \|last=Draper \|first=N. R. \|last2=Smith \|first2=H. \|year=1998 \|title=Applied Regression Analysis \|publisher=Wiley \|isbn=0-471-17082-8 }}</ref> In the language of [[linear algebra]], the projection matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix <math>\mathbf{X}</math>.<ref name = "Freedman09" /> (Note that <math>\left( \mathbf{X}^{\~~mathsf~~textsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\~~mathsf~~textsf{T}}</math> is the [[Moore–Penrose pseudoinverse#Full rank\|pseudoinverse of X]].) Some facts of the projection matrix in this setting are summarized as follows:<ref name = "Freedman09" /> * <math>\mathbf{u} = (\mathbf{I} - \mathbf{P})\mathbf{y},</math> and <math>\mathbf{u} = \mathbf{y} - \mathbf{P} \mathbf{y} \perp \mathbf{X}.</math> * <math>\mathbf{P}</math> is symmetric, and so is <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math>. Line 87 ⟶ 86: Suppose the design matrix <math>X</math> can be decomposed by columns as <math>X= [A~~~B]</math>. Define the hat or projection operator as <math>P\{X\} = X \left(X^\~~mathsf~~textsf{T} X \right)^{-1} X^\~~mathsf~~textsf{T}</math>. Similarly, define the residual operator as <math>M\{X\} = I - P\{X\}</math>. Then the projection matrix can be decomposed as follows:<ref>{{cite book\|last1=Rao\|first1=C. Radhakrishna\|last2=Toutenburg\|first2=Helge\|author3=Shalabh\|first4=Christian\|last4=Heumann\|title=Linear Models and Generalizations\|url=https://archive.org/details/linearmodelsgene00raop\|url-access=limited\|year=2008\|publisher=Springer\|___location=Berlin\|isbn=978-3-540-74226-5\|pages=[https://archive.org/details/linearmodelsgene00raop/page/n335 323]\|edition=3rd}}</ref> :<math> P\{X\} = P\{A\} + P\{M\{A\} B\}, </math> where, e.g., <math>P\{A\} = A \left(A^\~~mathsf~~textsf{T} A \right)^{-1} A^\~~mathsf~~textsf{T}</math> and <math>M\{A\} = I - P\{A\}</math>. There are a number of applications of such a decomposition. In the classical application <math>A</math> is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the [[fixed effects model]], where <math>A</math> is a large [[sparse matrix]] of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of <math>X </math> without explicitly forming the matrix <math>X</math>, which might be too large to fit into computer memory.

Projection matrix: Difference between revisions