Content deleted Content added
m Moving Category:Matrices to Category:Matrices (mathematics) per Wikipedia:Categories for discussion/Speedy |
|||
(10 intermediate revisions by 6 users not shown) | |||
Line 7:
If the vector of [[Response variable|response values]] is denoted by <math>\mathbf{y}</math> and the vector of fitted values by <math>\mathbf{\hat{y}}</math>,
:<math>\mathbf{\hat{y}} = \mathbf{P} \mathbf{y}.</math>
As <math>\mathbf{\hat{y}}</math> is usually pronounced "y-hat", the projection matrix <math>\mathbf{P}</math> is also named ''hat matrix'' as it "puts a [[circumflex|hat]] on <math>\mathbf{y}</math>".
==Application for residuals==
The formula for the vector of [[errors and residuals in statistics|residual]]s <math>\mathbf{r}</math> can also be expressed compactly using the projection matrix:
:<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - \mathbf{P} \mathbf{y} = \left( \mathbf{I} - \mathbf{P} \right) \mathbf{y}.</math>
where <math>\mathbf{I}</math> is the [[identity matrix]]. The matrix <math>\mathbf{M}
The [[covariance matrix]] of the residuals <math>\mathbf{r}</math>, by [[error propagation]], equals
Line 26 ⟶ 23:
From the figure, it is clear that the closest point from the vector <math>\mathbf{b}</math> onto the column space of <math>\mathbf{A}</math>, is <math>\mathbf{Ax}</math>, and is one where we can draw a line orthogonal to the column space of <math>\mathbf{A}</math>. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so
:<math>\mathbf{A}^\textsf{T}(\mathbf{b}-\mathbf{Ax}) = 0</math>.
From there, one rearranges, so
Line 33 ⟶ 30:
\Rightarrow && \mathbf{A}^\textsf{T}\mathbf{b} &= \mathbf{A}^\textsf{T}\mathbf{Ax} \\
\Rightarrow && \mathbf{x} &= \left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}
\end{align}</math>.
Therefore, since <math>\mathbf{
== Linear model ==
Line 77 ⟶ 74:
The projection matrix has a number of useful algebraic properties.<ref>{{cite book |last=Gans |first=P. |year=1992 |title=Data Fitting in the Chemical Sciences |url=https://archive.org/details/datafittinginche0000gans |url-access=registration |publisher=Wiley |isbn=0-471-93412-7 }}</ref><ref>{{cite book |last=Draper |first=N. R. |last2=Smith |first2=H. |year=1998 |title=Applied Regression Analysis |publisher=Wiley |isbn=0-471-17082-8 }}</ref> In the language of [[linear algebra]], the projection matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix <math>\mathbf{X}</math>.<ref name = "Freedman09" /> (Note that <math>\left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}</math> is the [[Moore–Penrose pseudoinverse#Full rank|pseudoinverse of X]].) Some facts of the projection matrix in this setting are summarized as follows:<ref name = "Freedman09" />
* <math>\mathbf{u} = (\mathbf{I} - \mathbf{P})\mathbf{y},</math> and <math>\mathbf{u} = \mathbf{y} - \mathbf{P} \mathbf{y} \perp \mathbf{X}.</math>
* <math>\mathbf{P}</math> is symmetric, and so is <math>\mathbf{M}
* <math>\mathbf{P}</math> is idempotent: <math>\mathbf{P}^2 = \mathbf{P}</math>, and so is <math>\mathbf{M}</math>.
* If <math>\mathbf{X}</math> is an {{nowrap|''n'' × ''r''}} matrix with <math>\operatorname{rank}(\mathbf{X}) = r</math>, then <math>\operatorname{rank}(\mathbf{P}) = r</math>
Line 94 ⟶ 91:
Suppose the design matrix <math>\mathbf{X}</math> can be decomposed by columns as <math>\mathbf{X} = \begin{bmatrix} \mathbf{A} & \mathbf{B} \end{bmatrix}</math>.
Define the hat or projection operator as <math>\mathbf{P}[\mathbf{X}] := \mathbf{X} \left(\mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}</math>. Similarly, define the residual operator as <math>\mathbf{M}[\mathbf{X}] := \mathbf{I} - \mathbf{P}[\mathbf{X}]</math>.
Then the projection matrix can be decomposed as follows:<ref>{{cite book|last1=Rao|first1=C. Radhakrishna|last2=Toutenburg|first2=Helge|author3=Shalabh|first4=Christian|last4=Heumann|title=Linear Models and Generalizations|url=https://archive.org/details/linearmodelsgene00raop|url-access=limited|year=2008|publisher=Springer|___location=Berlin|isbn=978-3-540-74226-5|
:<math> \mathbf{P}[\mathbf{X}] = \mathbf{P}[\mathbf{A}] + \mathbf{P}\big[\mathbf{M}[\mathbf{A}] \mathbf{B}\big], </math>
where, e.g., <math>\mathbf{P}[\mathbf{A}] = \mathbf{A} \left(\mathbf{A}^\textsf{T} \mathbf{A} \right)^{-1} \mathbf{A}^\textsf{T}</math> and <math>\mathbf{M}[\mathbf{A}] = \mathbf{I} - \mathbf{P}[\mathbf{A}]</math>.
There are a number of applications of such a decomposition. In the classical application <math>\mathbf{A}</math> is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the [[fixed effects model]], where <math>\mathbf{A}</math> is a large [[sparse matrix]] of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of <math>\mathbf{X}</math> without explicitly forming the matrix <math>\mathbf{X}</math>, which might be too large to fit into computer memory.
==History==
The hat matrix was introduced by John Wilder in 1972. An article by Hoaglin, D.C. and Welsch, R.E. (1978) gives the properties of the matrix and also many examples of its application.
== See also ==
Line 111 ⟶ 110:
[[Category:Regression analysis]]
[[Category:Matrices (mathematics)]]
|