Content deleted Content added
m Task 18 (cosmetic): eval 9 templates: del empty params (4×); |
No edit summary |
||
Line 8:
:<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - \mathbf{P} \mathbf{y} = \left( \mathbf{I} - \mathbf{P} \right) \mathbf{y}.</math>
where <math>\mathbf{I}</math> is the [[identity matrix]]. The matrix <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math> is sometimes referred to as the '''residual maker matrix'''. Moreover, the element in the ''i''th row and ''j''th column of <math>\mathbf{P}</math> is equal to the [[covariance]] between the ''j''th response value and the ''i''th fitted value, divided by the [[variance]] of the former:
:<math>p_{ij} = \frac{\operatorname{Cov}\left[ \hat{y}_i, y_j \right]
\begin{align}▼
▲p_{ij} = \operatorname{Cov}\left[ \hat{y}_i, y_j \right] / \operatorname{Var}\left[y_j \right]
\end{align}▼
Therefore, the [[covariance matrix]] of the residuals <math>\mathbf{r}</math>, by [[error propagation]], equals
:<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right)^
where <math>\mathbf{\Sigma}</matH> is the [[covariance matrix]] of the error vector (and by extension, the response vector as well). For the case of linear models with [[independent and identically distributed]] errors in which <math>\mathbf{\Sigma} = \sigma^{2} \mathbf{I}</math>, this reduces to:<ref name="Hoaglin1977"/>
:<math>\mathbf{\Sigma}_\mathbf{r} = \left( \mathbf{I} - \mathbf{P} \right) \sigma^{2}</math>.
Line 22 ⟶ 19:
From the figure, it is clear that the closest point from the vector <math>\mathbf{b}</math> onto the column space of <math>\mathbf{A}</math>, is <math>\mathbf{Ax}</math>, and is one where we can draw a line orthogonal to the column space of <math>\mathbf{A}</math>. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so
:<math>\mathbf{A}^\textsf{T}(\mathbf{b}-\mathbf{Ax}) = 0</math>
From there, one rearranges, so
▲:<math>\begin{align}
\Rightarrow && \mathbf{x} &= \left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}
▲\end{align}</math>
Therefore, since <math>\mathbf{x}</math> is on the column space of <math>\mathbf{A}</math>, the projection matrix, which maps <math>\mathbf{b}</math> onto <math>\mathbf{x}</math> is just <math>\mathbf{Ax}</math>, or <math>\mathbf{A}\left(\mathbf{A}^\textsf{T}\mathbf{A}\right)^{-1}\mathbf{A}^\textsf{T}\mathbf{b}</math>
== Linear model ==
Suppose that we wish to estimate a linear model using linear least squares. The model can be written as
:<math>\mathbf{y} = \mathbf{X} \boldsymbol
where <math>\mathbf{X}</math> is a matrix of [[explanatory variable]]s (the [[design matrix]]), '''''β''''' is a vector of unknown parameters to be estimated, and '''''ε''''' is the error vector.
Line 42 ⟶ 41:
When the weights for each observation are identical and the [[errors and residuals in statistics|errors]] are uncorrelated, the estimated parameters are
:<math>\hat{\boldsymbol
so the fitted values are
:<math>\hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol \beta} = \mathbf{X} \left( \mathbf{X}^
Therefore, the projection matrix (and hat matrix) is given by
:<math>\mathbf{P} \equiv \mathbf{X} \left(\mathbf{X}^
=== Weighted and generalized least squares ===
Line 57 ⟶ 56:
: <math>
\hat{\mathbf\beta}_{\text{GLS}}= \left( \mathbf{X}^\
</math>.
Line 63 ⟶ 62:
: <math>
H = \mathbf{X}\left( \mathbf{X}^\
</math>
and again it may be seen that <math>H^2 = H\cdot H = H</math>, though now it is no longer symmetric.
== Properties ==
The projection matrix has a number of useful algebraic properties.<ref>{{cite book |last=Gans |first=P. |year=1992 |title=Data Fitting in the Chemical Sciences |url=https://archive.org/details/datafittinginche0000gans |url-access=registration |publisher=Wiley |isbn=0-471-93412-7 }}</ref><ref>{{cite book |last=Draper |first=N. R. |last2=Smith |first2=H. |year=1998 |title=Applied Regression Analysis |publisher=Wiley |isbn=0-471-17082-8 }}</ref> In the language of [[linear algebra]], the projection matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix <math>\mathbf{X}</math>.<ref name = "Freedman09" /> (Note that <math>\left( \mathbf{X}^
* <math>\mathbf{u} = (\mathbf{I} - \mathbf{P})\mathbf{y},</math> and <math>\mathbf{u} = \mathbf{y} - \mathbf{P} \mathbf{y} \perp \mathbf{X}.</math>
* <math>\mathbf{P}</math> is symmetric, and so is <math>\mathbf{M} \equiv \left( \mathbf{I} - \mathbf{P} \right)</math>.
Line 87 ⟶ 86:
Suppose the design matrix <math>X</math> can be decomposed by columns as <math>X= [A~~~B]</math>.
Define the hat or projection operator as <math>P\{X\} = X \left(X^\
Then the projection matrix can be decomposed as follows:<ref>{{cite book|last1=Rao|first1=C. Radhakrishna|last2=Toutenburg|first2=Helge|author3=Shalabh|first4=Christian|last4=Heumann|title=Linear Models and Generalizations|url=https://archive.org/details/linearmodelsgene00raop|url-access=limited|year=2008|publisher=Springer|___location=Berlin|isbn=978-3-540-74226-5|pages=[https://archive.org/details/linearmodelsgene00raop/page/n335 323]|edition=3rd}}</ref>
:<math>
P\{X\} = P\{A\} + P\{M\{A\} B\},
</math>
where, e.g., <math>P\{A\} = A \left(A^\
There are a number of applications of such a decomposition. In the classical application <math>A</math> is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the [[fixed effects model]], where <math>A</math> is a large [[sparse matrix]] of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of <math>X </math> without explicitly forming the matrix <math>X</math>, which might be too large to fit into computer memory.
|