Projection matrix: Difference between revisions

Content deleted Content added
No edit summary
3mta3 (talk | contribs)
removed unnecessary bolding
Line 1:
In [[statistics]], the '''hat matrix''', '''''H''''', relates the fitted values to the observed values. It describes the influence each observed value has on each fitted value<ref name="Hoaglin1977">
{{Citation| title = The Hat Matrix in Regression and ANOVA
| first1= David C. | last1= Hoaglin |first2= Roy E. | last2=Welsch
Line 8:
 
If the vector of observed values is denoted by '''y''' and the vector of fitted values by <math>\hat{\mathbf{y}}</math>,
:<math>\hat{\mathbf{y}} = H \mathbf{H y}.</math>
As <math>\hat{\mathbf{y}}</math> is usually pronounced "y-hat", the hat matrix is so named as it "puts a hat on '''y'''".
 
Suppose that we wish to solve a [[linear model]] using [[linear least squares]]. The model can be written as
:<math>\mathbf{y} = \mathbf{X \boldsymbol \beta} + \mathbf{boldsymbol \varepsilon},</math>
where '''X''' is a matrix of explanatory variables (the [[design matrix]]), ''''β''''' is a vector of unknown parameters to be estimated, and '''''ε''''' is the error vector.
The estimated parameters are
:<math>\mathbf{\hat{ \boldsymbol \beta}} = \left(\mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{y},</math>
so the fitted values are
:<math>\hat{\mathbf{y}} = \mathbf{X \hat{\boldsymbol \beta}} = \mathbf{X} \left(\mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{y}.</math>
Therefore the hat matrix is given by
:<math>\mathbf{H} = \mathbf{X} \left(\mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{X}^\top.</math>
In the language of [[linear algebra]], the hat matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix '''X'''.
 
The hat matrix corresponding to a [[linear model]] is [[symmetric matrix|symmetric]] and [[idempotent]], that is, '''H'''<sup>2</sup> = '''H'''. However, this is not always the case; in [[local regression|locally weighted scatterplot smoothing (LOESS)]], for example, the hat matrix is in general neither symmetric nor idempotent.
 
The formula for the vector of residuals '''r''' can be expressed compactly using the hat matrix:
:<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - H \mathbf{H y} = (\mathbf{I} - \mathbf{H}) \mathbf{y}.</math>
The [[variance-covariance matrix]] of the residuals is therefore, by [[error propagation]], equal to <math>\mathbf{\left(I-H \right)^\top V\left(I-H \right) }</math>, where '''V''' is the variance-covariance matrix of the errors (and by extension, the observations as well).
For the case of linear models with [[independent and identically distributed]] errors in which '''V''' = ''σ''<sup>2</sup>'''I''', this reduces to ('''I''' - '''H''')''σ''<sup>2</sup><ref name="Hoaglin1977"/>.
 
For [[linear models]], the [[trace (linear algebra)|trace]] of the hat matrix is equal to the [[rank (linear algebra)|rank]] of '''X''', which is the number of independent parameters of the linear model.
For other models such as LOESS that are still linear in the observations '''y''',
the hat matrix can be used to define the [[degrees of freedom (statistics)#Effective degrees of freedom|effective degrees of freedom]] of the model.
Line 37:
==Correlated residuals==
 
The above may be generalized to the case of correlated residuals. Suppose that the [[covariance matrix]] of the residuals is '''A'''. Then since
 
:<math> \hat{\mathbfboldsymbol{\beta}} = \mathbf{X} \left(\mathbf{X}^\top\mathbf{ A}^{-1}\mathbf{ X} \right)^{-1} X^\top A^{-1}\,\mathbf{y}, </math>
\mathbf{X}^\top\mathbf{A}^{-1}\,\mathbf{y},
</math>
 
the hat matrix is thus
 
:<math> \mathbf{H} = \mathbf{X} \left(\mathbf{X}^\top\mathbf{ A}^{-1}\mathbf{ X}\right)^{-1} X^\top A^{-1}, </math>
\mathbf{X}^\top\mathbf{A}^{-1},
</math>
 
and again it may be seen that '''H'''<sup>2</sup> = '''H'''.
 
== See also ==