Projection matrix: Difference between revisions

Content deleted Content added
m revised categories
Major rewrite in standard linear models notation. rm non-sequiter about resid SS and a property of idempotent matrices given at projection (linear algebra) of unexplained relevance here
Line 1:
In [[statistics]], the '''hat matrix''' '''H''' relates the fitted values to the observed values. It describes the influence each observed value has on each fitted value<ref name="Hoaglin1977">
The '''hat matrix''', '''H''', is used in [[statistics]] to relate [[errors]] in [[errors and residuals in statistics|residuals]] to [[observational error|experimental errors]]. Suppose that a [[linear least squares]] problem is being addressed. The model can be written as
{{Citation| title = The Hat Matrix in Regression and ANOVA
:<math>\mathbf{y^{calc}=Jp},</math>
| first1= David C. | last1= Hoaglin |first2= Roy E. | last2=Welsch
where '''J''' is a matrix of coefficients and '''p''' is a vector of parameters. The solution to the un-weighted least-squares equations is given by
|journal= [[The American Statistician]] | volume=32 |number=1 | month=February| year= 1978| pages=17-22
:<math>\mathbf{p=\left(J^\top J \right)^{-1} J^\top y^{obs}}.</math>
|url= http://www.jstor.org/stable/2683469}}
The vector of un-weighted residuals, '''r''', is given by
</ref>.
:<math>\mathbf {r=y^{obs}-y^{calc}=y^{obs}-J \left(J^\top J \right)^{-1} J^\top y^{obs}}.</math>
The diagonal elements of the hat matrix are the [[leverage (statistics)|leverage]]s, which describe the influence each observed value has on the fitted value for that same observation.
The matrix <math>\mathbf {H = J \left(J^\top J \right)^{-1} J^\top }</math> is known as the hat matrix. Thus, the residuals can be expressed simply as
:<math>\mathbf{r=\left(I-H \right) y^{obs}}.</math>
 
If the vector of observed values is denoted by '''y''' and the vector of fitted values by <math>\hat{\mathbf{y}},</math>
:<math>\hat{\mathbf{y^{calc}} =Jp \mathbf{H y},.</math>
As <math>\hat{\mathbf{y}}</math> is usually pronounced "y-hat", the hat matrix is so named as it "puts a hat on '''y'''".
 
The hat matrix corresponding to a [[linear model]] is [[symmetric]] and [[idempotent]], that is, <math>\mathbf {HH=H}</math>. However, this is not always the case; for example, the [[local regression|LOESS]] hat matrix is generally not symmetric nor idempotent.
 
Suppose that we wish to solve a [[linear model]] using [[linear least squares]]. The model can be written as
The [[variance-covariance matrix]] of the residuals is, by [[error propagation]], equal to <math>\mathbf{\left(I-H \right)^\top M\left(I-H \right) }</math>, where '''M''' is the variance-covariance matrix of the errors (and by extension, the observations as well). Thus, the [[residual sum of squares]] is a [[quadratic form (statistics)|quadratic form]] in the observations.
:<math>\mathbf{ry} = \left(I-Hmathbf{X \right)beta} y^ + \mathbf{obs} \epsilon}.,</math>
where '''X''' is a matrix of explanatory variables (the [[design matrix]]), '''β''' is a vector of unknown parameters to be estimated, and '''ε''' is the error vector.
The estimated parameters are
:<math>\mathbf {r=y^\hat{obs\beta}-y^{calc} =y^{obs}-J \left(J\mathbf{X}^\top J\mathbf{X} \right)^{-1} J\mathbf{X}^\top \mathbf{y^{obs}}.,</math>
so the fitted values are
:<math>\hat{\mathbf{y}} = \mathbf{X \hat{\beta}} = \mathbf{X} \left(\mathbf{X}^\top \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{y}.</math>
Therefore the hat matrix is given by
:<math>\mathbf{pH} = \mathbf{X} \left(J\mathbf{X}^\top J\mathbf{X} \right)^{-1} J^\top y^mathbf{obs}X}^\top.</math>
In the language of [[linear algebra]], the hat matrix is the [[orthogonal projection]] onto the [[column space]] of the design matrix '''X'''.
 
The hat matrix corresponding to a [[linear model]] is [[symmetric matrix|symmetric]] and [[idempotent]], that is, '''H'''<mathsup>\mathbf {HH=H}2</mathsup> = '''H'''. However, this is not always the case; for example, thein [[local regression|LOESSlocally weighted scatterplot smoothing]], for example, the hat matrix is generallyin notgeneral neither symmetric nor idempotent.
The eigenvalues of an idempotent matrix are equal to 1 or 0.<ref>C. B. Read, Encyclopedia of Statistical Sciences, Idempotent Matrices, Wiley, 2006</ref> Some other useful properties of the hat matrix are summarized in <ref>P. Gans, ''Data Fitting in the Chemical Sciences,'', Wiley, 1992.</ref>
 
The formula for the vector of residuals '''r''' can be expressed compactly using the hat matrix:
:<math>\mathbf{r} = \mathbf{y} - \mathbf{\hat{y}} = \mathbf{y} - \mathbf{H y} = (\mathbf{I} - \mathbf{H}) \mathbf{y}.</math>
The [[variance-covariance matrix]] of the residuals is therefore, by [[error propagation]], equal to <math>\mathbf{\left(I-H \right)^\top MV\left(I-H \right) }</math>, where '''MV''' is the variance-covariance matrix of the errors (and by extension, the observations as well). Thus, the [[residual sum of squares]] is a [[quadratic form (statistics)|quadratic form]] in the observations.
For the case of linear models with [[independent and identically distributed]] errors in which '''V''' = σ<sup>2</sup>'''I''', this reduces to ('''I''' - '''H''')σ<sup>2</sup><ref name="Hoaglin1977"/>.
 
The eigenvalues of an idempotent matrix are equal to 1 or 0.<ref>C. B. Read, Encyclopedia of Statistical Sciences, Idempotent Matrices, Wiley, 2006</ref> Some other useful properties of the hat matrix are summarized in <ref>P. Gans, ''Data Fitting in the Chemical Sciences,'', Wiley, 1992.</ref>
 
== See also ==
Line 19 ⟶ 36:
== References ==
<references />
 
 
[[Category:Statistical terminology]]