Projection matrix

In statistics, the hat matrix, H, relates the fitted values to the observed values. It describes the influence each observed value has on each fitted value^[1]. The diagonal elements of the hat matrix are the leverages, which describe the influence each observed value has on the fitted value for that same observation.

If the vector of observed values is denoted by y and the vector of fitted values by ${\hat {\mathbf {y} }}$ ,

{\hat {\mathbf {y} }}=\mathbf {Hy} .

As ${\hat {\mathbf {y} }}$ is usually pronounced "y-hat", the hat matrix is so named as it "puts a hat on y".

Suppose that we wish to solve a linear model using linear least squares. The model can be written as

\mathbf {y} =\mathbf {X\beta } +\mathbf {\varepsilon } ,

where X is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. The estimated parameters are

\mathbf {\hat {\beta }} =\left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {y} ,

so the fitted values are

{\hat {\mathbf {y} }}=\mathbf {X{\hat {\beta }}} =\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {y} .

Therefore the hat matrix is given by

\mathbf {H} =\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }.

In the language of linear algebra, the hat matrix is the orthogonal projection onto the column space of the design matrix X.

The hat matrix corresponding to a linear model is symmetric and idempotent, that is, H² = H. However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.

The formula for the vector of residuals r can be expressed compactly using the hat matrix:

\mathbf {r} =\mathbf {y} -\mathbf {\hat {y}} =\mathbf {y} -\mathbf {Hy} =(\mathbf {I} -\mathbf {H} )\mathbf {y} .

The variance-covariance matrix of the residuals is therefore, by error propagation, equal to $\mathbf {\left(I-H\right)^{\top }V\left(I-H\right)}$ , where V is the variance-covariance matrix of the errors (and by extension, the observations as well). For the case of linear models with independent and identically distributed errors in which V = σ²I, this reduces to (I - H)σ²^[1].

For linear models, the trace of the hat matrix is equal to the rank of X, which is the number of independent parameters of the linear model. For other models such as LOESS that are still linear in the observations y, the hat matrix can be used to define the effective degrees of freedom of the model.

Some other properties of the hat matrix are summarized in ^[2].

Correlated residuals

The above may be generalized to the case of correlated residuals. Suppose that the covariance matrix of the residuals is $\mathbf {A}$ . Then ${\hat {\mathbf {\beta } }}=\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {A} ^{-1}\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {A} ^{-1}\mathbf {y}$ and the hat matrix is

\mathbf {H} =\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {A} ^{-1}\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\mathbf {A} ^{-1}

and again it may be seen that $\mathbf {H} ^{2}=\mathbf {H}$ .

References

^ ^a ^b Hoaglin, David C.; Welsch, Roy E. (1978), "The Hat Matrix in Regression and ANOVA", The American Statistician, 32 (1): 17–22 {{citation}}: Unknown parameter |month= ignored (help)
^ Gans, P. (1992) Data Fitting in the Chemical Sciences,, Wiley. ISBN 978-0-471-93412-7

[Hoaglin1977-1] Hoaglin, David C.; Welsch, Roy E. (1978), "The Hat Matrix in Regression and ANOVA", The American Statistician, 32 (1): 17–22 {{citation}}: Unknown parameter |month= ignored (help)

[2] Gans, P. (1992) Data Fitting in the Chemical Sciences,, Wiley. ISBN 978-0-471-93412-7

[1]

[2]

Projection matrix

Correlated residuals

See also

References