Generalized linear model

This is an old revision of this page, as edited by Billjefferys (talk | contribs) at 20:29, 6 May 2006 (External links: Remove broken external link). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics the generalized linear model (GLM) generalizes the ordinary least squares regression. A GLM can be written as follows:

The data () are predicted by

where is called the link function and the random component is a function of the mean

It is convenient if the variance follows from the exponential family which covers a very large range of distributions, but it may also simply be a statement that the variance is a function of the predicted value.

Parameters must be estimated with maximum likelihood or quasi maximum likelihood.

Examples

The simplest example of a GLM is linear regression. Here the link function is the identity and the variance is assumed to be normally distributed.

Binomial data

When the response data ( ) is binary, the variance is generally regarded as binomial and the interpretation of   is then the probability of   taking on the value one. The variance function is given by:

 

where   is often exactly one. When it is not, the variance is often described as quasibinomial.

There are many popular link functions for binomial functions, they include the logistic function:

 

In addition, any cumulative density function can be used and the normal is a popular choice and is called the probit model, and the link is

 

where   is the cumulative density function of the normal distribution

The identity link is also sometimes used for binomial data (this is equivalent to using the uniform distribution instead of the normal as the CDF) but this encounters problems when the predicted probabilities are greater or less than one. In implementation this is possible to fix but interpreting the coefficients can be difficult in this model. But, it is not too distant from the logit or probit around p=0.5.

Count data

Another example of generalized linear models includes Poisson regression which models count data. In this case, the variance is proportional to the mean

 

where  , the dispersion parameter, is often equal to one. When it is not, the variance is often described as poisson with overdispersion or quasipoisson.

References

  • P. McCullagh and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.
  • A.J. Dobson. Introduction to Generalized Linear Models, Second Edition. London: Chapman and Hall/CRC, 2001.