Generalized linear model

This is an old revision of this page, as edited by Avraham (talk | contribs) at 01:09, 17 October 2006 (Category:Actuarial science). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics the generalized linear model (GLM) generalizes the ordinary least squares regression. A GLM can be written as follows:

The data () are predicted by

where is called the link function and the random component is a function of the mean

It is convenient if the variance follows from the exponential family which covers a very large range of distributions, but it may also simply be a statement that the variance is a function of the predicted value.

Parameters must be estimated with maximum likelihood or quasi maximum likelihood.

Generalized Linear Model Components

The GLM is made up of three elements.

1. A distribution function in exponential family form.  
2. A linear predictor.  
3. A The link function.  

Exponential Family of Distributions

The exponential family of distributions are those probability distributions that can be fit to the form

 

The functions  ,  ,  , and   are known. If  , the distribution is said to be in canonical form.

Using this form, many common distributions can easily be analyzed.

Linear Predictors

Linear predictors are functions of the form

 

The Link Function combines the Linear Predictor and the Distribution function together. There are many Link Functions that can be used, and it is important to match the Link Function to the ___domain of the distribution. Following is a table of some common link functions used for various distributions.

Canonical Link Functions
Distribution Link Function Formula
Normal Linear  
Exponential Inverse  
Gamma
Poisson Log  
Binomial Logit  
Multinomial

Examples

The simplest example of a GLM is linear regression. Here the link function is the identity and the variance is assumed to be normally distributed.

Binomial data

When the response data ( ) is binary, the variance is generally regarded as binomial and the interpretation of   is then the probability of   taking on the value one. The variance function is given by:

 

where   is often exactly one. When it is not, the variance is often described as quasibinomial.

There are many popular link functions for binomial functions, they include the logistic function:

 

In addition, any cumulative density function can be used and the normal is a popular choice and is called the probit model, and the link is

 

where   is the cumulative density function of the normal distribution

The identity link is also sometimes used for binomial data (this is equivalent to using the uniform distribution instead of the normal as the CDF) but this encounters problems when the predicted probabilities are greater or less than one. In implementation this is possible to fix but interpreting the coefficients can be difficult in this model. But, it is not too distant from the logit or probit around p=0.5 which are very close to linear in a neighborhood around 0.5.

Count data

Another example of generalized linear models includes Poisson regression which models count data. In this case, the variance is proportional to the mean

 

where  , the dispersion parameter, is often equal to one. When it is not, the variance is often described as poisson with overdispersion or quasipoisson.

References

  • P. McCullagh and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.
  • A.J. Dobson. Introduction to Generalized Linear Models, Second Edition. London: Chapman and Hall/CRC, 2001.