In statistics the generalized linear model (GLM) generalizes the ordinary least squares regression. A GLM can be written as follows:
The data () are predicted by
where is called the link function and the random component is a function of the mean
It is convenient if the variance follows from the exponential family which covers a very large range of distributions, but it may also simply be a statement that the variance is a function of the predicted value.
Parameters must be estimated with maximum likelihood or quasi maximum likelihood.
Generalized Linear Model Components
The GLM is made up of three elements.
- 1. A distribution function in exponential family form.
- 2. A linear predictor.
- 3. A The link function.
Exponential Family of Distributions
The exponential family of distributions are those probability distributions that can be fit to the form
The functions , , , and are known. If , the distribution is said to be in canonical form.
Using this form, many common distributions can easily be analyzed.
Linear Predictors
Linear predictors are functions of the form
Link Functions
The Link Function combines the Linear Predictor and the Distribution function together. There are many Link Functions that can be used, and it is important to match the Link Function to the ___domain of the distribution. Following is a table of some common link functions used for various distributions.
Distribution | Link Function | Formula |
---|---|---|
Normal | Linear | |
Exponential | Inverse | |
Gamma | ||
Poisson | Log | |
Binomial | Logit | |
Multinomial |
Examples
The simplest example of a GLM is linear regression. Here the link function is the identity and the variance is assumed to be normally distributed.
Binomial data
When the response data ( ) is binary, the variance is generally regarded as binomial and the interpretation of is then the probability of taking on the value one. The variance function is given by:
where is often exactly one. When it is not, the variance is often described as quasibinomial.
There are many popular link functions for binomial functions, they include the logistic function:
In addition, any cumulative density function can be used and the normal is a popular choice and is called the probit model, and the link is
where is the cumulative density function of the normal distribution
The identity link is also sometimes used for binomial data (this is equivalent to using the uniform distribution instead of the normal as the CDF) but this encounters problems when the predicted probabilities are greater or less than one. In implementation this is possible to fix but interpreting the coefficients can be difficult in this model. But, it is not too distant from the logit or probit around p=0.5 which are very close to linear in a neighborhood around 0.5.
Count data
Another example of generalized linear models includes Poisson regression which models count data. In this case, the variance is proportional to the mean
where , the dispersion parameter, is often equal to one. When it is not, the variance is often described as poisson with overdispersion or quasipoisson.
References
- P. McCullagh and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.
- A.J. Dobson. Introduction to Generalized Linear Models, Second Edition. London: Chapman and Hall/CRC, 2001.