Generalized linear model

In statistics the generalized linear model (GLM) generalizes the ordinary least squares regression. A GLM can be written as follows:

The data ( $\mathbf {Y}$ ) are predicted by

E(\mathbf {Y} )={\boldsymbol {\mu }}=g^{-1}(X\beta )

where $g$ is called the link function and the random component is a function of the mean

$Var(Y)=f({\boldsymbol {\mu }})$

It is convenient if the variance follows from the exponential family which covers a very large range of distributions, but it may also simply be a statement that the variance is a function of the predicted value.

Parameters must be estimated with maximum likelihood or quasi maximum likelihood.

Generalized Linear Model Components

The GLM is made up of three elements.

1. A distribution function in exponential family form.

f_{y}(y;\theta )=\exp(a(y)b(\theta )+c(\theta )+d(y))\,

2. A linear predictor.

\eta =X\beta \,

3. A The link function.

E(y)=\mu =g^{-1}(\eta )\,

Exponential Family of Distributions

The exponential family of distributions are those probability distributions that can be fit to the form

f_{y}(y;\theta )=\exp(a(y)b(\theta )+c(\theta )+d(y))\,

The functions $a(y)$ , $b(\theta )$ , $c(\theta )$ , and $d(y)$ are known. If $a(y)=y$ , the distribution is said to be in canonical form.

Using this form, many common distributions can easily be analyzed.

Linear Predictors

Linear predictors are functions of the form

\eta _{i}=\beta _{0}+\beta _{1}x_{i}

Link Functions

The Link Function combines the Linear Predictor and the Distribution function together. There are many Link Functions that can be used, and it is important to match the Link Function to the ___domain of the distribution. Following is a table of some common link functions used for various distributions.

Canonical Link Functions
Distribution	Link Function	Formula
Normal	Linear	$\mu =X\beta \,$
Exponential	Inverse	$\mu ={\frac {1}{X\beta }}\,$
Gamma	Inverse	$\mu ={\frac {1}{X\beta }}\,$
Poisson	Log	$\mu =\exp(X\beta )\,$
Binomial	Logit	$\mu ={\frac {\exp(X\beta )}{1+\exp(X\beta )}}$
Multinomial	Logit	$\mu ={\frac {\exp(X\beta )}{1+\exp(X\beta )}}$

Examples

The simplest example of a GLM is linear regression. Here the link function is the identity and the variance is assumed to be normally distributed.

Binomial data

When the response data ( $Y$ ) is binary, the variance is generally regarded as binomial and the interpretation of $\mu _{i}$ is then the probability of $Y_{i}$ taking on the value one. The variance function is given by:

Var(Y)=\phi \mu (1-\mu )

where $\phi$ is often exactly one. When it is not, the variance is often described as quasibinomial.

There are many popular link functions for binomial functions, they include the logistic function:

g(p)=\ln \left({p \over 1-p}\right)

In addition, any cumulative density function can be used and the normal is a popular choice and is called the probit model, and the link is

g(p)=\Phi ^{-1}(p)

where $\Phi$ is the cumulative density function of the normal distribution

The identity link is also sometimes used for binomial data (this is equivalent to using the uniform distribution instead of the normal as the CDF) but this encounters problems when the predicted probabilities are greater or less than one. In implementation this is possible to fix but interpreting the coefficients can be difficult in this model. But, it is not too distant from the logit or probit around p=0.5 which are very close to linear in a neighborhood around 0.5.

Count data

Another example of generalized linear models includes Poisson regression which models count data. In this case, the variance is proportional to the mean

Var(y)=\phi \mu

where $\phi$ , the dispersion parameter, is often equal to one. When it is not, the variance is often described as poisson with overdispersion or quasipoisson.

References

P. McCullagh and J.A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.

A.J. Dobson. Introduction to Generalized Linear Models, Second Edition. London: Chapman and Hall/CRC, 2001.