In statistics, the variance function is a function relating the variance of a random quantity to the conditional mean of the random quantity. The variance function is a main ingredient in the generalized linear model framework and plays roles in Non-parametric regression and Functional data analysis as well. Not to be confused with the variance of a function, in parametric modelling, variance functions explicitly describe the relationship between the variance and the conditional mean of a random variable. For many well known distributions, the variance function represents the complete variance of a random variable under that distribution, but in fact, these are just special cases.
The variance function and it's applications comes up in many areas of statistical analysis. A very important use of this function is in the framework of Generalized linear models and Nonparametric regression.
Generalized Linear Model
Here we derive the variance function for the exponential family in general as well as specific examples. In addition we describe the applications and use of variance functions in maximum likelihood estimation and quasi likelihood estimation.
Derivation
The Generalized linear model, GLM, is a generalization of ordinary regression analysis that extends to any member of the exponential family. It is particularly useful when the response variable is categorical, binary or subject to a constraint (e.g. only positive responses make sense). A quick summary of the components of a GLM are summarized on this page, but for more details and information see the page on generalized linear models.
A GLM consists of three main ingredients:
1. Random Component - a distribution of y from the exponential family,
2. Linear Predictor - the relationship between
3. Link Function -
First it is important to derive a couple key properties of the exponential family.
Any random variable in the exponential family has a probability density function of the form,
with loglikelihood,
Here, is the canonical parameter and the parameter of interest, and is a nuisance parameter which plays a role in the variance.
We use the Bartlett's Identities insert reference to derive a general expression for the variance function.
The first and second Bartlett results ensures that under suitable conditions ( insert references), for a density function dependent on &theta, ,
These identities lead to simple calculations of the expected value and variance of any random variable in the exponential family .
Expected Value of y
Taking the first derivative with respect to of the log of the density in the exponential family form described above, we have
Then taking the expected value and setting it equal to zero leads to,
Variance of y
To compute the variance we use the second Bartlett identity,
We have now a relationship between and , namely
and , which allows for a relationship between and the variance,
.
Note that because , then is invertible.
Examples
Normal
The Normal Distribution is a special case where the variance function is a constant. Let then we put the density function of y in the form of the exponential family described above:
where
To calculate the variance function , we first express as a function of . Then we transform into a function of
Therefore the variance function is constant.
Bernoulli
Let , then we express the density in exponential family form,
Let , then we express the density in exponential family form,
which gives us
and
This give us
Here we see the central property of Poisson data, that the variance is equal to the mean.
Gamma
The Gamma distribution and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters
Then in exponential family form we have
And we have
Application
Maximum Likelihood Estimation
Quasi Likelihood
Variance functions play a very important role in Quasi-likelihood estimation. Quasi-likelihood estimation is useful when there appears to be overdispersion in the data, or when overdispersion is likely. Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data. This can happen for many reasons, one common reason being that there is high correlation between data points (grouped data). Because most features of GLMs only depend on the first two moments of the distribution, rather than then entire distribution, the Quasi-likelihood can be developed by just specifying a link function and a variance function. That is we need to specify
Though called a Quasi-likelihood, this is in fact a quasi-log-likelihood. The QL for one observation is
And therefore the QL for all n observations is,
From the QL we have the Quasi-Score
Quasi-Score
Recall the score function, U, for data with log-likelihood is
. We obtain the Quasi-Score in an identical manner,
Noting that, for one observation the score is
The first two Bartlett equations are satisfied for the Quasi-Score, namely
and
In addition, the the quasi-score is linear in y.
Ultimately the goal is to find information about the parameters of interest . Both the Quasi-Score and the QL are actually functions of . Recall, , and , therefore,
Quasi-Information
The quasi-information, is similar to the Fisher information,
The QL, QS and QI all provide the building blocks for inference about the parameters of interest. We use the QL, QS and QI all as functions of ,