Variance function

This is an old revision of this page, as edited by 169.237.46.6 (talk) at 18:44, 4 March 2014 (Quasi Likelihood). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Variance Function new article content ...

In statistics, the variance function is a function relating the variance of a random quantity to the conditional mean of the random quantity. The variance function is a main ingredient in the generalized linear model framework and plays roles in Non-parametric regression and Functional data analysis as well. Not to be confused with the variance of a function, in parametric modelling, variance functions explicitly describe the relationship between the variance and the conditional mean of a random variable. For many well known distributions, the variance function represents the complete variance of a random variable under that distribution, but in fact, these are just special cases.

Intuition

Overview

Types

The variance function and it's applications comes up in many areas of statistical analysis. A very important use of this function is in the framework of Generalized linear models and Nonparametric regression.

Generalized Linear Model

Here we derive the variance function for the exponential family in general as well as specific examples. In addition we describe the applications and use of variance functions in maximum likelihood estimation and quasi likelihood estimation.

Derivation

The Generalized linear model, GLM, is a generalization of ordinary regression analysis that extends to any member of the exponential family. It is particularly useful when the response variable is categorical, binary or subject to a constraint (e.g. only positive responses make sense). A quick summary of the components of a GLM are summarized on this page, but for more details and information see the page on generalized linear models.

A GLM consists of three main ingredients:

1. Random Component - a distribution of y from the exponential family,  
2. Linear Predictor -   the relationship between
3. Link Function -  

First it is important to derive a couple key properties of the exponential family.

Any random variable   in the exponential family has a probability density function of the form,

 

with loglikelihood,

 

Here,   is the canonical parameter and the parameter of interest, and   is a nuisance parameter which plays a role in the variance. We use the Bartlett's Identities insert reference to derive a general expression for the variance function. The first and second Bartlett results ensures that under suitable conditions ( insert references), for a density function dependent on &theta,  ,


 


 

These identities lead to simple calculations of the expected value and variance of any random variable   in the exponential family  . Expected Value of y Taking the first derivative with respect to   of the log of the density in the exponential family form described above, we have

 

Then taking the expected value and setting it equal to zero leads to,

 


 

Variance of y To compute the variance we use the second Bartlett identity,

 


 


 

We have now a relationship between   and  , namely

  and  , which allows for a relationship between   and the variance,


 
 .

Note that because  , then   is invertible.

Examples

Normal

The Normal Distribution is a special case where the variance function is a constant. Let   then we put the density function of y in the form of the exponential family described above:

 

where

 
 
 

 


To calculate the variance function  , we first express   as a function of  . Then we transform   into a function of  

 
 
 

Therefore the variance function is constant.

Bernoulli

Let  , then we express the density in exponential family form,

 


  logit(p), which gives us   expit 
  and
  expit 
 

This give us

 
Poisson

Let  , then we express the density in exponential family form,

 


  which gives us  
  and
 
 

This give us

 

Here we see the central property of Poisson data, that the variance is equal to the mean.


Gamma

The Gamma distribution and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters  

 

Then in exponential family form we have

 


 
 
 
 
 


And we have  

Application

Maximum Likelihood Estimation
Quasi Likelihood

Variance functions play a very important role in Quasi-likelihood estimation. Quasi-likelihood estimation is useful when there appears to be overdispersion in the data, or when overdispersion is likely. Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data. This can happen for many reasons, one common reason being that there is high correlation between data points (grouped data). Because most features of GLMs only depend on the first two moments of the distribution, rather than then entire distribution, the Quasi-likelihood can be developed by just specifying a link function and a variance function. That is we need to specify

- Link Function:  
- Variance Function:  

With a specified variance function and link function we can develop, as alternatives to the log-likelihood function, the score function, and the Fisher information, a Quasi-likelihood, a Quasi-score, and the Quasi-Information. This allows for full inference of  .

Quasi-Likelihood

Though called a Quasi-likelihood, this is in fact a quasi-log-likelihood.


Quasi-Score
 

The first two Bartlett equations are satisfied for the Quasi-Score, namely

  and
 

In addition, the the quasi-score is linear in y.

Non-Parametric Regression Analysis

See Also

References

Generalized linear models Quasi Likelihood Non-Parametric Regression