Variance function

Variance Function new article content ...

In statistics, the variance function is a function relating the variance of a random quantity to the conditional mean of the random quantity. The variance function is a main ingredient in the generalized linear model framework and plays roles in Non-parametric regression and Functional data analysis as well. Not to be confused with the variance of a function, in parametric modelling, variance functions explicitly describe the relationship between the variance and the conditional mean of a random variable. For many well known distributions, the variance function represents the complete variance of a random variable under that distribution, but in fact, these are just special cases.

Intuition

Overview

Types

The variance function and it's applications comes up in many areas of statistical analysis. A very important use of this function is in the framework of Generalized linear models and Nonparametric regression.

Generalized Linear Model

Here we derive the variance function for the exponential family in general as well as specific examples. In addition we describe the applications and use of variance functions in maximum likelihood estimation and quasi likelihood estimation.

Derivation

The Generalized linear model, GLM, is a generalization of ordinary regression analysis that extends to any member of the exponential family. It is particularly useful when the response variable is categorical, binary or subject to a constraint (e.g. only positive responses make sense). A quick summary of the components of a GLM are summarized on this page, but for more details and information see the page on generalized linear models.

A GLM consists of three main ingredients:

1. Random Component - a distribution of y from the exponential family,

E[y|X]=\mu

2. Linear Predictor -

\eta =XB=\sum _{j=1}^{p}X_{ij}^{T}B_{j}

the relationship between

3. Link Function -

\eta =g(\mu ),\mu =g^{-1}(\eta )

First it is important to derive a couple key properties of the exponential family.

Any random variable ${\textbf {y}}$ in the exponential family has a probability density function of the form,

\operatorname {f} (y,\theta ,\phi )=exp\left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)

with loglikelihood,

\operatorname {l} (\theta ,y,\phi )=\log(\operatorname {f} (y,\theta ,\phi )={\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )

Here, $\theta$ is the canonical parameter and the parameter of interest, and $\phi$ is a nuisance parameter which plays a role in the variance. We use the Bartlett's Identities insert reference to derive a general expression for the variance function. The first and second Bartlett results ensures that under suitable conditions ( insert references), for a density function dependent on &theta, $f_{\theta }(.)$ ,

\operatorname {E} _{\theta }\left[{\frac {\partial }{\partial \theta }}\log(f_{\theta }(y))\right]=0

\operatorname {Var} _{\theta }\left[{\frac {\partial }{\partial \theta }}\log(f_{\theta }(y))\right]+\operatorname {E} _{\theta }\left[{\frac {\partial ^{2}}{\partial \theta ^{2}}}\log(f_{\theta }(y))\right]=0

These identities lead to simple calculations of the expected value and variance of any random variable ${\textbf {y}}$ in the exponential family $E_{\theta }[y],Var_{\theta }[y]$ . Expected Value of y Taking the first derivative with respect to $\theta$ of the log of the density in the exponential family form described above, we have

{\frac {\partial }{\partial \theta }}\log(\operatorname {f} (y,\theta ,\phi )={\frac {\partial }{\partial \theta }}\left[{\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right]={\frac {y-b'(\theta )}{\phi }}

Then taking the expected value and setting it equal to zero leads to,

\operatorname {E} _{\theta }\left[{\frac {y-b'(\theta )}{\phi }}\right]={\frac {\operatorname {E} _{\theta }[y]-b'(\theta )}{\phi }}=0

\operatorname {E} _{\theta }[y]=b'(\theta )

Variance of y To compute the variance we use the second Bartlett identity,

\operatorname {Var} _{\theta }\left[{\frac {\partial }{\partial \theta }}\left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)\right]+\operatorname {E} _{\theta }\left[{\frac {\partial ^{2}}{\partial \theta ^{2}}}\left({\frac {y\theta -b(\theta )}{\phi }}-c(y,\phi )\right)\right]=0

\operatorname {Var} _{\theta }\left[{\frac {y-b'(\theta )}{\phi }}\right]+\operatorname {E} _{\theta }\left[{\frac {-b''(\theta )}{\phi }}\right]=0

\operatorname {Var} _{\theta }\left[y\right]=b''(\theta )\phi

We have now a relationship between $\mu$ and $\theta$ , namely

\mu =b'(\theta )

and

\theta =b'^{-1}(\mu )

, which allows for a relationship between

\mu

and the variance,

V(\theta )=b''(\theta )\rightarrow {\text{the part of the variance that depends on }}\theta

\operatorname {V} (\mu )=b''(b'^{-1}(\mu ))

.

Note that because $\operatorname {Var} _{\theta }\left[y\right]>0,b''(\theta )>0$ , then $b':\theta \rightarrow \mu$ is invertible.

Examples

Normal

The Normal Distribution is a special case where the variance function is a constant. Let $y\sim N(\mu ,\sigma ^{2})$ then we put the density function of y in the form of the exponential family described above:

f(y)=exp({\frac {y\mu -{\frac {\mu ^{2}}{2}}}{\sigma ^{2}}}-{\frac {y^{2}}{2\sigma ^{2}}}-{\frac {1}{2}}\ln {2\pi \sigma ^{2}})

where

\theta =\mu ,

b(\theta )={\frac {\mu ^{2}}{2}},

\phi =\sigma ^{2},

$c(y,\phi )=-{\frac {y^{2}}{2\sigma ^{2}}}-{\frac {1}{2}}\ln {2\pi \sigma ^{2}}$

To calculate the variance function $V(\mu )$ , we first express $\theta$ as a function of $\mu$ . Then we transform $V(\theta )$ into a function of $\mu$

\theta =\mu

b'(\theta )=\theta =\operatorname {E} [y]=\mu )

V(\theta )=b''(\theta )=1

Therefore the variance function is constant.

Bernoulli

Let $y\sim {\text{Bernoulli}}(p)$ , then we express the density in exponential family form,

f(y)=exp(y\ln {\frac {p}{1-p}}+\ln(1-p))

\theta =\ln {\frac {p}{1-p}}=

logit(p), which gives us

p={\frac {e^{\theta }}{1+e^{\theta }}}=

expit

(\theta )

b(\theta )=n\ln(1+e^{\theta })

and

b'(\theta )={\frac {e^{\theta }}{1+e^{\theta }}}=

expit

(\theta )=p=\mu

b''(\theta )={\frac {e^{\theta }}{1+e^{\theta }}}-\left({\frac {e^{\theta }}{1+e^{\theta }}}\right)^{2}

This give us

V(\mu )=\mu (1-\mu )

Poisson

Let $y\sim {\text{Poisson}}(\lambda )$ , then we express the density in exponential family form,

f(y)=exp(y\ln \lambda -\ln \lambda )

\theta =\ln \lambda =

which gives us

\lambda =e^{\theta }=

b(\theta )=e^{\theta }

and

b'(\theta )=e^{\theta }=\lambda =\mu

b''(\theta )=e^{\theta }=\mu

This give us

V(\mu )=\mu

Here we see the central property of Poisson data, that the variance is equal to the mean.

Gamma

The Gamma distribution and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters $(\mu ,\nu )$

f_{\mu ,\nu }(y)={\frac {1}{\Gamma (\nu )y}}({\frac {\nu y}{\mu }})^{\nu }e^{\frac {\nu y}{\mu }}

Then in exponential family form we have

f_{\mu ,\nu }(y)=exp({\frac {-{\frac {1}{\mu }}y+\ln({\frac {1}{\mu }})}{\frac {1}{\nu }}}+\ln({\frac {\nu ^{\nu }y^{\nu -1}}{\Gamma (\nu )}}))

\theta ={\frac {-1}{\mu }}\rightarrow \mu ={\frac {-1}{\theta }}

\phi ={\frac {1}{\nu }}

b(\theta )=-ln(-\theta )

b'(\theta )={\frac {-1}{\theta }}={\frac {-1}{\frac {-1}{\mu }}}=\mu

b''(\theta )={\frac {1}{\theta ^{2}}}=\mu ^{2}

And we have $V(\mu )=\mu ^{2}$

Application

Maximum Likelihood Estimation

Quasi Likelihood

Variance functions play a very important role in Quasi-likelihood estimation. Quasi-likelihood estimation is useful when there appears to be overdispersion in the data, or when overdispersion is likely. Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data. This can happen for many reasons, one common reason being that there is high correlation between data points (grouped data). Because most features of GLMs only depend on the first two moments of the distribution, rather than then entire distribution, the Quasi-likelihood can be developed by just specifying a link function and a variance function. That is we need to specify

- Link Function:

E[y]=\mu =g^{-1}(\eta )

- Variance Function:

V(\mu ){\text{, where the }}Var_{\theta }(y)=\sigma ^{2}V(\mu )

With a specified variance function and link function we can develop, as alternatives to the log-likelihood function, the score function, and the Fisher information, a Quasi-likelihood, a Quasi-score, and the Quasi-Information. This allows for full inference of $\beta$ .

Quasi-Likelihood (QL)

Though called a Quasi-likelihood, this is in fact a quasi-log-likelihood. The QL for one observation is

Q_{i}(\mu _{i},y_{i})=\int _{y_{i}}^{\mu _{i}}{{\frac {y_{i}-t}{\sigma ^{2}}}V(t)}dt

And therefore the QL for all n observations is,

Q(\mu ,y)=\sum _{i=1}^{n}{Q_{i}(\mu _{i},y_{i})}=\sum _{i=1}^{n}{\int _{y_{i}}^{\mu _{i}}{{\frac {y-t}{\sigma ^{2}}}V(t)}dt}

From the QL we have the Quasi-Score

Quasi-Score

Recall the score function, U, for data with log-likelihood $\operatorname {l} (\mu |y)$ is

U={\frac {\partial l}{d\mu }}

. We obtain the Quasi-Score in an identical manner,

U={\frac {y-\mu }{\sigma ^{2}V(\mu )}}

Noting that, for one observation the score is

{\frac {\partial Q}{\partial \mu }}={\frac {y-\mu }{\sigma ^{2}V(\mu )}}

The first two Bartlett equations are satisfied for the Quasi-Score, namely

E[U]=0

and

cov(U)+E[{\frac {\partial U}{\partial \mu }}]=0.

In addition, the the quasi-score is linear in y.

Ultimately the goal is to find information about the parameters of interest $\beta$ . Both the Quasi-Score and the QL are actually functions of $\beta$ . Recall, $\mu =g^{-1}(\eta )$ , and $\eta =X\beta$ , therefore,

\mu =g^{-1}(X\beta ).

Quasi-Information The quasi-information, is similar to the Fisher information,

i_{b}=-{\frac {\partial U}{\partial \beta }}

The QL, QS and QI all provide the building blocks for inference about the parameters of interest. We use the QL, QS and QI all as functions of $\beta$ ,

U(\beta )={\begin{bmatrix}U_{1}(\beta )\\U_{2}(\beta )\\\vdots \\\vdots \\U_{p}(\beta )\end{bmatrix}}=D^{T}V^{-1}{\frac {(y-\mu )}{\sigma ^{2}}}

Where,

\underbrace {D} _{nxp}={\begin{bmatrix}{\frac {\partial \mu _{1}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{1}}{\partial \beta _{p}}}\\{\frac {\partial \mu _{2}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{2}}{\partial \beta _{p}}}\\\vdots \\\vdots \\{\frac {\partial \mu _{m}}{\partial \beta _{1}}}&\cdots &\cdots &{\frac {\partial \mu _{m}}{\partial \beta _{p}}}\end{bmatrix}}\underbrace {V} _{nxn}={\text{diag}}(V(\mu _{1}),V(\mu _{2}),\cdots ,\cdots ,V(\mu _{n}))