Content deleted Content added
m →Types of variance functions: typo |
Thatsme314 (talk | contribs) |
||
(34 intermediate revisions by 25 users not shown) | |||
Line 1:
{{Short description|Smooth function in statistics}}
{{
{{for|variance as a function of space-time separation|Variogram}}
{{Regression bar}}
In [[statistics]], the '''variance function''' is a [[smooth function]]
== Intuition ==
In a regression model setting, the goal is to establish whether or
When it is likely that the response follows a distribution that is a member of the exponential family, a [[generalized linear model]] may be more appropriate to use, and moreover, when we wish not to force a parametric model onto our data, a [[non-parametric regression]] approach can be useful. The importance of being able to model the variance as a function of the mean lies in improved inference (in a parametric setting), and estimation of the regression function in general, for any setting.
Variance functions play a very important role in parameter estimation and inference. In general, maximum likelihood estimation requires that a likelihood function be defined. This requirement then implies that one must first specify the distribution of the response variables observed. However, to define a quasi-likelihood, one need only specify a relationship between the mean and the variance of the observations to then be able to use the quasi-likelihood function for estimation.<ref>{{cite journal|first=R.W.M.|last=Wedderburn|title=Quasi-likelihood functions, generalized linear models, and the Gauss–Newton Method|journal=Biometrika|year=1974|volume=61|issue=3|pages=
In summary, to ensure efficient inference of the regression parameters and the regression function, the heteroscedasticity must be accounted for. Variance functions quantify the relationship between the variance and the mean of the observed data and hence play a significant role in regression estimation and inference.
== Types
The variance function and its applications come up in many areas of statistical analysis. A very important use of this function is in the framework of [[generalized linear models]] and [[non-parametric regression]].
Line 21 ⟶ 22:
=== Generalized linear model ===
When a member of the [[exponential family]] has been specified, the variance function can easily be derived.<ref>{{cite book | last = McCullagh | first = Peter |author2=Nelder, John |authorlink2=John Nelder | title = Generalized Linear Models | publisher = London: Chapman and Hall | year = 1989 | edition=second| isbn = 0-412-31760-5 }}</ref>{{rp|29}} The general form of the variance function is presented under the exponential family context, as well as specific forms for Normal, Bernoulli, Poisson, and Gamma. In addition, we describe the applications and use of variance functions in maximum likelihood estimation and quasi-likelihood estimation.
==== Derivation ====
Line 36 ⟶ 37:
Any random variable <math>\textit{y}</math> in the exponential family has a probability density function of the form,
:<math>
</math>
with loglikelihood,
:<math>\
</math>
Here, <math>\theta</math> is the canonical parameter and the parameter of interest, and <math>\phi</math> is a nuisance parameter which plays a role in the variance.
We use the '''Bartlett's Identities''' to derive a general expression for the '''variance function'''.
The first and second Bartlett results ensures that under suitable conditions
:<math>\operatorname{E}_\theta\left[\frac{\partial}{\partial \theta} \log(f_\theta(y)) \right] = 0
</math>
:<math>\operatorname{Var}_\theta\left[\frac{\partial}{\partial \theta}\log(f_\theta(y))\right]+\operatorname{E}_\theta\left[\frac{\partial^2}{\partial \theta^2} \log(f_\theta(y))\right] = 0
</math>
These identities lead to simple calculations of the expected value and variance of any random variable <math>\textit{y}</math> in the exponential family <math>E_\theta[y], Var_\theta[y]</math>.
'''Expected
Taking the first derivative with respect to <math>\theta</math> of the log of the density in the exponential family form described above, we have
:<math>\frac{\partial}{\partial \theta}\log(
Then taking the expected value and setting it equal to zero leads to,
:<math>\operatorname{E}_\theta\left[\frac{y-b'(\theta)}{\phi}\right] = \frac{\operatorname{E}_\theta[y]-b'(\theta)}{\phi}=0</math>
Line 66 ⟶ 67:
</math>
:<math>\operatorname{Var}_\theta\left[\frac{y - b'(\theta)}
</math>
:<math> \operatorname{Var}_\theta\left[y\right]=b''(\theta)\phi</math>▼
▲\operatorname{Var}_\theta\left[y\right]=b''(\theta)\phi</math>
We have now a relationship between <math>\mu</math> and <math>\theta</math>, namely
Line 76:
:<math>\mu = b'(\theta)</math> and <math>\theta = b'^{-1}(\mu)</math>, which allows for a relationship between <math>\mu</math> and the variance,
:<math>V(\theta) = b''(\theta)
:<math>\operatorname{V}(\mu) = b''(b'^{-1}(\mu)). \, </math>
Note that because <math>\operatorname{Var}_\theta\left[y\right]>0, b''(\theta)>0</math>, then <math>b': \theta \rightarrow \mu</math> is invertible.
We derive the variance function for a few common distributions.
Line 84:
==== Example – normal ====
The [[
:<math>f(y) = \exp\left(\frac{y\mu - \frac{\mu^2}{2}}{\sigma^2} - \frac{y^2}{2\sigma^2} - \frac{1}{2}\ln{2\pi\sigma^2}\right)
Line 102:
:<math>V(\theta) = b''(\theta) = 1</math>
Therefore, the variance function is constant.
==== Example – Bernoulli ====
Line 136:
The [[Gamma distribution]] and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters <math>(\mu,\nu)</math>
:<math>f_{\mu,\nu}(y) = \frac{1}{\Gamma(\nu)y}\left(\frac{\nu y}{\mu}\right)^\nu e^{-\frac{\nu y}{\mu}}</math>
Then in exponential family form we have
:<math>f_{\mu,\nu}(y) = \exp\left(\frac{-\frac{1}{\mu}y+\ln(\frac{1}{\mu})}{\frac{1}{\nu}}+ \ln\left(\frac{\nu^\nu y^{\nu-1}}{\Gamma(\nu)}\right)\right)</math>
Line 142:
:<math> \theta = \frac{-1}{\mu} \rightarrow \mu = \frac{-1}{\theta}</math>
:<math> \phi = \frac{1}{\nu}</math>
:<math> b(\theta) = -\ln(-\theta)</math>
:<math> b'(\theta) = \frac{-1}{\theta} = \frac{-1}{\frac{-1}{\mu}} = \mu</math>
:<math> b''(\theta) = \frac{1}{\theta^2} = \mu^2</math>
Line 154:
While WLS assumes independence of observations it does not assume equal variance and is therefore a solution for parameter estimation in the presence of heteroscedasticity. The [[Gauss–Markov theorem]] and [[Alexander Aitken|Aitken]] demonstrate that the [[best linear unbiased estimator]] (BLUE), the unbiased estimator with minimum variance, has each weight equal to the reciprocal of the variance of the measurement.
In the GLM framework, our goal is to estimate parameters <math>\beta</math>, where <math>Z = g(E[y\mid X]) = X\beta</math>. Therefore, we would like to minimize <math>(Z-XB)^{T}W(Z-XB)</math> and if we define the weight matrix '''W''' as
:<math>\underbrace{W}_{n \times n} = \begin{bmatrix} \frac{1}{\phi V(\mu_1)g'(\mu_1)^2} &0&\cdots&0&0 \\
Line 178:
:<math>\frac{\partial \eta}{\partial \mu} = g'(\mu)</math> we have that
:<math>\frac{\partial l}{\partial \beta_r} =
The Hessian matrix is determined in a similar manner and can be shown to be,
Line 191:
==== Application – quasi-likelihood ====
Because most features of '''GLMs''' only depend on the first two moments of the distribution, rather than
With a specified variance function and link function we can develop, as alternatives to the log-[[likelihood function]], the [[score (statistics)|score function]], and the [[Fisher information]], a '''[[quasi-likelihood]]''', a '''quasi-score''', and the '''quasi-information'''. This allows for full inference of <math>\beta</math>.
'''Quasi-likelihood (QL)'''
Line 210:
'''Quasi-score (QS)'''
Recall the [[score (statistics)|score function]], '''U''', for data with log-likelihood <math>\operatorname{l}(\mu\mid y)</math> is
:<math>U = \frac{\partial l }{d\mu}.</math>
Line 238:
:<math>i_b = -\operatorname{E}\left[\frac{\partial U}{\partial \beta}\right]</math>
'''QL, QS, QI as functions of <math>\beta</math>'''
The QL, QS and QI all provide the building blocks for inference about the parameters of interest and therefore it is important to express the QL, QS and QI all as functions of <math>\beta</math>.
Recalling again that <math>\mu = g^{-1}(X\beta)</math>, we derive the expressions for QL, QS and QI parametrized under <math>\beta</math>.
Quasi-likelihood in <math>\beta</math>,
Line 249:
The QS as a function of <math>\beta</math> is therefore
:<math>U_j(\beta_j) = \frac{\partial}{\partial \beta_j} Q(\beta,y) = \sum_{i=1}^n \frac{\partial \mu_i}{\partial\beta_j} \frac{y_i-\mu_i(\beta_j)}{\sigma^2V(\mu_i)}</math>
:<math>U(\beta) = \begin{bmatrix} U_1(\beta)\\
U_2(\beta)\\
Line 276:
[[File:Variance Function.png|thumb|The smoothed conditional variance against the smoothed conditional mean. The quadratic shape is indicative of the Gamma Distribution. The variance function of a Gamma is V(<math>\mu</math>) = <math>\mu^2</math>]]
Non-parametric estimation of the variance function and its importance, has been discussed widely in the literature<ref>{{cite journal|last=Muller and StadtMuller|title=Estimation of Heteroscedasticity in Regression Analysis|journal=The Annals of Statistics|year=1987|volume=15|issue=2|pages=610–625|jstor=2241329|doi=10.1214/aos/1176350364|doi-access=free}}</ref><ref>{{cite journal|
In [[non-parametric regression]] analysis, the goal is to express the expected value of your response variable('''y''') as a function of your predictors ('''X'''). That is we are looking to estimate a '''mean''' function, <math>g(x) = \operatorname{E}[y\mid X=x]</math> without assuming a parametric form. There are many forms of non-parametric [[smoothing]] methods to help estimate the function <math>g(x)</math>. An interesting approach is to also look at a non-parametric '''variance function''', <math>g_v(x) = \operatorname{Var}(Y\mid X=x)</math>. A non-parametric variance function allows one to look at the mean function as it relates to the variance function and notice patterns in the data.
:<math>g_v(x) = \operatorname{Var}(Y\mid X=x) =\operatorname{E}[y^2\mid X=x] - \left[\operatorname{E}[y\mid X=x]\right]^2 </math>
An example is detailed in the pictures to the
== Notes ==
<!--- See [[Wikipedia:Footnotes]] on how to create references using<ref></ref> tags which will then appear here automatically -->▼
{{reflist}}
▲<!--- See [[Wikipedia:Footnotes]] on how to create references using<ref></ref> tags which will then appear here automatically -->
*
*
▲== References ==
==External links==
▲* {{cite book | last = McCullagh | first = Peter | authorlink=Peter McCullagh|author2=Nelder, John |authorlink2=John Nelder | title = Generalized Linear Models | publisher = London: Chapman and Hall | year = 1989 | edition=second| isbn = 0-412-31760-5 }}
*{{Commonscatinline}}
▲* {{cite book | author=Henrik Madsen and Poul Thyregod|title= Introduction to General and Generalized Linear Models | year=2011 | publisher=Chapman & Hall/CRC | isbn=978-1-4200-9155-7| ref=harv}}
{{statistics|correlation}}
Line 300 ⟶ 301:
[[Category:Actuarial science]]
[[Category:Generalized linear models]]
|