Variance function: Difference between revisions

Content deleted Content added
m clean up, typo(s) fixed: Therefore → Therefore, (2) using AWB
 
(23 intermediate revisions by 18 users not shown)
Line 1:
{{Short description|Smooth function in statistics}}
{{more citations needed|date=March 2014}}
{{for|variance as a function of space-time separation|Variogram}}
 
{{Regression bar}}
 
In [[statistics]], the '''variance function''' is a [[smooth function]] whichthat depicts the [[variance]] of a [[random quantity]] as a function of its [[mean]]. The variance function is a measure of [[heteroscedasticity]] and plays a large role in many settings of statistical modelling. It is a main ingredient in the [[generalized linear model]] framework and a tool used in [[non-parametric regression]],<ref name="Muller1">{{cite journal|last=Muller and Zhao|title=On a semi parametric variance function model and a test for heteroscedasticity|journal=The Annals of Statistics|year=1995|volume=23|issue=3|pages=946–967|jstor=2242430|doi=10.1214/aos/1176324630|doi-access=free}}</ref> [[semiparametric regression]]<ref name="Muller1"/> and [[functional data analysis]].<ref>{{cite journal|last=Muller, Stadtmuller and Yao|title=Functional Variance Processes|journal=Journal of the American Statistical SocietyAssociation|year=2006|volume=101|issue=475|pages=10071007–1018|jstor=27590778|doi=10.1198/016214506000000186|s2cid=13712496}}</ref> In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a [[smooth function]].
 
== Intuition ==
 
In a regression model setting, the goal is to establish whether or not a relationship exists between a response variable and a set of predictor variables. Further, if a relationship does exist, the goal is then to be able to describe this relationship as best as possible. A main assumption in [[linear regression]] is constant variance or (homoscedasticity), meaning that different response variables have the same variance in their errors, at every predictor level. This assumption works well when the response variable and the predictor variable are jointly Normal, see [[Normalnormal distribution|normal]]. As we will see later, the variance function in the Normal setting, is constant,; however, we must find a way to quantify heteroscedasticity (non-constant variance) in the absence of joint Normality.
 
When it is likely that the response follows a distribution that is a member of the exponential family, a [[generalized linear model]] may be more appropriate to use, and moreover, when we wish not to force a parametric model onto our data, a [[non-parametric regression]] approach can be useful. The importance of being able to model the variance as a function of the mean lies in improved inference (in a parametric setting), and estimation of the regression function in general, for any setting.
 
Variance functions play a very important role in parameter estimation and inference. In general, maximum likelihood estimation requires that a likelihood function be defined. This requirement then implies that one must first specify the distribution of the response variables observed. However, to define a quasi-likelihood, one need only specify a relationship between the mean and the variance of the observations to then be able to use the quasi-likelihood function for estimation.<ref>{{cite journal|first=R.W.M.|last=Wedderburn|title=Quasi-likelihood functions, generalized linear models, and the Gauss–Newton Method|journal=Biometrika|year=1974|volume=61|issue=3|pages=439439–447|jstor=2334725|doi=10.1093/biomet/61.3.439}}</ref> [[Quasi-likelihood]] estimation is particularly useful when there is [[overdispersion]]. Overdispersion occurs when there is more variability in the data than there should otherwise be expected according to the assumed distribution of the data.
 
In summary, to ensure efficient inference of the regression parameters and the regression function, the heteroscedasticity must be accounted for. Variance functions quantify the relationship between the variance and the mean of the observed data and hence play a significant role in regression estimation and inference.
 
== Types of variance functions ==
 
The variance function and its applications come up in many areas of statistical analysis. A very important use of this function is in the framework of [[generalized linear models]] and [[non-parametric regression]].
Line 43 ⟶ 44:
Here, <math>\theta</math> is the canonical parameter and the parameter of interest, and <math>\phi</math> is a nuisance parameter which plays a role in the variance.
We use the '''Bartlett's Identities''' to derive a general expression for the '''variance function'''.
The first and second Bartlett results ensures that under suitable conditions ((see [[Leibniz integral rule]])), for a density function dependent on <math>\theta, f_\theta()</math>,
 
:<math>\operatorname{E}_\theta\left[\frac{\partial}{\partial \theta} \log(f_\theta(y)) \right] = 0
Line 83 ⟶ 84:
==== Example – normal ====
 
The [[Normalnormal distribution]] is a special case where the variance function is a constant. Let <math>y \sim N(\mu,\sigma^2)</math> then we put the density function of '''y''' in the form of the exponential family described above:
 
:<math>f(y) = \exp\left(\frac{y\mu - \frac{\mu^2}{2}}{\sigma^2} - \frac{y^2}{2\sigma^2} - \frac{1}{2}\ln{2\pi\sigma^2}\right)
Line 135 ⟶ 136:
The [[Gamma distribution]] and density function can be expressed under different parametrizations. We will use the form of the gamma with parameters <math>(\mu,\nu)</math>
 
:<math>f_{\mu,\nu}(y) = \frac{1}{\Gamma(\nu)y}\left(\frac{\nu y}{\mu}\right)^\nu e^{-\frac{\nu y}{\mu}}</math>
Then in exponential family form we have
:<math>f_{\mu,\nu}(y) = \exp\left(\frac{-\frac{1}{\mu}y+\ln(\frac{1}{\mu})}{\frac{1}{\nu}}+ \ln\left(\frac{\nu^\nu y^{\nu-1}}{\Gamma(\nu)}\right)\right)</math>
Line 141 ⟶ 142:
:<math> \theta = \frac{-1}{\mu} \rightarrow \mu = \frac{-1}{\theta}</math>
:<math> \phi = \frac{1}{\nu}</math>
:<math> b(\theta) = -\ln(-\theta)</math>
:<math> b'(\theta) = \frac{-1}{\theta} = \frac{-1}{\frac{-1}{\mu}} = \mu</math>
:<math> b''(\theta) = \frac{1}{\theta^2} = \mu^2</math>
Line 177 ⟶ 178:
:<math>\frac{\partial \eta}{\partial \mu} = g'(\mu)</math> we have that
 
:<math>\frac{\partial l}{\partial \beta_r} =\frac{ (y-\mu}{\phi V(\mu)}W\frac{\partial \eta}{\partial \mu}x_r</math>
 
The Hessian matrix is determined in a similar manner and can be shown to be,
Line 190 ⟶ 191:
==== Application – quasi-likelihood ====
 
Because most features of '''GLMs''' only depend on the first two moments of the distribution, rather than thenthe entire distribution, the quasi-likelihood can be developed by just specifying a link function and a variance function. That is, we need to specify
:* the Linklink function:, <math>E[y] = \mu = g^{-1}(\eta)</math>
:* the Variancevariance function:, <math>V(\mu)\text{</math>, where the }<math>\operatorname{Var}_\theta(y) = \sigma^2V2 V(\mu)</math>
With a specified variance function and link function we can develop, as alternatives to the log-[[likelihood function]], the [[score (statistics)|score function]], and the [[Fisher information]], a '''[[quasi-likelihood]]''', a '''quasi-score''', and the '''quasi-information'''. This allows for full inference of <math>\beta</math>.
Line 237 ⟶ 238:
:<math>i_b = -\operatorname{E}\left[\frac{\partial U}{\partial \beta}\right]</math>
 
'''QL, QS, QI as functions of <math>\beta</math>'''
 
The QL, QS and QI all provide the building blocks for inference about the parameters of interest and therefore it is important to express the QL, QS and QI all as functions of <math>\beta</math>.
 
Recalling again that <math>\mu = g^{-1}(X\beta)</math>, we derive the expressions for QL, QS and QI parametrized under <math>\beta</math>.
 
Quasi-likelihood in <math>\beta</math>,
Line 248 ⟶ 249:
The QS as a function of <math>\beta</math> is therefore
:<math>U_j(\beta_j) = \frac{\partial}{\partial \beta_j} Q(\beta,y) = \sum_{i=1}^n \frac{\partial \mu_i}{\partial\beta_j} \frac{y_i-\mu_i(\beta_j)}{\sigma^2V(\mu_i)}</math>
 
:<math>U(\beta) = \begin{bmatrix} U_1(\beta)\\
U_2(\beta)\\
Line 275 ⟶ 276:
[[File:Variance Function.png|thumb|The smoothed conditional variance against the smoothed conditional mean. The quadratic shape is indicative of the Gamma Distribution. The variance function of a Gamma is V(<math>\mu</math>) = <math>\mu^2</math>]]
 
Non-parametric estimation of the variance function and its importance, has been discussed widely in the literature<ref>{{cite journal|last=Muller and StadtMuller|title=Estimation of Heteroscedasticity in Regression Analysis|journal=The Annals of Statistics|year=1987|volume=15|issue=2|pages=610–625|jstor=2241329|doi=10.1214/aos/1176350364|doi-access=free}}</ref><ref>{{cite journal|lastlast1=Cai and Wang|title=Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression|journal=The Annals of Statistics|year=2008|volume=36|issue=5|pages=2025–2054|jstor=2546470|bibcode=2008arXiv0810.4780C|last2=Wang|first2=Lie|arxiv=0810.4780|first1=T.|doi=10.1214/07-AOS509|s2cid=9184727}}</ref><ref>{{cite journal|last=Rice and Silverman|title=Estimating the Mean and Covariance structure nonparametrically when the data are curves|journal=Journal of the Royal StatiscialStatistical Society|year=1991|volume=53|issue=1|pages=233–243|jstor=2345738}}</ref>
In [[non-parametric regression]] analysis, the goal is to express the expected value of your response variable('''y''') as a function of your predictors ('''X'''). That is we are looking to estimate a '''mean''' function, <math>g(x) = \operatorname{E}[y\mid X=x]</math> without assuming a parametric form. There are many forms of non-parametric [[smoothing]] methods to help estimate the function <math>g(x)</math>. An interesting approach is to also look at a non-parametric '''variance function''', <math>g_v(x) = \operatorname{Var}(Y\mid X=x)</math>. A non-parametric variance function allows one to look at the mean function as it relates to the variance function and notice patterns in the data.
 
:<math>g_v(x) = \operatorname{Var}(Y\mid X=x) =\operatorname{E}[y^2\mid X=x] - \left[\operatorname{E}[y\mid X=x]\right]^2 </math>
 
An example is detailed in the pictures to the leftright. The goal of the project was to determine (among other things) whether or not the predictor, '''number of years in the major leagues''' (baseball,), had an effect on the response, '''salary''', a player made. An initial scatter plot of the data indicates that there is heteroscedasticity in the data as the variance is not constant at each level of the predictor. Because we can visually detect the non-constant variance, it useful now to plot <math>g_v(x) = \operatorname{Var}(Y\mid X=x) =\operatorname{E}[y^2\mid X=x] - \left[\operatorname{E}[y\mid X=x]\right]^2 </math>, and look to see if the shape is indicative of any known distribution. One can estimate <math>\operatorname{E}[y^2\mid X=x]</math> and <math>\left[\operatorname{E}[y\mid X=x]\right]^2 </math> using a general [[smoothing]] method. The plot of the non-parametric smoothed variance function can give the researcher an idea of the relationship between the variance and the mean. The picture to the right indicates a quadratic relationship between the mean and the variance. As we saw above, the Gamma variance function is quadratic in the mean.
 
== Notes ==
Line 286 ⟶ 287:
{{reflist}}
 
== References ==
* {{cite book | last = McCullagh | first = Peter | authorlink=Peter McCullagh|author2=Nelder, John |authorlink2=John Nelder | title = Generalized Linear Models | publisher = London: Chapman and Hall | year = 1989 | edition=second| isbn = 0-412-31760-5 }}
* {{cite book | author=Henrik Madsen and Poul Thyregod|title= Introduction to General and Generalized Linear Models | year=2011 | publisher=Chapman & Hall/CRC | isbn=978-1-4200-9155-7| ref=harv}}
 
==External links==
*{{Commonscatinline}}
 
{{statistics|correlation}}