Linear probability model: Difference between revisions

Content deleted Content added
Borisba (talk | contribs)
Latent-variable formulation: : Added another example of error term of the latent variable distribution
Link suggestions feature: 3 links added.
 
(8 intermediate revisions by 6 users not shown)
Line 1:
{{Short description|Statistics model}}
In [[statistics]], a '''linear probability model''' (LPM) is a special case of a [[binary regression]] model. Here the [[dependent and independent variables|dependent variable]] for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more [[dependent and independent variables|explanatory variables]]. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by [[linear regression]].
 
Line 6 ⟶ 7:
 
For this model,
:<math> E[Y|X] = 0\cdot \Pr(Y=0|X) +1\cdot \Pr(Y=1|X) = \Pr(Y=1|X) =x'\beta,</math>
 
and hence the vector of parameters β can be estimated using [[least squares]]. This method of fitting would be inefficient,<ref name=Cox /> and can be improved by adopting an iterative scheme based on [[weighted least squares]],<ref name=Cox/> in which the model from the previous iteration is used to supply estimates of the conditional variances, <math>\operatorname{Var}(Y|X=x)</math>, which would vary between observations. This approach can be related to fitting the model by [[maximum likelihood]].<ref name=Cox/>
 
Line 12 ⟶ 14:
 
==Latent-variable formulation==
More formally, the LPM can arise from a latent-variable formulation (usually to be found in the [[econometrics]] literature, <ref name=Amemiya>{{cite journal |last=Amemiya |first=Takeshi |year=1981 |title=Qualitative Response Models: A Survey|journal=Journal of Economic Literature |volume =19 |number =4 |pages=1483–1536 }}</ref>), as follows: assume the following regression model with a latent (unobservable) dependent variable:
 
: <math>y^* = b_0+ \mathbf x'\mathbf b + \varepsilon,\;\; \varepsilon\mid \mathbf x\sim U(-a,a).</math>
 
The critical assumption here is that the error term of this regression is a symmetric around zero Uniform[[Continuous uniform distribution|uniform]] [[random variable]], and hence, of mean zero. The cumulative distribution function of <math>\varepsilon</math> here is <math>F_{\varepsilon|\mathbf x}(\varepsilon\mid \mathbf x) = \frac {\varepsilon + a}{2a}.</math>
 
Define the indicator variable <math> y = 1</math> if <math> y^* >0</math>, and zero otherwise, and consider the conditional probability
Line 37 ⟶ 39:
:<math>\beta_0 = \frac {b_0+a}{2a},\;\; \beta=\frac{\mathbf b}{2a}.</math>
 
This method is a general device to obtain a conditional probability model of a binary variable: if we assume that the distribution of the error term is Logisticlogistic, we obtain the [[logit model]], while if we assume that it is the Normalnormal, we obtain the [[probit model]] and, if we assume that it is the logarithm of a [[Weibull distrubutiondistribution]], the [[Generalized linear model|complementary log-logitlog model]].
 
== See also ==