Revision as of 07:04, 17 June 2021 edit 116.68.205.75 (talk) →Further reading ← Previous edit		Revision as of 02:43, 3 January 2023 edit undo ThingsLittle (talk \| contribs) 4 edits No edit summary Tag: Visual edit: Switched Next edit →
Line 1: In [[statistics]], a '''linear probability model''' (LPM) is a special case of a [[binary regression]] model. Here the [[dependent and independent variables\|dependent variable]] for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated as depending on one or more [[dependent and independent variables\|explanatory variables]]. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by [[linear regression]]. The model assumes that, for a binary outcome ([[Bernoulli trial]]), <math>Y</math>, and its associated vector of explanatory variables, <math>X</math>,<ref name=Cox>{{cite book \|last=Cox \|first=D. R. \|year=1970 \|title=Analysis of Binary Data \|___location=London \|publisher=Methuen \|isbn=0-416-10400-2 \|chapter=Simple Regression \|pages=33–42 }}</ref> Line 10: A drawback of this model is that, unless restrictions are placed on <math> \beta </math>, the estimated coefficients can imply probabilities outside the [[unit interval]] <math> [0,1] </math>. For this reason, models such as the [[logit model]] or the [[probit model]] are more commonly used. ==Latent-variable formulation== More formally, the LPM can arise from a latent-variable formulation (usually to be found in the econometrics literature, <ref name=Amemiya>{{cite journal \|last=Amemiya \|first=Takeshi \|year=1981 \|title=Qualitative Response Models: A Survey\|journal=Journal of Economic Literature \|volume =19 \|number =4 \|pages=1483–1536 }}</ref>), as follows: assume the following regression model with a latent (unobservable) dependent variable: : <math>y^* = b_0+ \mathbf x'\mathbf b + \varepsilon,\;\; \varepsilon\mid \mathbf x\sim U(-a,a).</math> The critical assumption here is that the error term of this regression is a symmetric around zero Uniform random variable, and hence, of mean zero. The cumulative distribution function of <math>\varepsilon</math> here is <math>F_{\varepsilon\|\mathbf x}(\varepsilon\mid \mathbf x) = \frac {\varepsilon + a}{2a}.</math> Define the indicator variable <math> y = 1</math> if <math> y^* >0</math>, and zero otherwise, and consider the conditional probability :<math>{\rm Pr}(y =1\mid \mathbf x ) = {\rm Pr}(y^* > 0\mid \mathbf x) = {\rm Pr}(b_0+ \mathbf x'\mathbf b + \varepsilon>0\mid \mathbf x) </math> :<math> = {\rm Pr}(\varepsilon >- b_0- \mathbf x'\mathbf b\mid \mathbf x) = 1- {\rm Pr}(\varepsilon \leq - b_0- \mathbf x'\mathbf b\mid \mathbf x)</math> :<math>=1- F_{\varepsilon\|\mathbf x}(- b_0- \mathbf x'\mathbf b\mid \mathbf x) =1- \frac {- b_0- \mathbf x'\mathbf b + a}{2a} = \frac {b_0+a}{2a}+\frac {\mathbf x'\mathbf b}{2a}.</math> But this is the Linear Probability Model, :<math>P(y =1\mid \mathbf x )= \beta_0 + \mathbf x'\beta</math> with the mapping :<math>\beta_0 = \frac {b_0+a}{2a},\;\; \beta=\frac{\mathbf b}{2a}.</math> This method is a general device to obtain a conditional probability model of a binary variable: if we assume that the distribution of the error term is Logistic, we obtain the [[logit model]], while if we assume that it is the Normal, we obtain the [[probit model]]. == See also ==

Linear probability model: Difference between revisions