Linear probability model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 23:22, 28 November 2010 edit Road2larissa (talk \| contribs) 4 edits →The model ← Previous edit		Latest revision as of 20:08, 22 May 2025 edit undo Living Echo (talk \| contribs) 83 edits Link suggestions feature: 3 links added. Tags: Visual edit Newcomer task Suggested: add links
(34 intermediate revisions by 20 users not shown)
Line 1: {{Short description\|Statistics model}} In [[statistics]], a '''linear probability model''' (LPM) is a special case of a [[~~binomial~~binary regression]] model. Here the [[dependent and independent variables\|~~observed~~dependent variable]] for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is ~~treared~~treated as depending on one or more [[dependent and independent variables\|explanatory variables]]. For the "linear probability model", this relationship is a particularly simple one, and allows the model to be fitted by [[~~simple~~ linear regression]]. The model assumes that, for a binary outcome ([[Bernoulli trial]]), ''<math>Y''</math>, and its associated vector of explanatory variables, ''<math>X''</math>,<ref name=Cox>{{cite book \|last=Cox, \|first=D. R. (\|year=1970) ''\|title=Analysis of Binary Data~~'',~~ \|___location=London \|publisher=Methuen. ~~ISBN 0416~~\|isbn=0-416-10400-2~~(Section~~ ~~2.2)~~\|chapter=Simple Regression \|pages=33–42 }}</ref>▼ ~~==The model==~~ ▲The model assumes that, for a binary outcome ([[Bernoulli trial]]), ''Y'', and its associated vector of explanatory variables, ''X'',<ref name=Cox>Cox, D.R. (1970) ''Analysis of Binary Data'', Methuen. ISBN 0416-10400-2(Section 2.2)</ref> : <math> \Pr(Y=1 \| X=x) = x'\beta . </math> For this model, :<math> E[Y\|X] = 0\cdot \Pr(Y=0\|X) +1\cdot \Pr(Y=1\|X) = \Pr(Y=1\|X) =x'\beta,</math> and hence the vector of parameters β can be estimated using [[least squares]]. This method of fitting would be [[Efficiency (statistics)\|inefficient]]<ref name=Cox>Cox, D.R. (1970) ''Analysis of Binary Data'', Methuen. ISBN 0416-10400-2(Section 2.2)</ref> This method of fitting can be improved<ref name=Cox/> by adopting an iterative scheme based on [[weighted least squares]], in which the model from the previous iteration is used to supply estimates of the conditional variances, var(''Y''\|''X=x''), which would vary between observations. This approach can be related to fitting the model by [[maximum likelihood]].<ref name=Cox/>▼ ▲and hence the vector of parameters ~~β~~β can be estimated using [[least squares]]. This method of fitting would be ~~[[Efficiency (statistics)\|~~inefficient]],<ref name=Cox~~>Cox,~~ ~~D.R. (1970) ''Analysis of Binary Data'', Methuen. ISBN 0416-10400-2(Section 2.2)<~~/~~ref~~> ~~This method of fitting~~and can be improved~~<ref name=Cox/>~~ by adopting an iterative scheme based on [[weighted least squares]],<ref name=Cox/> in which the model from the previous iteration is used to supply estimates of the conditional variances, ~~var~~<math>\operatorname{Var}(''Y''\|''X=x'')</math>, which would vary between observations. This approach can be related to fitting the model by [[maximum likelihood]].<ref name=Cox/> A drawback of this model for the parameter of the [[Bernoulli distribution]] is that, unless restrictions are placed on <math> \beta </math>, the estimated coefficients can imply probabilities outside the [[unit interval]] <math> [0,1] </math> . For this reason, models such as the [[logit model]] or the [[probit model]] are more commonly used.▼ ▲A drawback of this model ~~for the parameter of the [[Bernoulli distribution]]~~ is that, unless restrictions are placed on <math> \beta </math>, the estimated coefficients can imply probabilities outside the [[unit interval]] <math> [0,1] </math> . For this reason, models such as the [[logit model]] or the [[probit model]] are more commonly used. ==References==▼ ==Latent-variable formulation== More formally, the LPM can arise from a latent-variable formulation (usually to be found in the [[econometrics]] literature<ref name=Amemiya>{{cite journal \|last=Amemiya \|first=Takeshi \|year=1981 \|title=Qualitative Response Models: A Survey\|journal=Journal of Economic Literature \|volume =19 \|number =4 \|pages=1483–1536 }}</ref>), as follows: assume the following regression model with a latent (unobservable) dependent variable: : <math>y^* = b_0+ \mathbf x'\mathbf b + \varepsilon,\;\; \varepsilon\mid \mathbf x\sim U(-a,a).</math> The critical assumption here is that the error term of this regression is a symmetric around zero [[Continuous uniform distribution\|uniform]] [[random variable]], and hence, of mean zero. The cumulative distribution function of <math>\varepsilon</math> here is <math>F_{\varepsilon\|\mathbf x}(\varepsilon\mid \mathbf x) = \frac {\varepsilon + a}{2a}.</math> Define the indicator variable <math> y = 1</math> if <math> y^* >0</math>, and zero otherwise, and consider the conditional probability :<math>{\rm Pr}(y =1\mid \mathbf x ) = {\rm Pr}(y^* > 0\mid \mathbf x) = {\rm Pr}(b_0+ \mathbf x'\mathbf b + \varepsilon>0\mid \mathbf x) </math> :<math> = {\rm Pr}(\varepsilon >- b_0- \mathbf x'\mathbf b\mid \mathbf x) = 1- {\rm Pr}(\varepsilon \leq - b_0- \mathbf x'\mathbf b\mid \mathbf x)</math> :<math>=1- F_{\varepsilon\|\mathbf x}(- b_0- \mathbf x'\mathbf b\mid \mathbf x) =1- \frac {- b_0- \mathbf x'\mathbf b + a}{2a} = \frac {b_0+a}{2a}+\frac {\mathbf x'\mathbf b}{2a}.</math> But this is the Linear Probability Model, :<math>P(y =1\mid \mathbf x )= \beta_0 + \mathbf x'\beta</math> with the mapping :<math>\beta_0 = \frac {b_0+a}{2a},\;\; \beta=\frac{\mathbf b}{2a}.</math> This method is a general device to obtain a conditional probability model of a binary variable: if we assume that the distribution of the error term is logistic, we obtain the [[logit model]], while if we assume that it is the normal, we obtain the [[probit model]] and, if we assume that it is the logarithm of a [[Weibull distribution]], the [[Generalized linear model\|complementary log-log model]]. == See also == * [[Linear approximation]] ▲== References == {{reflist}} == Further reading == * {{cite book \|first=John H. \|last=Aldrich \|author-link=John Aldrich (political scientist) \|first2=Forrest D. \|last2=Nelson \|title=Linear Probability, Logit, and Probit Models \|chapter=The Linear Probability Model \|publisher=Sage \|year=1984 \|pages=9–29 \|isbn=0-8039-2133-0 \|chapter-url=https://books.google.com/books?id=z0tmctgE1OYC&pg=PA9 }} * {{cite book \|last=Amemiya \|first=Takeshi \|chapter=Qualitative Response Models \|title=Advanced Econometrics \|year=1985 \|publisher=Basil Blackwell \|___location=Oxford \|isbn=0-631-13345-3 \|pages=267–359 \|chapter-url=https://books.google.com/books?id=0bzGQE14CwEC&pg=PA267 }} * {{cite book \|last=Wooldridge \|first=Jeffrey M. \|year=2013 \|title=Introductory Econometrics: A Modern Approach \|___location=Mason, OH \|publisher=South-Western \|edition=5th international \|chapter=A Binary Dependent Variable: The Linear Probability Model \|pages=238–243 \|isbn=978-1-111-53439-4 }} * Horrace, William C., and Ronald L. Oaxaca. "Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear Probability Model." Economics Letters, 2006: Vol. 90, P. 321–327 {{DEFAULTSORT:Linear Probability Model}} [[Category:~~Regression~~Generalized ~~analysis~~linear models]]