Binary regression: Difference between revisions

Content deleted Content added
m top: link
 
(14 intermediate revisions by 4 users not shown)
Line 1:
{{regression bar}}
In [[statistics]], specifically [[regression analysis]], a '''binary regression''' estimates a relationship between one or more [[explanatory variable]]s and a single output [[binary variable]]. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in [[linear regression]].
 
Binary regression is usually analyzed as a special case of [[binomial regression]], with a single outcome ({{tmath|<math>n = 1}}</math>), and one of the two alternatives considered as "success" and coded as 1: the value is the [[Count data|count]] of successes in 1 trial, either 0 or 1. The most common binary regression models are the [[logit model]] ([[logistic regression]]) and the [[probit model]] ([[probit regression]]).
 
==Applications==
Line 9 ⟶ 10:
Binary regression models can be interpreted as [[latent variable model]]s, together with a measurement model; or as probabilistic models, directly modeling the probability.
 
=== Latent variable model ===
The latent variable interpretation has traditionally been used in [[bioassay]], yielding the [[probit model]], where normal variance and a cutoff are assumed. The latent variable interpretation is also used in [[item response theory]] (IRT).
 
Formally, the latent variable interpretation posits that the outcome ''y'' is related to a vector of explanatory variables ''x'' by
The simplest direct probabilistic model is the [[logit model]]], which models the [[log-odds]] as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of [[generalized linear model]]s (GLIM): the log-odds are the natural parameter for the [[exponential family]] of the Bernoulli distribution, and thus it is the simplest to use for computations.
 
: <math>y=1 [y^*>0]</math>
 
where <math>y^*=x\beta +\varepsilon </math> and <math>\varepsilon \mid x\sim G</math>, {{math|''&beta;''}} is a vector of [[statistical parameter|parameters]] and ''G'' is a [[probability distribution]].
 
This model can be applied in many economic contexts. For instance, the outcome can be the decision of a manager whether invest to a program, <math>y^*</math> is the expected net [[discounted cash flow]] and ''x'' is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow to be positive.<ref>For a detailed example, refer to: Tetsuo Yai, Seiji Iwakura, Shigeru Morichi, Multinomial probit with structured covariance for route choice behavior, Transportation Research Part B: Methodological, Volume 31, Issue 3, June 1997, Pages 195–207, ISSN 0191-2615</ref>
 
Often, the [[errors and residuals|error term]] <math>\varepsilon</math> is assumed to follow a [[normal distribution]] conditional on the explanatory variables ''x''. This generates the standard [[probit model]].<ref>Bliss, C. I. (1934). "The Method of Probits". Science 79 (2037): 38–39.</ref>
 
=== Probabilistic model ===
The simplest direct probabilistic model is the [[logit model]]], which models the [[log-odds]] as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of [[generalized linear model]]s (GLIM): the log-odds are the natural parameter for the [[exponential family]] of the Bernoulli distribution, and thus it is the simplest to use for computations.
 
Another direct probabilistic model is the [[linear probability model]], which models the probability itself as a linear function of the explanatory variables. A drawback of the linear probability model is that, for some values of the explanatory variables, the model will predict probabilities less than zero or greater than one.
 
==See also ==
*{{sectionlink|Generalized linear model#Binary data}}
*[[Fractional model]]
 
==References==
{{reflist}}
{{refbegin}}
* {{cite chapterbook
|title=Regression Models for Categorical Dependent Variables Using Stata, Second Edition
|chapter=4. Models for binary outcomes: 4.1 The statistical model
Line 25 ⟶ 45:
|year=2006
|isbn=978-1-59718011-5
|ref=harv
}}
 
* {{cite chapterbook
|last=Agresti |first=Alan
|chapter=3.2 Generalized Linear Models for Binary Data
|year=2007
|title=Categorical Data Analysis
|url=https://archive.org/details/introductiontoca00agre |url-access=limited |edition=2nd
|edition=2nd
|pages=[https://archive.org/details/introductiontoca00agre/page/n88 68]–73
|pages=68–73
|ref=harv
}}
{{refend}}
Line 41 ⟶ 59:
{{statistics-stub}}
 
[[Category:Generalized linear models]]
[[Category:Regression analysis]]