Content deleted Content added
Init |
m →top: link |
||
(15 intermediate revisions by 4 users not shown) | |||
Line 1:
{{regression bar}}
In [[statistics]], specifically [[regression analysis]], a '''binary regression''' estimates a relationship between one or more [[explanatory variable]]s and a single output [[binary variable]]. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in [[linear regression]].
Binary regression is usually analyzed as a special case of [[binomial regression]], with a single outcome (
==Applications==
Line 9 ⟶ 10:
Binary regression models can be interpreted as [[latent variable model]]s, together with a measurement model; or as probabilistic models, directly modeling the probability.
=== Latent variable model ===
The latent variable interpretation has traditionally been used in [[bioassay]], yielding the [[probit model]], where normal variance and a cutoff are assumed. The latent variable interpretation is also used in [[item response theory]] (IRT).
Formally, the latent variable interpretation posits that the outcome ''y'' is related to a vector of explanatory variables ''x'' by
The simplest direct probabilistic model is the [[logit model]]], which models the [[log-odds]] as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of [[generalized linear model]]s (GLIM): the log-odds are the natural parameter for the [[exponential family]] of the Bernoulli distribution, and thus it is the simplest to use for computations.▼
: <math>y=1 [y^*>0]</math>
where <math>y^*=x\beta +\varepsilon </math> and <math>\varepsilon \mid x\sim G</math>, {{math|''β''}} is a vector of [[statistical parameter|parameters]] and ''G'' is a [[probability distribution]].
This model can be applied in many economic contexts. For instance, the outcome can be the decision of a manager whether invest to a program, <math>y^*</math> is the expected net [[discounted cash flow]] and ''x'' is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow to be positive.<ref>For a detailed example, refer to: Tetsuo Yai, Seiji Iwakura, Shigeru Morichi, Multinomial probit with structured covariance for route choice behavior, Transportation Research Part B: Methodological, Volume 31, Issue 3, June 1997, Pages 195–207, ISSN 0191-2615</ref>
Often, the [[errors and residuals|error term]] <math>\varepsilon</math> is assumed to follow a [[normal distribution]] conditional on the explanatory variables ''x''. This generates the standard [[probit model]].<ref>Bliss, C. I. (1934). "The Method of Probits". Science 79 (2037): 38–39.</ref>
=== Probabilistic model ===
▲The simplest direct probabilistic model is the [[logit model
Another direct probabilistic model is the [[linear probability model]], which models the probability itself as a linear function of the explanatory variables. A drawback of the linear probability model is that, for some values of the explanatory variables, the model will predict probabilities less than zero or greater than one.
==See also ==
*{{sectionlink|Generalized linear model#Binary data}}
*[[Fractional model]]
==References==
{{reflist}}
{{refbegin}}
* {{cite
|title=Regression Models for Categorical Dependent Variables Using Stata, Second Edition
|chapter=4. Models for binary outcomes: 4.1 The statistical model
Line 25 ⟶ 45:
|year=2006
|isbn=978-1-59718011-5
}}
* {{cite
|last=Agresti |first=Alan
|chapter=3.2 Generalized Linear Models for Binary Data
|year=2007
|title=Categorical Data Analysis
|url=https://archive.org/details/introductiontoca00agre |url-access=limited |edition=2nd
|pages=[https://archive.org/details/introductiontoca00agre/page/n88 68]–73
}}
{{refend}}
Line 41 ⟶ 59:
{{statistics-stub}}
[[Category:
|