Logistic regression: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Added bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | Linked from User:Mathbot/Most_linked_math_articles | #UCB_webform_linked 1740/1913
mNo edit summary
Line 3:
[[File:Exam pass logistic curve.svg|thumb|400px|Example graph of a logistic regression curve fitted to data. The curve shows the estimated probability of passing an exam (binary dependent variable) versus hours studying (scalar independent variable). See {{slink||Example}} for worked details.]]
 
In [[statistics]], Aa '''logistic model''' (or '''logit model''') is a [[statistical model]] that models the [[logit|log-odds]] of an event as a [[linear function (calculus)|linear combination]] of one or more [[independent variable]]s. In [[regression analysis]], '''logistic regression'''<ref>{{cite journal|last1=Tolles|first1=Juliana|last2=Meurer|first2=William J|date=2016|title=Logistic Regression Relating Patient Characteristics to Outcomes|journal=JAMA |language=en|volume=316|issue=5|pages=533–4|issn=0098-7484|oclc=6823603312|doi=10.1001/jama.2016.7653|pmid=27483067}}</ref> (or '''logit regression''') [[estimation theory|estimates]] the parameters of a logistic model (the coefficients in the linear or non linear combinations). In binary logistic regression there is a single [[binary variable|binary]] [[dependent variable]], coded by an [[indicator variable]], where the two values are labeled "0" and "1", while the [[independent variable]]s can each be a binary variable (two classes, coded by an indicator variable) or a [[continuous variable]] (any real value). The corresponding probability of the value labeled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labeling;<ref name=Hosmer/> the function that converts log-odds to probability is the [[logistic function]], hence the name. The [[unit of measurement]] for the log-odds scale is called a ''[[logit]]'', from '''''log'''istic un'''it''''', hence the alternative names. See {{slink||Background}} and {{slink||Definition}} for formal mathematics, and {{slink||Example}} for a worked example.
 
Binary variables are widely used in statistics to model the probability of a certain class or event taking place, such as the probability of a team winning, of a patient being healthy, etc. (see {{slink||Applications}}), and the logistic model has been the most commonly used model for [[binary regression]] since about 1970.{{sfn|Cramer|2002|p=10–11}} Binary variables can be generalized to [[categorical variable]]s when there are more than two possible values (e.g. whether an image is of a cat, dog, lion, etc.), and the binary logistic regression generalized to [[multinomial logistic regression]]. If the multiple categories are [[Level of measurement#Ordinal scale|ordered]], one can use the [[ordinal logistic regression]] (for example the proportional odds ordinal logistic model<ref name=wal67est />). See {{slink||Extensions}} for further extensions. The logistic regression model itself simply models probability of output in terms of input and does not perform [[statistical classification]] (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a [[binary classifier]].