Logistic regression: Difference between revisions

Content deleted Content added
Line 648:
Linear regression and logistic regression have many similarities. For example, in simple linear regression, a set of ''K'' data points (''x<sub>k</sub>'', ''y<sub>k</sub>'') are fitted to a proposed model function of the form <math>y=b_0+b_1 x</math>. The fit is obtained by choosing the ''b'' parameters which minimize the sum of the squares of the residuals (the squared error term) for each data point:
 
:<math>\epsilonvarepsilon^2=\sum_{k=1}^K (b_0+b_1 x_k-y_k)^2.</math>
 
The minimum value which constitutes the fit will be denoted by <math>\hat{\epsilonvarepsilon}^2</math>
 
The idea of a [[null model]] may be introduced, in which it is assumed that the ''x'' variable is of no use in predicting the y<sub>k</sub> outcomes: The data points are fitted to a null model function of the form ''y''&nbsp;=&nbsp;''b''<sub>0</sub>'' with a squared error term:
 
:<math>\epsilonvarepsilon^2=\sum_{k=1}^K (b_0-y_k)^2.</math>
 
The fitting process consists of choosing a value of ''b''<sub>0</sub>'' which minimizes <math>\epsilonvarepsilon^2</math> of the fit to the null model, denoted by <math>\epsilon_varepsilon_\varphi^2</math> where the <math>\varphi</math> subscript denotes the null model. It is seen that the null model is optimized by <math>b_0=\overline{y}</math> where <math>\overline{y}</math> is the mean of the ''y<sub>k</sub>'' values, and the optimized <math>\epsilon_varepsilon_\varphi^2</math> is:
 
:<math>\hat{\epsilonvarepsilon}_\varphi^2=\sum_{k=1}^K (\overline{y}-y_k)^2</math>
 
which is proportional to the square of the (uncorrected) sample standard deviation of the ''y<sub>k</sub>'' data points.
Line 664:
We can imagine a case where the ''y<sub>k</sub>'' data points are randomly assigned to the various ''x<sub>k</sub>'', and then fitted using the proposed model. Specifically, we can consider the fits of the proposed model to every permutation of the ''y<sub>k</sub>'' outcomes. It can be shown that the optimized error of any of these fits will never be less than the optimum error of the null model, and that the difference between these minimum error will follow a [[chi-squared distribution]], with degrees of freedom equal those of the proposed model minus those of the null model which, in this case, will be <math>2-1=1</math>. Using the [[chi-squared test]], we may then estimate how many of these permuted sets of ''y<sub>k</sub>'' will yield an minimum error less than or equal to the minimum error using the original ''y<sub>k</sub>'', and so we can estimate how significant an improvement is given by the inclusion of the ''x'' variable in the proposed model.
 
For logistic regression, the measure of goodness-of-fit is the likelihood function ''L'', or its logarithm, the log-likelihood ''ℓ''. The likelihood function ''L'' is analogous to the <math>\epsilonvarepsilon^2</math> in the linear regression case, except that the likelihood is maximized rather than minimized. Denote the maximized log-likelihood of the proposed model by <math>\hat{\ell}</math>.
 
In the case of simple binary logistic regression, the set of ''K'' data points are fitted in a probabilistic sense to a function of the form: