Simple linear regression: Difference between revisions

Content deleted Content added
No edit summary
Undid revision 293821310 by 97.96.80.127 (talk)rvv
Line 1:
A '''simple linear regression''' is a [[linear regression]] in which there is only one [[covariate]] (predictor variable).
 
Simple linear regression is used to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable.
Given a sample <math> (Y_i, X_i), \, i = 1, \ldots, n </math>, the regression model is given by
 
: <math>Y_i = a + bX_i + \varepsilon_i </math>
 
Where <math>Y_i</math> is the dependent variable, <math>a</math> is the ''y'' intercept, <math>b</math> is the gradient or slope of the line, <math>X_i</math> is independent variable and <math> \varepsilon_i </math> is a random term associated with each observation.
The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the [[Pearson product moment correlation coefficient]].
 
== Estimating the regression line ==
 
The parameters of the linear regression model, <math> Y_i = a + bX_i + \varepsilon_i </math>, can be estimated using the method of [[ordinary least squares]]. This method finds the line that minimizes the sum of the squares of errors, <math> \sum_{i = 1}^n \varepsilon_{i}^2 </math>.
 
The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:
Line 7 ⟶ 19:
:<math> \hat{b} = \frac {\sum_{i=1}^{N} (x_{i} - \bar{x})(y_{i} - \bar{y}) } {\sum_{i=1}^{N} (x_{i} - \bar{x}) ^2} </math>
 
:<math> \hat{a} = \bar{y} - \hat{b} \bar{x} </math>
 
Ordinary least squares produces the following features:
 
1. The line goes through the point <math> (\bar{x},\bar{y}) </math>. This is easily seen rearranging the expression <math> \hat{a} = \bar{y} - \hat{b} \bar{x} </math> as <math> \bar{y} = \hat{a} + \hat{b} \bar{x} </math>, which shows that the point <math> (\bar{x},\bar{y}) </math> verifies the fitted regression equation.
 
2. The sum of the residuals is equal to zero, if the model includes a constant. To see why, minimize <math> \sum_{i = 1}^n \varepsilon_i^2 = \sum_{i = 1}^n (y_i - a - b x_i)^2 </math> with respect to ''a'' taking the following partial derivative:
 
:<math> \frac{\partial}{\partial a} \sum_{i = 1}^n \varepsilon_i^2 = -2 \sum_{i = 1}^n (y_i - a - b x_i) </math>