Content deleted Content added
No edit summary |
Adding a section to SLR describing how to estimate a linear regression line using the method of least squares. Some parts are not so great because I am not familiar yet with the math coding. |
||
Line 15:
{{Statistics-stub}}
[[Category:Regression analysis]]
''
== Estimating the Regression Line ==
The SLR(simple linear regression) line, <math>Y = a + bX</math>, is normally determined as an estimate from a collection of sample data values consisting of <math>X</math> values in the scope of the experiment and the corresponding <math>Y</math> values observed. One common way of estimating the line is the Method of Least Squares. The goal of this method is to create a line that minimizes the summation of the residual error squared. The residual error values are the distances of each sample data point from the resulting best fit line. An example of the graphical representation of residual error is shown below:
[[Image:reserror.jpg]]
Let us use e<sub>i</sub> to represent each residual error, y<sub>i</sub> to represent each observed value of y, and ŷ<sub>i</sub> to represent the value of Y on the estimated line for each y<sub>i</sub> . The method of least squares involves minimizing '''Σe<sub>i</sub> = Σ(y<sub>i</sub> - ŷ<sub>i</sub>)'''.This is done using partial derivatives, which yield the following formulas for a(y intercept estimate) and b(slope estimate):
'''b = ( nΣx<sub>i</sub>y<sub>i</sub> - (Σx<sub>i</sub>)(Σy<sub>i</sub>) ) / ( nΣx<sub>i</sub><sup>2</sup> - (Σx<sub>i</sub>)<sup>2</sup> )'''
'''a = ( Σy<sub>i</sub> - bΣx<sub>i</sub>) / n'''
The line created using the Method of Least Squares above is characterized by two distinct features:
#Always goes through the point (X<sup>bar</sup>, Y<sup>bar</sup>), where X<sup>bar</sup> and Y<sup>bar</sup> are the average of all sample data x<sub>i</sub> and y<sub>i</sub>
#Residual errors are split so that the positive residuals cancel the negative residuals
|