Revision as of 01:45, 21 April 2008 edit Michael Hardy (talk \| contribs) Administrators 210,595 edits No edit summary ← Previous edit		Revision as of 01:46, 21 April 2008 edit undo Michael Hardy (talk \| contribs) Administrators 210,595 edits various cleanups Next edit →
Line 11: The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the [[Pearson product moment correlation coefficient]]. == Estimating the ~~Regression~~regression ~~Line~~line ==▼ The parameters of the linear regression line, <math>Y = a + bX</math>, can be estimated using the method of [[~~Ordinary~~ordinary ~~Least~~least ~~Squares~~squares]]. This method finds the line that minimizes the sum of the squares of the regression residuals, <math> \sum_{i=1}^N \hat{\varepsilon}_{i}^2 </math>. The residual is the difference between the observed value and the predicted value: <math> \hat{\varepsilon} _{i} = y_{i} - \hat{y}_{i} </math>▼ ▲== Estimating the Regression Line == ▲The parameters of the linear regression line, <math>Y = a + bX</math>, can be estimated using the method of [[Ordinary Least Squares]]. This method finds the line that minimizes the sum of the squares of the regression residuals, <math> \sum_{i=1}^N \hat{\varepsilon}_{i}^2 </math>. The residual is the difference between the observed value and the predicted value: <math> \hat{\varepsilon} _{i} = y_{i} - \hat{y}_{i} </math> The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters: : <math> \hat{b} = \frac {\sum_{i=1}^{N} (x_{i} - \bar{x})(y_{i} - \bar{y}) } {\sum_{i=1}^{N} (x_{i} - \bar{x}) ^2} </math> : <math> \hat{a} = \bar{y} - \hat{b} \bar{x} </math> Ordinary Least Squares produces the following features: Line 30 ⟶ 29: There are alternative (and simpler) formulas for calculating <math> \hat{b} </math>: : <math> \hat{b} = \frac {\sum_{i=1}^{N} {(x_{i}y_{i})} - N \bar{x} \bar{y}} {\sum_{i=1}^{N} (x_{i})^2 - N \bar{x}^2} = r \frac {s_y}{s_x} </math> Here, r is the correlation coefficient of X and Y, s<sub>x</sub> is the sample standard deviation of X and s<sub>y</sub> is the sample standard deviation of Y. Line 38 ⟶ 37: Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution with mean equal to '''b''' and standard error given by: : <math> s_ \hat{b} = \sqrt { \frac {\sum_{i=1}^N \hat{\varepsilon_i}^2 /(N-2)} {\sum_{i=1}^N (x_i - \bar{x})^2} }.</math>. A confidence interval for ''b'' can be created using a t-distribution with N-2 degrees of freedom: : <math> [ \hat{b} - s_ \hat{b} t_{N-2}^,\hat{b} + s_ \hat{b} t_{N-2}^] </math> == Numerical ~~Example~~example == Suppose we have the sample of points {(1,-1),(2,4),(6,3)}. The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by: : <math> \hat{b} = \frac {(1 - 3)((-1) - 2) + (2 - 3)(4 - 2) + (6 - 3)(3 - 2)} {(1 - 3)^2 + (2 - 3)^2 + (6 - 3)^2 } = 7/14 = 0.5 </math> The standard error of the coefficient is 0.866. A 95% confidence interval is given by: : [0.5 -− 0.866 x× 12.7062, 0.5 + 0.866 x× 12.7062] = [-−10.504, 11.504]. [[Category:Regression analysis]]

Simple linear regression: Difference between revisions