Revision as of 22:41, 1 June 2009 edit 97.96.80.127 (talk) No edit summary ← Previous edit		Revision as of 08:53, 2 June 2009 edit undo Melcombe (talk \| contribs) Pending changes reviewers 18,880 edits Undid revision 293821310 by 97.96.80.127 (talk)rvv Next edit →
Line 1: A '''simple linear regression''' is a [[linear regression]] in which there is only one [[covariate]] (predictor variable). Simple linear regression is used to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable. Given a sample <math> (Y_i, X_i), \, i = 1, \ldots, n </math>, the regression model is given by : <math>Y_i = a + bX_i + \varepsilon_i </math> Where <math>Y_i</math> is the dependent variable, <math>a</math> is the ''y'' intercept, <math>b</math> is the gradient or slope of the line, <math>X_i</math> is independent variable and <math> \varepsilon_i </math> is a random term associated with each observation. The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the [[Pearson product moment correlation coefficient]]. == Estimating the regression line == The parameters of the linear regression model, <math> Y_i = a + bX_i + \varepsilon_i </math>, can be estimated using the method of [[ordinary least squares]]. This method finds the line that minimizes the sum of the squares of errors, <math> \sum_{i = 1}^n \varepsilon_{i}^2 </math>. The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters: Line 7 ⟶ 19: :<math> \hat{b} = \frac {\sum_{i=1}^{N} (x_{i} - \bar{x})(y_{i} - \bar{y}) } {\sum_{i=1}^{N} (x_{i} - \bar{x}) ^2} </math> :<math> \hat{a} = \bar{y} - \hat{b} \bar{x} </math> Ordinary least squares produces the following features: 1. The line goes through the point <math> (\bar{x},\bar{y}) </math>. This is easily seen rearranging the expression <math> \hat{a} = \bar{y} - \hat{b} \bar{x} </math> as <math> \bar{y} = \hat{a} + \hat{b} \bar{x} </math>, which shows that the point <math> (\bar{x},\bar{y}) </math> verifies the fitted regression equation. 2. The sum of the residuals is equal to zero, if the model includes a constant. To see why, minimize <math> \sum_{i = 1}^n \varepsilon_i^2 = \sum_{i = 1}^n (y_i - a - b x_i)^2 </math> with respect to ''a'' taking the following partial derivative: :<math> \frac{\partial}{\partial a} \sum_{i = 1}^n \varepsilon_i^2 = -2 \sum_{i = 1}^n (y_i - a - b x_i) </math>

Simple linear regression: Difference between revisions