Simple linear regression

A simple linear regression is a linear regression in which there is only one covariate (predictor variable).

Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable.

The regression equation is given by

$Y=a+bX$

Where $Y$ is the dependent variable, $a$ is the y intercept, $b$ is the gradient or slope of the line and $X$ is independent variable.

The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the Pearson product moment correlation coefficient.

 $b=Sxy/Sxx$

This statistics-related article is a stub. You can help Wikipedia by expanding it.

Estimating the Regression Line

The SLR(simple linear regression) line, $Y=a+bX$ , is normally determined as an estimate from a collection of sample data values consisting of $X$ values in the scope of the experiment and the corresponding $Y$ values observed. One common way of estimating the line is the Method of Least Squares. The goal of this method is to create a line that minimizes the summation of the residual error squared. The residual error values are the distances of each sample data point from the resulting best fit line. An example of the graphical representation of residual error is shown below:

File:Reserror.jpg

Let us use e_i to represent each residual error, y_i to represent each observed value of y, and ŷ_i to represent the value of $Y$ on the estimated line for each y_i . The method of least squares involves minimizing Σe_i = Σ(y_i - ŷ_i)².This is done using partial derivatives, which yield the following formulas for a(y intercept estimate) and b(slope estimate):

b = ( nΣx_iy_i - (Σx_i)(Σy_i) ) / ( nΣx_i² - (Σx_i)² )

a = ( Σy_i - bΣx_i) / n

The line created using the Method of Least Squares above is characterized by two distinct features:

Always goes through the point (X^bar, Y^bar), where X^bar and Y^bar are the average of all sample data x_i and y_i
Residual errors are split so that the positive residuals cancel the negative residuals