A simple linear regression is a linear regression in which there is only one covariate (predictor variable).
Simple linear regression is used to evaluate the linear relationship between two variables. One example could be the relationship between muscle
The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:
b
^
=
∑
i
=
1
N
(
x
i
−
x
¯
)
(
y
i
−
y
¯
)
∑
i
=
1
N
(
x
i
−
x
¯
)
2
{\displaystyle {\hat {b}}={\frac {\sum _{i=1}^{N}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}}}}
a
^
=
y
¯
−
:
^
<
m
a
t
h
>
∂
∂
a
∑
i
=
1
n
ε
i
2
=
−
2
∑
i
=
1
n
(
y
i
−
a
−
b
x
i
)
{\displaystyle {\hat {a}}={\bar {y}}-{\hat {:}}<math>{\frac {\partial }{\partial a}}\sum _{i=1}^{n}\varepsilon _{i}^{2}=-2\sum _{i=1}^{n}(y_{i}-a-bx_{i})}
Setting this partial derivative to zero and noting that
ε
^
i
=
y
i
−
a
^
−
b
^
x
i
{\displaystyle {\hat {\varepsilon }}_{i}=y_{i}-{\hat {a}}-{\hat {b}}x_{i}}
yields
∑
i
=
1
n
ε
^
i
=
0
{\displaystyle \sum _{i=1}^{n}{\hat {\varepsilon }}_{i}=0}
as desired.
3. The linear combination of the residuals in which the coefficients are the x -values is equal to zero.
4. The estimates are unbiased.
There are alternative (and simpler) formulas for calculating
b
^
{\displaystyle {\hat {b}}}
:
b
^
=
∑
i
=
1
N
(
x
i
y
i
)
−
N
x
¯
y
¯
∑
i
=
1
N
(
x
i
)
2
−
N
x
¯
2
=
r
s
y
s
x
=
C
o
v
a
r
(
x
,
y
)
V
a
r
(
x
)
{\displaystyle {\hat {b}}={\frac {\sum _{i=1}^{N}{(x_{i}y_{i})}-N{\bar {x}}{\bar {y}}}{\sum _{i=1}^{N}(x_{i})^{2}-N{\bar {x}}^{2}}}=r{\frac {s_{y}}{s_{x}}}={\frac {Covar(x,y)}{Var(x)}}}
Here, r is the correlation coefficient of X and Y, sx is the sample standard deviation of X and sy is the sample standard deviation of Y.
Inference
Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution with mean equal to b and standard error given by:
s
b
^
=
∑
i
=
1
N
ε
i
^
2
/
(
N
−
2
)
∑
i
=
1
N
(
x
i
−
x
¯
)
2
.
{\displaystyle s_{\hat {b}}={\sqrt {\frac {\sum _{i=1}^{N}{\hat {\varepsilon _{i}}}^{2}/(N-2)}{\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}}}}.}
A confidence interval for b can be created using a t-distribution with N-2 degrees of freedom:
[
b
^
−
s
b
^
t
N
−
2
∗
,
b
^
+
s
b
^
t
N
−
2
∗
]
{\displaystyle [{\hat {b}}-s_{\hat {b}}t_{N-2}^{*},{\hat {b}}+s_{\hat {b}}t_{N-2}^{*}]}
Numerical example
Suppose we have the sample of points {(1,-1),(2,4),(6,3)}. The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by:
b
^
=
(
1
−
3
)
(
(
−
1
)
−
2
)
+
(
2
−
3
)
(
4
−
2
)
+
(
6
−
3
)
(
3
−
2
)
(
1
−
3
)
2
+
(
2
−
3
)
2
+
(
6
−
3
)
2
=
7
/
14
=
0.5.
{\displaystyle {\hat {b}}={\frac {(1-3)((-1)-2)+(2-3)(4-2)+(6-3)(3-2)}{(1-3)^{2}+(2-3)^{2}+(6-3)^{2}}}=7/14=0.5.}
The standard error of the coefficient is 0.866. A 95% confidence interval is given by
[0.5 − 0.866 × 12.7062, 0.5 + 0.866 × 12.7062] = [−10.504, 11.504].
Mathematical derivation of the least squares estimates
Assume that
Y
i
=
α
+
β
X
i
+
ε
{\displaystyle Y_{i}=\alpha +\beta X_{i}+\varepsilon }
is a stochastic simple regression model and let
(
y
i
,
x
i
)
,
i
=
1
,
…
,
n
{\displaystyle (y_{i},x_{i}),\,i=1,\ldots ,n}
be a sample of size n . Here the sample is seen as observable nonrandom variables but the calculations don't change when assuming that the sample is represented by random variables
(
Y
1
,
X
1
)
,
…
,
(
Y
n
,
X
n
)
{\displaystyle (Y_{1},X_{1}),\ldots ,(Y_{n},X_{n})}
.
Let Q be the sum of squared errors:
Q
(
α
,
β
)
:=
∑
i
=
1
n
(
y
i
−
α
−
β
x
i
)
2
{\displaystyle Q(\alpha ,\beta ):=\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})^{2}}
Then taking partial derivatives with respect to
α
{\displaystyle \alpha }
and
β
{\displaystyle \beta }
:
∂
Q
∂
α
(
α
,
β
)
=
−
2
∑
i
=
1
n
(
y
i
−
α
−
β
x
i
)
∂
Q
∂
β
(
α
,
β
)
=
2
∑
i
=
1
n
(
y
i
−
α
−
β
x
i
)
(
−
x
i
)
{\displaystyle {\begin{aligned}{\frac {\partial Q}{\partial \alpha }}(\alpha ,\beta )&=-2\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})\\{\frac {\partial Q}{\partial \beta }}(\alpha ,\beta )&=2\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})(-x_{i})\end{aligned}}}
Setting
∂
Q
∂
α
(
α
,
β
)
{\displaystyle {\frac {\partial Q}{\partial \alpha }}(\alpha ,\beta )}
and
∂
Q
∂
β
(
α
,
β
)
{\displaystyle {\frac {\partial Q}{\partial \beta }}(\alpha ,\beta )}
to zero yields
n
α
^
+
β
^
∑
i
=
1
n
x
i
=
∑
i
=
1
n
y
i
α
^
∑
i
=
1
n
x
i
+
β
^
∑
i
=
1
n
x
i
2
=
∑
i
=
1
n
x
i
y
i
{\displaystyle {\begin{aligned}n{\hat {\alpha }}+{\hat {\beta }}\sum _{i=1}^{n}x_{i}&=\sum _{i=1}^{n}y_{i}\\{\hat {\alpha }}\sum _{i=1}^{n}x_{i}+{\hat {\beta }}\sum _{i=1}^{n}x_{i}^{2}&=\sum _{i=1}^{n}x_{i}y_{i}\end{aligned}}}
which are known as the normal equations and can be written in matrix notation as
(
n
∑
i
=
1
n
x
i
∑
i
=
1
n
x
i
∑
i
=
1
n
x
i
2
)
(
α
^
β
^
)
=
(
∑
i
=
1
n
y
i
∑
i
=
1
n
x
i
y
i
)
{\displaystyle {\begin{pmatrix}n&\sum _{i=1}^{n}x_{i}\\\sum _{i=1}^{n}x_{i}&\sum _{i=1}^{n}x_{i}^{2}\end{pmatrix}}{\begin{pmatrix}{\hat {\alpha }}\\{\hat {\beta }}\end{pmatrix}}={\begin{pmatrix}\sum _{i=1}^{n}y_{i}\\\sum _{i=1}^{n}x_{i}y_{i}\end{pmatrix}}}
Using Cramer's rule we get
α
^
=
∑
i
=
1
n
y
i
∑
i
=
1
n
x
i
2
−
∑
i
=
1
n
x
i
y
i
∑
i
=
1
n
x
i
n
∑
i
=
1
n
x
i
2
−
(
∑
i
=
1
n
x
i
)
2
β
^
=
n
∑
i
=
1
n
x
i
y
i
−
∑
i
=
1
n
x
i
∑
i
=
1
n
y
i
n
∑
i
=
1
n
x
i
2
−
(
∑
i
=
1
n
x
i
)
2
{\displaystyle {\begin{aligned}{\hat {\alpha }}={\frac {\sum _{i=1}^{n}y_{i}\sum _{i=1}^{n}x_{i}^{2}-\sum _{i=1}^{n}x_{i}y_{i}\sum _{i=1}^{n}x_{i}}{n\sum _{i=1}^{n}x_{i}^{2}-\left(\sum _{i=1}^{n}x_{i}\right)^{2}}}\\{\hat {\beta }}={\frac {n\sum _{i=1}^{n}x_{i}y_{i}-\sum _{i=1}^{n}x_{i}\sum _{i=1}^{n}y_{i}}{n\sum _{i=1}^{n}x_{i}^{2}-\left(\sum _{i=1}^{n}x_{i}\right)^{2}}}\end{aligned}}}
Dividing the last expression by n :
β
^
=
∑
i
=
1
n
x
i
y
i
−
n
x
¯
y
¯
∑
i
=
1
n
x
i
2
−
n
x
¯
2
{\displaystyle {\hat {\beta }}={\frac {\sum _{i=1}^{n}x_{i}y_{i}-n{\bar {x}}{\bar {y}}}{\sum _{i=1}^{n}x_{i}^{2}-n{\bar {x}}^{2}}}}
Isolating
α
^
{\displaystyle {\hat {\alpha }}}
from the first normal equation yields
n
α
^
=
∑
i
=
1
n
y
i
−
β
∑
i
=
1
n
x
i
α
^
=
1
n
∑
i
=
1
n
y
i
−
β
^
1
n
∑
i
=
1
n
x
i
=
y
¯
−
β
^
x
¯
{\displaystyle {\begin{aligned}n{\hat {\alpha }}&=\sum _{i=1}^{n}y_{i}-\beta \sum _{i=1}^{n}x_{i}\\{\hat {\alpha }}&={\frac {1}{n}}\sum _{i=1}^{n}y_{i}-{\hat {\beta }}{\frac {1}{n}}\sum _{i=1}^{n}x_{i}\\&={\bar {y}}-{\hat {\beta }}{\bar {x}}\end{aligned}}}
which is a common formula for
α
^
{\displaystyle {\hat {\alpha }}}
in terms of
β
^
{\displaystyle {\hat {\beta }}}
and the sample means.
β
^
{\displaystyle {\hat {\beta }}}
may also be written as
β
^
=
∑
i
=
1
n
(
x
i
−
x
¯
)
(
y
i
−
y
¯
)
∑
i
=
1
n
(
x
i
−
x
¯
)
2
{\displaystyle {\hat {\beta }}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}}
using the following equalities:
∑
i
=
1
n
(
x
i
−
x
¯
)
2
=
∑
i
=
1
n
(
x
i
2
−
2
x
i
x
¯
+
x
¯
2
)
=
∑
i
=
1
n
x
i
2
−
2
x
¯
∑
i
=
1
n
x
i
⏟
n
x
¯
+
n
x
¯
2
=
∑
i
=
1
n
x
i
2
−
n
x
¯
2
∑
i
=
1
n
(
x
i
−
x
¯
)
(
y
i
−
y
¯
)
=
∑
i
=
1
n
(
x
i
y
i
−
y
¯
x
i
−
x
¯
y
i
+
x
¯
y
¯
)
=
∑
i
=
1
n
x
i
y
i
−
y
¯
∑
i
=
1
n
x
i
⏟
n
x
¯
−
x
¯
∑
i
=
1
n
y
i
⏟
n
y
¯
+
n
x
¯
y
¯
=
∑
i
=
1
n
x
i
y
i
−
n
y
¯
x
¯
−
n
x
¯
y
¯
+
n
x
¯
y
¯
=
∑
i
=
1
n
x
i
y
i
−
n
x
¯
y
¯
{\displaystyle {\begin{aligned}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}&=\sum _{i=1}^{n}(x_{i}^{2}-2x_{i}{\bar {x}}+{\bar {x}}^{2})\\&=\sum _{i=1}^{n}x_{i}^{2}-2{\bar {x}}\underbrace {\sum _{i=1}^{n}x_{i}} _{n{\bar {x}}}+n{\bar {x}}^{2}\\&=\sum _{i=1}^{n}x_{i}^{2}-n{\bar {x}}^{2}\\\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})&=\sum _{i=1}^{n}(x_{i}y_{i}-{\bar {y}}x_{i}-{\bar {x}}y_{i}+{\bar {x}}{\bar {y}})\\&=\sum _{i=1}^{n}x_{i}y_{i}-{\bar {y}}\underbrace {\sum _{i=1}^{n}x_{i}} _{n{\bar {x}}}-{\bar {x}}\underbrace {\sum _{i=1}^{n}y_{i}} _{n{\bar {y}}}+n{\bar {x}}{\bar {y}}\\&=\sum _{i=1}^{n}x_{i}y_{i}-n{\bar {y}}{\bar {x}}-n{\bar {x}}{\bar {y}}+n{\bar {x}}{\bar {y}}\\&=\sum _{i=1}^{n}x_{i}y_{i}-n{\bar {x}}{\bar {y}}\end{aligned}}}
The following calculation shows that
(
α
^
,
β
^
)
{\displaystyle ({\hat {\alpha }},{\hat {\beta }})}
is a minimum.
∂
Q
∂
α
(
α
,
β
)
=
−
2
∑
i
=
1
n
y
i
+
2
n
α
+
2
β
∑
i
=
1
n
x
i
∂
2
Q
∂
α
2
(
α
,
β
)
=
2
n
∂
Q
∂
β
(
α
,
β
)
=
−
2
∑
i
=
1
n
x
i
y
i
+
2
α
∑
i
=
1
n
x
i
+
2
β
∑
i
=
1
n
x
i
2
∂
2
Q
∂
β
2
(
α
,
β
)
=
2
∑
i
=
1
n
x
i
2
∂
2
Q
∂
α
∂
β
(
α
,
β
)
=
∂
2
Q
∂
β
∂
α
(
α
,
β
)
=
2
∑
i
=
1
n
x
i
{\displaystyle {\begin{aligned}{\frac {\partial Q}{\partial \alpha }}(\alpha ,\beta )&=-2\sum _{i=1}^{n}y_{i}+2n\alpha +2\beta \sum _{i=1}^{n}x_{i}\\{\frac {\partial ^{2}Q}{\partial \alpha ^{2}}}(\alpha ,\beta )&=2n\\{\frac {\partial Q}{\partial \beta }}(\alpha ,\beta )&=-2\sum _{i=1}^{n}x_{i}y_{i}+2\alpha \sum _{i=1}^{n}x_{i}+2\beta \sum _{i=1}^{n}x_{i}^{2}\\{\frac {\partial ^{2}Q}{\partial \beta ^{2}}}(\alpha ,\beta )&=2\sum _{i=1}^{n}x_{i}^{2}\\{\frac {\partial ^{2}Q}{\partial \alpha \partial \beta }}(\alpha ,\beta )&={\frac {\partial ^{2}Q}{\partial \beta \partial \alpha }}(\alpha ,\beta )=2\sum _{i=1}^{n}x_{i}\end{aligned}}}
Hence the Hessian matrix of Q is given by
D
2
Q
(
α
,
β
)
=
(
2
n
2
∑
i
=
1
n
x
i
2
∑
i
=
1
n
x
i
2
∑
i
=
1
n
x
i
2
)
|
D
2
Q
(
α
,
β
)
|
=
4
n
∑
i
=
1
n
x
i
2
−
4
(
∑
i
=
1
n
x
i
)
2
=
4
n
∑
i
=
1
n
x
i
2
−
4
n
2
x
¯
2
=
4
n
(
∑
i
=
1
n
x
i
2
−
n
x
¯
2
)
=
4
n
∑
i
=
1
n
(
x
i
−
x
¯
)
2
>
0
{\displaystyle {\begin{aligned}D^{2}Q(\alpha ,\beta )={\begin{pmatrix}2n&2\sum _{i=1}^{n}x_{i}\\2\sum _{i=1}^{n}x_{i}&2\sum _{i=1}^{n}x_{i}^{2}\end{pmatrix}}\\|D^{2}Q(\alpha ,\beta )|&=4n\sum _{i=1}^{n}x_{i}^{2}-4\left(\sum _{i=1}^{n}x_{i}\right)^{2}\\&=4n\sum _{i=1}^{n}x_{i}^{2}-4n^{2}{\bar {x}}^{2}\\&=4n\left(\sum _{i=1}^{n}x_{i}^{2}-n{\bar {x}}^{2}\right)\\&=4n\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}>0\end{aligned}}}
Since
|
D
2
Q
(
α
,
β
)
|
>
0
{\displaystyle |D^{2}Q(\alpha ,\beta )|>0}
and
2
n
>
0
{\displaystyle 2n>0}
,
D
2
Q
(
α
,
β
)
{\displaystyle D^{2}Q(\alpha ,\beta )}
is positive definite for all
(
α
,
β
)
{\displaystyle (\alpha ,\beta )}
and
(
α
^
,
β
^
)
{\displaystyle ({\hat {\alpha }},{\hat {\beta }})}
is a minimum.