Simple linear regression: Difference between revisions

Content deleted Content added
No edit summary
Line 14:
In this case, the slope of the fitted line is equal to the [[Pearson correlation coefficient|correlation]] between {{mvar|y}} and {{mvar|x}} corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that the line passes through the center of mass {{math|({{overline|''x''}}, {{overline|''y''}})}} of the data points.
 
==Formulation and computation==
==Fitting the regression line==
Consider the [[mathematical model|model]] function
: <math> y = \alpha + \beta x,</math>
Line 70:
The [[coefficient of determination]] ("R squared") is equal to <math>r_{xy}^2</math> when the model is linear with a single independent variable. See [[Correlation#Pearson's product-moment coefficient|sample correlation coefficient]] for additional details.
 
== Interpretation ==
=== Intuition about the slope ===
 
=== IntuitionInterpretation about the slope ===
By multiplying all members of the summation in the numerator by : <math>\begin{align}\frac{(x_i - \bar{x})}{(x_i - \bar{x})} = 1\end{align}</math> (thereby not changing it):
 
Line 81 ⟶ 82:
We can see that the slope (tangent of angle) of the regression line is the weighted average of <math>\frac{(y_i - \bar{y})}{(x_i - \bar{x})}</math> that is the slope (tangent of angle) of the line that connects the i-th point to the average of all points, weighted by <math>(x_i - \bar{x})^2</math> because the further the point is the more "important" it is, since small errors in its position will affect the slope connecting it to the center point more.
 
=== IntuitionInterpretation about the intercept ===
 
: <math>\begin{align}
Line 90 ⟶ 91:
we have <math>y_{\rm intersection} = \bar{y} - dx\times\widehat\beta = \bar{y} - dy</math>
 
=== IntuitionInterpretation about the correlation===
 
In the above formulation, notice that each <math>x_i</math> is a constant ("known upfront") value, while the <math>y_i</math> are random variables that depend on the linear function of <math>x_i</math> and the random term <math>\varepsilon_i</math>. This assumption is used when deriving the standard error of the slope and showing that it is [[Proofs_involving_ordinary_least_squares|unbiased]].