Deming regression: Difference between revisions

Content deleted Content added
mNo edit summary
Tags: Mobile edit Mobile app edit iOS app edit
Owen Reich (talk | contribs)
Link suggestions feature: 3 links added.
 
(20 intermediate revisions by 15 users not shown)
Line 1:
{{Short description|Algorithm for the line of best fit for a two-dimensional dataset}}
[[Image:Total least squares.svg|thumb|Deming regression. The red lines show the error in both ''x'' and ''y''. This is different from the traditional least squares method, which measures error parallel to the ''y'' axis. The case shown, with deviations measured perpendicularly, arises when errors in ''x'' and ''y'' have equal variances.]]
 
In [[statistics]], '''Deming regression''', named after [[W. Edwards Deming]], is an [[errors-in-variables model]] whichthat tries to find the [[line of best fit]] for a two-dimensional dataset[[data set]]. It differs from the [[simple linear regression]] in that it accounts for [[errors and residuals in statistics|errors]] in observations on both the ''x''- and the ''y''- axis. It is a special case of [[total least squares]], which allows for any number of predictors and a more complicated error structure.
 
Deming regression is equivalent to the [[maximum likelihood]] estimation of an [[errors-in-variables model]] in which the errors for the two variables are assumed to be independent and [[normal distribution|normally distributed]], and the ratio of their variances, denoted ''δ'', is known.{{sfn|Linnet|1993}} In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.
Line 25:
: <math>y^* = \beta_0 + \beta_1 x^*,</math>
such that the weighted sum of squared residuals of the model is minimized:{{sfn|Fuller|1987|loc=Ch. 1.3.3}}
: <math>SSR = \sum_{i=1}^n\bigg(\frac{\varepsilon_i^2}{\sigma_\varepsilon^2} + \frac{\eta_i^2}{\sigma_\eta^2}\bigg) = \frac{1}{\sigma_\varepsilonepsilon^2} \sum_{i=1}^n\Big((y_i-\beta_0-\beta_1x^*_i)^2 + \delta(x_i-x^*_i)^2\Big) \ \to\ \min_{\beta_0,\beta_1,x_1^*,\ldots,x_n^*} SSR</math>
 
See {{harvtxt|Jensen|2007}} for a full derivation.
== Solution ==
The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from ''i''&nbsp;=&nbsp;1 to ''n''):
: <math>\begin{align}
& \overline{x} &= \fractfrac{1}{n}\sum x_i, \quad& \overline{y} &= \fractfrac{1}{n}\sum y_i, \\
& s_{xx} &= \tfrac{1}{n}\sum (x_i-\overline{x})^2 &&= \overline{x^2} - \overline{x}^2, \\
& s_{xy} &= \tfrac{1}{n}\sum (x_i-\overline{x})(y_i-\overline{y}) &&= \overline{x y} - \overline{x} \, \overline{y}, \\
& s_{yy} &= \tfrac{1}{n}\sum (y_i-\overline{y})^2 &&= \overline{y^2} - \overline{y}^2.
\end{align}\,</math>
 
Finally, the least-squares estimates of model's parameters will be{{sfn|Glaister|2001}}
Line 45:
 
==Orthogonal regression==
For the case of equal error variances, i.e., when <math>\delta=1</math>, Deming regression becomes '''orthogonal regression''': it minimizes the sum of squared [[distance from a point to a line|perpendicular distances from the data points to the regression line]]. In this case, denote each observation as a point ''z''<submath>''j''z_j = x_j +i y_j</submath> in the [[complex plane]] (i.e., the point (''x''<sub>''j''</submath>(x_j, ''y''<sub>''j''</sub>y_j) is written as ''z''<sub>''j''</submath> =where ''x''<submath>''j''i</submath> + ''iy''<sub>''j''</sub> where ''i'' is the [[imaginary unit]]). Denote as ''Z''<math>S=\sum{(z_j - \overline z)^2}</math> the sum of the squared differences of the data points from the [[centroid]] <math>\overline z = \tfrac{1}{n} \sum z_j</math> (also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then:{{sfn|Minda|Phelps|2008|loc=Theorem 2.3}}
 
*If ''Z'' <math>S= 0</math>, then every line through the centroid is a line of best orthogonal fit.
*If ''Z''<math>S \neq 0</math>, the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to <math>\sqrt{ZS}</math>.
 
A [[trigonometry|trigonometric]] representation of the orthogonal regression line was given by Coolidge in 1913.{{sfn|Coolidge|1913}} The [[Distance_from_a_point_to_a_line#Another_formula|distance]] can also be calculated using the more typical equation of a line, given as <math>y=mx+k</math>.
 
===Application===
 
In the case of three [[Line (geometry)|non-collinear]] points in the plane, the [[triangle]] with these points as its [[vertex (geometry)|vertices]] has a unique [[Steiner inellipse]] that is tangent to the triangle's sides at their midpoints. The [[Ellipse#Elements of an ellipse|major axis of this ellipse]] falls on the orthogonal regression line for the three vertices.{{sfn|Minda|Phelps|2008|loc=Corollary 2.4}} The quantification of a biological cell's intrinsic [[cellular noise]] can be quantified upon applying Deming regression to the observed behavior of a two reporter [[synthetic biological circuit]].{{sfn|Quarton|2020}}
 
When humans are asked to draw a linear regression on a scatterplot by guessing, their answers are closer to orthogonal regression than to [[ordinary least squares]] regression.<ref>{{cite journal |last1=Ciccione |first1=Lorenzo |last2=Dehaene |first2=Stanislas |title=Can humans perform mental regression on a graph? Accuracy and bias in the perception of scatterplots |journal=Cognitive Psychology |date=August 2021 |volume=128 |pages=101406 |doi=10.1016/j.cogpsych.2021.101406|doi-access=free }}</ref>
 
== York regression ==
The York regression extends Deming regression by allowing correlated errors in x and y.<ref>York, D., Evensen, N. M., Martınez, M. L., and Delgado, J. D. B.: Unified equations for the slope, intercept, and standard errors of the best straight line, Am. J. Phys., 72, 367–375, https://doi.org/10.1119/1.1632486, 2004.</ref>
 
==See also==
* [[Line fitting]]
* [[Regression dilution]]
 
==References==
Line 66 ⟶ 72:
* {{cite journal|last=Adcock|first=R. J.|year=1878|title=A problem in least squares|journal=The Analyst|volume=5|issue=2|pages=53–54|doi=10.2307/2635758|doi-access=free|jstor=2635758|jstor-access=free}}
* {{cite journal|author=Coolidge|first=J. L.|year=1913|title=Two geometrical applications of the mathematics of least squares|journal=The American Mathematical Monthly|volume=20|issue= 6|pages=187–190|doi=10.2307/2973072|jstor=2973072}}
* {{cite journal|author=Cornbleet |first=P.J.|last2=Gochman |first2=N.|year=1979|title=Incorrect Least–Squares Regression Coefficients|journal=Clinical Chemistry |volume=25|issue=3|pages=432–438|doi=10.1093/clinchem/25.3.432|pmid=262186|doi-access=free}}
* {{cite book|last=Deming|first=W. E.|author-link=W. Edwards Deming|year=1943|title=Statistical adjustment of data|publisher=Wiley, NY (Dover Publications edition, 1985)|isbn=0-486-64685-8}}
* {{cite book|last=Fuller|first=Wayne A.|year=1987|title=Measurement error models|publisher=John Wiley & Sons, Inc|isbn=0-471-86187-1}}
* {{cite journal |last1 = Glaister | first1 = P. | year = 2001 | title = Least squares revisited | journal = [[The Mathematical Gazette]] | volume = 85 | pages = 104–107 | doi=10.2307/3620485| jstor = 3620485 | s2cid = 125949467 }}
* {{cite web |last=Jensen |first=Anders Christian |year=2007 |title=Deming regression, MethComp package |url=httpshttp://r-forgestaff.r-projectpubhealth.org/scm/viewvcku.phpdk/*checkout*~bxc/pkg/vignettesMethComp/Deming.pdf?root=methcomp |publisher=Steno Diabetes Center |___location=Gentofte, Denmark}}
* {{cite book|last=Koopmans|first=T. C.|year=1936|title=Linear regression analysis of economic time series|publisher=DeErven F. Bohn, Haarlem, Netherlands}}
* {{cite journal
Line 96 ⟶ 102:
| url = http://www.clinchem.org/cgi/reprint/39/3/424
| pmid = 8448852
| doi-access = free
}}
*{{cite journal