Content deleted Content added
m 2 revisions imported: import old edits from "Algorithms for calculating variance/Talk" in the August 2001 database dump |
|||
(71 intermediate revisions by 44 users not shown) | |||
Line 1:
{{WikiProject banner shell|class=Start|
{{
{{WikiProject Mathematics| importance = mid }}
}}
== Online algorithm in testing yields horrible results when mean=~0 ==
Most all the tests I've seen of these algorithms add some unrealistic constant (i.e. 10^6 or larger) to the dataset to demonstrate that the suggested algorithm on this page is indeed better. I naively used this algorithm in my own work, to horrible effect. My dataset consists of a large number of discrete values, perhaps with the values -1, 0, or 1, and with an average usually between -1 and 1. I wrote the following simple test program to demonstrate the difference in results between what I'll call METHOD1 (using a running sum and sum of squares of the dataset) and METHOD2 (using a running computation of the average and the variance, which the current wiki strongly recommends).
<syntaxhighlight lang="cpp">
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstdlib>
using namespace std;
int main( )
{
srand( time( NULL ) );
const double target = 0.95;
float sum = 0;
float average = 0;
float sumsq = 0;
float qvalue = 0;
double sumd = 0;
double averaged = 0;
double sumsqd = 0;
double qvalued = 0;
int numtrials = 0;
const int width = 15;
cout << setw( width ) << left << "numtrials"
<< setw( width ) << "float avg 2"
<< setw( width ) << "float avg 1"
<< setw( width ) << "double avg 2"
<< setw( width ) << "double avg 1"
<< setw( width ) << "float std 2"
<< setw( width ) << "float std 1"
<< setw( width ) << "double std 2"
<< setw( width ) << "double std 1"
<< endl;
while( true )
{
for( int i = 0; i < 1000000; i++ )
{
const int sample = ( static_cast< double >( rand( ) ) / RAND_MAX < target ? 1 : 0 );
numtrials++;
sum += sample;
sumd += sample;
const float delta = sample - average;
average += delta / numtrials;
const double deltad = sample - averaged;
averaged += deltad / numtrials;
sumsq += sample * sample;
sumsqd += sample * sample;
qvalue += delta * ( sample - average );
qvalued += deltad * ( sample - averaged );
}
cout << fixed << setprecision( 6 );
cout << setw( width ) << left << numtrials
<< setw( width ) << average
<< setw( width ) << sum / numtrials
<< setw( width ) << averaged
<< setw( width ) << sumd / numtrials
<< setw( width ) << sqrt( qvalue / ( numtrials - 1 ) )
<< setw( width ) << sqrt( ( sumsq - ( sum / numtrials ) * sum ) / ( numtrials - 1 ) )
<< setw( width ) << sqrt( qvalued / ( numtrials - 1 ) )
<< setw( width ) << sqrt( ( sumsqd - ( sumd / numtrials ) * sumd ) / ( numtrials - 1 ) )
<< endl;
}
return 0;
}
</syntaxhighlight>
And here sample output:
<syntaxhighlight lang="text">
numtrials float avg 2 float avg 1 double avg 2 double avg 1 float std 2 float std 1 double std 2 double std 1
1000000 0.948275 0.950115 0.950115 0.950115 0.218147 0.217707 0.217707 0.217707
2000000 0.941763 0.949966 0.949966 0.949966 0.217107 0.218015 0.218015 0.218015
3000000 0.922894 0.949982 0.949982 0.949982 0.217433 0.217982 0.217982 0.217982
4000000 0.909789 0.950044 0.950044 0.950044 0.215531 0.217854 0.217854 0.217854
5000000 0.899830 0.950042 0.950042 0.950042 0.219784 0.217859 0.217859 0.217859
6000000 0.890922 0.950006 0.950006 0.950006 0.218891 0.217933 0.217933 0.217933
7000000 0.884997 0.950047 0.950047 0.950047 0.215908 0.217848 0.217848 0.217848
8000000 0.879075 0.950082 0.950082 0.950082 0.213635 0.217776 0.217776 0.217776
9000000 0.873134 0.950091 0.950091 0.950091 0.214217 0.217758 0.217758 0.217758
10000000 0.868035 0.950095 0.950095 0.950095 0.219110 0.217749 0.217749 0.217749
11000000 0.865048 0.950076 0.950076 0.950076 0.220991 0.217788 0.217788 0.217788
12000000 0.862079 0.950086 0.950086 0.950086 0.218815 0.217768 0.217768 0.217768
13000000 0.859129 0.950118 0.950118 0.950118 0.216916 0.217701 0.217701 0.217701
14000000 0.856129 0.950086 0.950086 0.950086 0.215379 0.217768 0.217768 0.217768
15000000 0.853163 0.950096 0.950096 0.950096 0.213971 0.217746 0.217746 0.217746
16000000 0.850167 0.950074 0.950074 0.950074 0.212786 0.217793 0.217793 0.217793
17000000 0.847186 0.950069 0.950069 0.950069 0.211621 0.217803 0.217803 0.217803
18000000 0.844209 0.932068 0.950068 0.950068 0.210246 0.251630 0.217805 0.217805
19000000 0.841231 0.883011 0.950066 0.950066 0.209009 0.321407 0.217808 0.217808
20000000 0.838260 0.838861 0.950071 0.950071 0.207879 0.367659 0.217798 0.217798
21000000 0.835285 0.798915 0.950072 0.950072 0.206857 0.400811 0.217797 0.217797
22000000 0.832305 0.762601 0.950069 0.950069 0.205931 0.425489 0.217803 0.217803
23000000 0.829313 0.729444 0.950057 0.950057 0.205096 0.444247 0.217828 0.217828
24000000 0.826340 0.699051 0.950059 0.950059 0.204305 0.458671 0.217823 0.217823
25000000 0.823366 0.671089 0.950061 0.950061 0.203576 0.469818 0.217819 0.217819
26000000 0.820379 0.645278 0.950055 0.950055 0.203316 0.478429 0.217832 0.217832
27000000 0.817389 0.621378 0.950047 0.950047 0.202405 0.485044 0.217849 0.217849
28000000 0.816234 0.599186 0.950046 0.950046 0.201543 0.490063 0.217849 0.217849
29000000 0.816234 0.578525 0.950056 0.950056 0.200723 0.493795 0.217829 0.217829
30000000 0.816234 0.559241 0.950035 0.950035 0.200000 0.496478 0.217872 0.217872
31000000 0.816234 0.541201 0.950036 0.950036 0.199291 0.498300 0.217871 0.217871
32000000 0.816234 0.524288 0.950039 0.950039 0.198619 0.499410 0.217864 0.217864
33000000 0.816234 0.508400 0.950038 0.950038 0.197993 0.499929 0.217867 0.217867
34000000 0.816234 0.493448 0.950034 0.950034 0.197406 0.499957 0.217876 0.217876
35000000 0.816234 0.479349 0.950037 0.950037 0.196839 0.499573 0.217868 0.217868
36000000 0.816234 0.466034 0.950038 0.950038 0.196306 0.498845 0.217866 0.217866
37000000 0.816234 0.453438 0.950037 0.950037 0.195804 0.497827 0.217868 0.217868
38000000 0.816234 0.441506 0.950036 0.950036 0.195328 0.496567 0.217871 0.217871
39000000 0.816234 0.430185 0.950032 0.950032 0.194878 0.495102 0.217878 0.217878
</syntaxhighlight>
Note how the computed average using float's and method 2 fails to six digits accuracy before even 1 million trials, while method 1 using floats reproduces the double results all the way out to 17 million trials. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/173.219.85.45|173.219.85.45]] ([[User talk:173.219.85.45|talk]]) 19:41, 19 October 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
:Method 1's sum += sample is exact (with sample values 1 and 0) until it has no effect, so method 1 fails catastrophically after about 18 million. Method 2's average += delta / numtrials is generally imprecise, and you can't expect single precision to be great here for numtrials in 6 or 7 digits, but it's not too far off on average, so even with single floats and these large numbers of samples the estimated variances are not too unreasonable. [[User:Dicklyon|Dicklyon]] ([[User talk:Dicklyon|talk]]) 21:18, 22 May 2018 (UTC)
==Untitled==
''Editorial comment: it is actually the first formula that has precision problems when dealing with limited-precision arithmetic. If the difference between measurements and the mean is very small, then the first formula will yield precision problems, as information will be lost in the (x<sub>i</sub> - µ) operation. There is no such loss of significance in the intermediate operations of the second formula. -- ScottMoonen''
Line 101 ⟶ 230:
:I'm talking rubbish it's fine as x[n]-mean[n] = (n-1)/n*(x[n]-mean[n-1]). Sorry! --[[User:Cfp|cfp]] 22:20, 15 August 2006 (UTC)
--[[User:Snoopy67|Gabriel]] ([[User talk:Snoopy67|talk]]) 21:35, 17 July 2011 (UTC)
I think, the advantages of Welford's algorithm should be toned down. It is no more numerically stable than the direct two-pass method.
See http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
and http://www.jstor.org/stable/1267176 .
== Online algorithm ==
Line 107 ⟶ 242:
:Well, algorithms I and III only scan the list of data once, whereas both versions of algorithm II scan it twice. Hence, since I/O is by several orders of magnitude slower than calculations, that means that, unless the whole data set is read into memory, algorithm II will be twice as slow, and it won't work if the data file is not seekable. --[[User:Army1987|Army1987]] 16:00, 28 October 2007 (UTC)
The Welford algorithm *is* an on-line algorithm, because it needs to see only one data item at a time! It does *not* need to store the complete set of data, i.e., it does not need to look at any data item more than once. <small><span class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Snoopy67|Snoopy67]] ([[User talk:Snoopy67|talk]] • [[Special:Contributions/Snoopy67|contribs]]) 21:30, 17 July 2011 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->
:The Welford algorithm as written calculates the sample variance in the finalize routine, and nowhere in the description does it specifically call that out. <!-- Template:Unsigned --><small class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:192.91.171.34|192.91.171.34]] ([[User talk:192.91.171.34#top|talk]] • [[Special:Contributions/192.91.171.34|contribs]]) </small>
===Parallel algorithm===
Is there an error in <math>M_{2,AB}</math>?
Chan et al. in its paper <ref name=":0">{{Citation
| last1 = Chan | first1 = Tony F. | author1-link = Tony F. Chan
| last2 = Golub | first2 = Gene H. | author2-link = Gene H. Golub
| last3 = LeVeque | first3 = Randall J.
| contribution = Updating Formulae and a Pairwise Algorithm for Computing Sample Variances.
| title = Technical Report STAN-CS-79-773
| publisher = Department of Computer Science, Stanford University
| year = 1979
| contribution-url =http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf }}.</ref> has another relation for combining arbitrary sets <math>A</math> and <math>B</math>:
<math>M_{2,AB} = M_{2,A} + M_{2,B} + \frac{n_A}{n_B n_{AB}} \cdot \left( \frac{n_B}{n_A} \bar x_A - \bar x_B \right)^2 </math>, which yelds to <math>M_{2,AB} = M_{2,A} + M_{2,B} + \frac{1}{n_{AB}} \cdot \left( \bar x_A - \bar x_B \right)^2 </math> if <math>n_A</math> and <math>n_B</math> are equal.
=== Algorithm IV ===
Line 164 ⟶ 312:
[[User:Anders gorm|Gorm]] ([[User talk:Anders gorm|talk]]) 12:54, 19 February 2008 (UTC)
:Agreed. I've now added the intermediary variables Q and R to the formula as per West's original paper [[User:The imp|The imp]] ([[User talk:The imp|talk]]) 10:35, 30 September 2009 (UTC)
===Minor spelling and grammar fixes===
Line 207 ⟶ 357:
: Btw, I don't understand your comment at the beginning since the problematic division is outside the loop. [[User:McKay|McKay]] ([[User talk:McKay|talk]]) 01:02, 10 April 2009 (UTC)
::: It should not say "else variance = 0". That implies the sample variance is 0 when ''n'' = 1. That is incorrect. The sample variance is undefined in that case. [[User:Michael Hardy|Michael Hardy]] ([[User talk:Michael Hardy|talk]]) 01:18, 10 April 2009 (UTC)
{{reflist-talk}}
== Easiest online algorithm yet. ==
Line 226 ⟶ 378:
then you have an easy way to compute the variance as you go. You also have an easy way to combine the variances of two populations. Just add the sums (and sums of squares) and the counts. No divide-by-zeros. Very few math operations. What more can you ask for? :)
:: Ah, well the connection between the two wasn't spelled out to the satisfaction of my (possibly caffeine-deprived) mind. And, as far as I can tell, the online algorithm does not easily merge statistics from multiple populations, but rather updates the statistic one sample at a time. Correct me if I am wrong, or please add a suitable method to the page. [[Special:Contributions/209.237.23.36|209.237.23.36]] ([[User talk:209.237.23.36|talk]]) 22:56, 13 May 2009 (UTC)
::: The connection to the second equation of the "naive algorithm" follows immediately from the definition of <math>E[X^2]</math> and <math>E[X]</math>. The "parallel algorithm" is an online algorithm that easily merges statistics from multiple population (as noted in the text), though a better heading might make this clear. The key point is that a parallel algorithm can be implemented as a repeated serial algorithm, and the merging part is identical in both. [[User:Markjamesabraham|Markjamesabraham]] ([[User talk:Markjamesabraham|talk]]) 22:45, 26 May 2010 (UTC)
==Pseudocode==
The numerically stable mean and variance computation works, but attempting something similar for skewness and kurtosis using the incremental moments computation code in the later section seems to fail. Is something missing in it ? Are there any reference implementations ? [[User:Shyamal|Shyamal]] ([[User talk:Shyamal|talk]]) 16:17, 14 July 2009 (UTC)
:I've implemented the Terriberry moment-based method for skewness and kurtosis in the incremental case (i.e. B={x}) and it worked fine. [[User:Markjamesabraham|Markjamesabraham]] ([[User talk:Markjamesabraham|talk]]) 22:36, 26 May 2010 (UTC)
::Some published code is at http://lib.stat.cmu.edu/apstat/52 , which also indicates its original source. [[User:Melcombe|Melcombe]] ([[User talk:Melcombe|talk]]) 12:13, 27 May 2010 (UTC)
:::Actually I subsequently found that Chan's online-merging algorithm does fine for higher-order moments, but calculates the mean very badly. When A and B have large and comparable size, <math>\bar x_X = \bar x_A + \delta\cdot\frac{n_B}{n_X}</math> for <math>\delta\! = \bar x_B - \bar x_A</math> is not as numerically stable as when one of A and B are very much larger than the other. The error contribution from the subtraction that forms <math>\delta</math> is scaled only by about 1/2, whereas when <math>n_A >> n_B = 1</math> the error contribution from the subtraction is divided by <math>n_A + 1</math>. Instead, prefer to use <math>\bar x_X = \frac{n_A \bar x_A + n_B \bar x_B}{n_A + n_B}</math>. Main page updated accordingly [[User:Markjamesabraham|Markjamesabraham]] ([[User talk:Markjamesabraham|talk]]) 14:54, 30 May 2010 (UTC)
== Compensated variant ==
I cannot make sense of the "compensated" algorithm. It does not appear to be using ordinary Kahan summation.
[[User:Bkalafut|Bkalafut]] ([[User talk:Bkalafut|talk]]) 01:19, 9 July 2010 (UTC)
== Hey ==
Actually I just wanted to ask that on the page :Algorithms for calculating variance is the pseudocode that is given for calculating variance correct?
variance = E(X**2) - (E(X))**2
[[Computational formula for the variance]]
Should'nt it be mean*mean as opposed to Sum * mean?
Thanks...
```` <small><span class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Nk28|Nk28]] ([[User talk:Nk28|talk]] • [[Special:Contributions/Nk28|contribs]]) 06:59, 12 October 2010 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->
== Pseudocode is valid Python, but gets tripped up by integer division ==
Would anyone object to a note pointing out that the pseudocode can be run verbatim in a Python intepreter, but that in Python < 3.0 you'll need a <syntaxhighlight lang="python">from __future__ import division</syntaxhighlight> statement. (Can you tell I just got caught out by this? :D) [[User:Alexsdutton|Alexsdutton]] ([[User talk:Alexsdutton|talk]]) 15:05, 23 December 2010 (UTC)
::I think it would be inappropriate. The whole idea of pseudocode is that it's not language specific. -- [[User:RoySmith|RoySmith]] [[User Talk:RoySmith|(talk)]] 18:52, 25 December 2010 (UTC)
: I'm torn. Perhaps the text should say it is Python 3.0? I am loathe to add that line to an otherwise beautiful looks-like-pseudocode example, but as you say, the gotcha is easy to fall into... [[User:BenFrantzDale|—Ben FrantzDale]] ([[User talk:BenFrantzDale|talk]])
== Divide by n vs n-1? ==
Can somebody explain why there are two different flavors of the formula for variance, one where you divide by n, the other where you divide by n-1? Obviously, for large values of n, they approach each other, but why the difference at all? -- [[User:RoySmith|RoySmith]] [[User Talk:RoySmith|(talk)]] 18:55, 25 December 2010 (UTC)
: Wikipedia's mathematics reference desk would probably be a better place for this sort of question.
: Notice that
:* The rule that
::: var(''X'' + ''Y'') = var(''X'') + var(''Y'')
:: when ''X'' and ''Y'' are [[independence (probability theory)|independent]] [[random variable]]s applies when you divide by ''n'' and '''not''' when you divide by ''n'' − 1.
:* Dividing by ''n'' − 1 is done when one is trying to [[estimation (statistics)|estimate]] the variance of a population by using a random sample. It should never be done when one has the whole population and the variance can be computed exactly. It's effect is to make the estimate [[bias (statistics)|unbiased]]. That's not really all that good a reason to do it, but that's why it's done. See [[Bessel's correction]] for an account of some of the mathematical theory.
:* This is standard textbook material. Maybe more of it should be included in the article.
: [[User:Michael Hardy|Michael Hardy]] ([[User talk:Michael Hardy|talk]]) 19:28, 25 December 2010 (UTC)
::*Thanks for the pointer to [[Bessel's correction]], that explains this better than I've ever understood it before. And, yes, my intent in asking the question was not so much for my own knowledge (although that's part of it), but as a suggestion that something along these lines be included in this article. -- [[User:RoySmith|RoySmith]] [[User Talk:RoySmith|(talk)]] 03:28, 28 December 2010 (UTC)
:I am confused by the distinction of sampling from a population of size N vs sampling from a smaller subset n<N. The formulas to compute the variance, and its sample estimate, have exactly the same form (for finite N and n). This probably results from some authors referring to "population variance" when actually meaning "variance of the distribution" from where the independent samples originate (for example see Navidi, William (2006) Statistics for Engineers and Scientists, McGraw-Hill, pg 14.).
: The formulas in the article implicitly assume samples to be i.i.d random variables ([[Independent_and_identically_distributed_random_variables]]). The variance of such population of size K (either n or N) involves dividing by K. An unbiased estimate of this variance involves dividing by K if the mean of the distribution is known, or dividing by K-1 if the mean is also estimated from the data.
:This is consistent with the articles:
::* [[Bessel's correction]]
::* [[Variance#Sample_variance]]
:And the references:
::* Knuth, Donald Erwin, Ronald L. Graham, and Oren Patashnik. "Concrete mathematics." Adison Wesley (1989), p373.
::* Cramér, Harald. "Mathematical Methods of Statistics". Princeton university press (1961), p351.
: I believe the main article should be corrected to reflect the probabilistic nature of the population (i.i.d samples) and drop mention of N vs n.
[[User:Mpbrigham|Mpbrigham]] ([[User talk:Mpbrigham|talk]]) 02:35, 26 January 2017 (UTC)
== Weighted variance pseudocode correctness ==
The weighted variance pseudocode seems correct to me; I just did a back-of-the-envelope check (but feel free to check it more thoroughly!). I've removed the misleading comment in the pseudocode (''#[WARNING] This seems wrong. It should be moved before M2 assignment''). This comment doesn't seem to make much sense; one can easily show that it yields the wrong result for the trivial case in which the weights are all equal to one.
[[User:Theaucitron|Theaucitron]] ([[User talk:Theaucitron|talk]]) 11:07, 21 August 2012 (UTC)
== Online algorithm for covariance ==
Is this algorithm right? I'm not an expert, but when I compare it to the online variance computation, it's different, and if I replace y with x, I should get cov(x,x) = var(x), so could anyone have a look please? [[Special:Contributions/131.227.95.222|131.227.95.222]] ([[User talk:131.227.95.222|talk]]) 18:03, 27 February 2013 (UTC)
: Alright, I found a paper talking about this and now I understand that the formula is talking about co-moments and not covariances, I will add the note to the article. [[Special:Contributions/131.227.95.222|131.227.95.222]] ([[User talk:131.227.95.222|talk]]) 10:31, 28 February 2013 (UTC)
== Kurtosis algorithm biased for normal distribution? ==
[[Kurtosis|The wikipedia page for kurtosis]] claims that unbiased estimation of kurtosis is possible for normally distributed inputs. However, when I run the code my results are clearly biased.
<code>
[hs4 ~] 30$ ~/tmp/simple_kurt.py<br />
Average kurtosis: -0.545012387836<br />
[hs4 ~] 30$ ~/tmp/simple_kurt.py<br />
Average kurtosis: -0.536851508033<br />
[hs4 ~] 30$ ~/tmp/simple_kurt.py<br />
Average kurtosis: -0.55578863384<br />
[hs4 ~] 30$ ~/tmp/simple_kurt.py<br />
Average kurtosis: -0.546190322949<br />
</code>
Here's the code I'm running:
<syntaxhighlight lang="python">
#!/usr/bin/python2.7
from __future__ import division
from random import gauss
def kurtosis(data):
n = 0
mean = 0
M2 = 0
M3 = 0
M4 = 0
for x in data:
n1 = n
n = n + 1
delta = x - mean
delta_n = delta / n
delta_n2 = delta_n * delta_n
term1 = delta * delta_n * n1
mean = mean + delta_n
M4 = M4 + term1 * delta_n2 * (n*n - 3*n + 3) + 6 * delta_n2 * M2 - 4 * delta_n * M3
M3 = M3 + term1 * delta_n * (n - 2) - 3 * delta_n * M2
M2 = M2 + term1
kurtosis = (n*M4) / (M2*M2) - 3
return kurtosis
max_datasets = 10000
sum_kurtosis = 0
for lv in range(max_datasets):
data = [gauss(0, 1) for i in range(10)]
sum_kurtosis += kurtosis(data)
print "Average kurtosis:", sum_kurtosis/max_datasets
</syntaxhighlight>
[[Special:Contributions/208.77.214.129|208.77.214.129]] ([[User talk:208.77.214.129|talk]]) 21:02, 3 June 2013 (UTC)
== Weighted Incremental Algorithm Alternative Incorrect ==
<syntaxhighlight lang="python">
def weighted_incremental_variance(dataWeightPairs):
...
M2 = M2 + sumweight * delta * R # Alternatively, "M2 = M2 + weight * delta * (x−mean)"
...
</syntaxhighlight>
I did the math to see if the stated update formula is indeed correct with the alternative mentioned in the comment; I have found that it is not. As such, I propose the alternative comment be removed.
If anyone would like me to do show the math of why there are not equivalent, let me know. Basically, the alternative confuses the sumweight of the previous iteration and the sumweight of the current iteration (which is updated after M2 is calculated). <small><span class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Anishpatel0|Anishpatel0]] ([[User talk:Anishpatel0|talk]] • [[Special:Contributions/Anishpatel0|contribs]]) 13:11, 29 April 2014 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->
If anyone else would confirm that I am right, I'll remove the alternative comment in the article. [[User:Anishpatel0|Anishpatel0]] ([[User talk:Anishpatel0|talk]]) 13:14, 29 April 2014 (UTC)
Ok, boss. Here's your math. Really wonder what they teach you folks these days...
What we have
temp = weight + sumweight
delta = x - mean
R = delta * weight / temp
mean = mean + R
What we want to show
sumweight * delta * R = weight * delta * (x−mean)
sumweight * delta * R = weight * delta * (x−(x-delta+R))
sumweight * delta * delta * weight / temp = weight * delta * (x−(x-delta+R))
sumweight * delta * delta * weight / temp = weight * delta * (x−x+delta-R))
delta * weight * sumweight * delta / temp = weight * delta * (delta-delta * weight / temp)
weight * delta * sumweight * delta / temp = weight * delta * (delta-delta * weight / temp)
weight * delta * sumweight * delta / temp = weight * delta * delta * (1 - weight / temp)
weight * delta * delta * sumweight / temp = weight * delta * delta * (1 - weight / temp)
weight * delta * delta * sumweight / temp = weight * delta * delta * (1 - weight / temp)
weight * delta * delta * sumweight / temp = weight * delta * delta * (temp - weight) / temp
weight * delta * delta * sumweight / temp = weight * delta * delta * sumweight / temp
--[[Special:Contributions/204.120.71.252|204.120.71.252]] ([[User talk:204.120.71.252|talk]]) 06:05, 16 July 2014 (UTC)
==What language are the algorithms in?==
There are algorithms in the article which appear to be in some computer language or logical/mathematical notation. What is the language? Its syntax differs from programming languages I studied back in the day. It is not self-evident and self-explanatory. A link to some article dealing with the conventions and assumptions of the language would be helpful. None of the examples in [[Algorithm]] or in [[Algorithm examples]] are quite like the ones here, whose syntax appears to require they start with "DEF" and end with "RETURN." These algorithms have "do loops" which do not include an index telling it how many times to execute the statement in the loop (such as "for i = 1 to n", or a test at the end of each loop execution to determine if all the data items have been processed. [[User:Edison|Edison]] ([[User talk:Edison|talk]]) 18:01, 26 May 2014 (UTC)
: Looks like [[Python_(programming_language)]] to me -- [[User:RoySmith|RoySmith]] [[User Talk:RoySmith|(talk)]] 00:56, 30 May 2014 (UTC)
:: Python - style [[pseudocode]]. The details you mention are omitted here to highlight the main points, and are omitted in practice in many programming languages. --[[User:Hro%C3%B0ulf|Hroðulf]] (or Hrothulf) ([[User talk:Hro%C3%B0ulf|Talk]]) 13:32, 23 July 2014 (UTC)
==Why "naive?"==
What makes the first algorithm "naive?" I see no explanation as to how it is lacking, or what rules are broken, or what complications are not taken into consideration. [[User:Edison|Edison]] ([[User talk:Edison|talk]]) 18:11, 26 May 2014 (UTC)
:{{re|Edison}} It is naive because it is the obvious application of the formula that defines variance.
:As for things that are lacking, those are discussed further down the article. For example, it is highly vulnerable to <s>underflow</s>, <ins>[[loss of significance]]</ins>, and it could be more efficient.
:Feel free to edit the article to make these things more clear.
:--[[User:Hro%C3%B0ulf|Hroðulf]] (or Hrothulf) ([[User talk:Hro%C3%B0ulf|Talk]]) 13:27, 23 July 2014 (UTC)
Usually the term naive is used if it is the definition formula and its calculation using it is more laborious than doing it in other way.
The formula called naive in the article is none of them. It is not the formula that defines variance and it is not laborious but simplifies
a lot the calculation. The only problem with the formula is actually a problem of the float point arithmetic limitations. A small change
by computing shifted data fixes the the problem. (see reference 2 T.F.Chan, G.H. Golub and R.J. LeVeque "3 Computations with shifted data").
In this article (ref 2) the formula (called textbook) is recommended in certain conditions and variations.
So the use of the algorithm is far away from "should not be used in practice" as stated in the Wikipedia article.
[[User:Elxala|Elxala]] ([[User talk:Elxala|talk]]) 23:17, 23 August 2014 (UTC)
== Merge discussion ==
There was a merge tag added at [[Standard deviation#Rapid calculation methods]] to this article. I think it is quite reasonable to have that section be a summary of this article. <s>I must admit I was rather surprised I couldn't find a version of the <math>\operatorname{E}\left[(X-\operatorname{E}(X))^2\right] = \operatorname{E}\left[X^2 \right] - (\operatorname{E}[X])^2</math> formula in the standard deviation article, I've put a bit in its talk page about that and that that article is slow referring to variance and never links to it.</s>
For this article as old speedup calculation method is at [[assumed mean]]. [[User:Dmcq|Dmcq]] ([[User talk:Dmcq|talk]]) 09:19, 17 May 2013 (UTC)
I think what is written here is much more clear than the article that the merge tag refers to. Hence I propose to leave it as is. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/194.105.120.70|194.105.120.70]] ([[User talk:194.105.120.70|talk]]) 14:50, 11 July 2013 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
* '''Oppose merge''' Too big for [[Standard deviation]] article per [[WP:SUMMARYSTYLE]] and qualifies as notable under [[WP:GNG]] per its references. [[User:Brycehughes|Brycehughes]] ([[User talk:Brycehughes|talk]]) 04:41, 9 November 2013 (UTC)
* I think we all agree with {{re|Brycehughes}} that the Algorithms article is too big to merge into the [[Standard deviation]] one. The merge tag suggests a move in the other direction.
* I think the sensible thing is to use Summary Style, as {{re|Dmcq}} suggests. The detailed algorithms article need not change very much, but the section at [[Standard deviation#Rapid calculation methods]] should be rewritten as a summary of the algorithms article.
* --[[User:Hro%C3%B0ulf|Hroðulf]] (or Hrothulf) ([[User talk:Hro%C3%B0ulf|Talk]]) 10:17, 24 July 2014 (UTC)
== Algorithms by (uppercase roman) number mapped to names ==
Revisions of this page up to {{oldid|Algorithms for calculating variance|209652982|3 May 2008}} referred to algorithms by number. Next revision improved the titles like so:
* I. Naïve algorithm
* II. Two-pass algorithm
* III. On-line algorithm
* IV. Weighted incremental algorithm
Revision of {{oldid|Algorithms for calculating variance|270157362|12 February 2009}} further "corrected" titles by eliminating roman numerals. Today's revision of {{oldid|Algorithms for calculating variance|707525590}} replaced roman numerical references from the text according to the above map. [[User:Ale2006|ale]] ([[User talk:Ale2006|talk]]) 10:32, 29 February 2016 (UTC)
Issues with the section called "Welford's online algorithm".
1. I don't see the point of these equations:
{\displaystyle s_{n}^{2}={\frac {n-2}{n-1}}\,s_{n-1}^{2}+{\frac {(x_{n}-{\bar {x}}_{n-1})^{2}}{n}}=s_{n-1}^{2}+{\frac {(x_{n}-{\bar {x}}_{n-1})^{2}}{n}}-{\frac {s_{n-1}^{2}}{n-1}},\quad n>1}
{\displaystyle \sigma _{n}^{2}={\frac {(n-1)\,\sigma _{n-1}^{2}+(x_{n}-{\bar {x}}_{n-1})(x_{n}-{\bar {x}}_{n})}{n}}=\sigma _{n-1}^{2}+{\frac {(x_{n}-{\bar {x}}_{n-1})(x_{n}-{\bar {x}}_{n})-\sigma _{n-1}^{2}}{n}}.} {\displaystyle \sigma _{n}^{2}={\frac {(n-1)\,\sigma _{n-1}^{2}+(x_{n}-{\bar {x}}_{n-1})(x_{n}-{\bar {x}}_{n})}{n}}=\sigma _{n-1}^{2}+{\frac {(x_{n}-{\bar {x}}_{n-1})(x_{n}-{\bar {x}}_{n})-\sigma _{n-1}^{2}}{n}}.}
They are just scaled versions of the equation that follows. I recommend removal.
2. The comment "These formulas suffer from numerical instability, as they repeatedly subtract a small number from a big number which scales with n." is confusing. It looks to me like some of the formulas repeatedly add a small number to a large number. In this respect they are no worse than the equation that follows.
3. The advantage of the {\displaystyle M_{2,n}} recurrence is that it is simpler and more computationally efficient because there are fewer multiplications and no divisions.
4. The last two equations should be
{\displaystyle {\begin{aligned}s_{n}^{2}&={\frac {M_{2,n}}{n-1}}\\[4pt]\sigma _{n}^{2}&={\frac {M_{2,n}}{n}}\end{aligned}}}
[[User:ProfRB|ProfRB]] ([[User talk:ProfRB|talk]]) 18:51, 22 March 2019 (UTC)
:2. I think the comment "These formulas suffer from numerical instability, as they repeatedly subtract a small number from a big number which scales with n" is actually wrong. Maybe there is an '''accuracy''' issue, but I don't see why there should be an instability here. A reference would be most welcome. [[User:Natchouf|Natchouf]] ([[User talk:Natchouf|talk]]) 14:08, 20 March 2023 (UTC)
== Typo in "Computing shifted data" section ==
The second shown formula in this section does not compute the population variance <math>\sigma^2</math>, it rather computes the sample variance <math>s^2</math>. This can easily be seen when comparing the (wrong) formula divisor <math>n-1</math> with the respective divisor <math>n</math> as used in the section "Naive algorithm" just above (where capital <math>N</math> is used instead of <math>n</math>).
The comments in the corresponding code snippet below makes the situation a bit clearer. Maybe one could write the code explicitly as
...
// for sample variance use
variance = (Ex2 - Ex**2 / n) / (n - 1)
// for population variance use
// variance = (Ex2 - Ex**2 / n) / n
...
[[Special:Contributions/141.249.133.134|141.249.133.134]] ([[User talk:141.249.133.134|talk]]) 06:17, 10 April 2024 (UTC)
|