Content deleted Content added
m Signing comment by 173.219.85.45 - "→Incremental algorithm in testing yields horrible results when mean=~0: new section" |
moving this to the top so it gets seen |
||
Line 1:
{{WikiProject Statistics}}
{{maths rating | class = start | importance = mid | field = probability and statistics}}
== Online algorithm in testing yields horrible results when mean=~0 ==
Most all the tests I've seen of these algorithms add some unrealistic constant (i.e. 10^6 or larger) to the dataset to demonstrate that the suggested algorithm on this page is indeed better. I naively used this algorithm in my own work, to horrible effect. My dataset consists of a large number of discrete values, perhaps with the values -1, 0, or 1, and with an average usually between -1 and 1. I wrote the following simple test program to demonstrate the difference in results between what I'll call METHOD1 (using a running sum and sum of squares of the dataset) and METHOD2 (using a running computation of the average and the variance, which the current wiki strongly recommends).
<source lang="cpp">
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cstdlib>
using namespace std;
int main( )
{
srand( time( NULL ) );
const double target = 0.95;
float sum = 0;
float average = 0;
float sumsq = 0;
float qvalue = 0;
double sumd = 0;
double averaged = 0;
double sumsqd = 0;
double qvalued = 0;
int numtrials = 0;
const int width = 15;
cout << setw( width ) << left << "numtrials"
<< setw( width ) << "float avg 2"
<< setw( width ) << "float avg 1"
<< setw( width ) << "double avg 2"
<< setw( width ) << "double avg 1"
<< setw( width ) << "float std 2"
<< setw( width ) << "float std 1"
<< setw( width ) << "double std 2"
<< setw( width ) << "double std 1"
<< endl;
while( true )
{
for( int i = 0; i < 1000000; i++ )
{
const int sample = ( static_cast< double >( rand( ) ) / RAND_MAX < target ? 1 : 0 );
numtrials++;
sum += sample;
sumd += sample;
const float delta = sample - average;
average += delta / numtrials;
const double deltad = sample - averaged;
averaged += deltad / numtrials;
sumsq += sample * sample;
sumsqd += sample * sample;
qvalue += delta * ( sample - average );
qvalued += deltad * ( sample - averaged );
}
cout << fixed << setprecision( 6 );
cout << setw( width ) << left << numtrials
<< setw( width ) << average
<< setw( width ) << sum / numtrials
<< setw( width ) << averaged
<< setw( width ) << sumd / numtrials
<< setw( width ) << sqrt( qvalue / ( numtrials - 1 ) )
<< setw( width ) << sqrt( ( sumsq - ( sum / numtrials ) * sum ) / ( numtrials - 1 ) )
<< setw( width ) << sqrt( qvalued / ( numtrials - 1 ) )
<< setw( width ) << sqrt( ( sumsqd - ( sumd / numtrials ) * sumd ) / ( numtrials - 1 ) )
<< endl;
}
return 0;
}
</source>
And here sample output:
<source lang="text">
numtrials float avg 2 float avg 1 double avg 2 double avg 1 float std 2 float std 1 double std 2 double std 1
1000000 0.948275 0.950115 0.950115 0.950115 0.218147 0.217707 0.217707 0.217707
2000000 0.941763 0.949966 0.949966 0.949966 0.217107 0.218015 0.218015 0.218015
3000000 0.922894 0.949982 0.949982 0.949982 0.217433 0.217982 0.217982 0.217982
4000000 0.909789 0.950044 0.950044 0.950044 0.215531 0.217854 0.217854 0.217854
5000000 0.899830 0.950042 0.950042 0.950042 0.219784 0.217859 0.217859 0.217859
6000000 0.890922 0.950006 0.950006 0.950006 0.218891 0.217933 0.217933 0.217933
7000000 0.884997 0.950047 0.950047 0.950047 0.215908 0.217848 0.217848 0.217848
8000000 0.879075 0.950082 0.950082 0.950082 0.213635 0.217776 0.217776 0.217776
9000000 0.873134 0.950091 0.950091 0.950091 0.214217 0.217758 0.217758 0.217758
10000000 0.868035 0.950095 0.950095 0.950095 0.219110 0.217749 0.217749 0.217749
11000000 0.865048 0.950076 0.950076 0.950076 0.220991 0.217788 0.217788 0.217788
12000000 0.862079 0.950086 0.950086 0.950086 0.218815 0.217768 0.217768 0.217768
13000000 0.859129 0.950118 0.950118 0.950118 0.216916 0.217701 0.217701 0.217701
14000000 0.856129 0.950086 0.950086 0.950086 0.215379 0.217768 0.217768 0.217768
15000000 0.853163 0.950096 0.950096 0.950096 0.213971 0.217746 0.217746 0.217746
16000000 0.850167 0.950074 0.950074 0.950074 0.212786 0.217793 0.217793 0.217793
17000000 0.847186 0.950069 0.950069 0.950069 0.211621 0.217803 0.217803 0.217803
18000000 0.844209 0.932068 0.950068 0.950068 0.210246 0.251630 0.217805 0.217805
19000000 0.841231 0.883011 0.950066 0.950066 0.209009 0.321407 0.217808 0.217808
20000000 0.838260 0.838861 0.950071 0.950071 0.207879 0.367659 0.217798 0.217798
21000000 0.835285 0.798915 0.950072 0.950072 0.206857 0.400811 0.217797 0.217797
22000000 0.832305 0.762601 0.950069 0.950069 0.205931 0.425489 0.217803 0.217803
23000000 0.829313 0.729444 0.950057 0.950057 0.205096 0.444247 0.217828 0.217828
24000000 0.826340 0.699051 0.950059 0.950059 0.204305 0.458671 0.217823 0.217823
25000000 0.823366 0.671089 0.950061 0.950061 0.203576 0.469818 0.217819 0.217819
26000000 0.820379 0.645278 0.950055 0.950055 0.203316 0.478429 0.217832 0.217832
27000000 0.817389 0.621378 0.950047 0.950047 0.202405 0.485044 0.217849 0.217849
28000000 0.816234 0.599186 0.950046 0.950046 0.201543 0.490063 0.217849 0.217849
29000000 0.816234 0.578525 0.950056 0.950056 0.200723 0.493795 0.217829 0.217829
30000000 0.816234 0.559241 0.950035 0.950035 0.200000 0.496478 0.217872 0.217872
31000000 0.816234 0.541201 0.950036 0.950036 0.199291 0.498300 0.217871 0.217871
32000000 0.816234 0.524288 0.950039 0.950039 0.198619 0.499410 0.217864 0.217864
33000000 0.816234 0.508400 0.950038 0.950038 0.197993 0.499929 0.217867 0.217867
34000000 0.816234 0.493448 0.950034 0.950034 0.197406 0.499957 0.217876 0.217876
35000000 0.816234 0.479349 0.950037 0.950037 0.196839 0.499573 0.217868 0.217868
36000000 0.816234 0.466034 0.950038 0.950038 0.196306 0.498845 0.217866 0.217866
37000000 0.816234 0.453438 0.950037 0.950037 0.195804 0.497827 0.217868 0.217868
38000000 0.816234 0.441506 0.950036 0.950036 0.195328 0.496567 0.217871 0.217871
39000000 0.816234 0.430185 0.950032 0.950032 0.194878 0.495102 0.217878 0.217878
</source>
Note how the computed average using float's and method 2 fails to six digits accuracy before even 1 million trials, while method 1 using floats reproduces the double results all the way out to 17 million trials. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/173.219.85.45|173.219.85.45]] ([[User talk:173.219.85.45|talk]]) 19:41, 19 October 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
Line 290 ⟶ 415:
The weighted variance pseudocode seems correct to me; I just did a back-of-the-envelope check (but feel free to check it more thoroughly!). I've removed the misleading comment in the pseudocode (''#[WARNING] This seems wrong. It should be moved before M2 assignment''). This comment doesn't seem to make much sense; one can easily show that it yields the wrong result for the trivial case in which the weights are all equal to one.
[[User:Theaucitron|Theaucitron]] ([[User talk:Theaucitron|talk]]) 11:07, 21 August 2012 (UTC)
|