Revision as of 19:42, 19 October 2012 edit SineBot (talk \| contribs) Bots 2,564,753 edits m Signing comment by 173.219.85.45 - "→Incremental algorithm in testing yields horrible results when mean=~0: new section" ← Previous edit		Revision as of 19:44, 19 October 2012 edit undo 173.219.85.45 (talk) moving this to the top so it gets seen Next edit →
Line 1: {{WikiProject Statistics}} {{maths rating \| class = start \| importance = mid \| field = probability and statistics}} == Online algorithm in testing yields horrible results when mean=~0 == Most all the tests I've seen of these algorithms add some unrealistic constant (i.e. 10^6 or larger) to the dataset to demonstrate that the suggested algorithm on this page is indeed better. I naively used this algorithm in my own work, to horrible effect. My dataset consists of a large number of discrete values, perhaps with the values -1, 0, or 1, and with an average usually between -1 and 1. I wrote the following simple test program to demonstrate the difference in results between what I'll call METHOD1 (using a running sum and sum of squares of the dataset) and METHOD2 (using a running computation of the average and the variance, which the current wiki strongly recommends). <source lang="cpp"> #include <iostream> #include <iomanip> #include <cmath> #include <cstdlib> using namespace std; int main( ) { srand( time( NULL ) ); const double target = 0.95; float sum = 0; float average = 0; float sumsq = 0; float qvalue = 0; double sumd = 0; double averaged = 0; double sumsqd = 0; double qvalued = 0; int numtrials = 0; const int width = 15; cout << setw( width ) << left << "numtrials" << setw( width ) << "float avg 2" << setw( width ) << "float avg 1" << setw( width ) << "double avg 2" << setw( width ) << "double avg 1" << setw( width ) << "float std 2" << setw( width ) << "float std 1" << setw( width ) << "double std 2" << setw( width ) << "double std 1" << endl; while( true ) { for( int i = 0; i < 1000000; i++ ) { const int sample = ( static_cast< double >( rand( ) ) / RAND_MAX < target ? 1 : 0 ); numtrials++; sum += sample; sumd += sample; const float delta = sample - average; average += delta / numtrials; const double deltad = sample - averaged; averaged += deltad / numtrials; sumsq += sample * sample; sumsqd += sample * sample; qvalue += delta * ( sample - average ); qvalued += deltad * ( sample - averaged ); } cout << fixed << setprecision( 6 ); cout << setw( width ) << left << numtrials << setw( width ) << average << setw( width ) << sum / numtrials << setw( width ) << averaged << setw( width ) << sumd / numtrials << setw( width ) << sqrt( qvalue / ( numtrials - 1 ) ) << setw( width ) << sqrt( ( sumsq - ( sum / numtrials ) * sum ) / ( numtrials - 1 ) ) << setw( width ) << sqrt( qvalued / ( numtrials - 1 ) ) << setw( width ) << sqrt( ( sumsqd - ( sumd / numtrials ) * sumd ) / ( numtrials - 1 ) ) << endl; } return 0; } </source> And here sample output: <source lang="text"> numtrials float avg 2 float avg 1 double avg 2 double avg 1 float std 2 float std 1 double std 2 double std 1 1000000 0.948275 0.950115 0.950115 0.950115 0.218147 0.217707 0.217707 0.217707 2000000 0.941763 0.949966 0.949966 0.949966 0.217107 0.218015 0.218015 0.218015 3000000 0.922894 0.949982 0.949982 0.949982 0.217433 0.217982 0.217982 0.217982 4000000 0.909789 0.950044 0.950044 0.950044 0.215531 0.217854 0.217854 0.217854 5000000 0.899830 0.950042 0.950042 0.950042 0.219784 0.217859 0.217859 0.217859 6000000 0.890922 0.950006 0.950006 0.950006 0.218891 0.217933 0.217933 0.217933 7000000 0.884997 0.950047 0.950047 0.950047 0.215908 0.217848 0.217848 0.217848 8000000 0.879075 0.950082 0.950082 0.950082 0.213635 0.217776 0.217776 0.217776 9000000 0.873134 0.950091 0.950091 0.950091 0.214217 0.217758 0.217758 0.217758 10000000 0.868035 0.950095 0.950095 0.950095 0.219110 0.217749 0.217749 0.217749 11000000 0.865048 0.950076 0.950076 0.950076 0.220991 0.217788 0.217788 0.217788 12000000 0.862079 0.950086 0.950086 0.950086 0.218815 0.217768 0.217768 0.217768 13000000 0.859129 0.950118 0.950118 0.950118 0.216916 0.217701 0.217701 0.217701 14000000 0.856129 0.950086 0.950086 0.950086 0.215379 0.217768 0.217768 0.217768 15000000 0.853163 0.950096 0.950096 0.950096 0.213971 0.217746 0.217746 0.217746 16000000 0.850167 0.950074 0.950074 0.950074 0.212786 0.217793 0.217793 0.217793 17000000 0.847186 0.950069 0.950069 0.950069 0.211621 0.217803 0.217803 0.217803 18000000 0.844209 0.932068 0.950068 0.950068 0.210246 0.251630 0.217805 0.217805 19000000 0.841231 0.883011 0.950066 0.950066 0.209009 0.321407 0.217808 0.217808 20000000 0.838260 0.838861 0.950071 0.950071 0.207879 0.367659 0.217798 0.217798 21000000 0.835285 0.798915 0.950072 0.950072 0.206857 0.400811 0.217797 0.217797 22000000 0.832305 0.762601 0.950069 0.950069 0.205931 0.425489 0.217803 0.217803 23000000 0.829313 0.729444 0.950057 0.950057 0.205096 0.444247 0.217828 0.217828 24000000 0.826340 0.699051 0.950059 0.950059 0.204305 0.458671 0.217823 0.217823 25000000 0.823366 0.671089 0.950061 0.950061 0.203576 0.469818 0.217819 0.217819 26000000 0.820379 0.645278 0.950055 0.950055 0.203316 0.478429 0.217832 0.217832 27000000 0.817389 0.621378 0.950047 0.950047 0.202405 0.485044 0.217849 0.217849 28000000 0.816234 0.599186 0.950046 0.950046 0.201543 0.490063 0.217849 0.217849 29000000 0.816234 0.578525 0.950056 0.950056 0.200723 0.493795 0.217829 0.217829 30000000 0.816234 0.559241 0.950035 0.950035 0.200000 0.496478 0.217872 0.217872 31000000 0.816234 0.541201 0.950036 0.950036 0.199291 0.498300 0.217871 0.217871 32000000 0.816234 0.524288 0.950039 0.950039 0.198619 0.499410 0.217864 0.217864 33000000 0.816234 0.508400 0.950038 0.950038 0.197993 0.499929 0.217867 0.217867 34000000 0.816234 0.493448 0.950034 0.950034 0.197406 0.499957 0.217876 0.217876 35000000 0.816234 0.479349 0.950037 0.950037 0.196839 0.499573 0.217868 0.217868 36000000 0.816234 0.466034 0.950038 0.950038 0.196306 0.498845 0.217866 0.217866 37000000 0.816234 0.453438 0.950037 0.950037 0.195804 0.497827 0.217868 0.217868 38000000 0.816234 0.441506 0.950036 0.950036 0.195328 0.496567 0.217871 0.217871 39000000 0.816234 0.430185 0.950032 0.950032 0.194878 0.495102 0.217878 0.217878 </source> Note how the computed average using float's and method 2 fails to six digits accuracy before even 1 million trials, while method 1 using floats reproduces the double results all the way out to 17 million trials. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/173.219.85.45\|173.219.85.45]] ([[User talk:173.219.85.45\|talk]]) 19:41, 19 October 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot--> Line 290 ⟶ 415: The weighted variance pseudocode seems correct to me; I just did a back-of-the-envelope check (but feel free to check it more thoroughly!). I've removed the misleading comment in the pseudocode (''#[WARNING] This seems wrong. It should be moved before M2 assignment''). This comment doesn't seem to make much sense; one can easily show that it yields the wrong result for the trivial case in which the weights are all equal to one. [[User:Theaucitron\|Theaucitron]] ([[User talk:Theaucitron\|talk]]) 11:07, 21 August 2012 (UTC) ~~== Incremental algorithm in testing yields horrible results when mean=~0 ==~~ Most all the tests I've seen of these algorithms add some unrealistic constant (i.e. 10^6 or larger) to the dataset to demonstrate that the suggested algorithm on this page is indeed better. I naively used this algorithm in my own work, to horrible effect. My dataset consists of a large number of discrete values, perhaps with the values -1, 0, or 1, and with an average usually between -1 and 1. I wrote the following simple test program to demonstrate the difference in results between what I'll call METHOD1 (using a running sum and sum of squares of the dataset) and METHOD2 (using a running computation of the average and the variance, which the current wiki strongly recommends). ~~<source lang="cpp">~~ ~~#include <iostream>~~ ~~#include <iomanip>~~ ~~#include <cmath>~~ ~~#include <cstdlib>~~ ~~using namespace std;~~ ~~int main( )~~ { ~~srand( time( NULL ) );~~ ~~const double target = 0.95;~~ ~~float sum = 0;~~ ~~float average = 0;~~ ~~float sumsq = 0;~~ ~~float qvalue = 0;~~ ~~double sumd = 0;~~ ~~double averaged = 0;~~ ~~double sumsqd = 0;~~ ~~double qvalued = 0;~~ ~~int numtrials = 0;~~ ~~const int width = 15;~~ ~~cout << setw( width ) << left << "numtrials"~~ ~~<< setw( width ) << "float avg 2"~~ ~~<< setw( width ) << "float avg 1"~~ ~~<< setw( width ) << "double avg 2"~~ ~~<< setw( width ) << "double avg 1"~~ ~~<< setw( width ) << "float std 2"~~ ~~<< setw( width ) << "float std 1"~~ ~~<< setw( width ) << "double std 2"~~ ~~<< setw( width ) << "double std 1"~~ ~~<< endl;~~ ~~while( true )~~ { ~~for( int i = 0; i < 1000000; i++ )~~ { ~~const int sample = ( static_cast< double >( rand( ) ) / RAND_MAX < target ? 1 : 0 );~~ ~~numtrials++;~~ ~~sum += sample;~~ ~~sumd += sample;~~ ~~const float delta = sample - average;~~ ~~average += delta / numtrials;~~ ~~const double deltad = sample - averaged;~~ ~~averaged += deltad / numtrials;~~ ~~sumsq += sample * sample;~~ ~~sumsqd += sample * sample;~~ ~~qvalue += delta * ( sample - average );~~ ~~qvalued += deltad * ( sample - averaged );~~ } ~~cout << fixed << setprecision( 6 );~~ ~~cout << setw( width ) << left << numtrials~~ ~~<< setw( width ) << average~~ ~~<< setw( width ) << sum / numtrials~~ ~~<< setw( width ) << averaged~~ ~~<< setw( width ) << sumd / numtrials~~ ~~<< setw( width ) << sqrt( qvalue / ( numtrials - 1 ) )~~ ~~<< setw( width ) << sqrt( ( sumsq - ( sum / numtrials ) * sum ) / ( numtrials - 1 ) )~~ ~~<< setw( width ) << sqrt( qvalued / ( numtrials - 1 ) )~~ ~~<< setw( width ) << sqrt( ( sumsqd - ( sumd / numtrials ) * sumd ) / ( numtrials - 1 ) )~~ ~~<< endl;~~ } ~~return 0;~~ } ~~</source>~~ ~~And here sample output:~~ ~~<source lang="text">~~ ~~numtrials float avg 2 float avg 1 double avg 2 double avg 1 float std 2 float std 1 double std 2 double std 1~~ ~~1000000 0.948275 0.950115 0.950115 0.950115 0.218147 0.217707 0.217707 0.217707~~ ~~2000000 0.941763 0.949966 0.949966 0.949966 0.217107 0.218015 0.218015 0.218015~~ ~~3000000 0.922894 0.949982 0.949982 0.949982 0.217433 0.217982 0.217982 0.217982~~ ~~4000000 0.909789 0.950044 0.950044 0.950044 0.215531 0.217854 0.217854 0.217854~~ ~~5000000 0.899830 0.950042 0.950042 0.950042 0.219784 0.217859 0.217859 0.217859~~ ~~6000000 0.890922 0.950006 0.950006 0.950006 0.218891 0.217933 0.217933 0.217933~~ ~~7000000 0.884997 0.950047 0.950047 0.950047 0.215908 0.217848 0.217848 0.217848~~ ~~8000000 0.879075 0.950082 0.950082 0.950082 0.213635 0.217776 0.217776 0.217776~~ ~~9000000 0.873134 0.950091 0.950091 0.950091 0.214217 0.217758 0.217758 0.217758~~ ~~10000000 0.868035 0.950095 0.950095 0.950095 0.219110 0.217749 0.217749 0.217749~~ ~~11000000 0.865048 0.950076 0.950076 0.950076 0.220991 0.217788 0.217788 0.217788~~ ~~12000000 0.862079 0.950086 0.950086 0.950086 0.218815 0.217768 0.217768 0.217768~~ ~~13000000 0.859129 0.950118 0.950118 0.950118 0.216916 0.217701 0.217701 0.217701~~ ~~14000000 0.856129 0.950086 0.950086 0.950086 0.215379 0.217768 0.217768 0.217768~~ ~~15000000 0.853163 0.950096 0.950096 0.950096 0.213971 0.217746 0.217746 0.217746~~ ~~16000000 0.850167 0.950074 0.950074 0.950074 0.212786 0.217793 0.217793 0.217793~~ ~~17000000 0.847186 0.950069 0.950069 0.950069 0.211621 0.217803 0.217803 0.217803~~ ~~18000000 0.844209 0.932068 0.950068 0.950068 0.210246 0.251630 0.217805 0.217805~~ ~~19000000 0.841231 0.883011 0.950066 0.950066 0.209009 0.321407 0.217808 0.217808~~ ~~20000000 0.838260 0.838861 0.950071 0.950071 0.207879 0.367659 0.217798 0.217798~~ ~~21000000 0.835285 0.798915 0.950072 0.950072 0.206857 0.400811 0.217797 0.217797~~ ~~22000000 0.832305 0.762601 0.950069 0.950069 0.205931 0.425489 0.217803 0.217803~~ ~~23000000 0.829313 0.729444 0.950057 0.950057 0.205096 0.444247 0.217828 0.217828~~ ~~24000000 0.826340 0.699051 0.950059 0.950059 0.204305 0.458671 0.217823 0.217823~~ ~~25000000 0.823366 0.671089 0.950061 0.950061 0.203576 0.469818 0.217819 0.217819~~ ~~26000000 0.820379 0.645278 0.950055 0.950055 0.203316 0.478429 0.217832 0.217832~~ ~~27000000 0.817389 0.621378 0.950047 0.950047 0.202405 0.485044 0.217849 0.217849~~ ~~28000000 0.816234 0.599186 0.950046 0.950046 0.201543 0.490063 0.217849 0.217849~~ ~~29000000 0.816234 0.578525 0.950056 0.950056 0.200723 0.493795 0.217829 0.217829~~ ~~30000000 0.816234 0.559241 0.950035 0.950035 0.200000 0.496478 0.217872 0.217872~~ ~~31000000 0.816234 0.541201 0.950036 0.950036 0.199291 0.498300 0.217871 0.217871~~ ~~32000000 0.816234 0.524288 0.950039 0.950039 0.198619 0.499410 0.217864 0.217864~~ ~~33000000 0.816234 0.508400 0.950038 0.950038 0.197993 0.499929 0.217867 0.217867~~ ~~34000000 0.816234 0.493448 0.950034 0.950034 0.197406 0.499957 0.217876 0.217876~~ ~~35000000 0.816234 0.479349 0.950037 0.950037 0.196839 0.499573 0.217868 0.217868~~ ~~36000000 0.816234 0.466034 0.950038 0.950038 0.196306 0.498845 0.217866 0.217866~~ ~~37000000 0.816234 0.453438 0.950037 0.950037 0.195804 0.497827 0.217868 0.217868~~ ~~38000000 0.816234 0.441506 0.950036 0.950036 0.195328 0.496567 0.217871 0.217871~~ ~~39000000 0.816234 0.430185 0.950032 0.950032 0.194878 0.495102 0.217878 0.217878~~ ~~</source>~~ Note how the computed average using float's and method 2 fails to six digits accuracy before even 1 million trials, while method 1 using floats reproduces the double results all the way out to 17 million trials. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/173.219.85.45\|173.219.85.45]] ([[User talk:173.219.85.45\|talk]]) 19:41, 19 October 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->

Talk:Algorithms for calculating variance: Difference between revisions