Content deleted Content added
m Task 70: Update syntaxhighlight tags - remove use of deprecated <source> tags |
m 2 revisions imported: import old edits from "Algorithms for calculating variance/Talk" in the August 2001 database dump |
||
(6 intermediate revisions by 6 users not shown) | |||
Line 1:
{{WikiProject
{{WikiProject Statistics| importance = mid }}
{{WikiProject Mathematics| importance = mid }}
}}
== Online algorithm in testing yields horrible results when mean=~0 ==
Line 244 ⟶ 245:
:The Welford algorithm as written calculates the sample variance in the finalize routine, and nowhere in the description does it specifically call that out. <!-- Template:Unsigned --><small class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:192.91.171.34|192.91.171.34]] ([[User talk:192.91.171.34#top|talk]] • [[Special:Contributions/192.91.171.34|contribs]]) </small>
===Parallel algorithm===
Is there an error in <math>M_{2,AB}</math>?
Chan et al. in its paper <ref name=":0">{{Citation
| last1 = Chan | first1 = Tony F. | author1-link = Tony F. Chan
| last2 = Golub | first2 = Gene H. | author2-link = Gene H. Golub
| last3 = LeVeque | first3 = Randall J.
| contribution = Updating Formulae and a Pairwise Algorithm for Computing Sample Variances.
| title = Technical Report STAN-CS-79-773
| publisher = Department of Computer Science, Stanford University
| year = 1979
| contribution-url =http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf }}.</ref> has another relation for combining arbitrary sets <math>A</math> and <math>B</math>:
<math>M_{2,AB} = M_{2,A} + M_{2,B} + \frac{n_A}{n_B n_{AB}} \cdot \left( \frac{n_B}{n_A} \bar x_A - \bar x_B \right)^2 </math>, which yelds to <math>M_{2,AB} = M_{2,A} + M_{2,B} + \frac{1}{n_{AB}} \cdot \left( \bar x_A - \bar x_B \right)^2 </math> if <math>n_A</math> and <math>n_B</math> are equal.
=== Algorithm IV ===
Line 343 ⟶ 357:
: Btw, I don't understand your comment at the beginning since the problematic division is outside the loop. [[User:McKay|McKay]] ([[User talk:McKay|talk]]) 01:02, 10 April 2009 (UTC)
::: It should not say "else variance = 0". That implies the sample variance is 0 when ''n'' = 1. That is incorrect. The sample variance is undefined in that case. [[User:Michael Hardy|Michael Hardy]] ([[User talk:Michael Hardy|talk]]) 01:18, 10 April 2009 (UTC)
{{reflist-talk}}
== Easiest online algorithm yet. ==
Line 592 ⟶ 608:
{\displaystyle {\begin{aligned}s_{n}^{2}&={\frac {M_{2,n}}{n-1}}\\[4pt]\sigma _{n}^{2}&={\frac {M_{2,n}}{n}}\end{aligned}}}
[[User:ProfRB|ProfRB]] ([[User talk:ProfRB|talk]]) 18:51, 22 March 2019 (UTC)
:2. I think the comment "These formulas suffer from numerical instability, as they repeatedly subtract a small number from a big number which scales with n" is actually wrong. Maybe there is an '''accuracy''' issue, but I don't see why there should be an instability here. A reference would be most welcome. [[User:Natchouf|Natchouf]] ([[User talk:Natchouf|talk]]) 14:08, 20 March 2023 (UTC)
== Typo in "Computing shifted data" section ==
The second shown formula in this section does not compute the population variance <math>\sigma^2</math>, it rather computes the sample variance <math>s^2</math>. This can easily be seen when comparing the (wrong) formula divisor <math>n-1</math> with the respective divisor <math>n</math> as used in the section "Naive algorithm" just above (where capital <math>N</math> is used instead of <math>n</math>).
The comments in the corresponding code snippet below makes the situation a bit clearer. Maybe one could write the code explicitly as
...
// for sample variance use
variance = (Ex2 - Ex**2 / n) / (n - 1)
// for population variance use
// variance = (Ex2 - Ex**2 / n) / n
...
[[Special:Contributions/141.249.133.134|141.249.133.134]] ([[User talk:141.249.133.134|talk]]) 06:17, 10 April 2024 (UTC)
|