Talk:Independent and identically distributed random variables: Difference between revisions

Content deleted Content added
No edit summary
Cewbot (talk | contribs)
m Maintain {{WPBS}} and vital articles: 1 WikiProject template. Create {{WPBS}}. Keep majority rating "Start" in {{WPBS}}. Remove 1 same rating as {{WPBS}} in {{WPStatistics}}.
 
(10 intermediate revisions by 7 users not shown)
Line 1:
{{WikiProject banner shell|class=Start|
{{WPStatistics|class=startWikiProject Statistics|importance=mid}}
 
}}
==Untitled==
In the "Generalizations" section, I am missing pairwise/k-wise independence mentioned (i.e. any pair/k-tuple in the sequence is independent, but larger subsets are not necessarily independent). Pairwise/k-wise independence is used in theoretical CS. --David Pal
 
==Wiki Education Foundation-supported course assignment==
[[File:Sciences humaines.svg|40px]] This article was the subject of a Wiki Education Foundation-supported course assignment, between <span class="mw-formatted-date" title="2021-08-27">27 August 2021</span> and <span class="mw-formatted-date" title="2021-12-19">19 December 2021</span>. Further details are available [[Wikipedia:Wiki_Ed/Cornell_University/Probability,_Statistics,_and_Data_Analysis_for_the_Physical_Sciences_(Fall_2021)|on the course page]]. Student editor(s): [[User:Hanshenli|Hanshenli]]. Peer reviewers: [[User:Yibeiiiii|Yibeiiiii]], [[User:C.Hua Wang|C.Hua Wang]], [[User:Joannetsai|Joannetsai]].
 
{{small|Above undated message substituted from [[Template:Dashboard.wikiedu.org assignment]] by [[User:PrimeBOT|PrimeBOT]] ([[User talk:PrimeBOT|talk]]) 22:56, 17 January 2022 (UTC)}}
 
== auto correlation ==
 
I think this can simply be correlation to determine if observation are IID. auto correlation is simply one for a time ___domain of similar. Correlation is more general I think. <!-- Template:Unsigned --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Chrisparker126|Chrisparker126]] ([[User talk:Chrisparker126#top|talk]] • [[Special:Contributions/Chrisparker126|contribs]]) 18:40, 7 October 2019 (UTC)</small> <!--Autosigned by SineBot-->
== Link to German Version ==
 
Line 31 ⟶ 38:
In short, an informational relationship between the IUD page and the Uterine Malformation page would be a very helpful one. https://en.wikipedia.org/wiki/Uterine_malformation
 
^ This seems to be on the wrong page. This is IID, not IUD [[Special:Contributions/203.91.225.198|203.91.225.198]] ([[User talk:203.91.225.198|talk]]) 23:23, 25 January 2021 (UTC)
 
Not soure how to add the langunage link in this page. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/95.208.167.145|95.208.167.145]] ([[User talk:95.208.167.145|talk]]) 17:59, 8 May 2013 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
Line 38 ⟶ 46:
== IID consistency ==
 
After noticing the lead contained a mixture of both, I made a bold edit in favour of IID which I personally find less visually distracting than the dots in i.i.d. when the term is dropped into every second sentence. However, IID is not exactly beautiful, either, and typographically I would advise {{sc2|IID}} (i.e. <ttcode><nowiki>{{sc2|IID}}</nowiki></ttcode>), except that this is apparently discouraged in the MOS. This article might the one where it makes sense to go against the recommended-style grain, though it's above my pay grade to decide this unilaterally. &mdash; [[user:MaxEnt|MaxEnt]] 00:57, 20 March 2017 (UTC)
 
:Your decision has my support, since dropping the [[full stops]] (or ''periods'') from [[initialism]]s (and other [[abbreviations]]) has been throughout the last century or more, and continues to be, a productive process in English writing, as seen in the following usages, for example:
Line 71 ⟶ 79:
 
White noise implies constant mean and variance and zero autocorrelation. Correlation only measures linear relationships, and hence does not imply independence, nor does it imply identical probability distribution for all the sequence of ransom variables, since it also concerns itself with the first two mean-centered moments of the distribution. [[User:IntelligentET|IntelligentET]] ([[User talk:IntelligentET|talk]]) 22:13, 10 November 2018 (UTC)
 
== Machine Learning section ==
 
The "In machine learning" section has a number of serious issues.
First, it is much too detailed and specific for an article on as general a statistical concept as i.i.d. random variables. Even inclusion of a sentence that amounts to something like "In machine learning, each vector of variables in a dataset is often assumed to be an i.i.d. random vector" would be of doubtful value in this article. The assumption of i.i.d. sampling is pervasive across various applications of statistical analysis of data, serving as the simplest assumption about a data-generating process. Mentioning machine learning can mislead a reader by giving the impression the assumption is particular to machine learning (while in fact it's independent(!) of it).
Second, there are claims that are normative (e.g. "currently acquired massive quantities of data to deliver faster, more accurate results") and ill-defined (e.g. "The computer is very efficient to calculate multiple additions, but it is not efficient to calculate the multiplication").
Third, of the two URLs linked as references in the section, one no longer works and the other is not in English, which is not suitable for the English language version of Wikipedia.
Fourth, the section written as an answer to the question posed at its beginning: "Why assume the data in machine learning are independent and identically distributed?". The gist of the answer provided is that the log-likelihood function is additive, a simplification that makes for a more tractable optimization problem. But this again isn't particular to machine learning, but to maximum likelihood estimation. Moreover, i.i.d. sampling does not mean that the distribution function is known, so this is implicitly being assumed by the section. And then there's the fact that most machine learning methods are quite different from maximum likelihood. Finally, a good answer to this question would tackle the numerous problems with the assumption of independence in many real-world datasets, due to sample selection, autocorrelation, unobserved confounders, etc.
 
[[User:Undsoweiter|Undsoweiter]] ([[User talk:Undsoweiter|talk]]) 09:00, 13 January 2022 (UTC)
 
I fully agree with @Undsoweiter. There is no reason for explicitly mentioning machine learning. Moreover, I also do not understand the first reason at the end. Why is the cental limit theorem of any relevance at this point? One is not adding together random variables during likelihood optimization. [[User:Nmdwolf|Nmdwolf]] ([[User talk:Nmdwolf|talk]]) 15:01, 26 May 2022 (UTC)
 
== Degraded quality of article due to edits since November 2021 ==
 
While I understand the value of having students edit Wikipedia articles for a class, numerous issues have been introduced into the article. These include: the use of the pronoun "we"; the inconsistent math fonts for independence of events (which also has other issues); the machine learning section as detailed above in a separate comment; the elimination of important examples illustrating how the i.i.d. sampling can be a flawed assumption; the unnecessary mentioning of "data mining" and "signal processing"; etc. With the semester already finished, it is doubtful the issues will be remedied by members of the class. I think all edits since November 2021 should be undone.
[[User:Undsoweiter|Undsoweiter]] ([[User talk:Undsoweiter|talk]]) 09:12, 13 January 2022 (UTC)