Talk:Independent and identically distributed random variables: Difference between revisions
Content deleted Content added
AthalGolwen (talk | contribs) Update Probability, Statistics, and Data Analysis for the Physical Sciences assignment details |
Undsoweiter (talk | contribs) →Machine Learning section: new section |
||
Line 74:
White noise implies constant mean and variance and zero autocorrelation. Correlation only measures linear relationships, and hence does not imply independence, nor does it imply identical probability distribution for all the sequence of ransom variables, since it also concerns itself with the first two mean-centered moments of the distribution. [[User:IntelligentET|IntelligentET]] ([[User talk:IntelligentET|talk]]) 22:13, 10 November 2018 (UTC)
== Machine Learning section ==
The "In machine learning" section has a number of serious issues.
First, it is much too detailed and specific for an article on as general a statistical concept as i.i.d. random variables. Even inclusion of a sentence that amounts to something like "In machine learning, each vector of variables in a dataset is often assumed to be an i.i.d. random vector" would be of doubtful value in this article. The assumption of i.i.d. sampling is pervasive across various applications of statistical analysis of data, serving as the simplest assumption about a data-generating process. Mentioning machine learning can mislead a reader by giving the impression the assumption is particular to machine learning (while in fact it's independent(!) of it).
Second, there are claims that are normative (e.g. "currently acquired massive quantities of data to deliver faster, more accurate results") and ill-defined (e.g. "The computer is very efficient to calculate multiple additions, but it is not efficient to calculate the multiplication").
Third, of the two URLs linked as references in the section, one no longer works and the other is not in English, which is not suitable for the English language version of Wikipedia.
Fourth, the section written as an answer to the question posed at its beginning: "Why assume the data in machine learning are independent and identically distributed?". The gist of the answer provided is that the log-likelihood function is additive, a simplification that makes for a more tractable optimization problem. But this again isn't particular to machine learning, but to maximum likelihood estimation. Moreover, i.i.d. sampling does not mean that the distribution function is known, so this is implicitly being assumed by the section. And then there's the fact that most machine learning methods are quite different from maximum likelihood. Finally, a good answer to this question would tackle the numerous problems with the assumption of independence in many real-world datasets, due to sample selection, autocorrelation, unobserved confounders, etc.
[[User:Undsoweiter|Undsoweiter]] ([[User talk:Undsoweiter|talk]]) 09:00, 13 January 2022 (UTC)
|