Statistical hypothesis test: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
Rescuing orphaned refs ("kline" from rev 1299244632)
m Clarified philosophical and historical context of Fisher vs. Neyman–Pearson debate, improved grammar, and added citations to foundational works.
Tags: possible formatting issues Reverted use of deprecated (unreliable) source
Line 29:
Neyman & Pearson considered a different problem to Fisher (which they called "hypothesis testing"). They initially considered two simple hypotheses (both with frequency distributions). They calculated two probabilities and typically selected the hypothesis associated with the higher probability (the hypothesis more likely to have generated the sample). Their method always selected a hypothesis. It also allowed the calculation of both types of error probabilities.
 
The development of statistical hypothesis testing was shaped by a long-standing philosophical dispute between [[Ronald Fisher]] and [[Jerzy Neyman]] with [[Egon Pearson]]. Neyman and Pearson introduced a formal decision-theoretic framework in their 1933 paper, culminating in the [[Neyman–Pearson lemma]], which defines the most powerful test for a given significance level α when comparing two simple hypotheses.<ref name="NeymanPearson1933">{{cite journal|last1=Neyman|first1=Jerzy|last2=Pearson|first2=Egon S.|year=1933|title=On the Problem of the Most Efficient Tests of Statistical Hypotheses|journal=Philosophical Transactions of the Royal Society A|volume=231|pages=289–337}}</ref><ref name="NPlemma">{{cite web|title=Neyman–Pearson lemma|url=https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma|website=Wikipedia}}</ref>
Fisher and Neyman/Pearson clashed bitterly. Neyman/Pearson considered their formulation to be an improved generalization of significance testing (the defining paper<ref name="Neyman 289–337" /> was [[Neyman–Pearson lemma|abstract]]; Mathematicians have generalized and refined the theory for decades<ref name="Lehmann93" />). Fisher thought that it was not applicable to scientific research because often, during the course of the experiment, it is discovered that the initial assumptions about the null hypothesis are questionable due to unexpected sources of error. He believed that the use of rigid reject/accept decisions based on models formulated before data is collected was incompatible with this common scenario faced by scientists and attempts to apply this method to scientific research would lead to mass confusion.<ref>{{cite journal|last=Fisher|first=R N|year=1958|title=The Nature of Probability|url=http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf|journal=Centennial Review|volume=2|pages=261–274|quote=We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort.}}
</ref>
 
Fisher, in contrast, advocated for significance testing as a method of inductive inference. He emphasized the use of the [[p-value]] as a continuous measure of evidence against the null hypothesis, without rigid accept/reject rules. In his 1958 essay *The Nature of Probability*, Fisher criticized the Neyman–Pearson approach as overly mechanical and ill-suited for scientific research, where assumptions often evolve during experimentation.<ref name="Fisher1958">{{cite journal|last=Fisher|first=R. A.|year=1958|title=The Nature of Probability|journal=Centennial Review|volume=2|pages=261–274|url=http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf}}</ref>
The dispute between Fisher and Neyman–Pearson was waged on philosophical grounds, characterized by a philosopher as a dispute over the proper role of models in statistical inference.<ref name="Lenhard">{{cite journal|last=Lenhard|first=Johannes|year=2006|title=Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson|journal=Br. J. Philos. Sci.|volume=57|pages=69–91|doi=10.1093/bjps/axi152|s2cid=14136146}}</ref>
 
Fisher argued that pre-specified models and binary decisions could mislead researchers, stating:
Events intervened: Neyman accepted a position in the [[University of California, Berkeley]] in 1938, breaking his partnership with Pearson and separating the disputants (who had occupied the same building). [[World War II]] provided an intermission in the debate. The dispute between Fisher and Neyman terminated (unresolved after 27 years) with Fisher's death in 1962. Neyman wrote a well-regarded eulogy.<ref>{{cite journal|last1=Neyman|first1=Jerzy|year=1967|title=RA Fisher (1890—1962): An Appreciation.|journal=Science|volume=156|issue=3781|pages=1456–1460|bibcode=1967Sci...156.1456N|doi=10.1126/science.156.3781.1456|pmid=17741062|s2cid=44708120}}</ref> Some of Neyman's later publications reported ''p''-values and significance levels.<ref>{{cite journal|last1=Losavich|first1=J. L.|last2=Neyman|first2=J.|last3=Scott|first3=E. L.|last4=Wells|first4=M. A.|year=1971|title=Hypothetical explanations of the negative apparent effects of cloud seeding in the Whitetop Experiment.|journal=Proceedings of the National Academy of Sciences of the United States of America|volume=68|issue=11|pages=2643–2646|bibcode=1971PNAS...68.2643L|doi=10.1073/pnas.68.11.2643|pmc=389491|pmid=16591951|doi-access=free}}</ref>
> “We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be.”<ref name="Fisher1958" />
 
The dispute betweenwas Fishernot andmerely Neyman–Pearsontechnical wasbut wagedphilosophical. onNeyman philosophicalfocused grounds,on characterized**inductive bybehavior**—making adecisions philosopherunder asuncertainty—while aFisher disputeemphasized over**inductive theinference**, properaiming roleto ofdraw modelsconclusions infrom statistical inferencedata.<ref name="Lenhard">{{cite journal|last=Lenhard|first=Johannes|year=2006|title=Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson|journal=Br.British J.Journal Philos.for Sci.the Philosophy of Science|volume=57|pages=69–91|doi=10.1093/bjps/axi152|s2cid=14136146}}</ref>
 
In 1938, Neyman moved to the [[University of California, Berkeley]], ending his collaboration with Pearson and geographically distancing himself from Fisher. [[World War II]] interrupted the debate, which remained unresolved until Fisher’s death in 1962. Neyman later published a respectful eulogy in *Science*, acknowledging Fisher’s contributions despite their differences.<ref>{{cite journal|last=Neyman|first=Jerzy|year=1967|title=R. A. Fisher (1890–1962): An Appreciation|journal=Science|volume=156|issue=3781|pages=1456–1460|doi=10.1126/science.156.3781.1456|pmid=17741062}}</ref>
 
Interestingly, Neyman’s later publications included p-values and significance levels, blurring the lines between the two schools of thought.<ref>{{cite journal|last1=Losavich|first1=J. L.|last2=Neyman|first2=J.|last3=Scott|first3=E. L.|last4=Wells|first4=M. A.|year=1971|title=Hypothetical explanations of the negative apparent effects of cloud seeding in the Whitetop Experiment|journal=PNAS|volume=68|issue=11|pages=2643–2646|doi=10.1073/pnas.68.11.2643}}</ref>
 
==={{anchor|NHST}}Null hypothesis significance testing (NHST)===