Statistical hypothesis test: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
Rescuing orphaned refs ("Gigerenzer 587–606" from rev 1038184004). Read this before reverting.
Citation bot (talk | contribs)
Add: s2cid, doi-access. Removed proxy/dead URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_toolbar
Line 280:
The dispute between Fisher and Neyman–Pearson was waged on philosophical grounds, characterized by a philosopher as a dispute over the proper role of models in statistical inference.<ref name=Lenhard>{{cite journal|last=Lenhard|first=Johannes|title=Models and Statistical Inference: The Controversy between Fisher and Neyman–Pearson|journal=Br. J. Philos. Sci.|volume=57|pages=69–91|year=2006|doi=10.1093/bjps/axi152|s2cid=14136146}}</ref>
 
Events intervened: Neyman accepted a position in the western hemisphere, breaking his partnership with Pearson and separating disputants (who had occupied the same building) by much of the planetary diameter. World War II provided an intermission in the debate. The dispute between Fisher and Neyman terminated (unresolved after 27 years) with Fisher's death in 1962. Neyman wrote a well-regarded eulogy.<ref>{{cite journal|last1=Neyman|first1=Jerzy|title=RA Fisher (1890—1962): An Appreciation.|journal=Science|volume=156|issue=3781|pages=1456–1460|year=1967|doi=10.1126/science.156.3781.1456|pmid=17741062|bibcode=1967Sci...156.1456N|s2cid=44708120}}</ref> Some of Neyman's later publications reported ''p''-values and significance levels.<ref>{{cite journal|last1=Losavich|first1=J. L.|last2=Neyman|first2=J.|last3=Scott|first3=E. L.|last4=Wells|first4=M. A.|title=Hypothetical explanations of the negative apparent effects of cloud seeding in the Whitetop Experiment.|journal=Proceedings of the National Academy of Sciences of the United States of America|year=1971|volume=68|issue=11|pages=2643–2646|doi=10.1073/pnas.68.11.2643|pmid=16591951|pmc=389491|bibcode=1971PNAS...68.2643L|doi-access=free}}</ref>
 
The modern version of hypothesis testing is a hybrid of the two approaches that resulted from confusion by writers of statistical textbooks (as predicted by Fisher) beginning in the 1940s.<ref name="Halpin 625–653">{{cite journal|last1=Halpin|first1=P F|title=Inductive Inference or Inductive Behavior: Fisher and Neyman: Pearson Approaches to Statistical Testing in Psychological Research (1940–1960)|journal=The American Journal of Psychology|date=Winter 2006 |volume=119|issue=4|pages=625–653|jstor=20445367|doi=10.2307/20445367|pmid=17286092|last2=Stam|first2=HJ}}</ref> (But [[Detection theory|signal detection]], for example, still uses the Neyman/Pearson formulation.) Great conceptual differences and many caveats in addition to those mentioned above were ignored. Neyman and Pearson provided the stronger terminology, the more rigorous mathematics and the more consistent philosophy, but the subject taught today in introductory statistics has more similarities with Fisher's method than theirs.<ref name=Gigerenzer>{{cite book|title=The Empire of Chance: How Probability Changed Science and Everyday Life|last=Gigerenzer|first=Gerd|author2=Zeno Swijtink |author3=Theodore Porter |author4=Lorraine Daston |author5=John Beatty |author6=Lorenz Kruger |year=1989|publisher=Cambridge University Press|chapter=Part 3: The Inference Experts|isbn=978-0-521-39838-1|pages=70–122}}</ref>
Line 347:
*"[I]t does not tell us what we want to know".<ref name=cohen94/> Lists of dozens of complaints are available.<ref name=kline/><ref name="nickerson">{{cite journal|author=Nickerson, Raymond S.|title=Null Hypothesis Significance Tests: A Review of an Old and Continuing Controversy|journal=Psychological Methods|volume=5|issue=2|pages=241–301|year=2000|doi=10.1037/1082-989X.5.2.241|pmid=10937333|s2cid=28340967|url= https://semanticscholar.org/paper/8c5e0e6f85b9dc15ecf23d43a49404925c4c41bf}}</ref><ref name="branch">{{cite journal|author=Branch, Mark|title=Malignant side effects of null hypothesis significance testing|journal=Theory & Psychology|volume=24|issue=2|pages=256–277|year=2014|doi=10.1177/0959354314525282|s2cid=40712136|url=https://semanticscholar.org/paper/48f8711f3ca3535192ce695fa987847725374b0e}}</ref>
 
Critics and supporters are largely in factual agreement regarding the characteristics of null hypothesis significance testing (NHST): While it can provide critical information, it is ''inadequate as the sole tool for statistical analysis''. ''Successfully rejecting the null hypothesis may offer no support for the research hypothesis.'' The continuing controversy concerns the selection of the best statistical practices for the near-term future given the existing practices. However, adequate research design can minimize this issue. Critics would prefer to ban NHST completely, forcing a complete departure from those practices,<ref>{{cite journal |last1=Hunter |first1=John E. |title=Needed: A Ban on the Significance Test |journal=Psychological Science |date=January 1997 |volume=8 |issue=1 |pages=3–7 |doi=10.1111/j.1467-9280.1997.tb00534.x|s2cid=145422959 }}</ref> while supporters suggest a less absolute change.{{citation needed|date=December 2015}}
 
Controversy over significance testing, and its effects on publication bias in particular, has produced several results. The American Psychological Association has strengthened its statistical reporting requirements after review,<ref name=wilkinson>{{cite journal|author=Wilkinson, Leland|title=Statistical Methods in Psychology Journals; Guidelines and Explanations|journal=American Psychologist|volume=54|issue=8|pages=594–604|year=1999|doi=10.1037/0003-066X.54.8.594}} "Hypothesis tests. It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval." (p 599). The committee used the cautionary term "forbearance" in describing its decision against a ban of hypothesis testing in psychology reporting. (p 603)</ref> medical journal publishers have recognized the obligation to publish some results that are not statistically significant to combat publication bias<ref>{{cite web|url=http://www.icmje.org/publishing_1negative.html|title=ICMJE: Obligation to Publish Negative Studies|access-date=September 3, 2012|quote=Editors should seriously consider for publication any carefully done study of an important question, relevant to their readers, whether the results for the primary or any additional outcome are statistically significant. Failure to submit or publish findings because of lack of statistical significance is an important cause of publication bias.|url-status=dead|archive-url=https://web.archive.org/web/20120716211637/http://www.icmje.org/publishing_1negative.html|archive-date=July 16, 2012|df=mdy-all}}</ref> and a journal (''Journal of Articles in Support of the Null Hypothesis'') has been created to publish such results exclusively.<ref name=JASNH>''Journal of Articles in Support of the Null Hypothesis'' website: [http://www.jasnh.com/ JASNH homepage]. Volume 1 number 1 was published in 2002, and all articles are on psychology-related subjects.</ref> Textbooks have added some cautions<ref>{{cite book|title=Statistical Methods for Psychology|last=Howell|first=David|year=2002|publisher=Duxbury|edition=5|isbn=978-0-534-37770-0|page=[https://archive.org/details/statisticalmetho0000howe/page/94 94]|url= https://archive.org/details/statisticalmetho0000howe/page/94}}</ref> and increased coverage of the tools necessary to estimate the size of the sample required to produce significant results. Major organizations have not abandoned use of significance tests although some have discussed doing so.<ref name=wilkinson/>
Line 357:
A unifying position of critics is that statistics should not lead to an accept-reject conclusion or decision, but to an estimated value with an [[interval estimate]]; this data-analysis philosophy is broadly referred to as [[estimation statistics]]. Estimation statistics can be accomplished with either frequentist [https://www.ncbi.nlm.nih.gov/pubmed/31217592] or Bayesian methods.<ref>{{cite journal|last=Kruschke|first=J K|title=Bayesian Estimation Supersedes the T Test|journal=Journal of Experimental Psychology: General|date=July 9, 2012 |volume=142|issue=2|pages=573–603|doi=10.1037/a0029146|pmid=22774788|url=http://www.indiana.edu/~kruschke/articles/Kruschke2012JEPG.pdf}}</ref>
 
One strong critic of significance testing suggested a list of reporting alternatives:<ref name=Armstrong1>{{cite journal|author=Armstrong, J. Scott|title=Significance tests harm progress in forecasting|journal=International Journal of Forecasting|volume=23|pages=321–327|year=2007|url=http://repository.upenn.edu/cgi/viewcontent.cgi?article=1104&context=marketing_papers|doi=10.1016/j.ijforecast.2007.03.004|issue=2|citeseerx=10.1.1.343.9516|s2cid=1550979}}</ref> effect sizes for importance, prediction intervals for confidence, replications and extensions for replicability, meta-analyses for generality. None of these suggested alternatives produces a conclusion/decision. Lehmann said that hypothesis testing theory can be presented in terms of conclusions/decisions, probabilities, or confidence intervals. "The distinction between the ... approaches is largely one of reporting and interpretation."<ref name=Lehmann97>{{cite journal|author=E. L. Lehmann|title=Testing Statistical Hypotheses: The Story of a Book|journal=Statistical Science|volume=12|issue=1|pages=48–52|year=1997|doi=10.1214/ss/1029963261|doi-access=free}}</ref>
 
On one "alternative" there is no disagreement: Fisher himself said,<ref name=fisher /> "In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result." Cohen, an influential critic of significance testing, concurred,<ref name=cohen94>{{cite journal|author=Jacob Cohen|title=The Earth Is Round (p < .05)|journal=American Psychologist|volume=49|issue=12|pages=997–1003|date=December 1994|doi=10.1037/0003-066X.49.12.997|s2cid=380942|url=https://semanticscholar.org/paper/2cc7be3d5161e865807e13de7975c9d77fbd2815}} This paper lead to the review of statistical practices by the APA. Cohen was a member of the Task Force that did the review.</ref> "... don't look for a magic alternative to NHST ''[null hypothesis significance testing]'' ... It doesn't exist." "... given the problems of statistical induction, we must finally rely, as have the older sciences, on replication." The "alternative" to significance testing is repeated testing. The easiest way to decrease statistical uncertainty is by obtaining more data, whether by increased sample size or by repeated tests. Nickerson claimed to have never seen the publication of a literally replicated experiment in psychology.<ref name=nickerson /> An indirect approach to replication is [[meta-analysis]].
Line 371:
}} "...the proper application of statistics to scientific inference is irrevocably committed to extensive consideration of inverse [AKA Bayesian] probabilities..." It was acknowledged, with regret, that a priori probability distributions were available "only as a subjective feel, differing from one person to the next" "in the more immediate future, at least".</ref><ref>{{Cite journal | last = Berger | first = James
| title = The Case for Objective Bayesian Analysis
| journal = Bayesian Analysis | volume = 1 | issue = 3 | pages = 385–402 | year = 2006 | doi=10.1214/06-ba115| doi-access = free }} In listing the competing definitions of "objective" Bayesian analysis, "A major goal of statistics (indeed science) is to find a completely coherent objective Bayesian methodology for learning from data." The author expressed the view that this goal "is not attainable".</ref> Neither [[Ronald Fisher|Fisher]]'s significance testing, nor [[Neyman–Pearson lemma|Neyman–Pearson]] hypothesis testing can provide this information, and do not claim to. The probability a hypothesis is true can only be derived from use of [[Bayes' Theorem]], which was unsatisfactory to both the Fisher and Neyman–Pearson camps due to the explicit use of [[subjectivity]] in the form of the [[prior probability]].<ref name="Neyman 289–337"/><ref>{{cite journal|last=Aldrich|first=J|title=R. A. Fisher on Bayes and Bayes' theorem|journal=Bayesian Analysis|year=2008|volume=3|issue=1|pages=161–170|url=http://ba.stat.cmu.edu/journal/2008/vol03/issue01/aldrich.pdf|doi=10.1214/08-BA306|url-status=dead|archive-url=https://web.archive.org/web/20140906190025/http://ba.stat.cmu.edu/journal/2008/vol03/issue01/aldrich.pdf|archive-date=September 6, 2014|df=mdy-all|doi-access=free}}</ref> Fisher's strategy is to sidestep this with the [[p-value|''p''-value]] (an objective ''index'' based on the data alone) followed by ''inductive inference'', while Neyman–Pearson devised their approach of ''inductive behaviour''.
 
==Philosophy==