Content deleted Content added
m Added a helpful link to another Wikipedia page. |
|||
Line 339:
Criticism of statistical hypothesis testing fills volumes.<ref name=morrison>{{cite book|orig-year=1970|year=2006|title=The Significance Test Controversy|editor1=Morrison, Denton |editor2=Henkel, Ramon |publisher=Aldine Transaction |isbn=978-0-202-30879-1}}</ref><ref>{{cite book|last=Oakes|first=Michael|title=Statistical Inference: A Commentary for the Social and Behavioural Sciences|publisher=Wiley|___location=Chichester New York|year=1986|isbn=978-0471104438}}</ref><ref name=chow>{{cite book|first=Siu L.|last=Chow|year=1997|title=Statistical Significance: Rationale, Validity and Utility|isbn=978-0-7619-5205-3}}</ref><ref name=harlow>{{cite book|year=1997|title=What If There Were No Significance Tests?|editor1=Harlow, Lisa Lavoie |editor2=Stanley A. Mulaik |editor3=James H. Steiger |publisher=Lawrence Erlbaum Associates|isbn=978-0-8058-2634-0}}</ref><ref name=kline>{{cite book|last=Kline|first=Rex|title=Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research|publisher=American Psychological Association|___location=Washington, D.C. |year=2004|isbn=9781591471189 }}</ref><ref name=mccloskey>{{cite book|last= McCloskey|first=Deirdre N.|author2=Stephen T. Ziliak |year=2008|title=The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives|publisher=University of Michigan Press|isbn=978-0-472-05007-9}}</ref> Much of the criticism can be summarized by the following issues:
* The interpretation of a ''p''-value is dependent upon [[stopping rule]] and definition of multiple comparison. The former often changes during the course of a study and the latter is unavoidably ambiguous. (i.e. "p values depend on both the (data) observed and on the other possible (data) that might have been observed but weren't").<ref>{{cite journal|last=Cornfield|first=Jerome|title=Recent Methodological Contributions to Clinical Trials| journal=American Journal of Epidemiology|volume=104|issue=4|pages=408–421|year=1976|url=http://www.epidemiology.ch/history/PDF%20bg/Cornfield%20J%201976%20recent%20methodological%20contributions.pdf|doi=10.1093/oxfordjournals.aje.a112313|pmid= 788503}}</ref>
* Confusion resulting (in part) from combining the methods of Fisher and Neyman–Pearson which are conceptually distinct.<ref name="Tukey60">{{cite journal|last=Tukey|first=John W.|title=Conclusions vs decisions|journal= Technometrics|volume=26|issue=4|pages=423–433|year=1960|doi=10.1080/00401706.1960.10489909}} "Until we go through the accounts of testing hypotheses, separating [Neyman–Pearson] decision elements from [Fisher] conclusion elements, the intimate mixture of disparate elements will be a continual source of confusion." ... "There is a place for both "doing one's best" and "saying only what is certain," but it is important to know, in each instance, both which one is being done, and which one ought to be done."</ref>
* Emphasis on statistical significance to the exclusion of estimation and confirmation by repeated experiments.<ref>{{cite journal|last=Yates|first=Frank|title=The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics|journal=Journal of the American Statistical Association|volume=46|issue=253|pages=19–34|year=1951|doi=10.1080/01621459.1951.10500764}} "The emphasis given to formal tests of significance throughout [R.A. Fisher's] Statistical Methods ... has caused scientific research workers to pay undue attention to the results of the tests of significance they perform on their data, particularly data derived from experiments, and too little to the estimates of the magnitude of the effects they are investigating." ... "The emphasis on tests of significance and the consideration of the results of each experiment in isolation, have had the unfortunate consequence that scientific workers have often regarded the execution of a test of significance on an experiment as the ultimate objective."</ref>
|