Reproducibility: Difference between revisions

Content deleted Content added
bug fixes
Tags: Reverted section blanking Mobile edit Mobile web edit
 
(15 intermediate revisions by 9 users not shown)
Line 2:
{{About|the reproducibility of scientific research results|reproductive capacity of organisms|fertility|and|fecundity|reproducibility in the context of computer software|Reproducible builds}}
 
'''Reproducibility''', closely related to '''replicability''' and '''repeatability''', is a major principle underpinning the [[scientific method]]. For the findings of a study to be reproducible means that results obtained by an [[experiment]] or an [[observational study]] or in a [[statistical analysis]] of a [[data set]] should be achieved again with a high degree of reliability when the study is replicated. There are different kinds of replication<ref>{{Cite journal|last1=Tsang|first1=Eric W. K.|last2=Kwan|first2=Kai-man|date=1999|title=Replication and Theory Development in Organizational Science: A Critical Realist Perspective|url=http://dx.doi.org/10.5465/amr.1999.2553252|journal=Academy of Management Review|volume=24|issue=4|pages=759–780|doi=10.5465/amr.1999.2553252|issn=0363-7425|url-access=subscription}}</ref> but typically replication studies involve different researchers using the same methodology. Only after one or several such successful replications should a result be recognized as scientific knowledge.
 
==History==
== Types of Reproducibility ==
[[File:Boyle air pump.jpg|thumb|right|Boyle's air pump was, in terms of the 17th century, a complicated and expensive scientific apparatus, making reproducibility of results difficult.]]
There are different kinds of replication studies, each serving a unique role in scientific validation:
The first to stress the importance of reproducibility in science was the Anglo-Irish chemist [[Robert Boyle]], in [[England]] in the 17th century. Boyle's [[air pump]] was designed to generate and study [[vacuum]], which at the time was a very controversial concept. Indeed, distinguished philosophers such as [[René Descartes]] and [[Thomas Hobbes]] denied the very possibility of vacuum existence. [[History of science|Historians of science]] [[Steven Shapin]] and [[Simon Schaffer]], in their 1985 book ''[[Leviathan and the Air-Pump]]'', describe the debate between Boyle and Hobbes, ostensibly over the nature of vacuum, as fundamentally an argument about how useful knowledge should be gained. Boyle, a pioneer of the [[experimental method]], maintained that the foundations of knowledge should be constituted by experimentally produced facts, which can be made believable to a scientific community by their reproducibility. By repeating the same experiment over and over again, Boyle argued, the certainty of fact will emerge.
 
The air pump, which in the 17th century was a complicated and expensive apparatus to build, also led to one of the first documented disputes over the reproducibility of a particular [[scientific phenomenon]]. In the 1660s, the Dutch scientist [[Christiaan Huygens]] built his own air pump in [[Amsterdam]], the first one outside the direct management of Boyle and his assistant at the time [[Robert Hooke]]. Huygens reported an effect he termed "anomalous suspension", in which water appeared to levitate in a glass jar inside his air pump (in fact suspended over an air bubble), but Boyle and Hooke could not replicate this phenomenon in their own pumps. As Shapin and Schaffer describe, "it became clear that unless the phenomenon could be produced in England with one of the two pumps available, then no one in England would accept the claims Huygens had made, or his competence in working the pump". Huygens was finally invited to England in 1663, and under his personal guidance Hooke was able to replicate anomalous suspension of water. Following this Huygens was elected a Foreign Member of the [[Royal Society]]. However, Shapin and Schaffer also note that "the accomplishment of replication was dependent on contingent acts of judgment. One cannot write down a formula saying when replication was or was not achieved".<ref>[[Steven Shapin]] and [[Simon Schaffer]], ''[[Leviathan and the Air-Pump]]'', Princeton University Press, Princeton, New Jersey (1985).</ref>
Direct Replication – The exact experiment or study is repeated under the same conditions to verify the original findings.
 
The [[Philosophy of science|philosopher of science]] [[Karl Popper]] noted briefly in his famous 1934 book ''[[The Logic of Scientific Discovery]]'' that "non-reproducible single occurrences are of no significance to science".<ref>This citation is from the 1959 translation to English, [[Karl Popper]], ''[[The Logic of Scientific Discovery]]'', Routledge, London, 1992, p. 66.</ref> The [[Statistics|statistician]] [[Ronald Fisher]] wrote in his 1935 book ''[[The Design of Experiments]]'', which set the foundations for the modern scientific practice of [[hypothesis testing]] and [[statistical significance]], that "we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us statistically significant results".<ref>[[Ronald Fisher]], ''[[The Design of Experiments]]'', (1971) [1935](9th ed.), Macmillan, p. 14.</ref> Such assertions express a common [[dogma]] in modern science that reproducibility is a necessary condition (although not necessarily [[Necessity and sufficiency|sufficient]]) for establishing a scientific fact, and in practice for establishing scientific authority in any field of knowledge. However, as noted above by Shapin and Schaffer, this dogma is not well-formulated quantitatively, such as statistical significance for instance, and therefore it is not explicitly established how many times must a fact be replicated to be considered reproducible.
Conceptual Replication – A study tests the same hypothesis but uses a different methodology, materials, or population to see if the results hold in different contexts.
 
Computational Reproducibility – In data science and computational research, reproducibility requires making all datasets, code, and algorithms openly available so others can replicate the analysis and obtain the same results.
 
== Importance of Reproducibility ==
Reproducibility serves several critical purposes in science:
 
Verification of Results – Confirms that findings are not due to random chance or errors.
 
Building Trust in Research – Scientists, policymakers, and the public rely on reproducible studies to make informed decisions.
Advancing Knowledge – Establishes a strong foundation for future research by validating existing theories.
 
Avoiding Bias and Fraud – Helps detect false positives, publication bias, and data manipulation that could mislead the scientific community.
Challenges in Achieving Reproducibility
 
Despite its importance, many studies fail reproducibility tests, leading to what is known as the replication crisis in fields like psychology, medicine, and social sciences. Some key challenges include:
 
Insufficient Data Sharing – Many researchers do not make raw data, code, or methodology openly available, making replication difficult.
Small Sample Sizes – Studies with limited sample sizes may show results that do not generalize to larger populations.
 
Publication Bias – Journals tend to publish positive findings rather than null or negative results, leading to an incomplete scientific record.
Complex Experimental Conditions – In some cases, small variations in laboratory settings, equipment, or researcher expertise can affect outcomes, making exact replication difficult.
 
== Real-World Applications of Reproducibility ==
Medical Research – Reproducibility ensures that clinical trials and drug effectiveness studies produce reliable results before treatments reach the public.
 
AI and Machine Learning – Scientists emphasize reproducibility in AI by requiring open-source models and datasets to validate algorithm performance.
 
Climate Science – Climate models must be reproducible across different datasets and simulations to ensure accurate predictions of global warming.
 
Pharmaceutical Development – Drug discovery relies on reproducing experiments across multiple labs to ensure safety and efficacy.
 
== Improving Reproducibility in Science ==
To enhance reproducibility, researchers and institutions can adopt several best practices:
 
Open Data and Code – Making datasets and computational methods publicly available ensures that others can verify results.
 
Registered Reports – Some scientific journals now accept studies based on pre-registered research plans, reducing bias.
 
Standardized Methods – Using well-documented, standardized experimental protocols helps ensure consistent results.
 
Independent Replication Studies – Funding agencies and journals should prioritize replication studies to strengthen scientific integrity.
 
With a narrower scope, ''reproducibility'' has been defined in [[computational science]]s as having the following quality: the results should be documented by making all data and code available in such a way that the computations can be executed again with identical results.
 
In recent decades, there has been a rising concern that many published scientific results fail the test of reproducibility, evoking a reproducibility or [[replication crisis]].
 
==Terminology==
''Replicability'' and ''repeatability'' are related terms broadly or loosely synonymous with reproducibility (for example, among the general public), but they are often usefully differentiated in more precise senses, as follows.
 
WhenTwo major steps are naturally distinguished in connection with reproducibility of experimental or observational studies: when new data isare obtained in the attempt to achieve it, the term ''replicability'' is often used, and the new study is a ''replication'' or ''replicate'' of the original one. Obtaining the same results when analyzing the data set of the original study again with the same procedures, many authors use the term ''reproducibility'' in a narrow, technical sense coming from its use in computational research. ''Repeatability'' is related to the ''repetition'' of the experiment within the same study by the same researchers. Reproducibility in the original, wide sense is only acknowledged if a replication performed by an ''independent researcher team'' is successful.
Two major steps are naturally distinguished in connection with reproducibility of experimental or observational studies:
When new data is obtained in the attempt to achieve it, the term ''replicability'' is often used, and the new study is a ''replication'' or ''replicate'' of the original one. Obtaining the same results when analyzing the data set of the original study again with the same procedures, many authors use the term ''reproducibility'' in a narrow, technical sense coming from its use in computational research.
''Repeatability'' is related to the ''repetition'' of the experiment within the same study by the same researchers.
Reproducibility in the original, wide sense is only acknowledged if a replication performed by an ''independent researcher team'' is successful.
 
The terms reproducibility and replicability sometimes appear even in the scientific literature with reversed meaning,<ref>{{cite arXiv|title=Terminologies for Reproducible Research|last1=Barba|first1=Lorena A.|year=2018|class=cs.DL |eprint=1802.03311}}</ref><ref>{{cite web|title=Replicability vs. reproducibility — or is it the other way round?|last1=Liberman|first1=Mark|url=https://languagelog.ldc.upenn.edu/nll/?p=21956|access-date=2020-10-15}}</ref> as different research fields settled on their own definitions for the same terms.<ref>{{cite journal|title=Brooke on the Merton Thesis: A Direct Replication of John Hedley Brooke’sBrooke's Chapter on Scientific and Religious Reform.|last1=Van Eyghen|first1=Hans|last2=Van den Brink| first2=Gijsbert |last3=Peels | first3=Rik|year=2024|journal=Zygon |volume=59| issue=2| url=https://www.zygonjournal.org/article/id/11497/#!}}</ref>
 
==Measures of reproducibility and repeatability==
In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning.<ref>{{Cite journal |last= |first= |title=IUPAC - reproducibility (R05305) |url=https://goldbook.iupac.org/terms/view/R05305 |access-date=2022-03-04 |website=[[International Union of Pure and Applied Chemistry]]|doi= 10.1351/goldbook.R05305|doi-access=free|url-access=subscription}}</ref> In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called ''reproducibility''.<ref name="ASTM E177">{{cite web|url=https://www.astm.org/Standards/E177.htm |title=Standard Practice for Use of the Terms Precision and Bias in ASTM Test Methods |year=2014 |author=Subcommittee E11.20 on Test Method Evaluation and Quality Control |publisher=ASTM International |id=ASTM E177}}{{Subscription required}}</ref>
These measures are related to the more general concept of [[variance component]]s in [[metrology]].
 
Line 98 ⟶ 51:
 
Reproducible research is key to new discoveries in [[pharmacology]]. A Phase I discovery will be followed by Phase II reproductions as a drug develops towards commercial production. In recent decades Phase II success has fallen from 28% to 18%. A 2011 study found that 65% of medical studies were inconsistent when re-tested, and only 6% were completely reproducible.<ref>{{Cite journal|last1=Prinz |first1=F. |last2=Schlange |first2=T. |last3=Asadullah |first3=K. |doi=10.1038/nrd3439-c1 |title=Believe it or not: How much can we rely on published data on potential drug targets? |journal=Nature Reviews Drug Discovery |volume=10 |issue=9 |page=712 |year=2011 |pmid=21892149 |doi-access=free}}</ref>
 
Some efforts have been made to increase replicability beyond the social and biomedical sciences. Studies in the humanities tend to rely more on expertise and hermeneutics which may make replicability more difficult. Nonetheless, some efforts have been made to call for more transparency and documentation in the humanities.<ref>{{Cite journal |last1=Van Eyghen |first1=Hans |last2= Van den Brink |first2=Gijsbert |last3= Peels |first3= Rik |title=Brooke on the Merton Thesis: A Direct Replication of John Hedley Brooke's Chapter on Scientific and Religious Reform |journal=Zygon: Journal of Religion and Science |volume=59 |issue=2 |year=2024|url=https://www.zygonjournal.org/article/id/11497/| doi=10.16995/zygon.11497|doi-access=free }}</ref>
 
==Noteworthy irreproducible results==
Line 104 ⟶ 59:
In March 1989, [[University of Utah]] chemists Stanley Pons and Martin Fleischmann reported the production of excess heat that could only be explained by a nuclear process ("[[cold fusion]]"). The report was astounding given the simplicity of the equipment: it was essentially an [[electrolysis]] cell containing [[heavy water]] and a [[palladium]] [[cathode]] which rapidly absorbed the [[deuterium]] produced during electrolysis. The news media reported on the experiments widely, and it was a front-page item on many newspapers around the world (see [[science by press conference]]). Over the next several months others tried to replicate the experiment, but were unsuccessful.<ref>{{cite journal|title=Physicists Debunk Claim Of a New Kind of Fusion|newspaper=New York Times|last=Browne|first=Malcolm|url=http://partners.nytimes.com/library/national/science/050399sci-cold-fusion.html|date=3 May 1989|access-date=3 February 2017}}</ref>
 
[[Nikola Tesla]] claimed as early as 1899 to have used a high frequency current to light gas-filled lamps from over {{convert|25|mi|km}} away [[Wireless energy transfer|without using wires]]. In 1904 he built [[Wardenclyffe Tower]] on [[Shoreham, New York|Long Island]] to demonstrate means to send and receive power without connecting wires. The facility was never fully operational and was not completed due to economic problems, so no attempt to reproduce his first result was ever carried out.<ref>[[Margaret Cheney (author)|Cheney, Margaret]] (1999), ''Tesla, Master of Lightning'', New York: Barnes & Noble Books, {{ISBN|0-7607-1005-8}}, pp. 107.; "Unable to overcome his financial burdens, he was forced to close the laboratory in 1905."</ref>
 
Other examples which contrary evidence has refuted the original claim: