Open Science Infrastructure: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type, url, journal, pages, title. URLs might have been anonymized. Add: page, issue, volume, journal. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine
m Typo
Tags: Visual edit Mobile edit Mobile web edit
 
(8 intermediate revisions by 6 users not shown)
Line 3:
[[File:Open science pillars.png|thumb|upright=1.35|Open Science infrastructure is one of the four pillars of Open Science in the UNESCO Recommendation on Open Science (2021).]]
 
'''Open Science Infrastructure''' (or ''open scholarly infrastructure'') is an [[information infrastructure]] that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the [[Unesco]] recommendation on Open Science describedescribes it as "shared research infrastructures that are needed to support [[open science]] and serve the needs of different communities".{{sfn|UNESCO|2021}}
 
Open science infrastructures are a form of scientific infrastructure (also called ''[[cyberinfrastructure]]'', ''[[e-Science]]'' or ''e-infrastructure'') that support the production of open knowledge. Beyond the management of common resources, they are frequently structured as community-led initiatives with a set collective norms and governance regulations, which makes them also a form of [[knowledge commons]]. The definition of open science infrastructures usually exclude privately owned scientific infrastructures run by leading commercial publishers. Conversely it may include actors not always characterized as scientific infrastructures that play a critical role in the ecosystem of open science, such as publishing platforms in open access (''open scholarly communication service'').
Line 14:
''Open science infrastructure'' is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as publication, data or software.
 
TheA [[Unesco]] recommendation ofabout Open[[open Sciencescience]] approved in November 2021 definedefines open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities".{{sfn|UNESCO|2021}} TheA SPARC{{what|date=April 2025}} report on European Openopen Sciencescience Infrastructureinfrastructure includeincludes the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more".{{sfn|Ficarra et al.|2020|p=7}}
 
===Infrastructure===
The use of the term "infrastructure" is an explicit reference to the physical infrastructures and networks such as power grids, road networks or telecommunications that made it possible to run complex economic and social system after the industrial revolution: "The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function (...) If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy".{{sfn|Atkins|2003|p=5}} The concept of infrastructure was notably extended in 1996 to forms of computer-mediated knowledge production by [[Susan Leigh Star]] and [[Karen Ruhleder]], through an empirical observation of an early form of open science infrastructure, the Worm Community System.{{sfn|Star|Ruhleder|1996}} This definition has remained influential through the next two decades in [[science and technology studies]]{{sfn|Karasti et al. I|2016|p=4}} and has affected the policy debate over the building of scientific infrastructure since the early 2000s{{sfn|Atkins|2003|p=5}}
 
Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives:
Line 39:
The ''Principles'' attempt to hybridize the framework of infrastructure studies with the analysis of the [[commons]] initiated by [[Elinor Ostrom]]. The principles develop a series of recommendations in three critical areas to the success of open infrastructures:
* '''Governance''': the governance of the infrastructure should be open and accountable to the scientific communities it aims to serve. Specific measures should ensure that the management of the organization is transparent and diverse.{{sfn|Bilder|Lin|Neylon|2015}}
* '''Sutainability''': the core activities of organization should be covered by recurring funds. Short-term subventions should be limited to short-term projects. WhilWhile the organization could charge for services, it should not extend to the data that should remain "a community property".{{sfn|Bilder|Lin|Neylon|2015}}
* '''Insurance''': the technical infrastructure and the output of the organization are open. This ensure that the infrastructure can be recreated if necessary (in the jargon of open source, it becomes "forkable").{{sfn|Bilder|Lin|Neylon|2015}}
 
Line 55:
Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by [[Paul Otlet]] or [[Vannevar Bush]] already incorporated numerous features of online scientific infrastructures.{{sfn|Borgman|2007|p=40}}
 
After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output.{{sfn|Wouters|1999|p=61}} The issue became politically relevant after the successful launch of [[Sputnik]]: "The Sputnik crisis turned the librarians’ problem of bibliographic control into a national information crisis."{{sfn|Wouters|1999|p=62}} The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by [[machine translation]]: in the 1950s, a significant amount of scientific publications [[Languages of Science|were not available in English]], especially the one coming from the Soviet blockbloc.
 
Influent members of the [[National Science Foundation]] like [[Joshua Ledeberg]] advocated for the creation of a "centralized information system", [[SCITEL]] that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency.{{sfn|Wouters|1999|p=60}} In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles.{{sfn|Wouters|1999|p=64}}
Line 88:
Several competing terms appeared to fill this need. In the United States, the ''cyber-infrastructure'' was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy."{{sfn|Atkins|2003|p=5}} E-infrastructure or e-science were used in a similar meaning in the United Kingdom and European countries.
 
Thanks to "sizable investments",{{sfn|Eccles et al.|2009}} major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007–2008, such as the [[Open Science Grid]], [[BioGRID]], the [[Jisc|JISC]], {{ill|DARIAH|WDqid=Q49103279}} or the [[Project Bamboo]].{{sfn|Dacos|2013}}{{sfn|eResearch2020|2010|p={{page needed|date=February 2024}}}} Specialized free software for scientific publishing like [[Open Journal Systems]] became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals.{{sfn|Bosman et al.|2021|p=93}} Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then.{{sfn|Bosman et al.|2021|p=30}}
 
By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature".{{sfn|Eccles et al.|2009}} While the development of the web solved a large range of technical issues regarding network management, building scientific infrastructure remained challenging. Governance, communication across all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the [[Project Bamboo]] was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the [[Mellon Foundation]]’s's rejection of the project’sproject's final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself".{{sfn|Dombrowski|2014|p=334}} This lack of clarity was further aggravated by recurring communication missteps between the project initiators and the community it aimed to serve. "The community had spoken and made it clear that continuing to emphasize [[Service-oriented architecture]] would alienate the very members of the community Bamboo was intended to benefit most: the scholars themselves".{{sfn|Dombrowski|2014|p=329}} Budgets cuts following the economic crisis of 2007-2008 underlined the fragility of ambitious infrastructure plans relying on a significant recurring funds.{{sfn|Dombrowski|2014|p=331}}
 
[[File:Providers of digital tools for the scientific workflow.png|thumb|Leading commercial ecosystems for scientific research]]
Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of [[Elsevier]] "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal".{{sfn|Andriesse|2008|pp=257-258}} The persistence of high revenues from subscription and the consolidation of the sector made it possible to fund the conversion of the pre-existing online services to the web as well as the digitization of past collections. By the 2010s, leading publishers have been "moving from a content-provision to a data analytics business"<ref name="andressi_5">{{harvnb|Aspesi et al.|2019|p=5}}</ref> and developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process".{{sfn|Posada|Chen|2018|p=6}} Since it has expanded beyond publishing, the ''vertical integration'' of privately owned infrastructures has become extensively integrated to daily research activities.
 
{{blockquote|The privatised control of scholarly infrastructures is especially noticeable in the context of ‘vertical integration’ that publishers such as Elsevier and SpringerNature are seeking by controlling all aspects of the research life cycle, from submission to publication and beyond. For example, this vertical integration is represented in a number of Elsevier’sElsevier's business acquisitions, such as Mendeley (a reference manager), SSRN (a pre-print repository) and Bepress (a provider of repository and publishing software for universities).{{sfn|Moore|2019|p=156}}}}
 
=== Toward open science infrastructures (2015-…) ===
Line 108:
Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref,{{sfn|Bilder|2020}} OpenCitations{{sfn|Di Giambattista|2021}} or Data Dryad{{sfn|The Dryad Team|2020}} and has become a common basis for the institutional evaluation of existing open infrastructures.{{sfn|Ficarra et al.|2020|p=21}} The main focus of the ''Principles'' is to build "trustworthy institutions" with significant commitments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities.{{sfn|Neylon|2017|p=7}}
 
By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer."{{sfn|Fecher et al.|2021|p=505}} According to the 2021 Roadmap of the {{ill|European Strategy Forum on Research Infrastructures|WDqid=Q2623454}} (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm."{{sfn|ESFRI Roadmap|2021|p=159}} Examples of extensive data sharing programs include the [[European Social Survey]] (in social science), [[ECRIN ERIC]] (for clinical data) or the [[Cherenkov Telescope Array]] (in Astronomy).{{sfn|ESFRI Roadmap|2021|p=159}}
 
In agreement with the original intent of the ''Principles'', open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space."{{sfn|Kraker|2021|p=2}} In November 2021, the UNESCO Recommendation for Open Science acknowledged open science infrastructure as one of the four pillar of open science, along with open science knowledge, open engagement of societal actors and open dialog with other knowledge system and called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their longterm sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible."{{sfn|UNESCO|2021}}
Line 122:
Open Access repositories are the most frequent form of Open Science Infrastructure<ref>{{harvnb|Operas Landscape Study|2017|p=15}}</ref> with 5,791 repositories in existence in December 2021 according to OpenDOAR{{sfn|OpenDOAR Statistics}}
 
Yet, there is a significant diversification of the roles and the activities of open science infrastructure, at least among the largest infrastructures. In the survey of European infrastructure conducted by SPARC Europe, 95% of the respondents mention that they provide services in at least three different stages of research production out of six (Creation, Evaluation, Publishing, Hosting, Discovering and Archiving).{{sfn|Ficarra et al.|2020|p=13}} AgregationAggregation, hosting and indexing are especially central activities, common to most Open Science Infrastructures regardless of their focus.
 
Specialization does happen at a higher level. A network analysis identifies "two main clusters of activities":
Line 154:
 
=== Economics ===
Many Open Science Infrastructure run "at a relatively low cost" as small infrastructures are an important part of the open science ecosystem.{{sfn|Ficarra et al.|2020|p=35}} In 2020, 21 out of 53 surveyed European infrastructures "report spending less than €50,000".{{sfn|Ficarra et al.|2020|p=35}} Consequently, more than 75% of surveyed European infrastructures are run by small teams of 5 FTEs or less.<ref>{{harvnb|Ficarra et al.|2020|p=41}}</ref> The size of the infrastructure and the extent of its funding is far from always proportional to the critical service it offers: "some of the most heavily used services make ends meet with a tiny core team of two to five people."<ref>{{harvnb|Kraker|2021|p=3}}</ref> Volunteer contributions are significant as well with is both "a strength and weakness to an OSI’sOSI's sustainability".{{sfn|Ficarra et al.|2020|p=35}} The landscape of open science infrastructures is therefore rather close to the ideals of a "decentralised network of small projects" envisioned by theoricians of the scholarly commons.<ref>{{harvnb|Moore|2019|p=176}}</ref> A very large majority of open science infrastructure are non-commercial{{sfn|Ficarra et al.|2020|p=48}} and collaborations or financial support from the private sector remain very limited.{{sfn|Ficarra et al.|2020|p=45}}
 
Overall, European infrastructures were financially sustainable in 2020<ref>{{harvnb|Ficarra et al.|2020|p=51}}</ref> which contrasts with the situation ten years prior: in 2010, European infrastructures had much less visibility: they usually lacked "a long-term perspective" and struggled "with securing the funding for more than 5 years".{{sfn|eResearch2020|2010|p=103}} In 2020, European infrastructures frequently relies on grants from National funds and from the European Commission.{{sfn|Ficarra et al.|2020|p=45}} Without theses grants, most of theses actors would "could only remain viable for less than a year".{{sfn|Ficarra et al.|2020|p=48}} Yet, one quarter of surveyed European infrastructures was not supported by any grants and subventions and used either alternative means of incomes or voluntary contributions.{{sfn|Ficarra et al.|2020|p=35}} As they can be "difficult to define adequately", open science infrastructures can be overlooked by funding bodies, which "contributes to the challenge of securing funding".<ref>{{harvnb|Neylon|2017|p=1}}</ref>
Line 177:
* {{Cite report |last=Lewis| first=David W.| title=Mapping Scholarly Communication Infrastructure: A Bibliographic Scan of Digital Scholarly Communication Infrastructure| date=May 2020| ___location=Atlanta, GA| publisher=Educopia Institute| url=https://scholarworks.iupui.edu/server/api/core/bitstreams/cee09afc-db34-42f5-840b-be44338ed691/content| access-date=2021-12-12}}
*{{Cite report| author=((eResearch2020))| publisher = European Commission| title = The role of e-Infrastructures in the creation of global virtual research communities| ___location = Brussels| date = 2010|url = https://op.europa.eu/en/publication-detail/-/publication/edf0fed4-c01a-454b-8a9e-34f602b00100}}
* {{Cite report |ref={{harvid|Operas Landscape Study|2017}}| publisher = OPERAS| title = Landscape Study on Open Access Publishing| series = Design for Open Access Publications in European Research Areas for Social Sciences and Humanities| date = 2017| doi=10.3030/731031 |url=https://cordis.europa.eu/project/id/731031/results| url-access = subscription}}
* {{Cite report| last1= Chodacki| first1= John| last2= Cruse| first2= Patricia| last3= Lin| first3= Jennifer| last4= Neylon| first4= Cameron| last5= Pattinson| first5= Damian| last6= Strasser| first6= Carly| title = Supporting Research Communications: a guide| accessdate = 2021-12-11| date = 2018-04-05| url = https://zenodo.org/record/3524663}}
*{{Cite report |ref={{harvid|Aspesi et al.|2019}}| publisher = LIS Scholarship Archive| last1= Aspesi| first1= Claudio| last2= Allen| first2= Nicole Starr| last3= Crow| first3= Raym| last4= Daugherty| first4= Shawn| last5= Joseph| first5= Heather| last6= McArthur| first6= Joseph| last7= Shockey| first7= Nick| title = SPARC Landscape Analysis: The Changing Academic Publishing Industry – Implications for Academic Institutions| accessdate = 2022-01-05| date = 2019-04-03| url = https://osf.io/preprints/lissa/58yhb/}}
Line 249:
* {{Cite web| title = The end of the journal? What has changed, what stayed the same?| last=Neylon| first=Cameron| date=2015-11-29| work=Science in the Open| accessdate = 2021-10-31| url = http://cameronneylon.net/blog/the-end-of-the-journal-what-has-changed-what-stayed-the-same/}}
* {{Cite web| last = Guédon| first = Jean-Claude| title = Open Access: Toward the Internet of the Mind| work = BOAI| url=https://www.budapestopenaccessinitiative.org/boai15/open-access-toward-the-internet-of-the-mind/| access-date=2021-12-12}}
* {{cite webjournal |last=Bilder |first=Geoffrey |date=2020-12-02 |url=https://www.crossref.org/blog/crossrefs-board-votes-to-adopt-the-principles-of-open-scholarly-infrastructure/ |title=Crossref's Board votes to adopt the Principles of Open Scholarly Infrastructure |website=Blog |publisher=Crossref|doi=10.64000/hzemx-j7n79 }}
* {{cite web |author=The Dryad Team |date=2020-12-08 |url=https://blog.datadryad.org/2020/12/08/dryads-commitment-to-the-principles-of-open-scholarly-infrastructure/ |title=Dryad's Commitment to the Principles of Open Scholarly Infrastructure |website=Dryad news}}
* {{cite web |title=Open Science MOOC Response to UNESCO Draft Open Science Recommendations |author=((Open Science MOOC 2020 Steering Committee)) |date=December 30, 2020 |url=https://en.unesco.org/sites/default/files/comments_osr_partner_open_science_mooc_document.pdf}}