Content deleted Content added
More conversion to sfn templates. |
More conversion to sfn templates. Remove a couple of PN that I had added. |
||
Line 13:
''Open science infrastructure'' is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as pûblication, data or softwares.
The [[Unesco]] recommendation of Open Science approved in November 2021 define open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities".<ref name=UNESCO/> The SPARC report on European Open Science Infrastructure include the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more"
===Infrastructure===
Line 19:
Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives:
*Open science infrastructures are not simply a technical product but embed a set of tools, institutions and social norms
*Open science infrastructures are durable and resilient. They are expected to run on a long term basis and multiple research programs relies on.{{sfn|Atkins|2003|p=5}}
*Open science infrastructures can be shared and used by different actors and communities. It must be sufficiently consistent to remain coordinated and yet it have to welcome a diverse array of local uses: "an infrastructure occurs when the tension between local and global is resolved".{{sfn|Star|Ruhleder|1996}} Predefined agreement on the scope and the governance of the infrastructure within all stakeholders is a critical step.{{sfn|Bos et al.|2007|p=667}}
===Openness and the commons===
Open science infrastructures are open, which differentiate them with other scientific and knowledge infrastructure and, more specifically, with subscription-based commercial infrastructures. Openness is both a core value and a directing principle that affect the aims, the governance and the management of the infrastructure. Open science infrastructure face similar issues met by other open institutions such as [[open data]] repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process".
The conceptual definition of open science infrastructures has been largely influenced by the analysis of [[Elinor Ostrom]] on the [[commons]] and more specifically on the [[knowledge commons]]. In accordance with Ostrom, [[Cameron Neylon]] understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms.{{sfn|Neylon|2017|p=7}} The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work (…) provides a template (…) to make the transition from a local ''club'' to a community-wide infrastructure."{{sfn|Neylon|2017|pp=7-8}} Open science infrastructure tend to favor a non-for profit, publicly-funded model with strong involvement from scientific communities, which disassociate them from privately-owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven."{{sfn|Kraker|2021|p=2}} This status aims to ensure the autonomy of the infratructure and prevent their incorporation into commercial infrastructure.
Open science infrastructures are not only a more specific subset of scientific infrastructures and cyberinfrastructures but may also include actors that would not fall into this definition. "Open access publication platforms" such as [[Scielo]], [[OpenEdition.org|OpenEdition]] or the [[Open Library of Humanities]] are considered an integral part of open science infrastructures in the UNESCO definition<ref name=UNESCO/> and in several literature review{{sfn|Lewis|2020|p=6}} and policy reports,{{sfn|Ficarra et al.|2020|p=8}} whereas they were usually considered as a separate entities in the policy debate on cyberinfrastructure and e-infrastructures.{{sfn|Dacos|2013}} In the 2010 report of the European Commission on e-infrastructure, scientific publishing plaforms are "not e-Infrastructures but closely related to it".{{sfn|eResearch2020|2010|p=222}}
Line 34:
===Principles for open science infrastructures===
In 2015 ''Principles for Open Scholarly Infrastructure'' have laid out an influential prescriptive definition of open science infrastructures. Subsequent definitions and terminologies of open science infratructures have been largely elaborated on this basis.
The ''Principles'' attempt to hybridize the framework of infrastructure studies with the analysis of the [[commons]] initiated by [[Elinor Ostrom]]. The principles develop a series of recommendations in three critical areas to the success of open infrastructures:
Line 52:
[[File:Sputnik asm.jpg|thumbnail|The Sputnik launch has triggered one of the first major debate on scientific infrastructure]]
Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by [[Paul Otlet]] or [[Vannevar Bush]] already incorporated numerous features of online scientific infrastructures.
After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output.
Influent members of the [[National Science Foundation]] like [[Joshua Ledeberg]] advocated for the creation of a "centralized information system", [[SCITEL]] that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency.
Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed.
[[File:Principle medlars.png|thumb|The indexation process of citations in MEDLARS, an early scientific infrastructure for publications in medicine]]
Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as [[MEDLINE]] for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds."
{{Quotation|The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea.
The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the [[Institute for Scientific Information]] that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The [[Science Citation Index]] relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal.
Until the advent of the web, the landscape of scientific infrastructures remained fragmented.
=== The Web Revolution (1990–1995) ===
The [[World Wide Web]] was originally framed as an open scientific infrastructure. The project was inspired by [[ENQUIRE]], an information management software commissioned to [[Tim Berners-Lee]] by the [[CERN]] for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth".
Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data".{{sfn|Berners-Lee|1991}}
The web rapidly superseded pre-existing online infrastructure, even when they included more advanced computing features. From 1991 to 1994, users of the [[Worm Community System]], a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the ''Worm Community System'' could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services.
The Web and similar protocols developed at the time have had a similar impact on scientific publications. Early forms of open access publishing were not developed by large scale institutional infrastructures but through small initiatives. Universal access, regardless of the operating system, made it possible to maintain and share community-driven electronic journals year before online commercial scientific publishings became viable:
{{Quotation|In the late ‘80s and early ‘90s, a host of new journal titles launched on listservs and (later) the Web. Journals such as ''Postmodern Cultures'', ''Surfaces'', the ''Bryn Mawr Classical Review'' and the ''Public-Access Computer Systems Review'' were all managed by scholars and library workers rather than publishing professionals.
The first [[open-access repository|open-access repositories]] were individual or community initiatives as well. In August 1991, [[Paul Ginsparg]] created the first inception of the [[arXiv]] project at the [[Los Alamos National Laboratory]] in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles.{{sfn|Feder|2021}}
Line 85:
The development of the World-Wide Web had rendered numerous pre-existing scientific infrastructure obsolete. It also lifted numerous restrictions and obstacles to online contribution and network management that made it possible to attempt more ambitious project. By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue.{{sfn|Borgman|2007|p=21}} The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained;{{sfn|Dacos|2013}} and project managers were faced with a ''valley of death'' "between grant funding and ongoing operational funding".{{sfn|Skinner|2019|p=6}}
Several competing terms appeared to fill this need. In the United States, the ''cyber-infrastructure'' was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy."
Thanks to "sizable investments",<ref name = "eccles">{{harvnb|Eccles et al.|2009}}</ref> major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007-2008, such as the [[Open Science Grid]], [[BioGRID]], the [[Jisc|JISC]], {{ill|DARIAH|WD=Q49103279}} or the [[Project Bamboo]].{{sfn|Dacos|2013}}{{sfn|eResearch2020|2010|p={{pn|date=February 2024}}}} Specialized free software for scientific publishing like [[Open Journal Systems]] became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals.{{sfn|Bosman et al.|2021|p=93}} Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then.{{sfn|Bosman et al.|2021|p=30}}
By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature".<ref name = "eccles"/> While the development of the web solved a large range of technical issues regarding network management, building scientific infrastructure remained challenging. Governance, communication across all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the [[Project Bamboo]] was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the [[Mellon Foundation]]’s rejection of the project’s final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself".
[[File:Providers of digital tools for the scientific workflow.png|thumb|Leading commercial ecosystems for scientific research]]
Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of [[Elsevier]] "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal".
{{quote|The privatised control of scholarly infrastructures is especially noticeable in the context of ‘vertical integration’ that publishers such as Elsevier and SpringerNature are seeking by controlling all aspects of the research life cycle, from submission to publication and beyond. For example, this vertical integration is represented in a number of Elsevier’s business acquisitions, such as Mendeley (a reference manager), SSRN (a pre-print repository) and Bepress (a provider of repository and publishing software for universities).
=== Toward open science infrastructures (2015-…) ===
The consolidation and expansion of commercial scientific infrastructure had entailed renewed calls to secure "community-controlled infrastructure".
In contrast with the consolidation of privately-owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures".{{sfn|Okune et al.|2018|p=13}} It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "Common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership."{{sfn|Bosman et al.|2018|p=19}}
More precise concepts were needed to embed ethical principles of openness, community-service and autonomous governance in the building of infrastructure and ensure the transformation of small localized scholarly networks into large, "community-wide" structures.{{sfn|Neylon|2017|p=7}} In 2013, [[Cameron Neylon]] underlined that the lack of common infrastructure was one of the main weakness of the open science ecosystem: "in a world where it can be cheaper to re-do an analysis than to store the data, we need to consider seriously the social, physical, and material infrastructure that might support the sharing of the material outputs of research".
{{Quote|Over the past decade, we have made real progress to further ensure the availability of data that supports research claims. This work is far from complete. We believe that data about the research process itself deserves exactly the same level of respect and care. The scholarly community does not own or control most of this information. For example, we could have built or taken on the infrastructure to collect bibliographic data and citations but that task was left to private enterprise.{{sfn|Neylon et al.|2015}}}}
Line 107:
Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref,{{sfn|Bilder|2020}} OpenCitations{{sfn|Di Giambattista|2021}} or Data Dryad{{sfn|The Dryad Team|2020}} and has become a commmon basis for the institutional evaluation of existing open infrastructures.<ref name="ficarra_21">{{harvnb|Ficarra et al.|2020|p=21}}</ref> The main focus of the ''Principles'' is to build "trustworthy institutions" with significant committments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities.{{sfn|Neylon|2017|p=7}}
By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer."
In agreement with the original intent of the ''Principles'', open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space."
The development of open scientific infrastructure has become a debated topic regarding the future of online scientific research. In January 2021, a collective of researchers called for a ''Plan I'' or ''Plan Infrastructure'' in reaction to perceived shortcomings of the international initiative for open science of the cOAlition S, the ''Plan S''.
== Organization of open infrastructures ==
Most of the landscape reports on Open Infrastructure have been undertaken in Europe and, to a lesser extent, in Latin America. For Europe, the main sources include the SPARC report from 2020,
These reports underline that important open science infrastructures may be already existing and yet remain invisible to funders and scientific policies: "alternative practices and projects exist inside and outside Europe, but these projects are almost invisible to the eyes of the public authorities".
=== Type and roles ===
|