Open Archives Initiative Protocol for Metadata Harvesting
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting o Protocollo per il raccoglimento dei metadati dell'Open Archive Initiative) è un protocollo sviluppato dall'Open Archives Initiative. È utilizzato per raccogliere (o collezionare) i metadati descriptions of the records in un archivio affinché i servizi possano essere costruiti utilizzando metadati da più archivi. Una implementazione dell'OAI-PMH deve supportare metadati rappresentati in Dublin Core, ma può supportare altre reppresentazioni.
Il protocollo è spesso chiamato semplicemente protocollo OAI.
OAI-PMH utilizza XML su HTTP. La versione corrente è la 2.0, aggiornata nel 2002.
Storia
Alla fine degli anni 1990s, Herbert Van de Sompel, Università del Ghent, stava lavorando con ricercatorie bibliotecari nei Laboratori Nazionali di Los Alamos (US) e istituì un incontro per evidenziare le difficoltà relative ai problemi di interoperabilità dei servers e-print e degli archivi digitali. L'incontro si tenne a Santa Fe, Nuovo Messico, nell' ottobre 1999. Uno sviluppo chiave raggiunto dall'incontro fu la definizione di un'interfaccia che permise ai server e-print di esporre i metadati per gli articoli contenuti in maniera che altri archivi potessero identificare e copiare gli articoli di interesse, in maniera reciproca. Questa interfaccia fu chiamata la "Convenzione di Santa Fe".
Diversi workshops furono tenuti nel 2000 at the ACM Digital Libraries conference and elsewhere to share the ideas from the Santa Fe Convention. It was discovered at the workshops that the problems faced by the e-print community were also shared by libraries, museums, journal publishers, and others who needed to share distributed resources. To address these needs, the Coalition for Networked Information and the Digital Library Federation provided funding to establish an Open Archives Initiative (OAI) secretariat managed by Herbert Van de Sompel and Carl Lagoze. The OAI held a meeting at Cornell University (Ithaca, New York) in September 2000 to improve the interface developed at the Santa Fe Convention. The specifications were refined over e-mail.
OAI-PMH version 1.0 was introduced to the public in January 2001 at a workshop in Washington D.C., and another in February in Berlin, Germany. Subsequent modifications to the XML standard by the W3C required making minor modifications to OAI-PMH resulting in version 1.1. The current version, 2.0, was released in June 2002. It contained several technical changes and enhancements and is not backward compatible.
OAI-PMH
L'OAI-PMH fornisce un framework di interoperabilità, indipendente dall'applicazione,basata sul raccoglimento dei metadati. Vi sono due classi di participanti nell'OAI-PMH framework:
- I fornitori di dati amministrano i sistemi che supportano l'OAI-PMH come un mezzo per poter esporre i metadati
- I fornitori di servizi usano i metadati raccolti attraverso l'OAI-PMH come base per costruire servizi dal valore aggiunto.[1]
Registri OAI
The OAI Protocol has become widely adopted by many digital libraries, institutional repositories, and digital archives. Although registration is not mandatory, it is encouraged.
There are several large registries of OAI-compliant repositories:
- The Open Archives list of registered OAI repositories
- The OAI registry at University of Illinois at Urbana-Champaign
- The Celestial OAI registry
- Eprint’s Institutional Archives Registry
- Openarchives.eu The European Guide to OAI-PMH compliant repositories in the world
- ScientificCommons.org A worldwide service and registry
Uses
Commercial search engines have started using OAI-PMH to acquire more resources. Google is using OAI-PMH to harvest information from the National Library of Australia Digital Object Repository. In 2004, Yahoo! acquired content from OAIster (University of Michigan) that was obtained through metadata harvesting with OAI-PMH. Google did accept OAI-PMH as part of their Sitemap Protocol, though decided to stop doing so in 2008[2].
The mod_oai project is using OAI-PMH to expose content to web crawlers that is accessible from Apache Web servers.
Software
OAI-PMH is based on a client-server architecture, in which "harvesters" request information on updated records from "repositories". Requests for data can be based on a datestamp range, and can be restricted to named sets defined by the provider. Data providers are required to provide XML metadata in Dublin Core format, and may also provide it in other XML formats.
A number of software systems support the OAI-PMH, including Fedora, GNU EPrints from the University of Southampton, Open Journal Systems from the Public Knowledge Project, Desire2Learn, DSpace from MIT, HyperJournal from the University of Pisa , Primo, DigiTool, Rosetta and MetaLib from Ex Libris, DOOR from the eLab in Lugano, Switzerland, java implementation jOAI.
Archives
A number of large archives support the protocol including arXiv and the CERN Document Server.
Workshops
Since 2001 there has been a yearly workshop at CERN in Ginevra.
References
- Carl Lagoze and Herbert Van de Sompel, The Open Archives Initiative: Building a Low-Barrier Interoperability Framework, 2001, pp. 54-62.
- Lynch, Clifford A. (2001). "Metadata harvesting and the open archives initiative". ARL Bimonthly Report 217.
- Frank McCown, Xiaoming Liu, Michael L. Nelson and Mohammed Zubair, Search Engine Coverage of the OAI-PMH Corpus, in IEEE Internet Computing, vol. 10, n. 2, March/April 2006, pp. 66–73.
- Herbert Van de Sompel and Carl Lagoze, [http://dx.doi.org/10.1045/february2000-vandesompel-oai The Santa Fe Convention of the Open Archives Initiative], in D-Lib Magazine, vol. 6, n. 2, 2000, DOI:10.1045/february2000-vandesompel-oai.
- Herbert Van de Sompel, Jeffrey A. Young, and Thomas B. Hickey, [http://dx.doi.org/10.1045/july2003-young Using the OAI-PMH ... Differently], in D-Lib Magazine, vol. 9, n. 7/8, 2003, DOI:10.1045/july2003-young.
Note
Voci correlate
- Data format management
- Digital curation
- Digital preservation
- File format
- Dublin Core, uno standard metadata ISO
- National Digital Information Infrastructure and Preservation Program
- Metadata Encoding and Transmission Standard mantenuto dalla Biblioteca del Congresso
- LOCKSS
- Web archiving