Wikipedia:Authority control integration proposal: Difference between revisions
Content deleted Content added
clearer |
|||
(16 intermediate revisions by 5 users not shown) | |||
Line 1:
{{Superseded|[[Wikipedia:Authority control integration proposal/FAQ|the project FAQ]], as '''the project has now concluded'''}}
{{Notice|If you've '''found an error''' with one of the VIAF codes added, please [[Wikipedia:VIAF/errors|list it here]].}}
This proposed project intends to extend and systematise the use of [[authority control]] identifiers, using the {{tl|Authority control}} template, on English Wikipedia articles. ''Authority control'' is the [[term-of-art]] in librarianship, archival practice and related fields for [[unique identifiers]] to [[Wikipedia:Disambiguation|disambiguate]] objects (people, places, academic subjects, etc). These fields of study have different conceptualisations of unique identifiers form some other fields because many systems in place are backwards-compatible to pre-computerisation systems. This project aims to connect the English Wikipedia to this [[long tail]] of identifiers.▼
==Video Summary of the proposal==
On [http://www.youtube.com/watch?v=uwwTNmJUQ8w youtube].
==Introduction==
▲This proposed project intends to extend and systematise the use of [[authority control]] identifiers, using the {{tl|Authority control}} template, on English Wikipedia articles. ''Authority control'' is the [[term-of-art]] in librarianship, archival practice and related fields for [[unique identifiers]] to [[Wikipedia:Disambiguation|disambiguate]] objects (people, places, academic subjects, etc.). These fields of study have different conceptualisations of unique identifiers form some other fields because many systems in place are backwards-compatible to pre-computerisation systems. This project aims to connect the English Wikipedia to this [[long tail]] of identifiers.
The current proposal focuses on biographies, although this may be extended in future to cover other topics, and is built around the use of data from [[VIAF]], a composite system bringing together several major authority files. VIAF algorithmically matches and clusters entries from the individual authority files, and uses data scraped from Wikipedia to aid the process; as a result, there have already been a large number of Wikipedia-VIAF matched pairs identified and this provides a very effective springboard to work from.
The proposal was originally written up here, and [[Wikipedia:Village pump (proposals)/Archive 89#section Authority Control Integration|discussed on the Village Pump
This plan is being coordinated by [[User:Maximiliankleinoclc|Max Klein]], the Wikipedian in Residence at [[OCLC]], and [[User:Andrew Gray|Andrew Gray]], the Wikipedian in Residence at the British Library. OCLC are the central operating group for VIAF, and have offered to provide technical support for the matching process. If you would like to help work on it, please [[Wikipedia talk:Authority control integration
==Background==
Line 21 ⟶ 27:
*'''Returning metadata to the outside world''' - working backwards from this, once we have embedded identifiers, the curators of this metadata will find it a lot easier to incorporate information from Wikipedia, taking advantage of our fairly fast update cycle for things like death dates.
*'''Identifying alternate names''' - particularly for non-standard transliterations, the alternate headings in authority files give us an extensive and curated collection of variants of names. The linkage will help the creation of redirects.
*'''Content creation support''' - the presence of the identifiers allows future work on tools to,
Currently, around 4,000 articles on the English Wikipedia have some form of embedded authority control identifier, and on Commons, around 45,000 articles contain authority control. On the German Wikipedia, by comparison, [[:de:Wikipedia:Normdaten|around 220,000 articles]] have embedded identifiers.
==The proposal==
Line 37 ⟶ 43:
There are three available sources of data:
#'''Articles already using {{tl|Authority control}}'''.
#'''Interwikied articles with identifiers'''.
#:''Around [[:de:Vorlage:NORMDATENCOUNT|145,000 articles]] on the German Wikipedia currently have VIAF identifiers; the rest use other identities, but it may be practical to match them to VIAF.''
#'''VIAF authority file links'''. As part of the matching process, Wikipedia is used as a source of information to help bring VIAF "clusters" together. OCLC have provided an extracted list of over 250,000 English Wikipedia articles with corresponding VIAF numbers, though these may have to be checked to ensure that pages have not been moved since the matching was carried out.
#:''(The matching is done with this [http://dl.dropbox.com/u/10997393/wikipedia2auth3.py python code] written by OCLC Research Scientists Thom Hickey and Jenny Toves. During the algorithmic creation of the VIAF file if a Wikipedia link is matched with ~98% accuracy then it is included in the entry. Right now there are 266,202 links from VIAF to Wikipedia. Those links are available [http://dl.dropbox.com/u/10997393/wikilinks.out as a tab-delimited text file].)''
Line 52 ⟶ 58:
# This bot will add {{tl|authority control}} along with the VIAF codes from this list, once testing is complete.
# Finally, this bot will run periodic reports in conjunction with the VIAF update schedule, to reflect any reshuffling that occurs in the file.
Line 72 ⟶ 77:
{{Wikipedia:Authority_control_integration_proposal/FAQ}}
- [[User:Maximiliankleinoclc|Max Klein]], OCLC Wikipedian in Residence, and [[User:Andrew Gray|Andrew Gray]], British Library Wikipedian in Residence.▼
==Progress==
Now that [[Wikipedia:Authority_control_integration_proposal/RFC|RFC]] has passed, the work of the bot is underway. Code can be viewed at [https://github.com/notconfusing/VIAFbot github].
▲- [[User:Maximiliankleinoclc|Max Klein]], OCLC Wikipedian in Residence, and [[User:Andrew Gray|Andrew Gray]], British Library Wikipedian in Residence.
|