Wikipedia:Authority control integration proposal: Difference between revisions

Content deleted Content added
Background: expand benefits
Line 13:
{{main|Wikipedia:Authority control}}
 
[[Authority control]] is a system primarily used in libraries and other metadata services, where a single entity is given a canonical unique identifier. This allows clear disambiguation between different entities with similar names, while also allowing the use of a single identifier for those with multiple variant names. On Wikipedia, this is handled with the {{tl|authority control}} template, which places the identifiers at the end of the article and links out to library catalogues and central authority databases.
 
As well as these reader-visible links, the embedded data helps build infrastructure for future work, such as:
Currently, around 4,000 articles on the English Wikipedia have some form of embedded authority control identifier. Likewise on Commons around (45,000) articles contain authority control. On the German Wikipedia, by comparison, around 220,000 articles have embedded identifiers.
 
*'''Reliable linking from external services''' - we can build lookup services, such as this tool for the German Wikipedia's PND files: http://toolserver.org/~apper/pd/person/pnd-redirect/de/118768581 - which takes you to the article represented by that PND. Such tools allow people to automatically generate links to Wikipedia without guessing at article titles, use the API to pull out leads from articles for reuse in other sites, etc.
The practical uses for these identifiers are varied. Among other things, they can:
*'''Extending the scope for checking metadata''' - we already have methods, such as the [[Wikipedia:Death anomalies project|Death anomalies project]], for comparing the metadata between Wikipedia language editions and spotting inconsistencies. Including identifiers which tie into external services, with reliable APIs, give us a lot of additional data for cross-checking.
* help provide access to material written by and about the subject of an article, by linking directly into catalogues;
*'''Returning metadata to the outside world''' - working backwards from this, once we have embedded identifiers, the curators of this metadata will find it a lot easier to incorporate information from Wikipedia, taking advantage of our fairly fast update cycle for things like death dates.
* identify alternate names for which we can create redirects;
*'''Identifying alternate names''' - particularly for non-standard transliterations, the alternate headings in authority files give us an extensive and curated collection of variants of names. The linkage will help the creation of redirects.
* allow direct linking to, and reuse of, Wikipedia articles by external services, without needing to check page titles;
*'''Content creation support''' - the presence of the identifiers allows future work on tools to, eg, develop scripts to generate author's bibliographies for articles.
* support the development of better metadata services, by helping information flow from Wikipedia back to the central data stores;
* help tie together articles on specific individuals, supporting the development of [[:meta:Wikidata|Wikidata]]
 
Currently, around 4,000 articles on the English Wikipedia have some form of embedded authority control identifier. Likewise on Commons around (45,000) articles contain authority control. On the German Wikipedia, by comparison, around 220,000 articles have embedded identifiers.
This project is not the first effort to merge authority control with Wikipedia, but rather aims to build on previous projects; its main goal is to prepare infrastructure for use in Wikidata and future interoperability with external authority files, and to support the opportunities for future innovation with linked data.
 
==The proposal==