SoftWare Hash IDentifier: Difference between revisions

Content deleted Content added
AbcSxyZ (talk | contribs)
Correct and reformulate sentence on intrinsic identifier
m Format: fix syntaxhighlight errors
 
(22 intermediate revisions by 4 users not shown)
Line 1:
{{Short description|Software identifier}}
{{Draft topics|linguistics|software|computing|technology}}
{{AfC topic|stem}}
{{AfC submission|||ts=20250526140318|u=AbcSxyZ|ns=118}}
{{AFC submission|d|reason|Declining because this article is currently only sourced to documentation pages, standards, or organizations are involved in SWHID development. For the notability criterion to be met, multiple reliable independent sources (news articles, books, etc.) discussing the topic need to be cited.|u=AbcSxyZ|ns=118|decliner=Mrfoogles|declinets=20250525181148|ts=20250525163607}} <!-- Do not remove this line! -->
 
{{AFC comment|1=In accordance with Wikipedia's [[Wikipedia:Conflict of interest|Conflict of interest policy]], I disclose that I have a conflict of interest regarding the subject of this article. <!--Comment automatically added by the Article Wizard--> [[User:AbcSxyZ|AbcSxyZ]] ([[User talk:AbcSxyZ|talk]]) 16:33, 25 May 2025 (UTC)}}
 
----
 
 
 
<!-- Important, do not remove anything above this line before article has been created. -->
 
{{Italic title}}
{{Stub|Information science}}
{{Infobox identifier
| name =
Line 28 ⟶ 14:
| number =
| start_date =
| organisation = [[Software Heritage]]
| digits =
| check_digit =
| example = [https://archive.softwareheritage.org/swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd]
| website = https://www.{{official website|name=swhid.org/}}
}}
The '''''SoftWare Hash IDentifier''''' ('''SWHID''') is a persistent identifier used to uniquely identify a particular piece of software [[source code]] and its version. SWHID is a standard similar to the [[Digital Object Identifier|DOI]], but is tailored specifically for software source code,<ref name="ProgrammingHistorianFr_preserving_identifying" /> compatible with versioning software such as [[git]].
 
The '''''SoftWare Hash IDentifier''''' ('''SWHID''') is a [[persistent identifier]] used to uniquely identify a particular piece of software [[source code]] and its version. SWHID is a standard similar to the [[Digital Object Identifier|DOI]], but is tailored specifically for software source code,<ref name="ProgrammingHistorianFr_preserving_identifying" /> compatible with versioning software such as [[git]].
An SWHID can be used to point to different components or versions of the source code of a software package.<ref name="ProgrammingHistorianFr_preserving_identifying" /> This is an intrinsic identifier that can be calculated independently on the software itself<ref>{{Cite web |language=en |title=Intrinsic and Extrinsic identifiers |url=https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/ |website=Software Heritage |access-date=2025-05-24}}</ref>.
 
An SWHID can be used to point to different components or versions of the source code of a software package.<ref name="ProgrammingHistorianFr_preserving_identifying" /> ThisThe SWHID is an intrinsic identifier in the sense that canit bedescribes calculatedthe independentlysoftware based only on the software's itselfintrinsic properties, with no reliance on an external register.<ref>{{Cite web |language=en |title=Intrinsic and Extrinsic identifiers |url=https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/ |website=Software Heritage |access-date=2025-05-24}}</ref>.
 
== Format ==
The SWHID specification, maintained by Software Heritage,<ref name="UNESCO_archiving_open">{{cite Q|Q134581397|url-status=live}}</ref> allows identifying different components of software source code. Object types relating to the [[software versioning|software version]] are labelled as "snapshot", "release" or "revision"; a "directory" of files and possibly subdirectories can be identified; and a specific piece of a specific version of source code can be labelled as "content".<ref name="ProgrammingHistorianFr_preserving_identifying">{{cite Q|Q134581061|url-status=live|trans-title=Preserving and identifying research software with Software Heritage}}</ref> These are related to one another in a [[Merkle tree|Merkle]] [[directed acyclic graph]].<ref name="DiCosmo_Gruenpeter_Zacchiroli_2018">{{cite Q|Q105094730|url-status=live}}</ref>
 
The identifier has the following syntax:<ref name="SWHID_tracking_past" />
 
<syntaxhighlight lang="noneunixconfig">
swh:<scheme_version>:<object_type>:<object_id>[;qualifiers]
</syntaxhighlight>
Line 50 ⟶ 37:
According to the [[French National Centre for Scientific Research]] (CNRS), software source code archived with SWHIDs includes the source codes of [[Apollo 11]] navigation and of the [[NCSA Mosaic]] web browser.<ref name="CNRS_supports" />
 
Version 3.0 of the Linux kernel, released in July 2011, has the following SWHID:<ref>{{Cite web |language=en |title=Release v3.0 of torvalds/linux repository |url=https://archive.softwareheritage.org/browse/release/4204bcde7c0b93c5e127eb868e17b337a513cf34/?origin_url=https://github.com/torvalds/linux&release=v3.0&snapshot=130eecc6bd74794737bb078fe5c3fadd034eddcc |website=Software Heritage |access-date=2025-05-24}}</ref>:
 
<codesyntaxhighlight lang="text">swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd</codesyntaxhighlight>
 
The following example, drawn from the specification documentation,<ref>{{Cite web |language=en |title=Qualified identifiers |url=https://www.swhid.org/specification/v1.2/6.Qualified_identifiers/ |website=swhid.org |access-date=2025-05-27}}</ref> illustrates the use of multiple qualifiers in an SWHID:
 
<syntaxhighlight lang="text">
swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15
</syntaxhighlight>
 
== Standards ==
SWHID is an open standard licensed under the Community Specification License.<ref>{{Cite web |language=en |title=Copyright Section of SWHID Specification v1.2 |url=https://www.swhid.org/specification/v1.2/ |access-date=2025-05-24}}</ref>.
 
SWHID was formalized as the ISO 18670 standard in April 2025.<ref name="ISO18670">{{Cite web |language=en |title=ISO/IEC 18670:2025 |url=https://www.iso.org/standard/89985.html |website=ISO |access-date=2025-05-24}}</ref>.
 
== Creation and history ==
The SoftWare Hash IDentifier was developed by [[Software Heritage]]. Software Heritage's archives, identified by their SWHIDs, were publicly released starting in 2018.<ref name="CNRS_supports">{{cite Q|Q134581205|url-status=live|trans-title=The CNRS supports Software Heritage}}</ref>
 
{{as of|2020}}, SWHIDs were in use for about nine billion versions of pieces of software,<ref name="CNRS_supports" /> termed "artefacts".<ref name="SWHID_tracking_past">{{cite Q|Q134580517|url-status=live}}</ref> SWHIDs are integrated with research repositories including [[HAL (open archive)|HAL]], [[Zenodo]] and the French catalog of Academic Research Free Software.<ref>{{Cite web |language=en |title=About the site |url=https://logiciels.catalogue-esr.fr/readme |website=French Catalog of Academic Research Free Software |access-date=2025-05-24}}</ref> The identifier can be used by [[package manager]]s. [[GNU Guix|Guix]] uses SWHIDs to retrieve source code in a software archive when unavailable at its original URL.<ref>{{Cite web |language=en |title=Identifying software |url=https://guix.gnu.org/fr/blog/2024/identifying-software |website=GNU Guix Blog |access-date=2025-05-27}}</ref>
 
The acronym SWHID originally referred to "Software Heritage Identifiers" used to catalog software artifacts in the early days of the [[Software Heritage]] archive.<ref>{{Cite web |language=en |title=SoftWare Hash IDentifier (SWHID) |url=https://www.softwareheritage.org/software-hash-identifier-swhid/ |website=Software Heritage |access-date=2025-05-24}}</ref>. It later evolved into an open standard through a dedicated working group<ref>{{Cite web |language=en |title=SWHID working group |url=https://www.swhid.org/ |access-date=2025-05-24}}</ref> and was standardized as ISO in April 2025 under the more general name "Software Hash Identifier".<ref>{{Cite web |language=en |title=ISO/IEC 18670:2025 |url=https://www.iso.org/standard/89985.html |website=ISO |access-date=2025-05-24}}</ref>
 
[[Télécom Paris]] welcomed the ISO normalization arguing that it is a significant step in global digital infrastructure, providing traceability of software affected by vulnerabilities.<ref name="TelecomParis_significant_advance">{{cite Q|Q134580605|url-status=live|trans-title=A significant advance for global digital infrastructure: the ISO/IEC 18670 standard is now official}}</ref> UNESCO stated that SWHID is useful for the reproducibility and long-term accessibility of software.<ref>{{cite nameQ|Q134581397|url-status="UNESCO_archiving_open" live}}</ref>
 
== References ==
Line 72 ⟶ 65:
 
== External links ==
* [https://www.swhid.org {{Official website]}}
* [https://www.iso.org/obp/ui/en/#iso:std:iso-iec:18670:ed-1:v1:en ISO/IEC 18670:2025 Specification v1.2]
 
{{ISO standards}}
{{Stub|[[Category:Information science}}]]
[[Category:Unique identifiers]]
 
{{comp-sci-stub}}