SoftWare Hash IDentifier: Difference between revisions

Content deleted Content added
top: copyedit: what is meant, per SH, is not that the SWHID is independent of the software artefact, but that it is independent of an external register
m Format: fix syntaxhighlight errors
 
(10 intermediate revisions by 3 users not shown)
Line 14:
| number =
| start_date =
| organisation = [[Software Heritage]]
| digits =
| check_digit =
Line 21:
}}
 
The '''''SoftWare Hash IDentifier''''' ('''SWHID''') is a [[persistent identifier]] used to uniquely identify a particular piece of software [[source code]] and its version. SWHID is a standard similar to the [[Digital Object Identifier|DOI]], but is tailored specifically for software source code,<ref name="ProgrammingHistorianFr_preserving_identifying" /> compatible with versioning software such as [[git]].
 
An SWHID can be used to point to different components or versions of the source code of a software package.<ref name="ProgrammingHistorianFr_preserving_identifying" /> The SWHID is an intrinsic identifier in the sense that it describes the software based only on the software's intrinsic properties, with no reliance on an external register.<ref>{{Cite web |language=en |title=Intrinsic and Extrinsic identifiers |url=https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/ |website=Software Heritage |access-date=2025-05-24}}</ref>
 
== Format ==
The SWHID specification, maintained by Software Heritage,<ref name="UNESCO_archiving_open">{{cite Q|Q134581397|url-status=live}}</ref> allows identifying different components of software source code. Object types relating to the [[software versioning|software version]] are labelled as "snapshot", "release" or "revision"; a "directory" of files and possibly subdirectories can be identified; and a specific piece of a specific version of source code can be labelled as "content".<ref name="ProgrammingHistorianFr_preserving_identifying">{{cite Q|Q134581061|url-status=live|trans-title=Preserving and identifying research software with Software Heritage}}</ref> These are related to one another in a [[Merkle tree|Merkle]] [[directed acyclic graph]].<ref name="DiCosmo_Gruenpeter_Zacchiroli_2018">{{cite Q|Q105094730|url-status=live}}</ref>
 
The identifier has the following syntax:<ref name="SWHID_tracking_past" />
 
<syntaxhighlight lang="noneunixconfig">
swh:<scheme_version>:<object_type>:<object_id>[;qualifiers]
</syntaxhighlight>
Line 39:
Version 3.0 of the Linux kernel, released in July 2011, has the following SWHID:<ref>{{Cite web |language=en |title=Release v3.0 of torvalds/linux repository |url=https://archive.softwareheritage.org/browse/release/4204bcde7c0b93c5e127eb868e17b337a513cf34/?origin_url=https://github.com/torvalds/linux&release=v3.0&snapshot=130eecc6bd74794737bb078fe5c3fadd034eddcc |website=Software Heritage |access-date=2025-05-24}}</ref>
 
<codesyntaxhighlight lang="text">swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd</codesyntaxhighlight>
 
The following example, drawn from the specification documentation,<ref>{{Cite web |language=en |title=Qualified identifiers |url=https://www.swhid.org/specification/v1.2/6.Qualified_identifiers/ |website=swhid.org |access-date=2025-05-27}}</ref> illustrates the use of multiple qualifiers in an SWHID:
 
<syntaxhighlight lang="text">
swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15
</syntaxhighlight>
 
== Standards ==
Line 49 ⟶ 55:
The SoftWare Hash IDentifier was developed by [[Software Heritage]]. Software Heritage's archives, identified by their SWHIDs, were publicly released starting in 2018.<ref name="CNRS_supports">{{cite Q|Q134581205|url-status=live|trans-title=The CNRS supports Software Heritage}}</ref>
 
{{as of|2020}}, SWHIDs were in use for about nine billion versions of pieces of software,<ref name="CNRS_supports" /> termed "artefacts".<ref name="SWHID_tracking_past">{{cite Q|Q134580517|url-status=live}}</ref> SWHIDs are integrated with research repositories including [[HAL (open archive)|HAL]], [[Zenodo]] and the French catalog of Academic Research Free Software.<ref>{{Cite web |language=en |title=About the site |url=https://logiciels.catalogue-esr.fr/readme |website=French Catalog of Academic Research Free Software |access-date=2025-05-24}}</ref> The identifier can be used by [[package manager]],s. [[GNU Guix|Guix]] relyuses on itSWHIDs to retrieve source code in a software archive when unavailable fromat its original URL.<ref>{{Cite web |language=en |title=Identifying software |url=https://guix.gnu.org/fr/blog/2024/identifying-software/ |website=GNU Guix Blog |access-date=2025-05-27}}</ref>.
 
The acronym SWHID originally referred to "Software Heritage Identifiers" used to catalog software artifacts in the early days of the [[Software Heritage]] archive.<ref>{{Cite web |language=en |title=SoftWare Hash IDentifier (SWHID) |url=https://www.softwareheritage.org/software-hash-identifier-swhid/ |website=Software Heritage |access-date=2025-05-24}}</ref> It later evolved into an open standard through a dedicated working group<ref>{{Cite web |language=en |title=SWHID working group |url=https://www.swhid.org/ |access-date=2025-05-24}}</ref> and was standardized as ISO in April 2025 under the more general name "Software Hash Identifier".<ref>{{Cite web |language=en |title=ISO/IEC 18670:2025 |url=https://www.iso.org/standard/89985.html |website=ISO |access-date=2025-05-24}}</ref>.
 
[[Télécom Paris]] welcomed the ISO normalization arguing that it is a significant step in global digital infrastructure, providing traceability of software affected by vulnerabilities.<ref name="TelecomParis_significant_advance">{{cite Q|Q134580605|url-status=live|trans-title=A significant advance for global digital infrastructure: the ISO/IEC 18670 standard is now official}}</ref> UNESCO stated that SWHID is useful for the reproducibility and long-term accessibility of software.<ref>{{cite nameQ|Q134581397|url-status="UNESCO_archiving_open" live}}</ref>
 
== References ==
Line 62 ⟶ 68:
* [https://www.iso.org/obp/ui/en/#iso:std:iso-iec:18670:ed-1:v1:en ISO/IEC 18670:2025 Specification v1.2]
 
{{ISO standards}}
[[Category:Information science]]
[[Category:IdentifiersUnique identifiers]]
 
{{comp-sci-stub}}