SoftWare Hash IDentifier: Difference between revisions

Content deleted Content added
top: infobox: +org +clickable URL for the example identifier
m Format: fix syntaxhighlight errors
 
(28 intermediate revisions by 4 users not shown)
Line 1:
{{Short description|Software identifier}}
{{Draft topics|linguistics|software|computing|technology}}
{{AfC topic|stem}}
{{AfC submission|||ts=20250526140318|u=AbcSxyZ|ns=118}}
{{AFC submission|d|reason|Declining because this article is currently only sourced to documentation pages, standards, or organizations are involved in SWHID development. For the notability criterion to be met, multiple reliable independent sources (news articles, books, etc.) discussing the topic need to be cited.|u=AbcSxyZ|ns=118|decliner=Mrfoogles|declinets=20250525181148|ts=20250525163607}} <!-- Do not remove this line! -->
 
{{AFC comment|1=In accordance with Wikipedia's [[Wikipedia:Conflict of interest|Conflict of interest policy]], I disclose that I have a conflict of interest regarding the subject of this article. <!--Comment automatically added by the Article Wizard--> [[User:AbcSxyZ|AbcSxyZ]] ([[User talk:AbcSxyZ|talk]]) 16:33, 25 May 2025 (UTC)}}
 
----
 
 
 
<!-- Important, do not remove anything above this line before article has been created. -->
 
{{Italic title}}
{{Stub|Information science}}
{{Infobox identifier
| name =
Line 28 ⟶ 14:
| number =
| start_date =
| organisation = [[Software Heritage]]
| digits =
| check_digit =
| example = [https://archive.softwareheritage.org/swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd]
| website = https://www.{{official website|name=swhid.org/}}
}}
The '''''SoftWare Hash IDentifier''''' ('''SWHID''') is a persistent identifier used to uniquely identify a particular piece of software [[source code]] and its version. SWHID is a standard similar to the [[Digital Object Identifier|DOI]], but is tailored specifically for software source code,<ref name="ProgrammingHistorianFr_preserving_identifying" /> compatible with versioning software such as [[git]].
 
AnThe '''''SoftWare Hash IDentifier''''' ('''SWHID''') canis bea used[[persistent toidentifier]] pointused to differentuniquely componentsidentify ora particular versionspiece of thesoftware [[source code]] ofand its version. SWHID is a standard similar to the [[Digital Object Identifier|DOI]], but is tailored specifically for software package.source code,<ref name="ProgrammingHistorianFr_preserving_identifying" /> compatible with versioning software such as [[git]].
 
An SWHID can be used to point to different components or versions of the source code of a software package.<ref name="ProgrammingHistorianFr_preserving_identifying" /> The SWHID is an intrinsic identifier in the sense that it describes the software based only on the software's intrinsic properties, with no reliance on an external register.<ref>{{Cite web |language=en |title=Intrinsic and Extrinsic identifiers |url=https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/ |website=Software Heritage |access-date=2025-05-24}}</ref>
== Creation and history ==
The SoftWare Hash IDentifier was developed by [[Software Heritage]]. Software Heritage's archives, identified by their SWHIDs, were publicly released starting in 2018.<ref name="CNRS_supports">{{cite Q|Q134581205|url-status=live|trans-title=The CNRS supports Software Heritage}}</ref>
 
== Format ==
{{as of|2020}}, SWHIDs were in use for about nine billion versions of pieces of software,<ref name="CNRS_supports" /> termed "artefacts".<ref name="SWHID_tracking_past">{{cite Q|Q134580517|url-status=live}}</ref> SWHIDs are integrated with research repositories including [[HAL (open archive)|HAL]], [[Zenodo]] and the French catalog of Academic Research Free Software<ref>{{Cite web |language=en |title=About the site |url=https://logiciels.catalogue-esr.fr/readme |website=French Catalog of Academic Research Free Software |access-date=2025-05-24}}</ref>.
The SWHID specification allows identifying different components of software source code. Object types relating to the [[software versioning|software version]] are labelled as "snapshot", "release" or "revision"; a "directory" of files and possibly subdirectories can be identified; and a specific piece of a specific version of source code can be labelled as "content".<ref name="ProgrammingHistorianFr_preserving_identifying">{{cite Q|Q134581061|url-status=live|trans-title=Preserving and identifying research software with Software Heritage}}</ref> TheThese identifierare hasrelated theto followingone syntax:another in a [[Merkle tree|Merkle]] [[directed acyclic graph]].<ref name="SWHID_tracking_pastDiCosmo_Gruenpeter_Zacchiroli_2018">{{cite Q|Q105094730|url-status=live}}</ref>
 
The identifier has the following syntax:<ref name="SWHID_tracking_past" />
The acronym SWHID originally referred to "Software Heritage Identifiers" used to catalog software artifacts in the early days of the [[Software Heritage]] archive<ref>{{Cite web |language=en |title=SoftWare Hash IDentifier (SWHID) |url=https://www.softwareheritage.org/software-hash-identifier-swhid/ |website=Software Heritage |access-date=2025-05-24}}</ref>. It later evolved into an open standard through a dedicated working group<ref>{{Cite web |language=en |title=SWHID working group |url=https://www.swhid.org/ |access-date=2025-05-24}}</ref> and was standardized as ISO in April 2025 under the more general name "Software Hash Identifier"<ref>{{Cite web |language=en |title=ISO/IEC 18670:2025 |url=https://www.iso.org/standard/89985.html |website=ISO |access-date=2025-05-24}}</ref>
 
<syntaxhighlight lang="noneunixconfig">
[[Télécom Paris]] welcomed the ISO normalization arguing that it is a significant step in global digital infrastructure, providing traceability of software affected by vulnerabilities.<ref name="TelecomParis_significant_advance">{{cite Q|Q134580605|url-status=live|trans-title=A significant advance for global digital infrastructure: the ISO/IEC 18670 standard is now official}}</ref> UNESCO stated that SWHID is useful for the reproducibility and long-term accessibility of software.<ref>{{Cite web |language=en |title=Archiving open software as human heritage |url=https://www.unesco.org/en/open-science/inclusive-science/archiving-open-software-human-heritage |website=UNESCO |access-date=2025-05-24}}</ref>
swh:<scheme_version>:<object_type>:<object_id>[;qualifiers]
</syntaxhighlight>
 
=== StandardsExamples ===
According to the [[French National Centre for Scientific Research]] (CNRS), software source code archived with SWHIDs includes the source codes of [[Apollo 11]] navigation and of the [[NCSA Mosaic]] web browser.<ref name="CNRS_supports" />
SWHID is an open standard licensed under the Community Specification License<ref>{{Cite web |language=en |title=Copyright Section of SWHID Specification v1.2 |url=https://www.swhid.org/specification/v1.2/ |access-date=2025-05-24}}</ref>.
 
SWHIDVersion was formalized3.0 asof the ISOLinux 18670kernel, standardreleased in AprilJuly 20252011, has the following SWHID:<ref name="ISO18670">{{Cite web |language=en |title=ISORelease v3.0 of torvalds/IEClinux 18670:2025repository |url=https://wwwarchive.isosoftwareheritage.org/standardbrowse/89985release/4204bcde7c0b93c5e127eb868e17b337a513cf34/?origin_url=https://github.htmlcom/torvalds/linux&release=v3.0&snapshot=130eecc6bd74794737bb078fe5c3fadd034eddcc |website=ISOSoftware Heritage |access-date=2025-05-24}}</ref>.
 
<codesyntaxhighlight lang="text">swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd</codesyntaxhighlight>
== Format ==
The SWHID allows identifying different components of software source code. Object types relating to the [[software versioning|software version]] are labelled as "snapshot", "release" or "revision"; a "directory" of files and possibly subdirectories can be identified; and a specific piece of a specific version of source code can be labelled as "content".<ref name="ProgrammingHistorianFr_preserving_identifying">{{cite Q|Q134581061|url-status=live|trans-title=Preserving and identifying research software with Software Heritage}}</ref> The identifier has the following syntax:<ref name="SWHID_tracking_past" />
 
The following example, drawn from the specification documentation,<ref>{{Cite web |language=en |title=Qualified identifiers |url=https://www.swhid.org/specification/v1.2/6.Qualified_identifiers/ |website=swhid.org |access-date=2025-05-27}}</ref> illustrates the use of multiple qualifiers in an SWHID:
<syntaxhighlight lang="none">
 
swh:<scheme_version>:<object_type>:<object_id>[;qualifiers]
<syntaxhighlight lang="text">
swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15
</syntaxhighlight>
 
=== ExamplesStandards ===
SWHID is an open standard licensed under the Community Specification License.<ref>{{Cite web |language=en |title=Copyright Section of SWHID Specification v1.2 |url=https://www.swhid.org/specification/v1.2/ |access-date=2025-05-24}}</ref>.
According to the [[French National Centre for Scientific Research]] (CNRS), software source code archived with SWHIDs includes the source codes of [[Apollo 11]] navigation and of the [[NCSA Mosaic]] web browser.<ref name="CNRS_supports" />
 
VersionSWHID 3.0was formalized ofas the LinuxISO kernel,18670 releasedstandard in JulyApril 2011, has the following SWHID2025.<ref name="ISO18670">{{Cite web |language=en |title=Release v3.0 of torvaldsISO/linuxIEC repository18670:2025 |url=https://archivewww.softwareheritageiso.org/browsestandard/release/4204bcde7c0b93c5e127eb868e17b337a513cf34/?origin_url=https://github89985.com/torvalds/linux&release=v3.0&snapshot=130eecc6bd74794737bb078fe5c3fadd034eddcchtml |website=Software HeritageISO |access-date=2025-05-24}}</ref>:
 
== Creation and history ==
<code>swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd</code>
The SoftWare Hash IDentifier was developed by [[Software Heritage]]. Software Heritage's archives, identified by their SWHIDs, were publicly released starting in 2018.<ref name="CNRS_supports">{{cite Q|Q134581205|url-status=live|trans-title=The CNRS supports Software Heritage}}</ref>
 
{{as of|2020}}, SWHIDs were in use for about nine billion versions of pieces of software,<ref name="CNRS_supports" /> termed "artefacts".<ref name="SWHID_tracking_past">{{cite Q|Q134580517|url-status=live}}</ref> SWHIDs are integrated with research repositories including [[HAL (open archive)|HAL]], [[Zenodo]] and the French catalog of Academic Research Free Software.<ref>{{Cite web |language=en |title=About the site |url=https://logiciels.catalogue-esr.fr/readme |website=French Catalog of Academic Research Free Software |access-date=2025-05-24}}</ref> The identifier can be used by [[package manager]]s. [[GNU Guix|Guix]] uses SWHIDs to retrieve source code in a software archive when unavailable at its original URL.<ref>{{Cite web |language=en |title=Identifying software |url=https://guix.gnu.org/fr/blog/2024/identifying-software |website=GNU Guix Blog |access-date=2025-05-27}}</ref>
 
The acronym SWHID originally referred to "Software Heritage Identifiers" used to catalog software artifacts in the early days of the [[Software Heritage]] archive.<ref>{{Cite web |language=en |title=SoftWare Hash IDentifier (SWHID) |url=https://www.softwareheritage.org/software-hash-identifier-swhid/ |website=Software Heritage |access-date=2025-05-24}}</ref>. It later evolved into an open standard through a dedicated working group<ref>{{Cite web |language=en |title=SWHID working group |url=https://www.swhid.org/ |access-date=2025-05-24}}</ref> and was standardized as ISO in April 2025 under the more general name "Software Hash Identifier".<ref>{{Cite web |language=en |title=ISO/IEC 18670:2025 |url=https://www.iso.org/standard/89985.html |website=ISO |access-date=2025-05-24}}</ref>
 
[[Télécom Paris]] welcomed the ISO normalization arguing that it is a significant step in global digital infrastructure, providing traceability of software affected by vulnerabilities.<ref name="TelecomParis_significant_advance">{{cite Q|Q134580605|url-status=live|trans-title=A significant advance for global digital infrastructure: the ISO/IEC 18670 standard is now official}}</ref> UNESCO stated that SWHID is useful for the reproducibility and long-term accessibility of software.<ref>{{Cite webcite Q|language=en |title=Archiving open software as human heritage Q134581397|url=https://www.unesco.org/en/open-science/inclusive-science/archiving-open-software-human-heritage |websitestatus=UNESCO |access-date=2025-05-24live}}</ref>
 
== References ==
Line 70 ⟶ 65:
 
== External links ==
* [https://www.swhid.org {{Official website]}}
* [https://www.iso.org/obp/ui/en/#iso:std:iso-iec:18670:ed-1:v1:en ISO/IEC 18670:2025 Specification v1.2]
 
{{ISO standards}}
{{Stub|[[Category:Information science}}]]
[[Category:Unique identifiers]]
 
{{comp-sci-stub}}