Cryptographic hash function: Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted Visual edit Mobile edit Mobile web edit
Reverted 1 edit by 178.73.75.172 (talk): Unexplained removal of references and links
 
(25 intermediate revisions by 22 users not shown)
Line 1:
{{short description|Hash function that is suitable for use in cryptography}}
{{More citations needed|date=May 2016}}
[[Image:Cryptographic Hash Function.svg|thumb|375px|right|A cryptographic hash function (specifically [[SHA-1]]) at work. A small change in the input (in the word "over") drastically changes the output (digest). This is called the [[avalanche effect]].]]
{{SHA-box}}

A '''cryptographic hash function''' ('''CHF''') is a [[hash algorithm]] (a [[map (mathematics)|map]] of an arbitrary binary string to a binary string with a fixed size of <math>n</math> bits) that has special properties desirable for a [[cryptography|cryptographic]] application:{{sfn|Menezes|van Oorschot|Vanstone|2018|p=33}}
* the probability of a particular <math>n</math>-bit output result ([[hash value]]) for a random input string ("message") is <math>2^{-n}</math> (as for any good hash), so the hash value can be used as a representative of the message;
* finding an input string that matches a given hash value (a ''pre-image'') is infeasible, ''assuming all input strings are equally likely.'' The ''resistance'' to such search is quantified as [[security strength]]: a cryptographic hash with <math>n</math> bits of hash value is expected to have a ''preimage resistance'' strength of <math>n</math> bits, unless the space of possible input values is significantly smaller than <math>2^{n}</math> (a practical example can be found in {{section link||Attacks on hashed passwords}});
* a ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known;
* finding any pair of different messages that yield the same hash value (a ''collision'') is also infeasible: a cryptographic hash is expected to have a ''collision resistance'' strength of <math>n/2</math> bits (lower due to the [[birthday paradox]]).
 
Cryptographic hash functions have many [[information security|information-security]] applications, notably in [[digital signaturessignature]]s, [[message authentication codescode]]s (MACs), and other forms of [[authentication]]. They can also be used as ordinary [[hash functionsfunction]]s, to index data in [[hash tablestable]]s, for [[fingerprint (computing)|fingerprinting]], to detect duplicate data or uniquely identify files, and as checksums[[checksum]]s to detect accidental data corruption. Indeed, in information-security contexts, cryptographic hash values are sometimes called (''digital'') ''fingerprints'', ''checksums'', (''message'') ''digests'',<ref>{{cite web |url=https://csrc.nist.gov/glossary/term/message_digest |title=message digest |publisher=[[NIST]] |website=Computer Security Resource Center - Glossary}}</ref> or just ''hash values'', even though all these terms stand for more general functions with rather different properties and purposes.<ref name="wjryW">{{cite web|last1=Schneier|first1=Bruce|author-link1=Bruce Schneier|title=Cryptanalysis of MD5 and SHA: Time for a New Standard|url=https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html|url-status=dead|archive-url=https://web.archive.org/web/20160316114109/https://www.schneier.com/essays/archives/2004/08/cryptanalysis_of_md5.html|archive-date=2016-03-16|access-date=2016-04-20|website=Computerworld|quote=Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography.}}</ref>
 
[[Non-cryptographic hash functionsfunction]]s are used in [[hash tablestable]]s and to detect accidental errors; their constructions frequently provide no resistance to a deliberate attack. For example, a [[denial-of-service attack]] on hash tables is possible if the collisions are easy to find, as in the case of linear [[cyclic redundancy check]] (CRC) functions.{{sfn|Aumasson|2017|p=106}}
 
== Properties ==
Most cryptographic hash functions are designed to take a [[string (computer science)|string]] of any length as input and produce a fixed-length hash value.
 
A cryptographic hash function must be able to withstand all known [[Cryptanalysis#Types of cryptanalytic attack|types of cryptanalytic attack]]. In theoretical cryptography, the security level of a cryptographic hash function has been defined using the following properties:
 
; Pre-image resistance : Given a hash value {{math|''h''}}, it should be difficult to find any message {{math|''m''}} such that {{math|1=''h'' = hash(''m'')}}. This concept is related to that of a [[one-way function]]. Functions that lack this property are vulnerable to [[preimage attacksattack]]s.
; Second pre-image resistance : Given an input {{math|''m''{{sub|1}}}}, it should be difficult to find a different input {{math|''m''{{sub|2}}}} such that {{math|1=hash(''m''{{sub|1}}) = hash(''m''{{sub|2}})}}. This property is sometimes referred to as ''weak collision resistance''. Functions that lack this property are vulnerable to [[second-preimage attacksattack]]s.
; [[Collision resistance]] : It should be difficult to find two different messages {{math|''m''{{sub|1}}}} and {{math|''m''{{sub|2}}}} such that {{math|1=hash(''m''{{sub|1}}) = hash(''m''{{sub|2}})}}. Such a pair is called a cryptographic [[hash collision]]. This property is sometimes referred to as ''strong collision resistance''. It requires a hash value at least twice as long as that required for pre-image resistance; otherwise, collisions may be found by a [[birthday attack]].{{sfn|Katz|Lindell|2014|pp=155–157, 190, 232}}
 
Collision resistance implies second pre-image resistance but does not imply pre-image resistance.{{sfn|Rogaway|Shrimpton|2004|loc=in Sec. 5. Implications}} The weaker assumption is always preferred in theoretical cryptography, but in practice, a hash-function that is only second pre-image resistant is considered insecure and is therefore not recommended for real applications.
 
Informally, these properties mean that a [[adversary (cryptography)|malicious adversary]] cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical. Second pre-image resistance prevents an attacker from crafting a document with the same hash as a document the attacker cannot control. Collision resistance prevents an attacker from creating two distinct documents with the same hash.
 
A function meeting these criteria may still have undesirable properties. Currently, popular cryptographic hash functions are vulnerable to [[Length extension attack|''length-extension'' attacks]]: given {{math|hash(''m'')}} and {{math|len(''m'')}} but not {{math|''m''}}, by choosing a suitable {{math|{{′|''m''}}}} an attacker can calculate {{math|hash(''m'' ∥ {{′|''m''}})}}, where {{math|∥}} denotes [[concatenation]].<ref name="Y0rF6">{{cite web|url=http://vnhacker.blogspot.com/2009/09/flickrs-api-signature-forgery.html|title=Flickr's API Signature Forgery Vulnerability|first1=Thai|last1=Duong|first2=Juliano|last2=Rizzo|access-date=2012-12-07|archive-date=2013-08-15|archive-url=https://web.archive.org/web/20130815164303/http://vnhacker.blogspot.com/2009/09/flickrs-api-signature-forgery.html|url-status=live}}</ref> This property can be used to break naive authentication schemes based on hash functions. The [[HMAC]] construction works around these problems.
 
In practice, collision resistance is insufficient for many practical uses. In addition to collision resistance, it should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. In particular, a hash function should behave as much as possible like a [[random function]] (often called a [[random oracle]] in proofs of security) while still being deterministic and efficiently computable. This rules out functions like the [[SWIFFT]] function, which can be rigorously proven to be collision-resistant assuming that certain problems on ideal lattices are computationally difficult, but, as a linear function, does not satisfy these additional properties.{{sfn|Lyubashevsky|Micciancio|Peikert|Rosen|2008| pp=54–72}}
 
Checksum algorithms, such as CRC32[[CRC-32]] and other [[cyclic redundancy checkscheck]]s, are designed to meet much weaker requirements and are generally unsuitable as cryptographic hash functions. For example, a CRC was used for message integrity in the [[Wired Equivalent Privacy|WEP]] encryption standard, but an attack was readily discovered, which exploited the linearity of the checksum.
 
=== Degree of difficulty ===
In cryptographic practice, "difficult" generally means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important". The meaning of the term is therefore somewhat dependent on the application since the effort that a malicious agent may put into the task is usually proportional to their expected gain. However, since the needed effort usually multiplies with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a dozen bits to the latter.
 
For messages selected from a limited set of messages, for example passwords[[password]]s or other short messages, it can be feasible to invert a hash by trying all possible messages in the set. Because cryptographic hash functions are typically designed to be computed quickly, special [[key derivation functionsfunction]]s that require greater computing resources have been developed that make such [[brute-force attacksattack]]s more difficult.
 
In some [[Computational complexity theory|theoretical analyses]] "difficult" has a specific mathematical meaning, such as "not solvable in [[asymptotic computational complexity|asymptotic]] [[polynomial time]]". Such interpretations of ''difficulty'' are important in the study of [[provably secure cryptographic hash functionsfunction]]s but do not usually have a strong connection to practical security. For example, an [[exponential time|exponential-time]] algorithm can sometimes still be fast enough to make a feasible attack. Conversely, a polynomial-time algorithm (e.g., one that requires {{math|''n''<sup>20</sup>}} steps for {{math|''n''}}-digit keys) may be too slow for any practical use.
 
== Illustration ==
Line 74 ⟶ 77:
One of the main applications of a [[hash function]] is to allow the fast look-up of data in a [[hash table]]. Being hash functions of a particular kind, cryptographic hash functions lend themselves well to this application too.
 
However, compared with standard hash functions, cryptographic hash functions tend to be much more expensive computationally. For this reason, they tend to be used in contexts where it is necessary for users to protect themselves against the possibility of forgery (the creation of data with the same digest as the expected data) by potentially malicious participants, such as open source applications with multiple sources of download, where malicious files could be substituted in with the same appearance to the user, or an authentic file is modified to contain malicious data.<ref>{{citationCite web needed|title=File Hashing |url=https://www.cisa.gov/sites/default/files/FactSheets/NCCIC%20ICS_Factsheet_File_Hashing_S508C.pdf |url-status=live |archive-url=https://web.archive.org/web/20250202100840/https://www.cisa.gov/sites/default/files/FactSheets/NCCIC%20ICS_Factsheet_File_Hashing_S508C.pdf |archive-date=MayFebruary 20232, 2025 |access-date=March 10, 2025 |website=CYBERSECURITY & INFRASTRUCTURE SECURITY AGENCY |format=PDF}}</ref>
 
==== Content-addressable storage ====
Line 182 ⟶ 185:
A successful, practical attack broke MD5 (used within certificates for [[Transport Layer Security]]) in 2008.<ref name="bVltK">{{Cite web |last=Sotirov |first=A |last2=Stevens |first2=M |last3=Appelbaum |first3=J |last4=Lenstra |first4=A |last5=Molnar |first5=D |last6=Osvik |first6=D A |last7=de Weger |first7=B |date=December 30, 2008 |title=MD5 considered harmful today: Creating a rogue CA certificate |url=http://www.win.tue.nl/hashclash/rogue-ca/ |access-date=March 29, 2009 |website=HashClash |publisher=Department of Mathematics and Computer Science of Eindhoven University of Technology |archive-date=March 25, 2017 |archive-url=https://web.archive.org/web/20170325033522/http://www.win.tue.nl/hashclash/rogue-ca/ |url-status=live }}</ref>
 
Many cryptographic hashes are based on the [[Merkle–Damgård construction]]. All cryptographic hashes that directly use the full output of a Merkle–Damgård construction are vulnerable to [[length extension attack]]s. This makes the MD5, SHA-1, RIPEMD-160, Whirlpool, and the SHA-256 / SHA-512 hash algorithms all vulnerable to this specific attack. SHA-3, BLAKE2, BLAKE3, and the truncated SHA-2 variants are not vulnerable to this type of attack.{{citcn|date=April 2020}}
 
== Attacks on hashed passwords ==