Security of cryptographic hash functions: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:06, 10 January 2010 edit Timobaas (talk \| contribs) 34 edits m →More practical Provably Secure Hash Functions ← Previous edit		Latest revision as of 18:28, 7 January 2025 edit undo Rofraja (talk \| contribs) Extended confirmed users, Pending changes reviewers 47,452 edits m Replaced 1 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles
(60 intermediate revisions by 48 users not shown)
Line 1: {{Short description\|None}} In [[cryptography]], ~~we can divide~~ [[cryptographic hash functions]] can be divided into two main categories. In the first category are those functions, ~~where~~whose ~~their~~designs ~~design is~~are based on a mathematical ~~problem~~problems, and ~~thus~~whose ~~their~~security ~~security~~thus follows from rigorous mathematical ~~proves~~proofs, [[Computational complexity theory\|complexity theory]] and [[Reduction (complexity)\|formal reduction]]. These functions are called ''~~'Provably~~provably ~~Secure~~secure ~~Cryptographic~~cryptographic ~~Hash~~hash ~~Functions'~~functions''~~. However this does not mean that such a function could not be broken~~. To construct ~~them~~these is very difficult, and ~~only a~~ few examples ~~were~~have been introduced. ~~The~~Their practical use is limited. In the second category are functions ~~that~~which are not based on mathematical problems, but on an ad -hoc ~~basis~~constructions, ~~where~~in which the bits of the message are ~~nicely~~ mixed to produce the hash. ~~They~~These are then believed to be hard to break, but no ~~such~~ formal ~~prove~~proof is given. Almost all ~~widely-spread~~ hash functions ~~fall~~in widespread use reside in this category. Some of these functions are already broken, and are no longer in use. ''See'' [[Hash function security summary]]. == Types of ~~Security~~security of ~~Hash Functions~~hash functions== Generally, the ''basic'' security of [[cryptographic hash functions]] can be seen from ~~three~~ different angles: pre-image resistance, second pre-image resistance ~~and~~, collision resistance., and pseudo-randomness. * '''Pre-image resistance''': given a hash {{Mvar\|h}}, it should be ''hard'' to find any message {{Mvar\|m}} such that {{Math\|1=''h'' = hash(''m'')}}. This concept is related to that of ~~one~~the [[one-way function]]. Functions that lack this property are vulnerable to [[pre-image ~~attacks~~attack]]s. * '''Second pre-image resistance''': given an input m1{{Math\|''m''<sub>1</sub>}}, it should be ''hard'' to find another input, m2{{Math\|''m''<sub>2</sub> ~~(not~~≠ ~~equal to m1)~~''m''<sub>1</sub>}} such that {{Math\|1=hash(m1''m''<sub>1</sub>) = hash(m2''m''<sub>2</sub>)}}. This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to [[second pre-image ~~attacks~~attack]]s. * '''Collision resistance''': it should be ''hard'' to find two different messages m1{{Math\|''m''<sub>1</sub>}} and m2{{Math\|''m''<sub>2</sub>}} such that {{Math\|1=hash(m1''m''<sub>1</sub>) = hash(m2''m''<sub>2</sub>)}}. Such a pair is called a (cryptographic) hash collision~~, and~~. ~~this~~This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance,; otherwise, collisions may be found by a [[birthday attack]]. * '''Pseudo-randomness''': it should be ''hard'' to distinguish a pseudo-random number generator based on the hash function from true random number generator; for example, it passes usual [[randomness tests]]. ==== The ~~Meaning~~meaning of ~~"Hard" =~~''hard''=== The basic question is the meaning of "'''hard'''". There are two approaches to answer this question. First is the intuitive/practical approach: "''hard'' means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The second approach is theoretical and is based on the [[computational complexity theory]]: if problem ''A'' is hard, then there exists a formal [[Reduction (complexity)\|security reduction]] from a problem which is widely considered unsolvable in [[polynomial time]], such as [[integer factorization]] or the [[discrete logarithm]] problem. However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. The difficulty of a problem also depends on its size. For example, [[RSA public key cryptography\|RSA public-key cryptography]] (which relies on the difficulty of [[integer factorization]]) is considered secure only with keys that are at least 2048 bits long, whereas keys for the [[ElGamal encryption\|ElGamal cryptosystem]] (which relies on the difficulty of the [[discrete logarithm]] problem) are commonly in the range of 256–512 bits. The second approach is theoretical and is based on the [[complexity theory]]. If problem A is hard, there exists a formal [[Reduction (complexity)\|security reduction]] from a problem which is widely supposed to be unsolvable in [[polynomial time]], such as [[integer factorization]] problem or [[discrete logarithm]] problem. ===Password case=== ~~== Classical Hash Functions - Practical Approach to Security ==~~ If the set of inputs to the hash is relatively small or is ordered by likelihood in some way, then a brute force search may be practical, regardless of theoretical security. The likelihood of recovering the preimage depends on the input set size and the speed or cost of computing the hash function. A common example is the use of hashes to store [[password]] validation data. Rather than store the plaintext of user passwords, an access control system typically stores a hash of the password. When a person requests access, the password they submit is hashed and compared with the stored value. If the stored validation data is stolen, then the thief will only have the hash values, not the passwords. However, most users choose passwords in predictable ways, and passwords are often short enough so that all possible combinations can be tested if fast hashes are used.<ref>{{cite web Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various [[Bitwise operation\|bitwise operations]] (e.g. rotations), [[Modular arithmetic\|modular additions]] and [[One-way compression function\|compression functions]] are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, [[SHA hash functions\|SHA-1]], was shown to be less secure than its length suggested: collisions could be found in only 2^{69} tests, rather than the brute-force number of 2^{80}. The search for a "good" hash function has thus become a hot topic.▼ \| url=https://arstechnica.com/information-technology/2012/12/25-gpu-cluster-cracks-every-standard-windows-password-in-6-hours/ \| title=25-GPU cluster cracks every standard Windows password in <6 hours \| date=2012-12-10 \| first=Dan \| last=Goodin \| publisher=[[Ars Technica]] \| access-date=2020-11-23}}</ref> Special hashes called [[key derivation function]]s have been created to slow searches. ''See'' [[Password cracking]]. ==Cryptographic hash functions== In other words, most of the hash functions in use nowadays, are not provably secure collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing algorithms, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is [[MD5]]. ▼ {{Main article\|Cryptographic hash function}} ▲Most hash functions are built on an ad -hoc basis, where the bits of the message are nicely mixed to produce the hash. Various [[~~Bitwise~~bitwise operation~~\|bitwise operations~~]]s (e.g. rotations), [[Modular arithmetic\|modular additions]], and [[One-way compression function\|compression functions]] are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago{{When\|date=August 2024}}, one of the most popular hash functions, [[~~SHA hash functions\|~~SHA-1]], was shown to be less secure than its length suggested: collisions could be found in only 2^<sup>51</sup><ref>{~~69}~~{Cite ~~tests,~~web\| ~~rather~~title=Classification ~~than~~and ~~the~~Generation ~~brute-force~~of ~~number~~Disturbance ofVectors ~~2^{80}.~~for ~~The~~Collision ~~search~~Attacks ~~for~~against aSHA-1 ~~"good"~~\| ~~hash~~url=http://eprint.iacr.org/2008/469.pdf ~~function~~\| ~~has~~archive-url=https://web.archive.org/web/20090115213127/http://eprint.iacr.org/2008/469.pdf ~~thus~~\| ~~become~~archive-date=2009-01-15}}</ref> atests, ~~hot~~rather ~~topic~~than the brute-force number of 2<sup>80</sup>. Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a few dozen bits to the latter. ▲In other words, most of the hash functions in use nowadays, are not provably ~~secure~~ collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing ~~algorithms~~functions, but with the risk that a weakness of such a function will be eventually used to find collisions. ~~The~~One famous case is [[MD5]]. ~~== Provably Secure Hash Functions ==~~ In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash is as hard as breaking the underlying problem. This gives much stronger security than just relying on complex mixing of bits as in the classical approach.▼ ==Provably secure hash functions== A cryptographic hash has '''provable security against collision attacks''' if finding collisions is provably [[Polynomial-time reduction\|polynomial-time reducible]] from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable. ▲In this approach, ~~we base~~ the security of a hash function is based on some hard mathematical problem, and weit ~~prove~~is proved that finding collisions of the hash function is as hard as breaking the underlying problem. This gives ~~much~~a somewhat stronger notion of security than just relying on complex mixing of bits as in the classical approach. ItA ~~means~~cryptographic ~~that~~hash iffunction ~~finding~~has ~~collisions~~''provable ~~would~~security beagainst ~~feasible~~collision inattacks'' ~~polynomial~~if ~~time~~finding bycollisions ~~algorithm~~is A,provably ~~we could find and use~~[[Polynomial-time reduction\|polynomial -time ~~algorithm~~reducible]] Rfrom ~~(reduction algorithm) that would use algorithm A to solve~~a problem ''P,'' which is ~~widely~~ supposed to be unsolvable in polynomial time. ~~That~~The function is athen ~~contradiction.~~called ~~This~~provably ~~means~~secure, ~~that~~or ~~finding~~just ~~collisions cannot be easier than solving P~~provable. It means that if finding collisions would be feasible in polynomial time by algorithm ''A'', then one could find and use polynomial time algorithm ''R'' (reduction algorithm) that would use algorithm ''A'' to solve problem ''P'', which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means that finding collisions cannot be easier than solving ''P''. ~~Hash functions with the proof of their security are based on mathematical functions.~~ However, this only indicates that finding collisions is difficult in ''some'' cases, as not all instances of a computationally hard problem are typically hard. Indeed, very large instances of NP-hard problems are routinely solved, while only the hardest are practically impossible to solve. === Hard problems ===▼ Examples of problems, that are assumed to be not solvable in polynomial time▼ * [[Discrete logarithm\|Discrete Logarithm Problem]]▼ * [[Quadratic residue\|Finding Modular Square Roots]]▼ * [[Integer factorization\|Integer Factorization Problem]]▼ * [[Subset sum problem\|Subset Sum Problem]]▼ ▲=== Hard problems === === Downsides of Provable Approach ===▼ ▲Examples of problems, that are assumed to be not solvable in polynomial time include * Current collision-resistant hash algorithms that have provable security [[Reduction (complexity)\|reductions]] are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. [[Very smooth hash]] is an example.▼ ▲* [[Discrete logarithm~~\|Discrete Logarithm Problem~~]] * The proof is often a reduction to a problem with asymptotically hard [[Worst-case complexity\|words-case]] or [[Average-case complexity\|average-case]] complexity. Worst-case measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average complexity offers only limited security as there still can be an algorithm that easily solves the problem for a subset of the problem space. [[Fast Syndrome Based hash]] is an example.▼ ▲* [[Quadratic residue\|~~Finding~~ Modular ~~Square~~square ~~Roots~~roots]] * Constructing a hash function with provable security is much more difficult than using a classical approach where we just hope that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions.▼ ▲* [[Integer factorization~~\|Integer Factorization Problem~~]] ▲* [[Subset sum problem\|Subset ~~Sum Problem~~sum]] ▲=== Downsides of ~~Provable~~the ~~Approach~~provable approach=== ~~=== Example of (unpractical) Provably Secure Hash Function ===~~ ▲* Current{{When\|date=August 2024}} collision-resistant hash algorithms that have provable security [[Reduction (complexity)\|reductions]] are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. [[Very smooth hash]] is an example. Hash(''m'') = ''x''<sup>''m''</sup> mod ''n'' where ''n'' is hard to factor composite number, and ''x'' is some prespecified base value. A collision ''x''<sup>''m1''</sup> congruent to ''x''<sup>''m2''</sup> reveals a multiple ''m1 - m2'' of the order of ''x''. Such information can be used to factor ''n'' in polynomial time assuming certain properties of ''x''.▼ ▲* Constructing a hash function with provable security is much more difficult than using a classical approach where weone just ~~hope~~hopes that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions. ▲* The proof is often a reduction to a problem with asymptotically hard [[Worst-case complexity\|~~words~~worst-case]] or [[~~Average~~average-case complexity~~\|average-case~~]] ~~complexity~~. Worst-case complexity measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average-case complexity offers only limited security, as there still can be an algorithm that easily solves the problem for a subset of the problem space. For example, early versions of [[Fast Syndrome Based ~~hash~~Hash]] isturned anout ~~example~~to be insecure. This problem was solved in the latest version. [[SWIFFT]] is an example of a hash function that circumvents these security problems. It can be shown that, for any algorithm that can break SWIFFT with probability {{Mvar\|p}} within an estimated time {{Mvar\|t}}, one can find an algorithm that solves the ''worst-case'' scenario of a certain difficult mathematical problem within time {{Math\|''t''′}} depending on {{Mvar\|t}} and {{Mvar\|p}}.{{citation needed\|date=June 2014}} But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo ''n'' per message-bit.▼ ===Example of (impractical) provably secure hash function=== ~~=== More practical Provably Secure Hash Functions ===~~ ▲~~Hash~~Let {{Math\|1=hash(''m'') = ''x''<sup>''m''</sup> mod ''n''}}, where ''{{Mvar\|n''}} is a hard -to -factor composite number, and ''{{Mvar\|x''}} is some prespecified base value. A collision {{Math\|''x''<sup>''m1m''<sub>1</sub></sup> ~~congruent to~~&equiv; ''x''<sup>''m2m''<sub>2</sub></sup> (mod ''n'')}} reveals a multiple {{Math\|''m1m''<sub>1</sub> -− m2''m''<sub>2</sub>}} of the [[multiplicative order]] of ''{{Mvar\|x''}} modulo {{Mvar\|n}}. ~~Such~~This information can be used to factor ''{{Mvar\|n''}} in polynomial time, assuming certain properties of ''{{Mvar\|x''}}. * [[Very smooth hash\|VSH - Very Smooth Hash function]] - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number (this is assumed to be as hard as integer factorization).▼ ▲But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo ''{{Mvar\|n''}} per message-bit. ===More practical provably secure hash functions=== ▲* [[Very smooth hash\|VSH ~~- Very Smooth Hash function~~]]—Very -Smooth aHash—a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number {{Mvar\|n}} (this is ~~assumed~~proven to be as hard as ~~integer~~[[Integer factorization\|factoring]] {{Mvar\|n}}). * [[MuHASH]] * [[Elliptic curve only hash\|ECOH]]—[[Elliptic -curve only hash\|Elliptic Curve Only hash function]] ~~- based~~—based on the concept of [[Elliptic curve\|elliptic curves]], ~~Subset~~the ~~Sum~~[[subset ~~Problem~~sum problem]], and summation of polynomials. The security proof of the collision resistance was based on ~~weakend~~weakened ~~assumptions~~assumptionsm, and eventually a second pre-image attack was found. * [[Fast Syndrome Based Hash\|FSB]]—[[Fast -Syndrome Based Hash\|Fast Syndrome-Based hash function]] ~~- it~~—it can be proven that breaking FSB is at least as difficult as solving a''regular ~~certain~~syndrome ~~NP-complete~~decoding'', ~~problem~~which is known asto ~~Regular~~be ~~Syndrome Decoding~~NP-complete. * [[SWIFFT]] ~~- SWIFFT~~—SWIFFT is based on the [[~~Fast~~fast ~~fourier~~Fourier transform]] and is provably collision resistant, under a relatively mild assumption about the worst-case difficulty of finding short vectors in cyclic/~~ideal~~ [[Ideal lattice cryptography\|ideal lattices]]. * [[Chaum, van Heijst, Pfitzmann hash function]] ~~- A~~—a compression function where finding collisions is as hard as ~~finding~~solving athe [[discrete logarithm\|discrete logarithm problem]] in a finite group {{Math\|''F''<~~math~~sub>~~F_{2p~~2''p''+1}</~~math~~sub>}}. * [[Knapsack-based hash functions]] ~~- A~~—a family of hash functions based on the [[~~Knapsack~~knapsack problem]]. * [[The Zémor-Tillich hash function]] ~~- A~~—a family of hash functions that ~~rely~~relies on the arithmetic of the group of matrices [[Special linear group\|~~SL2~~SL<sub>2</sub>]]. Finding collisions is at least as difficult as finding factorization of certain elements in this group. This is supposed to be hard, at least [[PSPACE-complete]].{{Dubious\|PSPACE-complete?!\|date=January 2019}} For this hash, an attack was eventually discovered with a time complexity close to {{Math\|2<sup>''n''/2</sup>}}. This beat by far the birthday bound and ideal pre-image complexities, which are {{Math\|2<sup>3''n''/2</sup>}} and {{Math\|2<sup>3''n''</sup>}} for the Zémor-Tillich hash function. As the attacks include a birthday search in a reduced set of size {{Math\|2''n''}}, they indeed do not destroy the idea of provable security or invalidate the scheme but rather suggest that the initial parameters were too small.<ref>{{Citation \| last1 = Petit \| first1 = C. * [[Hash functions from Sigma Protocols]] - there exists a general way of constructing a provably secure hash, specifically from a any (suitable) [[The Sigma Protocol\|sigma protocol]]. A speed-up version of [[Very smooth hash\|VSH ]] (called VSH) could be obtained in this way.▼ \| last2 = Quisquater \| first2 = J.-J. \| last3 = Tillich \| first3 = J.-P. \| contribution = Hard and easy Components of Collision Search in the Zémor-Tillich hash function:new Attacks and Reduced Variants with Equivalent Security \| contribution-url = http://www-rocq.inria.fr/secret/Jean-Pierre.Tillich/publications/CTRSA.pdf }}</ref> ▲ [[Hash functions from Sigma Protocols]] ~~- there~~—there exists a general way of constructing a provably secure hash, specifically from a any (suitable) [[~~The~~Proof of knowledge#Sigma ~~Protocol~~protocols\|sigma protocol]]. A ~~speed-up~~faster version of [[Very smooth hash\|VSH ]] (called VSH*) could be obtained in this way. ==References== {{Reflist}} {{~~Crypto~~Cryptography navbox \| hash}} [[Category:Cryptographic hash functions]]