Security of cryptographic hash functions: Difference between revisions

Content deleted Content added
m cleanup after move
Line 1:
{{Merge to|Cryptographic hash function|discuss=Talk:ProvablySecurity secureof cryptographic hash functionfunctions#RestructuringMerge proposalProposal|date=January 2014}}
In [[cryptography]], [[cryptographic hash functions]] can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, [[Computational complexity theory|complexity theory]] and [[Reduction (complexity)|formal reduction]]. These functions are called '''Provably Secure Cryptographic Hash Functions'''. However this does not mean that such a function could not be broken. To construct them is very difficult and only a few examples were introduced. The practical use is limited.
 
In the second category are functions that are not based on mathematical problems but on an ad hoc basis, where the bits of the message are mixed to produce the hash. They are then believed to be hard to break, but no such formal proof is given. Almost all widely-spread hash functions fall in this category. Some of these functions are already broken and are no longer in use.
 
== Types of Securitysecurity of Hash Functionshash functions==
Generally, the ''basic'' security of [[cryptographic hash functions]] can be seen from three different angles: pre-image resistance, second pre-image resistance and collision resistance.
 
Line 11:
* '''Collision resistance''': it should be ''hard'' to find two different messages m1 and m2 such that hash(m1) = hash(m2). Such a pair is called a (cryptographic) hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance, otherwise collisions may be found by a [[birthday attack]].
 
=== The Meaningmeaning of "Hardhard" ===
 
The basic question is the meaning of "'''hard'''". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important."
 
Line 19 ⟶ 18:
However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. It is equally important to choose the parameters sensibly (e.g. a length of the numbers that the system works with). For instance, factoring 21 is easy although in general [[integer factorization]] is considered a hard problem.
 
== Classical Hashhash Functionsfunctions - Practicalpractical Approachapproach to Security security==
Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various [[bitwise operation]]s (e.g. rotations), [[Modular arithmetic|modular additions]] and [[One-way compression function|compression functions]] are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, [[SHA-1]], was shown to be less secure than its length suggested: collisions could be found in only 2<sup>51</sup><ref>http://eprint.iacr.org/2008/469.pdf</ref> tests, rather than the brute-force number of 2<sup>80</sup>. The search for a "good" hash function has thus become a hot topic.
 
Line 26 ⟶ 25:
Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousandfold advantage in processing power can be neutralized by adding a few dozen bits to the latter.
 
== Provably Secure Hash Functions ==
In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash functions is as hard as breaking the underlying problem. This gives much stronger security than just relying on complex mixing of bits as in the classical approach.
 
Line 35 ⟶ 34:
Hash functions with the proof of their security are based on mathematical functions.
 
=== Hard problems ===
Examples of problems, that are assumed to be not solvable in polynomial time
* [[Discrete logarithm|Discrete Logarithm Problem]]
Line 42 ⟶ 41:
* [[Subset sum problem|Subset Sum Problem]]
 
=== Downsides of Provable Approachprovable approach===
* Current collision-resistant hash algorithms that have provable security [[Reduction (complexity)|reductions]] are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. [[Very smooth hash]] is an example.
* Constructing a hash function with provable security is much more difficult than using a classical approach where we just hope that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions.
Line 49 ⟶ 48:
[[SWIFFT]] is an example of a hash function that circumvents these security problems. It can be shown that for any algorithm that can break SWIFFT with probability P within an estimated time T, we can find an algorithm that solves the ''worst case'' scenario of a certain difficult mathematical problem within time T' depending on T and P.
 
=== Example of (unpractical) Provably Secure Hash Function ===
Hash(''m'') = ''x''<sup>''m''</sup> mod ''n'' where ''n'' is hard to factor composite number, and ''x'' is some prespecified base value. A collision ''x''<sup>''m1''</sup> congruent to ''x''<sup>''m2''</sup> reveals a multiple ''m1 - m2'' of the order of ''x''. Such information can be used to factor ''n'' in polynomial time assuming certain properties of ''x''.
 
But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo ''n'' per message-bit.
 
=== More practical provably secure hash functions ===
* [[Very smooth hash|VSH - Very Smooth Hash function]] - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number <math>n</math> (this is proven to be as hard as [[Integer factorization|factoring]] <math>n</math>).
* [[MuHASH]]