Security of cryptographic hash functions: Difference between revisions

Content deleted Content added
Various corrections
Line 13:
The basic question is the meaning of "'''hard'''". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important."
 
The second approach is theoretical and is based on the [[computational complexity theory]]. If problem A is hard, there exists a formal [[Reduction (complexity)|security reduction]] from a problem which is widely supposed to be unsolvable in [[polynomial time]], such as [[integer factorization]] problem or [[discrete logarithm]] problem.
 
However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. It is equally important to choose the parameters sensibly (e.g. a length of the numbers that the system works with). For instance, factoring 21 is easy although in general [[integer factorization]] is considered a hard problem.
 
== Classical Hash Functions - Practical Approach to Security ==
Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various [[bitwise operation]]s (e.g. rotations), [[Modular arithmetic|modular additions]] and [[One-way compression function|compression functions]] are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, [[SHA hash functions|SHA-1]], was shown to be less secure than its length suggested: collisions could be found in only 2^{69} tests, rather than the brute-force number of 2^{80}. The search for a "good" hash function has thus become a hot topic.
 
In other words, most of the hash functions in use nowadays, are not provably secure collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing algorithmsfunctions, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is [[MD5]].
 
Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a few dozen bits to the latter.
 
== Provably Secure Hash Functions ==
In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash functions is as hard as breaking the underlying problem. This gives much stronger security than just relying on complex mixing of bits as in the classical approach.
 
A cryptographic hash function has '''provable security against collision attacks''' if finding collisions is provably [[Polynomial-time reduction|polynomial-time reducible]] from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.
 
It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means, that finding collisions cannot be easier than solving P.
Line 41 ⟶ 43:
* Current collision-resistant hash algorithms that have provable security [[Reduction (complexity)|reductions]] are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. [[Very smooth hash]] is an example.
* Constructing a hash function with provable security is much more difficult than using a classical approach where we just hope that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions.
* The proof is often a reduction to a problem with asymptotically hard [[Worst-case complexity|wordsworst-case]] or [[average-case complexity]]. Worst-case measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average complexity offers only limited security as there still can be an algorithm that easily solves the problem for a subset of the problem space. For example, early versions of [[Fast Syndrome Based Hash]] turned out to be insecure. This problem was solved in the latest version.
 
[[SWIFFT]] is an example of a hash function that circumvents these security problems. It can be shown that for any algorithm that can break SWIFFT with probability P within an estimated time T, we can find an algorithm that solves the ''worst case'' scenario of a certain difficult mathematical problem within time T' depending on T and P.
Line 51 ⟶ 53:
 
=== More practical Provably Secure Hash Functions ===
* [[Very smooth hash|VSH - Very Smooth Hash function]] - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number <math>n</math> (this is assumedproven to be as hard as integer[[Integer factorization|factoring]] <math>n</math>).
* [[MuHASH]]
* [[Elliptic curve only hash|ECOH - Elliptic Curve Only hash function]] - based on the concept of Elliptic curves, Subset Sum Problem and summation of polynomials. The security proof of the collision resistance was based on weakend assumptions and eventually a second pre-image attack was found.
Line 59 ⟶ 61:
* [[Knapsack-based hash functions]] - A family of hash functions based on the [[Knapsack problem]].
* [[The Zémor-Tillich hash function]] - A family of hash functions that rely on the arithmetic of the group of matrices [[Special linear group|SL2]]. Finding collisions is at least as difficult as finding factorization of certain elements in this group. This is supposed to be hard, at least [[PSPACE-complete]].
* [[Hash functions from Sigma Protocols]] - there exists a general way of constructing a provably secure hash, specifically from a any (suitable) [[The Sigma Protocol|sigma protocol]]. A speed-upfaster version of [[Very smooth hash|VSH]] (called VSH*) could be obtained in this way.
 
{{Crypto navbox | hash}}