One-way compression function: Difference between revisions

Content deleted Content added
Better introduction
The Merkle-Damgård structure: Extended the explanation, should probably become a separate article later on.
Line 13:
 
== The Merkle-Damgård structure ==
A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a ''compression function''. The compression function can either be specially designed for hashing or be built from a block cipher.

The last block processed should also be [[Padding (cryptography)|length padded]], this is crucial to the security of this construction. This construction is called the ''[[Ralph Merkle|Merkle]]-[[Ivan Damgård|Damgård]] structure''. Most widely used hash functions, including [[SHA-1]] and [[MD5]], take this form.
 
<div align=center>[[Image:Merkle-Damgard hash.svg|Merkle-Damgård]]</div>
In the diagram, the compression function is denoted by ''f'', and transforms a fixed length input to an output of the same size. The algorithm starts with an initial value, the [[initialization vector]] (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function ''f'' takes the result so far, combines it with the message block, and produces an intermediate result. Bits representing the length of the entire message are appended to the message and padded suitably as part of the last block. The value after the last block is taken to be the hash value for the entire message.
 
In the diagram, the compression function is denoted by ''f'', and transforms a fixed length input to an output of the same size. The algorithm starts with an initial value, the [[initialization vector]] (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function ''f'' takes the result so far, combines it with the message block, and produces an intermediate result. BitsThe representinglast theblock lengthis ofpadded thewith entirezeros messageas are appended to the messageneeded and paddedbits suitablyrepresenting asthe partlength of the lastentire block.message Theare valueappended. after the last block(This is takencalled tolength bepadding, thesee hash valuebelow for thedetailed entire messageexample.)
The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the compression function ''f'' is collision-resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:
 
To harden the hash further the last result is then often fed through a ''finalisation function''. The ''finalisation function'' can have several purposes such as compressing a bigger internal state (the last result) into a smaller output hash size or to guarantee a better mixing and [[avalanche effect]] on the bits in the hash sum. The ''finalisation function'' is often built by using the ''compression function''.
 
=== Security characteristics ===
The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the compression function ''f'' is collision-resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:
 
* Length extension - once an attacker has one collision, he can find more very cheaply.
Line 23 ⟶ 30:
* Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.
* "Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a [[random oracle]].
 
=== Length padding example ===
Let's say the message to be hashed is "Wikipedia" and the block size of the compression function is 8 bytes (64 bits).
 
So, to be able to feed the message to the compression function the last block needs to be zero padded to a full block. So we get two blocks looking like this:
 
Wikipedi a0000000
 
But this is not enough since it would mean that for instance the message "Wikipedia00" would get the same hash sum. Therefore also the length of the message is added in an extra block, so we get three blocks like this:
 
Wikipedi a0000000 00000009
 
Now that is a bit wasteful since it means hashing one extra block. So there is a slight speed optimisation that most hash implementations use. If there is space enough among the zeros padded to the last block the length value can instead be padded there. Like this:
 
Wikipedi a0000009
 
Note that to avoid confusion the implementation must use a fixed bit-size for the length value, say 64-bit. So the length value padded in the end really is "0009" not just "9".
 
== Davies-Meyer ==