One-way compression function

In cryptography, there are several methods to use a block cipher to build a cryptographic hash function. The methods resembles the block cipher modes of operation usually used for encryption.

Some methods to turn any normal block cipher into the compression function for a hash function are Davies-Meyer, Miyaguchi-Preneel, Matyas-Meyer-Oseas, MDC-2 and MDC-4. They are then used inside the Merkle-Damgård structure to build the actual hash function. These methods are described in detail further down. (MDC-2 is also the name of a hash function patented by IBM.)

Using a block cipher as a hash function usually is much slower then using a specially designed hash function. But in some cases it might be easier since it means just implementing a block cipher and then using it both as a block cipher and a hash function. It can also save code space in very tiny embedded systems like for instance smart cards or nodes in cars or other machines.

If a block cipher has a block size of say 128 bits most of the methods create a hash function that has the block size of 128 bits and produces a hash of 128 bits. But there are also methods to make hashes with double the hash size compared to the block size of the block cipher used. So a 128-bit block cipher can be turned into a 256-bit hash function.

The hash function is secure if the following conditions are met:

The block cipher needs to be secure.
The resulting hash size needs to be big enough. 64-bit is too small, 128-bit might be enough.
The last block needs to be properly length padded prior to the hashing. (See the Merkle-Damgård structure below.) Length padding is normally implemented and handled internally in specialised hash functions like SHA-1 etc.

The Merkle-Damgård structure

A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a compression function. The last block processed should also be length padded, this is crucial to the security of this construction. This construction is called the Merkle-Damgård structure. Most widely used hash functions, including SHA-1 and MD5, take this form.

In the diagram, the compression function is denoted by f, and transforms a fixed length input to an output of the same size. The algorithm starts with an initial value, the initialization vector (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function f takes the result so far, combines it with the message block, and produces an intermediate result. Bits representing the length of the entire message are appended to the message and padded suitably as part of the last block. The value after the last block is taken to be the hash value for the entire message.

The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the compression function f is collision-resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:

Length extension - once an attacker has one collision, he can find more very cheaply.
Second-preimage attacks against long messages are always much more efficient than brute force.
Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.
"Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a random oracle.

Davies-Meyer

The Davies-Meyer hash construction

The Davies-Meyer hash compression function feeds each block of the message (m_i) as the key to the block cipher. It feeds the previous hash value (H_i-1) as the cleartext to be encrypted. The output ciphertext is then also XORed ( $\oplus$ ) with the previous hash value (H_i-1) to produce the next hash value (H_i). In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

$H_{i}=E_{m_{i}}(H_{i-1})\oplus H_{i-1}$

If the block cipher uses for instance 256-bit keys then each message block (m_i) is a 256-bit chunk of the message. If the same block cipher uses a block size of 128 bits then the input and output hash values in each round is 128 bits.

Variations of this method replace XOR with any other group operation, such as addition on 32-bit unsigned integers.

Note: There is some more information in the old Davies-Meyer article that has not been merged to this article yet.

Matyas-Meyer-Oseas

The Matyas-Meyer-Oseas hash construction

The Matyas-Meyer-Oseas hash compression function can be considered the dual (the opposite) of Davies-Meyer.

It feeds each block of the message (m_i) as the cleartext to be encrypted. The output ciphertext is then also XORed ( $\oplus$ ) with the same message block (m_i) to produce the next hash value (H_i). The previous hash value (H_i-1) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

If the block cipher have different block and key size the hash value (H_i-1) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.

$H_{i}=E_{g(H_{i-1})}(m_{i})\oplus m_{i}$

Miyaguchi-Preneel

The Miyaguchi-Preneel hash construction

The Miyaguchi-Preneel hash compression function is an extended variant of Matyas-Meyer-Oseas. It was independently proposed by Shoji Miyaguchi and Bart Preneel.

It feeds each block of the message (m_i) as the cleartext to be encrypted. The output ciphertext is then XORed ( $\oplus$ ) with the same message block (m_i) and then also XORed with the previous hash value (H_i-1) to produce the next hash value (H_i). The previous hash value (H_i-1) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

If the block cipher have different block and key size the hash value (H_i-1) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.

$H_{i}=E_{g(H_{i-1})}(m_{i})\oplus H_{i-1}\oplus m_{i}$

The roles of m_i and H_i-1 may be switched, so that H_i-1 is encrypted under the key m_i. Thus making this method an extension of Davies-Meyer instead.

References

Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone (2001), chapter 9.