One-way compression function

In cryptography, there are several methods to use a block cipher to build a cryptographic hash function. The methods resembles the block cipher modes of operation usually used for encryption.

Some methods to turn any normal block cipher into the compression function for a hash function are Davies-Meyer, Miyaguchi-Preneel, Matyas-Meyer-Oseas, MDC-2 and MDC-4. They are then used inside the Merkle-Damgård structure to build the actual hash function. These methods are described in detail further down. (MDC-2 is also the name of a hash function patented by IBM.)

Using a block cipher as a hash function usually is much slower then using a specially designed hash function. But in some cases it might be easier since it means just implementing a block cipher and then using it both as a block cipher and a hash function. It can also save code space in very tiny embedded systems like for instance smart cards or nodes in cars or other machines.

If a block cipher has a block size of say 128 bits most of the methods create a hash function that has the block size of 128 bits and produces a hash of 128 bits. But there are also methods to make hashes with double the hash size compared to the block size of the block cipher used. So a 128-bit block cipher can be turned into a 256-bit hash function.

The hash function is secure if the following conditions are met:

The block cipher needs to be secure.
The resulting hash size needs to be big enough. 64-bit is too small, 128-bit might be enough.
The last block needs to be properly length padded prior to the hashing. (See the Merkle-Damgård structure below.) Length padding is normally implemented and handled internally in specialised hash functions like SHA-1 etc.

The Merkle-Damgård structure

A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a compression function. The compression function can either be specially designed for hashing or be built from a block cipher.

The last block processed should also be length padded, this is crucial to the security of this construction. This construction is called the Merkle-Damgård structure. Most widely used hash functions, including SHA-1 and MD5, take this form.

In the diagram, the compression function is denoted by f, and transforms a fixed length input to an output of the same size. The algorithm starts with an initial value, the initialization vector (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function f takes the result so far, combines it with the message block, and produces an intermediate result. The last block is padded with zeros as needed and bits representing the length of the entire message are appended. (This is called length padding, see below for detailed example.)

To harden the hash further the last result is then often fed through a finalisation function. The finalisation function can have several purposes such as compressing a bigger internal state (the last result) into a smaller output hash size or to guarantee a better mixing and avalanche effect on the bits in the hash sum. The finalisation function is often built by using the compression function.

Security characteristics

The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the compression function f is collision-resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:

Length extension - once an attacker has one collision, he can find more very cheaply.
Second-preimage attacks against long messages are always much more efficient than brute force.
Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.
"Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a random oracle.

Length padding example

Let's say the message to be hashed is "Wikipedia" and the block size of the compression function is 8 bytes (64 bits).

So, to be able to feed the message to the compression function the last block needs to be zero padded to a full block. So we get two blocks looking like this:

Wikipedi  a0000000

But this is not enough since it would mean that for instance the message "Wikipedia00" would get the same hash sum. Therefore also the length of the message is added in an extra block, so we get three blocks like this:

Wikipedi  a0000000  00000009

Now that is a bit wasteful since it means hashing one extra block. So there is a slight speed optimisation that most hash implementations use. If there is space enough among the zeros padded to the last block the length value can instead be padded there. Like this:

Wikipedi  a0000009

Note that to avoid confusion the implementation must use a fixed bit-size for the length value, say 40-bit. So the length value padded in the end really is "00009" not just "9".

Davies-Meyer

The Davies-Meyer hash construction

The Davies-Meyer hash compression function feeds each block of the message (m_i) as the key to the block cipher. It feeds the previous hash value (H_i-1) as the cleartext to be encrypted. The output ciphertext is then also XORed ( $\oplus$ ) with the previous hash value (H_i-1) to produce the next hash value (H_i). In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

$H_{i}=E_{m_{i}}(H_{i-1})\oplus H_{i-1}$

If the block cipher uses for instance 256-bit keys then each message block (m_i) is a 256-bit chunk of the message. If the same block cipher uses a block size of 128 bits then the input and output hash values in each round is 128 bits.

Variations of this method replace XOR with any other group operation, such as addition on 32-bit unsigned integers.

Note: There is some more information in the old Davies-Meyer article that has not been merged to this article yet.

Matyas-Meyer-Oseas

The Matyas-Meyer-Oseas hash construction

The Matyas-Meyer-Oseas hash compression function can be considered the dual (the opposite) of Davies-Meyer.

It feeds each block of the message (m_i) as the cleartext to be encrypted. The output ciphertext is then also XORed ( $\oplus$ ) with the same message block (m_i) to produce the next hash value (H_i). The previous hash value (H_i-1) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

If the block cipher have different block and key size the hash value (H_i-1) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.

$H_{i}=E_{g(H_{i-1})}(m_{i})\oplus m_{i}$

Miyaguchi-Preneel

The Miyaguchi-Preneel hash construction

The Miyaguchi-Preneel hash compression function is an extended variant of Matyas-Meyer-Oseas. It was independently proposed by Shoji Miyaguchi and Bart Preneel.

It feeds each block of the message (m_i) as the cleartext to be encrypted. The output ciphertext is then XORed ( $\oplus$ ) with the same message block (m_i) and then also XORed with the previous hash value (H_i-1) to produce the next hash value (H_i). The previous hash value (H_i-1) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant pre-specified initial value (H₀).

If the block cipher have different block and key size the hash value (H_i-1) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.

$H_{i}=E_{g(H_{i-1})}(m_{i})\oplus H_{i-1}\oplus m_{i}$

The roles of m_i and H_i-1 may be switched, so that H_i-1 is encrypted under the key m_i. Thus making this method an extension of Davies-Meyer instead.

References

Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone (2001), chapter 9.