MD5 Definition

In Cryptography, MD5 (acronym for Message-Digest Algorithm 5 according to abbreviationfinder) is a widely used 128- Bit Cryptographic Reduction Algorithm. The MD5 code was designed by Ronald Rivest in 1991. During 2004, certain security defects were disclosed, which will mean that in the near future this system will be changed to a more secure one.


MD5 is one of the cryptographic reduction algorithms designed by Professor Ronald Rivest of MIT (Massachusetts Institute of Technology). When analytical analysis indicated that the MD4 algorithm was insecure, it was decided to program MD5 to replace it in 1991. Weaknesses in MD4 were discovered by Hans Dobbertin.

In 1996 Dobbertin announced a hash collision of the MD5 compression function. This was not an attack on the MD5 hashing function, but it did lead cryptographers to start recommending the replacement of MD5 encoding to others such as SHA-1 or RIPEMD-160. In August 2004 Chinese researchers also found hash collisions in MD5. MD5 is currently in widespread use and it is unknown how these issues will affect its use and its future.


The 128-bit MD5 encoding is typically represented as a 32-digit hexadecimal number. The following 28-byte ASCII code will be treated with MD5 and we will see its corresponding output hash:

MD5(“This is an MD5 test”) = e07186fbff6107d0274af02b8b930b65

A simple change in the message gives us a total change in the hash encoding, in this case we change two letters, the “yes” to a “no”.

MD5(“This is not an MD5 test”) = dd21d99a468f3bb52a136ef5beef5034

Another example would be encoding an empty field:

MD5(“”) = d41d8cd98f00b204e9800998ecf8427e


  • Terminologies and notations

In this document “word” is a 32-bit entity and byte is an 8-bit entity. A sequence of bits can be naturally interpreted as a sequence of bytes, where each consecutive group of eight bits is interpreted as a byte with the most significant bit first. Similarly, a sequence of bytes can be interpreted as a sequence of 32 bits (word), where each consecutive group of four bytes is interpreted as a word with the least significant bit at the beginning.

The symbol “+” means sum of words. X <<< s is interpreted by a rotation to the left ‘s’ positions not(x) is understood as the complement of x

  • Description of the MD5 algorithm

We start by assuming that we have a message of ‘b’ input bits, and that we would like to find its digest. Here ‘b’ is an arbitrary non-negative integer value, but it can be zero, it doesn’t have to be a multiple of eight, and it can be very long. Imagine the bits of the message written like this:

m0 m1… m{b-1}

The following five steps are performed to compute the message digest.

Step 1. Adding bits

The message will be extended until its length in bits is consistent with 448, modulo 512. That is, the message will be extended until the smallest multiple of 512 bits is formed. This extension is always done, even if the message length is already consistent with 448, modulo 512.

The extension is done as follows: a single “1” bit is added to the message, and then “0” bits are added until the message’s bit length is consistent with 448, modulo 512. In all messages at least one bit and at most 512.

Step 2. Message length

A 64-bit representation of ‘b’ (the length of the message before adding the bits) is added to the result of the previous step. In the unintended assumption that ‘b’ is greater than 2^64, then only the least significant 64 bits of ‘b’ will be used.

At this point the resulting message (after padding with the bits and with ‘b’) will have a length that is an exact multiple of 512 bits. In turn, the length of the message is a multiple of 16 words (32 bits per word). With M[0… N-1] we will denote the words of the resulting message, where N is a multiple of 16.

Step 3. Initialize the MD buffer

A buffer of four words (A, B, C, D) is used to compute the message digest. Here each of the letters A, B, C, D represents a 32-bit register. These registers are initialized with the following hexadecimal values, least significant bits first:

word A: 01 23 45 67 word B: 89 ab cd ef C word: fe dc ba 98 D-word: 76 54 32 10

Step 4. Message processing in blocks of 16 words

We first define four helper functions that take three 32-bit words as input and output one 32-bit word.

<math>F(X,Y,Z) = (X\wedge{Y}) \vee (\neg{X} \wedge{Z})</math>

<math>G(X,Y,Z) = (X\wedge{Z}) \vee (Y \wedge \neg{Z})</math>

<math>H(X,Y,Z) = X \oplus Y \oplus Z</math>

<math>I(X,Y,Z) = Y \oplus (X \vee \neg{Z})</math>

The operators <math>\oplus, \wedge, \vee, \neg</math> are the XOR, AND, OR and NOT functions respectively.

At each position of each bit F acts as a conditional: if X, then Y else Z. The function F could have been defined using + instead of v since XY and not(x)Z will never have ones (‘1’) at the same bit position. It is interesting to note that if the bits of X, Y, and Z are independent and unbiased, each bit of F(X,Y,Z) will be independent and unbiased.

Functions G, H, and I are similar to function F, in that they act “bitwise in parallel” to produce their outputs from the bits of X, Y, and Z, as long as each corresponding bit of X, Y and Z are independent and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note that the function H is the bitwise comparison “xor” or “parity” function of its inputs.

This step uses a 64-element table T[1… 64] built with the sine function. We will denote by T[i] the i-th element of this table, which will be equal to the integer part of the absolute value of the sine of ‘i’ times 4294967296, where ‘i’ is in radians.

MD5 code:

/* Process each block of 16 words. */for i = 0 to N/16-1 do /* Copy the ‘i’ block to X. */ for j = 0 to 15 do make X[j] out of M[i*16+j]. end for /* of loop ‘j’ *//* Save A as AA, B as BB, C as CC, and D as DD. */ AA = A BB = B CC = C DD = D/* Round 1. */ /* [abcd ksi] will denote the operation a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the next 16 operations. */ [ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4] [ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8] [ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12] [ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]/* Round 2. */ /* [abcd ksi] will denote the operation a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the next 16 operations. */ [ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20] [ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24] [ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28] [ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]/* Round 3. */ /* [abcd kst] will denote the operation a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the next 16 operations. */ [ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36] [ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40] [ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44] [ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]/* Round 4. */ /* [abcd kst] will denote the operation a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the next 16 operations. */ [ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52] [ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56] [ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60] [ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]/* Now perform the following sums. (This is the increment of each one of the four registers by the value they had before this block was initialized.) */ A = A + AA B = B + BB C = C + CC D = D + DDend for /* of loop on ‘i’ */

Step 5. Exit

The message digest is the output produced by A, B, C, and D. That is, it starts with the low-order byte of A and ends with the high-order byte of D.


MD5 has been widely used, and was originally thought to be cryptographically secure. However, certain investigations have uncovered vulnerabilities that make future use of MD5 questionable. On August 17, 2004, Xiaoyun Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu announced that they had discovered MD5 hash collisions. His attack only took an hour of computation with an IBM P690 cluster.

Although Wang’s attack was analytical, the size of the hash (128 bits) is small enough to allow for ‘birthday’ type ‘brute force’ attacks. MD5CRK was a distributed project that started in March 2004 with the purpose of proving that MD5 is insecure by finding a collision using a ‘brute force’ attack, though it ended shortly after Wang’s warning.

Due to the discovery of an easy method to generate hash collisions, many researchers recommend other algorithms, such as SHA-1 or RIPEMD-160, to be used instead of MD5.


MD5 digests are widely used in the software world to provide assurance that a file downloaded from the internet has not been tampered with. By comparing a published MD5 checksum to the downloaded file’s checksum, a user can be confident enough that the file is the same as the one published by the developers. This protects the user against ‘Trojan Horses’ or ‘Trojans’ and viruses that some other malicious user might include in the software. Checking a downloaded file against its MD5 checksum not only detects maliciously altered files, it also recognizes a corrupt or incomplete download.

To check the integrity of a file downloaded from the Internet, an MD5 tool can be used to compare the MD5 sum of that file to an MD5SUM file with the MD5 sum of the first file. On UNIX systems, the md5sum command is an example of such a tool. Furthermore, it is also implemented in the PHP scripting language as MD5(“”) among others.

On UNIX and GNU / Linux systems, the MD5 algorithm is used to encrypt user passwords. The result of the MD5 of the key that is entered when registering a user is saved on the disk, and when the user wants to enter the system, the entry is compared with the one stored on the hard disk, if they coincide, it is the same password and the user will be authenticated. Hence the problem of finding and generating hash collisions at will.

MD5 can also be used to check that emails have not been tampered with using public and private keys.