0% found this document useful (0 votes)
112 views

A Review Paper On Cryptographic Hash Function

This document provides an overview of cryptographic hash functions. It discusses how hash functions take an input of any length and produce a fixed-length output. Hash functions are used for applications like message authentication, digital signatures, and password verification. The document also describes standard hash functions like SHA-1 and MD5, and how they are constructed using the Merkle-Damgard framework to iterate a compression function. It provides security properties of cryptographic hash functions like collision resistance.

Uploaded by

hamid khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

A Review Paper On Cryptographic Hash Function

This document provides an overview of cryptographic hash functions. It discusses how hash functions take an input of any length and produce a fixed-length output. Hash functions are used for applications like message authentication, digital signatures, and password verification. The document also describes standard hash functions like SHA-1 and MD5, and how they are constructed using the Merkle-Damgard framework to iterate a compression function. It provides security properties of cryptographic hash functions like collision resistance.

Uploaded by

hamid khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

A review Paper on Cryptographic Hash Function

Khan Mohammed Hamid Prof. Jyoti Joglekar


Shah And Anchor Kutchhi Engineering College, Shah And Anchor Kutchhi Engineering College,
Chembur, Mumbai, Maharashtra 400088, INDIA. Chembur, Mumbai, Maharashtra 400088,
+919870908062 INDIA.
[email protected]

Abstract: possessing the decryption key. The


unreadable message is usually referred to as
Cryptographic hash function is a function the ciphertext.
that takes an arbitrary length as an input and
produces a fixed size of an output. The Decryption is the inverse process which
viability of using cryptographic hash recovers the plaintext from the ciphertext.[1]
function is to verify data integrity, password
verification and sender identity or source of A cryptographic hash function is a function
information. This paper discusses about the that takes some message of any length as
Secure Hash Algorithm, MD5, its input and transforms it into a fixed-length
importance, attacks and recent development. output called a hash value, a message digest,
a checksum.
Keywords: Cryptography, Hash function,
Secure Hash Algorithm, SHA-1, SHA-2, Cryptographic Hash Functions are used to
SHA-3, MD-5. achieve a number of Security Goals like
Message Authentication, Message Integrity,
1. Introduction: and are also used to implement Digital
Signatures (Non-repudiation), Entity
Cryptography is the science that aims at Authentication and Digital Steganography.
designing and developing cryptographic Considerable research has been undergoing
systems, sometimes referred to as a in the field of Cryptographic Hash
cryptosystems. A cryptosystem is a set of Functions. Hash Functions are being
methods needed to create a particular generated from existing primitives like
encryption and decryption scheme. A typical Block ciphers (e.g. Whirlpool, Skein) as
cryptosystem is made up of three parts: One well as being explicitly and specially
that generates the encryption/decryption constructed from scratch like MDx
key, one that performs the encryption familyand SHA family of hash functions.[2]
process, and one that deals with the
decryption process. 2. Cryptographic Hash Functions:

Encryption is the process by which one A hash function H accepts a variable-length


changes a message (called plaintext) in order block of data M as input and produces a
to render it unreadable to all but those fixed-size hash value h=H (M). A “good”
hash function has the property that the padding includes the value of the length of
results of applying the function to a large set the original message in bits. The length field
of inputs will produce outputs that are is a security measure to increase the
evenly distributed and apparently random. In difficulty for an attacker to produce an
general terms, the principal object of a hash alternative message with the same hash
function is data integrity. A change to any value.[3]
bit or bits in M results, with high probability,
in a change to the hash code. 2.1.Security Application of Cryptographic
Hash Functions
The kind of hash function needed for 2.1.1. Information Integrity and Message
security applications is referred to as a Authentication:
cryptographic hash function. A
cryptographic hash function is an algorithm Verifying the integrity and authenticity of
for which it is computationally infeasible to information is a prime necessity in computer
find either (a) a data object that maps to a systems and networks. In particular, two
pre-specified hash result (the one-way parties communicating over an insecure
property) or (b) two data objects that map to channel require a method by which
the same hash result (the collision-free information sent by one party can be
property). Because of these characteristics, validated as authentic (or unmodified) by the
hash functions are often used to determine other.
whether or not data has changed. Message authentication is a mechanism or
service used to verify the integrity of a
message. Message authentication assures
that data received are exactly as sent (i.e.,
contain no modification, insertion, deletion,
or replay). In many cases, there is a
requirement that the authentication
mechanism assures that purported identity of
the sender is valid. When a hash function is
used to provide message authentication, the
hash function value is often referred to as a
message digest.

2.1.2. Digital certificate and Digital


signature:
Figure 1. Black Diagram of Cryptographic
Hash Function; h=H (M) A digital certificate or public key certificate
contains a user’s name along with the user’s
Figure 1 depicts the general operation of a public key. In most situations the certificate
cryptographic hash function. Typically, the must be signed by certificate authority, or
input is padded out to an integer multiple of CA, which acts as a trusted third party, or
some fixed length (e.g., 1024 bits), and the
TTP. By signing the certificate, the CA is Looking at this wide range of applications, it
confirming that the identity stated in the is not correct to say that Hash Functions
certificate is that of the holder of the belong to one particular cryptographic sub
corresponding private key. difference branch. These cryptographic tools deserve a
between a digital signature and digital separate status for themselves. They are
certificate is that a certificate is an electronic used in almost all cryptology where efficient
document that binds a public key using information processing is required.
digital signature to an individual or a person, 3. Standard of Cryptographic Hash
a computer or a network device where as Functions
a digital signature is to ensure that a
Cryptographic hash functions come in
data/information remain secure from the
different shape and size. There are basically
point it was issued.[4]
two main categories of hash functions. Hash
2.1.3. Password verification: functions that depends on a key for their
computation, usually known as Message
Storing all user passwords as cleartext can
Authentication Code or MAC and hash
result in a massive security breach if the
functions that do not depend on a key for
password file is compromised. One way to
their computation, generally known as un-
reduce this danger is to only store the hash
keyed hash function or simply hash
digest of each password. To authenticate a
function.
user, the password presented by the user is
hashed and compared with the stored hash. All well known hash functions are either
based on a block cipher or on modular
Hash functions as PRNG and OTP:
arithmetic. But before stepping into their
Hash functions as one way functions can be details, we study a well known method used
used to implement PRNG (Pseudo random to build the most popular hash functions, the
number generator). Cryptographic Hash Merkle - Damgard construction [5].
Functions like SHA and MD5 generates
hash function which cannot reverse back,
which can be used to generate OTP (One 3.1.The Merkle - Damgard Construction:
Time Password), A small Change in the
Named after its two inventors, the American
plaintext produces large change in the
Ralph C. Merkle and the Danish Ivan
output.
Damgard, the Merkle - Damgard structure
2.1.4. Other Applications: defines a generic step by step procedure for
Hash Functions can also be used to index deriving a fixed-length output value from a
data in hash tables, for fingerprinting, to variable-length input value. The process is
detect duplicate data or uniquely identify depicted in figure 2.
files, and as checksums to detect accidental
data corruption and for generating random
numbers also.
original message (that is, the message before
any padding has been applied) takes place in
such a way that the padding length bits are
added as the last bits of the padded message
block prior to being processed by the
compression function. Every block is
processed by the compression function in
the same iterative manner.[6]
Figure 2 [6]. The Merkle - Damgard hash The compression function always takes two
construction. inputs in each step or iteration, a message
The main building blocks of the Merkle - block and a chaining variable. In the first
Damgard structure are: iteration, the chaining variable is the IV or
IV: Initialization Vector or Initial Value is a Initialization Vector. It is given, together
fixed value used as the chaining variable for with the first block of message, as inputs to
the very first iteration. the compression function. The output of the
compression function f in the first iteration
f: the compression function or one-way hash
is the chaining variable in the second
function which is either specially designed
iteration. The output of the compression
for hashing or based on a block cipher. The
function f in the ith iteration is the chaining
compression function generally takes an
variable in the (i + 1)th iteration and so on
input of fixed length and produces an output
until we reach the last iteration.
of fixed length.
In the last iteration, the output of the
Finalisation: an output transformation
compression function is used as an input to a
function which usually reduces further the
finalization function which reduces further
length of the output value of the last
the length of the final output value from the
iteration.
compression function (however, in some
Hash: the message digest or the hash result. cases the finalization function is not present
As we can see from the figure 2, the entire and the output value of the compression
message to be hashed is first divided into n function f in the last iteration is used as the
blocks of equal length. The actual length of final hash result).
the message blocks depends on the 3.2.The MD5 hash Algorithm:
requirements set by the compression
The MD5 (1992) message-digest algorithm
function f. The message is then padded,
was designed as a strengthened extension of
always, such that its length is a multiple of
the MD4 (1990) message digest algorithm.
some specific number. The padding is done
MD5 is slightly slower than MD4; this is a
by adding after the last bit of the last
classical example where security is favored
message block a single 1-bit followed by the
at the expense of speed. Both algorithms
necessary number of 0-bits. The length
were developed by Ron Rivest who is the
padding which consists of appending a k-bit
“R” in the RSA [Rivest-Shamir-Adleman]
representation the length in bits of the
public-key encryption algorithm.
3.2.1. Description of the MD5 algorithm: hand, if the message is 448 bits long, it is
The algorithm accepts an input message of padded by 512 bits to a length of 960 bits.
arbitrary length and produces a 128-bit Thus, at least 1 bit and at most 512 bits are
“message digest”, “fingerprint” or “hash appended during this step.
result”. Figure 3 depicts the way the input Step 2: Append length
message is turned into a 128-bit message A 64 − bit representation of the length in
digest. bits of the original message M (before the
padding bits were added) is appended to the
result of step 1.
Step 3: Initialize MD buffer
A 128 − bit (4 × 32 − bit) buffer (A, B, C,
D) is used to hold intermediate and final
result of the MD5 hash algorithm. These
registers are initialized to the following 32 –
bit values in hexadecimal:
A = 67 45 23 01
B = ef cd ab 89
Figure 3. The MD5 algorithm.
C = 98 ba dc fe
The actual processing of the MD5 algorithm
consists of the following 5 steps: D = 10 32 54 76
Step 1: Append padding bits: These values are stored in little-endian
format, meaning that the low-order bytes of
During this step, the message is extended or
a word are placed in the low-address byte
padded in such a way that its total length in
position. The initialization values appear
bits is congruent to 448 modulo 512. This
then as follows:
operation is always performed even if the
message’s length in bit is originally word A = 01 23 45 67
congruent to 448 modulo 512. We notice word B = 89 ab cd ef
that 448 + 64 = 512, so the message is word C = fe dc ba 98
padded such that its length is now 64 bits
word D = 76 54 32 10
less an integer multiple of 512.
These four variables (they are indeed
Padding is done by appending to the
variables since they change value) are
message a single “1” bit followed by the
copied into different variables: A is saved as
necessary amount of “0” bits so that the
AA, B as BB, C as CC and D is saved as
length in bits of the padded message
DD.
becomes congruent to 448 modulo 512. For
example, if the message is 447 bits long, it is Step 4: Define four auxiliary functions and
padded by 1 bit to a length of 448 bits (the process message
single bit 1 is appended to the end of the This step consists of sixty-four (64) steps
message in this particular case). On the other divided into four (4) rounds of processing.
The four rounds are almost identical, with SUM32= addition modulo 232 .
the main difference being that each round 3.2.2. Attacks on MD5:
uses a different primitive logical function,
More generally, we can distinguish three
denoted by F, G, H, and I in the
main categories of attacks on cryptographic
specification. Let us first define the four
hash function which are pre-image attacks,
functions. We note that each of them takes
second pre-image attacks and collision
three 32-bit words as input and yields one
attacks. In one way or another, all kind of
32-bit word as output.
attacks fall into either one of these
Table 1: The primitive functions of the categories. This is the main reason why one-
MD5 compression algorithm. way hash functions are required to be
preimage resistant and second pre-image
resistant while in addition to these two
properties, collision-resistant hash function
need to exhibit the property of being
collision-resistant. We take a look at some
commonly known attacks.
Step 5: Output:
3.2.3. Collision attack in MD5
The output from the very last round is the cryptographic hash function:
128-bit hash result or message digest we In August 2004 Xiaoyun Wang and Hongbo
obtain after we have incrementally Yu of Shandong University in China
processed all t 512-bit blocks of the published an article[7] in which they
message. The entire process can be describe an algorithm that can find two
summarized as follows: different sequences of 128 bytes with the
CV0 = IV same MD5 hash. Their research was
CVk+1 = SUM32 (CVk , RFI [Mk , RFH [Mk, motivated by the possibility of finding a
RFG[Mk, RFF[Mk, CVk]]]]) colliding pair of messages, each consisting
of two blocks.
MD5SUM = CVt
The attack revolves around finding two
Where
distinct message blocks (M0, N0) and (M1,
IV = the initial value of the ABCD buffer, N1) where the first blocks differ only in a
defined by step 3 predefined constant vector cv1 such that M1
Mk= the kth 512-bit block of the message = M0 + cv1 and the second message blocks
CVk= the chaining variable processed with differs in a predefined constant vector cv2
the kth block of message (cv2 = −cv1 modulo 232) such that N1 = N0
RFx= the round function using primitive +cv2 and MD5 (M0, N0) = MD5 (M1, N1).
logical function x However, there is a considerable amount of
MD5SUM= the final hash result or message conditions that have to be met for the attack
digest to be successful. Basically, the attack makes
use of differential or differences in message
that are spread over a length of two message found. And that this flaw makes the
blocks. The first block’s difference algorithm less secure than originally
introduces a small difference in the original believed. No further details were given to
state or initialization vector whereas the the public, only that a small modification
second block’s difference cancels out the was made to the algorithm which was now
difference introduced by the first block. We known as SHA-1 and published in FIBS
also note that finding the first blocks (M0, PUB 180-1.
M1) takes about 239 MD5 operations, and This Standard specifies secure hash
finding the second blocks (N0, N1) takes algorithms, SHA-1, SHA-224, SHA-256,
about 232 MD5 operations. The application SHA-384, SHA-512, SHA-512/224 and
of this attack on IBM P690 takes about one SHA-512/256. All of the algorithms are
hour to find M0 and M1, where in the fastest iterative, one-way hash functions that can
cases it takes only 15 minutes. Then, it takes process a message to produce a condensed
only between 15 seconds to 5 minutes to representation called a message digest.
find the second blocks N0 and N1. [8] These algorithms enable the determination
3.2.4. Pure Brute-force Attack of a message’s integrity: any change to the
message will, with a very high probability,
The pure brute-force attack is one in which
results in a different message digest. This
all possible words of a certain length are
property is useful in the generation and
tried until the correct one is found. This
verification of digital signatures and
attack is guarantied to work, that is why one
message authentication codes, and in the
usually chooses the length of the hash result
generation of random numbers or bits.
in such a way that the brute-force attack
becomes impractical or too slow and thus Table 2 [10]. Presents the basic properties
less attractive.[9] of these hash algorithms.
3.3.The Secure Hash Algorithm - SHA
The Secure Hash Algorithm (SHA) was
developed by the National Security Agency
(NSA) and published in 1993 by the
National Institute of Standard and
Technology (NIST) as a U.S. Federal
Information Processing Standard (FIPS PUB
180). SHA is based on and shares the same 3.3.1. Description of the SHA-1
building blocks as the MD4 algorithm. algorithm
Unlike the MD4 algorithm, the design of
SHA introduced a new procedure which The SHA-1 algorithm accepts as input a
expands the 16-word message block input to message with a maximum length of 264− 1
the compression function to an 80-word and produces a 160-bit message digest as
block among other things. In 1994, NIST output. The message is processed by the
announced that a technical flaw in SHA was compression function in 512-bit block. Each
block is divided further into sixteen 32-bit C = 98 ba dc fe
words denoted by Mt
D = 10 32 54 76
for t = 0, 1, ···,15. The compression function
consists of four rounds; each round is made E = c3 d2 e1 f0
up of a sequence of twenty steps. A We can see that the registers A, B, C, and D
complete SHA-1 round consists of eighty are exactly the same as the four registers
steps where a block length of 512 bits is used in MD5 algorithm. But in SHA-1, these
used together with a 160-bit chaining values are stored in big-endian format,
variable to finally produce a 160-bit hash which means that the most significant byte
value. The processing works as described in of the word is placed in the low-address byte
the following steps: position. Hence the initialization values (in
Step 1: Append padding bits hexadecimal notation) appear as follows:

The original message is padded so that its word A = 67 45 23 01


length is congruent to 448 modulo 512. word B = ef cd ab 89
Again, padding is always added although the
message already has the desired length. word C = 98 ba dc fe
Padding consists of a single 1 followed by
word D = 10 32 54 76
the necessary number of 0 bits.
word E = c3 d2 e1 f0
Step 2: Append length
Step 4: Process message in 512-bit blocks
A 64-bit block treated as an unsigned 64-bit
integer (most significant byte first), and The compression function is divided into
representing the length of the original twenty sequential steps composed of four
message (before padding in step 1), is rounds of processing where each round is
appended to the message. The entire made up of twenty steps. The four rounds
message’s length is now a multiple of 512. are structurally similar to one another with
the only difference that each round uses a
Step 3: Initialize the buffer
different Boolean function, which we refer
The buffer consists of five (5) registers of 32 to as f1, f2, f3, f4 and one of four different
bits each denoted by A, B, C, D, and E. This additive constants Kt (0 ≤ t ≤ 79) which
160-bit buffer is used to hold temporary and depends on the step under consideration.
final results of the compression function. The values of the four distinct additives
These five registers are initialized to the constant are given in Table 2 below.
following 32-bit integers (in hexadecimal
notation).

A = 67 45 23 01

B = ef cd ab 89
Table 3: The four additive constants used in where the chaining variable CVk input to the
SHA-1 algorithm first round is added to the output obtained
after execution of step 80 to produce the
next chaining variable CVk+1

The primitive logical functions fr are defined


as shown in Table 3.

Table 4: The four primitive logical


functions used in SHA-1 algorithm
Every step updates two of the five registers.
The step operation which updates the value
of the E register and rotates the value of the
B register by 30 bit position to the left is of
the following form:

A, B, C, D, E ← (E + fr(t, B, C, D) + [A As compared to MD5, only the function


<<< 5] + Mt + Kt), A, [B <<< 30], C, D used in the first round has not changed.
Otherwise, the remaining three functions for
Where the last three rounds of SHA-1 are not the
same as the one used in the MD5
A, B, C, D, E = the five registers of the
compression algorithm. We notice that in
SHA-1 buffer
the SHA-1 algorithm, the functions f2 and f4
t = the step number, 0 ≤ t ≤ 79 have the same structure and are used
respectively in round 2 and 4. These two
fr = the primitive logical function used in functions are structurally equivalent to the
step t and round r function H used in the round 3 of the MD5
<<< s = the circular left shift of the 32-bit compression function. However, the
word by s bits function f3 of the SHA-1 algorithm is
entirely new. We now take a closer look at
Mt= a 32-bit word derived from the current the way SHA-1 expands the 16 block words
512-bit input block to 80 words used by the compression
function.
Kt= one of four additive constants
3.3.2. Deriving 80 32-bit word values
+ = addition modulo 232
from one 512-bit message block
Step 5: Output
Each 512-bit message block comprises 16
After processing the last 512-bit message 32-bit words (16 × 32 = 512). During the
block t (assuming that the message is step which processes the message in 512-bit
divided into t 512-bit blocks), we obtain a blocks, the first 16 words of every message
160-bit message digest. The compression block is taken and used directly as it
function uses a feed-forward operation
appears. The additional 64 blocks are 3.3.3. Attacks on SHA1:
derived by following the algorithm given by:
In 2005, cryptanalysts found attacks on
Mt = (Mt−16 ⊕ Mt−14 ⊕ Mt−8 ⊕ Mt−3) <<< 1 SHA-1 suggesting that the algorithm might
not be secure enough for ongoing use. NIST
This means that if we assume that word M0 required many applications in federal
through M15represent the first 16 words agencies to move to SHA-2 after 2010
(used in the first 16 steps), then for the step because of the weakness. Although no
17 the word M16 is given by: successful attacks have yet been reported on
M16= (M0 ⊕ M2 ⊕ M8 ⊕ M13) <<< 1 SHA-2, it is algorithmically similar to SHA-
1. In 2012, following a long-running
Up to step 80 where M79 is given by: competition, NIST selected an additional
algorithm, Keccak, for standardization
M79= (M63 ⊕ M65 ⊕ M71 ⊕ M76) <<< 1
under SHA-3.
As we can see, the value of all subsequent
In November 2013 Microsoft announced
16 words block Mt are obtained by XOR-ing
their deprecation policy on SHA-1
four of the preceding values of Mt and by
according to which Windows will stop
applying a circular left shift by one bit to the
accepting SHA-1 certificates in SSL by
resulting value. The word expansion
2017.In September 2014 Google announced
introduced by SHA-1 augments the
their deprecation policy on SHA-1
interdependency between every message
according to which Chrome will stop
block and the final message digest. Together
accepting SHA-1 certificates in SSL in a
with the longer output of 160-bit message
phased way by 2017. Mozilla is also
digest, SHA-1 simply strengthens the one-
planning to stop accepting SHA-1-based
wayness, pre-image resistance, second pre-
SSL certificates by 2017[11].
image resistance and collision resistance of
the SHA-1 viewed as cryptographic hash
function [1].

4. Comparison of Cryptographic hash


functions:
Table 5: Comparison of MD 5 and SHA
5. Conclusion: [7] Xiaoyun Wang and Hongbo Yu, How to
Break MD5 and Other Hash Functions,
This paper presents cryptographic hash EUROCRYPT (Ronald Cramer, ed.),
function with its measure methods and Lecture Notes in Computer Science, vol.
algorithms, its Applications, security attacks 3494, Springer, 2005, pp. 19–35.
on its algorithm and finally comparison
among MD 5 and Secure Hash Algorithm [8] Vlastimil Klíma “Finding MD5 Collisions
family. – a Toy for a Notebook” Prague, Czech
Republic, 5 March 2005.
6. References:
[9] Marc Martinus Jacobus Stevens “Attacks
[1] Joseph Sterling Grah “HASH on Hash Functions and Applications” Ph. D.
FUNCTIONS IN CRYPTOGRAPHY”
Master of Science thesis Department of Theses, University of Leiden, 2012.
Informatics, The Faculty of Mathematics
[10] Secure Hash Standard (SHS), FIBS
and Natural Sciences, University of Bergen
June 1, 2008. PUB180-4, March 2012.

[2] Rajeev Sobti and G.Geetha [11]“SHA-1”[online]


http://en.wikipedia.org/wiki/SHA1.[Accesse
“Cryptographic Hash Functions: A Review”
d: 12 Nov 2014].
IJCSI International Journal of Computer
Science Issues, Vol. 9, Issue 2, No 2, March
2012

[3] W. Stallings “Cryptography and


Network Security” - Prins and Pract. 5th ed
(Pearson, 2011)

[4] Deven N. Shah “Information Security,


Principles and Practice” Wiley India. 2011.

[5] Bart VAN ROMPAY “ANALYSIS


AND DESIGN OF CRYPTOGRAPHIC
HASH FUNCTIONS, MAC
ALGORITHMS AND BLOCK CIPHERS”
Ph. D. Theses, Dept. of ELECTRO-ESAT,
University of Leuven.

[6] “The Merkle - Damgard Construction”


[online]http://en.wikipedia.org/wiki/Merkle-
Damgard_construction.[Accessed: 12 Nov
2014]

You might also like