0% found this document useful (0 votes)
108 views

Hash Functions: From: Chapter 5 (Book 1) by Dr. Shashikala

The document discusses hash functions, which are mathematical functions that map digital data of arbitrary size to compressed values of a fixed size. It describes the requirements of cryptographic hash functions including compression, efficiency, one-wayness, and collision resistance. It also covers specific hash functions like MD5, SHA-1, Tiger, and the use of hashes for digital signatures.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Hash Functions: From: Chapter 5 (Book 1) by Dr. Shashikala

The document discusses hash functions, which are mathematical functions that map digital data of arbitrary size to compressed values of a fixed size. It describes the requirements of cryptographic hash functions including compression, efficiency, one-wayness, and collision resistance. It also covers specific hash functions like MD5, SHA-1, Tiger, and the use of hashes for digital signatures.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Hash Functions

From : Chapter 5 (Book 1)


By
Dr. Shashikala
Objective
• Introduction
• Cryptography Hash Function
• A birthday attack
• Non cryptography Hashes
• Tiger Hash
• HMAC
• Use for HASH function
• Miscellaneous
Introduction
• Hash functions are extremely useful and appear
in almost all information security applications.
• A hash function is a mathematical function that
converts a numerical input value into another
compressed numerical value.
• The input to the hash function is of arbitrary
length but output is always of fixed length.
• Values returned by a hash function are
called message digest or simply hash values.
• The following picture illustrated hash function
Requirements of HASH functions(1)
• Compression —
For any size input x, the output length of y = h(x) is small.
That is the output is fixed size, regardless of the length
of the input.
• Efficiency —
It must be easy to compute h(x) for any input x. The computational
effort required to compute h{x) will grow with the length of x, but it
cannot grow too fast. Easy to compute h=H(M) for any message M.
• One-way —
Given any value y, it's computationally infeasible to find a value “a”; such
that h{x) = y. Another way to say this is that there is no feasible way to
invert the hash.
Requirements of HASH functions(2)

• Weak collision resistance —


Given x and h(x), it's infeasible to find any y,
with y ≠ x, such that h(y) = h(x).

Another way to state this requirement is that it is not feasible to modify a


message without changing its hash value.
• Strong collision resistance —
It's infeasible to find any x and y, such that x ≠ y
such that h(x) = h(y).
That is, we cannot find any two inputs that hash to the
same output.
Cryptographic Hash Function

• Alice signs a message M by using her private key to "encrypt," that is, she
computes S =[M]Alice · If Alice sends M and S to Bob, then Bob can verify the
signature by verifying that M = {S}Alice, if M is large, [M] Alice is costly to
compute.
• If a cryptographic function h, Alice will sign M by first hashing M then signing
the hash, that is, Alice computes S = [h(M)]Alice. Hashes are efficient
(comparable to block cipher algorithms), and only a small number of bits need
to be signed.
• Then Alice can send Bob M and S, as illustrated in Figure. Bob verifies the
signature by hashing M and comparing the result to the value obtained when
Alice's public key is applied to S. That is, Bob verifies that h(M) = {S}Alice.
Birthday Paradox
• How many people must be there in a room to
make the probability 100% that at-least two
people in the room have same birthday?

Answer: 367 (since there are 366 possible


birthdays, including February 29).
• How many people must be there in a room to
make the probability 50% that at-least two
people in the room have same birthday?
• Answer: 23
The number is surprisingly very low. In fact,
we need only 70 people to make the
probability 99.9 %.
• What is the probability that two persons
among n have same birthday?
Let the probability that two people in a room
with n have same birthday be P(same).
P(Same) can be easily evaluated in terms of
P(different) where P(different) is the
probability that all of them have different
birthday.
• P(same) = 1 – P(different)
• P(different) can be written as 1 x (364/365) x
(363/365) x (362/365) x …. x (1 – (n-1)/365)
• How did we get the above expression?
Persons from first to last can get birthdays in
following order for all birthdays to be distinct:
The first person can have any birthday among 365
The second person should have a birthday which is
not same as first person
The third person should have a birthday which is
not same as first two persons.
…………….
……………
The n’th person should have a birthday which is not
same as any of the earlier considered (n-1) persons.
•• Approximation of above expression
 
The above expression can be approximated using Taylor’s Series.

• provides a first-order approximation for for x << 1:


≈ 1+x
• To apply this approximation to the first expression derived for
p(different), set x = -a / 365. Thus,

• The above expression derived for p(different) can be written as

1 x (1 – 1/365) x (1 – 2/365) x (1 – 3/365) x …. x (1 – (n-


1)/365)
• By putting the value of 1 – a/365 as e-a/365, we
get following.
Using the above approximate formula, we can approximate number
of people for a given probability. For example the following C++
function find() returns the smallest n for which the probability is
greater than the given p.
A Birthday Attack
• If M is the message that Alice wants to sign,
then she computes S = [h(M)]Alice & sends S
and M to Bob.
• Attacker selects an "evil" message E that she
wants Alice to sign, but which Alice is
unwilling to sign.
• Attacker also creates an innocent message I
that she is confident Alice is willing to sign.
• Attacker generates 2n/2 variants of the innocent message.
• These innocent messages, which we denote Ii, all have the same
meaning as I, but since the messages differ, their hash values differ.
• Attacker creates 2n/2 variants of the evil message, which denoted as
Ei. but their hashes differ.
• By the birthday problem, attacker can expect to find a collision,
h(Ej) = h(Ik).
• Given such a collision, attacker sends Ik to Alice, & asks Alice to
sign it.
• Alice signs it and returns Ik and h[Ik]Alice to attacker.
• Since h(Ej) = h(Ik), it follows that h[Ej]Alice = h[Ik]Alice· Consequently
attacker has obtained Alice's signature on the evil message Ej.
• To prevent this attack, choose a hash function for which n, the size
of the hash function output, is so large that attacker cannot compute
2n/2 hashes.
Non-Cryptographic Hashes
• Consider data X = (X1,X2,X3,…,Xn), each Xi is a byte.
• Defining hash function
h(X) = (X1+X2+X3+…+Xn) mod 256.
• This provides compression, since any size of input is compressed to an 8-
bit output.
• This Hash would be easy to break, since the birthday problem tells us that
if we hash just 24= 16 randomly selected inputs, we can expect to find a
collision.
• For example:
swapping two bytes will always yield a collision,
such as X = (10101010, 00001111), Hash is h(X) = 10111001.
If Y = (00001111, 10101010) then h(X) = h(Y).
• h(10101010,00001111) = h(00001111,10101010) = 10111001.
• Consider data X = (X0,X1,X2,…,Xn-1)
•  Suppose hash is defined as
h(X) = (nX +(n−1)X +(n−2)X +…+2⋅X +X ) mod 256.
1 2 3 n-1 n

It gives different results when the byte order is


swapped.
• For example:
• h(10101010,00001111) ≠ h(00001111,10101010)
• But there exists birthday problem issue and it also
happens to be relatively easy to construct collisions.
• For example:
h(00000001,00001111) = h(00000000,00010001) =
00010001.
• This is not a secure cryptographic hash, it's useful in
a particular non- cryptographic application known
as Rsync.
• Cyclic Redundancy Check is the remainder in a long
division calculation, good for detecting burst errors
and such random errors unlikely to yield a collision.
• CRC has been mistakenly used where crypto
integrity check is required (e.g., WEP).
Tiger Hash
• Tiger is not a particularly popular hash, it is a little easier to digest than
some of the big-name hashes.
• The two most popular cryptographic hashes of today MD5 & SHA-1.
• The most popular hash in the world was undoubtedly MD5.
• The "MD" in MD5 it is an abbreviation for message digest.
• Believe it or not, MD5 is the successor to MD4, which itself was the
successor to MD2.
• The earlier MDs are no longer considered secure, due to the fact that
collisions have been found.
• In fact, MD5 collisions are easy to find—you can generate one in a few
seconds on a PC
• All of the MDs were invented by crypto guru Ron Rivest.
• MD5 produces a 128-bit output.
• The other contender for title of world's most popular
hash function is SHA-1 which is a U.S. government
standard.
• Being a government standard, SHA is, of course, a clever
3-letter acronym—SHA stands for Secure Hash
Algorithm.
• You might ask, why is it SHA-1 instead of just SHA?
– In fact, there was a SHA (now known as SHA-0), but it
apparently had a minor flaw, as SHA-1 came quickly on the
heels of SHA, with some minor modifications but without
explanation.
BASIS FOR COMPARISON MD5 SHA1

Stands for Message Digest Secure Hash Algorithm

Length of Message Digest 128 bits 160 bits

Discerning of original 2128 operations 2160 operations


message would require

For finding two messages 264 operations would be 280 operations are


generating the same needed required
message digest

Security Poor Moderate

Speed Fast  Slow


• A hash function is considered secure provided
no collisions have been found.
• As with block ciphers, efficiency is also a major
concern in the design of hash functions.
• If, for example, it's more costly to compute the
hash of M than to sign M, the hash function is
not very useful, at least for digital signatures.
• A desirable property of any cryptographic hash function is
the so-called avalanche effect.
• The goal is that any small change in the input should
cascade and cause a large change in the output—just like an
avalanche.
• Ideally, any change in the input will result in output values
that are uncorrelated, and an attacker will then be forced to
conduct an exhaustive search for collisions.
• The avalanche effect should occur after a few rounds, yet we
would like the rounds to be as simple and efficient as
possible.
• In a sense, the designers of hash functions face similar
trade-offs as the designers of iterated block ciphers.
• Tiger, which was developed by Ross Anderson and
Eli Biham, seems to have a more structured
design than SHA-1 or MD5.
• In fact, Tiger can be given in a form that looks very
similar a block cipher.
• Tiger was designed to be "fast and strong" and
hence the name.
• It was also designed for optimal performance on
64-bit processors and it can serve as a
replacement for MD5, SHA-1, or any other hash
with an equal or smaller output.
• The input to Tiger is divided into 512-bit blocks, with
the input padded to a multiple of 512 bits, if necessary.
• Unlike MD5 or SHA-1, the output of Tiger is 192 bits.
The numerology behind the choice of 192 is that Tiger
is designed for 64-bit processors and 192 bits is exactly
three 64-bit words.
• In Tiger, all intermediate steps also consist of 192 bit
values.
• Tiger's block cipher influence can be seen in the fact
that it employs four S-boxes, each of which maps 8 bits
to 64 bits.
• Tiger also employs a "key schedule" algorithm that,
since there is no key, is applied to the input block, as
described below.
X = (X0,X1,…,Xn-1)
where each Xi is 512 bits.
Tiger Hash
• Tiger operates on 512-bit blocks
• Each block is broken into 8, 64-bit words
• Tiger returns 3 (or 2) 64-bit words
Overview Diagram
Message Pad Length

512-bit Blocks

64-bit Words
192-bit Hash
Stages
• Save ABC
• Pass 1, Key Schedule 1
• Pass 2, Key Schedule 2
• Pass 3
• feed-forward ABC
Save ABC
• ABC are initially salted with speical values.
• At the beginning of each successive round ABC
are saved for later use with feed-forward.

• 64_bit_word aa = a, bb = b, cc = c;
Pass 1, detail
• 1 pass = 8 rounds, 1 for each 64-bit word
• 64-bit words (keys) referred to as x0 - x7

• Each pass uses a multiplier (5,7,9) to


redistribute bits between s-box lookups.
• round(a,b,c,x0, mul);
Round function
• round(a,b,c,x, multiplier):

c=c^x
a = a – (s1[c1] ^ s2[c3] ^ s3[c5] ^ s4[c7])
b = b + (s4[c2] ^ s3[c4] ^ s2[c6] ^ s1[c8])
b = b * multiplier

• ^ denotes XOR
S-Boxes
• s-boxes compose a non-linear function
• map from 8 bits into 64.

• Available on the author’s site:


http://www.cs.technion.ac.il/~biham/
Key Schedule
• Key-Schedule re-distributes input bits.
• Introduces further algorithm complexity.
• Rotates words within block.

• Each block is only looked at 3 times in the


passes, this further distributes the bits.
Feed-Forward
• Generates new carry values from previous

• a = a ^ aa ;
• b = b - bb ;
• c = c + cc ;
Tiger Outer Round:
• The input X is padded to a multiple of 512 bits
and written as
• X = (X0,X1,…,Xn-1)
– Employs one outer round for each Xi
– Initial (a,b,c) constants.
– The final (a, b, c) output from one round is the
initial triple for the subsequent
• round and the final (a, b, c) from the final round
is the 192-bit hash value.
• In Outer round, input to
outer round F5 is
(a,b,c).
• The output of F5 as
(a,b,c), the input toF7 is
(c,a,b), the input to F9 is
(b,c,a).
• Each function Fm
consists of eight inner
rounds.
Tiger Inner Rounds
• Each Fm consists of precisely 8 inner rounds.
• 512 bit input W to Fm
– W=(w0,w1,…,w7)
– W is one of the input blocks Xi ➢ All lines are 64 bits
• The input values for fm,i, for i=0,1,2,…,7 are
Tiger Hash: One Round
• Each fm,i is a function of a,b,c,wi and m
– Input values of a,b,c from previous round.
– And wi is 64-bit block of 512 bit W.
– Subscript m is mul tiplier
– And c = (c0,c1,…,c7)
• Output of fm,i is

• Each Si is S-box(i.e., lookup table): 8 bits mapped to 64 bits.


Tiger Hash Key Schedule
• Input is X
– X=(x0,x1,…,x7)
• Small change in
X will produce
large change in
key schedule
output.
Summary
• Hash and intermediate values are 192 bits.
• 24 (inner) rounds:
– S-boxes: Claimed that each input bit affects a, b and c after 3 rounds.
– Key schedule: Small change in message affects many bits of intermediate
hash
– values.
– Multiply: Designed to ensure that input to S-box in one round mixed into
many S-boxes in next.
• S-boxes, key schedule and multiply together designed to ensure
strong avalanche effect.
• Note: A desirable property of any cryptographic hash function is
the so-called avalanche effect. The goal is that any small change in
the input should cascade and cause a large change in the output.
• At a higher level, Tiger employs
– Confusion
– Diffusion
HMAC
• For message integrity we can compute a
message authentication code, or MAC, where
the MAC is computed using a block cipher in
cipher block chaining (CBC) mode.
• The MAC is the final encrypted block, which is
also known as the CBC residue.
• Since a hash function effectively gives us a
fingerprint of a file, we should also be able to
use a hash to verify message integrity.
Motivation
• Consider Alice protect the integrity of M by simply
computing h(M) and sending both M and h(M) to Bob.
• If M changes, Bob will detect the change, provided that
h(M) has not changed (and vice versa).
• However, if attacker replaces M with M' and also
replaces h(M) with h(M'), then Bob will have no way to
detect the tampering.
• But using a hash function to provide integrity
protection, involves a key to prevent attacker from
changing the hash value.
Approach
• Alice encrypt the hash value with a symmetric cipher,
E(h(M),K), and send this to Bob.
• A slightly different approach is used to compute a
hashed MAC, or HMAC.
• Instead of encrypting the hash, directly mix the key
into M when computing the hash.
• Two approaches are to prepend the key to the
message, or append the key to the message:
– h(K,M)
– h(M,K)
h(K,M)
• If h(K,M) is used to compute an HMAC, then consider cryptographic hashes
hash the message in blocks—for MD5, SHA-1, and Tiger, the block size is 512
bits.
• As a result, if M = {B1, B2), where each Bi is 512 bits, then
– h(M) = F(F(A, B1),B2) = F(h(B1), B2) .............................equation (1)
• for some function F, where A is a fixed initial constant.
• For example, in the Tiger hash, the function F consists of the outer with each
Bi corresponding to a 512-bit block of input and A corresponding to the 192-
bit initial value {a,b,c).
• If an attacker chooses M' so that M' = (M,X), attacker might be able to use
equation (1) to find h(K,M') from h(K,M) without knowing K since, for K, M,
and X of the appropriate size,
h(K, M') = h(K, M, X) = F(h(K, M),X)
where the function F is known.
h(M,K)
• If it should happen that there is a known
collision for the hash function h, that is, if there
exists some M' with h(M') = h(M), then by
equation (1), then
– h(M, K) = F(h(M), K) = F(h(M'), K) = h(M’, K)
• provided that M and M' are each a multiple of
the block size.
• Conclusion: If such a collision exists, the hash
function is considered insecure.
HMAC
• Can compute a MAC of the message M with key K
using a “hashed MAC” or HMAC
• HMAC is a keyed hash
– Why would we need a key?
• How to compute HMAC?
• Two obvious choices: h(K,M) and h(M,K)
• Which is better?
HMAC
• Should we compute HMAC as h(K,M) ?
• Hashes computed in blocks
– h(B1,B2) = F(F(A,B1),B2) for some F and constant A
– Then h(B1,B2) = F(h(B1),B2)
• Let M’ = (M,X)
– Then h(K,M’) = F(h(K,M),X)
– Attacker can compute HMAC of M’ without K
• Is h(M,K) better?
– Yes, but… if h(M’) = h(M) then we might have
h(M,K)=F(h(M),K)=F(h(M’),K)=h(M’,K)
The Right Way to HMAC
• Described in RFC 2104
• Let B be the block length of hash, in bytes
– B = 64 for MD5 and SHA-1 and Tiger
• ipad = 0x36 repeated B times
• opad = 0x5C repeated B times
• K+ is K padded with zeros on the left so that the result
is b bits in length
• Then
HMAC(M,K) = h(K  opad, h(K  ipad, M))
HMAC Block Diagram
Hash Uses

• Authentication (HMAC)
• Message integrity (HMAC)
• Message fingerprint
• Data corruption detection
• Digital signature efficiency
• Anything you can do with symmetric crypto
• Also, many, many clever/surprising uses…
Online Bids
• Suppose Alice, Bob and Charlie are bidders
• Alice plans to bid A, Bob B and Charlie C
• They don’t trust that bids will stay secret
• A possible solution?
– Alice, Bob, Charlie submit hashes h(A), h(B), h(C)
– All hashes received and posted online
– Then bids A, B, and C submitted and revealed
• Advantage of online bid
– Hashes don’t reveal bids (one way)
– Can’t change bid after hash sent (collision)
• Limitation of online bid
– it is subject to a forward search attack.
– Fortunately, there is an easy fix that will prevent a
forward search, with no cryptographic keys
required
Spam Reduction
• Spam reduction
• Before accepting email, receiver requires proof
that sender spent effort to create email
– Here, effort == CPU cycles
• Goal is to limit the amount of email that can be
sent
– This approach will not eliminate spam
– Instead, make spam more costly to send
Spam Reduction
• Let M = email message
R = value to be determined
T = current time
• Sender must find R so that
h(M,R,T) = (00…0,X), where
N initial bits of hash value are all zero
• Sender then sends (M,R,T)
• Recipient accepts email, provided that…
h(M,R,T) begins with N zeros
Spam Reduction
• Sender: h(M,R,T) begins with N zeros
• Recipient: verify that h(M,R,T) begins with N zeros
• Work for sender: about 2N hashes
• Work for recipient: always 1 hash
• Sender’s work increases exponentially in N
• Small work for recipient regardless of N
• Choose N so that…
– Work acceptable for normal email users
– Work is too high for spammers
Secret Sharing
Shamir’s Secret Sharing
Two points determine a line
Y
 Give (X0,Y0) to Alice

 Give (X1,Y1) to Bob


(X1,Y1) (X0,Y0) Then Alice and Bob must
cooperate to find secret S
(0,S)  Also works in discrete case
 Easy to make “m out of n”
X
scheme for any m  n, where n is
2 out of 2
the number of participants any m
of which can cooperate to recover
the secret
Shamir’s Secret Sharing
Y
 Give (X0,Y0) to Alice
(X0,Y0)  Give (X1,Y1) to Bob
(X1,Y1)  Give (X2,Y2) to Charlie
(X2,Y2)
 Then any two can cooperate to
(0,S) find secret S
 But one can’t find secret S
X
 A “2 out of 3” scheme
2 out of 3
Shamir’s Secret Sharing
Y  Give (X0,Y0) to Alice
(X0,Y0)
 Give (X1,Y1) to Bob
(X1,Y1)
 Give (X2,Y2) to Charlie
(X2,Y2)
 3 pts determine parabola
(0,S)
 Alice, Bob, and Charlie must
cooperate to find S
X
3 out of 3  A “3 out of 3” scheme
 What about “3 out of 4”?
Secret Sharing Example
• Key escrow  suppose it’s required that your key be
stored somewhere
• Key can be “recovered” with court order
• But you don’t trust FBI to store your keys
• We can use secret sharing
– Say, three different government agencies
– Two must cooperate to recover the key
Secret Sharing Example
Y Your symmetric key is K
(X0,Y0)  Point (X0,Y0) to FBI

(X1,Y1)  Point (X1,Y1) to DoJ


(X2,Y2)
 Point (X2,Y2) to DoC
(0,K)  To recover your key K, two
of the three agencies must
X
cooperate
 No one agency can get K
Visual Cryptography
• Another form of secret sharing…
• Alice and Bob “share” an image
• Both must cooperate to reveal the image
• Nobody can learn anything about image from Alice’s
share or Bob’s share
– That is, both shares are required
• Is this possible?
Visual Cryptography
• How to share a pixel?
• Suppose image is black and white
 Then each pixel is
either black or
white
 We split pixels as
shown
Sharing a B&W Image
• If pixel is white, randomly choose a or b for
Alice’s/Bob’s shares
 If pixel is black,
randomly choose
c or d
 No information
in one “share”
Visual Crypto Example
• Alice’s  Bob’s  Overlaid
share share shares
Visual Crypto
• How does visual “crypto” compare to regular
crypto?
• In visual crypto, no key…
– Or, maybe both images are the key?
• With encryption, exhaustive search
– Except for a one-time pad
• Exhaustive search on visual crypto?
– No exhaustive search is possible!
Visual Crypto
• Visual crypto  no exhaustive search…
• How does visual crypto compare to crypto?
– Visual crypto is “information theoretically” secure  true
of other secret sharing schemes
– With regular encryption, goal is to make cryptanalysis
computationally infeasible
• Visual crypto an example of secret sharing
– Not really a form of crypto, in the usual sense
Random Numbers in
Cryptography
Random Numbers
• Random numbers used to generate keys
– Symmetric keys
– RSA: Prime numbers
– Diffie Hellman: secret values
• Random numbers used for nonces
– Sometimes a sequence is OK
– But sometimes nonces must be random
• Random numbers also used in simulations, statistics, etc.
– Such numbers need to be “statistically” random
Random Numbers
• Cryptographic random numbers must be
statistically random and unpredictable
• Suppose server generates symmetric keys…
– Alice: KA
– Bob: KB
– Charlie: KC
– Dave: KD
• But, Alice, Bob, and Charlie don’t like Dave
• Alice, Bob, and Charlie working together must not
be able to determine KD
Non-random Random Numbers
 Online version of Texas Hold ‘em Poker
o ASF Software, Inc.

• Random numbers used to shuffle the deck


• Program did not produce a random shuffle
• A serious problem or not?
Card Shuffle
• There are 52! > 2225 possible shuffles
• The poker program used “random” 32-bit integer to
determine the shuffle
– So, only 232 distinct shuffles could occur
• Code used Pascal pseudo-random number
generator (PRNG): Randomize()
• Seed value for PRNG was function of number of
milliseconds since midnight
• Less than 227 milliseconds in a day
– So, less than 227 possible shuffles
Card Shuffle
• Seed based on milliseconds since midnight
• PRNG re-seeded with each shuffle
• By synchronizing clock with server, number of
shuffles that need to be tested  218
• Could then test all 218 in real time
– Test each possible shuffle against “up” cards
• Attacker knows every card after the first of five
rounds of betting!
Poker Example
• Poker program is an extreme example
– But common PRNGs are predictable
– Only a question of how many outputs must be observed
before determining the sequence
• Crypto random sequences not predictable
– For example, keystream from RC4 cipher
– But “seed” (or key) selection is still an issue!
• How to generate initial random values?
– Keys (and, in some cases, seed values)
What is Random?
• True “randomness” hard to define
• Entropy is a measure of randomness
• Good sources of “true” randomness
– Radioactive decay  radioactive computers are
not too popular
– Hardware devices  many good ones on the
market
– Lava lamp  relies on chaotic behavior
Randomness
• Sources of randomness via software
– Software is (hopefully) deterministic
– So must rely on external “random” events
– Mouse movements, keyboard dynamics, network
activity, etc., etc.
• Can get quality random bits by such methods
• But quantity of bits is very limited
• Bottom line: “The use of pseudo-random processes
to generate secret quantities can result in pseudo-
security”
Information Hiding
Information Hiding
• Digital Watermarks
– Example: Add “invisible” identifier to data
– Defense against music or software piracy
• Steganography
– “Secret” communication channel
– Similar to a covert channel (more on this later)
– Example: Hide data in image or music file
Watermark
• Add a “mark” to data
• Visibility of watermarks
– Invisible  Watermark is not obvious
– Visible  Such as TOP SECRET
• Robustness of watermarks
– Robust  Readable even if attacked
– Fragile  Damaged if attacked
Watermark Examples
• Add robust invisible mark to digital music
– If pirated music appears on Internet, can trace it back
to original source of the leak
• Add fragile invisible mark to audio file
– If watermark is unreadable, recipient knows that audio
has been tampered (integrity)
• Combinations of several types are sometimes
used
– E.g., visible plus robust invisible watermarks
Watermark Example (1)

• Non-digital watermark: U.S. currency

 Image embedded in paper on rhs


o Hold bill to light to see embedded info
Watermark Example (2)
• Add invisible watermark to photo
• Claimed that 1 inch2 contains enough info to
reconstruct entire photo
• If photo is damaged, watermark can be used
to reconstruct it!
Steganography
• According to Herodotus (Greece 440 BC)
– Shaved slave’s head
– Wrote message on head
– Let hair grow back
– Send slave to deliver message
– Shave slave’s head to expose message  warning of
Persian invasion
• Historically, steganography used more often than
cryptography
Images and Steganography
• Images use 24 bits for color: RGB
– 8 bits for red, 8 for green, 8 for blue
• For example
– 0x7E 0x52 0x90 is this color
– 0xFE 0x52 0x90 is this color
• While
– 0xAB 0x33 0xF0 is this color
– 0xAB 0x33 0xF1 is this color
• Low-order bits don’t matter…
Images and Stego

• Given an uncompressed image file…


– For example, BMP format
• …we can insert information into low-order RGB bits
• Since low-order RGB bits don’t matter, result will
be “invisible” to human eye
– But, computer program can “see” the bits
Stego Example 1

• Left side: plain Alice image


• Right side: Alice with entire Alice in Wonderland
(pdf) “hidden” in the image
Non-Stego Example
 Walrus.html in web browser

• “View source” reveals:


<font color=#000000>"The time has come," the Walrus said,</font><br>
<font color=#000000>"To talk of many things: </font><br>
<font color=#000000>Of shoes and ships and sealing wax </font><br>
<font color=#000000>Of cabbages and kings </font><br>
<font color=#000000>And why the sea is boiling hot </font><br>
<font color=#000000>And whether pigs have wings." </font><br>
Stego Example 2
 stegoWalrus.html in web browser

• “View source” reveals:


<font color=#000101>"The time has come," the Walrus said,</font><br>
<font color=#000100>"To talk of many things: </font><br>
<font color=#010000>Of shoes and ships and sealing wax </font><br>
<font color=#010000>Of cabbages and kings </font><br>
<font color=#000000>And why the sea is boiling hot </font><br>
<font color=#010001>And whether pigs have wings." </font><br>

 “Hidden” message: 011 010 100 100 000 101


Steganography
• Some formats (e.g., image files) are more difficult
than html for humans to read
– But easy for computer programs to read…
• Easy to hide info in unimportant bits
• Easy to destroy info in unimportant bits
• To be robust, must use important bits
– But stored info must not damage data
– Agreement attacks are another concern
• Robust steganography is tricky!
Information Hiding:
The Bottom Line
• Not-so-easy to hide digital information
– “Obvious” approach is not robust
– Stirmark: tool to make most watermarks in images
unreadable without damaging the image
– Stego/watermarking active research topics
• If information hiding is suspected
– Attacker may be able to make information/watermark
unreadable
– Attacker may be able to read the information, given the
original document (image, audio, etc.)

You might also like