Information Theory and Coding PDF
Information Theory and Coding PDF
Information Theory is a mathematical representation of the parameters that impacts the processing
and transmission of information. So, Information Theory basically refers to how the information is
being transmitted by the help of a channel from the source towards the destination.
First, the information is being generated by the help of information source then after the generation
of information, it will be transferred to the transmitter part. The transmitter then processes that
information and then it will transfer it towards the channel. A channel here act as a communication
medium between the transmitter as well as receiver which then will transmit this information to the
receiver. The receiver receives that information which is then transmitted towards the destination part.
Some part of the information that is present at this destination goes back again to the information
source in the form of feedback that's why it is known as response of feedback. The block diagram
representation of the transmission of information is as shown below
Information Theory examines the processing, transmission and the extraction of information. Here
processing refers to the processing of information by the transmitter and receiver before transmitting
and receiving the information through the channel, transmission refers to the transmission of
information by the help of transmitter towards the channel and extraction refers to the destination
which extracts the information whatever it receives from the receiver.
Information is an organized data which gives some meaningful knowledge to the receiver. Now let's
understand the concept of information by the help of example of a newspaper and a reader. A reader
reads the newspaper to gather some amount of news. So here are the two cases. According to case
number one, if the news that is read by this reader is already known by that reader, then in that
scenario that data or the news will not act as information to the receiver but according to the case
number second if we can say that that news is not known by this reader then in that case, we can say
that that news will act as a part of information to that reader so this is the basic difference between
the data and the information.
Let X is a discrete memoryless source (DMS) that generates symbols and let the symbols be X1, X2……Xn.
Then the information content will be given as:
𝟏
𝐈(𝐱 𝐢 ) = − 𝐥𝐨𝐠 𝐛 𝐏(𝐱 𝐢 ) or 𝐈(𝐱 𝐢 ) = 𝐥𝐨𝐠 𝐛 )
𝐏(𝐱𝐢
For the calculation of the information content, the value of b will be required. So there will be three
values of b onto which the information content can be calculated. If the value of b is 2, then the unit of
information will be bit only; if the value of b is 10, then the unit of information will be decit only. And if
the value of b is exponential that is e, then the unit of information will be natural. Generally, we will
use the unit of information content as bits so we will use log 2 in place of log B here.
NUMERICALS ON INFORMATION
ENTROPY
Entropy is defined as a measure of the average information content per source symbol. Entropy is
also known as average information. It is represented by H(x). According to its definition, if we want to
derive its unit then the unit of measure of average information content is bits, so the unit of entropy is
bits/symbol. The mathematical expression for the entropy is given as:
Conditions based on Entropy ➔
• H(x) = 1 when the messages that are emitted by the source are equally likely emitted or in other
words the messages are equally probable to occur.
This condition can also be asked in the exam in the form of a question where we need to prove
something. Such a question is show below.
Since we know that the entropy is maximum i.e., 𝐇(𝐱) equals to 1 (assumed). Therefore, we can
𝐝
say that the differentiation of entropy is zero i.e., 𝐇(𝐱) = 𝟎. So, on substituting the value of
𝐝𝐏
𝐝
𝐇(𝐱) equals to 0 in the above expression, we get:
𝐝𝐏
• H(x) = 0 when the probability is equal to 0 or 1.
NUMERICALS ON ENTROPY
Now, the average message content i.e., the entropy of the source is
Here “r” represents the rate at which the picture elements is transmitted by the TV. Value of this “r”
depends on lines of picture information, no. of picture elements and rate of transmission.
MARGINAL, JOINT AND CONDITIONAL PROBABILITY
Marginal probability is the probability of occurrence of a single event only. It is denoted by P(A) where
A is the event to be occurred and P(A) shows the occurrence of event A.
Joint probability is the probability of occurrence of two events simultaneously. In case of the joint
probability, there are two cases:
• Case of Dependent Events ➔ 𝐏(𝐀, 𝐁) = 𝐏(𝐀 ∩ 𝐁) = 𝐏(𝐀). 𝐏(𝐁|𝐀) = 𝐏(𝐁). 𝐏(𝐀|𝐁)
• Case of Independent Events ➔ 𝐏(𝐀, 𝐁) = 𝐏(𝐀 ∩ 𝐁) = 𝐏(𝐀). 𝐏(𝐁)
Conditional probability is the probability of occurrence of one event when the other event has already
been occurred. In case of the conditional probability, there are two cases:
𝐏(𝐀∩𝐁) 𝐨𝐫 𝐏(𝐀,𝐁) 𝐏(𝐁∩𝐀) 𝐨𝐫 𝐏(𝐁,𝐀)
• Case of Dependent Events ➔ 𝐏(𝐀|𝐁) = =
𝐏(𝐁) 𝐏(𝐀)
𝐏(𝐀∩𝐁) 𝐨𝐫 𝐏(𝐀,𝐁) 𝐏(𝐁∩𝐀) 𝐨𝐫 𝐏(𝐁,𝐀)
Or 𝐏(𝐁|𝐀) = =
𝐏(𝐀) 𝐏(𝐁)
• Case of Independent Events ➔ 𝐏(𝐀|𝐁) = 𝐏(𝐀) or 𝐏(𝐁|𝐀) = 𝐏(𝐁)
The three relationships between joint and conditional probability are as follows:
• 𝐇(𝐗, 𝐘) = 𝐇(𝐗 ∩ 𝐘) = 𝐇(𝐗|𝐘) + 𝐇(𝐘)
• 𝐇(𝐗, 𝐘) = 𝐇(𝐗 ∩ 𝐘) = 𝐇(𝐘|𝑿) + 𝐇(𝐗)
• 𝐇(𝐗, 𝐘) = 𝐇(𝐗 ∩ 𝐘) = 𝐇(𝐗) + 𝐇(𝐘)
Let X is a discrete memoryless source (DMS) that generates symbols such as X1, X2……Xm having
codeword length l1, l2……lm with probabilities p1, p2……pm, then average codeword length is given by:
𝐦
𝐋 = ∑ 𝐩 𝐢 𝐥𝐢
𝐢=𝟏
where m = no. of symbols emitted by DMS, pi = probability of occurrence and li = codeword length of
the symbols.
Q. DMS generate the symbols X1 and X2 with probabilities P(X1) = P(X2) = 0.5. Check whether it satisfies
source coding theorem or not.
INFORMATION RATE
Before understanding about the information rate let us talk about the source that generates r
messages per second. Each message that is generated by this source will contain some information
content which can be represented in the form of bits.
MUTUAL INFORMATION
MARKOV SOURCES (MARKOV STATISTICAL MODEL)
Let there are 3 symbols emitted by a source such as A, B and C. Now, on moving from state 1 to state
3, it emits symbol B with a probability of occurrence of 1/4; on moving from state 3 to state 1, it emits
symbol A with a probability of occurrence of 1/4; on moving from state 3 to state 2, it emits symbol C
with a probability of occurrence of 1/4; on moving from state 2 to state 3, it emits symbol B with a
probability of occurrence of 1/4; on moving from state 1 to state 2, it emits symbol B with a probability
of occurrence of 1/4; on moving from state 2 to state 1, it emits symbol A with a probability of
occurrence of 1/4; on moving from state 1 to state 2, it emits symbol C with a probability of occurrence
of 1/4. Similarly, the state 1 loops within itself and emits symbol A with a probability of occurrence of
1/2; the state 2 loops within itself and emits symbol B with a probability of occurrence of 1/2 and the
state 3 loops within itself and emits symbol C with a probability of occurrence of 1/2.
There are 3 tree diagrams that we have to construct as there are 3 states and 3 symbols. We also go
upto 3 layers of tree diagram due to the same reason. The probability of occurrence of each state is
also 1/3.
Let A be the input symbol (transmitter side) which can be either be a1 or a2 and let B be the output
symbol, which can be either b1 or b2 an erasure or error symbol be. The probability of the channel
transmitting the symbol correctly is 𝐏 ̅ = 𝟏 − 𝐏 and the probability of the channel transmitting the error
symbol is 𝐏. 𝐏(𝐁|𝐀) represents the channel matrix.
Now to calculate the channel capacity 𝐂 = 𝐦𝐚𝐱(𝐇(𝐀) − 𝐇(𝐀|𝐁))𝐫𝐬 , we would need to calculate 𝐇(𝐀)
first. Let the probability of input a1 occurring is 𝐏(𝐚𝟏 ) = 𝐰 and the probability of input a2 occurring is
𝐏(𝐚𝟐 ) = 𝐰̅.
𝟏 𝟏 𝟏
Then 𝐇(𝐀) = ∑𝐫𝐢=𝟏 𝐏𝐢 𝐥𝐨𝐠 𝟐 = 𝐰 𝐥𝐨𝐠 𝟐 + 𝐰 ̅ 𝐥𝐨𝐠 𝟐 . Now, 𝐏(𝐀, 𝐁) = 𝐏(𝐚𝐢 , 𝐛𝐣 ) is obtained by multiplying
𝐏𝐣 𝐰 𝐰
̅
𝐏(𝐚𝟏 ) with the 1 row of 𝐏(𝐁|𝐀) and 𝐏(𝐚𝟐 ) with the 2nd row of 𝐏(𝐁|𝐀). After that, 𝐏(𝐛𝐣 ) is calculated by
st
𝐏(𝐚𝐢 ,𝐛𝐣 )
summing up each column of 𝐏(𝐀, 𝐁). Then, 𝐏(𝐀|𝐁) = is calculated.
𝐏(𝐛𝐣 )
𝟏 𝟏 𝟏 𝟏 𝟏
Now, 𝐇(𝐀|𝐁) = ∑𝐫𝐢=𝟏 ∑𝐬𝐣=𝟏 𝐏(𝐚𝐢 , 𝐛𝐣 ) 𝐥𝐨𝐠 𝟐 = 𝐰𝐩 𝐥𝐨𝐠 𝟐 +𝐰
̅𝐩 𝐥𝐨𝐠 𝟐 = 𝐩 (𝐰 𝐥𝐨𝐠 𝟐 +𝐰
̅ 𝐥𝐨𝐠 𝟐 ) =
𝐏(𝐚𝐢 |𝐛𝐣 ) 𝐰 𝐰
̅ 𝐰 𝐰
̅
𝐩(𝐇(𝐀)). Therefore, 𝐂 = 𝐦𝐚𝐱(𝐇(𝐀) − 𝐇(𝐀|𝐁))𝐫𝐬 = 𝐦𝐚𝐱(𝐇(𝐀) − 𝐩(𝐇(𝐀)))𝐫𝐬 = (𝟏 − 𝐩)𝐇(𝐀)𝐫𝐬 =
̅𝐇(𝐀)𝐫𝐬 .
𝐩
BOSE-CHAUDHURI-HOCQUENGHEM (BCH) CODE
REED-SOLOMON CODE
NUMERICALS ON DISCRETE MEMORYLESS CHANNELS (DMC)
APPLICATION OF CONTINUOUS CHANNEL
NUMERICALS ON APPLICATION OF CONTINUOUS CHANNEL
UNIQUELY DETECTABLE CODES (PREFIX CODES)
In the optimum binary tree section, the LHS tree shows the weighted binary tree where if one symbol
is added to the LHS, then we can’t check for RHS anymore whereas the RHS tree shows the optimum
binary tree where even after adding one symbol to the LHS, we can check and add symbol to RHS if
needed.
BLOCK CODES
At Channel Encoder we do channel encoding, where we add block code to solve error. At Source
Encoder we do source encoding, where we reduce redundancy to improve bandwidth utilization.
In (4,3) block code for even parity check, there are only 8 correct codewords. As it is a even parity
block code, so if there are even number of 1s in the codeword, then it is correct otherwise it is a
redundant codeword.
Remember that all the additions are modulo-2 addition.
Suppose there is an error in the bit positioned at column 2 and row 4. Then, there is parity failure for
the 2nd column as well as the 4th row. Thus, there is a failure in overall parity as well. Thus, the
errorneous bit 0 will be replaced with 1 and simultaneously the parity bit of 4 th row, 2nd column and
overall parity will be changed. Thus product code successfully detects and corrects single bit error.
But if there’s error in two or more positions, these technique would fail hence it can’t detect and correct
bit errors. We can conclude that if there is (n1, k1) and (n2, k2) codes, then the product code is (n1n2, k1k2).
REPETITION CODES
Parity bits are not used for even or odd parity check as parity bits are repeated informations.
Product code can detect and correct one bit error but in case of 2 or more bits error, it won’t work.
In the matrix operations, remember to use modulo-2 addition (XOR) operation not normal addition.
Approach - 1 ➔
Approach – 2 ➔
Error vector only shows at which bit from the LHS, the received codeword is having error.
The above given matrix is a systematic generator matrix as the identity matrix and the parity matrix
can be distinguished separately from each other.
CYCLIC CODE
CODE ALGEBRA OF CYCLIC CODE
Here to calculate r(x), we have used long division method. For carrying out long division method, the
numerator and denominator polynomials should be arranged in decreasing order.
GALOIS FIELD (GF)
In GF(q), we should first check for prime numbers and then according to the process shown above, we
will check whether it is a primitive root of GF(q) or not.
Here, we should always do modulo-2 addition. In case of additive inverse, for 0 it is searched in addition
section and it is found that on adding 0 with 0, we get 0 which satisfies the additive inverse condition.
Thus, 0 is the additive inverse of 0 and similarly for 1, the additive inverse is 1. In case of multiplicative
inverse, there is no such number which is a multiplicative inverse for 0 but for 1, the multiplicative
inverse is 1.
A convolution encoder is shown above which is containing the input 011 and this input is applied to the
three shift registers that is denoted by m, m1 and m2. Here, m is denoting the current message bit
whereas m1 and m1 is denoting the previous two successive message bit. Based on the value of these
three shift registers, the value of x1 and x2 is evaluated. Based on these two encoded outputs, we will
get our final output. So, this is how we can design an encoder.
The shifting takes place at regular interval of time which means that if there’s 0 at m, then after some
time, the zero would shift to m1 and then after some time, the zero would shift to m2 and at last 0 will
be discarded. First, we'll write the x1 and x2 and after that the shifting would occur and the value of
these three shift registers will be altered and we will get the new value for x1 and x2; then we'll write
the new values for x1 and x2 and like this same process goes on and we keep on getting new values of
x1 and x2 which generates the o/p sequence.
In this case, dimension of code is (n, k) = (2, 1).
CODE TREE
For plotting the code tree, the two rules are that if the input value = m = 0, then we will have to go to
upper branch and if the input value = m = 1, then we need to go to lower branch. So, initially we are at
state ‘a’ and the value of m here is 1. Since the value of m is 1, so we have to go to lower branch and
the output here is 11. Then the next state is ‘c’ and the value of m is 1. Since it is 1, so we have to go to
lower branch and the output value is 01. Similarly, the next state is ‘d’ and the value of m here is 0.
Since m = 0, so we have to go to upper branch and the output value is 01. The next state is just ‘b’ and
the plotting of code tree ends here.
For code trellis diagram and state diagram, if we have the input message bit = m = 0 then we will use
solid line and if the value of m = 1, then we will use dashed lines.
For getting the value of P1, we have to check one bit starting from P1 itself and then skip 1 bit and then
do XOR operation on all the bits that we got. Similarly, for getting the value of P 2, we have to check two
bits starting from P2 itself and then skip 2 bits and then do XOR operation on all the bits that we got
and similarly, for getting the value of P 4, we have to check four bits starting from P4 itself and then
skip 4 bits and then do XOR operation on all the bits that we got.
In case of even parity, for false parity, we take 1 and for true parity, we take 0.
IRREDUCIBLE POLYNOMIAL
An irreducible polynomial is a non-zero polynomial that cannot be factored into the product of two or
more non-constant polynomials over a given field. Formally, a polynomial f(x) over a field F is
irreducible if it is non-constant and if there do not exist polynomials g(x) and h(x) over F, both of lower
degree than f(x), such that:
𝐟(𝐱) = 𝐠(𝐱) ∙ 𝐡(𝐱)
The finite field or Galois field GF(pn) can be constructed using an irreducible polynomial of degree n
over the finite field GF(p), where p is a prime number. To construct a finite field GF(pn), we have to do
the following steps:
• Choose a Prime p ➔ Start with a base field GF(p) which contains p elements.
• Select an Irreducible Polynomial ➔ Identify an irreducible polynomial f(x) of degree n over GF(p).
• Field Elements ➔ The elements of GF(pn) are the polynomials of degree less than n with
coefficients in GF(p).
• Arithmetic Operations ➔ Define addition and multiplication of these polynomials modulo the
irreducible polynomial f(x).
For example, consider constructing the field GF(23), then the steps are:
• Base Field is GF(2) with 2 elements {0, 1}.
• We choose the irreducible polynomial f(x) = x3 + x + 1.
• The field elements are {0, 1, x, x+1, x2, x2 + 1, x2 + x, x2 + x + 1}.
• Modulo-2 addition is performed by adding corresponding coefficients.
• Multiplication is performed by multiplying the polynomials and then taking the remainder when
divided by f(x).
Irreducible polynomials are integral to the construction of error-correcting codes such as Reed-
Solomon codes and BCH codes.
GOLAY CODE
Golay codes are a class of error-correcting codes that are highly significant in coding theory due to
their remarkable error-correcting capabilities. There are two types of Golay codes:
• Binary Golay Code ➔ The binary Golay code, specifically the (23,12,7) Golay code, is a perfect,
linear, and cyclic error-correcting code. This means it can correct a certain number of errors in a
fixed-length block of bits. The binary Golay code is used in applications requiring high reliability,
such as deep-space communication and digital data transmission. Its error-correcting capability
ensures data integrity even in noisy environments.
➢ (23,12,7) Golay Code - This notation indicates a code with a block length of 23 bits, 12
information bits, and a minimum Hamming distance of 7. The minimum Hamming distance of
7 means the code can correct up to 3 errors (since the error-correcting capability t is given
𝐝−𝟏
by 𝐭 = ⌊ ⌋ where d is the minimum Hamming distance). The properties of this code are:
𝟐
✓ Block Length (n) = 23 bits.
✓ Number of Information Bits (k) = 12 bits.
✓ Minimum Hamming Distance (d) = 7.
✓ Can correct up to 3 errors in each 23-bit block.
• Ternary Golay Code ➔ The ternary Golay code, specifically the (11,6,5) Golay code, operates over a
ternary field (i.e., the field with three elements). It is also a perfect, linear, and cyclic code. The
ternary Golay code is used in systems where ternary logic is advantageous, such as certain optical
and communication systems. Its strong error-correcting capabilities make it suitable for reliable
data transmission.
➢ (11,6,5) Golay Code - This notation indicates a code with a block length of 11 trits (ternary digits),
6 information trits, and a minimum Hamming distance of 5. The minimum Hamming distance
of 5 means the code can correct up to 2 errors. The properties of this code are:
✓ Block Length (n) = 11 trits.
✓ Number of Information Bits (k) = 6 trits.
✓ Minimum Hamming Distance (d) = 5.
✓ Can correct up to 2 errors in each 11-trit block.
DUAL CODE
TURBO CODE
Turbo code is a special type of convolutional code which is more secure, have more length than
convolutional code and thus have more randomness than convolutional code. It is the parallel
concatenation of convolutional codes. In this parallel concatenation, the encoders are connected
parallelly. This is done to increase the length of the code which helps in turn to increase the
randomness. Interleaving i.e., rearrangement of message code is done by interleaver.
𝐇
Thus, 𝛈 = ̂
= 𝟎. 𝟗𝟕𝟔 and redundancy = 1 - η = 1 – 0.976 = 0.024.
𝐇
SHANNON HARTLEY ALGORITHM
The Shannon-Hartley theorem is a fundamental result in information theory that quantifies the
maximum data rate (channel capacity) that can be achieved over a communication channel with a
specified bandwidth in the presence of noise. This theorem provides a theoretical limit on the rate at
which information can be transmitted over a noisy channel without error.
This theorem states that for a continuous-time communication channel with bandwidth B (in Hz) and
signal-to-noise ratio S/N (dimensionless ratio), the channel capacity C (in bits per second) is given by:
𝐒
𝐂 = 𝐁 𝐥𝐨𝐠 𝟐 (𝟏 + )
𝐍
where C is the channel capacity in bits per second (bps), B is the bandwidth of the channel in hertz
(Hz), S is the average signal power, N is the average noise power and S/N is the signal-to-noise ratio
(SNR).