0% found this document useful (0 votes)

73 views

Information Theory, IT Entropy Mutual Information Use in NLP

The document discusses key concepts in information theory and natural language processing (NLP), including: 1) Entropy measures the uncertainty or average information content in a random variable. It helps determine the minimum number of bits needed to encode messages. 2) Mutual information measures the reduction in uncertainty of one random variable given knowledge of another. It quantifies the information two variables share. 3) These concepts are useful in NLP for tasks like language modeling, where entropy and cross-entropy help evaluate how well a model captures real-world language patterns.

Uploaded by

yaqoob cqupt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

Information Theory, IT Entropy Mutual Information Use in NLP

Uploaded by

yaqoob cqupt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Some basic concepts of Information Theory and Entropy

• Information theory, IT
• Entropy
• Mutual Information
• Use in NLP

NLP Language Models 1

Entropy

• Related to the coding theory- more

efficient code: fewer bits for more
frequent messages at the cost of
more bits for the less frequent

NLP Language Models 2

EXAMPLE: You have to send messages about the two
occupants in a house every five minutes

• Equal probability:
0 no occupants
1 first occupant
2 second occupant
3 Both occupants

• Different probability

Situation Probability Code

no occupants .5 0
first occupant .125 110
second occupant .125 111
Both occupants .25 10
NLP Language Models 3
• Let X a random variable taking values x1, x2, ..., xn
from a domain de according to a probability
distribution
• We can define the expected value of X, E(x) as
the summatory of the possible values weighted
with their probability
• E(X) = p(x1)X(x1) + p(x2)X(x2) + ... p(xn)X(xn)

NLP Language Models 4

random variable W that can take one
of several values V(W) and a
Entropy
probability distribution P.

• Is there a lower bound on the number

of bits neede tod encode a message?
Yes, the entropy

• It is possible to get close to the

minimum (lower bound)

• It is also a measure of our uncertainty

about wht the message says (lot of
NLP Language bits-
Models uncertain, few - certain) 5
• Given an event we want to associate its information
content (I)
• From Shannon in the 1940s
• Two constraints:
• Significance:
• The less probable is an event the more information it contains
• P(x1) > P(x2) => I(x2) > I(x1)
• Additivity:
• If two events are independent
• I(x1x2) = I(x1) + I(x2)

NLP Language Models 6

• I(m) = 1/p(m) does not satisfy the
second requirement
• I(x) = - log p(x) satisfies both
• So we define I(X) = - log p(X)

NLP Language Models 7

• Let X a random variable, described by p(X), owning an information
content I
• Entropy: is the expected value of I: E(I)

• Entropy measures information content of a random variable. We can

consider it as the average length of the message needed to transmite
a value of this variable using an optimal coding.
• Entropy measures the degree of desorder (uncertainty) of the random
variable.

NLP Language Models 8

• Uniform distribution of a variable X.
• Each possible value xi  X with |X| = M has the same probability
pi = 1/M
• If the value xi is codified in binary we need log2 M bits of
information
• Non uniform distribution.
• by analogy
• Each value xi has a different probability pi
• Let assume pi to be independent
• If Mpi = 1/ pi we will need log2 Mpi = log2 (1/ pi ) = - log2 pi bits of
information

NLP Language Models 9

Let X ={a, b, c, d} with pa = 1/2; pb = 1/4; pc = 1/8; pd = 1/8

entropy(X) = E(I)=
-1/2 log2 (1/2) -1/4 log2 (1/4) -1/8 log2 (1/8) -1/8 log2 (1/8) = 7/4 =
1.75 bits

X = a?
si no
a X = b? no
si
b X = c?
si no
c a

Average number of questions: 1.75

NLP Language Models 10
Let X with a binomial distribution
X = 0 with probability p
X = 1 with probability (1-p)
H(Xp)
H(X) = -p log2 (p) -(1-p) log2 (1-p)

p = 0 => 1 - p = 1 H(X) = 0 1
p = 1 => 1 - p = 0 H(X) = 0
p = 1/2 => 1 - p = 1/2 H(X) = 1

0
0 1/2 1 p

NLP Language Models 11

NLP Language Models 12
• joint entropy of two random variables, X, Y is
average information content for specifying
both variables

NLP Language Models 13

• The conditional entropy of a random variable Y
given another random variable X, describes what
amount of information is needed in average to
communicate when the reader already knows X

NLP Language Models 14

Chaining rule for probabilities

P(A,B) = P(A|B)P(B) = P(B|A)P(A)

P(A,B,C,D…) = P(A)P(B|A)P(C|A,B)P(D|A,B,C..)

NLP Language Models 15

Chaining rule for entropies

NLP Language Models 16

Mutual Information

I(X,Y) is the mutual information between X

and Y.
• I(X,Y) measures the reduction of
incertaincy of X when Y is known
• It measures too the amouny of
information X owns about Y (or Y about X)

NLP Language Models 17

• I = 0 only when X and Y are independent:
• H(X|Y)=H(X)
• H(X)=H(X)-H(X|X)=I(X,X)
• Entropy is the autoinformation (mutual
information between X and X)

NLP Language Models 18

NLP Language Models 19
Pointwise Mutual Information

• The PMI of a pair of outcomes x and y belonging to

discrete random variables quantifies the discrepancy
between the probability of their coincidence given their
joint distribution versus the probability of their
coincidence given only their individual distributions
and assuming independence

• The mutual information of X and Y is the expected

value of the Specific Mutual Information of all possible
outcomes.
NLP Language Models 20
• H: entropy of a language L
• We ignore p(X)
• Let q(X) a LM
• How good is q(X) as an estimation of
p(X) ?

NLP Language Models 21

Cross Entropy

Measures the “surprise” of a model q when it

describes events following a distribution p

NLP Language Models 22

Relative Entropy Relativa or Kullback-Leibler (KL) divergence

Measures the difference between two probabilistic distributions

NLP Language Models 23

Digital Signal Processing Tutorial
67% (3)
Digital Signal Processing Tutorial
102 pages
Information Theory 2014
No ratings yet
Information Theory 2014
23 pages
Unit - 2
No ratings yet
Unit - 2
10 pages
UNIT 3 Language Modelling
No ratings yet
UNIT 3 Language Modelling
15 pages
NLP and Entropy
No ratings yet
NLP and Entropy
54 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
No ratings yet
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
34 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
Uxpin The Elements of Successful Ux Design
No ratings yet
Uxpin The Elements of Successful Ux Design
48 pages
Clip Unit 4
No ratings yet
Clip Unit 4
9 pages
Probabilistic Theory in Natural Language Processing
No ratings yet
Probabilistic Theory in Natural Language Processing
15 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
language modelling_
No ratings yet
language modelling_
17 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
anlp-03-lm-seqmod
No ratings yet
anlp-03-lm-seqmod
68 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
StatisticalLanguageModel_307c1057bfc7eca695d81d227e3a7b88
No ratings yet
StatisticalLanguageModel_307c1057bfc7eca695d81d227e3a7b88
9 pages
plm.17
No ratings yet
plm.17
15 pages
Trigram Language Models
No ratings yet
Trigram Language Models
19 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
lm24aug
No ratings yet
lm24aug
84 pages
Lec15 17 N Gram Language Model Part1 Copy (1)
No ratings yet
Lec15 17 N Gram Language Model Part1 Copy (1)
49 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
3_2
No ratings yet
3_2
26 pages
ai
No ratings yet
ai
13 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Language Identification in The Limit: The R A N D Corporation
No ratings yet
Language Identification in The Limit: The R A N D Corporation
28 pages
L2 challenges in NLP pptx
No ratings yet
L2 challenges in NLP pptx
18 pages
Unit 1
No ratings yet
Unit 1
94 pages
L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
Entropy: Cross Entropy of Neural Language Models at Infinity-A New Bound of The Entropy Rate
No ratings yet
Entropy: Cross Entropy of Neural Language Models at Infinity-A New Bound of The Entropy Rate
15 pages
NLP_Week_03
No ratings yet
NLP_Week_03
33 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Language Model
No ratings yet
Language Model
2 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
NLp
No ratings yet
NLp
12 pages
KEN2570 4 LanguageModel
No ratings yet
KEN2570 4 LanguageModel
17 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Intro to statistical nlp
No ratings yet
Intro to statistical nlp
57 pages
NLP-UNITS-IV-V
No ratings yet
NLP-UNITS-IV-V
30 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Unit 5
No ratings yet
Unit 5
20 pages
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
No ratings yet
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
136 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Ima 2000
No ratings yet
Ima 2000
56 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
29-Intelligent Reflecting Surfaces Assisted Secure T
No ratings yet
29-Intelligent Reflecting Surfaces Assisted Secure T
5 pages
Defense Against Adversarial Cloud Attack On Remote Sensing
No ratings yet
Defense Against Adversarial Cloud Attack On Remote Sensing
12 pages
Cell Site
No ratings yet
Cell Site
23 pages
Lecture IV - Entropy and Information Theory
No ratings yet
Lecture IV - Entropy and Information Theory
19 pages
Wireless and Mobile Communication
No ratings yet
Wireless and Mobile Communication
19 pages
L03 Information Theory
No ratings yet
L03 Information Theory
9 pages
MATH223 Lecture7 With Solution
No ratings yet
MATH223 Lecture7 With Solution
17 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Detection of Crop Diseases Using Machine Learning
No ratings yet
Detection of Crop Diseases Using Machine Learning
10 pages
Communication Signals and Systems Design: Ii/Iv - B.Tech, Odd Semester
No ratings yet
Communication Signals and Systems Design: Ii/Iv - B.Tech, Odd Semester
31 pages
Ty-Timetable Latest
No ratings yet
Ty-Timetable Latest
2 pages
CS6503 Toc QB2 Rejinpaul
No ratings yet
CS6503 Toc QB2 Rejinpaul
13 pages
Answer - Alex Rodriguez Case
No ratings yet
Answer - Alex Rodriguez Case
8 pages
TP2 BlockChain Transactions
No ratings yet
TP2 BlockChain Transactions
3 pages
Condition Assessment Models For Sewer Pipelines
No ratings yet
Condition Assessment Models For Sewer Pipelines
121 pages
Thesis Fabrizio Galli
No ratings yet
Thesis Fabrizio Galli
22 pages
Session4 Easier Worksheet Answers
No ratings yet
Session4 Easier Worksheet Answers
6 pages
Assign (1225911)
No ratings yet
Assign (1225911)
6 pages
Harman Papneja Resume 2
No ratings yet
Harman Papneja Resume 2
1 page
Control Systems TechNeo
No ratings yet
Control Systems TechNeo
328 pages
Module - 02 - Algorithm and Flow Charts
No ratings yet
Module - 02 - Algorithm and Flow Charts
14 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Feature Fusion Siamese Networkfor Breast CA
No ratings yet
Feature Fusion Siamese Networkfor Breast CA
36 pages
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
No ratings yet
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
8 pages
Optimal Minimax Strategy in A Dice Game: December 2009
No ratings yet
Optimal Minimax Strategy in A Dice Game: December 2009
15 pages
Griffiths Problems 09.05
No ratings yet
Griffiths Problems 09.05
2 pages
Unit 10
No ratings yet
Unit 10
14 pages
HWK 1
No ratings yet
HWK 1
2 pages
Maggi 2
No ratings yet
Maggi 2
8 pages
OS - Assignment 2
No ratings yet
OS - Assignment 2
15 pages
Data Structure: Depth/Breadth-First Search
No ratings yet
Data Structure: Depth/Breadth-First Search
19 pages
Few-Shot Learning Tutorial - Medium
No ratings yet
Few-Shot Learning Tutorial - Medium
16 pages
2022_S1_MAT101 Question BanK
No ratings yet
2022_S1_MAT101 Question BanK
5 pages
Calculus in Iot
No ratings yet
Calculus in Iot
7 pages

Information Theory, IT Entropy Mutual Information Use in NLP

Uploaded by

Information Theory, IT Entropy Mutual Information Use in NLP

Uploaded by

Some basic concepts of Information Theory and Entropy

NLP Language Models 1

• Related to the coding theory- more

NLP Language Models 2

Situation Probability Code

NLP Language Models 4

• Is there a lower bound on the number

• It is possible to get close to the

• It is also a measure of our uncertainty

NLP Language Models 6

NLP Language Models 7

• Entropy measures information content of a random variable. We can

NLP Language Models 8

NLP Language Models 9

Average number of questions: 1.75

NLP Language Models 11

NLP Language Models 13

NLP Language Models 14

P(A,B) = P(A|B)P(B) = P(B|A)P(A)

NLP Language Models 15

NLP Language Models 16

I(X,Y) is the mutual information between X

NLP Language Models 17

NLP Language Models 18

• The PMI of a pair of outcomes x and y belonging to

• The mutual information of X and Y is the expected

NLP Language Models 21

Measures the “surprise” of a model q when it

NLP Language Models 22

Measures the difference between two probabilistic distributions

NLP Language Models 23

You might also like