NLP DL Lecture1
NLP DL Lecture1
T1 T2 …. Tt
D1 w11 w21 … wt1
D2 w12 w22 … wt2
: : : :
: : : :
Dn w1n w2n … wtn
7
Term Weights: Term Frequency
• More frequent terms in a document are more important, i.e. more
indicative of the topic.
fij = frequency of term i in document j
8
Term Weights: Inverse Document Frequency
• Terms that appear in many different documents are less indicative of overall
topic.
df i = document frequency of term i
= number of documents containing term i
idfi = inverse document frequency of term i,
= log2 (N/ df i)
(N: total number of documents)
• An indication of a term’s discrimination power.
• Log used to dampen the effect relative to tf.
9
TF-IDF Weighting
• A typical combined term importance indicator is tf-idf weighting:
wij = tfij idfi = tfij log2 (N/ dfi)
• A term occurring frequently in the document but rarely in the rest of
the collection is given high weight.
• Many other ways of determining term weights have been proposed.
• Experimentally, tf-idf has been found to work well.
10
Computing TF-IDF -- An Example
Given a document containing terms with given frequencies:
A(3), B(2), C(1)
Assume collection contains 10,000 documents and
document frequencies of these terms are:
A(50), B(1300), C(250)
Then:
A: tf = 3/3; idf = log2(10000/50) = 7.6; tf-idf = 7.6
B: tf = 2/3; idf = log2 (10000/1300) = 2.9; tf-idf = 2.0
C: tf = 1/3; idf = log2 (10000/250) = 5.3; tf-idf = 1.8
11
Neural-based Approaches
Why Neural?
Neural-based Milestones for NLP
14
Neural Language Model
15
Multitask Learning
16
Word Embedding
17
18
Recurrent Neural Networks
19
RNN Common Architectures
20
RNN Common Architectures
21
RNN Common Architectures
22
RNN Common Architectures
23
Enhancement from RNN
24
Enhancement from RNN
25
Gated Recurrent Unit
26
Seq2seq Architecture
27
Seq2seq limitation
28
Attention Mechanism
29
Attention Mechanism
30
Dynamic Memory Model
31
Pre-trained Language Model
32
Pre-trained Neural Language Model
33
Supervised Learning Problem