NLP_Week_02
NLP_Week_02
to NLP
Faizad Ullah
1
Text Normalization
2
Tokenization
• Before almost any natural language processing of a
text, the text has to be normalized, a task called text
normalization.
• Linguistic Unit?
• The world has 7097 languages at the time of this writing, according
to the online Ethnologue catalog (Simons and Fennig, 2018).
• code switching
How many words?
N = number of tokens/Instances
V = vocabulary = set of types
|V| is the size of the vocabulary
Issues in Tokenization
• Go to 2
Maximum Matching Word
Segmentation
• Thecatinthehat the cat in the hat
•N=?
•V=?
• he sit <SW> <SW> chair but he like sit <SW> <SW> floor
•N = ?
•V = ?
• <DATE>, <UNK>
Byte-Pair Encoding: A Bottom-up
Tokenization Algorithm
18
Byte-Pair Encoding
• BPE is most commonly used by large language models for
word tokenization.
29
Sentence Segmentation
• !, ? are relatively unambiguous
• Period “.” is quite ambiguous
• Sentence boundary
• Abbreviations like Inc. or Dr.
• Numbers like .02% or 4.3
• Build a binary classifier
• Looks at a “.”
• Decides End-of-Sentence/Not-End-of-Sentence
• Classifiers: hand-written rules, regular expressions, or
machine learning
Determining if a word is end-of-
sentence
Language Models
32
Language Models
• A language model is a machine learning model LM that
predicts upcoming words.
36
Probability
• (1,2) and (2,1) are the only two out of 36 possibilities that sum to
3.
elements in 𝑈 (universe)
• The number of elements in A divided by the number of
1.
Conditional Probability
• Let’s say there is a new screening test
that is supposed to measure
something
• That test will be “positive” for some
people, and “negative” for others.
• If we take the event B to be “people
for whom the test is positive”
• What is the probability that the test
will be “positive” for a randomly
selected person?
The Two Events Jointly
• What happens if we put them together?
• Or stated differently:
• “If we make region B our new Universe, what is the probability of
get:
= P(AB)/P(B) Equation 1
= P(AB)/P(A) Equation
2
The Bayes Theorem
• Now we have everything we need to derive Bayes theorem,
putting equation 1 and 2 together, we get: