0% found this document useful (0 votes)

17 views71 pages

Mod 1

The document provides an overview of Natural Language Processing (NLP), detailing its definition, history, and various stages of processing. It discusses the evolution from rule-based systems to data-driven approaches, highlighting key experiments and theories that shaped the field. Additionally, it covers text classification methods, evaluation metrics, and specific NLP tasks such as Part-of-Speech tagging and Named Entity Recognition.

Uploaded by

try.nahush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views71 pages

Mod 1

Uploaded by

try.nahush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 71

Module #1

CSE3188:
Natural Language Processing
Contents
• What is NLP?
• History of NLP
• Stages of Processing in NLP

2
What is NLP?
● Branch of Artificial Intelligence

● Combination of linguistics (understanding how languages work) and

computer science (building systems to solve natural language-
related problems).

3
History of NLP

4
The Imitation Game

● Identify the machine / human participant.

5
Georgetown Experiment

● Demonstration of machine translation performed by IBM at Georgetown

University in 1954.
● Involved translation of 60+ sentences from Russian to English.
● Consisted of 6 grammar rules and 250 lexical items (stems + endings)
● Initially, it led to lots of research money to be used by governments for
research in MT and NLP. However, real progress was much slower!

6
Syntactic Structures and Conceptual Dependency Theory

● In 1957, Noam Chomsky came up with Syntactic Structures, which

revolutionized linguistics and grammar.
○ Chomsky used phrase structure rules to generate new sentences.
○ Gave examples of grammatically correct sentences without any meaning.
Example: “Colourless green ideas sleep furiously”
○ Advocated a separation of syntax from semantics

7
ALPAC Report

● ALPAC (Automatic Language Processing Advisory Committee) was

formed to evaluate the progress in NLP in general and MT in particular in
1964.
● It published the ALPAC report in 1966, was sceptical of the research
done in the previous decade, and led to large funding cuts for
computational linguistics.

8
From Rules to Data

● Starting from the 1980s, we have seen a movement from using rule-
based NLP systems to statistical systems due to the presence of data.
● With data, we can use probability theory to build reasonably robust
systems for language modeling, machine translation, etc.
○ Example: Which one is correct in each pair and why?
■ I saw an elephant. Vs. I saw an equipment.
■ An European war is currently going on. Vs. A European war is currently going on.
■ Tell me something. Vs. Say me something.
● All this is possible because of probability.

9
Example of Machine Translation

● Earlier approach - Rule-based Machine Translation

● Linguists would create multiple rules in source language and target
language.
● People would use dictionaries to map to root words, morphemes, etc.
● Limited in scope. Could not account for many challenges in MT.

10
Example of Machine Translation

● More modern approach - Phrase-based Machine Translation

● Uses a parallel corpus, where sentences in the source language are
mapped to their equivalent in the target language.
● From this, phrases (or n-grams) are mapped from the source to the
target language.
● Example: India’s Prime Minister (EN) <-> Bhaarth ka Pradhaan Mantri
(HI)
● Used data to maximize alignment probability, and language modeling to
get correct target language sentences.

11
Stages of NLP

12
Stages of Processing
● Phonetics and phonology
● Morphology
● Lexical Analysis
● Syntactic Analysis
● Semantic Analysis
● Pragmatics and Discourse

13
Challenges Associated with
Phonetics / Speech
● Homophones - Words that sound same / similar.
○ After Mahatma Gandhi was killed by Godse, India was mourning. However,
that did not stop some kids playing in the evening in a park. Someone once
asked them “Why are you playing? It is mourning time now.” To which one
of the kids said “Sir. It is not morning time, but it is evening, and we have
just finished our homework!”
● Word boundary - Where to split the words in speech
○ I got a plate.
○ I got up late.
● Disfluency - ah, um, etc…

14
Morphology
● Word formation from root words and morphemes
○ Eg. singular - plural (teacher + s = teachers), gender (lion + ess = lioness),
tense (listen + ing = listening), etc.
● First step in NLP - extract the morphemes of the given word
● Languages rich in morphology - Dravidian languages (Eg. Kannada,
Tamil, Telugu, etc.)
○ Example: Maadidhanu - Maadu (root verb) + past tense + male singular
● Languages poor in morphology - English
○ Example: Did - Do (root verb) + past tense

15
Lexical Analysis
● Words have different meanings.
● Meanings have different words.
Example:
● Where there’s a will…
● There are many relatives

16
Lexical Disambiguation
● Part of Speech disambiguation.
○ Love (is it a verb (I love to eat sushi) or a noun (God’s love is so wonderful)?)

● Sense disambiguation.
○ Bank (I went to the bank on the river to buy fish. vs. I went to the bank on
the river to withdraw Rs. 1000)

17
Syntactic Analysis
• Consider the sentence “I like mangoes” S

VP
NP

NP
V
N

N
I like

mangoes 18
Syntactic Analysis
● S -> NP VP
● NP -> N
● VP -> V NP
● N -> Noun (mangoes) / Pronoun (I)
● VP -> Verb (like)

19
Ambiguity in Parsing
● Natural Language Ambiguity:
I saw a boy with a telescope.
(Who has the telescope?)
● Design Ambiguity:
I saw a boy with a telescope which I dropped. Vs. I saw a boy with a
telescope which he dropped.
(Will the same parse tree be generated using probability?)

20
Semantic Analysis
● Semantic Analysis involves assigning semantic roles to entities in the
text.
Example: John gave the book to Mary.
Agent: John, Recipient: Mary, Object / Theme: the book, etc.
● Semantic ambiguity:
Example: Visiting people involves lot of work.

21
Pragmatics and Discourse
● Study of contexts in which language is used.
○ Example: Coreference Resolution.
● Very hard problem. Requires successful (or satisfactory) solutions of
previous problems.
● Disambiguation clues need not be present within the same
sentence, but can be present anywhere in the text!

22
Contents
• Text Classification

23
What is Text Classification?
• Classifying the text (or a part of it) into distinct classes.
• Which of the following are examples of text classification?
• Machine Translation
• Sentiment Analysis
• Part-of-Speech Tagging
• Named Entity Recognition
• Automatic Essay Grading
• Natural Language Generation
• MCQ Comprehension and Question Answering
• ………
24
Text Classification: definition
• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}

• Output: a predicted class c  C

Classification Methods: Hand-
coded rules
• Rules based on combinations of words or other features
• spam: black-list-address OR (“dollars” AND “you have been selected”)
• Accuracy can be high
• If rules carefully refined by expert
• But building and maintaining these rules is expensive
Classification Methods:
Supervised Machine Learning

• Input:
• a document d
• a fixed set of classes C = {c1, c2,…, cJ}
• A training set of m hand-labeled documents (d1,c1),....,
(dm,cm)
• Output:
• a learned classifier γ:d  c

27
Classification Methods:
Supervised Machine Learning

• Any kind of classifier

• Naïve Bayes
• Logistic regression
• Neural networks
• k-Nearest Neighbors
•…
Document Representations
• Feature-based representation
• Bag of words representation
• Kernel-based representation
• Embedding-based representation

29
Feature-based Representation
• Converts the text into a vector of features.
• Example features:
• Length-based features
• Average word length, average sentence length, no. of syllables per word, etc.
• PoS-based features
• Proportion of nouns, verbs, adverbs, adjectives, punctuations, etc.
• Syntactic-based features
• Average parse-tree depth, No. of SBARs, etc.
• Coherence-based features
• Entity grid features…

30
The Bag of Words Representation

31
Kernel Representation
• Eg. String Kernel
• String Kernel is a similarity function between two strings.
• Eg. Histogram Intersection String Kernel (HISK)

• The kernel is then normalized as follows:

32
Embedding Representation
• Generate an embedding-based representation of the document.
• Eg. Generating the embedding-based representation for an essay for
grading it.

33
Evaluation
• Let's consider just binary text classification tasks
• Imagine you're the CEO of Delicious Pie Company
• You want to know what people are saying about your pies
• So you build a "Delicious Pie" tweet detector
• Positive class: tweets about Delicious Pie Co
• Negative class: all other tweets
The 2-by-2 confusion matrix
Evaluation: Accuracy
• Why don't we use accuracy as our metric?
• Imagine we saw 1 million tweets
• 100 of them talked about Delicious Pie Co.
• 999,900 talked about something else
• We could build a dumb classifier that just labels every tweet "not about
pie"
• It would get 99.99% accuracy!!! Wow!!!!
• But useless! Doesn't return the comments we are looking for!
• That's why we use precision and recall instead
Evaluation: Precision
• % of items the system detected (i.e., items the system labeled as
positive) that are in fact positive (according to the human gold labels)
Evaluation: Recall
• % of items actually present in the input that were correctly identified
by the system.
Why Precision and recall
• Our dumb pie-classifier
• Just label nothing as "about pie"
Accuracy=99.99%
but
Recall = 0
• (it doesn't get any of the 100 Pie tweets)
Precision and recall, unlike accuracy, emphasize true positives:
• finding the things that we are supposed to be looking for.
A combined measure: F
• F measure: a single number that combines P and R:

• We almost always use balanced F1 (i.e.,  = 1)

Development Test Sets ("Devsets") and
Cross-validation

Training set Development Test Set Test Set

• Train on training set, tune on devset, report on testset

• This avoids overfitting (‘tuning to the test set’)
• More conservative estimate of performance
• But paradox: want as much data as possible for training, and as much for dev; how to
split?
Cross-validation: multiple splits
• Pool results over splits, Compute pooled dev performance
Contents
• Introduction to PoS Tagging

43
Part-of-Speech Tagging
• Involves tagging each token with a part-of-speech (Eg. noun).
• Lets say that we have only 6 tags – noun (NN), verb (VB), adjective (JJ)
adverb (RB), function word (FW) to represent all other words and
punctation (.) to represent all punctuations.
• Consider the following sentence.
• The quick brown fox jumped over the lazy dog.
• The tagged sentence is
• The_FW quick_JJ brown_JJ fox_NN jumped_VB over_FW the_FW lazy_JJ
dog_NN ._.

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 44

Calculation of Part-of-Speech Tags
• Find the best Tag sequence (T*), given the word sequence (W).
• (by Bayes’ theorem)
• We get (Bigram assumption)
• Here, is the Initial Probability
• is the Transition Probability
• Similarly,
• is the Lexical Probability

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 45

Training a Part-of-Speech Tagger
• 1. Use a part-of-speech tagged corpus (Eg. Brown Corpus)
• 2. For a set of T tags and a vocabulary of size V, learn the following
tables.
• Initial Probability table = |T*1|
• Transition Probability table = |T*T|
• Lexical Probability table = |V*T|

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 46

Common Tagsets
• Penn Treebank P.O.S. Tags (upenn.edu)
• BNC Tagset
• BNC: The BNC Basic (C5) Tagset (ox.ac.uk)
• BNC: List of Tags in the BNC Enriched Tagset (ox.ac.uk)
• Universal POS tags (universaldependencies.org)

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 47

Challenges of PoS Tagging
• Unknown Words are words which are not present at the time of
training.
• How to handle them?
• Solution:
• Consider a tag set of size |T|.
• Lexical probability of the word given each tag, is .

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 48

Evaluation of PoS Tagging
• Evaluation Method: Train – Test set / n-fold cross-validation
• Evaluation Metric: Accuracy / Precision, Recall, F-Score / Kappa
• Accuracy = No. of correctly tagged tokens / Total tokens
• Precision(T) = No. of times T is correctly tagged / No. of times a word
was tagged T
• Recall(T) = No. of times T is correctly tagged / No. of times the tag T is
present in the test set

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 49

Contents
• Named Entity Recognition

50
Named Entities
• Named entities are anything that can be referred to with a proper
name.
• Multiple class problem
• 3 classes – PER (person), LOC (location), ORG (organization)
• 4 classes – PER (person), LOC (location), ORG (organization), GPE (geo-political
entity)
• More classes – PER (person), LOC (location), ORG (organization), GPE (geo-
political entity) + classes for dates, times, numbers, prices, etc.
• Often can include multi word phrases

51
Examples of Named Entities
Class Examples
Person Sandeep Mathias
Location Bengaluru
Organization Presidency University
Geo-political Entity Prime Minister of India

52
Named Entity Tagging
• The task of Named Entity Recognition (NER):
• Find spans of text that constitute a named entity.
• Tag the entity with the proper NER class.

53
NER Input
• Citing high fuel prices, United Airlines said Friday it has increased
fares by $6 per round trip on flights to some cities also served by
lower-cost carriers.
• American Airlines, a unit of AMR Corp., immediately matched the
move, spokesman Tim Wagner said.
• United, a unit of UAL Corp., said the increase took effect Thursday and
applies to most routes where it competes against discount carriers,
such as Chicago to Dallas and Denver to San Francisco.

54
NER – Finding NER Spans
• Citing high fuel prices, [United Airlines] said [Friday] it has increased
fares by [$6] per round trip on flights to some cities also served by
lower-cost carriers.
• [American Airlines], a unit of [AMR Corp.], immediately matched the
move, spokesman [Tim Wagner] said.
• [United], a unit of [UAL Corp.], said the increase took effect
[Thursday] and applies to most routes where it competes against
discount carriers, such as [Chicago] to [Dallas] and [Denver] to [San
Francisco].

55
NER Output
• Citing high fuel prices, [ORG United Airlines] said [TIME Friday] it has
increased fares by [MONEY $6] per round trip on flights to some cities
also served by lower-cost carriers.
• [ORG American Airlines], a unit of [ORG AMR Corp.], immediately
matched the move, spokesman [PER Tim Wagner] said.
• [ORG United], a unit of [ORG UAL Corp.], said the increase took effect
[TIME Thursday] and applies to most routes where it competes against
discount carriers, such as [LOC Chicago] to [LOC Dallas] and [LOC Denver]
to [LOC San Francisco].

56
Why NER?
• Sentiment analysis: consumer’s sentiment toward a particular
company or person?
• Question Answering: answer questions about an entity?
• Information Extraction: Extracting facts about entities from text

57
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [Washington] was born into slavery on the farm of James Burroughs.
• [Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [Washington] for what may well be his last state visit.
• In June, [Washington] legislators passed a primary seatbelt law.

58
Why NER is not so easy
• Segmentation
• In PoS tagging, no segmentation, since each word gets 1 tag.
• In NER, we have to find the span before adding the tags!
• Type Ambiguity
• Multiple types can map to same span.
• [PER Washington] was born into slavery on the farm of James Burroughs.
• [ORG Washington] went up 2 games to 1 in the four-game series.
• Blair arrived in [LOC Washington] for what may well be his last state visit.
• In June, [GPE Washington] legislators passed a primary seatbelt law.

59
Contents
• Statistical Machine Translation

60
Machine Translation
• Conversion of a text from one language to another using computers.
• Example: Translation from English to French.
• Input language is also known as source language, and output /
translated language is also known as target language.

61
Examples of MT
• Conversion from English to French
• SRC: The three rabbits of Grenoble
• TGT: Les trois lapins de Grenoble
• Conversion from English to Simple English
• SRC: Students should not procrastinate their assignment submissions.
• TGT: Students should not delay submitting their assignments.
• Conversion from Spoken English to English
• SRC: I bought a car for Rupees five lakhs.
• TGT: I bought a car for Rs. 5,00,000.
• Code mixing
62
Machine Translation Paradigms
• Rule-based MT: Using linguistic rules to perform translation
• Example: Plurals end with an “s” in English. Hence, “Hudugaru” (plural of
“Huduga” in Kannada) = “Boys” in English.
• Example-based MT: Translation by analogy
• Statistical-based MT: Using source – target language pairs / parallel
corpus to learn alignments.
• Neural MT: Uses an encoder-decoder architecture to learn
representations of the source and map it to the target language
representations.

63
Challenges of MT
• Ambiguity
• Same word, multiple meanings
• Same meaning, multiple words
• Word Order
• SOV to SVO?
• Morphological Richness
• Challenging for SMT systems!

64
Problems with Rule-based MT
• Requires linguistic knowledge of both languages.
• Maintenance of the system is challenging
• Difficult to handle ambiguity
• Scaling is difficult!

65
Statistical MT
• Model translation using a probabilistic model.
• Measure of confidence in the translations
• Modeling uncertainty in translations
• Using argmax:
• E* = argmax P(e|f)
• E* = best translation
• e = target language text
• f = source language text

66
Word Alignment
• Given a parallel corpus, we find word-level alignments
• Example:
• English: Narendra Modi is the Prime Minister of India
• Hindi: Bharat ke Pradhan Mantri, Narendra Modi Hain.
• Alignments:
• Narendra Modi (English) -> Narendra Modi (Hindi)
• Prime Minister of India -> Bharat ke Pradhan Mantri
• Is (English) -> Hain
• Prime Minister (English) -> Pradhan Mantri
• of India (English) -> Bharat ke
• ………
• ………
67
Word Alignment
• There can be multiple possible alignments.
• Example: Prime Minister -> Bhaarat ke (?)
• Another example: Narendra Modi -> Bhaarat ke (?)
• With one sentence pair, we cannot find alignments properly!
• We need a parallel corpus to find alignments using co-occurrence of
words.

68
Example of Word Alignment
• Consider a parallel corpus with 2 sentences:
• S1: “Three rabbits” = “Trois lapins”
• S2: “The rabbits of Grenoble” = “Les lapins de Grenoble”
• What all words can be aligned?
• What about “The rabbits of Bengaluru”?

69
Phrase Table
• Table of probabilities of phrases
• Phrase table is learnt with word alignments

English Hindi Probability

Prime Minister of India Bhaarat ke Pradhan Mantri 0.75
Prime Minister of India Bhaarat ke Bhootpurv Pradhan Mantri 0.02
Prime Minister of India Pradhan Mantri 0.23

70
Challenges in PBSMT
• Divergent Word Order
• Rich morphology
• Named entities and OOV words
• To be covered in the last module…

Prelim in Gen Math
No ratings yet
Prelim in Gen Math
2 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
Unit-I NLP
No ratings yet
Unit-I NLP
15 pages
Ima 2000
No ratings yet
Ima 2000
56 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
1 - Intro - To - NLP 2
No ratings yet
1 - Intro - To - NLP 2
55 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Module 1 Lecture 1
No ratings yet
Module 1 Lecture 1
29 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
NLP Ia1
No ratings yet
NLP Ia1
7 pages
Nlpslide
No ratings yet
Nlpslide
21 pages
Ed 3 Book
No ratings yet
Ed 3 Book
577 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages
Unit-I NLP
No ratings yet
Unit-I NLP
37 pages
Unit I
No ratings yet
Unit I
36 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
NLP Questions
No ratings yet
NLP Questions
26 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP StudyMaterial
No ratings yet
NLP StudyMaterial
540 pages
Lect1 Intro 3jan08
No ratings yet
Lect1 Intro 3jan08
94 pages
NLP Book
No ratings yet
NLP Book
599 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Ed3book (001 282)
No ratings yet
Ed3book (001 282)
282 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
NLP Lab1
No ratings yet
NLP Lab1
33 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
Sample
No ratings yet
Sample
8 pages
Ed 3 Book
No ratings yet
Ed 3 Book
636 pages
POStagging
No ratings yet
POStagging
72 pages
AI M3 Merged PDF
No ratings yet
AI M3 Merged PDF
98 pages
Unit 1
No ratings yet
Unit 1
99 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Presentation1
No ratings yet
NLP Presentation1
25 pages
NLP Important Question and Answers Module Wise
No ratings yet
NLP Important Question and Answers Module Wise
101 pages
Lecture 01
No ratings yet
Lecture 01
44 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Natural Language Processing (NLP) Research at Boston University
No ratings yet
Natural Language Processing (NLP) Research at Boston University
68 pages
Unit - 1
No ratings yet
Unit - 1
55 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
57 pages
Lec 1.1.2
No ratings yet
Lec 1.1.2
44 pages
Natural Language Processing (NLP) : April 2024
No ratings yet
Natural Language Processing (NLP) : April 2024
88 pages
Unit 4
No ratings yet
Unit 4
39 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Boy Tales of Childhood Discussion and Writing Unit Plan
From Everand
Boy Tales of Childhood Discussion and Writing Unit Plan
Timothy R. Baldwin
No ratings yet
How To Plan A Book
From Everand
How To Plan A Book
Lindsay Clandfield
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Gut and Brain Axis
No ratings yet
Gut and Brain Axis
16 pages
Lab Manual
No ratings yet
Lab Manual
58 pages
File 16
No ratings yet
File 16
8 pages
File 10
No ratings yet
File 10
10 pages
File 07
No ratings yet
File 07
10 pages
Lab Sheet 2
No ratings yet
Lab Sheet 2
4 pages
File 17
No ratings yet
File 17
10 pages
Lab 1
No ratings yet
Lab 1
3 pages
File 20
No ratings yet
File 20
8 pages
Lab Ca3
No ratings yet
Lab Ca3
2 pages
Labsheet 1
No ratings yet
Labsheet 1
3 pages
Python Cho
No ratings yet
Python Cho
13 pages
DL - Project 2.ipynb Colab
No ratings yet
DL - Project 2.ipynb Colab
16 pages
Structured Question Bank
No ratings yet
Structured Question Bank
20 pages
S.No Module #
No ratings yet
S.No Module #
30 pages
Toc Assignment
No ratings yet
Toc Assignment
9 pages
2 Structural Frame Worksheet
No ratings yet
2 Structural Frame Worksheet
3 pages
Complete Syllabus of Class XI & XII: Botany (Medical)
No ratings yet
Complete Syllabus of Class XI & XII: Botany (Medical)
6 pages
Graaff-Reinet Sedimentation Case Study
No ratings yet
Graaff-Reinet Sedimentation Case Study
14 pages
PT 2 3
No ratings yet
PT 2 3
4 pages
Building Regulations Guidance Part B Fire Safety Volume 1 Dwellinghouses
No ratings yet
Building Regulations Guidance Part B Fire Safety Volume 1 Dwellinghouses
135 pages
Guide For Authors-JITAA-2018
No ratings yet
Guide For Authors-JITAA-2018
5 pages
Automatic Data Capture CIM
100% (2)
Automatic Data Capture CIM
10 pages
MMW - Module 3-1
No ratings yet
MMW - Module 3-1
90 pages
Imo Sample Paper Class-4
No ratings yet
Imo Sample Paper Class-4
2 pages
Q3W1 Inquiries
No ratings yet
Q3W1 Inquiries
12 pages
Science 10 DLL
No ratings yet
Science 10 DLL
2 pages
1 s2.0 S277250812200031X Main
No ratings yet
1 s2.0 S277250812200031X Main
23 pages
How To Apply To Be A Prefect or Other Leadership Roles: W Y N T D
No ratings yet
How To Apply To Be A Prefect or Other Leadership Roles: W Y N T D
2 pages
DLL - All Subjects 2 - Q2 - W2 - D3
No ratings yet
DLL - All Subjects 2 - Q2 - W2 - D3
9 pages
CẤU TẠO TỪ VÀ TỪ LOẠI
No ratings yet
CẤU TẠO TỪ VÀ TỪ LOẠI
15 pages
Sex Ratio at Birth: 1. Definition
No ratings yet
Sex Ratio at Birth: 1. Definition
2 pages
Geography
No ratings yet
Geography
4 pages
Intopia WCAG 2.1 Map Audio Shorter Transcript August 2018
No ratings yet
Intopia WCAG 2.1 Map Audio Shorter Transcript August 2018
5 pages
Arte Rupestre Temprano en El Norte Del Uruguay
No ratings yet
Arte Rupestre Temprano en El Norte Del Uruguay
16 pages
Practice Tests 1-2
No ratings yet
Practice Tests 1-2
15 pages
Alien Explanation Homework
100% (1)
Alien Explanation Homework
7 pages
Monash University - Overview PDF
No ratings yet
Monash University - Overview PDF
5 pages
Vocabulary Chapter 2
No ratings yet
Vocabulary Chapter 2
5 pages
Physics Project
No ratings yet
Physics Project
18 pages
The Modern Movement in Architecture
No ratings yet
The Modern Movement in Architecture
8 pages
6 Social Studies Schemes of Work Term 2 KLB Visionary
No ratings yet
6 Social Studies Schemes of Work Term 2 KLB Visionary
7 pages
Module 1 - General Safety 2005
No ratings yet
Module 1 - General Safety 2005
32 pages
Lens Manufacturing Process
No ratings yet
Lens Manufacturing Process
4 pages
Reported Speech (Cont)
No ratings yet
Reported Speech (Cont)
4 pages

Mod 1

Uploaded by

Mod 1

Uploaded by

Module #1

● Combination of linguistics (understanding how languages work) and

● Identify the machine / human participant.

● Demonstration of machine translation performed by IBM at Georgetown

● In 1957, Noam Chomsky came up with Syntactic Structures, which

● ALPAC (Automatic Language Processing Advisory Committee) was

● Earlier approach - Rule-based Machine Translation

● More modern approach - Phrase-based Machine Translation

• Output: a predicted class c  C

• Any kind of classifier

• The kernel is then normalized as follows:

• We almost always use balanced F1 (i.e.,  = 1)

Training set Development Test Set Test Set

• Train on training set, tune on devset, report on testset

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 44

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 45

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 46

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 47

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 48

Dr. Sandeep Albert Mathias, Assistant Professor, CSE 49

English Hindi Probability

You might also like