01 Course Introduction 14-11

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views6 pages

01 Course Introduction 14-11

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

Hi, I'm Dan Jurafsky, and Chris Manning

and I are very happy to welcome you to our

course on natural language processing.
This is a particularly exciting time to be
working on natural language processing.
The vast amount of data on the Web and
social media have made it possible to
build fantastic new applications. Let's
look at one of them. Question answering.
You may know that IBM's Watson won the
Jeopardy challenge on February sixteen,
2011. Answering questions like William
Wilkinson's book inspired this author's
most famous novel. And you may know that
the answer is Bram Stoker who famously
wrote [sound] Dracula. Another important
task is information extraction. For
example, imagine that I have the following
email from my colleague Chris about
scheduling a meeting. We'd like software
to automatically notice that there are
dates, like tomorrow; times, like ten to
eleven:30; in a room, like Gates 159;
extract those information, create a new
calendar entry, and then populate a
calendar with this kind of structured
information, with the event, date, start,
and end, for a calendar program. And
modern email and calendar programs are
capable of doing this from text. Another
application of this kind of information
extraction, involves sentiment analysis.
Imagine that you're, interested in cameras
and you're reading a lot of reviews of
cameras on the web, so here's a bunch of,
bunch of reviews. We'd like to
automatically determine, from the reviews,
that what people care about in cameras,
are particular attributes. If they're
buying a camera, they want to know if it
has good zoom or affordability, or size
and weight. So, you want to automatically
determine those attributes. And then we'd
like to automatically, for any particular
attribute, determine how the reviewers
felt about those attributes. For example,
if a reviewer said nice and compact to
carry, that's a positive sentiment, and
here's another positive example. But a,
but a phrase like flimsy is a negative
sentiment. So we'd like to automatically
detect for each sentence what the
sentiment is, and then aggregate for each
feature for, say, presume for
affordability, so we might decide that
this camera, the reviewers really like the
flash. But they weren't so happy about the
ease of use. We might measure the positive
and negative sentiment. About each
attribute and then aggregate those.
Machine translation is another important
new application and machine translation
can be fully automatic. So for example, we
might have a source sentence in Chinese
and here's Stanford's phrasal MT system
translating that into English. But MT can
also be used to help human translators. So
here we might have an Arabic text and the
human translator translating it into
English might need some help from the MT
system, for example, a collection of
possible next words that the MT system can
build automatically and help the human
translator. Let's look at the state of the
art in language technology. Like every
field, NLP's divided up into specialties
and sub-specialties. A number of these
problems are pretty close to solved. So,
for example, spam detection, while it's
very hard to completely detect spam in our
email boxes, we don't have, 99 percent
spam, and that's because spam detection is
a relatively, easy classification task. A
couple of important component tasks, part
of speech tagging and named entity
tagging. We'll talk about those, later in
the course. And those work at pretty high
accuracies. We're gonna get 97 percent
accuracy in part of speech tagging, and
we'll see how that's important for
parsing. In other tasks, we're making good
progress. Not as commercial not as
completely solved but, there are systems
out there that are, that are being used.
So we talked about sentiment analysis the
task of deciding, thumbs up or thumbs down
on a sentence or a product. Component
technologies like word sense
disambiguation deciding if we're talking
about a rodent or a computer mouse when
people talk about mouses in a search.
We'll talk about parcing which is good
enough now to be used in lots of
applications, and machine translation
usable on the web. A number of
applications however are still quite hard.
So for example, answering hard questions
like how effective is this medicine in
treating that disease, by looking at the
web or by summarizing information we know
is quite hard. Similarly, while we made
some progress on, deciding that, the
sentence xyz company acquired abc company
yesterday means something similar to abc
has been taken over by xyz. The general
problem of detecting that two phrases or
sentences mean the same thing the
paraphrase tasks still quite hard. Even
harder is the task of summarization,
reading a number of, let's say, news
articles that say that the oh the
Dow Jones is up or the S&P500
has jumped, and housing prices
rose, and aggregating that to give user
information, like, in summary, the economy
is good. And finally, one of the hardest
tasks in natural language processing:
carrying on a complete human-machine
communication in dialogue. So, here's a
simple example asking about what movie is
playing when and buy movie tickets, and
you can get applications that do that
today. But the general problem of
understanding everything the user might
ask for, and returning a sensible
response, is quite difficult. Why is
natural language processing so difficult?
One cute example are the kinds of,
ambiguity problems that are called crash
blossoms. So, ambiguity is any case where
a surface form might have multiple
interpretations. A crash blossom is the
name for a kind of headline that has two
meanings, and the ambiguity causes, a
humorous interpretation. So, reading this
first headline, "Violinist Linked to JAL
Crash Blossoms." You might think that the
main verb is linked and the violinist is
being linked to what. He's being linked to
Japan Airline's crash blossoms. Well, what
are crash blossoms? Well this headline
gave the name to this phenomenon because
the actual interpretation that the
headline writer intended, the main verb
was blossoms. Who does the blossoming, the
violinist, and this fact about being
linked to JA crash was a modifier of
violinist. Similar kinds of syntactic
ambiguities. So here "Teacher Strikes Idle
Kids", the writer intended the main verb
to be idle. The strikes caused the kids to
be idle, but, of course, the humorous
interpretation is that the teacher is
striking. Strike is the verb. And we have
a teacher. Striking idle kids. Another
important kind of ambiguity, is word sense
ambiguity. So in our third example, red
tape holds up new bridges, the writer
intended holds up, to mean something like
delay. Call that sense one of holds up.
But the amusing interpretation is the
second sense of holds up, which we might
write down as to support. And now, we get
the interpretation that literal red-tape,
as opposed to bureaucratic red-tape, is
actually supporting a bridge. And, we can
see lots of other kinds of, ambiguities in
these actual headlines. Now, it turns out
that it's not just amusing headlines that
have ambiguity. Ambiguity is pervasive
throughout natural language text. Let's
look at a sensible, non-ambiguous-looking
headline from the New York Times. So the
headline shortened it here a bit, is Fed
raises interest rates, buy that seems
unambiguous. We have a verb here, a vital
parser [inaudible] raises. What gets
raised? A noun phrase, a vital role to
announce here interest rates. And we have
a verb phase, so raising interest rates
and then we have the Fed. Make a little
noun phrase. And then we'll say, this is a
sentence that has a noun phrase, Fed, and
a verb phrase, raises. And what gets
raised is interest rates. So, this is
called a phrase structure parse. We'll
talk about that, later in the course,
phrase structure. So, we could also write
a dependency parse. So, we say the head
verb, raises, has an argument which is
fed, and has another dependent, which is
rates. And, rates has another, itself has
a dependent, interest. So, we can see the
main verb is raising. Well, another
interpretation of the very same sentence,
one that people don't see but that parsers
see right away, is that it's not raises
that's the main verb of the sentence, but
interest. Somebody interests something,
and, that something that gets interested
is rates. And what is interesting these
rates, well. It's fed raises, raises by
the fed. So its a completely different
sentence with a different interpretation
that something is interesting, the rates,
whatever that could mean, and it seems an
unlikely interpretation for people. But of
course, for a parser, this is a perfectly
reasonable interpretation that we have to
learn how to rule out. In fact, the
sentence can get even more difficult. This
is, the actual headline was some, somewhat
longer so we had fed raises interest rates
half a percent. Here we could imagine that
rates is the verb and now we have what is
reading fed raises interest. The interest
in federal raises. Are rating, half a
percent, so we might have a, a dependency
structure like this. So again, interest.
Rates. The raises are what do the
interesting and the Fed is a modifier of
raises. So, whether with our, phrase
structure parse, or dependency parse, and
even more so as we add more words when get
more and more ambiguity, that have to be
solved in order to build a parse, for each
sentence. Now, the format of the course
you're going to have in video quizzes and
most lectures will include a little quiz.
And they're there just to check basic
understanding. They're simple multiple
choice questions. You can retake them if
you get them wrong. Let's see one right
now. A number of other things make natural
language understanding difficult. One of
them is the non standard English that we
frequently see in, text like Twitter
feeds, where we have, capitalization and,
unusual spelling of words, and hash tags
and user ID's and so on. So, all of our,
parsers and part of speech taggers that
we're gonna make use of are often trained
on very clean newspaper text English but,
the actual English in the, in the wild.
Will cause us a lot of problems. We'll
have a lot of segmentation problems for
example if we see that the string y o r k
dash any w as part as New York New Haven,
how do we know, the correct segmentation
is New York? And New Haven. So the New
York, New Haven railroad. And not
something like. York-dash-new. This word
here is not a word like in-dash-law. We
have to solve the segmentation problem
correctly. We have problems with idioms,
and with, new words that haven't be- seen
before. And, we'll also have problems with
entity names, like the movie, A Bug's
Life, which has English words in it, and
so it's often difficult to know where the
movie name starts and ends. And this comes
up very often in biology. Where we have
genes and proteins named with English
words. The task of natural understanding
is very difficult. What tools do we need?
Well, we need knowledge about language,
knowledge about the world and a way to
combine these knowledge sources. So
generally the way we do this is to use
probabilistic models that are built from
language data. So, for example, if we see
the word Maison, in French, we are very
likely to translate that as the word house
in English. On the other hand if we see
the word avocation all in French, we are
very unlikely to translate that as the
general avocado. And training these
probabilistic models in general can be
very hard. But it turns out that we can do
an approximate job of probabilistic models
with rough text features and we'll
introduce those rough to, text features as
we go on. So our goal in the class is
teaching key theory and methods for
statistical natural language processing.
We'll talk about the Viterbi algorithm,
nieve base, and maxen classifiers. We'll
introduce N gram language modeling and
statistical parcing. We'll talk about the
inverted index and TFIDF and vector models
of meaning that are important in
information retrieval. And we'll do this
for practical, robust, real world
applications. We'll talk about information
extraction, about spelling correction,
about information retrieval. The skills
you'll need for the task, you'll need
simple linear algebra so you should know
what a factor is and what a matrix is, you
should have some basic probability theory,
and you need to know how to program an
either job over python because there'll be
weekly programming assignments, you know
have your choice of languages. We're very
happy to welcome you to our course on
Natural Language Processing and we look
forward to seeing you in following
lectures.

NLP JNTUH Unit 3
No ratings yet
NLP JNTUH Unit 3
19 pages
Text-Processing-For-NLP-Text-Processing (6)
No ratings yet
Text-Processing-For-NLP-Text-Processing (6)
15 pages
NLP CHAPTER-1
No ratings yet
NLP CHAPTER-1
24 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
lect1-intro-3jan08 (1)
No ratings yet
lect1-intro-3jan08 (1)
94 pages
Lec2 Groudwork
No ratings yet
Lec2 Groudwork
36 pages
Aatificial Intelligence
No ratings yet
Aatificial Intelligence
61 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
Geometry Moise Downs Answers PDF
0% (8)
Geometry Moise Downs Answers PDF
3 pages
Lesson 1 - NLP
No ratings yet
Lesson 1 - NLP
38 pages
Lec 2
No ratings yet
Lec 2
11 pages
About NLP
No ratings yet
About NLP
14 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
1 - Intro - To - NLP 2
No ratings yet
1 - Intro - To - NLP 2
55 pages
Overview of Natural Language Processing: Advanced AI CSCE 976 Amy Davis
No ratings yet
Overview of Natural Language Processing: Advanced AI CSCE 976 Amy Davis
54 pages
AI_M3_Merged.pdf
No ratings yet
AI_M3_Merged.pdf
98 pages
AIYA Session 3 Presentation (1)
No ratings yet
AIYA Session 3 Presentation (1)
40 pages
nlp-01
No ratings yet
nlp-01
16 pages
Chapter 5 - Communication Perceving and Acting
No ratings yet
Chapter 5 - Communication Perceving and Acting
20 pages
4. Deep Parsing and Tools for NLP
No ratings yet
4. Deep Parsing and Tools for NLP
50 pages
NLP PYQ SOLUTIONS
No ratings yet
NLP PYQ SOLUTIONS
59 pages
Chekurkov Antigravity Research PDF
75% (4)
Chekurkov Antigravity Research PDF
5 pages
ai 6
No ratings yet
ai 6
55 pages
7-text classification-13-11-2024
No ratings yet
7-text classification-13-11-2024
53 pages
Ambiguity in Natural Language Processing
No ratings yet
Ambiguity in Natural Language Processing
9 pages
NLP
No ratings yet
NLP
78 pages
unit-4 NLP
No ratings yet
unit-4 NLP
54 pages
Unit 4C Extreme Sports
No ratings yet
Unit 4C Extreme Sports
18 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
36 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Unit - 1 Introduction
No ratings yet
Unit - 1 Introduction
33 pages
NLP-Questions (1)
No ratings yet
NLP-Questions (1)
26 pages
NLP_Presentation1
No ratings yet
NLP_Presentation1
25 pages
Introduction To NLP: Natural Language Processing
No ratings yet
Introduction To NLP: Natural Language Processing
21 pages
Introduction to NLP_first_week_lecture_1st
No ratings yet
Introduction to NLP_first_week_lecture_1st
6 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
NLP Qna Sem 7 2024 18 11 05 03 29 1
No ratings yet
NLP Qna Sem 7 2024 18 11 05 03 29 1
37 pages
Syntax_complete
No ratings yet
Syntax_complete
22 pages
NLP Ambiguity
No ratings yet
NLP Ambiguity
35 pages
Chapter 7 - Communication Perceving and Acting
No ratings yet
Chapter 7 - Communication Perceving and Acting
21 pages
NLP
No ratings yet
NLP
4 pages
Operational Manual WBL 85 HR
100% (1)
Operational Manual WBL 85 HR
62 pages
NLP Self
No ratings yet
NLP Self
22 pages
Introduction
No ratings yet
Introduction
49 pages
Natural Language Processing: Rada Mihalcea
No ratings yet
Natural Language Processing: Rada Mihalcea
26 pages
Unit - 1
No ratings yet
Unit - 1
9 pages
NLP Soln
No ratings yet
NLP Soln
19 pages
Moral Competence Test
No ratings yet
Moral Competence Test
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
27 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
NLP IA1
No ratings yet
NLP IA1
7 pages
Project Report
No ratings yet
Project Report
12 pages
Natural Language Processing: Rada Mihalcea
No ratings yet
Natural Language Processing: Rada Mihalcea
27 pages
History of NLP
No ratings yet
History of NLP
7 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
26 pages
Disaster Full Report
No ratings yet
Disaster Full Report
146 pages
Test Bank for Organic Chemistry, 3rd Edition: Janice Smith pdf download
100% (2)
Test Bank for Organic Chemistry, 3rd Edition: Janice Smith pdf download
45 pages
Schematic - Blower BLDC Motor Tester
100% (2)
Schematic - Blower BLDC Motor Tester
1 page
02 The Noisy Channel Model of Spelling 19-30
No ratings yet
02 The Noisy Channel Model of Spelling 19-30
12 pages
eStmt_2024-05-31
No ratings yet
eStmt_2024-05-31
8 pages
Lars Svendsen - A Philosophy of Fear-Reaktion Books LTD (2008)
100% (1)
Lars Svendsen - A Philosophy of Fear-Reaktion Books LTD (2008)
157 pages
in ICT
No ratings yet
in ICT
77 pages
03 Real-Word Spelling Correction 9-19
No ratings yet
03 Real-Word Spelling Correction 9-19
4 pages
05 Sentence Segmentation 5-31
No ratings yet
05 Sentence Segmentation 5-31
3 pages
05 Smoothing - Add-One 6-30
No ratings yet
05 Smoothing - Add-One 6-30
3 pages
08 Kneser-Ney Smoothing 8-59
No ratings yet
08 Kneser-Ney Smoothing 8-59
3 pages
02 Regular Expressions in Practical NLP 6-04
No ratings yet
02 Regular Expressions in Practical NLP 6-04
3 pages
DoS Attack Lab
No ratings yet
DoS Attack Lab
29 pages
Canadian Customs Tariff Schedule - HS 72 Iron and Steel
No ratings yet
Canadian Customs Tariff Schedule - HS 72 Iron and Steel
33 pages
Oppenheimer Reflection
No ratings yet
Oppenheimer Reflection
4 pages
Group A - Final Report - MKT 202.24
No ratings yet
Group A - Final Report - MKT 202.24
21 pages
AE3301 EAE Model QP
100% (1)
AE3301 EAE Model QP
2 pages
【89】
No ratings yet
【89】
6 pages
TIB-722-GB-0711 2-Way Control Valves With Pneum - Actuator PDF
No ratings yet
TIB-722-GB-0711 2-Way Control Valves With Pneum - Actuator PDF
58 pages
B Checking, Inspection and Loading Test at Site
No ratings yet
B Checking, Inspection and Loading Test at Site
7 pages
International Journal of Medical Science and Innovative Research (IJMSIR)
No ratings yet
International Journal of Medical Science and Innovative Research (IJMSIR)
18 pages
Danniel Martinez Gonzalez - EarthquakeProofHomesHomesSE
No ratings yet
Danniel Martinez Gonzalez - EarthquakeProofHomesHomesSE
5 pages
A Mobile Distributed System For Personal Security
No ratings yet
A Mobile Distributed System For Personal Security
5 pages
Forensic Chemistry & Toxicology
No ratings yet
Forensic Chemistry & Toxicology
8 pages
Gov. Pascual Avenue, Malabon City Tel/Fax: 921-27-44 S.Y. 2019-2020 Grade 11 and 12: Second Semester
No ratings yet
Gov. Pascual Avenue, Malabon City Tel/Fax: 921-27-44 S.Y. 2019-2020 Grade 11 and 12: Second Semester
2 pages
PPT
No ratings yet
PPT
26 pages
28 - 02 - 17 - 20 - 28 - 18 - Wrapped V-Belts
No ratings yet
28 - 02 - 17 - 20 - 28 - 18 - Wrapped V-Belts
20 pages
Soal Selidik Sikap
No ratings yet
Soal Selidik Sikap
8 pages
Job Organizational Chart
No ratings yet
Job Organizational Chart
2 pages
Hammer Drive Screw
No ratings yet
Hammer Drive Screw
1 page
SPEC of Pipes For Shipbuilding
No ratings yet
SPEC of Pipes For Shipbuilding
13 pages
Rock Your Writing
From Everand
Rock Your Writing
David Chislett
No ratings yet
Business Writing Skills: 3 Quick & Easy ImprovementsYou can Make Today
From Everand
Business Writing Skills: 3 Quick & Easy ImprovementsYou can Make Today
Robert F. Abbott
No ratings yet
No Mistakes Grammar Bites Volume XVII, “Hyphens, and When to Use Them” and “Em Dashes and En Dashes”
From Everand
No Mistakes Grammar Bites Volume XVII, “Hyphens, and When to Use Them” and “Em Dashes and En Dashes”
Giacomo Giammatteo
No ratings yet

01 Course Introduction 14-11

Uploaded by

01 Course Introduction 14-11

Uploaded by

Hi, I'm Dan Jurafsky, and Chris Manning

and I are very happy to welcome you to our

You might also like