0% found this document useful (0 votes)
17 views

NLP Introduction Overview

Uploaded by

Aruna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

NLP Introduction Overview

Uploaded by

Aruna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

INTRODUCTION TO NATURAL

LANGUAGE PROCESSING (NLP)

Overview and Key Concepts


Mrs.S.ARUNAKUMARI ,AP/AI&DS
INTRODUCTION
What is NLP?
NLP is a field of AI that focuses on the interaction
between computers and humans through natural
language.

Importance of NLP
• Enhances human-computer interaction.
• Automates and improves information
extraction.
History of NLP
Early Beginnings
• Roots in computational linguistics.

Evolution Over the Decades


• Transition from rule-based to statistical models
to deep learning.
Key Concepts in NLP
Tokenization
• Splitting text into words or sentences.
Stop Words
• Common words filtered out in text processing.
Stemming and Lemmatization
• Reducing words to their base or root form.
Part-of-Speech Tagging
• Identifying grammatical categories.
Named Entity Recognition (NER)
• Detecting proper names in text.

Syntactic Parsing
• Analyzing sentence structure.

Sentiment Analysis
• Determining emotional tone.
NLP Techniques
Rule-Based Approaches
• Handcrafted linguistic rules.
Statistical Methods
• Probabilistic models.
Machine Learning
• Supervised and unsupervised learning.
Deep Learning
• Neural networks and embeddings.
• Email platforms, such as Gmail, Outlook, etc., use
NLP extensively to provide a range of product
features, such as spam classification, priority
inbox, calendar event extraction, auto-complete,
etc
• Voice-based assistants, such as Apple Siri, Google
Assistant, Microsoft Cortana,and Amazon Alexa
rely on a range of NLP techniques to interact with
the user,understand user commands, and respond
accordingly.
• Modern search engines, such as Google and Bing, which are the
cornerstone of today’s internet, use NLP heavily for various subtasks,
such as query understanding, query expansion, question answering,
information retrieval, an ranking and grouping of the results, to name
a few.

• Machine translation services, such as Google Translate, Bing


Microsoft Translator, and Amazon Translate are increasingly used in
today’s world to solve a wide range of scenarios and business use
cases.
• Organizations across verticals analyze their social media feeds
to build a better and deeper understanding of the voice of their
customers.
• NLP is widely used to solve diverse sets of use cases on e-
commerce platformslike Amazon. These vary from extracting
relevant information from product descriptions to
understanding user reviews.
• Advances in NLP are being applied to solve use cases in
domains such as health‐care, finance, and law.
• Companies such as Arria [1] are working to use
NLP techniques to automatically generate reports
for various domains, from weather forecasting to
financial services. NLP in the Real World
• NLP forms the backbone of spelling- and
grammar-correction tools, such as Grammarly and
spell check in Microsoft Word and Google Docs.
• Jeopardy! is a popular quiz show on TV. In the
show, contestants are presented with clues in the
form of answers, and the contestants must phrase
their respon‐ ses in the form of questions.
• IBM built the Watson AI to compete with the show’s top
players. Watson won the first prize with a million dollars, more
than the world champions. Watson AI was built using NLP
techniques and is one of the examples of NLP bots winning a
world competition.

• NLP is used in a range of learning and assessment tools and


technologies, such as automated scoring in exams like the Graduate
Record Examination (GRE), plagi‐ arism detection (e.g., Turnitin),
intelligent tutoring systems, and language learn‐ ing apps like
Duolingo.

• NLP is used to build large knowledge bases, such as the Google


Knowledge Graph, which are useful in a range of applications like
search and question answering
• NLP Tasks - There is a collection of fundamental tasks that
appear frequently across various NLP projects.
• Language modeling This is the task of predicting what the
next word in a sentence will be based on the history of
previous words.
• The goal of this task is to learn the probability of a sequence
of words appearing in a given language. Language modeling is
useful for building solutions for a wide variety of problems,
such as speech recognition, optical character recognition,
handwriting recognition, machine translation, and spelling
correction.
• Language is a structured system of communication that involves
complex combinations of its constituent components, such as
characters, words, sentences, etc. Lin‐ guistics is the systematic
study of language. In order to study NLP, it is important to
understand some concepts from linguistics about how language
is structured.

• We can think of human language as composed of four major


building blocks: phonemes, morphemes and lexemes, syntax,
and context. NLP applications need knowledge of different
levels of these building blocks, starting from the basic sounds of
language (phonemes) to texts with some meaningful expressions
Phonemes :
• Phonemes are the smallest units of sound in a
language. They may not have any meaning by
themselves but can induce meanings when
uttered in combination with other phonemes.
For example, standard English has 44
phonemes, which are either single letters or a
combination of letters
• Morphemes and lexemes A morpheme is the smallest unit of
language that has a meaning. It is formed by a combination of
phonemes. Not all morphemes are words, but all prefixes and
suffixes are morphemes. For example, in the word
“multimedia,” “multi-” is not a word but a prefix that changes
the meaning when put together with “media.” “Multi-” is a
morpheme.
• Lexemes are the structural variations of morphemes related to
one another by mean‐ ing. For example, “run” and “running”
belong to the same lexeme form. Morphological analysis,
which analyzes the structure of words by studying its
morphemes and lexemes, is a foundational block for many
NLP tasks, such as tokenization, stemming, learning word
embeddings, and part-of-speech tagging.
Syntax:
• Syntax is a set of rules to construct grammatically correct
sentences out of words and phrases in a language. Syntactic
structure in linguistics is represented in many differ‐ ent ways.
A common approach to representing sentences is a parse tree.
• both sentences have a similar structure and hence a similar
syn‐ tactic parse tree. In this representation, N stands for
noun, V for verb, and P for preposition.
• Noun phrase is denoted by NP and verb phrase by VP. The
two noun phrases are “The girl” and “The boat,” while the
two verb phrases are “laughed at the monkey” and “sailed up
the river.”
• The syntactic structure is guided by a set of grammar rules
for the language (e.g., the sentence comprises an NP and a
VP), and this in turn guides some of the fundamental tasks of
language processing, such as parsing. Parsing is the NLP task
of constructing such trees automatically. Entity extraction
and relation extraction are some of the NLP tasks that build
on this knowledge of parsing,
• Context Context is how various parts in a language come together to
convey a particular meaning. Context includes long-term references,
world knowledge, and common sense along with the literal meaning
of words and phrases. The meaning of a sentence can change based
on the context, as words and phrases can sometimes have multiple
meanings.

• Generally, context is composed from semantics and pragmatics.


Semantics is the direct meaning of the words and sentences without
external context. Pragmatics adds world knowledge and external
context of the conversation to enable us to infer implied meaning.
Complex NLP tasks such as sarcasm detection, summariza‐ tion,
and topic modeling are some of tasks that use context heavily.
• Linguistics is the study of language and hence
is a vast area in itself, and we only introduced
some basic ideas to illustrate the role of
linguistic knowledge in NLP.
• Different tasks in NLP require varying degrees
of knowledge about these building blocks of
language
Why Is NLP Challenging?
What makes NLP a challenging problem domain? The ambiguity
and creativity of human language are just two of the
characteristics that make NLP a demanding area to work in.
Ambiguity:
Ambiguity means uncertainty of meaning. Most human
languages are inherently ambiguous. Consider the following
sentence: “I made her duck.” This sentence has multiple
meanings. The first one is: I cooked a duck for her.
• The second meaning is: I made her bend down to avoid an
object. (There are other possible meanings, too; we’ll leave
them for the reader to think of.) Here, the ambiguity comes
from the use of the word “made.” Which of the two meanings
applies depends on the context in which the sentence appears.
• If the sentence appears in a story about a mother and a child,
then the first meaning will probably apply. But if the sentence
appears in a book about sports, then the second meaning will
likely apply
• When it comes to figurative language—i.e., idioms—the
ambiguity only increases. For example, “He is as good as John
Doe.” Try to answer, “How good is he?” The answer depends
on how good John Doe is.
• The examples come from the Winograd Schema Challenge [5],
named after Professor Terry Winograd of Stanford University. This
schema has pairs of sentences that differ by only a few words, but
the meaning of the sentences is often flipped because of this minor
change.
• These examples are easily disambiguated by a human but are not
solvable using most NLP techniques. Consider the pairs of
sentences in the figure and the questions associated with them. With
some thought, how the answer changes should be apparent based on
a single word variation.
• As another experiment, consider taking an off-the-shelf NLP system
like Google Translate and try various examples to see how such
ambiguities affect (or don’t affect) the output of the system.
• Common knowledge A key aspect of any human language is
“common knowledge.” It is the set of all facts that most
humans are aware of. In any conversation, it is assumed that
these facts are known, hence they’re not explicitly mentioned,
but they do have a bearing on the meaning of the sentence. For
example, consider two sentences: “man bit dog” and “dog bit
man.” We all know that the first sentence is unlikely to
happen, while the second one is very possible.
• Why do we say so? Because we all “know” that it is very unlikely
that a human will bite a dog. Further, dogs are known to bite
humans. This knowledge is required for us to say that the first
sentence is unlikely to happen while the second one is possible.
Note that this common knowledge was not mentioned in What Is
Language? | 13 either sentence.
• Humans use common knowledge all the time to understand
and process any language. In the above example, the two
sentences are syntactically very similar, but a computer
would find it very difficult to differentiate between the two,
as it lacks the common knowledge humans have. One of the
key challenges in NLP is how to encode all the things that are
common knowledge to humans in a computational model.
• Creativity Language is not just rule driven; there is also a creative
aspect to it. Various styles, dialects, genres, and variations are used
in any language. Poems are a great example of creativity in
language. Making machines understand creativity is a hard problem
not just in NLP, but in AI in general.
• For most languages in the world, there is no direct mapping between
the vocabularies of any two languages. This makes porting an NLP
solution from one language to another hard. A solution that works
for one language might not work at all for another language. This
means that one either builds a solution that is language agnos‐ tic or
that one needs to build separate solutions for each language. While
the first one is conceptually very hard, the other is laborious and
time intensive. All these issues make NLP a challenging
Challenges in NLP
Ambiguity
• Words with multiple meanings.
Context Understanding
• Capturing context in language.
Sarcasm and Irony
• Detecting non-literal meanings.
Multilingual Processing
• Handling multiple languages.
Recent Advances in NLP
Transformer Models (e.g., BERT, GPT)
• Self-attention mechanism.

Pre-trained Language Models


• Transfer learning.

Transfer Learning in NLP


• Fine-tuning pre-trained models.
Tools and Libraries
NLTK
• Natural Language Toolkit.

SpaCy
• Industrial-strength NLP.

Hugging Face Transformers


• State-of-the-art models.

OpenNLP
• Apache’s machine learning library.
Future of NLP
Trends and Predictions
• Continued integration of NLP in daily life.

Research Directions
• Ethical AI, bias reduction, multilingual
models.

You might also like