0% found this document useful (0 votes)

19 views52 pages

Week 8-Module 7 NLP

Uploaded by

funcrusherr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views52 pages

Week 8-Module 7 NLP

Uploaded by

funcrusherr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Module 7

Natural Language
Processing
Natural Language Processing (NLP)

NLP is a field of computer It bridges the gap between

science and artificial human communication and
intelligence concerned with machine code, allowing
enabling computers to computers to process
understand and manipulate information in the way we
human language. naturally use language.
Applications of NLP

NLP has a vast range of applications that are woven into our daily lives:

Machine Translation: Breaking down language barriers by translating text or speech from one language
to another [e.g., Google Translate].

Smart Assistants: Responding to voice commands and questions in a natural way [e.g., Siri, Alexa,
Google Assistant].

Chatbots: Providing customer service or information through automated chat conversations.

Cont...

• Sentiment Analysis: Extracting opinions and emotions from text data [e.g., social
media monitoring].
• Text Summarization: Condensing large amounts of text into key points.
• Autocorrect and Predictive Text: Suggesting corrections and completions as you
type.
• Spam Filtering: Identifying and blocking unwanted emails.
• Search Engines: Ranking search results based on relevance to your query.
Challenges in Processing Human Language

Human language is complex and nuanced, which presents several

challenges for NLP:

Ambiguity: Words can have multiple meanings depending on context

(e.g., "bat" can refer to a flying mammal or a sports equipment).

Sarcasm and Irony: Computers struggle to understand the subtle cues

that convey these forms of expression.
Cont...

• Slang and Informal Language: Keeping up with ever-evolving slang and

informal language usage.
• Incomplete Sentences and Utterances: Human conversation often involves
shortcuts and missing information that can be confusing for machines.
NLP researchers are constantly developing techniques to address these
challenges and improve the accuracy and robustness of NLP systems.
Key NLP Tasks

Here's a glimpse into some fundamental NLP tasks that form the building blocks
for many applications:
• Tokenization: Breaking down text into smaller units like words, punctuation
marks, or phrases.
• Part-of-Speech (POS) tagging: Identifying the grammatical function of each
word in a sentence (e.g., noun, verb, adjective).
• Named Entity Recognition (NER): Recognizing and classifying named
entities in text, such as people, organizations, locations, dates, monetary
values, etc.
1. Tokenization:

Imagine you're dissecting a sentence. Tokenization is the first step, where you
break the sentence down into its individual building blocks. These blocks
can be:
• Words: "The", "quick", "brown", "fox"
• Punctuation marks: ".", ",", "?"
• Sometimes even phrases: "New York City" (depending on the application)
2. POS Tagging:

After you have your tokens, POS tagging assigns a grammatical role
(part-of-speech) to each one. Here's an example:

Sentence: "The quick brown fox jumps over the lazy dog."

POS Tags: (Determiner, Adjective, Adjective, Noun) (Verbs)

(Preposition, Determiner, Adjective, Noun)
3. Named Entity Recognition (NER):

This focuses on identifying and classifying specific entities within the tokens. Imagine
circling important names on a page. NER does something similar, recognizing entities like:
• People: "Albert Einstein"
• Organizations: "Google"
• Locations: "Paris"
• Dates: "July 4th, 2024"
• Monetary values: "$100"
Practical Examples

1. Search Engines:

Tokenization: When you search for "best restaurants NYC", the search engine
breaks it down into tokens like "best", "restaurants", "NYC".

POS Tagging: It can identify "best" as an adjective, "restaurants" as a noun, and

"NYC" as a proper noun (likely a location).

NER: This helps the search engine understand you're looking for highly-rated
restaurants in New York City and refines the search results accordingly.
2. Social Media Analysis:

Tokenization: Analyzing a tweet like "Feeling POS Tagging: It can identify "Feeling" as a verb, NER: This might not be relevant here, but NER
great after winning the game #GoTeam! "great" as an adjective, "winning" as a verb could be used to identify the team mentioned in the
#Champions". (participle), "game" as a noun, and hashtags as hashtags for further analysis.
proper nouns.
3. Spam Filtering:
Tokenization: Breaking down a spam email with subject line "Free $
$$ for you!".

POS Tagging: It can identify "Free" as an adjective, "$$$" as

symbols, and "you" as a pronoun.

NER: This might not have much role here, but tokenization and POS
tagging help identify the generic and promotional nature of the email,
potentially flagging it as spam.
• Tokenization: Breaking down a sentence in one
4. Machine language (e.g., Spanish) into individual words.

Translation: • POS Tagging: Identifying the grammatical function

of each word to understand the sentence structure.
• NER: Recognizing named entities to ensure accurate
translation within the context.
• These tasks work together for the translation engine
to understand the original sentence's meaning and
produce a grammatically correct and meaningful
translation in the target language.
Text Cleaning and Normalization for NLP

• Text data often comes in a raw and messy format. It can contain inconsistencies,
irrelevant information, and variations in how words are written.
• Cleaning and normalization are crucial steps in NLP to prepare the text for further
processing. Here's a breakdown of some common techniques:
Stopwords are very common words
1. that carry little meaning on their own
(e.g., "the", "a", "is").
Removing Removing them can improve
Stopwords: processing efficiency and focus the
analysis on more content-rich words.
• Punctuation marks, symbols, and
emojis can add noise to the data.
• Depending on the task, you might
choose to remove them entirely
2. Removing or convert them to a standard
Special format.

Characters:
Text data can be written in
different cases (uppercase,
3. lowercase).
Lowercasing/Uppercasing:
Converting everything to
lowercase or uppercase ensures
consistency and simplifies further
processing.
4. Normalizing Text:

This can involve:

• Expanding Abbreviations: Converting abbreviations to their full forms
(e.g., "e.g." to "for example").
• Handling Emojis: Converting emojis to text descriptions or removing them
altogether.
• Handling Numbers: Converting numbers to text (e.g., "2023" to "two
thousand twenty-three") or leaving them as numerals depending on the task.
5. Lemmatization vs. Stemming:

These techniques aim to reduce words to their base forms. However, they have subtle
differences:

Lemmatization: This process tries to convert a word to its dictionary form (lemma),
considering its grammatical role in the sentence (e.g., "running" becomes "run", "better"
becomes "good"). It requires a morphological analysis of the word.

Stemming: This process chops off suffixes to arrive at a base form (stem) that might not
always be a real word (e.g., "running" becomes "run", "better" becomes "bet"). It's a
simpler and faster approach but can sometimes lead to incorrect base forms.
The choice between lemmatization
and stemming depends on your
specific application.
Lemmatization is generally
preferred for tasks where
preserving meaning and
Cont... grammatical accuracy is crucial.
Stemming can be faster and
sufficient for simpler tasks where
the exact meaning of the base form
isn't critical.
• Text Normalization Libraries: Libraries
like NLTK (Python) and spaCy (Python)
offer functionalities for many of these text
cleaning and normalization tasks.
• Context-Specific Normalization: The
specific techniques you apply might vary
Additional depending on your NLP task and the nature
Considerations of your text data.
• Trade-offs: There can be trade-offs between
cleaning too aggressively and losing
information, and cleaning too lightly and
introducing noise. Finding the right balance
depends on your specific needs.
Some of the examples
1. Social Media Sentiment Analysis:
Imagine analyzing tweets to understand public sentiment towards a new
product launch. You'd want to clean the text by:
• Removing stopwords: Words like "a", "the", "is" don't contribute much to
sentiment.
• Removing special characters: Emojis, hashtags, and punctuation can be
removed or converted for consistency.
• Lowercasing: Case variations shouldn't affect sentiment analysis.
• Normalizing slang and abbreviations: "OMG" could be converted to "oh
my god" for better understanding.
2. Web Scraping and Text Summarization:
You might scrape news articles to summarize the main points.
Here, cleaning involves:
Removing HTML tags and code: Irrelevant for textual content.

Removing stopwords: Focus on the core information.

Normalizing text: Standardize dates, locations, etc.

3. Chatbot Development:
When building a chatbot, you need to understand user queries effectively.
Cleaning involves:

Correcting typos and misspellings: Users might make mistakes while typing.

Removing irrelevant information: Greetings, salutations, and emojis might

not be crucial for understanding the intent.

Normalization: Standardize formats for dates, times, and measurements.

4. Machine Translation:
Machine translation systems need clean and normalized text for accurate translation.
Cleaning involves:
Removing special characters: Symbols and emojis might not translate well.

Handling named entities: Proper names (people, locations) should be preserved.

Normalization: Standardize date and time formats across languages.

5. Text Classification:
Classifying emails as spam or not-spam requires cleaned text.
Cleaning involves:
Removing email headers and footers: Irrelevant for
classification.
Removing URLs and attachments: Not useful for content
analysis.
Normalization: Standardize greetings and salutations.
Concept: BoW is a simple way to represent documents
as numerical vectors.
Process:
• Each document is treated as a "bag" of words,
ignoring the order and grammar of the words.
1. Bag-of-Words • A vocabulary of unique words is created across all
documents in the corpus.
(BoW) Model: • Each document is represented by a vector where each
element corresponds to a word in the vocabulary.
• The value of each element indicates the frequency
(count) of the corresponding word appearing in that
document.
Document 1: "The cat sat on the mat." Document 2:
"The dog chased the cat."
Vocabulary: {the, cat, sat, on, mat, dog, chased}
Example: Document 1 vector: [3, 1, 1, 1, 1, 0, 0] (3 occurrences
of "the", etc.) Document 2 vector: [2, 1, 0, 0, 0, 1, 1]
Limitations:

Ignores word order and context.

Doesn't capture the relationships between words.

Can be sensitive to high-frequency stopwords.

2. Term Concept: TF-IDF builds upon BoW but considers the
Frequency- importance of words within a document and across the entire
corpus.
Inverse Process:
Document • TF (Term Frequency) for a word in a document is
calculated as its count divided by the total number of words
Frequency in that document.
• IDF (Inverse Document Frequency) for a word is calculated
(TF-IDF): as the logarithm of the total number of documents in the
corpus divided by the number of documents containing that
word. High IDF means the word is less frequent across
documents and potentially more informative.
• The TF-IDF weight for a word is then calculated by
multiplying TF and IDF.
Gives more weight to
important words (rare
but informative).
Benefits:
Reduces the impact of
stopwords.
3. Word Embeddings and Distributed
Representations (Word2Vec, GloVe):
Concept: Word embeddings map words to numerical vectors, capturing semantic relationships between words. Similar
words will have similar vector representations in high-dimensional space.

Techniques:

Word2Vec: Two popular architectures are Skip-gram and CBOW. They predict surrounding words based on a given
word (Skip-gram) or vice versa (CBOW). Words used for prediction and the target word become closer in the vector
space.

GloVe: Analyzes word co-occurrence statistics from a large corpus to learn word vectors. Words that frequently co-occur
are positioned closer in the vector space.
Benefits:

Captures semantic relationships Enables tasks like word similarity Can be used as input features for
between words. detection and analogy completion. various NLP models.
4. Language Models and Pre-trained
Transformers:
Concept: Language models are statistical methods that predict the next word in a sequence based on the preceding
words. Pre-trained transformers are powerful language models trained on massive amounts of text data.

Techniques:

Traditional Language Models (e.g., n-grams): Predict the next word based on the n preceding words (e.g., bigrams,
trigrams).

Pre-trained Transformers (e.g., BERT, GPT-3): These are complex neural network architectures trained on massive text
corpora. They learn contextual representations of words and can be fine-tuned for various NLP tasks like text
classification, question answering, and summarization.
Benefits:

Can handle complex Achieve state-of-the- Offer flexibility for

relationships between art performance on fine-tuning to specific
words in a sentence. many NLP tasks. domains.
Here's an analogy:

Language models and

pre-trained transformers
BoW and TF-IDF are like are like highly
Word embeddings are like
simple indexes in a knowledgeable librarians
advanced search features
library, listing all the who can not only find
that consider synonyms
words in each book relevant information but
and related terms.
(document). also understand the
context and relationships
between them.
Understanding Sentiment Analysis

Sentiment analysis, also known as opinion mining, is the process of computationally identifying and classifying the emotional tone behind a
piece of text. It aims to understand whether the sentiment expressed is positive, negative, or neutral.

Here's a breakdown of the concept:

Applications:
• Social media monitoring: Analyze public opinion towards brands, products, or events.
• Customer reviews: Understand customer satisfaction and identify areas for improvement.
• Market research: Gauge audience sentiment towards specific topics or products.
• Spam filtering: Identify and filter out spam emails with negative or promotional tones.
• Lexicon-based approach: Uses pre-defined dictionaries of
words with positive, negative, and neutral sentiment scores.
The overall sentiment is calculated based on the sentiment
scores of the words in the text.
• Machine learning: Trains models on labeled data (text
with known sentiment) to automatically classify new text.
Popular algorithms include Naive Bayes, Support Vector
Machines (SVM), and Logistic Regression.
Techniques: • Deep learning: Utilizes neural networks like Recurrent
Neural Networks (RNNs) and Long Short-Term Memory
(LSTM) networks to capture complex relationships
between words and improve sentiment classification
accuracy.
Building Sentiment Analysis Models

1. Data Preparation:

Collect a dataset of text samples with labeled sentiment (positive, negative, or

neutral).

Preprocess the text by cleaning it (removing noise, punctuation, stop words) and
potentially normalizing it (lowercasing, stemming/lemmatization).
Bag-of-Words
For machine learning (BoW): Represent the
models, create text as a vector where
features that represent each element
the text. This could indicates the
involve: frequency of a word
in the vocabulary.
2. Feature
Engineering: TF-IDF: Assigns
Word Embeddings:
weights to words
Represent words as
based on their
numerical vectors
importance within the
capturing semantic
document and across
relationships.
the corpus.
1 2 3
3. Model
Training: Choose a suitable machine
learning or deep learning
algorithm for sentiment
classification.
Train the model on your
labeled data.
Evaluate the model's
performance on a separate
test dataset.
Use metrics like accuracy,
precision, recall, and F1-
score to assess the model's
performance.
4. Evaluation: Fine-tune the model or
explore different algorithms
if performance is not
satisfactory.
Sentiment analysis models assign a sentiment score
or class (positive, negative, neutral) to a piece of
text.

Interpreting
It's crucial to understand the limitations:
Sentiment
Analysis Models might misclassify sarcasm, irony, or
Results complex emotions.

Contextual information beyond the text itself might

be needed for accurate interpretation.
Use the results as an indicator of overall
sentiment but don't rely solely on them for
Cont.. drawing definitive conclusions. Analyze the
data with a critical eye and consider the
context in which the text was written.
Sentiment analysis builds upon the field of Natural
Language Processing (NLP) and leverages various
techniques from machine learning and deep learning:

Linguistics: Sentiment analysis relies on understanding

the emotional connotation of words and phrases.

Theoretical
Explanation Machine Learning: Algorithms learn patterns from
labeled data to classify new text samples.

Deep Learning: Deep neural networks can capture

complex relationships between words and context,
improving classification accuracy.
1. Introduction to Topic modeling is a statistical method for
uncovering hidden thematic structures
Topic Modeling within a collection of documents. It aims to
identify groups of words (topics) that
and Latent frequently appear together and describe the
Dirichlet main subjects discussed in the corpus.

Allocation
(LDA)
Latent Dirichlet Allocation (LDA) is a
popular topic modeling algorithm. Here's
the basic idea:
• Each document is assumed to be a mixture
of various topics in different proportions.
• Each topic is represented by a probability
distribution over words in the vocabulary.
Cont...
LDA analyzes the documents in a corpus and
tries to discover these underlying topics and
their distribution across documents.
3. Evaluating Topic There's no single "best" number of topics for
LDA. Here are some approaches to guide
Models and your selection:
• Perplexity: LDA calculates perplexity, a
Selecting the measure of how well the model fits unseen
data. Lower perplexity often indicates a
Optimal Number of better fit. However, it can be sensitive to
model parameters.
Topics • Topic Coherence: Evaluate how well the
words within a topic are semantically
related. Various metrics like coherence
score (CoherenceModel in Gensim) can
help assess this.
• Domain Knowledge: Consider your
understanding of the domain and the
expected number of relevant themes within
the documents.
4. Introduction to Text Generation Techniques

Text generation aims to create coherent and realistic sequences of words, similar to
human-written text. Here are two common approaches:

1. Markov Chains:

A Markov chain is a statistical model that predicts the next word based on the
probability of it appearing after a specific sequence of preceding words (n-grams).

Simple and computationally efficient, but generated text can be repetitive and lack
long-range coherence.
RNNs are a type of neural network architecture
specifically designed for sequential data like text.
2. Recurrent
Neural They can learn complex relationships between
Networks words across longer sequences, leading to more
sophisticated and grammatically correct text
(RNNs): generation.

However, training RNNs often requires large

datasets and significant computational resources.

Crane Models White Paper
100% (1)
Crane Models White Paper
24 pages
GT GD Updated 2010
No ratings yet
GT GD Updated 2010
1 page
Practical Research 1 DLP - CSRS11-IIIa-3
100% (2)
Practical Research 1 DLP - CSRS11-IIIa-3
6 pages
PWC Coso Erm
100% (2)
PWC Coso Erm
26 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Text-Processing-For-NLP-Text-Processing (6)
No ratings yet
Text-Processing-For-NLP-Text-Processing (6)
15 pages
2. NLP Pipeline
No ratings yet
2. NLP Pipeline
50 pages
Handling Corpus Raw Text
No ratings yet
Handling Corpus Raw Text
15 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Unit 1 NLP and TA
No ratings yet
Unit 1 NLP and TA
9 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
Disruptive Technologies AI Lecture 3
No ratings yet
Disruptive Technologies AI Lecture 3
19 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
NLP Study Materials Updated
No ratings yet
NLP Study Materials Updated
43 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
Module-I_NLP (1)
No ratings yet
Module-I_NLP (1)
35 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
No ratings yet
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
171 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
6._NLP
No ratings yet
6._NLP
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP m2
No ratings yet
NLP m2
71 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
AI-2
No ratings yet
AI-2
7 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
CAT King study material 5
No ratings yet
CAT King study material 5
21 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
NLP unit1
No ratings yet
NLP unit1
24 pages
Welcome
No ratings yet
Welcome
8 pages
NLP_Preprocessing_Steps__1740444240
No ratings yet
NLP_Preprocessing_Steps__1740444240
20 pages
Introduction to NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
No ratings yet
Introduction to NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
35 pages
Module2.4 Text Processing
No ratings yet
Module2.4 Text Processing
17 pages
Unit 3
No ratings yet
Unit 3
14 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
No ratings yet
Natural Language Processing For The Semantic Web (Diana Maynard, Kalina Bontcheva Etc.) (Z-Library)
184 pages
NLP Insem Notes
No ratings yet
NLP Insem Notes
13 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
NLP concepts Resources
No ratings yet
NLP concepts Resources
48 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
NLP FINAL
No ratings yet
NLP FINAL
33 pages
Introduction to NLP_first_week_lecture_1st
No ratings yet
Introduction to NLP_first_week_lecture_1st
6 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
AI-CH-4
No ratings yet
AI-CH-4
53 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
NLP lect 2
No ratings yet
NLP lect 2
5 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
17 pages
NLP PYQ SOLUTIONS
No ratings yet
NLP PYQ SOLUTIONS
59 pages
nlp2
No ratings yet
nlp2
45 pages
Nlp Saurav
No ratings yet
Nlp Saurav
16 pages
NLP_course-EDC-1-29
No ratings yet
NLP_course-EDC-1-29
29 pages
NLP SEM IMP
No ratings yet
NLP SEM IMP
46 pages
Unit I NLP
No ratings yet
Unit I NLP
5 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
A Research Presented To The Faculty of College of Industrial Technology Bambang, Nueva Vizcaya
No ratings yet
A Research Presented To The Faculty of College of Industrial Technology Bambang, Nueva Vizcaya
11 pages
Project Management Process Groups and Knowledge Areas Mapping Initiation Planning Executing Closing Monitoring and Controlling
No ratings yet
Project Management Process Groups and Knowledge Areas Mapping Initiation Planning Executing Closing Monitoring and Controlling
7 pages
01 Introduction To Clinical Medicine
100% (1)
01 Introduction To Clinical Medicine
30 pages
CIV 4001 Proposal Presentstion Group#2
No ratings yet
CIV 4001 Proposal Presentstion Group#2
29 pages
BMG871 Assessment 2022-23
No ratings yet
BMG871 Assessment 2022-23
3 pages
U 3EV53600 Phase Misc
No ratings yet
U 3EV53600 Phase Misc
36 pages
Capturing The Six Trillion Dollar Opportunity PDF
No ratings yet
Capturing The Six Trillion Dollar Opportunity PDF
20 pages
Water Management Final
No ratings yet
Water Management Final
22 pages
Maximizing Your Research Visibility and Impact: A Step-by-Step Guide Session 5: Boosting The Visibility of The Research - Make A Paper ID
No ratings yet
Maximizing Your Research Visibility and Impact: A Step-by-Step Guide Session 5: Boosting The Visibility of The Research - Make A Paper ID
82 pages
Article Review-Taxation
No ratings yet
Article Review-Taxation
7 pages
Jain (2017)
No ratings yet
Jain (2017)
26 pages
Diass q1 Module 13
83% (6)
Diass q1 Module 13
15 pages
Chapter Iii Research Methodology
No ratings yet
Chapter Iii Research Methodology
19 pages
Strategy Formulation of Smart Logistics Development in A National Logistics - Company
No ratings yet
Strategy Formulation of Smart Logistics Development in A National Logistics - Company
11 pages
Sales Management
100% (1)
Sales Management
48 pages
Influence of Engagement On The Performance of Company Employees in Cavite, Philippines
No ratings yet
Influence of Engagement On The Performance of Company Employees in Cavite, Philippines
9 pages
Insights From Data With R An Introduction For The Life And Environmental Sciences Owen L Petchey instant download
100% (1)
Insights From Data With R An Introduction For The Life And Environmental Sciences Owen L Petchey instant download
78 pages
Strategic Management and Policy
No ratings yet
Strategic Management and Policy
23 pages
Linear Programming - Graphical Method
No ratings yet
Linear Programming - Graphical Method
6 pages
Chapter 10 - Organizational Culture
No ratings yet
Chapter 10 - Organizational Culture
39 pages
Report On Chocolates
50% (2)
Report On Chocolates
41 pages
Effectiveness of Prototype and Contextualized Daily Lesson Plans in The Performance Level of Grade I Pupils in Mathematics
No ratings yet
Effectiveness of Prototype and Contextualized Daily Lesson Plans in The Performance Level of Grade I Pupils in Mathematics
7 pages
University of Education Lower Mall Campus, Lahore: Department of Economics and Business Administration
No ratings yet
University of Education Lower Mall Campus, Lahore: Department of Economics and Business Administration
8 pages
Week 3 Business Opportunities and Related Factors T3 2020
No ratings yet
Week 3 Business Opportunities and Related Factors T3 2020
16 pages
MCQs For BM
No ratings yet
MCQs For BM
12 pages
Depression: A Study in The Suicidal Tendency
No ratings yet
Depression: A Study in The Suicidal Tendency
59 pages

Week 8-Module 7 NLP

Uploaded by

Week 8-Module 7 NLP

Uploaded by

Module 7

NLP is a field of computer It bridges the gap between

Chatbots: Providing customer service or information through automated chat conversations.

Human language is complex and nuanced, which presents several

Ambiguity: Words can have multiple meanings depending on context

Sarcasm and Irony: Computers struggle to understand the subtle cues

• Slang and Informal Language: Keeping up with ever-evolving slang and

POS Tags: (Determiner, Adjective, Adjective, Noun) (Verbs)

POS Tagging: It can identify "best" as an adjective, "restaurants" as a noun, and

POS Tagging: It can identify "Free" as an adjective, "$$$" as

Translation: • POS Tagging: Identifying the grammatical function

This can involve:

Removing stopwords: Focus on the core information.

Normalizing text: Standardize dates, locations, etc.

Removing irrelevant information: Greetings, salutations, and emojis might

Normalization: Standardize formats for dates, times, and measurements.

Handling named entities: Proper names (people, locations) should be preserved.

Normalization: Standardize date and time formats across languages.

Ignores word order and context.

Doesn't capture the relationships between words.

Can be sensitive to high-frequency stopwords.

Can handle complex Achieve state-of-the- Offer flexibility for

Language models and

Here's a breakdown of the concept:

Collect a dataset of text samples with labeled sentiment (positive, negative, or

Contextual information beyond the text itself might

Linguistics: Sentiment analysis relies on understanding

Deep Learning: Deep neural networks can capture

However, training RNNs often requires large

You might also like