UNIT III
UNIT III
1 Introduction to NLP: Explain what NLP is, describe different NLP processes, List of tools and
services for NLP, Identify NLP use cases, Syntax, semantics, and morphology, Tokenization,
stemming, and lemmatization.
3.2 Text Representation and Feature Engineering: Bag-of-words model, TF-IDF (Term
Frequency-Inverse Document Frequency), Word embeddings (e.g., Word2Vec, GloVe).
3.3 Language Models: N-gram models, Hidden Markov Models, Introduction to neural language
models. Machine Learning for NLP-Supervised learning for text classification, Named Entity
Recognition (NER), Sentiment analysis.
What is NLP?
Natural Language Processing (NLP) is a field of artificial intelligence (AI) that focuses on enabling
computers to understand, interpret, and generate human language. It combines linguistics
(linguistic rules, such as grammar, syntax, and semantics, to understand language), machine
learning, and deep learning (More advanced NLP models, like transformers (e.g., BERT, GPT-4,
T5),) to process and analyze text and speech data.
NLP allows machines to interact with humans in a natural way, making it a key component of
chatbots, search engines, voice assistants, machine translation, and more.
Before analyzing text, raw data must be cleaned and prepared. This involves:
Removing special characters and punctuation (do not contribute to meaning in many
NLP tasks and can introduce noise.)
Lowercasing words (NLP models treat words as different if they have different
capitalizations (Hello ≠ hello). Converting everything to lowercase ensures consistency.)
Removing stop words (common words like "the," "is," "and" as Stop words like "the,"
"is," "and," "a" appear frequently but do not add much meaning. Removing them
reduces data size and improves processing efficiency.)
Syntactic analysis (or parsing) examines the grammatical structure of sentences. It involves:
Example:
"The cat sat on the mat."
Dependency Parsing: "sat" is the main verb, "cat" is the subject, "on the mat" is a
phrase modifying "sat."
Semantic analysis helps machines understand the meaning of words and sentences. It includes:
o Example: "I went to the bank." (Does it mean a riverbank or a financial bank?)
Named Entity Recognition (NER): Identifying names, locations, organizations, and dates
in text.
Semantic Role Labeling (SRL): Understanding the roles words play in sentences (who did
what, when, and where).
Example of SRL
📌 Sentence:
"John gave Mary a book at the library yesterday."
📌 SRL Output:
Word Role
John Agent (Who did the action)
gave Predicate (Action)
Mary Recipient (Who received)
a book Theme (What was given)
at the library Location (Where)
yesterday Time (When)
Converting text from one language to another using statistical, rule-based, or neural machine
translation.
Speech-to-Text (STT): Converting spoken words into text. Example: Voice assistants like
Siri, Alexa, Google Assistant
Library/Tool Features
NLTK (Natural Language Classical NLP tasks (tokenization, stemming, etc.)
Toolkit)
spaCy Fast NLP processing with deep learning integration
Hugging Face Transformers Pre-trained NLP models (BERT, GPT, T5, etc.)
Stanford NLP Academic-grade NLP analysis
Gensim Topic modeling and document similarity
AI-powered chatbots like Siri, Alexa, Google Assistant, ChatGPT use NLP to process human
language and generate meaningful responses.
Companies use sentiment analysis to analyze customer reviews and social media feedback.
Services like Google Translate use NLP to translate text between languages.
Email providers use NLP to filter out spam messages based on keywords and patterns.
NLP can generate short summaries of long articles using extractive or abstractive
summarization.
4.1 Syntax
Syntax refers to the structure of sentences and how words are arranged to make grammatical
sense.
4.2 Semantics
Example: "I will meet you at the bank." (Does "bank" mean a financial institution or a
riverbank?)
4.3 Morphology
Example:
5.1 Tokenization
Example:
5.2 Stemming
Reducing words to their root by removing suffixes, even if the result isn't a real word.
Example:
5.3 Lemmatization
Converting words to their base form using linguistic rules instead of just chopping off endings
(like stemming).
Example:
o Input: "better"
o Input: "running"
Natural Language Processing is at the core of many AI applications, from chatbots to translation
services. With advancements in deep learning, transformers, and large-scale models, NLP is
becoming more sophisticated, helping machines understand human language better than ever
before.