0% found this document useful (0 votes)

46 views15 pages

NLP - Viva - Que & Ans

Computer Networks Lab Experiments

Uploaded by

Sravan Kumar Thota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views15 pages

NLP - Viva - Que & Ans

Computer Networks Lab Experiments

Uploaded by

Sravan Kumar Thota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

BASICS

 What is NLP?
NLP (Natural Language Processing) enables computers to understand, interpret, and
generate human language.

 What is Processed Under NLP?

NLP processes text, speech, syntax, semantics, entities, and sentiment.

 Difference Between NLU and NLG?

- NLU : Understanding language (meaning and intent).
- NLG : Generating natural language from data.

 Real-time Examples:
- NLU : Siri understanding a question.
- NLG : A chatbot generating a response.

 Major Problem in Understanding Language?

Ambiguity , where words or phrases have multiple meanings depending on context.

 What is Context?
Context refers to the surrounding text or situation that helps clarify meaning.

 Steps in NLP:
1. Tokenization
2. POS Tagging
3. Lemmatization/Stemming
4. Named Entity Recognition
5. Parsing
6. Sentiment Analysis
7. Machine Translation
 What is Semantics?
Semantics is the study of meaning in language.

 What is Syntax?
Syntax is the structure of sentences and the grammatical rules governing word order.

 What is Discourse?
Discourse is how sentences relate to form meaningful paragraphs or conversations.

 What is Pragmatics?
Pragmatics deals with language use in context, interpreting beyond literal meanings.

 Example of Language Ambiguity:

"The chicken is ready to eat" could mean either the bird is hungry or the food is
prepared.

 What is Grammar?
Grammar is the set of rules that governs sentence structure in a language.

Steps in NLP - Lexical analysis

 Lexical Analysis:
Lexical analysis, also known as scanning or tokenization, is the process of breaking up a
stream of text into individual words, phrases, or tokens. It's the first stage of compiler or
interpreter design, where the source code is analyzed to identify the basic building
blocks of the language.

 Lexical (Token):
A lexical, or token, is a basic unit of meaning in a language, such as:
- Keywords (e.g., if, while)
- Identifiers (e.g., variable names)
- Literals (e.g., numbers, strings)

- Operators (e.g., +, -, *)
- Symbols (e.g., parentheses, brackets)

 Goals of Lexical Analysis:

The primary goals of lexical analysis are:
1. Identify valid tokens
2. Ignore irrelevant characters (e.g., whitespace, comments)
3. Detect syntax errors
4. Prepare input for syntax analysis (parsing)

 Levenshtein Distance:
Levenshtein distance, also known as edit distance, measures the minimum number of
operations (insertions, deletions, substitutions) required to transform one string into
another.
Example:
- "kitten" → "sitting" (Levenshtein distance = 3)
- Substitute "k" with "s"
- Substitute "e" with "i"
- Append "g"

 Applications of Levenshtein Distance:

Levenshtein distance has various applications:
1. Spell checking: Suggest corrections for misspelled words.
2. Text similarity measurement: Compare similarity between texts.
3. Data compression: Measure compression efficiency.
4. Plagiarism detection: Identify similarities between documents.
5. Speech recognition: Measure similarity between spoken words.
6. Bioinformatics: Compare DNA or protein sequences.
7. Natural Language Processing (NLP): Measure semantic similarity.
Other applications include:
- Auto-complete features
- Error detection and correction
- Information retrieval
- Machine learning
The Levenshtein distance algorithm is widely used in many areas where text or
sequence comparison is necessary.

Syntax analysis
 What is checked in syntax analysis?
Syntax analysis checks if the sequence of tokens (words) generated from the lexical
analysis forms a valid structure as per the grammar of the language. It ensures that the
source code follows the language's rules, like matching brackets or correct order of
operators.

 What do we want to ensure by doing syntactic analysis?

- By doing syntactic analysis (parsing), we want to ensure that the program is
syntactically correct, meaning it follows the correct structure, like where keywords,
operators, and variables should appear.

 What is the role of grammar in syntax analysis?

- Grammar defines the rules of how statements and expressions should be structured
in the programming language. Syntax analysis uses this grammar to determine if the
input code is valid.

 What are terminals and non-terminals in a grammar?

- Terminals : These are the actual characters or symbols from the language (e.g.,
keywords, operators).
- Non-terminals : These represent combinations of terminals, used to define the
structure (e.g., expressions, statements).

 What are parse trees, and how many types are there?
- A parse tree is a tree structure that shows how a string (source code) is derived from
a grammar by breaking it down into terminals and non-terminals.
- There are two main types: leftmost derivation and rightmost derivation trees.
The difference is the order in which non-terminals are expanded.

 Which parse tree is good and why?

- The "good" parse tree is usually the one that reflects the most efficient or correct
structure as per the language's semantic rules. For example, in math expressions,
respecting operator precedence is important, so a tree that does this is preferred.

 How do you decide if a language is possible by a given grammar?

- If a grammar can generate all valid strings (statements) of a language, then it defines
that language. By deriving valid statements from the grammar, you can check if a
language is possible.

 What is context-free grammar (CFG)?

- A context-free grammar is a type of grammar where the production rules define a
single non-terminal on the left-hand side. It can generate languages that are more
complex than regular expressions.

 What are the rules of Chomsky Normal Form (CNF)?

- In Chomsky Normal Form, every production rule must be of one of these forms:
- A → BC (where A, B, and C are non-terminals)
- A → a (where A is a non-terminal and a is a terminal)
- A → ε (only for the start symbol and only if the language includes the empty string)

 What is the need for the CKY algorithm?

- The CKY (Cocke-Kasami-Younger) algorithm is used to efficiently parse strings that
belong to a context-free grammar, especially when the grammar is in Chomsky Normal
Form. It helps in deciding if a string can be generated by the grammar.

 What is PCFG and when is it useful?

- Probabilistic Context-Free Grammar (PCFG) assigns probabilities to each production
rule. It's useful when a language can have multiple valid parse trees, and we want to
choose the most likely one based on real-world data.
 If a language has multiple parse trees, how do you decide which parse tree is good?
- If a language has multiple parse trees, we can use *PCFG* to assign probabilities and
choose the tree that represents the most likely or appropriate interpretation. Operator
precedence and associativity rules also help in deciding.

 How can you figure out that a language is possible to derive using multiple parse
trees?
- If a grammar allows multiple ways to derive the same string, you can generate all
possible parse trees and check if the string is derived from the grammar. Ambiguity in
grammar leads to multiple parse trees.

 Can a language have multiple parse trees?

- Yes, some languages are *ambiguous*, meaning a single sentence (or string) can be
parsed in more than one way, leading to multiple parse trees. For example,
mathematical expressions without clear operator precedence can be ambiguous.

Language modelling
 What is language modeling?
Language modeling is the task of predicting the next word or sequence of words in a
sentence based on the previous words. It helps in various NLP tasks like speech
recognition, translation, and text generation.

 What do we want to achieve in this task?

In language modeling, the goal is both language understanding (grasping patterns in
the text) and language generation (producing coherent text).

 What is n-gram modeling?

N-gram modeling is a simple language model that predicts the next word based on
the previous n-1 words. For example, a bigram model uses the previous one word,
and a trigram model uses the previous two words.

 What is conditional probability?

Conditional probability is the probability of an event occurring, given that another
event has already occurred. In language modeling, it's the probability of a word given
the previous word(s).
 What is the probability chain rule?
The probability chain rule breaks down the probability of a sequence of words into
the product of conditional probabilities. For example, for words
𝑤1, 𝑤2, 𝑤3: 𝑃(𝑤1, 𝑤2, 𝑤3) = 𝑃(𝑤1) ⋅ 𝑃(𝑤2∣𝑤1) ⋅ 𝑃(𝑤3∣𝑤1,𝑤2)

 What is the Markov assumption?

The Markov assumption simplifies the language model by assuming that the
probability of a word depends only on a fixed number of previous words (not all
previous words). In an n-gram model, the nth word depends only on the previous n-
1 words.

 How do you calculate the probability of a word using a bigram model?

In a bigram model, the probability of a word 𝑤𝑛 given the previous word 𝑤𝑛−1 is
calculated as:
𝑃(𝑤𝑛∣𝑤𝑛−1) = Count(𝑤𝑛−1,𝑤𝑛) / Count(𝑤𝑛−1)
This is the ratio of the frequency of the word pair to the frequency of the first word.

 What do you mean by a corpus?

A corpus is a large collection of text used for training language models. It contains
various sentences and is essential for learning patterns in language.

 What is smoothing, and why is it needed?

Smoothing is a technique used to handle unseen word combinations (n-grams) in
the training data by assigning a small probability to these combinations. It helps to
avoid zero probabilities in the model when encountering new word pairs.

 How do you say that your LM is good? Which metric is used to evaluate a language
model?
A good language model predicts text well. Perplexity is a common metric used to
evaluate LMs. Lower perplexity means the model is better at predicting the next
word.

 What is Maximum Likelihood Estimation (MLE)?

MLE is a method to estimate the model parameters that maximize the likelihood of
the observed data. In language modeling, it involves choosing probabilities for words
that make the training text most likely.

Parts of Speech Tagging-POS tagging

 What is POS tagging?
POS (Part-of-Speech) tagging is the process of labeling each word in a sentence with
its appropriate part of speech, such as noun, verb, adjective, etc.

 Why do we need POS tagging?

POS tagging helps computers understand the structure of a sentence, enabling them
to process language for tasks like translation, sentiment analysis, and information
retrieval.

 How many tags are in usage in current times for the English language?
The number of POS tags depends on the tagging system used. Common tag sets like
the Penn Treebank use around 36 tags, while more detailed systems may have more.

 What is transmission probability?

Transmission probability is the likelihood of one POS tag following another in a
sequence. It helps in predicting the correct tags based on context.

 What is emission probability?

Emission probability is the likelihood of a specific word being associated with a
particular POS tag. It links words to their possible parts of speech.

 What is the purpose of the Viterbi algorithm?

The Viterbi algorithm is used to find the most likely sequence of POS tags for a
sentence based on transmission and emission probabilities.

 What is the sole aim of the Viterbi algorithm?

Its sole aim is to identify the most probable sequence of hidden states (POS tags)
that could generate the observed data (the sentence).
 How is POS tagging related to Natural Language Processing (NLP)?
POS tagging is a fundamental task in NLP that helps in understanding the grammatical
structure of sentences, which is critical for various language processing tasks like
machine translation, speech recognition, and information extraction.

Text representations
 Why do we need text representations?
Text representations convert words or documents into numerical formats so that
machine learning models can process and analyze them.

 What is one-hot vectorization and how do we do that?

One-hot vectorization represents each word as a vector of binary values, where only one
element (the word's position) is 1, and all others are 0. It captures whether a word is
present but loses word relationships.

 What is Bag of Words (BoW) and why do we call it that?

BoW is a text representation that counts how many times each word appears in a
document. We call it "Bag of Words" because it treats a document as an unordered
collection of words without considering their order.

 What is count vectorization?

Count vectorization converts text into vectors based on word frequencies, where each
word is assigned a count of how often it appears in a document.
 What exactly does IDF tell about a word?
IDF tells how unique or rare a word is across a collection of documents. Higher IDF
means the word is less common and more significant.

 What is N-gram based representation?

N-gram representation captures sequences of 'n' consecutive words instead of
treating words independently. For example, bigrams (n=2) capture two-word
combinations, providing some word order context.

 What are the drawbacks of Bag of Words representation?

 Ignores word order and context.
 Doesn't capture semantics (meaning).
 High dimensionality as vocabulary size increases.

 What do you mean by semantics?

Semantics refers to the meaning of words and how they relate to each other in
context.

 What do we want to achieve by representing text/documents in Bag of Words?

We aim to transform text into a numerical format for processing, while preserving
word frequency information, so that it can be used in machine learning models.
 What do you mean by dimensionality reduction?
Dimensionality reduction refers to techniques that reduce the number of features
(dimensions) in data, simplifying it while preserving important information.

 What is Latent Semantic Analysis (LSA) and how do we perform it?

LSA is a technique that reduces dimensionality by identifying relationships between
words and documents based on their co-occurrence patterns. It’s performed using
Singular Value Decomposition (SVD) on a term-document matrix.

 What does SVD stand for?

SVD stands for Singular Value Decomposition, a mathematical method used to
decompose a matrix into three smaller matrices, helping in dimensionality
reduction.

Topic Modeling
 What do we want to achieve via topic modeling?
We aim to discover hidden themes or topics within a large collection of documents,
helping us organize and understand the data better.

 What is the outcome of topic modeling?

The outcome is a set of topics, where each topic is represented by a group of words,
and each document is associated with a mixture of these topics.

 What is LDA (Latent Dirichlet Allocation)?

LDA is a popular topic modeling algorithm that assumes documents are mixtures of
topics, and each topic is a mixture of words. It assigns topics to words in documents
based on word co-occurrences.

 What is a document-topic matrix?

The document-topic matrix shows how much each topic contributes to each
document. Rows represent documents, and columns represent topics.

 What is a word-topic matrix?

The word-topic matrix shows how strongly each word is associated with each topic.
Rows represent words, and columns represent topics.

 How do you decide the number of topics?

The number of topics is typically decided through experimentation or by using
techniques like cross-validation. You might also use domain knowledge to estimate
the ideal number of topics.

 What is the distribution of distributions under Dirichlet Allocation?

Dirichlet distribution is a probability distribution over distributions. In LDA, it helps
in defining a distribution of topics for each document and a distribution of words for
each topic. It controls the sparsity of these distributions.

Word sense disambiguation

 What do you mean by word sense disambiguation?

Word sense disambiguation (WSD) is the process of determining the correct
meaning (sense) of a word in a given context when the word has multiple meanings.

 What is the relation between a dictionary and sense disambiguation?

A dictionary provides the different meanings (senses) of a word, and word sense
disambiguation helps in selecting the right sense from the dictionary based on the
context in which the word is used.

 What is the Lesk algorithm

The Lesk algorithm is a method for word sense disambiguation that assigns the
correct meaning to a word by comparing the dictionary definitions of the word's
senses with the context in which the word appears. It chooses the sense with the
most overlapping words between the definition and the surrounding context.

Lab based viva questions

 What is text preprocessing?
Text preprocessing involves cleaning and transforming raw text into a usable format
for analysis or modeling. It includes steps like tokenization, removing stopwords,
and converting to lowercase.

 What is stemming?
Stemming reduces words to their root form by cutting off prefixes and suffixes, e.g.,
"running" becomes "run."

 What is lemmatization?
Lemmatization reduces words to their base or dictionary form (lemma), considering
the word’s meaning, e.g., "running" becomes "run" but keeps the correct meaning.

 What does NLTK stand for

*NLTK* stands for *Natural Language Toolkit*, a Python library for working with
human language data.

 What is spaCy
spaCy is an advanced Python library for Natural Language Processing (NLP), providing
tools for tokenization, part-of-speech tagging, and more.

 What are stopwords?

Stopwords are common words (like "the", "is", "in") that are usually removed from text
during preprocessing because they don't carry much meaning.

 What are X and Y in machine learning?

 X represents the input data (features).
 Y represents the output or target (labels).

 What does the IMDB review dataset contain?

The IMDB dataset contains movie reviews, labeled as positive or negative sentiments.
We use it to train models for sentiment analysis (positive/negative classification).

 What is data and what is a label?

 Data refers to the input features (e.g., text, images).
 Label refers to the correct output or target associated with the data (e.g.,
sentiment, category).

 What is the relation between data and label?

Data provides the input that the model uses, and the label is the expected outcome that
helps the model learn during training.

 What happens in the training phase?

In the training phase, a model learns patterns from the input data (X) and
corresponding labels (Y) to make predictions on unseen data.

 What do you mean by classification?

Classification is the process of predicting the category or label of new data, based on
patterns learned during training.

 How do you evaluate a model?

Models are evaluated using metrics like accuracy, precision, recall, F1 score, and
confusion matrix to measure how well they predict on test data.

 What is the difference between text representation and model?

 Text representation converts text into a format (e.g., vectors) that models can
understand.
 Model refers to the algorithm that learns patterns from the represented text.

 What is a confusion matrix?

A confusion matrix is a table used to evaluate a model's performance, showing the true
positives, true negatives, false positives, and false negatives.

 What are TP, TN, FP, FN?

 TP : True Positives (correct positive predictions)
 TN : True Negatives (correct negative predictions)
 FP : False Positives (incorrectly predicted positives)
 FN : False Negatives (incorrectly predicted negatives)
 What is the real-time application of text classification?
Text classification is used in spam detection, sentiment analysis, email
categorization, chatbot responses, and customer service automation.

NLP Sem Answers (All)
No ratings yet
NLP Sem Answers (All)
124 pages
Macrame Manual
100% (1)
Macrame Manual
79 pages
Ai Unit - 5
No ratings yet
Ai Unit - 5
12 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
Preliminaries
No ratings yet
Preliminaries
45 pages
CSC 305: Programming Paradigm: Introduction To Language, Syntax and Semantics
No ratings yet
CSC 305: Programming Paradigm: Introduction To Language, Syntax and Semantics
38 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Principals of Programming Language 1.2
No ratings yet
Principals of Programming Language 1.2
86 pages
NLP Soln
No ratings yet
NLP Soln
19 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
CSC305 Chapter 2 (Part 1)
No ratings yet
CSC305 Chapter 2 (Part 1)
23 pages
Describing Syntax and Semantics
No ratings yet
Describing Syntax and Semantics
6 pages
UNIT 1
No ratings yet
UNIT 1
17 pages
Lecture Template 16x9
No ratings yet
Lecture Template 16x9
16 pages
5.natural Language Processing
No ratings yet
5.natural Language Processing
5 pages
NLP
No ratings yet
NLP
17 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Chapter 3 "Describing Syntax and Semantics"
No ratings yet
Chapter 3 "Describing Syntax and Semantics"
10 pages
nlp
No ratings yet
nlp
35 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
5th Unit NLP (1)
No ratings yet
5th Unit NLP (1)
32 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
100% (1)
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
42 pages
NLP Module 2
No ratings yet
NLP Module 2
18 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
13) Natural Language Processing
No ratings yet
13) Natural Language Processing
28 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Unit_5_NLP
No ratings yet
Unit_5_NLP
16 pages
Quest_NLP
No ratings yet
Quest_NLP
13 pages
Compiler: Mahmudul Hasan (Moon)
No ratings yet
Compiler: Mahmudul Hasan (Moon)
4 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP-1
No ratings yet
NLP-1
13 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
NLP Assignment
No ratings yet
NLP Assignment
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
AI Unit 5
No ratings yet
AI Unit 5
18 pages
NLP Final
No ratings yet
NLP Final
4 pages
CH 6
No ratings yet
CH 6
30 pages
Sample
No ratings yet
Sample
8 pages
3nlp Computer
No ratings yet
3nlp Computer
83 pages
1.describing Syntax and Semantics
No ratings yet
1.describing Syntax and Semantics
110 pages
NLP unit-1-introduction-and-word-level-analysis
No ratings yet
NLP unit-1-introduction-and-word-level-analysis
25 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
UNIT-I Part 2 Describing Syntax and Semantics
No ratings yet
UNIT-I Part 2 Describing Syntax and Semantics
70 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
Poeter Stemmer Algorithm
No ratings yet
Poeter Stemmer Algorithm
57 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
The Magic of Formal Languages
From Everand
The Magic of Formal Languages
Pasquale De Marco
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
InterCompany POSO SAP
No ratings yet
InterCompany POSO SAP
9 pages
Pqs User Manual Smx-0619-0102 v1.1
No ratings yet
Pqs User Manual Smx-0619-0102 v1.1
356 pages
90 110003 601
No ratings yet
90 110003 601
462 pages
Module 8 Cariñosa Dance
No ratings yet
Module 8 Cariñosa Dance
7 pages
Submitted by ROLL NO: 2400-BC-2018: Assignment
No ratings yet
Submitted by ROLL NO: 2400-BC-2018: Assignment
7 pages
Business Strategy 2012 Syllabus-1
No ratings yet
Business Strategy 2012 Syllabus-1
7 pages
Issue 22 Move Function in GoNow 21.12.22
No ratings yet
Issue 22 Move Function in GoNow 21.12.22
2 pages
Waterproof and Cleanroom Luminaires PDF
No ratings yet
Waterproof and Cleanroom Luminaires PDF
13 pages
Easy Yoga Your Ultimate Beginners
No ratings yet
Easy Yoga Your Ultimate Beginners
68 pages
Simple Interest
No ratings yet
Simple Interest
18 pages
Global Warming - Compound Nouns
No ratings yet
Global Warming - Compound Nouns
1 page
App Form Questions
No ratings yet
App Form Questions
24 pages
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
100% (1)
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
72 pages
CALIFORNICATION TAB by Red Hot Chili Peppers @
No ratings yet
CALIFORNICATION TAB by Red Hot Chili Peppers @
7 pages
Nuun Recruiting Svcs 2018
No ratings yet
Nuun Recruiting Svcs 2018
14 pages
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
No ratings yet
Book Review and Analysis of "Harrison Bergeron" by Kurt Vonnegut, Jr.
5 pages
Presser Seal
No ratings yet
Presser Seal
2 pages
Enabling Metrics Logg
No ratings yet
Enabling Metrics Logg
8 pages
A Bluetooth® Audio Modules
No ratings yet
A Bluetooth® Audio Modules
30 pages
Un Thread Class
No ratings yet
Un Thread Class
1 page
Automotive Industry Agenda
No ratings yet
Automotive Industry Agenda
88 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Salesforce API wiith SAP CPI
No ratings yet
Salesforce API wiith SAP CPI
64 pages
Reiss - Text Types PDF
No ratings yet
Reiss - Text Types PDF
14 pages
50 Mock Quiz of Chemical Engineering
No ratings yet
50 Mock Quiz of Chemical Engineering
8 pages
05 Layout Managers
No ratings yet
05 Layout Managers
8 pages
Local Budget Circular: Republic OF THE Philippines / Department OF Budget AND Managemen
No ratings yet
Local Budget Circular: Republic OF THE Philippines / Department OF Budget AND Managemen
41 pages
Fenrg 09 723775
No ratings yet
Fenrg 09 723775
6 pages
Quantitative Hedge Funds Discretionary Systematic Ai Esg and Quantamental 2022 World Scientific Publishing Co Ltd
No ratings yet
Quantitative Hedge Funds Discretionary Systematic Ai Esg and Quantamental 2022 World Scientific Publishing Co Ltd
3 pages

NLP - Viva - Que & Ans

Uploaded by

NLP - Viva - Que & Ans

Uploaded by

BASICS

 What is Processed Under NLP?

 Difference Between NLU and NLG?

 Major Problem in Understanding Language?

 Example of Language Ambiguity:

Steps in NLP - Lexical analysis

 Goals of Lexical Analysis:

 Applications of Levenshtein Distance:

 What do we want to ensure by doing syntactic analysis?

 What is the role of grammar in syntax analysis?

 What are terminals and non-terminals in a grammar?

 Which parse tree is good and why?

 How do you decide if a language is possible by a given grammar?

 What is context-free grammar (CFG)?

 What are the rules of Chomsky Normal Form (CNF)?

 What is the need for the CKY algorithm?

 What is PCFG and when is it useful?

 Can a language have multiple parse trees?

 What do we want to achieve in this task?

 What is n-gram modeling?

 What is conditional probability?

 What is the Markov assumption?

 How do you calculate the probability of a word using a bigram model?

 What do you mean by a corpus?

 What is smoothing, and why is it needed?

 What is Maximum Likelihood Estimation (MLE)?

Parts of Speech Tagging-POS tagging

 Why do we need POS tagging?

 What is transmission probability?

 What is emission probability?

 What is the purpose of the Viterbi algorithm?

 What is the sole aim of the Viterbi algorithm?

 What is one-hot vectorization and how do we do that?

 What is Bag of Words (BoW) and why do we call it that?

 What is count vectorization?

 What is N-gram based representation?

 What are the drawbacks of Bag of Words representation?

 What do you mean by semantics?

 What do we want to achieve by representing text/documents in Bag of Words?

 What is Latent Semantic Analysis (LSA) and how do we perform it?

 What does SVD stand for?

 What is the outcome of topic modeling?

 What is LDA (Latent Dirichlet Allocation)?

 What is a document-topic matrix?

 What is a word-topic matrix?

 How do you decide the number of topics?

 What is the distribution of distributions under Dirichlet Allocation?

Word sense disambiguation

 What do you mean by word sense disambiguation?

 What is the relation between a dictionary and sense disambiguation?

 What is the Lesk algorithm

Lab based viva questions

 What does NLTK stand for

 What are stopwords?

 What are X and Y in machine learning?

 What does the IMDB review dataset contain?

 What is data and what is a label?

 What is the relation between data and label?

 What happens in the training phase?

 What do you mean by classification?

 How do you evaluate a model?

 What is the difference between text representation and model?

 What is a confusion matrix?

 What are TP, TN, FP, FN?

You might also like