0% found this document useful (0 votes)
13 views

NLP Previous Sem

The document outlines the examination structure for the 6th Semester B. Tech CSE Semester End Examination in May 2024, focusing on Natural Language Processing. It includes detailed instructions for answering questions from various units, covering topics such as NLP applications, Python programming for text analysis, regular expressions, and supervised classification. Each unit presents multiple questions, allowing students to demonstrate their understanding of NLP concepts and techniques.

Uploaded by

Amann Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

NLP Previous Sem

The document outlines the examination structure for the 6th Semester B. Tech CSE Semester End Examination in May 2024, focusing on Natural Language Processing. It includes detailed instructions for answering questions from various units, covering topics such as NLP applications, Python programming for text analysis, regular expressions, and supervised classification. Each unit presents multiple questions, allowing students to demonstrate their understanding of NLP concepts and techniques.

Uploaded by

Amann Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SRN

6th Semester B. Tech CSE Semester End Examination MAY 2024


Course Title: Natural Language Processing
Course Code:B20EFS613 - 22221607
Time: 3 Hours Max. Marks: 100
Note:
1. Answer ONE FULL question from each unit.
2. Verify and ensure that question paper is completely printed before answering the question paper.
3. Any queries/discrepancies regarding the question paper, must be brought to the notice of the invigilator
4. Students must check the course title and course code before answering the question paper
UNIT – I Marks
1. a) Natural language processing is one of the advanced techniques and it is expanded to 7
many advancements. Identify and explain the applications of natural language
processing in real world.
b) i. Let text1=’The cat is on the mat’ 8
What is the difference between the following two lines? Which one will give a
larger value? Will this be the case for other texts?
>>> sorted(set([w.lower() for w in text1]))
>>> sorted([w.lower() for w in set(text1)])
ii. Build Python code that finds all the words occurring at least four times in Brown
Corpus.
c) Develop the expressions for finding all words in some text file that meet the following 10
conditions. The result should be in the form of a list of words: ['word1', 'word2', ...].
i. Ending in ize
ii. Containing the letter z
iii. Containing the sequence of letters pt
All lowercase letters except for an initial capital (i.e., titlecase)
OR
2. a) Develop and interpret with a suitable python code to perform the below basic 8
operations of NLP on the text file with example.
i. concordance( )
ii. tokenize()
iii. similar()
iv. common_context()
b) Suppose a text file contains more than 5000 words narrating a story of ‘Sherlock 8
Holmes’. So, Mention a definition of lexical diversity, Develop and interpret a python
program to compute the lexical diversity to understand the complexity of the text with
an example.
c) Identify any five common nouns in the English Literature and examine the holonym- 9
meronym relations for these nouns by outlining a step-by-step procedure of
examination.
UNIT – II
3. a) A text file contains a many collection of words. Apply the below regular expression in a 10
python code and describe the set of strings matched by the regular expressions:
i. [a-zA-Z]+

Page 1 of 3
ii. [A-Z][a-z]*

iii. ^m+i+n+e

iv. [^ghi] [mno] [jlk] [def]

b) Multilingual language incorporates many languages along with a unique Unicode. With a 10
neat Unicode decoding and encoding diagram, illustrate how NLP handles multilingual
languages.
c) Compare Stemming and lemmatization operations in NLP 5
OR
4. a) Develop and interpret python code to scrapes a favorite web page and extract some text 10
from it. For example, access a weather site and pull out the top temperature of the city
from HTML document.
url: http://www.accuweather.com/en/us/charlottesville-va/22902/weather-
forecast/331243
b) With an illustrative block diagram and the corresponding Python code, substantiate the 10
process of NLP pipeline to build the vocabulary in NLP.
c) Define Text normalization. Explain the techniques of text normalization with a suitable 5
python code.
UNIT – III
5. a) Linguists use morphological, syntactic, and semantic clues to determine the category of a 10
word. Explain the following terms with a suitable example.
i. Morphological Clues
ii. Syntactic Clues
iii. Semantic Clues
iv. New Words
b) Construct two dictionaries, Student(consisting of the Student’s srn and marks) and 5
NewEntryStudents, and add some entries to each. Now issue the command
Student.update(NewEntryStudents). What did this do? What might it be useful for?
c) Train a bigram tagger with no backoff tagger, and run it on some of the training data. 10
Next, run it on some new data. What happens to the performance of the tagger? Why?
OR
6. a) Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. 10
Inspect and What explanation do you have for Lexical categories?
b) Tokenize and tag the following sentence: “They wind back the clock, while we chase 10
after the wind.” Analyze what different pronunciations and parts-of-speech are
involved?
c) Construct a dictionary e, to represent a single lexical entry for some word of your choice. 5
Define keys such as headword, part-of-speech, sense, and example, and assign them
suitable values.
UNIT – IV
7. a) A decision tree is constructed by partitioning the training samples into successive 10
subsets. Illustrate Five different algorithms have been developed to efficiently construct
an accurate decision tree.
b) “Generative models are strictly more powerful than conditional models.” Justify the 10
statement with suitable examples.
c) In maximum entropy models, the term “features” often refers to joint-features but 5
analyze how is Joint-feature connected to maximum entropy?
OR

Page 2 of 3
8. a) The synonyms strong and powerful pattern differently (try combining them with chip 10
and sales). What features are relevant in this distinction? Build a classifier that predicts
when each word should be used.
b) Suppose you wanted to automatically generate a prose description of a scene, and 10
already had a word to uniquely describe each entity, such as the book, and simply
wanted to decide whether to use in or on in relating various items, e.g., the book is in
the cupboard versus the book is on the shelf. Analyze this issue by looking at corpus data
and writing programs as needed. Consider the following examples:
i. in the car versus on the train
ii. in town versus on campus
iii. in the picture versus on the screen
iv. in Macbeth versus on Letterman

c) Consider one of the language technologies mentioned in this section, such as word sense 5
disambiguation, semantic role labelling, question answering, machine translation, named
entity detection. Identify what type and quantity of annotated data is required for
developing such systems. Why do you think a large amount of data is required?

***

Page 3 of 3
SRN

6th Semester B.Tech CSE Semester End Examination May 2024


Course Title: Natural Language Processing
Course Code: B21ET0601 - 22121602
Time: 3 Hours Max. Marks: 100
Note:
1. Answer ONE FULL question from each unit.
2. Verify and ensure that question paper is completely printed before answering the question paper.
3. Any queries/discrepancies regarding the question paper, must be brought to the notice of the invigilator
4. Students must check the course title and course code before answering the question paper
UNIT – I Marks
1. a) You are working on a project to analyze a large dataset of customer reviews for a range 10
of products and asked to extract meaningful insights about common customer
complaints, product features that are praised, and overall sentiment towards different
products. Explain the following NLTK functions for searching text with examples:
concordance(), similar(), common_contexts() for brown corpus.
b) Corpus is a large collection of linguistic data used to perform NLP operations. Explain 10
about the below mentioned datasets in corpus with python code snippet for accessing
them.
i. Gutenberg Corpus
ii. Web and Chat Text
iii. Brown Corpus
iv. Reuter Corpus
v. Inaugural Address
c) Demonstrate a program to print the 50 most frequent trigrams (3 adjacent words) of a 5
text, omitting trigrams that contain stopwords.
OR
2. a) Explain Conditional Frequency in detail. Write the program to find the word “is” in the 10
genre ‘news’ of brown corpus.

b) NLP Models like ChatGPT3 are very good at Natural language understanding that is why 10
they are so good at responding to different types of prompts in various languages.
Explain different technologies involved in automatic natural language understanding
c) Compare and contrast Stemming and Lemmatization operations in NLP 5
UNIT – II
3. a) Regular expressions are a powerful tool for pattern matching in NLP. Explain the basic 10
meta-characters used in regular expressions, including wildcards, ranges, and closures
with examples of each meta-character and describe how they can be used in pattern
matching tasks.
b) Build the regular expression to check 10
i. whether the string starts with the given pattern or not str = "Data Science"
ii. if whitespace is removed from the string having whitespace at the beginning
and end of a string.
c) Differentiate between list, strings and tuples. 5
OR
4. a) You are developing a text editor that includes a feature to detect and highlight specific 10

Page 1 of 2
grammatical constructs in text. Build the regular expressions to match the following
classes of strings:
i. Strings containing any one of the determiners - a, an, and the.
ii. An arithmetic expression using integers, addition, and multiplication,
such as 2*3+8.
b) Explain segmentation with derivative and evaluate function. 10
c) Our programs often need to deal with different languages, and different character sets. 5
Explain what you understand by the Unicode.
UNIT – III
5. a) The POS is identified using the word, its meaning and the context in which the word is 10
used. Explain POS tagging and illustrate reading and POS tagging of a tagged corpora
with python code using NLTK library.(Use Brown Corpus)
b) Python dictionary is the efficient way of storing the data as a key value pair. Explain 10
how the mapping of word to tag is done using dictionary. Explain and develop the
default dictionary of value list with example program.
c) Develop a python program to find the POS of given sentences. 5
Sent= "The quick brown fox jumps over the lazy dog"
OR
6. a) With the neat diagram explain how process of general N-gram tagging using NLTK 10
library’s built-in taggers. Why should data be split into training and test portions?
b) With suitable python code explain the automatic tagging with the evaluate function. 10
c) Explain the universal part of speech tag set. 5
UNIT – IV
7. a) The supervised classification classifies the based on the labeled data. With a neat 10
diagram explain working principle of supervised classification of text.
b) Develop a python NLP program for Movie review using NLTK library’s Naïve Bayes 10
classifier and ‘names’ corpus.
c) Explain the confusion matrix for the bigram tagger. 5
OR
8. a) Develop a python program for POS tagging using Decision Tree Classifier in NLTK. Use 10
tagged Brown corpus for training.
b) In general, one text depicts the same meaning of text2. Briefly describe recognizing the 10
textual entailment with an example.
c) Decision tree is supervised classification to classify the input. With a suitable diagram 5
explain the decision tree for classification.

***

Page 2 of 2

You might also like