NLP Previous Sem
NLP Previous Sem
Page 1 of 3
ii. [A-Z][a-z]*
iii. ^m+i+n+e
b) Multilingual language incorporates many languages along with a unique Unicode. With a 10
neat Unicode decoding and encoding diagram, illustrate how NLP handles multilingual
languages.
c) Compare Stemming and lemmatization operations in NLP 5
OR
4. a) Develop and interpret python code to scrapes a favorite web page and extract some text 10
from it. For example, access a weather site and pull out the top temperature of the city
from HTML document.
url: http://www.accuweather.com/en/us/charlottesville-va/22902/weather-
forecast/331243
b) With an illustrative block diagram and the corresponding Python code, substantiate the 10
process of NLP pipeline to build the vocabulary in NLP.
c) Define Text normalization. Explain the techniques of text normalization with a suitable 5
python code.
UNIT – III
5. a) Linguists use morphological, syntactic, and semantic clues to determine the category of a 10
word. Explain the following terms with a suitable example.
i. Morphological Clues
ii. Syntactic Clues
iii. Semantic Clues
iv. New Words
b) Construct two dictionaries, Student(consisting of the Student’s srn and marks) and 5
NewEntryStudents, and add some entries to each. Now issue the command
Student.update(NewEntryStudents). What did this do? What might it be useful for?
c) Train a bigram tagger with no backoff tagger, and run it on some of the training data. 10
Next, run it on some new data. What happens to the performance of the tagger? Why?
OR
6. a) Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. 10
Inspect and What explanation do you have for Lexical categories?
b) Tokenize and tag the following sentence: “They wind back the clock, while we chase 10
after the wind.” Analyze what different pronunciations and parts-of-speech are
involved?
c) Construct a dictionary e, to represent a single lexical entry for some word of your choice. 5
Define keys such as headword, part-of-speech, sense, and example, and assign them
suitable values.
UNIT – IV
7. a) A decision tree is constructed by partitioning the training samples into successive 10
subsets. Illustrate Five different algorithms have been developed to efficiently construct
an accurate decision tree.
b) “Generative models are strictly more powerful than conditional models.” Justify the 10
statement with suitable examples.
c) In maximum entropy models, the term “features” often refers to joint-features but 5
analyze how is Joint-feature connected to maximum entropy?
OR
Page 2 of 3
8. a) The synonyms strong and powerful pattern differently (try combining them with chip 10
and sales). What features are relevant in this distinction? Build a classifier that predicts
when each word should be used.
b) Suppose you wanted to automatically generate a prose description of a scene, and 10
already had a word to uniquely describe each entity, such as the book, and simply
wanted to decide whether to use in or on in relating various items, e.g., the book is in
the cupboard versus the book is on the shelf. Analyze this issue by looking at corpus data
and writing programs as needed. Consider the following examples:
i. in the car versus on the train
ii. in town versus on campus
iii. in the picture versus on the screen
iv. in Macbeth versus on Letterman
c) Consider one of the language technologies mentioned in this section, such as word sense 5
disambiguation, semantic role labelling, question answering, machine translation, named
entity detection. Identify what type and quantity of annotated data is required for
developing such systems. Why do you think a large amount of data is required?
***
Page 3 of 3
SRN
b) NLP Models like ChatGPT3 are very good at Natural language understanding that is why 10
they are so good at responding to different types of prompts in various languages.
Explain different technologies involved in automatic natural language understanding
c) Compare and contrast Stemming and Lemmatization operations in NLP 5
UNIT – II
3. a) Regular expressions are a powerful tool for pattern matching in NLP. Explain the basic 10
meta-characters used in regular expressions, including wildcards, ranges, and closures
with examples of each meta-character and describe how they can be used in pattern
matching tasks.
b) Build the regular expression to check 10
i. whether the string starts with the given pattern or not str = "Data Science"
ii. if whitespace is removed from the string having whitespace at the beginning
and end of a string.
c) Differentiate between list, strings and tuples. 5
OR
4. a) You are developing a text editor that includes a feature to detect and highlight specific 10
Page 1 of 2
grammatical constructs in text. Build the regular expressions to match the following
classes of strings:
i. Strings containing any one of the determiners - a, an, and the.
ii. An arithmetic expression using integers, addition, and multiplication,
such as 2*3+8.
b) Explain segmentation with derivative and evaluate function. 10
c) Our programs often need to deal with different languages, and different character sets. 5
Explain what you understand by the Unicode.
UNIT – III
5. a) The POS is identified using the word, its meaning and the context in which the word is 10
used. Explain POS tagging and illustrate reading and POS tagging of a tagged corpora
with python code using NLTK library.(Use Brown Corpus)
b) Python dictionary is the efficient way of storing the data as a key value pair. Explain 10
how the mapping of word to tag is done using dictionary. Explain and develop the
default dictionary of value list with example program.
c) Develop a python program to find the POS of given sentences. 5
Sent= "The quick brown fox jumps over the lazy dog"
OR
6. a) With the neat diagram explain how process of general N-gram tagging using NLTK 10
library’s built-in taggers. Why should data be split into training and test portions?
b) With suitable python code explain the automatic tagging with the evaluate function. 10
c) Explain the universal part of speech tag set. 5
UNIT – IV
7. a) The supervised classification classifies the based on the labeled data. With a neat 10
diagram explain working principle of supervised classification of text.
b) Develop a python NLP program for Movie review using NLTK library’s Naïve Bayes 10
classifier and ‘names’ corpus.
c) Explain the confusion matrix for the bigram tagger. 5
OR
8. a) Develop a python program for POS tagging using Decision Tree Classifier in NLTK. Use 10
tagged Brown corpus for training.
b) In general, one text depicts the same meaning of text2. Briefly describe recognizing the 10
textual entailment with an example.
c) Decision tree is supervised classification to classify the input. With a suitable diagram 5
explain the decision tree for classification.
***
Page 2 of 2