AI UNIT 6 and UNIT 7 question and answers
AI UNIT 6 and UNIT 7 question and answers
B. Subjective Questions:
1. Explain tokenization in NLP.
Ans: The process of converting a text object into individual units is referred to as Tokenization.
Examples of tokens can be words, numbers, or punctuation symbols. The tokenized
form is used to:
Evaluation
7 SOLUTIONS
What can we derive From This Matrix? Select the correct option:
Ans: Out of those 100 results, the model predicted “yes” 45 times and “no” 55 times. In
reality, 60 times the event in question occurred, and 40 times it didn’t.
2. Match the terms with correct formulae:
a. Accuracy - iii. (TP+TN)/Total
b True Positive Rate - iv. TP/predicted yes
c False Positive Rate - v. FP/actual no
d. Specificity - ii. TN/actual no
e. Precision - i. TP/ actual yes
3. Compute for the following Metrics using the values from the matrix given above
with the correctly identified formulae-
Ans: i) Accuracy- 85%
ii) True Positive Rate- 88%
iii) Specificity- 83%
EP
4. Observe the predicted and actual cases of the given Confusion Matrix:
Ans: Accuracy- 1.1
Recall- 0.952
Precision- 1.1
F1 Score- 3.918
⮞ Hold-Out: In this method, the large dataset is divided into three subsets
⮞ Training set: It is used to build predictive models
⮞ Validation: It assesses the model’s performance made in the training phase. It providesa
test platform for finetuning the model’s parameters and thus selecting the model with
the best performance.
⮞ Test set:It’s a hidden subset of the dataset to predict a model’s expected future performance.
A model may be overfitting if it fits the training set substantially better than the test set.
⮞ Cross-Validation: With limited data available and to achieve an unbiased estimate of the
model performance, k-fold cross-validation is used. In k-fold cross-validation, w data
is divided into k subsets of equal size. We build models k times, each time leavingout
one of the subsets from training and using it as the test set. If k equals the sample size,
this is called «leave-one-out. »
4. Why is F1 score metrics better than accuracy?
Ans: The F1 score is often considered a better evaluation metric than accuracy because it takes
into account both precision and recall. Accuracy only feels the number of correct
predictions made by the model, while the F1 score balances the trade-off between
precision and recall. This is particularly important in cases with a class imbalance in the
data. For example, suppose the dataset has a majority class and a minority class. In that
case, accuracy can be misleading as it may result in the model incorrectly classifying
the minority class as the majority class. F1 score, on the other hand, provides a more
balanced measure of model performance, making it a better metric to evaluate models
in such scenarios.
5. What is Confusion Matrix? Why is it considered as the base for other metrics?
Ans: AConfusion Matrix is a desk used to assess the overall performance of a system, getting to
know the model. It is a matrix of predicted and actual values of a classification problem.
A confusion matrix is a base for other metrics because it provides a comprehensive
overview of a machine learning model’s performance.
AI-X EP