Assignment-2 3
Assignment-2 3
1. Each assignment must be solved in white pages sheet separately using BLUE PEN.
2. Each of assignment (every page) you need to write your name, registration number and put your
signature on it.
3. Last date for submitting assignment is 07/04/2021 on before 11:30 P.M at MS Team.
Motilal Nehru National Institute of Technology, Allahabad.
Department of Computer Science & Engineering,
Information Technology
B. Tech. (IT) VI Semester
Subject:- Data Mining
Assignment-II
Q1 Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (internal or ratio). Some cases may have more
than one interpretation, so briefly indicate your reasoning if you think there may be some
ambiguity. [4]
a) Military Rank
b) Distance of Railway station from your hostel
c) Number of heads in toss of five coins.
A B Class Label
T F +
T T +
T T +
T F -
T T +
F F -
F F -
F F -
T T -
T F -
(a) Calculate the information gain when splitting on A and B .Whichattribute would the
decision tree induction algorithm choose?[2]
(b) Calculate the gain in the Gini index when splitting on A and B .Whichattribute would the
decision tree induction algorithm choose?[2]
Q4. The following table summarizes a data set with three attributes A, B , C and two class labels
+, −. Build a two-level decision tree. P.T.O
A B C Number of Instance
+ -
T T T 5 0
F T T 0 20
T F T 20 0
F F T 0 5
T T F 0 0
F T F 25 0
T F F 0 0
F F F 0 25
(a) According to the classification error rate, which attribute would bechosen as the first splitting
attribute? For each attribute, show thecontingency table and the gains in classification error
rate. [2]
(b) How many instances are misclassified by the resulting decision tree?[2]
1. For the following vectors, x and y, calculate the indicated similarity or distance measures.
A. x =(1, 1, 1, 1) , y =(2, 2, 2, 2) cosine, correlation, Euclidean
B. x =(0, 1, 0, 1) , y =(1, 0, 1, 0) cosine, correlation, Euclidean, Jaccard
C. x =(2, −1, 0, 2, 0, −3) , y =(−1, 1, −1, 0, 0, −1) cosine, correlation
2. Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more
than one interpretation, so briefly indicate your reasoning if you think there may be some
ambiguity.
3. The following attributes are measured for members of a herd of Asian elephants:
weight, height, tusk length, trunk length, and ear area. Based on these measurements,
what sort of similarity measure would you use to compare or group these elephants?
Justify your answer and explain any specialcircumstances.
4. Discuss Whether or not each of the following activities is a data mining task:
Dividing the customers of a company according to their gender.
Dividing the customers of a company according to their profitability.
Computing the total sales of a company.
Sorting a student database based on student identification numbers.
Predicting the future stock price of a company using historical records.
5. Derive the mathematical relationship between cosine similarity and Eu-clidean distance when
each data object has an L2 length of 1.
6. Describe the types of situations that produce sparse or dense data cubes. Illustrate with
examples other than those used in the book.