0% found this document useful (0 votes)
49 views

Assignment-2 3

Students must complete each assignment question on separate white paper sheets using blue pen. They must write their name, registration number, and signature on each page. The final due date to submit the assignment through MS Teams is April 7, 2021 before 11:30 PM.

Uploaded by

botiwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Assignment-2 3

Students must complete each assignment question on separate white paper sheets using blue pen. They must write their name, registration number, and signature on each page. The final due date to submit the assignment through MS Teams is April 7, 2021 before 11:30 PM.

Uploaded by

botiwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Instructions for how to solve Assignment

1. Each assignment must be solved in white pages sheet separately using BLUE PEN.
2. Each of assignment (every page) you need to write your name, registration number and put your
signature on it.
3. Last date for submitting assignment is 07/04/2021 on before 11:30 P.M at MS Team.
Motilal Nehru National Institute of Technology, Allahabad.
Department of Computer Science & Engineering,
Information Technology
B. Tech. (IT) VI Semester
Subject:- Data Mining
Assignment-II

Q1 Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (internal or ratio). Some cases may have more
than one interpretation, so briefly indicate your reasoning if you think there may be some
ambiguity. [4]
a) Military Rank
b) Distance of Railway station from your hostel
c) Number of heads in toss of five coins.

Q2 Consider the following data set for a binary class problem

A B Class Label
T F +
T T +
T T +
T F -
T T +
F F -
F F -
F F -
T T -
T F -

(a) Calculate the information gain when splitting on A and B .Whichattribute would the
decision tree induction algorithm choose?[2]
(b) Calculate the gain in the Gini index when splitting on A and B .Whichattribute would the
decision tree induction algorithm choose?[2]

Q3 If proximity is typically defined between a pair of objects:


(a) Define two ways in which you might define the proximity among a groupof objects. [2]
(b) Explain why computing the proximity between two attributes is often simplerthan computing
the similarity between two objects. [2]

Q4. The following table summarizes a data set with three attributes A, B , C and two class labels
+, −. Build a two-level decision tree. P.T.O
A B C Number of Instance
+ -
T T T 5 0
F T T 0 20
T F T 20 0
F F T 0 5
T T F 0 0
F T F 25 0
T F F 0 0
F F F 0 25

(a) According to the classification error rate, which attribute would bechosen as the first splitting
attribute? For each attribute, show thecontingency table and the gains in classification error
rate. [2]
(b) How many instances are misclassified by the resulting decision tree?[2]

Q5 Define the following briefly :[4]


a. Occam Razor
b. Issues in data mining
c. Curse of Dimensionality
Motilal Nehru National Institute of Technology, Allahabad.
Department of Computer Science & Engineering,
Information Technology
B. Tech. (IT) VI Semester
Subject:- Data Mining
Assignment-III

1. For the following vectors, x and y, calculate the indicated similarity or distance measures.
A. x =(1, 1, 1, 1) , y =(2, 2, 2, 2) cosine, correlation, Euclidean
B. x =(0, 1, 0, 1) , y =(1, 0, 1, 0) cosine, correlation, Euclidean, Jaccard
C. x =(2, −1, 0, 2, 0, −3) , y =(−1, 1, −1, 0, 0, −1) cosine, correlation

2. Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more
than one interpretation, so briefly indicate your reasoning if you think there may be some
ambiguity.

(a) Time in terms of AM or PM.


(b) Brightness as measured by a light meter.
(c) Brightness as measured by people’s judgments.
(d) Angles as measured in degrees between 0 and 360

3. The following attributes are measured for members of a herd of Asian elephants:
weight, height, tusk length, trunk length, and ear area. Based on these measurements,
what sort of similarity measure would you use to compare or group these elephants?
Justify your answer and explain any specialcircumstances.

4. Discuss Whether or not each of the following activities is a data mining task:
 Dividing the customers of a company according to their gender.
 Dividing the customers of a company according to their profitability.
 Computing the total sales of a company.
 Sorting a student database based on student identification numbers.
 Predicting the future stock price of a company using historical records.

5. Derive the mathematical relationship between cosine similarity and Eu-clidean distance when
each data object has an L2 length of 1.
6. Describe the types of situations that produce sparse or dense data cubes. Illustrate with
examples other than those used in the book.

You might also like