0% found this document useful (0 votes)
116 views3 pages

6720 Labs Chapter 2

This document contains review questions for Chapter 2 of a data mining textbook. The questions cover topics like identifying supervised vs unsupervised learning tasks, the roles of validation and test partitions in modeling, overfitting of models to training data, and choosing between models based on their performance on validation vs training data. Additional questions involve using data mining software to pre-process categorical variables by converting them to dummy variables and partitioning a dataset into training and validation samples.

Uploaded by

sweetie05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views3 pages

6720 Labs Chapter 2

This document contains review questions for Chapter 2 of a data mining textbook. The questions cover topics like identifying supervised vs unsupervised learning tasks, the roles of validation and test partitions in modeling, overfitting of models to training data, and choosing between models based on their performance on validation vs training data. Additional questions involve using data mining software to pre-process categorical variables by converting them to dummy variables and partitioning a dataset into training and validation samples.

Uploaded by

sweetie05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Mining Review Questions / XLMiner Labs

Chapter 2 Overview of the Data Mining Process

1. Assuming that data mining techniques are to be used in the following cases,
identify whether the task required is supervised or unsupervised learning
(textbook reference - 2.1).
a. Deciding whether or not to issue a loan to an applicant based on
demographic and financial data (with reference to a database of similar
data on prior customers).
b. In an online bookstore, making recommendations to customers concerning
additional items to buy based on the buying patterns of prior transactions.
c. Identifying a network data packet as dangerous (e.g., virus, hacker attack)
based on comparison to other packets whose threat status is known.
d. Identifying segments of similar customers.
e. Predicting whether a company will go bankrupt based on comparing its
financial data to those of similar bankrupt and non-bankrupt firms.
f.

Estimating the repair time required for an aircraft based on a trouble


ticket.

g. Automated sorting of mail by zip code scanning.


h. Printing of customer discount coupons at the conclusion of a grocery store
checkout based on what you just bought and what others have bought
previously.
2. Describe the difference in roles assumed by the validation partition and the test
partition (textbook reference - 2.2).
3. Using the concept of over fitting, explain why that when a model is fit to training
data, zero error with those data are not necessarily good (textbook reference 2.5).
4. Two models are applied to a dataset that has been partitioned. Model A is
considerably more accurate than Model B on the training data but slightly less
accurate than Model B on the validation data. Which model are you more likely
to consider for final deployment? Explain your choice. (textbook reference 2.10)

Page 1 of 3

Page 2 of 3

5. The next 2 Questions require the Use of XLMiner Data Mining software and the
UniversalBank.xls dataset . . .
a. Use XLMiners Convert to Dummies utility to convert the categorical
variable Education to binary dummy variables. After the conversion, how
many resulting columns exist for the Education variable? Why is this
conversion performed?
b. Using the newly created dataset (with binary dummy variables), use
XLMiners Partitioning function to perform Standard Partitioning (accept
the default percentages for partitioning). How many records were
assigned to the Training Partition? How many records were assigned to
the Validation Partition? Why was a Test Partition not created?

Page 3 of 3

You might also like