0% found this document useful (0 votes)
38 views

DW Model Questions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

DW Model Questions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

BTECH DATA MINING & WAREHOUSING MODEL QUESTIONS

“If you take your time to find answers in a time you have, you will pass all assessments, also
be qualified to work in heavy workload environment”

All assessment will be chosen from the following questions, and they will be multiple
choice, you will be given opportunity of choosing which questions to answer

Q.1 Explain different data mining tasks.


Q.2 What is the relation between data warehousing and data mining?
Q.3 Explain the differences between “Explorative Data Mining” and “Predictive Data Mining”
and give one example of each.
Q.4 What are the application areas of data Mining?
Q.5 Explain the differences between Knowledge discovery and data mining.
Q.6 How is data warehouse different from a database? How are they similar?
Q.7 What type of benefit you might hope to get from data mining?
Q.8 What are the key issues in data Mining?
Q.9 How can Data Mining help business analyst?
Q.10 What are the limitations of data Mining?
Q.11 Discuss the need of human intervention in data mining process.
Q.12As a bank manager, how would you decide whether to give loan to an applicant or not?
Q.13 What steps you would follow to identify a fraud for a credit card company.
Q.14 What is data Mining?
Q.15 State three different application for which data mining techniques seem appropriate.
Informally explain each application.
Q.16 Explain briefly the differences between “classification” and ‘’clustering” and give an
informal example of an application that would benefit from each technique.
Q.17 What do you mean by Data Processing?
Q.18 Explain data cleaning.
Q.19 Describe different data cleaning approaches.
Q.20 How can we handle missing values?
Q,21 Explain Noisy Data.
Q.22 Give Brief description of following:
(a) Binning
(b) regression
(c) Clustering
(d) Smoothing
(e) Generalization
(f) Aggregation
Q.24 Can you briefly describe the four stages of knowledge discovery(KDD)? Can you
describe the multi-tiered data warehouse architecture?
Q.25 A data set for analysis includes only one attribute X:
X={ 7,12,5,8,5,9,13,12,19,7,12,12,13,3,4,5,13,8,7,6}
(a) What is the mean of the data set X?
(b) What is the median?
(c) Find the standard deviation for X.
Q.26 Define Frequent sets, confidence, support and association rule.
Q.27 What do you mean by Market Basket analysis and how it can help in a supermarket?
Q.28 Explain whether association rule mining is supervised or unsupervised type of learning.
Q.29 Name some variants of Apriori Algorithm.
Q.30 Discuss the importance of Association Rule Mining.
Q.31 Consider the Data set D. Given the minimum support2, apply apriori algorithm on this
dataset.
Transaction ID Items
100 A,C,D
200 B,C,E
300 A,B,C,E
400 B,E

Q.32 Describe example of data set for which apriori check would actually increase the cost?
By describe I mean either show an instance of the data set or describe how would it look like.
Q.33 Same question for MaxMiner. When does MaxMiner perform worse than apriori. How
does MaxMiner generate the frequency counts for every itemset which meets support
constraints?
Q.34 With a neat sketch explain the architecture of a data warehouse
Q.35 Discuss the typical OLAP operations with an example
Q.36 (i) Discuss how computations can be performed efficiently on data cubes.
Q.37 (ii) Write short notes on data warehouse meta data.
Q.38 (i) Explain various methods of data cleaning in detail.
(ii) Give an account on data mining Query language.

(iii) How is Attribute-Oriented Induction implemented? Explain in detail.

Q39 (a) Write and explain the algorithm for mining frequent item sets without candidate
generation. Give relevant example.

Q.40 Discuss the approaches for mining multi level association rules from the transactional
databases. Give relevant example.

Q.41 (i) Explain the algorithm for constructing a decision tree from training
samples. (ii) Explain Bayes theorem.

Q.42 Explain the following clustering methods in detail:


(i) BIRCH
(ii) CURE

Q.43 Classification is supervised learning. Justify.


Q.44 Explain different classification Techniques.
Q.45 Entropy is an important concept in information theory. Explain its significance in mining
context.
Q.46 What are over fitted models? Explain their effects on performance.
Q.47 Explain Naive Baye’s Classification.
Q.48 Describe the essential features of decision trees in context of classification.
Q.49 What are the advantages and disadvantages of decision tress over other classification
methods?
Q.50 Explain ID3 Algorithm.
Q.51 Explain the methods for computing best split.
Q.52 What is Clustering? What are different types of clustering?
Q.53 Explain different data types used in clustering.
Q.54 Define Association Rule Mining
Q.55 When we can say the association rules are interesting?
Q.56 Explain Association rule in mathematical notations.
Q.57 Define support and confidence in Association rule mining.
Q.58 How are association rules mined from large databases?
Q.59 Describe the different classifications of Association rule mining.
Q.60 What is the purpose of Apriori Algorithm?
Q.61 Define anti-monotone property.
Q.62 How to generate association rules from frequent item sets?
Q.63 Give few techniques to improve the efficiency of Apriori algorithm.
Q.64 Mention few approaches to mining Multilevel Association Rules
Q.65 What are multidimensional association rules
Q.66. List out the differences between OLTP and OLAP.
Q.67. Explain mining Multi-dimensional Boolean association rules from transaction
Q.68. Explain constraint-based association mining?
Q.69 Specify the 5 criteria for the evaluation of classification & prediction?
Q.70 State two clustering method thst are used in "grid and density based method?
Q.71 Why every data structure in the data warehouse contains the time element.
Q.72 How does a snowflake schema differ from a star schema ? Name two advantages and two
disadvantages of the snowflake schema.
Q.73 What are the essential differences between the MOLAP and ROLAP models? Also list a
few similarities.
Q.74 Why is the entity-relationship modelling technique not suitable for the data warehouse.
Q.75 How is Data Mining different from OLAP? Explain Briefly.
Q.76 a) Define Data warehouse? Discuss Design principles.
b) Write in brief about schemas in multidimensional data model.

Q.77 Discuss the following


a) Star schema
b) Snow Flake schema
c) Fact constellation schema

Q.78 a) What are steps in designing the data warehouse? Explain


b) Compare OLTP and OLAP
Q.79 Describe in brief about Data warehouse implementation
Q.80 Draw and Explain about OLAM Architecture?
Q.81 Write in detail about Attribute Oriented Induction with example
Q.82 Explain the following in OLAP
a) Roll up operation
b) Drill Down operation
c) Slice operation
d) Dice operation
e) Pivot operation
Q.83 Explain about the Apriori algorithm for finding frequent item sets with an example.

Q.84 You are given the transaction data shown in the Table below from a fast food

restaurant. There are 9 distinct transactions (order: 1 – order: 9) and each transaction

involves between 2 and 4 meal items. There are a total of 5 meal items that are involved in

the transactions. For simplicity we assign the meal items short names (M1 – M5) rather

than the full descriptive names (e.g., Big Mac).

Meal Item List of Item IDs Meal Item List of Item IDs

Order: 1 M1, M2, M5 Order: 6 M2, M3

Order: 2 M2, M4 Order: 7 M1, M3

Order: 3 M2, M3 Order: 8 M1, M2, M3, M5

Order: 4 M1, M2, M4 Order: 9 M1, M2, M3

Order: 5 M1, M3

For all of the parts below the minimum support is 2/9 (.222) and the minimum confidence
is 7/9 (.777). Note that you only need to achieve this level, not exceed it. Show your work
for full credit (this mainly applies to part a).
a. Apply the Apriori algorithm to the dataset of transactions and identify all frequent k
itemset. Show all of your work. You must show candidates but can cross them off to show
the ones that pass the minimum support threshold. This question is a bit longer than the
homework questions due to the number of transactions and items, so proceed carefully
and neatly. Note: if a candidate itemset is pruned because it violates the Apriori property,
you must indicate that it fails for this reason and not just because it does not achieve the
necessary support count (i.e., in these cases there is no need to actually compute the
support count). So, explicitly tag the itemset that are pruned due to violation of the Apriori
property. This really did not come up on the homework because those problems were quite
short. (If you do not know what the Apriori property is,do not panic. You will ultimately
get the exact same answer but will just lose a few points).

b. Find all strong association rules of the form:


and note their confidence values. Hint: the answer is not 0 so you should have at least one
frequent 3-frequent itemset.
Q.85 a) Discuss about basic concepts of frequent itemset mining. =

b) Write the Aprori Algorithm.

Q86. a) what are the drawbacks of Apriori Algorithm? Explain

b) Write the FP Growth Algorithm.

Q.87. a) what are the advantages of FP-Growth algorithm?

b) Discuss the applications of association analysis.

Q.88 Can we design a method that mines the complete set of frequent item sets without
candidate generation? If yes, explain with an example

Q.89 What are the Draw backs of Apriori Algorithm? Explain about FP Growth Concept in
Detail?

Q.90 Explain about the Mining Multilevel Association rules with example.

a) Write about basic concept in Association Rule mining

b) Can we overcome the draw backs of Apriori algorithm? Discuss.

Q91. What are the various Constraints in Constraint based Association rule mining? Explain.

Q92. Describe the data classification process with a neat diagram. How does the Naive
Bayesian classification work? Explain.

Q93. Explain decision tree induction algorithm for classifying data tuples and with suitable
example.
Q94. How does the Naïve Bayesian classification work? Explain in detail.

Q95. a) What is Bayesian belief network? Explain in detail.

b) Write a note attribute selection measure.

Q96. Explain in detail about Attribute Selection methods in Classification

Q97. a) what is Bayes theorem? Explain.

Q98. b) Discuss about Naïve Bayesian Classification.

Q99. a) What is Bayesian belief network? Explain in detail.

b) Write a note attribute selection measures.

Q100. Describe in detail about Rule based Classification.

Q101. Write and explain about Classification by Back propagation Algorithm.

Q102. a) what is prediction? Explain about Linear regression method.

b) Discuss about Accuracy and Error measures.

Q103. Define Clustering? Explain about Types of Data in Cluster Analysis?

Q104. a) Classify various Clustering methods.

b) Write any one Partitioning based clustering methods.

Q105. What is the goal of clustering? How does partitioning around medoids algorithm achieve
this?

Q106. a) Differentiate between AGNES and DIANA algorithms.

b) How to access the cluster quality?

Q107. a) What is outlier detection? Explain distance-based outlier detection.

b) Write partitioning around mediods algorithm.

Q108. a) Write K-means clustering algorithm.

b) Write the key issue in hierarchical clustering algorithm.

Q109. Explain the following

a) Density based clustering methods

b) Grid based clustering methods


Q110. What are outliers? Discuss the methods adopted for outlier detection

Q111. a) Give a brief note on PAM Algorithm.

b) What is the drawback of K-means algorithm? How can we modify the algorithm to

diminish that problem

Q112. Discuss in detail about Data mining Applications.

You might also like