DW Model Questions
DW Model Questions
“If you take your time to find answers in a time you have, you will pass all assessments, also
be qualified to work in heavy workload environment”
All assessment will be chosen from the following questions, and they will be multiple
choice, you will be given opportunity of choosing which questions to answer
Q.32 Describe example of data set for which apriori check would actually increase the cost?
By describe I mean either show an instance of the data set or describe how would it look like.
Q.33 Same question for MaxMiner. When does MaxMiner perform worse than apriori. How
does MaxMiner generate the frequency counts for every itemset which meets support
constraints?
Q.34 With a neat sketch explain the architecture of a data warehouse
Q.35 Discuss the typical OLAP operations with an example
Q.36 (i) Discuss how computations can be performed efficiently on data cubes.
Q.37 (ii) Write short notes on data warehouse meta data.
Q.38 (i) Explain various methods of data cleaning in detail.
(ii) Give an account on data mining Query language.
Q39 (a) Write and explain the algorithm for mining frequent item sets without candidate
generation. Give relevant example.
Q.40 Discuss the approaches for mining multi level association rules from the transactional
databases. Give relevant example.
Q.41 (i) Explain the algorithm for constructing a decision tree from training
samples. (ii) Explain Bayes theorem.
Q.84 You are given the transaction data shown in the Table below from a fast food
restaurant. There are 9 distinct transactions (order: 1 – order: 9) and each transaction
involves between 2 and 4 meal items. There are a total of 5 meal items that are involved in
the transactions. For simplicity we assign the meal items short names (M1 – M5) rather
Meal Item List of Item IDs Meal Item List of Item IDs
Order: 5 M1, M3
For all of the parts below the minimum support is 2/9 (.222) and the minimum confidence
is 7/9 (.777). Note that you only need to achieve this level, not exceed it. Show your work
for full credit (this mainly applies to part a).
a. Apply the Apriori algorithm to the dataset of transactions and identify all frequent k
itemset. Show all of your work. You must show candidates but can cross them off to show
the ones that pass the minimum support threshold. This question is a bit longer than the
homework questions due to the number of transactions and items, so proceed carefully
and neatly. Note: if a candidate itemset is pruned because it violates the Apriori property,
you must indicate that it fails for this reason and not just because it does not achieve the
necessary support count (i.e., in these cases there is no need to actually compute the
support count). So, explicitly tag the itemset that are pruned due to violation of the Apriori
property. This really did not come up on the homework because those problems were quite
short. (If you do not know what the Apriori property is,do not panic. You will ultimately
get the exact same answer but will just lose a few points).
Q.88 Can we design a method that mines the complete set of frequent item sets without
candidate generation? If yes, explain with an example
Q.89 What are the Draw backs of Apriori Algorithm? Explain about FP Growth Concept in
Detail?
Q.90 Explain about the Mining Multilevel Association rules with example.
Q91. What are the various Constraints in Constraint based Association rule mining? Explain.
Q92. Describe the data classification process with a neat diagram. How does the Naive
Bayesian classification work? Explain.
Q93. Explain decision tree induction algorithm for classifying data tuples and with suitable
example.
Q94. How does the Naïve Bayesian classification work? Explain in detail.
Q105. What is the goal of clustering? How does partitioning around medoids algorithm achieve
this?
b) What is the drawback of K-means algorithm? How can we modify the algorithm to