0% found this document useful (0 votes)
2 views

Qb Data Mining

The document is a comprehensive question bank covering various topics in data mining across five modules. It includes questions on data mining concepts, techniques, algorithms, and applications, such as classification, clustering, and data preprocessing. Each module focuses on specific areas, providing a structured approach to understanding data mining processes and methodologies.

Uploaded by

nikhitaraj1810
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Qb Data Mining

The document is a comprehensive question bank covering various topics in data mining across five modules. It includes questions on data mining concepts, techniques, algorithms, and applications, such as classification, clustering, and data preprocessing. Each module focuses on specific areas, providing a structured approach to understanding data mining processes and methodologies.

Uploaded by

nikhitaraj1810
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

QUESTION BANK

MODULE -1

1. What is data mining? Explain the KDD Process in detail with diagram?
2. List the types of data that can be mined and explain any two?
3. Explain the differences between data warehouses and transactional data?
4. Interpret the Classification and Regression for Predictive Analysis?
5. With an example demonstrate Class/Concept Description: Characterization
and Discrimination.
6. Describe how association rules help in mining frequent patterns?
7. Analyze the steps involved in performing Cluster Analysis and Outlier
Analysis?
8. Explain Information Retrieval with types?
9. Which Kinds of Applications Are Targeted? Analyze both the applications?
10.Identify and explain two major issues commonly encountered in data mining
processes?
11.What Is an Attribute? Explain nominal and binary of attributes?
12.Explain numeric attributes?
13.Compare mean, median, and mode as measures of central tendency with
example?
14.Explain the terms 1. Range 2. Quartiles 3. Interquartile Range 4. Five-
Number Summary 5. Boxplots and Outliers.
15.Analyze the roles of variance and standard deviation with Example?
16.Explain Histograms and Scatter Plots and Data Correlation?
17.Analize the Major Tasks in Data Preprocessing?
18.Design a process of Data Cleaning? Explain 1. Missing Values 2. Noisy Data
3. Data Cleaning as a Process
19. Analize Correlation Coefficient for Numeric Data and Covariance of
Numeric Data of given information
20. What is data reduction? Discuss Wavelet Transforms?
21.How does principal component analysis (PCA) contribute to data reduction?
22.Analize heuristic methods of attribute subset selection with example?
23.Analyze the impact of using sampling techniques versus full datasets in data
analysis example?
24.Explain Data Cube Aggregation?
25. Discuss Strategies for data transformation?
26.How would you apply normalization to transform a dataset for clustering?
27. What is binning?
28. Demonstrate the study four methods for the generation of concept
hierarchies for nominal data?

Module -2

1. Define Market Basket Analysis and explain its significance.


2. What are association rules, and what do support, and confidence represent?
3. Apriori algorithm for discovering frequent item sets for mining Boolean
association rules
4. Evaluate the impact of using different thresholds for support and confidence
in generating association rules from frequent itemset.
5. Apply the Apriori algorithm for the given table and Apriori algorithm for
discovering frequent itemsets for mining Boolean association rules
6. Analyze various optimization techniques used to improve the efficiency of
the Apriori algorithm

7. Explain and interpret three-tired data warehouse architecture


8. A database has five Transaction. Let the minimum support be 3.
1.Find the order items set.
2.Construct FP-Tree.
3.Find conditional Frequent Pattern and frequent pattern generation by FP
algorithm.
TID Items
T1 {M,O,N,K,E,Y}
T2 {D,O,N,K,E,Y}
T3 {M,A,K,E}
T4 {M,U,C,K,Y}
T5 {C,O,O,K,I,E}

9. What Is a Data Warehouse? Explain its key features?


10.Differences between Operational Database Systems and Data Warehouses
11.Evaluate the key methodologies used in data warehouse development
12.Compare star schema, a snowflake schema, and fact constellation schema
13.Compare OLAP and OLTP System with feature operation.
14.Explain Typical OLAP operations
15.How do join indexes and bitmap indexes contribute to the efficient
processing of OLAP queries?
Module 3

1. What is classification in data mining?


2. List the key steps involved in the decision tree induction process.
3. What is Bayes' Theorem?
4. Define bagging and boosting.
5. What are ROC curves used for?
6. Explain how Naïve Bayesian classification works.
7. Describe the process of tree pruning and its significance.
8. What is the general approach to rule extraction from a decision tree?
9. How does cross-validation help in evaluating classifier performance?
10.Interpret the significance of ensemble methods for improving classification
accuracy.
11.Apply the IF-THEN rule-based classification method to a small dataset.
12.Use the holdout method to evaluate the performance of a decision tree
classifier on a given dataset.
13.Calculate the performance metrics (accuracy, precision, recall, and F1-score)
for a given confusion matrix.
14.Generate a decision tree for a sample dataset and apply Tree pruning to
improve accuracy.
15.Apply the concept of bagging on a dataset using multiple decision trees.
16.Compare the attribute selection measures used in decision tree induction
(e.g., information gain and Gini index).
17.Analyze the differences between bagging and boosting techniques.
18.Analyze how random forests combine multiple decision trees to improve
classification accuracy.
19.Which method (bagging, boosting, or random forests) would you
recommend for class-imbalanced data? Justify your choice.
20.Propose a strategy to handle class-imbalanced data when using ensemble
methods.
21.Create an algorithm that improves rule induction using sequential covering
for a specific dataset.
22.Design a visual mining tool to better interpret decision tree structures.
23.Develop a hybrid approach that integrates ROC curve analysis and cost-
benefit analysis for classifier comparison.
24.Analyze a class label using Naive Bayesian classification with Algorithm
X = (age = senior, income = medium, student = yes, credit rating = fair) consider
the table below Q9
25.Sketch proved tree using decision tree in the following class labeled
training tuple . Solve the Gini(income) of the tree

Module -4

1. What is cluster analysis and list the applications of cluster analysis


2. List and discuss the requirements of cluster analysis
3. What is the main difference between k-means and k-medoids clustering
methods?
4. Explain the k-means partitioning algorithm.
5. Apply k-means partitioning algorithm for the data set
Consider six points in 1-D space having the values 1,2,3,8,9,10, and 25,
where k=2
6. Explain the PAM, a K-medoids partitioning algorithm with Example
7. Solve using K-mean clustering algorithm by considering the K=2
K= {2,3,4,10,11,12,20,25,30}
8. Explain how the choice of linkage criteria (e.g., single, complete, or
average) affects the dendrogram generated by agglomerative clustering.
9. Explain Distance Measures in Algorithmic Methods
10.Apply the probabilistic hierarchical clustering algorithm with example
11.Construct the Clustering feature (CF) for the data set (2,5),(3,2), and (4,3)
12.Discuss Probabilistic Hierarchical Clustering Algorithm
13.Explain Agglomerative versus Divisive Hierarchical Clustering in detail
14.Explain DBSCAN Algorithm with example
15.What is Grid-Based Methods?
16.Explain how STING divides the spatial region into hierarchical grids and
how statistical information is used for clustering.
17.Discuss the significance of grid partitioning and its role in the CLIQUE
clustering process.
18.List the challenges of evaluating clustering results for imbalanced datasets.
Propose a strategy to overcome these challenges. Explain any one
19.Develop an algorithm that integrates clustering tendency assessment into the
preprocessing phase of clustering. Justify your answer
20.Discuss the Extrinsic Methods
21.Explain the Intrinsic Methods

MODULE 05

1. Mining complex data types.


2. methodologies of data mining .
3. data mining application .
4. data mining and society.

You might also like