0% found this document useful (0 votes)
13 views4 pages

DMDW

The document is a semester exam key for the Data Warehousing and Data Mining course at Saveetha College of Liberal Arts and Sciences. It includes various questions on topics such as OLAP, classification vs. clustering, data mining techniques, and algorithms like kMeans and kNN. Additionally, it covers applications of data mining, data visualization tools, and the influence of data mining on social platforms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

DMDW

The document is a semester exam key for the Data Warehousing and Data Mining course at Saveetha College of Liberal Arts and Sciences. It includes various questions on topics such as OLAP, classification vs. clustering, data mining techniques, and algorithms like kMeans and kNN. Additionally, it covers applications of data mining, data visualization tools, and the influence of data mining on social platforms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

SAVEETHA INSTITUTE OF MEDICAL AND TECHNICAL

SCIENCES
SAVEETHA COLLEGE OF LIBERAL ARTS AND SCIENCES

SEMESTER EXAM KEY

Sub.Code: CSA16 Sub. Name: Data Warehousing and Data Mining

Branch: BCA Year: 2025 Year: II

1. Define OLAP. (2 Marks)


OLAP (Online Analytical Processing) is a technology that enables multidimensional analysis
of data stored in data warehouses. It allows users to perform complex queries, generate
reports, and analyze data interactively from multiple perspectives (e.g., sales by region, time,
or product).

2. What is the difference between classification and clustering in data mining? (2 Marks)
Classification: A supervised learning technique that assigns predefined labels/classes to data
points based on training data (e.g., spam vs. not spam).
Clustering: An unsupervised learning technique that groups similar data points into clusters
without predefined labels (e.g., customer segmentation).

3. List down the different types of patterns. (2 Marks)


Frequent patterns (e.g., itemsets, subsequences)
Sequential patterns
Association patterns
Predictive patterns
Clustering patterns

4. Define data primitives. (2 Marks)


Data primitives are the basic attributes or elements of data used in data mining tasks, such as
data type (e.g., numeric, categorical), measurement scale (e.g., nominal, ordinal), and
specific values or ranges.

5. What are the three key components of an Association Rule? (2 Marks)


Antecedent (If): The condition or itemset that triggers the rule.
Consequent (Then): The result or itemset predicted by the rule.
Support/Confidence: Measures like support (frequency) and confidence (strength) that
validate the rule.
6. What is the Apriori Algorithm? (2 Marks)
The Apriori Algorithm is a data mining technique used to identify frequent itemsets in
transactional datasets and generate association rules. It works on the principle that all subsets
of a frequent itemset must also be frequent (Apriori property).

7. What is clustering in data mining? (2 Marks)


Clustering is the process of grouping similar objects into clusters based on their attributes,
without prior knowledge of labels. It is an unsupervised learning method used to discover
patterns or structures in data.

8. What is the objective of using the kMeans algorithm? (2 Marks)


The objective of the kMeans algorithm is to partition a dataset into *k* clusters, where each
data point belongs to the cluster with the nearest mean (centroid), minimizing the
withincluster variance.

9. What is the difference between classification and prediction? (2 Marks)


Classification: Assigns discrete labels to data points (e.g., yes/no, spam/not spam).
Prediction: Estimates continuous values or future outcomes (e.g., predicting sales revenue).

10. List down the applications of data mining. (2 Marks)


Market basket analysis
Fraud detection
Customer segmentation
Healthcare diagnostics
Predictive maintenance

11. Define Data mining. Describe the kinds of data used in data mining. (5 Marks)
Definition: Data mining is the process of discovering patterns, trends, and useful
information from large datasets using statistical, machine learning, and database techniques.
Kinds of Data:
1. Structured Data: Relational databases (tables with rows/columns).
2. Unstructured Data: Text, images, videos.
3. Semistructured Data: XML, JSON.
4. Timeseries Data: Stock prices, sensor data.
5. Spatial Data: Maps, geographic information.

12. Compare different data visualization tools and techniques in terms of their effectiveness
for different types of data. (5 Marks)
Bar Charts: Effective for categorical data (e.g., sales by region).
Line Graphs: Best for timeseries data (e.g., stock trends).
Scatter Plots: Suitable for numerical data showing relationships (e.g., height vs. weight).
Heatmaps: Useful for large datasets with intensity variations (e.g., website clicks).
Pie Charts: Good for showing proportions (e.g., market share), but less effective for complex
data.

13. Discuss how the kNearest Neighbors (kNN) algorithm works in classification. (5 Marks)
kNN is a supervised learning algorithm that classifies a data point based on the majority
class of its *k* nearest neighbors.
Steps:
1. Calculate the distance (e.g., Euclidean) between the new data point and all training data
points.
2. Identify the *k* closest points (neighbors).
3. Assign the class with the most votes among the *k* neighbors.
Works well for simple datasets but is computationally expensive for large data.

14. Explain the trend in data mining. (5 Marks)


Trends in data mining include:
Big Data Integration: Handling largescale, unstructured data.
AI and Machine Learning: Using deep learning for complex patterns.
Realtime Mining: Processing streaming data (e.g., IoT).
Privacypreserving Mining: Techniques like anonymization to protect data.
Cloudbased Mining: Leveraging cloud platforms for scalability.

15. Explain in detail about the functionalities of data mining. (12 Marks)
Definition Recap: Data mining extracts knowledge from data.
Functionalities:
1. Pattern Discovery: Identifies frequent itemsets, sequences, etc.
2. Classification: Assigns labels (e.g., Decision Trees, SVM).
3. Clustering: Groups similar objects (e.g., kMeans).
4. Association Rule Mining: Finds relationships (e.g., market basket analysis).
5. Prediction: Forecasts trends (e.g., regression).
6. Anomaly Detection: Identifies outliers (e.g., fraud detection).
7. Summarization: Provides concise data representations.
8. Visualization: Aids in interpreting results.
Examples: Fraud detection (anomaly), customer segmentation (clustering).

16. How would you apply different data mining techniques to a given dataset? Provide
examples for each type. (12 Marks)
Dataset Example: Retail sales data.
Techniques:
1. Classification: Use Decision Trees to classify customers as "loyal" or "not loyal" based
on purchase history.
2. Clustering: Apply kMeans to segment customers by buying patterns.
3. Association Rule Mining: Use Apriori to find rules like "If bread, then butter."
4. Prediction: Use regression to predict next month’s sales.
5. Anomaly Detection: Detect unusual transactions (e.g., fraud).

6. Timeseries Analysis: Analyze sales trends over time.

17. Explain the concept of Association Rule Mining. Discuss the different types of
association rules and their significance. (12 Marks)
Concept: Association Rule Mining identifies relationships between items in large datasets
(e.g., "If A, then B").
Measures: Support (frequency), Confidence (strength), Lift (correlation).
Types:
1. Boolean Rules: Binary presence/absence (e.g., "bread → butter").
2. Quantitative Rules: Numeric attributes (e.g., "age > 30 → high income").
3. Multilevel Rules: Hierarchies (e.g., "dairy → milk").
4. Multidimensional Rules: Multiple attributes (e.g., "age > 30 and male → luxury car").
Significance: Improves marketing, inventory management, and decisionmaking.

18. Discuss the various types of classification algorithms with examples. (12 Marks)
Types:
1. Decision Trees: Splits data based on features (e.g., spam email detection).
2. kNearest Neighbors (kNN): Classifies based on proximity (e.g., image recognition).
3. Support Vector Machines (SVM): Finds optimal hyperplane (e.g., text classification).
4. Naive Bayes: Probabilistic classifier (e.g., sentiment analysis).
5. Neural Networks: Complex patterns (e.g., handwriting recognition).
Examples: Classifying customers (Decision Trees), disease diagnosis (SVM).

19. How does data mining influence social platforms and social behavior? (12 Marks)
Influence on Platforms:
1. Personalization: Recommends content (e.g., Netflix, YouTube).
2. Ad Targeting: Mines user data for ads (e.g., Facebook).
3. Trend Analysis: Identifies viral topics (e.g., Twitter hashtags).
4. Sentiment Analysis: Gauges public opinion (e.g., election predictions).
Influence on Behavior:
1. Shapes preferences through tailored content.
2. Encourages engagement via gamification (likes, shares).
3. Raises privacy concerns, altering trust in platforms.
Example: Mining X posts to predict user reactions to news.

You might also like