DMDW
DMDW
SCIENCES
SAVEETHA COLLEGE OF LIBERAL ARTS AND SCIENCES
2. What is the difference between classification and clustering in data mining? (2 Marks)
Classification: A supervised learning technique that assigns predefined labels/classes to data
points based on training data (e.g., spam vs. not spam).
Clustering: An unsupervised learning technique that groups similar data points into clusters
without predefined labels (e.g., customer segmentation).
11. Define Data mining. Describe the kinds of data used in data mining. (5 Marks)
Definition: Data mining is the process of discovering patterns, trends, and useful
information from large datasets using statistical, machine learning, and database techniques.
Kinds of Data:
1. Structured Data: Relational databases (tables with rows/columns).
2. Unstructured Data: Text, images, videos.
3. Semistructured Data: XML, JSON.
4. Timeseries Data: Stock prices, sensor data.
5. Spatial Data: Maps, geographic information.
12. Compare different data visualization tools and techniques in terms of their effectiveness
for different types of data. (5 Marks)
Bar Charts: Effective for categorical data (e.g., sales by region).
Line Graphs: Best for timeseries data (e.g., stock trends).
Scatter Plots: Suitable for numerical data showing relationships (e.g., height vs. weight).
Heatmaps: Useful for large datasets with intensity variations (e.g., website clicks).
Pie Charts: Good for showing proportions (e.g., market share), but less effective for complex
data.
13. Discuss how the kNearest Neighbors (kNN) algorithm works in classification. (5 Marks)
kNN is a supervised learning algorithm that classifies a data point based on the majority
class of its *k* nearest neighbors.
Steps:
1. Calculate the distance (e.g., Euclidean) between the new data point and all training data
points.
2. Identify the *k* closest points (neighbors).
3. Assign the class with the most votes among the *k* neighbors.
Works well for simple datasets but is computationally expensive for large data.
15. Explain in detail about the functionalities of data mining. (12 Marks)
Definition Recap: Data mining extracts knowledge from data.
Functionalities:
1. Pattern Discovery: Identifies frequent itemsets, sequences, etc.
2. Classification: Assigns labels (e.g., Decision Trees, SVM).
3. Clustering: Groups similar objects (e.g., kMeans).
4. Association Rule Mining: Finds relationships (e.g., market basket analysis).
5. Prediction: Forecasts trends (e.g., regression).
6. Anomaly Detection: Identifies outliers (e.g., fraud detection).
7. Summarization: Provides concise data representations.
8. Visualization: Aids in interpreting results.
Examples: Fraud detection (anomaly), customer segmentation (clustering).
16. How would you apply different data mining techniques to a given dataset? Provide
examples for each type. (12 Marks)
Dataset Example: Retail sales data.
Techniques:
1. Classification: Use Decision Trees to classify customers as "loyal" or "not loyal" based
on purchase history.
2. Clustering: Apply kMeans to segment customers by buying patterns.
3. Association Rule Mining: Use Apriori to find rules like "If bread, then butter."
4. Prediction: Use regression to predict next month’s sales.
5. Anomaly Detection: Detect unusual transactions (e.g., fraud).
17. Explain the concept of Association Rule Mining. Discuss the different types of
association rules and their significance. (12 Marks)
Concept: Association Rule Mining identifies relationships between items in large datasets
(e.g., "If A, then B").
Measures: Support (frequency), Confidence (strength), Lift (correlation).
Types:
1. Boolean Rules: Binary presence/absence (e.g., "bread → butter").
2. Quantitative Rules: Numeric attributes (e.g., "age > 30 → high income").
3. Multilevel Rules: Hierarchies (e.g., "dairy → milk").
4. Multidimensional Rules: Multiple attributes (e.g., "age > 30 and male → luxury car").
Significance: Improves marketing, inventory management, and decisionmaking.
18. Discuss the various types of classification algorithms with examples. (12 Marks)
Types:
1. Decision Trees: Splits data based on features (e.g., spam email detection).
2. kNearest Neighbors (kNN): Classifies based on proximity (e.g., image recognition).
3. Support Vector Machines (SVM): Finds optimal hyperplane (e.g., text classification).
4. Naive Bayes: Probabilistic classifier (e.g., sentiment analysis).
5. Neural Networks: Complex patterns (e.g., handwriting recognition).
Examples: Classifying customers (Decision Trees), disease diagnosis (SVM).
19. How does data mining influence social platforms and social behavior? (12 Marks)
Influence on Platforms:
1. Personalization: Recommends content (e.g., Netflix, YouTube).
2. Ad Targeting: Mines user data for ads (e.g., Facebook).
3. Trend Analysis: Identifies viral topics (e.g., Twitter hashtags).
4. Sentiment Analysis: Gauges public opinion (e.g., election predictions).
Influence on Behavior:
1. Shapes preferences through tailored content.
2. Encourages engagement via gamification (likes, shares).
3. Raises privacy concerns, altering trust in platforms.
Example: Mining X posts to predict user reactions to news.