0% found this document useful (0 votes)

14 views7 pages

module 1.

The document provides an overview of key concepts in data mining, including transactional databases, concept hierarchies, and data integration. It discusses the importance of data preprocessing, the challenges faced in data mining such as data quality and privacy issues, and the various data mining task primitives. Additionally, it highlights methods for dimensionality reduction and classification model representation.

Uploaded by

pp6524878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views7 pages

module 1.

Uploaded by

pp6524878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Part A

Module 1 : Data Mining

2020 March
1. What do you mean by a transactional database?
Transactional database in data mining refers to a database system that records individual
transactions, such as purchases or reservations, and is commonly used to analyze patterns
and relationships in large volumes of data.

2. What is a concept hierarchy? Give an example.

Concept hierarchy in data mining refers to a hierarchical organization of related concepts
or categories, with each level of the hierarchy representing a different level of abstraction or
generalization. For example, in a hierarchy of animal species, the top level may be
"Animals," the second level may be "Mammals," the third level may be "Carnivores" and so
on, with each subsequent level representing a more specific category.

3. What is background knowledge? Give an example.

Background knowledge refers to information that is known about the data or domain being
analyzed and can be used to inform the mining process or interpret the results. For example,
in analyzing customer purchasing patterns, background knowledge about seasonal trends or
marketing campaigns could be used to help identify relevant patterns in the data.

2021 April
1. What do you mean by data mining?
Data mining is the process of discovering patterns, relationships, and insights from large
volumes of data, using statistical and machine learning techniques to identify hidden
patterns or knowledge.

2. What do you mean by interestingness?

Interestingness refers to the degree to which a discovered pattern or relationship is novel,
valid, useful, and understandable to the domain expert.

3. List two methods for dimensionality reduction.

● Principal Component Analysis (PCA): A statistical method that identifies the most
important variables in a dataset and reduces the number of dimensions by projecting
the data onto a new coordinate system based on the principal components.
● t-SNE (t-Distributed Stochastic Neighbor Embedding): A nonlinear dimensionality
reduction technique that is particularly useful for visualizing high-dimensional
datasets by preserving the local structure of the data while also revealing global
patterns and relationships.

2022 April
1. What is a multimedia database?
Multimedia database is a database that stores multimedia data such as images, audio, and
video, and allows efficient retrieval of this data.

2. Name different methods by which a classification model can

be represented.
A classification model can be represented using various methods such as decision trees,
rule-based systems, neural networks, support vector machines (SVM), and k-nearest
neighbor (k-NN) algorithms.

3. What is numerosity reduction?

Numerosity reduction is a process of reducing the number of data instances or objects in a
dataset while preserving the important characteristics and relationships between the data
points, often used for reducing the computational complexity of data mining algorithms.
Part B
Module 1 : Data Mining

2020 March
13. Explain data discretization and concept hierarchy
generation.
Data discretization is the process of converting continuous numerical data into categorical
data by partitioning the range of values into intervals, or bins, and assigning each value to
the corresponding interval. This technique is used to simplify data analysis and reduce the
number of variables in a dataset.

Concept hierarchy generation, on the other hand, is the process of organizing categorical
data into a hierarchical structure of concepts or categories based on their relationships, such
as generalization or specialization. This technique is used to create a meaningful and
organized representation of categorical data for analysis and decision-making.

2021 April
13. Differentiate classification and prediction.
Classification and prediction are two fundamental tasks in data mining that involve building
models to predict the class or value of a target variable based on a set of input variables.
The main difference between classification and prediction is the type of target variable.

In classification, the target variable is a categorical variable, and the goal is to predict the
class or category of the target variable based on the input variables. Examples of
classification include predicting whether a customer will churn or not, or whether a tumor is
malignant or benign.

In prediction, the target variable is a continuous numerical variable, and the goal is to
predict the value of the target variable based on the input variables. Examples of prediction
include predicting the price of a house or the revenue of a business.

2022 April
13. Explain the concept of data integration.
Data integration is the process of combining data from multiple sources into a single,
unified view that can be used for analysis and decision-making. This process involves
identifying and resolving any inconsistencies or conflicts in the data, such as differences in
data formats, units of measurement, or data structures, to ensure that the data is accurate
and complete.

Data integration is a crucial step in data mining, as it enables analysts to work with a larger
and more diverse set of data, and to gain insights that may not be possible with individual
data sources. Some common techniques for data integration include data warehousing,
which involves storing and organizing data from multiple sources in a centralized repository,
and data fusion, which involves combining data from multiple sources to create a more
comprehensive and accurate representation of the underlying phenomenon.
Part C
Module 1 : Data Mining

2020 March
22. Explain why the data needs to be preprocessed before
mining.
Data preprocessing is a crucial step in the data mining process. It involves transforming
raw data into a clean and structured format that can be analyzed to extract meaningful
insights. There are several reasons why data preprocessing is necessary before data
mining:

● Data quality improvement: Raw data may contain errors, inconsistencies, missing
values, outliers, and noise that can affect the accuracy of the analysis. Data
preprocessing helps to identify and correct these issues, resulting in improved data
quality.

● Data integration: Data may be stored in different formats, sources, and structures.
Data preprocessing helps to integrate data from different sources into a common
format, making it easier to analyze.

● Data reduction: Raw data may contain a large number of attributes, some of which
may be irrelevant or redundant for analysis. Data preprocessing helps to reduce the
dimensionality of data by selecting relevant attributes, resulting in faster and more
accurate analysis.

● Data normalization: Raw data may be expressed in different units and scales. Data
preprocessing helps to normalize data by scaling it to a common range, making it
easier to compare and analyze.

● Data transformation: Raw data may not be suitable for analysis using certain
algorithms or models. Data preprocessing helps to transform data into a suitable
format for analysis.

Overall, data preprocessing is essential for accurate and efficient data mining. It helps to
improve data quality, reduce noise, integrate data from different sources, reduce
dimensionality, normalize data, and transform data into a suitable format for analysis.
2021 April
22. Explain major issues in data mining.
Data mining, despite its immense potential, is a complex process fraught with challenges
and issues. Below are some of the major issues in data mining:

● Data Quality: The quality of the data being analyzed is a critical factor in the success
of any data mining project. The data must be clean, consistent, and accurate to
ensure that the results are meaningful and actionable. However, data from various
sources may contain missing or incorrect values, outliers, or noise, which can impact
the accuracy and validity of the results.

● Scalability: With the explosion of data in recent years, the volume of data to be
processed by data mining algorithms has increased exponentially. This increase in
data volume can be challenging for algorithms that are not designed to handle such
large data sets. Therefore, scalability is a major issue in data mining that needs to be
addressed to ensure that the algorithms are efficient and can handle large data sets.

● Data Privacy and Security: Data mining involves the use of sensitive data, such as
financial records or medical records, which may be subject to privacy laws or
regulations. Therefore, data privacy and security are crucial issues that must be
addressed to ensure that the data is not misused or compromised.

● Interpretability: Another major issue in data mining is the interpretability of the

results. Data mining algorithms often generate complex models that may be difficult
to interpret or understand by non-experts. Therefore, it is important to ensure that the
results of data mining are presented in a way that is understandable and actionable
by decision-makers.

● Algorithmic Bias: Data mining algorithms may be subject to algorithmic bias, which
is the tendency of algorithms to favor certain groups or individuals over others.
Algorithmic bias can result in unfair or discriminatory outcomes, which can have
serious consequences for the individuals or groups affected.

● Ethics: Data mining involves the collection and use of data, which can raise ethical
concerns. For example, the use of data mining for surveillance or profiling may be
considered unethical or illegal in some contexts. Therefore, ethical considerations
must be taken into account when designing and implementing data mining projects.

In conclusion, data mining is a powerful tool for uncovering insights and patterns in large
data sets, but it also poses several challenges and issues. Addressing these challenges is
crucial to ensure that the results of data mining are accurate, reliable, and actionable.
2022 April
22. Explain various data mining task primitives.
Data mining task primitives are the basic building blocks of the data mining process, which
define the type of patterns that can be mined from a dataset. There are several data mining
task primitives that are widely used in the field of data mining. Some of the important task
primitives are:

1. The set of task-relevant data to be mined: This refers to the portion of the
database that the user is interested in. It could include specific attributes, dimensions
of interest in a data warehouse, or any other relevant data that the user wants to
extract insights from.

2. The kind of knowledge to be mined: This refers to the specific function or analysis
that the user wants to perform. For example, the user may want to perform
classification, clustering, or association analysis on the data.

3. The background knowledge to be used in the discovery process: This refers to

any prior knowledge that the user has about the data, which can be used to improve
the accuracy and relevance of the data mining results. For example, the user may
have information about certain relationships or dependencies in the data, which can
be used to guide the mining process.

4. The interestingness measures and thresholds for pattern evaluation: This refers
to the criteria that are used to determine the usefulness or significance of the
patterns discovered during the data mining process. For example, the user may set a
threshold for the minimum support level or confidence level of association rules that
are considered interesting.

5. The expected representation for visualizing the discovered patterns: This refers
to the form in which the user wants to visualize the patterns that are discovered. This
could include various forms such as tables, graphs, charts, decision trees, or cubes.
The visualization is an important aspect of the data mining process as it can help the
user to better understand and interpret the results.

By using these data mining task primitives, different types of patterns can be identified in the
data. These patterns can help in making informed decisions and improving the overall
efficiency of the process.

Dta Mining
No ratings yet
Dta Mining
15 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
Whats App
No ratings yet
Whats App
23 pages
DM Question Bank
No ratings yet
DM Question Bank
50 pages
Soln 1
100% (1)
Soln 1
6 pages
UNIT-2 LONG ANSWER TYPE QUESTIONS
No ratings yet
UNIT-2 LONG ANSWER TYPE QUESTIONS
15 pages
DS notes BCA
No ratings yet
DS notes BCA
16 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
Unit 1
No ratings yet
Unit 1
18 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
DM Module1
No ratings yet
DM Module1
15 pages
Data Mining
No ratings yet
Data Mining
15 pages
6 TheRealTimeFaceDetectionandRecognitionSystem
No ratings yet
6 TheRealTimeFaceDetectionandRecognitionSystem
48 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Down 2
No ratings yet
Down 2
61 pages
DM_Midsem_Question Bank (1)
No ratings yet
DM_Midsem_Question Bank (1)
5 pages
DATA MINING MODULE 2
No ratings yet
DATA MINING MODULE 2
23 pages
DATA MINING Notes (Upate)
No ratings yet
DATA MINING Notes (Upate)
25 pages
Data Warehouse and Data Mining- Definition and Concepts
No ratings yet
Data Warehouse and Data Mining- Definition and Concepts
20 pages
CHAPTER1-datamining
No ratings yet
CHAPTER1-datamining
33 pages
Unit-1
No ratings yet
Unit-1
7 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Data Mining Unit-II
No ratings yet
Data Mining Unit-II
4 pages
BI_Unit 5
No ratings yet
BI_Unit 5
9 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Unit 3
No ratings yet
Unit 3
34 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Unit-1 PPT
No ratings yet
Unit-1 PPT
21 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Unit - 2
No ratings yet
Unit - 2
17 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Data Mining
No ratings yet
Data Mining
5 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Data Mining
No ratings yet
Data Mining
20 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Data Mining
No ratings yet
Data Mining
26 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Module-2-Data Mining
No ratings yet
Module-2-Data Mining
48 pages
Screenshot 2025-04-09 at 10.35.12 AM
No ratings yet
Screenshot 2025-04-09 at 10.35.12 AM
31 pages
Data Moning Seminar Report
No ratings yet
Data Moning Seminar Report
12 pages
Activity 1 PDF
No ratings yet
Activity 1 PDF
3 pages
DATA MINING-Knowledge Discovery in Databases
No ratings yet
DATA MINING-Knowledge Discovery in Databases
6 pages
DATA MINING Notes
No ratings yet
DATA MINING Notes
37 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
DM&DW SEE Module 1
No ratings yet
DM&DW SEE Module 1
6 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Data mining
No ratings yet
Data mining
8 pages
Question Bank DMC
No ratings yet
Question Bank DMC
28 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
module 5-1
No ratings yet
module 5-1
6 pages
module 4-2
No ratings yet
module 4-2
7 pages
DocScanner 15-Nov-2024 3-14 PM
No ratings yet
DocScanner 15-Nov-2024 3-14 PM
8 pages
It and Environment 2021 Min
No ratings yet
It and Environment 2021 Min
2 pages
Artificial Intelligence in Power Station
No ratings yet
Artificial Intelligence in Power Station
12 pages
Mod3 7
No ratings yet
Mod3 7
15 pages
Mod5 2
No ratings yet
Mod5 2
20 pages
Mod3 4
No ratings yet
Mod3 4
7 pages
Mod3 12
No ratings yet
Mod3 12
15 pages
Mod5 10
No ratings yet
Mod5 10
14 pages
Mod3 14
No ratings yet
Mod3 14
9 pages
Media Notes For BCA VTH Sem
No ratings yet
Media Notes For BCA VTH Sem
16 pages
Cryptography Module 5
No ratings yet
Cryptography Module 5
24 pages
Mod3 13
No ratings yet
Mod3 13
15 pages
QP CODE: 23142447: Reg No: Name
No ratings yet
QP CODE: 23142447: Reg No: Name
2 pages
Bca 5 Sem Computer Networks e 5364 Oct 2018
No ratings yet
Bca 5 Sem Computer Networks e 5364 Oct 2018
2 pages
Sem 3 Statistics Chapter 2.
No ratings yet
Sem 3 Statistics Chapter 2.
8 pages
Rosroca Ol Mini En.
No ratings yet
Rosroca Ol Mini En.
2 pages
A Guide To Ethanol Extraction PDF
No ratings yet
A Guide To Ethanol Extraction PDF
6 pages
Electrophon AF462 22AF462 R-Player Philips - Österreich
No ratings yet
Electrophon AF462 22AF462 R-Player Philips - Österreich
2 pages
Execution and Business Plan-Victorio, P.A
No ratings yet
Execution and Business Plan-Victorio, P.A
26 pages
Normative Values For The Voice Handicap Index-10: Yzpittsburgh, Pennsylvania
No ratings yet
Normative Values For The Voice Handicap Index-10: Yzpittsburgh, Pennsylvania
4 pages
Lovato - General Purpose Relays
No ratings yet
Lovato - General Purpose Relays
14 pages
Ndodontics: Rubber Dam Frames and Accesories
No ratings yet
Ndodontics: Rubber Dam Frames and Accesories
5 pages
IPv6 IPv4 Presentation
No ratings yet
IPv6 IPv4 Presentation
20 pages
Letters For Validators 1
No ratings yet
Letters For Validators 1
4 pages
PR1 Module 1-FORMULATING RECOMMENDATIONS
No ratings yet
PR1 Module 1-FORMULATING RECOMMENDATIONS
49 pages
Custom Officer Syllabus
No ratings yet
Custom Officer Syllabus
50 pages
FRM Operator Booklet
No ratings yet
FRM Operator Booklet
52 pages
E Info Tulip Water Filter February 2010
No ratings yet
E Info Tulip Water Filter February 2010
3 pages
300-02 - Op Amp-II
No ratings yet
300-02 - Op Amp-II
18 pages
Business Cycle Shilpa
No ratings yet
Business Cycle Shilpa
9 pages
Andrew Loomis
No ratings yet
Andrew Loomis
2 pages
Resume - Aparna Sinha (Oct '23) v2
No ratings yet
Resume - Aparna Sinha (Oct '23) v2
1 page
TBA 09 Detailing of Clay Masonry
No ratings yet
TBA 09 Detailing of Clay Masonry
42 pages
Advanced Statistics
No ratings yet
Advanced Statistics
28 pages
2021 Crops Production Survey Manual of Operations
No ratings yet
2021 Crops Production Survey Manual of Operations
66 pages
SDGSCN 2nd Quarterly Meeting Report
No ratings yet
SDGSCN 2nd Quarterly Meeting Report
8 pages
Handbook of Simulation
No ratings yet
Handbook of Simulation
26 pages
Special Educational Needs Inclusion and Diversity
No ratings yet
Special Educational Needs Inclusion and Diversity
35 pages
178 Product Specification Dark Chocolate and Hazelnut Cookies 200g 23-04-2020
No ratings yet
178 Product Specification Dark Chocolate and Hazelnut Cookies 200g 23-04-2020
5 pages
Rev5 RAB CCTV GMB1
No ratings yet
Rev5 RAB CCTV GMB1
1 page
Project Proposal On Secured Shelter For Flood Victims in Krishna District, Andhra Pradesh
No ratings yet
Project Proposal On Secured Shelter For Flood Victims in Krishna District, Andhra Pradesh
3 pages
Technical Specification Station Wagon
No ratings yet
Technical Specification Station Wagon
3 pages
Kicker: DWG STR A 08-2
No ratings yet
Kicker: DWG STR A 08-2
1 page
ToR - UB Green Housing FS
No ratings yet
ToR - UB Green Housing FS
34 pages

module 1.

Uploaded by

module 1.

Uploaded by

Part A

Module 1 : Data Mining

2. What is a concept hierarchy? Give an example.

3. What is background knowledge? Give an example.

2. What do you mean by interestingness?

3. List two methods for dimensionality reduction.

2. Name different methods by which a classification model can

3. What is numerosity reduction?

● Interpretability: Another major issue in data mining is the interpretability of the

3. The background knowledge to be used in the discovery process: This refers to

You might also like