DM&DW SEE Module 1

Uploaded by

1NC21IS033 Manoj jayanth N

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

DM&DW SEE Module 1

Uploaded by

1NC21IS033 Manoj jayanth N

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

4.

Explain the major issues in data mining

i. Mining Methodology and User Interaction Issues It refers to the following kinds of
issues –
• Mining different kinds of knowledge in databases − Different users may be interested in
different kinds of knowledge. Therefore it is necessary for data mining to cover a broad range of
knowledge discovery task.
• Interactive mining of knowledge at multiple levels of abstraction − The data mining process
needs to be interactive because it allows users to focus the search for patterns, providing
• Incorporation of background knowledge − To guide discovery process and to express the
discovered patterns, the background knowledge can be used. Background knowledge may be used
to express the discovered patterns
• Data mining query languages and ad hoc data mining − Data Mining Query language that allows
the user to describe ad hoc mining tasks, should be integrated with a data warehouse query
language
• Presentation and visualization of data mining results − Once the patterns are discovered it
needs to be expressed in high level languages, and visual representations. These representations
should be easily understandable.
• Handling noisy or incomplete data − The data cleaning methods are required to handle the
noise and incomplete objects while mining the data regularities. If the data cleaning methods are
not there then the accuracy of the discovered patterns will be poor.
• Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.
ii. Performance Issues –
• Efficiency and scalability of data mining algorithms − In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be efficient
and scalable.
• Parallel, distributed, and incremental mining algorithms − The factors such as huge size of
databases.These algorithms divide the data into partitions which is further processed in a
parallel fashion. Then the results from the partitions is merged.
iii. Diverse Data Types Issues
• Handling of relational and complex types of data − The database may contain complex data
objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one
system to mine all these kind of data.
• Mining information from heterogeneous databases − The data is available at different data
sources on LAN or WAN. These data source may be structured, semi structured or
unstructured.
5.Define structured and unstructured data in the context of data mining
i.Structured Data:
 Structured data refers to data that is organized and formatted in a predefined
manner, typically residing in databases or structured files such as spreadsheets.
Characteristics of structured data include:
 Organized into rows and columns, with each column representing a specific
attribute or variable.
 Conforms to a fixed schema, specifying the data types and relationships between
different attributes.
 Examples include relational databases, CSV files, Excel spreadsheets, and
structured XML or JSON documents.
 Structured data is well-suited for traditional data mining techniques and relational
database systems
ii.Unstructured Data:
 Unstructured data refers to data that lacks a predefined structure or organization,
making it more challenging to analyze using traditional methods.
 Lack of a fixed schema, with data often stored in formats such as text documents,
emails, images, videos, audio recordings, social media posts, and web pages.
 May contain a wide variety of information, including text, multimedia content, and
semi-structured data.
 Often contains valuable insights and hidden patterns but requires specialized
techniques to extract and analyze.

6.Discuss the importance of data preprocessing in the context of the data mining process.
1.9.1Data Integration:
 It combines data from multiple sources into a coherent data store, as in data
warehousing.
 These sources may include multiple databases, data cubes, or flat files.
 The data integration systems are formally defined as
 triple<G,S,M>
Where G: The global schema
S:Heterogeneous source of schemas
M: Mapping between the queries of source and global schema
1.9.2 Issues in Data integration:
1. Schema integration and object matching: How can the data analyst or the computer
be sure that customer id in one database andcustomer number in another reference to the
same attribute.
2. Redundancy: An attribute (such as annual revenue, forinstance) may be redundant if it
can be derived from another attribute or set ofattributes. Inconsistencies in attribute or
dimension namingcan also cause redundanciesin the resulting data set.
3. detection and resolution of datavalue conflicts: For the same real-world entity,
attribute values fromdifferent sources may differ.
1.9.3 Data Transformation:
Smoothing, which works to remove noise from the data. Such techniques include
binning,regression, and clustering.
Aggregation, aggregation operations are applied to the data. This step is typically used in
constructing a data cube for analysis ofthe data at multiple granularities.
1.9.4 Data Reduction:
Data reduction techniques can be applied to obtain a reduced representation of thedata
set that ismuch smaller in volume, yet closely maintains the integrity of the originaldata.
Data cube aggregation, where aggregation operations are applied to the data in
theconstruction of a data cube.
Attribute subset selection, where irrelevant, weakly relevant, or redundant attributesor
dimensions may be detected and removed.
Dimensionality reduction, where encoding mechanisms are used to reduce the dataset
size.

7.What are the four main problems of data mining functionality? Explain each one of them
Data mining functionalities are used to represent the type of patterns that have to be
discovered in data mining tasks. In general, data mining tasks can be classified into two
types including descriptive and predictive.

Data characterization − The output of data characterization can be presented in multiple

forms.The data corresponding to the user-specified class is generally collected by a
database query.

Data discrimination − It is a comparison of the general characteristics of target class data

objects with the general characteristics of objects from one or a set of contrasting
classes.

Association Analysis − It analyses the set of items that generally occur together in a
transactional dataset.

Prediction − It defines predict some unavailable data values or pending trends. An object
can be anticipated based on the attribute values of the object and attribute
values of the classes.

Clustering − It is similar to classification but the classes are not predefined. The classes
are represented by data attributes. It is unsupervised learning.

Outlier analysis − Outliers are data elements that cannot be grouped in a given class
These are the data objects which have multiple behaviour from the general behaviour of
other data objects.

8. Identify common challenges associated with integrating a data mining system with a data
warehouse.
Integrating Data Mining systems with Databases and Data Warehouses with these methods
• No Coupling
• Loose Coupling
• Semi-Tight Coupling
• Tight Coupling

a) No Coupling
No coupling means that a DM system will not utilize any function of a DB or DW system. It may
fetch data from a particular source (such as a file system), process data using some data mining
algorithms, and then store the mining results in another file.
b) Loose Coupling
 Loose coupling means that a Data Mining system will use some facilities of a Database or
Data warehouse system, fetching data from a data repository managed
 Loose coupling is better than no coupling because it can fetch any portion of data stored
in Databases or Data Warehouses by using query processing
c) Semi-Tight Coupling
 These primitives can include sorting, indexing, aggregation, histogram analysis, multi-way join,
and pre-computation of some essential statistical measures, such as sum, count, max, min,
standard deviation.
 The semi-tight coupling means that besides linking a Data Mining system to a Database/Data
Warehouse system
d) Tight coupling
 Tight coupling means that a Data Mining system is smoothly integrated into the
Database/Data Warehouse system.
 The data mining subsystem is treated as one functional component of the information
system.

9.Discuss how data mining is applied in healthcare settings.

 Disease Prediction and Prevention: Analyzing patient data to predict diseases, allowing for
early intervention and prevention strategies.
 Drug Discovery: Analyzing biological data to identify potential drug compounds and accelerate
drug discovery processes.
 Healthcare Fraud Detection: Identifying fraudulent claims and activities in healthcare
insurance and billing.

10.Describe the role of data mining in detecting financial fraud.

 Fraud Detection: Identifying unusual patterns in transactions to detect credit card fraud, identity
theft, etc.
 Credit Scoring: Assessing the creditworthiness of applicants based on historical financial data.
 Algorithmic Trading: Analyzing historical data to develop trading strategies and predict market
trends.

AGV Project Report
100% (7)
AGV Project Report
46 pages
Data Mining and Warehouse
No ratings yet
Data Mining and Warehouse
30 pages
Lecture 4 - 6
No ratings yet
Lecture 4 - 6
18 pages
Issues in Data Mining
No ratings yet
Issues in Data Mining
4 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
20 pages
L-1 Data Mining Issues
No ratings yet
L-1 Data Mining Issues
24 pages
Module 2 Data Mining
No ratings yet
Module 2 Data Mining
49 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining
No ratings yet
Data Mining
15 pages
Unit 1 Data Mining task
No ratings yet
Unit 1 Data Mining task
7 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
No ratings yet
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
5 pages
Whats App
No ratings yet
Whats App
23 pages
Module-2-Data Mining
No ratings yet
Module-2-Data Mining
48 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
unit-1-dm
No ratings yet
unit-1-dm
62 pages
Data Mining
100% (1)
Data Mining
29 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
DM Notes (6th Nov)
No ratings yet
DM Notes (6th Nov)
6 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Unit 2: Big Data Analytics
No ratings yet
Unit 2: Big Data Analytics
45 pages
module 1
No ratings yet
module 1
41 pages
Data Integration in Data Mining
No ratings yet
Data Integration in Data Mining
1 page
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
Major issues in DM
No ratings yet
Major issues in DM
5 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
Data Science
No ratings yet
Data Science
13 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Data Mining Moodle Notes U1
No ratings yet
Data Mining Moodle Notes U1
11 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
CHAPTER1-datamining
No ratings yet
CHAPTER1-datamining
33 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
Data Mining
No ratings yet
Data Mining
20 pages
DMDW Imp Ques
No ratings yet
DMDW Imp Ques
17 pages
Data Mining Questions
No ratings yet
Data Mining Questions
24 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
DM Unit2(Part1)
No ratings yet
DM Unit2(Part1)
19 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
2 pages
Unit I
No ratings yet
Unit I
19 pages
DataMining S
No ratings yet
DataMining S
103 pages
DWDMunit 2
No ratings yet
DWDMunit 2
27 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Week1-2
No ratings yet
Week1-2
24 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
MSI PE60 2QUE Laptop Manual
No ratings yet
MSI PE60 2QUE Laptop Manual
54 pages
Data Mesh Meets Blockchain
No ratings yet
Data Mesh Meets Blockchain
15 pages
Bts 06 Developerstroubleshootingguide
No ratings yet
Bts 06 Developerstroubleshootingguide
132 pages
System Analysis and Design Complete Lecture Note CSC 317
No ratings yet
System Analysis and Design Complete Lecture Note CSC 317
62 pages
GCP - Associate Cloud Engineer Exam
No ratings yet
GCP - Associate Cloud Engineer Exam
9 pages
Free Online Barcode Generator Create Barcodes for Free!
No ratings yet
Free Online Barcode Generator Create Barcodes for Free!
1 page
Log
No ratings yet
Log
94 pages
Grammarly SOC 3 Report FY23
No ratings yet
Grammarly SOC 3 Report FY23
32 pages
Clock Synchronization
No ratings yet
Clock Synchronization
2 pages
Vxblock Product Overview
No ratings yet
Vxblock Product Overview
4 pages
Chapter 3 Packettracer Introduction: Learning Objectives
No ratings yet
Chapter 3 Packettracer Introduction: Learning Objectives
5 pages
Ahmedabad
No ratings yet
Ahmedabad
63 pages
4 5969937999511686081
No ratings yet
4 5969937999511686081
6 pages
Kotlin Docs PDF
No ratings yet
Kotlin Docs PDF
620 pages
Intelligent Trash Bin Management System
No ratings yet
Intelligent Trash Bin Management System
15 pages
History of Operating Sytem
100% (1)
History of Operating Sytem
6 pages
By Mrbrightside: Software Design and Development - HSC Course Notes
100% (1)
By Mrbrightside: Software Design and Development - HSC Course Notes
72 pages
Details of Vacancy - NITPY 01.03.2024
No ratings yet
Details of Vacancy - NITPY 01.03.2024
3 pages
FYBSc-COMPUTER-SCIENCE-SEM2 - Slip
No ratings yet
FYBSc-COMPUTER-SCIENCE-SEM2 - Slip
42 pages
Auditing Computerized Ais
100% (1)
Auditing Computerized Ais
1 page
SAP ABAP Scripts Interview Questions and Answers PDF
No ratings yet
SAP ABAP Scripts Interview Questions and Answers PDF
6 pages
Ipv6 Address Categories: Unicast
No ratings yet
Ipv6 Address Categories: Unicast
2 pages
B4A y MySQL
100% (1)
B4A y MySQL
1 page
Nightshade
No ratings yet
Nightshade
39 pages
IOT Smart City PowerPoint Templates
No ratings yet
IOT Smart City PowerPoint Templates
48 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
1 page
Time Charateristics and Current Member Variables: Symptom
No ratings yet
Time Charateristics and Current Member Variables: Symptom
3 pages
LPG Leakage and Flame Detection With Notification and Alarm System
No ratings yet
LPG Leakage and Flame Detection With Notification and Alarm System
17 pages
User Manual Panasonic SC-AKX200 (English - 2 Pages)
No ratings yet
User Manual Panasonic SC-AKX200 (English - 2 Pages)
2 pages

DM&DW SEE Module 1

Uploaded by

DM&DW SEE Module 1

Uploaded by

4.

Explain the major issues in data mining

Data characterization − The output of data characterization can be presented in multiple

Data discrimination − It is a comparison of the general characteristics of target class data

9.Discuss how data mining is applied in healthcare settings.

10.Describe the role of data mining in detecting financial fraud.

You might also like