Knowledge Discovery in Databases (KDD) Lect 4

Uploaded by

asiimwemartinkab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views28 pages

Knowledge Discovery in Databases (KDD) Lect 4

Uploaded by

asiimwemartinkab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Knowledge Discovery in

Databases(KDD)
• some people treat data mining same as
Knowledge discovery while some people view
data mining essential step in process of
knowledge discovery.
• Here is the list of steps involved in knowledge
discovery process:
Here is the list of steps involved in knowledge discovery process:

• Data Cleaning - In this step the noise and inconsistent data is

removed.
• Data Integration - In this step multiple data sources are
combined.
• Data Selection - In this step relevant to the analysis task are
retrieved from the database.
• Data Transformation - In this step data are transformed or
consolidated into forms appropriate for mining by performing
summary or aggregation operations.
• Data Mining - In this step intelligent methods are applied in
order to extract data patterns.
• Pattern Evaluation - In this step, data patterns are evaluated.
Knowledge Presentation - In this step, knowledge is represented.
Data Warehouse:
• A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management's
decision making process.
• Subject-Oriented: A data warehouse can be
used to analyze a particular subject area. For
example, "sales" can be a particular subject.
• Integrated: A data warehouse integrates data
from multiple data sources. For example,
source A and source B may have different
ways of identifying a product, but in a data
warehouse, there will be only a single way of
identifying a product.
• Time-Variant: Historical data is kept in a data warehouse.
For example, one can retrieve data from 3 months, 6
months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system,
where often only the most recent data is kept. For
example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold
all addresses associated with a customer.
• Non-volatile: Once data is in the data warehouse, it will
not change. So, historical data in a data warehouse
should never be altered.
Data Warehouse Design Process:
• A data warehouse can be built using a top-
down approach, a bottom-up approach, or a
combination of both.
• The top-down approach starts with the overall design
and planning. It is useful in cases where the technology
is mature and well known, and where the business
problems that must be solved are clear and well
understood.
• The bottom-up approach starts with experiments and
prototypes. This is useful in the early stage of business
modeling and technology development.
• It allows an organization to move forward at
considerably less expense and to evaluate the benefits
of the technology before making significant
commitments.
• In the combined approach, an organization
can exploit the planned and strategic nature of
the top-down approach while retaining the
rapid implementation and opportunistic
application of the bottom-up approach
• The warehouse design process consists of the
following steps:
• Choose a business process to model, for example,
orders, invoices, shipments, inventory, account
administration, sales, or the general ledger. If the
business process is organizational and involves
multiple complex object collections, a data
warehouse model should be followed.
• However, if the process is departmental and
focuses on the analysis of one kind of business
process, a data mart model should be chosen.
• Choose the grain of the business process. The grain is
the fundamental, atomic level of data to be
represented in the fact table for this process, for
example, individual transactions, individual daily
snapshots, and so on.
• Choose the dimensions that will apply to each fact
table record. Typical dimensions are time, item,
customer, supplier, warehouse, transaction type, and
status.
• Choose the measures that will populate each fact table
record. Typical measures are numeric additive
quantities like dollars sold and units sold.
Data Warehouse Architecture:
Data Warehouses usually have a three-level
(tier) architecture that includes:
1. Bottom Tier (Data Warehouse Server)
2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools)
A bottom-tier
• A bottom-tier that consists of the Data Warehouse
server, which is almost always an RDBMS. It may
include several specialized data marts and a
metadata repository.
• Data from operational databases and external
sources (such as user profile data provided by
external consultants) are extracted using application
program interfaces called a gateway. A gateway is
provided by the underlying DBMS and allows
customer programs to generate SQL code to be
executed at a server.
A middle-tier
• A middle-tier which consists of an OLAP server for fast
querying of the data warehouse.
• The OLAP server is implemented using either
• (1) A Relational OLAP (ROLAP) model, i.e., an extended
relational DBMS that maps functions on multidimensional
data to standard relational operations.
• (2) A Multidimensional OLAP (MOLAP) model, i.e., a
particular purpose server that directly implements
multidimensional information and operations.
• A top-tier that contains front-end tools for displaying
results provided by OLAP, as well as additional tools for data
mining of the OLAP-generated data.
• The metadata repository stores information that
defines DW objects. It includes the following
parameters and information for the middle and the
top-tier applications:
• A description of the DW structure, including the
warehouse schema, dimension, hierarchies, data
mart locations, and contents, etc.
• Operational metadata, which usually describes the
currency level of the stored data, i.e., active, archived
or purged, and warehouse monitoring information,
i.e., usage statistics, error reports, audit, etc.
• System performance data, which includes indices,
used to improve data access and retrieval
performance.
• Information about the mapping from operational
databases, which provides source RDBMSs and
their contents, cleaning and transformation rules,
etc.
• Summarization algorithms, predefined queries, and
reports business data, which include business
terms and definitions, ownership information, etc.
What is Operational Data Stores?

• An ODS has been described

by Inmon and Imhoff (1996) as a subject-
oriented, integrated, volatile, current valued
data store, containing only detailed corporate
data.
• A data warehouse is a documenting database
that includes associatively recent as well as
historical information and may also include
aggregate data.
• The ODS is a subject-oriented. It is organized
around the significant information subject of an
enterprise. In a university, the subjects may be
students, lecturers and courses while in the
company the subjects might be users,
salespersons and products.
• The ODS is an integrated. That is, it is a group of
subject-oriented record from a variety of systems
to provides an enterprise-wide view of the
information.
• The ODS is a current-valued. That is, an ODS is
up-to-date and follow the current status of the
data.
• An ODS does not contain historical
information.
• Since the OLTP system data is changing all the
time, data from underlying sources refresh the
ODS as generally and frequently as possible.
• The ODS is volatile. That is, the data in the
ODS frequently changes as new data refreshes
the ODS.
• The ODS is a detailed. That is, ODS is detailed
enough to serve the need of the operational
management staff in the enterprise. The
granularity of the information in the ODS does
not have to be precisely the same as in the
source OLTP system.
• ODS Design and Implementation
• The extraction of data from source databases needs
to be efficient, and the quality of records needs to
be maintained.
• Since the data is refreshed generally and frequently,
suitable checks are required to ensure the quality of
data after each refresh.
• An ODS is a read-only database other than regular
refreshing by the OLTP systems. Customer should
not be allowed to update ODS information.
• Populating an ODS contains an acquisition
phase of extracting, transforming and loading
information from OLTP source systems.
• This procedure is ETL.
• Completing populating the database, analyze
for anomalies and testing for performance are
essential before an ODS system can go online.
Difference between Operational Data Stores and Data Warehouse
Operational Data Stores Data Warehouse
ODS means for operational reporting A data warehouse is intended for
and supports current or near real-time historical and trend analysis, usually
reporting requirements. reporting on a large volume of data.

An ODS consist of only a short window A data warehouse includes the entire
of data. history of data.
It is typically detailed data only. It contains summarized and detailed data.

It is used for detailed decision making It is used for long term decision making
and operational reporting. and management reporting.

It is used at the operational level. It is used at the managerial level.

It serves as conduct for data between It serves as a repository for cleansed and
operational and analytics system. consolidated data sets.

It is updated often as the transactions It is usually updated in batch processing

system generates new data. mode on a set schedule.
•End

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
DWDM Lecture Notes III-II (1)
No ratings yet
DWDM Lecture Notes III-II (1)
81 pages
Module 1 Notes
No ratings yet
Module 1 Notes
29 pages
Unit -II Data Warehouseing&OLAP
No ratings yet
Unit -II Data Warehouseing&OLAP
17 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
DWDM Lecture Notes U-1
No ratings yet
DWDM Lecture Notes U-1
11 pages
Data Warehouse
No ratings yet
Data Warehouse
5 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Warehousing and Data Mining Bhoj Reddy Engineering College For Women
No ratings yet
Data Warehousing and Data Mining Bhoj Reddy Engineering College For Women
11 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
Dwdm Lecture Notes III-ii_for Nlcad-6-86
No ratings yet
Dwdm Lecture Notes III-ii_for Nlcad-6-86
81 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Business Intelligence: Multi-Dimensional Analysis Tools
No ratings yet
Business Intelligence: Multi-Dimensional Analysis Tools
35 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Introduction To DW
No ratings yet
Introduction To DW
28 pages
Data Warehouse
No ratings yet
Data Warehouse
5 pages
Data Warehouse Definition: - Users and System Orientation
No ratings yet
Data Warehouse Definition: - Users and System Orientation
6 pages
Approach, or A Combination of Both
No ratings yet
Approach, or A Combination of Both
12 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
C Lecture
No ratings yet
C Lecture
8 pages
UNIT II
No ratings yet
UNIT II
59 pages
DMDW Ref
No ratings yet
DMDW Ref
26 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
DW Basics
No ratings yet
DW Basics
8 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Unit3 Notes
No ratings yet
Unit3 Notes
15 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
17 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
15 pages
Data Warehousing Slides
No ratings yet
Data Warehousing Slides
76 pages
Data Warehouse Tutorial
No ratings yet
Data Warehouse Tutorial
88 pages
Module 1 (2)
No ratings yet
Module 1 (2)
71 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Data Warehouse Material Concepts
100% (2)
Data Warehouse Material Concepts
28 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
DATA WAREHOUSING.ppt
No ratings yet
DATA WAREHOUSING.ppt
16 pages
Unit 2 Ques
No ratings yet
Unit 2 Ques
80 pages
DMDA M1
No ratings yet
DMDA M1
30 pages
Introduction To Data Warehousing Concepts
No ratings yet
Introduction To Data Warehousing Concepts
8 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Data Mining and Warehousing - L1 & L2
No ratings yet
Data Mining and Warehousing - L1 & L2
30 pages
ALL YOU NEED Data_Mining_and_Warehousing
No ratings yet
ALL YOU NEED Data_Mining_and_Warehousing
42 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Bi Units F
No ratings yet
Bi Units F
53 pages
Unit 1
No ratings yet
Unit 1
22 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
55 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
The Data Warehouse Advantage
From Everand
The Data Warehouse Advantage
Pasquale De Marco
No ratings yet
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Traveling Salesman Problem (EXT.) : Prof. U. K. Bhattacharya
No ratings yet
Traveling Salesman Problem (EXT.) : Prof. U. K. Bhattacharya
18 pages
Introduction To Artificial Intelligence 2021 NS
No ratings yet
Introduction To Artificial Intelligence 2021 NS
43 pages
E-Series Manual (N00-807-01)
No ratings yet
E-Series Manual (N00-807-01)
119 pages
Lasya Priya Capstone
No ratings yet
Lasya Priya Capstone
64 pages
4 Updated As On 29-5-21 III, V & VII Sem (Repeater) MCQ Examination TT-July 2021
No ratings yet
4 Updated As On 29-5-21 III, V & VII Sem (Repeater) MCQ Examination TT-July 2021
25 pages
Numerical Report
No ratings yet
Numerical Report
7 pages
Pixel Chart - Pixels Per Inch - Sorted by Print Size - Nations Photo Lab
No ratings yet
Pixel Chart - Pixels Per Inch - Sorted by Print Size - Nations Photo Lab
3 pages
01. Lecture PPT - Python Programming Intro v2.3 (1)
No ratings yet
01. Lecture PPT - Python Programming Intro v2.3 (1)
24 pages
SQL Express 2019 Log - Files and Folders
No ratings yet
SQL Express 2019 Log - Files and Folders
136 pages
18th November'23 - ACP Answer File - Daily Q&A
No ratings yet
18th November'23 - ACP Answer File - Daily Q&A
2 pages
Web Designing - Day 2
No ratings yet
Web Designing - Day 2
6 pages
Burhan Ahmed - iOS Application Developer - Royal Cyber
No ratings yet
Burhan Ahmed - iOS Application Developer - Royal Cyber
3 pages
CST 445-Python For Engineers
No ratings yet
CST 445-Python For Engineers
54 pages
2-SAP Customer Activity Repository Overview 20160113
No ratings yet
2-SAP Customer Activity Repository Overview 20160113
47 pages
Class Notes Deep-Learning
No ratings yet
Class Notes Deep-Learning
3 pages
Security+ Guide To Network Security Fundamentals, Third Edition
No ratings yet
Security+ Guide To Network Security Fundamentals, Third Edition
62 pages
Lab4 PF
No ratings yet
Lab4 PF
8 pages
Fluent GetStart 19.3 L01.0 Intro To ANSYS
No ratings yet
Fluent GetStart 19.3 L01.0 Intro To ANSYS
8 pages
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
UNIT-2 BIG DATA
No ratings yet
UNIT-2 BIG DATA
10 pages
Autocad 2009 Key
No ratings yet
Autocad 2009 Key
5 pages
MST 9001d Diesel Engine Ecu Test Rig Phs Instruction 1
No ratings yet
MST 9001d Diesel Engine Ecu Test Rig Phs Instruction 1
13 pages
0901EC201113_Creative_problem_solving
No ratings yet
0901EC201113_Creative_problem_solving
19 pages
Accelerated Zero Based Budgeting: Identify and Implement Quickly Up To 25% of Agility
No ratings yet
Accelerated Zero Based Budgeting: Identify and Implement Quickly Up To 25% of Agility
7 pages
Capture d’écran . 2025-02-20 à 15.08.45
No ratings yet
Capture d’écran . 2025-02-20 à 15.08.45
1 page
Assumption Log Template PDF
No ratings yet
Assumption Log Template PDF
3 pages
PRTG Report 6058 - Latency Report - KGU Site - Created 2022-11-02 21-15-49 (2022-11-01 00-00 - 2022-11-03 00-00) UTC - Part 05
No ratings yet
PRTG Report 6058 - Latency Report - KGU Site - Created 2022-11-02 21-15-49 (2022-11-01 00-00 - 2022-11-03 00-00) UTC - Part 05
4 pages
Pixil
No ratings yet
Pixil
2 pages
rs232针脚分配
100% (2)
rs232针脚分配
6 pages
DSP Iat1 Final
No ratings yet
DSP Iat1 Final
2 pages

Knowledge Discovery in Databases (KDD) Lect 4

Uploaded by

Knowledge Discovery in Databases (KDD) Lect 4

Uploaded by

Knowledge Discovery in

• Data Cleaning - In this step the noise and inconsistent data is

• An ODS has been described

It is used at the operational level. It is used at the managerial level.

It is updated often as the transactions It is usually updated in batch processing

You might also like