0% found this document useful (0 votes)

87 views71 pages

Unit - 4 Final

A data warehouse is a relational database designed for analysis rather than transactions. It contains historical data from multiple sources integrated into a single view. Data warehouses use a multidimensional model to organize data into facts and dimensions for analysis. Common operations on this model include roll-ups to aggregate data, drill-downs for finer detail, slicing to filter dimensions, dicing to filter multiple dimensions, and pivoting to change the data view. This multidimensional structure allows for flexible analysis of business trends over time.

Uploaded by

SHREYA M PSGRKCW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views71 pages

Unit - 4 Final

Uploaded by

SHREYA M PSGRKCW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Data Mining Techniques

AP20C12
UNIT IV:

DATA WAREHOUSING: Introduction: What is a data warehouse?-Definition.

Multidimensional Data model-OLAP Operations-Warehouse Schema- Data
Warehousing Architecture- Warehouse Server-Metadata- OLAP Engine- Data
Warehouse Backend Process. Other Features.
What is a Data Warehouse?
• A Data Warehouse (DW) is a relational database that is designed for query and
analysis rather than transaction processing. It includes historical data derived
from transaction data from single and multiple sources
• A Data Warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision-makers for data modeling and analysis
• A Data Warehouse is a group of data specific to the entire organization, not only
to a particular group of users
• It is not used for daily operations and transaction processing but used for
making decisions
Continued…
• A Data Warehouse can be viewed as a data system with the following attributes:

• It is a database designed for investigative tasks, using data from various applications

• It supports a relatively small number of clients with relatively long interactions

• It includes current and historical data to provide a historical perspective of information

• Its usage is read-intensive

• It contains a few large tables

• Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

support of management's decisions
Characteristics and Definition of Data
Warehouse:
Subject-Oriented:
• A data warehouse target on the modeling and analysis of data for decision-makers
• Therefore, data warehouses typically provide a concise and straightforward view around a
particular subject, such as customer, product, or sales, instead of the global organization's
ongoing operations
• This is done by excluding data that are not useful concerning the subject and including all data
needed by the users to understand the subject
Integrated:
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and online
transaction records
It requires performing data cleaning and integration during data warehousing to ensure
consistency in naming conventions, attributes types, etc., among different data sources
Time-Variant
• Historical information is kept in a data warehouse
• For example, one can retrieve files from 3 months, 6 months, 12 months, or even previous data
from a data warehouse
• These variations with a transactions system, where often only the most current file is kept
Non-Volatile
• The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS
• The operational updates of data do not occur in the data warehouse, i.e., update, insert, and
delete operations are not performed
• It usually requires only two procedures in data accessing: Initial loading of data and access to
data
• Therefore, the DW does not require transaction processing, recovery, and concurrency
capabilities, which allows for substantial speedup of data retrieval. Non-Volatile defines that
once entered into the warehouse, and data should not change
MultiDimensional Data Model
• The multi-Dimensional Data Model is a method which is used for ordering data in the database
along with good arrangement and assembling of the contents in the database
• The Multi Dimensional Data Model allows customers to interrogate analytical questions
associated with market or business trends, unlike relational databases which allow customers
to access data in the form of queries
• They allow users to rapidly receive answers to the requests which they made by creating and
examining the data comparatively fast
Continued…
• OLAP (online analytical processing) and data warehousing uses multi dimensional databases

• It is used to show multiple dimensions of the data to users

• It represents data in the form of data cubes

• Data cubes allow to model and view the data from many dimensions and perspectives

• It is defined by dimensions and facts and is represented by a fact table

• Facts are numerical measures and fact tables contain measures of the related dimensional
tables or names of the facts
Working on a Multidimensional Data Model
On the basis of the pre-decided steps, the Multidimensional Data Model works
The following stages should be followed by every project for building a Multi Dimensional Data
Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model collects
correct data from the client. Mostly, software professionals provide simplicity to the client about
the range of data which can be gained with the selected technology and collect the complete data in
detail
Stage 2 : Grouping different segments of the system : In the second stage, the Multi Dimensional
Data Model recognizes and classifies all the data to the respective section they belong to and also
builds it problem-free to apply step by step
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the design
of the system is based. In this stage, the main factors are recognized according to the user's point
of view. These factors are also known as "Dimensions"
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related
qualities. These qualities are also known as "attributes" in the database
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In the fifth
stage, A Multi Dimensional Data Model separates and differentiates the actuality from the factors
which are collected by it. These actually play a significant role in the arrangement of a Multi
Dimensional Data Model
Stage 6 : Building the Schema to place the data, with respect to the information collected from the
steps above : In the sixth stage, on the basis of the data which was collected previously, a Schema is
built
Data Cube:
• Grouping of data in a multidimensional matrix is called data cubes
• In Data ware housing, we generally deal with various multidimensional data models as the
data will be represented by multiple dimensions and multiple attributes
• This multidimensional data is represented in the data cube as the cube represents a high-
dimensional space
• The Data cube pictorially shows how different attributes of data are arranged in the data model.
Below is the diagram of a general data cube

The example above is a 3D cube having attributes like branch(A,B,C,D),item type(home,

entertainment, computer, phone, security), year(1997,1998,1999)
Data cube classification:
• The data cube can be classified into two categories:
Multidimensional data cube: It basically helps in storing large amounts of data by making use of a
multi-dimensional array
• It increases its efficiency by keeping an index of each dimension
• Thus, dimensional is able to retrieve data fast
Relational data cube: It basically helps in storing large amounts of data by making use of relational
tables
• Each relational table displays the dimensions of the data cube
• It is slower compared to a Multidimensional Data Cube
Data Cube Operations:

Data cube operations are used to manipulate data to meet the needs of users
These operations help to select particular data for the analysis purpose
Roll-up: operation and aggregate certain similar data attributes having the same dimension
together

• For example, if the data cube displays the daily income of a customer, we can use a roll-up
operation to find the monthly income of his salary

Drill-down: this operation is the reverse of the roll-up operation

• It allows us to take particular information and then subdivide it further for coarser
granularity analysis

• It zooms into more detail

• For example- if India is an attribute of a country column and we wish to see villages in
India, then the drill-down operation splits India into states, districts, towns, cities, villages
and then displays the required information
Slicing: This operation filters the unnecessary portions
• Suppose in a particular dimension, the user doesn't need everything for analysis, rather a
particular attribute
• For example, country="jamaica", this will display only about Jamaica and only display other
countries present on the country list.
Dicing: This operation does a multidimensional cutting, that not only cuts only one dimension but
also can go to another dimension and cut a certain range of it
• For example- the user wants to see the annual salary of Jharkhand state employees
Pivot: This operation is very important from a viewing point of view
• It basically transforms the data cube in terms of view
• It doesn't change the data present in the data cube
• For example, if the user is comparing year versus branch, using the pivot operation, the user
can change the viewpoint and now compare branch versus item type
Advantages of data cubes:

• Helps in giving a summarized view of data

• Data cubes store large data in a simple way

• Data cube operation provides quick and better analysis

• Improve performance of data

For Example :
• Let us take the example of a firm
• The revenue cost of a firm can be recognized on the basis of different factors such as
geographical location of firm's workplace, products of the firm, advertisements done, time
utilized to flourish a product, etc.
2. Let us take the example of the data of a factory which sells products per quarter in Bangalore
• The data is represented in the table given below :

• In the above given presentation, the factory's sales for Bangalore are, for the time dimension,
which is organized into quarters and the dimension of items, which is sorted according to the
kind of item which is sold
• The facts here are represented in rupees (in thousands)
• Now, if we desire to view the data of the sales in a three-dimensional table, then it is
represented in the diagram given below
• Here the data of the sales is represented as a two dimensional table
• Let us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai).
Here is the table :

This data can be represented in the form of

three dimensions conceptually, which is shown
in the image below :
Data Warehouse – Schemas :
• A schema is defined as a logical description of database where fact and dimension tables are
joined in a logical manner
• Data Warehouse is maintained in the form of Star, Snow flakes, and Fact Constellation schema

Star Schema
• A Star schema contains a fact table and multiple dimension tables
• Each dimension is represented with only one-dimension table and they are not normalized
• The Dimension table contains a set of attributes
Characteristics

• In a Star schema, there is only one fact table and multiple dimension tables

• In a Star schema, each dimension is represented by one-dimension table

• Dimension tables are not normalized in a Star schema

• Each Dimension table is joined to a key in a fact table

The following illustration shows the sales data of a company with respect to the four dimensions,
namely Time, Item, Branch, and Location

• There is a fact table at the center It contains the keys to each of four dimensions
• The fact table also contains the attributes, namely dollars sold and units sold
• Note − Each dimension has only one-dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state, country}. This constraint may cause data redundancy.
Snowflakes Schema
• Some dimension tables in the Snowflake schema are normalized
• The normalization splits up the data into additional tables as shown in the following illustration
• Unlike in the Star schema, the dimension's table in a snowflake schema are normalized
• For example − The item dimension table in a star schema is normalized and split into two
dimension tables, namely item and supplier table
• Now the item dimension table contains the attributes item_key, item_name, type, brand, and
supplier-key
• The supplier key is linked to the supplier dimension table
• The supplier dimension table contains the attributes supplier_key and supplier_type
• Note − Due to the normalization in the Snowflake schema, the redundancy is reduced and
therefore, it becomes easy to maintain and the save storage space
Fact Constellation Schema (Galaxy Schema)
• A fact constellation has multiple fact tables
• It is also known as a Galaxy Schema
• The following illustration shows two fact tables, namely Sales and Shipping
• The sales fact table is the same as that in the Star Schema
• The shipping fact table has five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location
• The shipping fact table also contains two measures, namely dollars sold and units sold It is
also possible to share dimension tables between fact tables
• For example − Time, item, and location dimension tables are shared between the sales and
shipping fact table
Data Warehouse Architecture:
• A data warehouse architecture is a method of defining the overall architecture of data
communication processing and presentation that exist for end-clients computing within the
enterprise
• Each data warehouse is different, but all are characterized by standard vital components
• Production applications such as payroll accounts payable product purchasing and inventory
control are designed for online transaction processing (OLTP)
• Such applications gather detailed data from day to day operations
• Data Warehouse applications are designed to support the user ad-hoc data
requirements, an activity recently dubbed online analytical processing (OLAP)

• These include applications such as forecasting, profiling, summary reporting,

and trend analysis

• A data-warehouse is a heterogeneous collection of different data sources

organized under a unified schema

• There are 2 approaches for constructing data-warehouse:

• Top-down approach

• Bottom-up approach
The essential components are discussed below:
1) External Sources – External source is a source from where data is collected irrespective of
the type of data
• Data can be structured, semi structured and unstructured as well
2) Stage Area – Since the data, extracted from the external sources does not follow a particular
format, so there is a need to validate this data to load into data warehouse
• For this purpose, it is recommended to use ETL tool
• E(Extracted): Data is extracted from External data source
• T(Transform): Data is transformed into the standard format
• L(Load): Data is loaded into datawarehouse after transforming it into the standard format
3) Data-warehouse – After cleansing of data, it is stored in the datawarehouse as central
repository
• It actually stores the meta data and the actual data gets stored in the data marts
• Note that datawarehouse stores the data in its purest form in this top-down approach
4) Data Marts – Data mart is also a part of storage component
• It stores the information of a particular function of an organisation which is handled by single
authority
• There can be as many number of data marts in an organisation depending upon the functions
• We can also say that data mart contains subset of the data stored in data warehouse
5) Data Mining – The practice of analyzing the big data present in datawarehouse is data mining
• It is used to find the hidden patterns that are present in the database or in datawarehouse with
the help of algorithm of data mining
• This approach is defined by Inmon as – datawarehouse as a central repository for the
complete organisation and data marts are created from it after the complete data warehouse
has been created
Advantages of Top-Down Approach –
• Since the data marts are created from the datawarehouse, provides consistent dimensional
view of data marts
• Also, this model is considered as the strongest model for business changes
• That's why, big organizations prefer to follow this approach
• Creating data mart from datawarehouse is easy
Disadvantages of Top-Down Approach –
• The cost, time taken in designing and its maintenance is very high

Bottom-Up Approach:
First, the data is extracted from external sources (same as happens in top-down approach)
Then, the data go through the staging area (as explained above) and loaded into data marts instead
of data warehouse
• The data marts are created first and provide reporting capability. It addresses a single business
area
These data marts are then integrated into data warehouse
• This approach is given by Kinball as – data marts are created first and provides a thin view
for analyses and datawarehouse is created after complete data marts have been created
Advantages of Bottom-Up Approach:
1. As the data marts are created first, so the reports are quickly generated
2. We can accommodate more number of data marts here and in this way data warehouse can
be extended
3. Also, the cost and time taken in designing this model is low comparatively
Disadvantage of Bottom-Up Approach:
1. This model is not strong as top-down approach as dimensional view of data marts is not
consistent as it is in above approach
Metadata :
• Metadata is simply defined as data about data

• The data that is used to represent other data is known as metadata

• For example, the index of a book serves as a metadata for the contents in the book

• In other words, we can say that metadata is the summarized data that leads us to detailed data

In terms of data warehouse, we can define metadata as follows.

• Metadata is the road-map to a data warehouse

• Metadata in a data warehouse defines the warehouse objects and it acts as a directory

• This directory helps the decision support system to locate the contents of a data warehouse
Categories of Metadata
Metadata can be broadly categorized into three categories −
Business Metadata − It has the data ownership information, business definition, and changing
policies
Technical Metadata − It includes database system names, table and column names and sizes, data
types and allowed values. Technical metadata also includes structural information such as
primary and foreign key attributes and indices
Operational Metadata − It includes currency of data and data lineage. Currency of data means
whether the data is active, archived, or purged. Lineage of data means the history of data migrated
and transformation applied on it
Role of Metadata
• Metadata has a very important role in a data warehouse
• The role of metadata in a warehouse is different from the warehouse data, yet it plays an
important role
The various roles of metadata are explained below.
• Metadata acts as a directory
• This directory helps the decision support system to locate the contents of the data warehouse
• Metadata helps in decision support system for mapping of data when data is transformed from
operational environment to data warehouse environment
• Metadata helps in summarization between current detailed data and highly summarized data
• Metadata also helps in summarization between lightly detailed data and highly summarized
data
• Metadata is used for query tools
• Metadata is used in extraction and cleansing tools
• Metadata is used in reporting tools
• Metadata is used in transformation tools
• Metadata plays an important role in loading functions
The following diagram shows the roles of metadata
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It has the following metadata
Definition of data warehouse − It includes the description of structure of data warehouse The description is
defined by schema, view, hierarchies, derived data definitions, and data mart locations and contents
Business metadata − It contains has the data ownership information, business definition, and changing
policies
Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the
data is active, archived, or purged. Lineage of data means the history of data migrated and transformation
applied on it
Data for mapping from operational environment to data warehouse − It includes the source databases and
their contents, data extraction, data partition cleaning, transformation rules, data refresh and purging rules
Algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation,
summarizing, etc
Challenges for Metadata Management
• The importance of metadata can not be overstated. Metadata helps in driving the accuracy of
reports, validates data transformation, and ensures the accuracy of calculations. Metadata also
enforces the definition of business terms to business end-users. With all these uses of
metadata, it also has its challenges. Some of the challenges are discussed below
• Metadata in a big organization is scattered across the organization. This metadata is spread in
spreadsheets, databases, and applications
• Metadata could be present in text files or multimedia files. To use this data for information
management solutions, it has to be correctly defined
• There are no industry-wide accepted standards. Data management solution vendors have
narrow focus
• There are no easy and accepted methods of passing metadata
OLAP Hierarchical Structure / Types of OLAP
ROLAP
ROLAP works with data that exist in a relational database
Facts and dimension tables are stored as relational tables
It also allows multidimensional analysis of data and is the fastest growing OLAP
Advantages of ROLAP model:
High data efficiency. It offers high data efficiency because query performance and access language
are optimized particularly for the multidimensional data analysis
Scalability. This type of OLAP system offers scalability for managing large volumes of data, and
even when the data is steadily increasing
Drawbacks of ROLAP model:

• Demand for higher resources: ROLAP needs high utilization of manpower, software,

and hardware resources

• Aggregately data limitations. ROLAP tools use SQL for all calculation of aggregate data

• However, there are no set limits to the for handling computations

• Slow query performance. Query performance in this model is slow when compared

with MOLAP
Hybrid OLAP

• Hybrid OLAP is a mixture of both ROLAP and MOLAP

• It offers fast computation of MOLAP and higher scalability of ROLAP. HOLAP uses
two databases

• Aggregated or computed data is stored in a multidimensional OLAP cube

• Detailed information is stored in a relational database

ROLAP vs. MOLAP:
The following arguments can be given in favour of MOLAP:

• Relational tables are unnatural for multidimensional data

• Multidimensional arrays provide efficiency in storage and operations

• There is a mismatch between multidimensional operations and SQL

• For ROLAP to achieve efficiency, it has to perform outside current relational

systems, which is the same as what MOLAP does

PMBOK Guide (7th Edition) - June. 2022
93% (27)
PMBOK Guide (7th Edition) - June. 2022
90 pages
Hypercare Approach v1
100% (4)
Hypercare Approach v1
9 pages
Itil V4
100% (13)
Itil V4
260 pages
Hypercare - Managing The Risks Around The Unexpected
100% (2)
Hypercare - Managing The Risks Around The Unexpected
4 pages
Hypercare Approach v2
100% (4)
Hypercare Approach v2
10 pages
PMP Exam Prep Summary
100% (18)
PMP Exam Prep Summary
5 pages
Data Migration Checklist
100% (2)
Data Migration Checklist
6 pages
Operating Model and Organization Design Toolkit - Overview and Approach
83% (12)
Operating Model and Organization Design Toolkit - Overview and Approach
47 pages
Essential PMP Preparation A Practical Exam Prep With Simplified Explanations Definitions and Examp 2022
91% (11)
Essential PMP Preparation A Practical Exam Prep With Simplified Explanations Definitions and Examp 2022
336 pages
The McKinsey Engagement Summary
95% (19)
The McKinsey Engagement Summary
82 pages
How To Be A Great Project Manager
100% (21)
How To Be A Great Project Manager
24 pages
Cutover Plan Template
No ratings yet
Cutover Plan Template
12 pages
Business Analysis Cheat-Sheet
100% (8)
Business Analysis Cheat-Sheet
14 pages
Go Live Readiness Assessment
100% (2)
Go Live Readiness Assessment
8 pages
Key Performance Indicators KPIs
100% (23)
Key Performance Indicators KPIs
142 pages
Business & Consulting Toolkits - Free Sample in Powerpoint
100% (17)
Business & Consulting Toolkits - Free Sample in Powerpoint
131 pages
PMO Framework
100% (10)
PMO Framework
44 pages
Project Management Plan Template
100% (18)
Project Management Plan Template
59 pages
Project Status Report: #DP830 - Resource Management and Capacity Planning 12 JUN 2018
100% (4)
Project Status Report: #DP830 - Resource Management and Capacity Planning 12 JUN 2018
3 pages
Business Continuity Plan
92% (13)
Business Continuity Plan
24 pages
Cybersecurity Risk Assessment Template IT Security Risk Assessment
100% (6)
Cybersecurity Risk Assessment Template IT Security Risk Assessment
16 pages
Flowchart of Project Management Process As Applied To PDP
100% (5)
Flowchart of Project Management Process As Applied To PDP
2 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Checklist Go-Live Phase
100% (2)
Checklist Go-Live Phase
5 pages
Product Roadmap Guide
96% (24)
Product Roadmap Guide
67 pages
Henry Krips - Fetish - An Erotics of Culture-Free Association (1999)
100% (3)
Henry Krips - Fetish - An Erotics of Culture-Free Association (1999)
214 pages
Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Unit 2_Data Science BCA
No ratings yet
Unit 2_Data Science BCA
20 pages
Chap 2
No ratings yet
Chap 2
21 pages
Data Mining 9,10,11
No ratings yet
Data Mining 9,10,11
27 pages
DM Lect4
No ratings yet
DM Lect4
31 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
Unit-I: Introduction and Data Warehousing
No ratings yet
Unit-I: Introduction and Data Warehousing
17 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
IT DWDM Unit I New PPT
No ratings yet
IT DWDM Unit I New PPT
60 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
UNIT 1 DWDM PRE
No ratings yet
UNIT 1 DWDM PRE
20 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
data mining 4
No ratings yet
data mining 4
59 pages
Warehouse
No ratings yet
Warehouse
60 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Unit 1
No ratings yet
Unit 1
26 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
17 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
4th Year Dw& Dm Kai075 Unit 1
No ratings yet
4th Year Dw& Dm Kai075 Unit 1
25 pages
Data warehouse_unit-2_s
No ratings yet
Data warehouse_unit-2_s
21 pages
Data Warehousing
No ratings yet
Data Warehousing
21 pages
DM 6
No ratings yet
DM 6
29 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
3
No ratings yet
3
77 pages
Summary for Exam
No ratings yet
Summary for Exam
8 pages
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
No ratings yet
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
32 pages
DW Seminar
No ratings yet
DW Seminar
13 pages
Chapter-2 DATA WAREHOUSE PDF
100% (1)
Chapter-2 DATA WAREHOUSE PDF
28 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
unit1 dwbi
No ratings yet
unit1 dwbi
59 pages
DM Chapter 2
No ratings yet
DM Chapter 2
35 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
DMDW 1 2nd Module
No ratings yet
DMDW 1 2nd Module
29 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
3 Business Analysis in Data Mining L6 7 8-9-10
No ratings yet
3 Business Analysis in Data Mining L6 7 8-9-10
39 pages
DW Olap
No ratings yet
DW Olap
57 pages
Data Mining& Data Warehousing.
No ratings yet
Data Mining& Data Warehousing.
13 pages
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
No ratings yet
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
39 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
DWDM - Unit - I
No ratings yet
DWDM - Unit - I
70 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Adbs Unit IV
No ratings yet
Adbs Unit IV
34 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
DWDM 2020 Lecture02 Datawarehouses
No ratings yet
DWDM 2020 Lecture02 Datawarehouses
31 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
Iare DWDM PPT Cse
No ratings yet
Iare DWDM PPT Cse
249 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Excel Project Management Templates
100% (3)
Excel Project Management Templates
34 pages
Sample Project Plan
100% (1)
Sample Project Plan
8 pages
PMO Framework Templates
No ratings yet
PMO Framework Templates
16 pages
PMO Tools and Templates
89% (9)
PMO Tools and Templates
29 pages
Data Migration in SAP S4 HANA
100% (11)
Data Migration in SAP S4 HANA
17 pages
Operational Readiness Assessment ORA Template With Instructions
100% (3)
Operational Readiness Assessment ORA Template With Instructions
13 pages
UNIT - 3 Final
No ratings yet
UNIT - 3 Final
37 pages
Paper 1
No ratings yet
Paper 1
4 pages
Melita Profile
No ratings yet
Melita Profile
1 page
Sample Document
No ratings yet
Sample Document
29 pages
Internship Report
100% (1)
Internship Report
22 pages
Complement of A Set
100% (1)
Complement of A Set
3 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
2 pages
Exam Vocabulary (Classroom Tool Kit) Part-01 by Jaideep Sir
No ratings yet
Exam Vocabulary (Classroom Tool Kit) Part-01 by Jaideep Sir
5 pages
A Coustic Attenuation Performance of Double Expansion Chamber Silencers With Inter Connecting Tube
No ratings yet
A Coustic Attenuation Performance of Double Expansion Chamber Silencers With Inter Connecting Tube
4 pages
Project Management Methodologies British English Student
No ratings yet
Project Management Methodologies British English Student
7 pages
International Journal of Web & Semantic Technology (IJWesT)
No ratings yet
International Journal of Web & Semantic Technology (IJWesT)
4 pages
LE1_AIO_LT6
No ratings yet
LE1_AIO_LT6
1 page
STE Microproject Report On Mobile Application
No ratings yet
STE Microproject Report On Mobile Application
12 pages
SSF 214 - Liturgies and Devotions of The Byzantine Rite
No ratings yet
SSF 214 - Liturgies and Devotions of The Byzantine Rite
6 pages
Translation Techniques
No ratings yet
Translation Techniques
22 pages
Aesthetic and Emotional Effects of Meter and Rhyme in Poetry
No ratings yet
Aesthetic and Emotional Effects of Meter and Rhyme in Poetry
10 pages
Rock Clues
No ratings yet
Rock Clues
2 pages
Kiswahili Songs
No ratings yet
Kiswahili Songs
4 pages
MOZART. Don Giovanni - Analysis
No ratings yet
MOZART. Don Giovanni - Analysis
5 pages
I and Thou Model For Relationship
No ratings yet
I and Thou Model For Relationship
205 pages
Symmetric BSIM-SOIPart I A Compact Model For Dynamically Depleted SOI MOSFETs
No ratings yet
Symmetric BSIM-SOIPart I A Compact Model For Dynamically Depleted SOI MOSFETs
9 pages
Q1 Module 7
No ratings yet
Q1 Module 7
40 pages
Azure Policy as Code
No ratings yet
Azure Policy as Code
18 pages
Demonarch Exams
No ratings yet
Demonarch Exams
16 pages
Yuchi Language
No ratings yet
Yuchi Language
3 pages
GCC - Module - 1 - SpecificationVersion 1.4
No ratings yet
GCC - Module - 1 - SpecificationVersion 1.4
41 pages
Emtl Ii Ii R20
No ratings yet
Emtl Ii Ii R20
2 pages
Office Correspondence
No ratings yet
Office Correspondence
18 pages
7 TH
No ratings yet
7 TH
7 pages
(Cô Vũ Mai Phương) Bộ câu hỏi TỪ TRÁI NGHĨA hay và đặc sắc (P1) - FULL lời giải chi tiết
No ratings yet
(Cô Vũ Mai Phương) Bộ câu hỏi TỪ TRÁI NGHĨA hay và đặc sắc (P1) - FULL lời giải chi tiết
16 pages
Catechetics Reviewer
No ratings yet
Catechetics Reviewer
14 pages
PDF Ghostwriting Writing Handbooks Andrew Crofts download
100% (5)
PDF Ghostwriting Writing Handbooks Andrew Crofts download
82 pages
Summary of Module 2
No ratings yet
Summary of Module 2
4 pages

Unit - 4 Final

Uploaded by

Unit - 4 Final

Uploaded by

Data Mining Techniques

DATA WAREHOUSING: Introduction: What is a data warehouse?-Definition.

• It supports a relatively small number of clients with relatively long interactions

• It includes current and historical data to provide a historical perspective of information

• Its usage is read-intensive

• It contains a few large tables

• Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

• It is used to show multiple dimensions of the data to users

• It represents data in the form of data cubes

• It is defined by dimensions and facts and is represented by a fact table

The example above is a 3D cube having attributes like branch(A,B,C,D),item type(home,

Drill-down: this operation is the reverse of the roll-up operation

• It zooms into more detail

• Helps in giving a summarized view of data

• Data cubes store large data in a simple way

• Data cube operation provides quick and better analysis

• Improve performance of data

This data can be represented in the form of

• In a Star schema, each dimension is represented by one-dimension table

• Dimension tables are not normalized in a Star schema

• Each Dimension table is joined to a key in a fact table

• These include applications such as forecasting, profiling, summary reporting,

• A data-warehouse is a heterogeneous collection of different data sources

• There are 2 approaches for constructing data-warehouse:

• The data that is used to represent other data is known as metadata

In terms of data warehouse, we can define metadata as follows.

• Metadata is the road-map to a data warehouse

• Demand for higher resources: ROLAP needs high utilization of manpower, software,

• However, there are no set limits to the for handling computations

• Slow query performance. Query performance in this model is slow when compared

• Hybrid OLAP is a mixture of both ROLAP and MOLAP

• Aggregated or computed data is stored in a multidimensional OLAP cube

• Detailed information is stored in a relational database

• Relational tables are unnatural for multidimensional data

• Multidimensional arrays provide efficiency in storage and operations

• There is a mismatch between multidimensional operations and SQL

• For ROLAP to achieve efficiency, it has to perform outside current relational

You might also like