0% found this document useful (0 votes)
51 views

DWM Unit 1 (2023)

The document provides an overview of key concepts related to data warehousing and online analytical processing (OLAP). It discusses the components and architecture of a data warehouse, including source systems, data staging, the data warehouse layer containing data marts and a metadata repository, and analytical tools. It also covers OLAP concepts like MOLAP, ROLAP, and HOLAP systems as well as common OLAP operations like roll-up, drill-down, slice, dice, and pivot. Dimensional data models including star, snowflake, and fact constellation schemas are also summarized.

Uploaded by

UT DU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

DWM Unit 1 (2023)

The document provides an overview of key concepts related to data warehousing and online analytical processing (OLAP). It discusses the components and architecture of a data warehouse, including source systems, data staging, the data warehouse layer containing data marts and a metadata repository, and analytical tools. It also covers OLAP concepts like MOLAP, ROLAP, and HOLAP systems as well as common OLAP operations like roll-up, drill-down, slice, dice, and pivot. Dimensional data models including star, snowflake, and fact constellation schemas are also summarized.

Uploaded by

UT DU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Unit I

1.1 Data Warehousing:


Data warehousing Components, Building a Data
warehouse, Data Warehouse Architecture, DBMS
Schemas for Decision Support, Data Extraction,
Clean up, and Transformation Tools, Metadata,
reporting, Query tools and Applications
1.2 Online Analytical Processing (OLAP) – OLAP
and OLTP, Concept Hierarchies, Characteristics of
OLAP Systems, Typical OLAP Operations,
Multidimensional Data Analysis.
Data Warehouse
• A Data Warehouse is built by combining data
from multiple diverse sources that support
analytical reporting, structured and
unstructured queries, and decision making for
the organization, and Data Warehousing is a
step-by-step approach for constructing and
using a Data Warehouse.
Data warehouse
• A data warehouse is mainly a data
management system that’s designed to enable
and support business intelligence (BI)
activities, particularly analytics.
• Data warehouses are alleged to perform
queries, cleaning, manipulating, transforming
and analyzing the data and they also contain
large amounts of historical data.

Need of Data Warehousing

Characteristics of Data warehouse
• Subject Oriented
• Time-Variant
• Non-volatile
• Integrated
Architecture & Components of Data
Warehouse
• The architecture of the data warehouse
mainly consists of the proper arrangement of
its elements, to build an efficient data
warehouse with software and hardware
components.
• The elements and components may vary
based on the requirement of organizations.
ARCHITECTURE


Architecture
• Source layer
• Data Staging (ETL)
• Data Warehouse layer (Data Marts & Meta
data repository)
• Analysis: Issues reports, dynamically analyze
information, and simulate hypothetical
business scenarios.
Source Data Component
• External Data
• Internal Data
• Operational System data
• Flat files
Data Staging
• The data staging contains three primary
functions that take place in this part
Data Storage in Warehouse
• Metadata Meta-data repositories store
information on sources, access procedures, data
staging, users, data mart schema, and so on.
• Meta data helps the users to understand content
and find the data.
• Meta data are stored in a separate data stores
which is known as informational directory or
Meta data repository which helps to integrate,
maintain and view the contents of the data
warehouse
Extraction, Transformation, and
Loading
• Data extraction, which typically gathers data
from multiple, heterogeneous, and external
sources.
• Data cleaning, which detects errors in the data
and rectifies them when possible.
• Data transformation, which converts data from
legacy or host format to warehouse format.
• Load, which sorts, summarizes, consolidates,
computes views, checks integrity, and builds
indices and partitions.
• Refresh, which propagates the updates from the
data sources to the warehouse.
ETL

Loading
• The Load is the process of writing the data
into the target database
• Loading can be carried in two ways:
1.Refresh: Data Warehouse data is completely
rewritten.
2.Update: Only those changes applied to source
information are added to the Data Warehouse.
Update
• An update is typically carried out without
deleting or modifying preexisting data.
• This method is used in combination with
incremental extraction to update data
warehouses regularly.
Data Warehouse
• Data warehouses are built using dimensional
data models which consist of fact and
dimension tables.
• Fact tables are used for analysis and decision-
making process,
• While Dimension tables help store information
about a business and its processes.
Fact Table
• In data warehousing, a Fact Table is one that contains
the measurements, metrics, or facts of a business
operation.
• It is surrounded by Dimension Tables and is found at
the core of a star or snowflake schema.
• A fact table is a table that contains summarized
numerical and historical data (facts) and a multipart
index composed of foreign keys from the primary keys
of related dimension tables.
• A fact table typically has two types of columns: foreign
keys to dimension tables and measures those that
contain numeric facts.
• Dimensions are categories by which
summarized data can be viewed. E.g. a profit
summary in a fact table can be viewed by a
Time dimension (profit by month, quarter,
year), Region dimension
Fact and Dimension

DBMS Schemas for Decision Support
• Schema is a logical description of the entire
database
• A data warehouse uses following schema:
• Star schema
• Snowflake schema
• Fact Constellation schema.
•Each dimension in a star
Star Schema schema is represented with
only one-dimension table.
• •This dimension table
contains the set of
attributes.
•There is a fact table at the
center. It contains the keys
to each of four dimensions.
•The fact table also contains
the attributes, namely
dollars sold and units sold.
Snowflake Schema •Some dimension tables in
the Snowflake schema are
normalized.

•The normalization splits
up the data into additional
tables.
Due to normalization in the
Snowflake schema,
• The redundancy is
reduced and therefore,
• It becomes easy to
maintain and the save
storage space.
Fact Constellation Schema
• • A fact constellation has
multiple fact tables.
• It is also known as
galaxy schema.
OLAP
• Online Analytical Processing Server (OLAP) is
based on the multidimensional data model.
• Online Analytical Processing (OLAP) is a
category of software that allows users to
analyze information from multiple database
systems at the same time.
There are four types of OLAP servers −
1. Relational OLAP (ROLAP)
2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)
ROLAP
• ROLAP works with data that exist in a
relational database. Facts and dimension
tables are stored as relational tables.
MOLAP
• MOLAP uses array-based multidimensional
storage engines to display multidimensional
views of data. Basically, they use an OLAP
cube.
HOLAP
• Hybrid OLAP is a mixture of both ROLAP and
MOLAP. It offers fast computation of MOLAP
and higher scalability of ROLAP. HOLAP uses
two databases.
• Aggregated or computed data is stored in a
multidimensional OLAP cube
• Detailed information is stored in a relational
database.

OLAP cube
• The OLAP cube is a data •
structure optimized for
very quick data analysis.
• The OLAP Cube consists
of numeric facts called
measures which are
categorized by
dimensions.
• OLAP Cube is also called
the hypercube.
Cube
• 3-D table can be represented as 3-D data cube
OLAP Operations
The list of OLAP operations :
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Roll-up
• Roll-up is also known as
“consolidation” or •
“aggregation.” The Roll-
up operation can be
performed in 2 ways
1. Reducing dimensions
2. Climbing up concept
hierarchy.
3. Concept hierarchy is a
system of grouping things
based on their order or
level.
Drill-down
• In drill-down data is •
fragmented into smaller
parts. It is the opposite
of the rollup process. It
can be done via
• Moving down the
concept hierarchy
• Increasing a dimension
Slice
• One dimension is
selected, and a new
sub-cube is created.
•Dimension Time is Sliced
with Q1 as the filter.
Dice
• The dice operation •
selects 2 or more
dimensions that result
in the creation of a sub-
cube.
Pivot
• In Pivot the data is rotated in axes to provide a
substitute presentation of data.

You might also like