0% found this document useful (0 votes)
2 views

Unit-2

Uploaded by

Veer Gohil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit-2

Uploaded by

Veer Gohil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

UNIT - 2

The Architecture of BI and DW


Outline
• Data Warehouse Architecture
• OLTP v/s OLAP
• Data Warehouse Schema Architecture
• OLAP Operations
• OLAP Servers
Data Warehouse Architecture
Data Warehouse Architecture
• Bottom tier:
• The bottom tier is a warehouse database server that is almost always a relational
database system.
• Back-end tools and utilities are used to feed data into the bottom tier from
operational databases or other external sources.
• These tools and utilities perform data extraction, cleaning, and transformation, as
well as load and refresh functions to update the data warehouse.
• The data are extracted using application program interfaces known as gateways.
• A gateway is supported by the underlying DBMS and allows client programs to
generate SQL code to be executed at a server.
• Examples of gateways include ODBC (Open Database Connection) and OLEDB (Open
Linking and Embedding for Databases) by Microsoft and JDBC (Java Database
Connection).
• This tier also contains a metadata repository, which stores information about the
data warehouse and its contents.
Data Warehouse Architecture
• Middle tier:
• The middle tier is an OLAP (Online Analytical
Processing Server) that is typically implemented using either
• A relational OLAP (ROLAP) model, that is, an extended relational
DBMS that maps operations on multidimensional data to standard
relational operations or,
• A multidimensional OLAP (MOLAP) model, that is, a special-
purpose server that directly implements multidimensional data and
operations.
• Top tier:
• The top tier is a front-end client layer, which contains query
and reporting tools, analysis tools, and/or data mining
tools.
OLAP (On-Line Analytical Processing)
• OLAP is characterized by relatively low volume of
transactions.
• Queries are often very complex and involve
aggregations.
• For OLAP systems a response time is an
effectiveness measure.
• OLAP applications are widely used by Data Mining
techniques.
• In OLAP database there is aggregated, historical
data, stored in multi-dimensional schemas (usually
star schema).
OLTP (On-Line Transaction Processing)
• It is characterized by a large number of short on-
line transactions (INSERT, UPDATE, DELETE).
• The main emphasis for OLTP systems is put on
very fast query processing, maintaining data
integrity in multi-access environments and an
effectiveness measured by number of
transactions per second.
• In OLTP database, there is detailed and current
data, and schema used to store transactional
databases is the entity model (usually 3NF).
OLTP v/s OLAP (Understanding)
OLTP OLAP
Many Short Transactions Long Transactions (Complex Queries)
(Queries + Updates)
Examples Examples
• Update account balance • Report total sales for each department
• Enroll in course in each month
• Add book to shopping cart • Identify top-selling books
• Count classes with fewer than 10
students
Queries touch small amount of data (one Queries touch large amount of data
record or few records)
Updates are frequent Updates are infrequent
OLTP v/s OLAP
Functionality OLTP OLAP
Characteristic Operational processing Transaction Analysis
informational processing
Orientation Transaction Analysis
User Clerk, DBA, database professional Knowledge worker (e.g., manager,
executive, analyst)
Function day-to-day operations long-term informational
requirements, decision support
DB design ER based, application-oriented Star/snowflake, subject-oriented
Data Current; guaranteed up-to-date Historical; accuracy maintained
over time
Summarization Primitive, highly detailed Summarized, consolidated
View Detailed, flat relational Summarized, multidimensional
Unit of work Short, simple transaction Complex query
Access Read/write Mostly read
Data Warehouse Schema Architecture
• Data Warehouse environment usually transforms the
relational data model into some special architectures.
• There are many schema models designed for data
warehousing but the most commonly used are:
• Star Schema
• Snowflake Schema
• Fact constellation(Group of star, Collection of fact tables)
Schema
• The determination of which schema model should be
used for a data warehouse based upon the analysis of
project requirements, accessible tools and project
team preferences.
Star Schema
• The star schema architecture is the simplest data
warehouse schema.
• It is called a star schema because the diagram resembles
a star, with points radiating from a center.
• The center of the star consists of fact table and the
points of the star are the dimension tables.
• Usually the fact tables in a star schema are in third
normal form (3NF) whereas dimensional tables are de-
normalized.
• Despite the fact that the star schema is the simplest
architecture, it is most commonly used nowadays and is
recommended by Oracle.
Star Schema - Example
Snowflake Schema
• The snowflake schema architecture is a more complex
variation of the star schema used in a data warehouse,
because the tables which describe the dimensions are
normalized.
• This table is easy to maintain and saves storage space.
• However, this saving of space is negligible in comparison to
the typical size of the fact table.
• Furthermore, the snowflake structure can reduce the
effectiveness of browsing, since more joins will be needed to
execute a query.
• Hence, although the snowflake schema reduces
redundancy, it is not as popular as the star schema in data
warehouse design.
Snowflake Schema - Example
Snowflake Schema - Example
• DMQL(Data Mining Query Language) code for
Snowflake Schema can be written as follows:
• Define cube sales snowflake [time, item, branch, location]:
• Dollars sold = sum(sales in dollars), units sold = count(*)
• Define dimension time as (time key, day, day of week,
month, quarter, year)
• Define dimension item as (item key, item name, brand,
type, supplier (supplier key, supplier type))
• Define dimension branch as (branch key, branch name,
branch type)
• Define dimension location as (location key, street, city (city
key, city, province or state, country))
Fact Constellation Schema
• Sophisticated applications may require multiple fact tables to
share dimension tables.
• This kind of schema can be viewed as a collection of stars,
and hence is called a galaxy schema or a fact constellation.
• A fact constellation schema allows dimension tables to be
shared between fact tables.
• For example, the dimensions tables for time, item, and
location are shared between both the sales and shipping fact
tables.
• The main shortcoming of the fact constellation schema is a
more complicated design because many variants for
particular kinds of aggregation must be considered and
selected.
Fact Constellation Schema
Fact Constellation Schema
• DMQL code for Fact Constellation schema can be written as follows:
• Define cube sales [time, item, branch, location]:
• Dollars sold = sum(sales in dollars), units sold = count(*)
• Define dimension time as (time key, day, day of week, month, quarter, year)
• Define dimension item as (item key, item name, brand, type, supplier type)
• Define dimension branch as (branch key, branch name, branch type)
• Define dimension location as (location key, street, city, province or state,
country)
• Define cube shipping [time, item, shipper, from location, to location]:
• Dollars cost = sum(cost in dollars), units shipped = count(*)
• Define dimension time as time in cube sales
• Define dimension item as item in cube sales
• Define dimension shipper as (shipper key, shipper name, location as location in
cube sales, shipper type)
• Define dimension from location as location in cube sales
• Define dimension to location as location in cube sales
OLAP Operations
• Roll up
• Drill Down
• Slice
• Dice
• Pivot (Rotate)
Roll up – OLAP Operation
 The roll-up operation (also called drill-up or aggregation
operation) performs aggregation on a data cube by following
ways:
 By climbing up a concept hierarchy for a dimension
• By dimension reduction
• Roll-up is performed by climbing up a concept hierarchy for the
dimension location.
• Initially the concept hierarchy was "street < city < province <
country".
• On rolling up, the data is aggregated by ascending the location
hierarchy from the level of city to the level of country.
• The data is grouped into cities rather than countries.
• When roll-up is performed, one or more dimensions from the data
cube are removed.
Roll up – OLAP Operation
Drill Down – OLAP Operation
• Drill-down is the reverse operation of roll-up. It is performed by
either of the following ways:
• By stepping down a concept hierarchy for a dimension
• By introducing a new dimension
• Drill-down is performed by stepping down a concept hierarchy for
the dimension time.
• Initially the concept hierarchy was "day < month < quarter < year."
• On drilling down, the time dimension is descended from the
level of quarter to the level of month.
• When drill-down is performed, one or more dimensions from the
data cube are added.
• It navigates the data from less detailed data to highly detailed
data.
Drill Down – OLAP Operation
Slice – OLAP Operation
• The slice operation selects one particular dimension from a given cube
and provides a new sub cube.
• Here Slice is performed for the dimension "time" using the criterion time
= "Q1“, time = "Q2“, time = "Q3“ etc.
• It will form a new sub-cube by selecting one or more dimensions.
Slice – OLAP Operation
Dice – OLAP Operation
• Dice selects two or more dimensions from a given cube and provides a
new sub cube.
• The dice operation on the cube based on the following selection criteria
involves three dimensions.
• (location = "Toronto" or "Vancouver")
• (time = "Q1" or "Q2")
• (item =" Mobile" or "Modem")
Dice – OLAP Operation
Pivot – OLAP Operation
• The pivot operation is also known as rotation.
• It rotates the data axes in view in order to provide an alternative
presentation of data.
OLAP Servers
 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
Relational OLAP (ROLAP)
• Relational On-Line Analytical Processing (ROLAP) work mainly for the
data that resides in a relational database, where the base data and
dimension tables are stored as relational tables.
• ROLAP servers are placed between the relational back-end server and
client front-end tools.
• ROLAP servers use RDBMS to store and manage warehouse data, and
OLAP middleware to support missing pieces.
 Advantages of ROLAP
• ROLAP can handle large amounts of data.
• Can be used with data warehouse and OLTP systems.
 Disadvantages of ROLAP
• Limited by SQL functionalities.
• Hard to maintain aggregate tables.
Multidimensional OLAP (MOLAP)
• Multidimensional On-Line Analytical Processing (MOLAP) support
multidimensional views of data through array-based multidimensional
storage engines.
• With multidimensional data stores, the storage utilization may be low if
the data set is sparse.
 Advantages of MOLAP
• Optimal for slice and dice operations.
• Performs better than ROLAP when data is dense(heavy).
• Can perform complex calculations.
 Disadvantages of MOLAP
• Difficult to change dimension without re-aggregation.
• MOLAP can handle limited amount of data.
Hybrid OLAP (HOLAP)
• Hybrid On-Line Analytical Processing (HOLAP) is a combination of ROLAP
and MOLAP.
• HOLAP provide greater scalability of ROLAP and the faster computation
of MOLAP.
 Advantages of HOLAP
• HOLAP provide advantages of both MOLAP and ROLAP.
• Provide fast access at all levels of aggregation.
 Disadvantages of HOLAP
• HOLAP architecture is very complex because it support
both MOLAP and ROLAP servers.

You might also like