0% found this document useful (0 votes)
2 views

Unit 2 Notes DWM

The document provides an overview of data warehousing concepts, focusing on Data Cubes and OLAP (Online Analytical Processing). It discusses the advantages and disadvantages of OLAP, differentiates between OLAP and OLTP (Online Transaction Processing), and explains various data warehouse schemas including Star, Snowflake, and Fact Constellation schemas. Additionally, it covers OLAP operations such as roll-up, drill-down, slice, dice, and pivot, along with a comparison between OLTP and OLAP systems.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 2 Notes DWM

The document provides an overview of data warehousing concepts, focusing on Data Cubes and OLAP (Online Analytical Processing). It discusses the advantages and disadvantages of OLAP, differentiates between OLAP and OLTP (Online Transaction Processing), and explains various data warehouse schemas including Star, Snowflake, and Fact Constellation schemas. Additionally, it covers OLAP operations such as roll-up, drill-down, slice, dice, and pivot, along with a comparison between OLTP and OLAP systems.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 2: Data Warehousing Modeling & Online Analytical Processing (OLAP) I

Q.1] Define Data cube used in Data warehouse modelling


Data Cube or OLAP Cube:
The data warehouse data is grouped or combined in multidimensional
matrices; it is called Data Cube. Data or OLAP cube is a data structure optimized
for very quick data analysis. Data cube is also called as OLAP cube or hypercube.

Q.2] Define OLAP with examples.


OLAP stands for Online Analytical Processing:
OLAP is a software that
allows users to analyse information from multiple database systems at the same
time.
Example:
1. Finance and accounting:
2. Sales and Marketing
3. Production
Q.3] Advantages and Disadvantages of OLAP
Advantages of OLAP:
1. Fast query performance.
2. Multidimensional data analysis.
3. Easy data aggregation.
4. Supports complex calculations.
5. Aids in decision-making.
6. User-friendly interfaces.
Disadvantages of OLAP:
1. Complex setup and design.
2. High storage requirements.
3. Maintenance can be time-consuming.
4. Not ideal for real-time data.
5. Scalability challenges with large data.
6. High implementation and maintenance costs.
Q.4] Define OLTP in data warehouse
Online Transaction Processing (OLTP):
OLTP databases are meant to be used
to do many small transactions, and usually serve as a “single source of storage”.
Q.5] Define the term data cube in multidimensional data model.
▪ A data cube is a multidimensional data structure model for storing data in
the data warehouse.
▪ Data cube can be 2D, 3D or n-dimensional in structure.
▪ When data is grouped, combined together into multidimensional matrices,
then it is called as a data cube.
▪ Data cube represent data in terms of dimensions and facts.
▪ Dimension in a data cube represents attributes in the data set.
▪ Each cell of a data cube has aggregated data.
Q.6] Explain need of OLAP
1. Complex Data Analysis:
OLAP enables multidimensional analysis, offering
deeper insights from large datasets.
2. Faster Decision-Making:
It speeds up decision-making by delivering fast
query responses and real-time insights.
3. Data Exploration:
OLAP allows users to easily explore data through features
like drilling down, rolling up, and pivoting.
4. Aggregated Data:
Pre-aggregated data in OLAP systems simplifies analysis
and reduces manual computation.
5. Reporting and Visualization:
OLAP tools support detailed reporting and
visualization, aiding in clearer decision-making.
6. Forecasting and Trend Analysis:
OLAP helps with forecasting and
analyzing trends to predict future outcomes effectively.
Q.7] List & explain schema used in Data warehouse modeling.
Schema in Data warehouse modeling:
1. Star Schema
2. Snowflake Schema
3. Fact Constellation or Galaxy Schema
1] Star Schema:
▪ A star schema is the primary form of a dimensional model, in which data
are organized into facts and dimensions.
▪ A fact is an event that is counted or measured, such as a sale.
▪ A dimension includes all information about the fact, such as date, item, or
customer.
▪ The star schema is the explicit data warehouse schema.
▪ It is known as star schema because the entity-relationship diagram of this
schemas simulates a star, with points, diverge from a central table.
▪ The centre of the schema consists of a large fact table, and the points of the
star are the dimension tables.
Fact Table: (applicable for all schema) This table contains primary key of
multiple dimension tables. It contains facts or measures like quantity sold, amount
sold, etc.
Dimension Table: (applicable for all schema) This table provides descriptive
information for all measures recorded in fact table, like product, item, location,
time, etc.

Advantages of Star Schema:


1. Simple and easy to understand.
2. Improved query performance with fewer joins.
3. Optimized for OLAP systems.
4. Flexible and scalable design.
5. Simplifies ETL processes.
6. User-friendly for business users.
Disadvantages of Star Schema:
1. Can cause data redundancy.
2. Complex queries may slow down performance.
3. Lack of normalization can lead to inconsistencies.
4. Requires frequent updates to dimension tables.
5. Becomes inefficient with many dimensions.
2] Snowflake Schema:
▪ A snowflake schema is refinement of the star schema.
▪ "A schema is known as a snowflake where one or more-dimension tables
do not connect directly to the fact table, but must join through other
dimension tables."
▪ The snowflake schema is an expansion of the star schema where each point
(dimension table) of the star explodes into more points (more dimension
tables).
▪ Snowflaking is a method of normalizing the dimension tables in a STAR
schema.
▪ Snowflaking is used to develop the performance of specific queries.
▪ The snowflake schema consists of one fact table which is linked to many
dimension tables, which can be linked to other dimension tables through a
many-to-one relationship.
▪ Tables in a snowflake schema are generally normalized to the third normal
form.
Advantages of Snowflake Schema:
1. Reduces data redundancy by normalizing dimension tables.
2. Saves storage space compared to the star schema.
3. Improves data consistency due to normalization.
4. More efficient for complex queries that involve multiple dimensions.
5. Easier to maintain and update dimension tables.
Disadvantages of Snowflake Schema:
1. More complex design, making it harder to understand and use.
2. ETL processes are more complicated and time-consuming.
3. May lead to slower performance with large datasets.
4. Not as user-friendly for non-technical users.
5. Less Flexible.
3] Fact Constellation Schema:
▪ A Fact constellation means two or more fact tables sharing one or more
dimensions.
▪ It is also called Galaxy schema.
▪ It is a collection of multiple fact tables having some common dimension
tables.
▪ It can be viewed as a collection of several star schemas and hence, also
known as Galaxy schema.
▪ It is one of the widely used schemas for Data warehouse designing.
▪ It is much more complex than star and snowflake schema.
▪ For complex systems, we require fact constellations.
Fig. Fact Constellation Schema
Advantages of Fact Constellation Schema:
1. Supports multiple fact tables for complex analysis.
2. High performance for large datasets.
3. Flexible for various business requirements.
4. Handles complex relationships well.
5. Scalable for evolving business needs.
6. Reduces data redundancy.
Disadvantages of Fact Constellation Schema:
1. Complex design and maintenance.
2. Slower queries due to multiple joins.
3. Difficult for non-technical users.
4. Requires more storage space.
5. Complicated ETL processes.
Q.8] Differentiate between star schema and snowflake schema
Parameter Star Schema Snowflake Schema
1. Ease of It has redundant data and hence No redundancy and therefore easier
Maintenance less easy to maintain to maintain
2. Ease of change It has redundant data and hence No redundancy and therefore easier
less easy to change to change
3. Ease of Use Less complex queries and simple More complex queries and therefore
to understand less easy to understand (complex)
4. Normalization It has De-normalized tables It has normalized tables
5. Joins Fewer joins Higher Higher number of joins

6. Dimension Table It contains only a single dimension It may have more than one-
table for each dimension dimension table for each dimension
7. Foreign keys Less More
used

Q.9] Explain Multi-Dimensional Data Model


Multi-Dimensional Data Model:
▪ A multidimensional model views data in the form of a data-cube.
▪ A data cube enables data to be modelled and viewed in multiple
dimensions.
▪ Multidimensional data model consists of Fact table and dimension tables.
Fact Table:
▪ This table contains primary key of multiple dimension tables.
▪ It contains facts or measures like quantity sold, amount sold, etc.
Dimension Table:
This table provides descriptive information for all measures
recorded in fact table, like product, item, location, time, etc.
Example:
Consider the data of a shop for items sold per quarter in the city of Delhi. The
data is shown in the table. In this 2D representation, the sales for Delhi are shown
for the time dimension (organized in quarters) and the item dimension (classified
according to the types of an item sold). The fact or measure displayed in rupee
sold (in thousands).
The data from above table can be represented in the form of a 3D (3-Dimensional)
data cube, as shown in fig:

Q.10] Explain following OLAP operation:


1. Roll-up:
2. Drill down
3. Slice
4. Dice
5. Pivot
1] Roll-up:
Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation
can be performed in 2 ways
a. Reducing dimensions
b. Climbing up concept hierarchy. Concept hierarchy is a system of grouping
things based on their order or level.
Consider the following diagram:
In this overview section, roll-up operation performed by climbing up (merging)
in concept hierarchy of Location dimension (City to State)
▪ In this example, cities Pune and Mumbai are rolled up into State
Maharashtra.
▪ The sales figure of Pune and Mumbai are 260 and 390 respectively. They
become 650 after roll-up.
▪ In this aggregation process, data is location hierarchy moves up from city
to the state.
2] Drill down:
In drill-down data is fragmented (divided) into smaller parts. It is the opposite
of the rollup process. It can be done via
a. Moving down in the concept hierarchy and
b. Increasing a dimension.
Consider the following diagram:
In this overview section, drill-down operation is performed by moving down in
concept hierarchy of Time dimension (Quarter to Months).
In this example, Quarter Q1 is drilled down to months January, February, and
March. Corresponding sales are also registered. i.e. dimension months are
added.

3] Slice:
In this operation, one dimension is selected, and a new sub-cube is created. In
the overview section, slice is performed on the dimension Time (Q1).
In this example, dimension Time is Sliced with quarter Q1 as the filter. A new
cube is created altogether
4] Dice:
▪ This operation is similar to a slice. The difference in dice is that, you can
select 2 or more dimensions that result in the creation of a sub-cube.
▪ In the overview section, a sub-cube is selected by selecting Location Pune
or Mumbai and Time Q1 or Q2.

5] Pivot:
In Pivot operation, you rotate the data axes to provide a substitute
presentation of data. In this overview section, a sub-cube obtained after Slice
operation performing Pivot operation gives a new view of that slice.
Consider the result (slice) in slice operation.
Q.11] Distinguish between OLTP & OLAP.
OLTP OLAP

1. OLTP is characterized by a large number 1. OLAP is characterized by relatively low


of short on-line transactions (INSERT, volume of transactions.
UPDATE, DELETE).

2. OLTP queries are simple and easy to 2. OLAP Queries are often very complex
understand. and involve aggregations.
3. OLTP is widely used for small transaction. 3. OLAP applications are widely used by
Data Mining techniques.
4. OLTP is highly normalized. 4. OLAP is typically de-normalized.
5. OLTP is used for Backup religiously. 5. OLAP is used for regular backup.
6. Performance of OLTP is comparably fast 6. Performance of OLAP is comparably
as compared to OLAP. low as compared to OLTP.
7. Write-heavy operations 7. Read-heavy operations
8. Lower redundancy 8. Higher redundancy

You might also like