Unit 3 Notes DWM
Unit 3 Notes DWM
Processing II
Q.1] Explain Data Warehouse Design
▪ A data warehouse is a single data repository where a record from multiple
(heterogeneous) data sources is integrated for online business analytical
processing (OLAP).
▪ Thus, data warehouse design is a hugely complex, lengthy, and hence error-
prone process.
▪ Furthermore, business analytical functions change over time, which results
in changes in the requirements for the systems.
Q.2] State usage of data warehousing.
Usage of Data warehousing:
1. It possesses consolidated historical data, which helps the organization
to analyze its business.
2. A data warehouse helps executives to organize, understand, and use
their data to take strategic decisions.
3. Data warehouse systems help in the integration of diversity of
application systems.
4. A data warehouse system helps in consolidated historical data analysis.
5. Improved query performance
Q.3] Describe business framework for Data warehouse design.
Business framework for DW design:
The business analyst gets the information from the data warehouses to measure
the performance and make critical adjustments in order to win over other business
holders in the market.
Having a data warehouse offers the following advantages:
1. Since a data warehouse can gather information quickly and efficiently, it
can enhance business productivity.
2. A data warehouse provides us a consistent view of customers and items;
hence, it helps us manage customer relationship.
3. A data warehouse also helps in bringing down the costs by tracking trends,
patterns over a long period in a consistent and reliable manner.
To design an effective and efficient data warehouse, we need to understand and
analyze the business needs and construct a business analysis framework. Each
person has different views regarding the design of a data warehouse.
These views are as follows:
1. The top-down view: This view allows the selection of relevant
information needed for a data warehouse.
2. The data source view: This view presents the information being captured,
stored, and managed by the operational system.
3. The data warehouse view: This view includes the fact tables and
dimension tables. It represents the information stored inside the data
warehouse.
4. The business query view: It is the view of the data from the viewpoint of
the end user.
Q.4] Explain top-down and bottom-up design approach of data warehouse. OR
Explain Data warehouse design process.
Three methods:
1. Software Engineering Model
2. Typical Design Process
3. Top-down approach and Bottom-up approach
2] Bottom-Up approach:
In this approach, a data mart is created first for particular business processes (or
subjects).
1. First, the data is extracted from external sources.
2. Then, the data go through the staging area and loaded into data marts instead
of data warehouse.
3. The data marts are created first and provide reporting capability. It addresses a
single business area.
4. These data marts are then integrated into data warehouse.
Advantages of Bottom-Up Approach:
1. Faster implementation and quick results.
2. Lower initial investment.
3. Flexible and adaptable.
4. Immediate usability for departments.
▪ The base cuboid contains all three dimensions, city, item, and year. It can
return the total sales for any combination of the three dimensions.
▪ The apex cuboid, or 0-D cuboid, refers to the case where the group-by is
empty. It contains the total sum of all sales.
▪ The base cuboid is the least generalized (most specific) of the cuboids.
▪ The apex cuboid is the most generalized (least specific) of the cuboids, and
is often denoted as all.
Materialization (Precomputation of Data Cube):
There are three choices for data cube materialization given a base cuboid:
1. No materialization
2. Full materialization
3. Partial materialization
1] No materialization:
In this technique, cuboids are not precomputed. This leads to computing
expensive multidimensional aggregates that can takes more time and money.
2] Full materialization:
▪ The technique can precompute all the cuboids available in
multidimensional data cubes. The resulting lattice of computed cuboids is
referred as the full cube.
▪ This choice typically requires huge amount of memory space in order to
store all of the precomputed cuboids.
3] Partial materialization:
▪ Selectively compute a proper subset of the whole set of possible cuboids.
▪ Compute a subset of the cube, which contains only those cells that satisfy
some user specified criterion.
▪ It uses subcube where only some of the cells may be precomputed for
various cuboids
Q.11] Explain OLAP data indexing with its type. OR Explain Bitmap and Join
Index for OLAP.
Indexing OLAP Data Types:
1. Bitmap Index
2. Join Index
1] Bitmap index in OLAP:
▪ The bitmap index is an alternative representation of the record ID (RID)
list.
▪ Each attribute is represented by distinct bit value. If attribute’s domain
consists of n values, then n bits are needed for each entry in the bitmap
index.
▪ If the attribute value is present in the row, then it is represented by 1 in the
corresponding row of the bitmap index and rest are 0 (zero).
Example:
Base table mapping to bitmap index tables for dimensions Region and Type are:
The following query illustrates the join result that is used to create the bitmaps
that are stored in the bitmap join index:
SELECT sales.time_id, customers.cust_gender, sales.amount_sold FROM sales,
customers WHERE sales.cust_id = customers.cust_id;
Advantages of Join Index:
1. Speeds up complex join queries.
2. Reduces the need for expensive joins at query time.
3. Simplifies queries by storing pre-joined data.
4. Improves query response time for common joins.
Disadvantages of Join Index:
1. Increases storage overhead.
2. Requires maintenance when data changes.
3. Can become outdated with frequent data changes.
4. Adds complexity in managing indexes.
Advantages of ROLAP:
1. Handles large data volumes using relational databases.
2. Real-time data access without pre-aggregation.
3. Flexible with any relational database.
4. Scalable for complex queries over large datasets.
Disadvantages of ROLAP:
1. Slower query performance due to on-the-fly aggregation.
2. Depends on relational database performance.
3. Requires more processing power for complex queries.
4. Not ideal for highly aggregated data.
Advantages of MOLAP:
1. Fast query performance with pre-aggregated data.
2. Efficient for complex calculations and aggregations.
3. Supports interactive analysis.
4. Handles advanced analytical functions well.
Disadvantages of MOLAP:
1. Limited scalability with large datasets.
2. High storage requirements for data cubes.
3. Less flexible for changing data structures.
4. Struggles with real-time data updates.
3.Hybrid OLAP (HOLAP):
Hybrid OLAP is a combination of ROLAP and MOLAP. It offers higher
scalability of ROLAP and faster computation of MOLAP. HOLAP stores
aggregations in MOLAP for fast query performance, and detailed data in ROLAP
to optimize time of cube processing.
▪ HOLAP tools can utilize both pre-calculated cubes and relational data
sources
▪ HOLAP servers are capable of storing large amounts of detailed data. On
the one hand, HOLAP benefits from ROLAP’s greater scalability.
▪ HOLAP, on the other hand, makes use of cube technology for faster
performance and summary-type information.
HOLAP includes the following components:
1. Database Server
2. Multidimensional database (MDDB)
3. HOLAP server
4. Front-end tool.