Unit - 4 Final
Unit - 4 Final
AP20C12
UNIT IV:
• It is a database designed for investigative tasks, using data from various applications
• Data cubes allow to model and view the data from many dimensions and perspectives
• Facts are numerical measures and fact tables contain measures of the related dimensional
tables or names of the facts
Working on a Multidimensional Data Model
On the basis of the pre-decided steps, the Multidimensional Data Model works
The following stages should be followed by every project for building a Multi Dimensional Data
Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model collects
correct data from the client. Mostly, software professionals provide simplicity to the client about
the range of data which can be gained with the selected technology and collect the complete data in
detail
Stage 2 : Grouping different segments of the system : In the second stage, the Multi Dimensional
Data Model recognizes and classifies all the data to the respective section they belong to and also
builds it problem-free to apply step by step
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the design
of the system is based. In this stage, the main factors are recognized according to the user's point
of view. These factors are also known as "Dimensions"
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related
qualities. These qualities are also known as "attributes" in the database
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In the fifth
stage, A Multi Dimensional Data Model separates and differentiates the actuality from the factors
which are collected by it. These actually play a significant role in the arrangement of a Multi
Dimensional Data Model
Stage 6 : Building the Schema to place the data, with respect to the information collected from the
steps above : In the sixth stage, on the basis of the data which was collected previously, a Schema is
built
Data Cube:
• Grouping of data in a multidimensional matrix is called data cubes
• In Data ware housing, we generally deal with various multidimensional data models as the
data will be represented by multiple dimensions and multiple attributes
• This multidimensional data is represented in the data cube as the cube represents a high-
dimensional space
• The Data cube pictorially shows how different attributes of data are arranged in the data model.
Below is the diagram of a general data cube
Data cube operations are used to manipulate data to meet the needs of users
These operations help to select particular data for the analysis purpose
Roll-up: operation and aggregate certain similar data attributes having the same dimension
together
• For example, if the data cube displays the daily income of a customer, we can use a roll-up
operation to find the monthly income of his salary
• It allows us to take particular information and then subdivide it further for coarser
granularity analysis
• For example- if India is an attribute of a country column and we wish to see villages in
India, then the drill-down operation splits India into states, districts, towns, cities, villages
and then displays the required information
Slicing: This operation filters the unnecessary portions
• Suppose in a particular dimension, the user doesn't need everything for analysis, rather a
particular attribute
• For example, country="jamaica", this will display only about Jamaica and only display other
countries present on the country list.
Dicing: This operation does a multidimensional cutting, that not only cuts only one dimension but
also can go to another dimension and cut a certain range of it
• For example- the user wants to see the annual salary of Jharkhand state employees
Pivot: This operation is very important from a viewing point of view
• It basically transforms the data cube in terms of view
• It doesn't change the data present in the data cube
• For example, if the user is comparing year versus branch, using the pivot operation, the user
can change the viewpoint and now compare branch versus item type
Advantages of data cubes:
• In the above given presentation, the factory's sales for Bangalore are, for the time dimension,
which is organized into quarters and the dimension of items, which is sorted according to the
kind of item which is sold
• The facts here are represented in rupees (in thousands)
• Now, if we desire to view the data of the sales in a three-dimensional table, then it is
represented in the diagram given below
• Here the data of the sales is represented as a two dimensional table
• Let us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai).
Here is the table :
Star Schema
• A Star schema contains a fact table and multiple dimension tables
• Each dimension is represented with only one-dimension table and they are not normalized
• The Dimension table contains a set of attributes
Characteristics
• In a Star schema, there is only one fact table and multiple dimension tables
• There is a fact table at the center It contains the keys to each of four dimensions
• The fact table also contains the attributes, namely dollars sold and units sold
• Note − Each dimension has only one-dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state, country}. This constraint may cause data redundancy.
Snowflakes Schema
• Some dimension tables in the Snowflake schema are normalized
• The normalization splits up the data into additional tables as shown in the following illustration
• Unlike in the Star schema, the dimension's table in a snowflake schema are normalized
• For example − The item dimension table in a star schema is normalized and split into two
dimension tables, namely item and supplier table
• Now the item dimension table contains the attributes item_key, item_name, type, brand, and
supplier-key
• The supplier key is linked to the supplier dimension table
• The supplier dimension table contains the attributes supplier_key and supplier_type
• Note − Due to the normalization in the Snowflake schema, the redundancy is reduced and
therefore, it becomes easy to maintain and the save storage space
Fact Constellation Schema (Galaxy Schema)
• A fact constellation has multiple fact tables
• It is also known as a Galaxy Schema
• The following illustration shows two fact tables, namely Sales and Shipping
• The sales fact table is the same as that in the Star Schema
• The shipping fact table has five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location
• The shipping fact table also contains two measures, namely dollars sold and units sold It is
also possible to share dimension tables between fact tables
• For example − Time, item, and location dimension tables are shared between the sales and
shipping fact table
Data Warehouse Architecture:
• A data warehouse architecture is a method of defining the overall architecture of data
communication processing and presentation that exist for end-clients computing within the
enterprise
• Each data warehouse is different, but all are characterized by standard vital components
• Production applications such as payroll accounts payable product purchasing and inventory
control are designed for online transaction processing (OLTP)
• Such applications gather detailed data from day to day operations
• Data Warehouse applications are designed to support the user ad-hoc data
requirements, an activity recently dubbed online analytical processing (OLAP)
• Top-down approach
• Bottom-up approach
The essential components are discussed below:
1) External Sources – External source is a source from where data is collected irrespective of
the type of data
• Data can be structured, semi structured and unstructured as well
2) Stage Area – Since the data, extracted from the external sources does not follow a particular
format, so there is a need to validate this data to load into data warehouse
• For this purpose, it is recommended to use ETL tool
• E(Extracted): Data is extracted from External data source
• T(Transform): Data is transformed into the standard format
• L(Load): Data is loaded into datawarehouse after transforming it into the standard format
3) Data-warehouse – After cleansing of data, it is stored in the datawarehouse as central
repository
• It actually stores the meta data and the actual data gets stored in the data marts
• Note that datawarehouse stores the data in its purest form in this top-down approach
4) Data Marts – Data mart is also a part of storage component
• It stores the information of a particular function of an organisation which is handled by single
authority
• There can be as many number of data marts in an organisation depending upon the functions
• We can also say that data mart contains subset of the data stored in data warehouse
5) Data Mining – The practice of analyzing the big data present in datawarehouse is data mining
• It is used to find the hidden patterns that are present in the database or in datawarehouse with
the help of algorithm of data mining
• This approach is defined by Inmon as – datawarehouse as a central repository for the
complete organisation and data marts are created from it after the complete data warehouse
has been created
Advantages of Top-Down Approach –
• Since the data marts are created from the datawarehouse, provides consistent dimensional
view of data marts
• Also, this model is considered as the strongest model for business changes
• That's why, big organizations prefer to follow this approach
• Creating data mart from datawarehouse is easy
Disadvantages of Top-Down Approach –
• The cost, time taken in designing and its maintenance is very high
Bottom-Up Approach:
First, the data is extracted from external sources (same as happens in top-down approach)
Then, the data go through the staging area (as explained above) and loaded into data marts instead
of data warehouse
• The data marts are created first and provide reporting capability. It addresses a single business
area
These data marts are then integrated into data warehouse
• This approach is given by Kinball as – data marts are created first and provides a thin view
for analyses and datawarehouse is created after complete data marts have been created
Advantages of Bottom-Up Approach:
1. As the data marts are created first, so the reports are quickly generated
2. We can accommodate more number of data marts here and in this way data warehouse can
be extended
3. Also, the cost and time taken in designing this model is low comparatively
Disadvantage of Bottom-Up Approach:
1. This model is not strong as top-down approach as dimensional view of data marts is not
consistent as it is in above approach
Metadata :
• Metadata is simply defined as data about data
• For example, the index of a book serves as a metadata for the contents in the book
• In other words, we can say that metadata is the summarized data that leads us to detailed data
• Metadata in a data warehouse defines the warehouse objects and it acts as a directory
• This directory helps the decision support system to locate the contents of a data warehouse
Categories of Metadata
Metadata can be broadly categorized into three categories −
Business Metadata − It has the data ownership information, business definition, and changing
policies
Technical Metadata − It includes database system names, table and column names and sizes, data
types and allowed values. Technical metadata also includes structural information such as
primary and foreign key attributes and indices
Operational Metadata − It includes currency of data and data lineage. Currency of data means
whether the data is active, archived, or purged. Lineage of data means the history of data migrated
and transformation applied on it
Role of Metadata
• Metadata has a very important role in a data warehouse
• The role of metadata in a warehouse is different from the warehouse data, yet it plays an
important role
The various roles of metadata are explained below.
• Metadata acts as a directory
• This directory helps the decision support system to locate the contents of the data warehouse
• Metadata helps in decision support system for mapping of data when data is transformed from
operational environment to data warehouse environment
• Metadata helps in summarization between current detailed data and highly summarized data
• Metadata also helps in summarization between lightly detailed data and highly summarized
data
• Metadata is used for query tools
• Metadata is used in extraction and cleansing tools
• Metadata is used in reporting tools
• Metadata is used in transformation tools
• Metadata plays an important role in loading functions
The following diagram shows the roles of metadata
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It has the following metadata
Definition of data warehouse − It includes the description of structure of data warehouse The description is
defined by schema, view, hierarchies, derived data definitions, and data mart locations and contents
Business metadata − It contains has the data ownership information, business definition, and changing
policies
Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the
data is active, archived, or purged. Lineage of data means the history of data migrated and transformation
applied on it
Data for mapping from operational environment to data warehouse − It includes the source databases and
their contents, data extraction, data partition cleaning, transformation rules, data refresh and purging rules
Algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation,
summarizing, etc
Challenges for Metadata Management
• The importance of metadata can not be overstated. Metadata helps in driving the accuracy of
reports, validates data transformation, and ensures the accuracy of calculations. Metadata also
enforces the definition of business terms to business end-users. With all these uses of
metadata, it also has its challenges. Some of the challenges are discussed below
• Metadata in a big organization is scattered across the organization. This metadata is spread in
spreadsheets, databases, and applications
• Metadata could be present in text files or multimedia files. To use this data for information
management solutions, it has to be correctly defined
• There are no industry-wide accepted standards. Data management solution vendors have
narrow focus
• There are no easy and accepted methods of passing metadata
OLAP Hierarchical Structure / Types of OLAP
ROLAP
ROLAP works with data that exist in a relational database
Facts and dimension tables are stored as relational tables
It also allows multidimensional analysis of data and is the fastest growing OLAP
Advantages of ROLAP model:
High data efficiency. It offers high data efficiency because query performance and access language
are optimized particularly for the multidimensional data analysis
Scalability. This type of OLAP system offers scalability for managing large volumes of data, and
even when the data is steadily increasing
Drawbacks of ROLAP model:
• Aggregately data limitations. ROLAP tools use SQL for all calculation of aggregate data
• It offers fast computation of MOLAP and higher scalability of ROLAP. HOLAP uses
two databases