Data Warehouse Concepts
Data Warehouse Concepts
Components
What is Data warehouse?
Data warehouse is an information system that contains historical and commutative
data from single or multiple sources. It simplifies reporting and analysis process of
the organization.
It is also a single version of truth for any company for decision making and
forecasting.
• Subject-Oriented
• Integrated
• Time-variant
• Non-volatile
Subject-Oriented
A data warehouse is subject oriented as it offers information regarding a theme
instead of companies' ongoing operations. These subjects can be sales, marketing,
distributions, etc.
Integrated
In Data Warehouse, integration means the establishment of a common unit of
measure for all similar data from the dissimilar database. The data also needs to be
stored in the Datawarehouse in common and universally acceptable manner.
In the above example, there are three different application labeled A, B and C.
Information stored in these applications are Gender, Date, and Balance. However,
each application's data is stored different way.
However, after transformation and cleaning process all this data is stored in
common format in the Data Warehouse.
Time-Variant
The time horizon for data warehouse is quite extensive compared with operational
systems. The data collected in a data warehouse is recognized with a particular
period and offers information from the historical point of view. It contains an
element of time, explicitly or implicitly.
One such place where Datawarehouse data display time variance is in in the
structure of the record key. Every primary key contained with the DW should have
either implicitly or explicitly an element of time. Like the day, week month, etc.
Non-volatile
Data warehouse is also non-volatile means the previous data is not erased when
new data is entered in it.
Data is read-only and periodically refreshed. This also helps to analyze historical
data and understand what & when happened. It does not require transaction
process, recovery and concurrency control mechanisms.
Activities like delete, update, and insert which are performed in an operational
application environment are omitted in Data warehouse environment. Only two
types of data operations performed in the Data Warehousing are
1. Data loading
2. Data access
Here, are some major differences between Application and Data Warehouse
Complex program must be coded to make sure This kind of issues does not happen because
that data upgrade processes maintain high data update is not performed.
integrity of the final product.
Data is placed in a normalized form to ensure Data is not stored in normalized form.
minimal redundancy.
Single-tier architecture
The objective of a single layer is to minimize the amount of data stored. This goal is
to remove data redundancy. This architecture is not frequently used in practice.
Two-tier architecture
Three-tier architecture
1. Bottom Tier: The database of the Datawarehouse servers as the bottom tier.
It is usually a relational database system. Data is cleansed, transformed, and
loaded into this layer using back-end tools.
2. Middle Tier: The middle tier in Data warehouse is an OLAP server which is
implemented using either ROLAP or MOLAP model. For a user, this
application tier presents an abstracted view of the database. This layer also
acts as a mediator between the end-user and the database.
3. Top-Tier: The top tier is a front-end client layer. Top tier is the tools and API
that you connect and get data out from the data warehouse. It could be
Query tools, reporting tools, managed query tools, Analysis tools and Data
mining tools.
Datawarehouse Components
These ETL Tools have to deal with challenges of Database & Data heterogeneity.
Metadata
The name Meta Data suggests some high- level technological concept. However, it
is quite simple. Metadata is data about data which defines the data warehouse. It is
used for building, maintaining and managing the data warehouse.
This is a meaningless data until we consult the Meta that tell us it was
Therefore, Meta Data are essential ingredients in the transformation of data into
knowledge.
• What tables, attributes, and keys does the Data Warehouse contain?
• Where did the data come from?
• How many times do data get reloaded?
• What transformations were applied with cleansing?
Query Tools
One of the primary objects of data warehousing is to provide information to
businesses to make strategic decisions. Query tools allow users to interact with the
data warehouse system.
• Reporting tools
• Managed query tools
Reporting tools: Reporting tools can be further divided into production reporting
tools and desktop report writer.
1. Report writers: This kind of reporting tool are tools designed for end-users
for their analysis.
2. Production reporting: This kind of tools allows organizations to generate
regular operational reports. It also supports high volume batch jobs like
printing and calculating. Some popular reporting tools are Brio, Business
Objects, Oracle, Power Soft, SAS Institute.
Sometimes built-in graphical and analytical tools do not satisfy the analytical
needs of an organization. In such cases, custom reports are developed using
Application development tools.
4. OLAP tools:
While designing a Data Bus, one needs to consider the shared dimensions, facts
across data marts.
Data Marts
A data mart is an access layer which is used to get data out to the users. It is
presented as an option for large size data warehouse as it takes less time and
money to build. However, there is no standard definition of a data mart is differing
from person to person.
In a simple word Data mart is a subsidiary of a data warehouse. The data mart is
used for partition of data which is created for the specific group of users.
• Use a data model which is optimized for information retrieval which can be
the dimensional mode, denormalized or hybrid approach.
• Need to assure that Data is processed quickly and accurately. At the same
time, you should take an approach which consolidates data into a single
version of the truth.
• Carefully design the data acquisition and cleansing process for Data
warehouse.
• Design a Metadata architecture which allows sharing of metadata between
components of Data Warehouse
• Consider implementing an ODS model when information retrieval need is
near the bottom of the data abstraction pyramid or when there are multiple
operational sources required to be accessed.
• One should make sure that the data model is integrated and not just
consolidated. In that case, you should consider 3NF data model. It is also
ideal for acquiring ETL and Data cleansing tools
Summary: