0% found this document useful (0 votes)
98 views16 pages

WA Data Warehouse

A data warehouse is a digital storage system that connects data from multiple sources to provide a single source of truth for an organization. It stores current and historical data to power business intelligence, reporting, analytics, and decision-making. Modern data warehouses can handle both structured and unstructured data from various sources like IoT devices and social media to provide real-time access and insights. A well-designed data warehouse is foundational for any successful analytics or BI program by consolidating data to power reports, dashboards, and tools that support data-driven decisions.

Uploaded by

Mohammed Kemal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views16 pages

WA Data Warehouse

A data warehouse is a digital storage system that connects data from multiple sources to provide a single source of truth for an organization. It stores current and historical data to power business intelligence, reporting, analytics, and decision-making. Modern data warehouses can handle both structured and unstructured data from various sources like IoT devices and social media to provide real-time access and insights. A well-designed data warehouse is foundational for any successful analytics or BI program by consolidating data to power reports, dashboards, and tools that support data-driven decisions.

Uploaded by

Mohammed Kemal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

WA data warehouse 

(DW) is a
digital storage system that connects and
harmonizes large amounts of data from
many different sources. Its purpose is to
feed business intelligence (BI), reporting,
and analytics, and support regulatory
requirements – so companies can turn
their data into insight and make smart,
data-driven decisions. Data warehouses
store current and historical data in one
place and act as the single source of truth
for an organization.
 
Data flows into a data warehouse from operational systems (like ERP and CRM),
databases, and external sources such as partner systems, Internet of Things (IoT)
devices, weather apps, and social media – usually on a regular cadence. The emergence
of cloud computing has caused a shift in the landscape. In recent years, data storage
locations have moved away from traditional on-premise infrastructure to multiple locations,
including on premise, private cloud, and public cloud.
 
Modern data warehouses are designed to handle both structured and unstructured data,
like videos, image files, and sensor data. Some leverage integrated analytics and in-
memory database technology (which holds the data set in computer memory rather than in
disk storage) to provide real-time access to trusted data and drive confident decision-
making. Without data warehousing, it’s very difficult to combine data from heterogeneous
sources, ensure it’s in the right format for analytics, and get both a current and long-range
view of data over time.
What is a data warehouse?

Benefits of data warehousing


A well-designed data warehouse is the foundation for any successful BI or analytics
program. Its main job is to power the reports, dashboards, and analytical tools that have
become indispensable to businesses today. A data warehouse provides the information for
your data-driven decisions – and helps you make the right call on everything from new
product development to inventory levels. There are many benefits of a data warehouse.
Here are just a few: 

 Better business analytics: With data warehousing, decision-makers have access


to data from multiple sources and no longer have to make decisions based on
incomplete information.  
 Faster queries: Data warehouses are built specifically for fast data retrieval and
analysis. With a DW, you can very rapidly query large amounts of consolidated data
with little to no support from IT.  
 Improved data quality: Before being loaded into the DW, data cleansing cases are
created by the system and entered in a worklist for further processing, ensuring data
is transformed into a consistent format to support analytics – and decisions – based
on high quality, accurate data.
 Historical insight: By storing rich historical data, a data warehouse lets decision-
makers learn from past trends and challenges, make predictions, and drive
continuous business improvement.
Data warehouse screenshot showing data lineage.

What can a data warehouse


store?
When data warehouses first became popular in the late 1980s, they were designed to store
information about people, products, and transactions. This data – called structured data –
was neatly organized and formatted for easy access. However, businesses soon wanted to
store, retrieve, and analyze unstructured data – such as documents, images, videos,
emails, social media posts, and raw data from machine sensors.
 
A modern data warehouse can accommodate both structured and unstructured data. By
merging these data types and breaking down silos between the two, businesses can get a
complete, comprehensive picture for the most valuable insights.

Some key terms


There are lots of terms to make sense of in the world of DW. Here are some of the most
important. Explore some other terms and FAQs in our glossary.
 

Data warehouse vs. database


 
Databases and data warehouses are both data storage systems; however, they serve
different purposes.  A database stores data usually for a particular business area. A data
warehouse stores current and historical data for the entire business and feeds BI and
analytics. Data warehouses use a database server to pull in data from an organization’s
databases and have additional functionalities for data modeling, data lifecycle management,
data source integration, and more.
 

Data warehouse vs. data lake


 
Both data warehouses and data lakes are used for storing Big Data, but they are very
different storage systems. A data warehouse stores data that has been formatted for a
specific purpose, whereas a data lake stores data in its raw, unprocessed state – the
purpose of which has not yet been defined. Data warehouses and lakes often complement
each other. For example, when raw data stored in a lake is needed to answer a business
question, it can be extracted, cleaned, transformed, and used in a data warehouse for
analysis. The volume of data, database performance, and storage pricing play important
role in helping you choose the right storage solution.
Diagram of a data warehouse compared with a data lake.

Data warehouse vs. data mart 


 
A data mart is a subsection of a data warehouse, partitioned specifically for a department or
line of business – like sales, marketing, or finance. Some data marts are created for
standalone operational purposes as well. While a data warehouse serves as the central
data store for an entire company, a data mart serves relevant data to a select group of
users. This simplifies data access, speeds up analysis, and gives them control over their
own data. Multiple data marts are often deployed within a data warehouse.

Diagram of a data mart and how it works.

What are the key components of


a data warehouse?
A typical data warehouse has four main components: a central database, ETL (extract,
transform, load) tools, metadata, and access tools. All of these components are
engineered for speed so that you can get results quickly and analyze data on the fly.

Diagram showing the components of a data warehouse.

1. Central database: A database serves as the foundation of your data warehouse.


Traditionally, these have been standard relational databases running on premise or
in the cloud. But because of Big Data, the need for true, real-time performance, and
a drastic reduction in the cost of RAM, in-memory databases are rapidly gaining in
popularity.
2. Data integration: Data is pulled from source systems and modified to align the
information for rapid analytical consumption using a variety of data integration
approaches such as ETL (extract, transform, load) and ELT as well as real-time data
replication, bulk-load processing, data transformation, and data quality and
enrichment services.
3. Metadata: Metadata is data about your data. It specifies the source, usage, values,
and other features of the data sets in your data warehouse. There is business
metadata, which adds context to your data, and technical metadata, which describes
how to access data – including where it resides and how it is structured.
4. Data warehouse access tools: Access tools allow users to interact with the data in
your data warehouse. Examples of access tools include: query and reporting tools,
application development tools, data mining tools, and OLAP tools.
Data warehouse architecture
In the past, data warehouses operated in layers that matched the flow of the business data.

Diagram of data warehouse architecture. A typical data warehouse includes the three
separate layers above. Today, modern data warehouses combine OLTP and OLAP in a
single system.

 Data layer: Data is extracted from your sources and then transformed and loaded
into the bottom tier using ETL tools. The bottom tier consists of your database
server, data marts, and data lakes. Metadata is created in this tier – and data
integration tools, like data virtualization, are used to seamlessly combine and
aggregate data.
 Semantics layer: In the middle tier, online analytical processing (OLAP) and online
transactional processing (OLTP) servers restructure the data for fast, complex
queries and analytics.
 Analytics layer: The top tier is the front-end client layer. It holds the data warehouse
access tools that let users interact with data, create dashboards and reports, monitor
KPIs, mine and analyze data, build apps, and more. This tier often includes a
workbench or sandbox area for data exploration and new data model development.
 
Data warehouses have been designed to support decision making and have been primarily
built and maintained by IT teams, but over the past few years they have evolved to
empower business users – reducing their reliance on IT to get access to the data and derive
actionable insights. A few key data warehousing capabilities that have empowered business
users are:

1. The semantic or business layer that provides natural language phrases and allows
everyone to instantly understand data, define relationships between elements in the
data model, and enrich data fields with new business information.
2. Virtual workspaces allow teams to bring data models and connections into one
secured and governed place supporting better collaborating with colleagues through
one common space and one common data set.
3. Cloud has further improved decision making by globally empowering employees with
a rich set of tools and features to easily perform data analysis tasks. They can
connect new apps and data sources without much IT support.

Get started
Try our cloud data warehouse today.

Free trial
Top seven benefits of a cloud
data warehouse
Cloud-based data warehouses are rising in popularity – for good reason. These modern
warehouses offer several advantages over traditional, on-premise versions. Here are the
top seven benefits of a cloud data warehouse:  

1. Quick to deploy: With cloud data warehousing, you can purchase nearly unlimited
computing power and data storage in just a few clicks – and you can build your own
data warehouse, data marts, and sandboxes from anywhere, in minutes.
2. Low total cost of ownership (TCO): Data warehouse-as-a-service (DWaaS) pricing
models are set up so you only pay for the resources you need, when you need them.
You don’t have to forecast your long-term needs or pay for more compute throughout
the year than necessary. You can also avoid upfront costs like expensive hardware,
server rooms, and maintenance staff. Separating the storage pricing from the
computing pricing also gives you a way to drive down the costs.
3. Elasticity: With a cloud data warehouse, you can dynamically scale up or down as
needed. Cloud gives us a virtualized, highly distributed environment that can
manage huge volumes of data that can scale up and down.
4. Security and disaster recovery: In many cases, cloud data warehouses actually
provide stronger data security and encryption than on-premise DWs. Data is also
automatically duplicated and backed-up, so you can minimize the risk of lost data.
5. Real-time technologies: Cloud data warehouses built on in-memory database
technology can provide extremely fast data processing speeds to deliver real-time
data for instantaneous situational awareness.
6. New technologies: Cloud data warehouses allow you to easily integrate
new technologies such as machine learning, which can provide a guided
experience for business users and decision support in the
form of recommended questions to ask, as an example.
7. Empower business users: Cloud data warehouses empower employees equally
and globally with a single view of data from numerous sources and a rich set of tools
and features to easily perform data analysis tasks. They can connect new apps and
data sources without IT.

Data warehousing supports comprehensive analytics of company expenses by department,


vendors, region, and status, to name a few.

Data warehousing best practices


When you build a new data warehouse or add new applications to an existing warehouse,
there are proven steps for achieving your goals while saving time and money. Some are
focused on your business use, and other practices are part of your overall IT program. The
following list is a good starting point, and you will pick up additional best practices as you
work with your technology and services partners. 
Business Best Practices

 Define the information you require. Once you have a good understanding of your


initial needs, you can find the data sources to support them. Often, trade groups,
customers, and suppliers will have data recommendations for you. 
 Document the location, structure, and quality of your current data. Then, you can
identify data gaps and business rules for transforming the data to meet your
warehouse requirements.
 Build a team. This includes executive sponsors, managers, and staff who will
be using and providing the information. For example, identify the standard reporting
and KPIs they need to do their jobs.
 Prioritize your data warehouse applications. Pick one or two pilot projects that have
reasonable requirements and good business value. 
 Pick a strong data warehouse technology partner. They must have
the implementation services and experience needed for your projects. Make sure
that they support your deployment needs, including both cloud services and on-
premise options. 
 Develop a good project plan. Work with your team on a realistic blueprint and
schedule that supports communications and status reporting. 

IT Best Practices

 Monitor performance and security. The information in your data warehouse is


valuable, though it must be readily accessible to provide value to the organization.
Monitor system usage carefully to ensure that performance levels are high. 
 Maintain data quality standards, metadata, structure, and governance. New sources
of valuable data are becoming available routinely, but they require consistent
management as part of a data warehouse. Follow procedures for data cleaning,
defining metadata, and meeting governance standards.
 Provide an agile architecture. As your corporate and business unit usage increases,
you will discover a wide range of data mart and warehouse needs. A flexible platform
will support them far better than a limited, restrictive product. 
 Automate processes such as maintenance. In addition to adding value to business
intelligence, machine learning can automate data warehouse technical management
functions to maintain speed and reduce operating costs. 
 Use the cloud strategically. Business units and departments have different
deployment needs. Use on-premise systems when required, and capitalize
on cloud data warehouses for scalability, reduced cost, and phone and tablet
access.  
In summary
Modern data warehouses, and increasingly cloud data warehouses, will be a key part of any
digital transformation initiative for parent companies and their business units. They
capitalize on current business systems, particularly when you combine data from multiple
internal systems with new, important information from outside organizations. 
 
Dashboards, KPIs, alerts, and reporting support executive, management, and staff
requirements, as well as important customer and supplier needs. Data warehouses also
provide fast, complex data mining and analytics, and they don’t disrupt the performance of
other business systems. 
 
Given the flexibility to start small and expand as needed, both corporate offices and
business units can improve decision-making and bottom-line performance with modern data
warehouse technology.
The effective, efficient, and economic management of data is essential for an
organization’s success. Data supports expert opinions of people and provides input
into decisions of emerging technologies such as machine learning, business
intelligence, and artificial intelligence solutions.
The practice of managing historical and cumulative data can be difficult. Data
collected from numerous sources in various formats and following different naming
conventions overall present a challenging situation for the organization. This makes it
difficult to give data a consistent meaning and to provide accessibility to people and
applications that use the data to make decisions. A data warehouse architecture can
help with these complexities.

What is Data Warehouse


Architecture?
Data warehouse architecture consists of planning, designing, constructing, and
managing daily operational processes for how data is used for organizational
intelligence and decision support. A data warehouse architecture helps create a single
source of truth for large volumes of data derived from various and different data
sources. Data is then transformed into information and information is transformed into
knowledge for analytics within the data warehouse architecture.

The data lifecycle includes data collection from identified sources, data integrity
management and reconciliation, data storage, data transfer, and continuous
improvement of data relative to organizational maturity, analytics, and decision needs.
The data warehouse architecture must support these activities and other aspects of
data lifecycle management.

Data warehouse architectures are usually designed to be stakeholder-oriented such as


for sales, marketing, and others. Although using common data, each stakeholder has
different modeling and data analysis needs for their decisions. This includes people
using various tools as well as how technologies or applications consume data for
translating the data to information and decisions.

Type of Data Warehouse


Architectures
It is not a good practice to support analytical processing with a transactional database
because of performance challenges. Transactional databases are optimized for
processing huge volumes of transactions in real-time while analytical databases are
optimized for long-running, resource-intensive queries. For this reason, transactional
data should be an input to the data warehouse database rather than supporting both
transactional and analytic needs.

There are different data warehouse models such as:

Basic data warehouse architecture


– Single Tier
This architecture minimizes the amount of data stored and data redundancies. It is not
commonly used but may meet the needs of some small organizations that do not
require enterprise access to data. Performance issues often occur when analytical and
transaction processing are not separate.
Data warehouse architecture with a
centralized repository – Two Tier
This architecture uses staging to extract specific data, transform data for usage, and
load it into a data warehouse. This process is called Extraction, Transformation, and
Loading (ETL). One of the extraction sources can be from a transactional database.
Information is saved to one logically centralized individual repository, a data
warehouse, that is paired with analytical tools. Data marts may be included in a two-
tier data warehouse architecture to deliver focused business user applications.

Data warehouse architecture with a centralized


repository, and an OLAP Server – Three Tier
This architecture adds an On-Line Analytical Processing (OLAP) Server to the two-
tier design. This middle tier provides an abstracted view of the database for the end-
user and helps with system scalability and performance.

In each data warehouse architecture listed, there is always room for additional
optimization, such as using clusters to decentralize how data is managed and
processed. This could be useful for challenges relative to data governance, locally or
internationally. Data warehouse architectures could include bus, hub- and-spoke, and
federated models to solve specific needs.

The following diagram shows a three-tier data warehouse architecture. The data
warehouse structure can be modified at each level to fit more like components, such
as an increase in the number of data marts to support additional functional units in the
organization.
Data Warehouse Infrastructure Diagram
The main components of a data warehouse architecture are:

 Data Sources – databases and other files, including a transactional database


 The Data Warehouse itself
 Data Marts – for specific stakeholder analytical capabilities
 OLAP Server – enables fast, flexible multidimensional data analysis
 Tools that stakeholder uses to access analytics (applications)
One of the values of architecture in data warehousing is simplicity. An organization
can start with a basic structure using few components and add more later into various
parts of the architecture as the data strategy evolves. Basically, keep the design
structure and expand the specific elements such as data sources to add depth and
breadth to the solution.

Properties of Data Warehouse


Architectures

Data
warehouse architectures should focus on analytical processing. Transactional
processing should be done separately using a different database. A transactional
processing database should be a data source for the more extensive data warehouse.

Other properties of the data warehouse should include:


 The ability to scale the use of data for analytics quickly. This can be an essential
factor for the prevalence of derived analytics incorporating the most recent data
for the specific decisions that need to be made.
 The architecture should easily support additional data without redesigning the
entire system.
 The data must be adequately secured. The data warehouse contains data about the
entire organization. Compromise here is risky and could be very costly.
 Extraction, Transformation, and Loading tools should support different data
sources.
 The architecture management should not be overly complicated and should be
simplified for ease of use and better analytical outcomes.
 The data warehouse architecture in data mining applications should use trusted
data that has been adequately extracted, transformed, and loaded into the data
warehouse. Data mining tools that do not have good data will only return
inaccurate results.
 As the organization matures and understands how to use data, the data warehouse
solution should have the ability to transform quickly to accommodate changes.
Data warehouse architectures should also deliver a level of warranty relative to
availability, security, capacity, and continuity of usage. These elements of service
warranty for the data warehouse should also include usability and performance.

The data warehouse should easily support tools and applications such as reporting,
data mining, and application development tools.

Traditional Data Warehouse vs


Cloud Data Warehouses
As mentioned, a data warehouse is a collection of data from various sources,
reconciled to form a more extensive data warehouse for primary analytical processing
to support decisions for multiple stakeholders within the organization. The difference
between a traditional data warehouse vs a cloud data warehouse is related to the
general power of using cloud-based computing.

Cloud data warehouses allow the organization to:

 Take advantage of unlimited storage, rapid elasticity, and scalability


 Improve flexibility for supporting different architectures
 Improve mobility and access to data
 Support Big Data analytics better than typical on-premises solutions
 Deploy more quickly than on-premises solutions
 Gain more full-proof disaster recovery
 Pool IT resources more efficiently
Organizations can also be creative and use a hybrid solution leveraging the best of on-
premises and cloud architectures to support their data warehouse outcomes for various
stakeholders.

OLAP solutions can be leveraged for either architectural solution. OLAP allows
multidimensional analysis of data warehouse data, information, and knowledge to
support complex modeling and trend analysis of the data warehouse solution.
Business Intelligence (BI) and decision making across all functional areas in the
organization that utilize data warehouses can leverage OLAP for quick, fast, effective,
and responsive analytics.

Success with data warehouse solutions relies on understanding organizational decision


needs. Each stakeholder should be treated differently since how and when they make
decisions varies. When possible, enable end-user self-service to make configuration
changes in what and how data is accessed with their applications. The stakeholders
will have to give feedback for ETL processing to make sure the data is understandable
and meets their needs. Stakeholders and data warehouse support must work together
collaboratively and in a coordinated way to manage, evolve and transform the data
and the data warehouse into an effective, efficient, and economical solution for the
organization.

hat is a data warehouse?


What is a data warehouse?

You might also like