Data Warehouse Design
Data Warehouse Design
enterprises
Builders erect houses from blueprints, with architecture constrained by physical limitations.
When someone moves in, they decide how the inside of their home should look, and their
preferences determine interior decor and design. Data warehouses are similar: their architecture
is standardized, but their design is adapted to business and user needs.
Data warehouse architecture is inherent to the main hosting platform or service selected by an
organization. It’s a built-in, static infrastructure, and hosts all the specific tools and processes
that eventually make up the warehouse.
Data warehouses contain historical, current, and critical enterprise data. They form the storage
and processing platform underlying reporting, dashboards, business intelligence, and analytics.
In the middle tier, an online analytical processing (OLAP) server powers reporting and
analytic logic. At this tier data architects may further transform the data, aggregate it, or enrich it
before running business intelligence processes.
The top tier is the front end, the user-facing layer. It contains web interfaces for
stakeholders to access and query reports or analytical results, as well as visualization
and business intelligence tools for end users running ad-hoc analysis.
Drawbacks of traditional architecture
A major issue associated with on-premises data warehouses is the cost of deployment and
management. Businesses must purchase server hardware, set aside space to house it, and devote
IT staff to set it up and administer it.
More important, however, is the fact that on-premises hardware is more difficult to scale than
cloud-based storage and computing. Managers and decision-makers must get approval for new
hardware, assign budgets, and wait for shipment, and then engineers and IT specialists must
installation and set up both hardware and software. After installation, operational systems may
not be used to capacity, which means money wasted on resources that aren’t contributing value.
This type of mission-critical infrastructure requires a high level of expenditure, attention, and
employee specialization when deployed on-premises. Modern cloud services don’t have these
problems, because resources are scalable and pricing is too.
Cloud data warehouses feature column-oriented databases, where the unit of storage is a single
attribute, with values from all records. Columnar storage does not change how customers
organize or represent data, but allows for faster access and processing.
Cloud data warehouses also offer automatic, near-real-time scalability and greater system
reliability and uptime than on-premises hardware, and transparent billing, which allows
enterprises to pay only for what they use.
Because cloud data warehouses don’t rely on the rigid structures and data modeling concepts
inherent in traditional systems, they have diverse architectures.
Microsoft Azure SQL Data Warehouse is an elastic, large-scale data warehouse PaaS that
leverages the broad ecosystem of SQL Server. Like other cloud storage and computing
platforms, it uses a distributed, MPP architecture and columnar data store. It gathers data from
databases and SaaS platforms into one powerful, fully-managed centralized repository.
Snowflake scheme
The simpler star schema is a special case of the snowflake schema. Only one level of
dimension tables is connected to the central fact table, resulting in ERDs with star shapes. These
dimension tables are denormalized, containing all attributes and information associated with the
particular type of record they hold.
Star scheme
Data warehouse design is a collaborative process that should include all key stakeholders.
Leaving out end users during planning means less engagement. The same applies when leaving
design entirely up to the IT department. High-level managers and decision-makers should
provide the overall business strategy.
Data quality should be a priority. Strong data governance practices ensure clean data and
encourage adherence to rules and regulations.
Subject matter experts should lead the data modeling process. This guidance ensures that
the data pipeline will be robust, consistently organized, and documented.
Businesses should design for optimized query performance, pulling only relevant data,
using efficient data structures, and tuning systems often. OLAP cube design in particular is
critical: It allows super-fast and intuitive analysis of data according to the multiple dimensions of
a business problem.
1. User needs: A good data warehouse design should be based on business and user needs.
Therefore, the first step in the design procedure is to gather requirements, to ensure that the data
warehouse will be integrated with existing business processes and be compatible with long-term
strategy. Enterprises must determine the purpose of their data warehouse, any technical
requirements, which stakeholders will benefit from the system, and which questions will be
answered with improved reporting, business intelligence (BI), and analytics.
2. Physical environment: Enterprises that opt for on-premises architecture must set up the
physical environment, including all the servers necessary to power ETL processes, storage, and
analytic operations. Enterprises can skip this step if they choose a cloud data warehouse.
3. Data modeling: Next comes data modeling, which is perhaps the most important planning
step. The data modeling process should result in detailed, reusable documentation of a data
warehouse’s implementation. Modelers assess the structure of data in sources, decide how to
represent these sources in the data warehouse, and specify OLAP requirements, including level
of processing granularity, important aggregations and measures, and high-level dimensions or
context.
7. Test and tune: All that remains is to test and tune the completed data warehouse and data
pipeline. Businesses should assess data ingestion and ETL/ELT systems, tweak query engine
configurations for performance, and validate final reports. This is a continuous process requiring
dedicated testing environments and ongoing engagement.