Unit 1 Front Room Architecture
Unit 1 Front Room Architecture
The front room is the public face of the warehouse. It's what the business users see and work with day-to-
day.
BI Application Types:
The role of the data warehouse is to be the platform for business intelligence. The most important BI
application types include the following:
Direct access queries: The classic ad hoc requests initiated by business users from desktop query tool
applications.
Standard reports: Regularly scheduled reports typically delivered via the BI portal or as spreadsheets or
PDFs to an online library.
Analytic applications: Applications containing powerful analysis algorithms in addition to normal
database queries.
Dashboards and scorecards: multi-subject user interfaces showing key performance indicators (KPIs)
textually and graphically.
These BI applications are delivered to the business users through a variety of application interface
options, including:
BI portals and custom front ends: special purpose user interfaces to provide easy access to web based BI
applications or for specific complex queries and screen results.
Handheld device interfaces: special versions of BI applications engineered for handheld screens and input
devices.
Instantaneous BI (EII): an extreme form of real time data warehouse architecture with a direct connection
from the source transaction system to the user's screen.
BI Management Services:
BI management services run the gamut from shared services that typically reside between the presentation
server and the user to desktop services that are typically presented at the user level and mostly pertain to
report definition and results display.
Shared Services
Shared services include metadata services, security services, usage monitoring, query management,
enterprise reporting, and web and portal services
Metadata Services
A metadata model that describes the structure of the data warehouse for the tool's benefit, and simplifies
and enhances it for the user's understanding.
Security Services
Security services facilitate a user's connection to the database. Security services include authorization and
authentication services through which the user is identified and access rights are determined or access is
refused.
Usage Monitoring
Usage monitoring involves capturing information about the use of the data warehouse.
Query Management
Query management services are the capabilities that manage the translation of the user's specification of
the query on the screen into the query syntax submitted to the server, the execution of the query on the
database, and the return of the result set to the desktop.
Web Access
Your front room architecture needs to provide users with web browser-based data access services
Portal Services
Portal tools usually leverage the web server to provide a more general user interface for accessing
organizational, communications, and presentation services.
BI Data Stores
Stored Reports
As data moves into the front room and closer to the user, it becomes more diffused. Users can generate
hundreds of ad hoc queries and reports in a day. These are typically centered on specific questions,
investigations of anomalies, or tracking the impact of a program or event. Most individual queries yield
result sets with fewer than 10,000 rows — a large percentage have fewer than 1,000 rows. These result
sets are stored in the BI tool, at least temporarily. Much of the time, the results are actually transferred
into a spreadsheet and analyzed further.
BI Metadata
Front room BI metadata includes the elements detailed in the following sections.
PROCESS METADATA
TECHNICAL METADATA
- Standard query and report definitions
- BI semantic layer definition including business names for all tables and columns mapped to appropriate
presentation server objects, join paths, computed columns, and business groupings. May also include
aggregate navigation and drill across functionality.
- Application Logic
- Security groups and user assignments
BUSINESS METADATA
- Conformed attribute and fact definitions and business rules
- User documentation and training materials
Source Systems
It is a rare data warehouse, especially at the enterprise level, that does not pull data from multiple sources.
Extract
Most often, the challenge in the extract process is determining what data to extract and what kinds of
filters to apply. We all have stories about fields that have multiple uses, values that can't possibly exist,
payments being made from accounts that haven't been created, and other horror stories. From an
architecture point of view, you need to understand the requirements of the extract process so you can
determine what kinds of services will be needed. The extract-related ETL functions include:
- Data profiling
- Change data capture
- Extract system
Deliver
Once the data is properly cleaned and aligned, the next step in the ETL process involves preparing the
data for user consumption and delivering it to the presentation servers.
- Job scheduler
- Backup system
- Recovery and restart
- Version control
- Version migration
- Workflow monitor
- Sorting
- Lineage and dependency
- Problem escalation
- Paralleling and pipelining
- Compliance manager
- Security
- Metadata repository
ETL Metadata
PROCESS METADATA
TECHNICAL METADATA
- System inventory
- Source descriptions of all data sources, including record layouts, column definitions, and business rules.
- Source access methods
- ETL data store specifications and DDL scripts
- ETL data store policies and procedures
- ETL job logic, extract and transforms
BUSINESS METADATA
Aggregates
Unfortunately, most organizations have fairly large datasets; at least large enough so users would have to
wait a relatively long time for any summary level query to return. In order to improve performance at
summary levels, we add the second element of the presentation server layer: aggregates. Pre-aggregating
data during the load process is one of the primary tools available to improve performance for analytic
queries. These aggregates occupy a separate logical layer, but they could be implemented in the
relational database, in an OLAP server, or on a separate application server.
Aggregates are like indexes. They will be built and rebuilt on a daily basis; the choice of aggregates will
change over time based on analysis of actual query usage. Your architecture will need to include
functionality to track aggregate usage to support this. Ideally, the aggregate navigation system will do this
for you, and automatically adjust the aggregates it creates. We call this usage based optimization. This is
also why it's a good idea to have your atomic data stored in a solid, reliable, flexible relational
database.
Although we refer to this layer as aggregates, the actual data structures may also include detail level data
for performance purposes. Some OLAP engines, for example, perform much faster when retrieving data
from the OLAP database rather than drilling through to the relational engine. In this case, if the OLAP
engine can hold the detail, it makes sense to put it in the OLAP database along with the aggregates. We
encourage you to think of the aggregate layer as essentially a fat index.
Aggregate Navigation
Having aggregates and atomic data increases the complexity of the data environment. Therefore, you
must provide a way to insulate the users from this complexity. As we said earlier, aggregates are like
indexes; they are a tool to improve performance, and they should be transparent to user queries and BI
application developers. This leads us to the third essential component of the presentation server: the
aggregate navigator. Presuming you create aggregates for performance, your architecture must include
aggregate navigation functionality. The aggregate navigator receives a user query based on the atomic
level dimensional model. It examines the query to see if it can be answered using a smaller, aggregate
table. If so, the query is rewritten to work against the aggregate table and submitted to the database
engine. The results are returned to the user, who is happy with such fast performance and unaware of the
magic it took to deliver it. At the implementation level, there are a range of technologies to provide
aggregate navigation functionality, including:
- OLAP engines
- Materialized views in the relational database with optimizer-based navigation
- Relational OLAP (ROLAP) services
- BI application servers or query tools
Many of these technologies include functionality to build and host the aggregates. In the case of an OLAP
engine, these aggregates are typically kept in a separate server, often running on a separate machine.
Impact Analysis
First, an integrated repository could help you identify the impact of making a change to the DW/BI
system.A change to the source system data model would impact the ETLprocess, and may cause a change
to the target data model, which would then impact any database definitions based on that element, like
indexes, partitions, and aggregates. It would also impact any reports that include that element. If all the
metadata is in one place, or at least connected by common keys and IDs, then it would be fairly easy to
understand the impact of this change.