0% found this document useful (0 votes)
725 views7 pages

Unit 1 Front Room Architecture

The document discusses the architecture of a data warehouse system, including the front room and back room components. The front room includes BI applications, interfaces, and management services that deliver insights to business users. The back room focuses on extracting, transforming, and loading data from source systems and includes ETL processes, management services, and data stores. Aggregates and aggregate navigation are discussed as important techniques for improving query performance at summary levels.

Uploaded by

Prathamesh Saraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
725 views7 pages

Unit 1 Front Room Architecture

The document discusses the architecture of a data warehouse system, including the front room and back room components. The front room includes BI applications, interfaces, and management services that deliver insights to business users. The back room focuses on extracting, transforming, and loading data from source systems and includes ETL processes, management services, and data stores. Aggregates and aggregate navigation are discussed as important techniques for improving query performance at summary levels.

Uploaded by

Prathamesh Saraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit 1

** Front Room Architecture:

The front room is the public face of the warehouse. It's what the business users see and work with day-to-
day.
BI Application Types:
The role of the data warehouse is to be the platform for business intelligence. The most important BI
application types include the following:

Direct access queries: The classic ad hoc requests initiated by business users from desktop query tool
applications.
Standard reports: Regularly scheduled reports typically delivered via the BI portal or as spreadsheets or
PDFs to an online library.
Analytic applications: Applications containing powerful analysis algorithms in addition to normal
database queries.
Dashboards and scorecards: multi-subject user interfaces showing key performance indicators (KPIs)
textually and graphically.

These BI applications are delivered to the business users through a variety of application interface
options, including:

BI portals and custom front ends: special purpose user interfaces to provide easy access to web based BI
applications or for specific complex queries and screen results.
Handheld device interfaces: special versions of BI applications engineered for handheld screens and input
devices.
Instantaneous BI (EII): an extreme form of real time data warehouse architecture with a direct connection
from the source transaction system to the user's screen.
BI Management Services:
BI management services run the gamut from shared services that typically reside between the presentation
server and the user to desktop services that are typically presented at the user level and mostly pertain to
report definition and results display.

Shared Services
Shared services include metadata services, security services, usage monitoring, query management,
enterprise reporting, and web and portal services

Metadata Services
A metadata model that describes the structure of the data warehouse for the tool's benefit, and simplifies
and enhances it for the user's understanding.

Security Services
Security services facilitate a user's connection to the database. Security services include authorization and
authentication services through which the user is identified and access rights are determined or access is
refused.

Usage Monitoring
Usage monitoring involves capturing information about the use of the data warehouse.

Query Management
Query management services are the capabilities that manage the translation of the user's specification of
the query on the screen into the query syntax submitted to the server, the execution of the query on the
database, and the return of the result set to the desktop.

Enterprise Reporting Services


Enterprise reporting provides the ability to create and deliver production style predefined reports that
have some level of user interaction, a broad audience, and regular execution schedules.

Web Access
Your front room architecture needs to provide users with web browser-based data access services

Portal Services
Portal tools usually leverage the web server to provide a more general user interface for accessing
organizational, communications, and presentation services.

BI Data Stores

Stored Reports
As data moves into the front room and closer to the user, it becomes more diffused. Users can generate
hundreds of ad hoc queries and reports in a day. These are typically centered on specific questions,
investigations of anomalies, or tracking the impact of a program or event. Most individual queries yield
result sets with fewer than 10,000 rows — a large percentage have fewer than 1,000 rows. These result
sets are stored in the BI tool, at least temporarily. Much of the time, the results are actually transferred
into a spreadsheet and analyzed further.

Application Server Caches


There are several data-oriented services in the front room. This is usually in the form of a local cache for
application logic or to provide lightning fast response time.

Local User Databases.


Disposable Analytic Data Stores
The disposable data store is a set of data created to support a specific short-lived business situation. It is
similar to the local data store, but it is intended to have a limited life span. For example, a company may
be launching a significant promotion or new product and want to set up a special launch control room

BI Metadata
Front room BI metadata includes the elements detailed in the following sections.
PROCESS METADATA

- Report and query execution statistics


- Network security usage statistics

TECHNICAL METADATA
- Standard query and report definitions
- BI semantic layer definition including business names for all tables and columns mapped to appropriate
presentation server objects, join paths, computed columns, and business groupings. May also include
aggregate navigation and drill across functionality.
- Application Logic
- Security groups and user assignments

BUSINESS METADATA
- Conformed attribute and fact definitions and business rules
- User documentation and training materials

** Back Room Architecture


The ETL process flow involves four major operations: extracting the data from the sources, running it
through a set of cleansing and conforming transformation processes, delivering it to the presentation
server, and managing the ETL process and back room environment.

Source Systems
It is a rare data warehouse, especially at the enterprise level, that does not pull data from multiple sources.

Extract
Most often, the challenge in the extract process is determining what data to extract and what kinds of
filters to apply. We all have stories about fields that have multiple uses, values that can't possibly exist,
payments being made from accounts that haven't been created, and other horror stories. From an
architecture point of view, you need to understand the requirements of the extract process so you can
determine what kinds of services will be needed. The extract-related ETL functions include:
- Data profiling
- Change data capture
- Extract system

Clean and Conform


Cleaning and conforming services are the core of the data quality work that takes place in the ETL
process. In this step, a range of transformations are performed to convert the data into something valuable
and presentable to the business. In one example, we had to run customer service data through more than
20 transformation steps to get it into a usable state. This involved steps like remapping the old activity
codes into the new codes, cleaning up the freeform entry fields, and populating a dummy customer ID for
pre-sales inquiries. There are five major services in the cleaning and conforming step:
- Data cleansing system
- Error event tracking
- Audit dimension creation
- Deduplicating
- Conforming

Deliver
Once the data is properly cleaned and aligned, the next step in the ETL process involves preparing the
data for user consumption and delivering it to the presentation servers.

The delivery subsystems in the ETL back room consist of:


- Slowly changing dimension (SCD) manager
- Surrogate key generator
- Hierarchy manager
- Special dimensions manager
- Fact table builders
- Surrogate key pipeline
- Multi-valued bridge table builder
- Late arriving data handler
- Dimension manager system
- Fact table provider system
- Aggregate builder
- OLAP cube builder
- Data propagation manager

ETL Management Services


Management services, some of which are actively involved in the ETL flow, like the job scheduler,
and some of which are part of the general development environment, like security.

- Job scheduler
- Backup system
- Recovery and restart
- Version control
- Version migration
- Workflow monitor
- Sorting
- Lineage and dependency
- Problem escalation
- Paralleling and pipelining
- Compliance manager
- Security
- Metadata repository

ETL Data Stores


Data stores are the temporary or permanent landing places for data across the DW/BI system. The actual
data stores you need depend on your business requirements, the stability of your source systems, and the
complexity of your extract and transformation processes.

ETL Metadata

PROCESS METADATA

- ETL operations statistics


- Audit results including checksums and other measures of quality and
completeness.
- Quality screen results

TECHNICAL METADATA

- System inventory
- Source descriptions of all data sources, including record layouts, column definitions, and business rules.
- Source access methods
- ETL data store specifications and DDL scripts
- ETL data store policies and procedures
- ETL job logic, extract and transforms

BUSINESS METADATA

- Data quality screen specifications


- Data dictionary
- Logical data map showing the overall data flow from source tables and fields through the ETL system to
target tables and columns.
- Business rule logic describing all business rules that are either explicitly checked or implemented in the
data warehouse, including slowly changing dimension policies and null handling.

** Explain aggregates and aggregates navigation.

Aggregates
Unfortunately, most organizations have fairly large datasets; at least large enough so users would have to
wait a relatively long time for any summary level query to return. In order to improve performance at
summary levels, we add the second element of the presentation server layer: aggregates. Pre-aggregating
data during the load process is one of the primary tools available to improve performance for analytic
queries. These aggregates occupy a separate logical layer, but they could be implemented in the
relational database, in an OLAP server, or on a separate application server.
Aggregates are like indexes. They will be built and rebuilt on a daily basis; the choice of aggregates will
change over time based on analysis of actual query usage. Your architecture will need to include
functionality to track aggregate usage to support this. Ideally, the aggregate navigation system will do this
for you, and automatically adjust the aggregates it creates. We call this usage based optimization. This is
also why it's a good idea to have your atomic data stored in a solid, reliable, flexible relational
database.

Although we refer to this layer as aggregates, the actual data structures may also include detail level data
for performance purposes. Some OLAP engines, for example, perform much faster when retrieving data
from the OLAP database rather than drilling through to the relational engine. In this case, if the OLAP
engine can hold the detail, it makes sense to put it in the OLAP database along with the aggregates. We
encourage you to think of the aggregate layer as essentially a fat index.

Aggregate Navigation
Having aggregates and atomic data increases the complexity of the data environment. Therefore, you
must provide a way to insulate the users from this complexity. As we said earlier, aggregates are like
indexes; they are a tool to improve performance, and they should be transparent to user queries and BI
application developers. This leads us to the third essential component of the presentation server: the
aggregate navigator. Presuming you create aggregates for performance, your architecture must include
aggregate navigation functionality. The aggregate navigator receives a user query based on the atomic
level dimensional model. It examines the query to see if it can be answered using a smaller, aggregate
table. If so, the query is rewritten to work against the aggregate table and submitted to the database
engine. The results are returned to the user, who is happy with such fast performance and unaware of the
magic it took to deliver it. At the implementation level, there are a range of technologies to provide
aggregate navigation functionality, including:
- OLAP engines
- Materialized views in the relational database with optimizer-based navigation
- Relational OLAP (ROLAP) services
- BI application servers or query tools
Many of these technologies include functionality to build and host the aggregates. In the case of an OLAP
engine, these aggregates are typically kept in a separate server, often running on a separate machine.

** Explain need of Metadata Integration.


A single integrated repository for DW/BI system metadata would be valuable in
several ways, if it were possible to build. Chief among these are impact analysis, audit
and documentation, and metadata quality management.

Impact Analysis
First, an integrated repository could help you identify the impact of making a change to the DW/BI
system.A change to the source system data model would impact the ETLprocess, and may cause a change
to the target data model, which would then impact any database definitions based on that element, like
indexes, partitions, and aggregates. It would also impact any reports that include that element. If all the
metadata is in one place, or at least connected by common keys and IDs, then it would be fairly easy to
understand the impact of this change.

Audit and Documentation


lineage analysis picks an element and determines where it came from and what went into its creation. This
is particularly important for understanding the contents and source of a given column, table, or other
object in the DW/BI system; it is essentially system generated documentation. In its most rigorous form,
lineage analysis can use the auditmetadata to determine the origin of any given fact or dimension row. In
some compliance scenarios, this is required information.

Metadata Quality and Management


Multiple copies of metadata elements kept in different systems will invariably get out of sync. For the
DW/BI system structures and processes, this kind of error is selfidentifying because the next time a
process runs or the structure is referenced, the action will fail. The DW/BI developer could have spotted
this ahead of time with an impact analysis report if one were available. Errors in descriptive or business
metadata synchronization are not so obvious. For example, if the data steward updates the description of a
customer dimension attribute in the data model, it may not be updated in any of the other half dozen or so
repositories around the DW/BI system that hold copies of this description. A user will still see the old
description in the BI tool metadata. The query will still work, but the user's understanding of the result
may be incorrect. A single repository would have one entry for the description of this field, which is then
referred to wherever it is needed. A change in the single entry automatically means everyone will see the
new value.

** Explain business, process and technical metadata at front room of BI system


(Refer Front Room Ans).

You might also like