0% found this document useful (0 votes)
120 views

DWM Unit 1

The document discusses the need for data warehousing to provide strategic information to executives and managers for decision making. Operational systems store transactional data but cannot provide the aggregated, historical views needed for strategic analysis. Past decision support systems through ad-hoc reports, specialized programs, and early business intelligence tools were unable to meet escalating demands for flexible access to comprehensive enterprise-wide data. A data warehouse is defined as a non-volatile collection of integrated data from across an organization designed to support analysis and decision making through summarized and historical views without impacting transactional systems.

Uploaded by

Edu Free
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

DWM Unit 1

The document discusses the need for data warehousing to provide strategic information to executives and managers for decision making. Operational systems store transactional data but cannot provide the aggregated, historical views needed for strategic analysis. Past decision support systems through ad-hoc reports, specialized programs, and early business intelligence tools were unable to meet escalating demands for flexible access to comprehensive enterprise-wide data. A data warehouse is defined as a non-volatile collection of integrated data from across an organization designed to support analysis and decision making through summarized and historical views without impacting transactional systems.

Uploaded by

Edu Free
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

The compelling (forceful) need for data warehousing

1.1 Introduction
• Applications such as order processing, general ledger, inventory, human resources, payroll, in-
patient billing, checking accounts, insurance claims, and so on.
• These applications are important systems that run businesses
• They gather, store, and process all the data needed to successfully perform the daily routine
operations.
• They provide online information and produce a variety of reports to monitor and run the business.
• The operational computer systems did provide information to run the day-to-day operations but
what the executives needed were different kinds of information that could be used readily
to make strategic decisions.
• The decision makers wanted to know which geographic regions to focus on, which product
lines to expand, and which markets to strengthen.
• They needed the type of information with proper content and format that could help them
make such strategic decisions. We may call this type of information strategic information as
different from operational information. The operational systems, important as they were, could
not provide strategic information.
Businesses, therefore, were compelled to turn to new ways of getting strategic information.
Data warehousing is a new paradigm specifically intended to provide vital strategic information.
Figure 1-1 shows a sample of strategic areas where data warehousing had already produced results in
different industries.

1.2 Escalating(Increasing) demand for strategic information


Strategic information needed by top level management to formulate the business strategies, establish
goals, set objectives, and monitor results.

1
Here are some examples of business objectives:
• Retain the present customer base
• Increase the customer base by 15% over the next 5 years
• Improve product quality levels in the top five product groups
For making decisions about these objectives, executives and managers need information for the
following purposes:
• to get in-depth knowledge of their company’s operations,
• review and monitor key performance indicators and note how these affect one another,
• keep track of how business factors change over time,
• compare their company’s performance relative to the competition and to industry
benchmarks. Executives and managers need to focus their attention on customers’ needs
and preferences,
• emerging technologies,
• sales and marketing results,
• quality levels of products and services.
Strategic information is far more important for the continued health and survival of the
corporation.
Figure 1-2 lists the desired characteristics of strategic information.

2. Inability of past decision-support systems


5. Operational vs. Decision support system
[1,3,4 not in syllabus]

2. Failures (Inability) of Past Decision Support Systems

2
History of Decision Support Systems
Marketing department in a company has been concerned about the performance of a particular region as the sales numbers
from monthly report of that month are drastically low. The marketing manager wants to get some report from IT
department to analyze the performance over the past two years, product by product and compared to monthly targets. He
wants to take quick strategic decisions to rectify the situation. Now, there may not be any regular reports to give to the
marketing department on what they want. The IT department has to gather the data from multiple applications and start
forming results from scratch.
Depending on the size and nature of the business, most companies have gone through the following
stages of attempts to provide strategic information for decision making.
Ad hoc Reports: This was the earliest stage. Users, especially from marketing and finance, would send
requests to IT for special reports. IT would write special programs, typically one for each request, and
produce the ad hoc reports.

Special Extract Programs: IT would write a suite of programs and run the programs periodically to
extract data from the various applications.

Small Applications: IT would create simple applications based on the extracted files. The users could
stipulate the parameters for each special report. The report printing programs would print the
information based on user-specific parameters

Information Centers: The information center typically was a place where users could go to request ad
hoc reports or view special information on screens. These were predetermined reports or screens.

Decision-Support Systems: The systems were menu-driven and provided online information and also
the ability to print special reports.

Executive Information Systems: This was an attempt to bring strategic information to the executive
desktop. The main criteria were simplicity and ease of use. The system would display key information
every day and provide the ability to request simple, straightforward reports. However, only
preprogrammed screens and reports were available.
Inability to Provide Information
Figure 1-4 depicts the inadequate attempts by IT to provide strategic information.

3
Here are some of the factors relating to the inability to provide strategic information:
• IT receives too many ad hoc requests, resulting in a large overload. With limited resources, IT
is unable to respond to the numerous requests in a timely fashion.
• Requests are too numerous; they also keep changing all the time. The users need more reports to
expand and understand the earlier reports.
• The users find that they get into the spiral of asking for more and more supplementary reports,
so they sometimes adapt by asking for every possible combination, which only increases the IT
load even further.
• The users have to depend on IT to provide the information. They are not able to access the
information themselves interactively.
• The information environment ideally suited for strategic decision-making has to be very flexible
and conducive for analysis. IT has been unable to provide such an environment.

4
5. Operational vs. Decision support system
Following table summarizes the differences between the traditional operational systems and the
newer(needed) decision support system or informational systems that need to be built.
Attributes Operational Systems Decision Support Systems
1 Data Content Current value Achieved, summarized, derived
2 Data Structure Optimized for transactions Optimized for complex queries
3 Access High Medium to low
Frequency
4 Access Type Read, update, delete Read
5 Usage Predictive, repetitive Ad-hoc, random
6 Response time Sub-seconds Several seconds to minutes
7 User number Large numbers Relatively small number
8 Characteristic Operational processing Informational processing
9 Orientation Transaction Analysis
10 Users Clerk DBA database Executives, managers, business
professional executives
11 Function Day-to-day operations long-term informational requirements
12 Database design ER based, application oriented Star/snowflake, subject oriented
13 Summarization Primitive, highly detailed Summarized, consolidated
14 View Detailed, flat relational Summarized, multidimensional
15 Unit of work Short, simple transaction Complex query
16 Records accessed Tens Millions
17 Database Size 100MB to GB 100GB to TB
18 Priority High performance, high High flexibility , end user autonomy
availability
19 Indexes Few Many
20 Joins Many Some
21 Duplicated Data Normalized DBMS Denormalized DBMS
22. Derived data & Rare Common
Aggregates

5
1.3 Data Warehouse Defined
Definition of Data Warehouse
DW is a subject oriented, integrated, time varying, non-volatile collection of data that is used
primarily in organizational decision-making.
Reason for developing data warehouse
• The database designs of operational systems were not optimized for information analysis and
strategic decision making.
• The processing load of reporting affected the response time of operational systems.
• Generally all big organizations had a number of operational systems enterprise-wide reporting
could not be supported from a single system.
As a result, separate databases were built that were specifically designed to support management
information and analysis purposes.
The data warehouse is an informational environment that:
• Provides an integrated and total view of the enterprise.
• Makes the enterprise’s current and historical information easily available for strategic
decision making.
• Makes decision-support transactions possible without hindering(delay) operational systems.
• Renders the organization’s information consistent.
• Presents a flexible and interactive source of strategic information.

1. What can a Data Warehouse Do? [Not in syllabus but read once]
2. What Can a Data Warehouse Not Do?
3. Data Warehouse - An Environment or not a Product?
4. A Blend of Many Technologies
1. What can a Data Warehouse Do?
1. Immediate information delivery:
Data warehouses reduces the time period lapsed between the request for information and the
actual delivery of information to the users. For example, the sales report was formed once in
every month, usually in the first week of every month. But with data warehouses the same report
can be formulated on a daily basis thereby enabling the business analysts to exploit opportunities
that could otherwise have been raised.
2. Integration of data from within and outside the organization:
Data warehouses combine data from multiple sources. The data is collected from different
departments like sales, marketing, finance, and accounting. Besides this, data is also taken from
external sources like business magazines, news reports, survey's etc.

6
3. Provides an insight into the future:
Data warehouses store large amounts of historical information that enables the decision makers
to analyze the prevailing trends in the market and produce goods according to the customers'
demands.
4. Enables users to look at the same data in different ways:
A data warehouse provides its users with tools for analyzing and manipulating data in many
different ways. It facilitates the users to drill down into detailed data with the click of a mouse
that could have otherwise taken a few days with the traditional approach.
5. Provides freedom from the dependency on IT:
With data warehouses, the users have to no longer depend on the availability of IT professionals
to answer their queries. Now, if the manager needs an ad hoc report, he can himself form it
without the assistance of any computer expert.

2. What Can a Data Warehouse Not Do?


• It acts as an information repository that collects and reports data that already exists. It cannot
create additional data on its own.
• If an organization has dirty data in the source systems, the data warehouse will not be able to
correct results until and unless the data first cleaned.
• In this context, a data warehouse will only be able to identify where the problem exists, but
corrections will have to be made in the source systems that capture that data.

3. Data Warehouse - An Environment or not a Product?


A data warehouse is not a single software or hardware product you purchase to provide strategic
information. It is rather, a computing environment where users can find strategic information. It is a
user-centric environment.
• An ideal environment for data analysis and decision support
• Fluid, flexible, and interactive
• 100% user-driven
• Very responsive and conducive to the ask–answer–ask again pattern
• Provides the ability to discover answers to complex, unpredictable questions
4. A Blend of Many Technologies
The basic concept of data warehousing is:
• Take all the data from the operational systems.
• Where necessary, include relevant data from outside, such as industry benchmark indicators.
• Integrate all the data from the various sources.

7
• Remove inconsistencies and transform the data.
• Store the data in formats suitable for easy access for decision making.
Different technologies are, therefore, needed to support these functions. Figure 1-9 shows how a data
warehouse is a blend of the many technologies needed for the various functions.

1.4 Benefits of Data Warehousing


Tangible Benefits:
1. Cost of product introduction comes down with targeted marketing campaigns.
2. Better decisions in terms of cost and quality are taken by separating query processing from
running on operational systems.
3. Data warehouse have lead to enhanced asset and liability management since it provides a clear
picture of enterprise wise purchasing and inventory patterns thereby indicating otherwise unseen
credit exposure and opportunities for cost savings.
Intangible Benefits:
1. Improved productivity that is achieved by keeping all the data in a single location.
2. Enhanced customer relations through improved knowledge of individual customer's
requirements and trends in the market.
3. The information extracted from the data warehouse enables better customer relationship
management by tailored product offerings and improved customization.
4. Data warehouses enable reengineering of business processes by providing useful insights into
the work processes.

1.5 Features of DW
Subject-oriented Data
• Organized around major subjects, such as customer, product, sales

8
• Data warehouses are designed to help you analyze data, not on daily operations or
transaction processing
• For example, to learn more about your company's sales data, we can build a warehouse that
concentrates on sales.
• Using this warehouse, you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter, sales in this case, makes
the data warehouse subject oriented.
• E.g. claims data are organized around the subject of claims and not by individual applications
of Auto Insurance and Workers’ Comp
• Provide a simple and concise view around particular subject issues by excluding data that
are not useful in the decision support process

Integrated Data
• Data warehouses must put data from disparate sources into a consistent format.
o relational databases, flat files, on-line transaction records
• Data cleaning and data transformation techniques are applied.
o Ensure consistency in naming conventions, encoding structures, attribute measures, etc.
among different data sources
• When they achieve this, they are said to be integrated.

Some of the items that would need to standardized and made consistent:
• Naming conventions
• Codes
• Data attributes
• Measurements

9
Time-Variant Data
• In order to discover trends in business, analysts need large amounts of data.
• The data are kept for many years so they can be used for trends, forecasting, and comparisons
over time.
• A data warehouse's focus on change over time is what is meant by the term time variant.
• The time horizon for the data warehouse is significantly longer than that of operational systems
o Operational database: current value data
o Data warehouse data: provide information from a historical perspective (e.g., past 5-10
years)
• Every key structure in the data warehouse
o Contains an element of time, explicitly or implicitly

Non-volatile Data
• A physically separate store of data transformed from the operational environment
• Operational update of data does not occur in the data warehouse environment
o Does not require transaction processing, recovery, and concurrency control mechanisms
o Requires only two operations in data accessing:
▪ initial loading of data and access of data

Data Granularity
• Data granularity refers to the level of detail.
• Depending on the requirements, multiple levels of detail may be present. Many data
warehouses have at least dual levels of granularity.
• Depending on the query, we can then go to the particular level of detail and satisfy the query
• More granularity levels, more storage requirement
• Decide on the granularity levels based on the data types and the expected system performance
for queries.
Following examples of data granularity in a typical data warehouse.

10
1.6 Information Flow Mechanism

Main tasks are


1. Select the Source Data
2. Data Staging Area
3. Data Storage Component
4. Information Delivery Component
5. Metadata component
6. Management & control component
1) Select the Source Data:
Features of Production data:
• Main source of data warehouse
• Collected from various operational systems of organization
• Data may reside on different platforms, different databases, different operation system so
disparity is main issue
11
For e.g. employee table may be stored in accounts department, HR department and payroll
department. This is production data. Disparity could arise because in one table , employee name
may be defined as ENAME and is 20 bytes long, in other it may be defined as EMP NAME
and 16 bytes long, in another as NAME.
• So before the production data is stored in the DW , all the related data must be collected and
standardized and transformed into standard formats & integrated into useful data.

Features of Internal Data:


• Consists of various personal spreadsheets, documents, customer profiles etc tht the users often
keep with themselves especially when they deal with customers on a one to one basis.
• Very useful if a contribution of each customer is sufficient
• Adds additional complexity for standardizing & transformation( due to disparity as may be reside
on word, excel, access etc)
Features of Archived Data:

• Generally operating system stores current data.


• Depending on time its backup may on disk storage(older data), tape cartridges(oldest data)
• But warehouse stores current as well as historical data so we have access older and oldest data.
Features of External Data:
• Mainly collected from business magazines, industry newsletters, technology reports, competitive
analysis reports, sales and marketing analysis reports.
• Executive want this data for checking their performance
• First issue is frequency of availability
• Second issue external data don't meet each organizational standards. So data must be converted
into each company's internal formats.
• Third issue is granularity. Because external data is not displayed at fine granularity.
Extract Data from Source Systems
Now a day's many tools are available in the market to extract data from source system. e.g. are
Actaworks, Data Integrator, Octopus etc. Tools can be built in house.
Data extraction process following functions
• Identify sources of data
• Extract required data from each operational system
• Merge data

12
• Remove inconsistency in common data. e,g. In one table status is married and in another table
single. Remove this inconsistency by finding true value from records.

2) Data Staging Area


Provide a place and an area with a set of functions to clean, change, combine, convert, reduplicate,
and prepare source data for storage and use in the data warehouse
Three major functions need to be performed for getting the data ready
– extract the data (Extraction)
– transform the data (Transformation)
– and then load the data into the data warehouse storage (Loading)

Data Extraction
• Deal with numerous data sources
• Tools for data extraction
– Purchasing outside tools (e.g. Abinitio, Actawork, RT, Mapforce etc)
– Developing in-house programs
• Extract the source data into
– a group of flat files,
– or a data-staging relational database,
– or a combination of both

Data Transformation
• Perform a number of individual tasks
– Clean
– Standardization
– Combine
– Purging and separating out
– Sorting and merging
13
– Assignment of surrogate keys
• Results: a collection of integrated data that is cleaned, standardized, and summarized

14
Data Loading
• Two distinct groups of tasks
– The initial loading of the data into the data warehouse
– Refresh cycles
• Extract the changes to the source data
• Transform the data revisions
And feed the incremental data revisions on an ongoing basis

3) Data Storage Component


• A separate repository
– To keep large volume of historical data for analysis
– To keep the data in structures suitable for analysis
• The data warehouses are “read-only” data repositories
– The data is stable and it represents snapshots at specified periods
• The database in a data warehouse must be open
– Must be open to different tools
– RDBMSs or MDDBs

4) Information Delivery Component


• Who are the users?
– The novices, the casual users, the business analysts, and the power users
• Different methods of information delivery
– Ad hoc reports, complex queries, multidimensional analysis, statistical analysis, EIS
feed, data-mining applications
• Information delivery mechanism
– Online, internet, intranet, e-mail

15
5) Metadata Component
Metadata in a data warehouse is similar to the data dictionary or the data catalog in a database
management system. In the data dictionary, we keep the information about the logical data structures,
the information about the files and addresses, the information about the indexes, and so on. Similar in
warehouse it stores all the information about the contents of warehouse. So metadata is the source of
information for the management module.

6) Management & Control Component


This component controls the data transformation and the data transfer into the data warehouse storage.
It moderates the information delivery to the users. It works with the database management systems and
enables data to be properly stored in the repositories. It monitors the movement of data into the staging
area and from there into the data warehouse storage itself. The management and control component
interacts with the metadata component to perform the management and control functions.

1.7 Role of Metadata


After deployment, when user used warehouse for first time, he or she have number queries as below
• Are there any predefined queries?
• What are the various elements of data in the warehouse
• Does it contain the data that I need?
• How can I browse the data to find out what data is available?
• From which source systems has the data in the data warehouse been collected?
• How old is the data in the warehouse?
• Is there any summary data?
• To what level has the summarization/aggregation of data been done?
• Is there any pre-calculated data already available in the data warehouse?
• When was the data last refreshed?
• It is the metadata in the data warehouse which answers all the questions.

16
Example of Metadata for Client

A typical metadata contains information about the following


• Structure of data from the programmer's perspective
• Structure of data from the end-user's perspective
• Source systems that feed the data warehouse
• Transformation process that was applied before the data from the source systems could pass into
the data warehouse
• Data model
• History of data extraction process

Role of Metadata
Metadata in the data warehouse is similar to the data dictionary. The metadata component stores data
about the data. The metadata is often used for building, maintaining, and using the data warehouse. It is
the key to providing users and developers with a road map to the information in the warehouse.
The three main functions that metadata performs in a data
• Connects the different parts of the data warehouse thereby acting glue that connects all the parts.
• Provides information about the contents of the data and its underlying structure to the data
warehouse administrator and other users.
• Enables the end-users to search for the desired data in their own business terms.

17
Classification of Metadata
Operational Metadata
• Contain all of next information about the operational data sources
– Data for the data warehouse comes from several operational systems
– The data elements have various field lengths and data types
– We split records, combine parts of records from different source files, and deal with
multiple coding schemes and field lengths

Extraction and Transformation Metadata


• Contain data about the extraction of data from the source systems
– the extraction frequencies
– extraction methods,
– and business rules for the data extractions

End-User Data
• The navigational map
– Enable the end-users to find information
Allow the end-users to use their own business terminology

Why is metadata especially important in a data warehouse?


• It acts as the glue that connects all parts of the data warehouse.
• It provides information about the contents and structures to the developers.
• It opens the door to the end-users and makes the contents recognizable in their own terms

18
1.8 Data Warehouses and Data Marts
In 1998, Bill Inmon stated, “The single most important issue facing the IT manager this year is
whether to build the data warehouse first or the data mart first.”
Before deciding to build a data warehouse, we need to ask:
• Top-down or bottom-up approach?
• Enterprise-wide or department?
• Which first data warehouse or data mart?
• Build pilot or go with a full-fledged implementation?
• Dependent or independent data marts?

How are They are different?


Two different basic approaches
1. Overall data warehouse feeding dependent data marts
2. Several departmental or local data marts combining into a data warehouse
Data Warehouse Data Marts
Corporate/Enterprise -wide Departmental
Union of all data marts A single business process
Data received from staging area Star-join (facts & dimensions)
Structure for corporate view of data Structure to suit departmental view of data
Combination of more than one business Single business process
processes
Lightly indexed Highly indexed
Take months to year implementation Take months to implementation
Flexible query and analysis Restrictive query and analysis
Size vary from 100GB to few TB Less than 100GB
Low level of granularity High level of granularity

Top-Down Approach
The advantages of this approach are:
• A truly corporate effort, an enterprise view of data
• Inherently architected—not a union of disparate data marts
• Single, central storage of data about the content
• Centralized rules and control
• May see quick results if implemented with iterations

19
The disadvantages are:
• Takes longer to build even with an iterative method
• High exposure/risk to failure
• Needs high level of cross-functional skills
• High outlay without proof of concept

Bottom-Up Approach
The advantages of this approach are:
• Faster and easier implementation of manageable pieces
• Favourable return on investment and proof of concept
• Less risk of failure
• Inherently incremental; can schedule important data marts first
• Allows project team to learn and grow

The disadvantages are:


• Each data mart has its own narrow view of data
• Permeates redundant data in every data mart
• Perpetuates inconsistent and irreconcilable data
• Proliferates unmanageable interfaces

A Practical Approach by Ralph Kimball


1. Plan and define requirements at the overall corporate level
2. Create a surrounding architecture for a complete warehouse
3. Conform and standardize the data content
4. Implement the data warehouse as a series of supermarts, one at a time

An Enterprise Data Warehouse


• A data mart is a logical subset of the complete data warehouse
• A data warehouse is a conformed union of all data marts
• Individual data marts are targeted to particular business groups
• The collection of all the data marts form an integrated whole, called the enterprise data
warehouse

1.9 Data Warehouse Architecture

20
Various approaches by different Authors
1. Approach by Han Kamber: Three-Tier Data Warehouse Architecture
Data ware house adopt a three tier architecture.
These 3 tiers are: Bottom Tier, Middle Tier & Top Tier

Data Sources:
All the data related to any business organization is stored in operational databases, external files and flat
files. These sources are application oriented for example, complete data of organization such as training
detail, customer detail, sales, departments, transactions, employee detail etc. Data present here in
different formats or host format and also not well documented

Bottom Tier: Data warehouse Server


It is the relational database system. We use the back end tools and utilities to feed data into bottom tier.
These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Bottom tier contains
➢ Data warehouse
➢ Metadata Repository
➢ Data Marts
➢ Monitoring and Administration

Data Warehouse:
It is an optimized form of operational database contains only relevant information and provides fast
access to data. It has characteristics like subject oriented, integrated, and time variant, and non-volatile
Metadata repository:

21
It figure out that what is available in data warehouse.
It contains:
▪ Structure of data warehouse
▪ Data names and definitions
▪ Source of extracted data
▪ Algorithm used for data cleaning purpose
▪ Sequence of transformations applied on data
▪ Data related to system performance

Data Marts
Subset of data warehouse contain only small slices of data warehouse
E.g: Data pertaining to the single department
Two types of data marts: Dependent & Independent

Monitoring and Administration


▪ Data Refreshment
▪ Data source synchronization
▪ Disaster recovery
▪ Managing access control and security
▪ Manage data growth, database performance
▪ Controlling the number & range of queries
▪ Limiting the size of data warehouse

Middle Tier: OLAP Server


▪ It presents the users a multidimensional data from data warehouse or data marts.
▪ Typically implemented using two models:
▪ ROLAP Model: Present data in relational tables

22
▪ MOLAP Model: Present data in array based structures means map directly to data
cube array structure.

Top Tier: Front end tools


▪ It is front end client layer.
▪ Query and reporting tools
▪ Reporting Tools:
o Production reporting tools
o Report writers
▪ Managed query tools: Point and click creation of SQL used in customer mailing list.
▪ Analysis tools: Prepare charts based on analysis
▪ Data mining Tools: mining knowledge, discover hidden piece of information, new correlations,
useful pattern

2. Approach by Paulraj
a) Centralized corporate Data Warehouse
b) Independent Data Marts
c) Federated
d) Hub & Spoke
e) Data Mart Bus

a) Centralized corporate Data Warehouse


Centralized enterprise data warehouse is present. There are no data marts whether dependent or
independent. Therefore all information delivery is from the centralized data warehouse.

b) Independent Data Marts


The data warehouse is a collection of unconnected, disparate data marts, each serving a specific
department or purpose. These data marts in such organizations usually evolve over time without any
overall planning. Each data mart delivers information to its own group of users.

23
c)Federated
Common data elements in the various data marts and even data warehouses that compose federation are
integrated physically and logically. So resulttant output is centralized data warehouse.

d) Hub-and-Spoke
Centralized data warehouse is present. Also there are data marts that depend on the enterprise data
warehouse for data feed. Therefore information delivery can be both from centralized data warehouse
and dependent data marts.

e) Data Mart Bus


Do distinct, single data warehouse exists. The collection of all data marts form the data warehouse
because data marts are conformed "super marts" because the business dimensions and measured facts
are conformed and linked among the data marts. These data mart bus serve entire enterprise not just
single departments.

2. Approach by Reema Thareja


Single-Tier Architecture

24
DSS(Decision support system) engine, DSS client and RDBMS server all need one single machine.
Not scalable. All load on single machine so performance of other applications also degraded.

Two-Tier Architecture

Client DW Server
* GUI/Presentation logic * Data logic
* Query specification * Data services
* Data analysis * Meta data
* Report formatting * File services
* Summarizing
* Data access

Data warehouse resides on a dedicated RDBMS server and both DSS engine and DSS client reside on
client hardware. It utilizes existing legacy system as database servers and requires minimal investment
in additional hardware and software.
Drawbacks:
• Limited scalable.
• Can't support a large number of online end users without additional modifications
• Congestion problem may be there.

Three-Tier Architecture

25
Client Application/Data Mart Server DW Server
* GUI/Presentation logic * Summarizing * Data logic
* Query specification * Filtering * Data services
* Data Analysis * Meta Data * Meta data
* Report formatting * Multidimensional view * File services
* Data access * Data access

Advantages:
• Scalable but cost is associated with it.
• As data is put close to users, response time is fast.
• System is transparent so users are not aware where data is stored, complex or not etc
• Network traffic is less
Disadvantages:
• DSS engine is complex because for every query he has to find out location of data on server.
• Additional costs are iimposed as warehouse is maintained and data must be replicated to the
local servers.
• Need of local data administration as data design to be controlled and optimized for different
queries.

Four-Tier Architecture:

26
Data is stored in a data warehouse that includes both relational database and cache of multidimensional
data (Analysis server). After the response data is converted into web page format on the Internet server,
the data is returned to the client.

1.10 Data Warehousing Design strategy


To build an effective data warehouse, it is important for us to understand data warehouse design
principles. If our data warehouse is not built correctly, we can run into a number of different problems.

From business point of view


Understanding importance of warehouse: We and our organization must understand the importance
of having a data warehouse.
Data integrity: Avoid designing a data warehouse that will load data that is not consistent.
Goal of organization: The goal of our organization should be to integrate data and create standards that
will be used and followed.
Simplicity and user friendly: We want to design a system that is simple to use.
Operational efficiency: Once the data warehouse has been created, it should be able to carry out
operations quickly. It should not have errors or other technical problems. When errors or technical
problems do occur, they should be simple to fix.
Cost: We want to keep these costs low as much as possible.

IT design principles
Scalable: Design it in a way which will allow it to support expansions or upgrades. You should be able
to adapt it to a number of different business situations.
Design: Design should fall under the guidelines of information technology standards.

27

You might also like