1) - Brief The Architecture of Data Ware Housing
1) - Brief The Architecture of Data Ware Housing
Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the
amount of data stored to reach this goal; it removes data redundancies.
The figure shows the only layer physically available is the source layer. In this method, data
warehouses are virtual. This means that the data warehouse is implemented as a
multidimensional view of operational data created by specific middleware, or an
intermediate processing layer.
The vulnerability of this architecture lies in its failure to meet the requirement for separation
between analytical and transactional processing. Analysis queries are agreed to operational
data after the middleware interprets them. In this way, queries affect transactional
workloads.
Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier architecture
for a data warehouse system, as shown in fig:
Although it is typically called two-layer architecture to highlight a separation between
physically available sources and data warehouses, in fact, consists of four subsequent data
flow stages:
1. Source layer: A data warehouse system uses a heterogeneous source of data. That
data is stored initially to corporate relational databases or legacy databases, or it may
come from an information system outside the corporate walls.
2. Data Staging: The data stored to the source should be extracted, cleansed to
remove inconsistencies and fill gaps, and integrated to merge heterogeneous
sources into one standard schema. The so-named Extraction, Transformation,
and Loading Tools (ETL) can combine heterogeneous schemata, extract, transform,
cleanse, validate, filter, and load source data into a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual
repository: a data warehouse. The data warehouses can be directly accessed, but it
can also be used as a source for creating data marts, which partially replicate data
warehouse contents and are designed for specific enterprise departments. Meta-data
repositories store information on sources, access procedures, data staging, users,
data mart schema, and so on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue
reports, dynamically analyze information, and simulate hypothetical business
scenarios. It should feature aggregate information navigators, complex query
optimizers, and customer-friendly GUIs.
Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source system),
the reconciled layer and the data warehouse layer (containing both data warehouses and
data marts). The reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data
model for a whole enterprise. At the same time, it separates the problems of source data
extraction and integration from those of data warehouse population. In some cases,
the reconciled layer is also directly used to accomplish better some operational tasks, such
as producing daily reports that cannot be satisfactorily prepared using the corporate
applications or generating data flows to feed external processes periodically to benefit from
cleaning and integration.
Statistics is a field of mathematics that pertains to data analysis. Statistical methods and
equations can be applied to a data set in order to analyze and interpret results, explain variations
in the data, or predict future data. A few examples of statistical information we can calculate are: