Report on Principles of Fragmentation in Computer Science
Report on Principles of Fragmentation in Computer Science
Manish
2060 IT 6 th
DWD PEC-
IT602B-N
Core Data
Warehouse
(Integrated, Subject
Independent Data)
Data Staging Area
(Cleansing, Transformation,
Temporary Storage)
Operational Data
Sources
(Databases, Files,
External Data)
missing data
Standardizing data formats (e.g.,
dimensions
Applying business rules and data
validation checks
3. Temporary Storage: The staging
area acts as a temporary storage area
for the extracted and transformed data
before loading it into the core data
warehouse. This allows for better
management of data flows and enables
parallel processing of different data
streams.
4. Metadata Management: Metadata,
which is data about data, is captured
and managed in the staging area. This
includes information about data
sources, transformations, data quality
rules, and other metadata that
supports data lineage and auditing.
Middle Tier - Core Data Warehouse:
1. Data Integration: The core data
warehouse integrates data from
multiple sources, resolving any data
inconsistencies, redundancies, or
conflicts. This process ensures a
consistent and unified view of data
across the organization.
2. Subject-Independent Data
Structure: The core data warehouse
typically employs a subject-
independent data model, such as a
normalized or denormalized schema, to
store atomic-level data. This structure
allows for maximum flexibility in data
analysis and reporting.
3. History and Audit Tracking: The core
data warehouse maintains a historical
record of data changes over time,
enabling time-based analysis and data
auditing capabilities.
4. Data Partitioning and Indexing: To
optimize query performance, the core
data warehouse employs partitioning
and indexing strategies based on
common access patterns and workload
characteristics.
5. Backup and Recovery: Robust
backup and recovery mechanisms are
implemented to ensure data integrity
and business continuity in case of
system failures or disasters.
Top Tier - Data Marts:
1. Subject-Oriented Data Structure:
Data marts are organized around
specific subjects or business areas,
such as sales, finance, or marketing.
The data structure is optimized for the
specific analytical requirements of each
subject area, often using a dimensional
or denormalized schema.
2. Data Summarization and
Aggregation: Data marts typically
contain summarized and aggregated
data derived from the core data
warehouse, tailored for specific
reporting and analysis needs. This
helps improve query performance for
common analytical workloads.
3. Query Optimization: Data marts are
designed and optimized for specific
query patterns and workloads,
employing techniques such as
materialized views, indexing, and
caching to enhance query
performance.
4. User Access Controls: Data marts
often have more granular user access
controls and security measures in place
to ensure data privacy and compliance
with relevant regulations and policies.
5. Departmental or Functional Focus:
Data marts are typically focused on
serving the analytical needs of specific
departments or functional areas within
an organization, such as marketing,
finance, or sales.
It's important to note that the three-tier
architecture is a logical separation of
concerns, and in practice, the physical
implementation may vary based on
organizational needs, data volumes, and
performance requirements. Some
organizations may combine the staging
area and core data warehouse into a
single physical layer, while others may
have multiple core data warehouses or
data marts for different business units or
use cases.
The three-tier data warehouse
architecture promotes data quality,
scalability, and performance optimization
while providing a structured approach to
managing and analyzing large volumes of
data from diverse sources.
Q3. What are steps in designing the data
warehouse ?
Ans..) Designing a data warehouse
involves several key steps to ensure it
meets the organization's analytical and
reporting requirements. Here are the
typical steps involved in designing a
data warehouse:
1. Define Business Requirements and
Goals : Understand the organization's
business objectives, key performance
indicators (KPIs), and the types of
analyses and reports required. This step
helps determine the scope and
requirements of the data warehouse.
2. Identify and Analyse Data Sources :
Identify the various operational data
sources (e.g., transactional systems,
databases, flat files) that will feed data
into the data warehouse. Analyse the
data structures, data quality, and
consistency of these sources.