0% found this document useful (0 votes)
130 views

Presented By: Nirmalya Fadikar B.E. Information Technology

The document discusses data warehousing, which involves storing historical data from multiple sources in a centralized repository to facilitate reporting and analysis. A data warehouse differs from a decision support database in that it contains larger volumes of more diverse data sourced from various systems. The data is organized and aggregated to support management decision making. Key steps in a data warehousing project include identifying user needs, designing the warehouse, extracting and transforming data from source systems, and loading and maintaining the warehouse over time.

Uploaded by

Nirmalya Fadikar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

Presented By: Nirmalya Fadikar B.E. Information Technology

The document discusses data warehousing, which involves storing historical data from multiple sources in a centralized repository to facilitate reporting and analysis. A data warehouse differs from a decision support database in that it contains larger volumes of more diverse data sourced from various systems. The data is organized and aggregated to support management decision making. Key steps in a data warehousing project include identifying user needs, designing the warehouse, extracting and transforming data from source systems, and loading and maintaining the warehouse over time.

Uploaded by

Nirmalya Fadikar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 8

Presented By

Nirmalya Fadikar
B.E. Information Technology
Data warehouse is a repository of an organization's electronically stored data. Data
warehouses are designed to facilitate reporting and analysis.

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection


of data organized in support of management decision making.”

Subject-oriented. A data warehouse is structured in terms of the major subject


areas of the organisation such as, in the case of a university, students, lecturers
and modules, rather than in terms of application areas such as enrolment,
payroll and timetabling.
 
Integrated. A data warehouse provides a data repository which integrates
data from different systems with data frequently in different formats. The
objective is to provide a unified view of data for users.
 
 
Time-variant. A data warehouse explicitly associates time with data. Data in
a warehouse is only valid for some point or period in time
 
 
Non-volatile. The data in a data warehouse is not updated in real-time.
Instead, it is refreshed from data in operational systems on a regular basis.
A consequence of this is that the management of data integrity is not a critical
issue for data warehouses.

 
A data warehouse differs from a conventional decision-support
database in a number of ways:

• Volume of data. A data warehouse is likely to hold far more data than a
decision-support database. Volumes of the order of over 400 gigabytes of
data are commonplace.

• Diverse data sources. The data stored in a warehouse is likely to have


been extracted from a diverse range of application systems, only some of
which may be database systems. These systems are described as data
sources.

• Dimensional access. A warehouse is designed to fulfil a number of distinct


ways (dimensions) in which users may wish to retrieve data. This is
sometimes referred to as the need to facilitate ad-hoc query.
Data warehousing projects are large-scale development projects.  
The key steps involved in a data warehousing project are outlined below
(Inmon, 2000):
• Users specify information needs.
• Analysts and users create a logical and physical design.
• Sources of data are identified in operational systems, external sources etc.
• Source data is scrubbed, extracted and transformed.
• Data is transferred and loaded into the warehouse periodically.
• Users are given access to the warehouse data.
• The warehouse is maintained in terms of changing requirements.

some of the major components of a data warehouse:


• Operational data. Data for the warehouse may be sourced in a number of
ways, e.g. from mainframe-based hierarchical or network databases, from
relational databases and from data in proprietary file systems.
• Extraction, transformation and loading functions. These ETL operations or
functions are concerned with extracting data from source systems,
transforming it into a suitable form and loading the transformed data
into the data warehouse.
Contd..
• Warehouse management. A series of functions must be provided to
manage the warehouse: consistency analysis, indexing, denormalisation,
aggregation, backup and archiving
• Query management. The warehouse must perform a series of operations
concerned with the management of queries for use by a variety of actors:
reporting and query tools, OLAP tools or tools for data mining.

 Three specialist forms of schema are relevant to data


warehousing applications:
• star, snowflake and starflake schemas .
FORMS OF DATA IN A DATA WAREHOUSE
 We may distinguish a number of different forms of data in a data warehouse:
• Detailed data. This comprises the detailed production data. Usually, detailed data
is not stored on-line but is aggregated on a periodic basis. Sales data from the
supermarket application described above is an example of detailed data.

• Summarised data. Data in the warehouse is normally summarised or aggregated


to speed up the performance of queries. The data may be lightly summarised or
highly summarised. Summarised data needs to be updated periodically when the
detailed data is refreshed. As an example, sales data may be summarised in
terms of particular geographical areas, time-periods and/or product lines.
• Meta-data. Data about data is needed to enable the extraction, transformation
and loading processes by mapping data sources to the warehouse schema. Meta
data is also used to automate the production of summary data and to facilitate
query management.

• Archive data. Data needs to be periodically archived from the warehouse to


prevent the database growing too large for its platform. Normally, this is done
on the basis of some retention period established for data.

• Backup data. Just as in conventional databases, detailed, summary and meta-data


need to be backed up regularly in order to recover from failure.
 
 
Some of the benefits that a data warehouse
provides are as follows:
• A data warehouse provides a common data model for all data of interest regardless of the data's
source. This makes it easier to report and analyze information than it would be if multiple data
models were used to retrieve information such as sales invoices, order receipts, general ledger
charges, etc.

• Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This
greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse users so that, even if
the source system data is purged over time, the information in the warehouse can be stored safely
for extended periods of time.

• Because they are separate from operational systems, data warehouses provide retrieval of data
without slowing down operational systems.
• Data warehouses can work in conjunction with and, hence, enhance the value of operational
business applications, notably customer relationship management (CRM) systems.

• Data warehouses facilitate decision support system applications such as trend reports (e.g., the
items with the most sales in a particular area within the last two years), exception reports, and
reports that show actual performance versus goals.
Sample Applications

 Some of the applications data warehousing can


be used for are:

• Credit card churn analysis.

• Insurance fraud analysis.

• Call record analysis.

• Logistics management.

You might also like