0% found this document useful (0 votes)
2 views

Group Work

The document is a group assignment from the University of Lay Adventist of Kigali focusing on Data Warehousing and Data Mining. It covers various aspects of data warehousing including definitions, types, uses, advantages, and the differences between OLTP and warehousing. Additionally, it discusses the roadmap for data warehousing, information analysis, database design, and common mistakes in the planning and development process.

Uploaded by

Phial
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Group Work

The document is a group assignment from the University of Lay Adventist of Kigali focusing on Data Warehousing and Data Mining. It covers various aspects of data warehousing including definitions, types, uses, advantages, and the differences between OLTP and warehousing. Additionally, it discusses the roadmap for data warehousing, information analysis, database design, and common mistakes in the planning and development process.

Uploaded by

Phial
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIVERSITY OF LAY ADVENTIST OF KIGALI(UNILAK)

FACULTY OF COMPUTING AND INFORMATION SCIENCE

DEPARTMENT: IT-NETWORKING

RWAMAGANA CAMPUS

COURSE: DATA WAREHOUSE AND DATA MINING

GROUPWORK ASSIGNMENT N02

REG NUMBERS:

❖ 24495/2024: Jean d’Amour UWAMBAYE IKIREZI

❖ 25189/2024: Christine UWASHEMA

❖ 20636/2022: Walter HABIYAMBERE

❖ 19793/2022: Simon MAHORO

❖ 18558/2021: Sam D Jackson

❖ 18018/2021: Jean Sauveur Muhurwa


Topic: Comprehensive Summary of Unit V: Data Warehouse

Submission to Lecture: Dr.K.N Jonathan, PhD


Date: 13/06/2024

TABLE OF CONTENT

● Introduction to Data Warehousing………………………………………………Page 03


1. Definition……………………………………..…………………
2. Types………………………………………….…………………
3. Operational Data Store……………………….…………………
4. Enterprise Data Warehouse…………………..…………………
5. Data Marts………………………………………………………
6. Uses…………………………………………..…………………
7. Advantages…………………………………...…………………
● OLTP vs. Warehousing……………………………………………………….…Page 03
1. Roadmap to Data Warehousing…………………………………………………………
2. Data Extraction and Load………………………………………………………………
3. Metadata……………………………………………………………..…………………
4. Storage:................................................................................................…………………
● Information Analysis & Delivery………………………………………………………
1. Query Optimization………………………………………………….…………Page 04
2. Managing the Data Warehouse…………………………………………………………
1. Data Management………………………………………………………………………
2. Process Monitoring………………………………………………….………………….
● OLAP and OLTP…………………………………………………….……… .…Page 05
1. Metadata and Data Access…………………………………………..………………….
● Planning and Development Process……………………………………………..Page 07
1. Planning……………………………………………………………..………………….
2. Development Process……………………………………………….…………………..
3. Common Mistakes………………………………………………….……………………
● REFERENCE……………………………………………………………………Page 08

2
Introduction to Data Warehousing

Definition:

● A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile


collection of data that supports management’s decision-making process.
● It involves extracting value from informational assets using special storage systems
called data warehouses.

Types:

02. Operational Data Store: Mirrors operational data, e.g., item in stock.
03. Enterprise Data Warehouse: Supports historical analysis and complex pattern analysis.
04. Data Marts: Smaller, specialized data warehouses for specific departments or functions.

Uses:

● Standard reports and graphs presentation.


● Dimensional analysis and data mining.

Advantages:

● Reduces the cost of information access.


● Improves customer responsiveness.
● Identifies hidden business opportunities.
● Supports strategic decision-making.

OLTP vs. Warehousing:

● Organized by transactions vs. by subject.

3
● More users vs. fewer users.
● Accesses few records vs. entire table.
● Smaller databases vs. larger databases.
● Normalized data structure vs. un normalized.
● Continuous update vs. periodic update.

Roadmap to Data Warehousing

Data Extraction and Load:

● Source Identification: Tables, files, documents, commercial databases, emails, Internet.


● Data Cleaning: Using tools like Apertus to address issues like inconsistent naming and
units.
● Data Transformation: Converting codes, aggregating, and calculating derived values
using tools like SAS.
● Data Reengineering: Enhancing data quality and structure.

Metadata:

● Administrative Metadata: Describes source database contents, required


transformations, and history of migrated data.
● End User Metadata: Defines warehouse data, descriptions, consolidation hierarchy.

Storage:

● Relational databases (RDBMS) and Multidimensional Databases (MDD).


● Measurements quantify business processes, dimensions describe measurements.

Information Analysis & Delivery

Query Optimization:

● Query Optimizers: Improve retrieval speed using tools like bitmap indices.
● Adhoc Queries: Simple queries and analysis functions.
● Managed Queries: Business layer between end users and database.

4
● Multidimensional Analysis: OLAP supports complex analysis of dimensional data.

Managing the Data Warehouse

Data Management:

● Size and Storage: Addressing storage needs and security.


● Backups and Tracking: Ensuring data integrity and timely updates.

Process Monitoring:

● Monitoring changes in source data, ensuring data quality and accuracy.

Tools:

● Data Extraction: SAS.


● Data Cleaning: Apertus, Trillium.
● Data Storage: ORACLE, SYBASE.
● Optimizers: Advanced Parallel Optimizer, Bitmap Indices.

Database Design

Key Considerations:

● Simplicity: Ensuring data cleanliness and fast query processing.


● Fast Loading: Efficient data input processes.

Star Schema:

● Central fact table containing numerical values (sales, orders, budget, shipment).
● Dimension tables describing character data (period, market, product).

Variations:

● Outboard Tables

5
● Fact Table Families
● Multistar Fact Tables

OLAP and OLTP

OLAP Features:

● Reporting: Slice reports, pivot reports, alert-reporting, time-based, and exception


reporting.
● Wide OLAP: Generating and storing synthesized information, modeling capabilities,
forecasting, trend analysis, optimization, statistical analysis.
● Relational OLAP: Powerful SQL-generator, optimized SQL for databases, rapid
changes in dimensions.
● MDD OLAP: Row-level calculations, financial functions, currency conversions, interest
calculations.

OLTP Features:

 Detailed Transactional Data: OLTP systems store detailed records of individual


transactions, such as sales, payments, and customer interactions, which provide granular
data for mining.

 Real-Time Data Processing: OLTP systems process transactions in real-time,


ensuring that the data is current and up-to-date, which is essential for real-time data
mining applications.

 High Volume of Transactions: They handle a high volume of transactions,


generating a large amount of data that can be analyzed to identify trends, patterns, and
anomalies.

 Data Consistency and Integrity: OLTP systems ensure data consistency and
integrity through ACID (Atomicity, Consistency, Isolation, Durability) properties, which
is crucial for accurate data mining results.

 Concurrent Access and Processing: They support concurrent access by multiple


users, allowing simultaneous data collection and processing, which can be leveraged for
distributed data mining tasks.

6
Metadata and Data Access

Metadata Uses:

● Mapping source data to warehouse tables.


● Generating data extraction, transformation, and loading procedures.
● Helping users discover and query data in the warehouse.

Data Descriptions:

● Describing data elements, relationships, and field assignments.

Extract Jobs:

● Wholesale Replace
● Wholesale Append
● Update Replace
● Update Append

Planning and Development Process

Planning:

● Interviews, data quality assessment, data access, timeliness and history, data sources,
architecture decisions.

Development Process:

7
● Project initiation, developing enterprise information architecture, designing data
warehouse database, transforming data, managing metadata, developing user-interface,
managing production.

Common Mistakes

● Starting with the wrong sponsorship chain.


● Setting unrealistic expectations.
● Confusing data warehouse design with transactional database design.
● Over-reliance on performance and capacity promises.
● Assuming no further issues once the data warehouse is operational.

● REFERENCES

8
1. Unit V: Data Warehouse Of Lecture
2. Google Search
3. Data Mining

End!!!

You might also like