0% found this document useful (0 votes)
24 views10 pages

BI module 3

The document outlines the data warehousing process, including key components such as data sources, extraction, transformation, and loading (ETL), as well as the architecture of data warehouses. It describes the characteristics of data warehouses, including being subject-oriented, integrated, time-variant, and non-volatile, and discusses different data representation schemas like star and snowflake. Additionally, it covers the importance of business reporting and various types of reports used for performance measurement and decision-making.

Uploaded by

layappa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

BI module 3

The document outlines the data warehousing process, including key components such as data sources, extraction, transformation, and loading (ETL), as well as the architecture of data warehouses. It describes the characteristics of data warehouses, including being subject-oriented, integrated, time-variant, and non-volatile, and discusses different data representation schemas like star and snowflake. Additionally, it covers the importance of business reporting and various types of reports used for performance measurement and decision-making.

Uploaded by

layappa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

3rd module

4.3 DATA WAREHOUDING PROCESS


Data Warehousing Process: Key Components

• Data sources. Data are sourced from multiple independent operational “legacy” systems
and possibly from external data providers (such as the U.S. Census). Data may also come
from an OLTP or enterprise resource planning (ERP) system. Web data in the form of Web
logs may also feed to a data warehouse.
• Data extraction and transformation. Data are extracted and properly trans formed using
custom-written or commercial software called ETL.
• Data loading. Data are loaded into a staging area, where they are transformed and
cleansed. The data are then ready to load into the data warehouse and/or DMs.

Comprehensive Database (EDW - Enterprise Data Warehouse)

1. A large central database that stores all the important business data.
2. It collects data from many sources and organizes it into detailed and summarized information.
3. This helps businesses make better decisions using all available data.

Metadata (Data About Data)

1. Metadata is extra information about the data that helps IT teams and users understand and organize it.
2. It includes rules and descriptions for how data is structured.
3. Metadata makes it easier to find and use data, especially with web-based tools.

Middleware Tools (Connecting Users to Data)

1. Middleware tools help users access the data warehouse.


2. Advanced users (like data analysts) can write SQL queries to retrieve data.
3. Regular business users can use easy-to-use applications like:

1. Reporting tools (for creating reports)


2. Data mining tools (for finding patterns in data)
3. OLAP tools (for analyzing data in different ways)
4. Data visualization tools (for displaying data in charts and graphs)
These components work together to make sure that data is collected, organized, and easily accessible for
better decision-making.

4.3 data warehouse architecture

Three-Tier Architecture

How it works:

o The first layer (tier) contains operational systems (the original data and software for collecting it).
o The second layer is the data warehouse, where data is stored and organized.
o The third layer is the application server, which includes decision support systems (DSS), business intelligence (BI),
and business analytics (BA) tools.

Advantages:

o Better organization: Each layer has a separate role, making the system more efficient.
o Faster analysis: Data is processed into a special multidimensional database for quick analysis.
o More flexibility: Allows creating smaller databases (Data Marts) for specific needs.

2. Two-Tier Architectur

How it works:

o The DSS engine (used for analysis) runs on the same hardware as the data warehouse.
o There is no separate application server.
Advantages:

o Cheaper than the three-tier model because it uses fewer resources.

Disadvantage

o Slower performance for large data warehouses that require heavy data processing.

3. Web-Based Data Warehousing

4.

How it works:

o Uses the Internet to access and analyze data from a data warehouse.
o It has three layers:

 Client side: The user accesses the system via a web browser.
 Web server: Manages communication between the user and the data warehouse
 Application server: Processes data and delivers insiGHT

10 factor's that effects the data warehouse architecture


Several issues must be considered when considered which architecture to use

Model q1 3rd mod 5b

ETL PROCESS

ETL (Extract, Transform, Load) in Simple Words

ETL stands for Extract, Transform, and Load, and it is a key process in managing data for a data warehouse.
This process helps collect data from different sources, clean it, and store it properly for easy access and analysis.

1. Extraction (E)

 Extracting (or pulling) data from different sources such as databases, spreadsheets, CRM systems, or even web logs.
 Example: Collecting sales data from different branches of a company.

2. Transformation (T)

 Cleaning, converting, and organizing the extracted data so that it is consistent and useful.
 This includes fixing errors, removing duplicates, and applying business rules.
 Example: Changing all date formats to a standard format like YYYY-MM-DD.

3. Load (L)

 Storing the transformed data into a data warehouse for analysis and reporting.
 Example: After cleaning, the sales data is stored in a database so that managers can analyze sales trends.
Why is ETL Important?

 It integrates data from multiple sources.


 Ensures data quality by removing errors.
 Makes data ready for reporting and analysis.
 Saves time by automating data processing.

ETL Tools and Technologies

Instead of writing custom programs, companies use ETL tools to automate the process. Some popular ETL
tools are:

 Informatica
 Microsoft SSIS (SQL Server Integration Services)
 IBM DataStage
 Oracle Data Integrator
 Talend

Challenges in ETL

 Costly: ETL tools can be expensive.


 Complexity: Requires skilled IT professionals.
 Time-Consuming: ETL takes up to 70% of the time in a data warehouse project.

MODEL Q2 MOD 3 5A

CHARECTERSTICS OF DATA WARE HOUSE

Key Characteristics of a Data Warehouse (In Simple Words)

A data warehouse is a special type of database designed to help businesses analyze data and make better
decisions. Here are its main characteristics:
1. Subject-Orientedtransactions

 Data is grouped by topics like sales, customers, or products, making it easier to analyze specific areas of the business.
 Unlike regular databases that focus on (e.g., recording a sale), a data warehouse helps understand trends and patterns in
the data.

2. Integrated

 Data from different sources (e.g., spreadsheets, databases, and external files) is combined into a single, consistent format.
 It resolves issues like naming conflicts (e.g., "customer ID" vs. "client ID") and measurement differences (e.g., "kilograms"
vs. "pounds").

3. Time-Variant (Time-Series Data)

 A data warehouse stores historical data to track changes over time.


 It helps businesses detect trends, predict future outcomes, and compare data from different time periods (e.g., monthly
sales trends).

4. Non-Volatile (Data Does Not Change)

 Once data is added, it cannot be modified or deleted.


 If new information is available, it is added as a new record, rather than updating old data.
 This helps maintain data integrity and allows users to track changes over time.

Additional Features of a Data Warehouse

✅ Web-Based: Designed to work efficiently with web applications.


✅ Relational/Multidimensional: Uses a relational database (tables with rows and columns) or a
multidimensional database (optimized for fast analysis).
✅ Client/Server Architecture: Supports access from multiple devices (clients) connected to a central server.
✅ Real-Time Data: Some modern warehouses provide instant updates instead of waiting for scheduled
updates.
✅ Includes Metadata: Stores "data about data", explaining how the data is organized and how to use it
effectively.

These characteristics make data warehouses powerful tools for businesses that need accurate, fast, and
organized data analysis.
Differnrce beteween inmon and kimball dataware house process
Representation of data in a data warehouse

Easy Explanation of Data Warehouse Structure


A data warehouse stores large amounts of data for analysis. It is designed to handle complex queries quickly
and efficiently using a method called dimensional modeling. This helps organize data in a way that makes it
easy to analyze.

Two common ways to organize data in a warehouse are:

Star Schema ⭐

1. The simplest and most common way to structure a data warehouse.


2. It has a fact table in the center, which stores key business data like sales, profits, or costs.
3. Surrounding the fact table are dimension tables, which contain details about the data, such as product names,
customer details, or locations.
4. Helps in quickly retrieving and analyzing data.

Snowflake Schema

1. Similar to the star schema but more detailed.


2. The dimension tables are broken down into smaller tables to remove duplicate data (this is called normalization).
3. Helps save storage space but can be more complex to manage.

Both schemas help businesses analyze data and make decisions efficiently.

Business reporting

Easy Explanation of Business Report

Metric Management Reports


1. Businesses use specific numbers (metrics) to measure their performance.
2. For customers, these are called service-level agreements (SLAs), which define the quality of service they should
get.
3. For internal teams, they use key performance indicators (KPIs) to track progress.
4. These reports help companies stay on track with their goals and can be part of improvement methods like Six
Sigma or Total Quality Management (TQM)

Dashboard Reports

1. A dashboard displays different performance indicators on a single page, just like a car dashboard shows speed,
fuel, and other details.
2. Companies use dashboards to quickly see how their business is doing.
3. Dashboards often use color-coded indicators ( red = bad, orange = needs attention, green = good) to highlight
important areas.

Balanced Scorecard Reports

1. This method, created by Kaplan and Norton, gives a complete view of a company’s success.
2. Instead of only looking at financial data, it also considers:

1. Customer satisfaction
2. Business processes (how efficiently things are done)
3. Learning & growth (employee development and innovation)

These reports help businesses track performance and make better decisions. ✅

You might also like