BI module 3
BI module 3
• Data sources. Data are sourced from multiple independent operational “legacy” systems
and possibly from external data providers (such as the U.S. Census). Data may also come
from an OLTP or enterprise resource planning (ERP) system. Web data in the form of Web
logs may also feed to a data warehouse.
• Data extraction and transformation. Data are extracted and properly trans formed using
custom-written or commercial software called ETL.
• Data loading. Data are loaded into a staging area, where they are transformed and
cleansed. The data are then ready to load into the data warehouse and/or DMs.
1. A large central database that stores all the important business data.
2. It collects data from many sources and organizes it into detailed and summarized information.
3. This helps businesses make better decisions using all available data.
1. Metadata is extra information about the data that helps IT teams and users understand and organize it.
2. It includes rules and descriptions for how data is structured.
3. Metadata makes it easier to find and use data, especially with web-based tools.
Three-Tier Architecture
How it works:
o The first layer (tier) contains operational systems (the original data and software for collecting it).
o The second layer is the data warehouse, where data is stored and organized.
o The third layer is the application server, which includes decision support systems (DSS), business intelligence (BI),
and business analytics (BA) tools.
Advantages:
o Better organization: Each layer has a separate role, making the system more efficient.
o Faster analysis: Data is processed into a special multidimensional database for quick analysis.
o More flexibility: Allows creating smaller databases (Data Marts) for specific needs.
2. Two-Tier Architectur
How it works:
o The DSS engine (used for analysis) runs on the same hardware as the data warehouse.
o There is no separate application server.
Advantages:
Disadvantage
o Slower performance for large data warehouses that require heavy data processing.
4.
How it works:
o Uses the Internet to access and analyze data from a data warehouse.
o It has three layers:
Client side: The user accesses the system via a web browser.
Web server: Manages communication between the user and the data warehouse
Application server: Processes data and delivers insiGHT
Several issues must be considered when considered which architecture to use
ETL PROCESS
ETL stands for Extract, Transform, and Load, and it is a key process in managing data for a data warehouse.
This process helps collect data from different sources, clean it, and store it properly for easy access and analysis.
1. Extraction (E)
Extracting (or pulling) data from different sources such as databases, spreadsheets, CRM systems, or even web logs.
Example: Collecting sales data from different branches of a company.
2. Transformation (T)
Cleaning, converting, and organizing the extracted data so that it is consistent and useful.
This includes fixing errors, removing duplicates, and applying business rules.
Example: Changing all date formats to a standard format like YYYY-MM-DD.
3. Load (L)
Storing the transformed data into a data warehouse for analysis and reporting.
Example: After cleaning, the sales data is stored in a database so that managers can analyze sales trends.
Why is ETL Important?
Instead of writing custom programs, companies use ETL tools to automate the process. Some popular ETL
tools are:
Informatica
Microsoft SSIS (SQL Server Integration Services)
IBM DataStage
Oracle Data Integrator
Talend
Challenges in ETL
MODEL Q2 MOD 3 5A
A data warehouse is a special type of database designed to help businesses analyze data and make better
decisions. Here are its main characteristics:
1. Subject-Orientedtransactions
Data is grouped by topics like sales, customers, or products, making it easier to analyze specific areas of the business.
Unlike regular databases that focus on (e.g., recording a sale), a data warehouse helps understand trends and patterns in
the data.
2. Integrated
Data from different sources (e.g., spreadsheets, databases, and external files) is combined into a single, consistent format.
It resolves issues like naming conflicts (e.g., "customer ID" vs. "client ID") and measurement differences (e.g., "kilograms"
vs. "pounds").
These characteristics make data warehouses powerful tools for businesses that need accurate, fast, and
organized data analysis.
Differnrce beteween inmon and kimball dataware house process
Representation of data in a data warehouse
Star Schema ⭐
Snowflake Schema
Both schemas help businesses analyze data and make decisions efficiently.
Business reporting
Dashboard Reports
1. A dashboard displays different performance indicators on a single page, just like a car dashboard shows speed,
fuel, and other details.
2. Companies use dashboards to quickly see how their business is doing.
3. Dashboards often use color-coded indicators ( red = bad, orange = needs attention, green = good) to highlight
important areas.
1. This method, created by Kaplan and Norton, gives a complete view of a company’s success.
2. Instead of only looking at financial data, it also considers:
1. Customer satisfaction
2. Business processes (how efficiently things are done)
3. Learning & growth (employee development and innovation)
These reports help businesses track performance and make better decisions. ✅