Dwbi Notes
Dwbi Notes
What is datawarehouse
-> It is specifically designed for reporting, data analysis, and business intelligence
(BI) purposes. Data warehouses are used to collect, transform, integrate, and store
data from different operational systems, making it accessible for analysis and
decision-making.
ETL (Extract, Transform, Load): ETL processes are used to extract data from
source systems, transform it into a consistent format, and load it into the data
warehouse. ETL tools and workflows are essential for data integration and
transformation.
Data Modeling: Data modeling involves designing the structure of the data
warehouse, including the definition of tables, relationships, and attributes. Common
data modeling techniques include star schemas and snowflake schemas.
Business Intelligence (BI): BI refers to the tools, technologies, and processes used
to turn raw data into meaningful insights and reports. BI tools allow users to create
dashboards, generate ad-hoc reports, and perform data analysis.
A fact table holds the data to be analyzed, and a dimension table stores data about
the ways in which the data in the fact table can be analyzed.
1.Fact Table
Definition: A fact table is a central table in a data warehouse that contains
quantitative data, typically numerical values, and is used to represent business facts
or events. Fact tables store the metrics or measures that analysts want to analyze
and report on.Examples: In a retail DWBI system, a fact table might store sales
revenue, quantity sold, and discounts for each transaction. In a web analytics system,
it could store data on page views, clicks, and user interactions.
2.Dimension Table:
Definition: A dimension table is a table in a data warehouse that contains
descriptive or textual information about the business entities or attributes related to
the facts stored in the fact table. Dimension tables provide context and allow
analysts to filter, group, or slice data.Examples: In a retail DWBI system,
dimension tables might include a time dimension with attributes like date, month,
and year, a product dimension with attributes like product name and category, and a
customer dimension with attributes like customer name and address.
Unit 1
Q. What is data warehouse ?
Data warehouse is specially design for reporting, data analysing,
and business Intelligence (BI) purpose.data warehouse is used to
collect , transfer, and store data from different operational systems
and making it accessible to analyse and making decision.
It is a concept of subject-oriented, time-variant, volatile,Integrated
of collection of data that support to manage decision making
process. Data warehouse is organized around major subjects of the
enterprise such as customer, product and sales and rather then the
major areas such as product sales and stock control.
Benefits
1.potential high returns on investments
2.Competitive advantage
3.increased productivity of corporate decision makers
Problems with data warehouse
1.hidden problems with source systems.
2.Required data is not captured.
3.Increased end-users demands
4.High demand resources
5.High maintenance
6.Long duration projects.
Unit 2
Q.What is data modeling and types
Data modeling is a technique which is used in data warehouse and
business intelligence to organize and structure data for efficient
query and reporting. It is designed to optimize data retrieval and
analysis for decision support and reporting purpose.data
modeling typically involves two types of tables. 1.fact and
2.dimension table.
Fact table:- fact table contains numerical data (measure) that
represent facts such as sales ,revenue, profit , loss, quantity, sold ,
etc. It also include foreign key to link with dimensional table to
make relationship between measure and attribute.
Dimension table table:- this table contains descriptive attributes
or hierarchies.and they contains various level of attribute.
Advantages and disadvantages.
Types of dimension model
1.star schema
In a star schema, a central fact table is connected to dimension
tables through foreign keys, forming a star-like structure. Star
schemas are simple, intuitive, and easy to understand and query.
They are suitable for scenarios where there are one or more fact
tables sharing common dimensions.
2.snowflake schema
A snowflake schema is an extension of the star schema, where
dimension tables are further normalized into sub-dimensions,
resulting in a more complex, tree-like structure.
4.Security
1.which data is publish and whom , 2.seek advice from security
manager.
5.Data Integration
It makes all system works together seamlessly. And The data
integration must be take place on transaction system.data
integration is confirming dimension and confirming facts in DW.
6.Data latency
How quickly source data must delivered to business users it affect
on the ETL architecture. The good algorithm, parallelization and
good hardware can speed up batch processing data. ETL must
convert batch processing to stream processing if the latency is
most urgent.
7.Archive
Staging data after all major activity in ETL process it archive all the
staged data on permanent storage device.Each staged/ archive
data have metadata.
8.User Deliver interface
deliver data in a format which is easy to understand, well
structured and easy to retrieve
9.Available skills
ETL design should be based on the resources and skills
available.check the technical issues and license cost.
2.Scoreboard
It used to manage performance across organization.