SlideShare a Scribd company logo
Dimensional Modelling
By
Bob Timlin
Outline
• What is a Multi-dimensional Database
• What is a data-warehouse
• Review ER-Diagrams
• Problems with ER for OLAP Purposes
Outline
• What is Dimensional Modeling.
• Star Schemas (Facts and Dimensions)
• Star Schema vs. ER Diagram
• SQL Comparison
Outline (continued)
• Strengths of Dimensional Modeling
• Myths of Dimensional Modeling.
• Designing the Data warehouse
• Keys
• References
What is a MDDB?
An MDDB is a specialized data storage facility that
stores summarized data for fast and easy access. Users
can quickly view large amounts of data as a value at
any cross-section of business dimensions. A business
dimension can be any logical vision of the data -- time,
geography, or product, for example. Once an MDDB is
created, it can be copied or transported to any
platform. In addition, regardless of where the MDDB
resides, it is accessible to requesting applications on
any supported platform anywhere on the network,
including the Web.
MDDB (continued)
MDDB can be implemented either on a proprietary
MDDB product or as a dimensional model on a
RDBMS. The later is the more common. For our
purposes we will use Oracle 8i, a Relational
Database. Proprietary MDDB database include Oracle’s
Express, Arbor Essbase, Microsoft’s SQL Server OLAP
component, etc.
What is a data warehouse?
Data warehouses began in the 70’s out of the need of many
companies to combine the data of it’s various operational systems
into a useful and consistent form for analysis.
Data-warehouses are used to provide data to Decision Support
Systems (DSS). Many data-warehouses also work with OLAP
(Online Analytical Processing) servers and clients.
Data warehouses are updated only in batch not by transactions.
They are optimized for SQL Selects. This optimization includes
de-normalization.
DW (continued)
Inmon’s Four Characteristics of a Data Warehouse :
1. Subject-Oriented: DW’s answer a question, they don’t just
store data.
2. Integrated: DW’s provide a unified view of the companies
data.
3. Nonvolatile: DW’s are read-only for analytical purposes, de-
normalization is ok.
4. Time: DW-Data is time sensitive. Analyze the past to
predict the future.
mdmodel  multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities
Review of ER Modeling
Entity-relationship modeling is a logical design technique
that seeks to eliminate data redundancy and maintain the
integrity of the database. They do this by highly normalizing
the data. The more you normalize the more entities and
relationships you wind up with.
This is necessary in an online transaction processing
(OLTP) system because insert, deletes, and updates
against de-normalized data requires additional transactions
to keep all the redundant data in sync. This is both highly
inefficient and prone to errors.
The ER Model is the best model for OLTP.
The Problem with ER Diagrams
ER Diagrams are a spider web of all entities and their
relationship to other entities throughout the database
schema. Un-related relationships clutter the view of what
you really want to get at.
ER Diagrams are too complex for most end users to
understand and because of all the joins required to get any
meaningful data for analysis they are highly inefficient.
Not useful for data-warehouses which need intuitive high
performance retrieval of data.
mdmodel  multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities
What is Dimensional Modeling.
Dimensional modeling is the name of a logical design
technique often used for data-warehouses.
Dimensional modeling is a logical design technique that seeks to
present the data in a standard framework that is intuitive and
allows for high-performance access.
Dimensional modeling provides the best results for both
ease of use and high performance.
It uses the relational model with a few restrictions:
Every dimension is composed of one table with a multi-part
key, called the fact table, and a set of smaller tables called
dimension tables. Each dimension has a single-part
primary key that corresponds exactly to one of the
components of the multi-part key in the fact table. This
creates a structure that looks a lot like a star, hence the
term “Star Schema”
Interestingly early, late 60’s, implementations of relational
databases looked a lot like Star Schema’s. They pre-
dated ER Diagrams.
mdmodel  multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities
What is a Fact Table?
A fact table is composed of two or more primary keys and
usually also contains numeric data. Because it always
contains at least two primary keys it is always a M-M
relationship.
What is a Dimension?
Dimension tables on the other hand have a primary key and
only textual data or non-text data that is used for textual
purposes. This data is used for descriptive purposes only.
Each dimension is considered an equal entry point into the fact
table. The textual attributes that describe things are organized
within the dimensions. For example in a retail database we
would have product, store, customer, promotion, and time
dimensions.
Whether or not to combine related dimensions into one
dimensions is usually up to intuition. Remember however
that guiding principal of dimensional modeling is 1.
Intuitive Design, and 2. Performance.
Dimensions (continued)
Because Dimensions are the entry point into the facts that the
user is looking for they should be very descriptive and
intuitive to the user. Here are some rules:
•Verbose (full words)
•Descriptive
•Complete (no missing values)
•Quality assured (no misspellings, impossible values, obsolete or
orphaned values, or cosmetically different versions of the same
attribute)
•Indexed (perhaps B-Tree or bitmap)
•Documented in metadata that explains the origin and interpretation
of each attribute.
SQL Comparison
Dimensional Model:
SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price) ,
SUM(total_comm)
FROM order_fact of JOIN part_dimension pd ON of.part_nr = pd.part_nr
GROUP BY description;
ER-Model:
SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price),
SUM(total_comm)
FROM order o JOIN order_detail od ON o.order_nr = od.order_nr
JOIN part p ON p.part_nr = od.part_nr
JOIN customer c ON o.customer_nr = c.customer_nr
JOIN slsrep s ON s.slsrep_nr = c.slsrep_nr
GROUP BY description;
Notice that the dimensional model only joins two tables, while the ER model joins all
five in the ER Diagram. This is very typical of highly normalized ER models.
Imagine a typical normalized database with 100’s of tables
Rules about Facts and Dimensions:
The Basic tenet of dimensional modeling: “If you want to be
able to slice your data along a particular attribute, you simple
need to make the attribute appear in a dimension table.”
Facts and their corresponding Dimensions must be of the
same granularity. Meaning if the fact table holds numerically
data for days, then the dimensions must have factual
attributes that describe daily data.
An attribute can live in one and only one dimension, whereas
a fact can be repeated in multiple fact tables.
If a dimension appears to have more than one location, it is
probably playing multiple roles and needs a slightly different
textual description.
Rules (continued)
There is not necessarily a one to one relation between
source data and dimensional data, in fact usually one
source will create multiple dimensions or multiple source
data will create one dimension.
Every fact should have a default aggregation. Even if that
aggregation is No Aggregation.
ER to Dimensional Models
1. Separate each entity into the business process that it
represents.
2. Create fact tables by selecting M-M relationships that
contain numeric and additive non-key facts. Fact tables may
be a detail level or at an aggregated level depending on
business needs.
3. Create Dimensions by de-normalizing all the remaining
tables into flat tables with atomic keys that connect directly
to the fact tables.
Kimball: 146/147
Strengths of Dimensional Modeling
The Dimensional model is:
1. Predictable. Query tools can make strong assumptions about it.
2. Dynamic.
3. Extends Gracefully by adding rows or columns.
4. Standardized approach to modeling business events.
5. Growing number of software applications to support it.
Kimball: 147 to 149
Myths about Dimensional Modeling
1. Dimensional Models are non-dynamic: Only when you pre-
aggregate. Kept in it’s detail form it is just as dynamic as ER.
2. Dimensional Models are too complex: Just the opposite
3. Snow flaking is an alternative to Dimensional Modeling:
Snow flaking is an extension to the Star Schema. It adds sub-
dimensions to dimensions and therefore looks like a snow-
flake. It decreases the “simplicity” of the star-schema and
should be avoided.
Kimball: 150/151
Designing the Data warehouse
There are two approaches to building the data-warehouse. The
first is the top-down approach. In this approach an entire
organization wide data-warehouse is built and then smaller data-
marts use it as a source.
The second approach, which much more feasible, is the bottom-
up approach. In this approach individual data-marts are built
using conformed dimensions and a standardized architecture
across the enterprise.
Design Success factors
1. Create a surrounding architecture that defines the scope and
implementation of the complete data warehouse
2. Oversee the construction of each piece of the complete data
warehouse.
Kimball in chapter five refers to a design called the data-
warehouse bus architecture.
Kimball: 155
Drilling
There are two types of drilling
1. Drill down: Which simple means give me more detail, or a
lower level of granularity. For example show sales figures for
each county instead of for each state.
2. Drill up: Which simple means give me less detail, or a higher
level of granularity. For example showing sales figures for
each state instead of each county.
Most reporting/OLAP tools these days have this capability.
Special Types of Dimensions
1. Time dimension: Should be nation neutral. 176
2. Person dimension: Very atomic, for example separate
fields for all parts of name and address. 178
3. Small Static (slowly changing) Dimensions.
4. Small Dynamic (rapidly changing) Dimensions.
5. Large Static (slowly changing) Dimensions.
6. Large Dynamic (rapidly changing) Dimensions.
7. Degenerate Dimensions: Dimensions without Attributes.
8. Miscellaneous Dimensions: Miscellaneous data that
doesn’t fit anywhere else, but that you want to keep.
Keys
It is best only to use artificial keys assigned
by the data-warehouse, don’t use original
production keys. Also avoid smart keys.
Smart keys are keys that usually are also
attributes.
Designing the Fact Table
Kimball defines a four step process.
1. Choose the data mart
2. Choose the fact table grain: Should be as granular as possible.
3. Choose the dimensions: Usually determined by the fact table.
4. Choose the facts of interest to you.
Kimball: 194
A data-mart is essentially a coordinated set of fact tables, all with
similar structures. Kimball, 200
Granularity
Detail granularity has several advantages over Aggregate granularity
1. More Dynamic
2. Required for data-mining
3. Allows for Behavior analysis (207/208)
Aggregates offer increased performance when details are not needed.
A best of both worlds can be achieved using something called a snapshot.
In Oracle this is achieved using a Materialized View.
Transactions and snapshots are the yin and yang of data-warehousing.
Kimball: 211
REFERENCES:
The Data Warehouse Lifecycle Toolkit
Authors: Ralph Kimball, Laura Reeves, Margy Ross, and
Warren Thornthwaite
Publisher: Wiley.
ISBN: 0-471-25547-5
Pay particular attention to chapters 5, 6, and 7.
Thank You
Ad

More Related Content

Similar to mdmodel multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities (20)

3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
Er. Nawaraj Bhandari
 
Dbms schemas for decision support
Dbms schemas for decision supportDbms schemas for decision support
Dbms schemas for decision support
rameswara reddy venkat
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Module 1.2: Data Warehousing Fundamentals.pptx
Module 1.2:  Data Warehousing Fundamentals.pptxModule 1.2:  Data Warehousing Fundamentals.pptx
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
Aliya Saldanha
 
Exploring Data Modeling Techniques in Modern Data Warehouses
Exploring Data Modeling Techniques in Modern Data WarehousesExploring Data Modeling Techniques in Modern Data Warehouses
Exploring Data Modeling Techniques in Modern Data Warehouses
priyanka rajput
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
Dr Sandeep Kumar Poonia
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Export Data Model | SQL Database Modeler
Export Data Model | SQL Database ModelerExport Data Model | SQL Database Modeler
Export Data Model | SQL Database Modeler
SQL DBM
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08
AnwarrChaudary
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
 
MULTIMEDIA MODELING
MULTIMEDIA MODELINGMULTIMEDIA MODELING
MULTIMEDIA MODELING
Jasbeer Chauhan
 
PowerBI Training
PowerBI Training PowerBI Training
PowerBI Training
Knowledge And Skill Forum
 
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdfknowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
Rame28
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
SkillCertProExams
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Module 1.2: Data Warehousing Fundamentals.pptx
Module 1.2:  Data Warehousing Fundamentals.pptxModule 1.2:  Data Warehousing Fundamentals.pptx
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
Aliya Saldanha
 
Exploring Data Modeling Techniques in Modern Data Warehouses
Exploring Data Modeling Techniques in Modern Data WarehousesExploring Data Modeling Techniques in Modern Data Warehouses
Exploring Data Modeling Techniques in Modern Data Warehouses
priyanka rajput
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Export Data Model | SQL Database Modeler
Export Data Model | SQL Database ModelerExport Data Model | SQL Database Modeler
Export Data Model | SQL Database Modeler
SQL DBM
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08
AnwarrChaudary
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
 
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdfknowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
Rame28
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
SkillCertProExams
 

Recently uploaded (20)

Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Ad

mdmodel multidimensional (MD) modeling approach to represent more complex data relationships and provide richer analytical capabilities

  • 2. Outline • What is a Multi-dimensional Database • What is a data-warehouse • Review ER-Diagrams • Problems with ER for OLAP Purposes
  • 3. Outline • What is Dimensional Modeling. • Star Schemas (Facts and Dimensions) • Star Schema vs. ER Diagram • SQL Comparison
  • 4. Outline (continued) • Strengths of Dimensional Modeling • Myths of Dimensional Modeling. • Designing the Data warehouse • Keys • References
  • 5. What is a MDDB? An MDDB is a specialized data storage facility that stores summarized data for fast and easy access. Users can quickly view large amounts of data as a value at any cross-section of business dimensions. A business dimension can be any logical vision of the data -- time, geography, or product, for example. Once an MDDB is created, it can be copied or transported to any platform. In addition, regardless of where the MDDB resides, it is accessible to requesting applications on any supported platform anywhere on the network, including the Web.
  • 6. MDDB (continued) MDDB can be implemented either on a proprietary MDDB product or as a dimensional model on a RDBMS. The later is the more common. For our purposes we will use Oracle 8i, a Relational Database. Proprietary MDDB database include Oracle’s Express, Arbor Essbase, Microsoft’s SQL Server OLAP component, etc.
  • 7. What is a data warehouse? Data warehouses began in the 70’s out of the need of many companies to combine the data of it’s various operational systems into a useful and consistent form for analysis. Data-warehouses are used to provide data to Decision Support Systems (DSS). Many data-warehouses also work with OLAP (Online Analytical Processing) servers and clients. Data warehouses are updated only in batch not by transactions. They are optimized for SQL Selects. This optimization includes de-normalization.
  • 8. DW (continued) Inmon’s Four Characteristics of a Data Warehouse : 1. Subject-Oriented: DW’s answer a question, they don’t just store data. 2. Integrated: DW’s provide a unified view of the companies data. 3. Nonvolatile: DW’s are read-only for analytical purposes, de- normalization is ok. 4. Time: DW-Data is time sensitive. Analyze the past to predict the future.
  • 10. Review of ER Modeling Entity-relationship modeling is a logical design technique that seeks to eliminate data redundancy and maintain the integrity of the database. They do this by highly normalizing the data. The more you normalize the more entities and relationships you wind up with. This is necessary in an online transaction processing (OLTP) system because insert, deletes, and updates against de-normalized data requires additional transactions to keep all the redundant data in sync. This is both highly inefficient and prone to errors. The ER Model is the best model for OLTP.
  • 11. The Problem with ER Diagrams ER Diagrams are a spider web of all entities and their relationship to other entities throughout the database schema. Un-related relationships clutter the view of what you really want to get at. ER Diagrams are too complex for most end users to understand and because of all the joins required to get any meaningful data for analysis they are highly inefficient. Not useful for data-warehouses which need intuitive high performance retrieval of data.
  • 13. What is Dimensional Modeling. Dimensional modeling is the name of a logical design technique often used for data-warehouses. Dimensional modeling is a logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high-performance access. Dimensional modeling provides the best results for both ease of use and high performance.
  • 14. It uses the relational model with a few restrictions: Every dimension is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact table. This creates a structure that looks a lot like a star, hence the term “Star Schema” Interestingly early, late 60’s, implementations of relational databases looked a lot like Star Schema’s. They pre- dated ER Diagrams.
  • 16. What is a Fact Table? A fact table is composed of two or more primary keys and usually also contains numeric data. Because it always contains at least two primary keys it is always a M-M relationship.
  • 17. What is a Dimension? Dimension tables on the other hand have a primary key and only textual data or non-text data that is used for textual purposes. This data is used for descriptive purposes only. Each dimension is considered an equal entry point into the fact table. The textual attributes that describe things are organized within the dimensions. For example in a retail database we would have product, store, customer, promotion, and time dimensions. Whether or not to combine related dimensions into one dimensions is usually up to intuition. Remember however that guiding principal of dimensional modeling is 1. Intuitive Design, and 2. Performance.
  • 18. Dimensions (continued) Because Dimensions are the entry point into the facts that the user is looking for they should be very descriptive and intuitive to the user. Here are some rules: •Verbose (full words) •Descriptive •Complete (no missing values) •Quality assured (no misspellings, impossible values, obsolete or orphaned values, or cosmetically different versions of the same attribute) •Indexed (perhaps B-Tree or bitmap) •Documented in metadata that explains the origin and interpretation of each attribute.
  • 19. SQL Comparison Dimensional Model: SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price) , SUM(total_comm) FROM order_fact of JOIN part_dimension pd ON of.part_nr = pd.part_nr GROUP BY description; ER-Model: SELECT description, SUM(quoted_price), SUM(quantity), SUM(unit_price), SUM(total_comm) FROM order o JOIN order_detail od ON o.order_nr = od.order_nr JOIN part p ON p.part_nr = od.part_nr JOIN customer c ON o.customer_nr = c.customer_nr JOIN slsrep s ON s.slsrep_nr = c.slsrep_nr GROUP BY description; Notice that the dimensional model only joins two tables, while the ER model joins all five in the ER Diagram. This is very typical of highly normalized ER models. Imagine a typical normalized database with 100’s of tables
  • 20. Rules about Facts and Dimensions: The Basic tenet of dimensional modeling: “If you want to be able to slice your data along a particular attribute, you simple need to make the attribute appear in a dimension table.” Facts and their corresponding Dimensions must be of the same granularity. Meaning if the fact table holds numerically data for days, then the dimensions must have factual attributes that describe daily data. An attribute can live in one and only one dimension, whereas a fact can be repeated in multiple fact tables. If a dimension appears to have more than one location, it is probably playing multiple roles and needs a slightly different textual description.
  • 21. Rules (continued) There is not necessarily a one to one relation between source data and dimensional data, in fact usually one source will create multiple dimensions or multiple source data will create one dimension. Every fact should have a default aggregation. Even if that aggregation is No Aggregation.
  • 22. ER to Dimensional Models 1. Separate each entity into the business process that it represents. 2. Create fact tables by selecting M-M relationships that contain numeric and additive non-key facts. Fact tables may be a detail level or at an aggregated level depending on business needs. 3. Create Dimensions by de-normalizing all the remaining tables into flat tables with atomic keys that connect directly to the fact tables. Kimball: 146/147
  • 23. Strengths of Dimensional Modeling The Dimensional model is: 1. Predictable. Query tools can make strong assumptions about it. 2. Dynamic. 3. Extends Gracefully by adding rows or columns. 4. Standardized approach to modeling business events. 5. Growing number of software applications to support it. Kimball: 147 to 149
  • 24. Myths about Dimensional Modeling 1. Dimensional Models are non-dynamic: Only when you pre- aggregate. Kept in it’s detail form it is just as dynamic as ER. 2. Dimensional Models are too complex: Just the opposite 3. Snow flaking is an alternative to Dimensional Modeling: Snow flaking is an extension to the Star Schema. It adds sub- dimensions to dimensions and therefore looks like a snow- flake. It decreases the “simplicity” of the star-schema and should be avoided. Kimball: 150/151
  • 25. Designing the Data warehouse There are two approaches to building the data-warehouse. The first is the top-down approach. In this approach an entire organization wide data-warehouse is built and then smaller data- marts use it as a source. The second approach, which much more feasible, is the bottom- up approach. In this approach individual data-marts are built using conformed dimensions and a standardized architecture across the enterprise.
  • 26. Design Success factors 1. Create a surrounding architecture that defines the scope and implementation of the complete data warehouse 2. Oversee the construction of each piece of the complete data warehouse. Kimball in chapter five refers to a design called the data- warehouse bus architecture. Kimball: 155
  • 27. Drilling There are two types of drilling 1. Drill down: Which simple means give me more detail, or a lower level of granularity. For example show sales figures for each county instead of for each state. 2. Drill up: Which simple means give me less detail, or a higher level of granularity. For example showing sales figures for each state instead of each county. Most reporting/OLAP tools these days have this capability.
  • 28. Special Types of Dimensions 1. Time dimension: Should be nation neutral. 176 2. Person dimension: Very atomic, for example separate fields for all parts of name and address. 178 3. Small Static (slowly changing) Dimensions. 4. Small Dynamic (rapidly changing) Dimensions. 5. Large Static (slowly changing) Dimensions. 6. Large Dynamic (rapidly changing) Dimensions. 7. Degenerate Dimensions: Dimensions without Attributes. 8. Miscellaneous Dimensions: Miscellaneous data that doesn’t fit anywhere else, but that you want to keep.
  • 29. Keys It is best only to use artificial keys assigned by the data-warehouse, don’t use original production keys. Also avoid smart keys. Smart keys are keys that usually are also attributes.
  • 30. Designing the Fact Table Kimball defines a four step process. 1. Choose the data mart 2. Choose the fact table grain: Should be as granular as possible. 3. Choose the dimensions: Usually determined by the fact table. 4. Choose the facts of interest to you. Kimball: 194
  • 31. A data-mart is essentially a coordinated set of fact tables, all with similar structures. Kimball, 200
  • 32. Granularity Detail granularity has several advantages over Aggregate granularity 1. More Dynamic 2. Required for data-mining 3. Allows for Behavior analysis (207/208) Aggregates offer increased performance when details are not needed. A best of both worlds can be achieved using something called a snapshot. In Oracle this is achieved using a Materialized View. Transactions and snapshots are the yin and yang of data-warehousing. Kimball: 211
  • 33. REFERENCES: The Data Warehouse Lifecycle Toolkit Authors: Ralph Kimball, Laura Reeves, Margy Ross, and Warren Thornthwaite Publisher: Wiley. ISBN: 0-471-25547-5 Pay particular attention to chapters 5, 6, and 7.