SlideShare a Scribd company logo
Data Modeling on Azure for Analytics
Ike Ellis, MVP
General Manager – Data & AI Practice
Solliance
@ike_ellis
www.ikeellis.com
youtube.com/IkeEllisOnTheMic
• MVP since 2011
• Author of Developing Azure Solutions, Power BI MVP
Book
• Speaker at PASS Summit, SQLBits, DevIntersections,
TechEd, Craft, Microsoft Data & AI Conference
• Founder of the San Diego Software Architecture Group
• Founder of the San Diego Technology Immersion Group
• Lead a team of Data Engineers, Data Architects, Data
Scientists, and Data Creatives
• Data Platform Components
• Azure Data Platform
• Orchestration and data
processing
• Data Virtualization and data in
files
• Organize data in RAW
• Organize data in PREPARED
• Organize data in SERVING
• Star Schemas
• Dimensions
• Fact Tables
• Slowly Changing Dimensions
• Aggregating data
• Securing data
•
Relational
Data
source
Web URI
Data
Source
Source Files
(JSON, CSV,
ETC)
Streaming
Data
File-based storage
slow, cheap
Parquet files usually
Streaming
Very fast storage & processing
Data marts
Medium fast,
relational storage
Reporting
Tools
Visualization
Tools
Business
tool, usually
MS Excel
Machine
learning and
other
advanced
analytics
Data pipeline processes
Many processes used to clean, organize, and prepare data.
Often written in several different tools and languages as time goes
on. Could be streaming or batch
Orchestration
layer to control the order and workflow of the processes below
Data virtualization
layer that connects data in all of the locations
above to create a single interface for
interacting with data
Data storage
Data processing
Data sources
Data users
Aggregation Layer
Takes some amount of
work, but aggregations
are very fast and cached.
Can be relational
Relational
Data
source
Web URI
Data
Source
Source Files
(JSON, CSV,
ETC)
Streaming
Data
File-based storage
ADLS Gen 2, Azure Blob Storage
Streaming
Event Hubs, Event Grid, Service Bus
Data marts
Azure SQL
Database, Azure
Synapse Dedicated
SQL Pools
Reporting
Tools
SSRS, Power
BI
Visualization
Tools
Power BI
Business tool
MS Excel
Machine
learning
Azure ML
Studio
Data pipeline processes
Azure Databricks, Azure Synapse, Azure Functions, Azure Data Factory,
Azure Stream Analytics
Orchestration
Azure Data Factory, Azure Synapse
Data virtualization
SQL Server Polybase, Azure Synapse (or
Databricks) Spark virtual tables
Data storage
Data processing
Data sources
Data users
Aggregation Layer
Azure Analysis Services
Power BI Data Model
Manual Aggregation Tables
WEB
APPLICATIONS
DASHBOARDS
AZURE DATABRICKS
SQL DB /
SQL Server
SQL DW
AZURE
ANALYSIS
SERVICES
DATA LAKE STORE/
Azure Blob Storage
DATA
FACTORY
Mapping Dataflows
Pipelines
SSIS Packages
Triggered &
Scheduled Pipelines
ETL Logic
Calculations
AZURE
STORAGE
DIRECT
DOWNLOAD
etl
source
•
•
•
•
•
•
•
•
The whole idea of an analytical system is that data duplication will speed up
aggregations and reporting. Files allow for cheap duplication, which allows us
to duplicate more data more frequently.
CREATE TABLE CUSTOMERS
(CustID int NOT NULL,
CompanyName varchar NOT NULL)
ORDERS.parquet
CREATE EXTERNAL TABLE ORDERS
SELECT *
FROM Customers c
JOIN Orders o
ON c.CustID = o.CustID
Data Modeling on Azure for Analytics
What you do with them How you do it
Remove bad rows
Change column data types
Pivot
Unpivot
Combine columns
Remove columns
Split columns
Change format
Replace values
Merge/join data files and tables
Append data files and tables
Fill with a literal
Change the format
Perform mathematical calculations
Change the location of data
Python
C#
Azure Functions
Azure Databricks
Azure Synapse Data Flows
Power BI Data Flows
Stored Procedures
Pandas
Azure Stream Analytics
Azure Kubernetes Service
Azure VMs
And much, much more
Accounting
Database
CRM
Database
Copy
Copy
RAW
Folder on
ADLS
.parquet
files
clean data, but
don’t change
shape
Enriched
Folder on
ADLS
.parquet
files
create star
schema
Data mart
Data mart
Create
aggregations
Analysis
Services
Cubes
Data Modeling on Azure for Analytics
Files
(Parquet, ADLS, Azure Blob
Storage)
Relational
(Azure SQL Database, Azure
Database, Azure Synapse
Dedicated SQL Pools)
Cache
(Azure Cache for Redis, Azure
CosmosDB)
Stream
(Azure Event Hubs, Azure
Event Grid, Azure ServiceBus,
Azure Stream Analytics)
• Very cheap
• Not very fast
• Great for long-term
storage, archives
• Great for staging/raw
• Great for enriched layer
• Great for duplicating data
• Great for machine-
learning and other
analytics
• Can use SQL to query it
• Great for JSON, CSVs,
TSVs, any other files
• Great for serving layer
• Great for interactivity
• Great for using SQL
• Somewhat expensive
• Bad for long-term storage
(> 5 years)
• Medium term storage (1 –
5 years)
• Forces the format to be
primarily tabular (with
rows and columns)
• Generally bad for JSON
data
• Great for repeated, short-
term storage
• Very expensive
• Great for geo-replication
• Great for data that
changes quickly
• Great for JSON data
• Can use a SQL-variant to
query it (not full featured)
• Great for seven days of
data
• Very expensive
• Great for alerting
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
ERP
Data source
Copy Table RAWERPCustomers.parquet
•
•
•
•
CREATE EXTERNAL TABLE [dbo].[tempSalesOrderHeader]
(
[SalesOrderID] [int] NULL,
[SalesOrderDetailID] [int] NULL,
[OrderQty] [int] NULL,
[ProductID] [int] NULL,
[UnitPrice] [numeric](19,4) NULL,
[UnitPriceDiscount] [numeric](19,4) NULL,
[LineTotal] [numeric](38,6) NULL,
[rowguid] [varchar](8000) NULL,
[ModifiedDate] [datetime2](7) NULL
)
WITH (DATA_SOURCE = [ikedatalakefs_ikedatalake_dfs_core
_windows_net]
, LOCATION = N'raw/SalesLTSalesOrderDetail.parquet’
, FILE_FORMAT = [SynapseParquetFormat], REJECT_TYPE = V
ALUE, REJECT_VALUE = 0 )
GO
•
•
•
•
•
RAWERPCustomers.parquet Transform EnrichedCustomer.parquet
•
•
•
•
•
•
•
•
•
DimSalesPerson
SalesPersonKey
SalesPersonName
StoreName
StoreCity
StoreRegion
DimProduct
ProductKey
ProductName
ProductLine
SupplierName
DimCustomer
CustomerKey
CustomerName
City
Region
FactOrders
CustomerKey
SalesPersonKey
ProductKey
ShippingAgentKey
TimeKey
OrderNo
LineItemNo
Quantity
Revenue
Cost
Profit
DimDate
DateKey
Year
Quarter
Month
Day
DimShippingAgent
ShippingAgentKey
ShippingAgentName
•
•
•
•
•
•
•
EnrichedCustomer.parquet Data pipelines
•
•
•
•
•
•
•
•
•
•
•
DimSalesPerson
SalesPersonKey
EmployeeNo
SalesPersonName
StoreName
StoreCity
StoreRegion
surrogate key
business key
denormalized (no separate store table)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
FactOrders
CustomerKey
SalesPersonKey
ProductKey
Timekey
OrderNo
LineItemNo
PaymentMethod
Quantity
Revenue
Cost
Profit
Margin
FactAccountTransaction
CustomerKey
BranchKey
AccountTypeKey
AccountNo
CreditDebitAmount
AccountBalance
Additive
Nonadditive
Semi-additive
Degenerate
Dimensions
Grain =
Order Line Item
•
•
•
IoT stream of each ping. Can only
hold 3 days of data, since there are
hundreds of trucks
Copy
RAW Files
Sensor Detail
Transfer and
quick
aggregation
Enriched File
Truck ID
Date & Hour
Miles Per Hour
Transfer
and Star
Schema
Data Mart
Table
TruckID, Date,
HoursPerDay
Data Virtualization:
Three Tables for different granularities. The Star schema is in the most expensive, fastest
storage. The others are file-based, cheaper, there for reference or to redo an aggregation
•
•
•
•
•
•
•
•
Data Modeling on Azure for Analytics
Ike Ellis, MVP
General Manager – Data & AI Practice
Solliance
@ike_ellis
www.ikeellis.com
youtube.com/IkeEllisOnTheMic
• MVP since 2011
• Author of Developing Azure Solutions, Power BI MVP
Book
• Speaker at PASS Summit, SQLBits, DevIntersections,
TechEd, Craft, Microsoft Data & AI Conference
• Founder of the San Diego Software Architecture Group
• Founder of the San Diego Technology Immersion Group
• Lead a team of Data Engineers, Data Architects, Data
Scientists, and Data Creatives

More Related Content

What's hot (20)

PDF
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
PDF
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
 
PDF
Data Catalog for Better Data Discovery and Governance
Denodo
 
PDF
Modern Data architecture Design
Kujambu Murugesan
 
PDF
Data Mesh 101
ChrisFord803185
 
PDF
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PDF
Snowflake Company Presentation
AndrewJiang18
 
PDF
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
PDF
The ABCs of Treating Data as Product
DATAVERSITY
 
PDF
Mdm: why, when, how
Jean-Michel Franco
 
PPTX
Should I move my database to the cloud?
James Serra
 
PDF
Reference master data management
Dr. Hamdan Al-Sabri
 
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
PDF
Data modelling 101
Christopher Bradley
 
PDF
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
PDF
Data Governance Best Practices, Assessments, and Roadmaps
DATAVERSITY
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PDF
Data Catalog as the Platform for Data Intelligence
Alation
 
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Denodo
 
Modern Data architecture Design
Kujambu Murugesan
 
Data Mesh 101
ChrisFord803185
 
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
Time to Talk about Data Mesh
LibbySchulze
 
Snowflake Company Presentation
AndrewJiang18
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
The ABCs of Treating Data as Product
DATAVERSITY
 
Mdm: why, when, how
Jean-Michel Franco
 
Should I move my database to the cloud?
James Serra
 
Reference master data management
Dr. Hamdan Al-Sabri
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data modelling 101
Christopher Bradley
 
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Catalog as the Platform for Data Intelligence
Alation
 

Similar to Data Modeling on Azure for Analytics (20)

PPTX
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
PPTX
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
PPTX
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
PPTX
Data modeling trends for analytics
Ike Ellis
 
PPTX
Build a modern data platform.pptx
Ike Ellis
 
PPT
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
PPTX
Microsoft Azure Big Data Analytics
Mark Kromer
 
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
PDF
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
PPTX
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
PPTX
Anexinet Big Data Solutions
Mark Kromer
 
PDF
Good Data: Collaborative Analytics On Demand
zsvoboda
 
PDF
Trivadis Azure Data Lake
Trivadis
 
PDF
introduction to azure synapse analytics.
GravenGuan
 
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
PPTX
Fi nf068c73aef66f694f31a049aff3f4
Shawn D'souza
 
PPTX
The Microsoft BigData Story
Lynn Langit
 
PPTX
Data modeling trends for Analytics
Ike Ellis
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PPTX
Introduction to Microsoft’s Master Data Services (MDS)
James Serra
 
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Data modeling trends for analytics
Ike Ellis
 
Build a modern data platform.pptx
Ike Ellis
 
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Microsoft Azure Big Data Analytics
Mark Kromer
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Anexinet Big Data Solutions
Mark Kromer
 
Good Data: Collaborative Analytics On Demand
zsvoboda
 
Trivadis Azure Data Lake
Trivadis
 
introduction to azure synapse analytics.
GravenGuan
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Fi nf068c73aef66f694f31a049aff3f4
Shawn D'souza
 
The Microsoft BigData Story
Lynn Langit
 
Data modeling trends for Analytics
Ike Ellis
 
Prague data management meetup 2018-03-27
Martin Bém
 
Introduction to Microsoft’s Master Data Services (MDS)
James Serra
 
Ad

More from Ike Ellis (20)

PPTX
Storytelling with Data with Power BI
Ike Ellis
 
PPTX
Storytelling with Data with Power BI.pptx
Ike Ellis
 
PPTX
Migrate a successful transactional database to azure
Ike Ellis
 
PPTX
Relational data modeling trends for transactional applications
Ike Ellis
 
PPTX
Power bi premium
Ike Ellis
 
PPTX
Move a successful onpremise oltp application to the cloud
Ike Ellis
 
PPTX
Azure Databricks is Easier Than You Think
Ike Ellis
 
PPTX
Pass 2018 introduction to dax
Ike Ellis
 
PPTX
Pass the Power BI Exam
Ike Ellis
 
PPTX
Slides for PUG 2018 - DAX CALCULATE
Ike Ellis
 
PPTX
Introduction to DAX
Ike Ellis
 
PPTX
60 reporting tips in 60 minutes - SQLBits 2018
Ike Ellis
 
PPTX
14 Habits of Great SQL Developers
Ike Ellis
 
PPTX
14 Habits of Great SQL Developers
Ike Ellis
 
PPTX
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
PPTX
A lap around microsofts business intelligence platform
Ike Ellis
 
PPTX
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
PPTX
11 Goals of High Functioning SQL Developers
Ike Ellis
 
PPTX
SQL PASS BAC - 60 reporting tips in 60 minutes
Ike Ellis
 
PPTX
Introduction to Azure DocumentDB
Ike Ellis
 
Storytelling with Data with Power BI
Ike Ellis
 
Storytelling with Data with Power BI.pptx
Ike Ellis
 
Migrate a successful transactional database to azure
Ike Ellis
 
Relational data modeling trends for transactional applications
Ike Ellis
 
Power bi premium
Ike Ellis
 
Move a successful onpremise oltp application to the cloud
Ike Ellis
 
Azure Databricks is Easier Than You Think
Ike Ellis
 
Pass 2018 introduction to dax
Ike Ellis
 
Pass the Power BI Exam
Ike Ellis
 
Slides for PUG 2018 - DAX CALCULATE
Ike Ellis
 
Introduction to DAX
Ike Ellis
 
60 reporting tips in 60 minutes - SQLBits 2018
Ike Ellis
 
14 Habits of Great SQL Developers
Ike Ellis
 
14 Habits of Great SQL Developers
Ike Ellis
 
Dive Into Azure Data Lake - PASS 2017
Ike Ellis
 
A lap around microsofts business intelligence platform
Ike Ellis
 
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
11 Goals of High Functioning SQL Developers
Ike Ellis
 
SQL PASS BAC - 60 reporting tips in 60 minutes
Ike Ellis
 
Introduction to Azure DocumentDB
Ike Ellis
 
Ad

Recently uploaded (20)

PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Advancing WebDriver BiDi support in WebKit
Igalia
 

Data Modeling on Azure for Analytics