Real Time DW With SQL - Server PDF
Real Time DW With SQL - Server PDF
• https://www.linkedin.com/in/markmurphynyc
• http://www.infinityanalytics.com/
1
5/30/2015
AdvWorks
2014 Nightly ETL
SSIS/Stored Procs
Supplier Shipping
Schedules
(CSV/XML)
SQL 2008
Inventory
Merge Changes AdventureWorks
DW 2014
AdvWorks
2014 Constant ETL
Stored Procs/CDC
Supplier Shipping
Schedules
(CSV/XML)
2
5/30/2015
Why?
• Zero data latency
• Top customers *today*
• RT Analytics
• Predictive analytics
• Recommender systems
• RT promotions
3
5/30/2015
4 Architectural Components
1. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
3. XXXXXXXXXXXXXXXXXXXXXX (CDC) XXXXXXXXXXXXXXXXXXXXXXX
4. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4
5/30/2015
SQL 2008
Inventory
Merge Changes AdventureWorks
DW 2014
AdvWorks
2014
ETL
Stored Procs
Supplier Shipping
Schedules
(CSV/XML)
End Goal
• Dimensional Model
5
5/30/2015
Caveats to RTDW
• If you don’t need it – don’t do it
• Higher cost in ETL development and testing
• More moving parts – more to go wrong.
Let’s Go!
6
5/30/2015
AdvWorks
AdvWorks2014
ADV_WORKS_ODS
2014
SHIP_SCHED_ODS
7
5/30/2015
INVENTORY_ODS
• Mirror the source systems exactly (except
possibly for indexes)
4 Architectural Components
1. Operational Data Store (ODS) databases: create 1 per source
database or subject area. PUSH data into the ODS’s as often
as possible.
2. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
3. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8
5/30/2015
Now What?
ODS Layer
Oracle
CRM
CRM_ODS
SQL 2008
Inventory INVENTORY_ODS ? AdventureWorks
DW 2014
AdvWorks
ADV_WORKS_ODS
2014
SHIP_SCHED_ODS
Source to Target
9
5/30/2015
rtdemo_src.
dimGeography
rtdemo_src.
factInternetSales
10
5/30/2015
Re-Init Procedures
MERGE INTO <DESTINATION> TGT
USING <SOURCE VIEW> AS SRC
ON SRC.Business Key = TGT.Business Key
Re-Inits
• Are needed when:
• System is initialized
• Source system changes, need to reprocess
• System troubleshooting (failsafe)
11
5/30/2015
Problem
• 1:01 run dimGeography reinit
• 1:02 run dimCustomer reinit
• 1:05 run factInternetSales reinit
Database Snapshots
• Database snapshots are created instantly, as a shadow copy.
ADV_WORKS_ODS
ADV_WORKS_ODS
_SNAP
• They do not store data at initial creation. Instead, they store the
“before” image as changes are made.
• Can query either the snapshot or the original.
12
5/30/2015
Re-inits in Practice
• So source views/re-inits should be pointed to the ODS Snapshots.
13
5/30/2015
ODS
ODSSnapshots
Layer
Oracle
CRM ETL
CRM_ODS_SNAP
CRM_ODS Re-init SP’s
ADV_WORKS_ODS
AdvWorks _SNAP
ADV_WORKS_ODS Adv Works
2014
DW 2014
SHIP_SCHED_ODS_
SNAP
SHIP_SCHED_ODS
LSNs
• Binary way of representing the exact transaction order of the
database.
• Example: 0X0000002D000000480001
14
5/30/2015
4 Architectural Components
1. Operational Data Store (ODS) databases: create 1 per source database or
subject area. PUSH data into the ODS’s as often as possible.
4. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
15
5/30/2015
SQL 2008
Inventory INVENTORY_ODS ? AdventureWorks
DW 2014
AdvWorks
ADV_WORKS_ODS
2014
SHIP_SCHED_ODS
Incremental Algorithm
Lookup the last LSN from the HWM table (old)
Begin Transaction
Process all dimensions incrementally (old,new)
Process all facts incrementally (old,new)
Update the HWM
Commit Transaction
16
5/30/2015
CDC
• Using Change Data Capture (CDC) to pull all changes to a table for a
given LSN range.
CDC Primer
Sales.SalesOrderHeader
cdc.Sales_SalesOrderHeader_CT
..and functions…
17
5/30/2015
• cdc.fn_get_NET_changes_Sales_SalesOrderHeader(@startlsn, @endlsn)
Incremental Procs
• For each source table, read from the CDC functions to see what’s changed
in the requested LSN range.
• Join the tables together, mimicking the structure of the source views.
18
5/30/2015
4 Architectural Components
1. Operational Data Store (ODS) databases: create 1 per source database or subject
area. PUSH data into the ODS’s as often as possible.
2. Re-init Processes: build a re-init stored proc for each dim and fact, sourced from
ODS snapshots. PULL from the source views into the DW dims/facts. Store the
snapshot LSNs as the starting point.
19
5/30/2015
Agent Job
20
5/30/2015
Real-Time Aggregates
21
5/30/2015
Indexed Views
• Will degrade performance of INSERTs/UPDATEs to the fact table, so
make sure they’re worthwhile to add.
OLAP
• SSAS Cubes may also be able to be updated frequently.
• MOLAP
22
5/30/2015
Monitoring/Alerting
• All ETL operations should be logged
and timed.
• Logger should commit even if overall
transaction is rolled back.
Statistics
• Won’t ever be up to date for the latest data. From fact table, in SQL
2008/2012, will give cardinality estimate of 1 if the date range is past
the HWM.
• Trace flags 2389, 2390, 4139 in SQL 2012 to deal with this “Ascending
Key” problem
23
5/30/2015
Caching
• Turn off caching on the reporting server to always have live data.
Performance Considerations
• Need to tune RT ETL so that it doesn’t
have any inefficiencies. Measure in
milliseconds, not seconds.
• Use WhoIsActive to see what’s running
24
5/30/2015
Process AW_ODS
DEV
DW Prod DW Prod
• For a new RTDW, build in parallel to an (Legacy) (new RT)
existing DW, so you can reconcile the
two.
4 Architectural Components
1. Operational Data Store (ODS) databases: create 1 per source database or subject area.
PUSH data into the ODS’s as often as possible.
2. Re-init Processes: build a re-init stored proc for each dim and fact, sourced from ODS
snapshots. PULL from the source views into the DW dims/facts. Store the snapshot LSNs as
the starting point.
3. Incremental Processes: build an incremental stored proc for each dim and fact. Use CDC
functions to populate temp tables that mimic source views. PULL data incrementally on
demand. Transactionally store new HWM.
4. Test early & test often. Make sure RT data is flowing into DEV
and QA. Tune ETL, statistics, aggregates and user queries
against a live system with RCSI enabled.
25
5/30/2015
More Information
• Code/slides at: http://www.infinityanalytics.com/
• https://www.linkedin.com/in/markmurphynyc
26
5/30/2015
27