Why Databricks - Ali - Ghodsi DAIS
Why Databricks - Ali - Ghodsi DAIS
4
Combines customer data
and streaming events to
power 100+ ML models
that score 100M
182 million
wireless subscribers
transactions per day
protected from fraud
in-store, online, and in
support centers
How are these
companies doing it?
6
Tech leaders are to the right of the Data Maturity Curve
From hindsight to foresight Automated
Decision
Making
Prescriptive
Analytics
Competitive Advantage
Predictive
Modeling Automatically make the best decision
Data
Exploration
How should we respond?
Ad Hoc
Queries
What will happen?
Reports
Clean
Data
What happened?
Data + AI Maturity
But…
most companies
still struggle
to find success
at scale
Two incompatible architectures get in the way
Data Maturity Curve
Automated
Decision Making
What happened?
Prescriptive
Analytics
Predictive
Modeling
Data + AI Maturity
Two incompatible architectures get in the way
Highly reliable and efficient All of the data and very adaptable
Highly reliable and efficient All of the data and very adaptable
Disjointed
and duplicative
data silos
Highly reliable and efficient All of the data and very adaptable
Open, reliable
data storage to
efficiently handle all
data types
Data Lake
Structured tables and unstructured files
This is the lakehouse paradigm
Technologies
Data Science Data
All ML, SQL, BI, Databricks SQL
& ML Streaming Databricks Workflows
and Streaming
Business SQL Databricks Machine Learning
use cases Structured Streaming
Intelligence Analytics
One security
and governance Governance and Security Unity Catalog
approach for all data Files, Blobs, and Table ACLs Fine-grained governance
for data and AI
assets on all clouds
Open, reliable
data storage to Delta Lake
efficiently handle all Data reliability and performance
data types
Data Lake
Structured tables and unstructured files
Databricks
Lakehouse Platform
Lakehouse Platform
Data Data Data Data Science
Warehousing Engineering Streaming and ML Simple
Unify your data warehousing and AI
use cases on a single platform
Unity Catalog
Fine-grained governance for data and AI
Delta Lake
Multicloud
Data reliability and performance One consistent data platform across clouds
Data
Warehousing
Data
Engineering
Data
Streaming
Data Science
and ML Why do
Unity Catalog
people want
this
Fine-grained governance for data and AI
Delta Lake
lakehouse?
Data reliability and performance
Data Pipelines
Unity Catalog
Delta Lake
Data Ingestion
Cloud Data Lake
Supporting enterprises in every industry
Healthcare & Life Sciences Retail & CPG Media & Entertainment Financial Services
Public Sector Manufacturing & Logistics Energy & Utilities Digital Native
The rest of the
industry has
taken notice
How do these
lakehouse offerings
stack up?
Lakehouse Benchmark
TPC-DS 3TB: Total run costs for external Parquet tables
$270.00
$243.19
$74.77
$63.95
$8.00
$386
$52
Query
$258
$35
Query
$224
Auto- $150
$21 clustering Auto-
$14 Query
Auto- $76 clustering
clustering $110
$40 Load $73
Load Load
destination
Warehousing Engineering Streaming and ML