0% found this document useful (0 votes)
80 views

Why Databricks - Ali - Ghodsi DAIS

The document discusses Databricks CEO Ali Ghodsi's keynote presentation which focused on the problems solved by a data lakehouse architecture. It notes that traditional data warehouse and data lake architectures are incompatible, with the data warehouse optimized for business intelligence but not data science use cases, while the data lake contains all data but lacks governance. The lakehouse paradigm combines the best of these by providing a single platform for all data types that enables SQL, machine learning, streaming and BI workloads through an open, reliable data storage layer.

Uploaded by

jjpepping
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Why Databricks - Ali - Ghodsi DAIS

The document discusses Databricks CEO Ali Ghodsi's keynote presentation which focused on the problems solved by a data lakehouse architecture. It notes that traditional data warehouse and data lake architectures are incompatible, with the data warehouse optimized for business intelligence but not data science use cases, while the data lake contains all data but lacks governance. The lakehouse paradigm combines the best of these by providing a single platform for all data types that enables SQL, machine learning, streaming and BI workloads through an open, reliable data storage layer.

Uploaded by

jjpepping
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Why Databricks?

How can companies use data to gain a competitive edge and


unlock the power of data science, machine learning, and
artificial intelligence?

Ali Ghodsi, CEO and co-founder of Databricks, outlined


answers to key questions during his keynote presentation at
the 2022 Data + AI Summit in San Francisco in July 2022.

This ‘super-cut’ 16-minute version of his talk focuses on:


• The problems a data lakehouse solves
• Why organizations choose a data lakehouse over other
architecture options
• How the Databricks Lakehouse platform performs
compared to other data structures
• Customer examples who have benefited from implementing
a lakehouse
Why
Databricks?
Lakehouse Platform
How can companies use data to gain a competitive edge
Data Data Data Data Science and unlock the power of data science, machine learning,
Warehousing Engineering Streaming and ML and artificial intelligence?

Ali Ghodsi, CEO and co-founder of Databricks, outlined


Unity Catalog answers to key questions during his keynote presentation
Fine-grained governance for data and AI at the 2022 Data + AI Summit in San Francisco in July 2022.
This ‘super-cut’ 16-minute version of his talk focuses on:
Delta Lake
Data reliability and performance • The problems a data lakehouse solves
• Why organizations choose a data lakehouse over other
Cloud Data Lake architecture options
All structured and unstructured data
• How the Databricks Lakehouse platform performs
compared to other data structures
• Customer examples who have benefited from
implementing a lakehouse
Join Ali Ghodsi, CEO and co-founder of
Databricks, as he answers these
questions in this presentation made to
open the 2022 Data + AI Summit.
Data, analytics,
and AI are driving
innovative disruption

4
Combines customer data
and streaming events to
power 100+ ML models
that score 100M
182 million
wireless subscribers
transactions per day
protected from fraud
in-store, online, and in
support centers
How are these
companies doing it?

6
Tech leaders are to the right of the Data Maturity Curve
From hindsight to foresight Automated
Decision
Making

Prescriptive
Analytics
Competitive Advantage

Predictive
Modeling Automatically make the best decision

Data
Exploration
How should we respond?
Ad Hoc
Queries
What will happen?
Reports
Clean
Data

What happened?

Data + AI Maturity
But…
most companies
still struggle
to find success
at scale
Two incompatible architectures get in the way
Data Maturity Curve

Data Warehouse Data Lake


for BI for AI
Competitive Advantage

Automated
Decision Making

What happened?
Prescriptive
Analytics

Predictive
Modeling

Data What will happen?


Exploration
Ad Hoc
Queries
Reports
Clean Data

Data + AI Maturity
Two incompatible architectures get in the way

Highly reliable and efficient All of the data and very adaptable

Business SQL Data Science Data


Intelligence Analytics & ML Streaming

Governance and Security Governance and Security


Table ACLs Files and Blobs

Copy subsets of data

Data Warehouse Data Lake


Structured tables Vast amounts of RAW data
Logs, Texts, Audio, Video, Images
Two incompatible architectures get in the way

Highly reliable and efficient All of the data and very adaptable

Business SQL Incomplete Data Science Data


support for Streaming
Intelligence Analytics use cases & ML

Governance and Security


Incompatible Governance and Security
Table ACLs security and Files and Blobs
governance models

Copy subsets of data

Disjointed
and duplicative
data silos

Data Warehouse Data Lake


Structured tables Vast amounts of RAW data
Logs, Texts, Audio, Video, Images
There is no need to have two disparate platforms

Highly reliable and efficient All of the data and very adaptable

Business SQL Incomplete Data Science Data


support for Streaming
Intelligence Analytics use cases & ML

Governance and Security


Incompatible Governance and Security
Table ACLs security and Files and Blobs
governance models

Open, reliable
data storage to
efficiently handle all
data types

Data Lake
Structured tables and unstructured files
This is the lakehouse paradigm
Technologies
Data Science Data
All ML, SQL, BI, Databricks SQL
& ML Streaming Databricks Workflows
and Streaming
Business SQL Databricks Machine Learning
use cases Structured Streaming
Intelligence Analytics

One security
and governance Governance and Security Unity Catalog
approach for all data Files, Blobs, and Table ACLs Fine-grained governance
for data and AI
assets on all clouds

Open, reliable
data storage to Delta Lake
efficiently handle all Data reliability and performance
data types

Data Lake
Structured tables and unstructured files
Databricks
Lakehouse Platform
Lakehouse Platform
Data Data Data Data Science
Warehousing Engineering Streaming and ML Simple
Unify your data warehousing and AI
use cases on a single platform
Unity Catalog
Fine-grained governance for data and AI

Delta Lake
Multicloud
Data reliability and performance One consistent data platform across clouds

Cloud Data Lake


All structured and unstructured data
Open
Built on open source and open standards
Lakehouse Platform

Data
Warehousing
Data
Engineering
Data
Streaming
Data Science
and ML Why do
Unity Catalog
people want
this
Fine-grained governance for data and AI

Delta Lake

lakehouse?
Data reliability and performance

Cloud Data Lake


All structured and unstructured data
Databricks thrives within your modern data stack
BI and Dashboards Machine Learning Data Science

Data Governance Data Data Data Data Science


Warehousing Engineering Streaming and ML

Data Pipelines
Unity Catalog

Delta Lake

Data Ingestion
Cloud Data Lake
Supporting enterprises in every industry
Healthcare & Life Sciences Retail & CPG Media & Entertainment Financial Services

Public Sector Manufacturing & Logistics Energy & Utilities Digital Native
The rest of the
industry has
taken notice
How do these
lakehouse offerings
stack up?
Lakehouse Benchmark
TPC-DS 3TB: Total run costs for external Parquet tables

$270.00
$243.19

$74.77
$63.95

$8.00

Databricks SQL CDW 1 CDW 2 CDW 3 CDW 4


Data Warehouse Price/Performance
TPC-DS 10TB: Load and run

$386
$52
Query

$258
$35
Query
$224
Auto- $150
$21 clustering Auto-
$14 Query
Auto- $76 clustering
clustering $110
$40 Load $73
Load Load

Databricks CDW 4 Enterprise CDW 4 Standard


Beyond Benchmarks
Performance and efficiency delivers business impact

Global Media Company Fortune 50 Retail


Unified subscriber data and Combined traditional supply Adoption of lakehouse has
streaming data to efficiently chain data and streaming democratized data across
build personalization machine data from IoT sensors to the enterprise and lowered
learning models accurately forecast fresh operational costs
food demand

$30M 10x 60%


reduction in compute costs faster time time to insight lower analytics infrastructure costs

$39M+ $100M 30%


from accelerated revenue and saved annually through reduced reduction in delivery times
better retention food waste
Our mission is
to democratize
data and AI
Lakehouse Platform

Our Data Data Data Data Science

destination
Warehousing Engineering Streaming and ML

is the Unity Catalog


Fine-grained governance for data and AI

lakehouse Delta Lake


Data reliability and performance

Cloud Data Lake


All structured and unstructured data

You might also like