Databricks 101
Databricks 101
Databricks 101 2
Data & AI Technologies are in Silos
x
Great for Data, but not AI Great for AI, but not for data
Apache Spark: The First Unified Analytics Engine
Uniquely combines Data & AI technologies
Runtime
Delta
Spark Core Engine
Big Data Processing Machine Learning
ETL + SQL +Streaming MLlib + SparkR
Enterprises face challenges beyond Apache Spark
Disconnect
Engineers Scientists
PRESENTATION TITLE 5
Data & AI People are in Silos
DATA
x DATA
ENGINEERS SCIENTISTS
Databricks 101 6
DATABRICKS COLLABORATIVE WORKSPACE
Notebooks
Jobs Models
Apis
Dashboards
DATABRICKS RUNTIME
for Big Data for Machine Learning
Batch & Streaming
Data Lakes & Data Warehouses
Databricks 101 7
What is Databricks?
Databricks 101 8
■ Databricks to process, store, clean, share, analyze,
model, and monetize their datasets with solutions from
BI to machine learning
■ The Databricks workspace provides a unified interface
and tools for most data tasks, including:
What is
• Data processing workflows scheduling and
management
• Working in SQL
Databricks •
•
•
Generating dashboards and visualizations
Data ingestion
Managing security, governance, and HA/DR
used for? •
•
Data discovery, annotation, and exploration
Compute management
• Machine learning (ML) modeling and tracking
• ML model serving
• Source control with Git
Databricks 101 9
■ Use cases on Databricks are as varied as the
data processed on the platform and the many
personas of employees that work with data as
common use –
–
ETL and data engineering
Machine learning, AI, and data science
cases for –
–
Data warehousing, analytics, and BI
Data governance and secure data sharing
Databricks? –
–
DevOps, CI/CD, and task orchestration
Real-time and streaming analytics
PRESENTATION TITLE 10
DATABRICKS
ARCHITECTU
RE
Databricks 101 11
■ Databricks is the application of the Data
Lakehouse concept in a unified cloud-based
platform.
■ Databricks is positioned above the existing
data lake and can be connected with cloud-
based storage platforms like Google Cloud
DATABRICKS ■
Storage and AWS S3
Layers of Databricks Architecture
ARCHITECT • Delta Lake: Delta Lake is a Storage
Layer that helps Data Lakes be more
URE reliable.
• Delta Engine: The Delta Engine is a
query engine that is optimized for
efficiently processing data stored in the
Delta Lake.
• It also has other inbuilt tools that support
Data Science, BI Reporting, and MLOps.
PRESENTATION TITLE 12
DATABRICKS
LAKEHOUSE
PLATFORM
PRESENTATION TITLE 13
Traditional Solution
PRESENTATION TITLE 14
Databricks End-to-End Solution
PRESENTATION TITLE 15
Hands-On
Agenda For
Today
PRESENTATION TITLE 16