0% found this document useful (0 votes)
244 views

Databricks 101

Uploaded by

Vijay K Liko
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views

Databricks 101

Uploaded by

Vijay K Liko
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DATABRICKS 101

Goal: Make Big Data Simple


AI has huge promise
Huge disruptive innovations are affecting most enterprises on the
planet

Hardest Part of AI isn’t AI, it’s Data

“Hidden Technical Debt in Machine Learning


Systems,” Google NIPS 2015

Databricks 101 2
Data & AI Technologies are in Silos

x
Great for Data, but not AI Great for AI, but not for data
Apache Spark: The First Unified Analytics Engine
Uniquely combines Data & AI technologies

Runtime
Delta
Spark Core Engine
Big Data Processing Machine Learning
ETL + SQL +Streaming MLlib + SparkR
Enterprises face challenges beyond Apache Spark

Disconnect
Engineers Scientists

Complex data pipelines and infrastructure

Unified Analytics Engine

PRESENTATION TITLE 5
Data & AI People are in Silos

DATA
x DATA
ENGINEERS SCIENTISTS

Databricks 101 6
DATABRICKS COLLABORATIVE WORKSPACE
Notebooks
Jobs Models
Apis
Dashboards

DATA ENGINEERS DATA SCIENTISTS

DATABRICKS RUNTIME
for Big Data for Machine Learning
Batch & Streaming
Data Lakes & Data Warehouses

DATABRICKS CLOUD SERVICE

Databricks 101 7
What is Databricks?

Databricks is a unified set of tools for


A fast, easy and collaborative
building, deploying, sharing, and
Apache® Spark™ based analytics
maintaining enterprise-grade data
platform optimized for Azure
solutions at scale.

The Databricks Lakehouse Platform


integrates with cloud storage and
Designed in collaboration with the
security in your cloud account, and
founders of Apache
manages and deploys cloud
infrastructure on your behalf.

Databricks 101 8
■ Databricks to process, store, clean, share, analyze,
model, and monetize their datasets with solutions from
BI to machine learning
■ The Databricks workspace provides a unified interface
and tools for most data tasks, including:

What is
• Data processing workflows scheduling and
management
• Working in SQL

Databricks •


Generating dashboards and visualizations
Data ingestion
Managing security, governance, and HA/DR

used for? •

Data discovery, annotation, and exploration
Compute management
• Machine learning (ML) modeling and tracking
• ML model serving
• Source control with Git

Databricks 101 9
■ Use cases on Databricks are as varied as the
data processed on the platform and the many
personas of employees that work with data as

What are a core part of their job.


– Build an enterprise data lakehouse

common use –

ETL and data engineering
Machine learning, AI, and data science

cases for –

Data warehousing, analytics, and BI
Data governance and secure data sharing

Databricks? –

DevOps, CI/CD, and task orchestration
Real-time and streaming analytics

PRESENTATION TITLE 10
DATABRICKS
ARCHITECTU
RE

Databricks 101 11
■ Databricks is the application of the Data
Lakehouse concept in a unified cloud-based
platform.
■ Databricks is positioned above the existing
data lake and can be connected with cloud-
based storage platforms like Google Cloud

DATABRICKS ■
Storage and AWS S3
Layers of Databricks Architecture
ARCHITECT • Delta Lake: Delta Lake is a Storage
Layer that helps Data Lakes be more

URE reliable.
• Delta Engine: The Delta Engine is a
query engine that is optimized for
efficiently processing data stored in the
Delta Lake.
• It also has other inbuilt tools that support
Data Science, BI Reporting, and MLOps.

PRESENTATION TITLE 12
DATABRICKS
LAKEHOUSE
PLATFORM

PRESENTATION TITLE 13
Traditional Solution

PRESENTATION TITLE 14
Databricks End-to-End Solution

PRESENTATION TITLE 15
Hands-On
Agenda For
Today

PRESENTATION TITLE 16

You might also like