Data Analysis With Databricks Version 2
Data Analysis With Databricks Version 2
with
Databricks SQL
Databricks Academy
2023
©2023 Databricks Inc. — All rights reserved 1
Meet your classmates
• Where is everyone joining us from today (city, country)?
Follow Along Demo: A Simple (but Quick) Query, Visualization, and Dashboard 15 min
DAWD 01-5
DAWD 02-1
Compute Compute
resources resources
Public Preview
Data lineage
End-to-end table &
column lineage
©2023 Databricks Inc. — All rights reserved
Data Lineage
Mapping the flow of data in the lakehouse
Auto-capture runtime
data lineage across all
languages
Leverage common
permission model from
Unity Catalog
• aka, Database
• Second level of organization
• Users can see all schemas where
USAGE is granted on both the
schema and the catalog
DAWD 01-5
Describe last-mile ETL workflows fully within the gold layer for
5
specific use cases.
Identify the gold layer as the most common layer for data analysts
6
using Databricks SQL.
Quality
Quality
Quality
Quality
DAWD 01-6
cluster
DAWD 01-5
INFRASTRUCTURE
ALL THE
DATA LAKE DATA WAREHOUSE DATA
©2023 Databricks Inc. — All rights reserved
Problems with Managing Infrastructure
Users Admins
Clusters
Cost
Need to reduce costs
Finance
Optimized Capacity
Databricks SQL Idle clusters removed
Serverless
...
10 minutes after last
Compute query
(configurable)
● Robust security
VPC/VNET
foundation - data
Databricks Serverless compute
isolation and
encryption
Customers
Customer
Account
Account
Customer Storage
©2023 Databricks Inc. — All rights reserved
Warehouse Configuration
AWS Azure
DAWD 01-5
DAWD 03-1
2 nts
ra
SQL ec kG Table ACL
Ch
SELECT 3
1 Lookup Location
*
FROM
Sales2020; Cluster or SQL 4 Hive
6 Warehouse Return path to table Metastore
Cluster filters s3://sales/sales2020
unauthorized data
5
Instance Profile /
Service Principal /
Service Account
Unity Catalog
(cross-workspace)
Managed
Data Source
Cluster or SQL
Warehouse
User Identity
Passthrough
Defined
Credentials External
Tables
Other
©2023 Databricks Inc. — All rights reserved
Existing Data
Databricks Unity Catalog
Audit
Unity Log Data (files on S3/ADLS/GCS)
Users
Catalog
table1 /dataset/pages/part-001
/dataset/pages/part-002
table2
/dataset/users/uk/part-001
view1 /dataset/users/uk/part-002
view2 /dataset/users/us/part-001
models
SQL Databases
view3
Policy pii
pii
iot_key
Credentials
External
Tables
DAWD 01-5
DAWD 01-5
DAWD 01-5
DAWD 01-5