0% found this document useful (0 votes)

23 views

Azure Databricks - An Introduction

Azure Databricks is a collaborative analytics platform powered by Apache Spark, designed for big data processing and integrated with Azure services. It offers features such as secure collaboration, fine-grained access control, and a unified workspace for data engineers, scientists, and analysts. The platform enhances productivity with one-click setup, serverless infrastructure, and optimized performance for large-scale data processing.

Uploaded by

Nguyễn Sơn

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Azure Databricks - An Introduction

Uploaded by

Nguyễn Sơn

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Azure Databricks

An Introduction

Bryan Cafferky
Technical Solutions Professional
B I G D ATA & A D V A N C E D A N A LY T I C S AT A G L A N C E

Ingest Store Prep & Train Model & Intelligence

Business Serve
apps Data Factory
(Data movement, pipelines & orchestration)

Collaboration
Cosmos
Portal DB
Predictive apps
Databricks
Kafka Blobs
10
HDInsight SQL
SQL Database
Custom
01
Data Lake
Data Lake
apps
Analytics

SQL Data
Operational reports
Warehouse
Event Hub Machine
IoT Hub Learning
Sensors ML Workbench Analysis
and Services
devices Analytical
dashboards
Azure Databricks
Powered by Apache Spark
APACHE SPARK
An unified, open source, parallel, data processing framework for Big Data Analytics

Spark SQL Spark MLlib Spark GraphX

Interactive Machine Streaming Graph
Spark Unifies: Computation
Queries Learning Stream processing
 Batch Processing
 Interactive SQL
 Real-time processing
 Machine Learning
Spark Core Engine
 Deep Learning
 Graph Processing Standalone
Yarn Mesos
Scheduler
Spark MLlib
Spark Structured Machine
Streaming Learning
Stream processing
DATA B R I C K S - C O M PA N Y OV E RV I E W

 Founded in late 2013

 By the creators of Apache Spark, original team
from UC Berkeley AMPLab
 Largest code contributor code to Apache Spark
 Level 2/3 support partnership with
• Hortonworks
• MapR
• DataStax
 Provides certifications such as Databricks
Certified Application, Databricks Certified
Distribution and Databricks Certified Developer
 Main Product: The Unified Analytics Platform
 In Oct 2017, introduced Databricks Delta
(currently in private preview).
A Z U R E DATA B R I C K S

 Azure Databricks is a first party service on Azure.

• Unlike with other clouds, it is not an Azure Marketplace or a
3rd party hosted service.
 Azure Databricks is integrated seamlessly with Azure
services:
• Azure Portal: Service an be launched directly from Azure
Portal
• Azure Storage Services: Directly access data in Azure Blob
Storage and Azure Data Lake Store
• Azure Active Directory: For user authentication, eliminating
the need to maintain two separate sets of uses in
Databricks and Azure. Microsoft Azure
• Azure SQL DW and Azure Cosmos DB: Enables you to
combine structured and unstructured data for analytics
• Apache Kafka for HDInsight: Enables you to use Kafka as a
streaming data source or sink
• Azure Billing: You get a single bill from Azure

• Azure Power BI: For rich data visualization

 Eliminates need to create a separate account with

Databricks.
A Z U R E DATA B R I C K S

Azure Databricks
Collaborative Workspace

Machine learning models

IoT / streaming data
DATA DATA BUSINESS
ENGINEER SCIENTIST ANALYST

Deploy Production Jobs & Workflows

BI tools
Cloud storage

MULTI-STAGE JOB SCHEDULER NOTIFICATION &

PIPELINES LOGS
Data warehouses
Optimized Databricks Runtime Engine Data exports

Hadoop storage
DATABRICKS APACHE SERVERLESS Rest APIs
I/O SPARK Data warehouses

Enhance Productivity Build on secure & trusted cloud Scale without limits
GENERAL SPARK CLUSTER ARCHITECTURE

Driver Program
SparkContext
 ‘Driver’ runs the user’s ‘main’ function and
executes the various parallel operations on
the worker nodes.
 The results of the operations are collected by Cluster Manager
the driver
 The worker nodes read and write data from/to Worker Node Worker Node Worker Node
Data Sources including HDFS.
 Worker node also cache transformed data in Cache Cache Cache
memory as RDDs (Resilient Data Sets).
Task Task Task
 Worker nodes and the Driver Node execute as
VMs in public clouds (AWS, Google and
Azure).

Data Sources (HDFS, SQL, NoSQL, …)

S E C U R E C O L L A BO RAT I O N
Azure Databricks enables secure collaboration between colleagues

• With Azure Databricks

colleagues can securely share
key artifacts such as Clusters,
Notebooks, Jobs and
Workspaces Fine Grained Permissions
• Secure collaboration is enabled
through a combination of:

Fine grained permissions:

Defines who can do what on which
artifacts (access control)
AAD-based User
Authentication
AAD-based authentication: Ensures
that users are actually who they
claim to be
A Z U R E DATA B R I C K S I N T E G RAT I O N W I T H A A D
Azure Databricks is integrated with AAD—so Azure Databricks users are just regular AAD
users

 There is no need to define users—and

their access control—separately in
Databricks.
 AAD users can be used directly in
Azure Databricks for all user-based
access control (Clusters, Jobs, Access Authentication
Notebooks etc.). Control

 Databricks has delegated user

Azure Databricks
authentication to AAD enabling single-
sign on (SSO) and unified
authentication.
 Notebooks, and their outputs, are
stored in the Databricks account.
However, AAD-based access-control
ensures that only authorized users
can access them.
DATA B R I C K S AC C E S S C O N T R O L
Access control can be defined at the user level via the Admin Console

Access Control can be defined for Workspaces, Clusters, Jobs and REST APIs

Workspace Access Defines who can who can view, edit, and run
Control notebooks in their workspace

Allows users to who can attach to, restart, and

manage (resize/delete) clusters.
Cluster Access
Databric Control
ks Allows Admins to specify which users have
Access permissions to create clusters
Control Allows owners of a job to control who can view job
Jobs Access Control
results or manage runs of a job (run now/cancel)

Allows users to use personal access tokens instead of

REST API Tokens
passwords to access the Databricks REST API
A Z U R E DATA B R I C K S C O R E A RT I FAC T S

Clusters

Libraries Workspac
es

Azure
Databrick
s
Jobs Notebook
s
Why Spark?

• Open-source data processing engine built around speed, ease of use, and
sophisticated analytics

• In memory engine that is up to 100 times faster than Hadoop

• Largest open-source data project with 1000+ contributors

• Highly extensible with support for Scala, Java and Python alongside Spark SQL,
GraphX, Streaming and Machine Learning Library (Mllib)
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized
for Azure

Best of Best of
Databricks Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workfl ows

Interactive workspace that enables collaboration between data scientists, data engineers, and
business analysts.

Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SL As)

Differentiated experience on Azure
ENHANCE BUILD ON THE MOST COMPLIANT CLOUD
SCALE WITHOUT LIMITS
PRODUCTIVITY
Get started quickly by Simplify security and identity Operate at massive scale
launching your new Spark control with built-in integration with without limits globally.
environment with one click. Active Directory.

Share your insights in Accelerate data

Regulate access with fine-grained
powerful ways through rich processing with the fastest
user permissions to Azure
integration with Power BI. Spark engine.
Databricks’ notebooks, clusters, jobs
and data.
Improve collaboration
amongst your analytics team
through a unified workspace. Build with confidence on the
trusted cloud backed by
Innovate faster with native unmatched support, compliance and
integration with rest of Azure SLAs.
platform
Collaborative Workspace
GET STARTED IN SECONDS
Single click to launch your new Spark environment
Azure Databricks
INTERACTIVE EXPLORATION Collaborative Workspace
Explore data using interactive notebooks with support
for multiple programming languages including R,
Python, Scala, and SQL DATA DATA BUSINESS
ENGINEER SCIENTIST ANALYST

COLLABORATION Deploy Production Jobs &

Workflows
Work on the same notebook in real-time while
tracking changes with detailed revision history,
GitHub, or Bitbucket MULTI-STAGE JOB SCHEDULER NOTIFICATION &
PIPELINES LOGS

VISUALIZATIONS Optimized Databricks Runtime Engine

Visualize insights through a wide assortment of point-
and-click visualizations. Or use powerful scriptable
options like matplotlib, ggplot, and D3 DATABRICKS APACHE SERVERLESS Rest APIs
I/O SPARK

DASHBOARDS
Rich integration with PowerBI to discover and share
your insights in powerful new ways
Deploy Production Jobs & Workflows
JOBS SCHEDULER
Execute jobs for production pipelines on a specific
schedule Azure Databricks
Collaborative Workspace
NOTEBOOK WORKFLOWS
Create multi-stage pipelines with the control
structures of the source programming language DATA DATA BUSINESS
ENGINEER SCIENTIST ANALYST

RUN NOTEBOOKS AS JOBS Deploy Production Jobs &

Workflows
Turn notebooks or JARs into resilient Spark jobs with a
click or an API call
MULTI-STAGE JOB SCHEDULER NOTIFICATION &
NOTIFICATIONS AND LOGS PIPELINES LOGS

Set up alerts and quickly access audit logs for easy Optimized Databricks Runtime Engine
monitoring and troubleshooting

INTEGRATE NATIVELY WITH AZURE SERVICES DATABRICKS APACHE SERVERLESS Rest APIs
I/O SPARK
Deep integration with Azure SQL Data Warehouse,
Cosmos DB, Azure Data Lake Store, Azure Blob
Storage, and Azure Event Hub
Optimized Databricks Runtime Engine
OPTIMIZED I/O PERFORMANCE
The Databricks I/O module (DBIO) takes processing
speeds to the next level — significantly improving the Azure Databricks
performance of Spark in the cloud Collaborative Workspace

FULLY-MANAGED PLATFORM ON AZURE

Reap the benefits of a fully managed service and DATA DATA BUSINESS
ENGINEER ANALYST
remove the complexity of big data and machine SCIENTIST

learning Deploy Production Jobs &

Workflows
SERVERLESS INFRASTRUCTURE
Databricks’ serverless and highly elastic cloud service MULTI-STAGE JOB SCHEDULER NOTIFICATION &
is designed to remove operational complexity while PIPELINES LOGS

ensuring reliability and cost efficiency at scale Optimized Databricks Runtime Engine

OPERATE AT MASSIVE SCALE

Without limits globally DATABRICKS APACHE SERVERLESS Rest APIs
I/O SPARK
Advanced Analytics on Big Data

Ingest Store Prep & Train Model & Intelligence

Serve

Logs, files and

media
(unstructured)
Data factory Azure Azure Databricks Azure Cosmos DB Web & mobile apps
storage (Spark Mllib,
SparkR, SparklyR)

Business / custom Polybas

apps Data factory e
(Structured) Azure SQL Data
Analytical
Warehouse
dashboards
Spark Context
Demo
Pricing

Launch Stage Start Date DBU VM Pricing

Pricing
Gated public preview 11/15 50% 100%
Ungated public preview TBD (~January 50% 100%
2018)
GA TBD (~ March 100% 100%
2018 )
Roadmap Add-ons to public preview
• .NET integration
• GPU clusters
End-to-end experience
• Deep integration with SQL
• Create first-party DW
Databricks workspace and
cluster from Azure portal • Easy data import UI for
Blob Store & Azure Data
• Enterprise-grade Lake
security with AAD
authentication & Single Add-ons to Private • Audit logs, Spark logs to
sign-On Preview storage, log history, log
encryption at rest
• Integration with External • Billing
stores like Blob Store, • Event Hub integration Add-ons to general
• Clusters – Tagging, Disk
ADLS, SQL DW, Cosmos Storage, SSH access • ISO 27001, stretch goal - availability • OMS for service
DB, HDI Kafka SOC2 & HIPAA monitoring
• DBFS mount points • Deep integration with
• Power BI Integration via • Reserved Instances & PowerBI • Azure ML integration
JDBC/ODBC endpoint • Jobs – email alerts
commit pricing • ADF Integration (TBD) • All Azure regions
• REST APIs • Free community edition • PCI & other certifications
• 3 regions • 8 Regions

Ungated General availability GA+ GA++

Private Preview Gated Public 2018
Public H1 - 20 18 2018-19
Oct 2017 Preview
Nov 2017 Preview
Connect () Q1 2018
3 Regions 8 Regions
• West US • West US
• East US 2 • East US
• West
Europe • Central US
• West Europe
• North
Europe
Provided by Microsoft and Databricks under NDA • West US 2
How to get started
How to get started
Sign up for preview at
http://databricks.azurewebsites.net

Engage Microsoft experts for a workshop to help

identify high impact scenarios

Learn more about Azure Databricks

www.azure.com/databricks
Appendix
Help All Along the Way

Quick Start
Documentat
ion
Azure Databricks – workspace home page
Azure Databricks – service home page
Azure Databricks – creating a workspace
Azure Databricks – workspace
deployment
Important Techical
Details
CLUSTERS

 Azure Databricks clusters are the set of Azure Linux

VMs that host the Spark Worker and Driver Nodes
 Your Spark application code (i.e. Jobs) runs on the
provisioned clusters.
 Azure Databricks clusters are launched in your
subscription—but are managed through the Azure
Databricks portal.
 Azure Databricks provides a comprehensive set of
graphical wizards to manage the complete lifecycle of
clusters—from creation to termination.
C LU S T E R C R E AT I O N

 You can create two types of clusters –

Standard and Serverless Pool (see next
slide)
 While creating a cluster you can specify:
• Number of nodes
• Autoscaling and Auto Termination policy
• Auto Termination policy
• Spark Configuration details
• The Azure VM instance types for the
Driver and Worker Nodes

Graphical wizard in the Azure Databricks portal to create a Standard Cluster

CLUSTERS: AUTO SCALING AND AUTO
T E R M I N AT I O N
Simplifies cluster management and reduces costs by eliminating wastage

When creating Azure Databricks clusters you can

choose Autoscaling and Auto Termination options.

Autoscaling: Just specify the min and max number of

clusters. Azure Databricks automatically scales up or
down based on load.

Auto Termination: After the specified minutes of

inactivity the cluster is automatically terminated.
Benefits:
 You do not have to guess, or determine by trial and error,
the correct number of nodes for the cluster
 As the workload changes you do not have to manually
tweak the number of nodes
 You do not have to worry about wasting resources when the
cluster is idle. You only pay for resource when they are
actually being used
 You do not have to wait and watch for jobs to complete just
so you can shutdown the clusters
SERVERLESS POOL (BETA)
A self-managed pool of cloud resources, auto-configured for interactive Spark workloads

 You specify only the minimum and

maximum number of nodes in the cluster—
Azure Databricks provisions and adjusts the
compute and local storage based on your
usage.
 Limitation: Currently works only for SQL and
Python.
• Benefits of Serverless Pool
 Databricks chooses the best configuration for Spark to get the best performance
Auto-  Users don’t need to worry about providing any of the Databricks runtime version or Spark
Configuration configuration.
 Databricks also chooses the best cluster parameters to save cost on infrastructure
Elasticity  Automatically scales the compute and local storage, independently, based on usage

 Offers maximum resource utilization and minimum query latencies

Fine grained • Preemption: Databricks proactively preempts Spark tasks from over-committed users to ensure all users get their
fair share of cluster time and their jobs complete in a timely manner even when contending with dozens of other
Sharing
users. Uses the “Task Preemption for High Concurrency” feature of Spark in Databricks.
• Fault isolation: Databricks sandboxes the environments belonging to different notebooks from one another.
CLUSTER ACCESS CONTROL
• There are two configurable types of permissions for Cluster Access Control:
• Individual Cluster Permissions - This controls a user’s ability to attach notebooks to a cluster, as well as to
restart/resize/terminate/start clusters.
• Cluster Creation Permissions - This controls a user’s ability to create clusters

• Individual permissions can be configured on the Clusters Page

by clicking on Permissions under the ‘More Actions’ icon of an
existing cluster
• There are 4 different individual cluster permission levels: No
Permissions, Can Attach To, Can Restart, and Can Manage.
Abilities No Permissions Can Attach To Can Restart Can Manage
Privileges are shown below
Attach notebooks to
x x x
cluster
Tom Smith
View Spark UI x x x ([email protected])

View cluster metrics

x x x
(Ganglia)

Terminate cluster x x

Start cluster x x

Restart cluster x x

Resize cluster x

Modify permissions x

09 - Azure Data Engineering Cheatsheet
No ratings yet
09 - Azure Data Engineering Cheatsheet
37 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Azure Databricks Documentation
No ratings yet
Azure Databricks Documentation
7,197 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
Azure Databricks Documentation
No ratings yet
Azure Databricks Documentation
32 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
AutomotiveSPICE Introduction
No ratings yet
AutomotiveSPICE Introduction
9 pages
Azure Databricks An Introduction
No ratings yet
Azure Databricks An Introduction
54 pages
Azure Databricks Brief Introduction
No ratings yet
Azure Databricks Brief Introduction
40 pages
Azure Databricks Overview
100% (1)
Azure Databricks Overview
4 pages
Azure Databricks - An Introduction 2019 Roadshow
No ratings yet
Azure Databricks - An Introduction 2019 Roadshow
13 pages
Databricks 2
No ratings yet
Databricks 2
22 pages
Course Notes
No ratings yet
Course Notes
11 pages
Azuredatabricks New
No ratings yet
Azuredatabricks New
22 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
AZURE DATA BRICKS
No ratings yet
AZURE DATA BRICKS
8 pages
Azure Databricks Mastery
No ratings yet
Azure Databricks Mastery
95 pages
Databricks
No ratings yet
Databricks
36 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Data Bricks - BDCS
No ratings yet
Data Bricks - BDCS
6 pages
Databricks Guide
No ratings yet
Databricks Guide
27 pages
DataBricks_Note_free__1736678274
No ratings yet
DataBricks_Note_free__1736678274
87 pages
Aniruddha BigDataandAnalytics
No ratings yet
Aniruddha BigDataandAnalytics
33 pages
Databricks+Course+Deck
No ratings yet
Databricks+Course+Deck
134 pages
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
No ratings yet
Databricks, An Introduction: Chuck Connell, Insight Digital Innovation
36 pages
Azure Databricks
No ratings yet
Azure Databricks
12 pages
Data Bricks
No ratings yet
Data Bricks
115 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Databricks Guide
No ratings yet
Databricks Guide
31 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Real-Time Analytics With Azure Databricks
No ratings yet
Real-Time Analytics With Azure Databricks
11 pages
SDC - Synapse Analytics
No ratings yet
SDC - Synapse Analytics
23 pages
databricks
No ratings yet
databricks
131 pages
1.spark
No ratings yet
1.spark
2 pages
Azure Databricks Course Slide Deck V4
100% (4)
Azure Databricks Course Slide Deck V4
308 pages
004 Azure Databricks Course Slide Deck V3
0% (1)
004 Azure Databricks Course Slide Deck V3
219 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Azure Databricks Interview
No ratings yet
Azure Databricks Interview
4 pages
Azure Data Engineering - Pragathi
No ratings yet
Azure Data Engineering - Pragathi
4 pages
Reference Architecture Databricks on Azure
No ratings yet
Reference Architecture Databricks on Azure
1 page
DB For Data Engineering Solution Sheet
No ratings yet
DB For Data Engineering Solution Sheet
2 pages
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
No ratings yet
Azure Data Factory Vs Databricks - 4 Key Differences - Hevo
14 pages
Data Intelligence With Azure Databricks - Virtual 22 - 02 - 2024
No ratings yet
Data Intelligence With Azure Databricks - Virtual 22 - 02 - 2024
32 pages
Data Platform and Analytics Foundational Training: (Speaker Name)
No ratings yet
Data Platform and Analytics Foundational Training: (Speaker Name)
14 pages
ETL Azure
No ratings yet
ETL Azure
12 pages
Spark Summit: June 2014
No ratings yet
Spark Summit: June 2014
32 pages
DP 900t00a Enu Powerpoint 04
No ratings yet
DP 900t00a Enu Powerpoint 04
23 pages
Data Analyst Azure PowerBI Syllabus (1)
No ratings yet
Data Analyst Azure PowerBI Syllabus (1)
35 pages
Databricks Lab 1
100% (3)
Databricks Lab 1
7 pages
Azure DataBricks Interview Questions
No ratings yet
Azure DataBricks Interview Questions
17 pages
Azure-Databricks-Virtual-Workshop-21-Apr - FINAL PDF
No ratings yet
Azure-Databricks-Virtual-Workshop-21-Apr - FINAL PDF
43 pages
Apache Spark Analytics Made Simple
No ratings yet
Apache Spark Analytics Made Simple
76 pages
dp-203 Notes1
No ratings yet
dp-203 Notes1
12 pages
Ultimate Big Data Masters Program Curriculum v1
No ratings yet
Ultimate Big Data Masters Program Curriculum v1
14 pages
WinWire-Hadoop-to-Databricks-Migration
No ratings yet
WinWire-Hadoop-to-Databricks-Migration
14 pages
DP 203T00A ENU AssessmentGuide
No ratings yet
DP 203T00A ENU AssessmentGuide
13 pages
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
3 pages
Azure Data Platform End2End - 2day
100% (2)
Azure Data Platform End2End - 2day
108 pages
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Pega 8.6 Cssa
No ratings yet
Pega 8.6 Cssa
31 pages
Icstation Com Esp8266 Wifi Channel Relay Module Rewitch Wireless Transmitter Smart Home P 13420 HTML
No ratings yet
Icstation Com Esp8266 Wifi Channel Relay Module Rewitch Wireless Transmitter Smart Home P 13420 HTML
1 page
ESoftTools Free NSF To PST File Converter Tool
No ratings yet
ESoftTools Free NSF To PST File Converter Tool
13 pages
Fixing Bluetooth Driver Installation Via Bootca... - Apple Support Communiti
No ratings yet
Fixing Bluetooth Driver Installation Via Bootca... - Apple Support Communiti
3 pages
v1r0 Sine-Cosine Interface Box Wiring Instructions PDF
No ratings yet
v1r0 Sine-Cosine Interface Box Wiring Instructions PDF
12 pages
Practical Boat Building For Amateurs Con
No ratings yet
Practical Boat Building For Amateurs Con
145 pages
BC 700 PB Manual
No ratings yet
BC 700 PB Manual
60 pages
Wonderware Read and Write Excel
No ratings yet
Wonderware Read and Write Excel
22 pages
S7-300 Module Specification
No ratings yet
S7-300 Module Specification
564 pages
Advanced Junos Enterprise Switching: High-Level Lab Guide
No ratings yet
Advanced Junos Enterprise Switching: High-Level Lab Guide
70 pages
EIR Practical 3
No ratings yet
EIR Practical 3
4 pages
5 Ways To Fix White Screen On Mac - MacBook Air, Pro, Mini
No ratings yet
5 Ways To Fix White Screen On Mac - MacBook Air, Pro, Mini
3 pages
Online CSE Book Store Management System Visvesvaraya Technology University
No ratings yet
Online CSE Book Store Management System Visvesvaraya Technology University
35 pages
Computer Servicing - Module 2
No ratings yet
Computer Servicing - Module 2
13 pages
error
No ratings yet
error
6 pages
Aditya Bhatia: Technical Skills
No ratings yet
Aditya Bhatia: Technical Skills
2 pages
How To Reset The Admin Password NOC
No ratings yet
How To Reset The Admin Password NOC
2 pages
Raghav Singh Resume - 2021
No ratings yet
Raghav Singh Resume - 2021
2 pages
Instant download The SketchUp Workflow for Architecture Modeling Buildings Visualizing Design and Creating Construction Documents with SketchUp Pro and LayOut Second Edition Michael Brightman pdf all chapter
No ratings yet
Instant download The SketchUp Workflow for Architecture Modeling Buildings Visualizing Design and Creating Construction Documents with SketchUp Pro and LayOut Second Edition Michael Brightman pdf all chapter
47 pages
Magic Quadrant For Network Firewalls, 2021
No ratings yet
Magic Quadrant For Network Firewalls, 2021
41 pages
Product Data Sheet pd620 4 Channel Digital Io Aperio en 60476
No ratings yet
Product Data Sheet pd620 4 Channel Digital Io Aperio en 60476
6 pages
SAP Business One Road Map
No ratings yet
SAP Business One Road Map
43 pages
Experiment #8 Serial Communication Using The Asynchronous Communications Interface Adapter (Acia)
No ratings yet
Experiment #8 Serial Communication Using The Asynchronous Communications Interface Adapter (Acia)
29 pages
Chapter 1 Hardware
No ratings yet
Chapter 1 Hardware
27 pages
The Immobilizer System
No ratings yet
The Immobilizer System
22 pages
Black Book
No ratings yet
Black Book
51 pages
DATABASE MANAGEMENT SYSTEMS
No ratings yet
DATABASE MANAGEMENT SYSTEMS
1 page
Elastic Run 2018
No ratings yet
Elastic Run 2018
7 pages
De Report Chatbot
No ratings yet
De Report Chatbot
31 pages

Azure Databricks - An Introduction

Uploaded by

Azure Databricks - An Introduction

Uploaded by

Azure Databricks

Ingest Store Prep & Train Model & Intelligence

Spark SQL Spark MLlib Spark GraphX

 Founded in late 2013

 Azure Databricks is a first party service on Azure.

• Azure Power BI: For rich data visualization

 Eliminates need to create a separate account with

Machine learning models

Deploy Production Jobs & Workflows

MULTI-STAGE JOB SCHEDULER NOTIFICATION &

Data Sources (HDFS, SQL, NoSQL, …)

• With Azure Databricks

Fine grained permissions:

 There is no need to define users—and

 Databricks has delegated user

Allows users to who can attach to, restart, and

Allows users to use personal access tokens instead of

• In memory engine that is up to 100 times faster than Hadoop

• Largest open-source data project with 1000+ contributors

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workfl ows

Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SL As)

Share your insights in Accelerate data

COLLABORATION Deploy Production Jobs &

VISUALIZATIONS Optimized Databricks Runtime Engine

RUN NOTEBOOKS AS JOBS Deploy Production Jobs &

FULLY-MANAGED PLATFORM ON AZURE

learning Deploy Production Jobs &

OPERATE AT MASSIVE SCALE

Ingest Store Prep & Train Model & Intelligence

Logs, files and

Business / custom Polybas

Launch Stage Start Date DBU VM Pricing

Ungated General availability GA+ GA++

Engage Microsoft experts for a workshop to help

Learn more about Azure Databricks

 Azure Databricks clusters are the set of Azure Linux

 You can create two types of clusters –

Graphical wizard in the Azure Databricks portal to create a Standard Cluster

When creating Azure Databricks clusters you can

Autoscaling: Just specify the min and max number of

Auto Termination: After the specified minutes of

 You specify only the minimum and

 Offers maximum resource utilization and minimum query latencies

• Individual permissions can be configured on the Clusters Page

View cluster metrics

You might also like