SlideShare a Scribd company logo
DevOps for Databricks
Anna-Maria Wykes
Data Engineering Consultant
Agenda
§ What is DevOps
§ CI/CD (Continuous
Integration/Continuous
Deployment)
§ IAC (Infrastructure as Code)
§ Build Agents
§ Databricks Rest API
§ Real World Example
§ Other Tooling Examples
What is DevOps?
BI
Developer
Data
Scientist
Software
Engineer
Data
Engineer
“I want to get my
dashboard
published on the
website”
“I want to
productionize my
models and have
them automatically
update”
“I want to update
the website with the
latest dashboard”
“I want to push the
latest ETL pipelines
to production”
DevOps
DevOps
DevOps Pipelines
Development Test Production
DevOps Tools
Continuous Integration/Continuous Deployment
(CI/CD)
Infrastructure as Code (IAC)
ARM
(Azure Resource Manager)
Templates
Azure Bicep
Continuous Integration & Continuous Deployment
(CI/CD)
CI/CD
• Continues Improvements
• Feature releases
• Fast bug fixes
• Ability to quickly rollback
• Testing
• Unit/Integration/End to End testing
• Linting (check code is formatted correctly)
Infrastructure as Code (IAC)
IAC: The Blueprint of your Solution
Build Agents
What is a Build Agent?
• It is the compute under your DevOps Pipelines
• “Out of the box” available Agents in Azure DevOps
• Custom VM Agent
• Custom Docker Container Agent
Pipeline (yml) triggered Agent found/located
Pipeline executed on
Agent
Why a Custom Build Agent?
• You can decide specifically what you want your code you
run on (what Linux/Windows version/docker image)
• Make sure all the tools you need are installed on your
Agent
• Keep state
• Run within a VNet (Virtual Network)
Databricks Rest API
Why the Databricks REST API?
• Can use your existing knowledge of REST
• Can incorporate into our language of choice (Python)
• Cross Platform
https://docs.databricks.com/dev-tools/api/latest/index.html
Real World Example
What are we going to do?
• Use Python Scripts and Databricks Rest API to:
• Create a Databricks Cluster
• Check Cluster Status
• Upload Notebooks to Databricks Workspace
• Run some tests against our Python code
• Build and upload a Python Wheel to Databricks
• Install/uninstall/update Python Wheel in Databricks
• Use Azure DevOps to run our scripts
• YML Pipelines
• Custom DevOps Agent
Create a Databricks Cluster
Live demo using VSCode and Databricks in Azure
Check Cluster Status
Live demo using VSCode and Databricks in Azure
Live demo using VSCode and Databricks in Azure
Upload Notebooks to Databricks Workspace
Run some tests against our Python code
Live demo using VSCode and Databricks in Azure
Live demo using VSCode and Databricks in Azure
Build and upload a Python Wheel to Databricks
Live demo using VSCode and Databricks in Azure
Install/uninstall/update Python Wheel in Databricks
Introduction to Azure DevOps
• How to Create a Pipeline
• How to Run a Pipeline
• How to use a Custom Agent
How to Create and Run a DevOps Pipeline
Live demo using Azure DevOps
Adding our Python Scripts to a Pipeline
Live demo using Azure DevOps and Databricks in Azure
Using a Custom DevOps Agent
Live demo using Azure DevOps and Databricks in Azure
Examples of other DevOps IAC tools
Azure ARM templates
Azure Bicep
Other IAC Tools
What is Terraform?
Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage
hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
Write Plan Apply
Terraform for Databricks
Terraform for Databricks
https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs
Build, deploy, and manage modern cloud applications and infrastructure using familiar languages, tools, and engineering
practices.
https://github.com/pulumi/pulumi-azure
What is Pulumi?
https://www.pulumi.com/docs/reference/pkg/azure/databricks/
Cloud Engineering for Everyone
Pulumi azure.databricks Module
Based on the azurerm Terraform Provider.
Creating a Workspace Resource using Pulumi
What is Bicep?
Write Apply
• Project Bicep – Next Generation ARM Templates
• ARM Templates can get complex
• Bicep is a cleaner more readable language, that gets compiled into ARM to
deploy (a language around ARM)
Write and Compile Bicep
Language
ARM Templates Azure Resource Manager
Deployed Solution
Summary
Summary
• DevOps is for Everyone
• CI/CD keeps your code in check and the latest
features/changes in production as soon as possible
• IAC is the blueprint of your solution
• Lots of tooling options
• Databricks Rest API can be used in conjunction with Python
and Azure DevOps to create effective fault tolerant pipelines
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot (20)

PDF
Oracle db performance tuning
Simon Huang
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PPTX
MySQL_MariaDB-성능개선-202201.pptx
NeoClova
 
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
PPTX
Introduction to Azure Databricks
James Serra
 
PDF
Achieving Lakehouse Models with Spark 3.0
Databricks
 
PPTX
Azure Synapse Analytics Overview (r1)
James Serra
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Tanel Poder - Scripts and Tools short
Tanel Poder
 
PPSX
Domain Driven Design
Araf Karsh Hamid
 
PPTX
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
PDF
Introduction to IAC and Terraform
Venkat NaveenKashyap Devulapally
 
PDF
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
PDF
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
PPSX
Microservices Architecture - Cloud Native Apps
Araf Karsh Hamid
 
Oracle db performance tuning
Simon Huang
 
Delta lake and the delta architecture
Adam Doyle
 
Free Training: How to Build a Lakehouse
Databricks
 
Parquet performance tuning: the missing guide
Ryan Blue
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Apache Spark Architecture
Alexey Grishchenko
 
MySQL_MariaDB-성능개선-202201.pptx
NeoClova
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Introduction to Azure Databricks
James Serra
 
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Azure Synapse Analytics Overview (r1)
James Serra
 
Making Apache Spark Better with Delta Lake
Databricks
 
Tanel Poder - Scripts and Tools short
Tanel Poder
 
Domain Driven Design
Araf Karsh Hamid
 
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Introduction to IAC and Terraform
Venkat NaveenKashyap Devulapally
 
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
Microservices Architecture - Cloud Native Apps
Araf Karsh Hamid
 

Similar to DevOps for Databricks (20)

PDF
Azure DevOps Day - Kochi
Amal Dev
 
PDF
Azure DevOps - Azure Guatemala Meetup
Guillermo Zepeda Selman
 
PDF
Azure DevOps Day - Trivandrum
Amal Dev
 
PPTX
Azure DevOps
Michael Jesse
 
PPTX
Power of Azure Devops
Azure Riyadh User Group
 
PDF
DevOps and BigData Analytics
sbbabu
 
PPTX
Azure DevOps
Juan Fabian
 
PPTX
Drive business outcomes using Azure Devops
Belatrix Software
 
PDF
Innovative DevOps Project Ideas for Students to Practice with Industry.pdf
rose
 
PDF
[JAZUG Tohoku Azure DevOps] Azure DevOps
Naoki (Neo) SATO
 
PDF
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
PPTX
Azure dev ops
Swaminathan Vetri
 
PDF
Azure Infrastructure as Code: With ARM templates and Bicep 1st Edition Henry ...
chaberbuechi
 
PDF
DOO-007_How to run containers in production, at scale!
decode2016
 
PDF
Azure Infrastructure as Code: With ARM templates and Bicep 1st Edition Henry ...
enokikulla1u
 
PPTX
Azure dev ops
Vishwas N
 
PPTX
Tooling and DevOps for the Hybrid Cloud with Azure and Azure Stack
Microsoft Tech Community
 
PPTX
Azure DevOps for QA Automation
Evgeny Popovich
 
PPTX
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Janusz Nowak
 
PDF
DevOps para Open Source com Azure DevOps
Emmanuel Gomes Brandão
 
Azure DevOps Day - Kochi
Amal Dev
 
Azure DevOps - Azure Guatemala Meetup
Guillermo Zepeda Selman
 
Azure DevOps Day - Trivandrum
Amal Dev
 
Azure DevOps
Michael Jesse
 
Power of Azure Devops
Azure Riyadh User Group
 
DevOps and BigData Analytics
sbbabu
 
Azure DevOps
Juan Fabian
 
Drive business outcomes using Azure Devops
Belatrix Software
 
Innovative DevOps Project Ideas for Students to Practice with Industry.pdf
rose
 
[JAZUG Tohoku Azure DevOps] Azure DevOps
Naoki (Neo) SATO
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Azure dev ops
Swaminathan Vetri
 
Azure Infrastructure as Code: With ARM templates and Bicep 1st Edition Henry ...
chaberbuechi
 
DOO-007_How to run containers in production, at scale!
decode2016
 
Azure Infrastructure as Code: With ARM templates and Bicep 1st Edition Henry ...
enokikulla1u
 
Azure dev ops
Vishwas N
 
Tooling and DevOps for the Hybrid Cloud with Azure and Azure Stack
Microsoft Tech Community
 
Azure DevOps for QA Automation
Evgeny Popovich
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Janusz Nowak
 
DevOps para Open Source com Azure DevOps
Emmanuel Gomes Brandão
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
PDF
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PPTX
Comparative Study of ML Techniques for RealTime Fraud Detection System.pptx
Debolina Ghosh
 
PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PDF
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
PPTX
microservices-with-container-apps-dapr.pptx
vjay22
 
PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
Comparative Study of ML Techniques for RealTime Fraud Detection System.pptx
Debolina Ghosh
 
Kafka Use Cases Real-World Applications
Accentfuture
 
big data eco system fundamentals of data science
arivukarasi
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
Krezentios memories in college data.pptx
notknown9
 
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
microservices-with-container-apps-dapr.pptx
vjay22
 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
covid 19 data analysis updates in our municipality
RhuAyungon1
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 

DevOps for Databricks

  • 1. DevOps for Databricks Anna-Maria Wykes Data Engineering Consultant
  • 2. Agenda § What is DevOps § CI/CD (Continuous Integration/Continuous Deployment) § IAC (Infrastructure as Code) § Build Agents § Databricks Rest API § Real World Example § Other Tooling Examples
  • 4. BI Developer Data Scientist Software Engineer Data Engineer “I want to get my dashboard published on the website” “I want to productionize my models and have them automatically update” “I want to update the website with the latest dashboard” “I want to push the latest ETL pipelines to production” DevOps
  • 7. DevOps Tools Continuous Integration/Continuous Deployment (CI/CD) Infrastructure as Code (IAC) ARM (Azure Resource Manager) Templates Azure Bicep
  • 8. Continuous Integration & Continuous Deployment (CI/CD)
  • 9. CI/CD • Continues Improvements • Feature releases • Fast bug fixes • Ability to quickly rollback • Testing • Unit/Integration/End to End testing • Linting (check code is formatted correctly)
  • 11. IAC: The Blueprint of your Solution
  • 13. What is a Build Agent? • It is the compute under your DevOps Pipelines • “Out of the box” available Agents in Azure DevOps • Custom VM Agent • Custom Docker Container Agent Pipeline (yml) triggered Agent found/located Pipeline executed on Agent
  • 14. Why a Custom Build Agent? • You can decide specifically what you want your code you run on (what Linux/Windows version/docker image) • Make sure all the tools you need are installed on your Agent • Keep state • Run within a VNet (Virtual Network)
  • 16. Why the Databricks REST API? • Can use your existing knowledge of REST • Can incorporate into our language of choice (Python) • Cross Platform https://docs.databricks.com/dev-tools/api/latest/index.html
  • 18. What are we going to do? • Use Python Scripts and Databricks Rest API to: • Create a Databricks Cluster • Check Cluster Status • Upload Notebooks to Databricks Workspace • Run some tests against our Python code • Build and upload a Python Wheel to Databricks • Install/uninstall/update Python Wheel in Databricks • Use Azure DevOps to run our scripts • YML Pipelines • Custom DevOps Agent
  • 19. Create a Databricks Cluster Live demo using VSCode and Databricks in Azure
  • 20. Check Cluster Status Live demo using VSCode and Databricks in Azure
  • 21. Live demo using VSCode and Databricks in Azure Upload Notebooks to Databricks Workspace
  • 22. Run some tests against our Python code Live demo using VSCode and Databricks in Azure
  • 23. Live demo using VSCode and Databricks in Azure Build and upload a Python Wheel to Databricks
  • 24. Live demo using VSCode and Databricks in Azure Install/uninstall/update Python Wheel in Databricks
  • 25. Introduction to Azure DevOps • How to Create a Pipeline • How to Run a Pipeline • How to use a Custom Agent
  • 26. How to Create and Run a DevOps Pipeline Live demo using Azure DevOps
  • 27. Adding our Python Scripts to a Pipeline Live demo using Azure DevOps and Databricks in Azure
  • 28. Using a Custom DevOps Agent Live demo using Azure DevOps and Databricks in Azure
  • 29. Examples of other DevOps IAC tools
  • 30. Azure ARM templates Azure Bicep Other IAC Tools
  • 31. What is Terraform? Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files. Write Plan Apply
  • 34. Build, deploy, and manage modern cloud applications and infrastructure using familiar languages, tools, and engineering practices. https://github.com/pulumi/pulumi-azure What is Pulumi? https://www.pulumi.com/docs/reference/pkg/azure/databricks/ Cloud Engineering for Everyone
  • 35. Pulumi azure.databricks Module Based on the azurerm Terraform Provider. Creating a Workspace Resource using Pulumi
  • 36. What is Bicep? Write Apply • Project Bicep – Next Generation ARM Templates • ARM Templates can get complex • Bicep is a cleaner more readable language, that gets compiled into ARM to deploy (a language around ARM) Write and Compile Bicep Language ARM Templates Azure Resource Manager Deployed Solution
  • 38. Summary • DevOps is for Everyone • CI/CD keeps your code in check and the latest features/changes in production as soon as possible • IAC is the blueprint of your solution • Lots of tooling options • Databricks Rest API can be used in conjunction with Python and Azure DevOps to create effective fault tolerant pipelines
  • 39. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.