SlideShare a Scribd company logo
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
Apache Spark Core APIs
RDDs, DataFrame, Datasets
Spark SQL
GraphX /
GraphFrames
(graph)
Structured
Streaming
Mllib
(machine
learning)
Spark: The Definitive Guide
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
Managed Apache Spark platform optimized for Azure
Microsoft Azure
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
AZURE DATABRICKS
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
MongoDB and Azure Databricks
DBFS
Storage blob
CLI
MongoDB and Azure Databricks
MongoDB and Azure Databricks
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
AZURE DATABRICKS
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Executor0
TASKTASK
Executor7
TASKTASK…
Master
SparkConnSparkConnSparkConnSparkConn
Primary
Secondary Secondary
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
MongoDB and Azure Databricks
Official Apache Spark website
Azure Databricks Documentation
MongoDB Connector for Apache Spark
MongoDB and Azure Databricks

More Related Content

What's hot (20)

PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PPTX
Architecting a datalake
Laurent Leturgez
 
PDF
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
PPTX
Azure Synapse Analytics Overview (r1)
James Serra
 
PPTX
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PPTX
Introduction to Azure Databricks
James Serra
 
PDF
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
Amazon Web Services Korea
 
PDF
Introducing Databricks Delta
Databricks
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PDF
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
PDF
Azure SQL Database Managed Instance - technical overview
George Walters
 
PPTX
Introduction to Data Engineering
Durga Gadiraju
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PPTX
Modern data warehouse
Rakesh Jayaram
 
PPTX
Data mesh
ManojKumarR41
 
PPTX
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
PPTX
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
PPT
Data Architecture for Data Governance
DATAVERSITY
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Azure Synapse Analytics Overview (r2)
James Serra
 
Architecting a datalake
Laurent Leturgez
 
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Azure Synapse Analytics Overview (r1)
James Serra
 
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Introduction to Azure Databricks
James Serra
 
대용량 데이터레이크 마이그레이션 사례 공유 [카카오게임즈 - 레벨 200] - 조은희, 팀장, 카카오게임즈 ::: Games on AWS ...
Amazon Web Services Korea
 
Introducing Databricks Delta
Databricks
 
Modernizing to a Cloud Data Architecture
Databricks
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
Azure SQL Database Managed Instance - technical overview
George Walters
 
Introduction to Data Engineering
Durga Gadiraju
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Modern data warehouse
Rakesh Jayaram
 
Data mesh
ManojKumarR41
 
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Data Architecture for Data Governance
DATAVERSITY
 

Similar to MongoDB and Azure Databricks (20)

PPTX
Building Advanced Analytics Pipelines with Azure Databricks
Lace Lofranco
 
PDF
Spark as a Service with Azure Databricks
Lace Lofranco
 
PDF
Fighting Fraud with Apache Spark
Miklos Christine
 
PDF
Apache spark 2.4 and beyond
Xiao Li
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
PDF
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
PPTX
Apache spark
Prashant Pranay
 
PPTX
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
PPTX
Large-Scale Data Science in Apache Spark 2.0
Databricks
 
PDF
Apache Spark - A High Level overview
Karan Alang
 
PPTX
TechEvent Databricks on Azure
Trivadis
 
PDF
Bds session 13 14
Infinity Tech Solutions
 
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
PPTX
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
PDF
Apache spark
Hitesh Dua
 
PDF
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
 
PDF
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
PPTX
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
PPTX
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Building Advanced Analytics Pipelines with Azure Databricks
Lace Lofranco
 
Spark as a Service with Azure Databricks
Lace Lofranco
 
Fighting Fraud with Apache Spark
Miklos Christine
 
Apache spark 2.4 and beyond
Xiao Li
 
Started with-apache-spark
Happiest Minds Technologies
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Apache Spark and Python: unified Big Data analytics
Julien Anguenot
 
Apache spark
Prashant Pranay
 
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
Large-Scale Data Science in Apache Spark 2.0
Databricks
 
Apache Spark - A High Level overview
Karan Alang
 
TechEvent Databricks on Azure
Trivadis
 
Bds session 13 14
Infinity Tech Solutions
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
Apache spark
Hitesh Dua
 
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Biography of Daniel Podor.pdf
Daniel Podor
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 

MongoDB and Azure Databricks

  • 9. Apache Spark Core APIs RDDs, DataFrame, Datasets Spark SQL GraphX / GraphFrames (graph) Structured Streaming Mllib (machine learning) Spark: The Definitive Guide
  • 13. Managed Apache Spark platform optimized for Azure Microsoft Azure
  • 14. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses AZURE DATABRICKS Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits
  • 19. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses AZURE DATABRICKS Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits
  • 29. Official Apache Spark website Azure Databricks Documentation MongoDB Connector for Apache Spark

Editor's Notes

  • #9: Objective: Show heterogenous set of tools in big data world Slice of the big data ecosystem For
  • #10: Talking points: Unified. Computing engine. Not a storage solution (interfaces w/ existing storage) Libraries (Mllib, GraphX, Spark SQL, Structured Streaming, open source packages)
  • #12: Developers can also choose to cache For Jobs that reuse over again a particular Dataset
  • #14: Fun fact: Employees of Databricks have written over 75% of the code in Apache Spark Why it’s important Scalable distributed computing environment PAYG https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks
  • #15: 14
  • #16: Workspaces Workspaces allow you to organize all the work that you are doing on Databricks. Like a folder structure in your computer, it allows you to save notebooks and libraries and share them with other users. Workspaces are not connected to data and should not be used to store data. They're simply for you to store the notebooks and libraries that you use to operate on and manipulate your data with. Notebooks Notebooks are a set of any number of cells that allow you to execute commands. Cells hold code in any of the following languages: Scala, Python, R, SQL, or Markdown. Notebooks have a default language, but each cell can have a language override to another language. This is done by including %[language name] at the top of the cell. For instance %python. We'll see this feature shortly. Notebooks need to be connected to a cluster in order to be able to execute commands however they are not permanently tied to a cluster. This allows notebooks to be shared via the web or downloaded onto your local machine. Here is a demonstration video of Notebooks. Dashboards Dashboards can be created from notebooks as a way of displaying the output of cells without the code that generates them. Notebooks can also be scheduled as jobs in one click either to run a data pipeline, update a machine learning model, or update a dashboard. Libraries Libraries are packages or modules that provide additional functionality that you need to solve your business problems. These may be custom written Scala or Java jars; Python eggs or custom written packages. You can write and upload these manually or you may install them directly via package management utilities like pypi or maven. Tables Tables are structured data that you and your team will use for analysis. Tables can exist in several places. Tables can be stored in cloud storage, they can be stored on the cluster that you're currently using, or they can be cached in memory. For more about tables see the documentation. Clusters Clusters are groups of computers that you treat as a single computer. In Databricks, this means that you can effectively treat 20 computers as you might treat one computer. Clusters allow you to execute code from notebooks or libraries on set of data. That data may be raw data located on cloud storage or structured data that you uploaded as a table to the cluster you are working on. It is important to note that clusters have access controls to control who has access to each cluster. Here is a demonstration video of Clusters. Jobs Jobs are the tool by which you can schedule execution to occur either on an already existing cluster or a cluster of its own. These can be notebooks as well as jars or Python scripts. They can be created either manually or via the REST API. Here is a demonstration video of Jobs. Apps Apps are third party integrations with the Databricks platform. These include applications like Tableau.
  • #17: If Spark is computing engine, where does Databricks store the data?
  • #18: OBJECTIVE: Show how easy it is to get started - Create Databricks workspace - Create a spark cluster Create a notebook Import notebook: https://databricks.com/resources/type/example-notebooks (https://cdn2.hubspot.net/hubfs/438089/notebooks/Quick_Start/Quick_Start_Using_Python.html)
  • #29: 28