SlideShare a Scribd company logo
Azure Databricks
Chitra Singh
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
 Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
 Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
 Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
 Avoid Disturbance
Avoid unwanted chit chat during the session.
1. What is Azure Databricks ?
2. Why we need Azure Databricks ?
3. How does Azure Databricks Works ?
4. Various Databricks.
5. Integrate Azure Databricks with Azure Blob
storage.
Azure Databricks (For Data Analytics).pptx
What is Databricks
Databrick was founded by original creator of Apache Spark. It was developed as a web-based
platform for working with Apache Spark. It provides automated cluster management and iPython-
style notebooks.
What is Databricks
Azure Databricks is the jointly-developed data and AI cloud service from Microsoft and
Databricks for the data analytics, data science, data engineering and machine learning.
What is Databricks
Azure Databricks, architecturally, is a cloud service that lets you set up and use a cluster of
Azure instances with Apache Spark installed with a Master-Worker nodal dynamic (similar
to a local Hadoop/Spark cluster)
Azure Cluster with Spark
Remote Access
Databricks Notebooks
Multi-Language
Collaborative
Ideal For Exploration
Reproducible
Get to Production
Faster
Enterprise Ready
Adaptable
What is Databricks
Since Azure Databricks is a cloud base service, it has several advantages over traditional
Spark clusters. Let us look at the benefits of using Azure Databricks
Optimised Spark Engine: Data Processing with
Auto-scaling and Spark optimized for up to 50x
performance gain.
Mlfow : Track and share experiments, reproduce runs
and manage models collaboratively from a central
repository.
Machine Learning : Pre-configured environments with
frameworks such as PyTorch, TensorFlow and sci-kit
learn installed.
What is Databricks
Choice of language : Use your preferred language, including Python, Scala, R,
Spark SQL, and .Net - whether you use serverless or provisioned computer
resources.
What is Databricks
Collaborative Notebooks: Quickly access and explore data and share new
insights and building models collectively with the language and tools of your
choice
Delta Lake: Bring data reliability and scalability to your existing data lake with
an open-source transactional storage layer designed for the full data cycle.
Integration with Azure Services: Complete your end-to-end analytics and
machine learning solution and deep integration with azure services such as
Azure Data Factory, Azure Data Storage, Azure Machine learning and Power BI
What is Databricks
Interactive Workspace: Easy and seamless coordination with Data Analyst
Data Scientist ,Data Engineer and Business Analysist to ensure smooth
collaborations.
Enterprise Grade Security: The native security provided by Microsoft Azure
ensure protection of data within storage services and private workspaces.
Production Ready: Easily run, implement and monitor your data-oriented jobs
and job-related stats.
How does Azure Databricks Works
How does Azure Databricks Works
Microsoft Azure provides a very simplified and easy to use interface to implements Databricks.
Databricks Utilites
Databricks Utilities
Databricks utilities and DButils help us to perform a verity of powerful which include efficient object
storage, chaining notebooks together and working with secrets.
In Azure Databricks notebooks, the DBUtils library provides utilities for interacting with various
aspects of the Databricks environment, such as file system operations, database connections, and
cluster configuration.
All DButils are available for notebooks of the following languages:
• Python,
• Scala
• R
Note: DBUtils are not supported outside Notebooks
Overall, regardless of the notebook language you're using (Python, Scala, or R), you can leverage
the capabilities provided by DBUtils to interact with various aspects of Azure Databricks.
Integrating Azure Databricks with Azure Blob
Storage
Integrating Azure Databricks with Azure Blob Storage
Seamless integration with various Azure services:
• Azure Storage: Data storage and retrieval.
• Azure SQL Data Warehouse: Data warehousing and analytics.
• Azure Cosmos DB: NoSQL database for scalable applications.
• Azure Data Lake Storage: Scalable data lake storage.
• Azure Active Directory: Identity and access management.
Microsoft azure provides a multitude of services .It often benefical to combine multiple
services together to approch your use-case
User
Coding
Notebooks
Azure
Databricks
Azure Cluster
with Spark
Hands on- integrating azure databricks with
azure blob storage
Hands on- integrating azure databricks with azure
blob storage
Step 1: Set up Azure Databricks
• Log in to the Azure portal (https://portal.azure.com).
• Search for "Databricks" in the search bar.
• Create a new Azure Databricks workspace by providing necessary details like subscription,
resource group, workspace name, and pricing tier.
• Once the workspace is provisioned, navigate to it from the Azure portal.
Hands on- integrating azure databricks with azure
blob storage
Step 2: Create a Cluster
• Inside the Azure Databricks workspace, go to the Clusters tab.
• Click on "Create Cluster" and configure the cluster settings such as cluster mode, instance type,
and number of workers.
• Click "Create Cluster" to provision the cluster.
Hands on- integrating azure databricks with azure
blob storage
Step 3: Create a Notebook
• Go to the Notebooks tab in the workspace.
• Click on "Create" and choose the language you want to use (Python, Scala, SQL, or R).
• Name your notebook and click "Create."
Hands on- integrating azure databricks with azure
blob storage
Step 4: Connect to Azure Blob Storage
In your notebook, use the following code to configure Azure Blob Storage credentials:
pythonCopy code
# Define storage account credentials
storage_account_name = "your_storage_account_name"
storage_account_access_key = "your_storage_account_access_key"
# Configure Spark to access Azure Blob Storage
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key
)
Replace "your_storage_account_name" and "your_storage_account_access_key" with
your actual storage account name and access key.
Hands on- integrating azure databricks with azure
blob storage
Step 5: Access Data in Azure Blob Storage
Once connected, you can access data stored in Azure Blob Storage using Spark APIs. For
example:
pythonCopy code
# Load data from Azure Blob Storage
df =
spark.read.csv("wasbs://container@storage_account_name.blob.core.windows.net/path/to/file.cs
v")
# Display the data
display(df)
Replace "container" and "path/to/file.csv" with your container name and file path.
Hands on- integrating azure databricks with azure
blob storage
Step 6: Perform Data Operations
• You can now perform various data operations on the data loaded from Azure Blob Storage using
Spark DataFrame APIs.
• Analyze, transform, visualize, or model the data as needed within your notebook.
Hands on- integrating azure databricks with azure
blob storage
Step 7: Cleanup (Optional)
• Once you're done with your analysis, you can terminate the cluster to avoid incurring
unnecessary costs.
• Go to the Clusters tab, select your cluster, and click "Terminate."
That's it! You've successfully integrated Azure Databricks with Azure Blob Storage and performed
data operations within a notebook.
Conclusion
• Here we have learned about Azure Databricks
• Feature of Azure Databricks
• And, implementation with Blob Storage
• We can explore further and leverage Azure Databricks and Azure Blob Storage for data
analytics needs.
Azure Databricks (For Data Analytics).pptx

More Related Content

What's hot (20)

PPTX
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
PPTX
Databricks for Dummies
Rodney Joyce
 
PPTX
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
PDF
Azure Data Factory Introduction.pdf
MaheshPandit16
 
PPTX
ADF Demo_ppt.pptx
vamsytaurus
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PPTX
Databricks Fundamentals
Dalibor Wijas
 
PPTX
Azure Data Factory for Azure Data Week
Mark Kromer
 
PDF
Azure Synapse Analytics
WinWire Technologies Inc
 
PPTX
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
PPTX
Azure Data Factory
HARIHARAN R
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PPTX
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
PDF
Snowflake for Data Engineering
Harald Erb
 
PDF
Databricks Delta Lake and Its Benefits
Databricks
 
PPTX
Snowflake essentials
qureshihamid
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PPTX
Core Concepts in azure data factory
BRIJESH KUMAR
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Databricks for Dummies
Rodney Joyce
 
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Azure Data Factory Introduction.pdf
MaheshPandit16
 
ADF Demo_ppt.pptx
vamsytaurus
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Databricks Fundamentals
Dalibor Wijas
 
Azure Data Factory for Azure Data Week
Mark Kromer
 
Azure Synapse Analytics
WinWire Technologies Inc
 
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Azure Data Factory
HARIHARAN R
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
Snowflake for Data Engineering
Harald Erb
 
Databricks Delta Lake and Its Benefits
Databricks
 
Snowflake essentials
qureshihamid
 
An Overview of Apache Cassandra
DataStax
 
Core Concepts in azure data factory
BRIJESH KUMAR
 
DW Migration Webinar-March 2022.pptx
Databricks
 

Similar to Azure Databricks (For Data Analytics).pptx (20)

PPTX
Azure Databricks Training | Azure Databricks Online Training
eshwarvisualpath
 
PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
PPTX
Introduction to Azure Databricks
James Serra
 
PDF
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
PDF
Databricks and Logging in Notebooks
Knoldus Inc.
 
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
PPTX
TechEvent Databricks on Azure
Trivadis
 
PDF
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
PPTX
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
PPTX
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
PDF
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
PPTX
Machine Learning and AI
James Serra
 
PPTX
Deep Learning Technical Pitch Deck
Nicholas Vossburg
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PPTX
Migration to Databricks - On-prem HDFS.pptx
Kshitija(KJ) Gupte
 
PPTX
Azure Data Engineer Course | Microsoft Azure Data Engineer.pptx
kalyanvisualpath
 
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
PPTX
Azure Data serices and databricks architecture
AdventureWorld5
 
PPTX
slides.pptx
MayankJain659
 
PPTX
slides.pptx
FahmiTounsiBakri
 
Azure Databricks Training | Azure Databricks Online Training
eshwarvisualpath
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Introduction to Azure Databricks
James Serra
 
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
Databricks and Logging in Notebooks
Knoldus Inc.
 
Azure Databricks - An Introduction 2019 Roadshow.pptx
pascalsegoul
 
TechEvent Databricks on Azure
Trivadis
 
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Machine Learning and AI
James Serra
 
Deep Learning Technical Pitch Deck
Nicholas Vossburg
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Migration to Databricks - On-prem HDFS.pptx
Kshitija(KJ) Gupte
 
Azure Data Engineer Course | Microsoft Azure Data Engineer.pptx
kalyanvisualpath
 
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup
 
Azure Data serices and databricks architecture
AdventureWorld5
 
slides.pptx
MayankJain659
 
slides.pptx
FahmiTounsiBakri
 
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
PPTX
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
PPTX
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
PPTX
Java 17 features and implementation.pptx
Knoldus Inc.
 
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
PPTX
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
PPTX
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
PPTX
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
PPTX
Intro to Azure Container App Presentation
Knoldus Inc.
 
PPTX
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
PPTX
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
PPTX
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
PPTX
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Ad

Recently uploaded (20)

PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 

Azure Databricks (For Data Analytics).pptx

  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes  Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time!  Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter.  Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call.  Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. 1. What is Azure Databricks ? 2. Why we need Azure Databricks ? 3. How does Azure Databricks Works ? 4. Various Databricks. 5. Integrate Azure Databricks with Azure Blob storage.
  • 5. What is Databricks Databrick was founded by original creator of Apache Spark. It was developed as a web-based platform for working with Apache Spark. It provides automated cluster management and iPython- style notebooks.
  • 6. What is Databricks Azure Databricks is the jointly-developed data and AI cloud service from Microsoft and Databricks for the data analytics, data science, data engineering and machine learning.
  • 7. What is Databricks Azure Databricks, architecturally, is a cloud service that lets you set up and use a cluster of Azure instances with Apache Spark installed with a Master-Worker nodal dynamic (similar to a local Hadoop/Spark cluster) Azure Cluster with Spark Remote Access
  • 8. Databricks Notebooks Multi-Language Collaborative Ideal For Exploration Reproducible Get to Production Faster Enterprise Ready Adaptable
  • 9. What is Databricks Since Azure Databricks is a cloud base service, it has several advantages over traditional Spark clusters. Let us look at the benefits of using Azure Databricks Optimised Spark Engine: Data Processing with Auto-scaling and Spark optimized for up to 50x performance gain. Mlfow : Track and share experiments, reproduce runs and manage models collaboratively from a central repository. Machine Learning : Pre-configured environments with frameworks such as PyTorch, TensorFlow and sci-kit learn installed.
  • 10. What is Databricks Choice of language : Use your preferred language, including Python, Scala, R, Spark SQL, and .Net - whether you use serverless or provisioned computer resources.
  • 11. What is Databricks Collaborative Notebooks: Quickly access and explore data and share new insights and building models collectively with the language and tools of your choice Delta Lake: Bring data reliability and scalability to your existing data lake with an open-source transactional storage layer designed for the full data cycle. Integration with Azure Services: Complete your end-to-end analytics and machine learning solution and deep integration with azure services such as Azure Data Factory, Azure Data Storage, Azure Machine learning and Power BI
  • 12. What is Databricks Interactive Workspace: Easy and seamless coordination with Data Analyst Data Scientist ,Data Engineer and Business Analysist to ensure smooth collaborations. Enterprise Grade Security: The native security provided by Microsoft Azure ensure protection of data within storage services and private workspaces. Production Ready: Easily run, implement and monitor your data-oriented jobs and job-related stats.
  • 13. How does Azure Databricks Works
  • 14. How does Azure Databricks Works Microsoft Azure provides a very simplified and easy to use interface to implements Databricks.
  • 16. Databricks Utilities Databricks utilities and DButils help us to perform a verity of powerful which include efficient object storage, chaining notebooks together and working with secrets. In Azure Databricks notebooks, the DBUtils library provides utilities for interacting with various aspects of the Databricks environment, such as file system operations, database connections, and cluster configuration. All DButils are available for notebooks of the following languages: • Python, • Scala • R Note: DBUtils are not supported outside Notebooks Overall, regardless of the notebook language you're using (Python, Scala, or R), you can leverage the capabilities provided by DBUtils to interact with various aspects of Azure Databricks.
  • 17. Integrating Azure Databricks with Azure Blob Storage
  • 18. Integrating Azure Databricks with Azure Blob Storage Seamless integration with various Azure services: • Azure Storage: Data storage and retrieval. • Azure SQL Data Warehouse: Data warehousing and analytics. • Azure Cosmos DB: NoSQL database for scalable applications. • Azure Data Lake Storage: Scalable data lake storage. • Azure Active Directory: Identity and access management. Microsoft azure provides a multitude of services .It often benefical to combine multiple services together to approch your use-case User Coding Notebooks Azure Databricks Azure Cluster with Spark
  • 19. Hands on- integrating azure databricks with azure blob storage
  • 20. Hands on- integrating azure databricks with azure blob storage Step 1: Set up Azure Databricks • Log in to the Azure portal (https://portal.azure.com). • Search for "Databricks" in the search bar. • Create a new Azure Databricks workspace by providing necessary details like subscription, resource group, workspace name, and pricing tier. • Once the workspace is provisioned, navigate to it from the Azure portal.
  • 21. Hands on- integrating azure databricks with azure blob storage Step 2: Create a Cluster • Inside the Azure Databricks workspace, go to the Clusters tab. • Click on "Create Cluster" and configure the cluster settings such as cluster mode, instance type, and number of workers. • Click "Create Cluster" to provision the cluster.
  • 22. Hands on- integrating azure databricks with azure blob storage Step 3: Create a Notebook • Go to the Notebooks tab in the workspace. • Click on "Create" and choose the language you want to use (Python, Scala, SQL, or R). • Name your notebook and click "Create."
  • 23. Hands on- integrating azure databricks with azure blob storage Step 4: Connect to Azure Blob Storage In your notebook, use the following code to configure Azure Blob Storage credentials: pythonCopy code # Define storage account credentials storage_account_name = "your_storage_account_name" storage_account_access_key = "your_storage_account_access_key" # Configure Spark to access Azure Blob Storage spark.conf.set( "fs.azure.account.key."+storage_account_name+".blob.core.windows.net", storage_account_access_key ) Replace "your_storage_account_name" and "your_storage_account_access_key" with your actual storage account name and access key.
  • 24. Hands on- integrating azure databricks with azure blob storage Step 5: Access Data in Azure Blob Storage Once connected, you can access data stored in Azure Blob Storage using Spark APIs. For example: pythonCopy code # Load data from Azure Blob Storage df = spark.read.csv("wasbs://container@storage_account_name.blob.core.windows.net/path/to/file.cs v") # Display the data display(df) Replace "container" and "path/to/file.csv" with your container name and file path.
  • 25. Hands on- integrating azure databricks with azure blob storage Step 6: Perform Data Operations • You can now perform various data operations on the data loaded from Azure Blob Storage using Spark DataFrame APIs. • Analyze, transform, visualize, or model the data as needed within your notebook.
  • 26. Hands on- integrating azure databricks with azure blob storage Step 7: Cleanup (Optional) • Once you're done with your analysis, you can terminate the cluster to avoid incurring unnecessary costs. • Go to the Clusters tab, select your cluster, and click "Terminate." That's it! You've successfully integrated Azure Databricks with Azure Blob Storage and performed data operations within a notebook.
  • 27. Conclusion • Here we have learned about Azure Databricks • Feature of Azure Databricks • And, implementation with Blob Storage • We can explore further and leverage Azure Databricks and Azure Blob Storage for data analytics needs.