0% found this document useful (0 votes)
53 views

DP 203T00A ENU PowerPoint - 01

Uploaded by

chief artificer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

DP 203T00A ENU PowerPoint - 01

Uploaded by

chief artificer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Get started with data

engineering on Azure

© Copyright Microsoft Corporation. All rights reserved.


Learning Objectives
After completing this module, you will be able to:

1 Introduction to data engineering on Azure

2 Introduction to Azure Data Lake Storage Gen2

3 Introduction to Azure Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.


Introduction to data
engineering on Azure

© Copyright Microsoft Corporation. All rights reserved.


What is data engineering?
Data engineers work with multiple types of data to perform a variety of data operations using a
range of tools and scripting languages

Types of data Data operations Languages

SQL
Structured Integration
SELECT…

Python
Semi-structured Transformation
df=spark.read(…)

R Jav
Unstructured Consolidation .aNE Others
T
Scal
a

© Copyright Microsoft Corporation. All rights reserved.


Important data engineering concepts
Operational and analytical Streaming data Data pipeline
data

Orchestrated activities to transfer and


Operational: Transactional data used by Perpetual, real-time data feeds transform data.
applications
Used to implement extract, transform, and
Analytical: Optimized for analysis and load (ETL) or extract, load, and transform (ELT)
reporting operations.
Data Lake Data Warehouse Apache Spark

Analytical data stored in files Analytical data stored in a relational database Open-source engine for
distributed data processing
Distributed storage for massive scalability Typically modeled as a star schema to
optimize summary analysis

© Copyright Microsoft Corporation. All rights reserved.


Data engineering in Azure

Operational Data ingestion/ETL Analytical data storage Data modeling and


data and processing visualization
Azure Synapse Analytics Microsoft Power BI

Azure Data Lake Storage Gen2


Azure Stream Analytics

Azure Databricks
Azure Data Factory

© Copyright Microsoft Corporation. All rights reserved.


Knowledge check
1 Data in a relational database table is…
⃣Structured
⃣Semi-structured
⃣Unstructured

2 In a data lake, data is stored in…


⃣Relational tables
⃣Files
⃣A single JSON document

Which of the following Azure services provides capabilities for running data pipelines
3 AND managing analytical data in a data lake or relational data warehouse?
⃣Azure Stream Analytics
⃣Azure Synapse Analytics
⃣Azure Databricks

© Copyright Microsoft Corporation. All rights reserved.


Introduction to Azure
Data Lake Storage Gen2

© Copyright Microsoft Corporation. All rights reserved.


Understand Azure Data Lake Storage
Gen2
Distributed cloud
storage for data lakes
• HDFS-compatibility –
Common file system for
Hadoop, Spark, and
others
• Flexible security through
folder and file level
permissions
• Built on Azure Storage:
– High performance and
scalability
– Data redundancy
through built-in
replication
© Copyright Microsoft Corporation. All rights reserved.
Azure Data Lake Storage Gen 2 vs Azure Blob
Storage
Enable Hierarchical Namespace in a blob container to use Azure Data Lake Storage
Gen2
Azure Blob Storage Azure Data Lake Storage Gen2

Azure Storage Account


Azure Storage Account
Blob Container
Blob Container Directory
blob1 File1
File2
folder1/blob2 Hierarchical
Namespace

Blobs can be organized in virtual directories, but File system includes directories and files, and is
each path is considered a single blob in a flat compatible with large scale data analytics systems
namespace – Folder level operations are not like Hadoop, Databricks, and Azure Synapse
supported Analytics
© Copyright Microsoft Corporation. All rights reserved.
Knowledge check

1 Azure Data Lake Storage Gen2 stores data in…


⃣A document database hosted in Azure Cosmos DB
⃣An HDFS-compatible file system hosted in Azure Storage
⃣A relational data warehouse hosted in Azure Synapse Analytics

2 What option must you enable to use Azure Data Lake Storage
Gen2?
⃣Global replication
⃣Data encryption
⃣Hierarchical namespace

© Copyright Microsoft Corporation. All rights reserved.


Introduction to Azure
Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.


What is Azure Synapse Analytics?

Cloud platform for


data analytics
• Large-scale data
warehousing
• Advanced analytics

• Data exploration and


discovery
• Real time analytics

• Data integration

• Integrated analytics

© Copyright Microsoft Corporation. All rights reserved.


Work with files in a data lake
• Connect to data lake
storage using linked
services
• Every Azure Synapse
Analytics workspace has
a default data lake

© Copyright Microsoft Corporation. All rights reserved.


Ingest and transform data with
pipelines
• Native pipeline
functionality built on Azure
Data Factory
• Orchestrate activities to
ingest, transform, and load
data
• Integrate with other data
services

© Copyright Microsoft Corporation. All rights reserved.


Query and manipulate data with SQL

SQL Server based


pools for scalable
relational data
processing:
• Built-in serverless SQL
pool for data exploration
and analysis of files in the
data lake
• Custom dedicated SQL
pools to host large-scale
relational data
warehouses

© Copyright Microsoft Corporation. All rights reserved.


Process and analyze data with Apache
Spark
Open-source Spark
technology
• Highly scalable,
distributed processing
• Common libraries and
multiple programming
languages

Integrated notebook
experience

© Copyright Microsoft Corporation. All rights reserved.


Exercise: Explore Azure Synapse
Analytics
Use the hosted lab environment provided, or view the lab
instructions at the link below:
https://aka.ms/mslearn-explore-synapse

© Copyright Microsoft Corporation. All rights reserved.


Knowledge check
Which feature of Azure Synapse Analytics enables you to transfer data from one store to another
1 and apply transformations to the data at scheduled intervals?
⃣Serverless SQL pool
⃣Apache Spark pool
⃣Pipelines

2 You want to create a data warehouse in Azure Synapse Analytics in which the data is stored and
queried in a relational data store. What kind of pool should you create?
⃣Serverless SQL pool
⃣Dedicated SQL pool
⃣Apache Spark pool

A data analyst wants to analyze data by using Python code combined with text descriptions of
3 the insights gained from the analysis. What should they use to perform the analysis?
⃣A notebook connected to an Apache Spark pool
⃣A SQL script connected to a serverless SQL pool
⃣A KQL script connected to a Data Explorer pool

© Copyright Microsoft Corporation. All rights reserved.


Further reading

Get started with data engineering on Azure


https://aka.ms/mslearn-data-engineer

© Copyright Microsoft Corporation. All rights reserved.

You might also like