0% found this document useful (0 votes)

6 views

Azure_Practice assignment

The document outlines a comprehensive assignment for Azure Data Factory, detailing tasks such as provisioning a new Data Factory, setting up access to Azure Key Vault and Data Lake Storage, creating linked services, datasets, and pipelines for data movement and transformation. It includes instructions for scheduling pipelines, creating alerts for execution success or failure, and monitoring pipeline executions. Additionally, it covers the use of parameters and variables, as well as the differences between them, and concludes with a reference link for tumbling window triggers.

Uploaded by

badarrutuja9881

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Azure_Practice assignment

Uploaded by

badarrutuja9881

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Assignment (Azure Data Factory)

1) Provision one new Azure Data Factory under your resource group. Capture screenshot of all
the steps performed to do this task.
Region: East US
Version: v2
ANS:
2) Provide access for newly created ADF on Azure Keyvault. Add ADF managed Identity as
Azure Key Vault administrator.
ANS:
3) Provide access for newly created ADF on Azure Data Lake Storage Gen2. Add ADF
managed Identity as Azure Blob Data Contributor.
ANS:

4)Create one new Linked Service for Azure Key vault created as part of previous assignment.
ANS:
5) Create Linked Service for Azure SQL that you have created as part of previous
assignments.

Note: Use SQL Authentication and password must be used from Keyvault secret.
ANS:
6) Create Linked Service for Azure Data Lake Gen2 (using Account Key Authentication) that
you have created as part of previous assignments.

ANS:
7) Create Linked Service for Azure Data Lake Gen2 (using Service Principle Authentication)
that you have created as part of previous assignments.

Note: Service principal key must be used from Keyvault secret.

ANS:
8) Create one new Dataset for a table that you have created in your Azure SQL DB. Values of
Schema and Table name should be passed as parameter.
ANS:
9) Create one dataset for ADLS Gen2 storage account with delimitedText file format.
ANS:

10) Edit ADLS Gen2 (DelimitedText) dataset to make “File System”, “Directory” and “File
Name” dynamic. Values for all these should be passed as parameter.
ANS:

11) Create one dataset for ADLS Gen2 storage account with parquet file format.
ANS:
12) Edit ADLS Gen2 (parquet) dataset to make “File System”, “Directory” and “File Name”
dynamic. Values for all these should be passed as parameter.
ANS:

13) Create a new ADF pipeline to Copy data as per below mentioned details:
a. Source 🡪 Employee table in Azure SQL
b. Sink🡪 a csv file in ADLS Gen2 bronze container.

ANS:
14) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.

ANS:

15) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in this file is as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
16) Create a new ADF pipeline to copy multiple tables from Azure SQL tables to ADLS Gen2.
List of tables to be copied should be passed as Input parameter.
Source -> Tables in Azure SQL
Sink🡪 csv file in ADLS Gen2 bronze container. Folder and File name should be same as SQL
table name.
ANS:
17) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.
ANS:

18) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in all files are as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
Dynamic Pipeline to read Source and Sunk Details from a Config:

19) Create one new container named config in ADLS Gen2.

ANS:

20) Create one Config (JSON) file (sample given below) to have all the source tables and Sink
details.

[
{“TableName”:”x”,
“ADLSFileSystem”:”a”,
“ADLSDirectory”:”b”
“ADLSFileName”:”c”
}
]

21) Upload above config file in config container.

ANS:

22) Create one new dataset for ADLS Gen2 with JSON file format pointing to your config file.
ANS:

23) Create one pipeline to read config file and copy all the tables from SQL to ADLS Gen2 as
per details mentioned in config file. (Hint: Use lookup activity followed by ForEach)
ANS:
24) Create a pipeline to get all lastmodifed date for all files in ADLS Gen2 bronze container.
ANS:
25) Create a pipeline to copy data from ADLS Gen2 bronze to silver container (only if
lastmodified date of a file is of today)
ANS:

26) Create a pipeline to add list of all filenames from a ADLS Gen2 folder into an array type
variable (Hint: Use append variable activity along with Get metadata)
ANS:
27) Create one pipeline with a Wait activity. Wait time of this activity should be passed using
pipeline parameter.
ANS:

28) Create one pipeline to call pipeline created as part of above task. Wait time parameter
for child pipeline should be passed from Parent pipeline.
ANS:
29) Create a data flow to join employee and department tables from bronze container and
load into a csv file in gold container in ADLS Gen2.
ANS:
30) Create a data flow to aggregate salary of all employees by department and load
aggregated into a csv file in gold container in ADLS Gen2.
ANS:

31) Create a data flow to split employee’s data by department and load split datasets into
gold container in ADLS Gen2.
ANS:
32) Create a data flow to select only name and salary column from employee’s data and
multiply salary by 2. Load transformed data into a csv file into gold container in ADLS
Gen2.
ANS:
33) Create a new pipeline to call one of the dataflows created as part of above tasks. Run this
pipeline and make sure that pipeline is executed successfully.
ANS:
34) Schedule pipeline created as part of step #13 to run every day @ 1AM IST.

ANS:
35) Schedule pipeline created as part of step #13 to run every week on Mon, Wed and Fri @
9AM IST.

ANS:

36) Create one Storage event-based trigger for pipeline created as part of step #25
ANS:

37) Validate/Test Storage event trigger by putting some blob file in ADLS Gen2 container.
Make sure that trigger is executed successfully.
ANS:
38) Which Integration runtime to be used while reading data from a on-premises system.?
ANS:
Self-hosted integration runtime to be used while reading data from a on-premises system.

When reading data from an on-premises system in Azure Data Factory (ADF) or Synapse Analytics,
you should use the Self-hosted Integration Runtime.

The Self-hosted Integration Runtime (IR) acts as a bridge to connect on-premises data sources with
cloud-based Azure services. It can securely access data in your on-premises network and transfer it
to the cloud environment.

Steps to set up Self-hosted Integration Runtime:

1. Download and Install: Download the Self-hosted IR from the Azure portal and install it on a
machine in your on-premises environment.

2. Register the IR: Register the Self-hosted IR in the Azure portal.

3. Configure Access: Configure necessary network settings and credentials to allow the IR to access
the on-premises data sources.
4. Use in Pipelines: Use the Self-hosted IR in your Data Factory or Synapse pipelines to read and
write data from/to on-premises sources.

Benefits:
Security: Data transfer is secure and remains within your network boundaries until it reaches the
cloud.
Performance: Efficient data transfer with minimal latency.
Flexibility: Supports a wide range of on-premises data sources and custom connectors.

This setup is essential for scenarios where direct connectivity to on-premises systems is required for
data movement, ETL processes, and data integration tasks.

39) Create two Global parameters and use these parameters in one of your pipelines.
ANS:
40) List difference between Global Parameter vs Pipeline parameter
ANS:

In Azure Data Factory (ADF), both global parameters and pipeline parameters are used to
store and manage values, but they serve different purposes and have different scopes:

Global Parameters:

- Defined at the factory level (globally)

- Shared across all pipelines in the factory
- Used to store constants, environment-specific values, or sensitive information (e.g.,
credentials)
- Can be used in expressions, data flows, and activities
- Encrypted and stored securely
- Limited to 100 global parameters per factory

Pipeline Parameters:

- Defined at the pipeline level

- Specific to each pipeline
- Used to pass input values to a pipeline during execution
- Can be used in pipeline definitions, activities, and data flows
- Not encrypted by default (but can be stored securely using Azure Key Vault)

Key differences:

- Scope: Global parameters are shared across all pipelines, while pipeline parameters are
specific to each pipeline.
- Purpose: Global parameters store constants and sensitive information, while pipeline
parameters pass input values to a pipeline.
- Encryption: Global parameters are encrypted, while pipeline parameters are not (by
default).

41) List difference between a pipeline parameter and variable?

ANS:
Pipeline Parameters:

1. Defined at pipeline level

2. Passed as input when triggering the pipeline
3. Can be used in pipeline definitions, activities, and data flows
4. Not reusable across pipelines
5. Not stored in the pipeline, only passed as input
6. Can be used to pass external values, like file names or dates
7. Limited to 50 parameters per pipeline

Variables:

1. Defined inside a pipeline or data flow

2. Store values for reuse within the pipeline or data flow
3. Can be used in expressions, data flows, and activities
4. Reusable across activities and data flows within the pipeline
5. Stored in the pipeline or data flow definition
6. Can be used to store intermediate results or calculated values
7. No limit on the number of variables

Key differences:

- Purpose: Parameters pass external input, while variables store reusable values.
- Scope: Parameters are pipeline-specific, while variables are defined within a pipeline or
data flow.
- Reusability: Parameters are not reusable, while variables can be reused within the pipeline
or data flow.

When to use each:

- Use pipeline parameters for external input values that vary per pipeline execution.
- Use variables for intermediate results, calculated values, or reusable values within the
pipeline or data flow.

42) Capture all the steps/screenshots on how to monitor the pipeline executions.?
ANS:
Column name Description
Pipeline Name Name of the pipeline
Run Start Start date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Run End End date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Duration Run duration (HH:MM:SS)
Triggered By The name of the trigger that started the
pipeline
Status Failed, Succeeded, In Progress, Canceled, or
Queued
Annotations Filterable tags associated with a pipeline
Parameters Parameters for the pipeline run
(name/value pairs)
Error If the pipeline failed, the run error
Run Original, Rerun, or Rerun (Latest)
Run ID ID of the pipeline run
manually select the Refresh button to refresh the list of pipeline and activity runs. Autorefresh is
currently not supported.

To view the results of a debug run, select the Debug tab.

To get a detailed view of the individual activity runs of a specific pipeline run, click on the pipeline
name.
The list view shows activity runs that correspond to each pipeline run. Hover over the specific activity
run to get run-specific information such as the JSON input, JSON output, and detailed activity-
specific monitoring experiences.

If an activity failed, you can see the detailed error message by clicking on the icon in the error
column.
To rerun a pipeline that has previously ran from the start, hover over the specific pipeline run and
select Rerun. If you select multiple pipelines, you can use the Rerun button to run them all.

You can see the resources consumed by a pipeline run by clicking the consumption icon next to the
run.
Alerts

43) Create one alert to send email on pipeline successful execution.

ANS:
44) Create one alert to send email notification on pipeline failures.
ANS:
45) Delete Azure Data Factory resource once you are done with all questions of this
assignments.
ANS:
Tumbling Window Reference:

https://learn.microsoft.com/en-us/azure/data-factory/tumbling-window-trigger-
dependency

Narsimlu - Azure Data Engineer - Resume .Pf-1
50% (2)
Narsimlu - Azure Data Engineer - Resume .Pf-1
4 pages
AWS Certified Solutions Architect Study Guide with 900 Practice Test Questions: Associate (SAA-C03) Exam
From Everand
AWS Certified Solutions Architect Study Guide with 900 Practice Test Questions: Associate (SAA-C03) Exam
David Clinton
No ratings yet
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
From Everand
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
David Clinton
No ratings yet
Azure Data Factory Interview Questions
No ratings yet
Azure Data Factory Interview Questions
14 pages
Azure Data Factory Interview Questions and Answer
No ratings yet
Azure Data Factory Interview Questions and Answer
12 pages
Azure Data Factory
No ratings yet
Azure Data Factory
4 pages
Student Portal
No ratings yet
Student Portal
78 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Taking Interviw
No ratings yet
Taking Interviw
15 pages
Interview Series ADF Part-1
No ratings yet
Interview Series ADF Part-1
17 pages
capgemini questionnaire
No ratings yet
capgemini questionnaire
11 pages
ADF12072022
No ratings yet
ADF12072022
12 pages
ADE Project Along With CI_CD Pipeline
No ratings yet
ADE Project Along With CI_CD Pipeline
36 pages
Azure Data Factory Deck 1
No ratings yet
Azure Data Factory Deck 1
59 pages
az questions
No ratings yet
az questions
11 pages
ADF Question Set2
No ratings yet
ADF Question Set2
2 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
ADF interviews
No ratings yet
ADF interviews
6 pages
ADE -7_30AM_Frame 4
No ratings yet
ADE -7_30AM_Frame 4
1 page
Lab 7 - Orchestrating Data Movement With Azure Data Factory
No ratings yet
Lab 7 - Orchestrating Data Movement With Azure Data Factory
26 pages
025.0 ADF Overview
No ratings yet
025.0 ADF Overview
12 pages
Copy activity in ADF
No ratings yet
Copy activity in ADF
52 pages
Adf Interview QSNS
No ratings yet
Adf Interview QSNS
1 page
Azure Data Factory Interview Questions Answers 1740678784
No ratings yet
Azure Data Factory Interview Questions Answers 1740678784
9 pages
Azure Data Factory Whitepaper PassingParameters
No ratings yet
Azure Data Factory Whitepaper PassingParameters
19 pages
ADF - Data Movt and IR(2)
No ratings yet
ADF - Data Movt and IR(2)
26 pages
Pipeline: Azure Data Factory Cheat Sheet by
100% (1)
Pipeline: Azure Data Factory Cheat Sheet by
14 pages
ADF Question Set1
No ratings yet
ADF Question Set1
2 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Azure Data Factory
No ratings yet
Azure Data Factory
13 pages
For More Information, Check: Starting Your Journey With Microsoft Azure Data Factory
No ratings yet
For More Information, Check: Starting Your Journey With Microsoft Azure Data Factory
4 pages
ADF INTERVIEW Q&A
No ratings yet
ADF INTERVIEW Q&A
27 pages
Adf Loop PDF
100% (1)
Adf Loop PDF
4 pages
Most Frequently Asked Azure Data Factory Interview Questions
0% (1)
Most Frequently Asked Azure Data Factory Interview Questions
5 pages
Azure DATA Fatcory
No ratings yet
Azure DATA Fatcory
2,982 pages
Azure de Project
No ratings yet
Azure de Project
73 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
ADF Workshop by Amit Navgire
No ratings yet
ADF Workshop by Amit Navgire
26 pages
Untitled
No ratings yet
Untitled
3 pages
Azure Interview Questions
No ratings yet
Azure Interview Questions
7 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam
From Everand
AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam
Asif Abbasi
No ratings yet
Learn Cassandra in 24 Hours
From Everand
Learn Cassandra in 24 Hours
Alex Nordeen
No ratings yet
ADE -7_30AM_Frame 8
No ratings yet
ADE -7_30AM_Frame 8
1 page
Azure Notes - 3 Data Integration
No ratings yet
Azure Notes - 3 Data Integration
9 pages
ADF Question Set3
No ratings yet
ADF Question Set3
2 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
EssentialsOfAzureDataLakeStorageGen2 MelissaCoates
No ratings yet
EssentialsOfAzureDataLakeStorageGen2 MelissaCoates
41 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
ADF Hands-On
No ratings yet
ADF Hands-On
98 pages
Adf 25 Questions
No ratings yet
Adf 25 Questions
16 pages
Interviews are tough, especially when ADF basics trip you
No ratings yet
Interviews are tough, especially when ADF basics trip you
10 pages
Databricks
No ratings yet
Databricks
43 pages
DP-203
No ratings yet
DP-203
13 pages
Learn: Zure Data Factory (Adf)
No ratings yet
Learn: Zure Data Factory (Adf)
9 pages
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Akamai Online - Application Software Engineer Senior Lead
No ratings yet
Akamai Online - Application Software Engineer Senior Lead
2 pages
What Is Interprocess Communication?
No ratings yet
What Is Interprocess Communication?
39 pages
10-Code Generation
No ratings yet
10-Code Generation
74 pages
Assignment 2 04042021 045308pm
No ratings yet
Assignment 2 04042021 045308pm
9 pages
Modern Periodic Table - C Program
No ratings yet
Modern Periodic Table - C Program
10 pages
Css Practical No-5
No ratings yet
Css Practical No-5
3 pages
Dsa U4
No ratings yet
Dsa U4
13 pages
Launcher Logcat
No ratings yet
Launcher Logcat
28 pages
CH 4 C++
No ratings yet
CH 4 C++
25 pages
Java Introduction PDF
0% (1)
Java Introduction PDF
82 pages
Fortify Sys Reqs 20.1.0
No ratings yet
Fortify Sys Reqs 20.1.0
50 pages
Log
No ratings yet
Log
19 pages
PP Template For Business Presentation
No ratings yet
PP Template For Business Presentation
28 pages
Spring Webflow Reference
No ratings yet
Spring Webflow Reference
92 pages
System Programming
No ratings yet
System Programming
337 pages
Lab Report 4 Oop Lab 230609
No ratings yet
Lab Report 4 Oop Lab 230609
6 pages
Game Jolt Separate Library: Installation
No ratings yet
Game Jolt Separate Library: Installation
8 pages
FRS APP Cir
100% (1)
FRS APP Cir
2 pages
Python Essentials 1 - Module 1
No ratings yet
Python Essentials 1 - Module 1
27 pages
Java-Basics-1
No ratings yet
Java-Basics-1
31 pages
Untitled
No ratings yet
Untitled
40 pages
Day 2 - Session 1-3 - SMS
No ratings yet
Day 2 - Session 1-3 - SMS
72 pages
BCA Syllabus (I, II, & III Year)
No ratings yet
BCA Syllabus (I, II, & III Year)
29 pages
JWT For User Authentication in Flask
No ratings yet
JWT For User Authentication in Flask
8 pages
Types of Testing Approaches
100% (7)
Types of Testing Approaches
4 pages
PDF Investigation With Parser Differentials and On
No ratings yet
PDF Investigation With Parser Differentials and On
8 pages
How To Find Time Complexity of An Algorithm
No ratings yet
How To Find Time Complexity of An Algorithm
8 pages
vlsi_labmanual_2024
No ratings yet
vlsi_labmanual_2024
100 pages
Simple Bank Project Part 1 C Sharp
No ratings yet
Simple Bank Project Part 1 C Sharp
7 pages

Azure_Practice assignment

Uploaded by

Azure_Practice assignment

Uploaded by

Assignment (Azure Data Factory)

Note: Service principal key must be used from Keyvault secret.

19) Create one new container named config in ADLS Gen2.

21) Upload above config file in config container.

Steps to set up Self-hosted Integration Runtime:

2. Register the IR: Register the Self-hosted IR in the Azure portal.

- Defined at the factory level (globally)

- Defined at the pipeline level

41) List difference between a pipeline parameter and variable?

1. Defined at pipeline level

1. Defined inside a pipeline or data flow

When to use each:

To view the results of a debug run, select the Debug tab.

43) Create one alert to send email on pipeline successful execution.

You might also like