0% found this document useful (0 votes)
6 views

Azure_Practice assignment

The document outlines a comprehensive assignment for Azure Data Factory, detailing tasks such as provisioning a new Data Factory, setting up access to Azure Key Vault and Data Lake Storage, creating linked services, datasets, and pipelines for data movement and transformation. It includes instructions for scheduling pipelines, creating alerts for execution success or failure, and monitoring pipeline executions. Additionally, it covers the use of parameters and variables, as well as the differences between them, and concludes with a reference link for tumbling window triggers.

Uploaded by

badarrutuja9881
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Azure_Practice assignment

The document outlines a comprehensive assignment for Azure Data Factory, detailing tasks such as provisioning a new Data Factory, setting up access to Azure Key Vault and Data Lake Storage, creating linked services, datasets, and pipelines for data movement and transformation. It includes instructions for scheduling pipelines, creating alerts for execution success or failure, and monitoring pipeline executions. Additionally, it covers the use of parameters and variables, as well as the differences between them, and concludes with a reference link for tumbling window triggers.

Uploaded by

badarrutuja9881
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Assignment (Azure Data Factory)

1) Provision one new Azure Data Factory under your resource group. Capture screenshot of all
the steps performed to do this task.
Region: East US
Version: v2
ANS:
2) Provide access for newly created ADF on Azure Keyvault. Add ADF managed Identity as
Azure Key Vault administrator.
ANS:
3) Provide access for newly created ADF on Azure Data Lake Storage Gen2. Add ADF
managed Identity as Azure Blob Data Contributor.
ANS:

4)Create one new Linked Service for Azure Key vault created as part of previous assignment.
ANS:
5) Create Linked Service for Azure SQL that you have created as part of previous
assignments.

Note: Use SQL Authentication and password must be used from Keyvault secret.
ANS:
6) Create Linked Service for Azure Data Lake Gen2 (using Account Key Authentication) that
you have created as part of previous assignments.

ANS:
7) Create Linked Service for Azure Data Lake Gen2 (using Service Principle Authentication)
that you have created as part of previous assignments.

Note: Service principal key must be used from Keyvault secret.

ANS:
8) Create one new Dataset for a table that you have created in your Azure SQL DB. Values of
Schema and Table name should be passed as parameter.
ANS:
9) Create one dataset for ADLS Gen2 storage account with delimitedText file format.
ANS:

10) Edit ADLS Gen2 (DelimitedText) dataset to make “File System”, “Directory” and “File
Name” dynamic. Values for all these should be passed as parameter.
ANS:

11) Create one dataset for ADLS Gen2 storage account with parquet file format.
ANS:
12) Edit ADLS Gen2 (parquet) dataset to make “File System”, “Directory” and “File Name”
dynamic. Values for all these should be passed as parameter.
ANS:

13) Create a new ADF pipeline to Copy data as per below mentioned details:
a. Source 🡪 Employee table in Azure SQL
b. Sink🡪 a csv file in ADLS Gen2 bronze container.

ANS:
14) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.

ANS:

15) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in this file is as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
16) Create a new ADF pipeline to copy multiple tables from Azure SQL tables to ADLS Gen2.
List of tables to be copied should be passed as Input parameter.
Source -> Tables in Azure SQL
Sink🡪 csv file in ADLS Gen2 bronze container. Folder and File name should be same as SQL
table name.
ANS:
17) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.
ANS:

18) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in all files are as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
Dynamic Pipeline to read Source and Sunk Details from a Config:

19) Create one new container named config in ADLS Gen2.


ANS:

20) Create one Config (JSON) file (sample given below) to have all the source tables and Sink
details.

[
{“TableName”:”x”,
“ADLSFileSystem”:”a”,
“ADLSDirectory”:”b”
“ADLSFileName”:”c”
}
]

21) Upload above config file in config container.


ANS:

22) Create one new dataset for ADLS Gen2 with JSON file format pointing to your config file.
ANS:

23) Create one pipeline to read config file and copy all the tables from SQL to ADLS Gen2 as
per details mentioned in config file. (Hint: Use lookup activity followed by ForEach)
ANS:
24) Create a pipeline to get all lastmodifed date for all files in ADLS Gen2 bronze container.
ANS:
25) Create a pipeline to copy data from ADLS Gen2 bronze to silver container (only if
lastmodified date of a file is of today)
ANS:

26) Create a pipeline to add list of all filenames from a ADLS Gen2 folder into an array type
variable (Hint: Use append variable activity along with Get metadata)
ANS:
27) Create one pipeline with a Wait activity. Wait time of this activity should be passed using
pipeline parameter.
ANS:

28) Create one pipeline to call pipeline created as part of above task. Wait time parameter
for child pipeline should be passed from Parent pipeline.
ANS:
29) Create a data flow to join employee and department tables from bronze container and
load into a csv file in gold container in ADLS Gen2.
ANS:
30) Create a data flow to aggregate salary of all employees by department and load
aggregated into a csv file in gold container in ADLS Gen2.
ANS:

31) Create a data flow to split employee’s data by department and load split datasets into
gold container in ADLS Gen2.
ANS:
32) Create a data flow to select only name and salary column from employee’s data and
multiply salary by 2. Load transformed data into a csv file into gold container in ADLS
Gen2.
ANS:
33) Create a new pipeline to call one of the dataflows created as part of above tasks. Run this
pipeline and make sure that pipeline is executed successfully.
ANS:
34) Schedule pipeline created as part of step #13 to run every day @ 1AM IST.

ANS:
35) Schedule pipeline created as part of step #13 to run every week on Mon, Wed and Fri @
9AM IST.

ANS:

36) Create one Storage event-based trigger for pipeline created as part of step #25
ANS:

37) Validate/Test Storage event trigger by putting some blob file in ADLS Gen2 container.
Make sure that trigger is executed successfully.
ANS:
38) Which Integration runtime to be used while reading data from a on-premises system.?
ANS:
Self-hosted integration runtime to be used while reading data from a on-premises system.

When reading data from an on-premises system in Azure Data Factory (ADF) or Synapse Analytics,
you should use the Self-hosted Integration Runtime.

The Self-hosted Integration Runtime (IR) acts as a bridge to connect on-premises data sources with
cloud-based Azure services. It can securely access data in your on-premises network and transfer it
to the cloud environment.

Steps to set up Self-hosted Integration Runtime:


1. Download and Install: Download the Self-hosted IR from the Azure portal and install it on a
machine in your on-premises environment.

2. Register the IR: Register the Self-hosted IR in the Azure portal.


3. Configure Access: Configure necessary network settings and credentials to allow the IR to access
the on-premises data sources.
4. Use in Pipelines: Use the Self-hosted IR in your Data Factory or Synapse pipelines to read and
write data from/to on-premises sources.

Benefits:
Security: Data transfer is secure and remains within your network boundaries until it reaches the
cloud.
Performance: Efficient data transfer with minimal latency.
Flexibility: Supports a wide range of on-premises data sources and custom connectors.

This setup is essential for scenarios where direct connectivity to on-premises systems is required for
data movement, ETL processes, and data integration tasks.

39) Create two Global parameters and use these parameters in one of your pipelines.
ANS:
40) List difference between Global Parameter vs Pipeline parameter
ANS:

In Azure Data Factory (ADF), both global parameters and pipeline parameters are used to
store and manage values, but they serve different purposes and have different scopes:

Global Parameters:

- Defined at the factory level (globally)


- Shared across all pipelines in the factory
- Used to store constants, environment-specific values, or sensitive information (e.g.,
credentials)
- Can be used in expressions, data flows, and activities
- Encrypted and stored securely
- Limited to 100 global parameters per factory

Pipeline Parameters:

- Defined at the pipeline level


- Specific to each pipeline
- Used to pass input values to a pipeline during execution
- Can be used in pipeline definitions, activities, and data flows
- Not encrypted by default (but can be stored securely using Azure Key Vault)

Key differences:

- Scope: Global parameters are shared across all pipelines, while pipeline parameters are
specific to each pipeline.
- Purpose: Global parameters store constants and sensitive information, while pipeline
parameters pass input values to a pipeline.
- Encryption: Global parameters are encrypted, while pipeline parameters are not (by
default).

41) List difference between a pipeline parameter and variable?


ANS:
Pipeline Parameters:

1. Defined at pipeline level


2. Passed as input when triggering the pipeline
3. Can be used in pipeline definitions, activities, and data flows
4. Not reusable across pipelines
5. Not stored in the pipeline, only passed as input
6. Can be used to pass external values, like file names or dates
7. Limited to 50 parameters per pipeline

Variables:

1. Defined inside a pipeline or data flow


2. Store values for reuse within the pipeline or data flow
3. Can be used in expressions, data flows, and activities
4. Reusable across activities and data flows within the pipeline
5. Stored in the pipeline or data flow definition
6. Can be used to store intermediate results or calculated values
7. No limit on the number of variables

Key differences:

- Purpose: Parameters pass external input, while variables store reusable values.
- Scope: Parameters are pipeline-specific, while variables are defined within a pipeline or
data flow.
- Reusability: Parameters are not reusable, while variables can be reused within the pipeline
or data flow.

When to use each:

- Use pipeline parameters for external input values that vary per pipeline execution.
- Use variables for intermediate results, calculated values, or reusable values within the
pipeline or data flow.

42) Capture all the steps/screenshots on how to monitor the pipeline executions.?
ANS:
Column name Description
Pipeline Name Name of the pipeline
Run Start Start date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Run End End date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Duration Run duration (HH:MM:SS)
Triggered By The name of the trigger that started the
pipeline
Status Failed, Succeeded, In Progress, Canceled, or
Queued
Annotations Filterable tags associated with a pipeline
Parameters Parameters for the pipeline run
(name/value pairs)
Error If the pipeline failed, the run error
Run Original, Rerun, or Rerun (Latest)
Run ID ID of the pipeline run
manually select the Refresh button to refresh the list of pipeline and activity runs. Autorefresh is
currently not supported.

To view the results of a debug run, select the Debug tab.

To get a detailed view of the individual activity runs of a specific pipeline run, click on the pipeline
name.
The list view shows activity runs that correspond to each pipeline run. Hover over the specific activity
run to get run-specific information such as the JSON input, JSON output, and detailed activity-
specific monitoring experiences.

If an activity failed, you can see the detailed error message by clicking on the icon in the error
column.
To rerun a pipeline that has previously ran from the start, hover over the specific pipeline run and
select Rerun. If you select multiple pipelines, you can use the Rerun button to run them all.

You can see the resources consumed by a pipeline run by clicking the consumption icon next to the
run.
Alerts

43) Create one alert to send email on pipeline successful execution.


ANS:
44) Create one alert to send email notification on pipeline failures.
ANS:
45) Delete Azure Data Factory resource once you are done with all questions of this
assignments.
ANS:
Tumbling Window Reference:

https://learn.microsoft.com/en-us/azure/data-factory/tumbling-window-trigger-
dependency

You might also like