Azure_Practice assignment
Azure_Practice assignment
1) Provision one new Azure Data Factory under your resource group. Capture screenshot of all
the steps performed to do this task.
Region: East US
Version: v2
ANS:
2) Provide access for newly created ADF on Azure Keyvault. Add ADF managed Identity as
Azure Key Vault administrator.
ANS:
3) Provide access for newly created ADF on Azure Data Lake Storage Gen2. Add ADF
managed Identity as Azure Blob Data Contributor.
ANS:
4)Create one new Linked Service for Azure Key vault created as part of previous assignment.
ANS:
5) Create Linked Service for Azure SQL that you have created as part of previous
assignments.
Note: Use SQL Authentication and password must be used from Keyvault secret.
ANS:
6) Create Linked Service for Azure Data Lake Gen2 (using Account Key Authentication) that
you have created as part of previous assignments.
ANS:
7) Create Linked Service for Azure Data Lake Gen2 (using Service Principle Authentication)
that you have created as part of previous assignments.
ANS:
8) Create one new Dataset for a table that you have created in your Azure SQL DB. Values of
Schema and Table name should be passed as parameter.
ANS:
9) Create one dataset for ADLS Gen2 storage account with delimitedText file format.
ANS:
10) Edit ADLS Gen2 (DelimitedText) dataset to make “File System”, “Directory” and “File
Name” dynamic. Values for all these should be passed as parameter.
ANS:
11) Create one dataset for ADLS Gen2 storage account with parquet file format.
ANS:
12) Edit ADLS Gen2 (parquet) dataset to make “File System”, “Directory” and “File Name”
dynamic. Values for all these should be passed as parameter.
ANS:
13) Create a new ADF pipeline to Copy data as per below mentioned details:
a. Source 🡪 Employee table in Azure SQL
b. Sink🡪 a csv file in ADLS Gen2 bronze container.
ANS:
14) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.
ANS:
15) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in this file is as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
16) Create a new ADF pipeline to copy multiple tables from Azure SQL tables to ADLS Gen2.
List of tables to be copied should be passed as Input parameter.
Source -> Tables in Azure SQL
Sink🡪 csv file in ADLS Gen2 bronze container. Folder and File name should be same as SQL
table name.
ANS:
17) Run above pipeline in debug mode and make sure that pipeline is executed successfully.
Capture screenshot of successfully executed pipeline.
ANS:
18) Validate data in ADLS Gen2 bronze container using Azure storage explorer. Download the
csv file and validate that data in all files are as expected. Capture screenshot of csv file
loaded in ADLS Gen2.
ANS:
Dynamic Pipeline to read Source and Sunk Details from a Config:
20) Create one Config (JSON) file (sample given below) to have all the source tables and Sink
details.
[
{“TableName”:”x”,
“ADLSFileSystem”:”a”,
“ADLSDirectory”:”b”
“ADLSFileName”:”c”
}
]
22) Create one new dataset for ADLS Gen2 with JSON file format pointing to your config file.
ANS:
23) Create one pipeline to read config file and copy all the tables from SQL to ADLS Gen2 as
per details mentioned in config file. (Hint: Use lookup activity followed by ForEach)
ANS:
24) Create a pipeline to get all lastmodifed date for all files in ADLS Gen2 bronze container.
ANS:
25) Create a pipeline to copy data from ADLS Gen2 bronze to silver container (only if
lastmodified date of a file is of today)
ANS:
26) Create a pipeline to add list of all filenames from a ADLS Gen2 folder into an array type
variable (Hint: Use append variable activity along with Get metadata)
ANS:
27) Create one pipeline with a Wait activity. Wait time of this activity should be passed using
pipeline parameter.
ANS:
28) Create one pipeline to call pipeline created as part of above task. Wait time parameter
for child pipeline should be passed from Parent pipeline.
ANS:
29) Create a data flow to join employee and department tables from bronze container and
load into a csv file in gold container in ADLS Gen2.
ANS:
30) Create a data flow to aggregate salary of all employees by department and load
aggregated into a csv file in gold container in ADLS Gen2.
ANS:
31) Create a data flow to split employee’s data by department and load split datasets into
gold container in ADLS Gen2.
ANS:
32) Create a data flow to select only name and salary column from employee’s data and
multiply salary by 2. Load transformed data into a csv file into gold container in ADLS
Gen2.
ANS:
33) Create a new pipeline to call one of the dataflows created as part of above tasks. Run this
pipeline and make sure that pipeline is executed successfully.
ANS:
34) Schedule pipeline created as part of step #13 to run every day @ 1AM IST.
ANS:
35) Schedule pipeline created as part of step #13 to run every week on Mon, Wed and Fri @
9AM IST.
ANS:
36) Create one Storage event-based trigger for pipeline created as part of step #25
ANS:
37) Validate/Test Storage event trigger by putting some blob file in ADLS Gen2 container.
Make sure that trigger is executed successfully.
ANS:
38) Which Integration runtime to be used while reading data from a on-premises system.?
ANS:
Self-hosted integration runtime to be used while reading data from a on-premises system.
When reading data from an on-premises system in Azure Data Factory (ADF) or Synapse Analytics,
you should use the Self-hosted Integration Runtime.
The Self-hosted Integration Runtime (IR) acts as a bridge to connect on-premises data sources with
cloud-based Azure services. It can securely access data in your on-premises network and transfer it
to the cloud environment.
Benefits:
Security: Data transfer is secure and remains within your network boundaries until it reaches the
cloud.
Performance: Efficient data transfer with minimal latency.
Flexibility: Supports a wide range of on-premises data sources and custom connectors.
This setup is essential for scenarios where direct connectivity to on-premises systems is required for
data movement, ETL processes, and data integration tasks.
39) Create two Global parameters and use these parameters in one of your pipelines.
ANS:
40) List difference between Global Parameter vs Pipeline parameter
ANS:
In Azure Data Factory (ADF), both global parameters and pipeline parameters are used to
store and manage values, but they serve different purposes and have different scopes:
Global Parameters:
Pipeline Parameters:
Key differences:
- Scope: Global parameters are shared across all pipelines, while pipeline parameters are
specific to each pipeline.
- Purpose: Global parameters store constants and sensitive information, while pipeline
parameters pass input values to a pipeline.
- Encryption: Global parameters are encrypted, while pipeline parameters are not (by
default).
Variables:
Key differences:
- Purpose: Parameters pass external input, while variables store reusable values.
- Scope: Parameters are pipeline-specific, while variables are defined within a pipeline or
data flow.
- Reusability: Parameters are not reusable, while variables can be reused within the pipeline
or data flow.
- Use pipeline parameters for external input values that vary per pipeline execution.
- Use variables for intermediate results, calculated values, or reusable values within the
pipeline or data flow.
42) Capture all the steps/screenshots on how to monitor the pipeline executions.?
ANS:
Column name Description
Pipeline Name Name of the pipeline
Run Start Start date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Run End End date and time for the pipeline run
(MM/DD/YYYY, HH:MM:SS AM/PM)
Duration Run duration (HH:MM:SS)
Triggered By The name of the trigger that started the
pipeline
Status Failed, Succeeded, In Progress, Canceled, or
Queued
Annotations Filterable tags associated with a pipeline
Parameters Parameters for the pipeline run
(name/value pairs)
Error If the pipeline failed, the run error
Run Original, Rerun, or Rerun (Latest)
Run ID ID of the pipeline run
manually select the Refresh button to refresh the list of pipeline and activity runs. Autorefresh is
currently not supported.
To get a detailed view of the individual activity runs of a specific pipeline run, click on the pipeline
name.
The list view shows activity runs that correspond to each pipeline run. Hover over the specific activity
run to get run-specific information such as the JSON input, JSON output, and detailed activity-
specific monitoring experiences.
If an activity failed, you can see the detailed error message by clicking on the icon in the error
column.
To rerun a pipeline that has previously ran from the start, hover over the specific pipeline run and
select Rerun. If you select multiple pipelines, you can use the Rerun button to run them all.
You can see the resources consumed by a pipeline run by clicking the consumption icon next to the
run.
Alerts
https://learn.microsoft.com/en-us/azure/data-factory/tumbling-window-trigger-
dependency