ADF
ADF
FACTORY
Basic Azure Data Factory Interview Questions for Freshers
Using Azure Data Factory, you can create and schedule data-driven
workflows (called pipelines) that ingest data from disparate data
stores.
It can process and transform data using computer services such as
HDInsight, Hadoop, Spark, Azure Data Lake Analytics, and Azure
Machine Learning.
For example, consider an SQL Server. You need a connection string that
you can connect to an external device. You need to mention the source
and destination of your data.
1. Broadcast
2. Bi-directional syncs
3. Correlation
4. Aggregation
16. What is the difference between Azure Data Lake and Azure
Data Warehouse?
The data warehouse is a traditional way of storing data that is still
widely used. The data lake is complementary to a data warehouse, i.e.,
if you have your data in a data lake that can be stored in the data
warehouse, you have to follow specific rules.
18. What is the difference between Azure Data Lake store and
Blob storage?
Azure Data Lake Storage Gen1 Azure Blob Storage
Key Concepts Data Lake Storage Gen1 account contains folders, which
in turn contain data stored as files Storage account has containers,
which in turn has data in the form of blobs.
19. What are the steps for creating ETL process in Azure Data
Factory?
While we are trying to extract some data from the Azure SQL Server
database, if something has to be processed, it will be processed and
stored in the Data Lake Storage.
Steps for Creating ETL
1. Create a linked service for the source data store, which is SQL
Server Database
2. Assume that we have a cars dataset
3. Create a linked service for the destination data store, which is Azure
Data Lake Storage (ADLS)
4. Create a dataset for data saving.
5. Create the pipeline and add copy activity.
6. Schedule the pipeline by adding a trigger.
Each activity within the pipeline can consume the parameter value
that’s passed to the pipeline and run with the @parameter construct.
25. List any 5 types of data sources that Azure Data Factory
supports.
Azure supports the following data sources:
26. How can one set up data sources and destinations in Azure
Data Factory?
To connect with a data source or destination, one needs to set up a
linked service. A linked service is a configuration containing the
connection information required to connect to a data source or
destination. The following steps show how to set linked services:
Use cases:
1. UserErrorOdbcInvalidQueryString
Cause: when the user commits a wrong or invalid query for fetching the
data/schemas.
2. FailedToResolveParametersInExploratoryController
Cause: This error arises due to the limitation of supporting the linked
service, which provides a reference to another linked service with
parameters for test connections or preview data.
32. What are triggers in ADF, and how can they be used to
automate pipeline expressions? What is their significance in
pipeline development?
In ADF, triggers are components that enable the automated execution
of pipeline activities based on predefined conditions or schedules. In
orchestrating the data workflows, triggers have played a crucial role,
along with automating data integration and transformation tasks
within ADF.
CSV
Excel
Binary
Avro
JSON
ORC
XML
Parquet
On the contrary, in the wrangling data flow activity, the method of data
preparation is without the use of a program. In Spark, the data
manipulation capabilities of Power Query M are provided to the user, as
Power Query Online has a compatible nature.
CIDR
Array functions.
Comparison functions
Resource functions
Deployment value functions
Subscription scope function
Logical function
parseCidr
cidrSubnet
cidrHost
coalesce
equals
less
lessOrEquals
greater
greaterOrEquals
42. What are the three most important tasks that you can
complete with ADF?
The three most important tasks that you can complete with ADF are
moving data, transferring data, and exercising control.
It can return up to 5000 rows at once; if there are more than 5000
rows, it will return the first 5000 data values.
The supported activity size for lookup is 4 MB; the activity fails if
the size exceeds the limit.
The longest duration before timeout for lookup activity IS 24 hrs
https://www.linkedin.com/in/madhumitha-podishetty-842561119/