0% found this document useful (0 votes)
22 views

Azure Dataflow

The document describes creating several pipelines using Azure Data Factory. The first pipeline copies customer data from a CSV file to an SQL database where the customer ID is even. The second pipeline joins customer and address data and saves it as a JSON file. The third pipeline reads customer data from SQL and address data from CSV, joins them, filters on customer ID, sorts the results, and saves them as a parquet file. The fourth pipeline calculates the highest product price for each category from a product CSV, excluding blue products, and saves the results as a CSV file.

Uploaded by

raj d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Azure Dataflow

The document describes creating several pipelines using Azure Data Factory. The first pipeline copies customer data from a CSV file to an SQL database where the customer ID is even. The second pipeline joins customer and address data and saves it as a JSON file. The third pipeline reads customer data from SQL and address data from CSV, joins them, filters on customer ID, sorts the results, and saves them as a parquet file. The fourth pipeline calculates the highest product price for each category from a product CSV, excluding blue products, and saves the results as a CSV file.

Uploaded by

raj d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

1.

Create a pipeline to copy the customer data


from csv file to SQL where the customer id is
an even number.
Click on dataflow and select source1 as customer file located in Azure
datalake file storage

A
A
LY
U
AM
Customer File location:

A
A
Use Filter to get even number customers:
LY
U
AM
Sink into sqldataDB with new table as customerClone.

A
A
LY
U
AM
A
A
LY
Create a pipeline and add dataflow to execute :
U
AM

Publish and debug


Verfying table created in SQLDB:

A
A
LY
U
AM
2. Create a pipeline to join the two files (Customer,
Customer Address) based on customer id and save
the result as a JSON file.

Import Customer data from datalake:

A
A
LY
U
AM
Select Dataset Pointing to customer.txt file:

A
A
LY
U
AM
Import CustomerAddress data from datalake:

A
A
LY
Join both tables based on customer ID
U
AM
Select dataset -JSON to save:

A
A
LY
Create a pipeline to execute the dataflow:
U
AM
A
A
LY
U

!!!! Successfully created JSON file!!!


AM
A
A
LY
3.Create a pipeline to read the Customer table data from
SQL and CustomerAddress data from CSV, join both of
them ,and then save the result where customer id> 1000
U
& Customer id <2000 in ascending order as a Parquet file.
Select CustomerData from SQLDB:
AM
AM
U
LY
A
A
Select CustomerAddress data from dataLake:

A
A
LY
U
AM
Join both tables based on customerID

A
A
LY
U

Filter customer id > 1000 && Customer id <2000


AM
A
A
LY
Sort CustomerID in ascending order:
U
AM
A
A
Sink as Parquet File:
LY
U
AM
Output Flow:

A
A
LY
U
AM
A
A
LY
!!!!! Successfully Parquet file created !!!!

4. create a pipeline to read the Product CSV file, and


calculate the highest listPrice of any product under each
productcategory. Ensure that product shouldn't be of
U
blue in color and save the result as CSV file inside
ProductResult folder.
AM

Select Product table from datalake


A
A
LY
U
AM

Color Should not be in Blue Color


A
A
LY
U
AM

maximum of listprice in ProductID


A
A
LY
SELECT aggregate to get MAX(listPrice)
U
AM
A
A
LY
U

Sink in productResult folder in datalake


AM
AM
U
LY
A
A
AM
U
LY
A
A
Add dataflow to pipeline in order to execute:

A
A
LY
U
AM
AM
U
LY
A
A
A
A
LY
U
AM

Files created in partitions


create a pipeline to read the Product CSV file, and
calculate the highest listPrice of any product under each
productcategory. Ensure that product shouldn't be of
blue in color and save the result as a SINGLE CSV file
inside ProductSingleResult folder.

A:It is the same as above question but instead of saving in


multiple partitions save as single partition.

A
A
LY
U
AM
AM
U
LY
A
A
A
A
LY
U
AM

For more updates Follow me on LinkedIN: LinkedIn.com/in/amulya1003


AM
U
LY
A
A

You might also like