Load Data With Azure Data Factory
Load Data With Azure Data Factory
[AZURE.SELECTOR]
Data Factory
PolyBase
BCP
This tutorial shows you how to create a pipeline in Azure Data Factory to move data from Azure
Storage Blob to SQL Data Warehouse. With the following steps you will:
Create a pipeline to move data from Storage Blobs to SQL Data Warehouse.
[AZURE.VIDEO loading-azure-sql-data-warehouse-with-azure-data-factory]
Azure Storage Blob: This tutorial uses Azure Storage Blob as the data source for the Azure
Data Factory pipeline, and so you need to have one available to store the sample data. If you
don't have one already, learn how to Create a storage account.
SQL Data Warehouse: This tutorial moves the data from Azure Storage Blob to SQL Data
Warehouse and so need to have a data warehouse online that is loaded with the
AdventureWorksDW sample data. If you do not already have a data warehouse, learn how to
provision one. If you have a data warehouse but didn't provision it with the sample data, you
can load it manually.
Azure Data Factory: Azure Data Factory will complete the actual load and so you need to
have one that you can use to build the data movement pipeline.If you don't have one
already, learn how to create one in Step 1 of Get started with Azure Data Factory (Data
Factory Editor).
AZCopy: You need AZCopy to copy the sample data from your local client to your Azure
Storage Blob. For install instructions, see the AZCopy documentation.
1. Download sample data. This data will add another three years of sales data to your
AdventureWorksDW sample data.
1
2. Use this AZCopy command to copy the three years of data to your Azure Storage Blob.
AzCopy /Source:<Sample Data Location> /Dest:https://<storage
account>.blob.core.windows.net/<container name> /DestKey:<storage key>
/Pattern:FactInternetSales.csv
To get started, open the Azure portal and select your data factory from the left-hand menu.
1. First, begin the registration process by clicking the 'Linked Services' section of your data
factory and then click 'New data store.' Choose a name to register your azure storage under,
select Azure Storage as your type, and then enter your Account Name and Account Key.
2. To register SQL Data Warehouse navigate to the 'Author and Deploy' section, select 'New
Data Store', and then 'Azure SQL Data Warehouse'. Copy and paste in this template, and then
fill in your specific information.
{
"name": "<Linked Service Name>",
"properties": {
"description": "",
"type": "AzureSqlDW",
"typeProperties": {
"connectionString": "Data Source=tcp:<server
name>.database.windows.net,1433;Initial Catalog=<server name>;Integrated Security=False;User
ID=<user>@<servername>;Password=<password>;Connect Timeout=30;Encrypt=True"
}
}
}
1. Start this process by navigating to the 'Author and Deploy' section of your data factory.
2. Click 'New dataset' and then 'Azure Blob storage' to link your storage to your data factory.
You can use the below script to define your data in Azure Blob storage:
{
"name": "<Dataset Name>",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "<linked storage name>",
"typeProperties": {
"folderPath": "<containter name>",
"fileName": "FactInternetSales.csv",
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
2
"rowDelimiter": "\n"
}
},
"external": true,
"availability": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
3. Now we will also define our dataset for SQL Data Warehouse. We start in the same way, by
clicking 'New dataset' and then 'Azure SQL Data Warehouse'.
{
"name": "DWDataset",
"properties": {
"type": "AzureSqlDWTable",
"linkedServiceName": "AzureSqlDWLinkedService",
"typeProperties": {
"tableName": "FactInternetSales"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
In the 'Author and Deploy' section now click 'More Commands' and then 'New Pipeline'. After you
create the pipeline, you can use the below code to transfer the data to your data warehouse:
{
"name": "<Pipeline Name>",
"properties": {
"description": "<Description>",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"skipHeaderLineCount": 1
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:10"
3
}
},
"inputs": [
{
"name": "<Storage Dataset>"
}
],
"outputs": [
{
"name": "<Data Warehouse Dataset>"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Sample Copy",
"description": "Copy Activity"
}
],
"start": "<Date YYYY-MM-DD>",
"end": "<Date YYYY-MM-DD>",
"isPaused": false
}
}
Next steps
To learn more, start by viewing:
These topics provide detailed information about Azure Data Factory. They discuss Azure SQL
Database or HDinsight, but the information also applies to Azure SQL Data Warehouse.
Tutorial: Get started with Azure Data Factory This is the core tutorial for processing data with
Azure Data Factory. In this tutorial you will build your first pipeline that uses HDInsight to
transform and analyze web logs on a monthly basis. Note, there is no copy activity in this
tutorial.
Tutorial: Copy data from Azure Storage Blob to Azure SQL Database. In this tutorial, you will
create a pipeline in Azure Data Factory to copy data from Azure Storage Blob to Azure SQL
Database.
Real-world scenario tutorial. This is an in-depth tutorial for using Azure Data Factory.