Immediate download Azure Data Factory Cookbook: Data engineers guide to build and manage ETL and ELT pipelines with data integration , 2nd Edition Dmitry Foshin ebooks 2024
Immediate download Azure Data Factory Cookbook: Data engineers guide to build and manage ETL and ELT pipelines with data integration , 2nd Edition Dmitry Foshin ebooks 2024
com
https://textbookfull.com/product/azure-data-factory-
cookbook-data-engineers-guide-to-build-and-manage-etl-and-
elt-pipelines-with-data-integration-2nd-edition-dmitry-
foshin/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/azure-data-factory-cookbook-second-
edition-dmitry-foshin/
textboxfull.com
https://textbookfull.com/product/quick-start-guide-to-azure-data-
factory-azure-data-lake-server-and-azure-data-warehouse-1st-edition-
mark-beckner/
textboxfull.com
https://textbookfull.com/product/azure-data-factory-by-example-
practical-implementation-for-data-engineers-2nd-edition-richard-
swinbank/
textboxfull.com
https://textbookfull.com/product/azure-data-factory-by-example-
practical-implementation-for-data-engineers-second-edition-richard-
swinbank/
textboxfull.com
Data Engineering with AWS: Acquire the skills to design
and build AWS-based data transformation pipelines like a
pro 2nd Edition Eagar
https://textbookfull.com/product/data-engineering-with-aws-acquire-
the-skills-to-design-and-build-aws-based-data-transformation-
pipelines-like-a-pro-2nd-edition-eagar/
textboxfull.com
https://textbookfull.com/product/understanding-azure-data-factory-
operationalizing-big-data-and-advanced-analytics-solutions-sudhir-
rawat/
textboxfull.com
https://textbookfull.com/product/machine-learning-with-r-cookbook-
second-edition-analyze-data-and-build-predictive-models-bhatia/
textboxfull.com
https://textbookfull.com/product/azure-storage-streaming-and-batch-
analytics-a-guide-for-data-engineers-1st-edition-richard-l-nuckolls/
textboxfull.com
本书版权归Packt Publishing所有
Azure Data Factory Cookbook
Second Edition
Build ETL, Hybrid ETL, and ELT pipelines using ADF, Synapse Analytics,
Fabric and Databricks
Dmitry Foshin
Tonya Chernyshova
Dmitry Anoshin
Xenia Ireton
BIRMINGHAM—MUMBAI
Azure Data Factory Cookbook
Second Edition
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, without the prior written permission of the publisher, except in the case of brief
quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express or
implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any
damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products
mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee
the accuracy of this information.
ISBN 978-1-80324-659-8
www.packt.com
Contributors
I would like to express my heartfelt gratitude to my wife, Mariia, my parents, my brother, Ilya, and all my
family and friends who supported me and provided encouragement throughout the journey of producing this
book. Your unwavering support has been invaluable, and I am deeply grateful for your presence in my life.
Tonya Chernyshova is an experienced Data Engineer with over 10 years in the field, includ-
ing time at Amazon. Specializing in Data Modeling, Automation, Cloud Computing (AWS and
Azure), and Data Visualization, she has a strong track record of delivering scalable, maintainable
data products. Her expertise drives data-driven insights and business growth, showcasing her
proficiency in leveraging cloud technologies to enhance data capabilities.
Dmitry Anoshin is a data engineering leader with 15 years of experience working in business
intelligence, data warehousing, data integration, big data, cloud, and machine learning space
across North America and Europe.
He leads data engineering initiatives on a petabyte-scale data platform that was built using
cloud and big data technologies to support machine learning experiments, data science models,
business intelligence reporting, and data exchange with internal and external partners. He is also
responsible for handling privacy compliance and security-critical datasets.
Besides that work, he teaches a cloud computing course at the University of Victoria, Canada,
wherein he mentors high-school students in the CS faculty, and he also teaches people how to
land data jobs at Surfalytics.com. In addition, he is an author of analytics books and a speaker at
data-related conferences and user groups.
I want to thank my beautiful wife, Lana, and my kids, Vasily, Anna, and Michael, who give me the energy to
work, grow, and contribute to the data industry.
Xenia Ireton is a Senior Software Engineer at Microsoft. She has extensive knowledge in building
distributed services, data pipelines and data warehouses.
About the reviewers
Deepak Goyal is a certified Azure Cloud Solution Architect, and he posseses over fifteen years
of expertise in Designing, Developing and Managing Enterprise Cloud solutions. He is also a Big
Data Certified professional and a passionate Cloud advocate.
Saikat Dutta is an Azure Data Engineer with over 13 years of experience. He has worked ex-
tensively with Microsoft Data products, from SQL Server 2000 to ADF, Synapse Pipelines, and
MS Fabric. His career is shaped by various employers. The highlights of his career have been
adaptability and a commitment to staying at the forefront of technology.
This is his first book review, wherein he has tried to provide practical insights into Microsoft Data
products and tried to help the book become more than a cookbook. He has also contributed to a
popular Data Newsletter and blog to share knowledge in the tech community.
Excited about the book’s impact, I look forward to continuing my journey in the evolving field
of Data Engineering.
I express gratitude to my family for their unwavering support during the review process. Balancing work and
family, especially with a younger kid, wouldn’t have been possible without their cooperation.
Join our community on Discord
Join our community’s Discord space for discussions with the authors and other readers:
https://discord.gg/U229qmBmT3
Table of Contents
Preface xix
There’s more... • 18
See also • 18
Creating a data factory using PowerShell �������������������������������������������������������������������������� 19
Getting ready • 19
How to do it… • 19
How it works... • 21
There’s more... • 22
See also • 22
Using templates to create ADF pipelines ��������������������������������������������������������������������������� 22
Getting ready • 22
How to do it... • 22
How it works... • 24
See also • 24
Creating an Azure Data Factory using Azure Bicep ������������������������������������������������������������ 25
Getting ready • 25
How to do it... • 25
How it works... • 26
There’s more... • 27
See also • 27
How it works… • 49
There’s more… • 50
Using the ForEach and Filter activities ������������������������������������������������������������������������������� 51
Getting ready • 51
How to do it… • 51
How it works… • 55
Chaining and branching activities within a pipeline �������������������������������������������������������� 56
Getting ready • 56
How to do it… • 56
There’s more… • 59
Using the Lookup, Web, and Execute Pipeline activities ��������������������������������������������������� 60
Getting ready • 61
How to do it… • 65
How it works… • 69
There’s more… • 70
See also • 70
Creating event-based pipeline triggers ����������������������������������������������������������������������������� 70
Getting ready • 70
How to do it… • 71
How it works… • 75
There’s more… • 76
See also • 77
How to do it… • 88
How it works… • 94
There’s more… • 94
Loading data to Azure Synapse Analytics using Azure Data Studio ���������������������������������� 95
Getting ready • 95
How to do it… • 96
How it works… • 101
There’s more… • 102
Loading data to Azure Synapse Analytics using bulk load ����������������������������������������������� 102
Getting ready • 102
How to do it… • 103
How it works… • 108
Pausing/resuming an Azure Synapse SQL pool from Azure Data Factory ������������������������ 109
Getting ready • 109
How to do it… • 109
How it works… • 114
There’s more… • 114
Working with Azure Purview using Azure Synapse ���������������������������������������������������������� 115
Getting ready • 115
How to do it… • 116
How it works… • 123
There’s more... • 123
Copying data in Azure Synapse Integrate ������������������������������������������������������������������������� 123
Getting ready • 123
How to do it… • 123
How it works… • 126
Using a Synapse serverless SQL pool �������������������������������������������������������������������������������� 127
Getting ready • 127
How to do it… • 127
How it works… • 133
There’s more… • 133
Table of Contents xi
Processing data from Azure Data Lake with HDInsight and Hive ������������������������������������ 164
Getting ready • 165
How to do it… • 165
How it works… • 174
Building data models in Delta Lake and data pipeline jobs with Databricks �������������������� 176
Getting ready • 176
How to do it… • 177
How it works… • 186
There is more… • 188
Ingesting data into Delta Lake using Mapping Data Flows ��������������������������������������������� 188
Getting ready • 189
How to do it… • 189
How it works… • 197
There is more… • 198
External integrations with other compute engines (Snowflake) ������������������������������������� 198
Getting ready • 199
How to do it… • 199
How it works… • 208
There is more… • 208
Index 497
Preface
Azure Data Factory (ADF) is a modern data integration tool available on Microsoft Azure. This
Azure Data Cookbook, Second Edition helps you get up and running by showing you how to create
and execute your first job in ADF. You’ll learn how to branch and chain activities, create custom
activities, and schedule pipelines. This book will help you discover the benefits of cloud data
warehousing, Azure Synapse Analytics, Azure Data Lake Storage Gen2, and Databricks, which
are frequently used for Big Data Analytics. Through practical recipes, you’ll learn how to actively
engage with analytical tools from Azure Data Services and leverage your on-premises infrastruc-
ture with cloud-native tools to get relevant business insights.
As you advance, you’ll be able to integrate the most commonly used Azure services into ADF and
understand how Azure services can be useful in designing ETL pipelines. The book will take you
through the common errors that you may encounter while working with ADF and guide you in
using the Azure portal to monitor pipelines. You’ll also understand error messages and resolve
problems in connectors and data flows with the debugging capabilities of ADF.
Additionally, there is also a focus on the latest cutting-edge technology in Microsoft Fabric. You’ll
explore how this technology enhances its capabilities for data integration and orchestration.
By the end of this book, you’ll be able to use ADF as the main ETL and orchestration tool for your
data warehouse and data platform projects.
Chapter 2, Orchestration and Control Flow, will introduce you to the building blocks of data pro-
cessing in ADF. The chapter contains hands-on exercises that show you how to set up linked
services and datasets for your data sources, use various types of activities, design data-processing
workflows, and create triggers for data transfers.
Chapter 3, Setting Up Synapse Analytics, covers key features and benefits of cloud data warehousing
and Azure Synapse Analytics. You will learn how to connect and configure Azure Synapse Analytics,
load data, build transformation processes, and operate data flows.
Chapter 4, Working with Data Lake and Spark Pools, will cover the main features of the Azure Data
Lake Storage Gen2. It is a multimodal cloud storage solution that is frequently used for big data
analytics. We will load and manage the datasets that we will use for analytics in the next chapter.
Chapter 5, Working with Big Data and Databricks, will actively engage with analytical tools from
Azure’s data services. You will learn how to build data models in Delta Lake using Azure Databricks
and mapping data flows. Also, this recipe will show you how to set up HDInsights clusters and
how to work with delta tables.
Chapter 6, Data Migration – Azure Data Factory and Other Cloud Services, will walk though several
illustrative examples on migrating data from Amazon Web Services and Google Cloud providers.
In addition, you will learn how to use ADF’s custom activities to work with providers who are
not supported by Microsoft’s built-in connectors.
Chapter 7, Extending Azure Data Factory with Logic Apps and Azure Functions, will show you how to
harness the power of serverless execution by integrating some of the most commonly used Azure
services: Azure Logic Apps and Azure Functions. These recipes will help you understand how
Azure services can be useful in designing Extract, Transform, Load (ETL) pipelines.
Chapter 8, Microsoft Fabric and Power BI, Azure ML, and Cognitive Services, will teach you how to
build an ADF pipeline that operates on a pre-built Azure ML model. You will also create and run
an ADF pipeline that leverages Azure AI for text data analysis. In the last three recipes, you’ll
familiarize yourself with the primary components of Microsoft Fabric Data Factory.
Preface xxi
Chapter 9, Managing Deployment Processes with Azure DevOps, will delve into setting up CI and
CD for data analytics solutions in ADF using Azure DevOps. Throughout the process, we will
also demonstrate how to use Visual Studio Code to facilitate the deployment of changes to ADF.
Chapter 10, Monitoring and Troubleshooting Data Pipelines, will introduce tools to help you manage
and monitor your ADF pipelines. You will learn where and how to find more information about
what went wrong when a pipeline failed, how to debug a failed run, how to set up alerts that
notify you when there is a problem, and how to identify problems with your integration runtimes.
Chapter 11, Working with Azure Data Explorer, will help you to set up a data ingestion pipeline from
ADF to Azure Data Explorer: it includes a step-by-step guide to ingesting JSON data from Azure
Storage and will teach you how to transform data in Azure Data Explorer with ADF activities.
Chapter 12, The Best Practices of Working with ADF, will guide you through essential considerations,
strategies, and practical recipes that will elevate your ADF projects to new heights of efficiency,
security, and scalability.
If you are using the digital version of this book, we advise you to type the code yourself or access
the code via the GitHub repository (link available in the next section). Doing so will help you
avoid any potential errors related to the copying and pasting of code.
We also have other code bundles from our rich catalog of books and videos available at https://
github.com/PacktPublishing/. Check them out!
xxii Preface
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “Mount the
downloaded WebStorm-10*.dmg disk image file as another disk in your system.”
When we wish to draw your attention to a particular part of a code block, the relevant lines or
items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Bold: Indicates a new term, an important word, or words that you see on the screen. For instance,
words in menus or dialog boxes also appear in the text like this. For example: “Select System info
from the Administration panel.”
Preface xxiii
Get in touch
Feedback from our readers is always welcome.
General feedback: Email [email protected], and mention the book’s title in the subject of
your message. If you have questions about any aspect of this book, please email us at questions@
packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do
happen. If you have found a mistake in this book we would be grateful if you would report this
to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on
the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would
be grateful if you would provide us with the location address or website name. Please contact us
at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit http://authors.packtpub.
com.
xxiv Preface
Your review is important to us and the tech community and will help us make sure we’re deliv-
ering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical
books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free
content in your inbox daily
https://packt.link/free-ebook/9781803246598
Microsoft Azure offers a cloud analytics stack that helps us to build modern analytics solutions,
extract data from on-premises and the cloud, and use data for decision-making progress, searching
patterns in data, and deploying machine learning applications.
In this chapter, we will meet Azure data platform services and the main cloud data integration
service – Azure Data Factory (ADF). We will log in to Azure and navigate to the Data Factory
service in order to create the first data pipeline and run the copy activity. Then, we will do the
same exercise but will use different methods of data factory management and control by using
Python, PowerShell, and the Copy Data tool.
If you don’t have an Azure account, we will cover how you can get a free Azure account.
“And when ’e downs ’is ’ead and ’umps ’is back, ye cawn’t
remain, y’ know!”
—Hoyle.
JEFF pulled the paper over and began to scribble madly; pausing
from time to time to glance around for inspiration: at the Judge, at
Mac, at the papers, the books, the typewriter. “I’ll slip in a note for the
kids,” said Jeff. His lips moved, his eyes kindled in his eager
absorption; his face took on a softer and tenderer look. The Judge,
watching him, beamed with almost paternal indulgence.
On the whole Jeff wrote with amazing swiftness for a man who
professed to be unaccustomed to lying. For this communication,
apparently so spontaneous, dashed off by a man hardly yet clear of
the shadow of death, was learned by rote, no syllable
unpremeditated, the very blots of it designed.
This is what he handed the Judge at last:
San Miguel, Chihuahua, March 24.
My dear Wife:
Since I last wrote you I have been on a long trip into the
Yaqui country as guide, interpreter and friend to a timid
tenderfoot—and all-round sharp from the Smithsonian. His
main lay is Cliff-Dweller-ology, but he does other stunts—
rocks and bugs and Indian languages, and early Spanish
relics.
I get big pay. I enclose you $100——
“A hundred dollars! Why, this is blackmail!” remonstrated the Judge,
grinning nevertheless.
“But,” said Jeff, “I’ve got to send it. She knows I wouldn’t stay away
except for good big pay, and she knows I’ll send the big pay to her. I
didn’t think you were a piker. Why, I had thirty dollars in my pocket.
You won’t be out but seventy. And if you don’t send it she’ll know the
letter is a fake. Besides, she needs the money.”
“I surrender! I’ll send it,” said the Judge, and resumed his reading:
——and will send you more when I get back from next trip.
Going way down in the Sierra Madre this time. Don’t know
when we will hit civilization again, so you needn’t write till
you hear from me.
The Cliff-Dweller-ologist had the El Paso papers sent on
here to him and I am reading them all through while he
writes letters and reports and things. I am reading some of
his books, too.
Mary, I always hated it because I didn’t have a better
education. I used to wonder if you wasn’t sometimes
ashamed of me when we was first married. But I’ve
learned a heap from you and I’ve picked up considerable,
reading, these last few years—and I begin to see that
there are compensations in all things. I see a good deal in
things I read now that I would have missed if I’d just
skimmed over the surface when I was younger. For
instance, I’ve just made the acquaintance of Julius Cæsar
—introduced by my chief.
Say, that’s a great book! And I just know I’m getting more
out of it than if I’d been familiar with it ever since I was a
boy, with stone-bruises on my hoofs. I’ve read it over two
or three times now, and find things every time that I didn’t
quite get before.
It ought to be called Yond Caius Cassius, though.
Shakspere makes Julius out to be a superstitious old
wretch. But Julius had some pretty good hunches at that.
Of course Mark Antony’s wonderful speech at the funeral
was fine business. Gee! how he skinned the “Honorable
men!” Some of the things he said after that will stand
reading, too.
But Yond Cassius, he was the man for my money. He was
a regular go-getter. If Brutus had only hearkened to
Cassius once in a while they’d have made a different play
of it. I didn’t like Brutus near so well. He was a four-flusher.
Said he wouldn’t kill himself and sure enough he did. He
was set up and heady and touchy. I shouldn’t wonder if he
was better than Cassius, just morally. I guess maybe that’s
why Cassius knuckled down to him and humored him so.
But intellectually, and as a man of action, he wasn’t ace-
high to Cassius.
Still there’s no denying that Brutus had a fine line of talk.
There was his farewell to Cassius—you remember that—
and his parting with his other friends.
I’ve been reading Carlyle’s “French Revolution” too. It’s a
little too deep for me, so I take it in small doses. It looks to
me like a great writer could take a page of it and build a
book on it.
Well, that’s all I know. Oh, yes! I tried to learn typewriting
when I was in El Paso—I musn’t forget that. I made up a
sentence with all the letters in it—he kept vexing me by
frantic journeys hidden with quiet zeal—I got so I could
rattle that off pretty well, but when I tried new stuff I got
balled up.
Will write you when I can. George will know what to do
with the work. Have the boys help him.
Your loving husband,
Jeff.
Dear Kids:
I wish you could see some of the places I saw in the
mountains. We took the train to Casas Grandes and went
with a pack outfit to Durasno and Tarachi, just over the line
into Sonora. That’s one fine country. Had a good time
going and coming, but when we got there and my chief
was snooping around in those musty old underground
cave houses I was bored a-plenty. One day I remember I
lay in camp with nothing to do and read every line of an
old El Paso paper, ads and all.
Leo, you’re getting to be a big boy now. I want you to get
into something better than punching cows. When you get
time you ought to go down to your Uncle Sim’s and make
a start on learning to use a typewriter. I’ve been trying it
myself, but it’s hard for an old dog to learn new tricks.
You and Wesley must both help your mother, and help
George. Do what George tells you—he knows more about
things than you do. Be good kids. I’ll be home just as soon
as I can.
Dad.
“There,” said Jeff, “if there’s anything you want to blue-pencil I’ll write
it over. Anything you want to say suits me so long as it goes.”
“Why, this seems all right,” said the Judge, after reading it. “I have an
envelope in my billbook. Address it, but don’t seal it. You might
attempt to put in some inclosure by sleight-of-hand. If you try any
such trick I shall consider myself absolved from any promise. If you
don’t, I’ll mail it. I always prefer not to lie when I have nothing to gain
by lying. Bless my soul, how you have blotted it!”
“Yes. I’m getting nervous,” said Jeff.
The envelope bore the address:
MRS. JEFF BRANSFORD,
Rainbow South,
Escondido, N. M.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com