100% found this document useful (2 votes)
736 views103 pages

DP 900T00A ENU TrainerHandbook

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
736 views103 pages

DP 900T00A ENU TrainerHandbook

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Microsoft

Official
Course

DP-900T00
Microsoft Azure Data
Fundamentals
DP-900T00
Microsoft Azure Data
Fundamentals
II  Disclaimer

 
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in 
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
 
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
 
The names of manufacturers, products, or URLs are provided for informational purposes only and   
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is
not responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained  
therein.
 
© 2019 Microsoft Corporation. All rights reserved.
 
Microsoft and the trademarks listed at http://www.microsoft.com/trademarks 1are trademarks of the
Microsoft group of companies. All other trademarks are property of their respective owners.
 
 

1 http://www.microsoft.com/trademarks
EULA  III

MICROSOFT LICENSE TERMS


MICROSOFT INSTRUCTOR-LED COURSEWARE
These license terms are an agreement between Microsoft Corporation (or based on where you live, one
of its affiliates) and you. Please read them. They apply to your use of the content accompanying this
agreement which includes the media on which you received it, if any. These license terms also apply to
Trainer Content and any updates and supplements for the Licensed Content unless other terms accompa-
ny those items. If so, those terms apply.
BY ACCESSING, DOWNLOADING OR USING THE LICENSED CONTENT, YOU ACCEPT THESE TERMS.
IF YOU DO NOT ACCEPT THEM, DO NOT ACCESS, DOWNLOAD OR USE THE LICENSED CONTENT.
If you comply with these license terms, you have the rights below for each license you acquire.
1. DEFINITIONS.
1. “Authorized Learning Center” means a Microsoft Imagine Academy (MSIA) Program Member,
Microsoft Learning Competency Member, or such other entity as Microsoft may designate from
time to time.
2. “Authorized Training Session” means the instructor-led training class using Microsoft Instruc-
tor-Led Courseware conducted by a Trainer at or through an Authorized Learning Center.
3. “Classroom Device” means one (1) dedicated, secure computer that an Authorized Learning Center
owns or controls that is located at an Authorized Learning Center’s training facilities that meets or
exceeds the hardware level specified for the particular Microsoft Instructor-Led Courseware.
4. “End User” means an individual who is (i) duly enrolled in and attending an Authorized Training
Session or Private Training Session, (ii) an employee of an MPN Member (defined below), or (iii) a
Microsoft full-time employee, a Microsoft Imagine Academy (MSIA) Program Member, or a
Microsoft Learn for Educators – Validated Educator.
5. “Licensed Content” means the content accompanying this agreement which may include the
Microsoft Instructor-Led Courseware or Trainer Content.
6. “Microsoft Certified Trainer” or “MCT” means an individual who is (i) engaged to teach a training
session to End Users on behalf of an Authorized Learning Center or MPN Member, and (ii) current-
ly certified as a Microsoft Certified Trainer under the Microsoft Certification Program.
7. “Microsoft Instructor-Led Courseware” means the Microsoft-branded instructor-led training course
that educates IT professionals, developers, students at an academic institution, and other learners
on Microsoft technologies. A Microsoft Instructor-Led Courseware title may be branded as MOC,
Microsoft Dynamics, or Microsoft Business Group courseware.
8. “Microsoft Imagine Academy (MSIA) Program Member” means an active member of the Microsoft
Imagine Academy Program.
9. “Microsoft Learn for Educators – Validated Educator” means an educator who has been validated
through the Microsoft Learn for Educators program as an active educator at a college, university,
community college, polytechnic or K-12 institution.
10. “Microsoft Learning Competency Member” means an active member of the Microsoft Partner
Network program in good standing that currently holds the Learning Competency status.
11. “MOC” means the “Official Microsoft Learning Product” instructor-led courseware known as
Microsoft Official Course that educates IT professionals, developers, students at an academic
institution, and other learners on Microsoft technologies.
12. “MPN Member” means an active Microsoft Partner Network program member in good standing.
IV  EULA

13. “Personal Device” means one (1) personal computer, device, workstation or other digital electronic
device that you personally own or control that meets or exceeds the hardware level specified for
the particular Microsoft Instructor-Led Courseware.
14. “Private Training Session” means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective using Microsoft Instructor-Led
Courseware. These classes are not advertised or promoted to the general public and class attend-
ance is restricted to individuals employed by or contracted by the corporate customer.
15. “Trainer” means (i) an academically accredited educator engaged by a Microsoft Imagine Academy
Program Member to teach an Authorized Training Session, (ii) an academically accredited educator
validated as a Microsoft Learn for Educators – Validated Educator, and/or (iii) a MCT.
16. “Trainer Content” means the trainer version of the Microsoft Instructor-Led Courseware and
additional supplemental content designated solely for Trainers’ use to teach a training session
using the Microsoft Instructor-Led Courseware. Trainer Content may include Microsoft PowerPoint
presentations, trainer preparation guide, train the trainer materials, Microsoft One Note packs,
classroom setup guide and Pre-release course feedback form. To clarify, Trainer Content does not
include any software, virtual hard disks or virtual machines.
2. USE RIGHTS. The Licensed Content is licensed, not sold. The Licensed Content is licensed on a one
copy per user basis, such that you must acquire a license for each individual that accesses or uses the
Licensed Content.
●● 2.1 Below are five separate sets of use rights. Only one set of rights apply to you.
1. If you are a Microsoft Imagine Academy (MSIA) Program Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:

1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User who is enrolled in the Authorized Training Session, and only immediately
prior to the commencement of the Authorized Training Session that is the subject matter
of the Microsoft Instructor-Led Courseware being provided, or
2. provide one (1) End User with the unique redemption code and instructions on how they
can access one (1) digital version of the Microsoft Instructor-Led Courseware, or
3. provide one (1) Trainer with the unique redemption code and instructions on how they
can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:

1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure each End User attending an Authorized Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Authorized Training Session,
3. you will ensure that each End User provided with the hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
EULA  V

User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified Trainers who have in-depth knowledge of and experience with
the Microsoft technology that is the subject of the Microsoft Instructor-Led Courseware
being taught for all your Authorized Training Sessions,
6. you will only deliver a maximum of 15 hours of training per week for each Authorized
Training Session that uses a MOC title, and
7. you acknowledge that Trainers that are not MCTs will not have access to all of the trainer
resources for the Microsoft Instructor-Led Courseware.
2. If you are a Microsoft Learning Competency Member:
1. Each license acquire may only be used to review one (1) copy of the Microsoft Instruc-
tor-Led Courseware in the form provided to you. If the Microsoft Instructor-Led Course-
ware is in digital format, you may install one (1) copy on up to three (3) Personal Devices.
You may not install the Microsoft Instructor-Led Courseware on a device you do not own or
control.
2. For each license you acquire on behalf of an End User or MCT, you may either:
1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Authorized Training Session and only immediately prior to
the commencement of the Authorized Training Session that is the subject matter of the
Microsoft Instructor-Led Courseware provided, or
2. provide one (1) End User attending the Authorized Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) MCT with the unique redemption code and instructions on how
they can access one (1) Trainer Content.
3. For each license you acquire, you must comply with the following:
1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Authorized Training Session has their
own valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of
the Authorized Training Session,
3. you will ensure that each End User provided with a hard-copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
VI  EULA

4. you will ensure that each MCT teaching an Authorized Training Session has their own
valid licensed copy of the Trainer Content that is the subject of the Authorized Training
Session,
5. you will only use qualified MCTs who also hold the applicable Microsoft Certification
credential that is the subject of the MOC title being taught for all your Authorized
Training Sessions using MOC,
6. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
7. you will only provide access to the Trainer Content to MCTs.
3. If you are a MPN Member:
1. Each license acquired on behalf of yourself may only be used to review one (1) copy of the
Microsoft Instructor-Led Courseware in the form provided to you. If the Microsoft Instruc-
tor-Led Courseware is in digital format, you may install one (1) copy on up to three (3)
Personal Devices. You may not install the Microsoft Instructor-Led Courseware on a device
you do not own or control.
2. For each license you acquire on behalf of an End User or Trainer, you may either:

1. distribute one (1) hard copy version of the Microsoft Instructor-Led Courseware to one
(1) End User attending the Private Training Session, and only immediately prior to the
commencement of the Private Training Session that is the subject matter of the Micro-
soft Instructor-Led Courseware being provided, or
2. provide one (1) End User who is attending the Private Training Session with the unique
redemption code and instructions on how they can access one (1) digital version of the
Microsoft Instructor-Led Courseware, or
3. you will provide one (1) Trainer who is teaching the Private Training Session with the
unique redemption code and instructions on how they can access one (1) Trainer
Content.
3. For each license you acquire, you must comply with the following:

1. you will only provide access to the Licensed Content to those individuals who have
acquired a valid license to the Licensed Content,
2. you will ensure that each End User attending an Private Training Session has their own
valid licensed copy of the Microsoft Instructor-Led Courseware that is the subject of the
Private Training Session,
3. you will ensure that each End User provided with a hard copy version of the Microsoft
Instructor-Led Courseware will be presented with a copy of this agreement and each End
User will agree that their use of the Microsoft Instructor-Led Courseware will be subject
to the terms in this agreement prior to providing them with the Microsoft Instructor-Led
Courseware. Each individual will be required to denote their acceptance of this agree-
ment in a manner that is enforceable under local law prior to their accessing the Micro-
soft Instructor-Led Courseware,
4. you will ensure that each Trainer teaching an Private Training Session has their own valid
licensed copy of the Trainer Content that is the subject of the Private Training Session,
EULA  VII

5. you will only use qualified Trainers who hold the applicable Microsoft Certification
credential that is the subject of the Microsoft Instructor-Led Courseware being taught
for all your Private Training Sessions,
6. you will only use qualified MCTs who hold the applicable Microsoft Certification creden-
tial that is the subject of the MOC title being taught for all your Private Training Sessions
using MOC,
7. you will only provide access to the Microsoft Instructor-Led Courseware to End Users,
and
8. you will only provide access to the Trainer Content to Trainers.
4. If you are an End User:
For each license you acquire, you may use the Microsoft Instructor-Led Courseware solely for
your personal training use. If the Microsoft Instructor-Led Courseware is in digital format, you
may access the Microsoft Instructor-Led Courseware online using the unique redemption code
provided to you by the training provider and install and use one (1) copy of the Microsoft
Instructor-Led Courseware on up to three (3) Personal Devices. You may also print one (1) copy
of the Microsoft Instructor-Led Courseware. You may not install the Microsoft Instructor-Led
Courseware on a device you do not own or control.
5. If you are a Trainer.
1. For each license you acquire, you may install and use one (1) copy of the Trainer Content in
the form provided to you on one (1) Personal Device solely to prepare and deliver an
Authorized Training Session or Private Training Session, and install one (1) additional copy
on another Personal Device as a backup copy, which may be used only to reinstall the
Trainer Content. You may not install or use a copy of the Trainer Content on a device you do
not own or control. You may also print one (1) copy of the Trainer Content solely to prepare
for and deliver an Authorized Training Session or Private Training Session.
2. If you are an MCT, you may customize the written portions of the Trainer Content that are
logically associated with instruction of a training session in accordance with the most recent
version of the MCT agreement.
3. If you elect to exercise the foregoing rights, you agree to comply with the following: (i)
customizations may only be used for teaching Authorized Training Sessions and Private
Training Sessions, and (ii) all customizations will comply with this agreement. For clarity, any
use of “customize” refers only to changing the order of slides and content, and/or not using
all the slides or content, it does not mean changing or modifying any slide or content.
●● 2.2 Separation of Components. The Licensed Content is licensed as a single unit and you
may not separate their components and install them on different devices.
●● 2.3 Redistribution of Licensed Content. Except as expressly provided in the use rights
above, you may not distribute any Licensed Content or any portion thereof (including any permit-
ted modifications) to any third parties without the express written permission of Microsoft.
●● 2.4 Third Party Notices. The Licensed Content may include third party code that Micro-
soft, not the third party, licenses to you under this agreement. Notices, if any, for the third party
code are included for your information only.
●● 2.5 Additional Terms. Some Licensed Content may contain components with additional
terms, conditions, and licenses regarding its use. Any non-conflicting terms in those conditions
and licenses also apply to your use of that respective component and supplements the terms
described in this agreement.
VIII  EULA

3. LICENSED CONTENT BASED ON PRE-RELEASE TECHNOLOGY. If the Licensed Content’s subject


matter is based on a pre-release version of Microsoft technology (“Pre-release”), then in addition to
the other provisions in this agreement, these terms also apply:
1. Pre-Release Licensed Content. This Licensed Content subject matter is on the Pre-release
version of the Microsoft technology. The technology may not work the way a final version of the
technology will and we may change the technology for the final version. We also may not release a
final version. Licensed Content based on the final version of the technology may not contain the
same information as the Licensed Content based on the Pre-release version. Microsoft is under no
obligation to provide you with any further content, including any Licensed Content based on the
final version of the technology.
2. Feedback. If you agree to give feedback about the Licensed Content to Microsoft, either directly
or through its third party designee, you give to Microsoft without charge, the right to use, share
and commercialize your feedback in any way and for any purpose. You also give to third parties,
without charge, any patent rights needed for their products, technologies and services to use or
interface with any specific parts of a Microsoft technology, Microsoft product, or service that
includes the feedback. You will not give feedback that is subject to a license that requires Micro-
soft to license its technology, technologies, or products to third parties because we include your
feedback in them. These rights survive this agreement.
3. Pre-release Term. If you are an Microsoft Imagine Academy Program Member, Microsoft Learn-
ing Competency Member, MPN Member, Microsoft Learn for Educators – Validated Educator, or
Trainer, you will cease using all copies of the Licensed Content on the Pre-release technology upon
(i) the date which Microsoft informs you is the end date for using the Licensed Content on the
Pre-release technology, or (ii) sixty (60) days after the commercial release of the technology that is
the subject of the Licensed Content, whichever is earliest (“Pre-release term”). Upon expiration or
termination of the Pre-release term, you will irretrievably delete and destroy all copies of the
Licensed Content in your possession or under your control.
4. SCOPE OF LICENSE. The Licensed Content is licensed, not sold. This agreement only gives you some
rights to use the Licensed Content. Microsoft reserves all other rights. Unless applicable law gives you
more rights despite this limitation, you may use the Licensed Content only as expressly permitted in
this agreement. In doing so, you must comply with any technical limitations in the Licensed Content
that only allows you to use it in certain ways. Except as expressly permitted in this agreement, you
may not:
●● access or allow any individual to access the Licensed Content if they have not acquired a valid
license for the Licensed Content,
●● alter, remove or obscure any copyright or other protective notices (including watermarks), brand-
ing or identifications contained in the Licensed Content,
●● modify or create a derivative work of any Licensed Content,
●● publicly display, or make the Licensed Content available for others to access or use,
●● copy, print, install, sell, publish, transmit, lend, adapt, reuse, link to or post, make available or
distribute the Licensed Content to any third party,
●● work around any technical limitations in the Licensed Content, or
●● reverse engineer, decompile, remove or otherwise thwart any protections or disassemble the
Licensed Content except and only to the extent that applicable law expressly permits, despite this
limitation.
5. RESERVATION OF RIGHTS AND OWNERSHIP. Microsoft reserves all rights not expressly granted to
you in this agreement. The Licensed Content is protected by copyright and other intellectual property
EULA  IX

laws and treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property
rights in the Licensed Content.
6. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regula-
tions. You must comply with all domestic and international export laws and regulations that apply to
the Licensed Content. These laws include restrictions on destinations, end users and end use. For
additional information, see www.microsoft.com/exporting.
7. SUPPORT SERVICES. Because the Licensed Content is provided “as is”, we are not obligated to
provide support services for it.
8. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you
fail to comply with the terms and conditions of this agreement. Upon termination of this agreement
for any reason, you will immediately stop all use of and delete and destroy all copies of the Licensed
Content in your possession or under your control.
9. LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed
Content. The third party sites are not under the control of Microsoft, and Microsoft is not responsible
for the contents of any third party sites, any links contained in third party sites, or any changes or
updates to third party sites. Microsoft is not responsible for webcasting or any other form of trans-
mission received from any third party sites. Microsoft is providing these links to third party sites to
you only as a convenience, and the inclusion of any link does not imply an endorsement by Microsoft
of the third party site.
10. ENTIRE AGREEMENT. This agreement, and any additional terms for the Trainer Content, updates and
supplements are the entire agreement for the Licensed Content, updates and supplements.
11. APPLICABLE LAW.
1. United States. If you acquired the Licensed Content in the United States, Washington state law
governs the interpretation of this agreement and applies to claims for breach of it, regardless of
conflict of laws principles. The laws of the state where you live govern all other claims, including
claims under state consumer protection laws, unfair competition laws, and in tort.
2. Outside the United States. If you acquired the Licensed Content in any other country, the laws of
that country apply.
12. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the
laws of your country. You may also have rights with respect to the party from whom you acquired the
Licensed Content. This agreement does not change your rights under the laws of your country if the
laws of your country do not permit it to do so.
13. DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS" AND "AS AVAILA-
BLE." YOU BEAR THE RISK OF USING IT. MICROSOFT AND ITS RESPECTIVE AFFILIATES GIVES NO
EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. YOU MAY HAVE ADDITIONAL CON-
SUMER RIGHTS UNDER YOUR LOCAL LAWS WHICH THIS AGREEMENT CANNOT CHANGE. TO
THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS, MICROSOFT AND ITS RESPECTIVE AFFILI-
ATES EXCLUDES ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICU-
LAR PURPOSE AND NON-INFRINGEMENT.
14. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM
MICROSOFT, ITS RESPECTIVE AFFILIATES AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO
US$5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST
PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
X  EULA

This limitation applies to


●● anything related to the Licensed Content, services, content (including code) on third party Internet
sites or third-party programs; and
●● claims for breach of contract, breach of warranty, guarantee or condition, strict liability, negligence,
or other tort to the extent permitted by applicable law.
It also applies even if Microsoft knew or should have known about the possibility of the damages. The
above limitation or exclusion may not apply to you because your country may not allow the exclusion
or limitation of incidental, consequential, or other damages.
Please note: As this Licensed Content is distributed in Quebec, Canada, some of the clauses in this
agreement are provided below in French.
Remarque : Ce le contenu sous licence étant distribué au Québec, Canada, certaines des clauses
dans ce contrat sont fournies ci-dessous en français.
EXONÉRATION DE GARANTIE. Le contenu sous licence visé par une licence est offert « tel quel ». Toute
utilisation de ce contenu sous licence est à votre seule risque et péril. Microsoft n’accorde aucune autre
garantie expresse. Vous pouvez bénéficier de droits additionnels en vertu du droit local sur la protection
dues consommateurs, que ce contrat ne peut modifier. La ou elles sont permises par le droit locale, les
garanties implicites de qualité marchande, d’adéquation à un usage particulier et d’absence de contre-
façon sont exclues.
LIMITATION DES DOMMAGES-INTÉRÊTS ET EXCLUSION DE RESPONSABILITÉ POUR LES DOMMAG-
ES. Vous pouvez obtenir de Microsoft et de ses fournisseurs une indemnisation en cas de dommages
directs uniquement à hauteur de 5,00 $ US. Vous ne pouvez prétendre à aucune indemnisation pour les
autres dommages, y compris les dommages spéciaux, indirects ou accessoires et pertes de bénéfices.
Cette limitation concerne:
●● tout ce qui est relié au le contenu sous licence, aux services ou au contenu (y compris le code)
figurant sur des sites Internet tiers ou dans des programmes tiers; et.
●● les réclamations au titre de violation de contrat ou de garantie, ou au titre de responsabilité stricte, de
négligence ou d’une autre faute dans la limite autorisée par la loi en vigueur.
Elle s’applique également, même si Microsoft connaissait ou devrait connaître l’éventualité d’un tel
dommage. Si votre pays n’autorise pas l’exclusion ou la limitation de responsabilité pour les dommages
indirects, accessoires ou de quelque nature que ce soit, il se peut que la limitation ou l’exclusion ci-dessus
ne s’appliquera pas à votre égard.
EFFET JURIDIQUE. Le présent contrat décrit certains droits juridiques. Vous pourriez avoir d’autres droits
prévus par les lois de votre pays. Le présent contrat ne modifie pas les droits que vous confèrent les lois
de votre pays si celles-ci ne le permettent pas.
Revised April 2019
Contents

■■ Module 0 Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
Welcome to the course  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
■■ Module 1 Explore core data concepts  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
Core data concepts  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3
Data roles and services  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  14
■■ Module 2 Explore fundamentals of relational data in Azure  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  25
Explore relational data offerings in Azure  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  25
Explore Azure services for relational data  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  36
■■ Module 3 Explore fundamentals of non-relational data in Azure  . . . . . . . . . . . . . . . . . . . . . . . . .  49
Fundamentals of Azure Storage  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  49
Fundamentals of Azure Cosmos DB  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  55
■■ Module 4 Explore fundamentals of data analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  63
Modern data warehousing  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  63
Streaming and real-time analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  70
Data visualization  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  80
Further learning  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  88
Module 0 Introduction

Welcome to the course


About this Course
Welcome to this course on Microsoft Azure Data Fundamentals!
This course is designed for anyone who wants to learn the fundamentals of database concepts in a cloud
environment, get basic skilling in cloud data services, and build their foundational knowledge of cloud
data services within Microsoft Azure. The course provides a practical, hands-on approach in which you
will get a chance to see data in action and try Azure data services for yourself.
The materials in this workbook are designed to be used alongside online modules in Microsoft Learn1.
Throughout the course, you'll find references to specific Learn modules that you should use to gain
hands-on experience.

Learning objectives
After completing this course, you will be able to:
●● Describe core data concepts in Azure.
●● Explain concepts of relational data in Azure.
●● Explain concepts of non-relational data in Azure.
●● Identify components of a modern data warehouse in Azure.

Course Agenda
This course includes the following modules:

Module 1: Explore fundamentals of data


In this module, you will explore core data concepts and data roles and services.

1 https://docs.microsoft.com/learn
2     

Module 2: Explore fundamentals of relational data in Az-


ure
In this module, you will explore relational data concepts, adn Azure services for relational data.

Module 3: Explore fundamentals of non-relational data in


Azure
In this module, you will explore Azure Storage for non-relational data, and the fundamentals of Azure
Cosmos DB.

Module 4: Explore fundamentals of data analytics


In this module, you will you will explore the fundamentals of modern data warehousing, streaming and
real-time analytics, and data visualization.

Lab environment
Labs in this course are based on exercises in Microsoft Learn2.
You will be provided with an Azure subscription for use in this class. Your instructor will provide details.

2 https://docs.microsoft.com/learn
Module 1 Explore core data concepts

Core data concepts


Introduction
Over the last few decades, the amount of data that systems, applications, and devices have generated has
increased significantly. Data is everywhere. Data is available in different structures and formats. Under-
standing data and exploring it reveals interesting facts, and helps you gain meaningful insights.
In this lesson, you'll learn about how you can organize and process data. You'll learn about relational and
non-relational databases, and how data is handled through transactional processing, and through batch
and streaming data processing.
Imagine you're a data analyst for a large consumer organization. The organization wants to understand
customer buying patterns from supermarkets. The organization has a number of datasets from different
sources, such as till information (point of sale), weather data, and holiday data. The organization would
like to use Azure technologies to understand and analyze these datasets.

Learning objectives
In this lesson you will:
●● Identify how data is defined and stored
●● Identify characteristics of relational and non-relational data
●● Describe and differentiate data workloads
●● Describe and differentiate batch and streaming data

What is data?
Data is a collection of facts such as numbers, descriptions, and observations used to record information.
Data structures in which this data is organized often represents entities that are important to an organiza-
tion (such as customers, products, sales orders, and so on). Each entity typically has one or more attrib-
utes, or characteristics (for example, a customer might have a name, an address, a phone number, and so
on).
4     

You can classify data as structured, semi-structured, or unstructured.

Structured data
Structured data is data that adheres to a fixed schema, so all of the data has the same fields or properties.
Most commonly, the schema for structured data entities is tabular - in other words, the data is represent-
ed in one or more tables that consist of rows to represent each instance of a data entity, and columns to
represent attributes of the entity. For example, the following image shows tabular data representations
for Customer and Product entities.

Structured data is often stored in a database in which multiple tables can reference one another by using
key values in a relational model; which we'll explore in more depth later.

Semi-structured data
Semi-structured data is information that has some structure, but which allows for some variation between
entity instances. For example, while most customers may have an email address, some might have
multiple email addresses, and some might have none at all.
One common format for semi-structured data is JavaScript Object Notation (JSON). The example below
shows a pair of JSON documents that represent customer information. Each customer document includes
address and contact information, but the specific fields vary between customers.
// Customer 1
{
"firstName": "Joe",
"lastName": "Jones",
"address":
{
"streetAddress": "1 Main St.",
"city": "New York",
"state": "NY",
"postalCode": "10099"
},
"contact":
[
{
"type": "home",
"number": "555 123-1234"
    5

},
{
"type": "email",
"address": "[email protected]"
}
]
}

// Customer 2
{
"firstName": "Samir",
"lastName": "Nadoy",
"address":
{
"streetAddress": "123 Elm Pl.",
"unit": "500",
"city": "Seattle",
"state": "WA",
"postalCode": "98999"
},
"contact":
[
{
"type": "email",
"address": "[email protected]"
}
]
}

NOTE:
JSON is just one of many ways in which semi-structured data can be represented. The point here is not to
provide a detailed examination of JSON syntax, but rather to illustrate the flexible nature of semi-struc-
tured data representations.

Unstructured data
Not all data is structured or even semi-structured. For example, documents, images, audio and video
data, and binary files might not have a specific structure. This kind of data is referred to as unstructured
data.
6     

Data stores
Organizations typically store data in structured, semi-structured, or unstructured format to record details
of entities (for example, customers and products), specific events (such as sales transactions), or other
information in documents, images, and other formats. The stored data can then be retrieved for analysis
and reporting later.
There are two broad categories of data store in common use:
●● File stores
●● Databases
We'll explore both of these types of data store in subsequent topics.

How is data stored?


The ability to store data in files is a core element of any computing system. Files can be stored in local file
systems on the hard disk of your personal computer, and on removable media such as USB drives; but in
most organizations, important data files are stored centrally in some kind of shared file storage system.
Increasingly, that central storage location is hosted in the cloud, enabling cost-effective, secure, and
reliable storage for large volumes of data.
The specific file format used to store data depends on a number of factors, including:
●● The type of data being stored (structured, semi-structured, or unstructured).
●● The applications and services that will need to read, write, and process the data.
●● The need for the data files to be readable by humans, or optimized for efficient storage and process-
ing.
Some common file formats are discussed below.
    7

Delimited text files


Data is often stored in plain text format with specific field delimiters and row terminators. The most
common format for delimited data is comma-separated values (CSV) in which fields are separated by
commas, and rows are terminated by a carriage return / new line. Optionally, the first line may include the
field names. Other common formats include tab-separated values (TSV) and space-delimited (in which
tabs or spaces are used to separate fields), and fixed-width data in which each field is allocated a fixed
number of characters. Delimited text is a good choice for structured data that needs to be accessed by a
wide range of applications and services in a human-readable format.
The following example shows customer data in comma-delimited format:
FirstName,LastName,Email
Joe,Jones,[email protected]
Samir,Nadoy,[email protected]

JavaScript Object Notation (JSON)


JSON is a ubiquitous format in which a hierarchical document schema is used to define data entities
(objects) that have multiple attributes. Each attribute might be an object (or a collection of objects);
making JSON a flexible format that's good for both structured and semi-structured data.
The following example shows a JSON document containing a collection of customers. Each customer has
three attributes (firstName, lastName, and contact), and the contact attribute contains a collection of
objects that represent one or more contact methods (email or phone). Note that objects are enclosed in
braces ({..}) and collections are enclosed in square brackets ([..]). Attributes are represented by name:val-
ue pairs and separated by commas (,).
{
"customers":
[
{
"firstName": "Joe",
"lastName": "Jones",
"contact":
[
{
"type": "home",
"number": "555 123-1234"
},
{
"type": "email",
"address": "[email protected]"
}
]
},
{
"firstName": "Samir",
"lastName": "Nadoy",
"contact":
[
{
8     

"type": "email",
"address": "[email protected]"
}
]
}
]
}

Extensible Markup Language (XML)


XML is a human-readable data format that was popular in the 1990s and 2000s. It's largely been super-
seded by the less verbose JSON format, but there are still some systems that use XML to represent data.
XML uses tags enclosed in angle-brackets (<../>) to define elements and attributes, as shown in this
example:
<Customers>
<Customer name="Joe" lastName="Jones">
<ContactDetails>
<Contact type="home" number="555 123-1234"/>
<Contact type="email" address="[email protected]"/>
</ContactDetails>
</Customer>
<Customer name="Samir" lastName="Nadoy">
<ContactDetails>
<Contact type="email" address="[email protected]"/>
</ContactDetails>
</Customer>
</Customers>

Binary Large Object (BLOB)


Ultimately, all files are stored as binary data (1's and 0's), but in the human-readable formats discussed
above, the bytes of binary data are mapped to printable characters (typically though a character encod-
ing scheme such as ASCII or Unicode). Some file formats however, particularly for unstructured data,
store the data as raw binary that must be interpreted by applications and rendered. Common types of
data stored as binary include images, video, audio, and application-specific documents.
When working with data like this, data professionals often refer to the data files as BLOBs (Binary Large
Objects).

Optimized file formats


While human-readable formats for structured and semi-structured data can be useful, they're typically
not optimized for storage space or processing. Over time, some specialized file formats that enable
compression, indexing, and efficient storage and processing have been developed.
Some common optimized file formats you might see include Avro, ORC, and Parquet:
●● Avro is a row-based format. It was created by Apache. Each record contains a header that describes
the structure of the data in the record. This header is stored as JSON. The data is stored as binary
information. An application uses the information in the header to parse the binary data and extract
    9

the fields it contains. Avro is a good format for compressing data and minimizing storage and network
bandwidth requirements.
●● ORC (Optimized Row Columnar format) organizes data into columns rather than rows. It was devel-
oped by HortonWorks for optimizing read and write operations in Apache Hive (Hive is a data
warehouse system that supports fast data summarization and querying over large datasets). An ORC
file contains stripes of data. Each stripe holds the data for a column or set of columns. A stripe
contains an index into the rows in the stripe, the data for each row, and a footer that holds statistical
information (count, sum, max, min, and so on) for each column.
●● Parquet is another columnar data format. It was created by Cloudera and Twitter. A Parquet file
contains row groups. Data for each column is stored together in the same row group. Each row group
contains one or more chunks of data. A Parquet file includes metadata that describes the set of rows
found in each chunk. An application can use this metadata to quickly locate the correct chunk for a
given set of rows, and retrieve the data in the specified columns for these rows. Parquet specializes in
storing and processing nested data types efficiently. It supports very efficient compression and
encoding schemes.
A database is used to define a central system in which data can be stored and queried. In a simplistic
sense, the file system on which files are stored is a kind of database; but when we use the term in a
professional data context, we usually mean a dedicated system for managing data records rather than
files.

Relational databases
Relational databases are commonly used to store and query structured data. The data is stored in tables
that represent entities, such as customers, products, or sales orders. Each instance of an entity is assigned
a primary key that uniquely identifies it; and these keys are used to reference the entity instance in other
tables. For example, a customer's primary key can be referenced in a sales order record to indicate which
customer placed the order. This use of keys to reference data entities enables a relational database to be
normalized; which in part means the elimination of duplicate data values so that, for example, the details
of an individual customer are stored only once; not for each sales order the customer places. The tables
are managed and queried using Structured Query Language (SQL), which is based on an ANSII standard,
so it's similar across multiple database systems.
10     

Non-relational databases
Non-relational databases are data management systems that don’t apply a relational schema to the data.
Non-relational databases are often referred to as NoSQL database, even though some support a variant
of the SQL language.
There are four common types of Non-relational database commonly in use.
●● Key-value databases in which each record consists of a unique key and an associated value, which
can be in any format.

●● Document databases, which are a specific form of key-value database in which the value is a JSON
document (which the system is optimized to parse and query)

●● Column family databases, which store tabular data comprising rows and columns, but you can divide
the columns into groups known as column-families. Each column family holds a set of columns that
are logically related together.

●● Graph databases, which store entities as nodes with links to define relationships between them.
    11

Transactional data workloads


A transactional data processing system is what most people consider the primary function of business
computing. A transactional system records transactions that encapsulate specific events that the organi-
zation wants to track. A transaction could be financial, such as the movement of money between ac-
counts in a banking system, or it might be part of a retail system, tracking payments for goods and
services from customers. Think of a transaction as a small, discrete, unit of work.
Transactional systems are often high-volume, sometimes handling many millions of transactions in a
single day. The data being processed has to be accessible very quickly. The work performed by transac-
tional systems is often referred to as Online Transactional Processing (OLTP).

OLTP solutions rely on a database system in which data storage is optimized for both read and write
operations in order to support transactional workloads in which data records are created, retrieved,
updated, and deleted (often referred to as CRUD operations). These operations are applied transactional-
ly, in a way that ensures the integrity of the data stored in the database. To accomplish this, OLTP systems
enforce transactions that support so-called ACID semantics:
●● Atomicity – each transaction is treated as a single unit, which succeeds completely or fails completely.
For example, a transaction that involved debiting funds from one account and crediting the same
amount to another account must complete both actions. If either action can't be completed, then the
other action must fail.
●● Consistency – transactions can only take the data in the database from one valid state to another. To
continue the debit and credit example above, the completed state of the transaction must reflect the
transfer of funds from one account to the other.
●● Isolation – concurrent transactions cannot interfere with one another, and must result in a consistent
database state. For example, while the transaction to transfer funds from one account to another is
in-process, another transaction that checks the balance of these accounts must return consistent
results - the balance-checking transaction can't retrieve a value for one account that reflects the
balance before the transfer, and a value for the other account that reflects the balance after the
transfer.
●● Durability – when a transaction has been committed, it will remain committed. After the account
transfer transaction has completed, the revised account balances are persisted so that even if the
database system were to be switched off, the committed transaction would be reflected when it is
switched on again.
OLTP systems are typically used to support live applications that process business data - often referred to
as line of business (LOB) applications.

Analytical data workloads


Analytical data processing typically uses read-only (or read-mostly) systems that store vast volumes of
historical data or business metrics.
Analytics can be based on a snapshot of the data at a given point in time, or a series of snapshots.
12     

The specific details for an analytical processing system can vary between solutions, but a common
architecture for enterprise-scale analytics looks like this:

1. Data files may be stored in a central data lake for analysis.


2. An extract, transform, and load (ETL) process copies data from files and OLTP databases into a data
warehouse that is optimized for read activity. Commonly, a data warehouse schema is based on fact
tables that contain numeric values you want to analyze (for example, sales amounts), with related
dimension tables that represent the entities by which you want to measure them (for example,
customer or product),
3. Data in the data warehouse may be aggregated and loaded into an online analytical processing
(OLAP) model, or cube. Aggregated numeric values (measures) from fact tables are calculated for
intersections of dimensions from dimension tables. For example, sales revenue might be totaled by
date, customer, and product.
4. The data in the data lake, data warehouse, and analytical model can be queried to produce reports,
visualizations, and dashboards.
Data lakes are common in modern data analytical processing scenarios, where a large volume of file-
based data must be collected and analyzed.
Data warehouses are an established way to store data in a relational schema that is optimized for read
operations – primarily queries to support reporting and data visualization. The data warehouse schema
may require some denormalization of data in an OLTP data source (introducing some duplication to make
queries perform faster).
An OLAP model is an aggregated type of data storage that is optimized for analytical workloads. Data
aggregations are across dimensions at different levels, enabling you to drill up/down to view aggrega-
tions at multiple hierarchical levels; for example to find total sales by region, by city, or for an individual
address. Because OLAP data is pre-aggregated, queries to return the summaries it contains can be run
quickly.
Different types of user might perform data analytical work at different stages of the overall architecture.
For example:
●● Data scientists might work directly with data files in a data lake to explore and model data.
●● Data Analysts might query tables directly in the data warehouse to produce complex reports and
visualizations.
●● Business users might consume pre-aggregated data in an analytical model in the form of reports or
dashboards.
    13

Knowledge check
Question 1
How is data in a relational table organized?
†† Rows and Columns
†† Header and Footer
†† Pages and Paragraphs

Question 2
Which of the following is an example of unstructured data?
†† An Employee table with columns Employee ID, Employee Name, and Employee Designation
†† Audio and Video files
†† A table within SQL Server database

Question 3
What is a data warehouse?
†† A non-relational database optimized for read and write operations
†† A relational database optimized for read operations
†† A storage location for unstructured data files

Summary
Data is at the core of most software applications and solutions. It can be represented in many formats,
stored in files and databases, and used to record transactions or to support analysis and reporting.
In this lesson you've learned how to:
Identify common data formats
Describe options for storing data in files
Describe options for storing data in databases
Describe characteristics of transactional data processing solutions
Describe characteristics of analytical data processing solutions

Next steps
Now that you've learned about some core data concepts, consider learning more about data-related
workloads on Microsoft Azure by pursuing a Microsoft certification in Azure Data Fundamentals .
14     

Data roles and services


Introduction
Over the last decade, the amount of data that systems and devices generate has increased significantly.
Because of this increase, new technologies, roles, and approaches to working with data are affecting data
professionals. Data professionals typically fulfill different roles when managing, using, and controlling
data. In this module, you'll learn about the various roles that organizations often apply to data profes-
sionals, the tasks and responsibilities associated with these roles, and the Microsoft Azure services used
to perform them.

Learning objectives
In this module you will learn how to:
●● Identify common data professional roles
●● Identify common cloud services used by data professionals

Data professional roles


There's a wide variety of roles involved in managing, controlling, and using data. Some roles are busi-
ness-oriented, some involve more engineering, some focus on research, and some are hybrid roles that
combine different aspects of data management. Your organization may define roles differently, or give
them different names, but the roles described in this unit encapsulate the most common division of tasks
and responsibilities.
The three key job roles that deal with data in most organizations are:
●● Database administrators manage databases, assigning permissions to users, storing backup copies
of data and restore data in the event of a failure.
●● Data engineers manage infrastructure and processes for data integration across the organization,
applying data cleaning routines, identifying data governance rules, and implementing pipelines to
transfer and transform data between systems.
●● Data analysts explore and analyze data to create visualizations and charts that enable organizations
to make informed decisions.
NOTE:
The job roles define differentiated tasks and responsibilities. In some organizations, the same person
might perform multiple roles; so in their role as database administrator they might provision a transac-
tional database, and then in their role as a data engineer they might create a pipeline to transfer data
from the database to a data warehouse for analysis.

Database Administrator

A database administrator is responsible for the design, implementation, maintenance, and operational
aspects of on-premises and cloud-based database systems. They're responsible for the overall availability
    15

and consistent performance and optimizations of databases. They work with stakeholders to implement
policies, tools, and processes for backup and recovery plans to recover following a natural disaster or
human-made error.

The database administrator is also responsible for managing the security of the data in the database,
granting privileges over the data, granting or denying access to users as appropriate.

Data Engineer

A data engineer collaborates with stakeholders to design and implement data-related workloads, includ-
ing data ingestion pipelines, cleansing and transformation activities, and data stores for analytical
workloads. They use a wide range of data platform technologies, including relational and non-relational
databases, file stores, and data streams.

They're also responsible for ensuring that the privacy of data is maintained within the cloud and spanning
from on-premises to the cloud data stores. They own the management and monitoring of data pipelines
to ensure that data loads perform as expected.

Data Analyst

A data analyst enables businesses to maximize the value of their data assets. They're responsible for
exploring data to identify trends and relationships, designing and building analytical models, and ena-
bling advanced analytics capabilities through reports and visualizations.

A data analyst processes raw data into relevant insights based on identified business requirements to
deliver relevant insights.
NOTE:
The roles described here represent the key data-related roles found in most medium to large organiza-
tions. There are additional data-related roles not mentioned here, such as data scientist and data archi-
tect; and there are other technical professionals that work with data, including application developers and
software engineers.

Microsoft cloud services for data


Microsoft Azure is a cloud platform that powers the applications and IT infrastructure for some of the
world's largest organizations. It includes many services to support cloud solutions, including transactional
and analytical data workloads.
Some of the most commonly used cloud services for data are described below.
16     

NOTE:
This topic covers only some of the most commonly used data services for modern transactional and
analytical solutions. Additional services are also available.

Azure SQL

Azure SQL is the collective name for a family of relational database solutions based on the Microsoft SQL
Server database engine. Specific Azure SQL services include:
●● Azure SQL Database – a fully managed platform-as-a-service (PaaS) database hosted in Azure
●● Azure SQL Managed Instance – a hosted instance of SQL Server with automated maintenance, which
allows more flexible configuration than Azure SQL DB but with more administrative responsibility for
the owner.
●● Azure SQL VM – a virtual machine with an installation of SQL Server, allowing maximum configurabil-
ity with full management responsibility.
Database administrators typically provision and manage Azure SQL database systems to support line of
business (LOB) applications that need to store transactional data.
Data engineers may use Azure SQL database systems as sources for data pipelines that perform extract,
transform, and load (ETL) operations to ingest the transactional data into an analytical system.
Data analysts may query Azure SQL databases directly to create reports, though in large organizations
the data is generally combined with data from other sources in an analytical data store to support
enterprise analytics.

Azure Database for open-source relational databases

Azure includes managed services for popular open-source relational database systems, including:
●● Azure Database for MySQL - a simple-to-use open-source database management system that is
commonly used in Linux, Apache, MySQL, and PHP (LAMP) stack apps.
●● Azure Database for MariaDB - a newer database management system, created by the original
developers of MySQL. The database engine has since been rewritten and optimized to improve
performance. MariaDB offers compatibility with Oracle Database (another popular commercial
database management system).
●● Azure Database for PostgreSQL - a hybrid relational-object database. You can store data in relation-
al tables, but a PostgreSQL database also enables you to store custom data types, with their own
non-relational properties.
As with Azure SQL database systems, open-source relational databases are managed by database
administrators to support transactional applications, and provide a data source for data engineers
building pipelines for analytical solutions and data analysts creating reports.
    17

Azure Cosmos DB

Azure Cosmos DB is a global-scale non-relational (NoSQL) database system that supports multiple
application programming interfaces (APIs), enabling you to store and manage data as JSON documents,
key-value pairs, column-families, and graphs.
In some organizations, Cosmos DB instances may be provisioned and managed by a database adminis-
trator; though often software developers manage NoSQL data storage as part of the overall application
architecture. Data engineers often need to integrate Cosmos DB data sources into enterprise analytical
solutions that support modeling and reporting by data analysts.

Azure Storage

Azure Storage is a core Azure service that enables you to store data in:
●● Blob containers - scalable, cost-effective storage for binary files.
●● File shares - network file shares such as you typically find in corporate networks.
●● Tables - key-value storage for applications that need to read and write data values quickly.
Data engineers use Azure Storage to host data lakes - blob storage with a hierarchical namespace that
enables files to be organized in folders in a distributed file system.

Azure Data Factory

Azure Data Factory is an Azure service that enables you to define and schedule data pipelines to transfer
and transform data. You can integrate your pipelines with other Azure services, enabling you to ingest
data from cloud data stores, process the data using cloud-based compute, and persist the results in
another data store.
Azure Data Factory is used by data engineers to build extract, transform, and load (ETL) solutions that
populate analytical data stores with data from transactional systems across the organization.

Azure Synapse Analytics

Azure Synapse Analytics is a comprehensive, unified data analytics solution that provides a single service
interface for multiple analytical capabilities, including:
●● Pipelines - based on the same technology as Azure Data Factory.
●● SQL - a highly scalable SQL database engine, optimized for data warehouse workloads.
18     

●● Apache Spark - an open-source distributed data processing system that supports multiple program-
ming languages and APIs, including Java, Scala, Python, and SQL.
●● Azure Synapse Data Explorer - a high-performance data analytics solution that is optimized for
real-time querying of log and telemetry data using Kusto Query Language (KQL).
Data engineers can use Azure Synapse Analytics to create a unified data analytics solution that combines
data ingestion pipelines, data warehouse storage, and data lake storage through a single service.
Data analysts can use SQL and Spark pools through interactive notebooks to explore and analyze data,
and take advantage of integration with services such as Azure Machine Learning and Microsoft Power BI
to create data models and extract insights from the data.

Azure Databricks

Azure Databricks is an Azure-integrated version of the popular Databricks platform, which combines the
Apache Spark data processing platform with SQL database semantics and an integrated management
interface to enable large-scale data analytics.
Data engineers can use existing Databricks and Spark skills to create analytical data stores in Azure
Databricks.
Data Analysts can use the native notebook support in Azure Databricks to query and visualize data in an
easy to use web-based interface.

Azure HDInsight

Azure HDInsight is an Azure service that provides Azure-hosted clusters for popular Apache open-source
big data processing technologies, including:
●● Apache Spark - a distributed data processing system that supports multiple programming languages
and APIs, including Java, Scala, Python, and SQL.
●● Apache Hadoop - a distributed system that uses MapReduce jobs to process large volumes of data
efficiently across multiple cluster nodes. MapReduce jobs can be written in Java or abstracted by
interfaces such as Apache Hive - a SQL-based API that runs on Hadoop.
●● Apache HBase - an open-source system for large-scale NoSQL data storage and querying.
●● Apache Kafka - a message broker for data stream processing.
●● Apache Storm - an open-source system for real-time data processing through a topology of spouts
and bolts.
Data engineers can use Azure HDInsight to support big data analytics workloads that depend on multiple
open-source technologies.
    19

Azure Stream Analytics

Azure Stream Analytics is a real-time stream processing engine that captures a stream of data from an
input, applies a query to extract and manipulate data from the input stream, and writes the results to an
output for analysis or further processing.
Data engineers can incorporate Azure Stream Analytics into data analytics architectures that capture
streaming data for ingestion into an analytical data store or for real-time visualization.

Azure Data Explorer

Azure Data Explorer is a standalone service that offers the same high-performance querying of log and
telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.
Data analysts can use Azure Data Explorer to query and analyze data that includes a timestamp attribute,
such as is typically found in log files and Internet-of-things (IoT) telemetry data.

Microsoft Purview

Microsoft Purview provides a solution for enterprise-wide data governance and discoverability. You can
use Microsoft Purview to create a map of your data and track data lineage across multiple data sources
and systems, enabling you to find trustworthy data for analysis and reporting.
Data engineers can use Microsoft Purview to enforce data governance across the enterprise and ensure
the integrity of data used to support analytical workloads.
20     

Microsoft Power BI

Microsoft Power BI is a platform for analytical data modeling and reporting that data analysts can use to
create and share interactive data visualizations. Power BI reports can be created by using the Power BI
Desktop application, and the published and delivered through web-based reports and apps in the Power
BI service, as well as in the Power BI mobile app.

Knowledge check
Question 1
Which one of the following tasks is a role of a database administrator?
†† Backing up and restoring databases
†† Creating dashboards and reports
†† Identifying data quality issues

Question 2
Which role is most likely to use Azure Data Factory to define a data pipeline for an ETL process?
†† Database Administrator
†† Data Engineer
†† Data Analyst

Question 3
Which single service would you use to implement data pipelines, SQL analytics, and Spark analytics?
†† Azure SQL Database
†† Microsoft Power BI
†† Azure Synapse Analytics

Summary
Managing and working with data is a specialist skill that requires knowledge of multiple technologies.
Most organizations define job roles for the various tasks responsible for managing data.
In this lesson you've learned how to:
●● Identify common data professional roles
●● Identify common cloud services used by data professionals
    21

Next steps
Now that you’ve learned about professional data roles and the services they use, consider learning more
about data-related workloads on Microsoft Azure by pursuing a Microsoft certification in Azure Data
Fundamentals1.

1 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
22     

Answers
Question 1
How is data in a relational table organized?
■■ Rows and Columns
†† Header and Footer
†† Pages and Paragraphs
Explanation
Structured data is typically tabular data that is represented by rows and columns in a database table.
Question 2
Which of the following is an example of unstructured data?
†† An Employee table with columns Employee ID, Employee Name, and Employee Designation
■■ Audio and Video files
†† A table within SQL Server database
Explanation
Audio and video files are unstructured data.
Question 3
What is a data warehouse?
†† A non-relational database optimized for read and write operations
■■ A relational database optimized for read operations
†† A storage location for unstructured data files
Explanation
A data warehouse is a relational database in which the schema is optimized for queries that read data.
Question 1
Which one of the following tasks is a role of a database administrator?
■■ Backing up and restoring databases
†† Creating dashboards and reports
†† Identifying data quality issues
Explanation
Database Administrators will back up the database and will restore database when data is lost or corrupted.
Question 2
Which role is most likely to use Azure Data Factory to define a data pipeline for an ETL process?
†† Database Administrator
■■ Data Engineer
†† Data Analyst
Explanation
Data engineers create data pipelines.
    23

Question 3
Which single service would you use to implement data pipelines, SQL analytics, and Spark analytics?
†† Azure SQL Database
†† Microsoft Power BI
■■ Azure Synapse Analytics
Explanation
Azure Synapse Analytics includes native support for data pipelines, SQL, and Spark.
Module 2 Explore fundamentals of relational
data in Azure

Explore relational data offerings in Azure


Introduction
In the early years of computing systems, every application stored data in its own unique structure. When
developers wanted to build applications to use that data, they had to know a lot about the particular data
structure to find the data they needed. These data structures were inefficient, hard to maintain, and hard
to optimize for good application performance. The relational database model was designed to solve the
problem of multiple arbitrary data structures. The relational model provides a standard way of represent-
ing and querying data that can be used by any application. One of the key advantages of the relational
database model is its use of tables, which are an intuitive, efficient, and flexible way to store and access
structured information.
The simple yet powerful relational model is used by organizations of all types and sizes for a broad
variety of information management needs. Relational databases are used to track inventories, process
ecommerce transactions, manage huge amounts of mission-critical customer information, and much
more. A relational database is useful for storing any information containing related data elements that
must be organized in a rules-based, consistent structure.
In this module, you'll learn about the key characteristics of relational databases, and explore relational
data structures.

Learning objectives
In this module you will learn how to:
●● Identify characteristics of relational data
●● Define normalization
●● Identify types of SQL statement
●● Identify common relational database objects
26     

Relational tables
In a relational database, you model collections of entities from the real world as tables. An entity can be
anything for which you want to record information; typically important objects and events. For example,
in a retail system example, you might create tables for customers, products, orders, and line items within
an order. A table contains rows, and each row represents a single instance of an entity. In the retail
scenario, each row in the customer table contains the data for a single customer, each row in the product
table defines a single product, each row in the order table represents an order made by a customer, and
each row in the line item table represents a product that was included in an order.

Relational tables are a format for structured data, and each row in a table has the same columns; though
in some cases, not all columns need to have a value – for example, a customer table might include a
MiddleName column; which can be empty (or NULL) for rows that represent customers with no middle
name or whose middle name is unknown).
Each column stores data of a specific datatype. For example, An Email column in a Customer table would
likely be defined to store character-based (text) data (which might be fixed or variable in length), a Price
column in a Product table might be defined to store decimal numeric data, while a Quantity column in
an Order table might be constrained to integer numeric values; and an OrderDate column in the same
Order table would be defined to store date/time values. The available datatypes that you can use when
defining a table depend on the database system you are using; though there are standard datatypes
defined by the American National Standards Institute (ANSI) that are supported by most database
systems.

Normalization
Normalization is a term used by database professionals for a schema design process that minimizes data
duplication and enforces data integrity.
    27

While there are many complex rules that define the process of refactoring data into various levels (or
forms) of normalization, a simple definition for practical purposes is:
1. Separate each entity into its own table.
2. Separate each discrete attribute into its own column.
3. Uniquely identify each entity instance (row) using a primary key.
4. Use foreign key columns to link related entities.
To understand the core principles of normalization, suppose the following table represents a spreadsheet
that a company uses to track its sales.

Notice that the customer and product details are duplicated for each individual item sold; and that the
customer name and postal address, and the product name and price are combined in the same spread-
sheet cells.
Now let's look at how normalization changes the way the data is stored.
28     

Each entity that is represented in the data (customer, product, sales order, and line item) is stored in its
own table, and each discrete attribute of those entities is in its own column.
Recording each instance of an entity as a row in an entity-specific table removes duplication of data. For
example, to change a customer's address, you need only modify the value in a single row.
The decomposition of attributes into individual columns ensures that each value is constrained to an
appropriate data type - for example, product prices must be decimal values, while line item quantities
must be integer numbers. Additionally, the creation of individual columns provides a useful level of
granularity in the data for querying - for example, you can easily filter customers to those who live in a
specific city.
Instances of each entity are uniquely identified by an ID or other key value, known as a primary key; and
when one entity references another (for example, an order has an associated customer), the primary key
of the related entity is stored as a foreign key. You can look up the address of the customer (which is
stored only once) for each record in the Order table by referencing the corresponding record in the
Customer table. Typically, a relational database management system (RDBMS) can enforce referential
integrity to ensure that a value entered into a foreign key field has an existing corresponding primary key
in the related table – for example, preventing orders for non-existent customers.
In some cases, a key (primary or foreign) can be defined as a composite key based on a unique combina-
tion of multiple columns. For example, the LineItem table in the example above uses a unique combina-
tion of OrderNo and ItemNo to identify a line item from an individual order.
    29

Structured Query Language (SQL)


SQL stands for Structured Query Language, and is used to communicate with a relational database. It's the
standard language for relational database management systems. SQL statements are used to perform
tasks such as update data in a database, or retrieve data from a database. Some common relational
database management systems that use SQL include Microsoft SQL Server, MySQL, PostgreSQL, MariaDB,
and Oracle.
NOTE:
SQL was originally standardized by the American National Standards Institute (ANSI) in 1986, and by the
International Organization for Standardization (ISO) in 1987. Since then, the standard has been extended
several times as relational database vendors have added new features to their systems. Additionally, most
database vendors include their own proprietary extensions that are not part of the standard, which has
resulted in a variety of dialects of SQL.
You can use SQL statements such as SELECT, INSERT, UPDATE, DELETE, CREATE, and DROP to accom-
plish almost everything that you need to do with a database. Although these SQL statements are part of
the SQL standard, many database management systems also have their own additional proprietary
extensions to handle the specifics of that database management system. These extensions provide
functionality not covered by the SQL standard, and include areas such as security management and
programmability. For example, Microsoft SQL Server, and Azure database services that are based on the
SQL Server database engine, use Transact-SQL. This implementation includes proprietary extensions for
writing stored procedures and triggers (application code that can be stored in the database), and manag-
ing user accounts. PostgreSQL and MySQL also have their own versions of these features.
Some popular dialects of SQL include:
●● Transact-SQL (T-SQL). This version of SQL is used by Microsoft SQL Server and Azure SQL services.
●● pgSQL. This is the dialect, with extensions implemented in PostgreSQL.
●● PL/SQL. This is the dialect used by Oracle. PL/SQL stands for Procedural Language/SQL.
Users who plan to work specifically with a single database system should learn the intricacies of their
preferred SQL dialect and platform.
NOTE:
The SQL code examples in this module are based on the Transact-SQL dialog, unless otherwise indicated.
The syntax for other dialogs is generally similar, but may vary in some details.

SQL statement types


SQL statements are grouped into three main logical groups:
●● Data Definition Language (DDL)
●● Data Control Language (DCL)
●● Data Manipulation Language (DML)

DDL statements
You use DDL statements to create, modify, and remove tables and other objects in a database (table,
stored procedures, views, and so on).
The most common DDL statements are:
30     

Statement Description
CREATE Create a new object in the database, such as a
table or a view.
ALTER Modify the structure of an object. For instance,
altering a table to add a new column.
DROP Remove an object from the database.
RENAME Rename an existing object.
WARNING:
The DROP statement is very powerful. When you drop a table, all the rows in that table are lost. Unless
you have a backup, you won't be able to retrieve this data.
The following example creates a new database table. The items between the parentheses specify the
details of each column, including the name, the data type, whether the column must always contain a
value (NOT NULL), and whether the data in the column is used to uniquely identify a row (PRIMARY KEY).
Each table should have a primary key, although SQL doesn't enforce this rule.
NOTE:
Columns marked as NOT NULL are referred to as mandatory columns. If you omit the NOT NULL clause,
you can create rows that don't contain a value in the column. An empty column in a row is said to have a
NULL value.
CREATE TABLE Product
(
ID INT PRIMARY KEY,
Name VARCHAR(20) NOT NULL,
Price DECIMAL NULL
);

The datatypes available for columns in a table will vary between database management systems. Howev-
er, most database management systems support numeric types such as INT (an integer, or whole num-
ber), DECIMAL (a decimal number), and string types such as VARCHAR (VARCHAR stands for variable
length character data). For more information, see the documentation for your selected database manage-
ment system.

DCL statements
Database administrators generally use DCL statements to manage access to objects in a database by
granting, denying, or revoking permissions to specific users or groups.
The three main DCL statements are:

Statement Description
GRANT Grant permission to perform specific actions
DENY Deny permission to perform specific actions
REVOKE Remove a previously granted permission
For example, the following GRANT statement permits a user named user1 to read, insert, and modify
data in the Product table.
GRANT SELECT, INSERT, UPDATE
ON Product
    31

TO user1;

DML statements
You use DML statements to manipulate the rows in tables. These statements enable you to retrieve
(query) data, insert new rows, or modify existing rows. You can also delete rows if you don't need them
anymore.
The four main DML statements are:

Statement Description
SELECT Read rows from a table
INSERT Insert new rows into a table
UPDATE Modify data in existing rows
DELETE Delete existing rows
The basic form of an INSERT statement will insert one row at a time. By default, the SELECT, UPDATE,
and DELETE statements are applied to every row in a table. You usually apply a WHERE clause with these
statements to specify criteria; only rows that match these criteria will be selected, updated, or deleted.
WARNING:
SQL doesn't provide are you sure? prompts, so be careful when using DELETE or UPDATE without a
WHERE clause because you can lose or modify a lot of data.
The following code is an example of a SQL statement that selects all columns (indicated by *) from the
Customer table where the City column value is “Seattle”:
SELECT *
FROM Customer
WHERE City = 'Seattle';

To retrieve only a specific subset of columns from the table, you list them in the SELECT clause, like this:
SELECT FirstName, LastName, Address, City
FROM Customer
WHERE City = 'Seattle';

If a query returns many rows, they don't necessarily appear in any specific sequence. If you want to sort
the data, you can add an ORDER BY clause. The data will be sorted by the specified column:
SELECT FirstName, LastName, Address, City
FROM Customer
WHERE City = 'Seattle'
ORDER BY LastName;

You can also run SELECT statements that retrieve data from multiple tables using a JOIN clause. Joins
indicate how the rows in one table are connected with rows in the other to determine what data to
return. A typical join condition matches a foreign key from one table and its associated primary key in the
other table.
The following query shows an example that joins Customer and Order tables. The query makes use of
table aliases to abbreviate the table names when specifying which columns to retrieve in the SELECT
clause and which columns to match in the JOIN clause.
32     

SELECT o.OrderNo, o.OrderDate, c.Address, c.City


FROM Order AS o
JOIN Customer AS c
ON o.Customer = c.ID

The next example shows how to modify an existing row using SQL. It changes the value of the Address
column in the Customer table for rows that have the value 1 in the ID column. All other rows are left
unchanged:
UPDATE Customer
SET Address = '123 High St.'
WHERE ID = 1;

WARNING:
If you omit the WHERE clause, an UPDATE statement will modify every row in the table.
Use the DELETE statement to remove rows. You specify the table to delete from, and a WHERE clause
that identifies the rows to be deleted:
DELETE FROM Product
WHERE ID = 162;

WARNING:
If you omit the WHERE clause, a DELETE statement will remove every row from the table.
The INSERT statement takes a slightly different form. You specify a table and columns in an INTO clause,
and a list of values to be stored in these columns. Standard SQL only supports inserting one row at a
time, as shown in the following example. Some dialects allow you to specify multiple VALUES clauses to
add several rows at a time:
INSERT INTO Product(ID, Name, Price)
VALUES (99, 'Drill', 4.99);

NOTE:
This topic describes some basic SQL statements and syntax in order to help you understand how SQL is
used to work with objects in a database. If you want to learn more about querying data with SQL, review
the Get Started Querying with Transact-SQL1 learning path on Microsoft Learn.

Other common database objects


In addition to tables, a relational database can contains other structures that help to optimize data
organization, encapsulate programmatic actions, and improve the speed of access. In this unit, you'll
learn about three of these structures in more detail: views, stored procedures, and indexes.

What is a view?
A view is a virtual table based on the results of a SELECT query. You can think of a view as a window on
specified rows in one or more underlying tables. For example, you could create a view on the Order and
Customer tables that retrieves order and customer data to provide a single object that makes it easy to
determine delivery addresses for orders:

1 https://docs.microsoft.com/learn/paths/get-started-querying-with-transact-sql?azure-portal=true
    33

CREATE VIEW Deliveries


AS
SELECT o.OrderNo, o.OrderDate,
c.FirstName, c.LastName, c.Address, c.City
FROM Order AS o JOIN Customer AS c
ON o.CustomerID = c.ID;

You can query the view and filter the data in much the same way as a table. The following query finds
details of orders for customers who live in Seattle:
SELECT OrderNo, OrderDate, LastName, Address
FROM Deliveries
WHERE City = 'Seattle';

What is a stored procedure?


A stored procedure defines SQL statements that can be run on command. Stored procedures are used to
encapsulate programmatic logic in a database for actions that applications need to perform when
working with data.
You can define a stored procedure with parameters to create a flexible solution for common actions that
might need to be applied to data based on a specific key or criteria. For example, the following stored
procedure could be defined to change the name of a product based on the specified product ID.
CREATE PROCEDURE RenameProduct
@ProductID INT,
@NewName VARCHAR(20)
AS
UPDATE Product
SET Name = @NewName
WHERE ID = @ProductID;

When a product must be renamed, you can execute the stored procedure, passing the ID of the product
and the new name to be assigned:
EXEC RenameProduct 201, 'Spanner';

What is an index?
An index helps you search for data in a table. Think of an index over a table like an index at the back of a
book. A book index contains a sorted set of references, with the pages on which each reference occurs.
When you want to find a reference to an item in the book, you look it up through the index. You can use
the page numbers in the index to go directly to the correct pages in the book. Without an index, you
might have to read through the entire book to find the references you're looking for.
When you create an index in a database, you specify a column from the table, and the index contains a
copy of this data in a sorted order, with pointers to the corresponding rows in the table. When the user
runs a query that specifies this column in the WHERE clause, the database management system can use
this index to fetch the data more quickly than if it had to scan through the entire table row by row.
34     

For example, you could use the following code to create an index on the Name column of the Product
table:
CREATE INDEX idx_ProductName
ON Product(Name);

The index creates a tree-based structure that the database system's query optimizer can use to quickly
find rows in the Product table based on a specified Name.

For a table containing few rows, using the index is probably not any more efficient than simply reading
the entire table and finding the rows requested by the query (in which case the query optimizer will
ignore the index). However, when a table has many rows, indexes can dramatically improve the perfor-
mance of queries.
You can create many indexes on a table. So, if you also wanted to find products based on price, creating
another index on the Price column in the Product table might be useful. However, indexes aren't free. An
index consumes storage space, and each time you insert, update, or delete data in a table, the indexes for
that table must be maintained. This additional work can slow down insert, update, and delete operations.
You must strike a balance between having indexes that speed up your queries versus the cost of perform-
ing other operations.

Knowledge check
Question 1
Which one of the following statements is a characteristic of a relational database?
†† All columns in a table must be of the same data type
†† A row in a table represents a single instance of an entity
†† Rows in the same table can contain different columns

Question 2
Which SQL statement is used to query tables and return data?
†† QUERY
†† READ
†† SELECT
    35

Question 3
What is an index?
†† A structure that enables queries to locate rows in a table quickly
†† A virtual table based on the results of a query
†† A pre-defined SQL statement that modifies data

Summary
Relational databases are a common way for transactional applications to store and manage data. They
consist of a schema of tables, which are linked through common key values. You use SQL to query and
manipulate the data in the tables, and can enrich the database by creating objects like views, stored
procedures, and indexes.
In this lesson you learned how to:
●● Identify characteristics of relational data
●● Define normalization
●● Identify types of SQL statement
●● Identify common relational database objects

Next steps
Now that you've learned about relational databases, consider learning more about data-related work-
loads on Microsoft Azure by pursuing a Microsoft certification in Azure Data Fundamentals2.

2 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
36     

Explore Azure services for relational data


Introduction
Azure supports multiple database services, enabling you to run popular relational database management
systems, such as SQL Server, PostgreSQL, and MySQL, in the cloud.
Most Azure database services are fully managed, freeing up valuable time you’d otherwise spend manag-
ing your database. Enterprise-grade performance with built-in high availability means you can scale
quickly and reach global distribution without worrying about costly downtime. Developers can take
advantage of industry-leading innovations such as built-in security with automatic monitoring and threat
detection, automatic tuning for improved performance. On top of all of these features, you have guaran-
teed availability.
In this module, you'll explore the options available for relational database services in Azure.

Learning objectives
In this module, you'll learn how to:
●● Identify options for Azure SQL services
●● Identify options for open-source databases in Azure
●● Provision a database service on Azure

Azure SQL
Azure SQL is a collective term for a family of Microsoft SQL Server based database services in Azure.
Specific Azure SQL services include:
●● SQL Server on Azure Virtual Machines (VMs) - A virtual machine running in Azure with an installa-
tion of SQL Server. The use of a VM makes this option an infrastructure-as-a-service (IaaS) solution
that virtualizes hardware infrastructure for compute, storage, and networking in Azure; making it a
great option for “lift and shift” migration of existing on-premises SQL Server installations to the cloud.
●● Azure SQL Managed Instance - A platform-as-a-service (PaaS) option that provides near-100%
compatibility with on-premises SQL Server instances while abstracting the underlying hardware and
operating system. The service includes automated software update management, backups, and other
maintenance tasks, reducing the administrative burden of supporting a database server instance.
●● Azure SQL Database - A fully managed, highly scalable PaaS database service that is designed for the
cloud. This service includes the core database-level capabilities of on-premises SQL Server, and is a
good option when you need to create a new application in the cloud.
●● Azure SQL Edge - A SQL engine that is optimized for Internet-of-things (IoT) scenarios that need to
work with streaming time-series data.
NOTE:
Azure SQL Edge is included in this list for completeness. We'll focus on the other options for more
general relational database scenarios in this module.
    37

Compare Azure SQL services


SQL Server on Azure Azure SQL Managed Azure SQL Database
VMs Instance

Type of cloud service IaaS PaaS PaaS


SQL Server compatibility Fully compatible with Near-100% compatibili- Supports most core
on-premises physical ty with SQL Server. Most database-level capabili-
and virtualized installa- on-premises databases ties of SQL Server. Some
tions. Applications and can be migrated with features depended on
databases can easily be minimal code changes by an on-premises
“lift and shift” migrated by using the Azure application may not be
without change. Database Migration available.
service (https://docs.
microsoft.com/azure/
dms?azure-portal=true)
Architecture SQL Server instances are Each managed instance You can provision a
installed in a virtual can support multiple single database in a
machine. Each instance databases. Additionally, dedicated, managed
can support multiple instance pools can be (logical) server; or you
databases. used to share resources can use an elastic pool
efficiently across smaller to share resources
instances. across multiple databas-
es and take advantage
of on-demand scalabil-
ity.
Availability 99.99% 99.99% 99.995%
38     

SQL Server on Azure Azure SQL Managed Azure SQL Database


VMs Instance
Management You must manage all Fully automated Fully automated
aspects of the server, updates, backups, and updates, backups, and
including operating recovery. recovery.
system and SQL Server
updates, configuration,
backups, and other
maintenance tasks.
Use cases Use this option when Use this option for most Use this option for new
you need to migrate or cloud migration cloud solutions, or to
extend an on-premises scenarios, particularly migrate applications
SQL Server solution and when you need minimal that have minimal
retain full control over changes to existing instance-level depend-
all aspects of server and applications. encies.
database configuration.

SQL Server on Azure Virtual Machines


SQL Server on Virtual Machines enables you to use full versions of SQL Server in the Cloud without
having to manage any on-premises hardware. This is an example of the IaaS approach.
SQL Server running on an Azure virtual machine effectively replicates the database running on real
on-premises hardware. Migrating from the system running on-premises to an Azure virtual machine is no
different than moving the databases from one on-premises server to another.
This approach is suitable for migrations and applications requiring access to operating system features
that might be unsupported at the PaaS level. SQL virtual machines are lift-and-shift ready for existing
applications that require fast migration to the cloud with minimal changes. You can also use SQL Server
on Azure VMs to extend existing on-premises applications to the cloud in hybrid deployments.
NOTE:
A hybrid deployment is a system where part of the operation runs on-premises, and part in the cloud.
Your database might be part of a larger system that runs on-premises, although the database elements
might be hosted in the cloud.
You can use SQL Server in a virtual machine to develop and test traditional SQL Server applications. With
a virtual machine, you have the full administrative rights over the DBMS and operating system. It's a
perfect choice when an organization already has IT resources available to maintain the virtual machines.
These capabilities enable you to:
●● Create rapid development and test scenarios when you don't want to buy on-premises non-produc-
tion SQL Server hardware.
●● Become lift-and-shift ready for existing applications that require fast migration to the cloud with
minimal changes or no changes.
●● Scale up the platform on which SQL Server is running, by allocating more memory, CPU power, and
disk space to the virtual machine. You can quickly resize an Azure virtual machine without the require-
ment that you reinstall the software that is running on it.
    39

Business benefits
Running SQL Server on virtual machines allows you to meet unique and diverse business needs through a
combination of on-premises and cloud-hosted deployments, while using the same set of server products,
development tools, and expertise across these environments.
It's not always easy for businesses to switch their DBMS to a fully managed service. There may be specific
requirements that must be satisfied in order to migrate to a managed service that requires making
changes to the database and the applications that use it. For this reason, using virtual machines can offer
a solution, but using them doesn't eliminate the need to administer your DBMS as carefully as you would
on-premises.

Azure SQL Database Managed Instance


Azure SQL Managed instance effectively runs a fully controllable instance of SQL Server in the cloud. You
can install multiple databases on the same instance. You have complete control over this instance, much
as you would for an on-premises server. SQL Managed Instance automates backups, software patching,
database monitoring, and other general tasks, but you have full control over security and resource
allocation for your databases. You can find detailed information at What is Azure SQL Managed In-
stance?3.
Managed instances depend on other Azure services such as Azure Storage for backups, Azure Event Hubs
for telemetry, Azure Active Directory for authentication, Azure Key Vault for Transparent Data Encryption
(TDE) and a couple of Azure platform services that provide security and supportability features. The
managed instances make connections to these services.
All communications are encrypted and signed using certificates. To check the trustworthiness of commu-
nicating parties, managed instances constantly verify these certificates through certificate revocation lists.
If the certificates are revoked, the managed instance closes the connections to protect the data.

Use cases
Consider Azure SQL Managed Instance if you want to lift-and-shift an on-premises SQL Server instance
and all its databases to the cloud, without incurring the management overhead of running SQL Server on
a virtual machine.
Azure SQL Managed Instance provides features not available in Azure SQL Database (discussed below). If
your system uses features such as linked servers, Service Broker (a message processing system that can
be used to distribute work across servers), or Database Mail (which enables your database to send email
messages to users), then you should use managed instance. To check compatibility with an existing
on-premises system, you can install Data Migration Assistant (DMA)4. This tool analyzes your databases
on SQL Server and reports any issues that could block migration to a managed instance.

Business benefits
Azure SQL Managed Instance enables a system administrator to spend less time on administrative tasks
because the service either performs them for you or greatly simplifies those tasks. Automated tasks
include operating system and database management system software installation and patching, dynamic
instance resizing and configuration, backups, database replication (including system databases), high
availability configuration, and configuration of health and performance monitoring data streams.

3 https://docs.microsoft.com/azure/sql-database/sql-database-managed-instance
4 https://www.microsoft.com/download/details.aspx?id=53595
40     

Azure SQL Managed Instance has near 100% compatibility with SQL Server Enterprise Edition, running
on-premises.
Azure SQL Managed Instance supports SQL Server Database engine logins and logins integrated with
Azure Active Directory (AD). SQL Server Database engine logins include a username and a password. You
must enter your credentials each time you connect to the server. Azure AD logins use the credentials
associated with your current computer sign-in, and you don't need to provide them each time you
connect to the server.

Azure SQL Database


Azure SQL Database is a PaaS offering from Microsoft. You create a managed database server in the
cloud, and then deploy your databases on this server.
NOTE:
A SQL Database server is a logical construct that acts as a central administrative point for multiple single
or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups.
Azure SQL Database is available as a Single Database or an Elastic Pool.

Single Database
This option enables you to quickly set up and run a single SQL Server database. You create and run a
database server in the cloud, and you access your database through this server. Microsoft manages the
server, so all you have to do is configure the database, create your tables, and populate them with your
data. You can scale the database if you need more storage space, memory, or processing power. By
default, resources are pre-allocated, and you're charged per hour for the resources you've requested. You
can also specify a serverless configuration. In this configuration, Microsoft creates its own server, which
might be shared by databases belonging to other Azure subscribers. Microsoft ensures the privacy of
your database. Your database automatically scales and resources are allocated or deallocated as required.

Elastic Pool
This option is similar to Single Database, except that by default multiple databases can share the same
resources, such as memory, data storage space, and processing power through multiple-tenancy. The
resources are referred to as a pool. You create the pool, and only your databases can use the pool. This
model is useful if you have databases with resource requirements that vary over time, and can help you
to reduce costs. For example, your payroll database might require plenty of CPU power at the end of
each month as you handle payroll processing, but at other times the database might become much less
active. You might have another database that is used for running reports. This database might become
active for several days in the middle of the month as management reports are generated, but with a
lighter load at other times. Elastic Pool enables you to use the resources available in the pool, and then
release the resources once processing has completed.

Use cases
Azure SQL Database gives you the best option for low cost with minimal administration. It isn't fully
compatible with on-premises SQL Server installations. It's often used in new cloud projects where the
application design can accommodate any required changes to your applications.
NOTE:
You can use the Data Migration Assistant to detect compatibility issues with your databases that can
    41

impact database functionality in Azure SQL Database. For more information, see Overview of Data
Migration Assistant5.
Azure SQL Database is often used for:
●● Modern cloud applications that need to use the latest stable SQL Server features.
●● Applications that require high availability.
●● Systems with a variable load that need the database server to scale up and down quickly.

Business benefits
Azure SQL Database automatically updates and patches the SQL Server software to ensure that you're
always running the latest and most secure version of the service.
The scalability features of Azure SQL Database ensure that you can increase the resources available to
store and process data without having to perform a costly manual upgrade.
The service provides high availability guarantees, to ensure that your databases are available at least
99.99% of the time. Azure SQL Database supports point-in-time restore, enabling you to recover a
database to the state it was in at any point in the past. Databases can be replicated to different regions to
provide more resiliency and disaster recovery
Advanced threat protection provides advanced security capabilities, such as vulnerability assessments, to
help detect and remediate potential security problems with your databases. Threat protection also
detects anomalous activities that indicate unusual and potentially harmful attempts to access or exploit
your database. It continuously monitors your database for suspicious activities, and provides immediate
security alerts on potential vulnerabilities, SQL injection attacks, and anomalous database access patterns.
Threat detection alerts provide details of the suspicious activity, and recommend action on how to
investigate and mitigate the threat.
Auditing tracks database events and writes them to an audit log in your Azure storage account. Auditing
can help you maintain regulatory compliance, understand database activity, and gain insight into discrep-
ancies and anomalies that might indicate business concerns or suspected security violations.
SQL Database helps secure your data by providing encryption that protects data that is stored in the
database (at rest) and while it is being transferred across the network (in motion).

Azure databases for open-source


In addition to Azure SQL services, Azure data services are available for other popular relational database
systems, including MySQL, MariaDB, and PostgreSQL. The primary reason for these services is to enable
organizations that use them in on-premises apps to move to Azure quickly, without making significant
changes to their applications.

What are MySQL, MariaDB, and PostgreSQL?


MySQL, MariaDB, and PostgreSQL are relational database management systems that are tailored for
different specializations.
MySQL started life as a simple-to-use open-source database management system. It's the leading open
source relational database for Linux, Apache, MySQL, and PHP (LAMP) stack apps. It's available in several
editions; Community, Standard, and Enterprise. The Community edition is available free-of-charge, and
has historically been popular as a database management system for web applications, running under

5 https://docs.microsoft.com/sql/dma/dma-overview
42     

Linux. Versions are also available for Windows. Standard edition offers higher performance, and uses a
different technology for storing data. Enterprise edition provides a comprehensive set of tools and
features, including enhanced security, availability, and scalability. The Standard and Enterprise editions
are the versions most frequently used by commercial organizations, although these versions of the
software aren't free.
MariaDB is a newer database management system, created by the original developers of MySQL. The
database engine has since been rewritten and optimized to improve performance. MariaDB offers
compatibility with Oracle Database (another popular commercial database management system). One
notable feature of MariaDB is its built-in support for temporal data. A table can hold several versions of
data, enabling an application to query the data as it appeared at some point in the past.
PostgreSQL is a hybrid relational-object database. You can store data in relational tables, but a Post-
greSQL database also enables you to store custom data types, with their own non-relational properties.
The database management system is extensible; you can add code modules to the database, which can
be run by queries. Another key feature is the ability to store and manipulate geometric data, such as lines,
circles, and polygons.
PostgreSQL has its own query language called pgsql. This language is a variant of the standard relational
query language, SQL, with features that enable you to write stored procedures that run inside the
database.

Azure Database for MySQL

Azure Database for MySQL is a PaaS implementation of MySQL in the Azure cloud, based on the MySQL
Community Edition.
The Azure Database for MySQL service includes high availability at no additional cost, and scalability as
required. You only pay for what you use. Automatic backups are provided, with point-in-time restore.
The server provides connection security to enforce firewall rules and, optionally, require SSL connections.
Many server parameters enable you to configure server settings such as lock modes, maximum number
of connections, and timeouts.
Azure Database for MySQL provides a global database system that scales up to large databases without
the need to manage hardware, network components, virtual servers, software patches, and other underly-
ing components.
Certain operations aren't available with Azure Database for MySQL. These functions are primarily con-
cerned with security and administration. Azure manages these aspects of the database server itself.

Benefits of Azure Database for MySQL


You get the following features with Azure Database for MySQL:
●● High availability features built-in.
●● Predictable performance.
●● Easy scaling that responds quickly to demand.
●● Secure data, both at rest and in motion.
●● Automatic backups and point-in-time restore for the last 35 days.
    43

●● Enterprise-level security and compliance with legislation.


The system uses pay-as-you-go pricing so you only pay for what you use.
Azure Database for MySQL servers provides monitoring functionality to add alerts, and to view metrics
and logs.

Azure Database for MariaDB

Azure Database for MariaDB is an implementation of the MariaDB database management system adapt-
ed to run in Azure. It's based on the MariaDB Community Edition.
The database is fully managed and controlled by Azure. Once you've provisioned the service and trans-
ferred your data, the system requires almost no additional administration.

Benefits of Azure Database for MariaDB


Azure Database for MariaDB delivers:
●● Built-in high availability with no additional cost.
●● Predictable performance, using inclusive pay-as-you-go pricing.
●● Scaling as needed within seconds.
●● Secured protection of sensitive data at rest and in motion.
●● Automatic backups and point-in-time-restore for up to 35 days.
●● Enterprise-grade security and compliance.

Azure Database for PostgreSQL

If you prefer PostgreSQL, you can choose Azure Database for PostgreSQL to run a PaaS implementation
of PostgreSQL in the Azure Cloud. This service provides the same availability, performance, scaling,
security, and administrative benefits as the MySQL service.
Some features of on-premises PostgreSQL databases aren't available in Azure Database for PostgreSQL.
These features are mostly concerned with the extensions that users can add to a database to perform
specialized tasks, such as writing stored procedures in various programming languages (other than pgsql,
which is available), and interacting directly with the operating system. A core set of the most frequently
used extensions is supported, and the list of available extensions is under continuous review.
Azure Database for PostgreSQL has three deployment options: Single Server, Flexible Server, and Hyper-
scale.

Azure Database for PostgreSQL Single server


The single-server deployment option for PostgreSQL provides similar benefits as Azure Database for
MySQL. You choose from three pricing tiers: Basic, General Purpose, and Memory Optimized. Each tier
44     

supports different numbers of CPUs, memory, and storage sizes—you select one based on the load you
expect to support.

Azure Database for PostgreSQL Flexible Server


The flexible-server deploymnet option for PostgreSQL is a fully managed database service. It provides
more control and server configuration customizations, and has better cost optimization controls.

Azure Database for PostgreSQL Hyperscale (Citus)


Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to support
large database loads. Your database is split across nodes. Data is split into chunks based on the value of a
partition key or sharding key. Consider using this deployment option for the largest database PostgreSQL
deployments in the Azure Cloud.

Benefits of Azure Database for PostgreSQL


Azure Database for PostgreSQL is a highly available service. It contains built-in failure detection and
failover mechanisms.
Users of PostgreSQL will be familiar with the pgAdmin tool, which you can use to manage and monitor a
PostgreSQL database. You can continue to use this tool to connect to Azure Database for PostgreSQL.
However, some server-focused functionality, such as performing server backup and restore, aren't
available because the server is managed and maintained by Microsoft.
Azure Database for PostgreSQL records information about the queries run against databases on the
server, and saves them in a database named azure_sys. You query the query_store.qs_view view to see this
information, and use it to monitor the queries that users are running. This information can prove invalua-
ble if you need to fine-tune the queries performed by your applications.

Lab: Provision Azure relational database services


In this exercise you’ll provision an Azure SQL Database resource in your Azure subscription, and then use
SQL to query the tables in a relational database.
1. Start the virtual machine for this lab, or go to the exercise page at https://aka.ms/dp900-sql-lab.
2. Follow the instructions to complete the exercise on Microsoft Learn, using the Azure subscription
provided for this lab.

Knowledge check
Question 1
Which deployment option offers the best compatibility when migrating an existing SQL Server on-premises
solution?
†† Azure SQL Database (single database)
†† Azure SQL Database (elastic pool)
†† Azure SQL Managed Instance
    45

Question 2
Which of the following statements is true about Azure SQL Database?
†† Most database maintenance tasks are automated
†† You must purchase a SQL Server license
†† It can only support one database

Question 3
Which database service is the simplest option for migrating a LAMP application to Azure?
†† Azure SQL Managed Instance
†† Azure Database for MySQL
†† Azure Database for PostgreSQL

Summary
Azure supports a range of database services that you can use to support new cloud applications or
migrate existing applications to the cloud.
In this lesson, you learned how to:
●● Identify options for Azure SQL services
●● Identify options for open-source databases in Azure
●● Provision a database service on Azure

Next steps
Now that you've learned about Azure relational database services, consider learning more about data-re-
lated workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamentals6.

6 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
46     

Answers
Question 1
Which one of the following statements is a characteristic of a relational database?
†† All columns in a table must be of the same data type
■■ A row in a table represents a single instance of an entity
†† Rows in the same table can contain different columns
Explanation
Each row in a table represents an instance of an entity.
Question 2
Which SQL statement is used to query tables and return data?
†† QUERY
†† READ
■■ SELECT
Explanation
Use the SELECT statement to query one or more tables and return data.
Question 3
What is an index?
■■ A structure that enables queries to locate rows in a table quickly
†† A virtual table based on the results of a query
†† A pre-defined SQL statement that modifies data
Explanation
Indexes improve query performance by locating rows with indexed column values.
Question 1
Which deployment option offers the best compatibility when migrating an existing SQL Server on-prem-
ises solution?
†† Azure SQL Database (single database)
†† Azure SQL Database (elastic pool)
■■ Azure SQL Managed Instance
Explanation
Azure SQL Managed Instance offers near 100% compatibility with SQL Server.
Question 2
Which of the following statements is true about Azure SQL Database?
■■ Most database maintenance tasks are automated
†† You must purchase a SQL Server license
†† It can only support one database
Explanation
Azure SQL Database automates most maintenance tasks.
    47

Question 3
Which database service is the simplest option for migrating a LAMP application to Azure?
†† Azure SQL Managed Instance
■■ Azure Database for MySQL
†† Azure Database for PostgreSQL
Explanation
LAMP standard for Linux, Apache, MySQL, and PHP.
Module 3 Explore fundamentals of non-rela-
tional data in Azure

Fundamentals of Azure Storage


Introduction
Most software applications need to store data. Often this takes the form of a relational database, in which
the data is organized in related tables and managed by using Structured Query Language (SQL). Howev-
er, many applications don't need the rigid structure of a relational database and rely on non-relational
(often referred to as NoSQL) storage.
Azure Storage is one of the core services in Microsoft Azure, and offers a range of options for storing
data in the cloud. In this module, you'll explore the fundamental capabilities of Azure storage and learn
how it's used to support applications that require non-relational data stores.

Learning objectives
In this module, you'll learn how to:
●● Describe features and capabilities of Azure blob storage
●● Describe features and capabilities of Azure Data Lake Gen2
●● Describe features and capabilities of Azure file storage
●● Describe features and capabilities of Azure table storage
●● Provision and use an Azure Storage account

Azure Blob Storage


Azure Blob Storage is a service that enables you to store massive amounts of unstructured data as binary
large objects, or blobs, in the cloud. Blobs are an efficient way to store data files in a format that is
optimized for cloud-based storage, and applications can read and write them by using the Azure blob
storage API.
50     

In an Azure storage account, you store blobs in containers. A container provides a convenient way of
grouping related blobs together. You control who can read and write blobs inside a container at the
container level.
Within a container, you can organize blobs in a hierarchy of virtual folders, similar to files in a file system
on disk. However, by default, these folders are simply a way of using a “/” character in a blob name to
organize the blobs into namespaces. The folders are purely virtual, and you can't perform folder-level
operations to control access or perform bulk operations.
Azure Blob Storage supports three different types of blob:
●● Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 100 MB. A
block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The block is the
smallest amount of data that can be read or written as an individual unit. Block blobs are best used to
store discrete, large, binary objects that change infrequently.
●● Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob is
optimized to support random read and write operations; you can fetch and store data for a single
page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtu-
al disk storage for virtual machines.
●● Append blobs. An append blob is a block blob optimized to support append operations. You can
only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported.
Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.
Blob storage provides three access tiers, which help to balance access latency and storage cost:
●● The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data is
stored on high-performance media.
●● The Cool tier has lower performance and incurs reduced storage charges compared to the Hot tier.
Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs to be
accessed frequently initially, but less so as time passes. In these situations, you can create the blob in
the Hot tier, but migrate it to the Cool tier later. You can migrate a blob from the Cool tier back to the
Hot tier.
●● The Archive tier provides the lowest storage cost, but with increased latency. The Archive tier is
intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier
are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few
milliseconds, but for the Archive tier, it can take hours for the data to become available. To retrieve a
blob from the Archive tier, you must change the access tier to Hot or Cool. The blob will then be
rehydrated. You can read the blob only when the rehydration process is complete.
You can create lifecycle management policies for blobs in a storage account. A lifecycle management
policy can automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is
used less frequently (policy is based on the number of days since modification). A lifecycle management
policy can also arrange to delete outdated blobs.
    51

Azure Data Lake Store Gen 2


Azure Data Lake Store (Gen1) is a separate service for hierarchical data storage for analytical data lakes,
often used by so-called big data analytical solutions that work with structured, semi-structured, and
unstructured data stored in files. Azure Data Lake Storage Gen2 is a newer version of this service that is
integrated into Azure Storage; enabling you to take advantage of the scalability of blob storage and the
cost-control of storage tiers, combined with the hierarchical file system capabilities and compatibility with
major analytics systems of Azure Data Lake Store.

Systems like Hadoop in Azure HDInsight, Azure Databricks, and Azure Synapse Analytics can mount a
distributed file system hosted in Azure Data Lake Store Gen2 and use it to process huge volumes of data.
To create an Azure Data Lake Store Gen2 files system, you must enable the Hierarchical Namespace
option of an Azure Storage account. You can do this when initially creating the storage account, or you
can upgrade an existing Azure Storage account to support Data Lake Gen2. Be aware however that
upgrading is a one-way process – after upgrading a storage account to support a hierarchical namespace
for blob storage, you can’t revert it to a flat namespace.

Azure Files
Many on-premises systems comprising a network of in-house computers make use of file shares. A file
share enables you to store a file on one computer, and grant access to that file to users and applications
running on other computers. This strategy can work well for computers in the same local area network,
but doesn't scale well as the number of users increases, or if users are located at different sites.
Azure Files is essentially a way to create cloud-based network shares, such as you typically find in
on-premises organizations to make documents and other files available to multiple users. By hosting file
shares in Azure, organizations can eliminate hardware costs and maintenance overhead, and benefit from
high availability and scalable cloud storage for files.
52     

You create Azure File storage in a storage account. Azure Files enables you to share up to 100 TB of data
in a single storage account. This data can be distributed across any number of file shares in the account.
The maximum size of a single file is 1 TB, but you can set quotas to limit the size of each share below this
figure. Currently, Azure File Storage supports up to 2000 concurrent connections per shared file.
After you've created a storage account, you can upload files to Azure File Storage using the Azure portal,
or tools such as the AzCopy utility. You can also use the Azure File Sync service to synchronize locally
cached copies of shared files with the data in Azure File Storage.
Azure File Storage offers two performance tiers. The Standard tier uses hard disk-based hardware in a
datacenter, and the Premium tier uses solid-state disks. The Premium tier offers greater throughput, but
is charged at a higher rate.
Azure Files supports two common network file sharing protocols:
●● Server Message Block (SMB) file sharing is commonly used across multiple operating systems (Win-
dows, Linux, macOS).
●● Network File System (NFS) shares are used by some Linux and macOS versions. To create an NFS share,
you must use a premium tier storage account and create and configure a virtual network through
which access to the share can be controlled.

Azure Table Storage


Azure Table Storage is a NoSQL storage solution that makes use of tables containing key/value data
items. Each item is represented by a row that contains columns for the data fields that need to be stored.

However, don't be misled into thinking that an Azure Table Storage table is like a table in a relational
database. An Azure Table enables you to store semi-structured data. All rows in a table must have a
unique key (composed of a partition key and a row key), and when you modify data in a table, a times-
tamp column records the date and time the modification was made; but other than that, the columns in
each row can vary. Azure Table Storage tables have no concept of foreign keys, relationships, stored
procedures, views, or other objects you might find in a relational database. Data in Azure Table storage is
usually denormalized, with each row holding the entire data for a logical entity. For example, a table
holding customer information might store the first name, last name, one or more telephone numbers,
and one or more addresses for each customer. The number of fields in each row can be different, de-
pending on the number of telephone numbers and addresses for each customer, and the details record-
ed for each address. In a relational database, this information would be split across multiple rows in
several tables.
To help ensure fast access, Azure Table Storage splits a table into partitions. Partitioning is a mechanism
for grouping related rows, based on a common property or partition key. Rows that share the same
    53

partition key will be stored together. Partitioning not only helps to organize data, it can also improve
scalability and performance in the following ways:
●● Partitions are independent from each other, and can grow or shrink as rows are added to, or removed
from, a partition. A table can contain any number of partitions.
●● When you search for data, you can include the partition key in the search criteria. This helps to narrow
down the volume of data to be examined, and improves performance by reducing the amount of I/O
(input and output operations, or reads and writes) needed to locate the data.
The key in an Azure Table Storage table comprises two elements; the partition key that identifies the
partition containing the row, and a row key that is unique to each row in the same partition. Items in the
same partition are stored in row key order. If an application adds a new row to a table, Azure ensures that
the row is placed in the correct position in the table. This scheme enables an application to quickly
perform point queries that identify a single row, and range queries that fetch a contiguous block of rows
in a partition.

Lab: Explore Azure Storage


In this exercise you’ll provision and use Azure Storage.
1. Start the virtual machine for this lab, or go to the exercise page at https://aka.ms/dp900-storage-lab.
2. Follow the instructions to complete the exercise on Microsoft Learn, using the Azure subscription
provided for this lab.

Knowledge check
Question 1
What are the elements of an Azure Table storage key?
†† Table name and column name
†† Partition key and row key
†† Row number

Question 2
WWhat should you do to an existing Azure Storage account in order to support a data lake for Azure
Synapse Analytics?
†† Add an Azure Files share
†† Create Azure Storage tables for the data you want to analyze
†† Upgrade the account to enable hierarchical namespace and create a blob container

Question 3
Why might you use Azure File storage?
†† To share files that are stored on-premises with users located at other sites.
†† To enable users at different sites to share files.
†† To store large binary data files containing images or other unstructured data.
54     

Summary
Azure Storage is a key service in Microsoft Azure, and enables a wide range of data storage scenarios and
solutions.
In this lesson, you learned how to:
●● Describe features and capabilities of Azure blob storage
●● Describe features and capabilities of Azure Data Lake Gen2
●● Describe features and capabilities of Azure file storage
●● Describe features and capabilities of Azure table storage
●● Provision and use an Azure Storage account

Next steps
Now that you've learned about Azure Storage for non-relational data storage, consider learning more
about data-related workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamen-
tals1.

1 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
    55

Fundamentals of Azure Cosmos DB


Introduction
Relational databases store data in relational tables, but sometimes the structure imposed by this model
can be too rigid, and often leads to poor performance unless you spend time implementing detailed
tuning. Other models, collectively known as NoSQL databases, exist. These models store data in other
structures, such as documents, graphs, key-value stores, and column family stores.
Azure Cosmos DB is a highly scalable cloud database service for NoSQL data.

Learning objectives
In this module, you'll learn how to:
●● Describe key features and capabilities of Azure Cosmos DB
●● Identify the APIs supported in Azure Cosmos DB
●● Provision and use an Azure Cosmos DB instance

What is Azure Cosmos DB?

Azure Cosmos DB supports multiple application programming interfaces (APIs) that enable developers to
use the programming semantics of many common kinds of data store to work with data in a Cosmos DB
database. The internal data structure is abstracted, enabling developers to use Cosmos DB to store and
query data using APIs with which they're already familiar.
NOTE:
An API is an Application Programming Interface. Database management systems (and other software
frameworks) provide a set of APIs that developers can use to write programs that need to access data.
The APIs vary for different database management systems.
56     

Cosmos DB uses indexes and partitioning to provide fast read and write performance and can scale to
massive volumes of data. You can enable multi-region writes, adding the Azure regions of your choice to
your Cosmos DB account so that globally distributed users can each work with data in their local replica.

When to use Cosmos DB


Cosmos DB is a highly scalable database management system. Cosmos DB automatically allocates space
in a container for your partitions, and each partition can grow up to 10 GB in size. Indexes are created
and maintained automatically. There's virtually no administrative overhead.
Cosmos DB is a foundational service in Azure. Cosmos DB has been used by many of Microsoft's products
for mission critical applications at global scale, including Skype, Xbox, Microsoft 365, Azure, and many
others. Cosmos DB is highly suitable for the following scenarios:
●● IoT and telematics. These systems typically ingest large amounts of data in frequent bursts of activity.
Cosmos DB can accept and store this information quickly. The data can then be used by analytics
services, such as Azure Machine Learning, Azure HDInsight, and Power BI. Additionally, you can
process the data in real-time using Azure Functions that are triggered as data arrives in the database.
●● Retail and marketing. Microsoft uses Cosmos DB for its own e-commerce platforms that run as part of
Windows Store and Xbox Live. It's also used in the retail industry for storing catalog data and for
event sourcing in order processing pipelines.
●● Gaming. The database tier is a crucial component of gaming applications. Modern games perform
graphical processing on mobile/console clients, but rely on the cloud to deliver customized and
personalized content like in-game stats, social media integration, and high-score leaderboards. Games
often require single-millisecond latencies for reads and write to provide an engaging in-game
experience. A game database needs to be fast and be able to handle massive spikes in request rates
during new game launches and feature updates.
●● Web and mobile applications. Azure Cosmos DB is commonly used within web and mobile applica-
tions, and is well suited for modeling social interactions, integrating with third-party services, and for
building rich personalized experiences. The Cosmos DB SDKs can be used to build rich iOS and
Android applications using the popular Xamarin framework.
For additional information about uses for Cosmos DB, read Common Azure Cosmos DB use cases2.

Azure Cosmos DB APIs


Azure Cosmos DB supports multiple APIs, enabling developers to easily migrate data from commonly
used NoSQL stores and apply their existing programming skills. When you provision a new Cosmos DB
instance, you select the API that you want to use. The choice of API depends on many factors including,
the type of data to be stored, the need to support existing applications, and the API skills of the develop-
ers who will work with the data store.

Core (SQL) API


The native API in Cosmos DB manages data in JSON document format, and despite being a NoSQL data
storage solution, uses SQL syntax to work with the data.
A SQL query for a Cosmos DB database containing customer data might look similar to this:
SELECT *
FROM customers c

2 https://docs.microsoft.com/azure/cosmos-db/use-cases
    57

WHERE c.id = "[email protected]"

The result of this query consists of one or more JSON documents, as shown here:
{
"id": "[email protected]",
"name": "Joe Jones",
"address": {
"street": "1 Main St.",
"city": "Seattle"
}
}

MongoDB API
MongoDB is a popular open source database in which data is stored in Binary JSON (BSON) format. The
Azure Cosmos DB MongoDB API enables developers to use MongoDB client libraries to and code to work
with data in Azure Cosmos DB.
MongoDB Query Language (MQL) uses a compact, object-oriented syntax in which developers use
objects to call methods. For example, the following query uses the find method to query the products
collection in the db object:
db.products.find({id: 123})

The results of this query consist of JSON documents, similar to this:


{
"id": 123,
"name": "Hammer",
"price": 2.99}
}

Table API
The Table API is used to work with data in key-value tables, similar to Azure Table Storage. The Azure
Cosmos DB Table API offers greater scalability and performance than Azure Table Storage.
For example, you might define a table named Customers like this:

PartitionKey RowKey Name Email


1 123 Joe Jones [email protected]
1 124 Samir Nadoy [email protected]
You can then use the Cosmos DB Table API through one of the language-specific SDKs to make calls to
your service endpoint to retrieve data from the table. For example, the following request returns the row
containing the record for Samir Nadoy in the table above:
https://endpoint/Customers(PartitionKey='1',RowKey='124')
58     

Cassandra API
The Cassandra API is compatible with Apache Cassandra, which is a popular open source database that
uses a column-family storage structure. Column families are tables, similar to those in a relational
database, with the exception that it's not mandatory for every row to have the same columns.
For example, you might create an Employees table like this:

ID Name Manager
1 Sue Smith
2 Ben Chan Sue Smith
Cassandra supports a syntax based on SQL, so a client application could retrieve the record for Ben Chan
like this:
SELECT * FROM Employees WHERE ID = 2

Gremlin API
The Gremlin API is used with data in a graph structure; in which entities are defined as vertices that form
nodes in connected graph. Nodes are connected by edges that represent relationships, like this:

The example in the image shows two kinds of vertex (employee and department) and edges that connect
them (employee "Ben" reports to employee "Sue", and both employees work in the "Hardware" depart-
ment).

Gremlin syntax includes functions to operate on vertices and edges, enabling you to insert, update,
delete, and query data in the graph. For example, you could use the following code to add a new
employee named Alice that reports to the employee with ID 1 (Sue)
g.addV('employee').property('id', '3').property('firstName', 'Alice')
g.V('3').addE('reports to').to(g.V('1'))

The following query returns all of the employee vertices, in order of ID.
g.V().hasLabel('employee').order().by('id')
    59

Lab: Explore Azure Cosmos DB


In this exercise you’ll provision and use Azure Cosmos DB.
1. Start the virtual machine for this lab, or go to the exercise page at https://aka.ms/dp900-cosmos-lab.
2. Follow the instructions to complete the exercise on Microsoft Learn, using the Azure subscription
provided for this lab.

Knowledge check
Question 1
Which API should you use to store and query JSON documents in Azure Cosmos DB?
†† Core (SQL) API
†† Cassandra API
†† Table API

Question 2
Which Azure Cosmos DB API should you use to work with data in which entities and their relationships to
one another are represented in a graph using vertices and edges?
†† MongoDB API
†† Core (SQL) API
†† Gremlin API

Question 3
How can you enable globally distributed users to work with their own local replica of a Cosmos DB data-
base?
†† Create an Azure Cosmos DB account in each region where you have users.
†† Use the Table API to copy data to Azure Table Storage in each region where you have users.
†† Enable multi-region writes and add the regions where you have users.

Summary
Azure Cosmos DB provides a global-scale database solution for non-relational data.
In this lesson, you'll learn how to:
●● Describe key features and capabilities of Azure Cosmos DB
●● Identify the APIs supported in Azure Cosmos DB
●● Provision and use an Azure Cosmos DB instance
60     

Next steps
Now that you've learned about Azure Cosmos DB for non-relational data storage, consider learning more
about data-related workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamen-
tals3.

3 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
    61

Answers
Question 1
What are the elements of an Azure Table storage key?
†† Table name and column name
■■ Partition key and row key
†† Row number
Explanation
The partition key identifies the partition in which a row is located, and the rows in each partition are stored
in row key order.
Question 2
WWhat should you do to an existing Azure Storage account in order to support a data lake for Azure
Synapse Analytics?
†† Add an Azure Files share
†† Create Azure Storage tables for the data you want to analyze
■■ Upgrade the account to enable hierarchical namespace and create a blob container
Explanation
Enabling a hierarchical namespace adds support for Azure Data Lake Storage Gen 2, which can be used by
Synapse Analytics.
Question 3
Why might you use Azure File storage?
†† To share files that are stored on-premises with users located at other sites.
■■ To enable users at different sites to share files.
†† To store large binary data files containing images or other unstructured data.
Explanation
You can create a file share in Azure File storage, upload files to this file share, and grant access to the file
share to remote users.
Question 1
Which API should you use to store and query JSON documents in Azure Cosmos DB?
■■ Core (SQL) API
†† Cassandra API
†† Table API
Explanation
The core (SQL) API is designed to store and query JSON documents.
62     

Question 2
Which Azure Cosmos DB API should you use to work with data in which entities and their relationships to
one another are represented in a graph using vertices and edges?
†† MongoDB API
†† Core (SQL) API
■■ Gremlin API
Explanation
The Gremlin API is used to manage a network of nodes (vertices) and the relationships between them
(edges).
Question 3
How can you enable globally distributed users to work with their own local replica of a Cosmos DB
database?
†† Create an Azure Cosmos DB account in each region where you have users.
†† Use the Table API to copy data to Azure Table Storage in each region where you have users.
■■ Enable multi-region writes and add the regions where you have users.
Explanation
You can enable multi-region writes in the regions where you want users to work with the data.
Module 4 Explore fundamentals of data ana-
lytics

Modern data warehousing


Introduction
Modern data warehousing is a generic term that describes the infrastructure and processes used to
support large-scale data analytics. Modern data warehousing solutions combine conventional data
warehousing used to support business intelligence (BI), which typically involves copying data from
transactional data stores into a relational database with a schema that's optimized for querying and
building multidimensional models; with techniques used for so-called “big data” analytics, in which large
volumes of data in multiple formats are batch loaded or captured in real-time streams, and stored in a
data lake from which distributed processing engines like Apache Spark are used to process the data at
scale.

Learning objectives
In this module, you will learn how to:
●● Identify common elements of a modern data warehousing solution
●● Describe key features for data ingestion pipelines
●● Identify common types of analytical data store and related Azure services
●● Provision Azure Synapse Analytics and use it to ingest, process, and query data

What is modern data warehousing?


Modern data warehousing architecture can vary, as can the specific technologies used to implement it;
but in general, the following elements are included:
64     

1. Data ingestion and processing – data from one or more transactional data stores, files, real-time
streams, or other sources is loaded into a data lake or a relational data warehouse. The load operation
usually involves an extract, transform, and load (ETL) or extract, load, and transform (ELT) process in
which the data is cleaned, filtered, and restructured for analysis. In ETL processes, the data is trans-
formed before being loaded into an analytical store, while in an ELT process the data is copied to the
store and then transformed. Either way, the resulting data structure is optimized for analytical queries.
The data processing is often performed by distributed systems that can process high volumes of data
in parallel using multi-node clusters. Data ingestion includes both batch processing of static data and
real-time processing of streaming data.
2. Analytical data store – data stores for large scale analytics include relational data warehouses,
file-system based data lakes, and hybrid architectures that combine features of data warehouses and
data lakes (sometimes called data lakehouses or lake databases). We'll discuss these in more depth
later.
3. Analytical data model – while data analysts and data scientists can work with the data directly in the
analytical data store, it’s common to create one or more data models that pre-aggregate the data to
make it easier to produce reports, dashboards, and interactive visualizations. Often these data models
are described as cubes, in which numeric data values are aggregated across one or more dimensions
(for example, to determine total sales by product and region). The model encapsulates the relation-
ships between data values and dimensional entities to support “drill-up/drill-down” analysis.
4. Data visualization – data analysts consume data from analytical models, and directly from analytical
stores to create reports, dashboards, and other visualizations. Additionally, users in an organization
who may not be technology professionals might perform self-service data analysis and reporting. The
visualizations from the data show trends, comparisons, and key performance indicators (KPIs) for a
business or other organization, and can take the form of printed reports, graphs and charts in docu-
ments or PowerPoint presentations, web-based dashboards, and interactive environments in which
users can explore data visually.

Data ingestion and processing pipelines


Now that you understand a little about the architecture of a modern data warehousing solution, and
some of the distributed processing technologies that can be used to handle large volumes of data, it's
time to explore how data is ingested into an analytical data store from one or more sources.
    65

On Azure, large-scale data ingestion is best implemented by creating pipelines that orchestrate ETL
processes. You can create and run pipelines using Azure Data Factory1, or you can use the same pipeline
engine in Azure Synapse Analytics2 if you want to manage all of the components of your data ware-
housing solution in a unified workspace.
In either case, pipelines consist of one or more activities that operate on data. An input dataset provides
the source data, and activities can be defined as a data flow that incrementally manipulates the data until
an output dataset is produced. Pipelines use linked services to load and process data – enabling you to
use the right technology for each step of the workflow. For example, you might use an Azure Blob Store
linked service to ingest the input dataset, and then use services such as Azure SQL Database to run a
stored procedure that looks up related data values, before running a data processing task on Azure
Databricks or Azure HDInsight, or apply custom logic using an Azure Function. Finally, you can save the
output dataset in a linked service such as Azure Synapse Analytics. Pipelines can also include some
built-in activities, which don’t require a linked service.

Analytical data stores


There are two common types of analytical data store.

Data warehouses

1 https://azure.microsoft.com/services/data-factory?azure-portal=true
2 https://azure.microsoft.com/services/synapse-analytics?azure-portal=true
66     

A data warehouse is a relational database in which the data is stored in a schema that is optimized for
data analytics rather than transactional workloads. Commonly, the data from a transactional store is
denormalized into a schema in which numeric values are stored in central fact tables, which are related to
one or more dimension tables that represent entities by which the data can be aggregated. For example
a fact table might contain sales order data, which can be aggregated by customer, product, store, and
time dimensions (enabling you, for example, to easily find monthly total sales revenue by product for
each store). This kind of fact and dimension table schema is called a star schema; though it's often
extended into a snowflake schema by adding additional tables related to the dimension tables to repre-
sent dimensional hierarchies (for example, product might be related to product categories). A data
warehouse is a great choice when you have transactional data that can be organized into a structured
schema of tables, and you want to use SQL to query them.

Data lakes

A data lake is a file store, usually on a distributed file system for high performance data access. Technolo-
gies like Spark or Hadoop are often used to process queries on the stored files and return data for
reporting and analytics. These systems often apply a schema-on-read approach to define tabular schemas
on semi-structured data files at the point where the data is read for analysis, without applying constraints
when it's stored. Data lakes are great for supporting a mix of structured, semi-structured, and even
unstructured data that you want to analyze without the need for schema enforcement when the data is
written to the store.

Hybrid approaches
You can use a hybrid approach that combines features of data lakes and data warehouses in a lake
database or data lakehouse. The raw data is stored as files in a data lake, and a relational storage layer
abstracts the underlying files and expose them as tables, which can be queried using SQL. SQL pools in
Azure Synapse Analytics include PolyBase, which enables you to define external tables based on files in a
datalake (and other sources) and query them using SQL. Synapse Analytics also supports a Lake Database
approach in which you can use database templates to define the relational schema of your data ware-
house, while storing the underlying data in data lake storage – separating the storage and compute for
your data warehousing solution. Data lakehouses are a relatively new approach in Spark-based systems,
and are enabled through technologies like Delta Lake; which adds relational storage capabilities to Spark,
so you can define tables that enforce schemas and transactional consistency, support batch-loaded and
streaming data sources, and provide a SQL API for querying.
    67

Choose an analytical data store service


Azure services for analytical stores
On Azure, there are three main services that you can use to implement a large-scale analytical store

Azure Synapse Analytics3 is a unified, end-to-end solution for large scale data analytics. It brings
together multiple technologies and capabilities, enabling you to combine the data integrity and reliability
of a scalable, high-performance SQL Server based relational data warehouse with the flexibility of a data
lake and open-source Apache Spark. It also includes native support for log and telemetry analytics with
Azure Synapse Data Explorer pools, as well as built in data pipelines for data ingestion and transforma-
tion. All Azure Synapse Analytics services can be managed through a single, interactive user interface
called Azure Synapse Studio, which includes the ability to create interactive notebooks in which Spark
code and markdown content can be combined. Synapse Analytics is a great choice when you want to
create a single, unified analytics solution on Azure.

Azure Databricks4 is an Azure implementation of the popular Databricks platform. Databricks is a


comprehensive data analytics solution built on Apache Spark, and offers native SQL capabilities as well as
workload-optimized Spark clusters for data analytics and data science. Databricks provides an interactive
user interface through which the system can be managed and data can be explored in interactive note-
books. Due to its common use on multiple cloud platforms, you might want to consider using Azure
Databricks as your analytical store if you want to use existing expertise with the platform or if you need to
operate in a multi-cloud environment or support a cloud-portable solution.

Azure HDInsight5 is an Azure service that supports multiple open-source data analytics cluster types.
Although not as user-friendly as Azure Synapse Analytics and Azure Databricks, it can be a suitable
option if your analytics solution relies on multiple open-source frameworks or if you need to migrate an
existing on-premises Hadoop-based solution to the cloud.
NOTE:
Each of these services can be thought of as an analytical data store, in the sense that they provide a
schema and interface through which the data can be queried. In many cases however, the data is actually
stored in a data lake and the service is used to process the data and run queries. Some solutions might
even combine the use of these services. An extract, load, and transform (ELT) ingestion process might
copy data into the data lake, and then use one of these services to transform the data, and another to
query it. For example, a pipeline might use a MapReduce job running in HDInsight or a notebook running
in Azure Databricks to process a large volume of data in the data lake, and then load it into tables in a
SQL pool in Azure Synapse Analytics.

3 https://azure.microsoft.com/services/synapse-analytics?azure-portal=true
4 https://azure.microsoft.com/services/databricks?azure-portal=true
5 https://azure.microsoft.com/services/hdinsight?azure-portal=true
68     

Lab: Explore Azure Synapse Analytics


In this lab, you will provision an Azure Synapse Analytics workspace, and use it to ingest and process
data.
1. Start the virtual machine for this lab​, or go to the exercise page at https://aka.ms/dp900-synapse-lab. ​
2. Follow the instructions to complete the exercise on Microsoft Learn​, using the Azure subscription
provided for this lab.

Knowledge check
Question 1
Which Azure services can you use to create a pipeline for data ingestion and processing?
†† Azure SQL Database and Azure Cosmos DB
†† Azure Synapse Analytics and Azure Data Factory
†† Azure HDInsight and Azure Databricksthey do not include the capability to create pipelines.

Question 2
What must you define to implement a pipeline that reads data from Azure Blob Storage?
†† A linked service for your Azure Blob Storage account
†† A dedicated SQL pool in your Azure Synapse Analytics workspace
†† An Azure HDInsight cluster in your subscription

Question 3
Which open-source distributed processing engine does Azure Synapse Analytics include?
†† Apache Hadoop
†† Apache Spark
†† Apache Storm

Summary
Modern data warehousing is a complex workload that can involve many different technologies. This
module has provided a high-level overview of the key features of a modern data warehousing solution,
and explored some of the services in Azure that you can use to implement one.
In this module, you learned how to:
●● Identify common elements of a modern data warehousing solution
●● Describe key features for data ingestion pipelines
●● Identify common types of analytical data store and related Azure services
●● Provision Azure Synapse Analytics and use it to ingest, process, and query data
    69

Next steps
Now that you've learned about modern data warehousing, consider learning more about data-related
workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamentals6.

6 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
70     

Streaming and real-time analytics


Introduction
Increased use of technology by individuals, companies, and other organizations, together with the
proliferation of smart devices and Internet access has led to a massive growth in the volume of data that
can be generated, captured, and analyzed. Much of this data can be processed in real-time (or at least,
near real-time) as a perpetual stream of data, enabling the creation of systems that reveal instant insights
and trends, or take immediate responsive action to events as they occur.

Learning objectives
In this module, you'll learn about the basics of stream processing and real-time analytics, and the services
in Microsoft Azure that you can use to implement real-time data processing solutions. Specifically, you'll
learn how to:
●● Compare batch and stream processing
●● Describe common elements of streaming data solutions
●● Describe features and capabilities of Azure Stream Analytics
●● Describe features and capabilities of Spark Structured Streaming on Azure
●● Describe features and capabilities of Azure Synapse Data Explorer
NOTE:
This module is designed to present a conceptual overview of real-time processing and describe Azure
services that can be used to build real-time analytics solutions. It is not intended to teach implementation
details for creating a stream processing solution.

Batch vs stream processing


Data processing is simply the conversion of raw data to meaningful information through a process. There
are two general ways to process data:
●● Batch processing, in which multiple data records are collected and stored before being processed
together in a single operation.
●● Stream processing, in which a source of data is constantly monitored and processed in real time as
new data events occur.

Understand batch processing


In batch processing, newly arriving data elements are collected and stored, and the whole group is
processed together as a batch. Exactly when each group is processed can be determined in a number of
ways. For example, you can process data based on a scheduled time interval (for example, every hour), or
it could be triggered when a certain amount of data has arrived, or as the result of some other event.
For example, suppose you want to analyze road traffic by counting the number of cars on a stretch of
road. A batch processing approach to this would require that you collect the cars in a parking lot, and
then count them in a single operation while they're at rest.
    71

If the road is busy, with a large number of cars driving along at frequent intervals, this approach may be
impractical; and note that you don't get any results until you have parked a batch of cars and counted
them.
A real world example of batch processing is the way that credit card companies handle billing. The
customer doesn't receive a bill for each separate credit card purchase but one monthly bill for all of that
month's purchases.
Advantages of batch processing include:
●● Large volumes of data can be processed at a convenient time.
●● It can be scheduled to run at a time when computers or systems might otherwise be idle, such as
overnight, or during off-peak hours.
Disadvantages of batch processing include:
●● The time delay between ingesting the data and getting the results.
●● All of a batch job's input data must be ready before a batch can be processed. This means data must
be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs
bring the whole process to a halt. The input data must be carefully checked before the job can be run
again. Even minor data errors can prevent a batch job from running.

Understand stream processing


In stream processing, each new piece of data is processed when it arrives. Unlike batch processing,
there's no waiting until the next batch processing interval - data is processed as individual units in
real-time rather than being processed a batch at a time. Stream data processing is beneficial in scenarios
where new, dynamic data is generated on a continual basis.
For example, a better approach to our hypothetical car counting problem might be to apply a streaming
approach, by counting the cars in real-time as they pass:
72     

In this approach, you don't need to wait until all of the cars have parked to start processing them, and
you can aggregate the data over time intervals; for example, by counting the number of cars that pass
each minute.
Real world examples of streaming data include:
●● A financial institution tracks changes in the stock market in real time, computes value-at-risk, and
automatically rebalances portfolios based on stock price movements.
●● An online gaming company collects real-time data about player-game interactions, and feeds the
data into its gaming platform. It then analyzes the data in real time, offers incentives and dynamic
experiences to engage its players.
●● A real-estate website that tracks a subset of data from mobile devices, and makes real-time property
recommendations of properties to visit based on their geo-location.
Stream processing is ideal for time-critical operations that require an instant real-time response. For
example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors
to allow residents to escape immediately in the event of a fire.

Understand differences between batch and streaming data


Apart from the way in which batch processing and streaming processing handle data, there are other
differences:
●● Data scope: Batch processing can process all the data in the dataset. Stream processing typically only
has access to the most recent data received, or within a rolling time window (the last 30 seconds, for
example).
●● Data size: Batch processing is suitable for handling large datasets efficiently. Stream processing is
intended for individual records or micro batches consisting of few records.
    73

●● Performance: Latency is the time taken for the data to be received and processed. The latency for
batch processing is typically a few hours. Stream processing typically occurs immediately, with latency
in the order of seconds or milliseconds.
●● Analysis: You typically use batch processing to perform complex analytics. Stream processing is used
for simple response functions, aggregates, or calculations such as rolling averages.

Combine batch and stream processing


Many large-scale analytics solutions include a mix of batch and stream processing, enabling both histori-
cal and real-time data analysis. It's common for stream processing solutions to capture real-time data,
process it by filtering or aggregating it, and present it through real-time dashboards and visualizations
(for example, showing the running total of cars that have passed along a road within the current hour),
while also persisting the processed results in a data store for historical analysis alongside batch processed
data (for example, to enable analysis of traffic volumes over the past year).
Even when real-time analysis or visualization of data is not required, streaming technologies are often
used to capture real-time data and store it in a data store for subsequent batch processing (this is the
equivalent of redirecting all of the cars that travel along a road into a parking lot before counting them).
The following diagram shows some ways in which batch and stream processing can be combined in a
large-scale data analytics architecture.

1. Data events from a streaming data source are captured in real-time.


2. Data from other sources is ingested into a data store (often a data lake) for batch processing.
3. If real-time analytics is not required, the captured streaming data is written to the data store for
subsequent batch processing.
4. When real-time analytics is required, a stream processing technology is used to prepare the streaming
data for real-time analysis or visualization; often by filtering or aggregating the data over temporal
windows.
5. The non-streaming data is periodically batch processed to prepare it for analysis, and the results are
persisted in an analytical data store (often referred to as a data warehouse) for historical analysis.
6. The results of stream processing may also be persisted in the analytical data store to support histori-
cal analysis.
7. Analytical and visualization tools are used to present and explore the real-time and historical data.
NOTE:
Commonly used solution architectures for combined batch and stream data processing include lambda
and delta architectures. Details of these architectures are beyond the scope of this course, but they
74     

incorporate technologies for both large-scale batch data processing and real-time stream processing to
create an end-to-end analytical solution.
There are many technologies that you can use to implement a stream processing solution, but while
specific implementation details may vary, there are common elements to most streaming architectures.

A general architecture for stream processing


At its simplest, a high-level architecture for stream processing looks like this:

1. An event generates some data. This might be a signal being emitted by a sensor, a social media
message being posted, a log file entry being written, or any other occurrence that results in some
digital data.
2. The generated data is captured in a streaming source for processing. In simple cases, the source may
be a folder in a cloud data store or a table in a database. In more robust streaming solutions, the
source may be a “queue” that encapsulates logic to ensure that event data is processed in order and
that each event is processed only once.
3. The event data is processed, often by a perpetual query that operates on the event data to select data
for specific types of events, project data values, or aggregate data values over temporal (time-based)
periods (or windows) - for example, by counting the number of sensor emissions per minute.
4. The results of the stream processing operation are written to an output (or sink), which may be a file, a
database table, a real-time visual dashboard, or another queue for further processing by a subsequent
downstream query.

Real-time analytics in Azure


Microsoft Azure supports multiple technologies that you can use to implement real-time analytics of
streaming data, including:
●● Azure Stream Analytics: A platform-as-a-service (PaaS) solution that you can use to define streaming
jobs that ingest data from a streaming source, apply a perpetual query, and write the results to an
output.
●● Spark Structured Streaming: An open-source library that enables you to develop complex streaming
solutions on Apache Spark based services, including Azure Synapse Analytics, Azure Databricks,
and Azure HDInsight.
●● Azure Data Explorer: A high-performance database and analytics service that is optimized for
ingesting and querying batch or streaming data with a time-series element, and which can be used as
a standalone Azure service or as an Azure Synapse Data Explorer runtime in an Azure Synapse
Analytics workspace.
    75

Sources for stream processing


The following services are commonly used to ingest data for stream processing on Azure:
●● Azure Event Hubs: A data ingestion service that you can use to manage queues of event data,
ensuring that each event is processed in order, exactly once.
●● Azure IoT Hub: A data ingestion service that is similar to Azure Event Hubs, but which is optimized
for managing event data from Internet-of-things (IoT) devices.
●● Azure Data Lake Store Gen 2: A highly scalable storage service that is often used in batch processing
scenarios, but which can also be used as a source of streaming data.
●● Apache Kafka: An open-source data ingestion solution that is commonly used together with Apache
Spark. You can use Azure HDInsight to create a Kafka cluster.

Sinks for stream processing


The output from stream processing is often sent to the following services:
●● Azure Event Hubs: Used to queue the processed data for further downstream processing.
●● Azure Data Lake Store Gen 2 or Azure blob storage: Used to persist the processed results as a file.
●● Azure SQL Database or Azure Synapse Analytics, or Azure Databricks: Used to persist the pro-
cessed results in a database table for querying and analysis.
●● Microsoft Power BI: Used to generate real time data visualizations in reports and dashboards.

Real-time data processing with Azure Stream


Analytics
Azure Stream Analytics is a service for complex event processing and analysis of streaming data. Stream
Analytics is used to:
●● Ingest data from an input, such as an Azure event hub, Azure IoT Hub, or Azure Storage blob contain-
er.
●● Process the data by using a query to select, project, and aggregate data values.
●● Write the results to an output, such as Azure Data Lake Gen 2, Azure SQL Database, Azure Synapse
Analytics, Azure Functions, Azure event hub, Microsoft Power BI, or others.
76     

Once started, a Stream Analytics query will run perpetually, processing new data as it arrives in the input
and storing results in the output.
Azure Stream Analytics is a great technology choice when you need to continually capture data from a
streaming source, filter or aggregate it, and send the results to a data store or downstream process for
analysis and reporting.

Azure Stream Analytics jobs and clusters


The easiest way to use Azure Stream Analytics is to create a Stream Analytics job in an Azure subscription,
configure its input(s) and output(s), and define the query that the job will use to process the data. The
query is expressed using structured query language (SQL) syntax, and can incorporate static reference
data from multiple data sources to supply lookup values that can be combined with the streaming data
ingested from an input.
If your stream process requirements are complex or resource-intensive, you can create a Stream Analysis
cluster, which uses the same underlying processing engine as a Stream Analytics job, but in a dedicated
tenant (so your processing is not affected by other customers) and with configurable scalability that
enables you to define the right balance of throughput and cost for your specific scenario.
NOTE:
To learn more about the capabilities of Azure Stream Analytics, see the Azure Stream Analytics docu-
mentation7.

Real-time log and telemetry analysis with Azure


Data Explorer
Azure Data Explorer is a standalone Azure service for efficiently analyzing data. You can use the service as
the output for analyzing large volumes of diverse data from data sources such as websites, applications,
IoT devices, and more. For example, by outputting Azure Stream Analytics logs to Azure Data Explorer,
you can complement Stream Analytics low latency alerts handling with Data Explorer's deep investigation
capabilities. The service is also encapsulated as a runtime in Azure Synapse Analytics, where it is referred

7 https://docs.microsoft.com/azure/stream-analytics/
    77

to as Azure Synapse Data Explorer; enabling you to build and manage analytical solutions that combine
SQL, Spark, and Data Explorer analytics in a single workspace.

Data is ingested into Data Explorer through one or more connectors or by writing a minimal amount of
code. This enables you to quickly ingest data from a wide variety of data sources, including both static
and streaming sources. Data Explorer supports batching and streaming in near real time to optimize data
ingestion. The ingested data is stored in tables in a Data Explorer database, where automatic indexing
enables high-performance queries.
Azure Data Explorer is a great choice of technology when you need to:
●● Capture and analyze real-time or batch data that includes a time-series element; such as log telemetry
or values emitted by Internet-of-things (IoT) devices.
●● Explore, filter, and aggregate data quickly by using the intuitive and powerful Kusto Query Language
(KQL).
Azure Synapse Data Explorer is an especially good choice when you need to perform these tasks in a
centralized environment used for other kinds of analytics, such as SQL and Spark based queries.
TIP:
To learn more about Azure Data Explorer and its uses, see the Introduction to Azure Data Explorer8
module, which is not part of the official course materials for this course but supports further learning
beyond the scope of the Data Fundamentals certification.

Kusto Query Language (KQL)


To query Data Explorer tables, you can use Kusto Query Language (KQL), a language that is specifically
optimized for fast read performance – particularly with telemetry data that includes a timestamp attrib-
ute.

8 https://docs.microsoft.com/learn/modules/intro-to-azure-data-explorer/intro-to-azure-data-explorer/?azure-portal=true
78     

The most basic KQL query consists simply of a table name, in which case the query returns all of the data
in the table. For example, the following query would return the contents of the LogEvents table:
LogEvents

You can add clauses to a Kusto query to filter, sort, aggregate, and return (project) specific columns. Each
clause is prefixed by a | character. For example, the following query returns the StartTime, EventType,
and Message columns from the LogEvents table for errors that were recorded after December 31st 2021.
LogEvents
| where StartTime > datetime(2021-12-31)
| where EventType == 'Error'
| project StartTime, EventType , Message

Kusto query language is a versatile but intuitive language that enables data analysts to quickly gain
insights from data captured and stored in a Data Explorer database.
TIP:
To learn more about Kusto Query Language, see the Write your first query with Kusto Query Lan-
guage9 module, which is not part of the official course materials for this course but supports further
learning beyond the scope of the Data Fundamentals certification.

Lab: Analyze streaming data


In this lab, you will use Azure Stream Analytics to process a real-time data stream​.
1. Start the virtual machine for this lab​or go to the exercise page at https://aka.ms/dp900-stream-lab. ​
2. Follow the instructions to complete the exercise on Microsoft Learn​, using the Azure subscription
provided for this lab and a cloud shell in the Azure portal.

Knowledge check
Question 1
Which definition of stream processing is correct?
†† Data is processed continually as new data records arrives.
†† Data is collected in a temporary store, and all records are processed together as a batch.
†† Data is incomplete and cannot be analyzed.

Question 2
Which service would you use to continually capture data from an IoT Hub, aggregate it over temporal
periods, and store results in Azure SQL Database?
†† Azure Cosmos DB
†† Azure Stream Analytics
†† Azure Storage

9 https://docs.microsoft.com/en-us/learn/modules/write-first-query-kusto-query-language/?azure-portal=true
    79

Question 3
Which language would you use to query real-time log data in Azure Synapse Data Explorer?
†† SQL
†† Python
†† KQL

Summary
Real-time processing is a common element of enterprise data analytics solutions. Microsoft Azure offers a
variety of services that you can use to implement stream processing and real-time analysis.
In this module, you learned how to:
●● Compare batch and stream processing
●● Describe common elements of streaming data solutions
●● Describe features and capabilities of Azure Stream Analytics
●● Describe features and capabilities of Spark Structured Streaming on Azure
●● Describe features and capabilities of Azure Synapse Data Explorer

Next steps
Now that you've learned about stream processing and real-time analytics, consider learning more about
data-related workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamentals10.

10 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
80     

Data visualization
Introduction
Data modeling and visualization is at the heart of business intelligence (BI) workloads that are supported
by modern data analytics solutions. Essentially, data visualization powers reporting and decision making
that helps organizations succeed.
In this module, you'll learn about fundamental principles of analytical data modeling and data visualiza-
tion, using Microsoft Power BI as a platform to explore these principles in action.

Learning objectives
After completing this module, you'll be able to:
●● Describe a high-level process for creating reporting solutions with Microsoft Power BI
●● Describe core principles of analytical data modeling
●● Identify common types of data visualization and their uses
●● Create an interactive report with Power BI Desktop

Introduction to data visualization with Power BI​


There are many data visualization tools that data analysts can use to explore data and summarize insights
visually; including chart support in productivity tools like Microsoft Excel and built-in data visualization
widgets in notebooks used to explore data in services such as Azure Synapse Analytics and Azure Data-
bricks. However, for enterprise-scale business analytics, an integrated solution that can support complex
data modeling, interactive reporting, and secure sharing is often required.

Microsoft Power BI
Microsoft Power BI is a suite of tools and services that data analysts can use to build interactive data
visualizations for business users to consume.
    81

A typical workflow for creating a data visualization solution starts with Power BI Desktop, a Microsoft
Windows application in which you can import data from a wide range of data sources, combine and
organize the data from these sources in an analytics data model, and create reports that contain interac-
tive visualizations of the data.
After you've created data models and reports, you can publish them to the Power BI service; a cloud
service in which reports can be published and interacted with by business users. You can also do some
basic data modeling and report editing directly in the service using a web browser, but the functionality
for this is limited compared to the Power BI Desktop tool. You can use the service to schedule refreshes of
the data sources on which your reports are based, and to share reports with other users. You can also
define dashboards and apps that combine related reports in a single, easy to consume location.
Users can consume reports, dashboards, and apps in the Power BI service through a web browser, or on
mobile devices by using the Power BI phone app.

Analytical data modeling


Analytical models enable you to structure data to support analysis. Models are based on related tables of
data and define the numeric values that you want to analyze or report (known as measures) and the
entities by which you want to aggregate them (known as dimensions). For example, a model might
include a table containing numeric measures for sales (such as revenue or quantity) and dimensions for
products, customers, and time. This would enable you aggregate sale measures across one or more
dimensions (for example, to identify total revenue by customer, or total items sold by product per
month). Conceptually, the model forms a multidimensional structure, which is commonly referred to as a
cube, in which any point where the dimensions intersect represents an aggregated measure for those
dimensions.

NOTE:
Although we commonly refer to an analytical model as a cube, there can be more (or fewer) than three
dimensions – it’s just not easy for us to visualize more than three!

Tables and schema


Dimension tables represent the entities by which you want to aggregate numeric measures – for example
product or customer. Each entity is represented by a row with a unique key value. The remaining columns
represent attributes of an entity – for example, products have names and categories, and customers have
82     

addresses and cities. It’s common in most analytical models to include a Time dimension so that you can
aggregate numeric measures associated with events over time.
The numeric measures that will be aggregated by the various dimensions in the model are stored in Fact
tables. Each row in a fact table represents a recorded event that has numeric measures associated with it.
For example, the Sales table in the schema below represents sales transactions for individual items, and
includes numeric values for quantity sold and revenue.

This type of schema, where a fact table is related to one or more dimension tables, is referred to as a star
schema (imagine there are five dimensions related to a single fact table – the schema would form a
five-pointed star!). You can also define more a complex schema in which dimension tables are related to
additional tables containing more details (for example, you could represent attributes of product catego-
ries in a separate Category table that is related to the Product table – in which case the design is
referred to as a snowflake schema. The schema of fact and dimension tables is used to create an analyti-
cal model, in which measure aggregations across all dimensions are pre-calculated; making performance
of analysis and reporting activities much faster than calculating the aggregations each time.

Attribute hierarchies
One final thing worth considering about analytical models is the creation of attribute hierarchies that ena-
ble you to quickly drill-up or drill-down to find aggregated values at different levels in a hierarchical
dimension. For example, consider the attributes in the dimension tables we’ve discussed so far. In the
Product table, you can form a hierarchy in which each category might include multiple named products.
Similarly, in the Customer table, a hierarchy could be formed to represent multiple named customers in
each city. Finally, in the Time table, you can form a hierarchy of year, month, and day. The model can be
built with pre-aggregated values for each level of a hierarchy, enabling you to quickly change the scope
of your analysis – for example, by viewing total sales by year, and then drilling down to see a more
detailed breakdown of total sales by month.
    83

Analytical modeling in Microsoft Power BI


You can use Power BI to define an analytical model from tables of data, which can be imported from one
or more data source. You can then use the data modeling interface on the Model tab of Power BI
Desktop to define your analytical model by creating relationships between fact and dimension tables,
defining hierarchies, setting data types and display formats for fields in the tables, and managing other
properties of your data that help define a rich model for analysis.

Common data visualizations in reports


After you've created a model, you can use it to generate data visualizations that can be included in a
report.
There are many kinds of data visualization, some commonly used and some more specialized. Power BI
includes an extensive set of built-in visualizations, which can be extended with custom and third-party
visualizations. The rest of this unit discusses some common data visualizations but is by no means a
complete list.
84     

Tables and text

Tables and text are often the simplest way to communicate data. Tables are useful when numerous
related values must be displayed, and individual text values in cards can be a useful way to show impor-
tant figures or metrics.

Bar and column charts

Bar and column charts are a good way to visually compare numeric values for discrete categories.

Line charts
    85

Line charts can also be used to compare categorized values and are useful when you need to examine
trends, often over time.

Pie charts

Pie charts are often used in business reports to visually compare categorized values as proportions of a
total.

Scatter plots

Scatter plots are useful when you want to compare two numeric measures and identify a relationship or
correlation between them.
86     

Maps

Maps are a great way to visually compare values for different geographic areas or locations.

Interactive reports in Power BI

In Power BI, the visual elements for related data in a report are automatically linked to one another and
provide interactivity. For example, selecting an individual category in one visualization will automatically
filter and highlight that category in other related visualizations in the report. In the image above, the city
Seattle has been selected in the Sales by City and Category column chart, and the other visualizations
are filtered to reflect values for Seattle only.

Lab: Visualize data with Power BI


In this lab, you will use Power BI Desktop to create a data model and a report.
1. Start the virtual machine for this lab​or go to the exercise page at https://aka.ms/dp900-pbi-lab.
    87

2. Follow the instructions to complete the exercise on Microsoft Learn​, using the Azure subscription
provided for this lab.

Knowledge check
Question 1
Which tool should you use to import data from multiple data sources and create a report?
†† Power BI Desktop
†† Power BI Phone App
†† Azure Data Factory

Question 2
What should you define in your data model to enable drill-up/down analysis?
†† A measure
†† A hierarchy
†† A relationship

Question 3
Which kind of visualization should you use to analyze pass rates for multiple exams over time?
†† A pie chart
†† A scatter plot
†† A line chart

Summary
Data modeling and visualization enables organizations to extract insights from data.
In this module, you learned how to:
●● Describe a high-level process for creating reporting solutions with Microsoft Power BI
●● Describe core principles of analytical data modeling
●● Identify common types of data visualization and their uses
●● Create an interactive report with Power BI Desktop

Next steps
Now that you've learned about data modeling and visualization, consider learning more about data-relat-
ed workloads on Azure by pursuing a Microsoft certification in Azure Data Fundamentals11.

11 https://docs.microsoft.com/learn/certifications/azure-data-fundamentals/
88     

Further learning
Further learning
To review what you've learned and do additional labs, review the Microsoft Learn modules for this course:
●● Explore core data concepts12
●● Explore relational data in Azure13
●● Explore non-relational data in Azure14
●● Explore data analytics in Azure15

12 https://docs.microsoft.com/learn/paths/azure-data-fundamentals-explore-core-data-concepts/
13 https://docs.microsoft.com/learn/paths/azure-data-fundamentals-explore-relational-data/
14 https://docs.microsoft.com/learn/paths/azure-data-fundamentals-explore-non-relational-data/
15 https://docs.microsoft.com/learn/paths/azure-data-fundamentals-explore-data-warehouse-analytics/
    89

Answers
Question 1
Which Azure services can you use to create a pipeline for data ingestion and processing?
†† Azure SQL Database and Azure Cosmos DB
■■ Azure Synapse Analytics and Azure Data Factory
†† Azure HDInsight and Azure Databricksthey do not include the capability to create pipelines.
Explanation
Both Azure Synapse Analytics and Azure Data Factory include the capability to create pipelines.
Question 2
What must you define to implement a pipeline that reads data from Azure Blob Storage?
■■ A linked service for your Azure Blob Storage account
†† A dedicated SQL pool in your Azure Synapse Analytics workspace
†† An Azure HDInsight cluster in your subscription
Explanation
You need to create linked services for external services you want to use in the pipeline.
Question 3
Which open-source distributed processing engine does Azure Synapse Analytics include?
†† Apache Hadoop
■■ Apache Spark
†† Apache Storm
Explanation
Azure Synapse Analytics includes an Apache Spark runtime.
Question 1
Which definition of stream processing is correct?
■■ Data is processed continually as new data records arrives.
†† Data is collected in a temporary store, and all records are processed together as a batch.
†† Data is incomplete and cannot be analyzed.
Explanation
Stream processing is used to continually process new data as it arrives.
90     

Question 2
Which service would you use to continually capture data from an IoT Hub, aggregate it over temporal
periods, and store results in Azure SQL Database?
†† Azure Cosmos DB
■■ Azure Stream Analytics
†† Azure Storage
Explanation
Azure Stream Analytics can be used to query a stream of data from Azure IoT Hub and store the results in
Azure SQL Database.
Question 3
Which language would you use to query real-time log data in Azure Synapse Data Explorer?
†† SQL
†† Python
■■ KQL
Explanation
Kusto Query Language is an intuitive but powerful language for querying Data Explorer tables.
Question 1
Which tool should you use to import data from multiple data sources and create a report?
■■ Power BI Desktop
†† Power BI Phone App
†† Azure Data Factory
Explanation
Use Power BI Desktop to create reports from a wide range of data sources.
Question 2
What should you define in your data model to enable drill-up/down analysis?
†† A measure
■■ A hierarchy
†† A relationship
Explanation
A hierarchy defines multiple levels of attributes.
Question 3
Which kind of visualization should you use to analyze pass rates for multiple exams over time?
†† A pie chart
†† A scatter plot
■■ A line chart
Explanation
A line chart is ideal for visualizing values over time.

You might also like