0% found this document useful (0 votes)
20 views

SQLServerGeeks Magazine May 2021

https://sqlservergeeks.com/resources/magazine/SSG_Magazine_May_2021.pdf

Uploaded by

rexpan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

SQLServerGeeks Magazine May 2021

https://sqlservergeeks.com/resources/magazine/SSG_Magazine_May_2021.pdf

Uploaded by

rexpan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Back to TOC | Page 1

Our Social Channels

Website

Magazine Linkedin

It’s time to lift the curtains. We are thrilled to present to you, the
first edition of The SQLServerGeeks Magazine. Telegram

We always wanted to expand our offerings to the SQL Community,


evolve, and become better and what we do, and with the magazine, Youtube
we have upgraded ourselves. The magazine has long been a dream,
which has come to reality today.

In all honesty, the journey to this point was riddled with road-bumps Twitter
and potholes of all sorts. As we started conceptualizing the first
edition of the magazine, we were hit with the second wave of the
pandemic quite hard affecting the families of our team members Facebook
unsettling all of us. Things seemed grey and bleak. Nonetheless, we
kept your heads high and spirits higher as we gave it our undivided
best to bring you this edition, undoubtedly the first of many, as a COPYRIGHT STATEMENT
reminder of the fact that perseverance does pay off. Copyright 2021. SQLServerGeeks.com. c/o
eDominer Systems Pvt. Ltd. All rights
A massive shoutout to all authors -Warner Chaves, Leonard Lobel, reserved. No part of this magazine may be
reproduced or transmitted in any form or by
Edward Pollack, Parikshit Savjani, Tracy Boggiano, Tomaž Kaštrun, any means, electronic or mechanical,
Steve Jones, Anna Hoffman, and Anupama Natarajan -who agreed including photocopying and recording, or by
to contribute on such short notice –hats off to them. We are truly any information storage and retrieval
system, without permission in writing from
humbled to see that the #SQLFamily has got our back and with the publisher. Articles contained in the
absolute certainty, we can say that we got theirs! magazine are copyright of the respective
authors. All product names, logos,
As much as we are passionate about all things SQL, it was our trademarks, and brands are the property of
their respective owners. For any
unanimous decision to add something more, something Beyond clarification, write to
SQL, comprising of the little things in life we have come to cherish [email protected].
in these recent times of uncertainty. If you are already a Geek or
aspiring to be one, we bring you both the code and the colors from
CORPORATE ADDRESS
the world of SQL and the amazing people that make it possible.
Bangalore Office:
We here at SQLServerGeeks believe everything can be fine-tuned 686, 6 A Cross,
and optimized. It is no different with our magazine. Make sure to 3rd Block Koramangala,
give us your feedback so we can continue to provide quality content, Bangalore – 560034
well-curated to your interests. Write to us at
[email protected]
Kolkata Offices:
Just as a magician never reveals all his tricks at once, we too have a
few cards up our sleeves. So, make sure to stay tuned for a spectacle Office 1:
in the days to come. We are just getting started. Make sure you help eDominer Systems Pvt. Ltd.
us spread the word. Ask your friends and colleagues to subscribe to The Chambers
the magazine. Office Unit 206 (Second Floor)
1865 Rajdanga Main Road
From all of us at SQLServerGeeks, we wish you a pleasant read. (Kasba)
Happy Learning. Kolkata 700107
Yours Sincerely
SQLServerGeeks Team Office 2:
304, PS Continental,
Got from a friend? Subscribe now to get your copy. 83/2/1, Topsia Road (South),
Kolkata 700046
Back to TOC | Page 2
TABLE OF CONTENTS
04 a 10 a 12 a 16 a

Bullet-Proofing Your In Memory of Gareth Women in In Memory of


RTO & RPO Swanepoel Technology Ahmad Osama

18 a 24 a 26 a 33 a

Always Encrypted SQL Nuggets by Azure Database for DPS 2021


with Secure Enclaves Microsoft MySQL Flexible Server Announcement
in SQL Server 2019 – A Fully Managed
and Azure SQL Service Running
Database Community Version of
MySQL.

35 a 39 a 40 a 43 a

Mental Health and Learning SQL Server #SQLFamily


Wellness in IT: Let’s Opportunities OPENJSON for Beyond SQL
Stop the Stigma Selecting and
Comparing values

44 a 49 a 51 a 52 a

A Few Tricks for Inside Look at Data Free Content Every Good Database
Testing Security Exposed Week Architecture is the
Best Optimization
Back to TOC | Page 3
Bullet-proofing Your
RTO and RPO
Warner Chaves | @warchav

I f you are a DBA then you are surely familiar with the terms Recovery Time Objective (RTO) and
Recovery Point Objective (RPO). These terms have been used for decades to define service level
agreements for IT systems all around the world.

Relational database systems such as SQL Server provide mechanisms to configure, manipulate and
control the behaviour of your database system so that you can meet your RTO and RPO requirements.

This is not a new topic and so you must be thinking to yourself that you already know everything about
it and I’m beating on a dead horse. However, I recurringly run into incidents where everyone thought
the RTO and RPO would be easily met and it ended up being missed. Sometimes by a very large margin.

To this effect, I decide to compile some of the cases and scenarios to watch out for when working with
SQL Server and Azure SQL Db to make sure that when you commit to those RTO and RPO requirements,
you will be able to meet them guaranteed!

RTO
Let’s start with RTO. This one is all about how fast you can get back in business and that is why it’s
defined as a time objective. For production systems this could be hours down to seconds and for non-
production systems it could be hours, sometimes even days. To meet the RTO you will implement not
only a backup strategy but also a High Availability (HA) and Disaster Recovery (DR) strategy if
necessary.

Restore Testing
If you depend on your backups completely for your RTO, then at the very least you need to have a
restore testing strategy. There are many ways to go around this:

1. Restore job that refreshes a development copy.


2. Restore job into some other idle infrastructure like a DR server.
3. An automatic process that boots up or creates a VM, runs the restore test then powers
down.

Ideally the restore should also include a consistency check to cover all the bases and make sure the
data is 100% recoverable.

Bulletproofing Your RTO & RPO (Page 1 of 5) Back to TOC | Page 4


An important recommendation I often make is to test different restore file sequences to see which
one leads to the fastest recovery. For example, Full + Diff + Logs versus Full + Logs.

Even though usually the least files to restore means less time, it might not necessarily be so and
different file types with the same amount of files to apply will also give you different times. Restore
time depends on the sizes of the backups and the number of operations done during the timespan
covered by those files so there is always some level of uncertainty. For this reason, I still recommend
you test the different restore file sequences to figure out which one will be the fastest and do it on a
regular schedule.

Now you might be thinking, I have an HA configuration and I don’t depend on my backups for my RTO,
I have a cluster and multiple nodes, etc. In this case you still need to be mindful of your backup-based
RTO because what happens if someone either maliciously or accidentally deletes data or drops a
table? Your Availability Group will immediately replicate this destructive change to the rest of the
cluster! And back to the backups you will have to go.

Crash Recovery Testing


The other common event that will test your RTO is when something unexpected happens to your SQL
Server instance or Availability Group. Then SQL Server runs crash recovery and you might be waiting
for a long time to have your database 100% available because recovery is taking longer than expected.
This can happen when:

1. The SQL Server instance is restarted.


2. A SQL Server Failover Cluster Instance fails over.
3. An Availability Group fails over.
4. A long transaction is cancelled.

I have seen this happen countless times. For example, a DBA will have a two-node Availability Group
and set them to synchronous replication and so they think that any failover will be near instantaneous
and have extremely low RTO. Unfortunately, this is not the case at all, as the synchronous process only
guarantees remote log hardening, not actually replaying the log. If there are a lot of operations to be
replayed, you will still be waiting.

Bulletproofing Your RTO & RPO (Page 2 of 5) Back to TOC | Page 5


Now in SQL Server 2019 and Azure SQL Db, Microsoft introduced Accelerated Database Recovery. This
is a new feature that changes the recovery process of SQL Server databases to dramatically accelerate
the time for the database to be 100% available. However, even though it is a dramatic improvement
over previous versions, you should still test and monitor your fail-overs to make sure you meet your
RTO, even with Accelerated Database Recovery enabled.

Azure SQL Db RTO


Azure SQL Db offers both backup recovery as well as HA/DR options that can assist in meeting RTO.
For backup options you can do local backup as well as geo-restore backups in a paired Azure region.
Local backups are a size-of-data operation so the time to restore will depend on your database size. If
you are using the newer Hyperscale model then it’s not a size of data operation but a few minutes
regardless of the size. For either case, I still suggest to do tests regularly to get a good idea of timings.

If you are using the Auto-failover-group capability then you need to be aware that Microsoft will not
automatically failover your databases at the first sign of an issue but it can take up to 1 hour of
continuous outage before they trigger the failover. If this is not acceptable for your requirements then
you need to roll your own monitoring and do your own manual failover if needed.

One final gotcha is that Microsoft offers a 30 second database failover when invoked by the user but
this is not backed by a formal SLA except when running the Business-critical tier of Azure SQL Db.

The following table summarizes these options:

Recovery method RTO


Geo-restore from geo-replicated backups 12 h
Auto-failover groups 1h
Manual database failover 30 s

RPO
Recovery Point Objective is the amount of data in terms of time that your SLA allows to be lost in the
event of an outage. Unlike RTO, the RPO is usually measured in smaller units could be hours down to
seconds. Most DBAs will implement their RPO with a combination of log backups as well as HA/DR
technologies that continuously replicate database operations.

The biggest gotcha in understanding RPO is that it is not only about being able to recover to the last X
minutes or seconds but also about recovering to a specific point in time in the backup history that is
considered “active” by the business. Going back to the example of someone maliciously or accidentally
doing data changes, your last log backups happening at a high frequency will not protect you if the
change happened a few days ago and you already deleted those log backups to save on backup
storage.

Monitor Backup Time


Controlling the backup RPO is also not about simply setting the log backup schedule. It is also about
making sure that the log backups are always able to meet their frequency. For this I recommend
setting up monitoring jobs that track and baseline how long the backup jobs are taking, as well as alert
when a backup job runs longer than expected.

Bulletproofing Your RTO & RPO (Page 3 of 5) Back to TOC | Page 6


In the image above, even though the schedule is every 15 minutes, the 3pm run of the backup does
not happen because the 2:30 job is still running and it will not get triggered again until 3:30. RPO would
not be met in case the server had an irrecoverable problem in the middle of this backup run.

Availability Groups
For some requirements, the RPO of log backups is not sufficient and a technology like Availability
Groups is implemented to provide an even smaller RPO. The usual configuration is to have log backups
running as well as one or more synchronous replicas of the primary database.

In the configuration above, the system is taking log backups every 15 minutes and keeping one sync
copy with the primary. The biggest danger to your RPO here is that any type of issue stops the
synchronous replication and leaves you open for data loss. This can happen for multiple reasons but
the most common one is that temporary network issues disconnect the nodes and then the
application continues to perform changes on the primary while the backlog of changes to apply to the
secondary keeps growing. As an added issue, if the log starts to grow due to records accumulating,
your log backups will start taking longer and possibly slide over your RPO time window.

A common misconception is that in a synchronous two node cluster, if the secondary replica is
disconnected, the primary SQL Server will simply stop accepting changes (sacrificing availability for
RPO) but that is not the case. I recommend having robust Availability Group monitoring in place either
via 3rd party tools or home-grown scripts to detect these issues.

Azure SQL Db RPO


In terms of RPO, Azure SQL Db takes log backups every 5 to 10 minutes, depending on the compute
size and the amount of database activity. Based on the geo-restore RPO we can see that backup files
must be shipped between paired regions at least once every hour. And similar to the RTO, the RPO
SLA only applies to Business-Critical tier. In order to maintain this SLA Microsoft has throughput limits
on the log to guarantee that the replica never falls to far behind.

Bulletproofing Your RTO & RPO (Page 4 of 5) Back to TOC | Page 7


Here is the summary for Azure SQL Db RPO:

Recovery method RPO


Geo-restore from geo-replicated backups 1h
Auto-failover groups 5s
Manual database failover 5s

Conclusion
When you are in charge of data you must take that responsibility with the level of seriousness that it
truly requires. Guaranteeing specific levels of RTO and RPO are part of this responsibility and you need
to do everything in your power to make sure you can meet them. These cases and scenarios shared
here can help you cover some gaps in your general strategy so that you are never caught off-guard.
And remember, even if you are in the managed Azure SQL Db cloud offering, you should still verify
your configuration and test your procedures to make sure your RPO and RTOs are being met. Thanks
for reading!

Questions? Comments? Talk to the author today. Warner Chaves on Twitter.

About Warner Chaves


Warner is a SQL Server MCM, Microsoft Data Platform MVP and Principal
Consultant at Pythian.

LEARN MORE

Non-Tech World of Warner Chaves

Warner is originally from Costa Rica, currently living in Canada with his wife, daughter and a small
bulldog called Gizmo.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Bulletproofing Your RTO & RPO (Page 5 of 5) Back to TOC | Page 8


{Place your company ad here and reach out to our readers}
{Talk to us today. Drop an email at [email protected]}

Got from a friend? Subscribe now to get your copy


Gareth Swanepoel was a loving Husband,
Father, Son, Brother, Uncle, and a fierce
friend. His warm smile and hearty laugh were
ever present. Gareth was a gentle soul and a
gentleman. Born in Johannesburg South
Africa in 1972, Gareth came to the United
States in 2001.

The job that he came to America for was


shuttered after the tragic events of 9/11.
Gareth worked hard labor jobs, until he
found a job working as a database
administrator. That role would eventually
lead Gareth to become a published author, a
renowned speaker at international
conventions, and a Program Manager for the
Microsoft Azure Synapse Analytics Product
Group.

Visit SQL Memorial

Gareth was an amazing cook. He was a


connoisseur of barbequed meats and
South African delicacies. His brisket and
biltong were well known, and prepared
and generously given to friends. Gareth
was an authority on wine, bourbon, and
whiskey and loved to share a glass with
his neighbors and friends.

Gareth was a musician who played bass in


church and lover of music who collected
many vinyl albums. He loved to put them
on to entertain, or to educate family and
friends who had never heard the music.

Back to TOC | Page 10


Gareth was also a history buff and Civil War re-enactor for the Straw Hats. Gareth
encouraged his friends and participated in the Ron Jon & Space Coast triathlon’s. An
avid sports lover Gareth was passionate about Spingbok’s Rugby, Cricket, Formula
One racing, Boston Red Sox baseball, and Georgia Bulldogs football.

Gareth was a devout Christian with a strong life long relationship with Jesus Christ.
He is survived by his mother Joyce; wife Kellie; sisters Kirsty and Tessa; brother
Bernard; his son Chris; daughter Lilly and his step children Joshua, James, and Kristen.

Thoughts and Memories from Friends

Andy Warren: Farewell, Gareth


Swanepoel

Brent Ozar : We Lost Gareth Swanepoel

Steve Jones I have many memories of


Gareth at events all over the US. It seems
that I would often be in a convention
center, and I’d heard his voice, or see him
walking up. He was always a sight to
behold, wearing something that stood
our amongst the crowd. Always a smile,
always joyful in a way that was infectious.
I have a few pictures in my own post: RIP
Gareth Swanepoel and GoFundMe

TJay Belt As Posted on Twitter. Hey


friend. #sqlfamily. Family. You were one of
a kind. A smile always at the ready. A kind
word on your tongue. A presence in any
space one wanted to spend time with.
You will be missed. And leave a hole in our
collective heart

Kenneth Fisher : “They need help, we must


help.” #forGarethSwan

Contribute and Support Gareth's Family

Back to TOC | Page 11


Women in
technology
Anupama Natarajan | @shantha05

#2021 is still a challenging year for the whole world like #2020 because of the COVID-19 pandemic.
We celebrated the International Women’s day on March 8th
with the theme #ChooseToChallenge. As women we can all
choose to challenge and call out gender bias and equality and
celebrate our achievements.

In this edition we are going to discuss about the different roles


that are available for women to take up in their career and
various resources that will help them to assist with that.

The technical sector is growing each day and with lots of


organisations embracing Digital Transformation, it is opening
more and more roles and opportunities for women to take up. So what key tech roles are available
today and some useful resources to learn and prepare yourself for these roles.

Programmers/Developers – Women can take up this role by


just starting up learning at least one programming language.
This can be either C#, Python or Java. There are plenty of
resources available to get started.

• C# - Learn C# | Free tutorials, courses, videos, and more |


.NET (microsoft.com)
• Python - Take your first steps with Python - Learn | Microsoft
Docs
• Java - Java on Azure - Learn | Microsoft Docs
• No Code Development - Create a canvas app in Power Apps - Learn | Microsoft Docs, Create
a model-driven application in Power Apps - Learn | Microsoft Docs

Data – Data is becoming the new fuel powering Artificial Intelligence and Digital Transformation for
most organisations. There are more roles in the Data space for women like Data Analysts, Database
Administrators, Data Engineers and Data Scientist. Most of these roles involves in understanding the
data and how they stored, processed, analysed and make predictions using them.

Women in Technology (Page 1 of 3) Back to TOC | Page 12


• Data Engineer - Azure for the Data Engineer
learning path - Learn | Microsoft Docs
• Data Analyst - Create and use analytics reports
with Power BI - Learn | Microsoft Docs
• Database Administrator - Azure SQL
fundamentals - Learn | Microsoft Docs
• Data Scientist - Perform data science with Azure
Databricks - Learn | Microsoft Docs, Build AI
solutions with Azure Machine Learning - Learn | Microsoft Docs

Cybersecurity – This is currently a growing area across all the organisations.

Security Engineer

• SC-300 part 1: Implement an identity management


solution - Learn | Microsoft Docs,
• SC-300 part 2: Implement an Authentication and
Access Management solution - Learn | Microsoft Docs,
• SC-300 part 3: Implement Access Management for
Apps - Learn | Microsoft Docs,
• SC-300 part 4: Plan and implement an identity
governance strategy - Learn | Microsoft Docs

Security Operations Analyst

• SC-200 part 1: Mitigate threats using Microsoft Defender for Endpoint - Learn | Microsoft Docs
• SC-200 part 2: Mitigate threats using Microsoft 365 Defender - Learn | Microsoft Docs
• SC-200 part 3: Mitigate threats using Azure Defender - Learn | Microsoft Docs
• SC-200 part 4: Create queries for Azure Sentinel using Kusto Query Language (KQL) - Learn |
Microsoft Docs
• SC-200 part 5: Configure your Azure Sentinel environment - Learn | Microsoft Docs
• SC-200 part 6: Connect logs to Azure Sentinel - Learn | Microsoft Docs
• SC-200 part 7: Create detections and perform investigations using Azure Sentinel - Learn |
Microsoft Docs
• SC-200 part 8: Perform threat hunting in Azure Sentinel - Learn | Microsoft Docs

We all can choose to challenge that women can take


up any technical role and excel in those roles. All we
need to do is to motivate, inspire and encourage every
woman to achieve great heights in technical roles. Let’s
break the barriers and succeed in tech sector as
“Wonder Woman”.

Image Credits
www.internationalwomensday.com
www.imdb.com

Women in Technology (Page 2 of 3) Back to TOC | Page 13


Questions? Comments? Talk to the author today. Anupama Natarajan on Twitter.

About Anupama Natarajan


Anu is a Data, Analytics and AI Consultant with 20+ years of experience with
design and development of Data Warehouse, Business Intelligence, AI enabled
applications and SaaS integrated solutions.

LEARN MORE

Non-Tech World of Anupama Natarajan

Anu loves baking muffins, cupcakes and cakes in her free time. It’s her most recent passion finally
having enough time to make full use of her oven.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Women in Technology (Page 3 of 3) Back to TOC | Page 14


Ad

Back to TOC | Page 15

Learn More
Nothing in the
Career Highlights world can prepare
us to bear the loss
MCP of someone who
has influenced our
MVP lives greatly. The
notion that ‘time
Chief Technical Editor of heals the pain’ is a
SQLServerGeeks myth. We just
learn to get
Vice President – familiar with the
SQLServerGeeks idea of their
absence. The void
Core Team Member that losing
Asia's First SQL someone creates
Conference - in your life
SQLServerGeeks Annual remains, while the best we can do is find a healthy way to cope with
it and channel our grief.
Summit 2015
It is with a heavy heart that we, at SQLServerGeeks &
DataPlatformGeeks mourn the loss of Mr. Ahmad Osama, as our
Books Authored deepest condolences go out to his family & friends, praying they
Professional Azure SQL find the strength and courage to make it through these trying times.
Database Administration It is our due responsibility, towards a friend, comrade & a fellow
Geek, to take this moment to honor his journey, by remembering
Professional SQL Server
the things that made him the person we have come to know, adore
High Availability and admire.
Disaster Recovery Let’s take a stroll down memory lane…
He joined SQLServerGeeks back in 2012 as a Chief Technical Editor
Blog Series while having already acquired the stature of a Microsoft Certified
Professional, right
Accidental DBA when we were
climbing our first
DataPlatFormLabs few rungs of the
global ladder. Since
then, he has made
innumerable
contributions,
making him a part
of our Core Team,
as the Vice
President of
SQLServerGeeks –

Back to TOC | Page 16


and the Geeks family at large. After becoming a ‘Most Valued Professional’ for Microsoft, he played a
prominent role in organizing Asia’s first SQL Conference – SQLServerGeeks Annual Summit 2015.As a
true data enthusiast, he did his part to give back to the community by sharing his expertise with
countless IT professionals, by providing quality training and guidance to all those in need.
He was an affluent speaker, who actively participated in multiple tech events and webinars, where he
was responsible for ‘fine-tuning’ minds for optimized performance. And he did exactly that.!
As an avid blogger, he took to the
keyboard as the tool of choice to
share valuable insights with the
community. Having published
well over 100 blogs in
SQLServerGeeks alone, numerous
more in DataPlatformLabs, he has
undoubtedly earned the ‘Geek’
badge for himself.
Having said that, he was known to
spend a fair share of time on his
Xbox as a means of letting off
steam and rejuvenating himself
before taking another spin at the
real world.
Besides having a voice, a keyboard and a joystick in his arsenal, He took it a step further to encapsulate
his 12+ years of experience with Databases into two books, that he authored, which will undoubtedly
keep his legacy alive, benefiting countless people in the years to come. And just like that, he found a
way to contribute - even after he is long gone. Need We Say More?
With that, we would like to pay our sincere respects and a humble homage, to an incredible individual
& an amazing human being, who we were blessed enough to have, as a part of our lives.
Thank You for everything.
On Behalf of All of Us – You Will Be Missed.

Contribute and Support Ahmad's Family

Back to TOC | Page 17


Always Encrypted with
Secure Enclaves in SQL
Server 2019 and Azure
SQL Database
Leonard Lobel | @lennilobel

A lways Encrypted is the latest of several encryption features available in SQL Server and Azure
SQL Database. We’ve had column-level encryption since SQL Server 2005, which uses either
certificates or symmetric keys to keep encrypted data hidden from view. SQL Server 2008
(Enterprise Edition) added Transparent Data Encryption (TDE) to encrypt the entire database–again,
using a special database encryption key–so that without that key, the entire database (and its backups)
remains encrypted and completely inaccessible.

Although these features serve us well, they do suffer from two significant drawbacks.

First, the very certificates and keys used for encryption are themselves stored in the database (or
database server), which means that the database engine is always capable of decrypting the data.
While this may be acceptable with an on-premise data center that you manage yourself, it’s a major
problem if you want to move your data to the cloud. Because by giving up physical ownership of the
database, you're also handing over the encryption keys and certificates to your cloud provider
(Microsoft, in the case of Azure), empowering them to access your private data.

Another concern is the fact that these older features only encrypt data “at rest” (on disk), relying on
other protocols (for example, SSL and TLS) to encrypt data “in flight” (across the network).

Enter Always Encrypted


Always Encrypted was introduced in SQL Server 2016 to address these very concerns. With this
feature, data is encrypted not just at rest, but also in flight. Furthermore, the cryptography keys
themselves–which are essential for both encrypting and decrypting–are not stored in the database.
Those keys stay with you, the client.

Thus, Always Encrypted effectively separate the clients who own the data, from the cloud providers
who host it. Because the data is always encrypted, SQL Server (and the cloud hosting provider) cannot
decrypt it. Data can only be served up in its encrypted state, and so, data is inherently encrypted in
flight as well. Only when it arrives at the client can it be decrypted, on the client, by the client, who
possesses the necessary keys. Likewise, when inserting or updating new data, that data gets encrypted
immediately on the client, before it ever leaves the client, and remains encrypted in-flight all the way
to the database server, where SQL Server can only store it in that encrypted state–it cannot decrypt
it. This is classic GIGO (garbage in, garbage out), as far as SQL Server is concerned.

Always Encrypted with Secure Enclaves in Back to TOC | Page 18


SQL Server 2019 & Azure SQL Database (Page 1 of 6)
The initial version (“V1”) of Always Encrypted in SQL Server 2016 was an important first step toward
confidential computing. But in SQL Server 2019 (and now, Azure SQL Database), the feature has been
greatly enhanced to work with secure enclaves, and this enables rich query processing over encrypted
data beyond what was possible with V1. Since secure enclaves builds on the initial Always Encrypted
implementation, you need to start by understanding how Always Encrypted worked prior to SQL
Server 2019.

Always Encrypted V1 (SQL Server 2016)


Always Encrypted protects sensitive data on the server using cryptography keys available only to the
client. Specifically, these include Column Encryption Keys and Column Master Keys.

Column Encryption Key (CEK)


For each column in each table that you want to encrypt, you create one or more Column Encryption
Keys (CEKs). These are keys that encrypt your data using the SHA256 cryptography algorithm. You can
create one CEK for each column you want to encrypt, or you can use the same CEK to encrypt multiple
columns in multiple tables; it’s not necessarily one-to-one.

With a CEK in hand, data can be encrypted and decrypted in column(s) protected by that CEK. Thus,
CEKs must be carefully guarded. They can’t be stored in the database, since encryption and decryption
occurs exclusively on the client side.

Column Master Key (CMK)


Actual CEKs can’t be stored in the database, but encrypted CEKs can. And that’s where the Column
Master Key (CMK) comes in. Every CEK is encrypted by a CMK. So, think of the CMK as a “key encrypting
key” that, itself, encrypts the CEKs (a “data encrypting key”) so that they are safe to store in the
database. The CMK is only ever available on the client side; for example, in the client machine’s
Windows certificate store, or in the cloud using Azure Key Vault (AKV), accessible using client
credentials. Then, in addition to the encrypted CEKs, the client-side path to the CMK (not the CMK
itself) is stored in the database.

As a result, the database has all the information that the client needs to perform encryption and
decryption but is itself powerless to perform these operations on its own. And that’s because the CEK
is needed for cryptography operations; but the database only has a CEK that has been encrypted by
the CMK (and not the CEK itself). Furthermore, it has only a client-side path to the CMK (and not the
CMK itself). Thus, Always Encrypted can be viewed as a hybrid feature that is based on client-side
encryption/decryption and driven by server-side metadata.

Encrypting a Table
SQL Server Management Studio (SSMS) provides tooling to generate CMKs and CEKs for Always
Encrypted. It also has a wizard that will migrate an existing (non-encrypted) table to a table with one
or more encrypted columns.

For each column, you choose a CEK and an encryption type of deterministic or randomized. You must
choose deterministic if you want to be able to query (equality only) or join on the column. This works
because the same ciphertext (encrypted data) is always generated from the same clear text.
Otherwise, you should choose randomized because it’s much more secure than deterministic. For
example, deterministically encrypting a Boolean column yields only two distinct ciphertext values,
making it easy for a hacker to distinguish true and false values. With random encryption, the same
Boolean column will appear to have many different values, but cannot be queried against.

Always Encrypted with Secure Enclaves in Back to TOC | Page 19


SQL Server 2019 & Azure SQL Database (Page 2 of 6)
The wizard then creates a new table with matching schema, and with CEK and encryption type
designations assigned accordingly to each column you selected. Then it transfers the rows into the
new table, encrypting along the way. But remember, it’s the client (SSMS in this case) that’s
performing the encryption–not SQL Server, which has no access to the CEK. All the data gets round-
tripped through the wizard, which encrypts the selected column(s) using the CEK(s) and encryption
type(s) that you selected. Finally, the wizard drops the old table, renames the new table, and the
migration is complete.

Querying an Encrypted Table


On the client side,
cryptography operations
are transparently
performed by the client
driver, and is currently
supported for ADO.NET,
ODBC, JDBC, or PHP.
Note that the ADO.NET
driver supports Always
Encrypted across all .NET
flavors (.NET Framework,
.NET Core, and .NET
Standard).

Here’s the workflow:


1) The client includes “Column Encryption Setting=Enabled” in the connection string
2) The server sends the encrypted CEK (yellow key with red lock) and the CMK path (red key in
dotted border) back to the client.
3) The client retrieves the CMK (red key) from the path (local computer certificate store or Azure
Key Vault) and uses it to decrypt the CEK.
4) The client encrypts the SSN with the CEK (yellow key), so that it can be queried by the server.
This implies that deterministic encryption is being used on the SSN column, making it possible
to query on it (equality only).
5) The client issues a modified version of the query with ciphertext for the SSN. This is in-flight
encryption, so anyone hacking the wire cannot see the SSN in clear text.
6) The Name column returned to the client is also encrypted, but it could (should) be using
randomized encryption if there is never a need to query on it. Again, anyone hacking the wire
in this direction cannot see the Name in clear text.
7) The client receives the encrypted name column, and decrypts it using the CEK.

With this basic implementation, data is encrypted not just at rest and in-flight, but “in-use” as well.
That is, limited operations over encrypted data can be performed by SQL Server without requiring
decryption. Notably, only equality comparison is allowed, supporting point lookups (like the SSN
example), as well as JOIN, GROUP BY, and DISTINCT, but not much else. Furthermore, these operations
only work with deterministically encrypted columns, which is less secure than randomly encrypted
columns.

Always Encrypted with Secure Enclaves in Back to TOC | Page 20


SQL Server 2019 & Azure SQL Database (Page 3 of 6)
Always Encrypted with Secure Enclaves (SQL Server 2019 and Azure SQL
Database)

In SQL Server 2019 (and, most recently, Azure SQL Database), Always Encrypted has been greatly
enhanced to leverage secure enclaves. With secure enclaves, server-side processing of encrypted data
is fully supported, including not just equality comparisons (previously possible only with deterministic
encryption), but range queries, pattern matching (LIKE), and sorting–all over randomly encrypted data.

Furthermore, using secure enclaves, cryptography operations can be performed in place on the server.
Encrypting existing data therefore does not require round-tripping the network (like the SSMS wizard
with V1), which scales poorly with for large amounts of data.

All this may seem like an impossible feat, given that SQL Server still has no access to the keys needed
for encryption and decryption. Yet SQL Server 2019 and Azure SQL Database make this possible by
leveraging secure enclaves in conjunction with the base Always Encrypted functionality introduced in
SQL Server 2016. So data remains protected on the one hand, while at the same time, the ability to
perform rich server-side computations over that data is preserved.

What is a Secure Enclave?


To understand how this magic works, you need to understand what an enclave is. Simply put, an
enclave is a special region of the normal memory allocated to a process. This region of memory is
isolated and protected not only from its containing process, but everything else on the entire machine.
No other processes (not even the almighty kernel itself) can access this region of memory. The enclave
is essentially a black box that cannot be accessed even by highly privileged administrators.

Like any memory, an enclave can contain both code and data. However, code must be signed in a
special way in order to be able to run in an enclave, and then that becomes the only code running on
the machine that can access data contained inside the same enclave.

Several technologies are available today to provide the secure isolation of an enclave. This includes
hardware-based solutions such as Intel Software Guard Extensions (SGX), which is used by Azure SQL
Database. Secure enclave isolation can also be powered by leveraging the machine’s hypervisor, such
as virtualization-based security (VBS) in Windows Server 2019 and Windows 10 v1809, which is used
by SQL Server 2019.

An attacker attempting to access an enclave can easily open a debugger, connect to the process that
contains the enclave, and find the enclave
memory. But the memory contents will not be
visible to them. For example, attempting to view
the contents of a VBS enclave reveals nothing
but question marks:

Enclaves will never be exposed in a full memory


dump and they are completely impervious to
memory scanning attacks. This makes them an
extremely attractive technology that can serve
as a trusted execution environment for
processing sensitive data.

Always Encrypted with Secure Enclaves in Back to TOC | Page 21


SQL Server 2019 & Azure SQL Database (Page 4 of 6)
Leveraging Secure Enclaves
For Always Encrypted, the goal with secure enclaves remains the same; protect sensitive data from
highly privileged but unauthorized users (like DBAs and machine admins). By using secure enclaves,
this level of protection can now be maintained without compromising SQL Server’s ability to perform
rich queries, and encryption can be performed in-place on on the server.

When the database engine starts, it loads an enclave. This means that SQL Server is now a hosting
process that contains an enclave, but SQL Server itself does not run in the enclave, nor can it access
the enclave’s content. Rather, the enclave acts as an extension of the client-side trust boundary on
the server machine; a trusted representative of the client within the SQL Server environment. Think
of it as a foreign embassy. The embassy is physically located inside a foreign country. Yet within the
perimeter of the embassy, only the laws of its native country apply, while the laws of the hosting
foreign country do not. At the same time, it’s just a footstep to enter or exit the embassy, compared
with the thousands of miles to travel back and forth between the countries.

The way to think of this is in terms of the Always Encrypted philosophy is, cryptography operations
are still performed exclusively by the client, but not necessarily on the client machine. Meaning, the
enclave on the server machine in essence is the client. Critically, this means that the client and server
can communicate without round-tripping the network, because client code is running inside the
enclave as an extension of the client machine.

Enclave Attestation
But how does the client machine know that the enclave on the server machine can be trusted? How
does it know that there isn’t malicious code running inside the enclave? This supreme level of trust is
achieved by both the client and server machines negotiating through a third machine, called the
attestation server.

As the name implies, the sole purpose of this server is to attest to the authenticity of the enclave. That
is, it certifies to the client that the enclave on the server is running code that can be trusted. Only then
does the client authorize the use of the enclave.

Once attestation succeeds, the client driver establishes a secure tunnel connection to client code
running inside the enclave on the server machine. The client machine and the client code inside the
enclave on the server both exchange a shared secret over this secure tunnel. This secret is then used
to encrypt a CEK on the client machine and send it to the enclave on the server machine. Inside the
enclave–and only inside the enclave–the shared secret is used to decrypt the CEK.

Enabling Rich Query


Now the CEK is available inside the enclave, but still completely unavailable to SQL Server running
inside the process that’s hosting the enclave. At this point, SQL Server can perform rich queries over
encrypted data; for example, with support for pattern matching (LIKE), range comparisons, and
sorting. And that’s because the client running in the enclave is close at hand, and can be utilized for
cryptography operations all on one machine, with no network activity.

When we ask SQL Server to execute a query that includes rich computations, it’s still powerless to
process those portions of the query that operate over encrypted columns. So instead, SQL Server
delegates these portions of the query over to the enclave, along with the encrypted data that needs
to be examined for that particular operation (for example, a range comparison). The query engine
injects the encrypted data into the enclave (which is effectively the same as passing it to the client but
without a network call) and asks it to perform the operation.

Always Encrypted with Secure Enclaves in Back to TOC | Page 22


SQL Server 2019 & Azure SQL Database (Page 5 of 6)
Inside the enclave, the CEK obtained from the client machine via the secure tunnel is used to decrypt
the value and perform the operation. The result is then returned to the query engine, which then
continues to process the rest of the query normally. Any additional query references to encrypted
columns are similarly resolved by the enclave in-line with query execution. In this manner, encrypted
data is decrypted on the fly and processed by the client running in the enclave, as needed, by the
query engine on the server.

Conclusion
This article gave you an overview of Always Encrypted with secure enclaves in SQL Server 2019. And
now that this exciting feature is finally available in Azure SQL Database as well, you can leverage the
technology for greater security in both your on-premise and cloud databases.

Questions? Comments? Talk to the author today. Leonard Lobel on Twitter.

About Leonard Lobel


Leonard Lobel is the chief technology officer (CTO) and co-founder of Sleek
Technologies, Inc., a New York-based development shop with an early adopter
philosophy toward new technologies.

LEARN MORE

Non-Tech World of Leonard Lobel

During his free time, Lenni loves playing the piano and traveling.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Always Encrypted with Secure Enclaves in Back to TOC | Page 23


SQL Server 2019 & Azure SQL Database (Page 6 of 6)
SQL nuggets
By microsoft

Early technical preview of JDBC Driver 9.3.1 for


SQL Server released

Inference of ML Models in SQL Server via


External Languages

Microsoft.Data.SqlClient 3.0 Preview 2

SQL Server 2019 on Ubuntu 20.04,python2 dependency


removed for SQL Server 2019 across distributions.

Cumulative Update #10 for SQL Server 2019


RTM

Inference of ML Models in SQL Server via


External Languages

Early technical preview of JDBC Driver 9.3.0 for


SQL Server released

How to get the biggest bang for your buck with


SQL Server on Azure VMs
Back to TOC | Page 24
{Place your company ad here and reach out to our readers}
{Talk to us today. Drop an email at [email protected]}

Got from a friend? Subscribe now to get your copy.

Back to TOC | Page 25


Azure Database for
MySQL Flexible
Server – a fully
managed service
running community
version of MySQL.
Parikshit Savjani | @talktosavjani

T he year 2021 has started as an eventful year and continuous to be challenging times for people,
businesses and economies around the world. As our CEO Satya Nadella puts it – “We have seen
two years of digital transformation in two months”. The Azure Database for MySQL service is at
the heart of this transformation empowering online education, video streaming services, digital
payment solutions, e-commerce platform, gaming services and govt and healthcare websites to
support unprecedented growth, save cost and enable our customers to scale. It is immensely satisfying
to see Azure Database for MySQL service is enabling our customers to meet the growing demands for
their services during these critical times. Azure database for MySQL service with community version
of MySQL is powering mission critical applications and services like healthcare services for Denmark
citizens, digital payment application for Hong Kong citizens, music and video streaming platforms for
Indian, Korean and Japanese citizens, online news websites, mobile gaming services including our very
own Minecraft Realms.

What applications run on MySQL?


MySQL is one of the popular choices of database engine for designing internet scale consumer
applications. Internet scale consumer applications are highly transactional online applications with
short chatty transactions against a relatively small database size. These applications are typically
developed in Java or Php framework and migrated to run on Azure VMs, Azure App Services or
containerized to run on Azure Kubernetes Services (AKS). The database is typically required to scale
high volume of incoming transactions. Majority of our customers leverage proxysql load balancer
proxy and read replicas, to scale out and meet the workload demands for their business. MySQL 5.7
and 8.0 continuous to be the popular choice for our customers which enables them to meet their
performance and scale goals.

What is Flexible Server in Azure Database for MySQL?


With over two years since general availability of Azure Database for MySQL , we’ve listened and
learned a lot from you who use our MySQL managed database service on Azure. As a developer, you
appreciate the ease of provisioning, built-in high availability, and manageability of fully managed
service. But for some of you, moving to a managed service can be seen as loss of database level control

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 26
running community version of MySQL (Page 1 of 7)
and flexibility when it comes to configuring your MySQL servers—which has prevented you from
taking advantage of the benefits of a managed service but hopefully, not anymore.

Now in preview: Introducing Azure Database for MySQL - Flexible Server

We designed the new Flexible server deployment option for MySQL with these goals in mind:

• Simplify developer experiences – Make it easier for you to quickly onboard, connect, and get
started.
• Maximize Database Controls – Provide maximum control on your server configurations to
provide experiences at par with running your own MySQL deployments.
• More Cost Optimization Controls – Provide more options for you to optimize and save costs.
• Enable Zone Resilient & Aware Applications – Allow you to build highly available, zone
resilient and performant applications, with your MySQL database co-located in the same zone,
so you can tolerate zone level failures.

Let us now dive into what you can expect from the new Flexible server deployment option on Azure
Database for MySQL—as well as a bit about what your experience will be like.

Create a Flexible server with single Azure CLI command


As a developer, you are probably familiar with Azure CLI commands in Azure Cloud Shell. Now, you
can create a new Flexible server deployment option for MySQL using a single Azure CLI command, as
shown below:

Requires Azure CLI > 2.12.0 or Azure Cloud Shell

az mysql flexible-server create -l location

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 27
running community version of MYSQL (Page 2 of 7)
As of today, Flexible Servers offering for Azure Database for MySQL is live in 14 Azure regions. You can
check our documentation for most up to date information.

Use familiar tools to connect to your server & it just works!


With Flexible Server deployment option for MySQL, you can use familiar tools like MySQL Workbench
and drivers to connect and it just works !!!.

If you

would like to get a guided quick start, I recommend you start here. Here is the detailed list of
commands you can expect.

More Server Parameter Control with Flexible Server


With Flexible Server, we have exposed 30% more parameters compared to Single server which you
can now modify and customize based on the needs and dependencies of your application.

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 28
running community version of MYSQL (Page 3 of 7)
Network Isolation Control
With Flexible Server on Azure Database for MySQL, you can run and select your server to be in either
be in public access mode or secure it in private access mode.

With Private access, you can deploy your Flexible server into your Azure Virtual Network. Azure virtual
networks provide private and secure network communication. Resources in a virtual network can
communicate through private IP addresses only. Flexible server in private access mode has no public
endpoints and cannot be reached from outside the virtual network. In addition, you can create a
flexible server in virtual network using a single command show below. The subnet should not have
any other resource deployed in it and this subnet will be delegated
to Microsoft.DBforMySQL/flexibleServers, if not already delegated. See Networking concepts for
more details.

az mysql flexible-server create --subnet


/subscriptions/{SubID}/resourceGroups/{ResourceGroup}/providers/Microsoft.Network/virtualNetw
orks/{VNetName}/subnets/{SubnetName}

By default, SSL is enabled with TLS 1.2 encryption enforced but it can be disabled by setting the
require_secure_transport to OFF from portal.

Control your Planned Maintenance schedule


The service performs automated patching of the underlying hardware, OS, and database engine. The
patching includes security and software updates. For MySQL engine, minor version upgrades are also
included as part of the planned maintenance release. When managing and running mission critical
business application, it is critical for you to be able to control the maintenance schedule as it directly
impacts the
availability of the
database server and
application for your
business. You may also
want to test the
impact of the patch on
your application
behavior and
performance. This is
where you may want
to apply and release

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 29
running community version of MYSQL (Page 4 of 7)
the patch on pre-production and test environments first as soon as service releases it to test it and
plan to roll out in production at a later schedule. With the new Flexible Server option for Azure
Database for MySQL, you can now schedule your maintenance at a time which works best for you.
From the Maintenance blade in Azure portal, you can specify the day of the week and 1 hour time
window in a month, which works best for you to perform server patching which may involve restarts.
For more details, refer Scheduled Maintenance concepts.

Scale out your workload with up to 10 read replicas


MySQL is one of the popular database engines for running internet-scale web and mobile applications.
Many of our customers use it for their online education services, video streaming services, digital
payment solutions, e-commerce platforms, gaming services, news portals, government, and
healthcare websites. These services are required to serve and scale as the traffic on the web or mobile
application increases.

On the applications side, the application is typically developed in Java or php and migrated to run
on  Azure virtual machine scale sets or Azure App Services or are containerized to run on Azure
Kubernetes Service (AKS). With virtual machine scale set, App Service or AKS as underlying
infrastructure, application scaling is simplified by instantaneously provisioning new VMs and
replicating the stateless components of applications to cater to the requests but often, database ends
up being a bottleneck as centralized stateful component.

The read replica feature allows you to replicate data from an Azure Database for MySQL flexible server
to a read-only server. You can replicate from the source server to up to 10 replicas. Replicas are
updated asynchronously using the MySQL engine's native binary log (binlog) file position-based
replication technology. You can use a load balancer proxy solution like ProxySQL to seamlessly scale-
out your application workload to read replicas without any application refactoring cost.
See Read Replica concepts to learn more.

Start with burstable SKUs starting at $13 per month


This has been one of the long standing asks from many of you looking to use MySQL server for personal
projects or development purposes. With Flexible Server on Azure Database for MySQL, you can now
start with a burstable SKU if your workload doesn’t need 100% of CPU time all the time. Burstable
SKUs are generally preferred for dev/test scenarios. The lowest available burstable compute tier B1S
starts at $13 per month. See Compute and Storage sizes in documentation for more details.

Stop your server when not in use to save cost!


This is again one of the highly requesting asks from many of you who are looking to save compute cost
when not in use by simply stopping the server. See Server concepts for more details.

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 30
running community version of MYSQL (Page 5 of 7)
Build Zone resilient applications with Flexible Server
With Azure Kubernetes Services (AKS) or Virtual Machine Scale sets, you can build and deploy zone
resilient application that can tolerate zonal failures. With Flexible server on Azure Database for MySQL,
you can now enable zone redundancy for your MySQL database server as well.
When you enable zone
redundant high availability for
your MySQL server with Flexible
server, the service provisions a
hot standby server on the
secondary availability zone with
synchronous replication of data.
In case of zonal failures, the
MySQL database server will
automatically failover to bring
the standby server on secondary
availability zone online to ensure
your applications and database
is highly available and fault tolerant to Availability zone level failures. See high availability concepts for
more details.

Here is Database
Azure the latest update
for Mon MySQL
ySQL FlexibleServer
Flexible Server release – managed
– a fully
MySQL 8.0.21,
service Zonecommunity
running placement, and IOPs scaling
version now available
of M YSQL (Page 6 in
of Flexible
8) Server!!! -Microsoft Tech ...

Getting Started
You can quickly get started by creating your first server using the quickstarts in our documentation on
docs.microsoft.com:

• Create an Azure Database for MySQL Flexible server using Azure portal
• Create an Azure Database for MySQL Flexible server using Azure CLI
• Create an Azure Database for MySQL Flexible server using ARM template

To learn more, you can read our Flexible server documentation for MySQL.

For any questions or suggestions you might have about working with Azure Database for MySQL, you
can send an email to the (Ask Azure DB for MySQL. To provide feedback or request new features, we
would appreciate it if you could make an entry via UserVoice which can help us to prioritize.

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 31
running community version of MYSQL (Page 6 of 7)
Flexible server is available in preview on Azure Database for MySQL, with no SLAs and hence is not
meant for production deployments yet. Single Server deployment option continues to be our
enterprise-ready platform, supporting mission critical application and services as I shared in my
last service update.

To help you compare Single server and Flexible server for Azure Database for MySQL so you can figure
out which deployment option is right for you, we’ve created a handy feature comparison matrix for
you in our documentation.

Questions? Comments? Talk to the author today. Parikshit on Twitter.

About Parikshit Savjani


Parikshit Savjani leads the Program Management for Azure Database for
MySQL and MariaDB services.

LEARN MORE

Non-Tech World of Parikshit

In my free time, I enjoy spending time with my 6 year old daughter. It is delight to watch them grow
and how quickly they learn and adapt.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 32
running community version of MYSQL (Page 7 of 7)
v

Attention all Data Enthusiasts! My name


is DaDa. I represent Data Platform Virtual
Summit this year. And the big news is
here. DPS 2021 (#DPS2021) has been
announced.
Well, not just the DPS 2021 announcement,
the bigger news is that DPS 2021 (the
Summit) is now free if you book any one
Training Class. Read on to know more.

I will take you on an exhilarating journey, into


the world of Data, Analytics and AI. Be a part
of the action this September, to learn & grow
your technical skills with some deep technical
content from the world’s best Data
Professionals.

Data Platform Virtual Summit will run from Sep 13 to 18. Pre-Cons on Sep 8 & Sep 9.
Post-Cons on Sep 20 & Sep 21.

Visit DPS 2021 Today CFS is Open

Back to TOC | Page 33


Building on the success from last year, DPS 2021 will be virtual and will run for 54 hours – covering the
entire globe. Brilliant minds, spanning over different continents, will defy geographical boundaries and
come together to pull-off this spectacle.

A 100% technical learning event with 150+ Breakout Sessions, 20+ Training Classes, 100+ World’s Best
Educators & 54 hours of conference sessions , makes DPS 2021 one of the largest online learning

events on Microsoft Azure Data, Analytics & Artificial Intelligence.

Virtual World
of DPS
Last year, Data Platform Summit
transitioned into a virtual event. We
brought you 30+ Training Classes, 200+
Breakout Sessions, 170+ World’s Best
Educators, 48 hours of Pre-Cons, 48
hours of Post-Cons & 72 hours of non-
stop conference sessions – DPS 2020 is
the largest online learning event on
Microsoft Azure Data, Analytics &
Artificial Intelligence.
Breakout Session Room
For your comfort, we will covered all time zones,
running continuously. The event is came to your
country, your city, your home! So, now being
virtual, DPS have bigger participation from
Microsoft Redmond based Product Teams &
worldwide MVPs.

Now comes the exciting part! Our virtual


conferencing platform. You will be delighted to
experience the incredible interactivity of the
platform – truly immersive!

Round Tables

Data Gurukul Exhibitor Hall


Back to TOC | Page 34
Mental Health and
Wellness in iT: leT’s
Stop the Stigma
Tracy Boggiano | @TracyBoggiano

Q uick introduction to me. I am not a mental health professional, I’m just a SQL Server DBA who
has experienced work-related issues that have lead to me having mental health issues, and
personal issues that have lead to me having mental health issues that have both effected my
ability to work.

One thing I would like for everyone to remember is “Mental health isn’t just mental illness – it’s part
of being human.” – Anonymous. See mental health is just important as your physical health. The
stigma around mental health is staggering. If you had diabetes, you would seek treatment and take
you insulin. While most people with mental health issues do not seek help and refuse to take medicine
because they see it as a weakness. Seeking help is not a weakness, it actually the opposite, it’s one
the strongest things you can do. Instead, people self-medicate with food, drugs, alcohol, etc. including
myself at one point in my life. It’s hard to accept a mental health diagnosis with the stigma that exists,
but 15 years ago I was diagnosed with bipolar II, complex PTSD, and generalized anxiety. I found these
hard to accept and did not want to stick to medication regimen or even go see the doctor, luckily I
stuck with and I’m stable now more so than then. I personally think everyone could use therapist to
help with the bumps in life, and there is no shame in that or seeing a psychiatrist.

In the United States each year one out four1


people seek mental health help in a given year,
that is 25%2, a survey of IT people reveals that
is 42%3 in the US, and 48% in the United
Kingdom. More stats from the OSMI survey
reveal that IT professionals do not feel
comfortable about talking about issues with
their managers or colleagues. Mental health
should be just as important of apart of
conversations as your physical health, they are
intertwined and it’s so important to your well-being and how you live, work, and create.

For IT professionals there are four things that will cause an individual to possibly develop anxiety or
depression around their job: burnout, stress, harassment, and bullying. Burnout comes from working
on the same thing all the time and working extralong hours, I have been known to do this and take on
extra projects outside of work and just crumble with overwhelm when looking at my calendar. I would
personally take the quiz at http://burnoutindex.org and gauge how burnt out you might be. You might

Mental Health and Wellness in IT: Back to TOC | Page 35


Let’s Stop the SIGMA (Page 1 of 3)
be and not aware of it like I was when I first took
the quiz. We experience stress from being on
call and having to get systems up 24x7 and not
being able to make mistakes without dire
consequences in production. We also stress
ourselves out staying connected all times to our
cellphones even when are not on call. Then
comes in harassment of any type not just sexual
harassment. I’ve been sexually harassed at work
but also picked on for the clothes I wear. I have
seen a male manager at my workplace be
constantly picked for being short, he was 5 foot 2 inches and most of the other managers where 6 feet
tall. Picking on colleagues as work is not accepted. I have had a coworker get away with hanging a
Playboy calendar in is cubicle with just a sticky note over the sensitive areas to see. All these things
can cause people to become anxious about going to work or even depressed about it.

Now that we have talked about all the bad stuff what can we do about it. One talk more openly in the
workplace about your struggles with mental health. Let us break the stigma. First, seek in medical
help you may need to treat your mental health while you get the situation under control. See a
psychiatrist, or your primary care doctor or talk to a therapist. Stop any self-medicating you are doing
and let the professional’s help. Remember on this matter it may take several tries at different
medications to get one that works for you so don’t give up on the doctors. Talk to your boss about
anything over stressing you about your job and see what can be changed if nothing it is probably time
to find a new job. Put down your cell phones and work computers when it is outside your office
hours. Let the on-call person deal with it and you relax and work on your hobbies or other projects.
If the stress if coming from harassment or bullying, you may need to go to your human resources
department to file a formal complaint.

Other things you can do for yourself is to make sure you are eating well, exercising, and sleeping well.
These are fundamental to physical and mental health. Develop some hobbies away from the
computer (right here is me calling the kettle black,
so feel free to follow up with me to make sure I am
taking my own advice in a couple of months). Right
now, with the news and negativity get away from
social media or if you like Twitter use muted works.
Chrissy Lemaire has a great list in GitHub that can
get you started and save your sanity.

Also, especially with is being a pandemic we need


to look for our friends and colleagues. The
#SQLFamily is mighty and caring. If you have not
heard from someone in while or noticed they have
disappeared reach out to check on them. Several
people check on me and honestly it helps
tremendously and appreciate every one of those
people that help me maintain my mental health.
Make sure to listen to what the person has to say
with judging them, remember we all are getting use to talking about mental health and reducing the
stigma.

Mental Health and Wellness in IT: Back to TOC | Page 36


Let’s Stop the SIGMA (Page 2 of 3)
Seek help from your employer if needed. I know in America there is the ability to take family medical
leave if you need it. Use your vacation days as mental health days and take a day here and there for
just you. Also, advertise to your co-workers that is what you are doing so they might start feeling
comfortable doing the same thing. If you are manager, it would be helpful if you would do this to start
the trend among your employees. Don’t be afraid to use your time off just to do nothing but let you
mind rest, and you get mentally healthy.

Finally, I will sum up with more of my story. In 2018, I switched jobs and had half the company laid
off which cause me two stressors. Then I got in a wreck at SQL Saturday LA in my rental car. Then I
switched therapist. Then I help a friend through a crisis. I was travelling to two SQL Saturdays a month.
Then I started a different job because the first one scared me after they laid off half the people. Do
you see the stress adding up here? Meanwhile I was doing nothing to take care of myself besides
taking my meds and trying to contact my doctor when I went into full blown mania from the bipolar
II. Because of this I landed in psychiatric hospital and it took me a year to fully recover back to normal.
Don’t be me seek help earlier, don’t keep adding to your stress, talk to someone before it gets out of
control. But do help me STOP THE STIGMA! Image Credits: Unsplash

Questions? Comments? Talk to the author today. Tracy Boggiano on Twitter.

About Tracy Boggiano

Tracy is a Database Superhero and Microsoft Data Platform MVP. She has spent
over 20 years in IT and has used SQL Server since 1999.

LEARN MORE

Non-Tech World of Tracy Boggiano


During her free times; Tracy can be found make a difference somewhere.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Mental Health and Wellness in IT: Back to TOC | Page 37


Let’s Stop the SIGMA (Page 3 of 3)
Ad

Learn More

Back to TOC | Page 38

www.SQLMaestros.com @SQLMaestros
Learning
Opportunities

SQL Day
10-12 May 2021
SQLDay is the largest conference focused
on Microsoft Data Platform – databases,
Big Data, Business Intelligence and
advanced data analysis.

Data Weekender
15 May 2021
Data Saturday - Data Toboggan - Cool
A Virtual Popup Microsoft
Runnings
Data Conference
12 June 2021
Data Saturdays is a place for the data community
to run small regional events with little outlay
space yet!
GroupBy
25-26 May 2021
GroupBy is free data platform training
by the community, for the community. Data Platform Virtual Summit
13-18 Sep 2021
Accelerating Data Driven Success

Data Ceili
28 May 2021
Data Céilí is Ireland's newest Video Channels
data platform event.
SQLServerGeeks

DataMinutes
11 June 2021
DataMinutes is the fastest event
in the Microsoft Data Platform PASS
space yet!

Back to TOC | Page 39


Want to list your event here? Just tag us in your tweets @SQLServerGeeks
SQL Server OPENJSON
for selecting and
comparing values
Tomaž Kaštrun | @tomaz_tsql

O PENJSON was introduced in SQL Server 2016 and is a table-valued function that parses JSON
formatted text and returns objects and properties in form of key:value pairs. These pairs can
be used presented as rows and columns, or as a rowset view over JSON file.

This ability to extract the objects and parameters (or keys and values) in a rowset view, opens up a lot
of potential useful T-SQL techniques that will go beyond reading JSON files or JSON formats.

Daily wrangling and engineering data will challenge you with variety of tasks, that usually end up too
complex for later maintenance or might pose a performance issue. OPENJSON table-valued function
has been many times overlooked (among those are also CROSS APPLY, STRING_ESCAPE, STRING_AGG,
STRING_SPLIT, TRY_CONVERT, CUME_DIST, LAG, LEAD, FIRST_VALUE) not because people would not
heard about it, but – from what I have seen – it is immediately associated with JSON format and people
simply ignore it.

Showing two examples that have proven really helpful over past years and has helped me and other
data analysts, developers, scientists many times. First example will be on selecting values and second
one on comparing values. Both cases can be used in different scenarios, different industries, but it’s
simplicity can be really helpful. Both demos are using Master database for the simplicity and brevity
but I would propose using your own database

Selecting Values
Many times, you want to have a set of values (as a string with separator) introduced into query. You
can either hard-code the values (which I would not recommend), you can iterate through the list, use
XML FOR PATH clause, create a temporary object and many other solutions. Since OPENJSON is a
table-valued function, you can simply use it with JOIN statement to pass the parameters
USE [Master];

DECLARE @TableID VARCHAR(100) = '[20,21,22,23,24]';

SELECT *
FROM sys.objects AS o
JOIN sys.schemas AS s
ON s.schema_id = o.schema_id
JOIN OPENJSON(@TableID) AS d ON o.Object_ID = d.value
-- returns same result set if used explicit SELECT value FROM statement
-- INNER JOIN (SELECT value FROM OPENJSON(@TableID)) as d ON o.Object_ID = d.value

SQL server OPENJSON For Selecting and Back to TOC | Page 40


Comparing values (Page 1 of 3)
This is a simple but effective way to get the list of e.g.: invoiceID, customerID, locationID, codeID into
your query. OPENJSON in this does create a temporary table of identifiers and filters the data based
on the values.

For further reading use Microsoft Docs resources as this link.

Comparing Values
OPENJSON can also be used to compare the set of values, given the same key. Imagine a JSON file
with the following keys and values:
[{
"ColA": 10,
"ColD": "2021/04/28",
"Name": "Table1"
}, {
"ColA": 20,
"ColD": "2021/04/28",
"Name": "Table2"

}, {
"ColA": 30,
"ColD": "2021/04/29",
"Name": "Table3"

}]

This data would be represented as rowset view as:


ColA ColD Name
10 "2021/04/28" Table1
20 "2021/04/28" Table2
30 "2021/04/29" Table3

And your case is to find all the differences between Table1 and Table3 on any given attribute.

OPENJSON would give you the capability to intricately pivot the data (or values) over the same key
and either show all the data or simply use/show where there are differences or match.
USE [Master];

SELECT
master_db.[key]
,master_db.[value] AS master_values
,model_db.[value] AS model_values
,msdb_db.[value] AS msdb_values

FROM OPENJSON ((SELECT * FROM sys.databases WHERE database_id = 1 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS master_db
INNER JOIN OPENJSON((SELECT * FROM sys.databases WHERE database_id = 3 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS model_db
ON master_db.[key] = model_db.[key]
INNER JOIN OPENJSON((SELECT * FROM sys.databases WHERE database_id = 4 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS msdb_db
ON master_db.[key] = msdb_db.[key]

SQL server OPENJSON For Selecting and Back to TOC | Page 41


Comparing values (Page 2 of 3)
In this case I am taking three different databases and joining all the column names (key) and pivoting
the data. You would not get the same result with running this query:
SELECT * FROM sys.databases
WHERE
database_id IN (1,3,4)

Since the OPENJSON function pivots the keys and values, it is much easier to filter out the rows (that
are columns in SELECT * FROM sys.databases statement) by applying WHERE clause or filter out
values in ON clause.

I have seen this concept first by my friend and fellow MVP, Miloš Radivojević and he introduced it in
the book “SQL Server 2016 developer’s Guide” and from that time, I have been using it in many
reports, queries, from Sales data (comparing complaints or searching out differences among
customers) to parameter sweeping for machine learning models. Truly simple, yet powerful
approach.

Questions? Comments? Talk to the author today. Tomaž Kaštrun on Twitter.

About Tomaž Kaštrun

With more than 15 years of experiences in the field of databases, business


warehouses and development, with focus on T-SQL programming and query
optimization.

LEARN MORE

Non-Tech World of Tomaž Kaštrun


Tomaž is an avid coffee drinker, enjoys a good book and loves riding single-speed fixed-gear bikes.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

SQL server OPENJSON For Selecting and Back to TOC | Page 42


Comparing values (Page 3 of 3)
#SQLFamily
Beyond SQL

Geeky Setup of the Month


Neil Hambly shows off his tech setup

Wonder of the Month


Brent Ozar walking into the volcano

Family Photo of the Month


Reza & Leila with Khersi & Lucy

Want to be featured here? Just tag @SQLServerGeeks in your tweet.

Back to TOC | Page 43


A Few tricks for
testing security
Steve Jones | @way0utwest

O ne of the things that I have often seen developers ignore in building applications is security.
Too often they either do not understand or take the time to test how different logins and
users might interact with their code and often assign too many permissions. In fact, one of
the main problems in SQL Server for decades has been developers requiring dbo, or worse, sa
permissions for their code.

This is not because the application needs those permissions, but because the developers didn’t bother
to create a better security structure.

In this article, I will look at a couple of tricks that can help you ensure that you easily incorporate a
security model as you are writing your application that allows you to test in the same way a user will.

Preparation is Key
When a developer is working on an application, often they connect to SQL Server with their own
credentials. These are often sa or dbo, which gives the developer a skewed view of the security model.
SQL Server tries to be secure, so new accounts don’t have rights to anything by default.

A good technique when you start an application is to create a couple of users and roles to help you
easily test your code. These give you different views into how your application is actually working for
non-privileged users. Here is the type of script that I keep around for beginning work on any
application:
CREATE LOGIN Joe_Admin WITH PASSWORD = 'Dem012#4'
CREATE LOGIN Joe_User WITH PASSWORD = 'Dem012#4'
GO
USE MyNewApp
GO
CREATE USER Joe_Admin FOR LOGIN Joe_Admin
CREATE USER Joe_User FOR LOGIN Joe_User
GO
CREATE ROLE AppAdmin
CREATE ROLE AppUser
GO
ALTER ROLE AppAdmin ADD MEMBER Joe_Admin
ALTER ROLE AppUser ADD MEMBER Joe_User

A Few Tricks for Testing Security (Page 1 of 3) Back to TOC | Page 44


Note that even if this is a legacy application, I’ll add these logins and users to help me test how things
work.

Now, as I build objects, I will assign them permissions for the roles. For example, if I add a table and
stored procedure, I’ll use a script like this:
CREATE TABLE OrderHeader
( OrderHeaderID INT
, OrderDate DATE
, Complete BIT);
GO

CREATE OR ALTER PROCEDURE GetOrderHeader @OrderHeaderID INT


AS
BEGIN
IF @orderheaderID IS NULL
SELECT OrderHeaderID, OrderDate, Complete FROM orderheader;
ELSE
SELECT
OrderHeaderID
, OrderDate
, Complete
FROM orderheader
WHERE OrderHeaderID = @OrderHeaderID;
END;
GO
GRANT EXECUTE ON dbo.GetOrderHeader TO AppUser
GO

Once this is complete, I can begin to test my security


by simulating one of these users. I can do this in one
of two ways. First, I can open a new query window
and log in as a user. For quick testing, I’ll log in as
Joe_Admin in SSMS.

In my query window, I can see which user I have


logged in with in the status bar.

When I run this script, I get an error. It’s a permission


error, because I didn’t assign rights to this object to
either the user or role.

A Few Tricks for Testing Security (Page 2 of 4) Back to TOC | Page 45


I can open a second window for Joe_User and then
check their permissions, which we can see below are
correct.

The other option I have is to use the EXECUTE AS


command to change my user context. If I am
developing code in a window, I can run an EXECUTE
AS [login | user] with the name of the login or user,
and then run my code. This changes my security
context, so I can test code. You can see this working
below:

With this code, I don’t need to change query


windows. I do, however, need to change back to my
developer context with REVERT at the end. This
allows me to use my developer credentials to change
code, but switch to other users.

Since I can get confused, I do often want to add a


couple of lines to my code. First, I’ll use multiple
EXECUTE AS statements to quickly switch users with
the same code. You can see below I have one
commented out and one executing. I also want to be
sure I know which user was running code, so I will
add a USER_NAME() SELECT to allow me to look at
results and see which user is running code.

With either of these techniques, I can keep testing the code I am writing under different contexts to
be sure that when the application is deployed, it does not require any special privileges.

Existing Security Context


For many existing applications, I find that developers or administrators have often granted rights to
individual users, especially when a shared login is used for something like a web application. While
this works, it is cumbersome and difficult to maintain over time.

In these situations, I would start to use roles as a developer, in addition to granting individual rights.
This helps me quickly add a new user to test something, but also starts to show other developers or
administrators how cumbersome individual rights are. This is the first step to refactoring to a better
security model. This is true whether you use SQL, Windows, or AAD authentication.

A Few Tricks for Testing Security (Page 3 of 4) Back to TOC | Page 46


If I have an existing user from production, I might ensure they exist on this system, with a standard
password for the login. I can script out all the permissions and ensure the user on my development
instance works the same as production knowing production credentials or having simple security.

Summary
Having a known set of logins and users makes development work easier. I prefer to use standard
names on my development databases, with standard roles that simulate the way that different users
will connect in production. This ensures I am testing security under contexts other than my own.

I have shown two ways to do this, with separate query windows for each login and by changing
context. Of these, I find that separate windows, often on separate monitors allow me to keep
developing in one window without confusion. I can easily copy and paste code from my developer
connection to a normal user connection for easy testing. The context switch sometimes causes me
issues, especially when I create errors and the REVERT doesn’t execute.

It is important to test your application, both the features and security. Many of the software bugs and
problems experienced by users come from inadequate testing. With many systems under constant
probing and attack, good development habits can help ensure we do not accidentally release code
without vulnerabilities that could cause data breaches. This also helps ensure we deploy code that our
users can execute without simple security errors.

Questions? Comments? Talk to the author today. Steve Jones on Twitter.

About Steve Jones


Steve Jones has been working with databases and computers for over two
decades. In 2001, Steve founded SQLServerCentral with two partners and has
been publishing technical articles and facilitating discussions among SQL Server
professionals ever since.

LEARN MORE

Non-Tech World of Steve Jones


During his free time, Steve enjoys yoga, snowboarding, and coaching youth volleyball in Colorado.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

A Few Tricks for Testing Security (Page 4 of 4) Back to TOC | Page 47


{Place your company ad here and reach out to our readers}
{Talk to us today. Drop an email at [email protected]}

Got from a friend? Subscribe now to get your copy.


Inside look at
data exposed
Anna Hoffman | @AnalyticAnna

L et’s be honest: staying up to date on Microsoft technologies is not trivial. Overall, this is a good
thing. Our team is constantly listening to your feedback, suggestions, and issues. Before you
know it (literally), we’re implementing those changes and adding capabilities to our suite of
products. I am often asked a question like, “What’s the roadmap?” Usually, I can comment on themes
and things we’ve announced, but I can’t say much more than that. However, I generally share things
that have come out in the past month, either in public preview or general availability, and a good
portion of whoever is listening learns something new. So, while overall we’re innovating quickly and
that is helping customers, we’re not making it easy for you to learn what’s new. And, with the year
we’ve had, it’s hard to have those hallway conversations with your colleague that heard about that
new thing that you’ve been wondering about for a while.

That’s where Data Exposed comes in. Data Exposed is a series that we brought back to life in 2019. I
stumbled into Channel 9 Studios at Microsoft Headquarters in Redmond, WA to chat with the
producer. I asked her if we could start a show focused on Azure Data. To my surprise (since I had no
funding or video experience), she said yes! We started recording short episodes in the ‘self-hosted’
studio. This is a small room in the back of the studio where speakers can record themselves without a
producer. It’s a cool setup. Anyways, we started by recording episodes with team members that were
willing to make the trek to Building 25 to sit in a small room and record.

Fast-forward to 2020. Everything went virtual and we were out of episodes. Channel 9 Studios started
recording episodes on Skype. Being not based in Redmond myself, this opened the door for me to
come back and host the show, equipped with an awesome co-producer, Marisa Brasile (she really is
the engine that keeps this train moving!). Not only did virtual recording open the doors for me, but it
opened the doors for more than 50 Program Managers, Engineers, and Microsoft MVPs from around
the globe (and counting). We could now record with anyone, anywhere!

With the lack of in-person conferences for our team members to make their announcements and
reach customers, Data Exposed became a key part in getting awareness out about new (and existing)
capabilities in Azure SQL and SQL Server. We started to expand and occasionally host episodes with
teams like Azure Data Factory, Azure Synapse, Azure Data Explorer, and more. In 2020, we released
two episodes every week (Tuesday and Thursday) and demand grew significantly.

Inside Look at Data Exposed (Page 1 of 2) Back to TOC | Page 49


By the end of 2020, we were ready to take on our next challenge: Live shows. In January 2021, we
started Data Exposed Live, which airs every Wednesday at 9AM PT (4PM UTC). Marisa and I planned
and brainstormed how we would land this, and we decided on a schedule roughly as follows: for each
month, the first week will be a news update, the second two will be deep dives, and the fourth one
we will introduce a new series, Something Old, Something New, with Buck Woody. The news update
episode has become an increasingly important episode, in my opinion, so that our community can stay
up to date on what has happened in the last month and what’s upcoming. We bring in members of
different teams to talk about and demo what’s new, and then we wrap it up nicely in a blog that can
always be found at aka.ms/NewsUpdate.

Today, we release short episodes every Thursday, we stream every Wednesday, and we release
episodes with MVPs on the last Tuesday of every month. Data Exposed has been a lot of work but also
a lot of fun, and we hope it provides value to you and your organizations. Thank you for your support,
and if you ever have feedback, please let us know. To connect with us, you can follow our team on
Twitter @AzureSQL, and you can subscribe to our YouTube Channel at https://aka.ms/azuresqlyt.

Questions? Comments? Talk to the author today. Anna Hoffman on Twitter.

About Anna Hoffman


Anna Hoffman is a Data & Applied Scientist on the Azure Data team at
Microsoft.

LEARN MORE

Non-Tech World of Anna


During her free time, Anna loves running, hiking, and trying new foods.

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Inside Look at Data Exposed (Page 2 of 2) Back to TOC | Page 50


Free Content
every week

We are progressively releasing DPS 2020


content for the community. You can have free
access and watch the sessions on-demand.

Learn More

If you missed attending SQLBits or just want to


catchup then we have an enhanced experience
we are running weekly. Every Thursday we will be
replaying sessions from SQLBits 2020 using our
spatial chat platform. The speaker will available
for live Q&A plus there is a great opportunity to
network with other data professionals.

Learn More

Redgate is working to make PASS’s content


available through the websites, but they not
there yet. In the meantime you can view PASS
video content on PASS TV & Redgate University.

Learn More

Back to TOC | Page 51


Good Database
Architecture is the
Best Optimization
Edward Pollack | @EdwardPollack

D ata performance challenges are a universal part of application development. No technology,


platform, or development methodology can allow us to escape completely from them. As data
gets larger, the ways to inadvertently abuse it become more numerous.

Solving these problems is often seen as a reactive task. An application performs slowly, someone
complains, and a developer or administrator needs to research, find the source of the latency, and fix
it (somehow). Often, though, the solution is one that could have been implemented proactively as
part of the original release of the offending code.

These alternatives can be seen as choices we make in application development every day:
1. Do it right the first time.
2. Do it quickly the first time.

This challenge can be sarcastically addressed like this:

While silly, these do represent real organizational challenges, decisions, and decision-making
processes that are not silly.
I can think of a seemingly endless list of mistakes made over the years that were the direct result of
speed over precision. While not all errors can be avoided in life, there is value in preventing as many
as is reasonably possible up-front. This also has the bonus of improving our sleep schedules at those
times when bad things happen. Therefore, striking a comfortable balance between design and
architecture and technical debt is a valuable skill in software development.
Here are some examples and how they impacted real projects, software, and people. The names and
details are different, the but the mistakes illustrated have been made many times by many people.

Good Database Architecture is the Best Optimization (Page 1 of 6) Back to TOC | Page 52
Data Retention, Who Needs It?
Creating a new table is a common task. What is not common enough, though, are the questions we
should ask ourselves when creating a new data structure.
Imagine a log table that will accept application log data on a regular basis. The table is created as an
afterthought with no additional considerations as to how it will be used in the future:

CREATE TABLE dbo.application_log

(log_id INT NOT NULL IDENTITY(1,1),


log_time DATETIME2(3) NOT NULL,
log_title VARCHAR(100) NOT NULL,
log_message VARCHAR(MAX) NOT NULL,
is_error BIT NOT NULL,
error_detail VARCHAR(MAX) NULL);

The application begins to run for the first time and everything is great! 6 months later, though,
developers complain that application logging is slow. In addition, the application database has been
growing unusually large, consuming an unexpected amount of storage and backup resources.
What was forgotten? Retention! When creating new data, determine a retention policy for it and
ensure that computing resources can handle the associated data growth over time.
A retention period for data could be a week, a month, a year, or forever, depending on how quickly it
grows and what it is used for. In the log example above, developers likely would have assigned a
retention period of 1 week (or maybe a month) to the data and cleaned up any older data during a
low-volume time.
OK, problem solved! A retention process is created that cleans up data older than a week each
evening. The cleanup process takes an exceptionally long time to process, though. So long, that it is
stopped and investigated. What else was forgotten? Indexes! The table above has no clustered or
non-clustered indexes. With each cleanup of old data that occurred, the table had to be scanned. In
addition to being slow, that scan will block other processes that try to log to the table. The following
adds a clustered primary key and a supporting non-clustered index on log_time:

CREATE TABLE dbo.application_log

(log_id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_application_log PRIMARY KEY


CLUSTERED,
log_time DATETIME2(3) NOT NULL,
log_title VARCHAR(100) NOT NULL,
log_message VARCHAR(MAX) NOT NULL,
is_error BIT NOT NULL,
error_detail VARCHAR(MAX) NULL);

CREATE NONCLUSTERED INDEX IX_application_log_log_time ON dbo.application_log


(log_time);

If this table is highly transactional, even during less busy times, then the deletions made as part of
retention could be batched. This reduces each transaction size and reduces contention with other
transactional processes running at the same time.
The Data Type Follies
Data structures are easy to create and hard to change. Once applications, reports, APIs, and users are
relying on a specific database schema, changing it becomes challenging. The more time that passes,
the more work is needed. Choosing the best data types on day one can save more work later and as
a bonus, help prevent bad data.

Good Database Architecture is the Best Optimization (Page 2 of 6) Back to TOC | Page 53
Consider the following table:
CREATE TABLE dbo.sales_transaction

( transaction_id INT NOT NULL CONSTRAINT PK_sales_transaction PRIMARY KEY


CLUSTERED,

product_id INT NOT NULL,


salesperson_id INT NOT NULL,
transaction_amount DECIMAL(18,4) NOT NULL,
transaction_time VARCHAR(25) NOT NULL,
shipping_date VARCHAR(25) NULL);

ALTER TABLE dbo.sales_transaction ADD CONSTRAINT FK_sales_transaction_product


FOREIGN KEY (product_id) REFERENCES dbo.product (product_id);
ALTER TABLE dbo.sales_transaction ADD CONSTRAINT FK_sales_transaction_person
FOREIGN KEY (salesperson_id) REFERENCES dbo.person (person_id);

After our previous lesson, I made sure to include a clustered primary key, some foreign keys, and
created an archival process for any transactions over 2 years old. Things are going great until a
developer reports errors in production. I take a closer look and discover the following row in the table:

The transaction time is on September 31st?! That is not a real date, even during the longest and busiest
of months! Storing the date as a string seemed reasonable – and made saving the data from the
application quite easy! The right choice, though, was a data type that represented a date & time.
Then, when September 31st was entered, it would throw an error, rather than create bad data.
A few days of work later, a change is deployed and the table now contains a DATETIME:

CREATE TABLE dbo.sales_transaction

( transaction_id INT NOT NULL CONSTRAINT PK_sales_transaction PRIMARY KEY


CLUSTERED,

product_id INT NOT NULL,


salesperson_id INT NOT NULL,
transaction_amount DECIMAL(18,4) NOT NULL,
transaction_time DATETIME NOT NULL,
shipping_date DATETIME NULL);

For good measure, I also change the shipping_date column to a DATETIME so similar problems cannot
happen there. As a bonus, performance improved on the table as the implicit conversions between
DATE and DATETIME values in the application were not being compared to VARCHAR values in the
table, allowing a non-clustered index on transaction_time to yield index seeks instead of index scans.
A month later, though, another error crops up related to the shipping date. Some investigation reveals
data that looks like this:

Good Database Architecture is the Best Optimization (Page 3 of 6) Back to TOC | Page 54
That is not right! The shipping date is a DATE. There should not be a time component to
it…but…because the data type allowed it, some code somewhere inserted it. While the fix itself was
easy – truncate the TIME portion and alter the column to be a DATE, it took some development time
and a deployment to fix, which meant a late night working that I would have preferred doing anything
else. The new version of the table looks like this:
CREATE TABLE dbo.sales_transaction
( transaction_id INT NOT NULL CONSTRAINT PK_sales_transaction PRIMARY KEY
CLUSTERED,

product_id INT NOT NULL,


salesperson_id INT NOT NULL,
transaction_amount DECIMAL(18,4) NOT NULL,
transaction_time DATETIME NOT NULL,
shipping_date DATE NULL);

This time, I get almost a year of peace and quiet on this table, until one day there is an application
outage when all sales stop saving their data to the database. Checking the error logs and testing
reveals the following message:

It turns out that this table had an exceptionally high volume of transactions and after a year hit its
2,147,483,647th sales transaction. When transaction_id #2,147,483,648 was inserted, the above error
was the result. No one had told me that this table would see billions of transactions! Maybe I should
have asked?
The problem was worked around by setting the application to use negative numbers as transaction
IDs. This bought some time, but another long night lay ahead of me where I had to create a new table
with a BIGINT transaction_id column, backfill it with the existing data, and then swap the tables so
that that became the active table, complete with historical data.
The lesson of this escapade is that choosing data types is a significant decision. They reflect data size,
content, and longevity. Knowing up-front how much data will be created, how it will be used, and
what it represents allows for smart data types to be chosen immediately. This helps prevent bad data
and avoids painful emergencies that my future self would prefer to avoid.
To NULL or Not to NULL, that is the Question
This story starts with a simple table:
CREATE TABLE dbo.person
( person_id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_person PRIMARY KEY
CLUSTERED,
first_name VARCHAR(100) NOT NULL,

Good Database Architecture is the Best Optimization (Page 4 of 6) Back to TOC | Page 55
last_name VARCHAR(100) NOT NULL,
email_address VARCHAR(100) NOT NULL,
date_of_birth DATE NOT NULL);

After its release, a question is received regarding what to do if a person does not provide a date of
birth. Following some discussion, it is decided that an unknown date of birth can be represented by
NULL. The change is made:

CREATE TABLE dbo.person


( person_id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_person PRIMARY KEY
CLUSTERED,

first_name VARCHAR(100) NOT NULL,


last_name VARCHAR(100) NOT NULL,
email_address VARCHAR(100) NOT NULL,
date_of_birth DATE NULL);
The change goes smoothly, but the next day a ticket is received that indicates searches based on date
of birth no longer work. Apparently the application has no graceful way to deal with NULL. They ask
instead for the column to be made NOT NULL and to include a dummy value for unknown values. Take
3:
CREATE TABLE dbo.person

( person_id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_person PRIMARY KEY


CLUSTERED,

first_name VARCHAR(100) NOT NULL,


last_name VARCHAR(100) NOT NULL,
email_address VARCHAR(100) NOT NULL,
date_of_birth DATE NOT NULL CONSTRAINT DF_person_dob DEFAULT
('1/1/1900'));

Life goes on until one day an annoyed analyst asks you why there are so many people in the system
that are exactly 121 years old. Some head-scratching and review of timing reveals that the default
date of birth was polluting any date calculations that happened to use date of birth. In addition to
absurd ages, the system was also sending out birthday offers to all people with the dummy date of
birth, wishing them a happy birthday on January 1st.
This sequence of events illustrates how a simple problem can result in real-world awkwardness. The
easiest solution is to make the date of birth a required field at all levels of the application. This ensures
that dummy data or NULL is not needed. Alternatively, if date of birth is truly optional, then a handful
of legit solutions exist, including:
1. Make date_of_birth NULL and ensure that this is documented and handled effectively.
2. Normalize date_of_birth into a new optional table. This adds complexity and is not my
preferred solution, but is a way to normalize, avoid NULL, and ensure data quality.
As always, before making decisions based on data, filter accordingly. If a data element is optional,
then decisions made with that column need to properly omit or take into account scenarios where
data is not provided.
Where Does This Lead Us next?
The moral of this short series of stories, code, and vague attempts at humor was to remind us that
database architecture and performance optimization go hand-in-hand. Both address the same
challenges in different ways and at different times within a software design life cycle.
Asking (and answering) good questions up-front can allow for better data architecture and remove
the need for dramatic bug-fixes and changes later on. While not all problems can be proactively
solved, a keen eye for detail can prevent many future problems ranging from inconveniences all the
way to full-scale disasters.

Good Database Architecture is the Best Optimization (Page 5 of 6) Back to TOC | Page 56
As this list of design questions grows, so does our experience with seeking the answers to them and
turning that information into well-architected database objects and code. We all have crazy stories
of how bad data choices led to messy clean-up operations and those tales we share over a drink may
very well be the motivation and foundation for future good database architecture decisions.

Questions? Comments? Talk to the author today. Edward Pollack on Twitter.

About Edward Pollack


Ed Pollack has 20+ years of experience in database and systems administration,
which has developed his passion for performance optimization, database
design, and security.

LEARN MORE

Non-Tech World of Edward Pollack


In his free time, Ed enjoys travel, baking, spicy foods and sharing it with his family!

Want to write for the magazine? Comments? Feedback? Reach out to us at


[email protected]

Good Database Architecture is the Best Optimization (Page 6 of 6) Back to TOC | Page 57
Training Classes are 8 hours, focused,
deep-dive, demo-based virtual classroom
training. Each class will run for eight
hours in total, four hours each day, for
two consecutive days. Each Training
Class is designed to offer intermediate &
advanced-level training on a specific
topic/subject. These classes offer more
knowledge, skills, and expertise beyond
the summit
Back content.
to TOC | Page 58
Back to TOC Learn More

You might also like