SQLServerGeeks Magazine May 2021
SQLServerGeeks Magazine May 2021
Website
Magazine Linkedin
It’s time to lift the curtains. We are thrilled to present to you, the
first edition of The SQLServerGeeks Magazine. Telegram
In all honesty, the journey to this point was riddled with road-bumps Twitter
and potholes of all sorts. As we started conceptualizing the first
edition of the magazine, we were hit with the second wave of the
pandemic quite hard affecting the families of our team members Facebook
unsettling all of us. Things seemed grey and bleak. Nonetheless, we
kept your heads high and spirits higher as we gave it our undivided
best to bring you this edition, undoubtedly the first of many, as a COPYRIGHT STATEMENT
reminder of the fact that perseverance does pay off. Copyright 2021. SQLServerGeeks.com. c/o
eDominer Systems Pvt. Ltd. All rights
A massive shoutout to all authors -Warner Chaves, Leonard Lobel, reserved. No part of this magazine may be
reproduced or transmitted in any form or by
Edward Pollack, Parikshit Savjani, Tracy Boggiano, Tomaž Kaštrun, any means, electronic or mechanical,
Steve Jones, Anna Hoffman, and Anupama Natarajan -who agreed including photocopying and recording, or by
to contribute on such short notice –hats off to them. We are truly any information storage and retrieval
system, without permission in writing from
humbled to see that the #SQLFamily has got our back and with the publisher. Articles contained in the
absolute certainty, we can say that we got theirs! magazine are copyright of the respective
authors. All product names, logos,
As much as we are passionate about all things SQL, it was our trademarks, and brands are the property of
their respective owners. For any
unanimous decision to add something more, something Beyond clarification, write to
SQL, comprising of the little things in life we have come to cherish [email protected].
in these recent times of uncertainty. If you are already a Geek or
aspiring to be one, we bring you both the code and the colors from
CORPORATE ADDRESS
the world of SQL and the amazing people that make it possible.
Bangalore Office:
We here at SQLServerGeeks believe everything can be fine-tuned 686, 6 A Cross,
and optimized. It is no different with our magazine. Make sure to 3rd Block Koramangala,
give us your feedback so we can continue to provide quality content, Bangalore – 560034
well-curated to your interests. Write to us at
[email protected]
Kolkata Offices:
Just as a magician never reveals all his tricks at once, we too have a
few cards up our sleeves. So, make sure to stay tuned for a spectacle Office 1:
in the days to come. We are just getting started. Make sure you help eDominer Systems Pvt. Ltd.
us spread the word. Ask your friends and colleagues to subscribe to The Chambers
the magazine. Office Unit 206 (Second Floor)
1865 Rajdanga Main Road
From all of us at SQLServerGeeks, we wish you a pleasant read. (Kasba)
Happy Learning. Kolkata 700107
Yours Sincerely
SQLServerGeeks Team Office 2:
304, PS Continental,
Got from a friend? Subscribe now to get your copy. 83/2/1, Topsia Road (South),
Kolkata 700046
Back to TOC | Page 2
TABLE OF CONTENTS
04 a 10 a 12 a 16 a
18 a 24 a 26 a 33 a
35 a 39 a 40 a 43 a
44 a 49 a 51 a 52 a
A Few Tricks for Inside Look at Data Free Content Every Good Database
Testing Security Exposed Week Architecture is the
Best Optimization
Back to TOC | Page 3
Bullet-proofing Your
RTO and RPO
Warner Chaves | @warchav
I f you are a DBA then you are surely familiar with the terms Recovery Time Objective (RTO) and
Recovery Point Objective (RPO). These terms have been used for decades to define service level
agreements for IT systems all around the world.
Relational database systems such as SQL Server provide mechanisms to configure, manipulate and
control the behaviour of your database system so that you can meet your RTO and RPO requirements.
This is not a new topic and so you must be thinking to yourself that you already know everything about
it and I’m beating on a dead horse. However, I recurringly run into incidents where everyone thought
the RTO and RPO would be easily met and it ended up being missed. Sometimes by a very large margin.
To this effect, I decide to compile some of the cases and scenarios to watch out for when working with
SQL Server and Azure SQL Db to make sure that when you commit to those RTO and RPO requirements,
you will be able to meet them guaranteed!
RTO
Let’s start with RTO. This one is all about how fast you can get back in business and that is why it’s
defined as a time objective. For production systems this could be hours down to seconds and for non-
production systems it could be hours, sometimes even days. To meet the RTO you will implement not
only a backup strategy but also a High Availability (HA) and Disaster Recovery (DR) strategy if
necessary.
Restore Testing
If you depend on your backups completely for your RTO, then at the very least you need to have a
restore testing strategy. There are many ways to go around this:
Ideally the restore should also include a consistency check to cover all the bases and make sure the
data is 100% recoverable.
Even though usually the least files to restore means less time, it might not necessarily be so and
different file types with the same amount of files to apply will also give you different times. Restore
time depends on the sizes of the backups and the number of operations done during the timespan
covered by those files so there is always some level of uncertainty. For this reason, I still recommend
you test the different restore file sequences to figure out which one will be the fastest and do it on a
regular schedule.
Now you might be thinking, I have an HA configuration and I don’t depend on my backups for my RTO,
I have a cluster and multiple nodes, etc. In this case you still need to be mindful of your backup-based
RTO because what happens if someone either maliciously or accidentally deletes data or drops a
table? Your Availability Group will immediately replicate this destructive change to the rest of the
cluster! And back to the backups you will have to go.
I have seen this happen countless times. For example, a DBA will have a two-node Availability Group
and set them to synchronous replication and so they think that any failover will be near instantaneous
and have extremely low RTO. Unfortunately, this is not the case at all, as the synchronous process only
guarantees remote log hardening, not actually replaying the log. If there are a lot of operations to be
replayed, you will still be waiting.
If you are using the Auto-failover-group capability then you need to be aware that Microsoft will not
automatically failover your databases at the first sign of an issue but it can take up to 1 hour of
continuous outage before they trigger the failover. If this is not acceptable for your requirements then
you need to roll your own monitoring and do your own manual failover if needed.
One final gotcha is that Microsoft offers a 30 second database failover when invoked by the user but
this is not backed by a formal SLA except when running the Business-critical tier of Azure SQL Db.
RPO
Recovery Point Objective is the amount of data in terms of time that your SLA allows to be lost in the
event of an outage. Unlike RTO, the RPO is usually measured in smaller units could be hours down to
seconds. Most DBAs will implement their RPO with a combination of log backups as well as HA/DR
technologies that continuously replicate database operations.
The biggest gotcha in understanding RPO is that it is not only about being able to recover to the last X
minutes or seconds but also about recovering to a specific point in time in the backup history that is
considered “active” by the business. Going back to the example of someone maliciously or accidentally
doing data changes, your last log backups happening at a high frequency will not protect you if the
change happened a few days ago and you already deleted those log backups to save on backup
storage.
Availability Groups
For some requirements, the RPO of log backups is not sufficient and a technology like Availability
Groups is implemented to provide an even smaller RPO. The usual configuration is to have log backups
running as well as one or more synchronous replicas of the primary database.
In the configuration above, the system is taking log backups every 15 minutes and keeping one sync
copy with the primary. The biggest danger to your RPO here is that any type of issue stops the
synchronous replication and leaves you open for data loss. This can happen for multiple reasons but
the most common one is that temporary network issues disconnect the nodes and then the
application continues to perform changes on the primary while the backlog of changes to apply to the
secondary keeps growing. As an added issue, if the log starts to grow due to records accumulating,
your log backups will start taking longer and possibly slide over your RPO time window.
A common misconception is that in a synchronous two node cluster, if the secondary replica is
disconnected, the primary SQL Server will simply stop accepting changes (sacrificing availability for
RPO) but that is not the case. I recommend having robust Availability Group monitoring in place either
via 3rd party tools or home-grown scripts to detect these issues.
Conclusion
When you are in charge of data you must take that responsibility with the level of seriousness that it
truly requires. Guaranteeing specific levels of RTO and RPO are part of this responsibility and you need
to do everything in your power to make sure you can meet them. These cases and scenarios shared
here can help you cover some gaps in your general strategy so that you are never caught off-guard.
And remember, even if you are in the managed Azure SQL Db cloud offering, you should still verify
your configuration and test your procedures to make sure your RPO and RTOs are being met. Thanks
for reading!
LEARN MORE
Warner is originally from Costa Rica, currently living in Canada with his wife, daughter and a small
bulldog called Gizmo.
Gareth was a devout Christian with a strong life long relationship with Jesus Christ.
He is survived by his mother Joyce; wife Kellie; sisters Kirsty and Tessa; brother
Bernard; his son Chris; daughter Lilly and his step children Joshua, James, and Kristen.
#2021 is still a challenging year for the whole world like #2020 because of the COVID-19 pandemic.
We celebrated the International Women’s day on March 8th
with the theme #ChooseToChallenge. As women we can all
choose to challenge and call out gender bias and equality and
celebrate our achievements.
Data – Data is becoming the new fuel powering Artificial Intelligence and Digital Transformation for
most organisations. There are more roles in the Data space for women like Data Analysts, Database
Administrators, Data Engineers and Data Scientist. Most of these roles involves in understanding the
data and how they stored, processed, analysed and make predictions using them.
Security Engineer
• SC-200 part 1: Mitigate threats using Microsoft Defender for Endpoint - Learn | Microsoft Docs
• SC-200 part 2: Mitigate threats using Microsoft 365 Defender - Learn | Microsoft Docs
• SC-200 part 3: Mitigate threats using Azure Defender - Learn | Microsoft Docs
• SC-200 part 4: Create queries for Azure Sentinel using Kusto Query Language (KQL) - Learn |
Microsoft Docs
• SC-200 part 5: Configure your Azure Sentinel environment - Learn | Microsoft Docs
• SC-200 part 6: Connect logs to Azure Sentinel - Learn | Microsoft Docs
• SC-200 part 7: Create detections and perform investigations using Azure Sentinel - Learn |
Microsoft Docs
• SC-200 part 8: Perform threat hunting in Azure Sentinel - Learn | Microsoft Docs
Image Credits
www.internationalwomensday.com
www.imdb.com
LEARN MORE
Anu loves baking muffins, cupcakes and cakes in her free time. It’s her most recent passion finally
having enough time to make full use of her oven.
Learn More
Nothing in the
Career Highlights world can prepare
us to bear the loss
MCP of someone who
has influenced our
MVP lives greatly. The
notion that ‘time
Chief Technical Editor of heals the pain’ is a
SQLServerGeeks myth. We just
learn to get
Vice President – familiar with the
SQLServerGeeks idea of their
absence. The void
Core Team Member that losing
Asia's First SQL someone creates
Conference - in your life
SQLServerGeeks Annual remains, while the best we can do is find a healthy way to cope with
it and channel our grief.
Summit 2015
It is with a heavy heart that we, at SQLServerGeeks &
DataPlatformGeeks mourn the loss of Mr. Ahmad Osama, as our
Books Authored deepest condolences go out to his family & friends, praying they
Professional Azure SQL find the strength and courage to make it through these trying times.
Database Administration It is our due responsibility, towards a friend, comrade & a fellow
Geek, to take this moment to honor his journey, by remembering
Professional SQL Server
the things that made him the person we have come to know, adore
High Availability and admire.
Disaster Recovery Let’s take a stroll down memory lane…
He joined SQLServerGeeks back in 2012 as a Chief Technical Editor
Blog Series while having already acquired the stature of a Microsoft Certified
Professional, right
Accidental DBA when we were
climbing our first
DataPlatFormLabs few rungs of the
global ladder. Since
then, he has made
innumerable
contributions,
making him a part
of our Core Team,
as the Vice
President of
SQLServerGeeks –
A lways Encrypted is the latest of several encryption features available in SQL Server and Azure
SQL Database. We’ve had column-level encryption since SQL Server 2005, which uses either
certificates or symmetric keys to keep encrypted data hidden from view. SQL Server 2008
(Enterprise Edition) added Transparent Data Encryption (TDE) to encrypt the entire database–again,
using a special database encryption key–so that without that key, the entire database (and its backups)
remains encrypted and completely inaccessible.
Although these features serve us well, they do suffer from two significant drawbacks.
First, the very certificates and keys used for encryption are themselves stored in the database (or
database server), which means that the database engine is always capable of decrypting the data.
While this may be acceptable with an on-premise data center that you manage yourself, it’s a major
problem if you want to move your data to the cloud. Because by giving up physical ownership of the
database, you're also handing over the encryption keys and certificates to your cloud provider
(Microsoft, in the case of Azure), empowering them to access your private data.
Another concern is the fact that these older features only encrypt data “at rest” (on disk), relying on
other protocols (for example, SSL and TLS) to encrypt data “in flight” (across the network).
Thus, Always Encrypted effectively separate the clients who own the data, from the cloud providers
who host it. Because the data is always encrypted, SQL Server (and the cloud hosting provider) cannot
decrypt it. Data can only be served up in its encrypted state, and so, data is inherently encrypted in
flight as well. Only when it arrives at the client can it be decrypted, on the client, by the client, who
possesses the necessary keys. Likewise, when inserting or updating new data, that data gets encrypted
immediately on the client, before it ever leaves the client, and remains encrypted in-flight all the way
to the database server, where SQL Server can only store it in that encrypted state–it cannot decrypt
it. This is classic GIGO (garbage in, garbage out), as far as SQL Server is concerned.
With a CEK in hand, data can be encrypted and decrypted in column(s) protected by that CEK. Thus,
CEKs must be carefully guarded. They can’t be stored in the database, since encryption and decryption
occurs exclusively on the client side.
As a result, the database has all the information that the client needs to perform encryption and
decryption but is itself powerless to perform these operations on its own. And that’s because the CEK
is needed for cryptography operations; but the database only has a CEK that has been encrypted by
the CMK (and not the CEK itself). Furthermore, it has only a client-side path to the CMK (and not the
CMK itself). Thus, Always Encrypted can be viewed as a hybrid feature that is based on client-side
encryption/decryption and driven by server-side metadata.
Encrypting a Table
SQL Server Management Studio (SSMS) provides tooling to generate CMKs and CEKs for Always
Encrypted. It also has a wizard that will migrate an existing (non-encrypted) table to a table with one
or more encrypted columns.
For each column, you choose a CEK and an encryption type of deterministic or randomized. You must
choose deterministic if you want to be able to query (equality only) or join on the column. This works
because the same ciphertext (encrypted data) is always generated from the same clear text.
Otherwise, you should choose randomized because it’s much more secure than deterministic. For
example, deterministically encrypting a Boolean column yields only two distinct ciphertext values,
making it easy for a hacker to distinguish true and false values. With random encryption, the same
Boolean column will appear to have many different values, but cannot be queried against.
With this basic implementation, data is encrypted not just at rest and in-flight, but “in-use” as well.
That is, limited operations over encrypted data can be performed by SQL Server without requiring
decryption. Notably, only equality comparison is allowed, supporting point lookups (like the SSN
example), as well as JOIN, GROUP BY, and DISTINCT, but not much else. Furthermore, these operations
only work with deterministically encrypted columns, which is less secure than randomly encrypted
columns.
In SQL Server 2019 (and, most recently, Azure SQL Database), Always Encrypted has been greatly
enhanced to leverage secure enclaves. With secure enclaves, server-side processing of encrypted data
is fully supported, including not just equality comparisons (previously possible only with deterministic
encryption), but range queries, pattern matching (LIKE), and sorting–all over randomly encrypted data.
Furthermore, using secure enclaves, cryptography operations can be performed in place on the server.
Encrypting existing data therefore does not require round-tripping the network (like the SSMS wizard
with V1), which scales poorly with for large amounts of data.
All this may seem like an impossible feat, given that SQL Server still has no access to the keys needed
for encryption and decryption. Yet SQL Server 2019 and Azure SQL Database make this possible by
leveraging secure enclaves in conjunction with the base Always Encrypted functionality introduced in
SQL Server 2016. So data remains protected on the one hand, while at the same time, the ability to
perform rich server-side computations over that data is preserved.
Like any memory, an enclave can contain both code and data. However, code must be signed in a
special way in order to be able to run in an enclave, and then that becomes the only code running on
the machine that can access data contained inside the same enclave.
Several technologies are available today to provide the secure isolation of an enclave. This includes
hardware-based solutions such as Intel Software Guard Extensions (SGX), which is used by Azure SQL
Database. Secure enclave isolation can also be powered by leveraging the machine’s hypervisor, such
as virtualization-based security (VBS) in Windows Server 2019 and Windows 10 v1809, which is used
by SQL Server 2019.
An attacker attempting to access an enclave can easily open a debugger, connect to the process that
contains the enclave, and find the enclave
memory. But the memory contents will not be
visible to them. For example, attempting to view
the contents of a VBS enclave reveals nothing
but question marks:
When the database engine starts, it loads an enclave. This means that SQL Server is now a hosting
process that contains an enclave, but SQL Server itself does not run in the enclave, nor can it access
the enclave’s content. Rather, the enclave acts as an extension of the client-side trust boundary on
the server machine; a trusted representative of the client within the SQL Server environment. Think
of it as a foreign embassy. The embassy is physically located inside a foreign country. Yet within the
perimeter of the embassy, only the laws of its native country apply, while the laws of the hosting
foreign country do not. At the same time, it’s just a footstep to enter or exit the embassy, compared
with the thousands of miles to travel back and forth between the countries.
The way to think of this is in terms of the Always Encrypted philosophy is, cryptography operations
are still performed exclusively by the client, but not necessarily on the client machine. Meaning, the
enclave on the server machine in essence is the client. Critically, this means that the client and server
can communicate without round-tripping the network, because client code is running inside the
enclave as an extension of the client machine.
Enclave Attestation
But how does the client machine know that the enclave on the server machine can be trusted? How
does it know that there isn’t malicious code running inside the enclave? This supreme level of trust is
achieved by both the client and server machines negotiating through a third machine, called the
attestation server.
As the name implies, the sole purpose of this server is to attest to the authenticity of the enclave. That
is, it certifies to the client that the enclave on the server is running code that can be trusted. Only then
does the client authorize the use of the enclave.
Once attestation succeeds, the client driver establishes a secure tunnel connection to client code
running inside the enclave on the server machine. The client machine and the client code inside the
enclave on the server both exchange a shared secret over this secure tunnel. This secret is then used
to encrypt a CEK on the client machine and send it to the enclave on the server machine. Inside the
enclave–and only inside the enclave–the shared secret is used to decrypt the CEK.
When we ask SQL Server to execute a query that includes rich computations, it’s still powerless to
process those portions of the query that operate over encrypted columns. So instead, SQL Server
delegates these portions of the query over to the enclave, along with the encrypted data that needs
to be examined for that particular operation (for example, a range comparison). The query engine
injects the encrypted data into the enclave (which is effectively the same as passing it to the client but
without a network call) and asks it to perform the operation.
Conclusion
This article gave you an overview of Always Encrypted with secure enclaves in SQL Server 2019. And
now that this exciting feature is finally available in Azure SQL Database as well, you can leverage the
technology for greater security in both your on-premise and cloud databases.
LEARN MORE
During his free time, Lenni loves playing the piano and traveling.
T he year 2021 has started as an eventful year and continuous to be challenging times for people,
businesses and economies around the world. As our CEO Satya Nadella puts it – “We have seen
two years of digital transformation in two months”. The Azure Database for MySQL service is at
the heart of this transformation empowering online education, video streaming services, digital
payment solutions, e-commerce platform, gaming services and govt and healthcare websites to
support unprecedented growth, save cost and enable our customers to scale. It is immensely satisfying
to see Azure Database for MySQL service is enabling our customers to meet the growing demands for
their services during these critical times. Azure database for MySQL service with community version
of MySQL is powering mission critical applications and services like healthcare services for Denmark
citizens, digital payment application for Hong Kong citizens, music and video streaming platforms for
Indian, Korean and Japanese citizens, online news websites, mobile gaming services including our very
own Minecraft Realms.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 26
running community version of MySQL (Page 1 of 7)
and flexibility when it comes to configuring your MySQL servers—which has prevented you from
taking advantage of the benefits of a managed service but hopefully, not anymore.
We designed the new Flexible server deployment option for MySQL with these goals in mind:
• Simplify developer experiences – Make it easier for you to quickly onboard, connect, and get
started.
• Maximize Database Controls – Provide maximum control on your server configurations to
provide experiences at par with running your own MySQL deployments.
• More Cost Optimization Controls – Provide more options for you to optimize and save costs.
• Enable Zone Resilient & Aware Applications – Allow you to build highly available, zone
resilient and performant applications, with your MySQL database co-located in the same zone,
so you can tolerate zone level failures.
Let us now dive into what you can expect from the new Flexible server deployment option on Azure
Database for MySQL—as well as a bit about what your experience will be like.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 27
running community version of MYSQL (Page 2 of 7)
As of today, Flexible Servers offering for Azure Database for MySQL is live in 14 Azure regions. You can
check our documentation for most up to date information.
If you
would like to get a guided quick start, I recommend you start here. Here is the detailed list of
commands you can expect.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 28
running community version of MYSQL (Page 3 of 7)
Network Isolation Control
With Flexible Server on Azure Database for MySQL, you can run and select your server to be in either
be in public access mode or secure it in private access mode.
With Private access, you can deploy your Flexible server into your Azure Virtual Network. Azure virtual
networks provide private and secure network communication. Resources in a virtual network can
communicate through private IP addresses only. Flexible server in private access mode has no public
endpoints and cannot be reached from outside the virtual network. In addition, you can create a
flexible server in virtual network using a single command show below. The subnet should not have
any other resource deployed in it and this subnet will be delegated
to Microsoft.DBforMySQL/flexibleServers, if not already delegated. See Networking concepts for
more details.
By default, SSL is enabled with TLS 1.2 encryption enforced but it can be disabled by setting the
require_secure_transport to OFF from portal.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 29
running community version of MYSQL (Page 4 of 7)
the patch on pre-production and test environments first as soon as service releases it to test it and
plan to roll out in production at a later schedule. With the new Flexible Server option for Azure
Database for MySQL, you can now schedule your maintenance at a time which works best for you.
From the Maintenance blade in Azure portal, you can specify the day of the week and 1 hour time
window in a month, which works best for you to perform server patching which may involve restarts.
For more details, refer Scheduled Maintenance concepts.
On the applications side, the application is typically developed in Java or php and migrated to run
on Azure virtual machine scale sets or Azure App Services or are containerized to run on Azure
Kubernetes Service (AKS). With virtual machine scale set, App Service or AKS as underlying
infrastructure, application scaling is simplified by instantaneously provisioning new VMs and
replicating the stateless components of applications to cater to the requests but often, database ends
up being a bottleneck as centralized stateful component.
The read replica feature allows you to replicate data from an Azure Database for MySQL flexible server
to a read-only server. You can replicate from the source server to up to 10 replicas. Replicas are
updated asynchronously using the MySQL engine's native binary log (binlog) file position-based
replication technology. You can use a load balancer proxy solution like ProxySQL to seamlessly scale-
out your application workload to read replicas without any application refactoring cost.
See Read Replica concepts to learn more.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 30
running community version of MYSQL (Page 5 of 7)
Build Zone resilient applications with Flexible Server
With Azure Kubernetes Services (AKS) or Virtual Machine Scale sets, you can build and deploy zone
resilient application that can tolerate zonal failures. With Flexible server on Azure Database for MySQL,
you can now enable zone redundancy for your MySQL database server as well.
When you enable zone
redundant high availability for
your MySQL server with Flexible
server, the service provisions a
hot standby server on the
secondary availability zone with
synchronous replication of data.
In case of zonal failures, the
MySQL database server will
automatically failover to bring
the standby server on secondary
availability zone online to ensure
your applications and database
is highly available and fault tolerant to Availability zone level failures. See high availability concepts for
more details.
Here is Database
Azure the latest update
for Mon MySQL
ySQL FlexibleServer
Flexible Server release – managed
– a fully
MySQL 8.0.21,
service Zonecommunity
running placement, and IOPs scaling
version now available
of M YSQL (Page 6 in
of Flexible
8) Server!!! -Microsoft Tech ...
Getting Started
You can quickly get started by creating your first server using the quickstarts in our documentation on
docs.microsoft.com:
• Create an Azure Database for MySQL Flexible server using Azure portal
• Create an Azure Database for MySQL Flexible server using Azure CLI
• Create an Azure Database for MySQL Flexible server using ARM template
To learn more, you can read our Flexible server documentation for MySQL.
For any questions or suggestions you might have about working with Azure Database for MySQL, you
can send an email to the (Ask Azure DB for MySQL. To provide feedback or request new features, we
would appreciate it if you could make an entry via UserVoice which can help us to prioritize.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 31
running community version of MYSQL (Page 6 of 7)
Flexible server is available in preview on Azure Database for MySQL, with no SLAs and hence is not
meant for production deployments yet. Single Server deployment option continues to be our
enterprise-ready platform, supporting mission critical application and services as I shared in my
last service update.
To help you compare Single server and Flexible server for Azure Database for MySQL so you can figure
out which deployment option is right for you, we’ve created a handy feature comparison matrix for
you in our documentation.
LEARN MORE
In my free time, I enjoy spending time with my 6 year old daughter. It is delight to watch them grow
and how quickly they learn and adapt.
Azure Database for MySQL Flexible Server – a fully managed service Back to TOC | Page 32
running community version of MYSQL (Page 7 of 7)
v
Data Platform Virtual Summit will run from Sep 13 to 18. Pre-Cons on Sep 8 & Sep 9.
Post-Cons on Sep 20 & Sep 21.
A 100% technical learning event with 150+ Breakout Sessions, 20+ Training Classes, 100+ World’s Best
Educators & 54 hours of conference sessions , makes DPS 2021 one of the largest online learning
Virtual World
of DPS
Last year, Data Platform Summit
transitioned into a virtual event. We
brought you 30+ Training Classes, 200+
Breakout Sessions, 170+ World’s Best
Educators, 48 hours of Pre-Cons, 48
hours of Post-Cons & 72 hours of non-
stop conference sessions – DPS 2020 is
the largest online learning event on
Microsoft Azure Data, Analytics &
Artificial Intelligence.
Breakout Session Room
For your comfort, we will covered all time zones,
running continuously. The event is came to your
country, your city, your home! So, now being
virtual, DPS have bigger participation from
Microsoft Redmond based Product Teams &
worldwide MVPs.
Round Tables
Q uick introduction to me. I am not a mental health professional, I’m just a SQL Server DBA who
has experienced work-related issues that have lead to me having mental health issues, and
personal issues that have lead to me having mental health issues that have both effected my
ability to work.
One thing I would like for everyone to remember is “Mental health isn’t just mental illness – it’s part
of being human.” – Anonymous. See mental health is just important as your physical health. The
stigma around mental health is staggering. If you had diabetes, you would seek treatment and take
you insulin. While most people with mental health issues do not seek help and refuse to take medicine
because they see it as a weakness. Seeking help is not a weakness, it actually the opposite, it’s one
the strongest things you can do. Instead, people self-medicate with food, drugs, alcohol, etc. including
myself at one point in my life. It’s hard to accept a mental health diagnosis with the stigma that exists,
but 15 years ago I was diagnosed with bipolar II, complex PTSD, and generalized anxiety. I found these
hard to accept and did not want to stick to medication regimen or even go see the doctor, luckily I
stuck with and I’m stable now more so than then. I personally think everyone could use therapist to
help with the bumps in life, and there is no shame in that or seeing a psychiatrist.
For IT professionals there are four things that will cause an individual to possibly develop anxiety or
depression around their job: burnout, stress, harassment, and bullying. Burnout comes from working
on the same thing all the time and working extralong hours, I have been known to do this and take on
extra projects outside of work and just crumble with overwhelm when looking at my calendar. I would
personally take the quiz at http://burnoutindex.org and gauge how burnt out you might be. You might
Now that we have talked about all the bad stuff what can we do about it. One talk more openly in the
workplace about your struggles with mental health. Let us break the stigma. First, seek in medical
help you may need to treat your mental health while you get the situation under control. See a
psychiatrist, or your primary care doctor or talk to a therapist. Stop any self-medicating you are doing
and let the professional’s help. Remember on this matter it may take several tries at different
medications to get one that works for you so don’t give up on the doctors. Talk to your boss about
anything over stressing you about your job and see what can be changed if nothing it is probably time
to find a new job. Put down your cell phones and work computers when it is outside your office
hours. Let the on-call person deal with it and you relax and work on your hobbies or other projects.
If the stress if coming from harassment or bullying, you may need to go to your human resources
department to file a formal complaint.
Other things you can do for yourself is to make sure you are eating well, exercising, and sleeping well.
These are fundamental to physical and mental health. Develop some hobbies away from the
computer (right here is me calling the kettle black,
so feel free to follow up with me to make sure I am
taking my own advice in a couple of months). Right
now, with the news and negativity get away from
social media or if you like Twitter use muted works.
Chrissy Lemaire has a great list in GitHub that can
get you started and save your sanity.
Finally, I will sum up with more of my story. In 2018, I switched jobs and had half the company laid
off which cause me two stressors. Then I got in a wreck at SQL Saturday LA in my rental car. Then I
switched therapist. Then I help a friend through a crisis. I was travelling to two SQL Saturdays a month.
Then I started a different job because the first one scared me after they laid off half the people. Do
you see the stress adding up here? Meanwhile I was doing nothing to take care of myself besides
taking my meds and trying to contact my doctor when I went into full blown mania from the bipolar
II. Because of this I landed in psychiatric hospital and it took me a year to fully recover back to normal.
Don’t be me seek help earlier, don’t keep adding to your stress, talk to someone before it gets out of
control. But do help me STOP THE STIGMA! Image Credits: Unsplash
Tracy is a Database Superhero and Microsoft Data Platform MVP. She has spent
over 20 years in IT and has used SQL Server since 1999.
LEARN MORE
Learn More
www.SQLMaestros.com @SQLMaestros
Learning
Opportunities
SQL Day
10-12 May 2021
SQLDay is the largest conference focused
on Microsoft Data Platform – databases,
Big Data, Business Intelligence and
advanced data analysis.
Data Weekender
15 May 2021
Data Saturday - Data Toboggan - Cool
A Virtual Popup Microsoft
Runnings
Data Conference
12 June 2021
Data Saturdays is a place for the data community
to run small regional events with little outlay
space yet!
GroupBy
25-26 May 2021
GroupBy is free data platform training
by the community, for the community. Data Platform Virtual Summit
13-18 Sep 2021
Accelerating Data Driven Success
Data Ceili
28 May 2021
Data Céilí is Ireland's newest Video Channels
data platform event.
SQLServerGeeks
DataMinutes
11 June 2021
DataMinutes is the fastest event
in the Microsoft Data Platform PASS
space yet!
O PENJSON was introduced in SQL Server 2016 and is a table-valued function that parses JSON
formatted text and returns objects and properties in form of key:value pairs. These pairs can
be used presented as rows and columns, or as a rowset view over JSON file.
This ability to extract the objects and parameters (or keys and values) in a rowset view, opens up a lot
of potential useful T-SQL techniques that will go beyond reading JSON files or JSON formats.
Daily wrangling and engineering data will challenge you with variety of tasks, that usually end up too
complex for later maintenance or might pose a performance issue. OPENJSON table-valued function
has been many times overlooked (among those are also CROSS APPLY, STRING_ESCAPE, STRING_AGG,
STRING_SPLIT, TRY_CONVERT, CUME_DIST, LAG, LEAD, FIRST_VALUE) not because people would not
heard about it, but – from what I have seen – it is immediately associated with JSON format and people
simply ignore it.
Showing two examples that have proven really helpful over past years and has helped me and other
data analysts, developers, scientists many times. First example will be on selecting values and second
one on comparing values. Both cases can be used in different scenarios, different industries, but it’s
simplicity can be really helpful. Both demos are using Master database for the simplicity and brevity
but I would propose using your own database
Selecting Values
Many times, you want to have a set of values (as a string with separator) introduced into query. You
can either hard-code the values (which I would not recommend), you can iterate through the list, use
XML FOR PATH clause, create a temporary object and many other solutions. Since OPENJSON is a
table-valued function, you can simply use it with JOIN statement to pass the parameters
USE [Master];
SELECT *
FROM sys.objects AS o
JOIN sys.schemas AS s
ON s.schema_id = o.schema_id
JOIN OPENJSON(@TableID) AS d ON o.Object_ID = d.value
-- returns same result set if used explicit SELECT value FROM statement
-- INNER JOIN (SELECT value FROM OPENJSON(@TableID)) as d ON o.Object_ID = d.value
Comparing Values
OPENJSON can also be used to compare the set of values, given the same key. Imagine a JSON file
with the following keys and values:
[{
"ColA": 10,
"ColD": "2021/04/28",
"Name": "Table1"
}, {
"ColA": 20,
"ColD": "2021/04/28",
"Name": "Table2"
}, {
"ColA": 30,
"ColD": "2021/04/29",
"Name": "Table3"
}]
And your case is to find all the differences between Table1 and Table3 on any given attribute.
OPENJSON would give you the capability to intricately pivot the data (or values) over the same key
and either show all the data or simply use/show where there are differences or match.
USE [Master];
SELECT
master_db.[key]
,master_db.[value] AS master_values
,model_db.[value] AS model_values
,msdb_db.[value] AS msdb_values
FROM OPENJSON ((SELECT * FROM sys.databases WHERE database_id = 1 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS master_db
INNER JOIN OPENJSON((SELECT * FROM sys.databases WHERE database_id = 3 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS model_db
ON master_db.[key] = model_db.[key]
INNER JOIN OPENJSON((SELECT * FROM sys.databases WHERE database_id = 4 FOR JSON AUTO,
WITHOUT_ARRAY_WRAPPER)) AS msdb_db
ON master_db.[key] = msdb_db.[key]
Since the OPENJSON function pivots the keys and values, it is much easier to filter out the rows (that
are columns in SELECT * FROM sys.databases statement) by applying WHERE clause or filter out
values in ON clause.
I have seen this concept first by my friend and fellow MVP, Miloš Radivojević and he introduced it in
the book “SQL Server 2016 developer’s Guide” and from that time, I have been using it in many
reports, queries, from Sales data (comparing complaints or searching out differences among
customers) to parameter sweeping for machine learning models. Truly simple, yet powerful
approach.
LEARN MORE
O ne of the things that I have often seen developers ignore in building applications is security.
Too often they either do not understand or take the time to test how different logins and
users might interact with their code and often assign too many permissions. In fact, one of
the main problems in SQL Server for decades has been developers requiring dbo, or worse, sa
permissions for their code.
This is not because the application needs those permissions, but because the developers didn’t bother
to create a better security structure.
In this article, I will look at a couple of tricks that can help you ensure that you easily incorporate a
security model as you are writing your application that allows you to test in the same way a user will.
Preparation is Key
When a developer is working on an application, often they connect to SQL Server with their own
credentials. These are often sa or dbo, which gives the developer a skewed view of the security model.
SQL Server tries to be secure, so new accounts don’t have rights to anything by default.
A good technique when you start an application is to create a couple of users and roles to help you
easily test your code. These give you different views into how your application is actually working for
non-privileged users. Here is the type of script that I keep around for beginning work on any
application:
CREATE LOGIN Joe_Admin WITH PASSWORD = 'Dem012#4'
CREATE LOGIN Joe_User WITH PASSWORD = 'Dem012#4'
GO
USE MyNewApp
GO
CREATE USER Joe_Admin FOR LOGIN Joe_Admin
CREATE USER Joe_User FOR LOGIN Joe_User
GO
CREATE ROLE AppAdmin
CREATE ROLE AppUser
GO
ALTER ROLE AppAdmin ADD MEMBER Joe_Admin
ALTER ROLE AppUser ADD MEMBER Joe_User
Now, as I build objects, I will assign them permissions for the roles. For example, if I add a table and
stored procedure, I’ll use a script like this:
CREATE TABLE OrderHeader
( OrderHeaderID INT
, OrderDate DATE
, Complete BIT);
GO
With either of these techniques, I can keep testing the code I am writing under different contexts to
be sure that when the application is deployed, it does not require any special privileges.
In these situations, I would start to use roles as a developer, in addition to granting individual rights.
This helps me quickly add a new user to test something, but also starts to show other developers or
administrators how cumbersome individual rights are. This is the first step to refactoring to a better
security model. This is true whether you use SQL, Windows, or AAD authentication.
Summary
Having a known set of logins and users makes development work easier. I prefer to use standard
names on my development databases, with standard roles that simulate the way that different users
will connect in production. This ensures I am testing security under contexts other than my own.
I have shown two ways to do this, with separate query windows for each login and by changing
context. Of these, I find that separate windows, often on separate monitors allow me to keep
developing in one window without confusion. I can easily copy and paste code from my developer
connection to a normal user connection for easy testing. The context switch sometimes causes me
issues, especially when I create errors and the REVERT doesn’t execute.
It is important to test your application, both the features and security. Many of the software bugs and
problems experienced by users come from inadequate testing. With many systems under constant
probing and attack, good development habits can help ensure we do not accidentally release code
without vulnerabilities that could cause data breaches. This also helps ensure we deploy code that our
users can execute without simple security errors.
LEARN MORE
L et’s be honest: staying up to date on Microsoft technologies is not trivial. Overall, this is a good
thing. Our team is constantly listening to your feedback, suggestions, and issues. Before you
know it (literally), we’re implementing those changes and adding capabilities to our suite of
products. I am often asked a question like, “What’s the roadmap?” Usually, I can comment on themes
and things we’ve announced, but I can’t say much more than that. However, I generally share things
that have come out in the past month, either in public preview or general availability, and a good
portion of whoever is listening learns something new. So, while overall we’re innovating quickly and
that is helping customers, we’re not making it easy for you to learn what’s new. And, with the year
we’ve had, it’s hard to have those hallway conversations with your colleague that heard about that
new thing that you’ve been wondering about for a while.
That’s where Data Exposed comes in. Data Exposed is a series that we brought back to life in 2019. I
stumbled into Channel 9 Studios at Microsoft Headquarters in Redmond, WA to chat with the
producer. I asked her if we could start a show focused on Azure Data. To my surprise (since I had no
funding or video experience), she said yes! We started recording short episodes in the ‘self-hosted’
studio. This is a small room in the back of the studio where speakers can record themselves without a
producer. It’s a cool setup. Anyways, we started by recording episodes with team members that were
willing to make the trek to Building 25 to sit in a small room and record.
Fast-forward to 2020. Everything went virtual and we were out of episodes. Channel 9 Studios started
recording episodes on Skype. Being not based in Redmond myself, this opened the door for me to
come back and host the show, equipped with an awesome co-producer, Marisa Brasile (she really is
the engine that keeps this train moving!). Not only did virtual recording open the doors for me, but it
opened the doors for more than 50 Program Managers, Engineers, and Microsoft MVPs from around
the globe (and counting). We could now record with anyone, anywhere!
With the lack of in-person conferences for our team members to make their announcements and
reach customers, Data Exposed became a key part in getting awareness out about new (and existing)
capabilities in Azure SQL and SQL Server. We started to expand and occasionally host episodes with
teams like Azure Data Factory, Azure Synapse, Azure Data Explorer, and more. In 2020, we released
two episodes every week (Tuesday and Thursday) and demand grew significantly.
Today, we release short episodes every Thursday, we stream every Wednesday, and we release
episodes with MVPs on the last Tuesday of every month. Data Exposed has been a lot of work but also
a lot of fun, and we hope it provides value to you and your organizations. Thank you for your support,
and if you ever have feedback, please let us know. To connect with us, you can follow our team on
Twitter @AzureSQL, and you can subscribe to our YouTube Channel at https://aka.ms/azuresqlyt.
LEARN MORE
Learn More
Learn More
Learn More
Solving these problems is often seen as a reactive task. An application performs slowly, someone
complains, and a developer or administrator needs to research, find the source of the latency, and fix
it (somehow). Often, though, the solution is one that could have been implemented proactively as
part of the original release of the offending code.
These alternatives can be seen as choices we make in application development every day:
1. Do it right the first time.
2. Do it quickly the first time.
While silly, these do represent real organizational challenges, decisions, and decision-making
processes that are not silly.
I can think of a seemingly endless list of mistakes made over the years that were the direct result of
speed over precision. While not all errors can be avoided in life, there is value in preventing as many
as is reasonably possible up-front. This also has the bonus of improving our sleep schedules at those
times when bad things happen. Therefore, striking a comfortable balance between design and
architecture and technical debt is a valuable skill in software development.
Here are some examples and how they impacted real projects, software, and people. The names and
details are different, the but the mistakes illustrated have been made many times by many people.
Good Database Architecture is the Best Optimization (Page 1 of 6) Back to TOC | Page 52
Data Retention, Who Needs It?
Creating a new table is a common task. What is not common enough, though, are the questions we
should ask ourselves when creating a new data structure.
Imagine a log table that will accept application log data on a regular basis. The table is created as an
afterthought with no additional considerations as to how it will be used in the future:
The application begins to run for the first time and everything is great! 6 months later, though,
developers complain that application logging is slow. In addition, the application database has been
growing unusually large, consuming an unexpected amount of storage and backup resources.
What was forgotten? Retention! When creating new data, determine a retention policy for it and
ensure that computing resources can handle the associated data growth over time.
A retention period for data could be a week, a month, a year, or forever, depending on how quickly it
grows and what it is used for. In the log example above, developers likely would have assigned a
retention period of 1 week (or maybe a month) to the data and cleaned up any older data during a
low-volume time.
OK, problem solved! A retention process is created that cleans up data older than a week each
evening. The cleanup process takes an exceptionally long time to process, though. So long, that it is
stopped and investigated. What else was forgotten? Indexes! The table above has no clustered or
non-clustered indexes. With each cleanup of old data that occurred, the table had to be scanned. In
addition to being slow, that scan will block other processes that try to log to the table. The following
adds a clustered primary key and a supporting non-clustered index on log_time:
If this table is highly transactional, even during less busy times, then the deletions made as part of
retention could be batched. This reduces each transaction size and reduces contention with other
transactional processes running at the same time.
The Data Type Follies
Data structures are easy to create and hard to change. Once applications, reports, APIs, and users are
relying on a specific database schema, changing it becomes challenging. The more time that passes,
the more work is needed. Choosing the best data types on day one can save more work later and as
a bonus, help prevent bad data.
Good Database Architecture is the Best Optimization (Page 2 of 6) Back to TOC | Page 53
Consider the following table:
CREATE TABLE dbo.sales_transaction
After our previous lesson, I made sure to include a clustered primary key, some foreign keys, and
created an archival process for any transactions over 2 years old. Things are going great until a
developer reports errors in production. I take a closer look and discover the following row in the table:
The transaction time is on September 31st?! That is not a real date, even during the longest and busiest
of months! Storing the date as a string seemed reasonable – and made saving the data from the
application quite easy! The right choice, though, was a data type that represented a date & time.
Then, when September 31st was entered, it would throw an error, rather than create bad data.
A few days of work later, a change is deployed and the table now contains a DATETIME:
For good measure, I also change the shipping_date column to a DATETIME so similar problems cannot
happen there. As a bonus, performance improved on the table as the implicit conversions between
DATE and DATETIME values in the application were not being compared to VARCHAR values in the
table, allowing a non-clustered index on transaction_time to yield index seeks instead of index scans.
A month later, though, another error crops up related to the shipping date. Some investigation reveals
data that looks like this:
Good Database Architecture is the Best Optimization (Page 3 of 6) Back to TOC | Page 54
That is not right! The shipping date is a DATE. There should not be a time component to
it…but…because the data type allowed it, some code somewhere inserted it. While the fix itself was
easy – truncate the TIME portion and alter the column to be a DATE, it took some development time
and a deployment to fix, which meant a late night working that I would have preferred doing anything
else. The new version of the table looks like this:
CREATE TABLE dbo.sales_transaction
( transaction_id INT NOT NULL CONSTRAINT PK_sales_transaction PRIMARY KEY
CLUSTERED,
This time, I get almost a year of peace and quiet on this table, until one day there is an application
outage when all sales stop saving their data to the database. Checking the error logs and testing
reveals the following message:
It turns out that this table had an exceptionally high volume of transactions and after a year hit its
2,147,483,647th sales transaction. When transaction_id #2,147,483,648 was inserted, the above error
was the result. No one had told me that this table would see billions of transactions! Maybe I should
have asked?
The problem was worked around by setting the application to use negative numbers as transaction
IDs. This bought some time, but another long night lay ahead of me where I had to create a new table
with a BIGINT transaction_id column, backfill it with the existing data, and then swap the tables so
that that became the active table, complete with historical data.
The lesson of this escapade is that choosing data types is a significant decision. They reflect data size,
content, and longevity. Knowing up-front how much data will be created, how it will be used, and
what it represents allows for smart data types to be chosen immediately. This helps prevent bad data
and avoids painful emergencies that my future self would prefer to avoid.
To NULL or Not to NULL, that is the Question
This story starts with a simple table:
CREATE TABLE dbo.person
( person_id INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_person PRIMARY KEY
CLUSTERED,
first_name VARCHAR(100) NOT NULL,
Good Database Architecture is the Best Optimization (Page 4 of 6) Back to TOC | Page 55
last_name VARCHAR(100) NOT NULL,
email_address VARCHAR(100) NOT NULL,
date_of_birth DATE NOT NULL);
After its release, a question is received regarding what to do if a person does not provide a date of
birth. Following some discussion, it is decided that an unknown date of birth can be represented by
NULL. The change is made:
Life goes on until one day an annoyed analyst asks you why there are so many people in the system
that are exactly 121 years old. Some head-scratching and review of timing reveals that the default
date of birth was polluting any date calculations that happened to use date of birth. In addition to
absurd ages, the system was also sending out birthday offers to all people with the dummy date of
birth, wishing them a happy birthday on January 1st.
This sequence of events illustrates how a simple problem can result in real-world awkwardness. The
easiest solution is to make the date of birth a required field at all levels of the application. This ensures
that dummy data or NULL is not needed. Alternatively, if date of birth is truly optional, then a handful
of legit solutions exist, including:
1. Make date_of_birth NULL and ensure that this is documented and handled effectively.
2. Normalize date_of_birth into a new optional table. This adds complexity and is not my
preferred solution, but is a way to normalize, avoid NULL, and ensure data quality.
As always, before making decisions based on data, filter accordingly. If a data element is optional,
then decisions made with that column need to properly omit or take into account scenarios where
data is not provided.
Where Does This Lead Us next?
The moral of this short series of stories, code, and vague attempts at humor was to remind us that
database architecture and performance optimization go hand-in-hand. Both address the same
challenges in different ways and at different times within a software design life cycle.
Asking (and answering) good questions up-front can allow for better data architecture and remove
the need for dramatic bug-fixes and changes later on. While not all problems can be proactively
solved, a keen eye for detail can prevent many future problems ranging from inconveniences all the
way to full-scale disasters.
Good Database Architecture is the Best Optimization (Page 5 of 6) Back to TOC | Page 56
As this list of design questions grows, so does our experience with seeking the answers to them and
turning that information into well-architected database objects and code. We all have crazy stories
of how bad data choices led to messy clean-up operations and those tales we share over a drink may
very well be the motivation and foundation for future good database architecture decisions.
LEARN MORE
Good Database Architecture is the Best Optimization (Page 6 of 6) Back to TOC | Page 57
Training Classes are 8 hours, focused,
deep-dive, demo-based virtual classroom
training. Each class will run for eight
hours in total, four hours each day, for
two consecutive days. Each Training
Class is designed to offer intermediate &
advanced-level training on a specific
topic/subject. These classes offer more
knowledge, skills, and expertise beyond
the summit
Back content.
to TOC | Page 58
Back to TOC Learn More