OceanofPDF.com the Definitive Guide to KQL - Mark Morowczynski
OceanofPDF.com the Definitive Guide to KQL - Mark Morowczynski
com
The Definitive Guide to KQL
Using Kusto Query Language for Operations, Defending, and Threat
Hunting
Mark Morowczynski
Rod Trent
Matthew Zorich
Microsoft Press
OceanofPDF.com
The Definitive Guide to KQL: Using Kusto Query Language for operations,
defending, and threat hunting
ISBN-13: 978-0-13-829338-3
ISBN-10
Library of Congress Control Number:
ScoutAutomatedPrintCode
Trademarks
Every effort has been made to make this book as complete and as accurate
as possible, but no warranty or fitness is implied. The information provided
is on an "as is" basis. The author, the publisher, and Microsoft Corporation
shall have neither liability nor responsibility to any person or entity with
respect to any loss or damages arising from the information contained in
this book or from the use of the programs accompanying it.
Special Sales
For information about buying this title in bulk quantities, or for special sales
opportunities (which may include electronic versions; custom cover
designs; and content particular to your business, training goals, marketing
focus, or branding interests), please contact our corporate sales department
at [email protected] or (800) 382-3419.
For government sales inquiries, please contact
[email protected].
Editor-in-Chief
Brett Bartow
Executive Editor
Loretta Yates
Associate Editor
Shourav Bose
Development Editor
Rick Kughen
Managing Editor
Sandra Schroeder
Project Editor
Tracey Croom
Copy Editor
Rick Kughen
Indexer
Proofreader
Technical Editor
Corissa Koopmans
Editorial Assistant
Cindy Teeters
Interior Designer
Cover Designer
Compositor
Graphics
OceanofPDF.com
Contents
OceanofPDF.com
Table of Contents
OceanofPDF.com
Foreword
This book is based on the experience and expertise of Mark, Matt, and Rod,
Microsoft employees and KQL experts. They have authored this book to
help individuals master KQL and to help organizations use the technology
to improve their operational and security posture with data. Readers will
also benefit from the additional queries and content contributed by different
product managers, service engineers, and cloud solution architects who use
KQL daily.
Ann Johnson
Security, Microsoft
OceanofPDF.com
Dedication
For my friends and family, who I don’t get to see nearly enough,
particularly all my aunts, uncles, and cousins. And to all the defenders out
there keeping the world safe. Thank you.
–Mark
–Rod
For my family, Megan, Lachlan, and Matilda, for all your patience, love,
and support while I was writing this book.
–Matt
For my newborn son and my mother, who allowed me the time to work
with these great authors and proof queries while my son slept.
–Corissa
OceanofPDF.com
Acknowledgments
We would like to express our sincere gratitude to all the people who have
supported us while writing this book. Without their help and
encouragement, this book would not have been possible. This also includes
the folks at Pearson/Microsoft Press: Loretta Yates and Shourav Bose for
believing there was an audience for this and keeping us on track, and Rick
Kughen, who turned our drafts into a book you are reading!
The reach of KQL in the Microsoft ecosystem is broader and more complex
than any three people could possibly hope to cover. We would like to thank
our colleagues, who have shared their expertise and insights on various
operations and cybersecurity topics with KQL. There were so many great
suggestions we couldn’t even fit them all in the chapters, but they all made
it to the GitHub repository. We’d like to graciously acknowledge the help
and assistance from the people at Microsoft:
We’d like to thank Ann Johnson for writing the foreword and her tireless
leadership at Microsoft and in the information security industry. Security is
truly a “team sport,” and we are grateful to have you on our team.
Special thanks to Corissa Koopmans, our technical reviewer, who has been
with us from the very start, going above and beyond multiple times
throughout this book by challenging us, offering suggestions, and being
willing to run through more queries than you can even imagine. We cannot
thank you enough for your time, effort, and support throughout this entire
process. Any mistakes in the book are solely because of the authors.
We want to thank you, the reader, for your interest and KQL curiosity. Our
goal with this book was twofold. We want you to improve your
environment's security posture and operations and add KQL to your
professional skill set. When you finish this book, you’ll find that you are
actually just beginning! We also hope it will inspire you to explore further,
discovering new ways to continue improving the profession. We welcome
your feedback and comments, and if you write a great query, tell us. We
look forward to hearing from you!
Mark Morowczynski
Seattle, Washington
Rod Trent
Middletown, Ohio
Matthew Zorich
OceanofPDF.com
About the authors
Mark Morowczynski
Rod Trent
Rod Trent is a senior program manager at Microsoft, focused on
cybersecurity and AI. He has spoken at many conferences over the past 30-
some years and has written several books, including Must Learn KQL:
Essential Learning for the Cloud-focused Data Scientist, and thousands of
articles. He is a husband, dad, and first-time grandfather. In his spare time
(if such a thing does truly exist), you can regularly find him simultaneously
watching Six Million Dollar Man episodes and writing KQL queries. Rod
can be found on LinkedIn and X (formerly Twitter) at @rodtrent.
Matthew Zorich
Matthew Zorich was born and raised in Australia and works for the
Microsoft GHOST team, which provides threat-hunting oversight to many
areas of Microsoft. Before that, he worked for the Microsoft Detection and
Response Team (DART) and dealt with some of the most complex and
largest-scale cybersecurity compromises on the planet. Before joining
Microsoft as a full-time employee, he was a Microsoft MVP, ran a blog
focused on Microsoft Sentinel, and contributed hundreds of open-source
KQL queries to the community. He is a die-hard sports fan, especially the
NBA and cricket.
OceanofPDF.com
Introduction
"Attacks always get better; they never get worse" (Schneier, 2011, para. 4).
Digital transformation has hit every large and small business in the world. If
you were born before the year 2000 and look at how you book travel, order
food, and find tickets for an event today, you will realize the methods and
technologies you use are much better than they once were. They are much
more digitized and often provided by very different vendors. The cloud has
brought this disruption to the market of ideas and innovation at a global
scale. This digital transformation of our world has been very disruptive to
all industries and organizations, causing cloud adoption at an unprecedented
scale. Adopting the cloud is no longer seen as a luxury or a thought
experiment. It is imperative to remain competitive and relevant as a
business. It has fundamentally shifted the way a business operates.
But how do organizations speed up this detection process with all this data?
The answer is moving from raw data ingestion as a traditional Security
Incident and Event Management (SIEM) to a more automated approach on
actionable insights using Security Orchestration, Automation, and
Remediation (SOAR) technologies and integrating toolsets. Figure 1 depicts
modern security operations capabilities.
Figure 1 Turning raw data into insights and action of a modern SOC
SOAR has a few benefits for analysts and threat hunters. First, manual work
should be reduced. Instead of spending time moving between different tools
and consoles, connecting data points together in different languages, more
meaningful work is being done, fighting the adversary. Second, because
automation is happening at machine speed rather than human speed, our
response times are greatly speeding up. Finally, our analysts and hunters
can handle this increase in the scale of the environment, including the
growing number of attacks taking place both in scope and complexity.
This leads us to why you've picked up this book. The language you will use
to unlock these actionable insights and detect the most advanced attacks as
part of SOAR is the Kusto Query Language, better known as KQL, which is
at the heart of the Microsoft cloud for parsing data from various datasets.
You will be able to quickly search through millions of records across
multiple products to determine the scope and detect some of the most
advanced attacks. More importantly, you will take action to remediate it
natively in tools like Microsoft Sentinel and Microsoft Defender.
The KQL language must become second nature for information security
professionals, just as PowerShell or Python is today. Microsoft’s latest
threat actor detections found in blog posts and playbooks and community-
shared detections include KQL queries. These need to be run, modified, and
adapted for your environment to continue driving down that MTTR (mean
time to repair) in an ever-growing environment. Every second counts.
Note
We tried to make this as accessible for a broad range of people with varying
KQL expertise, including those who are leveraging the skills taught here for
the first time, as well as those who have been using KQL for many years. If
you are new to KQL, start with Chapter 1 and work your way forward. If
you are a seasoned KQL expert, quickly skim the first two chapters before
diving into the more advanced topics.
• Text that you type (apart from code blocks) appears in bold.
• A plus sign (+) between two key names means that you must press those
keys at the same time. For example, “Press Alt+Tab” means that you hold
down the Alt key while you press the Tab key.
GitHub Repo
The book's GitHub repository includes all the KQL queries used throughout
this book for easy copying and pasting as well as any of the sample datasets
used in the chapters: github.com/KQLMSPress/.
The download content will also be available on the book's product page at
https://MicrosoftPressStore.com/???
We’ve made every effort to ensure the accuracy of this book and its
companion content. You can access updates to this book—in the form of a
list of submitted errata and their related corrections—at:
MicrosoftPressStore.com/XXXX/errata
If you discover an error that is not already listed, please submit it to us at
the same page.
Please note that product support for Microsoft software and hardware is not
offered through the previous addresses. For help with Microsoft software or
hardware, go to support.microsoft.com.
Stay in touch
OceanofPDF.com
Chapter 1. Introduction and Fundamentals
• Set up the KQL environment and understand the KQL language syntax
• Do you need to hunt for a threat actor or malicious activity and determine
whether we were targeted?
You’ll use KQL to answer these questions and much, much more.
• App services
• Azure Arc
• Azure Stack
• Desktop virtualization
• Firewalls
• Key Vaults
• Storage accounts
Like many programming books, this book is partly conceptual and partly
typing class. You type the example code with this book open on your desk
or a second screen. Unfortunately, nothing quite replaces typing commands
and seeing the same output shown in one of the figures in this book. Seeing
a completely different output than you expected is even better because it
spurs you to trace your steps to see where you accidentally stepped off the
path. You’ll learn more about whatever you are doing that way.
We also have another goal beyond teaching you the KQL language. We
want to make this book as practical as possible, almost like a cookbook
filled with excellent little recipes to use in your environment. Nearly every
KQL query in this book is something that you should be running in your
environment.
• First, when learning a new skill, one of the best ways to learn it is through
repetition. Having KQL queries run in your environment will immerse you
in the language.
• Second, reading this book can help you solve problems and gain insights
into your environment. We have worked with a wide range of customers
throughout our careers and have noticed many trends that apply to nearly all
customers regardless of the size or industry. Instead of requiring you to try
and remember some concept in this book many months later and write a
query to solve the problem, we will cut to the chase and provide that query
and many more in a GitHub repository that can be found at
http://aka.ms/KQLMSPress/GitHub.
• Finally, sample data is useful, but it’s not interesting. So, what if that
server name that means nothing to you is running out of resources?
However, when it is your production server, you will care a lot about
running out of resources.
This is your environment, and running these queries will help you gain
insights, ask more questions, and continue improving based on the results.
Stay curious and keep making those data-driven decisions. If you don’t
have a production environment to run these queries in, don’t worry; the
sample data will suit you just fine! Just remember to come back to these
queries when you do have an environment of your own.
Tip
The information security space has grown from being a side job for a
network or system administrator. Security became a part of their job
because something they managed was attacked. Today, security is a multi-
billion-dollar industry, and universities have degree programs that focus
specifically on information security. We suspect many of you picked up this
book for much of the security aspects.
The title of this book has operations in the title, and that is not an accident.
The authors strongly believe good operational practices are good security
practices. We’ll start with a more obvious example: The operations team
applies their monthly patch schedule to the resources they manage. Failing
to do this consistently leaves an organization vulnerable to whatever flaws
that were patched. Do you have consistent patch coverage for your IaaS
virtual machines? Another example is having the operational rigor and
discipline to urgently apply a critical patch outside the normal process
because of an active attack. Again, good operational practices improve the
security of your organization.
Or let’s say you have an active attacker in your environment, and though
they were not successful in getting the secret yet, your security operations
team to investigate before the attacker is successful. Attackers often leave
traces of their intentions during failed attacks, which could have helped the
security team stop them before the attacker succeeded. Good operational
practice would be to fix that application and make sure it’s pointing at the
correct Key Vault, ensuring your log is as clean and accurate as possible and
allowing future mistakes to stick out.
If you are on the information security side of the house, make sure you
share these KQL queries with your operations team and partner up with
them. If you are more focused on operations, talk to your information
security counterparts about ensuring these fundamentals are covered.
Remember, good operations are good security.
Note
The only thing you’ll need to get started here is a browser and either a
Microsoft Entra ID account or a Microsoft account (MSA). In your browser
of choice, enter aka.ms/lademo, complete the sign-in with either account,
and you will land in the Log Analytics workspace, as shown in Figure 1-1.
FIGURE 1-1 Default view of the Log Analytics workspace
That’s it! That is all you need to do to get started with KQL. This is a Log
Analytics workspace. From here, we will run all our KQL commands
leveraging sample data.
Diagnostic Settings
If you would like to execute these KQL queries against your workload data,
you must leverage Azure Monitor to send the logs to a Log Analytics
workspace. Azure Monitor is a comprehensive monitoring solution for
collecting, storing, analyzing, visualizing, and responding to monitoring
data from your cloud and on-premises environments. A Log Analytics
workspace is used to ingest data from various sources and store it in tables.
Log Analytics is also the underlying workspace for Microsoft Sentinel and
Microsoft Defender for Cloud.
The architecture and design of using Azure Monitor and setting up Log
Analytics spaces to support numerous Azure services is well beyond the
scope of this book. However, the key thing to know is to configure the
diagnostic settings to get data from one of these services into a Log
Analytics. Typically, each service will have a diagnostic setting where you
can selectively pick log categories, metrics, and where they should be
stored. For example, we will cover the log sources available as part of
Microsoft Entra ID, as shown in Figure 1-2.
FIGURE 1-2 Microsoft Entra ID log sources and destinations
Microsoft Entra ID supports the following log categories:
• AuditLogs These Entra ID audit logs contain changes to the object state
in the directory. Examples of this would be a new license applied to a user
object, registering for Self Service Password Reset, or an updated attribute
on the user object. This category also includes changes to applications and
groups.
As you can see, a tremendous amount of data is available to us, and this is
just one service! It’s important to consider what logs are necessary and
useful for the business.
Note
We primarily focus on Log Analytics in this book, but your data can be sent
to additional destinations that your data can be sent to, as shown in Figure
1-2.
• Stream to an Event Hub This allows you to send your log data to a
security information and event management (SIEM) tool of your choice.
Many SIEM providers, such as Splunk, ArcSight, IBM QRadar, and Sumo
Logic, have built-in plugins to easily ingest data from an Event Hub.
• Sent to partner solution This allows you to send your log data to an
independent software vendor (ISV) integrated with Azure Native ISV
Services. This is a growing list, but at the time of this writing, it includes
services such as Elastic, Datadog, Logz.io, Apache Kafka on Confluent
Cloud, and Cloud NGFW by Palo Alto Networks.
Kusto.Explorer
For the authors of this book, 99 percent of the time, they leverage the
browser when writing their KQL queries when working with customers.
However, we understand that might not work for everyone, and some folks
just want to use a desktop application for this. If that is you or someone on
your team, Kusto.Explorer is the application to use. Just as you can in a
browser, you can query your data, search your data across tables, visualize
your data in various graphs, and share your queries and results.
Kusto.Explorer is a Windows-only desktop application and can be
downloaded from https://aka.ms/ke. It has a similar user interface to many
office applications (see Figure 1-3).
FIGURE 1-3 Kusto Explorer application
Another client-side tool for KQL is the Azure Data Studio. At the time of
this writing Azure Data Studio supports Windows, macOS, Linux
distributions of Red Hat, SUSE, Ubuntu, Debian, and Windows Subsystem
for Linux. This tool focuses more on connectivity to databases such as
Azure SQL, SQL Server, MySQL, PostgreSQL, and CosmosDB. However,
it does have a KQL extension.
Similar to those who will only use desktop applications, some people prefer
to do as much as possible through the command line. You can use the Azure
CLI and the az monitor log-analytics query command.
Similar to the KQL queries we will create in the browser, they can be used
directly with the azure monitor Log Analytics commands as part of the
Azure CLI:
Note
We are finally ready to dive into the world of data with KQL. This section
will give you the basics you’ll need to master more difficult and complex
queries in later chapters.
What is KQL?
KQL stands for Kusto Query Language. Exploring your data and
discovering patterns, identifying anomalies and outliers is what it does best.
This data is stored in different tables. In these tables, there are columns
where the data actually resides. This is very similar to SQL. KQL was also
designed and developed to take advantage of cloud computing through
clustering and scaling compute. You can process enormous amounts of data
very quickly. This is accomplished through read-only queries, which are
case-sensitive, including table names, table column names, operators, and
functions. You can turn this data into actionable insights by filtering,
analyzing and preparing data.
Tip
Now that we have our data, what do we want to do with it? Often, we are
trying to summarize the content of the query. There are many ways we can
do that, which we’ll cover throughout this book. Next, how do we want to
order the results, or do we want to order it all? Largest to smallest? Least to
most? Finally, we decide what data we want displayed in the results. The
flow will look like this:
TableName
| filtering data
| aggregating data
| ordering data
| modify column output
Let’s try running a query to see the results. Enter the following in your Log
Analytics query window. The output should be similar to Figure 1-4 but
will be slightly different based on whether you are using your own data or
the sample data. These slight differences will apply for most of the queries
in this book.
There you have it! You just wrote your first KQL query, and we’ve learned
something very valuable about the environment: In the last hour, we’ve had
20 interactive sign-ins where an Entra ID conditional access policy was not
applied! This is something we probably want to investigate.
Tip
As you write and run more queries, this pattern of filtering, aggregating,
ordering, and finally outputting data will become more and more natural.
Tip
Holding the shift key and pressing Enter will also run the
KQL query. This is the same as clicking the Run button.
How do we know what’s available for querying in these tables? What data
is stored in the SignInLogs table? One of the most useful functions you
can run when interacting with a new data source is getschema . This will
produce a list of all the columns in the table and their data types. Give it a
try by running the following command; the output should match Figure 1-5.
SigninLogs
| getschema
FIGURE 1-5 The getschema output of the SigninLogs table
As you can see in the lower-right part of Figure 1-5, there are 76 total
columns in the SignInLogs table. We can see the column names, too,
which is how we knew to ask for ConditionalAccessStatus and
what data types are stored in them. This is very important because this will
allow us to not only filter the data but also understand how we’ll be able to
interact with it in the future.
There are 10 data types in KQL that you should be aware of. If you are
familiar with other programming languages, these will be similar to what
you already know:
• Dynamic is a special data type that can take any value from the
previous bullets, as well as arrays and a {name = value} property bag
that appears to be like JSON. We’ll cover the dynamic type in more
detail in Chapter 3, “Unlocking Insights with Advanced KQL.” Dynamic
objects will need to be parsed and often casted into the correct data type.
• First is the basic category you will use repeatedly in KQL: string ,
bool , int , long , and real .
There are also three kinds of statements you will make in KQL.
• Let statement These are used to set variable names equal to an expression
or to create views. These are used mostly to help break complex
expressions into multiple parts, each represented by a variable, setting
constants outside the query to aid in readability.
• Set statement These are used to set the query duration. These are more
often used in Azure Data Explorer; we won’t cover this kind of statement in
much detail.
Now that we’ve covered some of the fundamentals, such as getting the data
stored, what tools we can use to access the data, and what the different data
types are, the vast majority of this book will focus on really two
fundamental things at its core, searching through and filtering of the data.
We know you might be thinking, that cannot possibly be true. All of this
just to find and filter data? It absolutely is. This skill will be your KQL
superpower. When you are done, you can write queries that provide the data
that gives specific, actionable insights you and your business are looking
for, not thousands upon thousands of results. The better you can filter, the
faster the results will be returned.
Search Operator
Now that you have some sort of data, you’d like to learn how it’s used in
your environment. How do you get started? We really need to answer two
questions:
To answer those questions, we’ll use the search operator to search for
the specific string text pattern—which is case-insensitive by default—
across multiple tables and columns. Set the time range to Last 7 Days and
run the following query. You should see an output similar to Figure 1-6.
search “deviceinfo”
FIGURE 1-6 Device info found in tables and columns
The data we are looking for does exist! The result in this example returned
42 rows. But we need to determine which table has the data we are looking
for. We’ll then add to our query to tell us which tables have the data we are
looking for. Add the following line to your query; the results should appear
like Figure 1-7:
search "deviceinfo"
|distinct $table
Note
At this point, it might be easy to think you can use KQL just like any other
search engine. You’ll pass whatever it is you are looking for to this search
operator and see the results. This will not work as expected. Let’s try
another example. Run the following query; the results should appear similar
to Figure 1-8.
search "browser"
FIGURE 1-8 Browser not found
There are two things you should notice. First, it didn’t work. This data
exists but was so large that it exceeded the limit for what could be returned.
Secondly, it also was very inefficient as it took approximately 20 seconds to
complete. However, we can target the specific table we want to search.
Let’s try this again, but we’ll focus on the SignInLogs table this time.
The output should be similar to Figure 1-9.
More than 15,000 results were returned from the SignInLogs table in 6
seconds! We’ll need to do much more filtering to get to the results we care
about. The search operator also supports the use of the * wildcard.
A few examples
We’ve found the data we are looking for exists in a table. If you are just
getting started with KQL or this is a table you are not familiar with, it can
feel a bit daunting to work with it. How do we know what else is in the
table? What columns are normally populated? You can also be writing a
complex query, and you need a way to check the sample of the results
before pulling everything. The take operator or the limit operator
will return a specific number of rows you specified without guaranteeing
which records are returned.
The take operator or the limit operator are functionally the same
thing. There is no difference, and you can use whatever one you prefer. We
will spend much of our time in the SignInLogs, so let’s get familiar with the
data we previously saw running the getschema command. Run the
following query; the results should be similar to Figure 1-10:
SigninLogs
| take 5
FIGURE 1-10 The take command with one record expanded
Feel free to expand the row and look through the different data we have
available to us. There are two things to note with the take and limit
commands:
• They do not guarantee consistency in the results. You can rerun the same
query and likely see different results returned, even if the data set didn’t
change.
These commands are an excellent way to quickly return and browse data
interactively to confirm that we found the data we were looking for. We can
then move on to the next step: filtering with the where operator.
Where Operator
Now, we are ready to move on to how we will filter down the vast sets of
data we’ve found to pull out the things we care about. To do that, we will
use the where operator. This operator will be in nearly all of your queries.
The where operator allows you to filter a table to the subset of rows that
satisfy your expression. You’ll sometimes see this referred to as a predicate
in the documentation.
You will compare three main types of data: strings, numeric/dates, and what
we’ll call “is empty.” In this section, we will focus on the string
comparison. The next section is focused on time operators, so we’ll cover
those comparisons there. We’ll then cover “is empty” in the “Dealing with
Nulls” section later in this chapter. Let’s look at a very simple example in
which we want to look for all the users who have failed to validate their
credentials in the SigninLogs table. Enter the following query; the
results should be similar to Figure 1-11:
SigninLogs
| where ResultType == 50126
FIGURE 1-11 Results that match code 50126
The first operator we will use is the equals operator, which is doing a
comparison designated by == . This is similar to many other programming
languages you might be familiar with. Another thing to notice is we
returned a week’s worth of sign-in errors—77 results in this sample—in
under two seconds! You can see how long that data query took in the lower-
left corner.
Note
You can find all the various error codes for Microsoft
Entra ID at
https://aka.ms/KQLMSPress/EntraIDErrorCodes.
What if we wanted to see all the sign-in events that were not failures? We
can achieve that by changing the query to look for everything that was not
equal to this result code. Update your query to the following; your results
should be similar to what is displayed in Figure 12:
SigninLogs
| where ResultType != 50126
FIGURE 1-12 Results that do not match the code 50126
The query results aren’t as useful without additional filtering being applied,
but the concept is extremely useful. This allows us to exclude that value
from the column and return the rest.
Tip
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType !in ("0", "50125", "50140")
SigninLogs
| where ConditionalAccessStatus == "Success"
FIGURE 1-13 No results returned for ConditionalAccessStatus ==
“Success”
No results were returned. When you run a query and see no results returned,
you need to ask yourself if that was expected or if your query has a mistake.
In this case, is this tenant not using any conditional access policies, or did
we make a mistake in our evaluation? The answer is we made a mistake,
and it’s an easy one to make. Remember, KQL is case-sensitive. In our
query above, “Success” is not the same as “success.” We can tell KQL that
the query is case insensitive. Update your command below; the results
should be similar to Figure 1-14.
SigninLogs
| where ConditionalAccessStatus =~ "Success"
FIGURE 1-14 Sign-ins where conditional access was applied
That looks a lot better. In order to make sure we are following Zero Trust
principles, such as verifying explicitly, we should make sure there is a
conditional access policy in every request scope.
Note
We can also update the query to show all the conditional access statuses that
were not successful. Update your query with the code below and run it.
Your result should be similar to Figure 1-15:
SigninLogs
| where ConditionalAccessStatus !~ "Success"
FIGURE 1-15 Sign-ins where conditional access was not applied
We can now quickly see where conditional access policies are not
successfully applied in our environment. With 11,425 sign-ins returned, we
have a lot of room for improvement. More investigation is needed to
understand why, but we are getting closer to insights and driving
improvements in our environment.
Note
What do you do if you don’t know the exact string you are trying to find?
There are several operators we can use depending on our scenario. We’ll
start with one you will probably use the most: has .
Using the has operator is much more performant than using something
like the contains operator, as you’ll see shortly, because of this
automatic indexing of those KQL terms. Run the following query; your
output should be similar to Figure 1-16.
SigninLogs
| where UserAgent has “Edge”
SigninLogs
| where UserAgent !has “Edge”
FIGURE 1-17 Sign-ins where the Microsoft Edge browser was not used in
the UserAgent value
This would show all the sign-ins where a browser other than “Edge” was
used. Those with a keen eye will have noticed two things:
• Second, in the previous section, we also said that you should use a case-
sensitive search for best performance in your queries. How do we do that
with this operator?
SigninLogs
| where UserAgent has_cs "Edge"
FIGURE 1-18 A case-sensitive search for Edge sign-ins
The same records were returned from our previous query and will be
slightly faster, especially if we have a lot of data in our sign-in logs. Our
previous example of looking for the opposite will also work with the
!has_cs operator. Run the following query. Once again, you should see
similar results as you did previously, but they should return quicker. Your
output should be similar to Figure 1-19.
SigninLogs
| where UserAgent !has_cs "Edge"
FIGURE 1-19 Case-sensititve search for sign-ins that did not use Edge
Tip
DeviceRegistryEvents
| where RegistryKey has @"Exclusions\Paths" and A
("RegistryValueDeleted","RegistryKeyDeleted","Reg
and Timestamp > ago(24h)
| sort by Timestamp
DeviceRegistryEvents
| where RegistryKey has @"Exclusions\Processes" a
("RegistryValueDeleted","RegistryKeyDeleted","Reg
and Timestamp > ago(24h)
| sort by Timestamp
What if we have a string we want to search for that’s not a full string term,
is part of a substring, or is less than three characters like ID , which would
not have a term index created for it? This is a great place to use the
contains operator, which is also case-insensitive by default. It will scan
the columns, looking for that substring to match. Run the following query;
the results should be similar to Figure 1-20:
SigninLogs
| where UserAgent contains “HroM”
FIGURE 1-20 Sign-ins where the browser UserAgent value matched the
HroM substring
If you expand any of the results, you will see “Chrome” listed and that we
matched substrings that were not case-sensitive.
Tip
The following query helps monitor for all secret
operations over the last 24 hours. Frequent secret
operations could indicate an adversary trying to steal
sensitive information. These should be investigated and
understood. –Laura Hutchcroft, senior service engineer
AzureDiagnostics
| where TimeGenerated > ago(24h)
| where ResourceProvider == "MICROSOFT.KEYVAULT"
| where Category == "AuditEvent"
| where OperationName contains "secret"
Much like the has operator, we can do a case-sensitive string match with
contains_cs as well as the opposite search for case-insensitive
!contains and case-sensitive !contains_cs .
Note
SigninLogs
| where AppDisplayName has_any ("Azure Portal", "
FIGURE 1-21 Sign-ins where the application is either “Azure Portal” or
“Graph Explorer”
If you expand any result, you’ll see sign-ins where the application was
Azure Portal or Graph Explorer because both matched. The has_any
operator looks for both of those terms together. In this example, has_any
looks for Azure and Portal or Graph and Explorer . If any of
those two terms are found together, it will return that row.
Another way to search for strings is the in operator. By default, the in
operator is case-sensitive and returns full-string matches. You can also
search for non-matches using the !in operator (case-sensitive). If you
need to search for the exact string match but do not want it to be case-
sensitive, you can use the in ~ operator. Finally, if you want to do a case-
insensitive search for non-matching strings, you can use the !in~
operator.
This operator will only return full strings that match. Notice we say strings
here, not terms. This is also important and will make more sense shortly.
For now, run the following query; the results should be similar to Figure 1-
22:
SigninLogs
| where AppDisplayName in ("Azure Portal", "Graph
FIGURE 1-22 Sign-ins where the application string is either “Azure
Portal” or “Graph Explorer”
SigninLogs
| where AppDisplayName in ("Azure", "Graph Explor
FIGURE 1-23 Sign-ins where the application is Graph Explorer
If you expand any of the results, you will find this query only returned the
Graph Explorer application because that string was a complete string
match; in this example, 25 records were found. Because there were no
Azure-only strings, no results were returned. Let’s try this again, but this
time, change the query to include has_any . The results should be similar
to Figure 1-24:
SigninLogs
| where AppDisplayName has_any ("Azure", "Graph E
FIGURE 1-24 Sign-ins where the application is Graph Explorer or
anything containing the word Azure
Which is the correct one to use? The answer depends on the results you
want. You just need to be aware of this behavior when using the in
operator.
Another common scenario you might need to look for is if a string starts
with or ends with a specific set of characters. A good example here would
be looking for a specific domain in a UPN or a set of characters in a
username to indicate the account type, such as an administrator or service
account. We will use two primary operators: startswith/endswith
and hasprefix/hassuffix . The main difference here is that
startswith/endswith will look at the beginning or end of a string,
and hasprefix/hassuffix will look at the beginning or end of a
term.
SigninLogs
| where AppDisplayName startswith ("Gra")
FIGURE 1-25 Sign-ins where the application string starts with Gra
If you expand any of the results, this should return the Graph Explorer
application. The startswith operator would not be able to match the
Exp string because it is not the start of the string. Try running the
following query to confirm the behavior; the results should match Figure 1-
26.
SigninLogs
| where AppDisplayName startswith ("Exp")
FIGURE 1-26 No results were found for sign-ins where the application
string starts with Exp
SigninLogs
| where AppDisplayName hasprefix ("Exp")
FIGURE 1-27 Sign-ins where the application starts with Exp
The behavior of endswith and hassuffix is the same. Use these if
you are trying to find a specific value at the end of a string or term.
Important
In all our queries so far, we’ve been focusing on filtering some specific data
that meets our criteria as an input. Then, we return all the data that meets
that query, including many columns of that table. As we’ve seen for the
Microsoft Entra ID sign-in logs, this includes many columns—some we
probably don’t care to see each time, such as the TenantID .
In this section, we focus on filtering the output. Usually, we are looking for
a few specific things and want only to see those columns. Perhaps the
column's name doesn’t clarify the report we are trying to make for our
organization, and we want to rename it. We can accomplish those things
with the project operator.
SigninLogs
| where ConditionalAccessStatus == "success"
| project AppDisplayName, Location, UserAgent
FIGURE 1-28 Only the columns we projected show up in the Results panel
The only columns shown in the output are the ones we specified with the
project operator. By default, the name of the column being “projected”
or displayed will be the same name shown in the table. For example, when
we project the AppDisplayName column, the column name is
AppDisplayName . Typically, the column order will also be shown in
the same order found in the table. We can modify both of those using the
project-rename and project-reorder operators. First, let’s
rename the AppDisplayName column to Application . To do so,
you specify the new name of the column followed by the existing column
name, NewName = ExistingName . Run the following query; your
output should be similar to Figure 1-29:
SigninLogs
| where ConditionalAccessStatus == "success"
| project AppDisplayName, Location, UserAgent
| project-rename Application = AppDisplayName
FIGURE 1-29 The AppDisplayName column renamed as Application
The data remained the same, but we changed the column name to
something more suitable for this environment. We can also force the
specific order of the columns using the project-reorder operator. To
do so, we specify the output order we want the columns to be. Run the
following query; your output should be similar to Figure 1-30.
SigninLogs
| where ConditionalAccessStatus == "success"
| project AppDisplayName, Location, UserAgent
| project-rename Application = AppDisplayName
| project-reorder Location, Application, UserAgen
FIGURE 1-30 Column order updated
The order of the columns changed based on what we specified. You can also
combine these to filter the output based on your needs and rename the
AppDisplayName column. We’ll come back to this query later; now, we
need to highlight the project away operator.
Let’s rerun the initial query to see the default output. By default, the
columns will be TimeGenerated , ResourceId ,
OperationName , OperationVersion , Category , and
ResultType . It should look similar to Figure 1-31, though you might
have to resize your window to make sure you see those columns.
SigninLogs
| where ConditionalAccessStatus == "success"
FIGURE 1-31 Default columns for the sign-in log table
Now, let’s remove the second through fifth columns using project-
away . Run the following query; your results should look similar to Figure
1-32:
SigninLogs
| where ConditionalAccessStatus == "success"
| project-away ResourceId, OperationName, Operati
FIGURE 1-32 Removing selected columns
Those columns have now been removed from the output; all other columns
remain as before. Using these various project operators should give you the
flexibility you need to filter the output to only what you are looking for.
We’ve been focusing on formatting the output of what’s in the table. What
if we needed to do some sort of calculation or insert some additional data
into the output. For that, you will need to use the extend operator. This
operator's input can be additional columns, built-in functions, or even a
string. Run the following query and the results should look similar to Figure
1-33.
SigninLogs
| where ConditionalAccessStatus == "success"
| project TimeGenerated, CreatedDateTime
| extend Duration = TimeGenerated – CreatedDateTi
FIGURE 1-33 Duration column created based on other column data
We created a new column called Duration , which is the calculated
result of two other columns, TimeGenerated and
CreatedDateTime . This calculation itself is not very interesting, but
this ability is extremely valuable and will be used repeatedly.
SigninLogs
| where ConditionalAccessStatus == "success"
| project ResultDescription
| extend strlen(ResultDescription)
FIGURE 1-34 Outputting the results of a function call to a new column
This query itself isn’t very useful, but the fact we can call a KQL function
to perform some action on a column and output the results is another useful
tool we’ll use again. Also, the new column name will be automatically
generated for you if you don’t specify one. Our final example using
extends will be to add our own data to the query. Run the following
query; your output should be similar to Figure 1-35.
SigninLogs
| where ConditionalAccessStatus == "success"
| extend Calculation = 1906 * 1917 * 2005
| extend MadeUpdata = "Winners!"
FIGURE 1-35 Outputing columns of your own data
Again, this isn’t the most useful of a query, but it demonstrates two things.
First, we can add our own data to the output. As shown above, we can then
create a calculated column based on that. Second, we can also include
numerical operators in our queries. This will be covered in the next section.
Tip
AzureActivity
| where CategoryValue == "Administrative"
| where OperationNameValue == "MICROSOFT.STORAGE/
| where ResourceGroup contains "CLOUD-SHELL-STORA
| extend storageaccname = tostring(parse_json(Pro
| project OperationNameValue, Caller, CallerIpAdd
Data Manipulation
So far, we’ve found and filtered the data we are looking for in the table and
filtered the data we want to see in the output. Then, we did some very basic
creating of new columns based on existing data and even showed how to
add our own data. In this section, we’ll take this even further by
manipulating different aspects of the data.
First we’ll start with manipulating our filtered output with sorting. Then,
we’ll look at manipulating string values by splitting and trimming them.
Finally, we’ll touch on the extremely powerful parse function that will
save you a lot of time and headaches when trying to identify well-known
data patterns.
Let’s return to one of the earlier queries in this chapter, where we looked for
conditional access to Microsoft Entra ID. However, instead of looking at
where it was applied successfully, let’s look at sign-ins where it wasn’t
applied. Run the following query, and the output should look similar to
Figure 1-36:
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| project AppDisplayName, Location, UserAgent
FIGURE 1-36 Conditional access has not been applied to the filtered
columns.
This is useful, but we could do a few things to make this even more useful.
Notice how we have multiple locations, but they don’t seem to be in any
order. This is where the sort/order operator can be leveraged. The
actual operator is called sort . However, there’s also the order alias, so
you can use either one. In your query, specify which column you want to
sort by. You can either sort by ascending or descending order; the default
sort is descending order unless you specify otherwise. Run the following
query; your output should resemble Figure 1-37:
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| project AppDisplayName, Location, UserAgent
| sort by AppDisplayName
FIGURE 1-37 Sorting in descending order by the ApplicationDisplayName
column
If you add desc after the AppDisplayName , you will get the same
results; by default, it sorts in descending order. Another useful feature is the
ability to sort additional columns, each of which can be either ascending or
descending. It doesn’t matter what other columns have been sorted by. Let’s
try it! Run the following query; your output should be similar to Figure 1-
38:
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| project AppDisplayName, Location, UserAgent
| sort by AppDisplayName desc, Location asc
FIGURE 1-38 Sorting two columns in different orders
The AppDisplayName column is sorted in descending order; the
Location column is sorted in ascending order. We could have left off
the desc , but including it clarifies what is happening. The first
application, contosodemos-azgovviz , is at the end of the list
because we are sorting in descending order. We then sort in ascending order
the countries, which, in this case, is Germany (DE) and then India (IN). We
then move to the next app, console-m365d, and do it all over again, sorting
in ascending order with the United Kingdom (GB) and the United States
(US). This is why the United Kingdom (GB) appears after India (IN) in the
Location column.
Tip
AuditLogs
| where LoggedByService == "Access Reviews"
| where OperationName == "Create request"
| order by TimeGenerated asc
AuditLogs
| where LoggedByService == "Access Reviews"
| where OperationName == "Request approved" or Op
| order by TimeGenerated asc
AuditLogs
| where LoggedByService == "Access Reviews"
| where OperationName == "Request expired"
| order by TimeGenerated asc
This type of sorting works fine if the column has data in it. However, you’ll
often have a column with a value of nothing or null , which might make
your sort not appear what you had in mind. If you sort by ascending, the
nulls will show up first. If you sort by descending, the nulls will
show up last.
You can deal with this in two ways:
The difference is based on the data type. String values cannot be null. For
strings, we will use either isempty or isnotempty . Run the
following query; your output should be similar to the output in Figure 1-39.
SigninLogs
| where ConditionalAccessStatus == "notApplied" a
| project AppDisplayName, Location, UserAgent
| sort by UserAgent
FIGURE 1-39 Removing empty values from the UserAgent column
If you scroll to the very bottom, you can also confirm there are no blank
values for UserAgent to be found. Be mindful of doing this, however.
Just because the value is blank doesn’t mean it’s useless. The absence of
data often indicates a problem; the isempty companion function will
look for blank values. The point is, don’t just automatically filter out empty
values unless that’s really what you want to do.
Top Operator
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| top 100 by TimeGenerated desc
| project TimeGenerated, AppDisplayName, Location
| sort by Location
FIGURE 1-40 The most recent 100 results
Notice in the lower right corner of Figure 1-40, you can see that only 100
results were returned, and they will be the most recent ones sorted
TimeGenerated in descending order. We can use top on any column
and return any number of sorted records we want to see. The nulls
first or nulls last parameter sorting will apply here as well.
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| project ResourceId
| extend ResourceIDSubString = split(ResourceId,
FIGURE 1-41 Splitting based on a specific character
Our original string is on the left, and our new split string is on the right. If
we want to access a specific substring value, we will pass the array value
for the number we wanted. In this string, if we wanted to pull out the
TenantID , we would access the third element in the array. Arrays
usually start counting at 0, and because the first element in the string is the
actual character we are splitting on—the forward slash, / —the array
element in space 0 is a blank since there is nothing before it. Run the
following query; the results should be similar to Figure 1-42:
SigninLogs
| where ConditionalAccessStatus == "notApplied"
| project ResourceId
| extend ResourceIDSubString = split(ResourceId,
FIGURE 1-42 Displaying a specific substring value in the array
This specific example isn’t that useful as we can get the Tenant ID from
another column itself, but the ability to split strings based on a common
delimiter can be very useful. If our needs call for a bit more complex
splitting and filtering of our string, we can use the trim , trim_start ,
and trim_end operators. These functions work in a similar matter as
split. Instead, you would pass the regular expression you want to match,
followed by the source of the data you are trying to match against. The
result is the trimmed string, and the matching part has been removed.
Trim will look at the leading and trailing matches. Trim_start will
look at the leading matches, and trim_end will look at the trailing
matches.
Note
KQL supports the RE2 syntax library. For specifics, see the
documentation found at
https://aka.ms/KQLMSPress/RE2Syntax.
Parse Functions
You might have some very gnarly strings you want to break apart at this
point. The UserAgent field is a good example; the strings look like
these examples:
• Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.35
Could you write your own regular expression to pull that apart? I’m sure
you can. Should you? No! KQL has several functions that understand how
to parse through common data formats. We’ll talk more about these in
Chapter 3, “Operational Excellence with KQL,” but just know there are pre-
built functions that can parse the following formats:
Tip
AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(30d)
| extend operatingSystem = parse_json(DeviceDetai
| where operatingSystem == "MacOs" or operatingSy
| where UserAgent startswith "Mac%20SSO%20Extensi
| summarize count() by UserPrincipalName, tostrin
| sort by count_
Numerical Operators
• Equals The == operator is used to compare number pairs. (We used this
with strings earlier.)
• Not equals The != operator is used to compare number pairs. (We used
this with strings earlier.)
• Not equals to one of the elements The !in operator is used to compare
if a number is not equal to one of the values. (We used this earlier with
strings.)
int , long , and real are numerical types we can use with these
operators. As discussed earlier, it’s important to understand the data type
because it can impact the results. If one of the numbers is a real type, the
resulting type will be real . If both numbers are int type, the resulting
type will also be int .
This might not be what you want when it comes to a division operation. For
example, an integer of 1 divided by an integer of 2 will result in a value of
0, not .5, which you probably expect. You may have to switch to a different
data type, such as tolong , toint , or toreal . For this integer
division, you can specify one as a real number using real(1)/2 to get
the result you are expecting— .5 .
Time Operators
Earlier in this chapter when we discussed the different data types, we listed
out two as datetime and timespan . These are some of the most
important data types in KQL—for a few reasons.
• First, KQL is highly optimized for time filters. You will see that most real-
world KQL queries include time-based filters like ago or now .
• Second, these data types are a great way to greatly reduce the data set we
are looking for before doing initial filtering. If we are only looking for
something a user did in the last 14 days, why are we returning more data?
As discussed earlier in this chapter, you should always filter by time first
and then apply the rest of the where filters.
• The datetime data type represents an instant in time. The value is
always in UTC. The time ranges can be from 00:00:00 (midnight), January
1, 0001 AD through 11:59:59 PM, December 31, 9999 AD. The values
themselves are measured in 100 nanosecond units called ticks. Similar to
UNIX epoch time, the number of total ticks will be from January 1, 0001
AD 00:00:00 start time.
• Most of the time, you will not need to get to this granular level, and there
will be a few operators you use repeatedly to narrow down your scope of
data before applying additional filters. Another thing that sometimes
confuses newcomers is looking at time in the UTC format. This greatly
simplifies things when you have a team outside a specific time zone and is
the standard for logging. However, there is a way to convert it to your local
time.
• The timespan data type represents a time interval. This can be days, hours,
minutes, seconds, and even down to the nanosecond tick. We’ll see how
these time-based data types can be used in our KQL queries.
The two most common time operators you will use are ago and
between .
• The ago operator starts at the current point in time of UTC and then
looks backward by the amount of the interval you set.
• The between operator lets you filter the data values matching that
inclusive range.
The between operator can also be used for a numeric value like int ,
long , or real . But in this case, we will use it for time data types,
which can either be a specific date using the datetime data type or a
timespan range like minutes, hours, or days. Finally, the now operator
returns the current UTC and is very useful to determine how much time has
passed from an event until you run the current KQL query.
Let’s start with the ago operator. Run the following query; you’ll notice
the output will look similar to Figure 1-43:
SigninLogs
| where TimeGenerated > ago(7d)
FIGURE 1-43 Displaying results that took place from seven days ago until
now
This should look a lot like the other queries we’ve run so far in this chapter.
There are two important things, however:
• First, next to the Run button, the Time Range value has changed from our
normal preset dates of Last 4 Hours, 1 Day, and 7 Days to Set In Query.
• Second, we are not limited to just days or hours. The range can be set to
days, hours, minutes, seconds, a tenth of a second, a millisecond, a
microsecond, and even down to a tick!
Let’s try this again but with a different time interval. Run the following
query; you should see results similar to Figure 1-44:
SigninLogs
| where TimeGenerated > ago(15m)
FIGURE 1-44 Displaying results that took place from 15 minutes ago until
now
The only data returned should be events that occurred in the last 15
minutes. Your date and time will vastly differ from when this was written to
when you are running the command. To specify the other time values, use
• d for days
• h for hours
• m for minutes
• s for seconds
• ms for milliseconds,
The ago operator works well when the starting point is right now, and you
want to look backward from right now. What if you want to look for a
specific time range of dates? For that, you will need the between
operator, which takes in two ranges and includes the values between them.
The between operator uses the datetime data type. This can be
passed in the ISO 8601, RFC 822, or RFC 850 formats. ISO 8601 is
strongly recommended because it’s also the default in KQL.
Run the following query using today’s date, subtracting 5 from the first
input and 3 from the second. Your query should be similar to the one below,
and your data should be aligned to your dates, similar to Figure 1-45:
SigninLogs
| where TimeGenerated between (datetime(2023-09-2
FIGURE 1-45 Displaying results that took place between two days
In this example, we just did the specific days, but you can get down to
milliseconds if you want. This method works well for a specific date range.
How would you handle a rolling date range? Let’s say you want to look at
data that occurred every day between 14 and 7 days ago. You could achieve
this in two different ways. The first way would be to update the date with
the new date every time. This is not ideal. The better way would be to
combine the between and ago operators. Run the following query; you
should get a similar output to Figure 1-46:
SigninLogs
| where TimeGenerated between (ago(14d) .. ago(7d
FIGURE 1-46 Displaying results that took place between 14 and 7 days
from now
All of the time values for days, hours, minutes, and seconds can be used
here, as can the ago function.
The now function is similar to ago . It just returns the current UTC the
query was run at. If multiple now functions are being run, they will be in
sync for that single query statement, even if there is a slight difference in
the time it takes to complete.
We can do a simple example of using the now function and determine how
much time has passed since the log was created and when the current query
was run. Run the following query; your results should be similar to Figure
1-47:
SigninLogs
| where TimeGenerated between (ago(14d) .. ago(7d
| extend HowLongAgo = ( now() - TimeGenerated)
FIGURE 1-47 Determining how long ago a record was created from when
the query is being run
This is just a simple example, but you can see how we can use the now
function to determine how long ago something occurred in a new column,
HowLongAgo .
Much like strings, a time will come (see what we did there) when we need
to format the date and time into a format different from the default UTC
zone using the ISO 8601 format. There are several reasons we might need
to do this:
• Perhaps we are presenting to our leadership team about an event, and it’s
easier for them to understand the timeline when an event occurs in the local
time zone.
• We might be ingesting data from a different log source, and it’s coming in
with a different time format than the rest of our logs in the Log Analytics
workspace.
If you are new to looking at logs in UTC format, formatting dates and times
like this might seem odd at first. Trust us; this standard format is extremely
beneficial when correlating across multiple systems and devices across
multiple time zones. The core contributors of this book are in four different
time zones, just scheduling meetings is a headache. That’s not even looking
at enormous amounts of data across thousands and thousands of systems.
However, we understand it can be helpful to see the UTC in the local time
of the person consuming that data. We’ll use the function
datetime_utc_to_local to convert the current UTC to whatever
timezone you want. Run the following query; your output should be similar
to Figure 1-48:
SigninLogs
| extend LocalTimeInTokyo = datetime_utc_to_local
FIGURE 1-48 Displaying the UTC in the Tokyo time zone
There are more than 590 time zone options you can choose from. If you
happen to be in Tokyo reading this, you can change the time to US/Pacific
to see a large time difference.
Note
To see the list of all 590 supported time zones, please see
the documentation at
https://aka.ms/KQLMSPress/Timezones.
Note To see the time formatting options, see the documentation found at
http://aka.ms/KQLMSPress/Format_Timespan/.
Our final time-based example will be with the intervals themselves. You
might find the need to split a datetime instance into its various
components: year, month, day, and so on. You could probably write
something to try and parse this, but the datetime_part function will
do all this for you. You specify the date part you want to extract, followed
by the datetime instance.
FIGURE 1-49 Displaying the number of weeks in the year since the query
was run
We will cover the print operator in the “Miscellaneous Fundamentals”
section later in this chapter, but it outputs a single row based on the
expression. If we want to find out which day of the week a date is, what
week of the year a date is, or what month is in a date, we have three
functions we can call:
As we near the end of this fundamentals chapter, we will take a quick break
from KQL operators and functions and discuss some of the main points of
the Log Analytics user interface. There are a few things to note:
• First, you can accomplish nearly everything with KQL commands. If
you’ve been following along in this chapter, you haven’t once needed to do
something in the user interface. If you skipped ahead to this section, we
recommend returning to the beginning of the chapter to learn the basics.
• Let’s say you forgot to do a sort. Instead of rerunning the query, you can
just sort the existing results. We’ll cover some of those basics here.
• Cloud services are being updated multiple times a day. By the time you
read this book, the user interface might look different. The concepts should
remain the same, but the interface might change.
SigninLogs
| where ConditionalAccessStatus == "notApplied"
Now, scroll over a bit in the results until you see AppDisplayName . If
you click the three dots in the column heading, a new menu will open. You
will be able to search and filter those applications right there in the user
interface (see Figure 1-50).
FIGURE 1-50 Filtering the results of the AppDisplayName column
We can also sort directly by clicking on the column headers themselves. A
little arrow will appear to indicate whether you are sorting by ascending or
descending, as seen in Figure 1-51.
You can also change the order of the columns or remove them. To do so,
click on the right side of the workspace (where vertical text reading
“Columns” is shown). A new window will slide out, and you can deselect or
rearrange columns by moving the mouse over the four dots and dragging
that column up and down. In Figure 1-52, we moved the
AppDisplayName column to appear first and deselected
ResourceID , removing it from the display.
FIGURE 1-52 Removing columns and changing their display order
We can also group by specific columns. To do this, in the new menu that
appeared, click the area to the right of the workspace (where vertical text
reading “Columns” is shown), and click the six-dots icon next to the
column name. The arrow pointer will change to a hand. Then, drag the
column you want to group by to the Row Groups area directly below the
list. In our example below, we will group by AppDisplayName . Your
results should look similar to Figure 1-53.
FIGURE 1-53 Grouping by AppDisplayName
To remove this grouping, click the X next to AppDisplayName in the
right-side menu under Row Groups.
Next to the Run button, you can select the time range. There are some pre-
built intervals, though you can also select a custom time interval. A Help
menu— also appears as a book icon with a dropdown arrow (as shown in
Figure 1-54).
FIGURE 1-54 Additional KQL help
These are some useful references for digging into the language-specific
things you will encounter. The Log Analytics Editor has an IntelliSense
aspect, which you have probably noticed already. As you are typing, it will
show you different options. In Figure 1-55, IntelliSense helps you pick the
correct table if you can’t remember whether it’s named Signinlogs or
SigninLogs .
There is much more to the user interface, and we’ll cover other aspects as
needed. However, it’s important to remember that most of the things you
can use the UI for can be done in the query itself!
Miscellaneous Fundamentals
A few other operators and functions didn’t fit nicely into the previous
examples. You might use these in some of your future queries.
The first of these is the print operator. Normally, this is one of the first
things you learn in a language, but as you’ve seen throughout this chapter,
we don’t need to do that as many of the results are already returned. The
print operator does just that. It will output a single row based on the
expression. We can perform numerical operations, output the results, and
print out strings. We can also use this with functions. Now, let’s talk about
concatenating strings, which can be done in KQL with the strcat
function; between 1 and 64 arguments can be concatenated. Let’s combine
print and strtcat to write “Hello, World!” in KQL. The output will
match Figure 1-56:
• “Hello”
• A space indicated by “ “
• “World!”
Finally, let’s revisit time filtering. Four functions can be helpful when trying
to break down time into common categories: startofday ,
startofweek , startofmonth , and startofyear . Similar to
the ago function—which starts at the current time and goes back to a
specified point—you can use these functions to start at that time period.
This is useful when aggregating data based on a specific period, such as the
number of failed sign-ins daily and failed sign-in trends.
Summary
This chapter introduced us to the world of KQL and some of the most
fundamental skills you will use repeatedly throughout this book and in the
real world. We are just scratching the surface of the power of KQL. With a
solid understanding, you can easily build up to much more advanced
capabilities.
We covered generating data, some ways you can interact with the data, and
the various data types you can leverage in your KQL queries. We also
covered the typical structure of a KQL query and many ways to find the
data you seek by filtering through these large datasets, including time-based
filtering. We covered how to look for specific time ranges and speed up
your queries. We also covered different ways to display the data we found,
including adding new columns and removing existing ones.
This all leads to the next chapter, “Data Aggregation.” Now that we have
found the correct data, we need to take the next step—getting more insights.
We’ll use various data aggregation techniques and display those in various
charts to make those insights really stand out.
OceanofPDF.com
Chapter 2. Data Aggregation
• Group your data by common time delimitations such as week, day, or hour
Reporting on that number to your leadership team isn’t good enough. They
need to know which office was most affected because the New York office
has much of the finance team, and quarterly earnings will be posted in 10
days. The Chicago office is the home to the main research and development
team. The Paris office is closing a strategic deal with a partner. Knowing
which users at these locations are possibly compromised is critical because
some parts of the business could suffer more impact if those compromises
are not remediated quickly. With 12,139 users affected, that’s far too many
to sort into regions manually.
Your leadership team needs to give a status update to the company’s senior
leadership team. You have a few choices. First, you can just scroll down the
list, trying to get a rough estimate based on what users you recognize. That
is no way to make a critical and strategic decision. You can try exporting
this data to another tool like Excel, where you can do additional
deduplication filtering, but some data types don’t export cleanly, so many of
your tools won't work. So, to fully use the data export, more work must be
done on those 7,013 records.
Or you can use another strength of KQL, data aggregation. In this chapter,
we will show you how to answer these questions quickly and include much
more information, such as the first and last time this was witnessed. You
will turn your dataset into insights and actions. You can also convert them
into one of the things managers love most: pretty charts. Many of the
functions discussed in this chapter will be used as building blocks to answer
questions like those in our scenario and many more!
Obfuscating Results
Before we jump into a whole chapter full of queries, you should know there
are ways to enable auditing of your queries. We can skip the whole “with
great power comes great responsibility” admin talk here. The important
thing is knowing your query might show up in the audit logs.
SigninLogs
| where TimeGenerated > ago (30d)
| where ResultType == 0
| where UserDisplayName has h'mark.morowczynski'
Again, this will not work in our Log Analytics Demo environment, and
none of the queries that we’ll cover in this chapter have secret info or PII,
but if you are slightly modifying these and running them in your production
environment, add that h or H beforehand, so the strings would be
obfuscated in the audit logs.
Some common scenarios you will need to repeat repeatedly are narrowing
down to the distinct number of elements returned and counting the
elements. Often, you’ll want to combine those two things! We can do all
that and much more.
Distinct
We’ll start with the distinct operator, which will return the results
based on the distinct combination of columns you provide. We’ll start by
trying to answer a simple question: How many different user agents are
being used in the environment? If we run our query as we did in Figure 2-1,
we’ll see we have many different records; see Figure 2-2.
SigninLogs
| where TimeGenerated > ago (14d)
| project UserAgent
FIGURE 2-2 User agents that have been used in the last 14 days
As you can see, in the last 14 days, we had 24,696 sign-ins, and the list of
the different user agents available seems pretty varied. The first two results
are the same; if we look near the bottom, the third and fifth results are the
same. But to answer our question, we need to remove the duplicates and
only return unique values. Let’s try our query again, but instead of using
project , let’s use the distinct operator in its place. The results
should look something similar to Figure 2-3.
SigninLogs
| where TimeGenerated > ago (14d)
| distinct UserAgent
FIGURE 2-3 Distinct user agents that have been used in the last 14 days
Our dataset was further reduced to 154 unique UserAgent strings in this
environment. We need to work on some of our device management and
patching to reduce this number further and ensure that our environment is
uniform. A few other things now easily stick out. First, the last row shows a
user using Firefox on Ubuntu. Do our security policies and Microsoft Entra
ID conditional access policies apply to the Linux platform? If not, we
probably need to turn this insight into action and update our policies. Also,
third from the bottom is the axios/0.21.4 user agent. This looks very
different from our other user agents. Is this expected in this environment?
It’s hard to say; this is a demo environment, so probably.
Looking through these types of results in your own data can lead to many
interesting discoveries. Besides finding gaps in their Microsoft Entra ID
conditional access policies, we’ve had customers find pockets of computers
that were never upgraded to the latest operating system, running unpatched
and unsupported in production. We can do a few other things to make
important findings stand out a bit more, which we’ll get to shortly.
The distinct operator isn’t limited to one column. You can add
multiple columns in your query and get the distinct values of that
combination. Let’s expand on the previous scenario, where we looked for
the unique number of user agents being used and now extend it to which
user agents are accessing which applications. We can easily update our
query to include applications. Run the following query and add the sorting
direction for clarity. Your query should look similar to Figure 2-4:
SigninLogs
| where TimeGenerated > ago (14d)
| distinct AppDisplayName, UserAgent
| sort by AppDisplayName asc
FIGURE 2-4 Distinct applications and the user agents accessing them
We can now tell the unique instance of each user agent mapped to which
application they were accessing. About halfway down the screen, we see
five different UserAgent strings used against the AXA Google Cloud
Instance application. This is easy enough for us to see, and we can actually
see one of those browsers is much older than the others: Chrome 113. But
what if we also need to determine the count across all the applications and
user agents/browsers?
Summarize By Count
We’ll start the first query with summarize , similar to what we did in the
previous chapter, by selecting a random sample value—in this case, a table
column—and pass it into the aggregate function. To do this, we will
use the take_any() aggregate function. Note that any() has
been deprecated. Run the following query; your output should be similar to
Figure 2-5:
SigninLogs
| where TimeGenerated > ago (14d)
| project TimeGenerated, UserAgent, AppDisplayNam
| summarize take_any(*)
Tip
Because we have a good handle on the UserAgent value, let’s try and
answer a question: Which UserAgent string values do we have in this
environment, and how often do they show up? To do that, run the following
query; your output should look similar to Figure 2-6.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize count() by UserAgent
FIGURE 2-6 UserAgents by how many times they were found
Again, a few things should stick out. First, we didn’t provide a column
name for the count() aggregation, so it’s just named count_ . We can
set that display value, which we will do in the next query. Second, we have
a wide range of values for count . A good operational practice is to look
at the longer tail of these results by looking at user agents that have only a
handful of results, which might identify clients that need to be updated or
an attacker that has misspelled a user agent name when trying to blend in
with the normal traffic. Run the following query; the output will be similar
to Figure 2-7.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize UserAgentCount = count() by UserAgent
| sort by UserAgentCount asc
FIGURE 2-7 UserAgents by how many times they were found, sorted from
least to most
Many user agents have only been seen once in the last 14 days. But
python-requests/2.28.1 sticks out; we should investigate it. We
can add additional columns to the count() by . This will allow us to
determine which user agent accessed each application. Run the following
query; your output will be similar to Figure 2-8.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize UserAgentCount = count() by UserAgent
| sort by UserAgent desc
FIGURE 2-8 UserAgents Sorted Z to A with what apps they accessed
The python-requests/2.28.1 request accessed the Microsoft
Azure CLI application once. But even more interesting, we see other user
agents named python-requests in this environment. Look to see
what information you uncover in your environment.
Tip
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(3d)
| where AppId =='e9134e10-fea8-4167-a8d0-94c0e715
| summarize RequestCount=count() by Location, IP
We can also look at this query from the application perspective if we want
to know which application has been accessed the most by which user agent.
To determine this, we’ll simply flip our count() by . Instead of
counting by user agent, we’ll count by application and show which user
agent is accessing that application the most. Run the following query; your
output should be similar to Figure 2-9.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize AppDisplayNameCount = count() by AppD
| sort by AppDisplayNameCount desc
FIGURE 2-9 Most-accessed application by user agent
In this demo environment, the Azure Portal application with an Edge
browser version 121.0.0.0 was used 2,653 times. At the start of this section,
we focused on getting the distinct set of results returned, but we had to
count manually. Then, we used a count() of the results returned, but
these are not distinct. Let’s combine both of these with the aggregate
function dcount() , which allows us to get the estimated distinct count
by passing the column for which we want to get a distinct count and which
additional columns we want to aggregate/group the data by. Let’s take our
current example. What user agent is accessing the most unique
applications? Run the following query; your output should be similar to
Figure 2-10.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize AppDisplayNameCount = dcount(AppDispl
| sort by AppDisplayNameCount desc
FIGURE 2-10 Distinct applications and how many times a user agent has
accessed them
This is extremely useful information as we can see our most used user agent
in the environment regarding the total number of applications it is
accessing. Sorting the opposite way is also interesting to see what user
agent is accessing only a small number of apps. These might be good
candidates to be updated and brought into the standard browser versions for
the environment.
Tip
AuditLogs
| where LoggedByService == "Access Reviews"
| summarize OperationCount = count() by Operation
| order by OperationCount desc
AuditLogs
| where LoggedByService == "Lifecycle Workflows"
| summarize OperationCount = count() by Operation
| order by OperationCount desc
AuditLogs
| where LoggedByService == "PIM"
| summarize OperationCount = count() by Operation
| order by OperationCount desc
We can also flip this. What if we want to see how many unique user agents
access each application? We can see this number pretty quickly by getting
the dcount() for the UserAgent column and grouping by
application. Run the following query; your results should be similar to
Figure 2-11:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize UserAgentCount = dcount(UserAgent) by
| sort by UserAgentCount desc
FIGURE 2-11 Counting the distinct user agents and which applications
they accessed
This is even more interesting; 100 different user agents access the Azure
Portal! Thankfully, this is a test environment, but this tells a compelling
story. Many customers will have their own line-of-business (LOB)
applications in Microsoft Entra ID. Running a similar query and seeing
many user agents will show the possible browsers that would need to be
tested to ensure compatibility. That’s great data for the leadership team to
show why standardization on specific versions should be warranted.
Note
SigninLogs
| where TimeGenerated > ago(14d)
| summarize TotalCount = count(), USLogins=counti
| sort by USLogins desc
FIGURE 2-12 Total logins per application and total US logins
This view is much easier to read than two separate queries. Those with a
sharp eye will also notice that we combined two summarize aggregate
functions. Like how we combined multiple data-filtering methods in
Chapter 1, we can do some powerful things by combining those functions.
We highlight a few of those throughout this chapter.
Tip
IntuneAuditLogs
| where TimeGenerated > ago(7d)
| where ResultType == "Success"
| where OperationName has_any ("Create", "Delete"
| summarize Operations=count() by OperationName,
| sort by Operations, Identity
IntuneOperationalLogs
| where OperationName == "Enrollment"
| extend PropertiesJson = todynamic(Properties)
| extend OS = tostring(PropertiesJson["Os"])
| extend EnrollmentTimeUTC = todatetime(Propertie
| extend EnrollmentType = tostring(PropertiesJson
| project OS, Date = format_datetime(EnrollmentTi
| summarize
iOS_Successful_Enrollments = countif(Result =
iOS_Failed_Enrollments = countif(Result == "F
Android_Successful_Enrollmenst = countif(Resu
Android_Failed_Enrollments = countif(Result =
Windows_Succesful_Enrollments = countif(Resul
Windows_Failed_Enrollments = countif(Result =
by Date
Going a step further, how many unique user agents are using that
application in that US region? Again, we could run separate queries like
before, but combining them is much more useful, so we will use the
dcountif() to only count the distinct rows that evaluate to true
based on the expression. Run the following query; the output should be
similar to Figure 2-13:
SigninLogs
| where TimeGenerated > ago(14d)
| summarize TotalCount = count(), USUserAgent=dco
| where USUserAgent > 0
| sort by USUserAgent desc
FIGURE 2-13 Total logins per application and by US access
The dcountif function evaluates the column you want to have the
distinct count of when the expression is evaluated to true . In this
example, we are looking for the unique number of user agents when the
location is US . Next, we grouped them by application display name
( AppDisplayName ).
A common scenario that will come up more often than you think is
determining the first or last occurrence of something. You can use the
min() or max() functions to find the minimum or maximum value of
what is passed to it, such as finding the first time someone signed in to an
application. Run the following query; your output should be similar to
Figure 2-14:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize TotalCount = count(), FirstEvent=min(
| sort by FirstEvent asc
FIGURE 2-14 The first sign-in event in the application and the total sign-
ins for that app
We can now quickly determine the first time a sign-in event was generated
for that application and sort our results based on the earliest time. We can
also do the opposite and determine the last time a sign-in event occurred for
an application. To do that, we’ll use the max function. Update the query to
match the one listed here; the output should be similar to Figure 2-15.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize TotalCount = count(), LastEvent=max(T
| sort by LastEvent desc
FIGURE 2-15 The last sign-in event in the application’s and the total sign-
ins for the app
The output is similar to our last result but now shows the last sign-in event
for that application. As mentioned earlier, we can combine multiple
summarize functions to refine our results further. We can get a side-by-
side timeline view of the first and last events with just the min and max
functions. Run the following query; your results should be similar to the
output in Figure 2-16:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize TotalCount = count(), FirstEvent = mi
| project AppDisplayName, TotalCount, FirstEvent,
| sort by FirstEvent asc, LastEvent desc
FIGURE 2-16 The first and last sign-in event for each application and the
total sign-ins for each application
Here, we are combining a few things that we’ve used so far in this book:
1. First, we use our new min and max aggregate functions to easily pull
out the first and the last time a sign-in event occurred.
2. Next, we re-order the column’s output to put the functions’ results side by
side to make it easier to see the difference.
3. Finally, we sort both columns, starting with the first event and then the
last.
As we move into more advanced queries, you will see this similar pattern of
combining multiple functions and filters, continuing to refine the query, and
then formatting the output. You could easily add a filter for a specific user
account to see this same information but for that user account.
The min and max functions return the value of a column, but what if you
want the values for additional columns or find the columns where that value
is located? You would use the arg _ min() and arg _ max()
aggregate functions. You would provide the first column for which you
want to find the minimum or maximum values, followed by the other
columns for which you’d also like these values returned. You’d enter an
asterisk ( * ) for all columns. Run the following query to find the minimum
values of TimeGenerated ; your output will be similar to Figure 2-17:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize FirstEvent = arg_min(TimeGenerated, C
| sort by FirstEvent asc
FIGURE 2-17 The minimum value of TimeGenerated by application with
the additional columns specified
Here, we are looking for the minimum value of TimeGenerated —the
first result showing an application sign-in event. Then, we also included
additional columns we want to see the values of when TimeGenerated
is at its minimum value, such as conditional access status, the client
application used to access the application, and finally, whether it was a
single-factor or multifactor request. We can run a similar query using the
arg _ max and return all columns using a * . Run the following query;
your output will be similar to Figure 2-18:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize LastEvent = arg_max(TimeGenerated, *)
| sort by LastEvent desc
FIGURE 2-18 Maximum value
This is similar to the minimum-value results, except we start with the most
recent event and return all the columns in the table. The scrollbar at the
bottom of Figure 2-18 shows that we have many more output columns to
see all the values for each application’s most recent event.
The final set of statistical functions we’ll look at in this section are
average and summation . Just as you learned in school, these
functions will find the avg() , otherwise known as the arithmetic mean,
and sum() , which will find the sum of values in a column. Let’s run the
following query to understand how these work; your output should be
similar to Figure 2-19:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize AvgCreatedTime = avg(CreatedDateTime)
FIGURE 2-19 The average time when a sign-in event occurred for each
application
Here, we can see the average time an event was created per application. We
can also expand this with the avgif() function. Like our previous
aggregate functions that use an if function, we can evaluate an
expression; if its results are true , that expression is used for the
calculation. For this, let’s determine the average creation date if the user
signed in from the US. Run the following query; your results should be
similar to Figure 2-20:
SigninLogs
| where TimeGenerated > ago (14d)
| summarize AvgCreatedTime = avgif(CreatedDateTim
FIGURE 2-20 Average time when a US sign-in occurred for each
application
Similar to our previous results, we are now filtering on the average creation
time if the sign-in came from the US. Some good examples of when to use
average would be calculating the processor utilization or memory
consumption of our IaaS virtual machines or even more advanced
functionality from our Internet of Things (IoT) devices that might be
reporting the temperature and humidity of their locations.
Tip
Perf
| where TimeGenerated > ago(1h)
| where (ObjectName == "Processor" and CounterNam
(ObjectName == "Memory" and CounterName =
| summarize avg(CounterValue) by Computer, Counte
The next aggregate functions we will look at are sum() and sumif() .
For these, you simply provide the column you want to summarize. The data
type value in the column needs to be numeric, such as a decimal, double,
long, or integer. For more information on data types, see Chapter 1, “Data
Types and Statements.” Our sample sign-in logs don’t have any good
columns to sum, so we are using a different table,
AppPerformanceCounters , for this query because it has more data
with values that can be totaled. Run the following query; the results should
be similar to Figure 2-21:
AppPerformanceCounters
| where TimeGenerated > ago(14d)
| summarize sum(Value) by AppRoleName, Name
FIGURE 2-21 The sum of the application performance counters
Going through these performance counters for an application is a bit outside
of the scope of this book, but the aggregate functions used so far can be
applied to this table and columns. Understanding how much time an
application has been executing or how much memory it has consumed
might highlight places for optimization to drive some of the consumption
costs down.
We can see that the Fabrikam-App handles 7,835 requests per second, more
than ch1-usagegenfuncy37ha6, which performs 5,507 requests per second.
We could have made this easier to read by only displaying that column. See
“Visualizing Data” later in this chapter to see how to graph this data.
As we continue to analyze more of our data, we’ll often need ways to group
this data out by different segments to answer questions. What day of the
week was the most active? Which month of the year was the least active?
We will use a common technique called binning to accomplish this and
more. We’ll also frequently need to quickly convert the data into something
a little easier to understand. Showing the percentage and the 25th or 95th
percentile distribution for the data will help you tell a story with the data.
• The first is the value you want to round down. This can be the int ,
long , real , datetime , or timespan types. (You’ll end up using
timespan often.)
• The second parameter is the bin size by which the values will be divided.
This can be the int , long , real , or timespan types.
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == 0
| summarize SuccessfullSignIn=count() by bin(Time
| sort by TimeGenerated asc
FIGURE 2-22 Daily Successful sign-in count
We are first filtering for how successful sign-ins are. In the previous
examples, we counted them for those 14 days, but now you can see some
days are busier than most. For most organizations, this is expected as people
are off not working on the weekend. But the ability to bin by date is
extremely useful. We’ll use this functionality multiple times throughout this
book.
AppPerformanceCounters
| where TimeGenerated > ago(14d)
| where Name == "Requests/Sec" and AppRoleName ==
| summarize sum(Value) by AppRoleName, Name, bin
| project TimeGenerated, AppRoleName, Name, sum_V
| sort by TimeGenerated asc
FIGURE 2-23 Total requests per second, per day
We made a few small modifications to the original query. First, we only
filtered for the application and performance counter we were interested in.
Our summarize function is the same as before, except we added a 1-day
bin interval. We then cleaned up the output and sorted by date. If you
wished any of the previous queries had been broken down by different
intervals, feel free to alter them using the bin function!
Tip
This query looks at network flows per hour for the last 24
hours. Look for patterns and suspicious or long-running
network flows. See https://aka.ms/KQLMSPress/NetFlows
for set-up requirements. –Laura Hutchcroft, senior service
engineer
AzureNetworkAnalytics_CL
| where TimeGenerated > ago(24h)
| summarize sum(InboundFlows_d), sum(OutboundFlow
Percentage
SigninLogs
| where TimeGenerated > ago (14d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
| extend ['MFA Percentage']=(todouble(MultiFactor
| extend ['SFA Percentage']=(todouble(SingleFacto
FIGURE 2-24 Percentage of MFA and single-factor sign-ins
Let’s break down this query. The beginning is the normal stuff, where we
filter by time and successful sign-ins. Then, we pull the columns we want to
work with and summarize the total count of all sign-ins, and then totals
depending if the sign-ins are single-factor or multifactor.
SigninLogs
| where TimeGenerated > ago (14d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
| extend ['MFA Percentage']=round((todouble(Multi
| extend ['SFA Percentage']=round((todouble(Singl
FIGURE 2-25 The rounded percentage of multifactor sign-ins and single-
factor sign-ins
As you can see, you can round and alter how many digits you want to round
to. This will be one of those common tactics you use repeatedly to calculate
the percentage.
Percentiles
What if you wanted to determine if the values for the column are larger than
a specific percentage compared to the other data? For that, we’ll need to use
the percentile() or percentiles() functions.
Percentile() takes two parameters: the column you want to use for
the calculation, and then the percentage you want to determine is equal to or
larger than for that sample set. Percentiles() works similarly, except
you can specify multiple comma-separated values. Let’s go back to the
ApplicationPerformanceCounters table and run the following
query; your results should be similar to Figure 2-26:
AppPerformanceCounters
| where TimeGenerated > ago(14d)
| where Name == "Available Bytes"
| summarize percentile(Value,50) by AppRoleName,
FIGURE 2-26 The 50th percentile value for Available Bytes per
application
Here, we can see the value of Available Bytes that would be 50 percent or
larger of the values for each application. We can get the values for multiple
percentages using percentiles(). Update your command to the following;
your output will be similar to Figure 2-27:
AppPerformanceCounters
| where TimeGenerated > ago(14d)
| where Name == "Available Bytes"
| summarize percentiles(Value,25,50, 75) by AppRo
FIGURE 2-27 The 25th, 50th, and 75th percentile values for available
bytes per application
These values fall along the 25 percent, 50 percent, and 75 percent
percentiles. This type of query is very interesting when you are trying to
determine how to allocate and size resources such as virtual machine size or
Azure App Service plan to pick for capacity planning or looking at usage
spikes. You can also leverage this when looking for anomalies or outliers in
your datasets. For example, if you have a simple test application that
authenticates 100 times a day, that isn’t the most concerning. However, if
you looked at the percentiles of sign-ins and found that it was in the 95
percent percentile, that would probably be a big cause for concern. The
simple test application should not be one of our environment’s most logged-
in applications. Either something is misconfigured, or it’s being used in a
way outside its normal scope. Percentiles can help highlight those types of
behaviors.
We’ve been returning lots of interesting data so far in our KQL journey.
What if we needed to temporarily store it to do some additional processing?
For example, let’s say when we returned all the UserAgent strings, we
wanted to check it against a known set of known malicious user agents.
Another scenario would be a compromised user account and we want to be
able to quickly determine all the unique applications they have accessed
from the time of known compromise until we regained control of the
account.
To be able to temporarily store some of these results or even create our own
dataset, we’ll use a common programming concept called a dynamic array.
We’ll cover more details of leveraging arrays in Chapter 3, “Advanced
KQL Operators,” and Chapter 5, “Security and Threat Hunting,” but we’ll
use two very common functions—lists and sets—to get you started.
Lists
A list is pretty simple. You’ll add items to the list either manually or as part
of a summarize query. Let’s first create our own list manually. Again,
we’ll cover this more in Chapter 5, “Security and Threat Hunting.” Here,
we’re just looking at a simple example to get you started. Run the following
query; your output will be similar to Figure 2-28:
Here, we can see the values—World Series winners from 2000 to 2015—
inputted into this list. The New York Yankees and St. Louis Cardinals
appear twice in the output. The list will store whatever is inputted, including
multiple values of the same thing. But you can now manipulate this data as
we’ve done throughout this chapter. Let’s group these winners by even and
odd years. Update your query; the output should be similar to Figure 2-29.
SigninLogs
| where TimeGenerated > ago (14d)
| summarize RiskLevels= make_list_if(RiskEventTyp
Sets
SigninLogs
| where TimeGenerated > ago (14d)
| summarize RiskLevels= make_set_if(RiskEventType
FIGURE 2-32 Distinct risk event sign-ins per application
If you compare this to the previous list, you will see that each
RiskEventType_v2 is only stored once. The
RiskEventTypes_V2 produces results with multiple event types, so it
might look like some of these events are repeating. They are not. If you
look at the Microsoft Office 365 Portal risk levels, three distinct results
exist between the brackets [] :
• A second risk event type shows when a sign-in has been flagged for
unfamiliar features.
• A third set of event types shows that a sign-in has been flagged for both
unfamiliar features and unlikely travel.
If you’ve been following along in this book, you might have noticed that
every time we succeed with a query, the output is just a list of data. We’ve
taken a huge amount of data and cut it down to something specific we want;
the result was more data. Now what?
There is an entire area of study on the best ways to visualize data for
cognitive reception, which is well beyond the scope of this book. There is
an art and a science to it. The easy way to remember this is one of the old
jokes that the mucky-muck managers love charts and graphs because they
have pictures and colors. There is some truth to that joke, however. You will
need to demonstrate a problem, insight, or finding; visualizations can be the
clearest way. A picture is worth a thousand words, or so they say.
KQL has a built-in operator to help you visualize your data, called
render() . This needs to be the last operator in the query. It does not
modify any data. You should use the where operator, summarize
operator, or even the top operator to limit the amount of data displayed.
By default, you’ve actually been outputting your data in the table render
format this whole time!
Because you want to go beyond just a table, you will specify what type of
visualization you’d like to use, and we’ll cover some of the very common
ones here. The render operator looks at the data as three kinds of
columns: the x-axis column (horizontal ), the y-axis column (vertical), and
the series. For example, if we were looking at sales data, it might be
visualized by having the months along the x-axis, the total along the y-axis,
and the series, showing how much of each product was sold each month. A
best practice is to sort the data to define the order along the x-axis. Another
thing to be aware of is that different KQL tools support different
visualizations. We will demonstrate the ones that work in our Log Analytics
demo environment and discuss the ones that do not. You are free to try them
yourself in your own environment.
Pie Chart
The first visualization we will look at is the pie chart. This takes two
columns in the query result. The first column is used as the color axis. This
can be text, a datetime , or a numeric data type. The other column will
determine the size of each slice of the pie and contain the numeric data
type. Run the following query; your output should be similar to Figure 2-
33.
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Appcount = count() by AppDisplayName
| render piechart
FIGURE 2-33 A pie chart showing the number of successful sign-ins by
application over the last 14 days
Pie charts are best used when presenting a composition of categories and
how much their proportion is of the total. You can clearly see that more than
50 percent of the sign-ins in this demo tenant are for the Azure Portal and
the Microsoft 365 Security and Compliance Center. You might have added
37.03 percent and 20.73 percent to come to that same realization if you
were calculating percentages for your dataset, but if you just counted the
total number of sign-ins, this might not be as obvious.
Tip
SigninLogs
| where TimeGenerated > ago(1d)
| project ConditionalAccessStatus, AppDisplayName
| where ConditionalAccessStatus has "notapplied"
| summarize count() by AppDisplayName
| render piechart
SigninLogs
| where TimeGenerated > ago(1d)
| project ConditionalAccessStatus, AppDisplayName
| where ConditionalAccessStatus has "failure"
| summarize count() by AppDisplayName
| render piechart
Bar Chart
The next chart we will look at is the bar chart. This takes two columns as
well. The first column will be used as the y-axis. This can contain text,
datetime , or numeric data. The other column will be the x-axis and can
contain numeric data types displayed as horizontal lines. Run the following
query; the output should be similar to Figure 2-34:
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Appcount = count() by AppDisplayName
| limit 10
| render barchart
FIGURE 2-34 Top 10 applications and the number of successful sign-ins
Bar charts are best used for comparing numerical or discrete variables
where the line length represents its value. We limited our results to just the
top 10 applications. However, once again, we can quickly see which
applications have the most usage relative to the others and note that the top
5 applications make up most of the usage. That might be a good area to
focus on to ensure proper security controls are being met.
Column Chart
The next type of chart we will look at is the column chart. This will also
take two columns. The first column is the x-axis and can contain text,
datetime , or numeric data types. The other column will be on the y-
axis, containing the numeric data types displayed as vertical lines. This
chart type can also be stacked or unstacked. By default, it is stacked, and
we’ll start with that. Run the following query; your output should be similar
to Figure 2-35:
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Signcount = count() by AppDisplayName
| render columnchart
FIGURE 2-35 A stacked column chart showing daily application sign-ins
These column charts are best used for comparing specific subcategory
items. This may be a bit tough to see in this black-and-white book (run
these queries yourself to really see), but each application has a different
color and is stacked on top of each other for each date. The size represents
how much of the sign-in count. You can see this if you mouse over an area;
it will show you the exact number. But you can see which application had
the most sign-ins of that day relative to the other applications. You can also
unstack this chart. Run the following command to see that version; your
output will be similar to Figure 2-36:
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Signcount = count() by AppDisplayName
| render columnchart with (kind=unstacked)
FIGURE 2-36 An unstacked column chart showing daily application sign-
ins
We used the same data as in the previous example, but each application has
its own column for that day. You can compare these application sign-ins to
each other daily.
Tip
CDBPartitionKeyRUConsumption
| where TimeGenerated >= now(-1d)
//specify collection and database
//| where DatabaseName == "DBNAME" and Collection
// filter by operation type
//| where operationType_s == 'Create'
| summarize sum(todouble(RequestCharge)) by toint
| render columnchart
Time Chart
Another type of chart you will use frequently is the time chart. This is a
type of line graph. The first column of the query will be on the x-axis and
should be a type of datetime . You will most likely want to use the
bin with this for the time-period intervals you are interested in. The other
column will be numeric and on the y-axis. Run the following query; your
output should be similar to Figure 2-37:
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == "0"
| summarize Signcount = count() by bin(TimeGenera
| render timechart
FIGURE 2-37 A time chart depicting the number of successful sign-ins per
day
Using a time chart is an excellent way to represent data by time. It’s clear to
see we see fewer sign-ins on the weekends, which is to be expected; our
busiest day is on Wednesday. Looking at datasets over time will give you
insights to see if normal patterns differ over weeks or even months.
Tip
AuditLogs
| where Category == "GroupManagement"
| where TargetResources == "REPLACE" // group id
| where ActivityDisplayName in ("Add member to gr
=="Update group"
| summarize count() by TimeGenerated
| render timechart
Area Chart
Another type of time series chart is the area chart. The first column should
be numeric and used as the x-axis. The second and additional numeric
columns are the y-axis. These represent volume or contributions. These can
be both stacked or unstacked. Run the following query; you should see a
similar output as Figure 2-38:
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Signcount = count() by AppDisplayName
| render areachart
FIGURE 2-38 A stacked area chart showing successful daily sign-ins per
application
Area charts are best used to show the change amount of different datasets.
Because this is a stacked area chart, the y-values are baselined to the
previous value below it. This allows us to see the total and the breakdown
of the individual items in the group. For example, if we look at the top two
peaks on November 8, they are Azure Advanced Threat Protection (458
sign-ins) and Azure Portal (total sign-ins). You’ll notice that they align on
the 3,200 and 3,700 y-axis ranges. The total sign-ins for that day is 3,700.
Each app value point in the chart is added to the previous values in that date
because they are baselined to the value below it (the previous line). You can
also do an unstacked area chart by running the following query; your results
should be similar to Figure 2-39:
SigninLogs
| where TimeGenerated > ago(14d)
| where ResultType == "0"
| summarize Signcount = count() by AppDisplayName
| render areachart with (kind=unstacked)
FIGURE 2-39 An unstacked area chart of daily successful sign-ins per
application
Area charts that are unstacked are baselined to zero. The y-axis only goes
up to 1,100; the highest volume occurred on November 8, with Azure
Portal’s 967 sign-ins.
Note
Line Chart
This is the most basic type of chart. The first column will be the x-axis and
should be numeric. The other column will be the y-axis and should also be
numeric. This is useful to track changes over periods of time. Line charts
are preferable for small changes over time because they are easier to see
than a bar chart. This will not display in the Log Analytics demo
environment.
Scatter Chart
Another type of chart you might use is the scatter chart. The first column is
the x-axis and should be a numeric value. Other columns are numeric and
are on the y-axis. These charts are good to show relationships between
variables. You can plot this data on a map if you also have the longitude,
latitude, or data that has the GeoJSON column. To view this type of map
scatter chart requires either Kusto Desktop Explorer or Azure Data Explorer
and will not display in the Log Analytics demo environment, though a
regular scatter chart would.
Additional Charts
We have included the rest of these charts, even though none of them can be
displayed in the Log Analytics demo and are used infrequently, if at all, by
the authors.
• Ladder chart The last two columns are the x-axis, which must be date
and time, and the additional columns are a composite on the y-axis. It sort
of looks like a bar chart, but the values are not touching the y-axis because
they are mapped to date and time. It's kind of like a Gantt chart.
• Card This visualization only displays one element. If there are multiple
rows or columns, the first result record is shown.
• Pivot chart This visualization allows you to interact with the data,
columns, rows, and various chart types.
• Time pivot These are similar to a pivot chart, but the interactive
navigation takes place on the events on the time axis.
• kind This value depends on the chart, but you will usually use
stacked , unstacked , or stacked100 . Stacked100 is useful
when plotting the percentage contribution instead of the absolute value.
• legend This value toggles the legend to Visible or Hidden.
• xaxis This scales the x-axis. This can be set to Linear or Log.
• yaxis This scales the y-axis. This can be set to Linear or Log.
• xcolumn This determines which column in the result is used for the x-
axis
Make Series
We need to cover one more chart time, a combination of time and the
optional values in the previous section: the make-series operator. This
operator allows you to create a series of specified aggregate values along a
specific axis. Why would you want to do this instead of just using one of
the built-in charts? A key reason is that many charts will be smoothed out
when you use the bin function for a specific period and if there is no
record for that time bin. For example, run the following query; your output
should be similar to Figure 2-40:
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where AppDisplayName == "Azure Purview"
| summarize Count=count() by bin(TimeGenerated, 4
| render timechart
FIGURE 2-40 A time chart showing daily Azure Purview sign-ins
This looks similar to the graphs we did previously, though it might give the
viewer a false sense of the data, specifically from October 28 to November
4. If you quickly glance at this, you might think there were four sign-ins for
the Azure Purview app each day. Let’s update our query using make-
series and set the default for when no data is present to be 0 . Your
output should be similar to Figure 2-41:
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where AppDisplayName == "Azure Purview"
| make-series Count=count() default=0 on TimeGene
| render timechart
FIGURE 2-41 A time chart using make-series with the default value set to
0
This is the same dataset from the previous example, but that time chart
looks much different! It’s more “accurate” than the previous example
because we now represent the lack of data as 0 .
The make-series operator has additional parameters that can also be
configured, so let’s break down what was in our first query and add to it.
The first things the make-series parameter uses are a column name for the
results and the aggregate function we want to run. This is no different
than what we’ve been doing throughout this chapter. The make-series
operator supports the following aggregate functions, most of which we’ve
covered in this chapter.
• avg()
• avgif()
• count()
• countif()
• dcount()
• dcountif()
• max()
• maxif()
• min()
• minif()
• percentile()
• take_any()
• stdev()
• sum()
• sumif()
• variance()
The next parameter is default= , which is the value you want to use
when no value is found. Figure 2-41 shows why we’d want to use the value
of 0 if there was no sign-in event for that day. Next is the AxisColumn ,
which is what the data series we are visualizing will be ordered by.
Typically, this will be either a timespan or datetime . In this
example, we used a 30-day timespan . If you are using a datetime
value, you would also include from start date to end date .
For example, if you were using the date range in the sample, you would use
from datetime(2023-10-20-T00:00:00Z) to
datetime(2023-11-19T00:00:00Z) . If you don’t specify a start
and stop, the first bin with data in it is used as the starting date. The final
parameter is step . This is the difference or bin size between the time
intervals. In our case, we used four hours.
We can also do some data analysis on the dataset. Digging into the various
data analysis functions available is beyond the scope of this book since we
are focusing on operations, defense, and threat hunting, but mapping a trend
line is something you will do often. We’ll combine this query with some of
the other make-series optional arguments to label our graph. Run the
following query; your output should be similar to Figure 2-42.
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType != "53003"
| make-series Count=count() default=0 on TimeGene
| extend (RSquare, SplitIdx, Variance, RVariance,
| project TimeGenerated, Count, TrendLine
| render timechart with (xtitle="Day", ytitle="Fa
FIGURE 2-42 A time chart with a trend line
• RVariance This is the residual variance between the input data values
and the approximated ones.
• TrendLine This is the name of the results that will be used to graph.
Tip
MicrosoftGraphActivityLogs
| where TimeGenerated between (ago(3d) .. ago(1h
| summarize EventCount = count() by bin(TimeGener
| render timechart
with (
title="Recent traffic patterns",
xtitle="Time",
ytitle="Requests",
legend=hidden
)
Now that we’ve learned the various aggregate functions available to us, you
will notice many other powerful functions that leverage these aggregate
functions. For example, the top-nested operator performs hierarchical
aggregation.
SigninLogs
| where TimeGenerated > ago(14d)
| top-nested 3 of AppDisplayName by count(), top-
| project AppDisplayName, UserAgent
FIGURE 2-43 The top three applications and top three UserAgents
Because we are asking for the top three application usage by count, the
applications are the first level of the hierarchy, followed by the user agents.
This query asks, “What are the top three user agents by count for the top
three applications?” Also, you can add additional levels to the top-
nested hierarchy using the following aggregate functions:
• avg()
• count()
• dcount()
• max()
• min()
• percentile()
• percentilew()
• sum()
Summary
In this chapter, we added another fundamental skill to our KQL skill set—
the ability to aggregate data. Data filtering (discussed in Chapter 1) and
aggregating data will be the two most common things you do in your KQL
queries going forward. Even some of the most advanced queries in this
book will be built on top of these two concepts.
One of the most common things you’ll do is break your dataset up using
bins , typically by dates, to look at the data aggregate per week, day, or
hour. This helps you visualize this data in the various charts and use the
right chart to tell the data’s story. We also covered how to make your own
dataset series and the implications that can have on those charts.
OceanofPDF.com
Chapter 3. Unlocking Insights with Advanced
KQL Operators
• Use the subtle arts of query tuning and optimization to handle vast
datasets efficiently
In this chapter, we will delve into the world of Advanced KQL operators.
These operators enable intricate manipulations and data analysis that are
simply impossible with the basic operators alone. From pattern recognition
to statistical evaluations, Advanced KQL operators facilitate a higher level
of data understanding.
By the end of this chapter, the mysteries of Advanced KQL operators will
be unraveled, providing you with a robust set of tools to approach data in
ways you might never have thought possible.
KQL variables are used to store and reference values within a query. They
act as placeholders that can be assigned different values, such as constants
or calculated results, and then used throughout the query. This allows for
better organization, readability, and reusability of code.
Variables in KQL queries offer several advantages. They allow for the
creation of reusable code snippets, promote better code organization,
enhance query readability, and facilitate easier maintenance and debugging.
Additionally, variables can be used to parameterize queries, making them
more flexible and adaptable to different scenarios.
The syntax for creating constants in KQL is important because it allows you
to store values that remain constant throughout the query execution. This
can be useful for filtering data based on specific criteria, making it easier to
modify the query behavior by changing the constant value in a single place.
For example, you can use a constant variable to store the name of a region
and then use that variable in the where clause to filter the data based on the
region.
To create a constant variable in KQL, you use the let statement followed
by the variable name, an equal sign, and the value you want to assign.
Constants are useful for storing values that remain constant throughout the
query execution.
Let's say we want to filter our data based on a specific region. We can use a
constant variable to store the region name and easily change it whenever
needed. Here's an example:
The KQL let operator is also important for calculating values. The let
operator improves query readability and reduces the risk of errors by
allowing you to refer to the calculated value without repeating the
calculation logic.
In addition to constants, you can use the let statement to create variables
holding calculated values. These values are derived from expressions or
functions and can be used in various parts of your query.
Let's say we want to calculate the time difference in seconds between two
timestamps. We can use a calculated value to store this result and reuse it
throughout the query:
The let operator in KQL allows you to define reusable functions, which
can encapsulate complex logic and be reused across queries. Also, let
allows you to define reusable functions. Functions allow you to encapsulate
complex logic and reuse it across queries.
For example, you can use let to create a function to format names. Let's
say we frequently need to concatenate the first name and last name of a
country in our data. We can create a function to simplify this task:
• Code reusability Functions allow you to define complex logic once and
reuse it across multiple queries, improving code efficiency and reducing
redundancy.
TIP
The ability to use multiple variables in KQL queries using the let
operator is important because it allows you to create more dynamic and
flexible queries. KQL allows you to use multiple variables within a query.
You can define and reference multiple variables to create more dynamic and
flexible queries.
Let's say we want to filter our data based on multiple criteria, such as
country and city. We can use multiple variables to store these values and
easily modify them as needed:
Note
In KQL, you can specify default values for function parameters, allowing
you to call the function without providing a value for that parameter, and
the default value will be used instead. KQL allows you to specify default
values for function parameters. This feature provides flexibility and allows
you to handle cases where certain parameters are not explicitly provided.
Let's say we have a function that calculates the time difference in days
between two timestamps. We can specify a default value for one of the
timestamps to handle cases where it is not provided:
let timeDiffInDays = (startDate: datetime, endDat
toscalar(endDate - startDate) / 1d
};
MyTable
| extend ElapsedDays = timeDiffInDays(StartTime,
Note
Creating views using the KQL let operator is important because it allows
you to create virtual tables based on the result set of a query, providing a
convenient way to organize and reuse data.
In addition to values and functions, the let statement can also be used to
create views—virtual tables based on the result set of a query—providing a
convenient way to organize and reuse data.
Let's say we frequently need to work with data from a specific region. We
can create a view that filters the data based on the region and reuse it in our
queries:
Note
Let's say we have a complex query that involves computing a total count
and then using it multiple times. We can use the materialize()
function to cache and reuse the subquery results efficiently:
Note
The let statement allows you to define variables that can be used
throughout the query. These variables can be used to store values,
expressions, or even entire subqueries, which can then be referenced in
other parts of the query. This ability can be particularly useful when
working with complex queries because it allows you to break down the
query into smaller, more manageable parts, improving readability and
maintainability. Following are some best practices:
• Naming conventions for variables When naming variables in KQL, the
best practice is to use descriptive and meaningful names that convey their
purpose or value, promoting code readability and understanding.
The union operator can be a useful tool when you want to combine data
from different sources or tables or when you want to perform a union on
data that meets different criteria.
union allows you to combine data from multiple tables into a single
result set. Unlike the join operator, which combines columns into a
single row, the union operator simply appends rows from one table to
another. This is particularly useful when you want to merge datasets that
have similar structures but different records.
The syntax of the union operator is straightforward. It consists of the
keyword union followed by the table references you want to combine.
Here is the basic syntax:
You can also specify additional parameters to modify the behavior of the
union operator. These parameters include kind , withsource , and
isfuzzy :
• kind This parameter determines how the columns are combined in the
result set. The inner option retains only the common columns to all
input tables, while the outer option includes all columns from any input
table. The default is outer .
The union operator in KQL allows you to combine the results of multiple
queries into a single result set, providing a powerful tool for uniting data
from different sources or tables. Here is an introduction to the basic usage
of the union operator in KQL.
This query will append the rows from Sales_2023 to the rows from
Sales_2022 and return the combined result set.
Sometimes, the tables you want to union may have columns with
different names. The union operator handles this situation by aligning
the columns based on their positions in the query:
Table1
| project Name, Age
Table2
| project FullName, YearsOld
Table1 | union Table2
In this example, Table1 has Name and Age columns, while Table2
has FullName and YearsOld columns. When we union these
tables, the columns will be aligned as follows:
As you can see, the columns with different names are still included in the
result set but are aligned based on their positions in the query.
The union operator allows you to filter and sort the unioned data. You
can use the where clause to filter the rows based on specific conditions
and the order by clause to sort the rows based on one or more columns:
Table1
| where Category == "Electronics"
Table2
| where Category == "Clothing"
You can also use let statements with the union operator to create
named variables for the tables you want to union . This can make your
query more readable and easier to maintain:
Tip
SigninLogs
| where TimeGenerated > ago(7d)
| project Id,ConditionalAccessPolicies, Status,Us
| where ConditionalAccessPolicies != "[]"
| mv-expand todynamic(ConditionalAccessPolicies)
| union (
AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(7d)
| project Id,ConditionalAccessPolicies, Statu
| where ConditionalAccessPolicies != "[]"
| mv-expand todynamic(ConditionalAccessPolici
)
| where ConditionalAccessPolicies.enforcedSession
ConditionalAccessPolicies.enforcedSessionControls
| where ConditionalAccessPolicies.result !="repor
!="notApplied"
| extend SessionNotSatisfyResult = ConditionalAcc
| extend Result = case (SessionNotSatisfyResult c
'SignInTokenProtection' , 'Block','Allow')
| extend CADisplayName = ConditionalAccessPolicie
| extend CAId = ConditionalAccessPolicies.id
| summarize by Id,tostring(CAId),tostring(CADispl
| summarize Requests = count(), Block = countif(R
dcount(UserPrincipalName), BlockedUsers = dcounti
tostring(CADisplayName),tostring(CAId)
| extend PctAllowed = round(100.0 * Allow/(Allow+
| sort by Requests desc
• Reduce the number of columns in the result set by using the project
operator to select only the necessary columns.
• Use filters (the where clause) to limit the number of rows processed by
the union operator.
• Ensure that the unioned columns have compatible data types to avoid
potential errors.
There are always questions about where you want to use the union
operator versus where you want to use the join operator. Let's dig into
some of the differences to provide clarity.
While the union and join operators are both used to combine data
from multiple tables, they have some key differences:
• The union operator combines rows from different tables into a single
result set, while the join operator combines columns from different
tables into a single row.
• The union operator does not require a common column between the
tables, while the join operator relies on a common column for matching
records.
• The union operator appends rows from one table to another, while the
join operator combines rows based on matching values in the common
column(s).
Choosing between the union and join operators depends on the nature
of your data and the desired outcome of your query. The union operator
is the appropriate choice to combine rows from different tables or datasets.
On the other hand, to combine columns from different tables based on a
common column, the join operator is the way to go.
When working with joins in KQL, it's important to follow best practices to
optimize performance and ensure efficient query execution. Here are some
tips to keep in mind:
• Choose the appropriate join flatypevor Select the join type that best
suits your specific use case and requirements. Consider factors such as the
desired output, data volume, and performance implications.
• Use appropriate filters Apply filters to limit the data before performing
the join operation. This can significantly reduce the amount of data
processed and improve query performance.
• Consider table sizes Consider the sizes of the tables involved in the join
operation. If one table is significantly larger than the other, consider using it
as the left table to optimize performance.
• Review and optimize query execution plans Monitor and analyze the
query execution plans to identify potential performance bottlenecks.
Consider using hints or other optimization techniques to improve query
performance.
TIP
let dc = IntuneDevices
| extend entra_DeviceID = tostring(ReferenceId);
let entraIDsignin = SigninLogs
| extend entra_DeviceID = tostring(DeviceDetail.d
entraIDsignin
| join kind=inner dc on entra_DeviceID
| extend authenticationMethod_ = tostring(parse_j
| extend succeeded_ = tostring(parse_json(Authent
| extend IntuneDeviceID = DeviceId
| extend trustType_ = tostring(DeviceDetail.trust
| where trustType_ == 'Azure AD joined'
| where ManagedBy == 'Intune'
| where Resource == "Microsoft.aadiam" and AppDis
| where succeeded_ == 'true'
| where authenticationMethod_== "Password" and su
| where AuthenticationRequirement == 'singleFacto
| where PrimaryUser != UserId
| summarize logins=count() by UserPrincipalName,
| render columnchart
Joining Data
Before diving into the different flavors of KQL joins, let's start with the
basics. The join operator in KQL allows you to merge rows from two or
more tables based on matching values in specified columns. join allows
you to combine data from different sources and create new relationships
between data points.
You need two tables with at least one column containing matching values to
perform a join . The join operator then matches the rows in these
tables based on the specified conditions and creates a new table with the
merged results.
It's important to note that the join operation in KQL is similar to the
join operation in SQL. However, KQL provides the following join types,
offering more flexibility and control over the merging process:
• Innerunique join
• Inner join
• Leftouter join
• Righouter join
• Fullouter join
• Leftsemi join
• Rightsemi join
• Leftanti join
• Rightanti join
Innerunique
With an inner join , only the rows with matching values in the
specified columns are included in the resulting table. Let's continue with
our previous example, where we joined the Fruit and Preparation
tables, but this time, we will join them based on the number column:
The leftouter join allows you to include all rows from the left table
and only the matching rows from the right table. Even if no matching
values exist in the specified columns, the rows from the left table will still
be included in the resulting table.
Rightouter
Fullouter
The leftsemi join allows you to include all rows from the left table
with matching values in the specified columns while excluding the non-
matching rows from both tables. This means only the rows from the left
table with matches in the right table will be included in the resulting table.
To simplify the join operation with the leftsemi join , let's consider a
scenario where we have two tables: Fruit and Preparation . The
Fruit table contains information about different fruits, including their
names and corresponding numbers. The Preparation table contains
information about various preparations for these fruits. We want to join
these tables based on the common number column:
The resulting table will include the number and fruit columns, with
only the rows from the Fruit table with matching numbers in the
Preparation table. A leftsemi join allows us to simplify the
join operation and focus on the relevant rows in the left table.
Rightsemi
To find matches with the rightsemi join , let's continue with our
previous example of joining the Fruit and Preparation tables. We
will join the same tables based on the number column:
Leftanti
The leftanti join allows you to exclude the rows from the left table
with matching values in the specified columns while including all the rows
from both tables. This means that only the rows from the left table that do
not have matches in the right table will be included in the resulting table.
The resulting table will include the number and fruit columns, with
only the rows from the Fruit table that do not have matching numbers in
the Preparation table. A leftanti join allows us to exclude the
matching rows from the left table and focus on the non-matching rows.
Rightanti
To filter matches with the rightanti join , let's continue with our
previous example of joining the Fruit and Preparation tables. We
will join the same tables based on the number column, excluding the
matching rows from the right table:
TIP
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(3d)
| where SignInActivityId == 'tPcQvrtP4kirTjs98vmi
| join kind=leftouter (union SigninLogs, AADNonIn
AADManagedIdentitySignInLogs
| where TimeGenerated > ago(4d)
| summarize arg_max(TimeGenerated, *) by Uniq
)
on $left.SignInActivityId == $right.UniqueTok
| limit 100
InsightsMetrics
| where TimeGenerated > ago(30m)
| where Origin == "vm.azm.ms"
| where Namespace == "Processor"
| where Name == "UtilizationPercentage"
| summarize avg(Val) by bin(TimeGenerated, 5m), C
| join kind=leftouter (ComputerGroup) on Computer
| where isnotempty(Computer1)
| sort by avg_Val desc nulls first
This query retrieves the average processor utilization for each computer and
joins the results with a list of specified computers. The output provides
valuable insights into the applications and servers experiencing high
processor utilization.
In a more advanced use case, you may want to query logs for selected
applications or servers that have different processor utilization thresholds.
This requires dynamically updating the thresholds without modifying the
KQL query itself. To achieve this, you can leverage the externaldata
operator with an external data file, such as a comma-separated value (CSV)
file, to store and retrieve threshold values. The following query
demonstrates the enhanced use case:
• Ensure that the external storage artifact is accessible, and the connection
string is accurate.
• Validate and sanitize the data retrieved from external sources to avoid
security risks and maintain data integrity.
Before we dive into the details of querying IP ranges using KQL, it's
essential to understand IP-prefix notation. IP-prefix notation, also known as
CIDR (Classless Inter-Domain Routing) notation, concisely represents an IP
address and its associated network mask. It consists of the base IP address
followed by a slash ( / ) and the prefix length.
The IPv4 prefix length ranges from 0 to 32, while the IPv6 prefix length
ranges from 0 to 128. The prefix length denotes the number of leading 1
bits in the netmask, determining the range of IP addresses belonging to the
network.
ipv4_is_in_range(Ipv4Address, Ipv4Range)
datatable(ip_address:string, ip_range:string)
[
'192.168.1.1', '192.168.1.1', // Equal
'192.168.1.1', '192.168.1.255/24', // 24 bit
]
| extend result = ipv4_is_in_range(ip_address, ip
ipv4_is_match() Function
• ip1 and ip2 are strings representing the IPv4 addresses to compare.
datatable(ip1_string:string, ip2_string:string)
[
'192.168.1.0', '192.168.1.0', // Equal
'192.168.1.1/24', '192.168.1.255', // 24 bit
'192.168.1.1', '192.168.1.255/24', // 24 bit
'192.168.1.1/30', '192.168.1.255/24', // 24 bit
]
| extend result = ipv4_is_match(ip1_string, ip2_s
ipv6_compare() Function
• 1 if it is greater
• -1 if it is less
• ip1 and ip2 are strings representing the IPv6 or IPv4 addresses to
compare.
datatable(ip1_string:string, ip2_string:string, p
[
'192.168.1.1', '192.168.1.0', 31, // 31 bit
'192.168.1.1/24', '192.168.1.255', 31, // 24 bit
'192.168.1.1', '192.168.1.255', 24, // 24 bit
]
| extend result = ipv6_compare(ip1_string, ip2_st
In the above example, we compared IPv6 and IPv4 addresses using the
ipv6_compare() function. The result table shows the comparison
results based on the specified IP-prefix notation and prefix length.
ipv6_is_match() Function
• ip1 and ip2 are strings representing the IPv6 or IPv4 addresses to
compare.
datatable(ip1_string:string, ip2_string:string)
[
// IPv4 are compared as IPv6 addresses
'192.168.1.1', '192.168.1.1', // Equal
'192.168.1.1/24', '192.168.1.255', // 24 bit
'192.168.1.1', '192.168.1.255/24', // 24 bit
'192.168.1.1/30', '192.168.1.255/24', // 24 bit
// IPv6 cases
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c
'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446
'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c
// Mixed case of IPv4 and IPv6
'192.168.1.1', '::ffff:c0a8:0101', // Equal
'192.168.1.1/24', '::ffff:c0a8:01ff', // 24 bi
'::ffff:c0a8:0101', '192.168.1.255/24', // 24 bi
'::192.168.1.1/30', '192.168.1.255/24', // 24 bi
]
| extend result = ipv6_is_match(ip1_string, ip2_s
The above example compared IPv6 and IPv4 addresses using the
ipv6_is_match() function. The result table showcased the
comparison results based on the specified IP-prefix notation and prefix
length.
TIP
These ranges are reserved and should not be used on the public internet.
Any IP address falling within these ranges is considered a private network
address.
ipv4_is_private(ip)
The ip parameter represents the IPv4 address that you want to check for
private network membership. The function returns a boolean value:
• false if it doesn't
ipv4_is_private('192.168.1.1/24') == true
ipv4_is_private('10.1.2.3/24') == true
ipv4_is_private('202.1.2.3') == false
ipv4_is_private("127.0.0.1") == false
datatable(ip_string:string) [
'10.1.2.3',
'192.168.1.1/24',
'127.0.0.1',
]
| extend result = ipv4_is_private(ip_string)
The above query created a datatable with three IP addresses. We then
extended the datatable with a new column called result , which used the
ipv4_is_private() function to check the private network
membership of each IP address. The output indicates whether each IP
address belongs to a private network.
IP-Prefix Notation
In IPv4, the prefix length ranges from 0 to 32, while in IPv6, it ranges from
0 to 128. The larger the prefix length, the smaller the range of IP addresses
that belong to the network. For example, a prefix length of 32 represents a
single IP address,
Note
geo_info_from_ip_address(IpAddress)
Note
print ip_location=geo_info_from_ip_address('20.53
mv-expand Operator
a b
1 10
1 20
2 a
2 b
In this example, the b column is expanded, creating new rows for each
element in the dynamic array.
SINGLE COLUMN—BAG EXPANSION
a b
1 {"prop1": "a1"}
1 {"prop2": "b1"}
2 {"prop1": "a2"}
2 {"prop2": "b2"}
Let's expand a bag into key-value pairs using the mv-expand operator
and extend to create new columns:
a b key val
1 ["prop1","a1"] prop1 a1
1 ["prop2","b1"] prop2 b1
2 ["prop1","a2"] prop1 a2
2 ["prop2","b2"] prop2 b2
In this example, the bag is expanded into key-value pairs, allowing uniform
access to the properties.
ZIPPED TWO COLUMNS
a b c
1 {"prop1":"a"} 5
1 {"prop1":"a"} 4
1 {"prop1":"a"} 3
1 {"prop2":"b"} 5
1 {"prop2":"b"} 4
1 {"prop2":"b"} 3
We can expand one after the other to get a Cartesian product of expanding
two columns.
datatable (a: int, b: dynamic, c: dynamic)
[
1, dynamic({"prop1": "a", "prop2": "b"}), dyn
]
| mv-expand b
| mv-expand c
a b c
1 {"prop1": "a"} 5
1 {"prop1": "a"} 6
1 {"prop2": "b"} 5
1 {"prop2": "b"} 6
CONVERT OUTPUT
In this example, the b and c columns are expanded and explicitly cast to
the int data type using the to typeof() clause.
parse Operator
The parse operator is another powerful tool that allows you to extract
specific parts of a string based on a defined pattern. Unlike regular
expressions, which can be complex and challenging to work with, the
parse operator provides a simpler and more intuitive approach to string
extraction. It is particularly useful when dealing with well-formatted strings
that have recurring text patterns.
The parse operator takes the name of the column to parse , followed
by the with keyword and the pattern to match in the string. You can also
provide a default result to handle cases where the pattern doesn't match.
In the following sections, we’ll explore some examples to see how the
parse operator can be used effectively.
In this example, the parse operator extracts the data following the GET
text and places it in a new column called Data .
Message ID
Executed 'Function2' (Failed, Id=123, Duration=50
Executed 'Function2' (Failed, Id=456, Duration=75
On the other hand, the parse operator is handy when you have well-
formatted strings with recurring patterns and need to extract specific parts.
It simplifies the extraction process and avoids the complexity of regular
expressions.
Note
TIP
base64_decode_tostring() Function
base64_decode_tostring(base64_string)
Kusto
It’s important to note that when decoding a base64 string, there might be
cases where the resulting UTF-8 encoding is invalid. In such cases, the
base64_decode_tostring() function returns null . Let’s
consider an example where we try to decode a base64 string generated from
invalid UTF-8 encoding:
On the other hand, if you need to encode a string into base64 format, you
can use the base64_encode_tostring() function. This function
takes a UTF-8 string as input and returns the base64-encoded representation
of the string.
• Data type considerations Ensure that the data type of the base64-
encoded string field is consistent throughout your dataset. Using consistent
data types can facilitate efficient decoding and processing.
In this section, let's focus on JSON data, a common KQL data type. We can
use Kusto Query Language (KQL) to query, manipulate, and analyze JSON
data in Azure Data Explorer and other data platforms. Once you've ingested
JSON data, you can unleash the power of Kusto Query Language (KQL) to
query and analyze the data.
SensorData
| extend Name = extract_json("$.name", Data)
| extend Index = extract_json("$.index", Data)
In this example, we extract the name and index properties from the
SensorData table's Data column using the extract_json
function. This enables you to work with specific JSON properties in your
queries.
When querying JSON data, you can apply filters to narrow down your
results. For example, if you want to retrieve data where the
temperature property is above a certain threshold, you can use the
where operator:
SensorData
| where Temperature > 25
This query filters the SensorData table to only include records where
the Temperature property is greater than 25 . By applying filters, you
can focus on the specific data that meets your criteria.
Early Filtering
SensorData
| where EventID == 8002
| where EventData !has "%SYSTEM32"
| extend Details = parse_xml(EventData)
| extend FilePath = tostring(Details.UserData.Rul
| extend FileHash = tostring(Details.UserData.Rul
| where FileHash != "" and FilePath !startswith "
| summarize count() by FileHash, FilePath
In this query, the where conditions are applied before parsing XML,
filtering out irrelevant records early in the process.
Full parsing of complex JSON objects can consume significant CPU and
memory resources. When you only need a few parameters from the JSON
data, it's more efficient to parse them as strings using the parse operator
or other text-parsing techniques. This approach can significantly boost
performance, especially when dealing with large JSON datasets.
Advanced JSON Processing Techniques
When working with JSON arrays, you can use functions like mv-expand
to expand array elements into separate records. This allows you to perform
operations on individual array elements:
SensorData
| mv-expand Data
| extend Name = extract_json("$.name", Data)
In this query, the mv-expand function expands the JSON array elements
in the Data column, enabling you to extract specific properties from each
array element.
TIP
KQL supports querying and manipulating nested JSON objects. You can
access nested properties using dot notation or JSONPath-like expressions:
SensorData
| extend NestedProperty = Data.NestedObject.Neste
KQL allows you to join JSON data from multiple tables using the join
operator, meaning you can combine data from different JSON sources based
on common properties:
Table1
| join kind=inner (Table2) on $left.CommonPropert
By joining JSON data, you can perform more complex analysis and derive
insights from multiple data sources.
Time-Series Analysis
In the era of cloud services and IoT devices, businesses generate massive
amounts of telemetry data that holds valuable insights that can be leveraged
to monitor service health, track physical production processes, and identify
usage trends. However, analyzing this data can be challenging without the
right tools and techniques. This is where time-series analysis comes into
play. By utilizing the power of the Kusto Query Language (KQL),
businesses can unlock the full potential of their time-series data.
In this section, we will delve into the world of time-series analysis using
KQL. We will explore the process of creating and analyzing time-series and
highlight the key functions and operators that KQL offers for time-series
manipulation. By the end, you will have a solid understanding of how to
harness the power of KQL to gain valuable insights from your time-series
data.
The first step in time-series analysis is to transform your raw telemetry data
into a structured format suitable for analysis. KQL provides the make-
series operator to simplify this process. It allows you to create a set of
time series by partitioning the data based on specific dimensions and
aggregating the values within each partition.
Let's consider an example where we have a demo_make_series1 table
containing web service traffic records. To create a time series representing
the traffic count partitioned by the operating system (OS), we can use the
following KQL query:
TIP
SigninLogs
| where ResultType == 0 or ResultType == 50074
// Filter out the AADC Sync Account
| where SignInIdentifier !startswith "Sync_"
// Filter out Sign-in Events from ADFS Connect He
| where SourceSystem == "Azure AD"
| extend AuthenticationDetails = todynamic(Authen
| mv-expand AuthenticationDetails
| extend authenticationMethod_ = tostring(parse_j
// Filter out sign-in events without relevant Aut
| where authenticationMethod_ != "Previously sati
| make-series SignIns = count() default = 0 on Ti
Once you have created the time series, KQL provides a range of functions
to process and analyze them. These functions enable you to identify
patterns, detect anomalies, and perform regression analysis on your time-
series data.
TIP
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(3d)
| where ServicePrincipalId == '9d6399dd-e9f6-4271
| extend path = replace_string(replace_string(rep
@'(\/)+','//'),'v1.0/',''),'beta/','')
| extend UriSegments = extract_all(@'\/([A-z2]+
| extend OperationResource = strcat_array(UriSegm
RequestMethod, OperationResource
Note
Note
By leveraging the RE2 syntax and Microsoft's RE2 library, you can harness
the full power of regular expressions in KQL, enabling you to easily
perform complex searches and data extractions.
let Regex=@"(?i)attrib.*\+h\\";
let TestString="attribute +h\";
print(iif(TestString matches regex Regex, true,fa
In this example, the Regex variable holds the regex pattern, and the
TestString variable contains the string you want to test against the
pattern. The print statement checks if the TestString matches the
regex pattern and returns either true or false .
By testing your regex patterns in KQL, you can ensure their accuracy
and reliability before utilizing them in your production queries.
Enhancing Detection Rules and Migrating from Other SIEM Tools
bin() Function
Data analysts often need to aggregate data and calculate summary statistics
to gain meaningful insights. One powerful tool for data aggregation in the
Kusto Query Language (KQL) is the bin() function. We will explore the
various applications of the bin() function and learn how to leverage its
capabilities to analyze and summarize data effectively.
bin(value, roundTo)
Here, value represents the data point that needs to be rounded down, and
roundTo indicates the bin size that divides the value. The bin()
function returns the nearest multiple of the roundTo value. Null values,
null bin size, or a negative bin size will result in null .
Numeric bins
The bin() function divides the revenue values into 1000 -dollar bin s.
The summarize operator calculates the total revenue for each bin . The
output of this query will provide insights into revenue distribution across
different bin s, helping us identify trends and patterns in the data.
Timespan bins
datatable(CallDuration: timespan)
[
time(0h, 2m, 30s),
time(0h, 7m, 45s),
time(0h, 4m, 20s),
time(0h, 10m, 15s),
time(0h, 1m, 30s)
]
| summarize Count = count() by bin(CallDuration,
In this query, the bin() function divides the call durations into 5-minute
bin s. The summarize operator calculates the count of calls for each
bin . This analysis can help us identify the distribution of call durations
and uncover any patterns or anomalies in the data.
Datetime bins
datatable(OrderTime: datetime)
[
datetime(2023-01-01 10:00:00),
datetime(2023-01-01 14:30:00),
datetime(2023-01-02 11:45:00),
datetime(2023-01-02 13:15:00),
datetime(2023-01-03 09:20:00)
]
| summarize Count = count() by bin(OrderTime, 1d)
In this query, the bin() function divides the order timestamps into daily
bin s. The summarize operator calculates the count of orders for each
bin . This analysis provides insights into the daily order volume, helping
us understand customer behavior and plan inventory accordingly.
Sometimes, there may be missing data points for certain bin s in a table.
To ensure a complete representation of all bin s, we can pad the table with
null values for the missing bin s. Let's consider an example where we
have a dataset of website visits, and we want to analyze the number of visits
for each day of the week:
In this query, we first used the summarize operator with the bin()
function to calculate the total visits for each day. Then, we used the
range operator to generate a table with all the dates in the desired range.
Finally, we joined the generated table with the summarized data using a
leftouter join to include null values for the missing days,
ensuring all days of the week are represented in the output, even if there
were no visits on certain days.
Functions are reusable subqueries or query parts that can be defined as part
of the query itself or stored as part of the database metadata. Functions are
invoked through a name, provided with input arguments, and produce a
single value based on their body. They can be categorized into two types:
built-in functions and user-defined functions.
• The function name must follow the same identifier naming rules as
other Kuston entities.
• The name of the function should be unique within its scope of definition.
To invoke a function that doesn't require any arguments, simply call the
function's name followed by parentheses:
let helloWorld = () {
"Hello, World!"
};
print helloWorld()
User-defined functions in Kusto can have default values for their scalar
input arguments. Default values are specified after the argument type and
are used when the argument is not provided during the function invocation:
In this case, the greetUser function has a default value for the name
argument, which is "Guest" . If the function is invoked without
providing a value for name , it will use the default value and return the
"Hello, Guest!" string.
Materialize Function
In the world of data analysis and query optimization, finding efficient ways
to speed up queries and improve performance is crucial. One powerful tool
that can help achieve this is the KQL materialize function.
materialize(expression)
NOTE
Summary
OceanofPDF.com
Chapter 4. Operational Excellence with KQL
• Learn to proactively detect and mitigate security threats, secure the cloud
infrastructure, and enhance incident response capabilities
• Identify the steps for creating and executing advanced hunting queries
with KQL
• Leverage sample queries and online resources to learn from experts and
improve skills
Originally designed for Azure Data Explorer (ADX), KQL has gained
popularity in various IT operations scenarios, including advanced hunting
and cloud security. With its intuitive syntax and extensive set of operators
and statements, KQL enables IT professionals to extract valuable insights
and perform advanced analytics on their data.
AzureDiagnostics
| where Resource =~ "varund-qpi-demo"
| where ResourceProvider =="MICROSOFT.DBFORPOSTGR
| where Category == "PostgreSQLLogs"
| where TimeGenerated >= ago(2d)
| where Message contains "disconnection: session
| extend pgmessage = tostring(split(Message, "dis
| extend myuser = tostring(split(tostring(split(p
| extend hours = todecimal(substring(pgmessage, 0
| extend minutes = todecimal(substring(pgmessage,
| extend seconds = todecimal(substring(pgmessage,
| extend milliseconds = todecimal(substring(pgmes
| extend connection_life_seconds = hours*60*60+mi
| where myuser != 'azuresu'
| extend connection_type = case(connection_life_s
connection_life_seconds between (60 .. 1200) , st
strcat("Long Live Connections"), "")
| summarize max(connection_life_seconds) by TimeG
| render timechart
To get started with KQL, you need to familiarize yourself with the syntax,
operators, and statements. Microsoft provides comprehensive
documentation and resources to help you learn KQL. Additionally, you can
leverage online communities and forums to seek guidance and connect with
experts in the field. The next sections will delve deeper into the practical
applications of KQL in advanced hunting and cloud security.
• project and top allow you to select and limit the columns and rows
in the query results
The following query combines process and network event data to identify
PowerShell execution events involving suspicious downloads or
commands:
union DeviceNetworkEvents
| where Timestamp > ago(1d)
| summarize count() by RemoteIP
| where count_ > 1000
| project RemoteIP, count_
KQL can also be used for infrastructure and application scanning in the
cloud. IT professionals can identify potential security misconfigurations and
vulnerabilities by querying infrastructure-as-code templates and
configuration files. KQL enables them to search for specific patterns or
keywords that indicate insecure configurations or weak access controls.
Additionally, KQL can scan application logs and performance metrics,
helping organizations proactively detect and respond to security incidents in
real-time.
TIP
First query:
AzureMetrics
| where ResourceProvider == "MICROSOFT
| where TimeGenerated >=ago(60min)
| where MetricName in ('deadlock')
| parse _ResourceId with * "/microsoft
| summarize Deadlock_max_60Mins = max(M
Second query:
AzureMetrics
| where ResourceProvider == "MICROSOFT
| where TimeGenerated >= ago(60min)
| where MetricName in ('cpu_percent')
| parse _ResourceId with * "/microsoft
| summarize CPU_Maximum_last15mins = ma
CPU_Average_last15mins = avg(Average) b
TIP
Once an ADX cluster is set up, professionals can ingest data from various
sources, such as Azure Blob Storage or log files, using KQL. Ingestion
allows them to load data into tables within the cluster, making it available
for querying and analysis. By writing KQL queries, professionals can
explore the ingested data, filter and aggregate it based on specific criteria,
and derive meaningful insights. KQL's versatility and flexibility enable
professionals to manipulate and transform data, facilitating in-depth
exploration and analysis.
• Use efficient filters Apply filters early in the query to reduce the amount
of data processed.
• Limit result sets Use the top operator to limit the number of rows
returned in the query results.
• Understand data types Familiarize yourself with the different data types
supported by KQL, such as datetime , string , bool , int , and
long .
• Handle datetime data Use datetime functions to manipulate and
compare timestamps effectively.
• Validate input data Ensure that input data matches the expected data type
to avoid errors and inconsistencies in query results.
• Generate alerts and reports Configure alerts and generate reports based
on KQL query results to facilitate incident response and compliance
reporting.
TIP
First query:
let today = SigninLogs
| where TimeGenerated > ago(1h) // Quer
| project TimeGenerated, UserPrincipalN
"failure")
// Optionally filter by a specific appl
//| where AppDisplayName == **APP NAME*
| summarize success = countif(status ==
1h) // hourly failure rate
| project TimeGenerated, failureRate =
| sort by TimeGenerated desc
| serialize rowNumber = row_number();
let yesterday = SigninLogs
| where TimeGenerated between((ago(1h)
the same time yesterday
| project TimeGenerated, UserPrincipalN
"failure")
// Optionally filter by a specific appl
//| where AppDisplayName == **APP NAME*
| summarize success = countif(status ==
1h) // hourly failure rate at same time
| project TimeGenerated, failureRateYes
| sort by TimeGenerated desc
| serialize rowNumber = row_number();
today
| join (yesterday) on rowNumber // join
| project TimeGenerated, failureRate, f
// Set threshold to be the percent diff
yesterday
// Day variable is the number of days s
because large variability in traffic is
| extend day = dayofweek(now())
| where day != time(6.00:00:00) // excl
| where day != time(0.00:00:00) // excl
| where day != time(1.00:00:00) // excl
| where abs(failureRate – failureRateYe
Second query:
To further advance your skills in using KQL for IT operations, consider the
following next steps:
• Social media Follow industry experts and thought leaders on social media
platforms to stay updated on the latest trends and developments in KQL and
IT operations.
• Sources The types of metric and log data to be collected and sent to the
specified destinations
• Destinations The target location for storing the collected data, such as a
Log Analytics workspace or Azure Storage account
Note
By and large, enabling Diagnostic Settings for each Azure service remains
mostly the same. There are some variances between services, but those
variances are generally concerned with where the option resides in the
lefthand menu for that specific service or if a post-creation configuration
needs to be made.
To create a diagnostic setting for an Azure service, you can follow these
steps:
1. In the Azure portal, navigate to the service for which you want to create a
diagnostic setting.
3. Click the +Add Diagnostic Setting button at the top of the page.
4. Give your diagnostic setting a name and choose the destination where
you want to send the logs and metrics. You can choose to send them to a
Log Analytics workspace, an Event Hub, or a storage account.
5. Select the logs and metrics you want to send to the destination by
checking the boxes next to their names.
TIP
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(3d)
| where ResponseStatusCode == 429
| extend path = replace_string(replace_string(rep
@'(\/)+','//'),'v1.0/',''),'beta/','')
| extend UriSegments = extract_all(@'\/([A-z2]+
| extend OperationResource = strcat_array(UriSegm
OperationResource, RequestMethod
| sort by RateLimitedCount desc
| limit 100
Using KQL for Microsoft Intune for Diagnostics and
Compliance
3. Click the Create button to start the Log Analytics workspace creation
process.
5. Enter a name for the Log Analytics workspace, ensuring it meets the
naming requirements.
6. Select the Azure Region where the workspace data will be stored.
7. Finally, provide any necessary tags for better resource management and
click Review + Create to validate the configurations.
8. Once validated, click Create to initiate the creation of the Log Analytics
workspace.
3. Click the Add Diagnostic Setting link to add the Export option to the Log
Analytics workspace.
5. Select the desired Log Categories from the available options, including
AuditLogs , OperationalLogs, DeviceComplianceOrg, and Devices .
6. Choose the appropriate destination for the exported data, such as the Log
Analytics workspace.
7. Select the Azure Subscription and Log Analytics workspace from the
respective dropdown lists.
Intune Audit Logs provide valuable insights into policy and settings
changes within the Microsoft Intune environment. By utilizing KQL
queries, administrators can dive deep into these logs and better understand
who made what changes and when. The IntuneAuditLogs table is
particularly useful for investigating policy changes and user activities.
IntuneAuditLogs
| project-rename User=Identity, Change=OperationN
| project TimeGenerated, Change, User
| summarize count() by User
| render columnchart
The following query helps identify changes to policy settings by parsing the
Properties column and extracting the relevant information. The
ModifiedProperties column provides details about the changes
made:
IntuneAuditLogs
| where TimeGenerated >= ago(30d)
| where OperationName !contains "Assignment"
| parse Properties with * ',"TargetDisplayNames"
| parse Properties with * '"TargetDisplayNames":[
'],'*
| project TimeGenerated, Identity, Object, Operat
TIP
The following query focuses on a specific policy, DJ-1, and captures group
assignment changes related to that policy. By parsing the
GroupAssignmentChanges column, the query extracts the change
type (add or remove) and provides a timeline of the changes.
IntuneAuditLogs
| where OperationName contains "Assignment"
| parse Properties with * '"TargetDisplayNames":[
| where IntuneProperty == "DJ-1"
| parse GroupAssignmentChanges with * 'New":"' Ch
| project TimeGenerated, Identity, Policy=IntuneP
Incident Management and Automation
IntuneDeviceComplianceOrg
| where isnotempty(DeviceHealthThreatLevel)
| where ComplianceState != "Compliant"
| project TimeGenerated, ComplianceState, DeviceN
| summarize arg_max(TimeGenerated, *) by DeviceId
Kusto Query Language (KQL) is a powerful tool for advanced hunting, and
it offers different modes to cater to users with varying levels of expertise.
Following are three modes that users can choose from to get started with
KQL:
1. Guided Mode: Query Builder If you are new to KQL or prefer a more
guided approach, you can start with the query builder in Guided Mode. The
query builder allows you to craft meaningful hunting queries without
having in-depth knowledge of KQL or the data schema. It provides a user-
friendly interface where you can select filters, operators, and values to build
your query step by step.
2. Advanced Mode: KQL Queries For those familiar with KQL and
seeking more control over their queries, the Advanced Mode allows you to
write custom KQL queries from scratch. In this mode, you have complete
freedom to leverage the full power of KQL, using its rich set of operators,
functions, and statements to construct complex queries that meet your
specific hunting requirements. The Advanced Hunting window is shown in
Figure 4-1.
To help you get started with advanced hunting, Microsoft provides a range
of sample queries in Query Builder that you can use as a starting point.
These sample queries cover various threat scenarios and can be customized
to suit your specific needs. By examining and editing these sample queries,
you can learn the syntax and structure of KQL and gain a deeper
understanding of how to construct effective hunting queries. Figure 4-3
shows how to locate the included sample queries in the Advanced Hunting
tool.
FIGURE 4-3 Advanced Hunting makes it easy to locate the stack of sample
queries available to use and modify
• Use time-based filters Leverage the time range filters in your queries to
analyze data within specific timeframes. This allows you to identify recent
threats or patterns of suspicious activity.
Note
Azure Monitor workbooks are invaluable for visualizing and analyzing data
within the Azure portal. With the ability to combine multiple data sources
and leverage the power of KQL (Kusto Query Language), Workbooks
provide a flexible canvas for creating rich reports and interactive
experiences.
Azure Monitor workbooks are a powerful Azure portal tool that allows
users to create rich visual reports and interactive experiences by combining
multiple data sources. With workbooks, users can leverage KQL queries to
extract insights and visualize data from various Azure services.
• Multiple data sources Workbooks can query data from various Azure
sources, including logs, metrics, Azure Resource Graph, Azure Resource
Manager, Azure Data Explorer, JSON, and custom endpoints.
The Workbooks Gallery is where you can find all the saved workbooks and
templates for your workspace. It is organized into different tabs, such as
All, Workbooks, Public Templates, and My Templates, to help you easily
sort and manage your workbooks.
To create a new Azure workbook, click the New button in the Workbooks
Gallery, as shown in Figure 4-4. You can start with an empty template or
choose from existing templates to customize based on your needs.
FIGURE 4-4 Creating a new workbook in Azure Monitor.
Azure Monitor workbooks can query data from a wide range of Azure data
sources. These sources include logs from Azure Monitor Logs (Application
Insights resources and Log Analytics workspaces), resource-centric data
such as activity logs, metrics from Azure Monitor, Azure Resource Graph,
Azure Resource Manager, Azure Data Explorer, JSON data, custom
endpoints, and more. Each data source provides unique information and
insights that can be visualized and analyzed within Azure workbooks.
Logs are a fundamental data source in Azure Monitor workbooks, and KQL
(Kusto Query Language) is the query language used to extract insights from
log data. KQL allows users to perform powerful queries on log data, filter
and aggregate data, and create visualizations based on the results. Users can
write KQL queries to query logs from Azure Monitor Logs, Application
Insights, and Log Analytics workspaces, allowing for deep analysis and
troubleshooting of applications and infrastructure.
Azure Data Explorer, also known as Kusto, is a fast and highly scalable data
exploration service that allows users to query and analyze large volumes of
data. Azure workbooks now support querying data from Azure Data
Explorer clusters using the powerful Kusto query language. Users can
specify the cluster name and region and then write KQL queries to retrieve
the desired data. The results can be visualized within Azure workbooks to
gain insights into the data stored in Azure Data Explorer.
Azure workbooks can query data from any external source using the JSON
provider and custom endpoints. The JSON provider allows users to create
query results from static JSON content, which can be useful for creating
dropdown parameters of static values. Custom endpoints enable users to
bring data from external sources into their workbooks, allowing for a more
comprehensive analysis of data that lives outside Azure. Users can specify
the necessary parameters, such as the HTTP method, URL, headers, URL
parameters, and body, to retrieve data from custom endpoints.
Charts and graphs are powerful visualizations that enable users to represent
data in a graphical format, making it easier to understand patterns and
trends. Azure workbooks provide a wide range of chart types, including line
charts, bar charts, pie charts, area charts, scatter plots, and more. Users can
customize the appearance of charts by selecting different colors, legends,
axes, and other settings to create visually appealing visualizations.
Grids and tables are commonly used to present data in a tabular format.
Azure workbooks allow users to create interactive grids and tables that can
be sorted, searched, and filtered.
Users can customize the appearance of grids by styling columns as
heatmaps or bars, adjusting column widths, and applying custom formatting
to cell values. Grids and tables provide a structured and organized way to
present data and enable users to explore and analyze data systematically.
In addition to text, charts, and grids, Azure workbooks offer other unique
visualizations, such as tiles, trees, and maps. Tiles provide a compact way
to display key metrics and summaries. Trees help visualize hierarchical data
structures, such as resource groups and their associated resources. Maps
allow users to plot data on a geographical map, providing insights based on
location. These visualizations add depth and variety to the reports, allowing
users to present data in different formats depending on the nature of the
information.
Following are some advanced techniques for working with KQL workbook
queries:
Write efficient KQL queries Efficiency is crucial when working with large
datasets and complex queries. When optimizing query performance, and it
is important to consider factors such as query structure, filtering,
aggregation, and data sampling. By writing efficient KQL queries, users can
reduce query execution time and improve the overall performance of their
workbooks.
• Carefully query logs Careful consideration is essential when querying
logs to ensure optimal performance and usability. Some best practices for
querying logs include using the smallest possible time ranges, protecting
against missing columns and tables, and utilizing fuzzy union and
parameter queries. These practices help streamline queries, protect against
errors, and improve the overall user experience when working with log
data.
Heatmaps and spark bars are powerful visualization techniques that can be
applied to grid visualizations in Azure workbooks. Heatmaps use color
gradients to represent data values, allowing users to quickly identify
patterns and anomalies. Spark bars provide a compact way to display trends
and variations in data. By leveraging heatmaps and spark bars, users can
enhance the visual impact of their grid visualizations and convey insights
more effectively.
Following are some tips and tricks for creating effective workbooks:
Note
By combining the capabilities of Azure Data Explorer and Power BI, you
can harness the full potential of your data analysis. Whether you're
analyzing telemetry, monitoring system logs, or tracking user behavior, the
integration between ADX and Power BI provides a seamless workflow to
transform raw data into actionable insights.
Before you can start building Power BI reports with Azure Data Explorer
data, you must establish a connection between the two. Power BI offers
different connectivity modes—Import and DirectQuery—depending on
your specific requirements.
• Import mode In Import mode, the data from Azure Data Explorer is
copied to Power BI, allowing for faster performance and offline access.
Import mode is suitable for small datasets or scenarios where near real-time
data is not required.
2. Create a query and select it. For example, you can query the
StormEvents table in the Samples database.
5. Select the Transform Data to open the Power Query Editor, as shown in
Figure 4-7.
FIGURE 4-7 Select the Transform Data opton.
6. Paste the Azure Data Explorer web UI query into the Navigator pane.
8. Choose Close & Apply the changes to load the data into Power BI.
Figure 4-8 shows the query imported into Power BI.
FIGURE 4-8 Imported query
5. Fill out the required information, such as the Cluster URL, database, and
table name.
6. Optionally, you can select advanced options for your queries and enable
additional capabilities.
8. Select the desired table or tables on the Navigator screen and click Load
Data.
9. Optionally, you can shape your data using the Power Query Editor.
10. Choose Close & Apply the changes to load the data into Power BI.
When running a query in Azure Monitor Logs, it's helpful to analyze the
query performance indicators provided in the Query Details pane:
• Total CPU Represents the overall compute used to process the query
across all compute nodes.
• Data Used For Processed Query Indicates the amount of data accessed
to process the query.
• Age Of Processed Data Indicates the gap between the current time and
the oldest data accessed for the query.
For example, consider a query that retrieves security events and parses
XML data:
SecurityEvent
| extend Details = parse_xml(EventData)
| extend FilePath = tostring(Details.UserData.Rul
| extend FileHash = tostring(Details.UserData.Rul
| where FileHash != "" and FilePath !startswith "
| summarize count() by FileHash, FilePath
Query 1:
Syslog
| extend Msg = strcat("Syslog: ",SyslogMessage)
| where Msg has "Error"
| count
Query 2:
Syslog
| where SyslogMessage has "Error"
| count
Both queries produce the same result, but the second one is more efficient
because it filters directly on the SyslogMessage column instead of an
evaluated column.
When using the join and summarize commands, it's important to consider
the cardinality of the columns used as dimensions. Higher cardinality can
lead to increased CPU utilization and slower query performance.
Optimizing the usage of these commands by selecting appropriate
aggregation functions and dimensions can improve the overall efficiency of
your queries.
In addition to Power BI, Azure Data Studio offers another option for
analyzing log analytics data—using notebooks. Notebooks provide a
flexible data exploration and analysis environment, allowing you to connect
to Log Analytics and create notebooks to analyze logs.
3. Use Kusto queries to retrieve and analyze log data within the notebook.
NOTE
To export Kusto queries to M and use the Web Connector, follow these
steps:
2. From the Power Query UI, select the Export option and download the
query as a file.
4. Copy the content of the downloaded file into the blank query in Power
BI.
5. Customize the query as needed and load the data into Power BI.
The Web Connector only supports Import mode, meaning the data needs to
be refreshed to stay updated. While this solution has some limitations, it
provides a viable option for accessing Log Analytics data in Power BI.
Despite its complexity, the Kusto/Data Explorer Connector offers one of the
best options to connect Azure Data Explorer to Power BI. This connector
allows for direct query access and provides the flexibility to load tables and
further transform the data within Power BI.
To use the Kusto/Data Explorer Connector in Power BI, follow these steps:
Log Analytics and Azure Data Explorer share a similar structure, making it
possible to connect to Log Analytics as if it were an ADX cluster. You need
to understand the URL format to connect to Log Analytics from an ADX
client tool.
Note
= Kusto.Contents("https://ade.loganalytics.io/sub
866c027fe3e0/resourcegroups/marketing/providers/m
"AppServiceHTTPLogs", [MaxRows=null, MaxSize=null
You can connect to your Log Analytics workspace and load the desired data
into Power BI by adjusting the connection details in the
Kusto.Contents function.
Discovering the Schema
If you need to discover the available tables in your Log Analytics database,
you can use the same expression but replace the table name with Null. This
will retrieve a list of available tables in the database. Note that you cannot
proceed with navigation; you must choose a specific table and modify the
code accordingly.
Remember to consider the size of your dataset, the need for real-time data,
and the performance requirements of your Power BI reports when choosing
the appropriate connectivity mode. Optimize your log queries by applying
early filtering, avoiding evaluated where clauses, and using effective
aggregation commands and dimensions. Additionally, explore the
functionalities of Azure Data Studio, such as notebooks, and the capabilities
of the Web Connector to enhance your data analysis workflow further.
There are several reasons why you might want to use transformations in
Azure Monitor:
• Remove sensitive data You might have a data source that sends
information you don't want to store for privacy or compliance reasons. With
transformations, you can filter out entire rows or particular columns that
contain sensitive information. Additionally, you can obfuscate sensitive
information by replacing certain values with common characters.
Transformations also allow sending sensitive records to an alternate table
with different role-based access control configurations.
• Reduce data costs Ingesting data into a Log Analytics workspace incurs
costs, so it's important to filter out any data that is not required to reduce
costs. Transformations can help you achieve this by removing entire rows
that are not needed or by filtering out redundant or minimally valuable
columns. You can also parse important data from a column and send certain
rows to basic logs tables for lower ingestion costs.
Transformations are defined in a data collection rule (DCR) and use the
Kusto Query Language (KQL) to manipulate the incoming data. Each entry
in the data is processed individually according to the specified KQL
statement. The transformation must understand the format of the incoming
data and create output in the structure expected by the destination.
For example, if you are collecting data from a virtual machine using Azure
Monitor Agent, you can specify the data to collect from the client operating
system in the DCR. You can also include a transformation that filters or
adds calculated columns to the data after it is sent to the data ingestion
pipeline.
Another example is data sent from a custom application using the Logs
Ingestion API. In this case, the application sends the data to a data
collection endpoint and specifies a DCR in the REST API call. The DCR
includes the transformation and the destination workspace and table.
2. Specify the data sources from which the data will be collected.
Note
It's important to note that currently, the tables in the DCR must be in the
same Log Analytics workspace. If you need to send data to multiple
workspaces from a single data source, you must use multiple DCRs and
configure your application accordingly.
When using Azure Monitor Agent, you can create a data collection rule
(DCR) to specify the data to collect from the client operating system. The
DCR can include a transformation that will be applied to the collected data
after it is sent to the data ingestion pipeline.
1. Define the data collection rule (DCR) that specifies the data sources and
destinations.
2. Write the KQL statements for the transformation to filter or modify the
data.
3. Associate the DCR with the data source, such as a virtual machine.
For example, you can create a DCR that collects data from a virtual
machine and applies a transformation to filter out records with specific
criteria. This allows you to collect only the relevant data and reduce
unnecessary storage costs.
If you are using the Logs Ingestion API to send data to Azure Monitor, you
can create a DCR that includes the transformation logic and specifies the
destination workspace and table.
1. Follow these steps to create a transformation for the Logs Ingestion API:
2. Define the data collection rule (DCR) that specifies the data sources and
destinations.
3. Write the KQL statements for the transformation to filter or modify the
data.
4. Include the DCR in the REST API call when sending data to the data
collection endpoint.
If you have a custom application or data source, you can still leverage
transformations in Azure Monitor by using the Logs Ingestion API. You can
send the data to a data collection endpoint and specify a DCR that includes
the transformation logic, destination workspace, and table.
1. Define the data collection rule (DCR) that specifies the data sources and
destinations.
2. Write the KQL statements for the transformation to filter or modify the
data.
3. Send the data to the data collection endpoint and include the DCR in the
REST API call.
Using the Logs Ingestion API, you can apply transformations to data from
any custom source, allowing you to preprocess and shape the data
according to your specific requirements.
For example, if you ingest 20GB of data, and the transformation filters out
12GB, you will be charged for 2GB of filtered data.
It's worth noting that if Azure Sentinel is enabled for the Log Analytics
workspace, there is no filtering ingestion charge, regardless of how much
data the transformation filters.
Note
Single Destination
The sample template below shows a DCR for Azure Monitor Agent that
sends data to the Syslog table. The transformation filters the data for
records with error in the message.
{
"$schema": "https://schema.management.azure.com
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.Insights/dataCollectionR
"name": "singleDestinationDCR",
"apiVersion": "2021-09-01-preview",
"location": "eastus",
"properties": {
"dataSources": {
"syslog": [
{
"name": "sysLogsDataSource",
"streams": ["Microsoft-Syslog"],
"facilityNames": ["auth", "authpriv
"logLevels": ["Debug", "Critical",
}
]
},
"destinations": {
"logAnalytics": [
{
"workspaceResourceId": "/subscripti
group/providers/Microsoft.OperationalInsights/wor
"name": "centralWorkspace"
}
]
},
"dataFlows": [
{
"streams": ["Microsoft-Syslog"],
"transformKql": "source | where messa
"destinations": ["centralWorkspace"]
}
]
}
}
]
}
The sample template below shows a DCR for data from the Logs Ingestion
API that sends data to both the Syslog and SecurityEvent tables.
It includes separate data flows with transformations for each table.
{
"$schema": "https://schema.management.azure.com
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.Insights/dataCollectionR
"name": "multiDestinationDCR",
"location": "eastus",
"apiVersion": "2021-09-01-preview",
"properties": {
"dataCollectionEndpointId": "/subscriptio
group/providers//Microsoft.Insights/dataCollectio
"streamDeclarations": {
"Custom-MyTableRawData": {
"columns": [
{
"name": "Time",
"type": "datetime"
},
{
"name": "Computer",
"type": "string"
},
{
"name": "AdditionalContext",
"type": "string"
}
]
}
},
"destinations": {
"logAnalytics": [
{
"workspaceResourceId": "/subscripti
group/providers/Microsoft.OperationalInsights/wor
"name": "clv2ws1"
}
]
},
"dataFlows": [
{
"streams": ["Custom-MyTableRawData"],
"destinations": ["clv2ws1"],
"transformKql": "source | project Tim
"outputStream": "Microsoft-Syslog"
},
{
"streams": ["Custom-MyTableRawData"],
"destinations": ["clv2ws1"],
"transformKql": "source | where (Addi
Subject = AdditionalContext",
"outputStream": "Microsoft-SecurityEv
}
]
}
}
]
}
The sample template below shows a DCR for data from the Logs Ingestion
API that sends data to both the Syslog table and a custom table with a
different data format.
{
"$schema": "https://schema.management.azure.com
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.Insights/dataCollectionR
"name": "multiDestinationDCR",
"location": "eastus",
"apiVersion": "2021-09-01-preview",
"properties": {
"dataCollectionEndpointId": "/subscriptio
group/providers//Microsoft.Insights/dataCollectio
"streamDeclarations": {
"Custom-MyTableRawData": {
"columns": [
{
"name": "Time",
"type": "datetime"
},
{
"name": "Computer",
"type": "string"
},
{
"name": "AdditionalContext",
"type": "string"
}
]
}
},
"destinations": {
"logAnalytics": [
{
"workspaceResourceId": "/subscripti
group/providers/Microsoft.OperationalInsights/wor
"name": "clv2ws1"
}
]
},
"dataFlows": [
{
"streams": ["Custom-MyTableRawData"],
"destinations": ["clv2ws1"],
"transformKql": "source | project Tim
"outputStream": "Microsoft-Syslog"
},
{
"streams": ["Custom-MyTableRawData"],
"destinations": ["clv2ws1"],
"transformKql": "source | extend json
AdditionalContext = jsonContext, ExtendedColumn=t
"outputStream": "Custom-MyTable_CL"
}
]
}
}
]
}
Now that you have a comprehensive understanding of data collection
transformations in Azure Monitor, it's time to put this knowledge into
practice. Start by creating a data collection rule and associating it with a
data source in Azure Monitor. Experiment with different transformation
logic to filter, modify, and enrich your data. With the power of
transformations, you can unlock deeper insights and make more informed
decisions based on your data.
KQL is a powerful data analytics service that allows users to explore and
analyze large volumes of diverse data in real time. To make the most of
KQL's capabilities, it's essential to optimize query performance. By
following best practices, you can ensure faster query execution, improved
resource utilization, and enhanced overall efficiency. This section explores a
range of best practices for optimizing query performance.
One of the most crucial factors in query performance is the amount of data
being processed. By reducing the data volume, you can significantly
improve query execution time. There are several mechanisms you can use
to achieve this:
• Use the where operator Utilize the where operator to apply filters
and reduce the amount of data being processed. You can limit the data
retrieved from the source by specifying specific conditions.
• Optimize string operators Use the has operator instead of contains when
searching for full tokens. The has operator is more efficient as it doesn't
search for substrings.
• Filter on specific columns When searching for text, specify the specific
column to search in instead of using the * operator, which performs a full-
text search across all columns.
When you need to extract fields from dynamic objects across millions of
rows, optimizing the extraction process is essential. Here are some best
practices to follow:
• Reduce the data fed into conversion Analyze your query and identify
opportunities to reduce the amount of data processed before applying
conversions. Applying filters and aggregations early in the query can
minimize the data volume and improve performance.
• Avoid unnecessary conversions Only convert the necessary data. If a
conversion can be avoided without affecting the desired results, skip it to
reduce processing overhead.
When working with new queries, it's important to follow best practices to
avoid potential performance issues. Here are some recommendations:
• Limit the result set Use the limit keyword with a small number or the
count function at the end of your query. Running unbound queries over
unknown datasets can result in large result sets, leading to slow response
times and increased resource consumption.
• Optimize join operations For join operations, select the table with
fewer rows as the first one in the query. When filtering by a single column,
use the in operator instead of a leftsemi join ( join
kind=leftanti ) for better performance.
String operators are crucial in query performance when working with text
data. Here are some best practices to optimize the use of string operators:
• Choose the right operator Select the appropriate string operator based on
your specific use case. Operators like has , has_cs , has_any , and
hassuffix are more efficient when searching for indexed terms of four
or more characters.
• Resource utilization The CPU and memory usage during query execution
• Use query logs and diagnostics Enable query logs and diagnostics to
capture detailed information about query execution. Analyzing these logs
can provide insights into query performance and help troubleshoot issues.
Summary
OceanofPDF.com
Chapter 5. KQL for Cybersecurity—Defending
and Threat Hunting
While this section of the book will have plenty of example queries we hope
you can utilize in your environment, the most important thing we want you
to take away is why KQL is used in cybersecurity. We want to give you the
knowledge about where you can use KQL in your day-to-day life, where it
can save you time, where it can detect threats, and ultimately, where it can
help you improve your cybersecurity resiliency.
We hope you don’t just copy and paste every query into your Microsoft
Sentinel environment or Microsoft 365 Advanced Hunting and generate
alerts for them, dust your hands off, and label yourself “secure.” If some of
the queries fill gaps in your alerting and are relevant to you, then that’s
fantastic; we would love for you to use them without question. Of course,
some queries, use cases, or detections are universally interesting and
important, such as adding users as Domain Administrators by threat actors.
Every Active Directory environment in the world should have that detection
running.
If you have made it this far in the book, then you realize that KQL wasn’t
designed from the ground up as a tool to be used within the security realm.
It started its life as a monitoring tool to understand application performance
within Azure Application Insights. There are similarities between the
capabilities of hunting and application performance monitoring tools, such
as quickly querying a massive amount of time-sequential data and detecting
patterns, anomalies, and outliers.
We think saying that KQL is really good at querying log data is a massive
understatement and sells the platform incredibly short. As people who write
hundreds and thousands of queries each month, the thing that stands out to
us while using KQL with cybersecurity is the flexibility it provides. These
statements aren’t designed as a marketing pitch. If you bought and read this
book, chances are you have been exposed to KQL already and are looking
to bolster your knowledge. As you read this book, we hope you regularly
stop and think, “Wait. I can do that with KQL?” Better still, we hope you
can apply that knowledge to your environment to make a genuine impact.
That isn’t the reality, though—far from it. The reality is that even within a
single vendor, there is very little standardization—let alone across vendors.
Fields might change between software versions; data structures might
change with added or removed functionality. Things that used to be logged
might no longer be logged, or new events might appear.
KQL provides many great functions and operators to make sense of the
mess; the best thing is that many operators are simple to use. For example,
if you have exported JSON logs or a CSV from a firewall appliance, we can
ingest that data without cleaning it up in Excel first. We can manipulate a
straight text block to take what we need from it. Using functions in KQL
such as parse , split , trim , or even regex capability, we can
quickly go from a mess of logs to a masterpiece of a query. Don’t stress too
much about the terms. We will go through all of them and more in this
chapter.
Easy pivoting between Datasets
We can preemptively hear the screams of worry about joining data and
pivoting across datasets. If you are relatively new to KQL, you might be
daunted by joining datasets, but you don’t need to be! We will explain when
to join and, perhaps more importantly, when not to join.
Many of these systems are very high volume, generating a huge amount of
logging data. As security analysts, we are likely only interested in bits and
pieces of that data—the parts reflecting malicious activity. Being able to
quickly filter out the noise of non-malicious day-to-day logs is key. KQL
has many operators and functions to help you find what you want in all that
data. In fact, you are likely to get confused initially as to the difference
between all those options. Fear not, however, as we will explain all of them
and when to use which.
To further help us make sense of this enormity of data, KQL has broad
capabilities in data aggregation and summation. That gives us, as analysts, a
summarized view of our data. Data aggregation is an extremely important
part of threat detection or digital forensics. As defenders and investigators,
we want to reduce the times we have to “doom scroll,” as it’s affectionately
known. Returning thousands and thousands of rows of data and then
scrolling through it line by line, looking for outliers and malicious activity,
is arduous. Hunting this way is time-consuming and tedious. We will be so
glassy-eyed from staring at the screen while we scroll that we’ll likely miss
malicious activity anyway.
Instead, we can aggregate data and look at its patterns or anomalies. For
instance, say we know that an IP address is suspicious and that an adversary
has compromised several of our users and continues to sign in as them from
this IP address. If we hunt in the sign-in log data for the IP address, we will
get all the results, which is useful, but it will also likely be thousands of
records. We have different types of sign-ins, both interactive (a user typing
their credentials in) and non-interactive (an application signing in on behalf
of a user), and we might have MFA logs. What might make more sense is to
summarize that data in a way that is more readable and easier to analyze.
Maybe we are looking for the total count of sign-ins, how many distinct
applications were accessed, and the first and last time seen for each
compromised user.
KQL can easily calculate the time between two events. For instance, maybe
you want to determine the time between a user receiving a phishing email
and a risky sign-in event. If those happen within two hours, it would be of
note; if they are two months apart, they are likely unrelated.
When an incident occurs, you might seek forensic data from systems that
are not otherwise logged into a central SIEM. That data might be provided
to you in any number of ways. Manual exports of firewall data or appliance
audit logs might be relevant to an investigation and might be sent to you in
different formats. CSV, text files, XML, and JSON are all very common to
see in digital forensics. Being able to query this data can be crucial to the
investigative process.
• They can also help drive impact and change with security policy. Using
our data, we can easily visualize metrics such as MFA coverage, how many
risky sign-in events we see, what percentage of phishing we receive, and
countless more. We jokingly refer to these kinds of visuals and dashboards
as “Please, can I have money in the cybersecurity budget?” diagrams. While
it is a tongue-in-cheek way of looking at it, there is an element of truth to it,
too. A picture tells a thousand words. If we can create impactful visuals to
highlight gaps or shortcomings in policy, then hopefully, that can help drive
improvement through additional resourcing, the purchasing of tools, or
whatever it takes.
Note
KQL provides many great functions that allow us to search widely for
indicators of interest without requiring our queries to be perfectly written or
fully optimized. We can cast a wide net using functions such as
contains and search . Sure, these queries will take a little longer to
run than an ultra-specific query, but perhaps we don’t know what we are
looking for early on in an investigation. That is where KQL can really
shine. If you have an IP address or a file hash, you can just search
everywhere initially. As the story comes together more, you can drill down
on your queries and paint the picture of what happened.
One of the key takeaways we want to leave you with is that your queries
don’t need to be perfectly optimized. You will learn how to make queries
perform better in time, but don’t feel the need to try to do it perfectly
upfront. If you are looking for an IP address and have no idea where it
could be located, then use those contains and search type operators.
That is what they are there for.
Versatility
• Creating insights into whether security policy meets security reality. Are
security policies being enforced with technical controls?
Cybersecurity-Focused Operators
In this section, we will take a deep dive into using KQL in cybersecurity,
with real-world examples and scenarios to build our knowledge on
leveraging KQL in many ways. As authors, we wanted this whole book to
mimic real-life scenarios. Rather than just overwhelming you with a list of
queries, we wanted to frame the context of all those queries with how you
might use them in your work. You might start with an alert, work backward
to understand initial access, and then work forward to understand the scope
and scale of the compromise.
To aid your learning, we have built out six varied security scenarios. We
will describe each scenario, walk through helpful queries, and explain how
we would tackle them. These scenarios start off at the beginner level and
move to be more advanced. The idea is to build on the learning from the
previous. Even in the most advanced scenarios, we will try to keep the KQL
as simple as possible to achieve the desired outcome. This section of the
book isn’t designed to showcase the most advanced KQL on the planet;
instead, we aim to teach you how valuable a tool KQL can be in your day-
to-day work.
Before we dig into the scenarios, we think it’s important to take a step back
and revisit some of what you learned in earlier chapters. However, this time,
we reintroduce that learning through the lens of a cybersecurity analyst.
In earlier chapters, we went through many operators and functions and gave
examples of how to use them. Having so many options, each with its own
quirks, limitations, and recommendations, can be overwhelming. Don’t feel
you must master everything to achieve your desired outcome. We have
never used many operators because they are skewed toward data analytics
instead of security. Or we prefer to write our queries in a particular way,
using my preferred operators and functions. Think of KQL as your Swiss
army knife for investigation and hunting. You have many tools at your
disposal, but you don’t need to use them all. Some of the best queries out
there are straightforward.
Like with many query and scripting languages, once you master the basics,
you will start writing queries in a very personal style. Just like with a tool
like PowerShell, you might like how certain PowerShell functions work, or
you like structuring your scripts in a way that is easy to read. That way,
when you dust them off after a few months of sitting idle, you can quickly
understand your previous work. KQL is very much in the same vein. You
will, in time, find a way to write queries that you enjoy, which might be
very different from the next person. There is no right or wrong with how
you style your queries. There are certainly ways to make your queries more
efficient, which we will cover. There is also an overlap in functionality with
certain KQL operators, where you can use multiple operators to achieve the
same outcome. What you use is ultimately your preference.
With security data, though, there are operators and functions of KQL that
you will likely be consistently drawn to. That is because they are
functionally a strong match for security data. As security practitioners, we
have specific requirements when we are hunting through data. Security
people are
Searching Operators
Imagine this all sits in a single column in your data. Maybe this is some
audit reporting on a firewall appliance you manage. When Kusto sees this
data, it indexes it according to the rules we just mentioned. It will break it
up into separate terms, based on alphanumeric sequences that are four or
more characters long. So, once it does that, what terms are we left with? See
Figure 5-3.
FIGURE 5-3 A firewall log file with each term highlighted in red
• 96b30c1e
• d3fb
• 409d
• a9c0
• f20565e0e595
• EventType
• CorrelationId
This is especially important for security data; we deal with many events and
logs with data types that fit this formatting. Another example is a directory
path (see Figure 5-4).
FIGURE 5-4 A file directory path with each Kusto term highlighted in red
• Matt
• AppData
• Roaming
• test
Think about all the kinds of data you might have in your environment and
how Kusto would index it. Quotes and commands separate JSON, and
syslog data might be separated by commas or equal signs.
Note
has
has is one of the most powerful search operators at our disposal, which
allows us to search for all those terms that have been indexed throughout
the data. Because this data has been indexed, it is much faster for us to
retrieve results. The indexing also can help compensate for our lack of
knowledge of the structure of our logs. As mentioned previously, no one
will be an expert on exactly the format of all logs you have access to, so we
can leverage the capability of KQL to do the heavy lifting. Once you
understand the indexing component, the operator's title gives away its
functionality. The query has to have one of the terms.
Once you run it, you should see two log entries that look like the ones
shown in Figure 5-5). Depending on which platform you use, it might
visually look a little different, but the functionality will be the same. These
entries mimic network creation events, where we see source and destination
IP addresses and ports, the initiating process command, and the username.
By the time you have run the code, Kusto has already completed its
indexing magic over it, allowing you to search using the has
functionality.
Using the indexing rules we learned earlier, we know that 3389 has been
indexed as a separate term, so when we query for it, only the event
containing 3389 will be returned, as shown in Figure 5-6).
FIGURE 5-6 Network event showing only results with 3389 present
Once the indexing was complete, the second record didn’t contain the 3389
term, so the query didn’t find it. We could do that if we were interested in
network connections originating from files in a temp directory.
Both records are returned this time because they both have temp as an
indexed term listed in the record, as shown in Figure 5-7.
has_any
has_any is the next logical progression of has and lets you search for
multiple terms that have been indexed. The query will return where there is
a match on any one of them, regardless of how many you specify. To
simplify things, we will continue using the same example data.
It will return both records again because the first record has 4444 as the
destination port. While the second record lists wincmd.exe as the
process path. We have a singular hit for each of our records, so our
has_any statement has been satisfied, and both return.
This time, to reinforce our learning, perhaps we are interested in any source
traffic coming from port 8080 and anything Eric Lang has been up to. We
can alter the query to include these new parameters:
has_all
This is the sibling of has_any . When using the has_all function, our
data must have all the terms listed in the query to be returned. Matching
only some terms is not good enough. Let’s rerun our first query from the
has_any section using has_all :
When you run this query, no results will be returned because neither record
contains both 4444 and wincmd . 4444 is listed in the first record,
while wincmd is listed in the second.
However, let’s say you read another threat intelligence report that says there
is a new malware variant called wincmd.exe , but it is only suspicious
when it connects on port 3389 because it has remote access capability. This
malware shouldn’t be confused with the genuine and non-malicious
wincmd.exe file that doesn’t connect on port 3389. Let’s adjust the
query to include both those indicators:
The results will show our second log record because it contains both
wincmd.exe and 3389 .
has , has_any , and has_all are valuable because they are much
faster than non-indexed searches and strongly align with security data types.
When you see JSON, XML, and items separated by non-alphanumeric
characters and other similar data, Kusto has indexed it, making it ready to
search.
You will be returned the first record, which doesn’t contain 3389 . Simple.
Now, if you try to type !has_any into KQL, you will notice it isn’t an
operator we can use because it doesn’t exist. We can still write our queries
to achieve the same outcome; we just need to use a slightly different
notation, using not() . For instance, if we wanted to exclude any 3389
traffic or anything related to Cassie Hicks , we would write the query
like this:
The only result we see is the one belonging to Eric Lang because it
satisfied the query’s terms—does not contain Cassie and wincmd . You
should master the use of has and all its variants because it is a cheat code
to find things quickly!
You might wonder how we got this far without mentioning contains ,
the ultimate searching tool! contains is straightforward and ultra-
powerful. contains simply searches across all your data for whatever
substring you define. We can use the same sample data to show how
contains works:
In this query, we are just looking for the three letters Des . Maybe we were
interested in DestinationPort but didn’t know what the field was
called. When we run this query, both logs are returned because they contain
the word DestinationPort and a secondary hit on
DestinationIP , too. We can exclude results using !contains :
This time, we got no result when we asked KQL to find logs that don’t
contain Des . Because both records contain Des , both are excluded from
the results.
You might wonder if there’s any reason to use anything besides contains
because you can search for any substring. That’s a fair question, and there
are several good reasons you shouldn’t:
• If you are dealing with a large amount of data, a broad search can take a
long time to run and potentially even time out.
• It might even return data you weren’t expecting because the string you
searched for is in several other places. We saw that with the query above
when we wanted to find DestinationPort; the contains search was very
broad, so we also found DestinationIP.
• As you add more terms, your queries become more complex. Let’s say
you wanted to use contains to find anything containing port 4444 or
wincmd.exe and anything related to Cassie :
That said, we certainly aren’t telling you not to use contains because it
is inefficient (or for any other reason). The operator exists for a reason; we
just want you to understand its limitations and pitfalls.
== and !=
in and !in
!in— the opposite of in — returns results for anything not listed in the
same field, so if we wanted to exclude 50.50.50.50 and
90.90.90.90 , we would run the same query but use !in instead of
in :
If you are wondering about the difference between the in and has
operators mentioned earlier, that is another great question. in requires a
complete match of the string and doesn’t do the same indexing we saw with
has . That is, you cannot use in to get matches on the indexed terms of a
larger string, only a full match. So, if you were searching for
50.50.50.50 within a larger string, say
IPAddress=50.50.50.50 , then in would not return a match.
Because of this, using in is often a functionally strong match for
indicators such as IP addresses or usernames, where those indicators are in
their own specific field in your data. If you have an IP address but it is
nested within a larger JSON structure, then you should use has .
Time Operators
These next set of operators are focused on our queries with timestamps.
Security, perhaps more than most professions, can be obsessed with time:
These questions are crucial to defenders and investigators. KQL has a vast
array of operators to help us, and these are a few of the most important.
ago()
ago() is an operator that lets you subtract time from the current UTC
time and retrieve a selection of time-based data. So, what does that mean in
practice? It is simple. Let’s look at a few examples using Microsoft Entra
ID logs in Microsoft Sentinel, though you can use any log you wish.
SigninLogs
| where TimeGenerated > ago(10d)
This query retrieves the last ten days of logs from when you run the query.
Remember, an ago() query is always relative to the time you run it. Your
query doesn’t need to specify only days:
SigninLogs
| where TimeGenerated > ago(4h)
SigninLogs
| where TimeGenerated > ago(25m)
• If you want to grab the absolute latest logs, you could even choose
seconds:
SigninLogs
| where TimeGenerated > ago(45s)
ago() queries are great for getting the latest available data because the
time is always relative to when you execute the query. You can choose any
unit you like, and even part units, if you wanted to look at three and a half
days of logs you could use 3.5d . You can even include multiple options;
if you wanted to find the results from between 14 and 7 days ago, you could
do that:
SigninLogs
| where TimeGenerated > ago (14d) and TimeGenerat
between
We could use those dates as an anchor point for all our queries. For
instance, if we wanted to see all Microsoft Entra ID (previously Azure
Active Directory) sign-in data for Eric, we could use this query:
SigninLogs
| where TimeGenerated between (datetime(07-01-202
| where UserPrincipalName == "[email protected]
Cybersecurity professionals call the first sign of malicious activity the left
goalpost. The last sign of malicious activity is called the right goalpost. By
using the goalposts in the query, we can focus on only the time window of
interest as if we are kicking a goal. Doing so makes our queries have higher
fidelity and more efficient. In this particular query, we cast a specific time
for the left and right goalposts using the datetime() function. The
datetime() operator uses specific times, as opposed to being a relative
time, like we previously saw with ago() . If you rerun this example query
today and again a week later, the results will be the same because we have
exact dates and times.
abs()
abs() lets us calculate the absolute value of the input. So, what does that
mean in terms of time? Using abs() , we can find the time between two
events and have KQL calculate it. Let’s use our datatable operator
again to make another fake set of data to test with:
This should return a single log entry for Eric Lang with two timestamps (as
shown in Figure 5-9).
Our new column is called TimeDelta , and we see the exact difference
between the two events. This might be interesting from a security point of
view for many reasons. As mentioned in the introduction to this chapter, the
sequence of, or time gap, between events can be forensically interesting to
us. If a user clicks a phishing link (which might be Event1), and then we see
a risky sign-in later ( which might be Event2 ), the gap between those
will be useful. The closer they are, the more likely they are to be related.
datetime_diff()
For all these examples, we will leverage the datatable() operator and
have it create 10 example sign-in logs. Using these, we will see how you
can summarize the data.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
count()
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| count
As you would expect, 10 results are returned. You can also count by
something—and count how many of something there are in your results.
Perhaps you are interested in how many events there are for each user; in
that case, you could count() by User . To do that, we need to use our
first summarize operator.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize count() by User
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize count() by User, IPAddress
FIGURE 5-15 Result showing the count of each user and IP address
combination
This time, we see the count of each user and IP address combination. From
a security point of view, these kinds of counts can be interesting. Maybe
you are tracking download events because you are worried about data
exfiltration. In that case, a high count might be something you investigate
further. If you are looking at sign-in data similar to our examples, low IP
address counts might be interesting to you.
dcount()
You will get a result of 3 . Much like count() , you can also do a
()dcount() by something else. If you want to know the distinct count
of users from each IP address, the syntax remains the same as count() :
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize dcount(User) by IPAddress
The results show how many distinct users are seen from each IP address.
Again, this might be forensically interesting to you.
make_list()
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize make_list(Application)
A list of 10 applications taken from the 10 sign-in logs is shown. Each sign-
in event includes an application, and we have made a list of those
applications. Just like our count() and ()dcount() operators, we
can also make lists by another field. For instance, if we want a list of
applications for each user, we can do that:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize make_list(Application) by User
FIGURE 5-18 Result showing the list of applications for each user
Hopefully, the make_list() operator's value is starting to become
apparent. For example, if we find a compromised user, make_list(1)
might surface a malicious IP. Based on the IP address (or other indicators),
we can then summarize the data to see the impact quickly.
make_set()
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize make_set(Application)
Even though we have 10 log events, the set of applications shows just 4
items because they are distinct (seen when expanding the array as in Figure
5-19).
Of course, we can also make a set by another field. When we run this
query by each user, we get a unique list of applications per user.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize make_set(Application) by User
FIGURE 5-20 Result showing the set of applications for each user
Whether you want to create a set or list depends on the data and the hunting
you are doing; there is no right or wrong answer. One isn’t better than the
other; it just comes down to what you are trying to find.
max()
max() is a valuable operator that is simple but powerful. It lets us see the
latest (or max imum) time something occurred. max() is a time-based
operator, so we need to specify the time field to make it work. For instance,
in ten logs from the previous example, it’s easy to determine when the most
recent one was generated:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize max(Timestamp)
The timestamp of the latest log file is returned. Like the other
summarize operators, we can find the latest time by another field. So,
we would use a similar syntax to find the most recent log for each user:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize max(Timestamp) by User
This time, the most recent timestamp for each user in the data is shown (see
Figure 5-22).
FIGURE 5-22 Result showing the most recent timestamp of each user
It’s always valuable to know the most recent time something occurred. If
you have a query that detects malicious activity, you can use max() to see
the impacted users or devices and quickly see when that activity occurred.
arg_max()
| summarize arg_max(Timestamp, *)
If you wanted to just return a few fields, you could specify them, separated
by commas:
Using the test data, we enter an asterisk to return the whole log file:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize arg_max(Timestamp, *)
Figure 5-23 shows the full log record with the latest event.
FIGURE 5-23 Result showing the latest log file in our data
By now, we are starting to sound like a broken record, but like all our
previous summarize operators, we can summarize by something else.
We can use arg_max() to return the latest record for each user.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize arg_max(Timestamp, *) by User
This time, three records are returned—the latest record for each user.
FIGURE 5-24 Result showing the latest log record for each user
min() and arg_min() are exactly what you would expect: the
opposite of max() and arg_max() . Instead of finding the latest event,
they find the earliest event. The first returns just the time of the earliest
event, the latter the entire earliest record. The syntax and capabilities are the
same. Depending on your investigation, whether you are interested in the
earliest or most recent event might change, or you might find value in
finding both.
Hopefully, you are starting to see the value of data aggregation. Even with
just 10 records, you can see how it helps detect data patterns and trends.
Data aggregation becomes more powerful when you combine many
operators to provide an overview of your data.
Let’s look at some examples using the same data we’ve used throughout
this chapter. For example, say you wanted each user's total count of events
and the distinct count of applications. You don’t need to write separate
queries because you can chain these operators together:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize TotalCount=count(), DistinctApps=dcou
The key piece of syntax here is summarize TotalCount=count(),
DistinctApps=dcount(Application) by User
Renaming the output is useful as the queries get a little more involved. So,
in this case, we are renaming the count of all our events to TotalCount ,
and the distinct count of applications to DistinctApps .When our
queries run, the output is much easier to read. We separate our two
aggregation operators— count() and ()dcount() —by a comma,
which tells KQL we want to run both. Finally, we still use the by syntax
and tell KQL we want these two aggregation operators to run against each
user. See Figure 5-25 for the output.
FIGURE 5-25 Results showing the total sign-in count and distinct
applications for each user
So, now we get the total count of events and distinct applications for each
user. You can expand on this even more, by including min() , max()
and even lists and sets:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Application: string,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", "Of
"2023-07-20 14:54:44.343", "50.20.500.20", "S
"2023-06-13 09:53:12.123", "20.70.20.20", "Of
"2023-07-22 08:23:53.111", "20.20.20.20", "My
"2023-07-18 17:19:41.234","20.20.20.20","Offi
"2023-06-13 13:23:33.761","20.20.500.20","MyP
"2023-06-18 02:32:50.331","20.20.20.20","Team
"2023-07-11 14:44:10.122","20.20.20.20","Offi
"2023-07-16 10:11:22.255","25.20.25.20","Team
"2023-07-04 00:25:29.499","20.20.20.20","Offi
];
Log
| summarize FirstEvent=min(Timestamp),LastEvent=m
ListOfApps=make_set(Application) by User
We end up with this great summary of what our users have been up to. The
potential for this kind of aggregation is hopefully obvious, and learning to
use these operators will be valuable to you day to day.
Other summarize operators allow you to count or list only items when
your query is true. For example, countif() and dcountif() allow
us to count an item if the IPAddress == 20.20.20.20 . Also, we
can use make_set _ if() to only count items using the same IP
address. You can explore those as well, it is just additional functionality, but
the structure remains the same.
project
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
| project Timestamp, IPAddress, User
In the results shown in Figure 5-28, we now only see those three colums.
FIGURE 5-28 Results showing only the Timestamp, IPAddress, and User
project has other functionality you might not know about: renaming
columns when you query them. For instance, let’s say you want to
standardize by changing the names of Timestamp , IPAddress , and
User to TimeGenerated , IPAddr , and UserPrincipalName ,
respectively. You can do that by using the equals ( = )symbol in your
project statement:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
| project TimeGenerated=Timestamp, IPAddr=IPAddre
The same data is returned, but the columns have been renamed to our
preferred standard, as shown in Figure 5-29.
FIGURE 5-29 Results with fields renamed
Finally, you can also create new columns based on existing data or a string
you enter. In this example, we’ll create a DaysFromToday column,
which calculates how many days have passed between the
TimeGenerated time and when you run the query. We’ll use the
datetime_diff operator:
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
| project TimeGenerated=Timestamp, IPAddr=IPAddre
DaysFromToday=datetime_diff("day",now(),Timestamp
Your DaysFromToday values will differ because your date and time will
differ from ours.
extend
extend allows us to create new columns based on criteria that are added
to the results. In fact, we used extend earlier in this chapter with the
abs() and datetime_diff() examples. Those columns didn’t exist
in the original data, but we created them using those operators when we
calculated the time between two events.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
You might have noticed that the Location column is a JSON object, as
shown in Figure 5-32.
FIGURE 5-32 JSON data representing the Location field
The City and Country fields are part of that nested array. We can use
extend to create new columns for those two fields.
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
| extend City = tostring(Location.City)
| extend Country = tostring(Location.Country)
Using extend in this way is very common with security data, where the
information we are interested in is often buried deep in dynamic JSON or
XML objects. The surrounding data is often not notable to us, so we want to
focus on the parts we care about.
parse()
parse() is, without question, one of the most valuable operators you
will use, especially in security data. It allows us to evaluate a large string
and parse it to various columns based on the patterns or rules we decide. So,
what exactly does that mean? As always, the best way to learn is to see it in
action.
We will use the datatable operator to generate some fake data to keep
things consistent.
In this case, it is just one long string of test firewall data, as shown in Figure
5-34.
The data is shown in one long string. We could query this data and do some
hunting on the 50.50.50.50 IP, but it is currently a bit of a mess. To clean it
up, we can use parse and create a new column for the
DestinationIP :
We can use parse to create a new column from the data between two
matches. In this case, we are generating a new column,
DestinationIP , with everything between DstIP= and the comma.
The asterisk on either side tells KQL to cut everything before and after
them. If you change the comma to an equal sign, you will see the change in
output:
Using parse can sometimes require some trial and error to get exactly
right, but it is extremely powerful. You can parse multiple new columns
in a single line, which really shows the value of the operator. In this log, we
can parse the four pieces of data we want: Timestamp , SourceIP ,
DestinationIP , and Port :
All the desired columns have been parsed, ready to query as normal, as
shown in Figure 5-37.
FIGURE 5-37 Fully parsed log data
The key to using parse is to just have a look at the structure of the data
and understand where you want to make the cuts. If you make errors in your
logic, and there are no matches, you simply will not receive any results.
split()
Notice that we are using extend in this syntax, so the output will be an
entirely new column. First, we need to tell KQL which column to split; in
this case, Data . Then, we need to define our delimiter character; in this
case, a comma. Our new columns are put into an array based on the comma
(see Figure 5-38).
FIGURE 5-38 Result when split with a comma
trim
trim() is another operator that lets us clean our data up a little, allowing
us to remove all leading and trailing matches from a string. If your data
comes through with some kind of extra characters, then you can remove
them.
If you run this query, you will see a string with a username surrounded by
equal signs, as shown in Figure 5-40.
Let’s say an appliance sends data in with extra characters; we can use
trim() to remove them:
let Log=datatable (data: string)["==cassie.hicks@
Log
| extend TrimmedData=trim("==",data)
If you wanted to only trim extra characters from the start or end of your
string, you can also use trim_start() or trim_end() ,
respectively.
let
Say you are hunting through your various datasets, and you come across
two IP addresses of interest. You can cast that as a variable using let and
then refer to it in the query:
let IPs=dynamic(["20.20.20.20","20.70.20.20"]);
let Log=datatable (
Timestamp: datetime,
IPAddress: string,
Location: dynamic ,
User: string
)[
"2023-07-18 07:23:24.299", "20.20.20.20", dyn
"2023-07-20 14:54:44.343", "50.20.500.20", dy
"2023-06-13 09:53:12.123", "20.70.20.20", dyn
];
Log
| where IPAddress in (IPs)
At the very top of the query, we have cast the IPs variable using let :
let IPs=dynamic(["20.20.20.20","20.70.20.20"]);
Then, in the query itself, instead of having the type out the IP addresses
again, we can refer to the IPs variable:
This tactic can be powerful when hunting the same IP addresses or other
indicators across multiple tables or data sources because you can simply
keep referring to the variable.
However, let becomes really powerful when you have the results of one
query be a variable for another. Let’s say you write a great detection query
looking for device malware. The query returns a list of devices with
malware, so you want to use that list of devices in your next query. This is
just a hypothetical example, and you won’t find any results in your actual
data:
DeviceFileEvents
| where ActionType == "FileCreated"
| where FileName == "malware.exe"
The following query would find all the file-creation events for
malware.exe . If you wanted to pivot from that list of devices to find
the DeviceLogonEvents for those same devices, you could do that in
a single query:
let devices=
DeviceFileEvents
| where ActionType == "FileCreated"
| where FileName == "malware.exe"
| distinct DeviceName;
DeviceLogonEvents
| where DeviceName in (devices)
This time, we cast our first query as a variable using let and named it
devices . We also added | distinct DeviceName at the end
because the only data we want from that query is the list of devices that
have malware.exe . In the second half of the query, we then looked at
DeviceLogonEvents , and within that second query, we returned only
the devices that matched in our first query by using | where
DeviceName in (devices) .
Rather than writing your first query, getting the results, making a list of all
the devices, typing that list into an array, and including them in your second
query, you can just let KQL do it for you. In our opinion, this is one of the
best ways to hunt because it lets you reuse the same list of indicators,
devices, or any other data as you hunt the threat.
externaldata
If you run the following code, the list of vulnerabilities published on the
CISA website is returned:
let CISAVulns=
externaldata(cveID: string, vendorProject: string
shortDescription: string, requiredAction: string,
[
'https://www.cisa.gov/sites/default/files/csv/kno
];
CISAVulns
You will see that the query contains the data schema and location. The
output looks like any other data, as shown in Figure 5-43.
You can query this data in the same way you would any other data. For
example, if you want to see only Adobe vulnerabilities, enter this query:
let CISAVulns=
externaldata(cveID: string, vendorProject: string
shortDescription: string, requiredAction: string,
[
'https://www.cisa.gov/sites/default/files/csv/kno
];
CISAVulns
| where vendorProject == "Adobe"
• Hunting for a published list of remote access tools that are seen in your
environment
• Secondly, you must update the query if it moves location or the schema
changes.
When a user is compromised, there are two real facets to the response:
• First, we want to return control of the account to the actual user. That will
involve resetting the user’s password, revoking session tokens, possibly
requiring the user to re-register for MFA, and anything else that is part of
your response playbook.
When dealing with user compromise, there are possibly even multiple ways
that you were alerted to the malicious activity. Perhaps the user called you
or your help desk and mentioned weird behavior on their account, or maybe
you noticed the user signing in from a known malicious IP address. If you
use identity tools such as Microsoft Entra Identity Protection, perhaps the
user was flagged for a high level of risk. Maybe they fell victim to a
phishing attack. Maybe they clicked something they know they shouldn’t
have clicked and don’t want to admit it out of fear of repercussion. Maybe
they were compromised on a personal device or mobile phone over which
the security team lacks visibility. In some cases, we just don’t know how
they were compromised, but based on the data we do have, we might be
able to infer what happened. Or it might forever be a mystery.
Note
• AuthN With authentication (AuthN), the user proves who they are to the
identity provider. Generally, that involves entering a username and
password.
When a user signs into Microsoft Entra ID, a code (sometimes known as an
error code or result type) is returned, which tells us exactly what happened
during this particular sign-in. When a user signs in successfully and passes
both authentication and authorization, that code is a 0 . If an attacker, not
the rightful account owner, completed this action, the account is clearly
compromised. When a user fails authentication, such as with an incorrect
password, we might receive codes such as 50126 (username or password
incorrect). In this case, the user is not compromised. That leaves us with a
whole lot of codes that sit in the middle rectangle of our figure—where
authentication is successful, and authorization fails. In our scenario noted
above, Anna Lidman mentioned she is getting MFA prompts she didn’t
initiate. If we map that to figure 5-45, the code for requiring MFA is
50076 . If Anna denied the MFA challenge (which we hope she did
because she didn’t initiate it), we would get a 500121 error code.
We know when a user is prompted for MFA, they generate the 50076
error code, so let’s start there:
SigninLogs
| where TimeGenerated > ago (30d)
| where UserPrincipalName == “anna.lidman@tailspi
| where ResultType == 50074
| project TimeGenerated, UserPrincipalName, AppDi
Note
• First, we are less likely to miss the first malicious sign-in if we go back in
time quite a while.
We can see a lot of events here for the user Anna Lidman; a sample of those
events is shown in Figure 5-46. Try this with your own account, and you
will see plenty of activity.
SigninLogs
| where TimeGenerated > ago (30d)
| where UserPrincipalName == "anna.lidman@tailspi
| where ResultType == 50074
| where RiskLevelDuringSignIn != "none"
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
Risk details can be kept in a few locations, but we tend to use the
RiskLevelDuringSignIn field, which tracks risk for each sign-in
event (see Figure 5-47). We can simply exclude all events with no risk
(where RiskLevelDuringSignin is not equal to none ) to focus on
those that do contain some kind of risk.
FIGURE 5-47 Detailed view of a single sign-in
Tip
Warning
We can confirm this; even where a user says that they aren’t in a location,
they might be using VPN software to make it appear they are in that
location.
Now that we have a few indicators we can pivot on, let’s see if these
indicators are uncommon to see associated with this user. We can use the IP
address, location, and the sign-in OfficeHome to determine whether this
is unusual for this user:
SigninLogs
| where TimeGenerated > ago (30d)
| where UserPrincipalName == "anna.lidman@tailspi
| where AppDisplayName == "OfficeHome" or IPAddre
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
We are no longer looking for just the error code 50074 result type
or risky sign-ins; we are looking more widely. Using the or logic, we can
look for any sign-ins for Anna that are to the OfficeHome application
from the 70.70.70.70 IP address or Nigeria. Even though this sign-in
seems suspicious, this kind of query gives us some good context as to what
is normal for this user.
SigninLogs
| where TimeGenerated > ago (30d)
| where UserPrincipalName == "anna.lidman@tailspi
| where ResultType == 0
| where IPAddress == "70.70.70.70"
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
| summarize TotalCount=count(), FirstEvent=min(Ti
AppsAccessed=make_set(AppDisplayName)
SigninLogs
| where TimeGenerated > ago (30d)
| where ResultType == 0
| where IPAddress == "70.70.70.70"
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
| summarize TotalCount=count(), FirstEvent=min(Ti
AppsAccessed=make_set(AppDisplayName) by UserPrin
Figure 5-50 shows that another victim account, Eric Lang, also has signed
in from this IP address. Phishing attacks are commonly sent to many users
in the hope of compromising many accounts, so if this is a phishing attack,
this finding isn’t uncommon.
Incident responders and blue teams will understand that when a user is
compromised, the adversary will usually seek to maintain persistence on
that account in various ways. Attackers usually know they will be caught at
some point but want to have a way to regain control of that account.
Persistence is often maintained by registering an MFA method on the user’s
account – this allows an adversary to access MFA-protected resources and
use self-service password reset to re-take control of that account.
The Microsoft Entra ID Audit Log is usually the first place we look for
persistence because that is where changes to user accounts, such as MFA
registration, are logged. Earlier in this chapter, we discussed the has
operator and why it is so useful; the Microsoft Entra ID audit log is a
perfect example of its power. This audit log tracks hundreds of events,
including MFA registration, group changes, and conditional access policy
updates. The details in each user’s log are unique. The number is tracked if
a user registers a mobile phone number as an MFA method. When we add
users to a group, the group names are tracked, but the phone numbers are in
a group change event. Remembering every detail about an event is
impossible, but we don’t have to because KQL will do the hard work for us:
AuditLogs
| where InitiatedBy has_any ("anna.lidman@tailspi
("[email protected]","eric.lang@tailsp
| project TimeGenerated, OperationName, Result, I
In the audit log, the details of the operations we are after are primarily kept
in two JSON fields: InitiatedBy and TargetResources . The
InitiatedBy field tracks the identity that initiated the activity; the
TargetResources field tracks the identity that was targeted by the
action and any other operation-specific data. For example, if an admin
added a user to a group, the admin would be identified in the
InitiatedBy field. The user and group name would be identified in
theTargetResources field. Sometimes, the InitiatedBy and
TargetResources fields can be the same identity. For example, if you
sign in to the Microsoft portal and register a new MFA method on your own
account, you will be both the initiator and target of that action. The query
uses the has_any operator to account for these combinations by looking
at the InitiatedBy and TargetResources fields (see Figure 5-
51).
We can see a phone number registered if we dig into the Update user
events occurring after the User registered security info
events and browse to the TargetResources field (see Figure 5-52).
If you have a keen eye, you might see something curious: The same phone
number was attached to both accounts. Reusing the same phone number
across multiple accounts is a common adversary tactic. However, this can
happen for legitimate reasons too, such as the same number being assigned
to a regular and an admin account. While this behavior is unusual for
standard users, it is suspicious.
Adding a device is another common threat actor tactic, giving the attacker a
foothold in the tenant. Depending on your conditional access policies, it
might also grant access to additional business applications.
Your investigations might uncover other activities that have occurred. For
example, perhaps the threat actor completed a self-service password reset
on behalf of the user. There is a chance an admin was socially engineered to
reset this account's credentials. The key is diving into the different events
and understanding whether they are suspicious, which sometimes requires
talking directly to the users involved.
Finally, we want to understand which data and systems were accessed. This
is where it is important to know which logs you can access and where
systems log their information. Some customers have multiple SIEMs or
might use an SIEM and a product like Microsoft 365 Defender Advanced
hunting.
Microsoft Defender for Cloud Apps is the best place for determining the
impact of Microsoft 365 services, such as SharePoint, Exchange Online,
and Teams. This product aggregates events across the Microsoft 365 stack
and writes an event each time something occurs. For instance, if you
download a file from SharePoint or access an email, those events are
logged. Audit events are also triggered on user changes to the platform, so if
you create a new mailbox rule, it will be audited.
If you have fully connected your applications, events like MFA registration
will also appear in Microsoft Defender for Cloud Apps. In the examples in
this section, our investigation has uncovered some indicators we can use to
pivot effectively. We can put them all together to uncover suspicious
activity:
CloudAppEvents
| where TimeGenerated > ago (30d)
| where RawEventData has_any("anna.lidman@tailspi
ANG80Y") and RawEventData has "70.70.70.70"
This query looks for all our indicators—the two accounts in the example, a
suspicious MFA phone number and an unknown device, and it also adds the
filter for our malicious IP address. So, to get a result and help filter out any
legitimate activity, we must have one of our accounts, a phone number or
device, and an IP address. In Figure 5-55, two previously unseen events
have popped up.
Combined with everything else we have found, these findings paint the
picture of the attack and begin speaking to our threat actor’s motivation.
This attack looks related to business email compromise, where our
adversary might be trying to manipulate invoices or emails as a means of
financial gain.
In real-world cases, when this happens, we often ask for manual exports of
available logs. Maybe the MyPayroll application has a custom audit log that
we can interrogate, or perhaps it writes data to the Windows Event Log, and
we can extract that data. In other cases, the logs simply don’t exist; the
application might not retain log data, or the retention period is very short. In
any investigation, it’s unlikely that you will have complete visibility, but it
is important to pull on as many threads as possible. Even if your data is not
readily available in an SIEM, you can still query it—via KQL or even by
manually interrogating it.
KQL is the tool that can help you find what you are after, but it’s important
to bring investigative skills and curiosity to the table. Hunting user
compromise is a constant cycle, as Figure 5-58 shows.
FIGURE 5-58 Cycle of user compromise hunting
Phishing Attacks
Phishing is still one of the most common ways threat actors compromise
credentials. As an incident responder or defender, it’s important to
understand how to quickly determine the scope and impact of a phishing
attack.
In this example, one of our staff members, Eric Lang, mentioned that he
noticed strange behavior with his email. He noticed several expected
invoice- and payment-instruction-related emails were missing. He asked if
it was related to the email that he received about his account expiring. After
questioning him about this email, he tells you that he received an email
stating his account will expire in 90 days unless he reconfirmed his
username and password. After obtaining a copy of this email from Eric, you
can see clearly it is a phishing email (see Figure 5-59).
FIGURE 5-59 Sample of the phishing email
We find it useful to visualize the path you want your investigation to take
and what data you want to investigate. It is important, however, not to be
bound by any preconceived assumptions you have about what might have
happened. Instead, just follow the trail and let your forensic data and
hunting results tell the story. Figure 5-60 shows the necessary mindset for
hunting phishing attacks.
From the suspect phishing email shown in Figure 5-59, we know the
following initial list of indicators, which becomes our investigation's
starting point.
• Recipient [email protected]
• CloudAppEvents Event data for Microsoft 365 and can track items such
as the download of data or creation of mailbox rules.
Phishing is always interesting because the data you are interested in spans
many sources. For instance, you might use a third-party email gateway or
MFA service. If you have that data available to query with KQL in
Microsoft Sentinel or Azure Data Explorer, it will be relevant to you, too.
Depending on where your investigation takes you, you might also be
interested in device endpoint logging data.
When investigating phishing campaigns, you can begin your investigation
in any number of places. You could do a deep dive on one user to
understand everything that happened with them or you understand the scale
of the incident. Our preference is to understand the scale of the compromise
before pivoting to other data sources (see Figure 5-61). Doing so helps us
hunt more efficiently. If we can identify more victim accounts or indicators
of compromise, such as IP addresses, then our later queries will utilize those
indicators. Also, starting with the full scale of the compromise lets you
begin remediation quickly, even before a full investigation is complete for
specific users. For instance, if you know 100 users all fell victim to a
phishing campaign, you will want to regain control of those accounts as
soon as possible.
FIGURE 5-61 Priorities when hunting phishing campaigns
Because the Subject is usually unique, you can start by searching for it:
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
| project TimeGenerated, EmailDirection, Recipien
DeliveryLocation
In this query, the Subject is the first indicator, so we’ll use that to track
any emails received in the last 30 days matching that Subject . The
EmailEvents table contains a lot of information, but we want to focus
on some valuable fields. Using the project operator returns only a
handful of fields of interest (see Figure 5-62).
Five users received the phishing email (see Figure 5-63). We also have
additional indicators to use as the basis of our investigation. We need to be
careful not to focus so narrowly on a small piece of the puzzle that we miss
things. We have the sender’s email address (SenderFromAddress) and the
domain information (SenderFromDomain) used in the phishing attack.
Thinking through the mind of an attacker, we can send multiple emails to
try to compromise additional users. This is especially true when the domain
is malicious and has no legitimate email associated with it.
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation
By using or logic in the query, we looked for emails with the same
subject or sender or is from the same sender domain. We found three
additional emails from a different sender address but from the same
malicious domain. In this case, however, those three email messages were
blocked (which we can see in the DeliveryAction field in Figure 5-
63), and the users never received the emails; perhaps it triggered a phishing
detection. This example is only a small phishing attack, so we can easily see
which users received a phishing email. A real-life situation might involve
10,000 users, so we’d want to narrow our focus on those who received the
email. In this example, we can see that the delivery outcome is tracked in
the DeliveryAction field.
Tip
EmailEvents
| where Timestamp > ago(20d)
| extend AuthenticationDetails = todynamic(Authen
| project SenderFromAddress,
SenderDisplayName,
RecipientEmailAddress,
AuthDetailsSPF=parse_json(Authentication
AuthDetailsDKIM=parse_json(Authenticatio
AuthDetailsDMARC=parse_json(Authenticati
AuthDetailsCompAuth=parse_json(Authentic
| summarize by SenderFromAddress, SenderDisplayNa
tostring(AuthDetailsDKIM), tostring(AuthDetailsDM
No one can remember all the potential values or fields in your data. Even if
you could, those fields and values will change as the functionality of the
products changes or you onboard more data sources. Instead of memorizing
all that information, we can use KQL to return the needed information. Like
many things in KQL, there are multiple ways to do this. Earlier in this book,
you were introduced to the distinct and count() operators. The
following query uses the distinct operator to find the possible results
for DeliveryAction :
EmailEvents
| where TimeGenerated > ago(30d)
| distinct DeliveryAction
In Figure 5-64, we see three outcomes:
• Junked The email was delivered but to the Junk Email folder.
EmailEvents
| where TimeGenerated > ago(30d)
| summarize count() by DeliveryAction
We get the same values, but this time, they are populated with the count for
each one, as shown in Figure 5-65.
Now that we know the potential values for our DeliveryAction field,
we can filter the query on the delivered emails:
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation
| where DeliveryAction == "Delivered"
So, where do we go from here with our targeted users? The reality,
however, is only a small percentage of people who receive a phishing email
will click the malicious link (hopefully), and then (again, hopefully) a
smaller percentage of people will enter their credentials into the site and be
compromised.
First, we need to track down the malicious URL in the email. Of course,
you can examine a copy of the email to get this information, though a better
way is to leverage the EmailUrlInfo data. Again, this is all about
hunting scale. What if the threat actor sent 10,000 emails with a single
subject, but there were 500 unique URLs within those 10,000 emails? You
can’t examine each email one by one to retrieve the URL information, so
let’s leverage our data. What exactly does that data look like? Use the
take or sample operators to check it out:
EmailUrlInfo
| take 10
In earlier chapters, we covered joins; joining data is what stumps people the
most. Let’s keep it simple for now, though. In Figure 5-69, two datasets are
shown:
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation, NetworkMessageI
| where DeliveryAction == "Delivered"
| join kind=inner(EmailUrlInfo) on NetworkMessage
The results now combine the two datasets: the delivery information from
EmailEvents and the URL information from EmailUrlInfo . It’s as
easy as that!
Tip
When you join your EmailEvents and
EmailUrlInfo , you will end up with more results in
the combined data compared to how many emails were
received. Why is that? An email might contain multiple
URLs; you will get a record in EmailUrlInfo for
each. An email with a URL in the subject, one in the
email body, and one in the email signature will have a
total of three EmailUrlInfo events for a single email.
So, now we know five users received this email, which contained a single
phishing URL shown in Figure 5-70.
Let’s see if we can determine who clicked the link in the email. If you use
SafeLinks (part of the Defender for Office) or similar third-party tooling,
you can use this data source to detect whether a user clicked a particular
link. These products work by rewriting the actual URL contained in the
email to a cloud security service. Therefore, https://example.com becomes
https://cloudsecurityservice.com/example.com. This is a simplified
example, but these services can block malicious URLs and track which
users clicked them. Figure 5-71 shows how URL protection works.
Note
If you want to look at this data, just use take to look at a sample:
UrlClickEvents
| take 10
Figure 5-72 shows an example of a click event. The click was allowed in
this case because it was deemed not malicious or otherwise blocked. We can
see who clicked the URL, the IP address the click came from, and the
workload (such as Email or Teams). If you use SafeLinks instead of
outright blocking malicious sites, you can warn users and give them the
option to progress anyway using the IsClickedThrough field. If this
is set to true , the user was warned, and they decided to go ahead and
click the link anyway. This data is useful forensically, but it’s also useful
because when the user tells you they didn’t click the link, you can leave a
screenshot of the ClickAllowed event and casually leave it on their
desk.
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation, NetworkMessageI
| where DeliveryAction == "Delivered"
| join kind=inner(EmailUrlInfo) on NetworkMessage
| join kind=inner(UrlClickEvents) on Url, Network
For the second join, we want to make sure we get a match on both the
NetworkMessageId and Url fields, so we just join on both. All
the data is combined, as shown in Figure 5-74.
FIGURE 5-74 Results with combined data from all three tables
We will leverage the let operator to make things even easier to pivot. As
mentioned earlier, we can use let to cast a variable. In this case, we will
cast our whole first query as a variable to reuse in a different data source:
let users=
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation, NetworkMessageI
| where DeliveryAction == "Delivered"
| join kind=inner(EmailUrlInfo) on NetworkMessage
| join kind=inner(UrlClickEvents) on Url, Network
| distinct RecipientEmailAddress;
SigninLogs
| where UserPrincipalName in~ (users)
| where RiskLevelDuringSignIn in ("high","medium"
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
We cast this phishing query as a variable called users . At the end of that
query, we used the distinct operator to retrieve only the impacted
users. We then pivoted across to the Microsoft Entra ID sign-in logs to look
for high- and medium risk sign-ins from those same users. Instead of having
to list the users in this query, we just call them from the users variable.
Figure 5-75 shows a sample of the total sign-in activity, but we received
two very suspicious-looking sign-ins from two IP addresses we haven’t
seen so far: 40.40.40.40 and 30.30.30.30 . The sign-ins are for
the OfficeHome application, which relates to office.com. While not
malicious, office.com is often used by phishing campaigns as a final redirect
or by threat actors testing credentials.
FIGURE 5-75 Results from Microsoft Entra ID sign-in events based on
risk
Once you have these IP addresses, you can complete a secondary pivot
using a second let statement. Maybe some additional users were
compromised differently:
let users=
EmailEvents
| where TimeGenerated > ago(30d)
| where Subject =~ "Your Account Will Expire in 9
or SenderFromAddress =~ "[email protected]
or SenderFromDomain =~ "tailspinIT.com "
| project TimeGenerated, EmailDirection, Recipien
DeliveryAction, DeliveryLocation, NetworkMessageI
| where DeliveryAction == "Delivered"
| join kind=inner(EmailUrlInfo) on NetworkMessage
| join kind=inner(UrlClickEvents) on Url, Network
| distinct RecipientEmailAddress;
let ips=
SigninLogs
| where UserPrincipalName in~ (users)
| where RiskLevelDuringSignIn in ("high","medium"
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
| distinct IPAddress;
SigninLogs
| where UserPrincipalName !in~ (users) and IPAddr
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
This time, we used let to cast two different variables, users from our
first query, and then ips with our second query. Then, we ran a final
query, utilizing both those variables. The query looks for users who aren’t
in the users variable (because we already know about them) but are
from the known-bad IP addresses in the second variable, ips . Pivoting
with let statements like this in Kusto is efficient. For instance, if you
found another suspicious subject that you wanted to query on, you could
add it to the initial phishing query; subsequent queries would then include
that new subject and automatically capture new victim accounts. In this
example, thankfully, we found no additional victim accounts.
Now that we understand the scope of the compromise with our three user
accounts, we need to understand the impact on our systems and user
accounts. You can start by summarizing your sign-in data to see exactly
what was accessed from these IP addresses by the threat actor:
SigninLogs
| where IPAddress in ("40.40.40.40","30.30.30.30"
| summarize TotalCount=count(), ApplicationsAcces
For the data that is in Defender for Cloud Apps, we can use our existing
indicators of compromise (IoCs) to pivot into that dataset:
CloudAppEvents
| where RawEventData has_any ("30.30.30.30","40.4
("[email protected]","tanja.plate@tailsp
FIGURE 5-77 A new rule event created by one user and multiple file
download events by another user
Figure 5-78 shows a closer look at the New-InboxRule event; an
adversary created a rule to move emails containing the word “invoice” to
the Conversation History folder, which explains why Eric is missing those
emails.
In Figure 5-79, we can see that the FileDownloaded events show that the
adversary was very interested in files relating to invoices, payment
instructions, and VPN access.
FIGURE 5-79 FileDownloaded events for Anna Lidman
Phishing attacks are one of the most common attack vectors seen in real-
world compromises. Being able to quickly understand the scale of a
phishing attack is crucial. We need to rapidly understand how many users
were targeted by the attack and how many users received the email. If
possible, we need to know who clicked the link. From that list of users, we
can try to determine which users were actually compromised by the attack.
From there, we revisit our user-compromise playbook to understand what
the adversary did when controlling those users. Did they manipulate
mailbox rules, register MFA, or access business systems? While getting
answers to these questions, we might uncover further indicators of
compromise or additional victims, replay our hunting methodology across
those new indicators, and pivot quickly.
As you analyze your logs, the inconsistencies in the data formats can
become a real bottleneck to completing your analysis. You might find one
log source that refers to a source IP address as SrcIP , while another uses
a completely different language. Some of your appliances might even send
logs in non-UTC time! We can hear the collective scream of analysts as
they remember an investigation where not all devices were configured to
send data in UTC time.
This section of the book aims to show you how to use KQL to manipulate
very inconsistent data into a consistent format. To help understand how we
do that, we have built some fake firewall data that you can use. If you have
been struggling with your own logs, you can apply the examples in this
chapter to your data. You can access this data by using the
externaldata operator. To keep things simple and to make sure the
queries run quickly, we have provided 12 firewall records:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
There are three distinct log variations. The first starts with a srcdev
field; the second starts with device ; and the final starts with a
timestamp. While these logs aren’t taken from actual devices, we have seen
logs that look very similar to these and some that are even harder to
analyze.
The issue with these logs now is that it creating accurate queries becomes
very difficult. Of course, we can search all the logs for a particular IP
address, which will return the results. However, we won’t know whether the
IP is a source IP or a destination IP address without looking at each log
separately. You might also notice that not all these logs are in UTC; there
are references to +5 GMT and another where tz=-4, both indicators of
timezone configuration. Understanding the event order when the
timestamps aren’t uniform requires some mental gymnastics. Take a step
back though, and breathe. KQL has a heap of tools in its bag to sort out
these logs out. You will be finding the bad guys in no time.
There are many ways to attack data issues like this, and there is no right or
wrong way. KQL has several tools to manipulate data, covered earlier in
this chapter and in other parts of the book. We can parse() , split() ,
and trim() things to clean up the data. Given that we have logs from
three unique devices here, we think it’s best to address each device
individually.
Let’s start with our first device; we can see it is flagged in our data as
srcdev=10.10.10.10 , so let’s filter on that:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
Tip
split()
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| extend Logs=split(data,",")
Next, we can create a new array called Logs , with each element in the
array created by splitting the data with a comma. In Figure 5-82, each field
now also has an equal sign ( = ), which separates the field title, such as
srcdev , with the actual value of that field, 10.10.10.10 .
FIGURE 5-82 Output after splitting the data based on a comma
Given the new field, Logs , is an array, the data is also positional, with the
source device in position [0] . We can then do a second split and
extend a new column to get the desired value:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| extend Logs=split(data,",")
| extend SourceDevice=split(Logs[0],"=")[1]
Now, the source device is a separate field! Next, we will extend all the
other fields in the same way:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| extend Logs=split(data,",")
| extend SourceDevice=split(Logs[0],"=")[1]
| extend Date=split(Logs[1],"=")[1]
| extend Time=split(Logs[2],"=")[1]
| extend Action=split(Logs[3],"=")[1]
| extend SourceIP=split(Logs[4],"=")[1]
| extend DestinationIP=split(Logs[5],"=")[1]
| extend SourcePort=split(Logs[6],"=")[1]
| extend DestinationPort=split(Logs[7],"=")[1]
| extend Protocol=split(Logs[8],"=")[1]
| extend BytesIn=split(Logs[9],"=")[1]
| extend BytesOut=split(Logs[10],"=")[1]
| project-away data, Logs
We just use the extend operator for each of the fields. The position in the
Logs array changes, but other than that, it’s just rinse and repeat. At the
end of the query, we also use project-away to remove the data and
Logs fields. project-away is the opposite of project , and we
can tell Kusto to remove the listed fields. We have extracted what we need
from the raw data, so we no longer need to see the raw data in our results.
Now your results appear as shown in Figure 5-85.
FIGURE 5-85 Logs split out fully using a combination of splitting on a
comma and an equal sign
All the fields have been separated, and the data much easier to analyze.
parse
parse lets us match expressions in the data to extend them to new fields.
So, let’s go back to the unstructured data once more to understand how
exactly it works:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
The raw data for the 10.10.10.10 device is shown in Figure 5-86.
FIGURE 5-86 Example firewall data from device 10.10.10.10
Now, let’s see how parse works:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @","
This new line tells Kusto to place the data between srcdev= and the
comma ( , ) and create a new field called SourceDevice from it; you
will see this in your output and Figure 5-87.
You can continue to chain this logic together to extend additional columns.
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
FIGURE 5-88 SourceDevice and Date created as new fields with parse
Using parse in this way requires you to look at your data, identify data
between these two expressions, and build the query to reflect that data. We
can parse the whole string of data out and make the exact same fields we
did using split :
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
Use the same syntax to extract all the fields. At the end of the query, use
project-away on the raw data to remove the data and Logs
fields. The output is identical to where we used the split operator, as
shown in Figure 5-89.
Timestamps
There are several ways to get from A to B when manipulating data like this,
and you might have your own ideas. We would start by splitting the data
into three elements: day , month , and year . In this case, we can
split the data on the whitespace between those elements:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
Figure 5-91 shows the three elements after being split.
Removing Ordinals
Next, we need to drop the ordinals from the day field. We think the best
way to do this is to retrieve the digits via regex :
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
| extend Day=extract(@'^(\d+)',1,Day)
For those unfamiliar with how regex works, it is covered in more detail
in Chapter 6, “Contributing.” In this example, regex looks at the Day
field and extracts only the numbers, dropping the ordinals and future-
proofed us from dates like 1st or 22nd getting into our data.
Now, let’s fix the time, which is a little easier. First, we want to remove the
time zone information from the end of the timestamp because the standard
doesn’t allow it. Again, we split off the data we want:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
| extend Day=extract(@'^(\d+)',1,Day)
| extend Time=tostring(split(Time,"(")[0])
Figure 5-93 shows the time with the time zone removed.
Now, we need to change the decimal points to colons. We can do that via
the replace_string function, which just replaces string matches with
something else:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
| extend Day=extract(@'^(\d+)',1,Day)
| extend Time=tostring(split(Time,"(")[0])
| extend Time=replace_string(Time,".",":")
replace_string found all the full stops in the Time field and
changed them to colons.
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
| extend Day=extract(@'^(\d+)',1,Day)
| extend Time=tostring(split(Time,"(")[0])
| extend Time=replace_string(Time,".",":")
| extend Timestamp=strcat(Day," ",Month," ",Year,
| extend Timestamp=todatetime(Timestamp)
Changing to UTC
The original logs were in +5 GMT, so we need to subtract 5 hours to set the
time to true UTC. Now that we have cast the Timestamp field as the
datetime type, we can simply remove five hours. At the same time, we
will use project-away to remove the working elements, Date ,
Time , Day , Month , and Year . Finally, we will reorder the data to
put the Timestamp first:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "srcdev=10.10.10.10"
| parse data with * @"srcdev=" SourceDevice @",da
@",dstip=" DestinationIP @",srcprt=" SourcePort @
@",bytesout=" BytesOut
| project-away data
| extend Month=tostring(split(Date," ")[0]), Day=
| extend Day=extract(@'^(\d+)',1,Day)
| extend Time=tostring(split(Time,"(")[0])
| extend Time=replace_string(Time,".",":")
| extend Timestamp=strcat(Day," ",Month," ",Year,
| extend Timestamp=todatetime(Timestamp)
| extend Timestamp=Timestamp-5h
| project-away Date, Time, Day, Month, Year
| project-reorder Timestamp, SourceDevice, Action
BytesOut
Figure 5-97 shows the fully parsed data with correct timestamps. Our logs
are so beautiful you could hang them in the Louvre.
FIGURE 5-97 Fully parsed firewall data
If you are new to KQL, you might think cleaning up the data like this is a
lot of work. What’s the point? Cleaning up the data is crucial to analysis
because we can now target specific times and dates, ports, and IP addresses,
and the data is returned in a way that is easy to digest.
Now, we can move on to the next source device, where the logs also differ.
For these, we can look for the 10.10.10.30 IP address:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "device:10.10.10.30"
In Figure 5-98, we see four new records belonging to this device. If we look
a bit closer, we can see that the logs look similar but different enough that
we will need to parse them again. A new field called policy doesn’t
exist in the other logs.
FIGURE 5-98 Firewall data from device 10.10.10.30
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "device:10.10.10.30"
| parse data with @"device:" SourceDevice @",time
SourceIP @":" SourcePort @"/" Protocol @",dst=" D
BytesOut "bytes,tz=" Timezone
| project-away data
When using parse in this fashion, there will be some trial and error as
you get the expressions in the right place, but after a few shots you should
get it right. The data shown in Figure 5-99 looks pretty good nearly straight
away! We just have a couple of things to address.
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "device:10.10.10.30"
| parse data with @"device:" SourceDevice @",time
SourceIP @":" SourcePort @"/" Protocol @",dst=" D
BytesOut "bytes,tz=" Timezone
| project-away data
| extend DestinationPort=split(DestinationPort,"/
| extend Timestamp=todatetime(Timestamp)
| extend Timestamp=Timestamp+4h
| project-away Timezone
| project-reorder Timestamp, SourceDevice, Action
BytesOut, Policy
Figure 5-100 shows the fully parsed firewall data from device 10.10.10.30.
Now, we have two devices that are aligned. One to go! Let’s check out what
those logs look like:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "SourceDeviceIP:10.10.10.20"
As you can see in Figure 5-101, these logs look quite unique compared to
the previous two logs we’ve looked at:
• The SourceDeviceIP is at the end of the results.
• There is no Protocol .
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "SourceDeviceIP:10.10.10.20"
| parse data with Timestamp @": connection " Acti
DestinationPort @" from " SourceIP @":" SourcePor
As you can see in Figure 5-102, some of the data looks good, but there is an
issue. The final record—, a firewall deny event—, hasn’t been
parsed properly because these logs are more descriptive. Instead of a
firewall deny event, it is shown as a connection rejected
event, so our parse logic fails.
To counter this problem, we will need to create two separate parsers, one
for connection accepted events and one for connection
denied events. We will start with connection accepted events.
This query finds only the connection accepted from device
10.10.10.20:
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "SourceDeviceIP:10.10.10.20" and
| parse data with Timestamp @": connection " Acti
DestinationPort @" from " SourceIP @":" SourcePor
| project-away data
Next, we will cast the timestamp to the datetime type so that KQL
treats it as a proper timestamp. Finally, will also re-order the data as
Timestamp , SourceDevice , Action , SourceIP ,
SourcePort , DestinationIP , DestinationPort ,
Protocol , BytesIn , BytesOut , and Policy :
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
| where data has "SourceDeviceIP:10.10.10.20" and
| parse data with Timestamp @": connection " Acti
DestinationPort @" from " SourceIP @":" SourcePor
| project-away data
| extend Timestamp=todatetime(Timestamp)
| project-reorder Timestamp, SourceDevice, Action
We are close to finishing the firewall parser. We have three devices and four
queries, but how do we put them all together? We will simply union our
queries into one magical parser. To make things easier, let’s cast some
variables. First, we will cast externaldata as Logs and then cast
four variables for each of the queries we’ve created called one , two ,
three , four . Then, we just need to union them:
Figure 5-105 shows the logs lined up, with the same field names and
aligned timestamps.
FIGURE 5-105 Parsed firewall data with all devices combined
Some fields are missing in some records, but that is to be expected. Also,
some events and fields aren’t always logged. The Action field has a mix
of different records— accepted , accept , and allow for successful
attempts, and deny , denied , and drop for failed events. We
wouldn’t want to search for only accept and only find events from one
firewall. You can standardize those in several ways, but we prefer to use a
case statement, which is added to the end of the query:
This little bit of code is just a simple case() statement. If you have a
background in SQL, this might look familiar. We are simply saying that if
the Action is allow , accept , or accepted , then change it to
allow , and if it is deny , denied , or drop , then change it to
drop . If it doesn’t match any of those, then set it to other .
Now that the Action field is consistent, if we search for allow events,
we will get results from all three firewalls, even though they don’t use that
language natively. See Figure 5-106.
We have little doubt that many of you have also been the lucky person who
must respond to audits. You know the ones where you receive a laundry list
of questions from an internal or external auditor, and you must go hunting
around your environment for all the answers. Historically, auditors asked
about things like password policies and the status of backup infrastructure.
As cyberattacks have become more commonplace, these audits now often
include things like MFA coverage or vulnerability management to quantify
your environment’s risk. You also might need to provide evidence of
compliance before insurance companies issue new cybersecurity insurance
policies or renew existing ones.
In this section, we will learn how to answer the kinds of questions you
might receive in an audit that satisfy your curiosity. While the questions we
pose in this section might not match the ones you have seen, the hope is that
you will take away knowledge and skills you can apply in your
environment.
For this topic, we will use a variety of data sources. Of course, you might
have additional sources beyond those shown here, but what you learn here
will hopefully apply to those other sources. From our experience, we have
drawn useful information from the following sources:
You can gain valuable insights from myriad data sources, though. They will
be useful if you have third-party MFA or a federated identity provider.
Firewall logs and other endpoint agents might be something you lean on in
your environment.
Multifactor Authentication
Let’s start with MFA, a topic that is likely to come up on audits and in
relation to cybersecurity. If you use Microsoft Entra ID, sign-in data can be
sent to Microsoft Sentinel or visible in Microsoft 365 Defender Advanced
Hunting. The examples in this section will use the Microsoft Sentinel
schema for consistency. However, this book’s GitHub repository at
https://github.com/KQLMSPress/definitive-guide-kql has you covered for
Microsoft 365 Advanced Hunting queries. If you use another identity
provider, such as Okta, and send your logs to Microsoft Sentinel as your
SIEM, then it is highly likely that similar information will be in that
telemetry, too.
or
SigninLogs
| take 1
Figure 5-107 shows just a sample of the schema, but we can see things like
TimeGenerated , the duration of the sign-in, and the ResultType
(also known as the ErrorCode ):
FIGURE 5-107 Example of the Microsoft Entra ID sign-in logs schema
• The application that was signed into—in this case, Azure Virtual Desktop
Client.
While all this information is great, let’s condense the query down to fields
we might be used to answer questions about MFA:
SigninLogs
| project TimeGenerated, AppDisplayName, UserPrin
We use project to retrieve only the time, username, some information
about the result, location, and AuthenticationRequirement field.
If you run that in your tenant, you should see results similar to Figure 5-
109.
FIGURE 5-109 Example sign-in events from Microsoft Entra ID, including
MFA details
In the Tailspin Toys tenant, we can see some sign-ins to the Azure Portal
and the Azure Virtual Desktop Client. Some are from Australia; some are
from the US. We also see a ResultType of 0 .
Tip
The power of KQL really becomes apparent when using the data
summation operators to slice MFA statistics in any way we want. Let’s start
simply by just counting by the AuthenticationRequirement field:
SigninLogs
| where TimeGenerated > ago (180d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize Count=count() by AuthenticationRequir
This example shows the last 180 days using the ago() operator.
Depending on how much data you retain, you might have more or less data
available to you. As mentioned, we also look for successful sign-ins and
keep the same fields using project as before. Then, we just
summarize the count of the AuthenticationRequirement field.
See Figure 5-112.
Note
The results in your tenant will look different, but in the Tailspin Toys
tenant, we can see 1,234 single-factor authentications and 1,755
authentications requiring MFA. Not bad! At least there is more MFA than
single-factor authentication. That said, these raw numbers are probably not
what an auditor is after because the numbers don’t provide a lot of context.
We can do better! We can use multiple summation operators to add
more context:
SigninLogs
| where TimeGenerated > ago (180d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
• The first is the total of all successful sign-ins in the specific period, known
as TotalCount .
SigninLogs
| where TimeGenerated > ago (180d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
| extend ['MFA Percentage']=(todouble(MultiFactor
| extend ['SFA Percentage']=(todouble(SingleFacto
| project-reorder TotalCount, MultiFactor, ['MFA
If you are curious about the todouble() operator, it simply tells KQL
to convert the number to a real floating-point number. (This might be
another trip down math memory lane for you.) We also added a
project-reorder operator in there to make sure our data is ordered
so it’s easy to read. project-reorder simply rearranges the order of
the fields in the results for us. See Figure 5-114.
In Figure 5-115, those multiple decimal points have been rounded to just
two, making the numbers much easier to read.
Now, we can go back to the auditor and say MFA coverage is 58.72 percent.
Pretty good!
One of the best features of data aggregation is that we can count and
calculate things by something else. So, let’s expand the query. Chances
are, your company’s conditional access policies enforce MFA on some
applications and single-factor on others. You can simply add by
AppDisplayName to the previous query to display the coverage per
application. Here’s the revised version:
SigninLogs
| where TimeGenerated > ago (180d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
| extend ['MFA Percentage']=round((todouble(Multi
| extend ['SFA Percentage']=round((todouble(Singl
| project-reorder AppDisplayName, TotalCount, Mul
This query will calculate the percentages for every unique application, as
shown in Figure 5-116.
FIGURE 5-116 MFA statistics per application
The breakdown of MFA per application is quite low for the Azure Virtual
Desktop Client and much higher for the Azure Portal. Why is this valuable?
Maybe your auditor or manager wants to know the MFA percentage for
some specific applications. Perhaps MFA coverage for VPN or virtual
desktop-type applications is especially important. Perhaps access to
management applications like the Azure Portal is crucial. You could add
that additional logic by adding | where AppDisplayName has
“Azure” . Only applications containing the name Azure would be
returned.
SigninLogs
| where TimeGenerated > ago (180d)
| where ResultType == 0
| project TimeGenerated, AppDisplayName, UserPrin
Location
| summarize TotalCount=count(),MultiFactor=counti
SingleFactor=countif(AuthenticationRequirement ==
| extend ['MFA Percentage']=round((todouble(Multi
| extend ['SFA Percentage']=round((todouble(Singl
| project-reorder UserPrincipalName, TotalCount,
Figure 5-117 shows a breakdown of each user.
Maybe your auditor wants to know the MFA percentage for your admin
accounts. If your admin accounts all followed a naming standard beginning
with adm , you could add | where UserPrincipalName
beginswith “adm” to return the MFA percentage for each admin
account.
Tip
AADNonInteractiveUserSignInLogs
| extend DeviceRaw=parse_json(DeviceDet
| extend DeviceOS=DeviceRaw.operatingSy
DeviceId=DeviceRaw.deviceId,DeviceBrows
| where AuthenticationProcessingDetails
| extend JsonAuthProcDetails = parse_js
| mv-apply JsonAuthProcDetails on (
where JsonAuthProcDetails.key startswit
| project HasLegacyTls=JsonAuthProcDeta
| summarize Total=count(),LegacyTLS=cou
AppDisplayName, AppId, tostring(DeviceO
User Accounts
Lifecyle questions are also common. For instance, you might need to
provide the current number of users and when they last signed in. We can
leverage the same Microsoft Entra ID sign-in data to answer those
questions. Let’s say someone wants a count of how many unique users have
signed into your tenant in the last month. That is easy:
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| distinct UserPrincipalName
| count
Much like the MFA queries, we choose the time period first; this time, we
choose 30 days and use ago() . Then, to keep things standard, we look for
successful logins and find the distinct usernames so each person is
accounted for only once. Finally, we count the number of distinct
usernames.
Note
Guest Accounts
We can run our same query from the previous section but just add the guest
user filter:
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where UserType == "Guest"
| distinct UserPrincipalName
| count
We can use more advanced data aggregation to look at all our applications
and how many guests and regular users are accessing each. We use similar
countif() logic that we used for our MFA queries. This time, though,
we are using dcountif() . See Figure 5-118.
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize Members=dcountif(UserPrincipalName,Us
"Guest") by AppDisplayName
SigninLogs
| where TimeGenerated > ago(180d)
| where ResultType == 0
| where UserPrincipalName startswith "adm"
| summarize LastSignIn=max(TimeGenerated) by User
We use max() for this query, which returns the most recent time
something happened. In this case, we want the most recent time a
successful sign-in from a username starting with adm occurred. We
summarize that by UserPrincipalName , so we return the result
for each admin account, as shown in Figure 5-119.
SigninLogs
| where TimeGenerated > ago(180d)
| where ResultType == 0
| summarize LastSignIn=max(TimeGenerated) by User
| extend DaysSinceLastLogon=datetime_diff('day',n
FIGURE 5-120 Last sign-in statistics, including days since the last logon
If you wanted to view this information by application, you can use the same
query. If users no longer access certain applications, that might be a sign
that those applications can be decommissioned.
SigninLogs
| where TimeGenerated > ago(180d)
| where ResultType == 0
| summarize LastSignIn=max(TimeGenerated) by AppD
| extend DaysSinceLastLogon=datetime_diff('day',n
Endpoint Devices
Endpoint devices and servers are the other areas of your environment you
will likely get questions about. These questions might be related to
supportability, such as whether you still have any Windows 7 or Windows
Server 2012 devices. It can also be a configuration management question,
such as the number of devices we have or how many licenses we require. If
you use Microsoft Defender for Endpoint (MDE), you get access to a lot of
telemetry from those devices. With those, you can build great queries to
understand security posture. There are also a lot of prebuilt dashboards in
the Microsoft 365 Defender Portal itself, but you might want to do some
custom reporting. If you send logs from another endpoint detection and
response (EDR) product to Microsoft Sentinel, chances are those logs
contain similar information.
For MDE, those logs are stored in a collection of tables that all begin with
Device*
It’s easy to find out what operating system versions you have. Some general
device information is written regularly to the DeviceInfo table in
MDE. The following query goes back 30 days to ensure we capture every
device that has been online during that period. We used arg_max() to
grab the latest record for each device so we don’t count the same device
multiple times. Then, we just simply count() the number of each
OSPlatform .
DeviceInfo
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| summarize Count=count() by OSPlatform
This kind of query might be a good example of one that would have more
impact if we created a visual for it. We have already done the hard work by
summarizing the data. By adding a couple of lines, we can create a bar
chart:
DeviceInfo
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| summarize Count=count() by OSPlatform
| sort by Count
| render barchart
We added a line— sort by Count —to sort the data, ordering the
results from highest to lowest. Viewing the information this way makes the
visual easier to consume. Finally, we told Kusto to render a bar chart. See
Figure 5-123.
Maybe your manager wants to know the health of the sensor for all devices
onboarded to MDE; the same DeviceInfo table can tell you that, too:
DeviceInfo
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by DeviceId
| where OnboardingStatus == "Onboarded"
| summarize Count=count() by SensorHealthState
You might be asked about the number of devices publicly accessible on the
Internet, all of which have an increased attack-surface risk. These devices
generally have additional controls protecting them, such as a firewall
appliance or additional monitoring and a more rigid patching schedule.
MDE has a few separate tables that can track this kind of data
DeviceInfo , DeviceNetworkEvents , and
DeviceLogonEvents .
DeviceNetworkEvents
| where TimeGenerated > ago (30d)
| where ActionType == "InboundConnectionAccepted"
| distinct DeviceName
DeviceNetworkEvents
| where TimeGenerated > ago (30d)
| where ActionType == "InboundConnectionAccepted"
| where LocalPort in (22,80,443,3389)
| project TimeGenerated, DeviceName, RemoteIP, Re
FIGURE 5-126 Device network events showing connects from the Internet
We include those ports with the in operator and see two public RDP
events.
DeviceLogonEvents
| where TimeGenerated > ago (30d)
| where ActionType == "LogonSuccess" and RemoteIP
| project TimeGenerated, DeviceName, AccountName,
The query is very similar to the prior one, though this time, we are looking
for successful sign-ins instead of accepted inbound connections. Successful
events from adm-eric.lang are shown in Figure 5-127. The
DeviceLogonEvents table has a valuable field called LogonType ,
which differentiates different types of logons, such as RemoteInteractive
(RDP), Interactive (actual hands-on-keyboard logons), and things like
network logons.
FIGURE 5-127 Device logon events from public IP addresses
With these queries, you can tell your manager or auditor how many devices
have a public IP. Further, you could show which ones accepted a public
connection and which had a successful login from the Internet.
You might also have to report how many users are local administrators on
their devices. This question is often brought up in audits because users who
are local administrators are at a higher risk because they can unintentionally
install malware, which can often be packaged into pirated or cracked
software, or delivered to users via phishing. When logging in to a device,
MDE tracks whether the person connecting is a local administrator in a
field called IsLocalAdmin . You can use that field to help retrieve
statistics of local administrator access across your devices:
DeviceLogonEvents
| where TimeGenerated > ago (340d)
| where ActionType == "LogonSuccess"
| where LogonType == "Interactive"
| where InitiatingProcessCommandLine == "lsass.ex
| where AdditionalFields.IsLocalLogon == true
| where IsLocalAdmin == true
| project TimeGenerated, DeviceName, AccountName,
This query looks for only interactive logons and looks for logons that use
lsass.exe as the process. This helps us to exclude native Windows services
that also log in locally. Finally, we filter where IsLocalAdmin ==
true .
Figure 5-128 shows a list of logon events where the user has local admin
privileges.
If you have a lot of devices and users, this list will be massive because
every logon event is tracked. This is probably not what your manager is
after. Instead, we can summarize our local administrator exposure for each
device:
DeviceLogonEvents
| where TimeGenerated > ago (30d)
| where ActionType == "LogonSuccess"
| where LogonType == "Interactive"
| where InitiatingProcessCommandLine == "lsass.ex
| where AdditionalFields.IsLocalLogon == true
| where IsLocalAdmin == true
| project TimeGenerated, DeviceName, AccountName,
| summarize CountofAdmins=dcount(AccountName), Li
DeviceLogonEvents
| where TimeGenerated > ago (30d)
| where ActionType == "LogonSuccess"
| where LogonType == "Interactive"
| where InitiatingProcessCommandLine == "lsass.ex
| where AdditionalFields.IsLocalLogon == true
| where IsLocalAdmin == true
| project TimeGenerated, DeviceName, AccountName,
| summarize CountofDevices=dcount(DeviceName), Li
Figure 5-130 shows a list of users who have logged on to a company device
with administrator privileges and the number of unique devices they have
logged on to. If a user who is a local administrator on hundreds of hundreds
of devices is compromised, that account is of extremely high value to an
adversary.
FIGURE 5-130 Summarized login data for each user with administrative
privileges
You can simply query which devices have a particular piece of software,
such as OneDrive, as shown in Figure 5-131.
DeviceTvmSoftwareInventory
| where SoftwareName has "onedrive"
This query shows vendor and version information and some high-level
details about the device, such as the operating system platform. This can be
useful if you see a report about a vulnerable piece of software or are
auditing the number of copies of a particular piece of software you have.
DeviceTvmSoftwareVulnerabilities
| where CveId == @"CVE-2022-38013"
This data table is valuable when summarizing your data. For instance, we
could find the devices with the most high-rated CVEs (Common
Vulnerabilities and Exposures). The CVE system helps to classify
vulnerable software by giving each vulnerability a score out of 10 to
determine severity, with 10 being the highest severity:
DeviceTvmSoftwareVulnerabilities
| where VulnerabilitySeverityLevel == "High"
| summarize CountOfHighVulns=dcount(CveId) by Dev
let devices=
DeviceNetworkEvents
| where ActionType == "InboundConnectionAccepted"
| distinct DeviceName;
DeviceTvmSoftwareVulnerabilities
| where DeviceName in (devices)
| where VulnerabilitySeverityLevel == "High"
| summarize CountOfHighVulns=dcount(CveId) by Dev
In this query, we cast a variable called devices using let . The first
part of the query finds any devices with an inbound connection from a
public IP. It stores that for us as a variable. The second part of the query
looks for only those devices in our stored variable, by using | where
DeviceName in (devices) .
Figure 5-134 shows that one web server with 152 high vulnerabilities was
returned! We should go patch that one for sure.
You might be asking why we use let instead of join . When we join
tables in KQL, we want data from both queries. In this case, we don’t really
need anything other than the DeviceName from our first query. We just want
to know if a device had an inbound public connection, and that’s it. We
don’t need the time, port, or remote IP. It is easier and more efficient to just
cast it as a variable and reuse it than join the tables.
Tip
In a Microsoft Entra ID tenant compromise where you
have lost positive control—or even if you have
accidentally locked yourself out of your tenant—
Microsoft can reinstate access for you. You will need to
log a support case for the Azure Protection Team to help.
Of course, there are some requirements to reinstate
access, such as proving you are the true owner of the
tenant. Once satisfied, the team will add you back as a
Global Administrator. However, that team is not there to
complete your investigation or evict the threat actor from
your tenant. They can simply restore your access.
For this example, we will primarily use the schema used in Microsoft
Sentinel, focusing on the Microsoft Entra ID audit logs. Not to sound like a
broken record, but the book GitHub repository found at
https://github.com/KQLMSPress/definitive-guide-kql, has you covered for
Microsoft 365 Advanced Hunting, too. Some other data sources that might
be interesting for these kinds of investigations include Microsoft Entra ID
sign-in logs and Microsoft Defender for Cloud Apps. If you have fully
connected Microsoft 365 to MDCA, many of the same audit events are
available there. If you haven’t used the Microsoft Entra ID audit and sign-in
logs before, they cover two unique things:
• The sign-in logs are the actual authentication events—Where and when
a user signed into the tenant.
• The audit logs cover changes to the tenant itself—Things like a user
registering an MFA method, users being created, or tenant-level settings
being altered.
In all scenarios, we hope to teach you a hunting mindset and the KQL skills
to empower it so you can apply it to your organization. Just remember that
your data might look entirely different, and these data sources and schemas
will inevitably change in time.
Initially, we will look at the audit events. If you are using Microsoft
Sentinel, they are written to a table named AuditLogs . The queries in
this section are based on a hypothetical scenario, though you can change the
usernames and other indicators to real data in your tenant to test the queries
and investigation method. As always, the best way to see this data is to use
either getschema or take . You can take a quick look at the schema
with the following query:
AuditLogs
| getschema
If you look at a single record, you can get a feel for the schema of this data.
AuditLogs
| take 1
Again, for the sake of readability, we’re just looking at a few of the fields.
Having spent a lot of time with these logs, we can tell you that two fields
that you will spend a lot of time analyzing are InitiatedBy and
TargetResources :
Often, we use the following query first to remove the additional fields and
allow us to focus on what we care about:
AuditLogs
| project TimeGenerated, OperationName, Result, R
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has "adm-andrew.harris@tailsp
or TargetResources has "adm-andrew.harris@tailspi
In this query, we are looking at the previous five days using ago() for
any events where [email protected] was
the initiator and is in the InitiatedBy field or was the target and
is in the TargetResources field.
If you run this query on your own account, you will see a lot of logs. Data
summation can help you make more sense of it. You can get a quick
snapshot of all the events by using various summarize operators
discussed in “Data Summation Operators,” earlier in this chapter. So, let’s
look at what the adm-andrew.harris account been doing:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has "adm-andrew.harris@tailsp
or TargetResources has "adm-andrew.harris@tailspi
| summarize TotalCount=count(), FirstEvent=min(Ti
Let’s dive deep into the Remove member from role event to see
what it uncovers. To do that, run this query:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has "adm-andrew.harris@tailsp
or TargetResources has "adm-andrew.harris@tailspi
| where OperationName == "Remove member from role
If you dive down into the TargetResources , you can even see the role
that was impacted—the Global Administrator, just as we thought (see
Figure 5-141).
If we wanted to make the query neater, we could use the extend operator
to extend those nested fields into their own columns. For example, we
might want the actor, the actor’s IP address, the target, and the role name to
be individual fields:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has "adm-andrew.harris@tailsp
or TargetResources has "adm-andrew.harris@tailspi
| where OperationName == "Remove member from role
| extend Target = tostring(parse_json(TargetResou
| extend ActorIPAddress = tostring(parse_json(tos
| extend Actor = tostring(parse_json(tostring(par
| extend RoleName =
tostring(parse_json(tostring(parse_json(tostring(
| project TimeGenerated, OperationName, Result, T
If you look at the second last line, you can see the Extend Noperator is
quite complex. This will happen with fields that are nested several layers
down. Now that the columns have been extended, we get a much cleaner
result. We can easily see that the target was removed as a Global
Administrator by the [email protected]
account. As shown in Figure 5-142, we can also see an additional indicator,
the IP address 50.50.50.50 .
FIGURE 5-142 Parsed query for role removal event
Tip
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has "svc-integration@tailspin
or TargetResources has svc-integration@tailspinto
In Figure 5-143, we can see this account is new to the tenant because the
first event we see an Add user event. We also see an Add member
to role event, which seems to be the opposite of the Remove
member from role event we saw earlier.
FIGURE 5-143 Audit events for svc-integration
If we drill down on these two events, we can see that this account was
created by [email protected] . See
Figure 5-144.
Run the following query to extend the columns to make this information
easier to digest:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where OperationName == "Add user" and Initiated
| extend ActorIPAddress = tostring(parse_json(tos
| extend Actor = tostring(parse_json(tostring(Ini
| extend Target = tostring(TargetResources[0].use
| project TimeGenerated, OperationName, Actor, Ac
Tip
AuditLogs
| where OperationName == "Enable account"
| extend userPrincipalName_ = tostring(parse_json
| extend ipAddress_ = tostring(parse_json(tostrin
| extend TargetUserEnabled = tostring(TargetResou
| project TimeGenerated, OperationName, UserThatE
UserUpdated=TargetUserEnabled
This can be handy to place at the top of your queries to track where you are
at. This shouldn’t be your official timeline or documentation of events, but
it can be valuable to easily refer to.
If we look at Figure 5-143 once again, another interesting event for adm-
[email protected] is Reset password
(by admin) , indicating that a different administrator actually reset the
password on [email protected] .
Additionally, we see Admin registered security info , which
means another admin registered an MFA method on this account. If we look
at those events more closely, more of the story starts to become apparent.
Using the in operator, we can query both at the same time:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where OperationName in ("Reset password (by adm
| where TargetResources has adm-pia.westermann@ta
The actions shown in Figures 5-148 and 5-149 were initiated by adm-
[email protected] . Jon Orton is another admin
within the organization, and although this account is not a full Global
Administrator, they do hold the Privileged Authentication Administrator
role. With that, they can change the passwords of other privileged users,
including Global Administrators, and update MFA details for them.
FIGURE 5-148 Password reset event
Tip
CloudAppEvents
| where Timestamp >= datetime("Insert date")
| where ActionType == "Update user." and RawEvent
| extend target = RawEventData.ObjectId
| mvexpand ModifiedProperties = parse_json(RawEve
| where ModifiedProperties matches regex @"\+\d{1
| mvexpand ModifiedProperties = parse_json(Modifi
| where ModifiedProperties contains "NewValue" an
| extend PhoneNumber = extract(@"\+\d{1,3}\s*\d{9
| project Timestamp, target, PhoneNumber
After looking through the data for the adm-
[email protected] password and MFA update events,
the IP address is not the same 50.50.50.50 malicious IP we have
already been tracking. In fact, the IP address used for these events doesn’t
stand out as malicious; it is regularly associated with Jon Orton’s account.
Jon remembers getting a Teams message from Pia Westerman from Pia’s
regular, non-privileged account. In the message, Pia said she forgot the
credentials for her admin account because she doesn’t use it often and has a
new mobile phone number, too. Jon reset Pia’s credentials for her admin
account and updated the MFA number, which accounts for the actions seen
in the audit log. Jon believed he was doing the right thing by helping out a
colleague. However, after talking to Pia, she said she never sent those
messages despite them appearing in the Teams chat log. Also, she said the
+1 4845551234 phone number is unknown to her. It appears that Jon
was socially engineered by someone who had compromised Pia’s regular
account.
This is a good reminder that, sometimes, the data can’t tell the entire story;
it certainly told us these actions were taken on Pia’s account, but it didn’t
have the full context. Deconflicting events like this with users is an
important part of any investigation.
That phone number also becomes another indicator. Adversaries will often
reuse the same phone numbers for MFA. We can query events where an
admin has registered that same phone number against other accounts.
AuditLogs
| where TimeGenerated > ago(5d)
| where OperationName == "Admin registered securi
| where TargetResources has "+1 4845551234"
| extend Target = tostring(TargetResources[0].use
| extend Actor = tostring(parse_json(tostring(Ini
| project TimeGenerated, OperationName, Result, A
When investigating an impact like this, some things are easily detected as
malicious. For instance, in this scenario, we know the svc-
integration , svc-useronboarding , breakglass04 , and
helpdesk01 accounts were created by our adversary. Given a threat
actor created them, we know that everything from them is malicious. We
also know that the 50.50.50.50 IP address is bad. We can use those
two pieces of information to craft a query to understand what those
accounts have been doing. We search for anything initiated by either those
accounts or that IP address.
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("svc-integration@tai
"[email protected]"
"[email protected]",
"[email protected]")
or InitiatedBy has "50.50.50.50"
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("adm-andrew.harris@t
"[email protected]"
"[email protected]")
and InitiatedBy has "50.50.50.50"
For events created by legitimate users that we are unsure whether they are
malicious, we can just deconflict the actions with the user. By deconfliction,
we mean literally asking them if they performed a certain action. In terms
of unwinding the damage caused by the threat actor, if the user is unsure
whether they performed the action themselves (especially if a long time has
passed) and we don’t have another indicator to confirm, we should err on
the side of caution and revert the changes.
In this instance, let’s look at four threat actor–created accounts to see what
they have been doing. We can reuse the summation operators to help
make sense of their actions:
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("svc-integration@tai
"[email protected]"
"[email protected]",
"[email protected]")
| summarize TotalCount=count(), FirstEvent=min(Ti
All the actions shown in Figure 5-153 came from our threat actor–created
accounts, so they all need to be reviewed. We know the MFA changes, role
removal, and additions. The Add named location and Update
conditional access policy events are important to investigate.
Named Locations in Microsoft Entra ID allow us to define groups of IP
ranges or countries that we can use in conditional access policies— to block
certain ranges or have different security requirements on particular IP
addresses.
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("svc-integration@tai
"[email protected]"
"[email protected]",
"[email protected]")
| where OperationName == "Add named location"
Drilling down on this event, we can see in Figure 5-154, the threat actor
created a new Named Location named Corporate IPs .
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("svc-integration@tai
"[email protected]"
"[email protected]",
"[email protected]")
| where OperationName == "Update conditional acce
In Figure 1-55, we can see that a policy named Require MFA for
Office 365 was updated, and an exclusion was put in place for the new
named location—the location identifier (the GUID starting with
03ee556 ) is a match. Assuming the policy name is accurate, it would
bypass MFA for the threat actor on the 50.50.50.50 IP.
AuditLogs
| where TimeGenerated > ago (5d)
| project TimeGenerated, OperationName, Result, R
| where InitiatedBy has_any ("svc-integration@tai
"[email protected]"
"[email protected]",
"[email protected]")
| where OperationName has "Update application – C
If we locate this app, we can see the secret still there, and the key idenfifier
matches between the logs and the secret itself. See Figure 5-158.
In cases like these, when a threat actor has manipulated settings or policy or
created accounts, we have a couple of options—we can revert the changes
or delete the resources and start over. If an adversary has taken control over
items like privileged user accounts, we believe the best practice is to delete
the existing accounts, and new accounts should be created. This will help
restore trust in your tenant and ensure you are tenant-level control is
positive.
We can also refer to the Microsoft Entra ID logs to see if we can uncover
any additional user accounts or indicators. For instance, we can detect sign-
ins from the threat actor creator accounts or the known-malicious IP.
SigninLogs
| project TimeGenerated, UserPrincipalName, Resul
RiskLevelDuringSignIn, RiskEventTypes
| where UserPrincipalName in~ ("svc-integration@t
"[email protected]"
"[email protected]",
"[email protected]") or I
• We might find other IP addresses for those four user accounts we had not
previously seen.
• We might also see other things that we can use to pivot on, such as user
agents, or locations.
• There is a chance that the initial sign-in was flagged with some kind of
risk that might help our investigation.
Around the time of the first successful sign-in, do we see lots of failed sign-
ins with incorrect username and passwords? If so, that might indicate some
kind of password spray. Alternatively, do we see a successful sign-in with
no surrounding anomalous activity. In that case, maybe that points to a
successful phishing attack, where the adversary knew the credentials.
Proving phishing or similar attacks is sometimes difficult. We might not get
the telemetry to know for certain. For instance, maybe a user was sent a
phishing email to a personal computer, where they accidentally installed
credential-stealing malware, which also stole corporate credentials. Maybe
the user was a smishing (a phishing attack sent via SMS to mobile phones)
victim and received a phishing message via SMS to a personal phone.
Often, users might not want to admit they clicked suspicious links out of
embarrassment. Initial access is not always provable; instead, it’s often
inferred.
These sign-in logs can also help us understand gaps in our security controls.
If the sign-ins to the Azure Portal were flagged with risk, can we enable
risk-based conditional access for our admins?
• Did they attempt to access credentials, and how was that achieved?
These reports might also include details of whether the adversary utililized
any living-off-the-land binaries (affectionately known as LOLBins). Instead
of deploying malware to an environment, an adversary might “live off the
land” by using legitimate applications or functionality built into the
operating system for malicious purposes. For example, system
administrators can use PowerShell for day-to-day legitimate work, whereas
a threat actor might use PowerShell to exfiltrate data or ransomware
devices. Detecting malicious use of these legitimate binaries can be difficult
and requires the ability to write queries and business context. For instance,
a PowerShell script running regularly in your environment might look
suspicious, but that is just how that application works.
• The threat actor then downloaded the instructions for the VPN and VPN
client and successfully connected to the corporate VPN from a threat actor–
controlled workstation named DESKTOP-80YMGP . Reconnaissance of
Active Directory—including querying the Domain Admins membership—
was detected from Eric’s account.
• A Group Policy Object was created to disable antivirus across the entire
server fleet and deploy reprise99 ransomware to all servers.
Even though this is a simplified timeline, a lot is going on here! Let’s take
these step by step, and write some queries to detect the indicators and,
where possible, the behavior. We will primarily use data from Microsoft
Defender for Endpoint (MDE), but we will also mention other locations
where you might find forensic information about attacks like this. Before
digging into the queries, let's have a bit of a closer look at exactly what data
MDE surfaces for us to analyze. Whether you use Microsoft Sentinel or
Microsoft Defender 365 Advanced Hunting, MDE sends data to a collection
of tables starting with Device . The tables and the data they contain are as
follows:
Now that we have an outline of the data, let’s dig into the first event: User 0
attempted to download a legitimate IT asset inventory software but was
redirected by malicious advertising in his search results to kql.io, a site
designed to masquerade as the legitimate site and trick users into
downloading malware, named discoverypackage.exe , with SHA1
hash DA39A3EE5E6B4B0D3255BFEF95601890AFD80709 .
Note
In this book’s introductory chapters, we discussed some
simple ways to search for specific strings in our data. For
instance, we could easily use our search operator to find
any of these indicators.
The following query will find hits on those indicators, which is obviously a
great starting point. Searching this way is valuable for finding the indicators
and the data source they were found in. If you want to retrieve only the
table names where those indicators are found, you can add a distinct
operator to the end:
search("kql.io")
search("discoverypackage.exe")
search("DA39A3EE5E6B4B0D3255BFEF95601890AFD80709"
The following query will only return the table names in which the
discoverypackage.exe filename was found. From that, we can
narrow down our future searches.
search("discoverypackage.exe")
| distinct $table
What about if you want to know which tables contain any kind of remote
URLs, not just a specific one like kql.io ? We can click the table names
in the Microsoft Sentinel UI to find the existing fields for each table. We
can see an example from the DeviceNetworkEvents table in Figure
5-160.
FIGURE 5-160 Schema reference table
Instead of clicking through the UI, we can use Kusto to find out where
remote URLs can be found. There are several ways of doing this, but our
preferred way is to use union to hunt across all the tables:
1. The first line uses union to union these tables. As part of the union ,
we use withsource to tell Kusto to create a new field from the
TableName . That way, when we look at the data, we know which source
table it came from.
2. Next, we will union all the Device tables together. Many people don’t
know that union supports wildcards, so if you want to combine all the
Device tables, there is no need to list them individually. Using Device*
captures them all.
3. Next, we want to look for any record where the RemoteUrl is not
empty.
Note
DeviceFileEvents
| where FileName =~ "discoverypackage.exe" or SHA
Figure 5-162 shows the file creation event for
discoverypackage.exe .
This query will find any network connections initiated by certutil.exe that
are connecting to a public IP address.
DeviceNetworkEvents
| where InitiatingProcessCommandLine has "certuti
| project TimeGenerated, DeviceName, InitiatingPr
RemoteIPType, RemoteIP, RemoteUrl, RemotePort
Figure 5-164 shows evidence that aligns with the investigation’s timeline;
certutil.exe was used to download reprise99.exe from
70.70.70.70 . The reprise99.exe file was renamed to
winword.exe as it was downloaded. Our detection is based on finding
certutil in the InitiatingProcessCommandLine field and
public in the RemoteIPType field. While we could detect purely on
70.70.70.70 , we know IP indicators like this change frequently, and
behavior-based logic for this detection is better.
Next, the adversary accessed Eric’s Microsoft 365 account. Eric’s corporate
credentials were stolen by the malware he saved in his browser. The
adversary then logged in to Eric’s Microsoft 365 account and searched for
terms such as VPN, work from home, and virtual desktop. The threat
actor then downloaded the instructions for the VPN and the VPN client.
This is a good example of why we need to hunt wide on our indicators. We
have these IP IOCs from this malware, but we might see them elsewhere.
Maybe our adversary connected to Microsoft 365 from the same IP address.
That is why it’s key to use search or similar operators early in an
investigation.
Also, we can see the download of the VPN client and documentation.
Obviously, just downloading the VPN documentation is not inherently
malicious—the users do that every day. The threat actor’s download of the
VPN client is a new behavior, so how do we turn that into an actionable
detection? We can’t alert every time someone downloads the client.
However, we can leverage risk signals:
SigninLogs
| where RiskLevelDuringSignIn in ("high","medium"
| project RiskySigninTime=TimeGenerated, SigninIP
| join kind=inner(
CloudAppEvents
| where ActionType == "FileDownloaded"
| extend UserPrincipalName = tostring(RawEventDat
| extend FileName = tostring(RawEventData.SourceF
| where FileName has_any ("VPN","WFH","Work from
desktop","anyconnect","globalprotect")
| project DownloadTime=TimeGenerated, FileName, U
) on UserPrincipalName
| extend ['Hours Between Events']=datetime_diff("
| where ['Hours Between Events'] <= 6
We join the data from our Entra ID logs to the Microsoft Defender for
Cloud App events and look for file downloads relating to VPNs or
passwords; we look for keywords such as VPN, WFH, VDI, and virtual
desktop. If you use a particular VPN product, you can include it or
anything specific to your business here. This query then looks for events
within six hours of each other. We know adversaries will move quickly, and
we also want to limit false positives. If a user has a risky sign-in and
downloads VPN documentation four weeks later, these things might not be
related. However, the risk could be higher if they occur within an hour of
each other. During the next step, the threat actor successfully pivots to on-
premises, connecting to the corporate VPN from a threat actor–controlled
workstation named DESKTOP-80YMGP .
This next event draws an interesting line in the sand in our investigation.
Until now, we have been looking at data sources that we can access: MDE
logs from our devices, download events from SharePoint, Microsoft Entra
ID sign-in data, and so on. However, in this event, the threat actor
connected to the VPN from their own device named DESKTOP-80YMGP .
This is common in real-life engagements; if possible, an adversary will
connect their own device (usually a virtual machine) and complete activities
from there. The threat actor can simply disable antivirus on their virtual
machine, and we won’t have any telemetry available. Threat actor is
unlikely to enroll their device into MDE for us, so we can directly see what
they’ve been up to, though that would be kind of them.
When we don’t have any forensic information from the device itself, we
need to look for evidence of the activities or devices and identities they
interacted with—the ones for which we do have visibility. For instance, you
might wonder how we even know the name of the adversary’s device. That
data can be exposed in many locations; for instance, when that device
attempts to sign in to a device in our environment, the workstation name
can be exposed in MDE or Windows Security Event logs. Your VPN
appliance might track that information, too, so it might be there.
Uncovering the device name and IP address of a threat actor’s device is
extremely valuable to your investigation.
Figure 5-165 shows that the workstation name is found only in the
DeviceLogonEvents .
We know this computer is malicious, so you can search for any evidence of
it in that table:
DeviceLogonEvents
| where * contains "DESKTOP-80YMGP"
Figure 5-166 shows a sample of the returned events in which the threat
actor’s workstation name surfaced in the RemoteDeviceName field.
Interestingly, we see another jump host that was targeted from this device
and multiple IP addresses, which makes sense if the adversary is accessing
a VPN; if they disconnect and reconnect, chances are, they would obtain a
new IP.
FIGURE 5-166 Logon events for device DESKTOP-80YMGP
Coming up with a detection for activity like this is hard because it depends
on how your users are allowed to work. If you must use a corporate-
managed device, detecting on devices is easier. At Tailspin Toys, all
sanctioned device names start with TST- , so you could alert when a logon
occurs from a device with a name that doesn’t use that scheme:
DeviceLogonEvents
| where not (RemoteDeviceName has "TST-")
let existingusers=
DeviceLogonEvents
| where TimeGenerated > ago(30d) and TimeGenerate
| where ActionType== "LogonSuccess" and LogonType
| summarize CountOfDistinctDevices=dcount(DeviceN
| where CountOfDistinctDevices > 50
| distinct AccountName;
DeviceLogonEvents
| where TimeGenerated > ago(6h)
| where ActionType== "LogonSuccess" and LogonType
| summarize CountOfDistinctDevices=dcount(DeviceN
| where CountOfDistinctDevices > 50
| where AccountName !in (existingusers)
In this query, we’re doing a little time analysis. Let’s break the query down
to understand it better:
• To baseline the environment, we look for events between 30 days and 6
hours ago. Within that time period, we will summarize all successful
network logon events.
• Then, want to filter out the noise, so we then say, “Only show me accounts
that log on to more than 50 devices in a single-hour period.” There are
likely service accounts or applications that interact with Active Directory
heavily, so we are trying to exclude those from the query.
• Next, we look at the last six hours for the same behavior and exclude any
accounts we have already found in the baseline query.
In Figure 5-167, we see that Eric’s account had a large spike in logon
activity in one 1-hour time block, where his account logged on to more than
4,000 devices.
FIGURE 5-167 Summarized logon activity for Eric Lang
Now that the threat actor has accessed one of our devices interactively, we
will start seeing events in the DeviceLogonEvents table, which tracks
all logon events to onboarded devices. Again, it isn’t practical to alert on
every logon to a jump host. By their very nature, they accept many logons.
We do know that IT admins are creatures of habit, though. Maybe Eric
Lang’s compromised account never really uses this jump host. Again, we
can hunt specifically for Eric Lang logging in to this server:
DeviceLogonEvents
| where TimeGenerated > ago(6h)
| where DeviceName =~ "ADMSERVER01.tailspintoys.c
| where ActionType== "LogonSuccess" and LogonType
| where AccountName =~ "eric.lang"
As part of the investigation, we would investigate exactly what Eric’s
account accessed – to understand the scope of compromise. But we also
want to turn this into a valuable behavioral detection, to catch other
accounts that might be compromised. Let’s write a query to detect when a
user logs onto this jump host for the first time in 30 days:
let existingusers=
DeviceLogonEvents
| where TimeGenerated > ago(30d) and TimeGenerate
| where DeviceName =~ "ADMSERVER01.tailspintoys.c
| where ActionType== "LogonSuccess" and LogonType
| distinct AccountName;
DeviceLogonEvents
| where TimeGenerated > ago(6h)
| where DeviceName =~ "ADMSERVER01.tailspintoys.c
| where ActionType== "LogonSuccess" and LogonType
| where AccountName !in (existingusers)
This query looks at the time between the prior 30 days and 6 hours ago. We
do that with | where TimeGenerated > ago(30d) and
TimeGenerated < ago(6h) .
If you don’t have a data source like MDE available to you, you can achieve
the same thing if you have native Windows Security Event logs. The logic
is the same, but the data structure is different. In Windows event logs, you
would be looking for EventId 4624 , which is a successful log-on, and
Logon Type 10 , which is RDP:
let existingusers=
SecurityEvent
| where TimeGenerated > ago(30d) and TimeGenerate
| where Computer == "ADMSERVER01.tailspintoys.com
| where EventID == 4624
| where LogonType == 10
| distinct TargetAccount;
SecurityEvent
| where TimeGenerated > ago(6h)
| where Computer == "ADMSERVER01.tailspintoys.com
| where EventID == 4624
| where LogonType == 10
| where TargetAccount !in (existingusers)
Of course, if you had other jump hosts that were named similarly and
wanted to expand your detection, instead of alerting on one specific device,
you could include a catch-all based on the name. Perhaps you would use |
where DeviceName startswith “adm” if all your jump hosts
followed a similar naming standard. This RDP activity is followed by
Credential Access events on that same jump host.
Using forensic tooling, we observed that the antivirus product was disabled
on ADMSERVER01 . Additionally, the threat actor dumped LSASS (Local
Security Authority Subsystem Service) to disk using Task Manager.
Adversaries target this process to extract the credentials from the device.
Antivirus logs are a valuable forensic and detection tool for events where
antivirus or other security tools have been disabled or altered to allow
malware. If you use MDE, you can track tampering attempts:
DeviceEvents
| where ActionType == "TamperingAttempt"
Event
| where EventLog == "Microsoft-Windows-Windows De
| where EventID in ("5001","5007","5013","1116")
Regarding the creation of LSASS dumps, any EDR product should detect
this, but for the sake of the exercise, you can use file-creation events for
.dmp files. We exclude crash dumps generated by the WerFault process
from our detection logic:
DeviceFileEvents
| where InitiatingProcessFileName != "WerFault.ex
| where FileName endswith ".dmp"
In this hypothetical scenario, the threat actor potentially copied that LSASS
dump to their threat actor–owned machine and used a tool like Mimikatz
from there.
Finally, we see that the threat actor accessed a domain controller and
deployed ransomware.
We can capture any successful RDP event to any Domain controller that
doesn’t come from a device starting with PAW- , which is how we name
our privileged workstations.
Instead, we can try to alert on activity spikes in activity. This query will find
when any of these settings are changed on 10 or more devices in a 15-
minute period:
Event
| where EventLog == "Microsoft-Windows-Windows De
| where EventID in ("5001","5007","5013","1116")
| summarize CountofDistinctDevices=dcount(Compute
15m)
| where CountofDistinctDevices > 10
You might think that if you are alerting at this point in the kill-chain, you’re
already having a bad day—and you are right. That is the nature of
ransomware. By the time an adversary has compromised a Domain
Administrator–level credential, you are trying to limit the damage.
With cybersecurity, people often say there are two types of controls—
preventative or detective. We can either prevent something or detect it. The
more preventative controls you can put in place, the better. If we can simply
stop something happening with our security tools, or how we manage our
identities, then perfect. For the remaining events, we need to attempt to
detect them in a timely manner. Reading reports like this, or threat
intelligence reports, can provide information about what is currently
happening in real-life engagements. From those, you can take away the
indicators of compromise and see if you have any relevant hits for them in
your environment.
It’s just as important to learn what tactics are currently being employed by
threat actors. If we can’t prevent the actions, can we detect them quickly?
Allowing personal devices to connect to your VPN is a perfect example.
Can we prevent that behavior with policy, whether conditional access or on
a firewall appliance? Remember: The further along in that kill-chain you
go, the harder it gets to prevent something destructive in your environment.
Summary
By now, you should be able to understand the benefits of using KQL day-
to-day in your cybersecurity role, find events in your data using a variety of
different searching operators, and manipulate and parse data so that it is
consistently formatted and easy to analyze. Also, you learned how to apply
the concepts in this chapter to real-world investigations, such as phishing
attacks or tenant compromise, and combine many data sources to craft
single queries.
OceanofPDF.com
Chapter 6. Advanced KQL Cybersecurity Use
Cases and Operators
In this final chapter, we will expand on all the KQL from our security
scenarios and move into some more advanced operators and use cases.
Even though the queries and operators you see in this section are more
advanced, we hope you have learned enough to follow along. Importantly,
while these aren’t full scenarios like we previously worked through, all the
queries and examples you see are still based on real-world use cases. This
chapter isn’t a definitive list of every function and operator in KQL.
Instead, we covered just the ones that skew toward security data analysis. If
you look through the official documentation, you will see many other
functions there. Some skew toward data analytics or geospatial analysis, so
we don’t use them from a cybersecurity perspective.
To use this chapter, you will need some data! To make things easy, we will
use a few data sources:
For those who have spent any time in KQL, you realize, at some point, the
bit of data you need is buried deep within a much larger JSON data object.
This can be for several reasons: perhaps the log you are looking at covers
multiple events. For example, the Update user in Microsoft Entra ID
is written to the audit log and covers many different operations. You will
see an Update user event written when a user changes their name.
Perhaps they were recently married. You will also see an Update user
event if a user adds a new MFA method. Some of the data in these two
operations will remain constant, but there will also be unique data for each.
When a user changes their name, we will have the old and new names, and
the user is probably who triggered the update. For an MFA registration
event, we may have a phone number associated with the registered phone.
The specific data for each Update user event is held in a JSON object
called TargetResources . The structure of TargetResources
changes significantly depending on exactly what triggered the Update
user event.
Like many things in Kusto, there are multiple ways to manipulate these
kinds of datasets to get what you are after. We will deep dive into two
operators in particular, mv-expand and mv-apply. To help illustrate
the differences between them and how you can use them day to day, we
have created some test Microsoft Entra ID sign-in data for use. It is hosted
in a GitHub gist in the repository for the book, so we can use our
externaldata operator to retrieve it. , if you have your own Microsoft
Sentinel workspace, then use that instead.
SigninLogs
| project TimeGenerated, UserPrincipalName, Resul
ConditionalAccessPolicies
If you run this query, you will see 500 sample events similar to Figure 6-1.
For the sake of readability, only a few are shown in Figure 6-2. You might
have more or less than that. If you look at one in particular, you will see
even more nested JSON.
After filtering on sign-ins that failed conditional access, we can see a few in
the list. Figure 6-3 shows the effect on the array order.
FIGURE 6-3 An example of a Conditional access failure event
You may even notice that we have more nested JSON in each policy record.
Figure 6-4 shows "Mfa" in the enforcedGrantControls field. If
this policy required MFA from your users and a compliant device, both
would appear in that array.
Now that we understand the dilemma, let’s see how operators can help and
which to use.
mv-expand
If you have your own data, you can simply use take 1 to return a single
record. If you are using the test data, you can achieve the same:
externaldata
(TimeGenerated:datetime,UserPrincipalName:string,
nalAccessStatus:string,ConditionalAccessPolicies
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
='csv',ignorefirstrecord=true)
| take 1
externaldata
(TimeGenerated:datetime,UserPrincipalName:string,
nalAccessStatus:string,ConditionalAccessPolicies
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
='csv',ignorefirstrecord=true)
| take 1
| mv-expand ConditionalAccessPolicies
FIGURE 6-6 Conditional access policies expanded via the use of mv-
expand
externaldata
(TimeGenerated:datetime,UserPrincipalName:string,
nalAccessStatus:string,ConditionalAccessPolicies
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
='csv',ignorefirstrecord=true)
| extend CAResult = tostring(ConditionalAccessPol
| where CAResult == "failure"
In the first policy line, we extended a column for the result field; you can
see that the position is set to [0] . Then, we looked for instances when that
field equals “failure” . You should see five records, as shown in Figure
6-7.
externaldata
(TimeGenerated:datetime,UserPrincipalName:string,
nalAccessStatus:string,ConditionalAccessPolicies
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
='csv',ignorefirstrecord=true)
| mv-expand ConditionalAccessPolicies
| extend CAResult = tostring(ConditionalAccessPol
| where CAResult == "failure"
Tip
SigninLogs
| where TimeGenerated > ago(90d)
| mv-expand ParsedFields = parse_json(Authenticat
| extend AuthMethod = ParsedFields.authentication
| extend ParsedFields2 = parse_json(DeviceDetail)
| extend DeviceID = tostring(ParsedFields2.device
| extend ParsedFields3 = parse_json(Status)
| extend SigninStatus = tostring(ParsedFields3.er
| where AuthMethod != "Previously satisfied"
| where isnotempty(DeviceID)
| where SigninStatus == 0
| summarize dcount(DeviceID) by UserDisplayName
| order by dcount_DeviceID desc
Tip
mv-apply
• mv-expand expands the data for us, and then we can query on it.
• mv-apply lets us apply a filter or manipulate that data as we expand
it.
The syntax for mv-apply can be a little confusing, but we can use the
mv-expand example to understand how to use mv-apply . Again, we
only want to retrieve failure results:
externaldata
(TimeGenerated:datetime,UserPrincipalName:string,
nalAccessStatus:string,ConditionalAccessPolicies
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
='csv',ignorefirstrecord=true)
| mv-apply ConditionalAccessPolicies on (
where ConditionalAccessPolicies.result == "failur
)
You can see the syntax is a little different. We first choose the field we want
to mv-apply on—in this case, ConditionalAccessPolicies .
Then, we need to define a subquery: mv-apply
ConditionalAccessPolicies on . The on refers to the criteria
we will expand our data with. In this case, when the result field at
ConditionalAccessPolicies.result is equal to failure .
You will see 11 results again, the same as in the mv-expand example. It
is just two different ways to get the same result. Let’s look at another
example to reinforce that learning.
In this example, we want to look for mailbox rule creations, particularly
where the mailbox rule is only made up of non-alphanumeric characters,
which can indicate a compromise. This data can be found in the Defender
for Cloud Apps dataset:
CloudAppEvents
| where Application == "Microsoft Exchange Online
| where ActionType == "New-InboxRule"
| mv-apply Objects=todynamic(ActivityObjects) on
(
where Objects.Name == "Name"
| extend RuleName= Objects.Value
)
| where isnotempty(RuleName)
| where RuleName matches regex @"^[^a-zA-Z0-9]*$"
| extend AccountUpn=tostring(RawEventData.UserId)
| extend SessionId=tostring(RawEventData.SessionI
| project TimeGenerated, Application, ActionType,
If you are wondering when you should use mv-expand versus mv-apply.
The answer, as always, depends on what you are trying to achieve. If you
are trying to understand conditional access policy stats, then mv-expand
might be better because your stats will be accurate and include all events. If
you are writing a detection rule, then mv-apply may be better because it
is more specific and targeted. You may even use a combination of both if
you have additional nested JSON inside your original JSON object.
Joins
We hear your collective screams. Perhaps you thought we were all done
with joins after having discussed them in-depth in Chapter 3, “Unlocking
Insights with Advanced KQL Operators,” but in this chapter, we want to
show some specific cybersecurity use cases for them.
Before diving in, let’s take a step back and understand what we are trying to
achieve when we join data. At its simplest, joins are used when we need to
join data from two (or more) tables containing data we are interested in.
Joins allow us to combine that data into a single set of results. For example,
we might be investigating a user compromise and need to join Microsoft
Entra ID sign-in and audit Log data. The sign-in data contains information
about authentication events, and the audit data contains information about
changes to Microsoft Entra ID, such as MFA changes or password reset
events. Items such as IP addresses, usernames, user agents, and others may
be available in both data sources. Joins let us unite two sets of data to
understand patterns between them.
You might also want to join email URL info and firewall data. Let’s say
you’re tracking a phishing campaign that uses a suspicious URL, such as
totallynotphishing.com. Your email URL info data can tell you which users
received an email containing that URL, and your firewall data can help you
determine which users or endpoints are connected to it. By combining the
data, you can work out which users were impacted, potentially uncover
other indicators of compromise (IOCs), and even gain insights such as the
time between the email arrival and when users or devices accessed the
domain.
Think of it like this: You have different datasets tied together by something
like a filename, IP, hash, or the like, and you need information from both
datasets. join is your new best friend. You will get different results
depending on the kind of join you use. Perhaps you want the complete
joined data, or perhaps you just want certain parts of it. We will cover both
possibilities in this section.
Before we dig in, let’s look at the main types of joins shown in Figure 6-8.
FIGURE 6-8 Visualization showing the various types of joins available in
KQL
When joining data, you will often hear the concept of left and right .
Left is the first query in the join, and right is the second. Imagine
you were looking at a list of indicators of compromise like a set of IP
addresses. Those IP addresses include is other information, such as the IP’s
country of origin. And let’s say you want to join that data to your firewall
data to understand any correlations between them. The indicators would be
the left table, and the firewall data would be the right table. If you
reversed your query and looked at the firewall data first and then matched it
to the indicators, your firewall data would be the left table, and the
indicators would be the right table.
The shaded portions of Figure 6-8 show what data is returned once the
tables have been joined. Using the same hypothetical with indicators and
firewall data, if you completed a fullouter join, you would get the
complete data from your indicators and firewall data, including any
matches. If you did a rightanti join, you would only get the data from
your firewall data.
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true)
In Figure 6-9, you can see that the query returned a list of indicators,
including the IP address and port information, details about the location,
confidence level, and a first-seen timestamp. If you have a threat
intelligence feed, you will get access to similar telemetry, though it would
likely be a bit more detailed.
FIGURE 6-9 IP address indicators of compromise
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true)
Inner Join
We’ll start with an inner join because it’s the one you will use most
often. First though, a word about the join syntax: To make things easy, we
will cast our two datasets as variables using the let operator:
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
Now, we can call our data using the indicators and fwlogs
variables. Using an inner join as an example, let’s look at the syntax for
joining data (see Figure 6-11).
The fourth line is where your knowledge of your data is key. You need to
understand the link between your datasets. Is it an IP address like in this
example? Is it a username? A file hash?
In this example, we are saying, “Take the Indicator field from the
indicators data in the left query, and join it to the
DestinationIP in the fwlogs data in the right . If any fields
exist in both data sources, you can join directly on them. For example, if
you had a field called IPAddress in both the indicators and
fwlogs data, you could simply type on IPAddress, instead of on
$left.Indicator==$right.DestinationIP . However, it’s
important to apply critical thinking skills if the data exists in both datasets
because it might not make sense to join on that particular field. Kusto just
tells you that the data exists in both, not that it makes sense to join them.
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=inner(
fwlogs
)
on $left.Indicator==$right.DestinationIP
The join syntax is the same. We are telling Kusto to look up all the
indicators and look up all the firewall logs. Next, we want to see when the
Indicator field matches the DestinationIP field. We will see any
time the firewall has logged traffic to a DestinationIP on which there
is threat intelligence. The results are shown in Figure 6-12.
This match returns seven results. You can just eyeball the data to see the
logic; in each result, the Indicator field matches the
DestinationIP field. Importantly, we don’t see any results where a
match doesn’t exist—they are omitted from the results. Figure 6-13 shows
an inner join, where the Indicators table is represented by the left
circle, the fwlogs table is represented by the right circle, and the shaded
area represents the data matches. The returned results only show data
matches.
inner joins are probably the most common kind of joins you will
do in cybersecurity. Give it a shot with some of your data. Think about
where certain data exists in your organization. Perhaps you have usernames
(maybe sign-in data). Where else could those usernames exist? Maybe
they’re found in Microsoft Defender for Cloud Apps data (say, for file
downloads) or Office Activity data (for email events). Once you’ve
identified the sources, join them to see what the combined data looks
like.
Remember, you can still write your queries and filter your results before the
join . We don’t need to join everything and hope for the best. You can
filter before joining. In fact, you should! The less data you join , the
quicker your query will run. For example, let’s say your SOC is so
overworked that you only really care about high-confidence threat
intelligence. The sample indicators data contains a Confidence
field. Next, if your firewall denies a connection, you don’t want to waste
time on it. The firewall did its job, and you simply don’t have the cycles to
investigate every failed connection. So, in the fwlogs data, you see an
Action field that shows whether the firewall allowed or denied a
connection.
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| where Confidence == "high"
| join kind=inner(
fwlogs
| where Action == "allow"
)
on $left.Indicator==$right.DestinationIP
In the first query (the left ), we added a filter for high-confidence threat
intelligence. In the second query (the right ), we asked to see only
returns where the firewall allowed the connection.
Figure 6-14 shows the updated results matching the query logic. For all of
them, the Confidence level is high , and the Action is allow .
Then, the join (or match) between the two datasets is where the
Indicator is the same as the DestinationIP . Three results are
shown. See how easy it is to join data?
Innerunique Join
Even with these filters, we only retrieve matches between the two datasets.
A very closely related join known as an innerunique might work
better. An innerunique join is very similar to an inner join, with
one important distinction: Kusto deduplicates the data in the first (or
left ) table (see Figure 6-15). So, if you happen to have 50 instances of
the same data in the first table, it will only be matched once to the right
table. This is important because innerunique is also the default join
type in Kusto, so if you don’t specify type=inner , Kusto will default to
innerunique .
FIGURE 6-15 An inner join
Fullouter Join
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=fullouter(
fwlogs
)
on $left.Indicator==$right.DestinationIP
Tip
Leftouter Join
Next is the leftouter join . Once again, we will keep everything the
same and just change the join kind :
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=leftouter(
fwlogs
)
on $left.Indicator==$right.DestinationIP
Figure 6-18 shows the results in which there is a match between the two
tables. Also, we see everything from the left table ( indicators )
that doesn’t have a match.
The matched data and everything else from the left table. This particular
join is interesting in the context of the sample data because it provides some
scale between the number of indicators versus the number of matches. For
example, the leftouter join might help us cull 1,000 indicators to just 7
matches.
Leftanti Join
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=leftanti(
fwlogs
)
on $left.Indicator==$right.DestinationIP
With leftanti join, matches are not returned; we see only the indicators
without matches. Instead of matching indicators, imagine the first query
(the left ) is a list of all your users, and the second query (the right )
is a list of users who fell victim to a phishing campaign. This kind of join
could retrieve all the users you don’t need to worry about investigating,
helping you speed up your response to those impacted.
Leftsemi Join
This time, we are matching the data between the two datasets and retrieving
only data from the indicators (the left table) that have matches to the
firewall logs (the right table). Perhaps we are only interested in the
indicators that you do have matches and don’t want to retrieve the firewall
data associated with them at the moment. A leftsemi join would find
only the indicators:
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=leftsemi(
fwlogs
)
on $left.Indicator==$right.DestinationIP
Rightouter Join
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=rightouter(
fwlogs
)
on $left.Indicator==$right.DestinationIP
In Figure 6-24, the query returns matches in the left-side data and where
there are no matches in the right.
A rightouter join might be useful so you can see the hits on your
indicators and the firewall events that have no hits on your indicators.
Again, it is about giving your data context.
Rightanti Join
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=rightanti(
fwlogs
)
on $left.Indicator==$right.DestinationIP
Rightsemi Join
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=rightsemi(
fwlogs
)
on $left.Indicator==$right.DestinationIP
let indicators=
externaldata (Indicator:string,Location:string,Fi
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
let fwlogs=
externaldata (Timestamp:datetime,SourceIP:string,
h@'https://raw.githubusercontent.com/KQLMSPress/d
kql/main/Chapter%206%3A%20Advanced%20KQL%20for%20
(ignorefirstrecord=true);
indicators
| join kind=inner(
fwlogs
)
on $left.Indicator==$right.DestinationIP, $left.T
Depending on the data you are hunting through, it might make sense to join
on multiple items, perhaps a username and an IP address or a file hash and
an IP address. There are no hits in this sample data, but there may be in
your environment. Nothing changes when you join on multiple items; the
join kinds are the same, and there’s no additional query syntax to add. All
the cool stuff you have learned in this chapter still applies. The only thing
that changes is that you are matching on both items.
Tip
DeviceFileEvents
| where InitiatingProcessFileName has_any ("any.e
ago(24h)
| summarize by strangefiles=InitiatingProcessSHA1
| join kind=inner(DeviceFileCertificateInfo
| where isnotempty(IsTrusted)) on $left.strangefi
| summarize by strangefiles, Signer, Issuer, IsSi
You can join between more than two tables. In Chapter 5, we used a
phishing campaign as an example where we joined email delivery data with
email URLs, and then we joined that joined data with URL click events.
This data is not available in the Log Analytics demo environment, but if
you have your own tenant, you should get results:
EmailEvents
| join kind=inner(EmailUrlInfo) on NetworkMessage
| join kind=inner(UrlClickEvents) on Url, Network
In this example, the EmailEvents data is the left table, and the
EmailUrlInfo data is the right table. Kusto completes an inner
join , creating a new table that combines the EmailEvents and
EmailUrlInfo data. Next, the combined EmailEvents and
EmailUrlInfo table becomes the left , and the
UrlClickEvents table becomes the right . You could complete an
inner join first, followed by a fullouter join or whatever
combination you wanted.
After covering joins and destroying your sense of purpose on this planet, we
will now cover an operator that may help you avoid using joins when they
are unnecessary. Why did we run through joins before giving you a way to
avoid them? Because if we have to learn joining data, then you do, too. We
want to pay that pain forward. Seriously, the real reason is that there are
many ways to find the desired data, and sometimes, joins are the only way
to get the job done. Understanding how joins work is the only to know
which situations call for them and which don’t. That said, you can
sometimes use pivoting let statements instead.
At its simplest, let just allows us to declare a variable and assign it a
name for reuse later. PowerShell or SQL have the same functionality. Say
we want to investigate the 50.50.50.50 IP address. This is a placeholder IP
address; you should add a legitimate address in your environment. We could
cast that as a variable and then call that variable:
let ip="50.50.50.50";
SigninLogs
| where IPAddress == ip
You can even declare multiple variables in the same way; you aren’t limited
to just one. This time, maybe you also want to include a username to go
along with the IP address.
let ip="50.50.50.50";
let user="[email protected]";
SigninLogs
| where IPAddress == ip and UserPrincipalName ==
This is useful, but more than anything, it’s a time saver. Also, it might make
your query easier to read. The power of let comes from its ability to save
the results of a query as a variable for re-use:
let riskyusers=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct UserPrincipalName;
In this example, we find sign-in events with high risk associated with them,
and then we get a distinct listing of user principal names
( UserPrincipalName ) associated with the risky sign-ins. If you run
this query, you will get an error because we have declared a variable but
haven’t re-used it yet. We declare that as a variable, riskyusers . We
can then use that list of users for additional queries. Risk data isn’t
contained in the Log Analytics demo environment, but if you have your
own tenant, you should have some risky users:
let riskyusers=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct UserPrincipalName;
AuditLogs
| where TimeGenerated > ago (1d)
| where OperationName in ("User registered securi
info")
| where Result == "success"
| extend UserPrincipalName = tostring(TargetResou
| where UserPrincipalName in (riskyusers)
After getting the list of risky users, we can look through the Microsoft Entra
ID audit logs for any MFA changes from those users. We simply ask Kusto
to look up the list of risky users found in the first query.
You can even chain these together, which we often do in our day-to-day
jobs. We build a query out piece by piece and add indicators dynamically on
the fly:
let riskyusers=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct UserPrincipalName;
let riskyips=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct IPAddress;
let mfausers=
AuditLogs
| where TimeGenerated > ago (1d)
| where OperationName in ("User registered securi
info")
| where Result == "success"
| extend UserPrincipalName = tostring(TargetResou
| where UserPrincipalName in (riskyusers)
| distinct UserPrincipalName;
CloudAppEvents
| where ActionType == "FileDownloaded"
| extend IPAddress = tostring(RawEventData.Client
| extend FileName = tostring(RawEventData.ObjectI
| extend UserPrinciplName = tostring(RawEventData
| where UserPrinciplName in (mfausers) or IPAddre
In this query, we get risky users and IP addresses from the sign-in data,
casting both as variables. Then, we again find MFA changes for any of
those users, thou we cast the results of that query as its own variable,
mfausers .
let corpips=
SigninLogs
| where NetworkLocationDetails has "Corporate IPs
| distinct IPAddress;
let riskyusers=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct UserPrincipalName;
let riskyips=
SigninLogs
| where RiskLevelDuringSignIn == "high"
| distinct IPAddress;
let mfausers=
AuditLogs
| where TimeGenerated > ago (1d)
| where OperationName in ("User registered securi
info")
| where Result == "success"
| extend UserPrincipalName = tostring(TargetResou
| where UserPrincipalName in (riskyusers)
| distinct UserPrincipalName;
CloudAppEvents
| where ActionType == "FileDownloaded"
| extend IPAddress = tostring(RawEventData.Client
| extend FileName = tostring(RawEventData.ObjectI
| extend UserPrinciplName = tostring(RawEventData
| where UserPrinciplName in (mfausers) or IPAddre
| where IPAddress !in (corpips)
We have added an additional variable to our query. By looking for any sign-
in from a location called Corporate IP s, we have built a list of our
corporate network IP addresses. We then cast that as another variable called
corpips and exclude it from the final query. Now we have all these
variables ready to go, if we want to change one of the queries, that change
will flow down to the rest. For instance, maybe you want to include
medium- and high-risk events. You simply change that part of your query,
and everything else contintues to work. Pivoting using let statements
like this will save you a huge amount of time once you get used to it.
Now, how does this relate to joins, you may be wondering? Think about the
differences between using let and joins . With a join , we are
combining data from different tables to form one set of results.
iff()
If you are familiar with SQL queries, iff() might look familiar. Iff() is a
really useful operator for adding context to your queries. When we did data
summation queries in Chapter 5, we used operators such as countif()
and dcountif() , which performed a count or distinct count only when
the statement was true . iff() works similarly, except it extends a
column based on whether the query is true or false . For example, the
DeviceInfo table includes information about the operating system
platform (Windows or Linux) and additional information about the
operating system distribution (Windows 11 or a Windows Server version).
This information is not in the Log Analytics demo environment but should
be if you have your own tenant to use:
DeviceInfo
| project DeviceName, OSPlatform, OSDistribution
DeviceInfo
| project DeviceName, OSPlatform, OSDistribution
| extend isWindows = iff(OSPlatform contains "Win
| extend isServer = iff(OSDistribution contains "
We add two new columns, called isWindows and isServer and use
iff() to add our logic. If the OSPlatform contains “Windows” our new
isWindows field will be true, if not, it will be false. And similarly, if the
OSDistribution field contains “Server”, it will return true. Our results
now look like Figure 6-31.
FIGURE 6-31 Logs showing device information with the new fields
You could then query on those fields like normal. For instance, where
isServer == true , would return all server devices.
case()
You’re likely are familiar with guest accounts in Microsoft Entra ID, but
when looking at sign-in logs for guest accounts, there are two distinct types
of events: inbound guests and outbound guests :
In the Microsoft Entra ID sign-in logs, there are three fields we can use to
distinguish between these activities: AADTenantId , HomeTenantId ,
and ResourceTenantId . From experience, we can tell you that when
the AADTenantId is different from the HomeTenantId and the
HomeTenantId is different from the ResourceTenantId , then the
connection is an inbound guest connection. When the AADTenantId is
the same as the HomeTenantId and the ResourceTenantId is
different from the AADTenantId , the connection is an outbound guest.
Confusing right? Instead of trying to remember that every time you are
investigating, we can use case() to build a little parser to do the hard
work for us:
SigninLogs
| where TimeGenerated > ago (1d)
| where UserType == "Guest"
| project TimeGenerated, UserPrincipalName, AppDi
AADTenantId
| extend ['Guest Type']=case(AADTenantId != HomeT
AADTenantId == HomeT
"unknown")
In this parser, we just apply the above guest tenant logic that we described.
We will extend a new column, Guest Type . When
AADTenantId != HomeTenantId and HomeTenantId !=
ResourceTenantId , then the column will be Inbound Guest .
When AADTenantId == HomeTenantId and
ResourceTenantId != AADTenantId , it will be Outbound
Guest . A comma separates each case. You can then query on that field, so
when querying for Guest Type == “Inbound Guest”, only
Inbound Guests will be in the results.
You might notice “unknown” at the end of the previous query. When
using the case() statement, we need to add what we would like Kusto to
display if none of the statements are a match. It is just saying, “I have
evaluated all the case statements in your query, but none of them match, so
I will say “unknown” to let you know that. You can use any text you want
there, such as “unknown guest type” or “no match”. Use whatever makes
the most sense to you.
Another good example is Active Directory security logs for group additions
and removals. In Active Directory, an EventId will be triggered when a user
is added to a global group and another will be triggered when a user is
added to a domain local group. The same is true when a user is removed
from those groups. You might just want to simply see all “add” or “remove”
events, regardless of the group type:
SecurityEvent
| project TimeGenerated, EventID, AccountType
| where AccountType == "User"
| where EventID in (4728, 4729, 4732, 4733, 4
| extend Action = case(EventID in ("4728", "4
EventID in ("4729", "4757", "4733"), strc
, "unknown")
coalesce()
datatable (action:string,username:string,userdisp
"create virtual machine","eric.lang@tailspintoys
"delete virtual network","randy.byrne@tailspintoy
"create storage account","","Tim Kim",
"delete storage account","dennis.bye@tailspintoys
"create virtual firewall","","Dennis Bye"
]
Figure 6-32 shows some fake cloud events; some are missing the
username , and the userdisplayname is missing in one. This is
quite common if you use services across different clouds. We can use
coalesce to create a new column based on whichever one is available to
us.
FIGURE 6-32 Cloud logs showing missing data for the username or
userdisplyname
datatable (action:string,username:string,userdisp
"create virtual machine","eric.lang@tailspintoys
"delete virtual network","randy.byrne@tailspintoy
"create storage account","","Tim Kim",
"delete storage account","dennis.bye@tailspintoys
"create virtual firewall","","Dennis Bye"
]
| extend Actor=coalesce(username, userdisplayname
The new field is made up of the first non-null field it finds, see Figure 6-33.
You can coalesce up to 64 fields in this way.
parse-where
datatable (data:string) [
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"sourceip=10.10.10.10,destinationip=50.50.50.50,s
You can see a few log files here; if we write a very quick parser for the first
one, it looks like this:
datatable (data:string) [
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"sourceip=10.10.10.10,destinationip=50.50.50.50,s
| parse data with * @"srcipaddr=" SourceIP @",dst
In Figure 6-35, two results are nicely parsed, and one is all blank. For that
last record, we can see the data structure is different, so the parsing logic we
just created doesn’t apply.
If we change the query to use parse-where , you will see the difference:
datatable (data:string) [
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"srcipaddr=10.10.10.10,dstipaddr=50.50.50.50,srcp
"sourceip=10.10.10.10,destinationip=50.50.50.50,s
| parse-where data with * @"srcipaddr=" SourceIP
DestinationPort @",protocol=" Protocol
In Figure 6-36, only two results are shown because the last one doesn’t
match the parse-where logic, so it is dropped from the results.
Think of it this way: If you have a table in Microsoft Sentinel to which you
send the logs from your firewalls. That table contains logs from Cisco
devices, Palo Alto, and other vendors. You write a parser to clean up the
logs from the Cisco devices, which has its own unique log format. Running
a regular parse will return the cleaned-up data from the Cisco devices and
the unparsed results from all your other devices. If you run parse-
where , it will only return the data from the Cisco devices because there is
no match on the other data.
Caution
One thing to be careful about when using parse-
where is that you need to be sure you aren’t excluding
results you were expecting to see through your parsing
logic. Double-checking with a regular parse is valuable
to ensure accuracy.
parse_json()
We have spent a lot of time talking about JSON in this book—for good
reason: Security and operational data are often JSON. Occasionally, you
will be exposed to some data that looks like JSON, but for whatever reason,
you aren’t able to query it like you would normally. Maybe there is an
element you want to extend out to its own field, but you can’t quite get
it to work. It may be the case that the data type is actually a string ,
despite the data being JSON. In this case, we can use parse_json() to
tell KQL that the data is, in fact, JSON, and then the regular operators will
work on it as usual.
To help you understand exactly how it works, we have created some test
JSON that you can use:
datatable(Username:string,ErrorCode:string,Locati
"[email protected]","50126",'{"City":"Lo
"[email protected]","0",'{"City":"Syd
"[email protected]","50053",'{"City"
]
In Figure 6-37, you can see three events. The Username and
ErrorCode fields are straightforward. However, if you look at the
LocationData closely, it looks like JSON, but it’s actually been
ingested as a string, so we can’t automatically extend our columns out.
For instance, if we wanted City as a new field, we couldn’t do that by
extending LocationData.City .
The data in Figure 6-37 looks like valid JSON, though, so we can tell Kusto
to parse at the JSON, so we can manipulate it properly:
datatable(Username:string,ErrorCode:string,Locati
"[email protected]","50126",'{"City":"Lo
"[email protected]","0",'{"City":"Syd
"[email protected]","50053",'{"City"
]
| extend LocationData=parse_json(LocationData)
| extend City=LocationData.City
After telling Kusto to parse it, we can do things like extend the City
element out as usual. See Figure 6-38.
FIGURE 6-38 Sign-in data with the location details as a JSON object
Tip
AzureActivity
| where CategoryValue == "Administrative"
| where OperationNameValue contains "MICROSOFT.ST
| where ResourceGroup !contains "CLOUD-SHELL-STOR
| where ActivityStatusValue == "Success"
| extend storageaccname = tostring(parse_json(Pro
| project Caller, OperationNameValue, CallerIpAdd
parse_xml()
parse_user_agent()
Tip
User-Agents change often when browsers are patched
and updated. You can also change your User-Agent to
any string you want at anytime in your browser’s
Advanced Settings. So, any hunting based on User-
Agents should be framed with those two points in mind.
SigninLogs
| where TimeGenerated > ago (30d)
| take 100
| distinct UserAgent
Figure 6-39 shows the query results, where you see several User-
Agents .
FIGURE 6-39 Example user agents
SigninLogs
| where TimeGenerated > ago (30d)
| take 100
| distinct UserAgent
| extend UserAgentDetails=parse_user_agent(UserAg
• The first is the field where our User-Agent is located; in this case,
UserAgent .
• Then, we need to tell Kusto what to search for within that User-
Agent , whether it is a browser, device, or operating system. In this case,
we are looking for browser details.
We can see the result once we run the query, as shown in Figure 6-40.
The details about the browser have been put into an array—Browser,
MajorVersion, MinorVersion, and Patch. This information can be useful for
tracking down users accessing your applications via legacy or outdated
browsers and possibly even browsers with known vulnerabilities.
parse_url()
Often, URLs are forensically interesting, meaning they can point out what
domains our users accessed or that URLs were found in emails. With
phishing still so commonplace, we defenders often try to understand
whether a domain or URL is malicious. parse_url() lets us easily
break down a full URL into various components that we can then query on.
For instance, if you look at the DeviceEvents table, we can see an
event named BrowserLaunchedToOpenUrl , which is exactly what it
sounds like, an event when a URL is opened in a web browser. The
DeviceEvents is not available in the Log Analytics demo environment,
but should be in your tenant:
DeviceEvents
| where ActionType == "BrowserLaunchedToOpenUrl"
| distinct RemoteUrl
Figure 6-41 shows example URL logs for the TailspinToys SharePoint site.
DeviceEvents
| distinct RemoteUrl
| extend UrlDetails=parse_url(RemoteUrl)
The array includes things like the Host, Path (the URL’s path), and any
additional parameters. It can even include things like a Username and
Password, if those things are sent as part of the URL, which presents a
really great detection capability.
regex
match lets you hunt for regex patterns within a field to see if they
align with the regex pattern. For instance, let’s look at the following test
data:
datatable (data:string) [
"ipaddress=50.50.50.50-url=tailspintoys.com",
"ipaddress=50.50.50.50-username=eric@tailspintoys
"ipaddress=50.50.50.50-userid=39372",
"ipaddress=unknown-userid=39281"
]
datatable (data:string) [
"ipaddress=50.50.50.50-url=tailspintoys.com",
"ipaddress=50.50.50.50-username=eric@tailspintoys
"ipaddress=50.50.50.50-userid=39372",
"ipaddress=unknown-userid=39281"
]
| where data matches regex @"((?:[0-9]{1,3}\.){3}
You can also use the not functionality for regex queries to find where
you don’t have a match. If we add that logic to our last query, we see the
other two results, where there are no domain names:
datatable (data:string) [
"ipaddress=50.50.50.50-url=tailspintoys.com",
"ipaddress=50.50.50.50-username=eric@tailspintoys
"ipaddress=50.50.50.50-userid=39372",
"ipaddress=unknown-userid=39281"
]
| where not (data matches regex @"([a-z0-9|-]+\.)
Figure 6-46 shows the query results where there isn’t a match on the
domain regex pattern.
FIGURE 6-46 Results where there is not a match on a domain
Note
extract
extract lets you look through a large string of text for matches to your
regex pattern and extract them to a new column:
datatable (data:string) [
"ipaddress=50.50.50.50-url=tailspintoys.com,ipadd
url=aka.ms/kustofree"
]
| extend IPAddress=extract(@"((?:[0-9]{1,3}\.){3}
Figure 6-48 shows a new field called IPAddress with the first match of
the regex pattern.
FIGURE 6-48 The first match of an IP address extracted to a new field
extract_all
datatable (data:string) [
"ipaddress=50.50.50.50-url=tailspintoys.com,ipadd
url=aka.ms/kustofree"
]
| extend IPAddress=extract_all(@"((?:[0-9]{1,3}\
Figure 6-49 shows that all our IP addresses are extracted into a new array.
FIGURE 6-49 All IP addresses extracted to a new array
Tip
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(3d)
| extend path = replace_string(replace_string(rep
@'(\/)+','//'),'v1.0/',''),'beta/','')
| extend UriSegments = extract_all(@'\/([A-z2]+|\
| extend OperationResource = strcat_array(UriSegm
| where OperationResource == 'oauth2permissiongra
| summarize RequestCount=count() by AppId
Advanced time
If you work for a large multinational organization, you probably deal with
many time zones. While all logs should be ingested in UTC, you can
manipulate your queries to return multiple time zones, which might help
with your analysis. As mentioned in the introductory sections, the
datetime type in KQL is separate from a regular string, so you can alter
and query on it in different ways. For instance, say you have three main
offices: one is on UTC, one is +9 GMT (Japan), and one is -5 GMT (EST).
You can present all three time zones in your query to help your analysis:
SigninLogs
| project TimeGenerated, UserPrincipalName, AppDi
SigninLogs
| project TimeGenerated, UserPrincipalName, AppDi
| extend EST=TimeGenerated-5h
| extend JST=TimeGenerated+9h
Now, when we see our results, we see all three timestamps, which might
give you additional context. For example, is an activity from a Japanese
user happening in regular Japanese working hours or outside of them?
If you know the friendly name of time zones, you can use the
datetime_utc_to_local operator as an alternative. For instance, if
you wanted Sydney time as well as as UTC, you could get that easily:
SigninLogs
| extend SydneyTime=datetime_utc_to_local(TimeGen
Kusto also has inbuilt functionality to determine the start of days, weeks, or
months. The ago() function lets us investigate a historical block of time
from when we ran our query, so if we use ago(1d) , we find any results
from the last day. This simply returns the last 24 hours of data. So, if you
run that query at 9.15 AM, it will go back to 9.15 AM the previous day; if
you run it again at 11.25 AM, it will go back to 11.25 AM the previous day.
If you are interested in only events that happen today or this week, then you
can use startofday() , startofweek() , startofmonth() ,
and startofyear() .
When you run the following query, it will go back to midnight on the day
you ran it, regardless of the time you ran it. If you run it at 9.15 AM, it will
go back to midnight; if you run it at 11.25 AM, it will also go back to
midnight. The startofday() , startofweek() ,
startofmonth() , and startofyear() operators all work the
same way. There is no right or wrong with using these options; choosing the
correct one comes down to being aware of the data that will be returned. If
you want to get even more specific, you can also use similar operators:
dayofweek() or timeofday() .
SigninLogs
| where TimeGenerated > startofday(now())
Events can be more or less interesting depending on the hours of the day or
the days of the week that they occur on. For example, your IT admins
should complete most of their work during business hours. Seeing them
perform an activity outside normal work hours or on weekends might
increase the event risk. Using KQL, we can filter down to only events that
happen during your standard working hours. In this example, let’s assume
your regular hours are Monday to Friday, 6 AM to 6 PM. Some more
sophisticated threat actors will launch their attacks outside standard
business hours, where alerts may go undetected. To mimic that kind of
attack, we can search for any high-risk sign-in activities to the Azure Portal
that fall outside those times:
• We cast two variables for Saturday and Sunday to make our query
a little easier to read.
• We set them as dayofweek ,with the first day of the week being day 0
(counting starts from 0, as always) for Sunday and day 6 for Saturday.
• Our privileged accounts start with the adm prefix, so we include that as a
filter ( where UserPrincipalName startswith “adm” )
SigninLogs
| where TimeGenerated > ago (7d)
| where RiskLevelDuringSignIn in ("medium", "high
| where ResultType in ("50079","50072")
| project RiskTime=TimeGenerated, UserPrincipalNa
| join kind=inner(
AuditLogs
| where TimeGenerated > ago (7d)
| where OperationName == "User registered sec
| where Result == "success"
| extend UserPrincipalName = tostring(TargetR
)
on UserPrincipalName
| project-rename MFATime=TimeGenerated, MFAResult
| where (MFATime - RiskTime) between (0min .. 30m
| extend TimeDelta=MFATime-RiskTime
| project RiskTime, MFATime, TimeDelta, UserPrinc
• First, we first searched for medium- and high-risk events, where the user
requires MFA registration (denoted by codes 50079 and 50072 ).
• We joined those events to MFA registration events from the same user.
• Finally, we calculate the time difference between the two and return
events occurring within 30 minutes of each other.
Let’s start with some Microsoft Defender for Endpoint (MDE) network
events. If you use the sample operator, you can return a collection of events:
DeviceNetworkEvents
| where TimeGenerated > ago(1d)
| sample 100
| sort by TimeGenerated asc
Figure 6-52 shows the results. Yours will look different, but you will get a
random assortment of logs.
DeviceNetworkEvents
| where TimeGenerated > ago(1d)
| sample 100
| sort by TimeGenerated asc
| extend TimeDiffInMinutes=datetime_diff('minute
The key to this query is our last line, where we extend a new column
called TimeDiffInMinutes using datetime_diff . We then say
we want this to calculate the minutes (other alternatives could be seconds or
hours) between events and use the TimeGenerated field as the time
field.
DeviceNetworkEvents
| where TimeGenerated > ago(1d)
| sample 100
| sort by TimeGenerated asc
| extend TimeDiffInMinutes=datetime_diff('minute
FIGURE 6-54 Results showing the time between events but reversed
Time-series analysis
Kusto has a range of inbuilt functionality that can do the heavy lifting
regarding time-based analysis. Using this analysis, you can simply visualize
events, or you can do additional analysis on top of your visualization,
including things such as trends. Learning how to quickly visualize a dataset
can be valuable for a couple of reasons.
• It can help you quickly see anomalies in your data, such as when an event
began appearing suddenly (or when it stopped appearing).
• Large data spikes might be interesting, too. You can then dig into those
particular periods.
If you want to create simple timechart visualizations, you can use several
operators. You can use the Log Analytics Demo environment at
aka.ms/LADemo for all these examples. Let’s create a timechart showing
total sign-ins using the example sign-in data. Your visualization will look
slightly different because the data will be different at the time you run it,
but you will get the idea:
SigninLogs
| where TimeGenerated > ago (30d)
| summarize Count=count() by bin(TimeGenerated, 1
| render timechart
FIGURE 6-55 Timechart showing sign-in events over the last 30 days
You can see all the sign-ins to the demo environment, which follows a
predictable pattern of having fewer sign-ins on the weekend—which is to
be expected. You can take your visualizations one step further such as
breaking up single-factor and multifactor authentications:
SigninLogs
| where TimeGenerated > ago (30d)
| summarize Count=count() by bin(TimeGenerated, 1
| render timechart
In Figure 6-56, we have broken up our data into the same 1-day blocks of
time and separated out the AuthenticationRequirement field. We
definitely need more MFA in this environment!
FIGURE 6-56 Timechart showing single-factor versus multifactor
authentications
SigninLogs
| where TimeGenerated > ago (30d)
| make-series Count=count() default=0 on TimeGen
| render timechart
With make-series , we also define what value to use when no hits are
in that particular time bucket; in this case, we used 0 . Why is this
important? It speaks to the different visual outputs you can get from
summarize versus make-series , depending on your data.
For summarize:
LAQueryLogs
| where TimeGenerated > ago (30d)
| summarize Count=count() by bin(TimeGenerated, 1
| render timechart
For make-series:
LAQueryLogs
| where TimeGenerated > ago (30d)
| make-series Count=count() default=0 on TimeGene
| render timechart
When you first look at the time charts in Figures 6-58 and 6-59, you might
think they are the same. But are they? In the summarize version shown
in Figure 6-58, the line never hits zero; the visualization is essentially
“smoothed” between the days that have activity. Now look at Figure 6-59,
covering the span from October 27 to October 30. With make-series ,
we said if there is no activity on a particular day, then default to 0. So, we
can see on October 28, it goes down to 0 .
Now that we have done some simple visualizations, let’s have a look at
some operators to add time-series analysis on top of them.
series_stats
SigninLogs
| where TimeGenerated > ago (30d)
| make-series Count=count() default=0 on TimeGene
| project series_stats(Count)
You may want to add a trend line over your visualization to understand
if you are trending upward or downward in terms of the total count. This
might be useful for things like security alerts and phishing emails or to
understand the overall trend over a longer period.
Figure 6-61 shows the total sign-ins in a timechart with a trendline showing
the overall data trend. You can see that the total sign-ins are trending down
slightly over the previous month. The natural extension of a trend line is to
use Kusto to forecast based on historical data.
SigninLogs
| make-series Count=count() on TimeGenerated from
| extend forecast = series_decompose_forecast(Cou
| render timechart
In Figure 6-62, you can see the actual data (shown as “count” in the
legend), which drops off around November 22 (when this visual was
created), and the forecast for the additional 14 days.
let timeframe=1h;
let sensitivity=2;
let threshold=10;
SigninLogs
| where TimeGenerated between (startofday(ago(21d
| where ResultType == 0
| make-series SigninCount=count() on TimeGenerate
Location
| extend outliers=series_decompose_anomalies(Sign
| mv-expand TimeGenerated, SigninCount, outliers
| where outliers == 1 and SigninCount > threshold
• This query looks at the last full 21 days (using the startofday()
operator) and then looks for successful sign-ins only (result type is 0 ).
• We also put in some logic to show only events where the sign-in count is
over the threshold of 400 . This is mainly to remove low-count anomalies.
For example, while going from a single sign-in to two sign-ins doubles the
count, this isn't what interests us.
We find it useful to cast things like the timeframe, sensitivity, and threshold
as variables because you might want to adjust your query depending on
your environment or data. By default, the sensitivity field is 1.5, which you
can make more or less sensitive, depending on what you are trying to find.
Seeing the outliers in a list can be valuable, but a visual will probably be
more valuable. To do that, we will use the let operator and cast the first
query as a variable called outliercountries :
let timeframe=1h;
let sensitivity=2;
let threshold=400;
let outliercountries=
SigninLogs
| where TimeGenerated between (startofday(ago(21d
| where ResultType == 0
| make-series SigninCount=count() on TimeGenerate
Location
| extend outliers=series_decompose_anomalies(Sign
| mv-expand TimeGenerated, SigninCount, outliers
| where outliers == 1 and SigninCount > threshold
| distinct Location;
SigninLogs
| where TimeGenerated between (startofday(ago(21d
| where ResultType == 0
| where Location in (outliercountries)
| make-series SigninCount=count() on TimeGenerate
Location
| render timechart
We tell Kusto to return only the anomalous activity locations and render a
chart to visualize the anomalies. In Figure 6-63, you can see the big spike of
activity that flagged the detection logic. You will likely get different results
when you run this query in the demo environment or on your own. You
must play with the various options to see what you can detect.
Tip
Once you have written one of these queries, it is just a matter of deciding
what you would like to hunt on and manipulating those variables. Do you
want to look at data over a larger time frame? Then you can up it to 6 hours
or a day. Do you want a higher sensitivity in your query? Then you can up
that variable, too.
You might want to look for anomalies in countless types of data and logs.
Again, this is where your environmental expertise comes into play. Maybe
you want to look for anomalous downloads from SharePoint, MFA
registration events, or emails being sent to quarantine. The same logic
applies for them all: Arrange your data into a time series and then apply
your criteria, such as a timeframe overlaid with sensitivity. Optionally, you
can create a chart it’s easier to understand.
Geolocation
Geolocation activities are often interesting in cybersecurity, if your core
business is in the United Kingdom, we might be interested in activity from
outside the UK, especially countries and locations considered to be high
risk.
Some log data you query will already have geolocation data attached to it,
whether that is the city or country related to the event. It might even have
the raw latitiude and longitude available. Using that data, we can query
specific data, such as looking for only logs from New York. Also, you can
plot that data onto a map.
Tip
Note
In the example sign-in logs, you will see the latitude and longtitue held in a
field called LocationDetails . See Figure 6-64.
SigninLogs
| sample 100
FIGURE 6-64 Location details held in the sign-in data
We can extend them to new columns, count the sign-ins, and render
them onto a world map, as shown in Figure 6-65.
SigninLogs
| extend Lat=toreal(['LocationDetails']['geoCoord
| extend Long=toreal(['LocationDetails']['geoCoor
| summarize Count=count() by Long, Lat
| render scatterchart with (kind=map)I
FIGURE 6-65 World map visualization with sign-in data highlighted
If the data you are analyzing doesn’t have built-in geolocation information,
KQL can retrieve it for you using the
geo_info_from_ip_address() operator. The operator’s name tells
you that ig takes an IP Address as input and retrieves the associated
geolocation information for you. For example, we can look in
DeviceLogonEvents for RDP logon events that occurred from a
public IP address:
DeviceLogonEvents
| where ActionType == "LogonSuccess"
| where LogonType == "RemoteInteractive"
| where RemoteIPType == "Public"
| project TimeGenerated, DeviceName, AccountName,
DeviceLogonEvents
| where ActionType == "LogonSuccess"
| where LogonType == "RemoteInteractive"
| where RemoteIPType == "Public"
| extend GeoInfo=geo_info_from_ip_address(RemoteI
| project TimeGenerated, DeviceName, AccountName,
In Figure 6-67, the geolocation details have been extended to a new field.
DeviceLogonEvents
| where ActionType == "LogonSuccess"
| where LogonType == "RemoteInteractive"
| where RemoteIPType == "Public"
| extend GeoInfo=geo_info_from_ip_address(RemoteI
| extend City = tostring(GeoInfo.city)
| extend Country = tostring(GeoInfo.country)
| extend Latitude = tostring(GeoInfo.latitude)
| extend Longitude = tostring(GeoInfo.longitude)
| extend State = tostring(GeoInfo.state)
| where City == "New York"
| project TimeGenerated, DeviceName, AccountName,
We can also render these events on the same kind of world map as shown
previously in Figure 6-65. The new world map is shown in Figure 6-68.
DeviceLogonEvents
| where ActionType == "LogonSuccess"
| where LogonType == "RemoteInteractive"
| where RemoteIPType == "Public"
| project TimeGenerated, DeviceName, AccountName,
| extend GeoInfo=geo_info_from_ip_address(RemoteI
| project TimeGenerated, DeviceName, AccountName,
| extend Lat=toreal(['GeoInfo']['latitude'])
| extend Long=toreal(['GeoInfo']['longitude'])
| summarize Count=count() by Long, Lat
| render scatterchart with (kind=map)
Have you ever written a query to look for suspicious activity from IP
addresses and kept getting private IP ranges in your query?
Some datasets, such as network events taken from Microsoft Defender for
Endpoint, have a field called RemoteIPType , which signifies whether
the IP address is remote or private. Many data sources, however, don’t have
such a field. If you are a seasoned networking pro, you might be able to
look at an IP address and instantly know whether it is private. We
sometimes remember, but we often don’t—especially when a public IP
address closely resembles one of the private ranges. Fear not; we have an
inbuilt operator to save you a headache—and it is simple to use.
FirewallLogs
| where ipv4_is_private(IPAddress)
If you want to see the opposite of this, only finding results with public IP
addresses, we just use the not operator to return public IP addresses:
FirewallLogs
| where not (ipv4_is_private(IPAddress))
The iff() operator can be a good use case for extending another field
using iff() :
FirewallLogs
| extend PrivateIP=iff(ipv4_is_private(IPAddress)
We use iff() to do the hard work for us and create a new column called
PrivateIP ; the results will be either true or false , depending on
the lookup!
datatable (SourceIPAddress:string,DestinationIPAd
"192.168.1.5","50.50.50.50","443",
"192.168.1.13","60.60.60.60","80",
"192.168.5.65","50.50.50.50","22",
"192.168.2.67","70.70.70.70","443",
]
| extend isVPN = ipv4_is_in_range(SourceIPAddress
Figure 6-70 shows a new column indicating if our SourceIPAddress
is in the 192.168.1.0/26 VPN range. We can then focus our energy
on the VPN traffic.
If you have multiple VPN ranges, you can expand on that same logic with
ipv4_is_in_any_range() . The idea is exactly the same, but it will
look up multiple ranges for you, as shown below; the results are shown in
Figure 6-71.
datatable (SourceIPAddress:string,DestinationIPAd
"192.168.1.5","50.50.50.50","443",
"192.168.1.13","60.60.60.60","80",
"192.168.5.65","50.50.50.50","22",
"192.168.2.67","70.70.70.70","443",
]
| extend isVPN = ipv4_is_in_any_range(SourceIPAdd
FIGURE 6-71 Multiple VPN ranges
base64_decode_tostring()
If you are dealing with data that is base64-encoded, Kusto can natively
decode it for you inline, so you don’t need to use a third-party app or
website to do it for you. This can be particularly useful for things like
PowerShell that can be encoded. The syntax is simple. Let’s generate some
base64-encoded strings, as shown below; the output is shown in Figure 6-
72:
datatable (ProcessName:string,ProcessParams:strin
"PowerShell.exe","VGhlIERlZmluaXRpdmUgR3VpZGUgdG8
"PowerShell.exe","SHVtYW4ga25vd2xlZGdlIGJlbG9uZ3M
"PowerShell.exe","aHR0cHM6Ly90d2l0dGVyLmNvbS9yZXB
]
FIGURE 6-72 Encoded process details
We just extend a new column and have Kusto decode it for us; the
decoded strings are shown in Figure 6-73
datatable (ProcessName:string,ProcessParams:strin
"PowerShell.exe","VGhlIERlZmluaXRpdmUgR3VpZGUgdG8
"PowerShell.exe","SHVtYW4ga25vd2xlZGdlIGJlbG9uZ3M
"PowerShell.exe","aHR0cHM6Ly90d2l0dGVyLmNvbS9yZXB
]
| extend Decoded=base64_decode_tostring(ProcessPa
This one is one for the maths nerds out there, which lets us calculate a value
and have it saved as a scalar constant. Put simply, it lets us calculate
something and then save that for reuse. This is useful for calculations on
things such as standard deviation. To make these queries easy to read, you
can use the let statement. This example uses toscalar() to calculate
the standard deviation of blocked email. The Log Analytics demo
environment does not contain email data, but your own tenant should:
The first part of the query simply calculates the average emails blocked per
day in the tenant and then saves the output as a scalar constant via the
AverageBlockedEmail variable. The second query then uses that value to
calculate the standard deviation of blocked email over the last month.
Finally, we render a chart so it is easy to digest, as shownin Figure 6-74.
evaluate pivot()
If you want to create a small pivot table in Kusto, you can do that natively
using the evaluate operator. For instance, you could find out how many
times your staff had accessed a specific application containing the word
“Azure.” See Figure 6-75. In the Log Analytics demo environment, the
UserPrincipalName is hidden, to preserve privacy, but it will be available in
your own tenant.
SigninLogs
| where TimeGenerated > ago (30d)
| where ResultType == 0
| where AppDisplayName has "Azure"
| evaluate pivot(AppDisplayName,count(), UserPrin
You can see you get a handy pivot table with each staff member’s
username, the application name, and the count. Kusto isn’t a replacement
for Microsoft Excel by any stretch, but it can be useful to quickly create a
pivot table.
Functions
KQL allows you to save prewritten KQL as a function . When you that
function , the saved KQL runs. While that might sound an awfully lot
like saving a hunting query, functions go far beyond just saving hunting
queries for re-use. Importantly, they can also be used nearly everywhere
that uses KQL. Microsoft 365 Advanced Hunting, Microsoft Sentinel, and
Azure Data Explorer support the use of functions.
Let’s say you often find yourself looking at the same data, such as
Microsoft Entra ID sign-in logs, but you generally filter down to maybe
eight fields. You could turn that query into a function to reuse, saving you
lots of time tidying up your query every time you want to run it. So, let’s
select eight fields:
SigninLogs
| project TimeGenerated, UserPrincipalName, AppDi
RiskLevelDuringSignIn
Now, query your sign-in logs, returning the time, username, accessed
application, the sign-in result, IP address, user agent, location details, and
any associated risk. Now that we have the query, we can save it as a
function. Choose Save | Save As Function, as shown in Figure 6-76. You
won’t be able to save functions in the Log Analytics environment because it
is read only, but you can in your own environment.
After clicking Save, we just need to give it a minute or so until the newly
saved function is available to us. Once it is saved, the Kusto IntelliSense
will show the new function when you type its name, as shown in Figure 6-
78.
FIGURE 6-78 Intellisense detecting the saved function
We can simply run the AADLogs function with no other inputs; it will run
the KQL that we saved to the function. See Figure 6-79.
Kusto is also smart enough to allow you to write queries as normal after
“running” the function. For instance, you can find successful and high-risk
sign-ins while using a saved function. You just run a function like a query,
and Kusto returns the result.
AADLogs
| where ResultType == "0" and RiskLevelDuringSign
Understanding how functions work is the key to getting the most from
them. They are essentially pre-running a query for you. If you try to query
on something that isn’t included in the function, you won’t be able to. For
instance, if you wanted to know if these sign-ins were single -or multifactor
authentication, you can’t because that information is not part of the
function. Try it yourself:
AADLogs
| where AuthenticationRequirement == "Multifactor
You will get an error similar to the one shown in Figure 6-80 because the
function was not configured to return the
AuthenticationRequirement field, so we cannot query on it.
FIGURE 6-80 A Log Analytics error message
Note
One of the most valuable uses of functions in KQL is to help you clean up
and parse data. As mentioned previously, security data can sometimes be a
mix of formats, often very inconsistent and hard to read without some kind
of parsing. For example, consider firewall data. Maybe all the information
you want is held within a single text field. The IP addresses and port data
are all in there in a single string. You can parse that string in various ways,
as we have explained, by using operators like split(), trim() , or
parse( b). Once you have done that tedious work, the best way to
preserve your hard work is to save it as a function. Functions are valuable
because they are also made available to other workspace users.
externaldata (data:string)[h@'https://raw.githubu
kql/main/Chapter%205%3A%20KQL%20for%20Cyber%20Sec
record=false)
Now, to get the maximum benefit from all our hard work, let’s save our
work as a function called FirewallLogs . Once again, select Save >
Save As, as shown in Figure 6-83.
FIGURE 6-83 Saving the function as “FirewallLogs”
Then, you can simply use FirewallLogs when running your queries
against your beautifully parsed data instead of having the full KQL parser in
your query window each time:
FirewallLogs
| where SourceIP == "50.50.50.50" and Protocol ==
id: f4bcd8b6-5a67-4131-a5c2-de1af4f177b6
name: Security Event log cleared
description: |
'Checks for event id 1102 which indicates the s
It uses Event Source Name "Microsoft-Windows-Ev
servers for instance.'
severity: Medium
requiredDataConnectors:
- connectorId: SecurityEvents
dataTypes:
- SecurityEvent
queryFrequency: 1d
queryPeriod: 1d
triggerOperator: gt
triggerThreshold: 0
tactics:
- DefenseEvasion
relevantTechniques:
- T1107
query: |
SecurityEvent
| where EventID == 1102 and EventSourceName ==
| summarize StartTimeUtc = min(TimeGenerated),
Account, EventID, Activity
| extend timestamp = StartTimeUtc, AccountCusto
entityMappings:
- entityType: Account
fieldMappings:
- identifier: FullName
columnName: AccountCustomEntity
- entityType: Host
fieldMappings:
- identifier: FullName
columnName: HostCustomEntity
version: 1.0.0
You can see the core KQL under the query section of our YAML, and
surrounding that is additional information like a description and MITRE
mapping.
Once you are good to go, you can just submit a pull request. Your pull
request will go through a review; some of that review is automated (such as
checking your YAML formatting), and then some of the maintainers of the
repository will review it manually. They may have some questions or
suggestions for you, so you can edit it and re-submit.
You also don’t need to stop at queries. If you have created a fantastic
workbook that visualizes all kinds of data, you can also make it open-
source. Remember, that when you open source the workbook template, it
won’t bring any of your personal data with it, so don’t stress! If a user
imports the workbook you created, it will run on their own data and display
results from their tenant, not yours! The same wiki has contribution
guidance for all types of resources.
On top of the official repository and the queries and everything you get out
of the box, an amazing group of community members is submitting queries,
blog posts, and other content to help you on your journey. These are some
of the more popular resources as of this writing, but we will keep the
official book GitHub up to date as these will change over time.
•
https://github.com/microsoft/AzureMonitorCommunity/tree/master/Azure%2
0Services—The official repository managed by Microsoft for Azure
Monitor, covering queries for operational excellence in Microsoft Azure.
• https://github.com/reprise99/Sentinel-Queries—The repository of
Matthew Zorich, one of the authors of this book.
Community Repos
The following list shows some of the more popular KQL repos available on
GitHub. This list is not exhaustive and not designed to offend anyone who
was left off it. Additional resources are available on the book GitHub.
• https://github.com/alexverboon/Hunting-Queries-Detection-
Rules/tree/main—The repository of Alex Verboon, Microsoft MVP.
• https://github.com/cyb3rmik3/KQL-threat-hunting-queries—The
repository of Michalis Michalos, a security and KQL enthusiast.
• https://github.com/LearningKijo/KQL/tree/main/KQL-XDR-Hunting—
The repository of Kijo Girardi, a Microsoft employee and all-around
awesome person.
Other Resources
Below are some other more general KQL resources that aren’t strictly query
collections or code.
• https://rodtrent.substack.com/p/must-learn-kql-part-1-tools-and-resources
—MustLearnKQL was author Rod Trent's first KQL book, an open-sourced
learning series designed to introduce KQL to people. MustLearnKQL
walked so this book could run.
Summary
In this chapter, you learned about advanced KQL operators that will help
you when threat hunting. We took deep dives in some of the most useful
operators, providing examples you can put to work in your own
environment. We also learned how you can contribute to the KQL
community,
OceanofPDF.com
Keywords:
OceanofPDF.com
Author Bio:
OceanofPDF.com