OAuth2 and OpenID Connect The Professional Guide Ebook v02
OAuth2 and OpenID Connect The Professional Guide Ebook v02
1
OAuth2 and OpenID Connect:
The Professional Guide - Beta
by Vittorio Bertocci
curated by Andrea Chiarelli
2
Introduction 5
...to Directories 13
Cross-Domain SSO 15
OAuth2 Roles 37
Confidential Clients 41
A detailed walkthrough 44
Anatomy of an ID Token 51
3
The Refresh Token Grant 80
4
Introduction
This book will help you to make sense of OAuth, OpenID Connect, and the many moving parts
You will discover how authentication and authorization requirements changed in past years, and
how today’s standard protocols evolved and augmented their ancestors to meet those challenges
You will learn both the whys and the hows of OAuth2 and OpenID Connect. You will learn what
parts of the protocol are appropriate to use for each of the classic scenarios and app types
(Sign-on for traditional web apps, Single Page Apps, calling API from desktop, mobile and web
apps, and so on). We will examine every exchange and parameter in detail - putting everything
in context and always striving to see the reasons behind every implementation choice within
After reading this book, you will have a clear understanding of the classic problems in authentication
and delegated authorization, the modern tools that open protocols offer to solve those problems,
and a working knowledge of OAuth2 and OpenID Connect. All that will allow you to make informed
design decisions - and even to know your way through troubleshooting and network traces.
5
Chapter 1 - Introduction to Digital Identity
In this chapter, you will be able to grasp some of the essentials of identity, both in terms of
concepts and the jargon that we like to use in this context. And you'll have a good feeling of the
problems, the classic dragons that we want to slay in the identity space, which also happens to
Without further ado, what is the deal with identity? Why is everyone always saying, "Oh, this is
complicated." Why? Just look at the following picture. It is trivially simple: I have just two bodies
in here and your basic physics course, it would be one of the easy problems.
Figure 1.1
I have a resource of some kind, and I have a user — an entity of some kind that wants to access
that resource in some capacity. It's just two things doing one action. Why is this so complicated?
When something goes wrong in this scenario, it goes catastrophically wrong. And so, like every
mission-critical scenario, of course, it deserves our respect and our attention, and our preparation.
There is a lot of energy that goes into preventing this catastrophic scenario from coming true.
But in this specific domain of development, the thing that makes these complex is the Cartesian
6
product of all the factors that come into play to determine what you have to do for having a viable
Resource types: just think of all the types of resources you can have. Just a few years ago,
if you’d walk in a bank, you'd have a host, they’d have some central database, and that's it.
Today, conversely, pretty much everything is accessible programmatically. So you have the
API economy, you have serverless — all those buzzwords actually point to different ways
of exposing resources and, of course, websites, apps, and all the things that you use in
your daily life. Whenever you interact with a computer system, there is a kind of resource
that you have to connect to. And, from the point of view of a developer, implementing that
Development stacks: there are minor differences between development stacks that translate
into big differences in the code that you have to write for implementing access to a resource
and the way in which you interact with it. This is one level of complexity.
Identities sources: the other level of complexity is the sheer magnitude of the sources of
Think of all the ways in which your own identity gets expressed online. You can be a member
of a social network, an employee of one company, a citizen of a country. And all of those
identities somewhat get expressed in a database somewhere, and that somewhere determines
You connect to Facebook in a certain way. You connect to Active Directory in a different way.
You get recognized when you're paying your taxes to your country in yet another way. So,
again, we encounter another factor of complexity: if you want to extract identity from these
repositories, you have to find a way of doing it according to each repository’s requirements
and characteristics.
7
Client types: Finally, there are many more complexity factors, but I just want to mention
another one: the incredible richness with which we can consume information today. Think of
all the possible clients that you can use from your mobile phone and applications to websites,
to your watch. You can literally use anything you want to access the data. And again, these
compounds in terms of complexity with the kinds of resources that you wanted to access,
the places from where you are extracting information. So, this picture might look simple,
Now, what can Auth0 do for you to make this a bit more manageable? We offer many different
things but, in particular, the most salient component of our offering is our service. It is a service
that you can use for outsourcing most of the authentication functions that you need to have in
your solutions - so that you don't have to be exposed to that complexity. In particular, we offer:
ways of abstracting away the details of how you connect to multiple sources of identities.
Every identity provider will have a different style of doing the identity transactions, and we
a way of dealing with the user-management lifecycle. We have user representations and
a very large number of SDKs and samples, which help you to cross the last mile so that when
you're using a particular development stack, you can actually use components to connect
to Auth0 in a way that is aligned with the idiom that you're using in that context.
no other service at this point that offers the same freedom you have with Auth0 to customize
your experience.
8
Now, when you need to connect your application to Auth0, you need to do something to tell
us, "Auth0, please do authentication". And that something in Auth0 is implemented using open
standards.
Open standards are agreements, wide consensus agreements that have been crafted by
open standards when we came to the realization that everyone - users, customers, and vendors -
would have been better off if we would have enshrined in common standards common messages,
common protocols, some of the transactions that we know needed to occur when you're doing
authentication, and similar. What happened back then is that we went to semi-expensive hotels
around the world, met with our peers across the industry, and argued about how applications should
present themselves when offering services in the context of an identity transaction. We discussed
similar considerations for identity providers. What kind of messages should be exchanged? We
literally argued message details down to the semicolon. That's how fun standards authoring is,
but it's all worth it: now that we have open standards and all vendors implemented the open
standards, you, as the customer, can choose which vendor you want to use without worries about
being locked into a particular technology or vendor. Above all, you can plan to introduce different
Of course, this is mostly theory: a bit like those simplified school problems disregarding friction
or gravity of the moon influencing tides. In reality, there are always little details that you need to
iron out. But largely, if you worked in our industry for the last couple of decades, you know that
we are so much better off now that we have those open standards we can rely on.
In identity management, you're going to get in touch with many protocols, many of them probably
9
The ones that are a daily occurrence nowadays are:
OAuth2, which is the basis of OpenID Connect and it is a delegation protocol designed to help
JSON Web Token or JWT, which is a standard token format. Most of the tokens you'll be
SAML, which is somewhat a legacy (but still very much alive) protocol that is used for doing
single sign-on across domains for browsers. SAML also defines a standard token format,
which has been very popular in the past and is still very much in use today.
Let’s spend the next few minutes going through a time-lapse-accelerated-whirlwind tour of how
authentication technologies evolved. My hope is that by going back to basics and revisiting this
somewhat simplified timeline, I'll have the opportunity to show you why things are the way they are
today. In doing so, I’ll also have the opportunity to introduce the right terms at the right time. By
being exposed to new terminology at the correct time, that is to say, when a given term first arose,
you will understand what the corresponding concepts mean in the most general terms. Contrast
that with the narrower interpretations of a term’s meaning you’d end up with if you’d be exposed
to it only in the context of solving a specific problem. You might end up thinking that the problem
you are solving at the moment is the only thing the concept is good for, missing the big picture and
potentially stumbling in all sorts of future misunderstandings. We won’t let that happen!
10
Let's go back to the absolute basics and think about the scenario that I described earlier in Figure
1.1 - the scenario in which I have one resource of some kind, let's say, a web application and a user,
and we want to connect the two. Now, what is identity in this context?
We won't get bogged down with philosophy and similar. Identity here can be defined in a very
operational, very precise fashion. We call digital identity the set of attributes that define a particular
user in the context of a function which is delivered by a particular application. What does it mean?
That means that if I am a bookseller, the relevant information I need about a user is largely their
credit card number, their shipping address, and the last ten books that the user bought. That's their
digital identity in that context. If I am the tax department, then the digital identity of a user is again,
a physical address, an identifier (here in the USA is the Social Security number), and any other
information which is relevant to the motion of extracting money from the citizen. If I am a service
that does DNA sequencing, the identity of my user is the username that they use for signing in,
their email address for notifications, and potentially their entire genome.
You can see how for all the various functionalities that we want to achieve, we actually have a
completely or nearly completely different set of identities. These might correspond to the same
physical person or not. It doesn't matter. From the point of view of designing our systems, that's
what the digital identity is. So, you could say that the digital identity of this user is this set of
attributes we can place in the application’s store. Now the problem of identity becomes: when
do I bring those particular attributes in context? The oldest trick in the world is to have the
resource and the user agree on something such as a shared secret of some sort. So, when the
user comes back to the site and presents that secret, demonstrates knowledge of that secret,
the website will say, okay, I know who you are, you’re the same user I saw yesterday. Here is
your set of attributes, welcome back. I authenticated the user. In summary, that means grabbing
a set of credentials, sending it over, and assuming that those credentials were saved previously
11
This scenario is summarized in the following picture:
Figure 1.2
Now, you hear a lot of bad things about username and password... and they are all true. That's
unfortunate, but it's true. However, it is an extraordinarily simple schema, and as such, it is very,
very, very resilient. Even if we have more advanced technologies, which do more or less the
same job, passwords are still very popular. I predict that this year, like every year, someone will
say that this is the year in which passwords will die. But I think that passwords will still be around
for some time. My favorite metaphor for this is what happens in the natural world. Humans are
allegedly the pinnacle of evolution. However, there are still plenty of jellyfish in the sea. They are
so simple, and sure, we are more advanced, but I am ready to bet that there are more individual
jellyfish than there are humans. The fact that their body plan is simple doesn't mean that it is
not successful. You'll see, as we go through this history, that passwords are somewhat building
blocks on which more advanced protocols layer on top of. Again, I'm not discounting the efforts
of eliminating passwords and using something better, but I'm just trying to set expectations that
12
... to Directories
Let's make things a bit more interesting. Imagine the scenario in which we have one user and
one application. Now, extend this scenario to the situation in which this user is an employee of
some company. There is a collection of applications being used by this particular user in the
context of the company’s business. Most applications are all part of what the user does in the
context of his or her employment. Imagine that one application is for expense notes, the other is
for accounting, the other is for warehouse management. Anything you can think of. A few years
ago, what happened was that we had a bunch of apps on a computer. Then, we had someone
showing up with a coaxial cable, installing token ring networks, and placing all these computers
in the network. But that alone didn't make the environment, and in particular the applications,
automatically network ready. What happened is that you'd have exactly the situation - the big
thing here - in which you'd have a user accessing different independent apps which knew nothing
about each other, and which replicated all the functionality that could have been easily centralized.
In particular, every user had different usernames and passwords - or I should say different
usernames, because, of course, people reuse their passwords. Every time users went to a new
app, they had to enter their credentials. And whenever a user had to leave the company, willingly
or not, the administrator had to go in pilgrimage, on all these various apps, run after the user’s
entries in there and deprovision them by hand, which of course is a tedious and error-prone flow.
It's difficult. You often hear horror stories of disgruntled employees using procurement systems
for buying large amounts of items just for getting back at their former bosses and being able to
13
What happened is that the industry responded by introducing a new entity, which we call the
directory. The directory is still extremely popular. It is a software component, a service, which
Figure 1.3
Basically, the directory centralized credentials and attributes and made it redundant for applications
to implement their own identity management logic. At this point, users would simply sign in with
their own central directory, and from that moment onward, they'd have Single Sign-On access
to all the other applications. The application developers didn't actually have to code anything
for identity to achieve that result. In fact, now that the network infrastructure itself provided the
identity information, administrators could now take advantage of this centralized place to deal
with the user lifecycle. It can be said that the introduction of the directory is what truly created
created an ecosystem of tooling that helps people to run operations, identities, and similar. So,
a fantastic improvement - which was predicated on the perimeter. In order for all this to work as
intended, you had to have all the actors within that perimeter. The perimeter was often the office
14
building itself, with users actually walking in the building, sitting in front of a particular physical
device, and having direct “line of sight” with this cathedral in the center of the enterprise: the
Cross-Domain SSO
Of course, we know from current business practices that this approach doesn't scale. It works
well when you are within one company, but there are so many business processes that require
Think of a classic supplier or reseller. Any of those relationships requires spanning multiple
organizations. And so what happens is that when you have a user in one organization that needs
to access a different resource in a different organization, you have a problem. In fact, this user
The first way in which the industry tried to give a solution to this problem was to introduce what
we call shadow accounts, which means provisioning the user to the resource side directory.
This is completely unsustainable, as it presents the same problems that we mentioned earlier at
a different scale when every application handled identity explicitly. Let's say that we have a user
whose lifecycle is managed in one place, their own home directory, but that has been provisioned
an entry in the resource side directory as well. When the user is deprovisioned from their home
directory, then there might be a trail of user accounts provisioned in other directories (such as
the resource side directory in our scenario) that are still around and that need to be manually
deprovisioned. That's, of course, a big problem because the deprovisioning isn’t likely to happen
timely or, like any changes in general, are harder to reflect in distributed systems not centrally
15
managed. Plus, imagine the complexity of having this company, which may be a reseller for many
other companies, but needs to duplicate somewhat the work that its customer companies are
already doing in their own directories for managing their own users. It's just not sustainable.
So, what happened was that, just like it's classic in computer science, we solved this problem by
adding a level of abstraction. We took the capabilities that we have seen for the local directory
case, and we just abstracted it away. We provided the same transactions, but we described
them in a way that is not dependent on network infrastructure. For example, Active Directory
and directories in general, rely on an authentication protocol called Kerberos, which is very much
integrated with a network layer, hence has specific network hardware requirements. Whereas, of
course, in this case of scenarios spanning multiple companies, we have to cross the chasm of the
public Internet and cannot afford to impose any requirements as requests will traverse unknown
network hardware.
What happened is that the big guys of that time, Sun, IBM and similar, sat at one table and came
up with this protocol called SAML, which stands for Security Assertion Markup Language. In a
nutshell, the protocol described a transaction in which a user can sign in in one place and then
show proof of signing in in another place and gain access. Here’s how it works. We need something
which facades my actual resource with some software which is capable of talking with that
protocol, which in this particular case is going to be what we call a middleware: a component that
stands between your application and the caller, intercepting traffic and executing logic before
the requests reach the actual application. Similar protocol capabilities would be exposed on the
identity provider side. In the topology shown in figure 1.3, we have the machine already fulfilling
the local directory duties (what we call the domain controller in the directory jargon), and we just
teach that machine to speak a different language, SAML, which can be considered somewhat of
a trading language that we can use for communication outside the company’s perimeter.
16
In order to close this transaction, what happens is that we need to introduce another concept:
trust. Think of the scenario we were describing earlier, the one within one single directory: in it,
every application and every user implicitly believes and trusts the domain controller. The network
software in itself, whenever you need to authenticate, will send you back to the domain controller
and the domain controller will do its authentication. It is just implicit, it's as natural as the air that
you're breathing because there is only one place that can perform authentication duties in the
entire network.
Figure 1.4
The application within the Company 2 perimeter can be accessed by any of its business partners:
there is now a choice about from where we want to get users identities, there is no longer an
obvious default users’ source. We say that a resource trusts an identity provider or an authority
when that resource is willing to believe what the authority says about its users. If the authority
says: “this user is one of my users and successfully authenticated five minutes ago”, then the
17
When you set up your middleware in front of your application, you typically configure it with the
coordinates of the identity providers that you trust. How does that come into play when you
actually make a transaction? Let's see how this works in an actual flow by describing in detail
Figure 1.5
In the first leg of the diagram, the user points the browser to the application and attempts to
GETa page (1). The middleware in front of the application intercepts the request, sees that the
user is not authenticated, and turns the request into an authentication request to the identity
In concrete terms, the middleware will craft some kind of message, probably a URL with specific
query string parameters, and will redirect the browser against one particular endpoint associated
18
In this particular scenario, the target endpoint belongs to a local identity provider. You can see that
the call to the IdP authentication endpoint is occurring within the boundaries of the enterprise.
That means that that call will be authenticated using Kerberos, like any other call on the local
network. You can already see these layering of protocols, one on top of the other. Thanks to the
use of Kerberos and the fact that the user is already authenticated with the local directory, the
user will not have to enter any credentials during this call.
Next, the identity provider establishes that the user is already correctly authenticated, and
establishes that the resource is one of the resources that have been recorded and approved.
Because of those positive checks, the IdP issues to the user what we call a security token (4).
A security token is an artifact, a bunch of bits, which is used to carry a tangible proof that the
user successfully authenticated. Security tokens are digitally signed. What does it mean? A
digital signature is something that protects bits from tampering. Let's say that someone modifies
anything of those bits in transit: when the intended recipient tries to check the signature, it will
find that the signature does not compute. The recipient will know for sure that those bits have
This property is useful for two reasons. One reason is that given that we use public-key
cryptography, we expect that the private key that was used to perform the signature is only
accessible by the intended origin of this token. No one else in the universe can perform with
that signature, but that particular party. Remember what we just said about trust: that property
can be used as proof that a token is coming from a specific entity, and in particular, whether it
is a trusted one.
The second reason is that given that the token content cannot be modified in transit without
breaking the signature, I can use tokens as a mechanism to provide the digital identity of a user
19
on the fly. Instead of having to negotiate in advance the acquisition of the attributes that define
the user (the user identity, according to our definition), as an application, I can just receive those
attributes just in time, together with the token. This might be the first and the last time that this
particular user accesses this application, but thanks to the fact that there is a trust between the
The attributes that travel inside tokens are called claims. A claim is simply an attribute packaged
in a context that allows the recipient to decide whether to believe that the user does indeed
possess that attribute. Think about what happens when boarding a plane. If I present my passport
to the gate agents, they will be able to compare my name (as asserted by the passport) with the
name printed on my boarding pass and decide to let me go through. The gate agents will reach
that conclusion because they trust the government, the entity that issued my passport. If I’d pull
out a Post-it with my name jolted down with my scrawny chicken legs handwriting and present it
to the gate agents in lieu of the passport, I'm probably not going to board the plane - in fact, I'm
likely going to be in trouble. The medium truly is the message in this case. The token really does
carry this potential for deciding whether you trust or not that particular information. Attributes
Once the identity provider issues a SAML token, it typically returns it to the browser inside an
HTML form, together with some JavaScript that triggers as soon as the page is loaded - POSTing
the token to the application, where it will be intercepted by the middleware (5).
The middleware looks at the token, establishes whether it's coming from a trusted source,
establishes whether the signature hasn't been broken, etc. etc. and if it's happy with all that, it
emits what we call a session cookie (6). The session cookie represents the fact that successful
authentication occurred. By setting a cookie to represent the session, the application will be
20
spared from having to do the token dance again for every subsequent request. The session
cookie is simply used for enabling the application to consider the user authenticated every time
This is how SAML solved the particular problem of cross-domain single sign-on. We’ll see that
this pattern of exchanging a token for a cookie will also occur with OpenID Connect.
All this happened in the business world, but the consumer world also didn't stay still from the
identity perspective. One thing that happened was that, as we got more and more of our lives
online, we found ourselves more and more often with the need to access resources that we
Let me make a very concrete example. I guess that many of you have LinkedIn, and many of you
also have Gmail. Imagine the following scenario. Say that a user is currently already signed in
in LinkedIn, in whatever way they want. The mechanics of how they got signed in in LinkedIn is
not the point in this scenario. Say that LinkedIn wants to suggest you to invite all of your Gmail
We are using LinkedIn and GMail only because they are familiar names with familiar use cases,
but we are in no way implying that they are really implemented in this way nor that they played
Now, how was LinkedIn used to do this? I'm using LinkedIn as an example here, but it's basically
the behavior of any similar service you can think of before the rise of delegated authorization.
Let’s take a look at this flow by following the steps in the following figure.
21
Figure 1.6
LinkedIn would actually ask you for your Gmail username and password, which are normally
stored and validated by Gmail (1). You provide LinkedIn with your Gmail credentials (2), and
then, LinkedIn would use them to actually access the Gmail APIs used by the Gmail app itself for
programmatic access to its own service (3). This would achieve what LinkedIn wants, which is
to call the APIs in Gmail for listing your contacts (4) and sending emails on your behalf.
What is the problem with this scenario? Many problems, but two, in particular, are impossible to
ignore.
The first problem is that granting access to your credentials on any entity that is not the custodian
of those credentials is always a bad idea. That is mostly because those different entities will not
have as much skin in the game as the entity that is actually the original place for those credentials.
If LinkedIn does not apply due diligence and save those credentials in an insecure place... sure,
they'd get bad PR, but it will not be the catastrophe that it would be for Gmail, for which the user
access is now impacted. For example, Gmail users will need to change passwords, creating a
situation where they are highly likely to defect or at least to experience lower satisfaction with
the service.
22
Here’s the second bad thing. Although the intent that LinkedIn had with this transaction was good
(it is mutually beneficial both for me as a user and for LinkedIn as a service for me to expand my
network), the way in which they have implemented the function gives them way too much power.
LinkedIn can actually use this username and password to do whatever they want with my Gmail.
They can read my emails, they can delete emails selectively, they can send other emails, they can
do everything they want beyond the scenario originally intended - and that's clearly not good.
In response to the challenges outlined at the end of the preceding section, the industry came up
with a way of working around the problem of giving too much power to applications.
OAuth2 was designed precisely to implement the delegated access scenario described earlier,
but without the bad properties we identified as part of the brute force approach. The defining
feature of the OAuth2 approach lies in the introduction of a new entity, the authorization server,
which explicitly handles operations related to delegated authorization. I won't go too much into
the details right now, because I'm going to bore you to death about it later on in this book.
Suffice to say here that the authorization server has two endpoints:
The authorization endpoint, designed to deal with the interaction with the end-user.
It's designed to allow the user to express whether they want a certain service to access
their resources in a certain fashion. The authorization endpoint handles the interactive
1 The first incarnation of OAuth was OAuth1, a protocol that resolved the delegated access scenario but had several limitations and
complications. The industry quickly came up with an evolution, named OAuth2, which solved those problems and completely sup-
planted OAuth1 for all intents and purposes. For that reason, in this text we only discuss OAuth2.
23
The token endpoint, which is designed to deal with software to software communication and
takes care of actually executing on the intent that the user expressed in terms of permission,
Please note: in the following discussion, we are assuming that the user is already signed in
LinkedIn even before the described scenario plays out. We don’t care how the sign-in occurred
in this context; we just assume it did. OAuth2, as you will hear over and over again, is not a
sign-in protocol.
Let’s say that that, as part of his or her LinkedIn session, the user gets to a point in which LinkedIn
wants to gain access to Gmail API on his or her behalf, as described in the last section for the
analogous scenario.
In the OAuth2 approach, that means that LinkedIn will cause the user to go to Gmail and grant
permission to LinkedIn to see their contacts and send mail on their behalf. Let’s follow this new
24
Figure 1.7
LinkedIn follows the OAuth2 specification to craft an authorization request and redirect the user’s
browser to GMail’s authorization server and, in particular, the authorization endpoint (1).
The authorization endpoint is used by Gmail to prompt the user (2) for credentials if they are
not currently authenticated with the GMail web application. This is all within the natural order
of things. In fact, it's Gmail asking a Gmail user for Gmail credentials. So, no foul playing here,
everything is fine. As soon as the user is authenticated, the Gmail authorization server will prompt
the end-user, saying something along the lines of, "Hey, I have this known client, LinkedIn, that
needs to access my own APIs using your privileges. In particular, they want to see your contacts,
and they want to send emails on your behalf. Are you okay with it?"
Once the user says okay, presumably, the authorization server emits an authorization code (3).
An authorization code is just an opaque string that constitutes a reminder for the authorization
server of the fact that the user did grant consent for those permissions for that particular client.
25
The authorization code is returned to LinkedIn via browser (4). From now on, the rest of the
Please note: before any of the described transactions could occur, LinkedIn had to go to the
authorization server and register itself as a known client. As part of the client registration operation,
LinkedIn received an identifier (called client id) and, most importantly, a client secret. The client
id and client secret will be used for proving LinkedIn’s identity as an application in requests sent
to GMail’s authorization server, in particular to its token endpoint. The remainder of the diagram
Now that it obtained an authorization code, LinkedIn will reach out to the token endpoint of the
authorization server (5) and will present with its own credentials (client id and client secret) and
the authorization code, substantially saying, "Hey, this user consented for this and I'm LinkedIn.
As an outcome of this, the authorization server will emit a new kind of token, which we call an
access token (6). The access token is an artifact that is used to grant to LinkedIn the ability to
access the Gmail APIs (7) on the user’s behalf, only within the scope of the permissions that the
This solves the excessive permissions problem described in The Password Sharing Anti-Pattern
section. In fact, as long as LinkedIn accesses the Gmail APIs only attempting operations the user
consented to, the requests to the API will succeed. As soon as LinkedIn tries to do something
different from the consented operations, like, for example, deleting emails, the endpoint will deny
LinkedIn access, because the access token accompanying the API call is scoped down to the
permissions the user consented to (in our example, read contacts and send emails). Scope is
the keyword that we use here to represent the permissions a client requested on behalf of the
26
user. This mechanism effectively solved the problem of excessive permissions, providing a way
What we described so far is the canonical OAuth2 use case, the one for which the protocol has
been originally designed. In practice, however, OAuth2 is used all over the place, and it incurs in
all sorts of abuses, that is, in ways in which OAuth2 wasn't designed to be used. Be on the lookout
for those problematic scenarios: every time you hear that some solution uses OAuth2, please
think of the canonical use case as described here first. OAuth2 supports many other scenarios,
and in this book, we will discuss most of them. However, the core intent is as expressed in the use
case we described in this section. Thinking about whether a solution is using OAuth2 in line with
the intent expressed here, or delve from it significantly, is a useful mental tool to verify whether
you are dealing with a canonical scenario or if you need to brace for non-standard approaches.
Let me give you a demonstration of one particularly common type of OAuth2 abuse. As OAuth2
and delegated authorization scenarios started gaining traction, many application developers
decided that they wanted to do more than just calling APIs. They wanted it to achieve in the
consumer space, what we achieved with SAML. They wanted to allow users to sign in in their apps
reusing accounts living in a completely different system. Instantiating this new requirement in the
scenario we’ve been discussing, LinkedIn might like users with a Gmail account to be able to use
it to sign in in LinkedIn directly, without the need to create a LinkedIn account. In other words,
LinkedIn would just want users to be able to sign up in LinkedIn reusing their Gmail accounts.
27
This is a sound proposition because, in many cases, people typically aren't crazy about creating
new accounts, new passwords, and similar. So, making it possible to reuse accounts is not a bad
idea in itself
However, OAuth2 was not designed to implement sign-in operations. Most providers only exposed
OAuth2 as a way of supporting delegated authorization for their API, and did not expose any
proper sign-in mechanism as it wasn’t the scenario they were after. That didn’t deter application
developers, who simply piggybacked on OAuth2 flows to achieve some kind of poor man's signing
in. Imagine the delegated authorization scenario described for the canonical OAuth2 flow and
imagine it taking place with the user not being previously signed in in LinkedIn. The following
Figure 1.8
LinkedIn can perform the dance to gain access to Gmail APIs without having any authenticated
user signed in yet (1). As soon as LinkedIn successfully accesses Gmail APIs (2), it might reason,
“Okay, this proves that the person interacting with my app has a legitimate account in Gmail”,
28
so LinkedIn might be satisfied by that and consider this user authenticated - which in practice
could be implemented by creating and saving a session cookie (3), as we did during sign-in
This would be a good time to remind you that we are using LinkedIn and GMail only
because they are familiar names with familiar use cases, but we are in no way implying
This pattern for implementing sign-in is still a common practice today. A lot of people do this.
It's usually not a good idea, mainly because access tokens are opaque to the clients requesting
them, which makes many important details impossible to verify. For example, the fact that an
access token can be used for successfully calling an API doesn't really say anything about
whether that access token was issued for your client or for some other application. Someone
could have legitimately obtained that access token via another application (in our scenario
not as LinkedIn, but as some other app) and then somehow managed to inject the token in
the request. If LinkedIn just uses that token for calling the API and it reasons, “Okay, as long
as I can use this token to call the API without getting an error, I’ll consider the current user
Another consequence of the fact that access tokens are opaque to clients is that an attacker
could get a token from a user and somehow inject it in the sign-up operation for a completely
different user. Once again, LinkedIn wouldn't know better because unless the API being called
returns information that can be used to identify the calling user, the sheer fact that the API
call succeeds will not provide any information the client can use to determine that an identity
swap occurred.
29
The attacks that I'm describing are called the Confused Deputy attack, and they are a classic
Even more aggravating: with this approach, there is no way to standardize the OAuth2 based
sign-in flow. In our model scenario, the last mile is a successful call to Gmail APIs. If I want to
apply the same pattern with Facebook, the last mile would be a successful call to the Facebook
Graph APIs, which are dramatically different from the GMail API. That makes it impossible to
enshrine this pattern in a single SDK that can be used to implement sign in with every provider
This is where the main players in the industry once again came together and decided to
introduce a new specification, called OpenID Connect, which formalizes how to layer signing
in on top of OAuth2. I'll go into painstakingly fine details about that effort in the rest of the
book, but in a nutshell, the central point of the approach is the introduction of a new artifact,
which we call the ID token. The ID token can be issued by an authorization server via all the
flows OAuth2 defines. OpenID Connect describes how applications can, instead of asking
for an access token (or alongside access token requests), ask for an ID token. The following
Figure 1.9
30
An ID token is a token meant to be consumed by the client itself, as opposed to being used
by the client for accessing a resource. The characteristic of the ID token is that it has a fixed
format that clients can parse and validate. The use of a known format and the fact that the token
is issued for the client itself means that when a client requests and obtains an ID token, the
client can inspect and validate it - just like web apps secured via SAML inspected and validated
SAML tokens. It also means the ability to extract identity information from it, once again, just
like we learned is common practice with SAML. Those properties are what makes it possible
to achieve proper signing in using OAuth2. The news introduced by OpenID Connect didn't
stop there: the new specification introduced new ways of requesting tokens, including one in
which the ID token can be presented to the client directly via the front channel, between the
browser and the application. That makes it possible to implement sign in very easily, just like
we have learned in the SAML case, without having to use secrets and a backside integration
What we have seen in this chapter can be thought of as a rough timeline for the sequence
of events that culminated with the creation of OpenID Connect. In the next chapters, we will
expand on the high level flows described here, going in deep in the details of the protocol.
31
Auth0: an Intermediary Keeping Complexity at Bay
What's the role of Auth0 in all this? You can think of Auth0 as an intermediary that has all
the capabilities in terms of protocols to talk to pretty much any application that supports the
protocols that you support, such as OAuth2, OpenID Connect, SAML, WS Federation.
Figure 1.10
You can simply integrate your application with Auth0, which in a nutshell, is a super authorization
server, using any of the standard protocol flows we described in this chapter. From that moment
on, Auth0 can take over the authentication function: when it’s time to authenticate, your app
can redirect users to Auth0 and, in turn, Auth0 will talk to the different identity providers you
want to integrate with, in each case using whatever protocol each identity provider requires. If
the identity providers of choice are using one of the open protocols I mentioned, the integration
Auth0 needs to perform is very easy. But if they are using any proprietary approach, for the
32
application developer, it doesn't matter. Once the app redirects to Auth0, Auth0 takes care of
the integration details. For you, it's just a matter of flipping a switch saying, “I want to talk with
this particularidentity provider” - the result, mediated by Auth0, will always come in the format
determined by the open protocol you chose to use for integrating with Auth0. In concrete, that's
what we meant earlier when we stated that Auth0 abstracts away the problem from you.
In addition, Auth0 offers a way of managing the lifecycle of a user. Auth0 maintains its own user
store; it integrates with external user stores and exposes various operations you can perform
for managing users. For example, you can have multiple accounts sourced from multiple identity
providers, that accrue to the same account in Auth0 and your app. You can normalize the set of
claims that you receive from different identity providers so that your application doesn't have to
We also provide ways of injecting your own code at authentication time, so that if you want to
execute custom logic, for example, subscription, or billing, or any functionality which just makes
sense in your scenario to occur at the same time of authentication, you can easily achieve that.
You have full control over the experience your users will go through, as Auth0 allows you to
customize every aspect of the authentication UX. Auth0 makes it very easy for you to use
the set of features, mostly by providing you with a dashboard that has a very simple point and
click interface. You can also use Auth0’s management APIs to achieve programmatic access to
That's it for Identity 101. It was a pretty quick whirlwind tour of the last 15 to 20 years of evolution
in the world of digital identity. In the next chapters, we'll spend a bit more time sweating the details.
33
Chapter 2 - OAuth2 and OpenID Connect
Let's dig a bit deeper, and specifically turn our attention to OAuth and OpenID Connect (OIDC)
as protocols.
Have you ever read any of the specifications of those protocols? I am an old hand at this: I
was working in this space when there was still CORBA, WS-Trust, and various other old man's
protocols. In the past, identity protocols tended to be extraordinarily complicated: they were
XML-based, and exhibiting high assurance features that made them hard to understand and
implement. For example, the cryptography they used supported what was called message-based
security - granting the ability to achieve secure communications even on plain HTTP. It was an
interesting property, but it came at the cost of really intricate message formatting rules that made
implementation costs prohibitive for everyone but the biggest industry players.
Now, the new crop of protocols, OAuth, OpenID Connect, and similar, are based on simple HTTP
and JSON - a reasonably simple format - and they heavily rely on the fact that everything occurs
on secure channels. This simple assumption enormously simplifies things: together with other
simplifications and cuts, this makes the new protocols more approachable and at least readable.
However, we are not exactly talking about Harry Potter. Ploughing through eighty-six pages of
intensely technical language, such as the ones constituting the OpenID Connect Core specification,
is a pretty big endeavor, even for committed professionals. If you work in the identity space,
you'll find yourself referring to the specifications in detail, over and over again, with a lawyer-like
focus, on each and every single word - those documents are dense with meaning. You can also
see that the specifications have a pretty high cyclomatic complexity. That's to say: there are
34
multiple links that provide context and, usually, there is not a lot of redundancy. If there is a link
pointing to another specification defining a concept used in the current document, you've got to
follow the link and actually learn about that concept before you can make any further progress.
There's really a very large number of such specifications, even if you limit the scope to just one
or two hops from the code OpenID Connect and OAuth2 core specs. All the specifications that
you see in the constellation of OAuth, and OpenID, and JWT, and JWS, and similar are the core,
describing the most fundamental aspects that come into play when handling the main scenarios
those specifications are meant to address. There is an entire ring of best practices or new
capabilities not shown here. The complete picture is, in fact, much larger.
Figure 2.1
35
The main reason for which I am showing you this is to dispel the notion, which a lot of people really
like to believe, that adding identity capabilities to one application is just a matter of reading the
spec. If you want to do modern identity, just read the OAuth2 and OpenID Connect specifications,
and you'll be fine. Of course, the reality is quite different. If that would be true, then not a lot of
In fact, reading all these things is our job, as identity professionals - as the ones who build identity
services, SDKs, quick starts, samples, and guides that developers can use for getting their job
done without necessarily having to be bogged down in the fine-grained details of the underlying
protocols. That said, given that the book you are reading is meant to be read by aspiring identity
professionals, the fine-grained details of the protocol are among the things we want to learn
about - and what you'll find in abundance in the rest of the text.
However, I dislike the classic academic approach so common in other learning material about
identity. There you just get the lecture and a laundry list of the concepts listed in these various
specifications - college style - and expected to figure out on your own how they apply to your
scenarios. The messages, artifacts, and practices defined in those specifications are all there
for specific reasons. Typically, it is for addressing use cases and scenarios. It's just that their
language is such that it's not presented, usually, in a scenario-based approach, as it would not
be economical in a specification to do so. That's a great approach for formal descriptions and
keeping ambiguity to a minimum, but not great for actually understanding how to apply things
in concrete.
I'm going to turn things around, and actually, apart from giving you some basic definitions, I want
to operate at the scenario level. I want you to understand why things are the way they are and
how they are applied in particular solutions rather than just ask you to study for a test. In the
36
process, we will eventually end up covering all the main actors and all the main elements in the
specifications. Simply, we will not be following the traditional order in which those artifacts are
listed in the specs themselves. We'll just follow the order dictated by the jobs to be done that
we want to tackle.
OAuth2 Roles
Let's start with the few definitions that I mentioned we need before starting our scenario-based
journey through the specifications. OAuth2 and OpenID Connect define a number of primitives
that are required for describing what's going on during identity transactions.
In particular, OAuth2 introduces several canonical roles that different actors can play in the context
of an identity transaction. As OpenID Connect is built on OAuth2, it inherits those roles as well.
The first one is the resource owner. The resource owner is, quite simply, the user. Think of
the LinkedIn and Gmail scenario in the preceding chapter: the resource LinkedIn wants to
access is the user's Gmail inbox; hence the user in the scenario is the resource owner.
Then we have the resource server, which is the guardian of the resource, the gatekeeper
that you need to clear in order to obtain access. It typically is an API. In our model scenario,
the resource server is whatever protects the API that LinkedIn calls for enumerating contacts
Then, there is the client, probably the entity that is most salient for developers. The client,
from the OAuth2 perspective, is the application that needs to obtain access to the resource.
37
For OAuth2, which is a delegated authorization protocol and a resource access
protocol, every application is modeled as a client. However, we'll see that when we
start layering things on top of OAuth2, and for example, we'll use OpenID Connect for
signing in, very often what, according to the spec jargon, is called the client will, in
fact, be the resource that we want to access. In that sentence, I use “resource” not
in the OAuth sense, but in the general English language sense of the world. You can
see how naming “client” the resource you want to gate access to might be confusing!
Now that you have seen in Chapter 1 how OpenID Connect was built on top of
OAuth2 scenarios, you know why. That's because in OpenID Connect signing in
be consumed by the requestor itself, rather than for accessing an external resource.
Your application is both the client (because it requests the IDtoken) and the resource
itself (because it consumes it instead of using it for calling an API), but the term
we end up using for describing the app in protocol terms is just client. That can
be confusing for the non-initiated, but that's the way it is. I will often highlight this
Digital Identity, is the collection of endpoints used for driving the delegated authentication
The authorization server exposes the authorization endpoint, which is the place where users
go to for anything entailing interactivity. Practically speaking, the authorization endpoint serves
38
back web pages. It's not always literally the case, as we'll see in the chapter about SPA, but the
The authorization server also features a token endpoint, that is the endpoint to which apps typ-
ically speak to in programmatic fashion, performing the operation that actually retrieves tokens.
Authorization and token endpoints are defined in OAuth2 Core. OpenID Connect augments
those with the discovery endpoint. This is a standard endpoint that advertises, in a machine -
consumable format, the capabilities of the authorization server. For example, it will list information
like the addresses of the two endpoints that I just described. Another essential information the
discovery endpoint provides is the key that OIDC clients should use for validating tokens issued
The most complicated things in the context of OAuth2 and OpenID Connect are usually what
we call the grants. In a nutshell: grants are just the set of steps a client uses for obtaining some
kind of credential from the authorization server, for the purpose of accessing a resource. As
simple as that. OAuth2 defines a large number of grants because each of them makes the best
of the ability of a different client type to connect to the authorization server in their own ways,
according to its peculiar security guarantees. Grants also serve the purpose of addressing
different scenarios, such as scenarios where access is performed on behalf of the user vs. via
I won't go into details of the various grants here because we are going to pretty much look at
all of them inside out through this book. Suffice to say at this point that there is a core set of
39
grants originally defined by OAuth2: Authorization Code, Implicit, Resource Owner Credentials,
Client Credentials, and Refresh Token. OpenID Connect introduces a new one, the Hybrid,
which is combining two particular OAuth2 grants into one single flow.
In addition to the grants defined by the code OAuth2 and OpenID Connect specifications, the
OAuth2 working group at IETF and the OpenID Foundation continuously produce independent
extensions, devised to address scenarios that weren't originally contemplated by the core
specs, or deemed too specific for inclusion. The ability to add new specifications to extend
and specialize the core spec is a powerful mechanism, which helps the community to receive
The book will examine every essential grant in details, with a particular emphasis of the
scenarios for which a specific grant is most appropriate, the reasons behind the main features
characterizing every grant, and the most important factors that need to be taken into account
40
Chapter 3 - Web Sign-In
Starting with this chapter, we are going to dive deeper into concrete scenarios. Let's begin with
Confidential Clients
Before I actually get into the mechanics of it, I have to make a couple of high-level introductions
of artifacts and terminology that we use in the context of OAuth2 in OpenID Connect. In particular,
A confidential client in OAuth2 is a client that has the ability to prove its own application programmatic
identity. It's any application to which the authorization server can assign a credential of some type
- that makes it possible for the app to prove to the authorization server its identity as a registered
This typically happens with any app that is a singleton. Think of a website that is running on a
certain set of machines. Even if executing on a cluster, it's one logical entity running there. When
I provision my client by registering it at the authorization server, I have a clear identity for it. I
have URLs that determine where this client lives, and I have a flow for getting whatever secret
Allegedly, if the application is running on a server, the server administrator is the only person
that can access that secret. Contrast all of this with applications that, for example, run on your
device: those apps are all but a singleton. Every phone will have a different instance of Slack, for
41
example. When you download the application from the application store, there is no easy way
for you to get a unique key that would represent that particular instance of a client.
You certainly cannot embed such key in the code, because it would be de-compiled in a second-
and you'd be in trouble. Also, the device is always available in the pockets of the people using it.
It is outside of your control, so there is no way for you to protect the key for an extended period
of time. A motivated hacker has an infinite time to actually dig into the device, as opposed to a
server that needs first to be breached before it can reveal its secrets.
In summary, confidential clients are clients for which it's appropriate to assign a secret. The
But you can also think of an IoT scenario, in which you want to identify the device itself rather
For example, consider a continuous integration system that uses your Jenkins and compiles
your product overnight, runs tests, and similar long-running tasks. It's likely that you'll want that
daemon to run with its own identity, as opposed to the identity of a user. In fact, if you use the
identity of a user, and then the user leaves the company, it may happen that everything grinds
to a halt, and no one knows why. This happens because very often people forget that a particular
user identity was used for running these scripts. So, assigning its own identity to the daemon is
a better option.
One subtlety here is that even if an application is a confidential client, not every single grant that
the application does will require the use of a client credential. It is a capability that the application
has, but it doesn't have to exercise it every time. There will be, in fact, scenarios, like the one
42
that we are about to explore, in which there is no need to use keys. Typically, the key is used for
proving your identity as a client when you're asking for a token for accessing a different resource.
Instead, we'll see that in the case of Web Sign-In, you are the resource.
The grant that we're going to use here is the implicit grant with form_post. It is kind of a mouthful,
but, unfortunately, that's the way the protocol defines it. This is something that wasn't possible
before OpenID Connect. It is the easiest way to achieve Web Sign-In using OpenID Connect and
it is really similar to SAML. In fact, it basically follows the same steps that I've described when I
demonstrated the first SAML flow in the first chapter, Introduction to Digital Identity.
This grant constitutes the basis of something that only OpenID Connect can do, that is combining
signing-in in a website with granting that website with delegated permission to access an API.
What we are going to do now is to study half of that transaction. We'll only look at the sign-in part.
When we will talk about APIs, we'll look at the other half. Those two halves can be combined so
that the experience for the user is truly streamlined. Also, in terms of design, combining sign in and
API invocation capabilities makes it possible for an application to play multiple roles. This is a really
Given that we're using the front channel, we don't need to use the application credentials. We
see that there are security implications here and there, but, as just said, it is just like SAML.
Setting this thing up from a developer perspective is a thing of beauty. You just install your
middleware in front of your application. Then, you use your configuration to point it to the
discovery endpoint, as we mentioned in Chapter 2, OpenID Connect and OAuth, and just specify
43
the identifier that you were assigned as a client when you registered your application. In the
authorization server, you need to specify the address where you want to get tokens back to the
A detailed walkthrough
Let's see in detail how the implicit grant with form_post works. Take a look at the scenario shown
by Figure 3.1:
You might notice that, in this authorization server, I'm showing only the authorization endpoint
and the discovery endpoint. I don't show the token endpoint because, in this particular flow, we
The idea is that, as soon as this web application comes alive, the middleware will reach out
to the discovery endpoint and will learn everything it needs about the authorization server. In
particular, it will get the address of the authorization endpoint and the key to be used for checking
signatures. We’ll show how all those steps occur in detail later on (see Metadata and Discovery
Let's see how the access plays out by describing each numbered step.
44
Figure 3.1
45
2. Authorization Request Redirect
The middleware intercepts this call and emits an authorization request for the authorization
server in response. The HTTP response has an HTTP 302 status code, i.e. it's a redirect,
and has a number of parameters meant to communicate to the authorization server all the
Figure 3.2
It’s really important to understand the anatomy of this message since all the other messages that
we'll see will be a derivative of this. Here, we're going to touch on all the most relevant parameters.
Authorization endpoint. The first element is the authorization endpoint. That's the address
where we expect the authorization endpoint functionality to be for the authorization server.
Client ID. This client_id parameter is the identifier of your application at the authorization
server. The authorization server has a bundle of configuration settings associated with your
app, and it will bring those up in focus when it receives this particular client ID.
46
Response type. The response_type parameter indicates the artifact that I want. In this
particular case, I want to sign in, so I need an ID token. Consequently, the value of the
response_type parameter will be id_token. There is a large variety of artifacts that I can ask
for. I can also ask for combinations of artifacts: we'll see those combinations in detail.
Response mode. Response mode is the way in which I want these artifacts to be returned to
me. I have all the choices that HTTP affords me. I can get things in the query string, but this
is usually a bad idea because artifacts end up in the browser history. I can get the artifacts
in a fragment, which is still part of the URL but not transmitted to a server. I can get them as
a form post (form_post), which is what we are using here. In this case, we just want to make
sure that we post the token to our client. This way, we don't place stuff in the query string,
which, as mentioned, is generally a bad practice, from the security perspective. The use of
a POST also allows us to have large tokens. In fact, if you would place stuff anywhere but in
Redirect URI. The redirect_uri parameter has a very important role. It represents the address
in my application, where I expect tokens and artifacts to be returned to. I need to specify
this because the tokens that we use in this context are what we call bearer tokens.
Bearer tokens are tokens that can be used just by owning them. In other words, I can use it
directly, without needing to do anything else, like other types of tokens might require. For
example, other types of tokens may require me also to know a key and use it at the same
time. But bearer tokens don't. You will hear much more about bearer tokens in the section
about token validation (see Principles of Token Validation). So, it is imperative that I use only
47
Also, it is very important that I specify the exact address I want the response to be sent back
to. If I don’t and, for example, instead of doing a strict match with the address they provide,
I allow callers to attach further parameters, I put communication security at risk. What might
happen - and it did happen in the past - is that there might be flaws in the development stack
I'm using that will cause my request to be redirected elsewhere. That would mean shipping
to malicious actors my bearer tokens, and that’s all they’d need to impersonate me. OAuth2
and OpenID Connect are strict about this: the redirect URI that you specify in the request
Scope. The scope parameter represents the reason for which I'm asking for the artifacts. In
the example above, I specified openid, profile, and email, which are scopes that cause the
authorization server to issue an ID token with a particular layout. It's somewhat redundant
with the earlier response type, but I’m also asking for enriching this ID token with profile and
In short, with the scope, I am specifying the reason for which I want the artifacts I am
requesting. We will see that, when we will use APIs, we'll be asking for particular delegated
Nonce. The nonce parameter is mostly a trick for preventing token injection. At request time,
I generate a unique identifier, and I save it somewhere (like in a cookie). This identifier is
sent to the authorization server, and eventually, the ID token that I receive back will have a
claim containing the same identifier. At that point, I'll be able to compare that claim with the
identifier that I saved, and I'll be confident that the token I received is the one I requested. If
I receive a token that has a different (or no) identifier, I have to conclude that the response
48
It is worth mentioning that I specified form_post as the value for response_mode because the
default response mode of ID token would be different (it would have been fragment); hence I had to
override it explicitly. The following table shows the default response mode for each response type
defined by OAuth2 and OpenID Connect. If I omit response_mode in the request, the authorization
3. Authorization Request
The next step for the browser is to honor the 302 redirection and actually perform a GET
hitting the authorization endpoint with all the parameters I just described.
From now on, the authorization server does whatever it deems necessary to authenticate
a user and to prompt for consent. How this occurs isn't specified by OAuth2 or OpenID
Connect. The mechanics of user authentication, credentials gathering, and the like are a
completely private matter of the authorization server, as long as the eventual response
that comes back is in the format dictated by the standard. You can have multi-factor
authentication, multiple pages, one single page. It doesn't matter, as long as you come
49
4. Authorization Response
Once everything works out, you get an HTTP response with a 200 status code. This means
that you have successfully authenticated with the authorization server. The authorization
server will set a cookie that represents your session with it. So, if later on you need to hit
the authorization endpoint again, you will not have to enter credentials to sign in explicitly.
You might have to give more consent, for example, but you shouldn't have to re-enter
credentials.
The other important part to note here is the ID token, which is what we requested. It is
being returned as a parameter in the form post that we are getting. You can see in the
body of HTML being returned, that the JavaScript onload event is wired up to submit a
form automatically.
the form to our application. This means that the requested ID token is finally sent to my
web application.
on scenario in the first chapter, Introduction to Digital Identity. The application receives the
ID token and decides whether it likes it or not according to all the various trust rules, and
what it has learned from the discovery endpoint. If it likes it, the app will emit an HTTP
302 response with its own cookie. Thanks to that cookie, representing an authenticated
session with my app, I will not need to get the ID token again as long as the cookie is valid.
50
Together with the cookie creation, the app emits an HTTP 302 response, which redirects
protected route, but this time we present a session cookie with it.
If you compare the original request in 1 with this redirect, you will discover that it is exactly
From now on, every subsequent request toward the application will carry the session
Anatomy of an ID Token
As we said earlier, the ID token is an artifact proving that a successful authentication occurred.
We have two ways of requesting it: using a response_type parameter with the id_token value
The reason for which we have two mechanisms is that the authors of the specifications wanted
to be able to use OpenID Connect even if your SDK was only based on OAuth2. In fact, at the
OAuth2 time, there was no ID token in the enumeration of a response type. Since scopes are
completely generic as a parameter, then the ability to use one particular scope that would cause
51
the authorization server to return an ID token was a great way of being backward-compatible.
Today, it's a great way of getting confused, but now that you know, you no longer run this risk.
OpenID Connect defines the ID token as a fixed format, the JSON Web Token (JWT) format. The
specification actually defines not just the format but the list of claims that must be present in an
ID token. In addition, it even tells you in normative terms what you need to do in order to validate
some of those claims. As we said, if I include a profile or email value in the scopes of my request,
Just to get a feeling of it, here you can see what you would normally see on the wire:
Figure 3.3
That’s what a JWT token normally looks like, with its Base64 encoded components. If you go to
jwt.io, which is a very handy utility offered by Auth0, you can actually paste the bits of your ID
token and see it automatically decoded. The following picture shows an example of such decoding:
52
Figure 3.4
You can see on the right side that we have a header that describes the shape of this specific
JWT. In particular, by examining the header content, we find that this token is in JWT format,
what algorithm has been used for signing it and a reference to the key required to validate the
signature, which in this case corresponds to the key that we downloaded from the discovery
If you look at the payload, you’ll find that it contains the actual information we were expecting to
The issuer (iss), which is a string representing the source of the token, that is the entity behind
the authorization server - like the key, also found via the discovery endpoint.
53
The audience (aud), which represents the particular application which the token has been
issued for. It is very important to check this claim. As an app receives this token, the middleware
used for validating it will compare what was configured to be the app identifier (in the case
of sign-in and ID tokens, that will correspond to the client ID of the app) with the audience
claim. If there is a mismatch, that means that someone stole a token from somewhere else,
The issued-at (iat) and expiration (exp) are coordinates that are used for evaluating whether
this token is still within its validity window or if, being expired, it can no longer be accepted.
We'll see during the API discussion that access tokens and ID tokens typically have a limited
validity time.
All the other claims are pretty much identity information about the user, which are present in
the ID token only because I asked for profile and email in the scope parameter.
We've been talking about validating tokens quite a lot, relying on the intuition that it entails
validating signatures and performing metadata discovery. Let's explore the matter in more detail,
and have a more organic discussion about what it means to validate tokens.
We have seen the function that tokens perform in a couple of scenarios. We have seen signing
in with SAML. We have seen access tokens for calling APIs, and in particular, right now, we have
seen how to use an ID token for signing in. All those scenarios entail an entity, the resource, to
receive a token and make a decision about whether it entitles the caller to perform whatever
operation the caller is attempting. How does the resource take that decision?
54
Subject Confirmation
The subject confirmation is a concept we inherit from SAML. In particular, the subject confirma-
tion method determines the way in which a resource decides whether a token has been used
correctly or not.
Bearer is the simplest. It is similar to finding 20 dollars on the floor. You pick up the money, go
wherever you want to use this money, use it, and you're going to get the good or service you
are paying for. No further questions will be asked because all it takes for using 20 dollars is to
own those 20 dollars and for them to change hands. That's the substance of the bearer subject
confirmation method. If you have the bits of a token in your possession, you are entitled to use
the token.
Proof of possession is something more advanced. In proof of possession, you have a token that
contains a key of some kind in some encrypted section. This encryption is specifically done for
the intended recipient of the token. The idea is that when a client obtains such a token, they
also receive a separate session key, the same key embedded in the encrypted section of the
token. When the client sends a message to the intended recipient, it attaches the token as in the
bearer case, but it also uses this session key to do something - like signing part of a message
When the resource receives the token and the message, they will validate the token in the usual
way as we described for bearer. That done, they will extract the session key from the portion that
was encrypted for them. They'll use the session key to validate the signature in the message. If
the validation works, the recipient will know for certain that the caller is the original requestor
that obtained the token in the first place. Otherwise, they would not have been able to use
55
This mechanism is more secure than the bearer: an attacker intercepting the message would be
able to replay the token, but without knowledge of the session key, they would not be able to
Today nobody substantially is using proof of possession in OAuth2 or OpenID Connect. But proof
of possession is now coming back. There is a specification, still in draft, which shows how to use
the mechanism I just described in OAuth2 and OpenID Connect, but it is not mainstream at all.
So, to all intent and purposes, you can think of Bearer tokens as being the law of the land. There
is another concept - the sender constraint - but I'll talk more about it when we deal with native
In OAuth2, access tokens have no format. The standard doesn't specify any format mostly because
originally, it was thought for a scenario where the authorization server and the resource server
Think, for example, of the scenario we described in the first chapter, where Gmail is the resource
server with its own APIs, and it's also the authorization server.
In that particular scenario, those two entities can share memory. They can have, for example,
a shared database. So, when a client asks for an access token, this access token can be just
an opaque string that happens to be the primary key in a specific table where the authorization
56
When the client makes a call to the resource server presenting this token, the resource server
grabs the token and just uses it for finding in the database the correct row and in it the consented
permissions. The resource server uses that information for making an authorization decision.
This scenario is compliant with the spirit of the spec - and also the letter of the spec - and we
However, in the case of OpenID Connect, we did define a format for the ID token. We expected
the receiver actually to look inside a token and perform validation steps. This happens typically
when the resource server and the authorization server are not co-located, hence cannot use
shared memory to communicate. In those cases, you typically (but not always) rely on an agreed-
upon format.
Also, in the SAML case, we defined a format, a set of instructions on how to encode a token.
In the case of format-driven validation checks, there are certain constraints which apply pretty
Signature for integrity. Your token is signed, and we have seen the reasons for which we want
to sign a token: being sure of the token origin and preventing tampering in transit. The token
must provide some indication about the key and the algorithm used in order for its recipient to
Infrastructural claims. Token formats will typically include infrastructural claims, meant to
provide information that the token recipient must validate to determine whether the incoming
token should be accepted. One notable example of those claim types is the issuer, which is to
say the identifier of the entity that issued (and signed) the token, and that should correspond to
one of the issuers trusted by the intended recipient. Another common infrastructural claim, the
57
audience, says for whom a token is meant to. You need audience to have a way of validating that
the token is actually for a specific recipient. You also need expiration times claims: tokens
have typically restricted validity so that there is the opportunity to revoke them.
Those are all claims that you would expect tokens to have and that the middleware is typically
on point to validate.
There is a different way of validating tokens, which goes under the name of introspection. With
this approach, the resource receiving a token considers it opaque. It may happen because it
doesn’t have the capability to validate the token. It should be rare in the JWT case because
checking a JWT is pretty trivial, and it can be done in any dev stack. However, imagine that for
some reason, you cannot assume that incoming tokens are in a format that you know how to
validate.
You can take the incoming token and send it to the introspection endpoint, which is an additional
endpoint that can be exposed by authorization servers. Given that you connect to the introspection
endpoint using HTTPS, you can actually validate the identity of the server itself. You can be
confident that you are sending the token where it's meant to go, as opposed to a malicious site.
The authorization server examines the token, determines whether that token is valid or not, and
if it is valid, send down the same channel the content of the token itself (e.g.claims).
In a nutshell, the resource server sends back tokens to the authorization server saying, "Please
tell me whether it's valid or not." The authorization server can render a decision and send it back
to the client, along with the content of the token, so that the resource can peek inside.
58
Personally, I'm not crazy about introspection, mostly because it's brittle. You need to have the
authorization server up and available, and if your application is very chatty, you might get throttled,
for example. Also, with this approach, you need to wait until you have one extra network round trip
before you can actually make an access control decision about the resource that you're calling.
You might run out of outgoing HTTP connections, which typically live in a pool. It's a lot of work.
Sometimes there are no alternatives. But in general, for Auth0, given that we always use JWTs
and public cryptography, normally, it's just better if you validate your own token at your API.
The way in which token validation middleware discovers the values expected in valid tokens
is through the discovery endpoint. The middleware simply hits the URL ./well-known/openid-
configuration, which is defined by OpenID Connect, and retrieves validation information according
to the specification.
The document published at this URL typically contains direct information that we need to have,
like the issuer value, the addresses of our authorization endpoint, and similar. It also connects
to a different file that contains the actual keys, which could be literally the bits of X.509 public
key certificates.
Let’s take a look at how middleware extracts validation information from the discovery endpoint
59
Figure 3.5
60
1. Request Configuration
At load time or even the first time that you receive a message, the middleware reaches
tokens.
For example, just to highlight some of these values, you have the address of the
the value that we are supposed to validate against, a list of claims which are supported
3. Request Keys
The next step would be to actually make a GET request to the address at which the keys
are published.
4. Receive Keys
The result of that request will be another file containing a collection of keys with their
respective supported algorithm (alg), their identifier (kid), and the bits of the public key.
The middleware programmatically downloads all of that stuff and keeps it ready.
Those keys will occasionally roll, because it's good practice to change them. Your
middleware will simply have to reach out and re-download these keys when it happens.
61
Chapter 4 - Calling an API from a Web App
In this chapter, we move our attention to calling APIs. This is the quintessential scenario addressed
by OAuth 2.0: delegated access to API is the main reason for which OAuth came to begin with.
Most of the discussion will focus on the canonical grant OAuth 2.0 offers to address the delegated
API access scenario, the Authorization Code grant. We’ll also take a look at other grants, such
as the Hybrid flow and the Client Credentials grant, which can be used to call API in slightly
different scenarios.
At a high level, the way we typically invoke an API from a web application is roughly the same
way we’d call an API from any client flavor. Details will differ, as we will see throughout the book.
Depending on the client’s flavor, we'll use different grants, with different properties. In particular,
in this chapter, we want to focus on the scenarios in which a web application calls an API from
its server-side code. To that purpose, we use the OAuth 2.0 authorization code grant. The
authorization code grant, code grant from now on for brevity, empowers one web application to
access an API on behalf of a user and within the boundaries of what the user granted consent
for. This is the grant we encountered when introducing OAuth 2.0 in Chapter 1.
In section Layering Sign In on Top of OAuth2: OpenID Connect of Chapter 1, we've seen that
some people tried to stretch this grant to achieve sign in, as opposed to invoking an API. In the
same section, we have seen how if you just use this grant to obtain and use access tokens for
62
signing in, things don’t work out that well. We have seen how the OpenID Connect is layered
on top of this grant to achieve signin the right way, and we'll have more considerations about it
in this chapter. At this point, I just want to stress that what we are looking at in this chapter is
Another important concept to grok upfront is that the code grant will only empower an
the application will usually end up having fewer access rights. Users cannot use the
code grant for granting application access to the resources the users themselves
don't own or have the rights for. When thinking about OAuth 2.0 and the code grant,
in particular, it's easy for people to get confused. They observe that APIs grant
access to a call depending on the presence of scopes in the token. That lends to
the credence that the scopes themselves are what grants the client the privileges
to access the resource. Actually, the scopes select what privileges the user already
I just want to stress that the authorization code grant is a delegated flow. It allows
clients to do things on the user’s behalf, which means that the user’s capabilities are a
hard limit for what an application can do on the user’s behalf. In other words, a client
obtaining a token via code grant cannot do more than the user can do. If you need a
client to do more than the user can do, which is a common scenario, then you need to
switch to a different flow in which permissions are granted directly to the application
that needs it, with no user involvement. Clear as mud? Don’t worry. We'll revisit those
63
In the last chapter, we explored how to perform web sign in through the front channel, which
afforded us the luxury to implement the full scenario without any secrets. As you witnessed in the
detailed descriptions of flows and network traces, no secret came into play. In the authorization
code grant, however, the use of an application credential such as a client secret is inevitable.
Whenever the web app redeems an authorization code, it needs to authenticate as a client to the
authorization server. The way in which we will approach the delegated API invocation scenario
will vary depending on whether one needs to access the APIs only while a user is present and
currently signed in in the application, or whether one needs to acquire permanent access to the
APIs and perform calls to these APIs even when no user is present.
My favorite example is an application that can publish tweets at an arbitrary time. Personally, I
don't like to wake up early in the morning: I really hate it. Nonetheless, it turns out that tweets
get the best exposure when they come out pretty early. The fact that I’m based in the West
Coast makes things even worse: if I have to publish tweets manually at a time that should be
considered morning in the entire North America, I’d have to wake up real early. Luckily, there
are applications I can use for tweeting on my behalf at whatever time I schedule beforehand.
Those applications are a typical example of a client that needs to have an access token always
available for calling the Twitter API on my behalf, regardless of the fact that I am currently signed
in an active session or blissfully still asleep. This is one of the classic scenarios, offline access,
demonstrating the need and intended usage of a very important artifact - the refresh token.
Without further ado, let's dive into the details of the authorization code grant with the help of
64
Figure 4.1
65
The diagram depicts the usual actors we encountered in Chapter 2:
The authorization server, on top. Note that this time both the authorization and the token
endpoints are present in the picture, as both will come into play
The API the web app needs to call as part of our scenario
Just like we did during the first explanation of the OAuth2 flow in Chapter 1, section Delegated
Authorization: OAuth2, here we assume that the user is already signed in with the web application.
We don't know how that sign-in operation occurred, and we don't care in this context - the API
invocation operation can be performed independently of the sign-in (although we will later see,
in the section on hybrid flow, that there are potential synergies there). Let’s examine the message
sequence in detail.
1. Route Request
The user hits a route of the web application that, in our sample scenario, allows the user to
API on behalf of the user; hence, accessing that route causes the web app to generate a
Note, if you compare the equivalent step in the flow described in Chapter 3, section
The Implicit Grant with form_post for the sign-in operation, you will notice that the web
app does not have a middleware in front to intercept the route request. In this case, the
route isn’t the asset we want to protect requesting that route just happens to be the thing
66
that triggers the need to acquire a token to call an API. The logic necessary to generate
the associated delegated authorization request is, in fact, inside the app codebase itself
2. Authorization Request
The reaction from the application to the request is somewhat familiar: a 302 HTTP status
code response with a message for the authorization server. You can see a number of differ-
ences with the equivalent step 2 in section The Implicit Grant with form_post of Chapter 3.
First, we are setting a cookie to track the nonce value (see Chapter 3, section Authorization
Request Redirect for more details), as besides the access token needed for accessing
the API, we'll also be asking for an ID token. The ID token is useful in this flow, knowing a
bit more about the transaction, given that the access token itself is opaque to the client.
Ignoring the audience parameter for a second, the next entry is the client_id - representing
The response_type for this particular grant is code. We want to obtain a code from the
authorization endpoint, which the web app will later exchange via token endpoint for an
access token.
We don't need to specify the response mode because we are okay with a default response
mode, which in the case of code response type is query - meaning that we expect the
67
Next, we find the scope parameter. This message includes all the same scope values
encountered earlier, openid, profile, and email, indicating that we require an ID token
alongside the code. This time, however, we aren’t requesting an ID token for sign-in
purposes: we just want to have some information about who is the resource owner grant-
ing permission in this transaction. Without an ID token, that is to say, something the client
itself can consume, we would have no way to know. We'd just blindly get an access token
and use it, with no indication about the identity of the user who obtained it.
The scopes collection includes a scope value we didn’t encounter yet, read:appointment.
That scope value represents a permission exposed by the API we want to invoke: in other
words, one of the things that can be done when using that particular API, and that can
request, the client is saying to the authorization server: “This web application wants to
exercise the read:appointment privilege on behalf of the user”. That's something that the
authorization server needs to know. It will determine important details in the way the
request is handled, such as the content of the consent prompt presented to the user and
The next parameter represents the redirect URI, which you are already familiar with.
The last parameter in the captured message is the nonce, a token injection prevention
Now that we covered every message parameter in detail, let’s revisit the audience
parameter. When requesting an access token for an API protected by Auth0, a client is
required to specify one extra parameter, called audience, indicating the identity of the
68
The core OAuth2 specification does not contain any parameter performing this function,
mostly because there is an underlying assumption (though not a requirement) that resource
server and authorization server are co-located. This assumption makes it unnecessary
to identify which resource server the request refers to. For a concrete example of this
scenario, consider how Facebook uses OAuth2 for gating access to its Graph API. The
Facebook authorization server can only issue access tokens for the Facebook Graph API;
there is no other resource server in the picture. The only latitude left to clients is to specify
different scopes for that one resource server, the Facebook Graph. Different scopes will
express different permissions and operations I intend to exercise, but they will all refer to
the same resource server, which doesn’t need to be explicitly named in the authorization
request. Similar considerations hold for Google, Dropbox, and other popular services:
whenever clients get tokens from those services, they are always calling the provider’s own
APIs, whose identity results self-evident from the context without requiring an identifier in
the request.When the solution includes a 3rd party authorization server, like in the case
of an Auth0 customer leveraging the Auth0 authorization server to secure its own custom
API, the topology makes it possible for the same authorization server to be used to gate
access for a multitude of resources, which can all live in different places. In that scenario,
the client does need the ability to specify which resource it intends to request access to.
There are multiple ways a message could be constructed to include explicit references
to a particular resource server. For example, an API might embed a resource server
identifier in individual scope strings themselves. However, the approach has issues: scope
strings could get really long and hard to read. Also, including multiple scopes referring to
different resources in the same request might generate ambiguity about which resources
69
Given those complications, Auth0 and other identity vendors decided to introduce a
dedicated parameter for identifying resources. Azure AD, for example, has a resource
Since those individual vendor decisions have been made, the IETF OAuth2 working group
officially recognized the usefulness of such primitives and issued a new specification,
OAuth2 Resource Indicators. This specification extends OAuth2 with a resource parameter,
which is, to all intent and purposes, equivalent to Auth0’s audience. We plan to start
4. Authorization Response
Upon receiving the authorization request, the authorization server takes care of the
interactive portion of the flow.The authorization endpoint decides what's necessary for
authenticating the user and goes through it. Then presents them with a consent prompt
saying: "Hey, client X wants to read appointments on your behalf." The moment in which
the user grants consent, the authorization endpoint returns its response with the requested
authorization code in the query string, in accordance with the response_type we asked for.
Also, the response includes the usual set-cookie command with which the authorization
server records in the browser that an authentication session has been established.
70
5. Providing the Authorization Code to the Web App
At this point, the browser simply executes the redirect that will dispatch the authorization
code to the web application. From this moment on, the web application will continue the
sends them in a message to the token endpoint.The message to the token endpoint is in the
form of an HTTP POST request where the app presents its client_id and client_secret, the
authorization code received from the front channel, and a new parameter, the grant_type.
Figure 4.2
71
Every time an application talks to the token endpoint, it has to specify the desired grant
type letting the authorization server know how to interpret the request. In this particular
case, the desired flow is the authorization_code grant. That tells the authorization server
to search for an authorization code in the message, and to consider the client ID and secret
in the context of this specific grant. If, for example, the request would have specified
client_ credentials as the grant type, a flow we’ll discuss later on, then the authorization
server would have ignored the authorization code, would have looked only at the client
ID and client secret and would have considered only the identity of the client application
itself rather than the consent options of the resource owner implied by the authorization
code. In other words, the grant_type parameter is used to disambiguate the flow the client
The request also includes the audience for the reasons stated earlier. In this particular
case, audience is redundant. The authorization code has been granted in the context of
that audience, and the authorization server knows it - hence there’s no need to provide
it again in this request. However, some extra clarity can be beneficial: for example, this
helps to interpret what this request is for while examining a network trace, without the
need to correlate it with the earlier messages that led to this point.
Finally, the message contains a redirect_uri parameter. In this phase, the authorization
server doesn’t really have any opportunity of performing redirects, given that the client
is talking to the authorization server via a direct channel. Rather, the redirect_uri is used
will verify that the redirect_uri presented here is identical to the one provided during the
authorization code request leg of the flow, preventing an attacker from performing URI
replacement.(see https://tools.ietf.org/html/rfc6749#section-10.6).
72
7. Receiving the Access Token in the Token Endpoint Response
Assuming that the request is accepted by the authorization server and processed without
issues, the grant concludes with a response message carrying the artifact originally
An ID token, in response to the presence of the openid in the list of requested scope values
The token type, which is always Bearer for the time being - as discussed in the token validation
section
The expires_in parameter, expressing the time through which the access token should be
considered valid. Although at times the access token itself might contain that information, and
happen to be in a format that can be inspected, access tokens should always be treated as
for the client to be able to use that information (for example, for deciding for how long an
Important: access tokens should always be assumed and treated as opaque by client
applications because their content and format are a private matter between the authorization
server and the resource server. The terms of the agreement between the authorization
server and the resource server can change at any time: if the client app contains code
that relies on the ability to parse the access token content, even minor changes will break
73
Imagine a case in which access tokens, initially sent in the clear, start being encrypted in
a way that only the intended resource recipient can decrypt. Any client will lose access
to the token’s content. Client code relying on the ability to access the token content will
irremediably break.In summary, avoid logic in client applications that inspects the content
of access tokens. Note, examining the content of a token in a network trace is perfectly
fine for troubleshooting purposes, as the information will be consumed via debugging
to do is to include the access token bits in a classic REST call. In this particular example,
the call is a GET, but any REST invocation style is possible. The key feature in that message
is the Authorization HTTP header, exhibiting the Bearer authentication schema, carrying
The OAuth2 Bearer Token Usage specification, the document describing how to use bearer
tokens obtained through OAuth2 for accessing resources, says that it's possible to place
the token elsewhere in the outgoing request, for example in the body of a call - or even
a request link, as a query parameter. Encountering clients that send tokens in the body is
very rare. The use of the query string for sending access tokens is actively discouraged,
as it has important security downsides. Consider the case in which your client is running
in a browser: whenever a token is included in the query string, its bits will end up in the
browser history. Any attack that can dump the browser history will also expose the token.
Moreover, if the API call is immediately followed by a redirect, the query string will be
available to the redirect destination host in the referral header: once again, that will expose
74
For those and other reasons, it is reasonable to expect that the near totality of the API calls
encountered in the wild that rely on OAuth2 will use the Authorization HTTP header.
authorization code flow should leverage Proof Key for Code Exchange (RFC 7636), an extension
to the authorization code grant meant to protect authorization code from being stolen in transit.
PKCE was originally devised for public clients, where it performs essential security functions that
we’ll describe in detail in the next chapter. Its use for confidential clients is not as critical, as there
are other measures already in place (state, nonce checks) mitigating other aspects coming into
play in associated attacks. This is why we have chosen to keep this section light and to defer
introducing PKCE in the next chapter, as you will be more familiar with the original grants, and
it will be easier to add PKCE as an incremental step. However, we wanted to make a note of the
BCP guidance already here, so that if you read about it elsewhere you’ll know what it is all about.
OAuth 2.0 offers a delegated authorization framework. Unfortunately, developers often disregard
the “delegated” part, and attempt using OAuth primitives and flows to solve pure authorization
scenarios that the protocol hasn’t been explicitly designed to address. The outcome is solutions
that might appear to work in toy scenarios but fall short as soon as the approach is applied in
For that reason, it is a worthwhile investment to spend a few paragraphs discussing essential
concepts and terminology in authorization, spelling out explicitly their relationship with OAuth -
75
and in particular, what is part of OAuth and what is instead a property of the underlying resources
we are exposing.
Permissions
Imagine that you want to expose programmatic access to an existing resource. Depending on
the nature of the resource, there will be varying sets of operations that can be performed on it,
or with it. In the context of a document editing system, users will be able to see, read, comment
on, or modify documents. An API facading a printer might expose the ability to print in black and
white or in color. Any kind of resource will have a set of permissions that make sense for that
particular resource, and that can be allowed or denied for a particular caller. A permission is just
that, a statement describing the type of things that can be done with a resource: document.read,
Permissions describe intrinsic properties of resources, which exist regardless of how those
resources are exposed. OAuth2 solutions might surface them if they happen to be useful in the
context of a delegated authorization solution involving those resources. Still, in the general case,
permissions exist in their own right and will be used outside of OAuth as well.
Priviledges
A privilege is an assigned permission: it declares that a certain principal (say, John) can perform
a certain operation on a given resource (say, calling the printer API to print in full color).
As it was the case for permissions, the concept of privilege exists independently of OAuth 2
(or any other higher-level protocol, for that matter). For example, the framework necessary to
describe privileges needs primitives for principals (users and apps to whom permissions might
76
The existence of permissions and privileges applied to a set of resources will influence the
behavior of OAuth 2 solutions based on those resources, but how that will happen is not described
Scopes
Finally, we get to talk about an OAuth primitive. In the case in which a resource needs to be
exposed in the context of a delegated authorization solution, the scope is the primitive that
enables a client application to request exercising the privilege of a user for a particular permission
for a given resource. The mechanism that the client uses for expressing this to the authorization
being requested. When used with this semantic - that is, lists of permissions for a given resource
- scopes are used to define the subset of user privileges that a client application wants to
exercise on behalf of the user. Note that the scopes can be used for other purposes: we have
seen examples of that in the case of openid (requesting the presence of an extra artifact, in that
Effective Permissions
Consider a classic delegated authorization flow in which a client requests to the authorization
server to access a resource. In particular, the client specifies what permissions will be required for
the operations it intends to perform on the resource. Upon receiving the request and authenticating
the user, the authorization server will typically prompt the user to grant the app delegated access
to the corresponding permissions. The user granting consent through that prompt is effectively
saying "Yes, I'm okay with this particular client exercising on my behalf the privileges being
requested".
77
Say, for example, that the client implements an email solution, and the permission it requests is
mail.read. The scope requested is mail.read and the access token being returned will include (by
Once the client obtains the access token, it will use it to make a call to the API, requesting to read
a list of email messages. The middleware protecting the API, upon receiving and validating the
access token, will verify that the scope it carries includes mail.read, the permission required by
the API to perform the read operation requested, and allow the request to move along.
But the authorization checks aren’t over yet! Imagine that the client requests the list of emails
from the inbox of a user who’s different from the user who granted consent and obtained the
access token. Should the API allow the request to succeed? Of course not! Scopes do not create
privileges where there are none. Scopes can grant to a client a subset of the privileges a resource
owner has on a resource but can never add privileges the resource owner didn’t have to begin
with. The effective permissions are the intersection of the privileges a resource owner has
and the scopes that have been granted to the client. The effective permissions represent what
a client can actually do, and that can be a subset of what’s declared in the scopes. You always
need to check at runtime whether the scopes represent something that the resource owner
can actually do for the resource being accessed. Also, note that there is no guarantee that the
privileges the resource owner had at the moment of granting consent will be preserved forever.
Hence, even if your authorization server conflates scopes and privileges (for example, by only
allowing a user to consent if he or she possesses the corresponding privileges), nothing prevents
some of those privileges from being revoked at a later time. This makes it necessary for the API
to check rather than just relying on the scopes in the incoming access token.
This is one subtle point that is often misunderstood in the context of OAuth.
78
Note that OAuth can also be used for application to application flows, in which no user is involved.
The client obtains an access token for a resource from the authorization server only through
its own client credentials, as opposed to requesting access on behalf of a resource owner. You
could say that in those scenarios, the client application itself is the resource owner: there is no
delegation, hence there’s no need for scopes to limit the privileges involved. We will study the
corresponding OAuth 2 grant, the client credentials grant, in a later section in this chapter. In this
case, it's not completely clear how permissionsare expressed, as the core OAuth 2 specifications
don’t provide any mechanism to express assigned privileges (though there is a new specification,
the JWT Profile for OAuth2 Access Tokens, that does introduce some guidance about that).
Regardless of the implementation details of how those privileges are expressed, this is a case
in which privileges are actually carried in the token. There might be other cases where the
authorization server includes user privileges, roles, group memberships, and other authorization
information in the access token. Those cases are all valid and represent real, important scenarios.
However, they aren’t described by the specifications we are studying in this book, so we will not
Finally, consider that although scopes often map to permissions, that is not always the case.
Remember the openid scope? Its presence in a request just causes an ID token to be included in
the response from the authorization server. Or think about the profile scope, which, when added
in a request, causes the ID token to include claims that wouldn’t be present otherwise. So, it's
easy to make the mapping between permission and scope. Scopes do correspond to permissions
in many common cases, which might erroneously create the belief that scopes, and permissions
are the same concepts, but in fact, it’s important to remember that they aren’t.
79
The Refresh Token Grant
Let's now go back to grants. I mentioned this in passing earlier on: tokens typically have an
expiration time. They have an expiration time because a token is caching a number of facts and
user attributes, and those facts might change after the token has been issued.
Also, the ability of a client to obtain a token at a given time doesn’t guarantee that the same
client will be able to reobtain the same token in the future. For example, the resource owner
might visit the authorization server and revoke consent for that client to obtain tokens with the
scopes previously granted. This makesthe content of any previously issued tokens obsolete as
they no longer reflect the current situation. The idea is that by endowing tokens with a short
duration, we ensure that the client cannot really use them (and hence, the information they
cache) for too long. Upon token expiration, clients will be forced to call back home and repeat
a request to obtain a new token. This new request creates the opportunity for the authorization
server to issue a new token containing up to date information or refusing to issue a new token
if conditions changed (e.g., the user account has been deleted from the system). The shorter
the token validity interval, the more up to date the issued information will be. Solutions typically
seek compromises that balance that with performance and traffic considerations.
Of course, this brings another challenge, which is: although we do want up to date information, we
don't want to give users a bad experience to achieve that. The user should be blissfully unaware
of all the low-level mechanisms unfolding behind the scenes to achieve those updates. We need
to empower clients to renew tokens in a way that does not impact the user experience. The way
in which OAuth solved this is by introducing a new artifact, that we call the refresh token, and
80
The first step to work with refresh tokens is to request one. The OAuth 2 core specification
doesn’t define a mechanism to request refresh tokens, leaving the decision to issue one to
individual authorization servers. However, OpenID Connect does define a mechanism to request
refresh tokens, and the result is that a large number of OAuth 2 authorization servers adopt that
mechanism as their main (or even only) way of requesting refresh tokens.
Let’s revisit the authorization code grant examined in an earlier section and add a few small
81
Figure 4.3
82
The original message in step 3 carried the list of scope values the client required to request an
ID token with rich attributes content (openid, profile, email) and the access level required for the
operations the client intends to perform (read:appointment). The message in step 3 in Figure
4.3 contains an extra scope value, offline_access. This is a scope value defined in the OpenID
Connect core specification: its presence in a request asks an authorization server to include a
refresh token in its token endpoint response, alongside all the other artifacts (in this case, an ID
token and an access token). In particular, the validity of that refresh token will extend beyond
the duration of the authentication session within which it has been issued. Don’t worry if that’s
not very clear for now. We’ll expand on what that means later in this section.
If you observe step 7 in the diagram, you’ll see that as expected, the authorization server returns
a refresh token along with the usual access token and the ID token.
Now the client has a refresh token in its possession. Let's take a look at how the client uses
it, and in particular how the refresh token makes it possible to get new access tokens without
prompting the user again. The entire flow occurs on the server side, as it entails the client (in this
case, a web app whose code runs on the server) connecting directly to the token endpoint of the
authorization server. The browser, used to send the request and drive the interactive portions
of the transaction, is now entirely out of the picture. Follow the numbered steps in Figure 4.4.
83
Figure 4.4
84
1. Request Configuration
The first leg of the grant takes the form of a typical token endpoint request, analogous to
The client_secret. This is a confidential client, hence requests to the token endpoint require
The new refresh_token parameter, carrying the refresh token bits received earlier.
The grant_type. As mentioned earlier, every request to the token endpoint must specify the
grant the client intends to use. In this case, the parameter value is refresh_token.
The redirect_uri parameter, included for the same security reasons specified in the code
the original request included openid), and the list of scopes that were granted when the
refresh token was obtained to begin with, in this case, during the authorization code grant.
The reason the authorization server returns the list of granted scopes is that the client
might not really know what this particular refresh token was originally granted with, or if
the conditions at the authorization server changed since its original issuance. Furthermore,
the client can request a certain list of scopes, but the authorization server can always
85
decide to return a subset of those scopes. In that case, if the authorization server wouldn't
return the list of scopes that have been granted in the context of this particular refresh
token redemption, the client would have no way of knowing. Even if it remembered the
that such a list would be accurate. Remember that the client is bound to consider the
access token as opaque, hence it cannot simply look into the access token to find out.as
opaque, hence it cannot simply look into the access token to find out.
In this particular case, the authorization server does not return a new refresh token
alongside the access and ID tokens. The client is expected to hold on to the refresh token
bits it received on the first flow and keep using it until expiration.
There are various scenarios in which the authorization server does include a new refresh
token at every refresh token grant. The most notable case is in the context of a security
Token rotation guarantees that, whenever you use a refresh token, the bits of that particular
refresh token will no longer work for any future redemption attempts. Every use of a refresh
token will cause the authorization server to invalidate it and issue a new one, returned
alongside the refreshed access token. Clients need to be ready to discard old refresh
Any attempt to use an old refresh token will cause the authorization server to conclude that
the request originator stole it. That might trigger protective measures, such as invalidating
all the other tokens that have been created in the same authenticated session, in case
the leak indicates a compromised application. Note that this measure might be overkill for
86
confidential clients, where use from legitimate clients is enforced by requiring applications
to use their client_secret when redeeming refresh tokens. However, it is extremely useful
for public clients, where apps can redeem refresh tokens without the need to exhibit any
app credentials. More details about this will be discussed in the next chapter on native
considerations about calling API according to the OAuth2 Bearer Token Usage specification
apply.
authorization server that the resulting refresh tokens lifetime will be decoupled from the lifetime
of the authenticated user session within which the grant was performed. In other words, whether
a user is signed in or not signed in with an application via the front channel doesn't really matter
with respect to whether the same application is able to redeem a refresh token. Also, the fact
that the app can still use a valid refresh token doesn't say anything about whether there’s an
active sign-in session for the user that helped obtain that refresh token in the first place. The two
things are completely separated. The scenario that offline_access is meant to support is the one
that I described at the beginning of the chapter, where a user wants to schedule a tweet to be
published at a future time regardless of whether the user will be signed in at that time or otherwise.
In more general terms, it addresses the case in which an application might be in need to obtain
a valid access token to invoke an API even if no user is present to tend to interaction requests.
87
One common mistake developers make is to interpret the ability of an application backend to
redeem a refresh token as proof that the user still has a session. Per the above explanation, this
is a dangerous mistake that can lead to resurrecting sessions already expired or terminated via
When developing applications that need to invoke APIs even without an active user session,
the app clearly needs to persist refresh tokens so that they are available independently of the
presence of an interactive session. Even for cases in which API calls are scoped to the interactive
session lifetime, tokens need to be saved somewhere other than in memory if you want to spare
users from having to go through token acquisition flows in case the webserver memory recycles.
Of course, persisting refresh tokens (and tokens in general) requires caution. It’s important to
make sure that tokens are stored per user, to prevent the possibility of a user ending up accessing
and using the refresh tokens associated with another user. That's just the same basic hygiene
required to enforce session separation, but when it comes to tokens, the need to follow best
practices is all the more critical given the high impact of identity mix-up and the complications
that derive from persisting user data beyond the interactive session lifetime.
To close the topic of refresh tokens for this chapter, here’s a last recommendation. Even if you
know the expiration time associated with a refresh token, you should still not rely on that in your
code. There are many reasons for which a refresh token might stop working, regardless of its
projected expiration. For example, a user could revoke consent, immediately invalidating refresh
tokens issued on the basis of previous consent. Another example: a resource server might change
policy and establish that, from that moment on, it will only accept access tokens obtained via
multi-factor authentication. This renders any refresh token obtained with a single-factor session
unable to obtain viable access tokens and forces the client to reobtain a new refresh token via
multi-factor authentication. Again, all this may happen regardless of the declared expiration of
88
the original refresh token. For all those reasons, it is prudent to develop client code assuming
that a refresh token might stop working at any time and embed appropriate error management
You now had the opportunity to see both access tokens and ID tokens in action. Just as important,
you learned about the reasons for which both artifacts have been introduced by OAuth 2 and
OpenID Connect in the first place. It is worth stepping back for a moment and summarizing the
differences between the two token types, as confusion about when to use what is one of the
on behalf of a resource owner bestowing the client application with delegated authorization. As
Earlier on, we discussed the implications of the common topology where authorization server
and resource server are co-located, making it possible for them to access shared memory and
Conversely, consider an authorization server separated from the resource servers, as it is the case
with identity as a service offering like Auth0, where the same authorization server is shared by
multiple resource servers owned by different companies. This is a scenario that can really benefit
from agreeing on a format and using it for validating incoming tokens, even if the protocol doesn’t
89
offer anything out of the box. The use of JWT as a format for access tokens is so common that
there’s a standardization effort currently ongoing to define an interoperable profile for it - which
I happen to be currently driving. You can find the latest draft (at the time of writing) here.
At the cost of being pedantic, it should be stressed that, as a client app developer, you should
never write code that inspects the access token content. The fact that in some cases you might
know that a specific token format is being used doesn’t change this, as the reasons for which it’s
not a good idea are more about the contracts between client, resource and authorization server. In
fact, it will often be happenstance that you have a chance to look inside an access token, and the
situation might change at any time. The format used in an access token is a matter agreed upon by
the resource server and the authorization server, and the details can change at any time to their
discretion without informing the client. Any code predicated on assumptions about the access
token content will break as soon as those assumptions no longer hold, and on occasions without
any remediation. Think of information being removed, or the content beingencrypted so that no
entity, but the access token intended recipient can inspect it. Although during troubleshooting
it is legitimate for a developer to read whatever information is available, including the content of
captured tokens, developing code that does so routinely will very often result in downtimes and
ID Token Recap
ID tokens are designed to support sign-in operations and, optionally, make authentication
information available to clients. They don’t contain any delegated authorization information
(though nothing prevents implementers from extending the default claims set described in the
specifications with their own custom values). ID tokens come into play during user sign-in, and
clients can use them to learn about what happened during the authentication flow. Whereas
90
clients should really not inspect access tokens, as discussed in detail just a few paragraphs
earlier, clients must look inside ID tokens - that’s part of the validation step described in the Web
One of the most common points of confusion about ID tokens is whether they can be used for
calling APIs. The short answer is that they shouldn’t. Let’s invest a few moments to understand
why people attempt that, and why it’s generally not a good idea.
ID tokens are designed to support sign-in operations. The client app is simultaneously the
requestor and the recipient of the ID token: once the token has been received by the client, it
has reached its intended destination and isn’t meant to travel any farther. All the client needs to
do with it is to validate it and extract user attributes, when present. Both are operations that can
be done locally, thanks to the fact that ID tokens have a fixed format, and the OpenID Connect
specification details how to perform validation. The ultimate proof that the ID token shouldn’t
leave the client app lies in the aud claim, formalizingthat the client app is the intended recipient
by carrying its client_id value. We have discussed all this in Chapter 3, Anatomy of an ID Token.
Nonetheless, there are real-world situations in which client apps do use ID tokens for invoking
API. Often, that is due to designers not fully understanding the underlying protocols, and in
particular, the role of the audience claim. For them, a JWT is a JWT, and ID token is often easier to
obtain as it doesn’t require registering APIs, defining scopes, and adapting validation techniques
to each specific authorization server requirements. For example, some will not use JWT as the
format for access tokens and will require supporting introspection calls. Some others might not
be designed to protect 3rd party API at all, hence not offering API registration and access token
issuance and validation features, but still issue ID tokens for sign-in purposes.
91
Nonetheless, in the general case using ID tokens for invoking API has issues. The main problem
goes to the heart of why we have audiences in the first place. An API receiving an ID token can
only verify that the token was issued for that particular client: there’s nothing in the token saying
that it was issued with the intent to call this particular API. Besides the practical issue of being
unable to insert ad-hoc claims for that particular API, there are serious security concerns: a
leaked ID token can now be used not just to access the client, but also to invoke this API and all
Whereas properly scoped tokens would contain the blast radius of a leak event (an access token
scoped to API A can only be used with A), many APIs accepting an ID token means that they would
all be compromised at once. This also makes it really hard to maintain separation between API:
if both A and B accept ID tokens, that means that when the client calls A, A can turn around and
use the same token it received from the client to invoke B. Although that might be acceptable at
times, in the general case, this should never happen as a side effect.
Lastly, I will mention that the use of ID tokens for calling APIs cannot be secured by sender
constraint, asthe protocols supporting it won’t provide any mechanism to associate the ID token
For the sake of exhaustiveness, I want to acknowledge a particular situation where the use of
ID tokens for calling an API might not be disastrous, though it’s never as good as using access
tokens. Consider the case in which the client app and the API in itself happen to be the same
logical application. That’s the scenario commonly described as “1st party app”, where both
ends have the same owner and are tightly coupled to implement a given solution. Think of a
social network API and its client app, for example. In this case, the solution won’t strictly require
delegation, the incoming token will likely be expected to identify the user, and the tokens issued
92
to that client won’t be accepted by any API other than the 1st party one (if you exclude cases
where individual app owners decide to accept them anyway, which are outside the control of
From the end-user perspective, the client+API ensemble constituting the solution is a logical whole
- my experience of using my Twitter account through the Twitter app doesn’t usually require any
special consent where the APIs are explicitly called out. In that case, one could argue that the
component of the app requesting the token and the component implementing APIs are, in fact,
the same entity, which could be represented by the same identifier - hence, here’s the crucial
step, targeted by a token with the same audience… just like an ID token.
Once, in front of a beer, one of the authors of the OpenID Connect specification told me that
an ID token is just an access token with specialized semantics. That said, it’s still generally not
worth it to ever use ID tokens for calling APIs. Although narrowly defined 1st party scenarios do
exist, those would still be better off when implemented with access tokens (think about sender
constraint limitations mentioned above) and the risk of overreaching and using the ID token in
ways that expose you to serious security risks is just too great. I mentioned this particular case
here because you are likely to encounter that approach in the wild if you work in this space
long enough, and I wanted to empower you to understand the nuances and point of view of
the people following that approach: however, the best practice remains using access tokens for
calling APIs. If you need JWT access tokens, the aforementioned JWT profile for OAuth2 access
93
ID Tokens and the Back Channel
OpenID Connect offers multiple different ways of signing in. The one we studied in the preceding
chapter leverages the front channel. It relies on the implicit flow (that is, issuing an ID Token
directly from the authorization endpoint) plus form post (transmitting the token to backend hosted
logic, as it is the norm for redirect based apps). That flow is just the one that happens to have
the least number of moving parts, as it doesn’t require the client app to obtain, manage, and
use a client secret. It also is the flow that has more or less the same security characteristics as
traditional protocols such as SAML or WS-Federation, which are still in very wide use in mission-
The authorization code grant we just studied in this chapter for calling the API can and is commonly
used for performing sign-in operations - by obtaining ID tokens following the same steps we
studied for requesting an access token. Say that you are in a scenario in which, for some reason,
you don't want to disclose the bits of the ID token to the user’s browser: by using the authorization
code grant, you can make everything take place on the server side. You can just perform an
authorization code grant in the same way we did for getting a token for calling the API: you
just ask for an ID token as well. Note, that’s exactly what we did in our API calling scenario, by
including the openid scope in the initial request. All we need for making that operation count as
sign-in is to validate that ID token and create a front channel session on the base of its content.
The notable difference from the front channel is that, given that the client obtains the ID token
from a direct HTTPS connection with the token endpoint, there is no uncertainty about the
source from which the ID token bits came from. The client knows for certain that the ID token
comes directly from the authorization server, with no intermediaries that could have tampered
with the content in transit. And with origin and integrity verified, there is no need to validate the
94
ID token’s signature. Think about it: if you were to validate the signature, you’d use the key you
retrieved from the discovery document. And why do you trust that it is the right key? Because you
retrieved the discovery endpoint over an HTTPS direct channel! The same assumptions hold for
the ID token retrieval from a direct connection with the token endpoint, which is why the client
What’s very, very important to understand is that not having to verify the signature does NOT
mean that the client is allowed to skip token validation! The client is still meant to validate
audience, issuer, expiration times, and all the other checks that the OpenID Connect specification
describes for the ID Token validation. The signature is only one of the many checks a recipient
should perform to validate incoming tokens, even in the front channel case.
Obtaining an ID token via authorization code is technically more secure than receiving it through the
front channel. However, this technique is more onerous, as it requires the client to obtain, protect
and use an application credential - that has a management cost, associated risks (like forgetting
a secret in source control), performance, and availability challenges (extra server calls). If your
application only needs to sign-in users and don’t have particular constraints about having tokens
transit through the browser, the front channel technique works fine - as demonstrated by many
years of successful SAML deployments using similar techniques to protect high-value scenarios.
If you are indeed in a situation that calls for higher security, or if you are already performing API
calls requiring the authorization code flow anyway, you might consider implementing sign-in via
95
The Userinfo Endpoint
A client requesting an ID token without specifying the profile and email scope values will receive
a skeleton token stating that user X (as expressed by an opaque identifier, usually) successfully
authenticated with issuer Y. The token also specifies the time and perhaps the authentication
There might be multiple reasons for which a client might opt for such barebone ID token content.
For example, a client might want such a token to use an easy to set up front channel sign-in flow
while avoiding disclosure of personally identifiable information (PII) to the browser. Alternatively,
clients might go that route simply to reduce the size of transferred data on a network that doesn't
have a lot of bandwidth, or on a metered connection where bigger ID tokens might result in the
The good news is that clients can opt to work with barebone ID tokens and still gain access to
user attributes when necessary. OpenID Connect introduced a new API endpoint, called Userinfo
endpoint, which can be used for retrieving information about the user by presenting an appropriate
access token - following the same OAuth2 bearer token API calling technique studied earlier in
this chapter. Whenever the client needs to know something about the user, whether it didn’t save
the initial ID token or received a barebone one, it reaches out to the Userinfo endpoint using a
previously obtained access token. It will receive what substantially is the content that the client
would have gotten in an ID token requested with profile and email scopes.
The first chapter described the evolution that led from OAuth 2 to OpenID Connect. A key
passage was about a particular way of abusing OAuth for simulating sign-in, where the ability
to successfully call an API with an access token was considered proof enough for the client to
96
consider a user signed in. That had several problems: access tokens could not be tied to a user
in particular (very important if you aretrying to authenticate, that is, to sign-in), could not be
proven to have been issued as part of a sign-in operation for that app in particular, and could not
be standardized given that every provider protected API of different shape (Facebook Graph,
The Userinfo endpoint resolves the first and the 3rd problem. The Userinfo response does provide
information about the user that obtained the access token used to secure the call to begin with
- and it’s standard, hence generic SDKs can be built to work against it. That makes it possible
for a client to implement pure OAuth 2.0 to retrieve user information in a standardized fashion.
It is very important to realize that, however, successfully calling the Userinfo endpoint is NOT
equivalent to validating ID tokens and alone CANNOT be used to implement sign-in. Calling the
Userinfo endpoint is not equivalent to validating a token, it does NOT count as sign-in verification.
Calling the Userinfo endpoint only proves that the corresponding access token is valid and
associated with the user identity whose attributes are returned: it does NOT prove that the
access token was issued for that particular client. OpenID Connect sign-in operations ALWAYS
require validating an ID token, although, as we have seen in some circumstances, the signature
Another thing to keep into account when considering using the Userinfo endpoint from a
confidential client is that all the discussions about the burden of using a secret apply here, as
After all that preamble, let’s take a look at how an actual call to the Userinfo endpoint takes
place. As usual, we are going to explain each step - please refer to the numbered messages in
97
Figure 4.5
98
1. Userinfo Request
The scenario in the diagram assumes that the client has already obtained a suitable access
token for calling the Userinfo endpoint. Invoking the Userinfo endpoint is simply an HTTP
You might notice that in this particular network trace, the access token value looks different
from all the other tokens shown in the diagrams so far. Whereas token values in earlier
diagrams were always clipped for presentation purposes, and their shape suggested the
classic JWT encoding, the bits on display here are the entirety of the access token and
don’t appear to follow any known pattern. That's because calling the Userinfo endpoint
is precisely a scenario in which the use of opaque, formatless tokens makes sense. The
Userinfo endpoint is co-located with the authorization server: there is no need for cross-
boundaries communication. The entity that issued the access token in the first place is
the same entity responsible for validating it during the Userinfo API call. That means that
the two tasks can access the exact same memory space. In concrete terms, this means
that the access token intended to access the Userinfo API doesn't need to be encoded
in any particular format. It can literally be the identifier of a row in a database that was
created at issuance time and can now be looked up at API invocation time, or any other
This is a luxury we cannot afford when the API being invoked is managed by a 3rd party and
hosted elsewhere. In this scenario, the parties involved are forced to rely on token validation
lack of shared memory between the entity issuing the token and the entity consuming it.
99
2. Userinfo Response
The response returned by the Userinfo endpoint contains pretty much the same
list of claims carried by an ID token obtained via a request that includes the profile
scope.
The hybrid grant is, as the name suggests, a mix of multiple flows into one. It combines a sign-
in operation (getting an ID token from the front channel) and obtaining an access token for
invoking an API from the client backend (by requesting and redeeming an authorization code).
That saves network round trips, consolidates prompts and consent requests, and is, in general,
a very efficient way of performing a sign-in operation while getting ready to invoke API at the
same time. No diagram is shown for the hybrid grant, as you can easily piece it together yourself
by combining the web sign-in flow diagram in the preceding chapter and the authorization code
flow shown here. OpenID Connect is unique in this ability to mix and match sign-in and calling
APIs and having entities playing both roles: a “resource”, as in something being accessed as part
of the sign-in access, and a client, consuming other resources such as API. The fact that the app
in OpenID Connect is always called a client, emphasizing the latter role and omitting the former,
is a nod to its OAuth 2 origins (and to the fact that “resource” in OAuth 2 is reserved for APIs).
The hybrid grant is a really powerful tool that is commonly used in applications. In fact, today, it's
pretty rare to be able to state that an app will forever either only require sign-in, or only call APIs.
It's usually a continuum, and the availability of this grant makes it easy to add one functionality
or the other by simply modifying either the implicit plus form_post grant or the authorization
code grant.
100
Client Credentials Grant
In the last section of the chapter dedicated to invoking API, we will study the client credentials
grant - a flow defined by OAuth 2 for the cases where a client needs to get access tokens using
its own programmatic identity, rather than doing so on behalf of a user. Unlike the grants we
examined so far, the client credentials grant has no public client variant - it can only be performed
by a confidential client.All the flows examined so far for API are designed to grant clients delegated
access to resources, that is to say, enabling clients to “borrow” some of the user’s privileges
There are a number of situations in which clients need to operate as themselves, rather than
on behalf of a user. These are scenarios in which the application has an identity and has direct
resource privileges in itself. That class of scenarios doesn’t require a user to be signed in or
otherwise present. Even if a user happens to be signed in at that time of access, their privileges
might not be the ones the client needs to exercise. A classic example of that scenario occurs
when an application needs to perform an operation that the currently signed in user has no
privilege for. Imagine, for example, a continuous integration (CI) web app in which the final step
of a build process is taking the binaries of a compiled product and saving them in a particular
One way of working around the problem would be to open the floodgates and give every user the
permission to access that share. That would preserve the CI’s ability to call the share in delegated
access mode. However, the risk for abuse would be very high: users might choose to exercise
101
An alternative would be to give privileges for file share access to the application itself. In turn,
the application can feature logic that determines which users should be able to write to the
share. So, it can use its own write privileges to perform write operations only for the appropriate
user sessions, and only within the limits of what the CI logic requires. Said in another way, by
granting the application itself the privileges required to access a resource, the responsibility of
determining who can do what is transferred from the authorization server to the application itself,
One common way of referring to the aforementioned pattern is to say that the application and
To use a real-world analogy, consider how a classic amusement park handles visitors’ access. At
the entrance, a visitor pays for a ticket and is given a bracelet or equivalent visible sign that the
individual paid for access. This sign does not need to bear any indication of the identity of the
wearer. Once the guest is in, she can enjoy every ride without any further access control check
Similarly, once a user signs in with the CI web app, all the subsequent calls to the downstream
API will be performed as the web app itself, just in virtue of the fact that the user successfully
signed in. In a way, you can think of this as a resurgence of the concept of perimeter. However,
the big difference with traditional network perimeter is that the boundaries here are mostly
logical (API’s willingness to accept tokens issued to the CI app client) rather than physical (actual
network boundaries).
This class of patterns is pretty common in the context of microservices, where there is a gateway
that validates the identity of a caller. Once that check has been successfully performed, all the
subsequent calls from the gateway can be performed carrying tokens identifying the calling app
102
rather than the user. The user information might still be required, but it doesn’t strictly need to
As it is the case with every confidential client flow, the critical point here is in putting particular
care in provisioning client credentials and maintaining them: for example, by making sure that
no entity other than the application has access to its credentials. Another critical aspect of the
scenario, not explicitly covered by the standards but of vital importance, is to carefully choose the
privileges assigned to the application and application logic exercising them. The least privilege
Let's take a look at how the client credentials grant actually works on the wire: please refer to
Figure 4.6.
103
Figure 4.6
104
1. Access Token Request
The client application requests a token by contacting the token endpoint directly, similarly
to what we have observed in the server-side segments of all the grants we have studied
so far.
In the sample scenario we have been discussed so far, the call is performed during a user
session - however that is entirely arbitrary. Remember that the client credentials grant
only relies on the client’s own identity rather than requesting delegated authorization from
a user. So, from the OAuth 2 standpoint, the flow described here might just as well occur
executed in a context where distribution and protection of client credentials are possible.
The request is a customary HTTP POST, carrying the well-known client_id, client_secret,
Observing the body of the POST message, one notable difference from all the grants
encountered so far is that the message for the token endpoint doesn’t contain any artifact
besides the client_secret. In contrast, the authorization code grant and the refresh code
grant all included some other entity to redeem. Once again, this shows why the other
flows are conceivable with public clients as well, whereas the client credential grant isn’t
Here it’s opportune to stress that client credentials and the client credentials grant are two
separate, distinct concepts. Client ID and client secret are the client credentials assigned
to a confidential client application and are used to identify the client app in every grant
whenever communication with the token endpoint occurs. The client credential grant is
105
a grant which happens to require only the client credentials, and no other artifact, to be
performed. It’s easy to get confused when using the terms loosely: whenever you hear
someone mentioning “client credentials”, it’s useful to be c lear on whether they are talking
about the grant, or just about the client ID and client secret.
One last observation on the request message: the audience parameter is required to
indicate to the authorization server what resource the client is requesting access to. This
information is necessaryfor authorization servers that can protect multiple source servers;
hence there’s no default resource the authorization server can refer to. As mentioned in
our earlier discussions about the audience parameter, the standard way of signaling that
information to the authorization server is through the resource parameter as defined in the
resource indicators specification, which was formalized into RFC state only a few months
2. Token Response
The token endpoint response is entirely unsurprising, carrying back the requested access
Of course, there is no id_token, given that the grant didn’t entail user identity in any
capacity.
Notably absent is the refresh token, too. In this scenario, it would simply serve no purpose.
The refresh token is meant to allow a client app to obtain a new access token to substitute
an expired one, and to do so without bugging the user with an extra prompt. However,
there is no need to ask anything to a user here, as the client credentials are available to
106
Important note: the mechanism shouldn't be abused. Once a client requests and obtains
an access token, it should keep it around (stored with all the safety measures the task
requires) for the duration of its useful lifetime and use it whenever it needs to call an
API. Discarding still-valid access tokens and requesting a new access token from the
authorization server every time can be a costly anti-pattern, at all levels: security (every
time credentials are sent on the wire, there’s an opportunity for something to go wrong),
Note that, in this particular case, Auth0 uses scope to represent what the client can do.
Let's say that scopes normally restrain the set of privileges that the client can use from
the privilege that the user has, and here there is no user. Even if it appears not quite
appropriate, that's how Auth0 does it today. It just represents the privileges that have been
granted to the client application. There is no real security risk because of this: if a resource
owner would interpret the incoming scopes as the delegated authorization concepts we
discussed so far, the power they’d confer to the caller would be less, not more. However,
obtained the access token being used to protect that call. This completes our journey to
understand how to leverage OAuth2 and OpenID Connect to invoke APIs from a traditional
In the next chapter, we'll take a look at native clients, mobile clients and pretty much any
application that an end-user can directly operate… and that isn’t a browser.
107
Chapter 5 - Desktop and Mobile Apps
COMING SOON
108
Chapter 6 - Single Page Applications
COMING SOON
109