0% found this document useful (0 votes)
389 views109 pages

OAuth2 and OpenID Connect The Professional Guide Ebook v02

This document provides an introduction to the book "OAuth2 and OpenID Connect: The Professional Guide - Beta" by Vittorio Bertocci. The book aims to help readers understand digital identity concepts and protocols like OAuth2 and OpenID Connect. It explores problems with traditional authentication methods and how modern standards evolved to address these challenges. The book provides details on various identity concepts, OAuth2 and OpenID Connect flows, and how to implement authentication for different application types using these open standards.

Uploaded by

Sistermas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
389 views109 pages

OAuth2 and OpenID Connect The Professional Guide Ebook v02

This document provides an introduction to the book "OAuth2 and OpenID Connect: The Professional Guide - Beta" by Vittorio Bertocci. The book aims to help readers understand digital identity concepts and protocols like OAuth2 and OpenID Connect. It explores problems with traditional authentication methods and how modern standards evolved to address these challenges. The book provides details on various identity concepts, OAuth2 and OpenID Connect flows, and how to implement authentication for different application types using these open standards.

Uploaded by

Sistermas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

OAuth2 and OpenID Connect:

The Professional Guide - Beta


by Vittorio Bertocci
curated by Andrea Chiarelli

1
OAuth2 and OpenID Connect:
The Professional Guide - Beta

by Vittorio Bertocci
curated by Andrea Chiarelli

2
Introduction 5

Chapter 1 - Introduction to Digital Identity 6

From User Passwords in Every App... 10

...to Directories 13

Cross-Domain SSO 15

The Password Sharing Anti-Pattern 21

Delegated Authorization: OAuth2 23

Layering Sign In on Top of OAuth2: OpenID Connect 27

Auth0: an Intermediary Keeping Complexity at Bay 32

Chapter 2 - OAuth2 and OpenID Connect 34

OAuth2 Roles 37

OAuth2 Grants and OIDC Flows 39

Chapter 3 - Web Sign-In 41

Confidential Clients 41

The Implicit Grant with form_post 43

A detailed walkthrough 44

Anatomy of an ID Token 51

Principles of Token Validation 54

Metadata and Discovery 59

Chapter 4 - Calling an API from a Web App 62

The Authorization Code Grant 62

Sidebar: Essential Authorization Concepts and Terminology 75

3
The Refresh Token Grant 80

Sidebar: Access Tokens vs. ID Tokens 89

ID Tokens and the Back Channel 93

The Userinfo Endpoint 95

The Hybrid Grant 100

Client Credentials Grant 101

Chapter 5 - Desktop and Mobile Apps [COMING SOON]

Chapter 6 - Single Page Applications [COMING SOON]

4
Introduction
This book will help you to make sense of OAuth, OpenID Connect, and the many moving parts

that come together to make authentication and delegated authorization happen.

You will discover how authentication and authorization requirements changed in past years, and

how today’s standard protocols evolved and augmented their ancestors to meet those challenges

- problems and solutions locked in an ever-escalating arm’s race.

You will learn both the whys and the hows of OAuth2 and OpenID Connect. You will learn what

parts of the protocol are appropriate to use for each of the classic scenarios and app types

(Sign-on for traditional web apps, Single Page Apps, calling API from desktop, mobile and web

apps, and so on). We will examine every exchange and parameter in detail - putting everything

in context and always striving to see the reasons behind every implementation choice within

the larger picture.

After reading this book, you will have a clear understanding of the classic problems in authentication

and delegated authorization, the modern tools that open protocols offer to solve those problems,

and a working knowledge of OAuth2 and OpenID Connect. All that will allow you to make informed

design decisions - and even to know your way through troubleshooting and network traces.

5
Chapter 1 - Introduction to Digital Identity

In this chapter, you will be able to grasp some of the essentials of identity, both in terms of

concepts and the jargon that we like to use in this context. And you'll have a good feeling of the

problems, the classic dragons that we want to slay in the identity space, which also happens to

be the things that Auth0 can do for our customers.

Without further ado, what is the deal with identity? Why is everyone always saying, "Oh, this is

complicated." Why? Just look at the following picture. It is trivially simple: I have just two bodies

in here and your basic physics course, it would be one of the easy problems.

Figure 1.1

I have a resource of some kind, and I have a user — an entity of some kind that wants to access

that resource in some capacity. It's just two things doing one action. Why is this so complicated?

Well, for one, there's the fact that this is mission-critical.

When something goes wrong in this scenario, it goes catastrophically wrong. And so, like every

mission-critical scenario, of course, it deserves our respect and our attention, and our preparation.

There is a lot of energy that goes into preventing this catastrophic scenario from coming true.

But in this specific domain of development, the thing that makes these complex is the Cartesian

6
product of all the factors that come into play to determine what you have to do for having a viable

solution. Consider the following factors:

Resource types: just think of all the types of resources you can have. Just a few years ago,

if you’d walk in a bank, you'd have a host, they’d have some central database, and that's it.

Today, conversely, pretty much everything is accessible programmatically. So you have the

API economy, you have serverless — all those buzzwords actually point to different ways

of exposing resources and, of course, websites, apps, and all the things that you use in

your daily life. Whenever you interact with a computer system, there is a kind of resource

that you have to connect to. And, from the point of view of a developer, implementing that

connection is actually a lot of work.

Development stacks: there are minor differences between development stacks that translate

into big differences in the code that you have to write for implementing access to a resource

and the way in which you interact with it. This is one level of complexity.

Identities sources: the other level of complexity is the sheer magnitude of the sources of

identities that you can use today.

Think of all the ways in which your own identity gets expressed online. You can be a member

of a social network, an employee of one company, a citizen of a country. And all of those

identities somewhat get expressed in a database somewhere, and that somewhere determines

how you pull this information out.

You connect to Facebook in a certain way. You connect to Active Directory in a different way.

You get recognized when you're paying your taxes to your country in yet another way. So,

again, we encounter another factor of complexity: if you want to extract identity from these

repositories, you have to find a way of doing it according to each repository’s requirements

and characteristics.

7
Client types: Finally, there are many more complexity factors, but I just want to mention

another one: the incredible richness with which we can consume information today. Think of

all the possible clients that you can use from your mobile phone and applications to websites,

to your watch. You can literally use anything you want to access the data. And again, these

compounds in terms of complexity with the kinds of resources that you wanted to access,

the places from where you are extracting information. So, this picture might look simple,

but it's all but.

Now, what can Auth0 do for you to make this a bit more manageable? We offer many different

things but, in particular, the most salient component of our offering is our service. It is a service

that you can use for outsourcing most of the authentication functions that you need to have in

your solutions - so that you don't have to be exposed to that complexity. In particular, we offer:

ways of abstracting away the details of how you connect to multiple sources of identities.

Every identity provider will have a different style of doing the identity transactions, and we

abstract all of that away from you.

a way of dealing with the user-management lifecycle. We have user representations and

features for dealing with the lifecycle of users and similar.

a very large number of SDKs and samples, which help you to cross the last mile so that when

you're using a particular development stack, you can actually use components to connect

to Auth0 in a way that is aligned with the idiom that you're using in that context.

a degree of customization ability that is absolutely unprecedented in the industry. There is

no other service at this point that offers the same freedom you have with Auth0 to customize

your experience.

8
Now, when you need to connect your application to Auth0, you need to do something to tell

us, "Auth0, please do authentication". And that something in Auth0 is implemented using open

standards.

Open standards are agreements, wide consensus agreements that have been crafted by

consortiums of different actors in the industry. We identity professionals decided to work on

open standards when we came to the realization that everyone - users, customers, and vendors -

would have been better off if we would have enshrined in common standards common messages,

common protocols, some of the transactions that we know needed to occur when you're doing

authentication, and similar. What happened back then is that we went to semi-expensive hotels

around the world, met with our peers across the industry, and argued about how applications should

present themselves when offering services in the context of an identity transaction. We discussed

similar considerations for identity providers. What kind of messages should be exchanged? We

literally argued message details down to the semicolon. That's how fun standards authoring is,

but it's all worth it: now that we have open standards and all vendors implemented the open

standards, you, as the customer, can choose which vendor you want to use without worries about

being locked into a particular technology or vendor. Above all, you can plan to introduce different

technologies afterward, without worrying about incompatibilities.

Of course, this is mostly theory: a bit like those simplified school problems disregarding friction

or gravity of the moon influencing tides. In reality, there are always little details that you need to

iron out. But largely, if you worked in our industry for the last couple of decades, you know that

we are so much better off now that we have those open standards we can rely on.

In identity management, you're going to get in touch with many protocols, many of them probably

not even invented yet.

9
The ones that are a daily occurrence nowadays are:

OpenID Connect, which is used for signing in

OAuth2, which is the basis of OpenID Connect and it is a delegation protocol designed to help

you access third party APIs

JSON Web Token or JWT, which is a standard token format. Most of the tokens you'll be

working with are in this format

SAML, which is somewhat a legacy (but still very much alive) protocol that is used for doing

single sign-on across domains for browsers. SAML also defines a standard token format,

which has been very popular in the past and is still very much in use today.

From User Passwords in Every App...

Let’s spend the next few minutes going through a time-lapse-accelerated-whirlwind tour of how

authentication technologies evolved. My hope is that by going back to basics and revisiting this

somewhat simplified timeline, I'll have the opportunity to show you why things are the way they are

today. In doing so, I’ll also have the opportunity to introduce the right terms at the right time. By

being exposed to new terminology at the correct time, that is to say, when a given term first arose,

you will understand what the corresponding concepts mean in the most general terms. Contrast

that with the narrower interpretations of a term’s meaning you’d end up with if you’d be exposed

to it only in the context of solving a specific problem. You might end up thinking that the problem

you are solving at the moment is the only thing the concept is good for, missing the big picture and

potentially stumbling in all sorts of future misunderstandings. We won’t let that happen!

10
Let's go back to the absolute basics and think about the scenario that I described earlier in Figure

1.1 - the scenario in which I have one resource of some kind, let's say, a web application and a user,

and we want to connect the two. Now, what is identity in this context?

We won't get bogged down with philosophy and similar. Identity here can be defined in a very

operational, very precise fashion. We call digital identity the set of attributes that define a particular

user in the context of a function which is delivered by a particular application. What does it mean?

That means that if I am a bookseller, the relevant information I need about a user is largely their

credit card number, their shipping address, and the last ten books that the user bought. That's their

digital identity in that context. If I am the tax department, then the digital identity of a user is again,

a physical address, an identifier (here in the USA is the Social Security number), and any other

information which is relevant to the motion of extracting money from the citizen. If I am a service

that does DNA sequencing, the identity of my user is the username that they use for signing in,

their email address for notifications, and potentially their entire genome.

You can see how for all the various functionalities that we want to achieve, we actually have a

completely or nearly completely different set of identities. These might correspond to the same

physical person or not. It doesn't matter. From the point of view of designing our systems, that's

what the digital identity is. So, you could say that the digital identity of this user is this set of

attributes we can place in the application’s store. Now the problem of identity becomes: when

do I bring those particular attributes in context? The oldest trick in the world is to have the

resource and the user agree on something such as a shared secret of some sort. So, when the

user comes back to the site and presents that secret, demonstrates knowledge of that secret,

the website will say, okay, I know who you are, you’re the same user I saw yesterday. Here is

your set of attributes, welcome back. I authenticated the user. In summary, that means grabbing

a set of credentials, sending it over, and assuming that those credentials were saved previously

in a database. If they match, the user is authenticated.

11
This scenario is summarized in the following picture:

Figure 1.2

Now, you hear a lot of bad things about username and password... and they are all true. That's

unfortunate, but it's true. However, it is an extraordinarily simple schema, and as such, it is very,

very, very resilient. Even if we have more advanced technologies, which do more or less the

same job, passwords are still very popular. I predict that this year, like every year, someone will

say that this is the year in which passwords will die. But I think that passwords will still be around

for some time. My favorite metaphor for this is what happens in the natural world. Humans are

allegedly the pinnacle of evolution. However, there are still plenty of jellyfish in the sea. They are

so simple, and sure, we are more advanced, but I am ready to bet that there are more individual

jellyfish than there are humans. The fact that their body plan is simple doesn't mean that it is

not successful. You'll see, as we go through this history, that passwords are somewhat building

blocks on which more advanced protocols layer on top of. Again, I'm not discounting the efforts

of eliminating passwords and using something better, but I'm just trying to set expectations that

it's still going to take some time.

12
... to Directories

Let's make things a bit more interesting. Imagine the scenario in which we have one user and

one application. Now, extend this scenario to the situation in which this user is an employee of

some company. There is a collection of applications being used by this particular user in the

context of the company’s business. Most applications are all part of what the user does in the

context of his or her employment. Imagine that one application is for expense notes, the other is

for accounting, the other is for warehouse management. Anything you can think of. A few years

ago, what happened was that we had a bunch of apps on a computer. Then, we had someone

showing up with a coaxial cable, installing token ring networks, and placing all these computers

in the network. But that alone didn't make the environment, and in particular the applications,

automatically network ready. What happened is that you'd have exactly the situation - the big

thing here - in which you'd have a user accessing different independent apps which knew nothing

about each other, and which replicated all the functionality that could have been easily centralized.

In particular, every user had different usernames and passwords - or I should say different

usernames, because, of course, people reuse their passwords. Every time users went to a new

app, they had to enter their credentials. And whenever a user had to leave the company, willingly

or not, the administrator had to go in pilgrimage, on all these various apps, run after the user’s

entries in there and deprovision them by hand, which of course is a tedious and error-prone flow.

It's difficult. You often hear horror stories of disgruntled employees using procurement systems

for buying large amounts of items just for getting back at their former bosses and being able to

do so because their credentials in the procurement system weren’t timely revoked.

That wasn't a great situation, to say the least.

13
What happened is that the industry responded by introducing a new entity, which we call the

directory. The directory is still extremely popular. It is a software component, a service, which

centralizes a lot of the functionalities that you see in Figure 1.3.

Figure 1.3

Basically, the directory centralized credentials and attributes and made it redundant for applications

to implement their own identity management logic. At this point, users would simply sign in with

their own central directory, and from that moment onward, they'd have Single Sign-On access

to all the other applications. The application developers didn't actually have to code anything

for identity to achieve that result. In fact, now that the network infrastructure itself provided the

identity information, administrators could now take advantage of this centralized place to deal

with the user lifecycle. It can be said that the introduction of the directory is what truly created

identity administrators as a category of professionals. The ubiquitous availability of directories

created an ecosystem of tooling that helps people to run operations, identities, and similar. So,

a fantastic improvement - which was predicated on the perimeter. In order for all this to work as

intended, you had to have all the actors within that perimeter. The perimeter was often the office

14
building itself, with users actually walking in the building, sitting in front of a particular physical

device, and having direct “line of sight” with this cathedral in the center of the enterprise: the

directory, a central place knowing everything about everyone.

Cross-Domain SSO

Of course, we know from current business practices that this approach doesn't scale. It works

well when you are within one company, but there are so many business processes that require

having more than one company.

Think of a classic supplier or reseller. Any of those relationships requires spanning multiple

organizations. And so what happens is that when you have a user in one organization that needs

to access a different resource in a different organization, you have a problem. In fact, this user

does not exist in the resource side directory.

The first way in which the industry tried to give a solution to this problem was to introduce what

we call shadow accounts, which means provisioning the user to the resource side directory.

This is completely unsustainable, as it presents the same problems that we mentioned earlier at

a different scale when every application handled identity explicitly. Let's say that we have a user

whose lifecycle is managed in one place, their own home directory, but that has been provisioned

an entry in the resource side directory as well. When the user is deprovisioned from their home

directory, then there might be a trail of user accounts provisioned in other directories (such as

the resource side directory in our scenario) that are still around and that need to be manually

deprovisioned. That's, of course, a big problem because the deprovisioning isn’t likely to happen

timely or, like any changes in general, are harder to reflect in distributed systems not centrally

15
managed. Plus, imagine the complexity of having this company, which may be a reseller for many

other companies, but needs to duplicate somewhat the work that its customer companies are

already doing in their own directories for managing their own users. It's just not sustainable.

So, what happened was that, just like it's classic in computer science, we solved this problem by

adding a level of abstraction. We took the capabilities that we have seen for the local directory

case, and we just abstracted it away. We provided the same transactions, but we described

them in a way that is not dependent on network infrastructure. For example, Active Directory

and directories in general, rely on an authentication protocol called Kerberos, which is very much

integrated with a network layer, hence has specific network hardware requirements. Whereas, of

course, in this case of scenarios spanning multiple companies, we have to cross the chasm of the

public Internet and cannot afford to impose any requirements as requests will traverse unknown

network hardware.

What happened is that the big guys of that time, Sun, IBM and similar, sat at one table and came

up with this protocol called SAML, which stands for Security Assertion Markup Language. In a

nutshell, the protocol described a transaction in which a user can sign in in one place and then

show proof of signing in in another place and gain access. Here’s how it works. We need something

which facades my actual resource with some software which is capable of talking with that

protocol, which in this particular case is going to be what we call a middleware: a component that

stands between your application and the caller, intercepting traffic and executing logic before

the requests reach the actual application. Similar protocol capabilities would be exposed on the

identity provider side. In the topology shown in figure 1.3, we have the machine already fulfilling

the local directory duties (what we call the domain controller in the directory jargon), and we just

teach that machine to speak a different language, SAML, which can be considered somewhat of

a trading language that we can use for communication outside the company’s perimeter.

16
In order to close this transaction, what happens is that we need to introduce another concept:

trust. Think of the scenario we were describing earlier, the one within one single directory: in it,

every application and every user implicitly believes and trusts the domain controller. The network

software in itself, whenever you need to authenticate, will send you back to the domain controller

and the domain controller will do its authentication. It is just implicit, it's as natural as the air that

you're breathing because there is only one place that can perform authentication duties in the

entire network.

Now, look at this particular scenario:

Figure 1.4

The application within the Company 2 perimeter can be accessed by any of its business partners:

there is now a choice about from where we want to get users identities, there is no longer an

obvious default users’ source. We say that a resource trusts an identity provider or an authority

when that resource is willing to believe what the authority says about its users. If the authority

says: “this user is one of my users and successfully authenticated five minutes ago”, then the

resource will believe it. That's all trust means.

17
When you set up your middleware in front of your application, you typically configure it with the

coordinates of the identity providers that you trust. How does that come into play when you

actually make a transaction? Let's see how this works in an actual flow by describing in detail

each numbered step shown in the following figure:

Figure 1.5

In the first leg of the diagram, the user points the browser to the application and attempts to

GETa page (1). The middleware in front of the application intercepts the request, sees that the

user is not authenticated, and turns the request into an authentication request to the identity

provider (IdP), as it is configured as one of the trusted IdPs (2).

In concrete terms, the middleware will craft some kind of message, probably a URL with specific

query string parameters, and will redirect the browser against one particular endpoint associated

with the identity provider (3).

18
In this particular scenario, the target endpoint belongs to a local identity provider. You can see that

the call to the IdP authentication endpoint is occurring within the boundaries of the enterprise.

That means that that call will be authenticated using Kerberos, like any other call on the local

network. You can already see these layering of protocols, one on top of the other. Thanks to the

use of Kerberos and the fact that the user is already authenticated with the local directory, the

user will not have to enter any credentials during this call.

Next, the identity provider establishes that the user is already correctly authenticated, and

establishes that the resource is one of the resources that have been recorded and approved.

Because of those positive checks, the IdP issues to the user what we call a security token (4).

A security token is an artifact, a bunch of bits, which is used to carry a tangible proof that the

user successfully authenticated. Security tokens are digitally signed. What does it mean? A

digital signature is something that protects bits from tampering. Let's say that someone modifies

anything of those bits in transit: when the intended recipient tries to check the signature, it will

find that the signature does not compute. The recipient will know for sure that those bits have

been modified in transit.

This property is useful for two reasons. One reason is that given that we use public-key

cryptography, we expect that the private key that was used to perform the signature is only

accessible by the intended origin of this token. No one else in the universe can perform with

that signature, but that particular party. Remember what we just said about trust: that property

can be used as proof that a token is coming from a specific entity, and in particular, whether it

is a trusted one.

The second reason is that given that the token content cannot be modified in transit without

breaking the signature, I can use tokens as a mechanism to provide the digital identity of a user

19
on the fly. Instead of having to negotiate in advance the acquisition of the attributes that define

the user (the user identity, according to our definition), as an application, I can just receive those

attributes just in time, together with the token. This might be the first and the last time that this

particular user accesses this application, but thanks to the fact that there is a trust between the

two organizations, I didn't need to do any pre-provisioning steps.

The attributes that travel inside tokens are called claims. A claim is simply an attribute packaged

in a context that allows the recipient to decide whether to believe that the user does indeed

possess that attribute. Think about what happens when boarding a plane. If I present my passport

to the gate agents, they will be able to compare my name (as asserted by the passport) with the

name printed on my boarding pass and decide to let me go through. The gate agents will reach

that conclusion because they trust the government, the entity that issued my passport. If I’d pull

out a Post-it with my name jolted down with my scrawny chicken legs handwriting and present it

to the gate agents in lieu of the passport, I'm probably not going to board the plane - in fact, I'm

likely going to be in trouble. The medium truly is the message in this case. The token really does

carry this potential for deciding whether you trust or not that particular information. Attributes

inside tokens become claims. It is an important difference.

Once the identity provider issues a SAML token, it typically returns it to the browser inside an

HTML form, together with some JavaScript that triggers as soon as the page is loaded - POSTing

the token to the application, where it will be intercepted by the middleware (5).

The middleware looks at the token, establishes whether it's coming from a trusted source,

establishes whether the signature hasn't been broken, etc. etc. and if it's happy with all that, it

emits what we call a session cookie (6). The session cookie represents the fact that successful

authentication occurred. By setting a cookie to represent the session, the application will be

20
spared from having to do the token dance again for every subsequent request. The session

cookie is simply used for enabling the application to consider the user authenticated every time

the application receives a postback.

This is how SAML solved the particular problem of cross-domain single sign-on. We’ll see that

this pattern of exchanging a token for a cookie will also occur with OpenID Connect.

The Password Sharing Anti-Pattern

All this happened in the business world, but the consumer world also didn't stay still from the

identity perspective. One thing that happened was that, as we got more and more of our lives

online, we found ourselves more and more often with the need to access resources that we

handle in a certain application... from a different application.

Let me make a very concrete example. I guess that many of you have LinkedIn, and many of you

also have Gmail. Imagine the following scenario. Say that a user is currently already signed in

in LinkedIn, in whatever way they want. The mechanics of how they got signed in in LinkedIn is

not the point in this scenario. Say that LinkedIn wants to suggest you to invite all of your Gmail

contacts to become part of your LinkedIn network.

We are using LinkedIn and GMail only because they are familiar names with familiar use cases,

but we are in no way implying that they are really implemented in this way nor that they played

any direct role in authoring this course.

Now, how was LinkedIn used to do this? I'm using LinkedIn as an example here, but it's basically

the behavior of any similar service you can think of before the rise of delegated authorization.

Let’s take a look at this flow by following the steps in the following figure.

21
Figure 1.6

LinkedIn would actually ask you for your Gmail username and password, which are normally

stored and validated by Gmail (1). You provide LinkedIn with your Gmail credentials (2), and

then, LinkedIn would use them to actually access the Gmail APIs used by the Gmail app itself for

programmatic access to its own service (3). This would achieve what LinkedIn wants, which is

to call the APIs in Gmail for listing your contacts (4) and sending emails on your behalf.

What is the problem with this scenario? Many problems, but two, in particular, are impossible to

ignore.

The first problem is that granting access to your credentials on any entity that is not the custodian

of those credentials is always a bad idea. That is mostly because those different entities will not

have as much skin in the game as the entity that is actually the original place for those credentials.

If LinkedIn does not apply due diligence and save those credentials in an insecure place... sure,

they'd get bad PR, but it will not be the catastrophe that it would be for Gmail, for which the user

access is now impacted. For example, Gmail users will need to change passwords, creating a

situation where they are highly likely to defect or at least to experience lower satisfaction with

the service.

22
Here’s the second bad thing. Although the intent that LinkedIn had with this transaction was good

(it is mutually beneficial both for me as a user and for LinkedIn as a service for me to expand my

network), the way in which they have implemented the function gives them way too much power.

LinkedIn can actually use this username and password to do whatever they want with my Gmail.

They can read my emails, they can delete emails selectively, they can send other emails, they can

do everything they want beyond the scenario originally intended - and that's clearly not good.

Delegated Authorization: OAuth2

In response to the challenges outlined at the end of the preceding section, the industry came up

with a way of working around the problem of giving too much power to applications.

OAuth2 was designed precisely to implement the delegated access scenario described earlier,

but without the bad properties we identified as part of the brute force approach. The defining

feature of the OAuth2 approach lies in the introduction of a new entity, the authorization server,

which explicitly handles operations related to delegated authorization. I won't go too much into

the details right now, because I'm going to bore you to death about it later on in this book.

Suffice to say here that the authorization server has two endpoints:

The authorization endpoint, designed to deal with the interaction with the end-user.

It's designed to allow the user to express whether they want a certain service to access

their resources in a certain fashion. The authorization endpoint handles the interactive

components of the delegated authorization transaction.

1 The first incarnation of OAuth was OAuth1, a protocol that resolved the delegated access scenario but had several limitations and
complications. The industry quickly came up with an evolution, named OAuth2, which solved those problems and completely sup-
planted OAuth1 for all intents and purposes. For that reason, in this text we only discuss OAuth2.

23
The token endpoint, which is designed to deal with software to software communication and

takes care of actually executing on the intent that the user expressed in terms of permission,

consent, delegation, and similar concepts. More details later on.

Please note: in the following discussion, we are assuming that the user is already signed in

LinkedIn even before the described scenario plays out. We don’t care how the sign-in occurred

in this context; we just assume it did. OAuth2, as you will hear over and over again, is not a

sign-in protocol.

Let’s say that that, as part of his or her LinkedIn session, the user gets to a point in which LinkedIn

wants to gain access to Gmail API on his or her behalf, as described in the last section for the

analogous scenario.

In the OAuth2 approach, that means that LinkedIn will cause the user to go to Gmail and grant

permission to LinkedIn to see their contacts and send mail on their behalf. Let’s follow this new

flow by taking a look at this figure:

24
Figure 1.7

LinkedIn follows the OAuth2 specification to craft an authorization request and redirect the user’s

browser to GMail’s authorization server and, in particular, the authorization endpoint (1).

The authorization endpoint is used by Gmail to prompt the user (2) for credentials if they are

not currently authenticated with the GMail web application. This is all within the natural order

of things. In fact, it's Gmail asking a Gmail user for Gmail credentials. So, no foul playing here,

everything is fine. As soon as the user is authenticated, the Gmail authorization server will prompt

the end-user, saying something along the lines of, "Hey, I have this known client, LinkedIn, that

needs to access my own APIs using your privileges. In particular, they want to see your contacts,

and they want to send emails on your behalf. Are you okay with it?"

Once the user says okay, presumably, the authorization server emits an authorization code (3).

An authorization code is just an opaque string that constitutes a reminder for the authorization

server of the fact that the user did grant consent for those permissions for that particular client.

25
The authorization code is returned to LinkedIn via browser (4). From now on, the rest of the

transaction occurs on the server side.

Please note: before any of the described transactions could occur, LinkedIn had to go to the

authorization server and register itself as a known client. As part of the client registration operation,

LinkedIn received an identifier (called client id) and, most importantly, a client secret. The client

id and client secret will be used for proving LinkedIn’s identity as an application in requests sent

to GMail’s authorization server, in particular to its token endpoint. The remainder of the diagram

explanation will give you an example of how this occurs.

Now that it obtained an authorization code, LinkedIn will reach out to the token endpoint of the

authorization server (5) and will present with its own credentials (client id and client secret) and

the authorization code, substantially saying, "Hey, this user consented for this and I'm LinkedIn.

Can I please get access to the resource I want?"

As an outcome of this, the authorization server will emit a new kind of token, which we call an

access token (6). The access token is an artifact that is used to grant to LinkedIn the ability to

access the Gmail APIs (7) on the user’s behalf, only within the scope of the permissions that the

user consented to (8).

This solves the excessive permissions problem described in The Password Sharing Anti-Pattern

section. In fact, as long as LinkedIn accesses the Gmail APIs only attempting operations the user

consented to, the requests to the API will succeed. As soon as LinkedIn tries to do something

different from the consented operations, like, for example, deleting emails, the endpoint will deny

LinkedIn access, because the access token accompanying the API call is scoped down to the

permissions the user consented to (in our example, read contacts and send emails). Scope is

the keyword that we use here to represent the permissions a client requested on behalf of the

26
user. This mechanism effectively solved the problem of excessive permissions, providing a way

to express and enforce delegated authorization.

What we described so far is the canonical OAuth2 use case, the one for which the protocol has

been originally designed. In practice, however, OAuth2 is used all over the place, and it incurs in

all sorts of abuses, that is, in ways in which OAuth2 wasn't designed to be used. Be on the lookout

for those problematic scenarios: every time you hear that some solution uses OAuth2, please

think of the canonical use case as described here first. OAuth2 supports many other scenarios,

and in this book, we will discuss most of them. However, the core intent is as expressed in the use

case we described in this section. Thinking about whether a solution is using OAuth2 in line with

the intent expressed here, or delve from it significantly, is a useful mental tool to verify whether

you are dealing with a canonical scenario or if you need to brace for non-standard approaches.

Layering Sign In on Top of OAuth2: OpenID Connect

Let me give you a demonstration of one particularly common type of OAuth2 abuse. As OAuth2

and delegated authorization scenarios started gaining traction, many application developers

decided that they wanted to do more than just calling APIs. They wanted it to achieve in the

consumer space, what we achieved with SAML. They wanted to allow users to sign in in their apps

reusing accounts living in a completely different system. Instantiating this new requirement in the

scenario we’ve been discussing, LinkedIn might like users with a Gmail account to be able to use

it to sign in in LinkedIn directly, without the need to create a LinkedIn account. In other words,

LinkedIn would just want users to be able to sign up in LinkedIn reusing their Gmail accounts.

27
This is a sound proposition because, in many cases, people typically aren't crazy about creating

new accounts, new passwords, and similar. So, making it possible to reuse accounts is not a bad

idea in itself

However, OAuth2 was not designed to implement sign-in operations. Most providers only exposed

OAuth2 as a way of supporting delegated authorization for their API, and did not expose any

proper sign-in mechanism as it wasn’t the scenario they were after. That didn’t deter application

developers, who simply piggybacked on OAuth2 flows to achieve some kind of poor man's signing

in. Imagine the delegated authorization scenario described for the canonical OAuth2 flow and

imagine it taking place with the user not being previously signed in in LinkedIn. The following

picture describes this flow:

Figure 1.8

LinkedIn can perform the dance to gain access to Gmail APIs without having any authenticated

user signed in yet (1). As soon as LinkedIn successfully accesses Gmail APIs (2), it might reason,

“Okay, this proves that the person interacting with my app has a legitimate account in Gmail”,

28
so LinkedIn might be satisfied by that and consider this user authenticated - which in practice

could be implemented by creating and saving a session cookie (3), as we did during sign-in

flows early on when we discussed the SAML approach

This would be a good time to remind you that we are using LinkedIn and GMail only

because they are familiar names with familiar use cases, but we are in no way implying

that they are really implemented in this way.

This pattern for implementing sign-in is still a common practice today. A lot of people do this.

It's usually not a good idea, mainly because access tokens are opaque to the clients requesting

them, which makes many important details impossible to verify. For example, the fact that an

access token can be used for successfully calling an API doesn't really say anything about

whether that access token was issued for your client or for some other application. Someone

could have legitimately obtained that access token via another application (in our scenario

not as LinkedIn, but as some other app) and then somehow managed to inject the token in

the request. If LinkedIn just uses that token for calling the API and it reasons, “Okay, as long

as I can use this token to call the API without getting an error, I’ll consider the current user

authenticated”, then LinkedIn would be fooled in creating an authenticated session.

Another consequence of the fact that access tokens are opaque to clients is that an attacker

could get a token from a user and somehow inject it in the sign-up operation for a completely

different user. Once again, LinkedIn wouldn't know better because unless the API being called

returns information that can be used to identify the calling user, the sheer fact that the API

call succeeds will not provide any information the client can use to determine that an identity

swap occurred.

29
The attacks that I'm describing are called the Confused Deputy attack, and they are a classic

shortcoming of piggybacking sign-in operations on top of OAuth2.

Even more aggravating: with this approach, there is no way to standardize the OAuth2 based

sign-in flow. In our model scenario, the last mile is a successful call to Gmail APIs. If I want to

apply the same pattern with Facebook, the last mile would be a successful call to the Facebook

Graph APIs, which are dramatically different from the GMail API. That makes it impossible to

enshrine this pattern in a single SDK that can be used to implement sign in with every provider

across the industry, even if they all correctly support OAuth2.

This is where the main players in the industry once again came together and decided to

introduce a new specification, called OpenID Connect, which formalizes how to layer signing

in on top of OAuth2. I'll go into painstakingly fine details about that effort in the rest of the

book, but in a nutshell, the central point of the approach is the introduction of a new artifact,

which we call the ID token. The ID token can be issued by an authorization server via all the

flows OAuth2 defines. OpenID Connect describes how applications can, instead of asking

for an access token (or alongside access token requests), ask for an ID token. The following

picture summarizes one of such flows:

Figure 1.9

30
An ID token is a token meant to be consumed by the client itself, as opposed to being used

by the client for accessing a resource. The characteristic of the ID token is that it has a fixed

format that clients can parse and validate. The use of a known format and the fact that the token

is issued for the client itself means that when a client requests and obtains an ID token, the

client can inspect and validate it - just like web apps secured via SAML inspected and validated

SAML tokens. It also means the ability to extract identity information from it, once again, just

like we learned is common practice with SAML. Those properties are what makes it possible

to achieve proper signing in using OAuth2. The news introduced by OpenID Connect didn't

stop there: the new specification introduced new ways of requesting tokens, including one in

which the ID token can be presented to the client directly via the front channel, between the

browser and the application. That makes it possible to implement sign in very easily, just like

we have learned in the SAML case, without having to use secrets and a backside integration

flow as the canonical OAuth2 API invocation pattern required.

What we have seen in this chapter can be thought of as a rough timeline for the sequence

of events that culminated with the creation of OpenID Connect. In the next chapters, we will

expand on the high level flows described here, going in deep in the details of the protocol.

31
Auth0: an Intermediary Keeping Complexity at Bay
What's the role of Auth0 in all this? You can think of Auth0 as an intermediary that has all

the capabilities in terms of protocols to talk to pretty much any application that supports the

protocols that you support, such as OAuth2, OpenID Connect, SAML, WS Federation.

Figure 1.10

You can simply integrate your application with Auth0, which in a nutshell, is a super authorization

server, using any of the standard protocol flows we described in this chapter. From that moment

on, Auth0 can take over the authentication function: when it’s time to authenticate, your app

can redirect users to Auth0 and, in turn, Auth0 will talk to the different identity providers you

want to integrate with, in each case using whatever protocol each identity provider requires. If

the identity providers of choice are using one of the open protocols I mentioned, the integration

Auth0 needs to perform is very easy. But if they are using any proprietary approach, for the

32
application developer, it doesn't matter. Once the app redirects to Auth0, Auth0 takes care of

the integration details. For you, it's just a matter of flipping a switch saying, “I want to talk with

this particularidentity provider” - the result, mediated by Auth0, will always come in the format

determined by the open protocol you chose to use for integrating with Auth0. In concrete, that's

what we meant earlier when we stated that Auth0 abstracts away the problem from you.

In addition, Auth0 offers a way of managing the lifecycle of a user. Auth0 maintains its own user

store; it integrates with external user stores and exposes various operations you can perform

for managing users. For example, you can have multiple accounts sourced from multiple identity

providers, that accrue to the same account in Auth0 and your app. You can normalize the set of

claims that you receive from different identity providers so that your application doesn't have to

contain any identity provider specific logic.

We also provide ways of injecting your own code at authentication time, so that if you want to

execute custom logic, for example, subscription, or billing, or any functionality which just makes

sense in your scenario to occur at the same time of authentication, you can easily achieve that.

You have full control over the experience your users will go through, as Auth0 allows you to

customize every aspect of the authentication UX. Auth0 makes it very easy for you to use

the set of features, mostly by providing you with a dashboard that has a very simple point and

click interface. You can also use Auth0’s management APIs to achieve programmatic access to

everything the dashboard does, and more

That's it for Identity 101. It was a pretty quick whirlwind tour of the last 15 to 20 years of evolution

in the world of digital identity. In the next chapters, we'll spend a bit more time sweating the details.

33
Chapter 2 - OAuth2 and OpenID Connect

Let's dig a bit deeper, and specifically turn our attention to OAuth and OpenID Connect (OIDC)

as protocols.

Have you ever read any of the specifications of those protocols? I am an old hand at this: I

was working in this space when there was still CORBA, WS-Trust, and various other old man's

protocols. In the past, identity protocols tended to be extraordinarily complicated: they were

XML-based, and exhibiting high assurance features that made them hard to understand and

implement. For example, the cryptography they used supported what was called message-based

security - granting the ability to achieve secure communications even on plain HTTP. It was an

interesting property, but it came at the cost of really intricate message formatting rules that made

implementation costs prohibitive for everyone but the biggest industry players.

Now, the new crop of protocols, OAuth, OpenID Connect, and similar, are based on simple HTTP

and JSON - a reasonably simple format - and they heavily rely on the fact that everything occurs

on secure channels. This simple assumption enormously simplifies things: together with other

simplifications and cuts, this makes the new protocols more approachable and at least readable.

However, we are not exactly talking about Harry Potter. Ploughing through eighty-six pages of

intensely technical language, such as the ones constituting the OpenID Connect Core specification,

is a pretty big endeavor, even for committed professionals. If you work in the identity space,

you'll find yourself referring to the specifications in detail, over and over again, with a lawyer-like

focus, on each and every single word - those documents are dense with meaning. You can also

see that the specifications have a pretty high cyclomatic complexity. That's to say: there are

34
multiple links that provide context and, usually, there is not a lot of redundancy. If there is a link

pointing to another specification defining a concept used in the current document, you've got to

follow the link and actually learn about that concept before you can make any further progress.

There's really a very large number of such specifications, even if you limit the scope to just one

or two hops from the code OpenID Connect and OAuth2 core specs. All the specifications that

you see in the constellation of OAuth, and OpenID, and JWT, and JWS, and similar are the core,

describing the most fundamental aspects that come into play when handling the main scenarios

those specifications are meant to address. There is an entire ring of best practices or new

capabilities not shown here. The complete picture is, in fact, much larger.

Figure 2.1

35
The main reason for which I am showing you this is to dispel the notion, which a lot of people really

like to believe, that adding identity capabilities to one application is just a matter of reading the

spec. If you want to do modern identity, just read the OAuth2 and OpenID Connect specifications,

and you'll be fine. Of course, the reality is quite different. If that would be true, then not a lot of

people would be doing modern authentication nowadays.

In fact, reading all these things is our job, as identity professionals - as the ones who build identity

services, SDKs, quick starts, samples, and guides that developers can use for getting their job

done without necessarily having to be bogged down in the fine-grained details of the underlying

protocols. That said, given that the book you are reading is meant to be read by aspiring identity

professionals, the fine-grained details of the protocol are among the things we want to learn

about - and what you'll find in abundance in the rest of the text.

However, I dislike the classic academic approach so common in other learning material about

identity. There you just get the lecture and a laundry list of the concepts listed in these various

specifications - college style - and expected to figure out on your own how they apply to your

scenarios. The messages, artifacts, and practices defined in those specifications are all there

for specific reasons. Typically, it is for addressing use cases and scenarios. It's just that their

language is such that it's not presented, usually, in a scenario-based approach, as it would not

be economical in a specification to do so. That's a great approach for formal descriptions and

keeping ambiguity to a minimum, but not great for actually understanding how to apply things

in concrete.

I'm going to turn things around, and actually, apart from giving you some basic definitions, I want

to operate at the scenario level. I want you to understand why things are the way they are and

how they are applied in particular solutions rather than just ask you to study for a test. In the

36
process, we will eventually end up covering all the main actors and all the main elements in the

specifications. Simply, we will not be following the traditional order in which those artifacts are

listed in the specs themselves. We'll just follow the order dictated by the jobs to be done that

we want to tackle.

OAuth2 Roles

Let's start with the few definitions that I mentioned we need before starting our scenario-based

journey through the specifications. OAuth2 and OpenID Connect define a number of primitives

that are required for describing what's going on during identity transactions.

In particular, OAuth2 introduces several canonical roles that different actors can play in the context

of an identity transaction. As OpenID Connect is built on OAuth2, it inherits those roles as well.

The first one is the resource owner. The resource owner is, quite simply, the user. Think of

the LinkedIn and Gmail scenario in the preceding chapter: the resource LinkedIn wants to

access is the user's Gmail inbox; hence the user in the scenario is the resource owner.

Then we have the resource server, which is the guardian of the resource, the gatekeeper

that you need to clear in order to obtain access. It typically is an API. In our model scenario,

the resource server is whatever protects the API that LinkedIn calls for enumerating contacts

and sending emails with Gmail on behalf of the resource owner.

Then, there is the client, probably the entity that is most salient for developers. The client,

from the OAuth2 perspective, is the application that needs to obtain access to the resource.

In our example, that would be the LinkedIn web application.

37
For OAuth2, which is a delegated authorization protocol and a resource access

protocol, every application is modeled as a client. However, we'll see that when we

start layering things on top of OAuth2, and for example, we'll use OpenID Connect for

signing in, very often what, according to the spec jargon, is called the client will, in

fact, be the resource that we want to access. In that sentence, I use “resource” not

in the OAuth sense, but in the general English language sense of the world. You can

see how naming “client” the resource you want to gate access to might be confusing!

Now that you have seen in Chapter 1 how OpenID Connect was built on top of

OAuth2 scenarios, you know why. That's because in OpenID Connect signing in

means requesting an ID token, which is a special semantic access token meant to

be consumed by the requestor itself, rather than for accessing an external resource.

Your application is both the client (because it requests the IDtoken) and the resource

itself (because it consumes it instead of using it for calling an API), but the term

we end up using for describing the app in protocol terms is just client. That can

be confusing for the non-initiated, but that's the way it is. I will often highlight this

discrepancy throughout the book.

Finally, we have the authorization server, which, as defined in Chapter 1, Introduction to

Digital Identity, is the collection of endpoints used for driving the delegated authentication

scenarios described there (and many more).

The authorization server exposes the authorization endpoint, which is the place where users

go to for anything entailing interactivity. Practically speaking, the authorization endpoint serves

38
back web pages. It's not always literally the case, as we'll see in the chapter about SPA, but the

cases in which we don't show a UI on the authorization point are an exception.

The authorization server also features a token endpoint, that is the endpoint to which apps typ-

ically speak to in programmatic fashion, performing the operation that actually retrieves tokens.

Authorization and token endpoints are defined in OAuth2 Core. OpenID Connect augments

those with the discovery endpoint. This is a standard endpoint that advertises, in a machine -

consumable format, the capabilities of the authorization server. For example, it will list information

like the addresses of the two endpoints that I just described. Another essential information the

discovery endpoint provides is the key that OIDC clients should use for validating tokens issued

by this particular authorization server, and so on, and so forth.

OAuth2 Grants and OIDC Flows

The most complicated things in the context of OAuth2 and OpenID Connect are usually what

we call the grants. In a nutshell: grants are just the set of steps a client uses for obtaining some

kind of credential from the authorization server, for the purpose of accessing a resource. As

simple as that. OAuth2 defines a large number of grants because each of them makes the best

of the ability of a different client type to connect to the authorization server in their own ways,

according to its peculiar security guarantees. Grants also serve the purpose of addressing

different scenarios, such as scenarios where access is performed on behalf of the user vs. via

privileges assigned to the client itself and many more.

I won't go into details of the various grants here because we are going to pretty much look at

all of them inside out through this book. Suffice to say at this point that there is a core set of

39
grants originally defined by OAuth2: Authorization Code, Implicit, Resource Owner Credentials,

Client Credentials, and Refresh Token. OpenID Connect introduces a new one, the Hybrid,

which is combining two particular OAuth2 grants into one single flow.

In addition to the grants defined by the code OAuth2 and OpenID Connect specifications, the

OAuth2 working group at IETF and the OpenID Foundation continuously produce independent

extensions, devised to address scenarios that weren't originally contemplated by the core

specs, or deemed too specific for inclusion. The ability to add new specifications to extend

and specialize the core spec is a powerful mechanism, which helps the community to receive

the guidance it needs to address new scenarios as they arise.

The book will examine every essential grant in details, with a particular emphasis of the

scenarios for which a specific grant is most appropriate, the reasons behind the main features

characterizing every grant, and the most important factors that need to be taken into account

when choosing to solve a scenario with a specific grant.

40
Chapter 3 - Web Sign-In

Starting with this chapter, we are going to dive deeper into concrete scenarios. Let's begin with

the most common one: Web Sign-In.

Confidential Clients

Before I actually get into the mechanics of it, I have to make a couple of high-level introductions

of artifacts and terminology that we use in the context of OAuth2 in OpenID Connect. In particular,

I want to talk to you about client types.

A confidential client in OAuth2 is a client that has the ability to prove its own application programmatic

identity. It's any application to which the authorization server can assign a credential of some type

- that makes it possible for the app to prove to the authorization server its identity as a registered

client during any request.

This typically happens with any app that is a singleton. Think of a website that is running on a

certain set of machines. Even if executing on a cluster, it's one logical entity running there. When

I provision my client by registering it at the authorization server, I have a clear identity for it. I

have URLs that determine where this client lives, and I have a flow for getting whatever secret

we want to agree upon, which I can save and protect locally.

Allegedly, if the application is running on a server, the server administrator is the only person

that can access that secret. Contrast all of this with applications that, for example, run on your

device: those apps are all but a singleton. Every phone will have a different instance of Slack, for

41
example. When you download the application from the application store, there is no easy way

for you to get a unique key that would represent that particular instance of a client.

You certainly cannot embed such key in the code, because it would be de-compiled in a second-

and you'd be in trouble. Also, the device is always available in the pockets of the people using it.

It is outside of your control, so there is no way for you to protect the key for an extended period

of time. A motivated hacker has an infinite time to actually dig into the device, as opposed to a

server that needs first to be breached before it can reveal its secrets.

In summary, confidential clients are clients for which it's appropriate to assign a secret. The

classic scenario is websites that run with a server.

But you can also think of an IoT scenario, in which you want to identify the device itself rather

than the user of a device.

Another scenario involves long-running processes.

For example, consider a continuous integration system that uses your Jenkins and compiles

your product overnight, runs tests, and similar long-running tasks. It's likely that you'll want that

daemon to run with its own identity, as opposed to the identity of a user. In fact, if you use the

identity of a user, and then the user leaves the company, it may happen that everything grinds

to a halt, and no one knows why. This happens because very often people forget that a particular

user identity was used for running these scripts. So, assigning its own identity to the daemon is

a better option.

One subtlety here is that even if an application is a confidential client, not every single grant that

the application does will require the use of a client credential. It is a capability that the application

has, but it doesn't have to exercise it every time. There will be, in fact, scenarios, like the one

42
that we are about to explore, in which there is no need to use keys. Typically, the key is used for

proving your identity as a client when you're asking for a token for accessing a different resource.

Instead, we'll see that in the case of Web Sign-In, you are the resource.

The Implicit Grant with form_post

The grant that we're going to use here is the implicit grant with form_post. It is kind of a mouthful,

but, unfortunately, that's the way the protocol defines it. This is something that wasn't possible

before OpenID Connect. It is the easiest way to achieve Web Sign-In using OpenID Connect and

it is really similar to SAML. In fact, it basically follows the same steps that I've described when I

demonstrated the first SAML flow in the first chapter, Introduction to Digital Identity.

This grant constitutes the basis of something that only OpenID Connect can do, that is combining

signing-in in a website with granting that website with delegated permission to access an API.

What we are going to do now is to study half of that transaction. We'll only look at the sign-in part.

When we will talk about APIs, we'll look at the other half. Those two halves can be combined so

that the experience for the user is truly streamlined. Also, in terms of design, combining sign in and

API invocation capabilities makes it possible for an application to play multiple roles. This is a really

powerful scenario that wasn't possible before OpenID Connect.

Given that we're using the front channel, we don't need to use the application credentials. We

see that there are security implications here and there, but, as just said, it is just like SAML.

Setting this thing up from a developer perspective is a thing of beauty. You just install your

middleware in front of your application. Then, you use your configuration to point it to the

discovery endpoint, as we mentioned in Chapter 2, OpenID Connect and OAuth, and just specify

43
the identifier that you were assigned as a client when you registered your application. In the

authorization server, you need to specify the address where you want to get tokens back to the

app, and you’ve done.

A detailed walkthrough

Let's see in detail how the implicit grant with form_post works. Take a look at the scenario shown

by Figure 3.1:

We have a user with a browser, a web application protected by a middleware implementing

OpenID Connect, and an authorization server.

You might notice that, in this authorization server, I'm showing only the authorization endpoint

and the discovery endpoint. I don't show the token endpoint because, in this particular flow, we

don't use it.

The idea is that, as soon as this web application comes alive, the middleware will reach out

to the discovery endpoint and will learn everything it needs about the authorization server. In

particular, it will get the address of the authorization endpoint and the key to be used for checking

signatures. We’ll show how all those steps occur in detail later on (see Metadata and Discovery

sectionref). For now, we’ll focus on the authentication phase proper.

Let's see how the access plays out by describing each numbered step.

1. Request Protected Route on Web App


In the first step, the browser reaches out to the application to get one particular route-

which happens to be protected hence not accessible by anonymous requests.

44
Figure 3.1

45
2. Authorization Request Redirect
The middleware intercepts this call and emits an authorization request for the authorization

server in response. The HTTP response has an HTTP 302 status code, i.e. it's a redirect,

and has a number of parameters meant to communicate to the authorization server all the

information necessary to perform the required authentication operation.

Figure 3.2

It’s really important to understand the anatomy of this message since all the other messages that

we'll see will be a derivative of this. Here, we're going to touch on all the most relevant parameters.

Authorization endpoint. The first element is the authorization endpoint. That's the address

where we expect the authorization endpoint functionality to be for the authorization server.

Client ID. This client_id parameter is the identifier of your application at the authorization

server. The authorization server has a bundle of configuration settings associated with your

app, and it will bring those up in focus when it receives this particular client ID.

46
Response type. The response_type parameter indicates the artifact that I want. In this

particular case, I want to sign in, so I need an ID token. Consequently, the value of the

response_type parameter will be id_token. There is a large variety of artifacts that I can ask

for. I can also ask for combinations of artifacts: we'll see those combinations in detail.

Response mode. Response mode is the way in which I want these artifacts to be returned to

me. I have all the choices that HTTP affords me. I can get things in the query string, but this

is usually a bad idea because artifacts end up in the browser history. I can get the artifacts

in a fragment, which is still part of the URL but not transmitted to a server. I can get them as

a form post (form_post), which is what we are using here. In this case, we just want to make

sure that we post the token to our client. This way, we don't place stuff in the query string,

which, as mentioned, is generally a bad practice, from the security perspective. The use of

a POST also allows us to have large tokens. In fact, if you would place stuff anywhere but in

a form post, then you might run into size limitations.

Redirect URI. The redirect_uri parameter has a very important role. It represents the address

in my application, where I expect tokens and artifacts to be returned to. I need to specify

this because the tokens that we use in this context are what we call bearer tokens.

Bearer tokens are tokens that can be used just by owning them. In other words, I can use it

directly, without needing to do anything else, like other types of tokens might require. For

example, other types of tokens may require me also to know a key and use it at the same

time. But bearer tokens don't. You will hear much more about bearer tokens in the section

about token validation (see Principles of Token Validation). So, it is imperative that I use only

HTTPS so that no intermediary can interject itself and intercept traffic.

47
Also, it is very important that I specify the exact address I want the response to be sent back

to. If I don’t and, for example, instead of doing a strict match with the address they provide,

I allow callers to attach further parameters, I put communication security at risk. What might

happen - and it did happen in the past - is that there might be flaws in the development stack

I'm using that will cause my request to be redirected elsewhere. That would mean shipping

to malicious actors my bearer tokens, and that’s all they’d need to impersonate me. OAuth2

and OpenID Connect are strict about this: the redirect URI that you specify in the request

has to be an exact match of what you want.

Scope. The scope parameter represents the reason for which I'm asking for the artifacts. In

the example above, I specified openid, profile, and email, which are scopes that cause the

authorization server to issue an ID token with a particular layout. It's somewhat redundant

with the earlier response type, but I’m also asking for enriching this ID token with profile and

email information of the user if present.

In short, with the scope, I am specifying the reason for which I want the artifacts I am

requesting. We will see that, when we will use APIs, we'll be asking for particular delegated

permissions we want to acquire.

Nonce. The nonce parameter is mostly a trick for preventing token injection. At request time,

I generate a unique identifier, and I save it somewhere (like in a cookie). This identifier is

sent to the authorization server, and eventually, the ID token that I receive back will have a

claim containing the same identifier. At that point, I'll be able to compare that claim with the

identifier that I saved, and I'll be confident that the token I received is the one I requested. If

I receive a token that has a different (or no) identifier, I have to conclude that the response

has been forged and the token injected.

48
It is worth mentioning that I specified form_post as the value for response_mode because the

default response mode of ID token would be different (it would have been fragment); hence I had to

override it explicitly. The following table shows the default response mode for each response type

defined by OAuth2 and OpenID Connect. If I omit response_mode in the request, the authorization

server will apply its default value.

response_type default response_mode


code query
token fragment
id_token fragment, query disallowed
none query
code_token fragment, query disallowed
code_id token fragment, query disallowed
id_token token fragment, query disallowed
code_id token_token fragment, query disallowed

3. Authorization Request
The next step for the browser is to honor the 302 redirection and actually perform a GET

hitting the authorization endpoint with all the parameters I just described.

From now on, the authorization server does whatever it deems necessary to authenticate

a user and to prompt for consent. How this occurs isn't specified by OAuth2 or OpenID

Connect. The mechanics of user authentication, credentials gathering, and the like are a

completely private matter of the authorization server, as long as the eventual response

that comes back is in the format dictated by the standard. You can have multi-factor

authentication, multiple pages, one single page. It doesn't matter, as long as you come

out with a standard result.

49
4. Authorization Response
Once everything works out, you get an HTTP response with a 200 status code. This means

that you have successfully authenticated with the authorization server. The authorization

server will set a cookie that represents your session with it. So, if later on you need to hit

the authorization endpoint again, you will not have to enter credentials to sign in explicitly.

You might have to give more consent, for example, but you shouldn't have to re-enter

credentials.

The other important part to note here is the ID token, which is what we requested. It is

being returned as a parameter in the form post that we are getting. You can see in the

body of HTML being returned, that the JavaScript onload event is wired up to submit a

form automatically.

5. Send the Token to the Application


As soon as the page returned by the authorization server gets rendered, it's going to post

the form to our application. This means that the requested ID token is finally sent to my

web application.

6. Token Validation and Web App Sessions Creation


What happens now is pretty much the same thing that we studied earlier in the web sign-

on scenario in the first chapter, Introduction to Digital Identity. The application receives the

ID token and decides whether it likes it or not according to all the various trust rules, and

what it has learned from the discovery endpoint. If it likes it, the app will emit an HTTP

302 response with its own cookie. Thanks to that cookie, representing an authenticated

session with my app, I will not need to get the ID token again as long as the cookie is valid.

50
Together with the cookie creation, the app emits an HTTP 302 response, which redirects

the browser to the original route it requested.

7. Request Protected Route with Authorization


As the browser honors the redirect, we end up where we started: we are requesting a

protected route, but this time we present a session cookie with it.

If you compare the original request in 1 with this redirect, you will discover that it is exactly

the same request but with a cookie coming along.

8. Access the Protected Route


Finally, after this long back-and-forth, we can get our response, which is an HTTP 200

response with a page in the body.

From now on, every subsequent request toward the application will carry the session

cookie, proving that there is an authenticated session in place.

Anatomy of an ID Token

As we said earlier, the ID token is an artifact proving that a successful authentication occurred.

We have two ways of requesting it: using a response_type parameter with the id_token value

and using a scope parameter with the openid value.

The reason for which we have two mechanisms is that the authors of the specifications wanted

to be able to use OpenID Connect even if your SDK was only based on OAuth2. In fact, at the

OAuth2 time, there was no ID token in the enumeration of a response type. Since scopes are

completely generic as a parameter, then the ability to use one particular scope that would cause

51
the authorization server to return an ID token was a great way of being backward-compatible.

Today, it's a great way of getting confused, but now that you know, you no longer run this risk.

OpenID Connect defines the ID token as a fixed format, the JSON Web Token (JWT) format. The

specification actually defines not just the format but the list of claims that must be present in an

ID token. In addition, it even tells you in normative terms what you need to do in order to validate

some of those claims. As we said, if I include a profile or email value in the scopes of my request,

I will cause the content of the ID token to look different.

Just to get a feeling of it, here you can see what you would normally see on the wire:

Figure 3.3

That’s what a JWT token normally looks like, with its Base64 encoded components. If you go to

jwt.io, which is a very handy utility offered by Auth0, you can actually paste the bits of your ID

token and see it automatically decoded. The following picture shows an example of such decoding:

52
Figure 3.4

You can see on the right side that we have a header that describes the shape of this specific

JWT. In particular, by examining the header content, we find that this token is in JWT format,

what algorithm has been used for signing it and a reference to the key required to validate the

signature, which in this case corresponds to the key that we downloaded from the discovery

endpoint (more on that in a moment).

If you look at the payload, you’ll find that it contains the actual information we were expecting to

retrieve. Going in more details, we have:

The issuer (iss), which is a string representing the source of the token, that is the entity behind

the authorization server - like the key, also found via the discovery endpoint.

53
The audience (aud), which represents the particular application which the token has been

issued for. It is very important to check this claim. As an app receives this token, the middleware

used for validating it will compare what was configured to be the app identifier (in the case

of sign-in and ID tokens, that will correspond to the client ID of the app) with the audience

claim. If there is a mismatch, that means that someone stole a token from somewhere else,

and they're trying to trick the app into accepting it.

The issued-at (iat) and expiration (exp) are coordinates that are used for evaluating whether

this token is still within its validity window or if, being expired, it can no longer be accepted.

We'll see during the API discussion that access tokens and ID tokens typically have a limited

validity time.

All the other claims are pretty much identity information about the user, which are present in

the ID token only because I asked for profile and email in the scope parameter.

Principles of Token Validation

We've been talking about validating tokens quite a lot, relying on the intuition that it entails

validating signatures and performing metadata discovery. Let's explore the matter in more detail,

and have a more organic discussion about what it means to validate tokens.

We have seen the function that tokens perform in a couple of scenarios. We have seen signing

in with SAML. We have seen access tokens for calling APIs, and in particular, right now, we have

seen how to use an ID token for signing in. All those scenarios entail an entity, the resource, to

receive a token and make a decision about whether it entitles the caller to perform whatever

operation the caller is attempting. How does the resource take that decision?

54
Subject Confirmation

The subject confirmation is a concept we inherit from SAML. In particular, the subject confirma-

tion method determines the way in which a resource decides whether a token has been used

correctly or not.

Bearer is the simplest. It is similar to finding 20 dollars on the floor. You pick up the money, go

wherever you want to use this money, use it, and you're going to get the good or service you

are paying for. No further questions will be asked because all it takes for using 20 dollars is to

own those 20 dollars and for them to change hands. That's the substance of the bearer subject

confirmation method. If you have the bits of a token in your possession, you are entitled to use

the token.

Proof of possession is something more advanced. In proof of possession, you have a token that

contains a key of some kind in some encrypted section. This encryption is specifically done for

the intended recipient of the token. The idea is that when a client obtains such a token, they

also receive a separate session key, the same key embedded in the encrypted section of the

token. When the client sends a message to the intended recipient, it attaches the token as in the

bearer case, but it also uses this session key to do something - like signing part of a message

When the resource receives the token and the message, they will validate the token in the usual

way as we described for bearer. That done, they will extract the session key from the portion that

was encrypted for them. They'll use the session key to validate the signature in the message. If

the validation works, the recipient will know for certain that the caller is the original requestor

that obtained the token in the first place. Otherwise, they would not have been able to use

the session key.

55
This mechanism is more secure than the bearer: an attacker intercepting the message would be

able to replay the token, but without knowledge of the session key, they would not be able to

perform the additional signature and provide proof of possession.

Today nobody substantially is using proof of possession in OAuth2 or OpenID Connect. But proof

of possession is now coming back. There is a specification, still in draft, which shows how to use

the mechanism I just described in OAuth2 and OpenID Connect, but it is not mainstream at all.

That specification is not yet an approved standard.

So, to all intent and purposes, you can think of Bearer tokens as being the law of the land. There

is another concept - the sender constraint - but I'll talk more about it when we deal with native

clients (Chapter 5, Desktop and Mobile Apps).

Format Driven Validation Checks

In OAuth2, access tokens have no format. The standard doesn't specify any format mostly because

originally, it was thought for a scenario where the authorization server and the resource server

are co-located, and they can share memory.

Think, for example, of the scenario we described in the first chapter, where Gmail is the resource

server with its own APIs, and it's also the authorization server.

In that particular scenario, those two entities can share memory. They can have, for example,

a shared database. So, when a client asks for an access token, this access token can be just

an opaque string that happens to be the primary key in a specific table where the authorization

server saved the consent granted by the user to the client.

56
When the client makes a call to the resource server presenting this token, the resource server

grabs the token and just uses it for finding in the database the correct row and in it the consented

permissions. The resource server uses that information for making an authorization decision.

This scenario is compliant with the spirit of the spec - and also the letter of the spec - and we

didn't need to mandate any specific format.

However, in the case of OpenID Connect, we did define a format for the ID token. We expected

the receiver actually to look inside a token and perform validation steps. This happens typically

when the resource server and the authorization server are not co-located, hence cannot use

shared memory to communicate. In those cases, you typically (but not always) rely on an agreed-

upon format.

Also, in the SAML case, we defined a format, a set of instructions on how to encode a token.

In the case of format-driven validation checks, there are certain constraints which apply pretty

much to every format, and in particular, to JWT:

Signature for integrity. Your token is signed, and we have seen the reasons for which we want

to sign a token: being sure of the token origin and preventing tampering in transit. The token

must provide some indication about the key and the algorithm used in order for its recipient to

be able to check its signature.

Infrastructural claims. Token formats will typically include infrastructural claims, meant to

provide information that the token recipient must validate to determine whether the incoming

token should be accepted. One notable example of those claim types is the issuer, which is to

say the identifier of the entity that issued (and signed) the token, and that should correspond to

one of the issuers trusted by the intended recipient. Another common infrastructural claim, the

57
audience, says for whom a token is meant to. You need audience to have a way of validating that

the token is actually for a specific recipient. You also need expiration times claims: tokens

have typically restricted validity so that there is the opportunity to revoke them.

Those are all claims that you would expect tokens to have and that the middleware is typically

on point to validate.

Alternative Validation Strategy: Introspection

There is a different way of validating tokens, which goes under the name of introspection. With

this approach, the resource receiving a token considers it opaque. It may happen because it

doesn’t have the capability to validate the token. It should be rare in the JWT case because

checking a JWT is pretty trivial, and it can be done in any dev stack. However, imagine that for

some reason, you cannot assume that incoming tokens are in a format that you know how to

validate.

You can take the incoming token and send it to the introspection endpoint, which is an additional

endpoint that can be exposed by authorization servers. Given that you connect to the introspection

endpoint using HTTPS, you can actually validate the identity of the server itself. You can be

confident that you are sending the token where it's meant to go, as opposed to a malicious site.

The authorization server examines the token, determines whether that token is valid or not, and

if it is valid, send down the same channel the content of the token itself (e.g.claims).

In a nutshell, the resource server sends back tokens to the authorization server saying, "Please

tell me whether it's valid or not." The authorization server can render a decision and send it back

to the client, along with the content of the token, so that the resource can peek inside.

58
Personally, I'm not crazy about introspection, mostly because it's brittle. You need to have the

authorization server up and available, and if your application is very chatty, you might get throttled,

for example. Also, with this approach, you need to wait until you have one extra network round trip

before you can actually make an access control decision about the resource that you're calling.

You might run out of outgoing HTTP connections, which typically live in a pool. It's a lot of work.

Sometimes there are no alternatives. But in general, for Auth0, given that we always use JWTs

and public cryptography, normally, it's just better if you validate your own token at your API.

Metadata and Discovery

The way in which token validation middleware discovers the values expected in valid tokens

is through the discovery endpoint. The middleware simply hits the URL ./well-known/openid-

configuration, which is defined by OpenID Connect, and retrieves validation information according

to the specification.

The document published at this URL typically contains direct information that we need to have,

like the issuer value, the addresses of our authorization endpoint, and similar. It also connects

to a different file that contains the actual keys, which could be literally the bits of X.509 public

key certificates.

Let’s take a look at how middleware extracts validation information from the discovery endpoint

by following the numbered steps in Figure 3.5.

59
Figure 3.5

60
1. Request Configuration
At load time or even the first time that you receive a message, the middleware reaches

out to the discovery endpoint.

That’s a simple matter of making an HTTP GET request to the ./well-known/openid-

configuration endpoint of the authorization server.

2. Receive Configuration Document


What you get back is a big JSON document with all the values required to validate incoming

tokens.

For example, just to highlight some of these values, you have the address of the

authorization endpoint (authorization_endpoint), the value of the issuer (issuer), which is

the value that we are supposed to validate against, a list of claims which are supported

(claims_supported), the supported response modes (response_modes_supported), and

a pointer to the file where all keys are kept (jwks_uri).

3. Request Keys
The next step would be to actually make a GET request to the address at which the keys

are published.

4. Receive Keys
The result of that request will be another file containing a collection of keys with their

respective supported algorithm (alg), their identifier (kid), and the bits of the public key.

The middleware programmatically downloads all of that stuff and keeps it ready.

Those keys will occasionally roll, because it's good practice to change them. Your

middleware will simply have to reach out and re-download these keys when it happens.

61
Chapter 4 - Calling an API from a Web App

In this chapter, we move our attention to calling APIs. This is the quintessential scenario addressed

by OAuth 2.0: delegated access to API is the main reason for which OAuth came to begin with.

Most of the discussion will focus on the canonical grant OAuth 2.0 offers to address the delegated

API access scenario, the Authorization Code grant. We’ll also take a look at other grants, such

as the Hybrid flow and the Client Credentials grant, which can be used to call API in slightly

different scenarios.

The Authorization Code Grant

At a high level, the way we typically invoke an API from a web application is roughly the same

way we’d call an API from any client flavor. Details will differ, as we will see throughout the book.

Depending on the client’s flavor, we'll use different grants, with different properties. In particular,

in this chapter, we want to focus on the scenarios in which a web application calls an API from

its server-side code. To that purpose, we use the OAuth 2.0 authorization code grant. The

authorization code grant, code grant from now on for brevity, empowers one web application to

access an API on behalf of a user and within the boundaries of what the user granted consent

for. This is the grant we encountered when introducing OAuth 2.0 in Chapter 1.

In section Layering Sign In on Top of OAuth2: OpenID Connect of Chapter 1, we've seen that

some people tried to stretch this grant to achieve sign in, as opposed to invoking an API. In the

same section, we have seen how if you just use this grant to obtain and use access tokens for

62
signing in, things don’t work out that well. We have seen how the OpenID Connect is layered

on top of this grant to achieve signin the right way, and we'll have more considerations about it

in this chapter. At this point, I just want to stress that what we are looking at in this chapter is

aimed at calling APIs, and not at signing in.

Another important concept to grok upfront is that the code grant will only empower an

application to do up to as much as the user can already do and no more. If anything,

the application will usually end up having fewer access rights. Users cannot use the

code grant for granting application access to the resources the users themselves

don't own or have the rights for. When thinking about OAuth 2.0 and the code grant,

in particular, it's easy for people to get confused. They observe that APIs grant

access to a call depending on the presence of scopes in the token. That lends to

the credence that the scopes themselves are what grants the client the privileges

to access the resource. Actually, the scopes select what privileges the user already

has and is delegating to the client.

I just want to stress that the authorization code grant is a delegated flow. It allows

clients to do things on the user’s behalf, which means that the user’s capabilities are a

hard limit for what an application can do on the user’s behalf. In other words, a client

obtaining a token via code grant cannot do more than the user can do. If you need a

client to do more than the user can do, which is a common scenario, then you need to

switch to a different flow in which permissions are granted directly to the application

that needs it, with no user involvement. Clear as mud? Don’t worry. We'll revisit those

points later in the chapter.

63
In the last chapter, we explored how to perform web sign in through the front channel, which

afforded us the luxury to implement the full scenario without any secrets. As you witnessed in the

detailed descriptions of flows and network traces, no secret came into play. In the authorization

code grant, however, the use of an application credential such as a client secret is inevitable.

Whenever the web app redeems an authorization code, it needs to authenticate as a client to the

authorization server. The way in which we will approach the delegated API invocation scenario

will vary depending on whether one needs to access the APIs only while a user is present and

currently signed in in the application, or whether one needs to acquire permanent access to the

APIs and perform calls to these APIs even when no user is present.

My favorite example is an application that can publish tweets at an arbitrary time. Personally, I

don't like to wake up early in the morning: I really hate it. Nonetheless, it turns out that tweets

get the best exposure when they come out pretty early. The fact that I’m based in the West

Coast makes things even worse: if I have to publish tweets manually at a time that should be

considered morning in the entire North America, I’d have to wake up real early. Luckily, there

are applications I can use for tweeting on my behalf at whatever time I schedule beforehand.

Those applications are a typical example of a client that needs to have an access token always

available for calling the Twitter API on my behalf, regardless of the fact that I am currently signed

in an active session or blissfully still asleep. This is one of the classic scenarios, offline access,

demonstrating the need and intended usage of a very important artifact - the refresh token.

Once again, we’ll explore this scenario in detail in this chapter.

Without further ado, let's dive into the details of the authorization code grant with the help of

the diagram in Figure 4.1.

64
Figure 4.1

65
The diagram depicts the usual actors we encountered in Chapter 2:

On the far left, the user and their browser

The authorization server, on top. Note that this time both the authorization and the token

endpoints are present in the picture, as both will come into play

A web application roughly in the middle

The API the web app needs to call as part of our scenario

Just like we did during the first explanation of the OAuth2 flow in Chapter 1, section Delegated

Authorization: OAuth2, here we assume that the user is already signed in with the web application.

We don't know how that sign-in operation occurred, and we don't care in this context - the API

invocation operation can be performed independently of the sign-in (although we will later see,

in the section on hybrid flow, that there are potential synergies there). Let’s examine the message

sequence in detail.

1. Route Request
The user hits a route of the web application that, in our sample scenario, allows the user to

book an appointment. Booking an appointment happens to require accessing the booking

API on behalf of the user; hence, accessing that route causes the web app to generate a

request for delegated access.

Note, if you compare the equivalent step in the flow described in Chapter 3, section

The Implicit Grant with form_post for the sign-in operation, you will notice that the web

app does not have a middleware in front to intercept the route request. In this case, the

route isn’t the asset we want to protect requesting that route just happens to be the thing

66
that triggers the need to acquire a token to call an API. The logic necessary to generate

the associated delegated authorization request is, in fact, inside the app codebase itself

(although it will often be implemented by an SDK, rather than from scratch).

2. Authorization Request
The reaction from the application to the request is somewhat familiar: a 302 HTTP status

code response with a message for the authorization server. You can see a number of differ-

ences with the equivalent step 2 in section The Implicit Grant with form_post of Chapter 3.

First, we are setting a cookie to track the nonce value (see Chapter 3, section Authorization

Request Redirect for more details), as besides the access token needed for accessing

the API, we'll also be asking for an ID token. The ID token is useful in this flow, knowing a

bit more about the transaction, given that the access token itself is opaque to the client.

More details later in this chapter.

Next, in the captured trace message, we have the authorization endpoint.

Ignoring the audience parameter for a second, the next entry is the client_id - representing

the client ID identifying the web app at the authorization server.

The response_type for this particular grant is code. We want to obtain a code from the

authorization endpoint, which the web app will later exchange via token endpoint for an

access token.

We don't need to specify the response mode because we are okay with a default response

mode, which in the case of code response type is query - meaning that we expect the

authorization server to return the authorization code in a query string parameter.

67
Next, we find the scope parameter. This message includes all the same scope values

encountered earlier, openid, profile, and email, indicating that we require an ID token

alongside the code. This time, however, we aren’t requesting an ID token for sign-in

purposes: we just want to have some information about who is the resource owner grant-

ing permission in this transaction. Without an ID token, that is to say, something the client

itself can consume, we would have no way to know. We'd just blindly get an access token

and use it, with no indication about the identity of the user who obtained it.

The scopes collection includes a scope value we didn’t encounter yet, read:appointment.

That scope value represents a permission exposed by the API we want to invoke: in other

words, one of the things that can be done when using that particular API, and that can

be gated by an authorization check. By presenting that scope value in the authorization

request, the client is saying to the authorization server: “This web application wants to

exercise the read:appointment privilege on behalf of the user”. That's something that the

authorization server needs to know. It will determine important details in the way the

request is handled, such as the content of the consent prompt presented to the user and

the actual outcome in granting the delegated permissions.

The next parameter represents the redirect URI, which you are already familiar with.

The last parameter in the captured message is the nonce, a token injection prevention

mechanism we already encountered earlier in the book.

Now that we covered every message parameter in detail, let’s revisit the audience

parameter. When requesting an access token for an API protected by Auth0, a client is

required to specify one extra parameter, called audience, indicating the identity of the

resource the client is requesting access to.

68
The core OAuth2 specification does not contain any parameter performing this function,

mostly because there is an underlying assumption (though not a requirement) that resource

server and authorization server are co-located. This assumption makes it unnecessary

to identify which resource server the request refers to. For a concrete example of this

scenario, consider how Facebook uses OAuth2 for gating access to its Graph API. The

Facebook authorization server can only issue access tokens for the Facebook Graph API;

there is no other resource server in the picture. The only latitude left to clients is to specify

different scopes for that one resource server, the Facebook Graph. Different scopes will

express different permissions and operations I intend to exercise, but they will all refer to

the same resource server, which doesn’t need to be explicitly named in the authorization

request. Similar considerations hold for Google, Dropbox, and other popular services:

whenever clients get tokens from those services, they are always calling the provider’s own

APIs, whose identity results self-evident from the context without requiring an identifier in

the request.When the solution includes a 3rd party authorization server, like in the case

of an Auth0 customer leveraging the Auth0 authorization server to secure its own custom

API, the topology makes it possible for the same authorization server to be used to gate

access for a multitude of resources, which can all live in different places. In that scenario,

the client does need the ability to specify which resource it intends to request access to.

There are multiple ways a message could be constructed to include explicit references

to a particular resource server. For example, an API might embed a resource server

identifier in individual scope strings themselves. However, the approach has issues: scope

strings could get really long and hard to read. Also, including multiple scopes referring to

different resources in the same request might generate ambiguity about which resources

the resulting access token could be used with.

69
Given those complications, Auth0 and other identity vendors decided to introduce a

dedicated parameter for identifying resources. Azure AD, for example, has a resource

parameter whose semantic is equivalent to Auth0’s audience.

Since those individual vendor decisions have been made, the IETF OAuth2 working group

officially recognized the usefulness of such primitives and issued a new specification,

OAuth2 Resource Indicators. This specification extends OAuth2 with a resource parameter,

which is, to all intent and purposes, equivalent to Auth0’s audience. We plan to start

accepting those standard parameters too in a future update.

3. 302 Redirect Execution


Next, the browser executes the 302 HTTP status code redirection sending the message

we examined on its way toward the authorization endpoint.

4. Authorization Response
Upon receiving the authorization request, the authorization server takes care of the

interactive portion of the flow.The authorization endpoint decides what's necessary for

authenticating the user and goes through it. Then presents them with a consent prompt

saying: "Hey, client X wants to read appointments on your behalf." The moment in which

the user grants consent, the authorization endpoint returns its response with the requested

authorization code in the query string, in accordance with the response_type we asked for.

Also, the response includes the usual set-cookie command with which the authorization

server records in the browser that an authentication session has been established.

70
5. Providing the Authorization Code to the Web App
At this point, the browser simply executes the redirect that will dispatch the authorization

code to the web application. From this moment on, the web application will continue the

flow on the server side.

6. Redeeming the Authorization Code


The web application combines the authorization code with its own client credential and

sends them in a message to the token endpoint.The message to the token endpoint is in the

form of an HTTP POST request where the app presents its client_id and client_secret, the

authorization code received from the front channel, and a new parameter, the grant_type.

The message layout is shown, annotated, in Figure 4.2.

Figure 4.2

71
Every time an application talks to the token endpoint, it has to specify the desired grant

type letting the authorization server know how to interpret the request. In this particular

case, the desired flow is the authorization_code grant. That tells the authorization server

to search for an authorization code in the message, and to consider the client ID and secret

in the context of this specific grant. If, for example, the request would have specified

client_ credentials as the grant type, a flow we’ll discuss later on, then the authorization

server would have ignored the authorization code, would have looked only at the client

ID and client secret and would have considered only the identity of the client application

itself rather than the consent options of the resource owner implied by the authorization

code. In other words, the grant_type parameter is used to disambiguate the flow the client

expects the authorization server to perform.

The request also includes the audience for the reasons stated earlier. In this particular

case, audience is redundant. The authorization code has been granted in the context of

that audience, and the authorization server knows it - hence there’s no need to provide

it again in this request. However, some extra clarity can be beneficial: for example, this

helps to interpret what this request is for while examining a network trace, without the

need to correlate it with the earlier messages that led to this point.

Finally, the message contains a redirect_uri parameter. In this phase, the authorization

server doesn’t really have any opportunity of performing redirects, given that the client

is talking to the authorization server via a direct channel. Rather, the redirect_uri is used

as a security measure to prevent redirection URI manipulation - the authorization server

will verify that the redirect_uri presented here is identical to the one provided during the

authorization code request leg of the flow, preventing an attacker from performing URI

replacement.(see https://tools.ietf.org/html/rfc6749#section-10.6).

72
7. Receiving the Access Token in the Token Endpoint Response
Assuming that the request is accepted by the authorization server and processed without

issues, the grant concludes with a response message carrying the artifact originally

indicated by the response_type in step 2 - Authorization Request, in this case, an access

token. Here’s a breakdown of the response message content:

The requested access token

An ID token, in response to the presence of the openid in the list of requested scope values

The token type, which is always Bearer for the time being - as discussed in the token validation

section

The expires_in parameter, expressing the time through which the access token should be

considered valid. Although at times the access token itself might contain that information, and

happen to be in a format that can be inspected, access tokens should always be treated as

opaque by clients. As such, expires_in needs to be provided as a parameter in the response

for the client to be able to use that information (for example, for deciding for how long an

access token should be cached).

Important: access tokens should always be assumed and treated as opaque by client

applications because their content and format are a private matter between the authorization

server and the resource server. The terms of the agreement between the authorization

server and the resource server can change at any time: if the client app contains code

that relies on the ability to parse the access token content, even minor changes will break

that code - often without recourse.

73
Imagine a case in which access tokens, initially sent in the clear, start being encrypted in

a way that only the intended resource recipient can decrypt. Any client will lose access

to the token’s content. Client code relying on the ability to access the token content will

irremediably break.In summary, avoid logic in client applications that inspects the content

of access tokens. Note, examining the content of a token in a network trace is perfectly

fine for troubleshooting purposes, as the information will be consumed via debugging

tools, without generating code that can break in the future.

8. Using the Access Token to Call the API


Once the client obtains the requested access token, it can finally invoke the API: all it needs

to do is to include the access token bits in a classic REST call. In this particular example,

the call is a GET, but any REST invocation style is possible. The key feature in that message

is the Authorization HTTP header, exhibiting the Bearer authentication schema, carrying

the bits of the access token.

The OAuth2 Bearer Token Usage specification, the document describing how to use bearer

tokens obtained through OAuth2 for accessing resources, says that it's possible to place

the token elsewhere in the outgoing request, for example in the body of a call - or even

a request link, as a query parameter. Encountering clients that send tokens in the body is

very rare. The use of the query string for sending access tokens is actively discouraged,

as it has important security downsides. Consider the case in which your client is running

in a browser: whenever a token is included in the query string, its bits will end up in the

browser history. Any attack that can dump the browser history will also expose the token.

Moreover, if the API call is immediately followed by a redirect, the query string will be

available to the redirect destination host in the referral header: once again, that will expose

the token outside of the normal client-resource exchanges.

74
For those and other reasons, it is reasonable to expect that the near totality of the API calls

encountered in the wild that rely on OAuth2 will use the Authorization HTTP header.

Authorization Code Grant and PKCE


The latest OAuth2 Security Best Current Practice (BCP) documents suggest that every

authorization code flow should leverage Proof Key for Code Exchange (RFC 7636), an extension

to the authorization code grant meant to protect authorization code from being stolen in transit.

PKCE was originally devised for public clients, where it performs essential security functions that

we’ll describe in detail in the next chapter. Its use for confidential clients is not as critical, as there

are other measures already in place (state, nonce checks) mitigating other aspects coming into

play in associated attacks. This is why we have chosen to keep this section light and to defer

introducing PKCE in the next chapter, as you will be more familiar with the original grants, and

it will be easier to add PKCE as an incremental step. However, we wanted to make a note of the

BCP guidance already here, so that if you read about it elsewhere you’ll know what it is all about.

Sidebar: Essential Authorization Concepts and Terminology

OAuth 2.0 offers a delegated authorization framework. Unfortunately, developers often disregard

the “delegated” part, and attempt using OAuth primitives and flows to solve pure authorization

scenarios that the protocol hasn’t been explicitly designed to address. The outcome is solutions

that might appear to work in toy scenarios but fall short as soon as the approach is applied in

more realistic settings.

For that reason, it is a worthwhile investment to spend a few paragraphs discussing essential

concepts and terminology in authorization, spelling out explicitly their relationship with OAuth -

75
and in particular, what is part of OAuth and what is instead a property of the underlying resources

we are exposing.

Permissions
Imagine that you want to expose programmatic access to an existing resource. Depending on

the nature of the resource, there will be varying sets of operations that can be performed on it,

or with it. In the context of a document editing system, users will be able to see, read, comment

on, or modify documents. An API facading a printer might expose the ability to print in black and

white or in color. Any kind of resource will have a set of permissions that make sense for that

particular resource, and that can be allowed or denied for a particular caller. A permission is just

that, a statement describing the type of things that can be done with a resource: document.read,

document.write, print:bw, print:color, mail:read, mail:send, and so on.

Permissions describe intrinsic properties of resources, which exist regardless of how those

resources are exposed. OAuth2 solutions might surface them if they happen to be useful in the

context of a delegated authorization solution involving those resources. Still, in the general case,

permissions exist in their own right and will be used outside of OAuth as well.

Priviledges
A privilege is an assigned permission: it declares that a certain principal (say, John) can perform

a certain operation on a given resource (say, calling the printer API to print in full color).

As it was the case for permissions, the concept of privilege exists independently of OAuth 2

(or any other higher-level protocol, for that matter). For example, the framework necessary to

describe privileges needs primitives for principals (users and apps to whom permissions might

be assigned), which OAuth 2 does not define.

76
The existence of permissions and privileges applied to a set of resources will influence the

behavior of OAuth 2 solutions based on those resources, but how that will happen is not described

directly in the protocol and messages defined in the OAuth 2 specification.

Scopes
Finally, we get to talk about an OAuth primitive. In the case in which a resource needs to be

exposed in the context of a delegated authorization solution, the scope is the primitive that

enables a client application to request exercising the privilege of a user for a particular permission

for a given resource. The mechanism that the client uses for expressing this to the authorization

server is by including in an authorization request the scopes corresponding to the permissions

being requested. When used with this semantic - that is, lists of permissions for a given resource

- scopes are used to define the subset of user privileges that a client application wants to

exercise on behalf of the user. Note that the scopes can be used for other purposes: we have

seen examples of that in the case of openid (requesting the presence of an extra artifact, in that

case, the ID token) or profile, email (influencing returned content).

Effective Permissions
Consider a classic delegated authorization flow in which a client requests to the authorization

server to access a resource. In particular, the client specifies what permissions will be required for

the operations it intends to perform on the resource. Upon receiving the request and authenticating

the user, the authorization server will typically prompt the user to grant the app delegated access

to the corresponding permissions. The user granting consent through that prompt is effectively

saying "Yes, I'm okay with this particular client exercising on my behalf the privileges being

requested".

77
Say, for example, that the client implements an email solution, and the permission it requests is

mail.read. The scope requested is mail.read and the access token being returned will include (by

value or by reference, depending on the format) mail.read.

Once the client obtains the access token, it will use it to make a call to the API, requesting to read

a list of email messages. The middleware protecting the API, upon receiving and validating the

access token, will verify that the scope it carries includes mail.read, the permission required by

the API to perform the read operation requested, and allow the request to move along.

But the authorization checks aren’t over yet! Imagine that the client requests the list of emails

from the inbox of a user who’s different from the user who granted consent and obtained the

access token. Should the API allow the request to succeed? Of course not! Scopes do not create

privileges where there are none. Scopes can grant to a client a subset of the privileges a resource

owner has on a resource but can never add privileges the resource owner didn’t have to begin

with. The effective permissions are the intersection of the privileges a resource owner has

and the scopes that have been granted to the client. The effective permissions represent what

a client can actually do, and that can be a subset of what’s declared in the scopes. You always

need to check at runtime whether the scopes represent something that the resource owner

can actually do for the resource being accessed. Also, note that there is no guarantee that the

privileges the resource owner had at the moment of granting consent will be preserved forever.

Hence, even if your authorization server conflates scopes and privileges (for example, by only

allowing a user to consent if he or she possesses the corresponding privileges), nothing prevents

some of those privileges from being revoked at a later time. This makes it necessary for the API

to check rather than just relying on the scopes in the incoming access token.

This is one subtle point that is often misunderstood in the context of OAuth.

78
Note that OAuth can also be used for application to application flows, in which no user is involved.

The client obtains an access token for a resource from the authorization server only through

its own client credentials, as opposed to requesting access on behalf of a resource owner. You

could say that in those scenarios, the client application itself is the resource owner: there is no

delegation, hence there’s no need for scopes to limit the privileges involved. We will study the

corresponding OAuth 2 grant, the client credentials grant, in a later section in this chapter. In this

case, it's not completely clear how permissionsare expressed, as the core OAuth 2 specifications

don’t provide any mechanism to express assigned privileges (though there is a new specification,

the JWT Profile for OAuth2 Access Tokens, that does introduce some guidance about that).

Regardless of the implementation details of how those privileges are expressed, this is a case

in which privileges are actually carried in the token. There might be other cases where the

authorization server includes user privileges, roles, group memberships, and other authorization

information in the access token. Those cases are all valid and represent real, important scenarios.

However, they aren’t described by the specifications we are studying in this book, so we will not

add further details here.

Finally, consider that although scopes often map to permissions, that is not always the case.

Remember the openid scope? Its presence in a request just causes an ID token to be included in

the response from the authorization server. Or think about the profile scope, which, when added

in a request, causes the ID token to include claims that wouldn’t be present otherwise. So, it's

easy to make the mapping between permission and scope. Scopes do correspond to permissions

in many common cases, which might erroneously create the belief that scopes, and permissions

are the same concepts, but in fact, it’s important to remember that they aren’t.

79
The Refresh Token Grant

Let's now go back to grants. I mentioned this in passing earlier on: tokens typically have an

expiration time. They have an expiration time because a token is caching a number of facts and

user attributes, and those facts might change after the token has been issued.

Also, the ability of a client to obtain a token at a given time doesn’t guarantee that the same

client will be able to reobtain the same token in the future. For example, the resource owner

might visit the authorization server and revoke consent for that client to obtain tokens with the

scopes previously granted. This makesthe content of any previously issued tokens obsolete as

they no longer reflect the current situation. The idea is that by endowing tokens with a short

duration, we ensure that the client cannot really use them (and hence, the information they

cache) for too long. Upon token expiration, clients will be forced to call back home and repeat

a request to obtain a new token. This new request creates the opportunity for the authorization

server to issue a new token containing up to date information or refusing to issue a new token

if conditions changed (e.g., the user account has been deleted from the system). The shorter

the token validity interval, the more up to date the issued information will be. Solutions typically

seek compromises that balance that with performance and traffic considerations.

Of course, this brings another challenge, which is: although we do want up to date information, we

don't want to give users a bad experience to achieve that. The user should be blissfully unaware

of all the low-level mechanisms unfolding behind the scenes to achieve those updates. We need

to empower clients to renew tokens in a way that does not impact the user experience. The way

in which OAuth solved this is by introducing a new artifact, that we call the refresh token, and

associated grants using it to handle token renewals without displaying prompts.

80
The first step to work with refresh tokens is to request one. The OAuth 2 core specification

doesn’t define a mechanism to request refresh tokens, leaving the decision to issue one to

individual authorization servers. However, OpenID Connect does define a mechanism to request

refresh tokens, and the result is that a large number of OAuth 2 authorization servers adopt that

mechanism as their main (or even only) way of requesting refresh tokens.

Let’s revisit the authorization code grant examined in an earlier section and add a few small

changes, as shown in Figure 4.3.

81
Figure 4.3

82
The original message in step 3 carried the list of scope values the client required to request an

ID token with rich attributes content (openid, profile, email) and the access level required for the

operations the client intends to perform (read:appointment). The message in step 3 in Figure

4.3 contains an extra scope value, offline_access. This is a scope value defined in the OpenID

Connect core specification: its presence in a request asks an authorization server to include a

refresh token in its token endpoint response, alongside all the other artifacts (in this case, an ID

token and an access token). In particular, the validity of that refresh token will extend beyond

the duration of the authentication session within which it has been issued. Don’t worry if that’s

not very clear for now. We’ll expand on what that means later in this section.

If you observe step 7 in the diagram, you’ll see that as expected, the authorization server returns

a refresh token along with the usual access token and the ID token.

Now the client has a refresh token in its possession. Let's take a look at how the client uses

it, and in particular how the refresh token makes it possible to get new access tokens without

prompting the user again. The entire flow occurs on the server side, as it entails the client (in this

case, a web app whose code runs on the server) connecting directly to the token endpoint of the

authorization server. The browser, used to send the request and drive the interactive portions

of the transaction, is now entirely out of the picture. Follow the numbered steps in Figure 4.4.

83
Figure 4.4

84
1. Request Configuration
The first leg of the grant takes the form of a typical token endpoint request, analogous to

the code redemption request described earlier in the chapter.

Examining the request, you’ll encounter the following parameters:

The usual client_id

The client_secret. This is a confidential client, hence requests to the token endpoint require

the client app to identify itself.

The new refresh_token parameter, carrying the refresh token bits received earlier.

The grant_type. As mentioned earlier, every request to the token endpoint must specify the

grant the client intends to use. In this case, the parameter value is refresh_token.

The redirect_uri parameter, included for the same security reasons specified in the code

redemption flow description.

2. Refresh Token Response


The authorization server response returns a new access token, a new ID token (because

the original request included openid), and the list of scopes that were granted when the

refresh token was obtained to begin with, in this case, during the authorization code grant.

The reason the authorization server returns the list of granted scopes is that the client

might not really know what this particular refresh token was originally granted with, or if

the conditions at the authorization server changed since its original issuance. Furthermore,

the client can request a certain list of scopes, but the authorization server can always

85
decide to return a subset of those scopes. In that case, if the authorization server wouldn't

return the list of scopes that have been granted in the context of this particular refresh

token redemption, the client would have no way of knowing. Even if it remembered the

ones originally requested, there would be no guarantee

that such a list would be accurate. Remember that the client is bound to consider the

access token as opaque, hence it cannot simply look into the access token to find out.as

opaque, hence it cannot simply look into the access token to find out.

In this particular case, the authorization server does not return a new refresh token

alongside the access and ID tokens. The client is expected to hold on to the refresh token

bits it received on the first flow and keep using it until expiration.

There are various scenarios in which the authorization server does include a new refresh

token at every refresh token grant. The most notable case is in the context of a security

measure called token rotation.

Token rotation guarantees that, whenever you use a refresh token, the bits of that particular

refresh token will no longer work for any future redemption attempts. Every use of a refresh

token will cause the authorization server to invalidate it and issue a new one, returned

alongside the refreshed access token. Clients need to be ready to discard old refresh

tokens and expect to store new ones at every renewal operation.

Any attempt to use an old refresh token will cause the authorization server to conclude that

the request originator stole it. That might trigger protective measures, such as invalidating

all the other tokens that have been created in the same authenticated session, in case

the leak indicates a compromised application. Note that this measure might be overkill for

86
confidential clients, where use from legitimate clients is enforced by requiring applications

to use their client_secret when redeeming refresh tokens. However, it is extremely useful

for public clients, where apps can redeem refresh tokens without the need to exhibit any

app credentials. More details about this will be discussed in the next chapter on native

and mobile clients.

3. Calling the API


The new access token will be used exactly in the same way as the old one: all the

considerations about calling API according to the OAuth2 Bearer Token Usage specification

apply.

Some Considerations on Refresh Tokens


The fact that a client requests a refresh token including the scope offline_access signals to the

authorization server that the resulting refresh tokens lifetime will be decoupled from the lifetime

of the authenticated user session within which the grant was performed. In other words, whether

a user is signed in or not signed in with an application via the front channel doesn't really matter

with respect to whether the same application is able to redeem a refresh token. Also, the fact

that the app can still use a valid refresh token doesn't say anything about whether there’s an

active sign-in session for the user that helped obtain that refresh token in the first place. The two

things are completely separated. The scenario that offline_access is meant to support is the one

that I described at the beginning of the chapter, where a user wants to schedule a tweet to be

published at a future time regardless of whether the user will be signed in at that time or otherwise.

In more general terms, it addresses the case in which an application might be in need to obtain

a valid access token to invoke an API even if no user is present to tend to interaction requests.

87
One common mistake developers make is to interpret the ability of an application backend to

redeem a refresh token as proof that the user still has a session. Per the above explanation, this

is a dangerous mistake that can lead to resurrecting sessions already expired or terminated via

sign out, making front channel session management ineffective.

When developing applications that need to invoke APIs even without an active user session,

the app clearly needs to persist refresh tokens so that they are available independently of the

presence of an interactive session. Even for cases in which API calls are scoped to the interactive

session lifetime, tokens need to be saved somewhere other than in memory if you want to spare

users from having to go through token acquisition flows in case the webserver memory recycles.

Of course, persisting refresh tokens (and tokens in general) requires caution. It’s important to

make sure that tokens are stored per user, to prevent the possibility of a user ending up accessing

and using the refresh tokens associated with another user. That's just the same basic hygiene

required to enforce session separation, but when it comes to tokens, the need to follow best

practices is all the more critical given the high impact of identity mix-up and the complications

that derive from persisting user data beyond the interactive session lifetime.

To close the topic of refresh tokens for this chapter, here’s a last recommendation. Even if you

know the expiration time associated with a refresh token, you should still not rely on that in your

code. There are many reasons for which a refresh token might stop working, regardless of its

projected expiration. For example, a user could revoke consent, immediately invalidating refresh

tokens issued on the basis of previous consent. Another example: a resource server might change

policy and establish that, from that moment on, it will only accept access tokens obtained via

multi-factor authentication. This renders any refresh token obtained with a single-factor session

unable to obtain viable access tokens and forces the client to reobtain a new refresh token via

multi-factor authentication. Again, all this may happen regardless of the declared expiration of

88
the original refresh token. For all those reasons, it is prudent to develop client code assuming

that a refresh token might stop working at any time and embed appropriate error management

and remediation logic upfront.

Sidebar: Access Tokens vs. ID Tokens

You now had the opportunity to see both access tokens and ID tokens in action. Just as important,

you learned about the reasons for which both artifacts have been introduced by OAuth 2 and

OpenID Connect in the first place. It is worth stepping back for a moment and summarizing the

differences between the two token types, as confusion about when to use what is one of the

most common challenges you’ll encounter as an identity practitioner.

Access Tokens Recap


Access tokens are artifacts meant to enable a client application to access a resource, typically

on behalf of a resource owner bestowing the client application with delegated authorization. As

discussed, there is no token format mandated by OAuth 2.

Earlier on, we discussed the implications of the common topology where authorization server

and resource server are co-located, making it possible for them to access shared memory and

making using a format for access tokens unnecessary.

Conversely, consider an authorization server separated from the resource servers, as it is the case

with identity as a service offering like Auth0, where the same authorization server is shared by

multiple resource servers owned by different companies. This is a scenario that can really benefit

from agreeing on a format and using it for validating incoming tokens, even if the protocol doesn’t

89
offer anything out of the box. The use of JWT as a format for access tokens is so common that

there’s a standardization effort currently ongoing to define an interoperable profile for it - which

I happen to be currently driving. You can find the latest draft (at the time of writing) here.

At the cost of being pedantic, it should be stressed that, as a client app developer, you should

never write code that inspects the access token content. The fact that in some cases you might

know that a specific token format is being used doesn’t change this, as the reasons for which it’s

not a good idea are more about the contracts between client, resource and authorization server. In

fact, it will often be happenstance that you have a chance to look inside an access token, and the

situation might change at any time. The format used in an access token is a matter agreed upon by

the resource server and the authorization server, and the details can change at any time to their

discretion without informing the client. Any code predicated on assumptions about the access

token content will break as soon as those assumptions no longer hold, and on occasions without

any remediation. Think of information being removed, or the content beingencrypted so that no

entity, but the access token intended recipient can inspect it. Although during troubleshooting

it is legitimate for a developer to read whatever information is available, including the content of

captured tokens, developing code that does so routinely will very often result in downtimes and

serious production problems.

ID Token Recap
ID tokens are designed to support sign-in operations and, optionally, make authentication

information available to clients. They don’t contain any delegated authorization information

(though nothing prevents implementers from extending the default claims set described in the

specifications with their own custom values). ID tokens come into play during user sign-in, and

clients can use them to learn about what happened during the authentication flow. Whereas

90
clients should really not inspect access tokens, as discussed in detail just a few paragraphs

earlier, clients must look inside ID tokens - that’s part of the validation step described in the Web

Sign-In chapter and mandated by the OpenID Connect core specification.

One of the most common points of confusion about ID tokens is whether they can be used for

calling APIs. The short answer is that they shouldn’t. Let’s invest a few moments to understand

why people attempt that, and why it’s generally not a good idea.

ID tokens are designed to support sign-in operations. The client app is simultaneously the

requestor and the recipient of the ID token: once the token has been received by the client, it

has reached its intended destination and isn’t meant to travel any farther. All the client needs to

do with it is to validate it and extract user attributes, when present. Both are operations that can

be done locally, thanks to the fact that ID tokens have a fixed format, and the OpenID Connect

specification details how to perform validation. The ultimate proof that the ID token shouldn’t

leave the client app lies in the aud claim, formalizingthat the client app is the intended recipient

by carrying its client_id value. We have discussed all this in Chapter 3, Anatomy of an ID Token.

Nonetheless, there are real-world situations in which client apps do use ID tokens for invoking

API. Often, that is due to designers not fully understanding the underlying protocols, and in

particular, the role of the audience claim. For them, a JWT is a JWT, and ID token is often easier to

obtain as it doesn’t require registering APIs, defining scopes, and adapting validation techniques

to each specific authorization server requirements. For example, some will not use JWT as the

format for access tokens and will require supporting introspection calls. Some others might not

be designed to protect 3rd party API at all, hence not offering API registration and access token

issuance and validation features, but still issue ID tokens for sign-in purposes.

91
Nonetheless, in the general case using ID tokens for invoking API has issues. The main problem

goes to the heart of why we have audiences in the first place. An API receiving an ID token can

only verify that the token was issued for that particular client: there’s nothing in the token saying

that it was issued with the intent to call this particular API. Besides the practical issue of being

unable to insert ad-hoc claims for that particular API, there are serious security concerns: a

leaked ID token can now be used not just to access the client, but also to invoke this API and all

the other APIs following the same strategy.

Whereas properly scoped tokens would contain the blast radius of a leak event (an access token

scoped to API A can only be used with A), many APIs accepting an ID token means that they would

all be compromised at once. This also makes it really hard to maintain separation between API:

if both A and B accept ID tokens, that means that when the client calls A, A can turn around and

use the same token it received from the client to invoke B. Although that might be acceptable at

times, in the general case, this should never happen as a side effect.

Lastly, I will mention that the use of ID tokens for calling APIs cannot be secured by sender

constraint, asthe protocols supporting it won’t provide any mechanism to associate the ID token

to a channel between client and API.

For the sake of exhaustiveness, I want to acknowledge a particular situation where the use of

ID tokens for calling an API might not be disastrous, though it’s never as good as using access

tokens. Consider the case in which the client app and the API in itself happen to be the same

logical application. That’s the scenario commonly described as “1st party app”, where both

ends have the same owner and are tightly coupled to implement a given solution. Think of a

social network API and its client app, for example. In this case, the solution won’t strictly require

delegation, the incoming token will likely be expected to identify the user, and the tokens issued

92
to that client won’t be accepted by any API other than the 1st party one (if you exclude cases

where individual app owners decide to accept them anyway, which are outside the control of

the 1st party solution developer anyway).

From the end-user perspective, the client+API ensemble constituting the solution is a logical whole

- my experience of using my Twitter account through the Twitter app doesn’t usually require any

special consent where the APIs are explicitly called out. In that case, one could argue that the

component of the app requesting the token and the component implementing APIs are, in fact,

the same entity, which could be represented by the same identifier - hence, here’s the crucial

step, targeted by a token with the same audience… just like an ID token.

Once, in front of a beer, one of the authors of the OpenID Connect specification told me that

an ID token is just an access token with specialized semantics. That said, it’s still generally not

worth it to ever use ID tokens for calling APIs. Although narrowly defined 1st party scenarios do

exist, those would still be better off when implemented with access tokens (think about sender

constraint limitations mentioned above) and the risk of overreaching and using the ID token in

ways that expose you to serious security risks is just too great. I mentioned this particular case

here because you are likely to encounter that approach in the wild if you work in this space

long enough, and I wanted to empower you to understand the nuances and point of view of

the people following that approach: however, the best practice remains using access tokens for

calling APIs. If you need JWT access tokens, the aforementioned JWT profile for OAuth2 access

tokens is on its way.

93
ID Tokens and the Back Channel

OpenID Connect offers multiple different ways of signing in. The one we studied in the preceding

chapter leverages the front channel. It relies on the implicit flow (that is, issuing an ID Token

directly from the authorization endpoint) plus form post (transmitting the token to backend hosted

logic, as it is the norm for redirect based apps). That flow is just the one that happens to have

the least number of moving parts, as it doesn’t require the client app to obtain, manage, and

use a client secret. It also is the flow that has more or less the same security characteristics as

traditional protocols such as SAML or WS-Federation, which are still in very wide use in mission-

critical high-value scenarios.

The authorization code grant we just studied in this chapter for calling the API can and is commonly

used for performing sign-in operations - by obtaining ID tokens following the same steps we

studied for requesting an access token. Say that you are in a scenario in which, for some reason,

you don't want to disclose the bits of the ID token to the user’s browser: by using the authorization

code grant, you can make everything take place on the server side. You can just perform an

authorization code grant in the same way we did for getting a token for calling the API: you

just ask for an ID token as well. Note, that’s exactly what we did in our API calling scenario, by

including the openid scope in the initial request. All we need for making that operation count as

sign-in is to validate that ID token and create a front channel session on the base of its content.

The notable difference from the front channel is that, given that the client obtains the ID token

from a direct HTTPS connection with the token endpoint, there is no uncertainty about the

source from which the ID token bits came from. The client knows for certain that the ID token

comes directly from the authorization server, with no intermediaries that could have tampered

with the content in transit. And with origin and integrity verified, there is no need to validate the

94
ID token’s signature. Think about it: if you were to validate the signature, you’d use the key you

retrieved from the discovery document. And why do you trust that it is the right key? Because you

retrieved the discovery endpoint over an HTTPS direct channel! The same assumptions hold for

the ID token retrieval from a direct connection with the token endpoint, which is why the client

can skip the signature verification.

What’s very, very important to understand is that not having to verify the signature does NOT

mean that the client is allowed to skip token validation! The client is still meant to validate

audience, issuer, expiration times, and all the other checks that the OpenID Connect specification

describes for the ID Token validation. The signature is only one of the many checks a recipient

should perform to validate incoming tokens, even in the front channel case.

Obtaining an ID token via authorization code is technically more secure than receiving it through the

front channel. However, this technique is more onerous, as it requires the client to obtain, protect

and use an application credential - that has a management cost, associated risks (like forgetting

a secret in source control), performance, and availability challenges (extra server calls). If your

application only needs to sign-in users and don’t have particular constraints about having tokens

transit through the browser, the front channel technique works fine - as demonstrated by many

years of successful SAML deployments using similar techniques to protect high-value scenarios.

If you are indeed in a situation that calls for higher security, or if you are already performing API

calls requiring the authorization code flow anyway, you might consider implementing sign-in via

backchannel as described in this section.

95
The Userinfo Endpoint

A client requesting an ID token without specifying the profile and email scope values will receive

a skeleton token stating that user X (as expressed by an opaque identifier, usually) successfully

authenticated with issuer Y. The token also specifies the time and perhaps the authentication

modes, and no other info - in particular, no user attributes.

There might be multiple reasons for which a client might opt for such barebone ID token content.

For example, a client might want such a token to use an easy to set up front channel sign-in flow

while avoiding disclosure of personally identifiable information (PII) to the browser. Alternatively,

clients might go that route simply to reduce the size of transferred data on a network that doesn't

have a lot of bandwidth, or on a metered connection where bigger ID tokens might result in the

user getting charged more for data use.

The good news is that clients can opt to work with barebone ID tokens and still gain access to

user attributes when necessary. OpenID Connect introduced a new API endpoint, called Userinfo

endpoint, which can be used for retrieving information about the user by presenting an appropriate

access token - following the same OAuth2 bearer token API calling technique studied earlier in

this chapter. Whenever the client needs to know something about the user, whether it didn’t save

the initial ID token or received a barebone one, it reaches out to the Userinfo endpoint using a

previously obtained access token. It will receive what substantially is the content that the client

would have gotten in an ID token requested with profile and email scopes.

The first chapter described the evolution that led from OAuth 2 to OpenID Connect. A key

passage was about a particular way of abusing OAuth for simulating sign-in, where the ability

to successfully call an API with an access token was considered proof enough for the client to

96
consider a user signed in. That had several problems: access tokens could not be tied to a user

in particular (very important if you aretrying to authenticate, that is, to sign-in), could not be

proven to have been issued as part of a sign-in operation for that app in particular, and could not

be standardized given that every provider protected API of different shape (Facebook Graph,

Twitter API, etc.).

The Userinfo endpoint resolves the first and the 3rd problem. The Userinfo response does provide

information about the user that obtained the access token used to secure the call to begin with

- and it’s standard, hence generic SDKs can be built to work against it. That makes it possible

for a client to implement pure OAuth 2.0 to retrieve user information in a standardized fashion.

It is very important to realize that, however, successfully calling the Userinfo endpoint is NOT

equivalent to validating ID tokens and alone CANNOT be used to implement sign-in. Calling the

Userinfo endpoint is not equivalent to validating a token, it does NOT count as sign-in verification.

Calling the Userinfo endpoint only proves that the corresponding access token is valid and

associated with the user identity whose attributes are returned: it does NOT prove that the

access token was issued for that particular client. OpenID Connect sign-in operations ALWAYS

require validating an ID token, although, as we have seen in some circumstances, the signature

check can be skipped from the validation checklist.

Another thing to keep into account when considering using the Userinfo endpoint from a

confidential client is that all the discussions about the burden of using a secret apply here, as

that’s part of obtaining an access token.

After all that preamble, let’s take a look at how an actual call to the Userinfo endpoint takes

place. As usual, we are going to explain each step - please refer to the numbered messages in

the diagram in Figure 4.5.

97
Figure 4.5

98
1. Userinfo Request
The scenario in the diagram assumes that the client has already obtained a suitable access

token for calling the Userinfo endpoint. Invoking the Userinfo endpoint is simply an HTTP

GET request, attaching said access token in an authorization header.

You might notice that in this particular network trace, the access token value looks different

from all the other tokens shown in the diagrams so far. Whereas token values in earlier

diagrams were always clipped for presentation purposes, and their shape suggested the

classic JWT encoding, the bits on display here are the entirety of the access token and

don’t appear to follow any known pattern. That's because calling the Userinfo endpoint

is precisely a scenario in which the use of opaque, formatless tokens makes sense. The

Userinfo endpoint is co-located with the authorization server: there is no need for cross-

boundaries communication. The entity that issued the access token in the first place is

the same entity responsible for validating it during the Userinfo API call. That means that

the two tasks can access the exact same memory space. In concrete terms, this means

that the access token intended to access the Userinfo API doesn't need to be encoded

in any particular format. It can literally be the identifier of a row in a database that was

created at issuance time and can now be looked up at API invocation time, or any other

technique relying on shared memory.

This is a luxury we cannot afford when the API being invoked is managed by a 3rd party and

hosted elsewhere. In this scenario, the parties involved are forced to rely on token validation

based on formats, introspection, and in general, techniques meant to accommodate the

lack of shared memory between the entity issuing the token and the entity consuming it.

99
2. Userinfo Response
The response returned by the Userinfo endpoint contains pretty much the same

list of claims carried by an ID token obtained via a request that includes the profile

scope.

The Hybrid Grant

The hybrid grant is, as the name suggests, a mix of multiple flows into one. It combines a sign-

in operation (getting an ID token from the front channel) and obtaining an access token for

invoking an API from the client backend (by requesting and redeeming an authorization code).

That saves network round trips, consolidates prompts and consent requests, and is, in general,

a very efficient way of performing a sign-in operation while getting ready to invoke API at the

same time. No diagram is shown for the hybrid grant, as you can easily piece it together yourself

by combining the web sign-in flow diagram in the preceding chapter and the authorization code

flow shown here. OpenID Connect is unique in this ability to mix and match sign-in and calling

APIs and having entities playing both roles: a “resource”, as in something being accessed as part

of the sign-in access, and a client, consuming other resources such as API. The fact that the app

in OpenID Connect is always called a client, emphasizing the latter role and omitting the former,

is a nod to its OAuth 2 origins (and to the fact that “resource” in OAuth 2 is reserved for APIs).

The hybrid grant is a really powerful tool that is commonly used in applications. In fact, today, it's

pretty rare to be able to state that an app will forever either only require sign-in, or only call APIs.

It's usually a continuum, and the availability of this grant makes it easy to add one functionality

or the other by simply modifying either the implicit plus form_post grant or the authorization

code grant.

100
Client Credentials Grant

In the last section of the chapter dedicated to invoking API, we will study the client credentials

grant - a flow defined by OAuth 2 for the cases where a client needs to get access tokens using

its own programmatic identity, rather than doing so on behalf of a user. Unlike the grants we

examined so far, the client credentials grant has no public client variant - it can only be performed

by a confidential client.All the flows examined so far for API are designed to grant clients delegated

access to resources, that is to say, enabling clients to “borrow” some of the user’s privileges

when accessing resources.

There are a number of situations in which clients need to operate as themselves, rather than

on behalf of a user. These are scenarios in which the application has an identity and has direct

resource privileges in itself. That class of scenarios doesn’t require a user to be signed in or

otherwise present. Even if a user happens to be signed in at that time of access, their privileges

might not be the ones the client needs to exercise. A classic example of that scenario occurs

when an application needs to perform an operation that the currently signed in user has no

privilege for. Imagine, for example, a continuous integration (CI) web app in which the final step

of a build process is taking the binaries of a compiled product and saving them in a particular

share that no user has access to.

One way of working around the problem would be to open the floodgates and give every user the

permission to access that share. That would preserve the CI’s ability to call the share in delegated

access mode. However, the risk for abuse would be very high: users might choose to exercise

their privileges on that file share even outside of the CI process.

101
An alternative would be to give privileges for file share access to the application itself. In turn,

the application can feature logic that determines which users should be able to write to the

share. So, it can use its own write privileges to perform write operations only for the appropriate

user sessions, and only within the limits of what the CI logic requires. Said in another way, by

granting the application itself the privileges required to access a resource, the responsibility of

determining who can do what is transferred from the authorization server to the application itself,

which becomes the gatekeeper for the resource.

One common way of referring to the aforementioned pattern is to say that the application and

the downstream APIs it accesses are defined as a trusted subsystem.

To use a real-world analogy, consider how a classic amusement park handles visitors’ access. At

the entrance, a visitor pays for a ticket and is given a bracelet or equivalent visible sign that the

individual paid for access. This sign does not need to bear any indication of the identity of the

wearer. Once the guest is in, she can enjoy every ride without any further access control check

other than the bracelet, broadcasting her right to be on the premises.

Similarly, once a user signs in with the CI web app, all the subsequent calls to the downstream

API will be performed as the web app itself, just in virtue of the fact that the user successfully

signed in. In a way, you can think of this as a resurgence of the concept of perimeter. However,

the big difference with traditional network perimeter is that the boundaries here are mostly

logical (API’s willingness to accept tokens issued to the CI app client) rather than physical (actual

network boundaries).

This class of patterns is pretty common in the context of microservices, where there is a gateway

that validates the identity of a caller. Once that check has been successfully performed, all the

subsequent calls from the gateway can be performed carrying tokens identifying the calling app

102
rather than the user. The user information might still be required, but it doesn’t strictly need to

travel in an issued token.

As it is the case with every confidential client flow, the critical point here is in putting particular

care in provisioning client credentials and maintaining them: for example, by making sure that

no entity other than the application has access to its credentials. Another critical aspect of the

scenario, not explicitly covered by the standards but of vital importance, is to carefully choose the

privileges assigned to the application and application logic exercising them. The least privilege

principle remains a key best practice in this scenario.

Let's take a look at how the client credentials grant actually works on the wire: please refer to

Figure 4.6.

103
Figure 4.6

104
1. Access Token Request
The client application requests a token by contacting the token endpoint directly, similarly

to what we have observed in the server-side segments of all the grants we have studied

so far.

In the sample scenario we have been discussed so far, the call is performed during a user

session - however that is entirely arbitrary. Remember that the client credentials grant

only relies on the client’s own identity rather than requesting delegated authorization from

a user. So, from the OAuth 2 standpoint, the flow described here might just as well occur

in a command-line tool, a long-running process, or in general, any kind of application

executed in a context where distribution and protection of client credentials are possible.

The request is a customary HTTP POST, carrying the well-known client_id, client_secret,

and grant_type (this time, set to client_credentials).

Observing the body of the POST message, one notable difference from all the grants

encountered so far is that the message for the token endpoint doesn’t contain any artifact

besides the client_secret. In contrast, the authorization code grant and the refresh code

grant all included some other entity to redeem. Once again, this shows why the other

flows are conceivable with public clients as well, whereas the client credential grant isn’t

possible without, well, client credentials.

Here it’s opportune to stress that client credentials and the client credentials grant are two

separate, distinct concepts. Client ID and client secret are the client credentials assigned

to a confidential client application and are used to identify the client app in every grant

whenever communication with the token endpoint occurs. The client credential grant is

105
a grant which happens to require only the client credentials, and no other artifact, to be

performed. It’s easy to get confused when using the terms loosely: whenever you hear

someone mentioning “client credentials”, it’s useful to be c lear on whether they are talking

about the grant, or just about the client ID and client secret.

One last observation on the request message: the audience parameter is required to

indicate to the authorization server what resource the client is requesting access to. This

information is necessaryfor authorization servers that can protect multiple source servers;

hence there’s no default resource the authorization server can refer to. As mentioned in

our earlier discussions about the audience parameter, the standard way of signaling that

information to the authorization server is through the resource parameter as defined in the

resource indicators specification, which was formalized into RFC state only a few months

ago. At the time of writing, Auth0 doesn’t support resource indicators.

2. Token Response
The token endpoint response is entirely unsurprising, carrying back the requested access

token just like described for other grants.

Of course, there is no id_token, given that the grant didn’t entail user identity in any

capacity.

Notably absent is the refresh token, too. In this scenario, it would simply serve no purpose.

The refresh token is meant to allow a client app to obtain a new access token to substitute

an expired one, and to do so without bugging the user with an extra prompt. However,

there is no need to ask anything to a user here, as the client credentials are available to

the client app at any time to request a new token.

106
Important note: the mechanism shouldn't be abused. Once a client requests and obtains

an access token, it should keep it around (stored with all the safety measures the task

requires) for the duration of its useful lifetime and use it whenever it needs to call an

API. Discarding still-valid access tokens and requesting a new access token from the

authorization server every time can be a costly anti-pattern, at all levels: security (every

time credentials are sent on the wire, there’s an opportunity for something to go wrong),

performance (network calls), availability (possibility of being throttled, transient network

failures), and money (various providers charge per issued token).

Note that, in this particular case, Auth0 uses scope to represent what the client can do.

For what we said earlier about scopes, this is a bit controversial.

Let's say that scopes normally restrain the set of privileges that the client can use from

the privilege that the user has, and here there is no user. Even if it appears not quite

appropriate, that's how Auth0 does it today. It just represents the privileges that have been

granted to the client application. There is no real security risk because of this: if a resource

owner would interpret the incoming scopes as the delegated authorization concepts we

discussed so far, the power they’d confer to the caller would be less, not more. However,

it’s an exception that is important to be aware of.

3. Calling the API


As expected, the call to the API occurs as usual, without any dependency on how the client

obtained the access token being used to protect that call. This completes our journey to

understand how to leverage OAuth2 and OpenID Connect to invoke APIs from a traditional

web app, and in general, any confidential client.

In the next chapter, we'll take a look at native clients, mobile clients and pretty much any

application that an end-user can directly operate… and that isn’t a browser.

107
Chapter 5 - Desktop and Mobile Apps

COMING SOON

108
Chapter 6 - Single Page Applications

COMING SOON

109

You might also like