0% found this document useful (0 votes)
257 views137 pages

Lectures27and28 Optimization PDF

The document discusses optimization problems in online advertising and kidney allocation. It provides reminders about upcoming project presentations and report deadlines. It then discusses how internet advertising works, focusing on Google AdWords and how sponsored search advertising auctions function. It covers click-through rates and using machine learning models like random forests to predict CTR from ad characteristics. Google's optimization problem is to maximize revenue by determining the best position and ads to display for each query.

Uploaded by

Ilias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
257 views137 pages

Lectures27and28 Optimization PDF

The document discusses optimization problems in online advertising and kidney allocation. It provides reminders about upcoming project presentations and report deadlines. It then discusses how internet advertising works, focusing on Google AdWords and how sponsored search advertising auctions function. It covers click-through rates and using machine learning models like random forests to predict CTR from ad characteristics. Google's optimization problem is to maximize revenue by determining the best position and ads to display for each query.

Uploaded by

Ilias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

+

Lectures 27 and 28 – Prediction and


Optimization in Advertising and Kidney
Allocation IEOR 242 – Applications in Data Analysis
Fall 2019 – Paul Grigas IEOR 242, Fall 2019 - Lecture 27
+ 2

Project Reminders
n Presentations are next week: December 9 midnight
(YouTube) or December 10 during class time
n Presentation (slide deck + delivery, etc.) is 20% of your
project grade
n Five minute presentations
n Consider the presentation as a five minute “pitch” of your
project idea and what you have done so far
n The analysis does not have to be totally complete at this stage
as you will still have time left to work on the project
n Still you should have done enough work in data processing
and at least run some initial models by this point
n The presentation should entice us to read your report to see
how the “story” ends

n Presentation and report should adequately address four


components (see document on bCourses, etc.)
n Motivation, data, analytics models, impact
IEOR 242, Fall 2019 - Lecture 27
+ 3

Project Reminders cont.

n Project report is due on Friday, December 20

n From the Guidelines Document on bCourses:


n The report should be ~4 pages of text
n This is not a hard constraint, but rather view 4 pages of text as
a target
n It is not necessary to sacrifice readability by stuffing
everything in the appendix, but please keep things
reasonable – aim for 4 pages of text
n Tell a complete story that summarizes what you have done
(successes hopefully, but also possibly failures)
n Please also submit your code (R, Python, …) and instructions
for how to reproduce your results

IEOR 242, Fall 2019 - Lecture 27


+ 4

Today’s Agenda

n Internet Advertising

n Liner Optimization Review

n Internet Advertising and “Predict-then-Optimize”

nA smarter way to train models in the predict-then-


optimize setting

n Kidney Allocation with ML and Optimization (if


time)

IEOR 242, Fall 2019 - Lecture 27


+
Internet Advertising and
Optimization

IEOR 242, Fall 2019 - Lecture 27 5


+ 6

Analytics, ML, and Optimization –


One Perspective

(Image from Dimitris Bertsimas, Editor-in-Chief of the


INFORMS Journal on Optimization.)

IEOR 242, Fall 2019 - Lecture 27


+ 7

Online Advertising

n Total US advertising spend in 2015: $189.06 billion


n Online spend: $58.61 billion
n In 2016 online advertising exceeded TV advertising spend,
for the first time in US history
n Mobile advertising ~30% of online spend
n Sponsored search ad spend ~51% of non-mobile online
spend

n Google’s (global) ad revenue in 2017: $95.38


billion

IEOR 242, Fall 2019 - Lecture 27


+ 8

Google Advertising Revenue

Google Advertising Revenue ($million)


from 2000-2016
$90,000

$80,000

$70,000

$60,000

$50,000

$40,000

$30,000

$20,000

$10,000

$0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

IEOR 242, Fall 2019 - Lecture 27


+ 9

Google Advertising and AdWords

n Google AdWords was launched in 2000 to generate


some revenue

n Original idea proposed by Bill Gross from Idealab

n Advertisers run campaign themselves

n AdWords provided ~99% of Google’s revenue from


2004-2007 and now accounts for ~90% of revenue

n Excellent online site for advertisers:


n Heavy traffic – attracts largest group of potential buyers
n Google maintains clean and clear tone of results page
n Works for huge companies and small local businesses alike

IEOR 242, Fall 2019 - Lecture 27


+ 10

How Sponsored Search Advertising


Works
n Advertisers place bids for search queries

n Google determines the placement (order) of the advertisements


based on each ad’s “quality score”
n Predominant component of quality score is the “expected
revenue”: amount bid times estimated Click-Through Rate (CTR)
n Other possible “adjustment” factors: Relevance of ad copy to a
keyword, landing page quality and loading time, …

n Ads are displayed as “sponsored links” on the results page


n Advertisers only pay if a user clicks on their ad
n In contrast with traditional CPM concept in advertising
n The amount advertisers pay Google is determined by a “generalized
second-price auction”

n Google’s problem: accurately predict the CTR of an ad and use


this to determine whether/where to place an ad for each
particular user
IEOR 242, Fall 2019 - Lecture 27
+ 11

Vickrey Auctions

n Named after Professor William Vickrey


(Columbia University )
n awarded the Nobel Prize in Economics in 1996
n Announced on October 8, 1996
n Died of a heart attack on October 11, 1996

n Amount advertiser pays per click (PPC)


depends on the ad position
n Pay the minimum amount required to keep
position
n This is the price of the bidder ranked below you
(plus 1 cent more)
Forces you to put your true valuation

IEOR 242, Fall 2019 - Lecture 27


+ 12

Click-Through-Rates (CTRs)

n Even though an ad is displayed, it might not be


clicked on

n Click-Through Rate (CTR) is the probability that


an ad will be clicked on each time it is displayed
on a results page

n Can vary by the position on the page, keyword,


geographic location, user profile, device being
used for access, …

IEOR 242, Fall 2019 - Lecture 27


+ 13

Click-Through Rate Data From Random


Forests/Boosting Lecture 10
n 6,057 new advertisements being shown to a
specified groups of users in a specified position on
the Tencent search engine soso.com
n Dependent Variable: Click-through rate (CTR) –
the proportion of impressions yielding a click
n Independent Variables
n Number of words in the title and text of the advertisement
n Number of ads on the page and the position of this ad
n Average CTR of this advertiser’s ads, both across all ads and
across all ads in this position
n Average CTR of other ads for this query, both across all ads
and across all ads in this position
n Gender and age of the users viewing the advertisement

IEOR 242, Fall 2019 - Lecture 27


+ 14

Predictive Quality for soso.com Data


Out-of-Sample
Model Type OSR2 MAE
Linear 0.483 0.035
Regression
CART Model 0.508 0.034
Random Forest 0.587 0.031
Boosting 0.588 0.032

n Boosting and Random Forests outperform


everything else

n This behavior is typical for these methods

n But remember, R2 is not the only goal in life…

IEOR 242, Fall 2019 - Lecture 27


+ 15

Example Query with Ads

Position 1

Position 2 Query
Depth = 4

Position 3

Position 4

IEOR 242, Fall 2019 - Lecture 27


+ 16

CTR vs. Position on Page

n CTR is typically higher for ads placed in higher


positions on the page

n In other words, CTRs tend to look something like


this:
Position on Page Click-Through-Rate (CTR)
First 0.08
Second 0.05
Third 0.025
Fourth 0.01
Fifth 0.005

IEOR 242, Fall 2019 - Lecture 27


+ 17

Google’s Optimization Problem

n Objective: Maximize revenue

n Decisions: Which ads to display for each query and what


position (order) to display them
n Since revenue is pay-per-click, it does not make sense to just display
the ads with the highest bids
n Google wants to display the ads that people are more likely to click
on
n Must account for (estimated) Click-Thru Rate (CTR) of ad
n Quality Score (QS) = $Bid x CTR x 1,000
n Order the position of ads by QS

n Constraints:
n Cannot exceed advertisers’ budgets
n Cannot display more ads in any position than the number of queries

IEOR 242, Fall 2019 - Lecture 27


+ 18

Today’s Example: Hotels near MIT


n Time Horizon: 1 day
n More typically weekly or monthly in practice

n Three Queries:
n “hotel near MIT”
n “MIT hotel”
n “Cambridge hotel”

n Four Bidders:
n Kendall Hotel
n Marriott Boston/Cambridge
n Royal Sonesta
n Hotel Marlowe

n Two positions to display on the results page

(Example from the Analytics Edge textbook)

(In reality, more bidders, more queries, and more positions …)

IEOR 242, Fall 2019 - Lecture 27


+ 19

Detailed Predicted Click-Thru-


Rates
“hotel near MIT”
Ad Position
Hotel
1 2 3 4
Kendall 0.097 0.061 0.030 0.012
Marriott 0.054 0.034 0.017 0.007
Sonesta 0.065 0.040 0.020 0.008
Marlowe 0.086 0.054 0.027 0.011

IEOR 242, Fall 2019 - Lecture 27


+ 20

Detailed Predicted Click-Thru-


Rates, cont.
“MIT hotel”
Ad Position
Hotel
1 2 3 4
Kendall 0.097 0.061 0.030 0.012
Marriott 0.054 0.034 0.017 0.007
Sonesta 0.076 0.047 0.024 0.009
Marlowe 0.086 0.054 0.027 0.011

IEOR 242, Fall 2019 - Lecture 27


+ 21

Detailed Predicted Click-Thru-


Rates, cont.
“Cambridge hotel”
Ad Position
Hotel
1 2 3 4
Kendall 0.081 0.051 0.025 0.010
Marriott 0.070 0.044 0.022 0.009
Sonesta 0.086 0.054 0.027 0.011
Marlowe 0.108 0.067 0.034 0.013

IEOR 242, Fall 2019 - Lecture 27


+ 22

Bids, Budgets, and Queries

Bid
Hotel “hotel near “MIT hotel” “Cambridge Daily Budget
MIT” hotel”
Kendall $8 $12 $0 $10
Marriott $25 $15 $25 $50
Sonesta $15 $0 $15 $20
Marlowe $15 $20 $10 $30

Queries/day 15 20 25

IEOR 242, Fall 2019 - Lecture 27


+
(Basic) Linear Optimization
Modeling

IEOR 242, Fall 2019 - Lecture 27 23


+ 24

Linear Optimization Models


nA linear optimization problem is the problem of
minimizing (or maximizing) a linear objective
function subject to linear constraints:

n Here is a vector of objective function


coefficients, is a vector of decision
variables, is a vector, and is an
matrix
IEOR 242, Fall 2019 - Lecture 27
+ 25

Common Mathematical
Ingredients
Common Mathematical Ingredients
Objective to be optimized Minimize

A set of decision variables

An objective function that


expresses the objective in terms of
the decision variables
Constraints that limit or otherwise
impose requirements on the
relationships between the decision
variables (A feasible solution is a
set of decision variable values that
satisfy the constraints)
Nonnegativity conditions on the
decision variables
IEOR 242, Fall 2019 - Lecture 27
+ 26

Linear Optimization Models

n Previously, the optimization problems we saw in


this class involved nonlinear functions, but were
unconstrained
n Nonlinear functions are more complicated than linear ones
n Constrained problems are more complicated than
unconstrained ones

IEOR 242, Fall 2019 - Lecture 27


+ 27

Some Examples of Applications of


Optimization Models
n American Airlines improves crew scheduling ($20
million/year), overbookings, discount fare allocation,
contribute $500 million/year
n NationalGrid: using optimization to schedule its gas
operations in Northeastern US ~$70 million in savings
n Grantham, Mayo, Van Otterloo & Co. (GMO),
Riversource Investments, others use optimization
models for portfolio and other investment management
optimization (“quantitative asset management”)
n … wide variety of applications in just about every
industrial sector

IEOR 242, Fall 2019 - Lecture 27


+ 28

Linear Optimization Example

nA pottery manufacturer can produce 4 types of


dining room service sets: English, Currier,
Primrose, and Bluetail. Primrose can be made by
two different methods.
n Each set uses clay, enamel, dry room time, and kiln
time. The manufacturer is committed to making
the same amount of Primrose using methods 1 and
2. How much of each set should be produced so as
to maximize profit? Assume the company can sell
everything it produces.

IEOR 242, Fall 2019 - Lecture 27


+ 29

LP Example

Resource
Data English Currier Primrose 1 Primrose 2 Bluetail
Availability
Clay (lbs.) 10 15 10 10 20 130
Enamel (lbs.) 1 2 2 1 1 13
Dry Room 3 1 6 6 3 45
(hours)
Kiln (hours) 2 4 2 5 3 23
Earnings 51 102 66 66 89
($/Set)

IEOR 242, Fall 2019 - Lecture 27


+ 30

Define Variables

n These represent the decisions that the manager


needs to make (and are fully controllable)
n In this example, we need to decide how much of
each set to produce
n Therefore, our decision variables are
n E = # of English sets
n C = # of Currier sets
n P1 = # of Primrose type 1 sets
n P2 = # of Primrose type 2 sets
n B = # of Bluetail sets

IEOR 242, Fall 2019 - Lecture 27


+ 31

Construct the Objective Function

n This is the output you are trying to optimize

n It can be profit, revenue, costs, risk, returns, anything

n The objective function is made up of two parts:


n A linear function that includes our decision variables
n An optimization objective
(i.e. are we minimizing or maximizing?)

n In this example we are trying to get the most profit by


optimizing our production quantities, therefore:

n Maximize Profit = 51 E + 102 C + 66 P1 + 66 P2 + 89 B

IEOR 242, Fall 2019 - Lecture 27


+ 32

Construct Constraints

n Every linear optimization problem will have some


constraints to limit the feasible solution set

n Before writing equations make a list of all the


constraints in English, this will help you make sure
you’ve covered all your bases

n In this example we are constrained by:


n We only have 130lbs of Clay
n We only have 13lbs of Enamel
n We only have 45hrs in the Dry Room
n We only have 23hrs in the Kiln
n We must make the same amount in both Primrose methods

IEOR 242, Fall 2019 - Lecture 27


+ 33

Construct Constraints, cont.

n Then translate English statements into equations:

n In this example:
n (Clay) 10 E + 15 C + 10 P1 + 10 P2 + 20 B ≤ 130
n (Enamel) 1 E + 2 C + 2 P1 + 1 P2 + 1 B ≤ 13
n (Dry Room) 3 E + 1 C + 6 P1 + 6 P2 + 3 B ≤ 45
n (Kiln) 2 E + 4 C + 2 P1 + 5 P2 + 3 B ≤ 23
n (Primrose) P1 – P2 = 0

IEOR 242, Fall 2019 - Lecture 27


+ 34

Nonnegativity Conditions
(Variable Restrictions)
n There are usually a few intuitive constraints that we
have to explicitly mention
n These include assumptions about decision variables being
nonnegative and/or integers

n In this example:
n E≥0
n C≥0
n P1 ≥ 0
n P2 ≥ 0
n B≥0

IEOR 242, Fall 2019 - Lecture 27


+ 35

The Full Model

Maximize 51 E + 102 C + 66 P1 + 66 P2 + 89 B

Subject to:
(Clay) 10 E + 15 C + 10 P1 + 10 P2 + 20 B ≤ 130
(Enamel) 1 E + 2 C + 2 P1 + 1 P2 + 1 B ≤ 13
(Dry Room) 3 E + 1 C + 6 P1 + 6 P2 + 3 B ≤ 45
(Kiln) 2 E + 4 C + 2 P1 + 5 P2 + 3 B ≤ 23
(Primrose) P1 – P2 = 0
(Nonnegativity) E, C, P1, P2, B ≥ 0

IEOR 242, Fall 2019 - Lecture 27


+ 36

Spreadsheet with Data and Model

Data: English Currier Primrose 1 Primrose 2 Bluetail Resource Availability


Clay (lbs.) 10 15 10 10 20 130
Enamel (lbs.) 1 2 2 1 1 13
Dry Room (hours) 3 1 6 6 3 45
Kiln (hours) 2 4 2 5 3 23

Selling Prices ($/Set) $51 $102 $66 $66 $89

Decision Variables: English Currier Primrose 1 Primrose 2 Bluetail


# of Sets Produced 0 0 0 0 0

Objective Function: MAX 0

Constraints: LHS Inequality RHS


Clay 0 <= 130 lbs.
Enamel 0 <= 13 lbs.
Dry Room 0 <= 45 hours
Kiln 0 <= 23 hours
Pimrose 0 = 0

IEOR 242, Fall 2019 - Lecture 27


+ 37

Spreadsheet with Solution

Data: English Currier Primrose 1 Primrose 2 Bluetail Resource Availability


Clay (lbs.) 10 15 10 10 20 130
Enamel (lbs.) 1 2 2 1 1 13
Dry Room (hours) 3 1 6 6 3 45
Kiln (hours) 2 4 2 5 3 23

Selling Prices ($/Set) $51 $102 $66 $66 $89

Decision Variables: English Currier Primrose 1 Primrose 2 Bluetail


# of Sets Produced 0 2 0 0 5

Objective Function: MAX 649

Constraints: LHS Inequality RHS


Clay 130 <= 130 lbs.
Enamel 9 <= 13 lbs.
Dry Room 17 <= 45 hours
Kiln 23 <= 23 hours
Pimrose 0 = 0

IEOR 242, Fall 2019 - Lecture 27


+ 38

Common Mathematical
Ingredients
Common Mathematical Ingredients
Objective to be optimized Maximize profit

A set of decision variables E, C, P1, P2, B

An objective function that 51 E + 102 C + 66 P1 + 66 P2 + 89 B


expresses the objective in terms of
the decision variables
Constraints that limit or otherwise (Clay)
impose requirements on the 10 E + 15 C + 10 P1 + 10 P2 + 20 B ≤ 130
relationships between the decision …
variables

Nonnegativity conditions on the E, C, P1, P2, B ≥ 0


decision variables
IEOR 242, Fall 2019 - Lecture 27
+ 39

Common Mathematical Problem, cont.

n A constraint can be in the form of:


n an equality constraint ( = ) , or
n an inequality constraint ( ≤ or ≥ )

n Each constraint can be rearranged to have the following format, e.g.:


n 14.0 a – 12.25 b + … + 7.2 f ≥ 925.2

constraint function relation Right-Hand-Side (RHS)

n A feasible solution is an assignment of the decision variables that satisfies


all of the constraints (including the nonnegativity conditions)

n Goal: find a feasible solution that optimizes the objective function. This is
called a constrained optimization problem.

n An optimal solution is a feasible solution that achieves the best value of the
objective function over all other feasible solutions.

IEOR 242, Fall 2019 - Lecture 27


+ 40

Linear Optimization Model

n The problem is a linear optimization model if


n all constraints are linear functions
n the objective function is a linear function

9 A + 2 B – 5C + 7 D – 174.7 is a linear function

9 A + 2BC + 33.3 is not linear

9 A + 3 B + D2 + 18.7 is not linear

IEOR 242, Fall 2019 - Lecture 27


+ 41

Nonlinear Optimization Modeling

n A nonlinear optimization model is an optimization problem


where some of the functions are nonlinear:
n perhaps nonlinear objective function
n perhaps nonlinear constraint functions

n Nonlinear optimization models can be either moderately more


difficult to solve or extremely more difficult to solve, depending
on the mathematical structure of the nonlinear functions

n Where do nonlinear optimization problems most typically arise


in management? In engineering? In machine learning/statistics?

IEOR 242, Fall 2019 - Lecture 27


+ Integer and Binary Decision Variables in 42

Optimization

n A decision variable X is an integer variable if its values are


restricted to be the nonnegative whole numbers 0, 1, 2, 3, 4,
…. (the integers)

n A decision variable Y is a binary variable if its values are


restricted to be either 0 or 1
n binary variables are used to conveniently handle a lot of
modeling situations

n Using integer and/or binary decision variables can make the


optimization model moderately or extremely more difficult to
solve

n We will use integer variables today…

IEOR 242, Fall 2019 - Lecture 27


+
Internet Advertising and
Optimization

IEOR 242, Fall 2019 - Lecture 27 43


+ 44

Today’s Example: Hotels near MIT


n Time Horizon: 1 day
n More typically weekly or monthly in practice

n Three Queries:
n “hotel near MIT”
n “MIT hotel”
n “Cambridge hotel”

n Four Bidders:
n Kendall Hotel
n Marriott Boston/Cambridge
n Royal Sonesta
n Hotel Marlowe

n Two positions to display on the results page

(From the Analytics Edge textbook)

(In reality, more bidders, more queries, and more positions …)

IEOR 242, Fall 2019 - Lecture 27


+ 45

Detailed Predicted Click-Thru-


Rates
“hotel near MIT”
Ad Position
Hotel
1 2 3 4
Kendall 0.097 0.061 0.030 0.012
Marriott 0.054 0.034 0.017 0.007
Sonesta 0.065 0.040 0.020 0.008
Marlowe 0.086 0.054 0.027 0.011

IEOR 242, Fall 2019 - Lecture 27


+ 46

Detailed Predicted Click-Thru-


Rates, cont.
“MIT hotel”
Ad Position
Hotel
1 2 3 4
Kendall 0.097 0.061 0.030 0.012
Marriott 0.054 0.034 0.017 0.007
Sonesta 0.076 0.047 0.024 0.009
Marlowe 0.086 0.054 0.027 0.011

IEOR 242, Fall 2019 - Lecture 27


+ 47

Detailed Predicted Click-Thru-


Rates, cont.
“Cambridge hotel”
Ad Position
Hotel
1 2 3 4
Kendall 0.081 0.051 0.025 0.010
Marriott 0.070 0.044 0.022 0.009
Sonesta 0.086 0.054 0.027 0.011
Marlowe 0.108 0.067 0.034 0.013

IEOR 242, Fall 2019 - Lecture 27


+ 48

Bids, Budgets, and Queries

Bid
Hotel “hotel near “MIT hotel” “Cambridge Daily Budget
MIT” hotel”
Kendall $8 $12 $0 $10
Marriott $25 $15 $25 $50
Sonesta $15 $0 $15 $20
Marlowe $15 $20 $10 $30

Queries/da 15 20 25
y

IEOR 242, Fall 2019 - Lecture 27


+ 49

Bids and CTR for Ad Position 1

Bid and CTR for Ad Position 1


“hotel near “MIT hotel” “Cambridge
MIT” hotel” Daily
Hotel Budge
Bid CTR for Bid CTR Bid CTR for t
Position for Position 1
1 Pos. 1
Kendall $8 0.097 $12 0.097 $0 0.081 $10
Marriott $25 0.054 $15 0.054 $25 0.070 $50
Sonesta $15 0.065 $0 0.076 $15 0.086 $20
Marlowe $15 0.086 $20 0.086 $10 0.108 $30

Queries 15 20 25
/day
IEOR 242, Fall 2019 - Lecture 27
+ Quality Score = (expected bid 50

revenue/click) x 1,000

Bid, CTR, Quality Score (QS), and Order


“hotel near MIT” “MIT hotel” “Cambridge
Hotel
hotel” Daily
Bid CTR Quality Order Bid CTR Quality Order Bid CTR Quality Order Budget
Score Score Score

Kendall $8 0.097 776 4 $12 0.097 1,164 2 $0 0.081 0 - $10

Marriott $25 1 $15 3 $25 1 $50


0.054 1,350 0.054 810 0.070 1,750

Sonesta $15 3 $0 - $15 2 $20


0.065 975 0.076 0 0.086 1,290

Marlowe $15 0.086 1,290 2 $20 0.086 1,720 1 $10 0.108 1,080 3 $30

Queries 15 20 25
/Day

776 = $8 x .097 x 1,000


IEOR 242, Fall 2019 - Lecture 27
+ 51

Bidding Landscapes

n Ordered set of bidders for each query

n Bidders for “hotel near MIT”: Kendall, Marriott, Sonesta, Marlowe


n Order by Quality Score (QS)
n Kendall
n QS = 776
n Marriott
n QS = 1,350
n Sonesta
n QS = 975
n Marlowe
n QS = 1,290

n Bidding Landscape for “hotel near MIT” is:


{Marriott, Marlowe, Sonesta, Kendall}
IEOR 242, Fall 2019 - Lecture 27
+ 52

Ordered Bidding Landscapes

Bid x QS and Order


“hotel near MIT” “MIT hotel” “Cambridge
hotel”
Hotel
QS Order QS Order QS Order
Kendall 776 4 1,164 2 0 -
Marriott 1,350 1 810 3 1,750 1
Sonesta 975 3 0 - 1,290 2
Marlow 2 1 3
1,290 1,720 1,080
e

IEOR 242, Fall 2019 - Lecture 27


+ 53

Bidding Landscapes and Slates


n “hotel near MIT”
n Bidding Landscape is {Marriott, Marlowe, Sonesta, Kendall}

n “MIT hotel”
n Bidding Landscape is {Marlowe, Kendall, Marriott}

n “Cambridge hotel”
n Bidding Landscape is {Marriott, Sonesta, Marlowe}

n Suppose we can display at most k = 2 positions (k is the number of positions on


the results page)

n For “hotel near MIT” we can display any of the following ordered “slates”:
n { Marriott, Marlowe }, { Marriott, Sonesta }, { Marriott, Kendall }, { Marlowe,
Sonesta }, { Marlowe, Kendall }, { Sonesta, Kendall }, { Marriott }, { Marlowe },
{ Sonesta }, { Kendall }

n These are called the “slates” for the query “hotel near MIT”

IEOR 242, Fall 2019 - Lecture 27


+ 54

Slates

“hotel near MIT” “MIT hotel” “Cambridge hotel”

Slate Slate Slate

{Marriott, Marlowe} {Marlowe, Kendall} {Marriott, Sonesta}

{Marriott, Sonesta} {Marlowe, Marriott} {Marriott, Marlowe}

{Marriott, Kendall} {Kendall, Marriott} {Sonesta, Marlowe}

{Marlowe, Sonesta} {Marlowe} {Marriott}

{Marlowe, Kendall} {Kendall} {Sonesta}

{Sonesta, Kendall} {Marriott} {Marlowe}

{Marriott}

{Marlowe}

{Sonesta}

{Kendall}

IEOR 242, Fall 2019 - Lecture 27


+ 55

Decision Variables

n For each query, we need to decide how many times


each slate will be displayed

n xij = number of times slate j is displayed for query i


(these will be integer decision variables)

n 10 decision variables for “hotel near MIT”

n6 decision variables for “MIT hotel”

n6 decision variables for “Cambridge hotel”

IEOR 242, Fall 2019 - Lecture 27


+ 56

Decision Variables for Slates


“hotel near MIT” “MIT hotel” “Cambridge hotel”

Slate Variable Slate Variable Slate Variable

{Marriott, Marlowe} x10 {Marlowe, Kendall} x20 {Marriott, Sonesta} x30

{Marriott, Sonesta} x11 {Marlowe, {Marriott, Marlowe} x31


x21
Marriott}
{Marriott, Kendall} x12 {Kendall, Marriott} x22 {Sonesta, Marlowe} x32

{Marlowe, Sonesta} x13 {Marlowe} x23 {Marriott} x33

{Marlowe, Kendall} x14 {Kendall} x24 {Sonesta} x34

{Sonesta, Kendall} x15 {Marriott} x25 {Marlowe} x35

{Marriott} x16

{Marlowe} x17

{Sonesta} x18

{Kendall} x19

IEOR 242, Fall 2019 - Lecture 27


+ 57

Query Constraints

n Cannot display more slates than the (expected) number of each


query

n One constraint for each query:

“hotel near MIT”:


x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 ≤ 15

“MIT hotel”:
x20 + x21 + x22 + x23 + x24 + x25 ≤ 20

“Cambridge hotel”:
x30 + x31 + x32 + x33 + x34 + x35 ≤ 25

IEOR 242, Fall 2019 - Lecture 27


+ 58

Determining the Price-per-click


(PPC)
n How much should advertisers pay per click?

n Depends on the query and the position of the ad

n Consider the query “hotel near MIT”


n Marriott got the first position since 1,350 > 1,290
n This will still be true if their bid price P satisfies: P x 54 >
1,290
n They needed to have bid at least P = $23.88 + .01 = $23.89

n Marlowe got the second position since 1,290 > 975


n This will still be true if their bid price P satisfies: P x 86 > 975
n They needed to have bid at least P = $11.33 + .01 = $11.34

n Do this for every query and every bidder


IEOR 242, Fall 2019 - Lecture 27
+ Quality Score = (expected bid 59

revenue/click) x 1,000

Bid, CTR, Quality Score (QS), and Order


“hotel near MIT” “MIT hotel” “Cambridge
Hotel
hotel” Daily
Bid CTR Quality Order Bid CTR Quality Order Bid CTR Quality Order Budget
Score Score Score

Kendall $8 0.097 776 4 $12 0.097 1,164 2 $0 0.081 0 - $10

Marriott $25 1 $15 3 $25 1 $50


0.054 1,350 0.054 810 0.070 1,750

Sonesta $15 3 $0 - $15 2 $20


0.065 975 0.076 0 0.086 1,290

Marlowe $15 0.086 1,290 2 $20 0.086 1,720 1 $10 0.108 1,080 3 $30

Queries 15 20 25
/Day

IEOR 242, Fall 2019 - Lecture 27


+ 60

Derived Values of Price-Per-Click


(PPC)

Price-Per-Click
Hotel “hotel near MIT” “MIT hotel” “Cambridge hotel”
Kendall $0.01 $8.36 --
Marriott $23.89 $0.01 $18.43
Sonesta $11.94 -- $12.56
Marlowe $11.34 $13.54 $0.01

n Bottom ad in the bidding landscape pays PPC = $0.01

IEOR 242, Fall 2019 - Lecture 27


+ 61

Derived Price per Click (PPC),


cont.

Bid, Quality Score (QS), and Order


“hotel near “MIT hotel” “Cambridge
Hotel Daily
MIT” hotel”
Budge
Bid QS Order PPC Bid QS Order PPC Bid QS Order PPC
t
Kendall $8 776 4 $0.01 $12 1,164 2 $8.36 $0 0 - -- $10

Marriott $25 1,350 1 $23.89 $15 810 3 $0.01 $25 1,750 1 $18.43 $50

Sonesta $15 975 3 $11.94 $0 0 - -- $15 1,290 2 $12.56 $20

Marlowe $15 1,290 2 $11.34 $20 1,720 1 $13.54 $10 1,080 3 $0.01 $30

Queries/D 15 20 25
ay

IEOR 242, Fall 2019 - Lecture 27


+ 62

Budget Constraint for Each Bidder

n Each time a slate is displayed, the expected revenue to Google (which


equals the cost to the advertiser) is the PPC times the CTR for each ad in the
slate

n Each bidder’s budget constraint must be satisfied

n For Marlowe we have


11.34 x ( 0.054*x10 + 0.086*x13 + 0.086*x14 + 0.086*x17 )

+ 13.54 x (0.086*x20 + 0.086*x21 + 0.086*x23 )

+ 0.01 x ( 0.067*x31 + 0.067*x32 + 0.108*x35 ) ≤ 30

n Similar budget constraint for other bidders Kendall, Marriott, Sonesta

IEOR 242, Fall 2019 - Lecture 27


+ 63

Objective Function

n Maximize the revenue to Google

n Add up over all slates:

(expected revenue of slate) x (number of displays of slate)

n Consider query “hotel near MIT” and slate {Marriott, Sonesta}:


n Slate decision variable is x10

n Marriott contributes $23.89 x .054


n Sonesta contributes $11.94 x 0.040
n Total expected revenue is
$1.76726 = $23.89 x .054 + $11.94 x 0.040

IEOR 242, Fall 2019 - Lecture 27


+ 64

(Expected) Revenue per Display

“hotel near MIT” “MIT hotel” “Cambridge hotel”


Revenue Revenue Revenue
per per per
Slate Display Slate Display Slate Display
{Marriott, Marlowe} $1.90 {Marlowe, Kendall} $1.67 {Marriott, Sonesta} $1.97
{Marriott, Sonesta} $1.77 {Marlowe, Marriott} $1.16 {Marriott, Marlowe} $1.29
{Marriott, Kendall} $1.29 {Kendall, Marriott} $0.81 {Sonesta, Marlowe} $1.08
{Marlowe, Sonesta} $1.45 {Marlowe} $1.16 {Marriott} $1.29
{Marlowe, Kendall} $0.98 {Kendall} $0.81 {Sonesta} $1.08
{Sonesta, Kendall} $0.78 {Marriott} $0.00 {Marlowe} $0.00
{Marriott} $1.29
{Marlowe} $0.98
{Sonesta} $0.78
{Kendall} $0.00
IEOR 242, Fall 2019 - Lecture 27
+ 65

Google’s Completed Optimization


Model
Maximize revenue: = 1.90 x10 + 1.77 x11 + 1.29 x12 + 1.45 x13 + 0.98 x14 + 0.78 x15 + 1.29 x16 + 0.98 x17 + 0.78 x18 + 0.00 x19
+ 1.67 x20 + 1.16 x21 + 0.81 x22 + 1.16 x23 + 0.81 x24 + 0.00 x25
+ 1.97 x30 + 1.29 x31 + 1.08 x32 + 1.29 x33 + 1.08 x34 + 0.00 x35
subject to:
Kendall budget: 0.01x (0.061*x12+ 0.061*x14+ 0.061*x15+ 0.097*x19) +8.36x(0.061*x20+ 0.097*x22+ 0.097*x24) ≤ 10

Marriott budget: 23.89x (0.054*x10+ 0.054*x11+ 0.054*x12+ 0.054*x16) +0.01x(0.034*x21


+ 0.034*x22+ 0.054*x25) + 18.43x( 0.070*x30+ 0.070*x31+0.070*x33) ≤ 50

Sonesta budget: 11.94x (0.04*x11+ 0.04*x13+ 0.065*x15+ 0.065*x18) + 12.56x( 0.054*x30+ 0.086*x32+0.086*x34) ≤ 20

Marlowe budget: 11.34x (0.054*x10+ 0.086*x13+ 0.086*x14+ 0.086*x17) +13.54x(0.086*x20+ 0.086*x21+ 0.086*x23)
+ 0.01x( 0.067*x31+ 0.067*x32+0.108*x35) ≤ 30

Queries for “hotel near MIT”: x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 ≤ 15
Queries for “MIT hotel”: x20 + x21 + x22 + x23 + x24 + x25 ≤ 20
Queries for “Cambridge hotel”: x30 + x31 + x32 + x33 + x34 + x35 ≤ 25

x10 , x11 , x12 , x13 , x14 , x15 , x16 , x17 , x18 , x19 , x20 , x21 , x22 , x23 , x24 , x25 , x30 , x31 , x32 , x33 , x34 , x35 integers

IEOR 242, Fall 2019 - Lecture 27


+ 66

Optimal Solution
“hotel near MIT” “MIT hotel” “Cambridge hotel”
Optimal Optima Optimal
Slate Value Slate l Slate Value
Value
{Marriott, Marlowe} 9 {Marlowe, {Marriott, Sonesta} 25
18
Kendall}
{Marriott, Sonesta} 4 {Marlowe, {Marriott, Marlowe}
1
Marriott}
{Marriott, Kendall} {Kendall, Marriott} 1 {Sonesta, Marlowe}
{Marlowe, Sonesta} 2 {Marlowe} {Marriott}
{Marlowe, Kendall} {Kendall} {Sonesta}
{Sonesta, Kendall} {Marriott} {Marlowe}
{Marriott}
{Marlowe}
{Sonesta}
{Kendall}

Optimal revenue value = $108.42


IEOR 242, Fall 2019 - Lecture 27
+ 67

Optimal Solution and $/display

“hotel near MIT” “MIT hotel” “Cambridge hotel”


Optimal Optima Optimal
Slate $/displa Value Slate $/displa l Slate $/display Value
y y
Value
{Marr, Marl} $1.90 9 {Marl, Ken} $1.67 18 {Marr, Son} $1.97 25

{Marr, Son} $1.77 4 {Marl, Marr} $1.16 1 {Marr, Marl} $1.29


{Marr, Ken} $1.29 {Ken, Marr} $0.81 1 {Son, Marl} $1.08
{Marl, Son} $1.45 2 {Mar} $1.16 {Marr} $1.29
{Marl, Ken} $0.98 {Ken} $0.81 {Son} $1.08
{Son, Ken} $0.78 {Marr} $0.00 {Marl} $0.00
{Marr} $1.29
{Marl} $0.98
{Son} $0.78
{Ken} $0.00

IEOR 242, Fall 2019 - Lecture 27 Optimal revenue value = $108.42


+ 68

Analyzing the Solution

n How well did the model do?


n The budgets for the advertisers are $10, $50, $20, $30
n The maximum revenue we could hope for is $110
n Our objective function value is $108.42

n We do not always display the most profitable slates. Why?

IEOR 242, Fall 2019 - Lecture 27


+ 69

Typical Improvements from


Optimization

n Most of the time, optimization increases objective over


intelligent “non-optimization” solutions by 3-10%

n For Google, a 1% improvement from optimization


translates to $794 million/year

n Indeed, Google investment in optimization and


analytics is huge for this reason (and other reasons as
well)

IEOR 242, Fall 2019 - Lecture 27


+ 70

Pros and Cons of the Optimization


Model

n What are some advantages of this optimization


model?

n What are some disadvantages of this optimization


model?

n Is the model easy to implement?

IEOR 242, Fall 2019 - Lecture 27


+ 71

Analyzing the Solution, again

n How well did the model do?


n The budgets for the advertisers are $10, $50, $20, $30
n The maximum revenue we could hope for is $110
n Our objective function value is $108.42

n What can possibly go wrong with using the objective


function value as a performance metric?
n The objective function is based on an expected value…with
probabilities that are sometimes very small
n Moreover, these probabilities are based on a predictive model for
CTR estimation which will always have some level of error

IEOR 242, Fall 2019 - Lecture 27


+ 72

“Predict-then-Optimize Pipeline”

n A crucial input to today’s lecture is accurate forecasting of the


click-through-rates

n Today’s optimization model is part of a multistage analytics


pipeline

Historical Statistical Decision


Data Model Strategies

CTR prediction Profit/goal optimization


(Also: Impression arrival (Also: Other concerns in
modeling, …) reality?)

IEOR 242, Fall 2019 - Lecture 27


+ 73

(Legitimately) Evaluating the


Quality of our Solution
n The optimization model implies a policy:
n When a new search query arrives on Google, which slate
should be displayed?
n One approach: we can randomly sample among all feasible
slates, using the optimal decision variable values as relative
weights
n For example, for the query “hotel near MIT”, {Marriott,
Marlowe} gets relative weight 9, {Marriott, Sonesta} gets
relative weight 4, and {Marlowe, Sonesta} gets relative weight 2

n Then, how should we properly evaluate this policy?


n Using historical data? Simulation! (On a held-out test dataset)
n “In the wild”? A/B testing!

IEOR 242, Fall 2019 - Lecture 27


+ 74

Critical Role of Search-Related


Advertising Optimization Models

n Optimization models for search-related advertising


is critical for search sites (not just Google of course)

IEOR 242, Fall 2019 - Lecture 27


+ 75

Critical Role of Search-Related


Advertising Optimization Models, cont.

IEOR 242, Fall 2019 - Lecture 27


+ 76

Critical Role of Search-Related


Advertising Optimization Models, cont.

IEOR 242, Fall 2019 - Lecture 27


+ 77

Critical Role of Search-Related


Advertising Optimization Models, cont.

IEOR 242, Fall 2019 - Lecture 27


+ 78

Critical Role of Search-Related


Advertising Optimization Models

n Actual optimization models used and how they are


solved are closely-guarded company secrets

n Real life engineering systems are complex and


involve many different parts

IEOR 242, Fall 2019 - Lecture 27


+ 79

Analytics in Online Advertising

n Analytics lies at the heart of all three core technologies


that enable the world wide web
n How web search is done
n How to forecast Click-Thru-Rates
n How to determine whether and where to place an ad for each
particular user

n Targeted online advertising is the revenue engine that


truly enables the world wide web
n Data privacy is an important concern
n Critical question: Can we understand the tradeoffs between
privacy and revenue?

IEOR 242, Fall 2019 - Lecture 27


+ 80

Advertiser’s Problem

n For details on how the advertisers might optimally


take advantage of the auction mechanism, see the
Chapter 12 of The Analytics Edge

IEOR 242, Fall 2019 - Lecture 27


+
“Predict, then Optimize”
Setting

IEOR 242, Fall 2019 - Lecture 27 81


+ 82

“Predict-then-Optimize Pipeline”

n A crucial input to today’s lecture is accurate forecasting of the


click-through-rates

n Today’s optimization model is part of a multistage analytics


pipeline

Historical Statistical Decision


Data Model Strategies

CTR prediction Profit/goal optimization


(Also: Impression arrival (Also: Other concerns in
modeling, …) reality?)

IEOR 242, Fall 2019 - Lecture 27


+ 83

Predict, then Optimize Setting

n Optimization problems arising in practice almost


always involve unknown parameters

n Often there is a relationship between the unknown


parameters and some contextual/auxiliary feature data
n Historical data can be leveraged to build a machine learning
model

n Common Approach (“Predict, then Optimize”):


n Predict unknown parameters using a previously trained ML
model
n “Plug in” predicted values, then optimize

IEOR 242, Fall 2019 - Lecture 27


+ 84

Shortest Path Example

n Maps and Waze need to predict edge


costs (travel time along each edge)

n Have access to features such as time,


day, historical traffic patterns, speed
limit, etc.

n Need to provide shortest path route


to driver using predicted edge costs

IEOR 242, Fall 2019 - Lecture 27


+ 85

An Observation

n Machine learning models are trained by minimizing


some form of prediction error on a training dataset
n Example: train a model by minimizing the least-squares loss
function, logistic loss function, hinge loss function…

n Training of the model does not consider the


optimization problem of interest
n The training procedure is “blind” to the down stream
optimization task

IEOR 242, Fall 2019 - Lecture 27


+ 86

Predict-then-Optimize Framework

n We consider a nominal linear optimization problem


of the form:

n denotes the (fixed) constraint set

n is an “optimization subroutine” for , i.e.,


the optimal solution of as a function of

IEOR 242, Fall 2019 - Lecture 27


+ 87

Key Ingredients of the Predict-


then-Optimize Framework

n Nominal Optimization Problem:

n Training data:

n Key presumption: the cost vector of the linear


optimization problem is unknown, but related to
some auxiliary features
IEOR 242, Fall 2019 - Lecture 27
+ 88

Key Ingredients of the Predict-


then-Optimize Framework, cont.
n We want to build a model to predict based on
n Note that is a d dimensional vector, is a p dimensional
vector

n We will use a linear model:


n Prediction will be a linear function of the features

n Let denote a matrix of coefficients, then:

IEOR 242, Fall 2019 - Lecture 27


+ 89

Loss Function Minimization

n We will again appeal to the idea of regularized loss


function minimization

n We need to choose a loss function that


quantifies the error in making prediction when the
actual cost vector is

nA central question in this setting is how to construct


the loss function

IEOR 242, Fall 2019 - Lecture 27


+ 90

Loss Function Minimization, cont.

n We need to choose a loss function that


quantifies the error in making prediction when the
actual cost vector is

n Given the choice of loss function and regularization


function and parameter , the training
procedure is to solve the optimization problem:

IEOR 242, Fall 2019 - Lecture 27


+ 91

How to choose the loss function?

n This is (sort of) a regression problem, so a standard


choice would be:

n Can we do better by accounting for the


downstream optimization problem in the
design of the loss function?

IEOR 242, Fall 2019 - Lecture 27


+ 92

Predict-then-Optimize Paradigm

n Predict: given a new feature vector , make


prediction

n Optimize: make decision by solving

n Incur cost with respect to the “true”


realized cost vector

Predict Optimize

IEOR 242, Fall 2019 - Lecture 27


+ 93

Smart “Predict, then Optimize”


(SPO) Loss Function
n After making decision , we incur cost

n We would have rather made the decision


with associated optimal cost

n The difference is the loss that we


suffered and provides the definition of our SPO
loss function:

IEOR 242, Fall 2019 - Lecture 27


+ 94

“Ideal” Learning Problem

n The ideal SPO learning problem would be:

n Unfortunately, this problem is very computationally


difficult to solve

IEOR 242, Fall 2019 - Lecture 27


+ 95

Relation to Binary Classification

n In fact, it turns out that


the SPO loss is a special 3

case of the classical 0-1 2


loss in binary 0-1 loss
function
classification 1

0
n This equivalence -2 -1 0 1 2
happens with:

IEOR 242, Fall 2019 - Lecture 27


+ 96

Surrogate Loss Functions

n The connection with binary


classification motivates the use
and construction of surrogate 3.5

loss functions Loss


3

nA surrogate loss function is a loss


2.5

function that is used in place of


2

the “ideal” loss function


1.5

0-1 Loss
1

Hinge Loss

n Hinge loss (SVMs) and logistic


0.5 Logistic Loss

loss are surrogates for the 0-1


0

-3 -2 -1 0 1
yibTxi
2 3

loss in binary classification

IEOR 242, Fall 2019 - Lecture 27


+ 97

Surrogate SPO+ Loss Function

n We will construct a tractable upper bound on the


SPO loss function, which will serve as a “surrogate”
function for us to optimize

n We call this the SPO+ loss function:

n Definition:

n Recall

IEOR 242, Fall 2019 - Lecture 27


+ 98

Surrogate SPO+ Loss Function

n SPO+ loss function:

n Two important properties:


n is a well-behaved (convex) function of
n We have the inequality

n The next three slides provide a derivation of this


surrogate SPO+ loss function

IEOR 242, Fall 2019 - Lecture 27


+ 99

(Advanced: Derivation of the


SPO+ Loss)
n For any , we can write

n Where

n Therefore

IEOR 242, Fall 2019 - Lecture 27


+ 100

(Advanced: Derivation of the


SPO+ Loss)

n Plugging in yields

IEOR 242, Fall 2019 - Lecture 27


+ 101

(Advanced: Derivation of the


SPO+ Loss)

n Finally, using yields

n This RHS is the definition of the SPO+ loss function:

IEOR 242, Fall 2019 - Lecture 27


+ 102

SPO+ Learning Problem

n Instead of the ideal SPO learning problem, we


solve the surrogate SPO+ problem:

n This problem can be solved efficiently with SGD


just as we saw before

IEOR 242, Fall 2019 - Lecture 27


+ 103

Example: Shortest Path Problem

n Dataset: New York City Hourly Traffic Estimates


from 2010-2013

n Contains traffic estimates for individual links of the


NYC road directed network, obtained from taxi
trips

n Some work had to be done to impute missing


travel times in a reasonable way

IEOR 242, Fall 2019 - Lecture 27


+ 104

Example: Shortest Path Problem,


n We focused on a particular neighborhood and
fixed a starting point and end point

n Given travel time estimates, the problem of getting


from Start to End is a linear optimization model
Ending Node

Starting Node
IEOR 242, Fall 2019 - Lecture 27
+ 105

Example: Shortest Path Problem,


cont.
n Training set contained data from 2010, 2011, 2012
n Testing set contained 2013 data
n Each observation corresponds to a particular hour of
the day on a particular date (e.g., January 1, 2010 at 3
AM)
n Each observation consists of 348 observed average
travel times – one for each edge in the graph of the
region. The average is over all observed travel times
during that particular hour.
n We used time information – day of week, month, and
hour – as the features

IEOR 242, Fall 2019 - Lecture 27


+ 106

Shortest Path Example, Results

n We compared a model was trained using the SPO+


loss with a model that uses linear regression (i.e.,
the least squares loss)

n We then looked at the average SPO loss of the two


models on the 2013 test data

Average SPO Average SPO Percent


Loss for LR Loss for SPO+ Improvement
105.47 66.60 38.88%

IEOR 242, Fall 2019 - Lecture 27


+ 107

Conclusions

n “Predict, then Optimize” setting provides one


perspective on the integration of ML and
optimization modeling

n Training a model in a way that is guided by the


task that you actually care about is a promising
direction

n More details in a recent paper I wrote (you can


understand a lot of it, I promise!):
https://arxiv.org/abs/1710.08005

IEOR 242, Fall 2019 - Lecture 27


+
Kidney Allocation and
Scoring Rules

IEOR 242, Fall 2019 - Lecture 27 108


+ 109

Kidney Transplants Background

n There are currently ~104,000 patients on the


waiting list for cadaver (deceased donor) kidneys
in the US according UNOS (United Network for
Organ Sharing)
n In 2016, there were 13,431 transplants of cadaver
kidneys in the U.S. and 5,629 transplants from
living donors
n About 5,000 patients die per year while on the
waiting list and about 4,000 are removed from the
list as “too sick to transplant”

IEOR 242, Fall 2019 - Lecture 27


+ 110

An Increasing Problem…

IEOR 242, Fall 2019 - Lecture 27


+ 111

The Need for Transplantation

n Kidney transplantation and maintenance dialysis


are currently the only two treatment options for
end-stage renal disease

n Dialysis treatment can last for at least 12 hours


each week and can lead to other health problems

n Studies demonstrate that a successful transplant,


especially before dialysis begins, can add on
average 10 to 15 years of life

IEOR 242, Fall 2019 - Lecture 27


+ 112

Two Analytics Problems

n Kidney Allocation for Deceased Donors:


n Once a kidney is procured, it can typically be preserved
for up to 36-48 hours
n Thus we have to quickly decide which patient on the
waiting list should receive the transplant

n Kidney Exchange for Living Donors:


n People in the family/friends circles are willing to donate a
kidney, but …
n … often donors are incompatible with their intended
recipients, which opens up the possibility of exchange

IEOR 242, Fall 2019 - Lecture 27


+ 113

Kidney Allocation Policies

n In today’s lecture, we’ll focus on kidney allocation


policies for deceased donors
n According to the OPTN (Organ Procurement and
Transplantation Network) who manages the
national waiting list, such a policy should:
n 1. Seek to achieve the best use of donated organs, and
avoid organ wastage
n 2. Set priority rankings based on sound medical judgment
n 3. Balance medical efficiency and equity, without
discriminating against patients based on their race, age,
blood type, etc.

IEOR 242, Fall 2019 - Lecture 27


+ 114

Kidney Allocation Policies, cont.

n It is also important for an allocation policy to be


sufficiently simple and transparent so that
patients and doctors can reasonably estimate their
chances of receiving an organ

n For these reasons, a scoring rule based policy is


employed:
n Each time a new organ is procured, calculate a score for
each patient
n Rank patients according to the scores and prioritize those
at the top of the list

IEOR 242, Fall 2019 - Lecture 27


+ 115

Kidney Allocation Policies, cont.

n The OPTN Kidney Transplantation Committee


(KTC) decided to revise their scoring rule policy in
2014

n Previous scoring rule was too inefficient and in


some ways unfair

nA major problem was that the scoring rule placed


too high of a weight on time spend waiting on the
list, thus patient waiting time was dominating
everything else

IEOR 242, Fall 2019 - Lecture 27


+ 116

Scoring Rule Components

n The OPTN Kidney Transplantation Committee identified


4 components that the new scoring rule should be
based on:
n LYFT(p, o) (life years from transplant) = the expected
incremental quality-adjusted life years gained of patient p
from receiving organ o, compared to remaining on dialysis
n DT(p) (dialysis time) = years the patient has already spent on
dialysis
n DPI(o) (donor profile index) = number between 0 and 1
indicating the quality of the donated organ (0 is highest
quality, 1 is lowest quality)
n CPRA(p) (calculated panel reactive antibody) = number
between 0 and 100 measuring the sensitization of the patient,
i.e., the patients likelihood of rejecting a new organ. (0
corresponds to the lowest sensitization level and thus the
smallest likelihood of rejecting.)

IEOR 242, Fall 2019 - Lecture 27


+ 117

A Dominant Proposal

n The KTC considered more than 28 different


scoring rules and utilized simulation to evaluate
their performance and tune their parameters

n Of these 28 proposals, the following formula for the


Kidney Allocation Score (KAS) proved to be
dominant:

KAS(p, o) = 0.8LYFT(p,o) * (1 – DPI(o)) + 0.8DT(p)*DPI(o) +


0.2DT(p) + 0.04CPRA(p)

IEOR 242, Fall 2019 - Lecture 27


+ 118

Some Intuition

KAS(p, o) = 0.8LYFT(p,o) * (1 – DPI(o)) + 0.8DT(p)*DPI(o) +


0.2DT(p) + 0.04CPRA(p)

n LYFT and DT are scaled by the DPI such that higher


quality organs place more emphasis on LYFT and lower
quality organs place more emphasis on dialysis time

n Dialysis time still gets some weight regardless

n CPRA gets a strong amount of emphasis since highly


sensitized patients should be given more donation
opportunities (many of which they will have to turn down)

IEOR 242, Fall 2019 - Lecture 27


+ Designing an Allocation Policy
with Optimization and Linear
Regression

IEOR 242, Fall 2019 - Lecture 27 119


+ 120

Perfect Hindsight/Foresight
Scenario
n Suppose that it’s January 1, 2019, a genie grants you
one wish, and you decide (quite generously) to
redo all of the kidney transplants that happened in
2018. What should you do?

n Put another way, suppose that it’s December 31,


2018, a genie grants you one wish, and you decide
(quite generously) that you would like to know the
characteristics of every organ that will be procured
in 2019 and the characteristics of every patient on
the waiting list. What should you do?

IEOR 242, Fall 2019 - Lecture 27


+ 121

Perfect Hindsight/Foresight
Scenario, cont.
n In such a perfect hindsight/foresight scenario, you
might consider solving an optimization problem

n Suppose that there are n different score


components (DPI(o), DT(p) etc.)

n Consider the set of patient-organ pairs eligible for


transplantation over the time horizon, i.e.,

IEOR 242, Fall 2019 - Lecture 27


+ 122

Patient-Organ Bipartite Graph

n The set C can be visualized as a bipartite graph

Patients Organs

1 1

2 2

.. ..
. .
P O

IEOR 242, Fall 2019 - Lecture 27


+ 123

Decision Variables

n Introduce binary decision variables x(p,o) such that

n How should a fractional value of x(p,o) be


interpreted?

IEOR 242, Fall 2019 - Lecture 27


+ 124

Fairness Constraints

n We can impose constraints on the allocation outcomes


to incorporate fairness considerations

n For example, we may want to impose lower bounds for


a specific group of patients (such as patients of blood
type O) on:
n The probability of receiving a transplant
n The average LYFT gained among the transplant recipients
within this group
n The average time spent on dialysis among the transplant
recipients within this group

n All of the above may be modeled as linear constraints

IEOR 242, Fall 2019 - Lecture 27


+ 125

Fairness Constraints, cont.

n For example, consider a particular group of


patients G and suppose that we want to ensure that
at least L organs should be allocated to the patients
in group G

n This requirement may be modeled with the


constraint:

IEOR 242, Fall 2019 - Lecture 27


+ 126

Fairness Constraints, cont.

n More generally, suppose that we have m fairness


constraints that may each be expressed as a linear
constraint

n Letting x denote the vector of allocation variables,


these constraints may be described as

n …for some matrix A and right-hand side vector b

IEOR 242, Fall 2019 - Lecture 27


+ 127

Linear Optimization Formulation

n Then the problem of maximizing the total life years


from transplant while meeting the fairness
requirements is:

IEOR 242, Fall 2019 - Lecture 27


+ 128

Shadow Prices

n Suppose that we solve the previous linear


optimization problem to optimality

n Then, the m constraints have associated


shadow prices (a.k.a. optimal dual
variables)

n The reduced “cost” (benefit) associated with the


shadow prices is:

IEOR 242, Fall 2019 - Lecture 27


+ 129

Reduced “Costs”

n The reduced “cost” value has the


interpretation that increasing by a small
amount results in an increase (or decrease) of
the objective function by

n There is a clear “benefit” from but


there is also a “penalty” due to the fact that it may
be more difficult to meet the fairness constraints
with an increase in

n captures this “penalty”


IEOR 242, Fall 2019 - Lecture 27
+ 130

Reduced Costs as a Scoring Rule

n The reduced cost value is a natural scoring


rule

n When an organ of type o is procured, allocate it to


the patient with the largest value of

n What’s wrong with this scoring rule?


n We can only calculate the values in hindsight!
n This scoring rule does not account for the n different score
components (DPI(o), DT(p) etc.)

IEOR 242, Fall 2019 - Lecture 27


+ 131

Incorporating the Score


Components
n Let denote the
values of the n score components for each patient
organ pair (p,o)
n For example we might have

n Let’s consider using linear regression to find


weights such that

IEOR 242, Fall 2019 - Lecture 27


+ 132

Overall Approach

n 1. Using historical data, solve the linear


optimization problem and obtain shadow prices

n 2. Compute the benefit of allocating organ o to


patient p that takes into account the shadow prices:

n 3. Use linear regression to find


such that:

IEOR 242, Fall 2019 - Lecture 27


+ 133

Evaluation

n How would you evaluate the previously described


approach?

n This approach was developed in the paper


“Fairness, Efficiency and Flexibility in Organ
Allocation for Kidney Transplantation” by D.
Bertsimas, N. Trichakis, and V. F. Farias

n In the paper, they use highly detailed data from


2008 and a simulator made available by the KTC to
evaluate their approach

IEOR 242, Fall 2019 - Lecture 27


+ 134

Evaluation, cont.

n First, the 2008 dataset is split in half so that the first


half is used to solve the linear optimization
problem and train the linear regression model and
the second half is used to evaluate performance
with the simulator

n The previously described 4 score components


were used and constraints were imposed to induce
fairness across racial groups, age groups, dialysis
time, blood types, diagnosis types, and
sensitization levels

IEOR 242, Fall 2019 - Lecture 27


+ 135

Resulting Formula

n The formula resulting from the linear optimization


based approach is:

KAS(p, o) = LYFT(p,o) + g(DT(p)) + 0.12CPRA(p)

n …where g(DT(p)) is a piecewise linear function:

n Recall the dominant formula selected by the KTC:


KAS(p, o) = 0.8LYFT(p,o) * (1 – DPI(o)) + 0.8DT(p)*DPI(o) +
0.2DT(p) + 0.04CPRA(p)
IEOR 242, Fall 2019 - Lecture 27
+ 136

Simulation
Results
n Simulation
results of the
KTC policy
and the
previously
described
new policy:

IEOR 242, Fall 2019 - Lecture 27


+ 137

Conclusions

n The described data-driven approach does not


require “hand tuning”, is flexible, and is adaptive
to changes in organ/patient trends over time

n The data-driven approach yields an 8.2%


improvement in net life years from transplant while
maintaining nearly identical fairness properties as
the KTC approach

n Is this data-driven approach being used in practice


by the KTC? I don’t know and don’t think so L

IEOR 242, Fall 2019 - Lecture 27

You might also like