Essays On Incentives and Measurement of Online Marketing Efforts
Essays On Incentives and Measurement of Online Marketing Efforts
by
Ron Berman
in
Business Administration
in the
Graduate Division
of the
Committee in charge:
Spring 2014
Essays on Incentives and Measurement of Online Marketing Efforts
Copyright 2014
by
Ron Berman
1
Abstract
This dissertation contains three essays that examine different aspects of online marketing
activities, the ability of marketers to measure the effectiveness of such activities, and the
design of experiments to aid in this measurement.
Chapter 2 examines the impact of search engine optimization (SEO) on the competition
between advertisers for organic and sponsored search results. The results show that a positive
level of search engine optimization may improve the search engine’s ranking quality and
thus the satisfaction of its visitors. In the absence of sponsored links, the organic ranking
is improved by SEO if and only if the quality provided by a website is sufficiently positively
correlated with its valuation for consumers. In the presence of sponsored links, the results
are accentuated and hold regardless of the correlation.
Chapter 3 examines the attribution problem faced by advertisers utilizing multiple adver-
tising channels. In these campaigns advertisers predominantly compensate publishers based
on effort (CPM) or performance (CPA) and a process known as Last-Touch attribution. Us-
ing an analytical model of an online campaign we show that CPA schemes cause moral-hazard
while existence of a baseline conversion rate by consumers may create adverse selection. The
analysis identifies two strategies publishers may use in equilibrium – free-riding on other
publishers and exploitation of the baseline conversion rate of consumers.
Our results show that when no attribution is being used CPM compensation is more
beneficial to the advertiser than CPA payment as a result of free-riding on other’s efforts.
When an attribution process is added to the campaign, it creates a contest between the
publishers and as a result has potential to improve the advertiser’s profits when no baseline
exists. Specifically, we show that last-touch attribution can be beneficial for CPA campaigns
when the process is not too accurate or when advertising exhibits concavity in its effects on
consumers. As the process breaks down for lower noise, however, we develop an attribution
method based on the Shapley value that can be beneficial under flexible campaign specifi-
cations. To resolve adverse selection created by the baseline we propose that the advertiser
will require publishers to run an experiment as proof of effectiveness.
2
Chapter 4 discusses the type of experiments an advertiser can run online and their re-
quired sample sizes. We identify several shortcomings of the current prevailing experimental
design that may result in longer experiments due to overestimation of the required sample
sizes.
We discuss the use of sequential analysis in online experiments and the different goals of
the experiments to make experiments more efficient. Using these techniques we show that a
significant lowering of required sample sizes is achievable online.
i
To Racheli
and
To Zsolt
Who supported and guided me with excellent ideas and careful attention.
ii
Contents
Contents ii
List of Figures iv
List of Tables v
1 Introduction 1
Bibliography 57
List of Figures
A.1 Mixed strategy equilibrium of an all-pay auction as a function of the headstart of player
1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
v
List of Tables
Acknowledgments
I want to deeply thank my advisor Prof. Zsolt Katona for advising me through the develop-
ment of the essays in this thesis, undivided attention and many helpful discussions. I would
also like to acknowledge and thank the crucial and helpful discussions with Prof. Ganesh
Iyer and Prof. Shachar Kariv.
This work has been partially supported by the California Management Review, the Joe
Shoong Foundation, Benton C. Coit and Lam Research.
Chapter 2 is reprinted by permission, Ron Berman and Zsolt Katona, The Role of Search
Engine Optimization in Search Marketing, Marketing Science. Forthcoming. Copyright
2013, the Institute for Operations Research and the Management Sciences, 5521 Research
Park Drive, Suite 200, Catonsville, Maryland 21228 USA.
1
Chapter 1
Introduction
During the past 20 years the Internet has changed the way marketers interact with consumers
and how consumers shop and consume content online. If at the beginning online behavior
mimicked offline traditions in terms of advertising and shopping experience, the past 10 years
have seen a dramatic shift in these trends towards mass customization, individual targeting,
and usage of experimentation and mechanism design instead of traditional market research.
Although there are many contributing factors to this shift, a few select trends can be
categorized as having substantial influence and include:
• Computing Power - The increase in computing power and ability to dynamically allo-
cate resources for solving complex problems just-in-time has made previously untackled
problems solvable. For example, firms today can estimate very large empirical models
using high dimensional data in an efficient manner. These models are used in estimat-
ing click through rates on keyword ads, estimating consumer preferences for products
and more.
• Data Availability - There is an increase in both the breadth and depth of information
collected on each consumer today. Information such as location, purchase history,
individual characteristics and more help firms react to consumer behavior in a more
nuanced manner than before.
• Individual Targeting - The ability to collect and track consumer data and to compute
an appropriate dynamic response is reinforced by the ability to follow consumers over
multiple sites and devices. As such there is a unique one-to-one matching between
collected data and an individual consumer.
This dissertation touches on three aspects related to online marketing activities - the im-
pact of incentives and market design on agents running marketing campaigns and competing
for profit, the ability to measure the performance of these agents when multiple activities
take place concurrently and the design of large scale experiments that aid this measurement
process. The linking theme among the essays is that standard analysis to date has typically
been done in isolation, ignoring the multitude of stakeholders taking part in the process and
using intuition as a guide to the interpretation of results.
Chapter 2 uses a game theoretical model to analyze the competition between websites
to achieve a higher ranking on organic search engine results. This phenomenon, known as
Search Engine Optimization (SEO), constitutes a substantial effort in terms of time and
financial resources invested by websites today. The intuition of consumers and the search
engine, however, leads to the conclusion that this type of activity may degrade search engine
results and lower consumer welfare. We show that this intuition is misguided and that search
results can improve with some level of SEO. Since search engines cannot exactly infer the
quality of each website and its matching for its search query, the search results will be noisy
and SEO can serve as a mechanism to remedy the errors. When sponsored ads are added
to the mix and the websites can choose whether to compete for organic links or sponsored
links, however, we show that the search engine’s profit may decrease, although consumers
and websites may benefit. As a result, there is a tradeoff between allowing more SEO to
increase consumer welfare and the volume of visitors and the decrease in profit. Our analysis
identifies this set of conditions and can serve as a guide for the design of search environments.
Chapter 3 examines the measurement and compensation problem an advertiser faces
when contracting with multiple online agents to display ads. Examples of these agents may
be a firm to buy sponsored ads on a search engine, a firm to perform SEO and a firm to run
a display ad campaign on another platform. The chapter focuses on display ad campaigns
with multiple channels that can autonomously decide on the number of ads to show and on
which consumers to target. I first build a game theoretical model that allows the analysis of
varied compensation schemes, and show that the current approaches for compensation and
measurement of campaign performance may result in moral hazard and adverse selection.
CHAPTER 1. INTRODUCTION 3
The conclusion of the analysis is that although performance metrics of campaigns may be
maximized by certain agents, the metrics themselves do not properly measure performance
and can be gamed by profit maximizing agents. Using concepts from cooperative game
theory, I then proceed to show how experimentation can be used to generate data that can
be used to estimate the true effectiveness of different advertising channels. An application on
real campaign data compares the current standard practice by firms to the proposed method
and identifies substantial discrepancies in current estimates of campaign effectiveness.
Chapter 4 expands on experimentation methods used online and focuses on the required
sample sizes to detect small effects of different online treatments. The analysis identifies
two approaches an experimenter may use to decrease sample sizes in experiments without
sacrificing the test’s statistical power and validity. First, I show how the majority of current
online analyses needlessly collect too much information than required or use flawed method-
ology. Second, I show that the sequential nature of consumer arrival to websites and thus to
the experiment can be exploited to make early decision about the termination of experiments
when results exceed expectations to the better or worse. The chapter includes a technical de-
scription of the techniques that can be used to achieve these goals and substantially decrease
the sample sizes required in experiments.
The structure of the chapters follows that of traditional marketing literature. Each
chapter begins with a detailed overview of the problem at hand and the results, followed
by a model and detailed analysis. Most technical proofs are relegated to appendices, with
additional results and extensions of interest appearing in an appendix as well.
4
Chapter 2
2.1 Overview
Consumers using a search engine face the option of clicking organic or sponsored links. The
organic links are ranked according to their relevance to the search query, while the sponsored
links are allocated to advertisers through a competitive auction. Since consumers tend to
trust organic links more, advertisers often try to increase their visibility in the organic list by
gaming the search engine’s ranking algorithm using techniques collectively known as search
engine optimization (SEO)1 .
A notable example of the dramatic impact an SEO campaign can have is that of JCPen-
ney, an American retailer. This retailer’s organic links skyrocketed during the 2010 holiday
shopping season and suddenly climbed to the top of the search results for many general
keywords such as “dresses”, “bedding” and “furniture”.2 JCPenney eventually fired their
SEO contractor after finding out that they used “black hat” techniques that eventually led
to a punitive response from Google. Search engine optimization is widespread in the world
of online advertising; a 2010 survey of 1500 advertisers and agencies revealed that 90% of
them engaged in SEO compared to 81% who purchased sponsored links.3 In the past few
years, search engine optimization has grown to become a multi-billion dollar business.4
This chapter explores the economics of the SEO process and its effects on consumers,
advertisers and search engines. Using a game theoretical model we fully characterize the
incentives and tradeoffs of all players in the ecosystem. Our model consists of (i) advertisers
with exogenous qualities and potentially correlated valuations for clicks, competing for the
attention of consumers, (ii) a search engine that offers both organic and sponsored links
1
We focus only on “black hat” SEO which does not improve the actual relevance of the webpage to the
query, but just games the ranking algorithm.
2
“The Dirty Little Secrets of Search”, The New York Times, Feb 12, 2011.
3
“The SEMPO Annual State of Search Survey 2010”.
4
“US Interactive Marketing Forecast, 2009 to 2014”, Forrester Research, July 6, 2009.
CHAPTER 2. THE ROLE OF SEARCH ENGINE OPTIMIZATION IN SEARCH
MARKETING 5
and can set minimum bids, and (iii) consumers who engage in costly search to find the
highest quality site. In order to capture the effect of SEO, we model the imperfections in the
algorithms used by search engines, assuming that there is a measurement error that prevents
the search engine from perfectly ordering links according to quality. Advertisers can, in
turn, manipulate the potentially erroneous quality observations to their advantage through
SEO and improve their ranking. A key parameter of our model is the effectiveness of SEO,
determining the extent to which SEO efforts by advertisers affect the organic results.
We first ask how SEO changes the organic results and whether these changes are always
detrimental to consumers and high quality advertisers. The interest in this question stems
from the strong stance that search engines typically take against SEO by emphasizing the
potential downside on organic link quality. To justify their position, search engines typically
claim that manipulation of search engine results hurts consumer satisfaction and decreases
the welfare of “honest” sites. In contrast, search engines also convey the message that the
auction mechanism for sponsored links ensures that the best advertisers will obtain the links
of highest quality, resulting in higher social and consumer welfare. This reasoning suggests
that consumers should trust sponsored links more than organic links in equilibrium, and
would prefer to start searching on the sponsored side. A substantial contribution of using a
sophisticated model for consumers is that we are able to derive their optimal search behavior.
Contrary to claims by search engines, we find that search engines fight SEO because of the
trade-off advertisers face between investing in sponsored links and investing in influencing
organic rankings. Consequently, search engines may lose revenue if sites spend significant
amounts on SEO activities instead of on paid links and content creation.
To approach the issue of diminished welfare from SEO, we first focus on the case where
sponsored links are not available to advertisers and consumers. This base model serves as a
benchmark and gives us a deeper understanding of the nature of the competition for organic
links when using SEO activities. Our first result reveals that SEO can be advantageous
by improving the organic ranking. In the absence of sponsored links, this only happens
when advertiser quality and valuation are positively correlated. That is, if sites’ valuations
for consumers are correlated with their qualities then consumers are better off with some
positive level of SEO than without. By contrast, if there are sites that extract high value from
visitors yet provide them with low quality then SEO is generally detrimental to consumer
welfare. The SEO process essentially allows sites with a high value for consumers to correct
the search engine’s imperfect ranking through a contest.
The second question we ask focuses on the full interaction between organic and sponsored
links when SEO is possible. The institutional differences between the organic and sponsored
lists are critical to the understanding of our model. First, advertisers usually pay for SEO
services up front and the effects can take months to materialize. Bids for sponsored links,
on the other hand, can be frequently adjusted depending on the ordering of the organic
list. Second, SEO typically involves a lump sum payment for initial results and the variable
portion of the cost tends to be convex, whereas payment for sponsored links is on a per-click
basis with very little or no initial investment. Finally, there is substantial uncertainty as
to the outcome of the SEO process depending on the search engine algorithms, whereas
CHAPTER 2. THE ROLE OF SEARCH ENGINE OPTIMIZATION IN SEARCH
MARKETING 6
sponsored links are allocated through a deterministic auction.
Interestingly, the presence of sponsored links accentuates the results of the base model
and SEO favors the high quality advertiser regardless of the correlation between quality and
valuation. The intuition is that sponsored links act as a backup for high quality advertisers
in case they do not possess the top organic link. When consumers have low search costs, they
will eventually find the high quality advertiser, reducing the value of the organic position
for a low quality player. In equilibrium, consumers will start searching on the organic side
and high quality sites will have an increased chance of acquiring the organic link as SEO
becomes more effective.
Although SEO clearly favors high quality advertisers, we find that there is a strong
tension between the interests of consumers and the search engine. As advertisers spend
more on SEO and consumers are more likely to find what they are looking for on the organic
side, they are less likely to click on revenue generating sponsored links. This tension may
explain why search engines take such a strong stance against SEO, even though they favor a
similar mechanism on the sponsored side. Furthermore, we obtain an important normative
result that could help search engines mitigate the revenue loss due to SEO: we find that there
is an optimal minimum bid the search engine can set that is decreasing in the intensity of
SEO. Setting the minimum bid too high, however, could drive more advertiser dollars away
from the sponsored side towards SEO.
As common the practice of SEO may be, research on the topic is scant. Many papers have
focused on sponsored links and some on the interaction between the two lists. In all of these
cases, however, the ranking of a website in the organic list is assumed exogenous, and the
possibility of investing in SEO is ignored. On the topic of sponsored search, works such as
those by Rutz and Bucklin (2011) and Ghose and Yang (2009) focus on consumer response to
search advertising and the different characteristics that impact advertising efficiency. Other
recent examples, such as those by Chen and He (2011), Athey and Ellison (2012) and Xu
et al. (2011) analyze models that include both consumers and advertisers as active players.
A number of recent papers study the interplay between organic and sponsored lists.
Katona and Sarvary (2010) show that the top organic sites may not have an incentive to bid
for sponsored links. In an empirical piece, Yang and Ghose (2010) show that organic links
have a positive effect on the click-through rates of paid links, potentially increasing profits.
Taylor (2012), White (2009) and Xu et al. (2012) study how the incentives of the search
engine to provide high quality organic results are affected by potential losses on sponsored
links. The general notion is that search engines have an incentive to provide lower quality
results in order to maximize revenues.
The work of Xing and Lin (2006) is the closest antecedent to our work. It defines
“algorithm quality” and “algorithm robustness” to describe the search engine’s ability to
accurately identify relevant websites. Their paper shows that when advertisers’ valuations
for organic links is high enough, SEO is sustainable and SEO service providers can then
free-ride on the search engine due to their “parasitic nature”. The relationship between
advertiser qualities and valuations and the strategic nature of consumer search are not taken
into account. An earlier work by Sen (2005) develops a theoretical model that examines the
CHAPTER 2. THE ROLE OF SEARCH ENGINE OPTIMIZATION IN SEARCH
MARKETING 7
optimal strategy of mixing between investing in SEO and buying ad placements. Surprisingly,
the model shows that SEO should not exist as part of an equilibrium strategy.
2.2 Model
We set up a static game in which consumers search for a phrase and advertisers compete for
their visits. We assume there is a monopolistic search engine that provides search results to
consumers by displaying links to one of two websites. These sites can also buy sponsored
links from the search engine. Whenever a consumer enters the search phrase, the search
engine ranks the sites according to a scoring mechanism, and presents one organic link
and one sponsored link according to the scores and bids of the sites. The incentives and
characteristics of the search engine, advertisers, and consumers are described below.
Proposition 1.
1. When ρ = 1, any α > 0 which is not too large improves the efficiency of the ranking and
consumer satisfaction. However, when ρ = −1, SEO is detrimental to consumer satis-
faction. For intermediate −1 < ρ < 1 values, SEO can improve consumer satisfaction
for some α values.
2. Suppose α is small. When ρ = −1, both sites’ profits are decreasing in α. When ρ = 1,
sites’ profits are decreasing in α, except for the higher quality site, whose profits are
increasing iff vH > 2vL .
The first part demonstrates the main effect of equilibrium SEO investments on the rank-
ing. The SEO mechanism gives both sites incentives to invest in trying to improve their
ranking, but favors bidders with high valuations. Since the search engine cannot measure
site qualities perfectly, this mechanism corrects some of the error when valuations are posi-
tively correlated with qualities. On the flip side, when lower quality sites have high valuations
for traffic, SEO creates incentives that are not compatible with the utilities of consumers.
In this latter case, the high valuation sites that are not relevant can get ahead by investing
in SEO. Examples are cases of “spammer” sites that intentionally mislead consumers. Con-
sumers gain little utility from visiting such sites, but these sites may profit from consumer
visits.
Closer examination of the proof suggests that ∂P∂α∂σ (α,σ)
is positive for small α’s. This
suggests, somewhat counter-intuitively, that investments against SEO on the search engine’s
part complement investments in better search algorithms rather than substitute them. That
is, only search engines that are already very good at estimating true qualities should fight
hard against SEO. Nevertheless, as measurement error can depend on exogenous factors and
can vary from keyword to keyword, it may make sense to allow higher levels of SEO in areas
where the quality measurement is very noisy.
To analyze the relationship between α and advertiser profits we focus on small levels7 of
α. As the second part of the proposition shows, the player with the lower valuation is always
worse off with higher SEO effectiveness regardless of its quality. The only site that benefits
from SEO is the one with a quality advantage, and only if its valuation is substantially
higher than its competitor’s. The intuition follows from the fact that higher levels of SEO
emphasize the differences in valuations; the higher the difference the more likely that the
higher valuation will win. Importantly, an advantage in valuation only helps when the site
also has a higher quality, that is, spammer sites with low quality and high valuation will not
benefit from SEO due to the intense competition with better sites.
7
This relationship can be quite complex in the general case.
CHAPTER 2. THE ROLE OF SEARCH ENGINE OPTIMIZATION IN SEARCH
MARKETING 11
The Role of Sponsored Links
We now examine how the availability of sponsored advertising changes the incentive of in-
vesting in SEO and the resulting link order. Since the search engine’s main source of revenue
comes from sponsored links, this analysis is crucial to understanding how SEO affects the
search engine’s revenue. We solve the model outlined in Section 2.2 with r < vH . That is,
at the minimum, sites with a high valuation will be able to pay for sponsored links. When
describing the intuition, we focus on the case of r < vL so that any site can afford sponsored
links.
In order to determine advertisers’ SEO efforts and sponsored bids, we also need to uncover
where consumers start their search process. We assume that consumers always incur a small,
but positive search cost. They have rational expectations and start with the link that gives
them the highest probability of finding a high quality result without searching further. The
following proposition summarizes our main results.
Proposition 2. There exists a c > 0, such that if c < c then
1. In the unique equilibrium consumers begin their search on the organic side.
2. If r < vL the likelihood of a high quality organic link is increasing in α for any −1 ≤
ρ ≤ 1.
3. If vL ≤ r, the likelihood of a high quality organic link is increasing in α iff ρ is high
enough.
4. The search engine’s revenue increases in α iff the likelihood of a high quality organic
link decreases.
In short, we prove that the presence of sponsored links accentuates the potential benefits
of SEO on increasing the quality of the organic link. As α increases and SEO becomes
more effective, the probability that the higher quality site acquires the organic link increases
even if advertisers’ qualities and their valuations for consumers are negatively correlated.
Contrary to the commonly held view that SEO often helps low quality sites climb to the top
of the organic list if they have enough resources, we find that in the presence of sponsored
links, low quality sites cannot take advantage of SEO. The intuition relies on the notion that
sponsored links serve as a second chance to acquire clicks from the search engine for the site
that does not possess the organic link. However, as a result of exhaustive consumer search,
high quality sites enjoy a distinct advantage as they are likely to be found no matter what
position they are in. Low quality advertisers, on the other hand, suffer if a higher quality
competitor is also on the search page. Thus a low quality site’s incentive to obtain the
organic link will be reduced, while high quality sites will face less competition in the SEO
game and will be more likely to win it. For high quality sites, the main value of acquiring
the top organic link is not merely the access to consumers. Instead, the high quality site
benefits from the organic link because it does not have to pay for the access to consumers,
as it would have to on the sponsored side.
CHAPTER 2. THE ROLE OF SEARCH ENGINE OPTIMIZATION IN SEARCH
MARKETING 12
In the ensuing equilibrium, high quality advertisers always spend more on SEO than their
low quality competitors. Since this increases the chances of high quality organic links, we
find that rational consumers start their search on the organic side. Consumers benefit from
finding a high quality link as early as possible, and thus more effective SEO increases their
welfare by increasing the likelihood of a high quality organic link. This fact, however, hurts
the search engine whose revenues decrease when the high quality advertiser competes less for
the sponsored link. The misalignment between consumer welfare and search engine profits
has already been recognized by White (2009) and Taylor (2012). Our results reconfirm this
tension and shed light on an interesting fact: The main danger of SEO for search engines is
not the disruption of the organic list which has long-term impact on reputation and visitors,
but rather decreased revenues on the sponsored side which are of a short-term nature. Often
advertisers pay third parties to conduct SEO services instead of paying the search engine for
sponsored links. The result from the advertiser’s perspective is not much different, but the
search engine is stripped of significant revenues.
The search engine has an important tool on the sponsored side – setting the minimum
bid that affects what the winning advertiser pays. In the absence of SEO, an increased
minimum bid directly increases the revenue from advertisers who have a valuation above the
minimum bid. When SEO is possible the situation is different:
Corollary 1. There exists an r̂(α) > 0 such that the search engine’s revenue is increasing
in r for r < r̂(α) and decreasing for r̂(α) < r < vL . When vL is high enough then r̂(α) is
the unique optimal minimum bid which is decreasing in α.
The inverse U-shape of the effect is a result of two opposing forces. An increasing min-
imum bid increases revenue directly. However, in the presence of SEO, a higher minimum
bid makes sites invest more in SEO, which makes the high quality site more likely to acquire
the organic link. This, in turn, will lower sponsored revenues as most of these revenues
come from the case when the low quality site possesses the organic link. The combination
of these two forces will make the search engine’s revenue initially increase with an increased
minimum bid, but begin to decrease when sites invest more in SEO. The maximal profit is
reached at a lower minimum bid as SEO becomes more effective (α increases). Finally, we
examine how a site’s revenues are affected by SEO.
Corollary 2. If r < vL and the two sites have different qualities, the profit of the higher
quality site increases, while the profit of the lower quality site decreases in α.
Chapter 3
3.1 Overview
Digital advertising campaigns in the U.S. commanded US $36.6 Billion in revenues during
2012 with an annual growth rate of 19.7% in the past 10 years,1 surpassing all other media
spending except broadcast TV. In many of these online campaigns advertisers choose to
deliver ads through multiple publishers with different media technologies (e.g. Banners,
Videos, etc.) that can reach overlapping target populations.
This chapter analyzes the attribution process that online advertisers perform to compen-
sate publishers following a campaign in order to elicit efficient advertising. Although this
process is commonly used to benchmark publisher performance, when asked about how the
publishers compare, advertisers’ responses range from “We don’t know” to “It looks like
publisher X is best, but our intuition says this is wrong.” In a recent survey2 , for example,
only 26% of advertisers claimed they were able to measure their social media advertising
effectiveness while only 37% of advertisers agreed that their facebook advertising is effective.
In a time when consumers shift their online attention towards social media, it is surprising
to witness such low approval of its effectiveness.
To illustrate the potential difficulties in attribution from multiple publisher usage, Figure
3.1 depicts the performance of a car rental campaign exposed to more than 13 million online
consumers in the UK, when the number of converters3 and conversion rates are broken down
by the number of advertising publishers that consumers were exposed to. As can be seen, a
large number of converters were exposed to ads by more than one publisher; it also appears
that the conversion rate of consumers increases with the number of publishers they were
exposed to.
An important characteristic of such multi-publisher campaigns is that the advertisers do
not know a-priori how effective each publisher may be. Such uncertainty may arise, e.g., when
1
Source: 2012 IAB internet advertising revenue report.
2
Source: “2013 Social Media Marketing Industry Report”, www.socialmediaexaminer.com
3
Converters are car renters in this campaign. Conversion rate is the rate of buyers to total consumers.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 15
0.90%
14,000
0.80%
12,000
0.70%
Conversion Rate
10,000 Converters
0.60%
Converters
Conversion Rate
8,000 0.50%
0.40%
6,000
0.30%
4,000
0.20%
2,000
0.10%
- 0.00%
0 1 2 3 4
No. of Channels
publishers can target consumers based on prior information, when using new untested ads or
because consumer visit patterns shift over time. Given that online campaigns collect detailed
browsing and ad-exposure history from consumers, we ask what obstacles this uncertainty
may create to the advertiser’s ability to properly mount a campaign.
The first obstacle that the advertiser faces during multi-publisher campaigns is that the
ads interact in a non-trivial manner to influence consumers. From the point of view of
the advertiser, getting consumers to respond to advertising constitutes a team effort by the
publishers. In such situations a classic result in the economics literature is that publishers
can piggyback on the efforts of other publishers, thus creating moral hazard (Holmstrom,
1982). If the advertiser tries to base its decisions solely on the measured performance of the
campaign, such free-riding may prevent it from correctly compensating publishers to elicit
efficient advertising.
A second obstacle an advertiser may face is lack of information about the impact of
advertising on different consumers. Since the decision to show ads to consumers is delegated
to publishers, the advertiser does not know what factors contributed to the decision to display
ads nor does it know the impact of individual ads on consumers. The publishers, on the
other hand, have more information about the behavior of consumers and their past actions,
especially on targeted websites with which consumers actively interact such as search-engines
and social-media networks. Such asymmetry in information about ad effectiveness may create
adverse selection – publishers who are ineffective will be able to display ads and claim their
effectiveness is high, with the advertiser being unable to measure their true effectiveness.
To address these issues advertisers use contracts that compensate the publishers based
on the data collected during a campaign. We commonly observe two types of contracts in
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 16
the industry: effort based and performance based contracts. In an effort based contract,
publishers receive payment based on the number of ads they showed during a campaign.
These schemes, commonly known as cost per mille (CPM) are popular for display (banner)
advertising, yet their popularity is declining in favor of performance based payments.
Performance based contracts, in contrast, compensate publishers by promising them a
share of the observed output of the campaign, e.g., number of clicks, website visits or pur-
chases. The popularity of these contracts, called Cost Per Action (CPA), has been on the
rise, prompting the need for an attribution process whose results are used to allocate com-
pensation. Among these methods, the popular last-touch method credits conversions to the
publisher that was last to show an ad (“touch the consumer”) prior to conversion. The ra-
tionale behind this method follows traditional sales compensation schemes – the salesperson
who “closes the deal” receives the commission.
This chapter uses analytical modeling to focus on the impact of different incentive schemes
and attribution processes on the decision of publishers to show ads and the resulting profits
of the advertisers. Our goal is to develop payment schemes that alleviate the effects of moral-
hazard and asymmetric information and yield improved results to the advertiser. To this end
Section 3.3 introduces a model of consumers, two publishers and an advertiser engaged in an
advertising campaign. Consumers in our model belong to one of two segments: a baseline
and a non-baseline segment. Baseline consumers are not impacted by ads yet purchase
products regardless. In contrast, exposure to ads from multiple publishers has a positive
impact on the purchase probabilities of non-baseline consumers. Our model allows for a
flexible specification of advertising impact, including increasing returns (convex effects) and
decreasing returns (concave effects) of multiple ad exposures. The publishers in our model
may have private information about whether consumers belong to the baseline and make
a choice regarding the number of ads to show to every consumer in each segment. The
advertiser, in its turn, designs the payment scheme to be used after the campaign as well as
the measurement process that will determine publisher effectiveness.
Section 3.4 uses a benchmark fixed share compensation scheme to show that moral-hazard
is more detrimental to advertiser profits than using effort based compensation. We find that
CPM campaigns outperform CPA campaigns for every type of conversion function and under
quite general conditions. As ads from multiple publishers affect the same consumer, each
publisher experiences an externality from actions by other publishers and can reduce its
advertising effort, raising a question about the industry’s preference for this method. We
give a possible explanation for this behavior by focusing on single publisher campaigns in
which CPA may outperform CPM for convex conversion functions.
Since CPA campaigns suffer from under-provision of effort by publishers, we observe that
advertisers try to make these campaigns more efficient by employing an attribution process
such as last-touch. By adding this process advertisers effectively create a contest among the
publishers to receive a commission, and can counteract the effects of free-riding by incen-
tivizing publishers to increase their advertising efforts closer to efficient amounts. We include
attribution in our model through a function that allocates the commission among publishers
based on the publishers’ efforts and performance and has the following four requirements:
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 17
In Section 3.7 we investigate whether evidence exists for baseline exploitation or publisher
free-riding in real campaign data. The data we analyze comes from a car rental campaign in
the UK that was exposed to more than 13.4 million consumers. We observe that the budgets
allocated to publishers exhibit significant heterogeneity and their estimates of effectiveness
are highly varied when using last-touch methods. An estimate of publisher effectiveness when
interacting with other publishers, however, gives an indication for baseline exploitation as
predicted by our model, and lends credibility to the focus on the baseline in our analysis.
Evidence for such exploitation can be gleaned from Figure 3.2, which describes the conversion
behavior of consumers who were exposed to advertising only after visiting the car rental
website without purchasing. If we compare the conversion rate of consumers who were
2000 3.50%
1800
Converters 3.00%
1600 Conversion Rate
1400 2.50%
Conversion Rate
1200
Converters
2.00%
1000
1.50%
800
600 1.00%
400
0.50%
200
0 0.00%
0 1 2 3 4
Number of Channels exposed
exposed to two or more publishers post-visit, it would appear that the advertising had little
effect compared to no exposure post-visit.
We posit that the publishers target consumers with high probability of buying in order to
be credited with the sale which is a by-product of the attribution method used by advertisers.
To try and identify publishers who free-ride on others, we calculate an estimate of average
marginal contributions of publishers based on the Shapley value, and use these estimates to
compare the performance of publishers to last-touch methods. Calculating this value poses a
significant computational burden and part of our contribution is a method to calculate this
value that takes into account specific structure of campaign data. The results, which were
communicated to the advertiser, show that a few publishers operate at efficient levels, while
others target high baseline consumers to game the compensation scheme. We are currently
in the process of collecting the information about the changes in behavior of publishers as a
result of employing the Shapley value, and the results of this investigation is currently the
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 19
focus of research. To the best of our knowledge, this is the first large scale application of
this theoretical concept appearing in the literature.
The discussion in Section 3.8 examines the impact of heterogeneity in consumer behavior
on publisher behavior and the experimentation mechanism. We conclude with consideration
of the managerial implications of proper attribution.
attribution methods. In a recent survey5 54% of advertisers indicated they used a last-
touch method, while 42% indicated that being “unsure of how to choose the appropriate
method/model of attribution” is an impediment to adopting an attribution method. Re-
search focusing on the advertiser’s problem of measuring and compensating multiple pub-
lishers is quite recent, however, with the majority focusing on empirical applications to
specific campaign formats. Tucker (2012) analyzes the impact of better attribution tech-
nology on campaign decisions by advertisers. The paper finds that improved attribution
technology lowered the cost per attributed converter. The paper also overviews theoretical
predictions about the impact of refined measurement technology on advertising prices and
makes an attempt to verify these claims using the campaign data. Kireyev et al. (2013) and
Li and Kannan (2013) build specific attribution models for online campaign data using a
conversion model of consumers and interaction between publishers. They find that publish-
ers have strong interaction effects between one another which are typically not picked up by
traditional measurements.
On the theory side, classic mechanism design research on team compensation closely
resembles the problem an advertiser faces. Among the voluminous literature on cooperative
production and team compensation the classic work by Holmstrom (1982) analyzes team
compensation under moral hazard when team members have no private information. Our
contribution is in the fact that the advertiser is a profit maximizing and not a welfare
maximizing principal, yet we find similar effects and design mechanisms to solve these issues.
Consumers
Consumers in the model visit both publishers’ sites and are exposed to advertising, resulting
in a probabilistic decision to “convert”. A conversion is any target action designated by the
advertiser as the goal of the campaign that can also be monitored by the advertiser directly.
Such goals can be the purchase of a product, a visit to the advertiser’s site or a click on an
ad.
The response of consumers to advertising depends on the effectiveness of advertising as
well as on the propensity of consumers to convert without seeing any ads which we call the
baseline conversion rate. The baseline captures the impact of various states of consumers
5
Source: “Marketing Attribution: Valuing the Customer Journey” by EConsultancy and Google.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 21
resulting from exogenous factors such as brand preference, frequency of purchase in steady
state and effects of offline advertising prior to the campaign. When each publisher i ∈ {1, 2}
shows qi ads, we let (q1 + q2 )ρ denote the conversion rate of consumers who have a zero
baseline.6 By denoting the baseline probability of conversion as s, the advertiser expects to
observe the following conversion rate after the campaign:
The values of ρ and s are determined by nature prior to the campaign and are ex-
ogenous. To focus on pure strategies of advertising, we assume that 0 < ρ < 2.7 The
assumption implies that additional advertising has a positive effect on the probability of
buying of a consumer, yet allows both increasing and decreasing returns. When ρ < 1 the
response of consumers to additional advertising has decreasing returns and publishers’ ads
are substitutes. When ρ > 1 publishers’ ads are complements.
Finally, we let the baseline s be distributed s ∼ Beta(α, β) with parameters α > 0, β > 0.
The flexible structure will let us understand the impact of various campaign environments
on the incentives of advertisers and publishers.
Publishers
Publishers in the model make a simultaneous choice about the number of ads qi to show
to each consumer and try to maximize their individual profits. When showing these ads
publishers incur a cost resulting from their efforts to attract consumers to their websites.
q2
We define the cost of showing qi ads as 2i . Both publishers have complete information about
the values of ρ and s, as well as the conversion function x and the cost functions.
At the end of the campaign, each publisher receives a payment bi from the advertiser
that may depend on the amount of ads that were shown and the conversion rate observed
by the advertiser. The profit of each publisher i is therefore:
qi2
ui = bi (q1 , q2 , x) − (3.2)
2
The Advertiser
The advertiser’s goal is to maximize its own profit by choosing the payment contract bi to
use with each publisher prior to the campaign. The structure of the conversion function x,
as well as the value of ρ are known to the advertiser. Initially, we assume as a benchmark
that the baseline s is known to the advertiser, which we normalize to zero without loss of
generality. The goal of this assumption, to be relaxed later, is to distinguish the effects
6
The additivity of advertising effects is not required but simplifies exposition. Asymmetric publisher
effectiveness is discussed in Appendix B.2.
7
Restricting ρ < 2 is sufficient for the existence of profitable pure strategies when costs are quadratic.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 22
of strategic publisher interaction on the advertiser’s profit from the effects of additional
information the publishers may have about consumers.
Normalizing the revenue from each consumer to 1, the profit of the advertiser is then:
qi2
ui = qi pM
i − (3.4)
2
CPA contracts (cost per action) are performance based contracts. In these contracts the
advertiser designates a target action to be carried out by a consumer, upon which time a
price pA
i will be paid to the publishers involved in causing the action. The prices are defined
as a share of the revenue x, yielding the following publisher profit:
qi2
ui = (q1 + q2 )ρ pA
i − (3.5)
2
The timing of the game is illustrated in Figure 3.3. The advertiser first decides on a
compensation scheme based on the observed efforts qi , performance x or both. The publishers
in turn learn the value of the baseline s and make a decision about how many ads qi to show
to the consumers. Consumers respond to ads and convert according to x(q1 , q2 ). Finally, the
advertiser observes qi and x, compensates each publisher with bi and payouts are realized.
Several features of the model make the analysis interesting and are considered in the
next sections. The first is that the interaction among the publishers is essentially of a
team generating conversions. A well known result by Holmstrom (1982) shows that no fixed
allocation of output among team members can generate efficient outcomes without breaking
the budget. In our model, however, a principal is able to break the budget, yet its goal is
profit maximization rather than efficiency. Nonetheless, the externality that one publisher
causes on another by showing ads will create moral hazard under a CPA model as will be
presented in the next section.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 23
Time
Consumer Baseline Consumer Payouts
s Response ⇡,ui
x(q1 , q2 )
The second feature is that under CPM payment neither the performance of the campaign
nor the effect of the baseline enter the utility function of the publishers directly and therefore
do not impact a publishers’s decision regarding the number of ads to show. Consequently,
if the advertiser does not use the performance of the campaign as part of the compensation
scheme, adverse selection will arise.
Finally, we note that both the effort of the publishers as well as the output of the
campaign are observed by the advertiser. Traditional analysis of team production problems
typically assumed one of these is unobservable by the advertiser and cannot be contracted
upon. Essentially, CPA campaigns ignore the observable effort while CPM campaigns ignore
the observable performance. As we will show, a primary effect of an attribution process is
to tie the two together into one compensation scheme.
We now proceed to analyze the symmetric publisher model under CPM and CPA pay-
ments. The analysis builds towards the inclusion of an attribution mechanism with a goal
of making multi-publisher campaigns more profitable for the advertiser.
M M ρ ρ 2−ρ
2
q =p = arg max(2p) − 2p = (3.7)
p 2
In contrast, under a CPA contract, publisher i will choose qi to solve the first order
condition qi = ρ(qi + q−i )ρ−1 pA A A A A
i . Invoking symmetry again, we expect p1 = p2 and q1 = q2 ,
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 24
as a result yielding:
1
q A = ρ2ρ−1 pA
2−ρ
(3.8)
We notice that the number of ads displayed in a CPA campaign increases with the price pA
offered to the publishers.
By performing the full analysis and solving for the equilibrium prices pM and pA offered
by the advertiser we find the following:
Proposition 3. When 0 < ρ < 2:
• q A < q M < q ∗ - the level of advertising under CPA is lower than the level under CPM.
Both of these are lower than the efficient level of advertising.
• π M > π A - the profit of the advertiser is higher when using CPM contracts.
• There exists a critical value ρc with 0 < ρc < 1 s.t. for ρ < ρc , uA > uM and CPA is
more profitable for the publishers. When ρ > ρc , uM > uA and CPM is more profitable
for the publishers.
Proposition 3 shows that using CPA causes the publishers to free-ride and not provide
enough effort to generate sales in the campaign. The intuition is that the externality each
publisher receives from the other publisher gives an incentive to lower efforts, which con-
sequently lowers total output of the campaign. Under CPM payment, however, publishers
do not experience this externality and cannot piggyback on efforts by other publishers. By
properly choosing a price for an impression, the advertiser can then incentivize the publishers
to show a higher number of ads.
In terms of profits, we observe that advertisers should always prefer to use CPM contracts
when multiple publishers are involved in a campaign. This counter-intuitive result stems from
the fact that the resulting under-provision of effort overcomes the gains from cooperation by
the publishers even when complementarities exist.
The final part of Proposition 3 gives one explanation to the market observation that
campaigns predominantly use CPA schemes. When the publishers have market power to
determine the payment scheme, e.g. the case of Google in the search market, the publishers
should prefer a CPA based payment when ρ is small, i.e., when publishers are extreme
substitutes. In this case, the possibility for free-riding is at its extreme, and even minute
changes in efforts by competing publishers increase the profits of each publisher significantly.
For example, if consumers are extremely prone to advertising and a single ad is enough to
influence them to convert, any publisher that shows an ad following the first one immediately
receives “free” commission. If a search engine which typically arrives later in the buying
process of a consumer, is aware of that, it will prefer to use CPA payment to free-ride on
previous publisher advertising.
A question that arises is about the motivation of advertisers, in contrast to publishers, to
prefer CPA campaigns over CPM ones. The following corollary shows that when advertisers
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 25
do not take into account the interaction between the publishers, CPA campaigns are also
profitable for the advertiser.
Corollary 3. When there is one publisher in a campaign and 0 < ρ < 2:
• q A > q M iff ρ > 1: the publisher shows more ads under CPA payment.
• π A > π M iff ρ > 1: more revenue and more profit is generated for the advertiser when
using CPA payment and advertising has increasing returns (ρ > 1).
Corollary 3 reverses some of the results of Proposition 3 for the case of one publisher
campaigns. Since free-riding is not possible in these campaigns, we find that CPA campaigns
better coordinate the publisher and the advertiser when ads have increasing marginal returns,
while CPM campaigns are more efficient for decreasing marginal returns.
qi2
uA A
i = fi (qi , q−i , x)x(q1 , q2 )p − (3.9)
2
An initial observation is that the process creates a contest between the two publishers
for credit. Once ads have been shown, the investment has been sunk yet credit depends
on delayed attribution. It is well known (see, e.g., Sisak (2009) and Konrad (2007)) that
contests will elicit the agents to overexert effort in equilibrium compared to a non-contest
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 26
situation. As a result the attribution process can be used to incentivize the publishers to
increase their efforts and show a number of ads closer to the integrated market levels.
In the next section we analyze the impact of the commonly used last-touch attribution
method, and compare it to a new method based on the Shapley value we developed to
attribute performance in online campaigns.
property described above. It also trivially has the 3 other properties. The second property
is that last-touch attribution makes use of the conversion rate only in a trivial manner. The
credit given to the publisher only depends on the number of ads shown to a consumer and
whether the consumer had converted. It does not depend on the actual conversion rate of
the consumer and therefore ignores the value of x.
It is useful to examine the equilibrium best response of the publishers in a CPA campaign
in order to understand the impact of last-touch attribution on the quantities of ads being
displayed. Recall that when no attribution is used, the publisher will display q ads according
to the solution of:
When using last-touch attribution, a publisher faces a winner-take-all contest which increases
its marginal revenue when receiving credit for the conversion, even if the conversion rate
remains the same. In a CPA campaign the first order condition in a symmetric equilibrium
becomes:
ρ−1 0 1
(2q) 2f1 (1) + ρ pA = q (3.12)
2
where f 0 (1) is the marginal increase in the share of attribution when showing an additional
ad when q1 = q2 . Comparing equations (3.11) and (3.12) we see that if 2f10 (1) + 12 ρ > ρ,
then the publisher faces a higher marginal revenue for the same amount of effort. As a result
it will have an incentive to increase its effort in equilibrium when the conversion function
is concave compared to the case when no attribution was used. Gershkov et al. (2009)
show conditions under which such a tournament can achieve Pareto-optimal allocation when
symmetric team members use a contest to allocate the revenue among themselves. Whether
this contest is sufficient to compensate for free-riding in online campaigns remains yet to be
seen.
To answer this question we are required to perform the full analysis that considers the
price pA offered by the advertiser in equilibrium. In addition, the accuracy of the attribution
process which depends on the magnitude of the noise d has an impact and may yield ex-
aggerated effort by each publisher. Finally, the curvature of the conversion function x that
depends on the parameter ρ may also influence the efficiency of last-touch attribution.
When performing the complete analysis for both CPA and CPM campaigns, we find the
following:
u1 A
0.10 d=10
d=6
0.05
d=4
q1
0.2 0.4 0.6 0.8 1.0 1.2 1.4
-0.05
-0.10
-0.15
-0.20
Publisher’s 1 best response to publisher’s 2 strategy of showing q A−LT ads when ρ = 1.
Finally, a comparison of the profits the advertiser makes with and without last-touch
attribution yields the following result:
4
Corollary 4. When 0 < ρ < 2 − d−1 , π A−LT > π M > π A and the advertiser makes higher
profit under last-touch attribution.
as following:8
X |S|!(|M | − |S| − 1)!
φi (x) = (xS∪i − xS ) (3.13)
|M |!
S⊂(M \i)
where M is the set of publishers and x is the set of conversion rates for different subsets of
publishers.
The value has the four properties mentioned in the previous section: Efficiency, Symme-
try, Null Player and Marginality.9 In addition, it is the unique allocation function that has
these properties with the addition of an additivity property over the space of cooperative
games defined by the conversion function x(·). For the case of two publishers M = 2 the
Shapley value reduces to:
x(q1+q2)−x(q2 )+x(q1 )−0 x(q1+q2)−x(q1 )+x(q2 )−0
φ1 = 2
φ2 = 2
(3.14)
Using the Shapley value has the benefit of directly using the marginal contribution of
the publishers to compensate them. In addition, the process’s accuracy does not depend on
exogenous noise and yields a pure strategy equilibrium for all values of ρ.
q2
In a CPA campaign, the profit of a publisher will become: uA−S
i = φi pA−S − 2i .
Solving for the symmetric equilibrium strategies and profits of the advertiser and pub-
lishers yield the following result:
Proposition 5.
1
2−ρ
ρ2
When 0 < ρ < 2, using the Shapley value for attribution yields q A−S = 4
(2ρ−1 + 1) .
4
For ρ < 2 − d−1
, q A < q A−S < q A−LT .
The profit of the advertiser is higher under Shapley value than under Last-Touch attribution
4
iff q A−S > q A−LT , i.e. d < 2−ρ + 1.
The profit of the publisher is higher under the Shapley value attribution than under regular
CPM pricing iff ρ > 1.
Proposition 5 is a major result of this chapter, showing that the Shapley value can be
more profitable when publishers are complements. Contrary to Last-touch attribution, a
symmetric pure strategy equilibrium exists for any value of ρ, including very convex func-
tions. When considering lower values of ρ for which Last-Touch attribution improves the
efficiency of the campaign, we see that when the noise level d is low enough, the Shapley
value will yield better results for the advertiser if ρ > 1, while CPM will be better when
ρ < 1. Figure 3.5 depicts for which values of ρ and d is each attribution and compensation
scheme more profitable.
8
This is a continuous version of the value.
9
Some of these properties can be shown to be derived from others.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 30
d
Last T ouch
Shapley
CP M ⇢
Values of ρ and d for which each compensation scheme is more profitable for the advertiser.
The intuition behind this result can be illustrated best for extreme values of ρ. When
ρ < 1 and is extremely low, the initial ads have the most impact on the consumer. As a
result, there will be significant free-riding which Last-touch is best suited to solve, while the
marginal increase that the Shapley value allocates is not too high. When ρ > 1, however,
if the noise is low enough, the publishers will be inclined to show too many ads because of
the low uncertainty about their success of being the last one to show an ad. In essence, the
competition is too strong and overcompensates for free-riding. The Shapley value in this case
is better suited to incentivize the players as the marginal increase between two symmetric
publishers to one is highest with a convex function.
To make use of the Shapley value in an empirical application, it is required that the
advertiser can observe the conversion rates of consumers who were exposed to publisher 1
solely, publisher 2 solely and to both of them together. In addition, when a baseline is
present, it cannot be assumed that not being exposed to ads yields no conversions.
The next section discusses the baseline and the use of experimentation to generate the
data required to calculate the Shapley value.
may cause adverse selection - publishers can target consumers with high baselines to receive
credit for those conversions.
Specifically, if we consider again equation (3.12) the first order condition of an advertiser
showing q ads to all consumers now becomes:
ρ−1 0 1 0
(2q) 2f1 (1) + ρ (1 − s) + f (1)s pA = q (3.15)
2
In the extreme case of s = 1, the publishers will elect to show advertising to baseline
consumers and be attributed credit.
To understand how experimentation may be beneficial for the advertiser in light of this
problem, we analyze a model with a single publisher, but now assume the baseline is non-
zero and known to the publisher. We also assume ρ = 1, and recall that s is distributed
Beta(α, β). Thus, if all consumers are exposed to q ads, the expected observed number of
converters will be N (s + q(1 − s)). We note however that if non-baseline consumers are not
exposed to ads at all, the advertiser would still expect to observe N (s + q(1 − s)) converters.
When the advertiser is integrated with the publisher and can target specific consumers,
it can choose to show qb ads to baseline consumers and q ads to the non-baseline consumers.
2
If the cost of showing q ads to a consumer is q2 the firm’s profit from advertising is:
qb2 q2
π(q, qb ; s) = N s + q(1 − s) − s − (1 − s) (3.16)
2 2
The insight gained from this specification is that when consumers have a high baseline, the
advertiser has a smaller population to affect with its ads, as consumers in the baseline would
convert anyway.
It is obvious that when the advertiser can target consumers exactly, it has no reason to
show ads to baseline consumers, and therefore will set qb = 0. The allocation of ads that
maximizes the advertiser’s profit under full information is then q ∗ = 1 and qb∗ = 0, while the
total number of ads shown will be N (1 − s). We call this strategy the optimal strategy and
note that the number of ads to show decreases in the magnitude of the baseline. The profit
achieved under the optimal strategy is π max = N µ+1 2
when µ is the expectation of s.10 This
profit increases with α, and decreases with β. This means that when higher baselines are
more probable in terms of mass above the expectation, a higher profit is expected.
Turning to the case of a firm with uncertainty about s, one approach the firm may choose
is to maximize the expected profit over s by showing a number of ads q to all consumers
independent of the baseline. This expected strategy solves:
max Es [π(q, q; s)] (3.17)
q
The achieved profit in this case can serve as a lower bound π min on profit the firm can
achieve in the worst case. Any additional information is expected to increase this profit; if
it does not, the firm can opt to choose the expected strategy.
10 α
µ= α+β
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 32
The following result compares the expected strategy with the optimal one:
• The firm will choose to show q E = 1 − µ ads when using the expected strategy.
N
• The firms profit, π min is lower than π max by 2
(µ − µ2 ).
Lemma 1 posits that the number of ads displayed using this strategy treats the market
as if s equals its expected value. As a result, the achieved profit increases with the expected
value of s. When this strategy is the only one available the value of full information to the
firm is highest when the expected baseline is close to 1/2.
The most common strategy that firms employ in practice, however, is to learn the value
of s through experimentation. The firm can decide to not show ads to n < N consumers and
observe the number of converters in the sample. This information is then used to update
the firm’s belief about s and maximize q. We call this strategy the learning strategy.
When the firm observes k converters in the sample it will base the number of ads to show
on this updated belief (DeGroot, 1970). The expected profit of the firm in this case is:
nEs [x(q = 0; s)] + (N − n)Es Ek|s max π(q; s)|s (3.18)
q
The caveat here is that by designating consumers as the sample set, the firm forfeits
potential added profit from showing ads to these consumers. We are interested to know
when this strategy is profitable, and also how much can be gained from using it and under
what conditions.
Let n∗ denote the optimal sample size that maximizes (3.18) given the distribution of s.
As the distribution of the observed converters k is Bin(n, s), the posterior s|k is distributed
Beta(α + k, β + n − k). Using Lemma 1, the optimal number of ads to show when observing
k converters becomes q ∗ (µ(k)) when µ(k) = Es [s|k] = α+β+n
α+k
. A comparative statics analysis
of the optimal sample size n∗ shows the following behavior:
Lemma 2 shows that unless the distribution of s is heavily skewed towards 0 by having
a large β parameter, even with small populations some experimentation can be useful. On
the flip side, when the distribution is heavily skewed towards 1 with very large α, the
high probability baseline makes it less valuable to experiment, and the optimal sample size
decreases.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 33
Having set conditions for the optimal size of the sample during experiments, we now
revisit our question: when is it profitable for the firm to learn compared to choosing an
expected strategy. Our finding is that for a large enough population N , it is always more
profitable to learn than to use an expected strategy:
Baseline Exploitation
When the advertiser is not integrated with the publisher, the publisher has a choice of
which consumers to target and how many ads to show to each segment. We can solve for
the behavior of an advertiser under CPM and CPA pricing in this special case without
attribution to get the following result:
1−µ
Proposition 7. • Under CPM the publisher will show q M = 2
ads to each consumer
in both segments.
2µ−1
• Under CPA, the publisher will show q A = 2µ−2 when µ < 12 . The ads will be shown to
consumers only in the non-baseline segment. When µ > 12 , the advertiser will opt to
not use CPA at all.
• Under CPA the publisher will show a total number of ads which is higher than the
efficient number q ∗ , as well as higher than q M , for every value of s.
• The profit of the advertiser under CPM is higher than under CPA for any value of µ.
Proposition 7 exposes two seemingly contradicting results. Since under CPM payment
the publisher is paid for the amount of ads it shows, it will opt to show both q > 0 and
qb > 0 ads. Given the same price and cost for each ad displayed, it will show exactly the
same amount to both segments, which will be lower than the efficient amount of ads to show.
Specifically, when µ is high, i.e., the expectation of the baseline is high, the publisher will
lower its effort as the advertiser would have wanted. Under CPA, however, the publisher will
use an efficient allocation of ads in terms of targeting and will not show ads to the baseline
population. Since the publisher gets a commission from the baseline as well, however, it
experiences lower effective cost for each commission payment, and as a result will show
too many ads compared to the optimal amount. The apparent contradiction may be that
although the publisher now allocates its ads correctly under CPA compared to CPM, the
profit of the advertiser is still higher under CPM payment for low baseline values. The
11
This result assumes n is continuous. As n is discrete, the actual n∗ is slightly larger than this bound to
allow for discrete sizes of samples.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 34
intuition is that CPM allows the advertiser to internalize the strategy of the publisher and
control it through the price, while in CPA the advertiser will need to trade-off effective ads
for ineffective exploitation of the baseline if it lowers the price paid per conversion.
Adding Last-Touch attribution to the CPA process will only exacerbate the issue. If the
publisher will show a different number of ads in each segments, the advertiser can infer which
segment may be the baseline one and not compensate the publisher for it. The publisher, as
a result, will opt to show the same number of ads to all consumers, and the number of ads
shown will now depend on the size of the baseline population s. The result will be too many
ads shown by a CPA publisher to the entire population, and reduced profit to the advertiser.
Using the Shapley value, in contrast, will allocate revenue to the publisher only for
non-baseline consumers, as the Shapley value will control for the observed baseline through
experimentation. When solving for the total profit of the advertiser including the cost of
experimentation, it can be shown that Shapley value attribution in a CPA campaign reaches
a higher profit than CPM campaigns.
We thus advocate moving to an attribution process based on the Shapley value consid-
ering the adverse effects of the baseline. The next section discusses a preliminary analysis
of data from an online campaign using Last-Touch attribution to detect whether baseline
exploitation is indeed occurring.
To associate the return of the campaign the advertiser computed last-touch attribution
for the publishers based on the last ad they displayed to consumers. Table 3.2 shows the
12
An online consumer is measured by a unique cookie file on a computer.
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 35
attributed performance alongside the average cost per attributed conversion. We see that
the allocation of budgets correlates with the attributed performance of the publishers, while
the cost per conversion can be explained by different average sales through each publisher
and quantity discounts.13
Publisher No. Type Attribution Budget ($) Cost per Converter ($)
1 Online Magazine 386 8,300 21.50
2 Travel Agency 218 8,000.02 36.69
3 Travel Magazine 40 6,000 150
4 Display Network 168
5 Travel Search 50
6 Display Network 1,330 13,200 9.92
7 Travel Search 69
8 Media Exchange/Retargeting 3,769 33,200 8.80
Total 6,030 68,700 11.39
Table 3.2: Last Touch Attribution for the Car Rental Campaign
We observed that in order to achieve high profits, the advertiser needs to be able to
condition payment on estimates of the baseline as well as on the marginal increase of each
publisher over the sets of other publishers. This result extends to the case of many publish-
ers, where for a set of publishers M the advertiser will need to observe and estimate 2|M |
measurements.
Even small campaigns utilizing 7 publishers require more than 100 of these estimates to
be used and reported. Current industry practices do not allow for such elaborate reporting
resulting in advertisers using statistics of these values. The common practice is to report
one value per publisher with the implicit assumption that if a publisher’s attribution value
is higher, so is its effectiveness.
logdiff
Publishers coef se
1 -0.657 (0.849)
2 -2.175*** (0.693)
3 -1.960*** (0.703)
4 -0.986 (0.751)
5 -1.559** (0.691)
6 -1.689** (0.744)
7 -0.588 (0.748)
8 -0.539 (0.813)
R2 0.650
Observations 88
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
were not attempting to game the last-touch method, we would expect to see their marginal
contribution estimates be close to their last-touch attribution in equilibrium. An issue that
arises with using marginal estimates from the data, however, is that the timing of ads being
displayed is endogenous and depends on a decision by the consumer to visit a publisher
and by the publisher to display the ad. The advertiser does not observe and cannot control
for this order, which might raise an issue with using ad view data as created by random
experimentation.
The use of the Shapley value, however, gives equal probability to the order of appear-
ance of a publisher when a few publishers show ads to the same consumers. The effect is a
randomization of order of arrival of ads when multiple ads are observed by the same con-
sumer. Because of this fact, using the Shapley value as is to estimate marginal contributions
will be flawed when not every order of arrival is possible. For example, the baseline effect
needs to be treated separately while special publishers such as retargeting publishers and
search publishers that can only show ads based on specific events need to be accounted for.
An additional hurdle to using the Shapley value is the computation time required as it is
exponential in the size of the input.
We developed a modified Shapley value estimation procedure to handle these issues. The
computational issues are addressed by using specific structure of the advertising campaign
data and will be described in Berman (2013).
Figure 3.6 compares the results from a last-touch attribution process to the Shapley value
estimation.
More than 1, 000 converters were reallocated to the baseline. In addition, a few pub-
lishers lost significant shares of their previously attributed contributions, showing evidence
of baseline exploitation. Using these attribution measures the advertiser has reallocated its
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 37
6,000
5,000
4,000
Last
Touch
3,000
Shapley
2,000
1,000
-‐
1
2
3
4
5
6
7
8
Total
budgets and significantly lowered its cost per converter. We are currently collecting the data
on the behavior of the publishers given this change in attribution method, to be analyzed in
the future.
3.8 Conclusion
As multi-publisher campaigns become more common and many new publisher forms appear
in the market, attribution becomes an important process for large advertisers. The more
publishers are added to a campaign, however, the more complex and prone to errors the
process becomes. Our two-publisher model has identified two issues that are detrimental to
the process – free-riding among team members and baseline exploitation. This measurement
issue arises because the data does not allow us to disentangle the effect of each publisher
accurately and using statistics to estimate this effect gives rise to free riding. Thus, set-
ting an attribution mechanism that does not take into account the equilibrium behavior of
publishers will give rise to moral hazard even when the actions of the publishers are fully
observable. On the other hand, if the performance of the campaign is not explicitly used
in the compensation scheme through an attribution mechanism, adverse selection cannot be
mitigated and ineffective publishers will be able to impersonate as effective ones.
The method of last-touch attribution, as we have showed, has the potential to make CPA
campaigns more efficient than CPM campaigns under some conditions. In contrast, attri-
bution based on the Shapley value yields well behaved pure strategy equilibria that increase
profits over last-touch attribution when the noise is not too small. Adding experimentation
as a requirement to the contract does not lower the profits of the advertiser too much, and
allows for collection of the information required to calculate the Shapley value, as well as
CHAPTER 3. ATTRIBUTION IN ONLINE ADVERTISING 38
Chapter 4
4.1 Overview
Online experiments have gained popularity as a leading method of performing market re-
search, measuring ROI and producing new software for startups, advertisers and other firms.
The two factors contributing to the increased popularity are the reduced cost of produc-
ing different versions to experiment with and cheap access to large consumer populations
through the Internet. As a result of this trend, a recent survey1 of 2,500 online marketers
has determined “Conversion Rate Optimization” to be the top priority for the coming years,
while another recent survey2 has determined that the most popular method for determining
marketing activity effectiveness is running an A/B Test.
A/B tests are randomized controlled trials in which two versions of a treatment to be
tested (A and B) are assigned to consumers arriving to a website or using an app. An
action by the consumer is designated as the target result of the experiment, and is called
a conversion. Some examples of such treatments are exposing the consumer to an ad or
displaying a different version of a webpage. Examples of conversions are the purchase of a
product, filling out a form or providing an email address.
When running these experiments, marketers are required to invest time in designing the
experiment through not only producing the different versions to test, but also by considering
treatment allocation, experiment run-time, sample sizes and statistical tests. This approach
fuses the traditionally separate positions of creative directors with that of planners and me-
dia buyers into one position requiring more rigorous and detailed analysis of the experiment
a-priori. The applied business literature supports this approach by stressing the importance
of properly applying the scientific method to business experiments, with examples from
1
SaleForce ExactTarget “2014 State of Marketing” report at:
http://content.exacttarget.com/en/StateOfMarketing2014
2
Econsultancy “Conversion Rate Optimization Report 2013” at:
http://econsultancy.com/reports/conversion-rate-optimization-report
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 40
both marketing (Anderson and Simester, 2011) and entrepreneurship (Blank, 2013). Conse-
quently, marketers who run online experiments typically utilize a software platform such as
Optimize.ly, Google Analytics or Adobe Test for experimental design.
Given the treatments to test, the software automatically allocates them to consumers,
tracks conversions, produces reports and performs statistical tests for the marketer. The
determination of the experiment’s sample size and the execution of hypotheses tests are
then relegated to the software.
This chapter discusses the standard online experimental design proposed by the leading
online platforms and focuses on the current methods used for sample size determination and
hypothesis testing. Since in many cases even small effect size have large economic value in
terms of profit for a website, experimenters set ambitious goals for detection of effects which
may result in large sample requirements. The consequences of these large samples implies
that experiments need to run for a long period of time until reaching the desired population
and in many cases may result in an inability to properly measure the effect of a campaign
due to large signal to noise ratios of the data as documented in Lewis and Rao (2012b) and
Lewis et al. (2013).
The question arises, however, whether the standard test used by testing platforms, the
statistical test of inequality of conversion rates with fixed sample size, is the most efficient
test that can be used in an online setting. The intuition behind this question stems from
three insights about online experiments. The first is that the goal of the experiment, or
the decision to be taken given the result, impacts the required sample size to make the
right decision. The second is that the data collected in an experiment is stochastic and
collected sequentially, and the variation in it can be exploited. The third is that a-priori,
the experimenter many times makes a best-effort guess regarding the underlying effect sizes
to be determined, but it may turn out that the treatments are much worse or better than
previously hypothesized. We therefore focus on several techniques an experimenter can use
to lower the sample size needed in an experiment, either through matching the goal of the
experiment with the statistical test used, or through using sequential analysis methods.
When calculating the required sample sizes in an experiment, typically two types of
parameters are taken into account. The first type of parameters describe the desired effect
size to be detected and many times a baseline conversion rate to detect the change from. For
example, an experiment’s goal may be to detect an increase of at least 10% in conversion
rate above a 5% conversion rate. That is, we wish to detect treatments with conversion
rates above 5.5%. The other set of parameters set the acceptable error rates of the decision
process, namely the maximum levels for Type I and Type II error rates.
Typically there are two possible goals for an online experiment: to select the best treat-
ment or to determine whether a new treatment is better than a control. We call the former
a selection test and the latter a test of superiority, the difference depending on the cost of
picking treatment A over B for eventual use. If both treatments A and B have the same
cost of implementation and the goal is to pick the best one, we call the experiment a se-
lection experiment. If switching from the control treatment to a new one will bear some
additional cost, than the experimenter would like to make sure the new treatment outweighs
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 41
this cost by showing that the new treatment is superior to the control. We therefore label
this experiment a test of superiority.
Section 4.2 formally introduces the distinction between the possible types of experiment
goals and develops an understanding on the impact these have on reducing the possible
sample sizes in experiments. We show that although the goal of most online experiments
is to test superiority or select the best treatment, the current practice overestimates the
sample size required for these tasks as it uses a test of equality. As an example, when
selecting the best version out of two, it many times does not matter which version is selected
if their performance measures are close enough. This indifference zone for experiment results
effectively eliminates Type I errors and allows for achieving sample size requirements lower
by more than 80% compared to the standard test.
In Sections 4.3 and 4.4 the concept of sequential statistical tests is introduced and ap-
plied to relevant online questions. The concept of sequential analysis was introduced by
Wald et al. (1945) who developed the sequential ratio probability test (SPRT) during World
War II to test the improved performance of anti-aircraft guns. The main idea behind the test
is to use a sequence of likelihood ratios for the data to reject or accept the null hypothesis
while an experiment is running when an extreme result occurs. Since its initial development,
significant work has been done in the field of sequential analysis with the majority of appli-
cations carried out for medical trials. Ghosh and Sen (1991) contains a classical overview of
the developments of the different tests, while Bartroff et al. (2012) contains a more modern
treatment with focus on medical experimentation.
It is interesting to note that to the best of our knowledge sequential methods have seldom
been used in the fields of Business, Economics and Psychology, and seem completely non-
existent among marketing practitioners. Part of this puzzle can be attributed to the technical
nature of the statistics required to perform the tests, as well as the abundant number of
methods that may apply to a specific scenario. Another reason may be the lack of accessible
software libraries for carrying out the experimental design and performing the statistical
tests. This chapter therefore aims to synthesize the available literature and methods into
a coherent overview that can serve as a guideline for applying sequential techniques for
marketers. To this end, Sections 4.3, 4.4 and 4.5 review the most common scenarios online
marketers may encounter in their experiments and carefully describe the applicable statistical
approaches that can be used. A software library that has been developed while writing this
chapter will be distributed on the author’s site with the goal of making these techniques
accessible and easily applicable.
Section 4.6 describes the application of a sequential test on observational data from an
online experiment carried out by a software startup. The goal of the experiment was to
show whether the firm’s software has superior efficacy to the current best practice of a
marketing website. As the results show the sequential test determines that the experiment
has achieved its desired goal within 12 days, which would have allowed to reduce the length of
the experiment by approximately 25% compared to the original a-priori determined sample
size.
Lastly, Section 4.7 describes additional open questions and possible future avenues for
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 42
research in the application of sequential tests for marketing purposes.
2
Φ−1 (1 − α/2) + Φ−1 (1 − β)
nN EQ = 2p(1 − p) (4.1)
d
when Φ is the cdf of the Normal distribution.
Similarly, for a superiority test with the same parameters, the sample size equals approx-
imately:
−1 2
Φ (1 − α) + Φ−1 (1 − β)
nSU P = 2p(1 − p) (4.2)
d
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 44
The case of the selection experiment is more interesting. As the goal defines, when the
two treatments are close enough (|pA −pB | < d), the experimenter is indifferent among which
treatment is selected as best. Effectively, this test has no Type I error, and we only need
to make sure that the test selects the highest performing treatment with high probability.
Assume, w.l.o.g that pA ≥ pB + d, then the test that picks treatment A if p̂A > p̂B has the
following (lowest) power when θ = d, termed the probability of correct selection:
−d
P r(Correct Selection) = P r(p̂A − p̂B > d|pA − pB = d) = 1 − Φ( )=1−β (4.3)
σ
Solving for n yields
2
Φ−1 (1 − β)
nSEL = 2p(1 − p) (4.4)
d
Calibrating with typical values of α = 0.05 and β = 0.1, we can observe that the test of
superiority requires ∼ 81.5% of the sample size required by the test of non-equivalence, while
the selection experiment only requires ∼ 15.6% of the sample size. The improvement does
not depend on the effect size d or the baseline rate p. This is a dramatic six-fold improvement
in sample size that is achieved when running selection tests, which are very common for new
products.
The caveat in this comparison is that the absolute sample size required to detect small
effects compared to the baseline effect p may be very large compared to the timing constraints
of an experiment. Figure 4.1 displays the required sample sizes to detect a 10% increase for
different values of conversion rate p.
For a website with a thousand daily visitors willing to spend two weeks on an experiment,
there are not enough visitors to detect even a 15% increase in conversion rate using a supe-
riority test. This led Lewis and Rao (2012b) and Lewis et al. (2013) to provide convincing
evidence that measuring small yet economically meaningful effects may be very hard for
smaller firms and advertisers without access to extremely large populations.
Apart from using a test of inequality for superiority and selection experiments, the current
approach has two noticeable deficiencies. The first deficiency is that the effect size, d, is
selected as the difference of the two treatments regardless of the underlying baseline p of
the experiment. The a-priori determined sample size is therefore highly sensitive to the
specification of the baseline p. If p diverges significantly from the pre-specification, the test
may end as being too powerful or not powerful enough to make a decision.
The second deficiency is the fact that the standard test is powered at a specific effect size,
and does not adjust the sample size when the true effect is much larger or much smaller than
hypothesized. The fact that consumers arrive sequentially over time to be included in the
experiment means that early information about a substantially large or small effect size may
be used to terminate the experiment early with the correct conclusion. The method behind
this intuition is known as Sequential Analysis and will be introduced in the next section.
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 45
Sample size required to detect 10% increase in conversion rate Sample size required to detect increase in conversion rate
50,000
50,000
non−equivalence non−equivalence
● superiority ● superiority
selection selection
40,000
40,000
30,000
30,000
Sample Size n
Sample Size n
20,000
20,000
10,000
10,000
●
●
n=7,000 ● n=7,000
●
● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
0
0
● ● ● ● ● ● ●
0.06 0.08 0.10 0.12 0.14 5% 7% 9% 11% 13% 15% 17% 19%
Baseline Conversion Rate p (α = 0.05, β = 0.1) Percentage Increase in Conversion Rate Over Baseline p=0.1 (α = 0.05,β = 0.1)
4. Otherwise, when c < LLRn < c, continue the test and take another sample.
To understand the intuition behind the test, Figure 4.2 graphs the log likelihood-ratio
values of two random samples generated from Bernoulli distributions with parameters p =
0.08 and p = 0.12. The boundaries c and c are marked using the dashed lines. The values of
the LLRs constitute a random walk. When each of the lines crosses one of the boundaries, the
experiment is stopped and H0 or H1 is accepted. In the example’s case, testing H0 : p = 0.1
vs. H1 : p = 0.11 would have required a sample size of 9, 976 samples to achieve Type I error
of α = 0.05 and Type II error rate of β = 0.1. The simulation graphs show, however, that
less than 2, 000 draws were required to stop and accept H1 for p = 0.12 and less than 1, 100
draws were required before stopping to accept H0 with p = 0.08
p=0.12
6
p=0.08
4
c = 2.890372
2
LLR
0
−2
c = −2.251292
−4
−6
Sample size n
The SPRT has been shown to approximately achieve a Type I error rate of α and a Type II
error rate of β. As can be noticed from its description, the actual stopping time, denoted N ,
of the test is a random variable and depends on the arriving data. In theory, the experiment
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 47
may proceed indefinitely yielding very large sample sizes. Consequently substantial amount
of research has been dedicated to minimize the expected sample size E[N ] of the test under
various assumptions, while providing a definite upper bound for the maximum sample size
of the procedure.
One common solution is to choose an upper bound Nmax for the sample size n, and
truncate the experiment if this bound is reached. At this point, H0 is accepted if LLRNmax <
0 and H1 is accepted if LLRNmax > 0, while a tie is broken arbitrarily. Another option is to
design the stopping boundaries c and c to change with n, and converge to meeting at Nmax .
An excellent overview of the research on the topic can be found in Ghosh and Sen (1991);
Jennison and Turnbull (1999); Bartroff et al. (2012). The majority of the applications of se-
quential analysis to date have been the medical trials field, where the cost of experimentation
is high and ethical issues about adverse effects of drugs raise the need to stop experiments
early.
Except for the possibly unlimited sample size, a major issue with applying the standard
SPRT to comparing to Bernoulli trials is that the test applies only to simple hypotheses of
single parameter distributions, and most extensions apply to the exponential family of dis-
tributions, to which the two-armed bernoulli trial does not belong. In our case of conversion
rate comparison, however, we are interested in testing composite hypotheses H0 : pA ≤ pB
vs. H1 : pA ≥ pB (1 + d). These hypotheses are composite since the underlying values of pA
and pB are unknown in advance. As a result, we are interested in making use of the observed
data to estimate pA and pB which may bring the ability of stopping earlier for very high or
very low values of pA .
We present two modified versions of sequential tests based on Hoel et al. (1976) and
Kulldorff et al. (2011) which take a maximum likelihood approach to the estimation of the
test statistic. The first procedure, the equality constrained maximum likelihood ratio SPRT
(eqMaxSPRT), uses a two sided boundary and allows for early stopping to both accept H0
or H1 . This approach is useful to minimize the expected sample sizes of the experiment
when H0 and H1 are true with equality, and in addition to stop early with high power when
pA pB or when pA pB .
The second approach, which we term ineqMaxSPRT uses an inequality constrained max-
imum likelihood estimator for the test statistic, and only an upper bound c for stopping. In
other words, c = −∞, and the test will never stop early to accept H0 . The test is truncated
at a maximum sample Nmax , at which point H0 is accepted. This test is appropriate when
there is high probability that H1 is correct, or high cost to not stopping when H1 is correct
and the experimenter would like to stop as early as possible when this is true. Otherwise,
if both treatments are equivalent, the cost of continuing to experiment should be low. This
test, for example, can be used to detect a new treatment that is significantly worse than a
control by properly reversing the hypotheses or counting non-converters as converters.
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 48
Equality Constrained MaxSPRT
We first notice that the set of hypotheses H0 : pA = pB vs. H1 : pA = pB (1 + d) is composite,
as there are setsPof values for pA Pand pB that can satisfy these. The log-likelihood of the
data with sA = j xAj and sB = j xBj conversions for arms A and B respectively, given
values pA and pB of the true conversion rates and n samples from each arm is:
To calculate the eqMaxSPRT statistic, we maximize the log-likelihood under each hy-
pothesis, to receive the log likelihood-ratio:
eqM axLLRn (sA , sB ) = max LLRn (sA , sB |pA , pB ) − max LLRn (sA , sB |pA , pB ) (4.6)
pA =pB (1+d) pA =pB
The left-hand additive estimates the log likelihood of the data under H1 by solving the
quadratic equation resulting from the first order condition to maximize the constrained
likelihood. The right-hand additive solution results in an estimate of p̂A = p̂B = sA2n+sB
.
To show the applicability of this test, we simulated 2, 000 experiments for different values
of pA when pB = 0.1 and d = 0.1, with up to 50, 000 draws in each experiment. For
each experiment we calculated the eqMaxSPRT statistic for each state, and determined the
stopping time of the experiment using the boundaries c and c described above, with α = 0.05
and β = 0.1. Table 4.1 shows the probability of rejecting H0 and the probability of not having
stopped by using n samples with n = 10, 000, n = 30, 000 and n = 50, 000 draws per arm.
n=10,000 n=30,000 n=50,000
pA Prob. Prob. Prob. Prob. Prob. Prob.
Reject No Reject No Reject No
H0 Stop H0 Stop H0 Stop
0.08 0 0 0 0 0 0
0.09 0.0005 0.0045 0.0005 0 0.0005 0
0.1 0.0305 0.256 0.0435 0.0105 0.0435 0.0005
0.105 0.22 0.482 0.4345 0.0695 0.4655 0.012
0.11 0.5935 0.331 0.8915 0.01 0.9015 0
0.12 0.988 0.0105 0.9985 0 0.9985 0
Table 4.1: Simulation results of probability of rejecting H0 and of not stopping eqMaxSPRT
pB = 0.1, d = 0.1, α = 0.05 and β = 0.1.
As expected, the power of the test increases as pA gets farther below 0.1 or above 0.11. In
addition the longer the maximum sample sizes n, the higher the power of the test. Finally,
for the range of values between pA = 0.1 and pA = 0.1 × (1 + 0.1) = 1.1, we see that the
probability of not stopping early is highest, as the test statistic is not powerful enough to
discriminate at these rates. This is the same phenomenon that would happen with a fixed
size test for values of θ0 < θ < θ1 .
Analyzing the expected sample size until stopping E[N ], however, sheds light on the
advantages of using the sequential method. Table 4.2 summarizes the expected sample sizes
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 49
until stopping to accept H0 or H1 for the series of simulated experiments described above.
The values in parenthesis show the ratio between the expected sample size E[N ] to the one-
arm sample size of 16, 094 required to test H0 : pA < pB vs. H1 : pA ≥ pB for pB = 0.1, with
α = 0.05 and β = 0.1 when d = ppBA = 0.1.
As can be noticed, when the data is drawn from distributions with pA ≥ pB (1+d) or with
pA ≤ pB , the expected sample size is substantially smaller than the fixed sample test size,
leading to improvement of 40% and more. This feature of the test, being more powerful and
requiring smaller samples with more extreme data farther outside the indifference zone is a
result of the monotonicity of the estimate of the constrained p̂A in the number of successes
sA , which is itself monotone in the true pA in expectation. When the test is less powerful,
however, for values of pB < pA < pB (1 + d), the expected sample size increases and may
reach values close to the fixed sample test that will not warrant the expense of having the
possibility of not stopping the test.
The problem of minimizing the maximum expected sample size E[N ] is known as the
Kiefer-Weiss problem (Kiefer et al., 1957). Several applicable solutions to a slight variation of
this problem for exponential families have been developed, a good example of which appears
in Huffman (1983). These solutions unfortunately do not apply to the case of comparing
two Bernoulli populations with composite hypotheses.
ineqM axLLRn (sA , sB ) = max LLRn (sA , sB |pA , pB ) − max LLRn (sA , sB |pA , pB )
pA ≥pB (1+d) pA =pB
(4.7)
The other major difference is that the test only has an upper bound for early stopping c and
a maximum sample size Nmax chosen such that:
N
Xmax
In their novel development of the inequality constrained MaxSPRT, Kulldorff et al. (2011)
compare two-armed problems whose distributions can be reduced to single parameter dis-
tributions. As a result they could use numerical integration or exact calculations to find
Nmax and c. For our purpose, we can achieve similar results using simulation methods that
estimate P r(ineqM axLLRn (sA , sB ) > c) under H1 and H0 to find B and Nmax .
Table 4.3 shows the probability of rejecting H0 : pA = pB under different conditions.
Comparing to Table 4.1 we can see that higher power is achieved for smaller sample sizes
with the inequality constrained MaxSPRT. Since the test is built using simulation techniques,
however, the error rate guarantees are only approximate, as can be seen for the values of
pA = 0.1, which reaches a Type I error of 0.0565 for a large sample. In our experiments
these values fluctuated between 0.02 and 0.07.
pA n= 10,000 n= 30,000 n= 50,000
0.08 0.005 0.0045 0.0045
0.09 0.014 0.0105 0.0105
0.1 0.0685 0.0575 0.0565
0.105 0.1685 0.3395 0.456
0.11 0.489 0.9165 0.9825
0.12 0.976 1 1
Table 4.3: Inequality MaxSPRT probability of rejecting H0
pB = 0.1, d = 0.1, α = 0.05 and β = 0.1.
Another characteristic of the inequality constrained MaxSPRT is its expected sample size
behavior. A key feature is that rejecting H0 will always require the maximum sample size,
while rejecting H1 might come at much earlier stages. Table 4.4 documents simulation results
for E[N ] for various conditions and should be compared with Table 4.2. As can be seen,
for values of pA much lower than pB or higher than pB (1 + d) the expected sample sizes can
be much smaller for the inequality constrained version. Closer to these values, however, the
expected sample size might inflate above the original fixed sample test size. This undesirable
property should be taken into account when choosing which test to use. Comparing to the
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 51
equality constrained version, however, we notice that the inequality constrained MaxSPRT
allows testing d = 0 vs. d > 0, while the equality constrained version cannot test this
assumption. Therefore, by setting d = 0 for the inequality test, one can avoid the inflated
sample sizes, and achieve a powerful test for detecting smaller effects of the treatment.
• Otherwise, continue sampling until n = Nmax and declare the arm with the highest
value of si as the best.
It is easy to show that this procedure will reach the same decision as the fixed sample
procedure, hence reaching the same probability of correct selection (Power). On the other
hand, this procedure has the advantage of being able to stop earlier if one of the arms turns
out to be much better than the other. Another advantage of this procedure is that the
possibles states for si − s−i after n samples are discrete and finite ranging from −n to n.
We can therefore exactly calculate the probability of stopping and probability of error given
this procedure.
Although the curtailed difference procedure has the advantage of a deterministic stopping
time, calculations show that the decrease in expected sample sizes is moderate at best. The
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 52
reason is that the procedure itself does not make use of the indifference zone to relax its
requirement for stopping.
We therefore propose to use the equality constrained MaxSPRT procedure to test the
pB
hypothesis: H0 : pA = 1+d vs. H1 : pA = pB (1 + d). If H0 is accepted, we choose arm 2 as
the best, while if H1 is accepted, we pick arm 1 as the best. The procedure is similar to the
one described in Section 4.3 with a different null hypothesis and with setting α = β. This
ensures that the probability of correct selection is 1 − β as desired.
Prob.
Prob.
pA Correct E[N]
No Stop
Selection
0.08 0.9965 0 1,067 (0.36)
0.09 0.918 0.016 1,700 (0.58)
0.1 1 0.056 2,050 (0.69)
0.105 0.723 0.0415 1,928 (0.65)
0.11 0.9005 0.0095 1,639 (0.55)
0.12 0.9825 0 1,003 (0.34)
Table 4.5: Simulation results for eqMaxSPRT test of Selection.
n = 6, 000, β = 0.1, d = 0.1, pB = 0.1.
Table 4.5 summarizes the results of using this procedure to pick the highest among pA and
pB , with pB fixed at 0.1, for various values of pA , and an indifference zone of 10% (d = 0.1).
The fixed sample size test requires n = 2, 957 from each arm to reach a 90% probability of
correct selection. As can be seen, setting n = 6, 000 for the sequential test yields a high
probability of stopping early with the required probabilities of correct selection. The major
advantage of this approach is the lowered expected sample size compared to the curtailed
difference approach. The results in the table show that the expected sample sizes go as low
as 34% of the fixed test size, and do not surpass 70%.
4.5 Extensions
Unknown Sample Sizes
Various online scenarios do not allow for determining the total sample size of exposed con-
sumers, but rather provide information only on the number of converters resulting from the
different exposures. As an example, when running an online advertising campaign to test
two ad creatives, a budget is allocated to a network that will be used to display both versions
of the ad. The network can guarantee a certain ratio of ad displays between the versions
(e.g. 1:1 or 1:2), but cannot guarantee, and many times cannot determine the number of
consumers exposed to the ads, as exposures are done on an impression by impression basis.
Another example is using a search advertising campaign that gives the number of impressions
per ad, but not the number of consumers exposed to each ad.
In such cases it is impossible to tell how many of the treatments resulted in non-
conversions (failures). It is possible, however, to test an approximate hypothesis about
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 53
the ratios of pA and pB assuming their values are small and the samples are large. Sup-
pose that a sample of n consumers was exposed to treatment 2, and that the ratio of
sample sizes between treatment 2 and 1 is z. That is, n/z consumers were exposed to
treatment 1. The expected number of successes from treatment 1 is (npA )/z, and it proba-
bility is P r(Converter from 1) = npAnp A
+nzpB
. Dividing by npB and letting r = ppBA we receive
r
P r(Converter from 1) = r+z .
Thus, we can consider the allocation of converters among the treatments as a Bernoulli
r
variable with parameter r+z . Testing for H0 : r = 1 vs H1 : r = 1 + d using the previously
described tests will allow to test both superiority as well as performing selection without
knowing n.
The disadvantage in this approach is that the Bernoulli approximation is not exact
and loses the information from non-converters that could be used to better distinguish the
likelihood-ratios (Cook et al., 2012). As a result, reaching the same powerful test will require
a larger number of converters and a total sample size. Other approaches which may be benefi-
cial include matching, where only instances of paired results of (xA , xB ) = (success, f ailure)
and (xA , xB ) = (f ailure, success) are being collected from the data. The interested reader
is directed to Wald et al. (1945) and to Tamhane (1985) for details.
Data Grouping
Performing a continuous sequential test with every additional observation is many times
undesirable and sometimes impossible. For example, many online tracking systems provide
only aggregated daily values for the number of exposures and converters in an experiment.
This restriction is a blessing in disguise as the limited number of tests to perform increases
the power of the test and lowers the potential error rate. Suppose the samples arrive in K
groups (e.g days), each of size nk , with sAk and sBk the number of success in group k. Each
sik is then distributed binomially with parameter pi , and its probability mass function can
be calculated exactly. The standard eqMaxSPRT statistic can be used, but the limits c
and c can be adjusted to be less conservative and allow for earlier stopping. The theory for
group sequential tests is well developed in Jennison and Turnbull (1999), and the R software
library gsDesign by Merck Corp. implements these techniques.
4.6 Application
To illustrate the use of the equMaxSPRT test, we apply the technique to data collected
by Reactful.com in an experiment to compare the efficacy of their software on a branding
website. Reactful.com is a startup producing add-on software for websites that allow the
website to “react” dynamically to customer visits to enhance user experience. As an example,
the add-on may detect confusion by the consumer evident in its mouse hovering between too
many options and react with a pop-up screen to suggest an explanation. Another example
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 54
is detecting when a consumer is about to enter a purchasing process and suggesting more
information to the consumer to help them make a decision.
The experiment was run for 14 days in January 2014 on a branding website (known as a
“mini-site”) whose goal was to educate consumers about a new product on the market of a
well known CPG brand. The conversion was defined as the event of a consumer ordering a
free trial of the product.
The experiment was set-up so that consumers were randomly allocated with equal prob-
ability to a static version of the mini-site (Control) and the dynamic version using Reactful
(Treatment). The goal was to show superiority of the Reactful product over the control with
at least 10% improvement. Every day data was collected about the total number of visitors
for each version of the site (nA and nB ), version A being the treatment and B the control.
In addition, the number of converters under each version was counted.
The data is displayed in Table 4.6. Although the software assigned each consumer to a
treatment randomly with equal probability, the samples are not balanced. In addition, the
control version had an eventual overall 11.4% conversion rate.
Day nA (Reactful) sA (Reactful) nB (Control) sB (Control)
1 661 102 681 85
2 1,044 136 1,030 124
3 651 58 691 50
4 1,108 84 1,102 68
5 1,111 117 1,007 102
6 737 126 719 105
7 973 145 923 134
8 1,527 194 1,531 195
9 804 108 741 93
10 471 67 533 77
11 778 109 739 85
12 671 87 689 83
13 547 63 474 52
14 558 69 573 49
Total 11,641 1,465 11,433 1,302
Total Conversion Rate 0.126 0.114
Table 4.6: Results of Reactful.com experiment on CPG trial website
The fixed sample size for a test of superiority able to detect a 10% increase with α = 0.05
and β = 0.1 is 13, 886 samples per arm. This is less than the sample sizes actually achieved in
the experiment, and would have required approximately 2, 500 additional samples for each
arm. The question is whether a sequential approach could determine if the treatment is
superior to the control.
As the data is daily aggregate data and not continuous, the standard SPRT techniques
should be modified to handle grouped data as discussed in Section 4.5. We should note,
however, that if at any point the group data crosses one of the boundaries for stopping, it
would have crossed that boundary in a possibly earlier time for a continuous test. Thus, we
can tell whether the test should have been stopped earlier or not. In addition, if we assume
that the test statistic can be approximated by straight lines between the group points, we
can use the standard eqMaxSPRT to decide if to accept H0 or H1 .
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 55
Figure 4.3 compares the test statistic values to the stopping boundaries of the test. As
can be seen, the test could have been stopped after day 12 with a conclusion that Reactful’s
software increases conversions by at least 10%.
c = 2.890372
3
2
eqMaxSPRT LLR
1
0
−1
−2
c = −2.251292
−3
2 4 6 8 10 12 14
Day
4.7 Conclusion
The standard technique of testing for equality of conversion rates in A/B tests can be inef-
ficient when not matched with the goal of the test and when not exploiting the sequential
nature of arriving data. In this chapter we have shown two approaches that when combined
can lead to a substantial decrease in expected sample sizes of online expeirments.
The first and simplest approach is to match the statistical test with the goal of the
experiment. By realizing that many experiments are aimed at selection, marketers can
frequently design powerful experiments which end later than previously hypothesized. The
second approach uses sequential analysis techniques to stop the experiment early when the
results show enough evidence for the efficacy or lack of efficacy of the treatment.
CHAPTER 4. REDUCING SAMPLE SIZES IN LARGE SCALE ONLINE
EXPERIMENTS 56
The field of sequential analysis is wide and contains many varieties of tests and designs
that can be used for a diverse set of online scenarios. Applying these techniques, however,
requires developing software to make the techniques accessible to researchers and practi-
tioners. Combining sequential techniques with Bayesian methods is a natural avenue for
further exploration of the topic. It should be noted, however, that there is a clear advantage
of using the frequentist approach when applying these dynamic techniques as they make
interpretation and application easy.
Future directions for research include the dynamic allocation of treatments to consumers
based on the historical result of the experiment to date. These adaptive methods balance
exploration and exploitation of the treatment arms to maximize the value generated by
the experiment, and are related to the classic multi-armed bandit problem. Performing
these experiments while combining them with an ongoing sequential procedure will prove
invaluable to the current development of online experimental techniques.
57
Bibliography
Eric T. Anderson and Duncan Simester. A step-by-step guide to smart business experiments.
Harvard Business Review, 89(3):98, 2011.
Susan Athey and Glenn Ellison. Position auctions with consumer search. The Quarterly
Journal of Economics, page forthcoming, 2012.
Jay Bartroff, Tze Leung Lai, and Mei-Chiung Shih. Sequential Experimentation in Clinical
Trials: Design and Analysis, volume 298. Springer, 2012.
Michael R. Baye and Heidrun C. Hoppe. The strategic equivalence of rent-seeking, innova-
tion, and patent-race games. Games and Economic Behavior, 44(2):217–226, 2003.
Robert E Bechhofer. An optimal sequential procedure for selecting the best bernoulli
process—a review. Naval research logistics quarterly, 32(4):665–674, 1985.
Ron Berman. An application of the shapley value for online advertising campaigns. Work
in Progress, 2013.
Thomas Blake, Chris Nosko, and Steven Tadelis. Consumer heterogeneity and paid search
effectiveness: A large scale field experiment. NBER Working Paper, pages 1–26, 2013.
Steve Blank. Why the lean start-up changes everything. Harvard Business Review, 91(5):
63–72, 2013.
Y. Chen and C. He. Paid placement: Advertising and search on the Internet. The Economic
Journal, 121(556):309–328, 2011.
Andrea J Cook, Ram C Tiwari, Robert D Wellman, Susan R Heckbert, Lingling Li, Patrick
Heagerty, Tracey Marsh, and Jennifer C Nelson. Statistical approaches to group sequential
monitoring of postmarket safety surveillance data: current state of the art for use in the
mini-sentinel pilot. Pharmacoepidemiology and drug safety, 21(S1):72–81, 2012.
Alex Gershkov, Jianpei Li, and Paul Schweinzer. Efficient tournaments within teams. The
RAND Journal of Economics, 40(1):103–119, 2009.
Anindya Ghose and Sha Yang. An empirical analysis of search engine advertising: Sponsored
search in electronic markets. Management Science, 55(10):1605–1622, 2009.
Bhaskar Kumar Ghosh and Pranab Kumar Sen. Handbook of sequential analysis. CRC Press,
1991.
DG Hoel, GH Weiss, and R Simon. Sequential tests for composite hypotheses with two
binomial populations. Journal of the Royal Statistical Society. Series B (Methodological),
pages 302–308, 1976.
Bengt Holmstrom. Moral hazard in teams. The Bell Journal of Economics, pages 324–340,
1982.
Christopher Jennison and Bruce W Turnbull. Group sequential methods with applications to
clinical trials. CRC Press, 1999.
Przemyslaw Jeziorski and Ilya Segal. What Makes Them Click: Empirical Analysis of Con-
sumer Demand for Search Advertising. SSRN eLibrary, 2009.
Zsolt Katona and Miklos Sarvary. The race for sponsored links: Bidding patterns for search
advertising. Marketing Science, 29(2):199–215, 2010.
Jack Kiefer, Lionel Weiss, et al. Some properties of generalized sequential probability ratio
tests. The Annals of Mathematical Statistics, 28(1):57–74, 1957.
Pavel Kireyev, Koen Pauwels, and Sunil Gupta. Do display ads influence search? attribution
and dynamics in online advertising. Working Paper, 2013.
René Kirkegaard. Favoritism in asymmetric contests: Head starts and handicaps. Games
and Economic Behavior, page forthcoming, 2012.
Vijay John Morgan Krishna. The winner-take-all principle in small tournaments. Advances
in Applied Microeconomics, 28:849–862, 2007.
BIBLIOGRAPHY 59
Martin Kulldorff, Robert L Davis, Margarette Kolczak, Edwin Lewis, Tracy Lieu, and
Richard Platt. A maximized sequential probability ratio test for drug and vaccine safety
surveillance. Sequential Analysis, 30(1):58–78, 2011.
Anja Lambrecht and Catherine Tucker. When does retargeting work? timing information
specificity. Timing Information Specificity (Dec 02, 2011), 2011.
Randall Lewis, Justin M Rao, and David H Reiley. Measuring the effects of advertising: The
digital frontier. Technical report, National Bureau of Economic Research, 2013.
Randall A Lewis and Justin M Rao. On the near impossibility of measuring advertising
effectiveness. Technical report, Working paper, 2012a.
Randall A Lewis and Justin M Rao. On the near impossibility of measuring advertising
effectiveness. Technical report, Working paper, 2012b.
Alice Li and P.K. Kannan. Modeling the conversion path of online customers. Working
Paper, 2013.
Puneet Manchanda, Jean-Pierre Dubé, Khim Yong Goh, and Pradeep K Chintagunta. The
effect of banner advertising on internet purchasing. Journal of Marketing Research, pages
98–108, 2006.
R. Preston McAfee and John McMillan. Optimal contracts for teams. International Eco-
nomic Review, pages 561–577, 1991.
Oliver J. Rutz and Randolph E Bucklin. From generic to branded: A model of spillover in
paid search advertising. Journal of Marketing Research, 48(1):87–102, 2011.
Ravi Sen. Optimal search engine marketing strategy. Int. J. Electron. Commerce, 10(1):
9–25, 2005.
Lee Sherman and John Deighton. Banner advertising: Measuring effectiveness and optimiz-
ing placement. Journal of Interactive Marketing, 15(2):60–64, 2001.
Ajit C Tamhane. Some sequential procedures for selecting the better bernoulli treatment
by using a matched samples design. Journal of the American Statistical Association, 80
(390):455–460, 1985.
BIBLIOGRAPHY 60
Greg Taylor. Search quality and revenue cannibalisation by competing search engines. Jour-
nal of Economics & Management Strategy, page forthcoming, 2012.
Catherine Tucker. The implications of improved attribution and measurability for online
advertising markets. 2012.
Alexander White. Search engines: Left side quality versus right side profits. working paper,
Touluse School of Economics, 2009.
Bo Xing and Zhangxi Lin. The impact of search engine optimization on online advertis-
ing market. In ICEC ’06: Proceedings of the 8th international conference on Electronic
commerce, pages 519–529, New York, NY, USA, 2006. ACM. ISBN 1-59593-392-1. doi:
http://doi.acm.org/10.1145/1151454.1151531.
Lizhen Xu, Jianqing Chen, and Andrew B. Whinston. Price competition and endogenous
valuation in search advertising. Journal of Marketing Research, 48(3):566–586, 2011.
Lizhen Xu, Jianqing Chen, and Andrew B. Whinston. Effects of the presence of organic
listing in search advertising. Information Systems Research, page forthcoming, 2012.
S. Yang and A. Ghose. Analyzing the relationship between organic and sponsored search
advertising: Positive, negative, or zero interdependence? Marekting Science, 29:602–623,
2010.
Song Yao and Carl F. Mela. A dynamic model of sponsored search advertising. Marketing
Science, 30(3):447–468, 2011.
Yi Zhu and Kenneth C. Wilbur. Hybrid advertising auctions. Marketing Science, 30(2):
249–273, 2011.
61
Appendix A
A.1 Proofs
Proof of Proposition 1:
Let Fεi −εj be the c.d.f. of a triangle distribution εi − εj ∼ T [−σ, σ] with mean zero and
fεi −εj be its p.d.f. Each website faces the following first order condition w.r.t. to their scores
resulting from the profit function:
s̄i − qi
vi · fεi −εj (s̄i − s̄j ) = , (A.1)
α2
where s̄i = Eεi [si ]. Let x = s̄i − s̄j and µ = qi − qj . By subtracting both F.O.Cs and using
the fact that fεi −εj is symmetric around zero we can rewrite the condition as:
x−µ
fεi −εj (x) = (A.2)
− vj )
α2 (vi
An interior solution x∗ would require both F.O.Cs and the S.O.Cs to hold as well as −σ ≤
µ µ
x∗ ≤ σ. When vi > vj and α2 ≥ −σ vi −v j
, or when vi < vj and α2 < σ vj −vi
, the equilibrium
σ 2 µ+σα2 (vi −vj ) µ
solution is s∗i − s∗j = x∗R = σ 2 +α2 (vi −vj )
. When vi < vj and α2 ≥ σ vj −v i
, or when vi > vj
µ σ 2 µ+σα2 (v −v )
and α2 < −σ vi −v j
, the equilibrium solution is s∗i − s∗j = x∗L = σ2 −α2 (vi −v
i j
j)
We can immediately verify that the condition σ > µ ensures that −σ ≤ x ≤ σ, while
2
α2 < vσH ensures that both the F.O.Cs and the S.O.Cs hold. Under the condition on α, the
equilibrium point is a unique extremum, and thus a global maximum.
To examine the effects of the equilibrium SEO investment on the ranking efficiency and
consumer satisfaction, we let P (α) denote the probability that the player with the highest
quality wins the organic link. Assume qH = q1 > q2 = qL . In the perfectly correlated
case x∗ = x∗R . We then have P (α) = Fε1 −ε2 (x∗R ) and P 0 (α) = fε1 −ε2 (x∗R ) ∂α ∂x
x=x∗R
> 0. In
the perfectly negatively correlated case, when ρ = −1, we have x∗ = x∗L , thus P 0 (α) =
APPENDIX A. APPENDIX FOR CHAPTER 2 62
α is small enough. For player 2, the same exercise shows there is no solution for α > 0 that
increases player 2’s profit.
Proof of Proposition 2: We use backward induction and first determine the sponsored bids
given the allocation of the organic link, then the SEO investments in three different cases
with respect to the site qualities. Initially, we assume that consumers start with the organic
link. Later we will show that this is an equilibrium strategy and that starting with the
sponsored link cannot be (Part 1 of the proposition). We will also determine the threshold
c. We start with the r < vL case and then show how the analysis changes for vL ≤ r < vH .
Let wO denote the organic and wS the sponsored winner. The main technique we use is to
compare the profits in equilibrium when the player occupies and does not occupy the organic
link. The difference between these profits is the value of the organic link for that player.
Case I: When qi = qj = qH , consumers stop searching at the organic link and do not
search further. This renders the sponsored link useless for both players leading to no valid
bids above the reserve price, r. The SEO game is therefore equivalent to the case with no
sponsored links.
Case II: When qi = qj = qL , consumers will not be satisfied with the organic link and
continue to the sponsored link as long as it does not lead to the same site. If site i is the
organic winner, then ctri = 0 for the sponsored link, leaving the sponsored link for site j 6= i
to win at a price per click equal to r. Since qi = qj , consumers do not go back to the organic
link, leaving 0 profits for site i. The organic link is worthless, therefore no site will invest in
SEO.
Case III: When qi = qH and qj = qL , consumers will stop at the organic link if wO = i.
Just as in Case I, no site will submit a valid bid higher than r. If wO = j, consumers will
not be satisfied with a low quality organic link and will continue searching, as long as the
sponsored link is different from the organic. As in Case II, ctri = 1 and ctrj = 0, leading
to wS = i at a price per click of r. Hence site i, with the high quality, will capture all the
demand regardless of which position it is in. When wO = i, this will lead to πiO = vi , but
when wO = j and wS = i, site i has to pay for the sponsored link and πiS = vi − r. The
value of winning the organic link will therefore be πiO − πiS = r for site i and πjO − πjS = 0
for site j. Applying the results of Proposition 1 with vi0 = r, vj0 = 0, qi0 = qH , qj0 = qL , we get
APPENDIX A. APPENDIX FOR CHAPTER 2 63
the optimal SEO efforts and the probability of a high quality organic link:
2
αr(σ − qH + qL ) ∗ 1 σ(σ − qH + qL )
∗ ∗
ei = , ej = 0, P = P (α|qi = qH , qj = qL ) = 1 − .
α2 r + σ 2 2 α2 r + σ 2
(A.4)
∗
P is increasing in α, that is, wO = i becomes more likely as α increases regardless of ρ,
proving Part 2 of the Proposition.
In Part 3, when vL ≤ r < vH the analysis is identical to the above except in Case III,
when wO = j and vi = vL < r. In this case site i with qi = qH cannot afford the sponsored
link and will profit πiO − πiS = vL − 0 = vL from getting the organic link, whereas site j will
profit πjO − πjS = vH − 0 = vH . According to Proposition 1 a higher α decreases Pr(wO = i),
2
but the probability of this case is Pr(qi = qH , qj = qL , vi = vL , v = vH ) = 1−ρ 4
, which
decreases with ρ and reaches 0 when ρ = 1. Thus, SEO will only increase the probability of
the high quality site acquiring the organic link if ρ is high enough, proving Part 3.
Returning to Part 1, combining the three cases, it is clear that the organic link is more
likely to be of high quality than the sponsored link. It is therefore rational for consumers
to start their search with the organic link. On the other hand, assuming that consumers
start with the sponsored link, redoing the same analysis shows that even then the organic
link is more likely to be high quality. Starting with the sponsored link is therefore never
an equilibrium strategy. Furthermore, in order to determine c, we need to calculate the
expected benefit of continuing the search when finding qL . This is simply (qH − qL ) Pr(qwS =
(1/2)(1−P ∗ ) ∗
qH |qwO = qL ) = (qH − qL ) (1/4)+(1/2)(1−P ∗ ) , where P is defined in (A.4). For a consumer to
even start searching it is sufficient to assume c < qL . Therefore,
1 − P∗
c = min qL , (qH − qL ) (A.5)
3/2 − P ∗
To prove Part 4, we only need to examine Case III, since neither consumer welfare nor
search engine revenue is affected by SEO in Case I and Case II. In Case III, consumers always
find qH eventually, but they are better off finding it right away, when wO = i. Therefore,
consumer welfare increases iff P (α) increases. On the other hand, search engine revenues
are higher when the low quality site acquires the organic link, that is, the revenue increases
iff P (α) decreases, proving Part 4.
Proof of Corollary 1: Consumers only click the sponsored link if the organic link is of
low quality. Thus, the search engine’s revenue is RSE = (1 − P (α)) · r, since the search
engine makes exactly r when the low quality site gets the organic link. From the proof of
2 (σ−q +q )2
Proposition 1, we can derive P (α) = P (α, r) = 34 − σ 4(σ H L
2 +rα2 )2 which is clearly increasing
SE
in r. Differentiating the revenue with respect to r yields ∂R∂r = 1 − P (α, r) − r · ∂P∂r (α,r)
=
1 σ 2 (σ−qH +qL )2 (σ 2 −rα2 )
4
+ 4(σ 2 +rα2 )3
. The above derivative is positive if r is below a suitable r̂(α), leading
to an inverse U-shaped revenue function below vL . The implicit function theorem yields that
r̂(α) is decreasing.
APPENDIX A. APPENDIX FOR CHAPTER 2 64
Proof of Corollary 2: When r < vL the higher quality site has an effective valuation of
r for the organic link, whereas the low quality site has an effective valuation of 0. From
the proof of Proposition 1, it is clear that the high quality site has an increasing chance of
acquiring the organic link and its profit increases as α increases.
σ(si − qi )
si vj
f∆ 1− = . (A.7)
σ vi vi α2
vj ∗
Again, if σ is high enough this yields a unique s∗i solution providing s∗j = s
vi i
and the
s∗i −qi s∗j −qj
e∗i= α
, =e∗j equilibrium efforts. We can then show that P (α) is increasing
α
(decreasing) depending on the relationship between (vi , vj ) and (qi , qj ) in the exact same
fashion as in the proof of Proposition 1.
describes the equilibria in contests with asymmetric players, while Siegel (2009) analyzed
such games under more general conditions. Our application is unique in that it considers the
cases where the initial asymmetry is biased by noise inherent in the quality measurement
process. Krishna (2007) and Athey and Nekipelov (2010) are two of the few examples taking
noise into consideration in an auction setting.
In this section we assume that two Web sites compete for a single organic link, but nlike
in our main model, we assume that sites observe the error made by the search engine in
assigning scores to them. Before deciding on their SEO investments, sites therefore observe
sSi = qi + i . Let the distribution of the error be simple: it takes the values of σ or −σ with
equal probabilities. We assume σ > |q1 −q2 |/2 to ensure that the error can affect the ordering
of sites, otherwise the error never changes the order of results and the setup is equivalent to
one with no error. We assume that valuations are exogenously given v1 , v2 and that qualities
are q1 > q2 since in the case of equal qualities, SEO does not matter.
When search engine optimization is not possible, i.e., when α = 0, sites cannot influence
their position among the search results. Since q1 ≥ q2 , the probability that the higher quality
site gets the organic link is P (0) = 43 . When search engine optimization becomes effective,
i.e., when α > 0, websites have a tool to influence the order of results knowing the score that
has been assigned to them sSi , which includes the error. The game thus becomes an all-pay
auction with headstarts.
Lemma 3. The game that sites play after observing their starting scores is equivalent to an
all-pay auction with headstarts.
Proof. All pay-auctions with headstarts are generalizations of basic all-pay auctions. In
traditional all-pay auctions players submit bids for an object that they have different valua-
tions for. The player with the highest bid wins the object, but all players have to pay their
bid to the auctioneer (hence the term “all-pay auction”). When the auctioneer does not
collect the revenues from the bids which are sunk, the game is called a contest. If players
have headstarts then the winner is the player with the highest score - the sum of bid and
headstart.
The level of headstart in our model depends on the starting scores and hence on the
error. For example, if q1 > q2 and ε1 = ε2 = 1, the error does not affect the order (which is
q1 ≥ q2 ) nor the difference between the starting scores (q1 − q2 ). Since SEO effectiveness is
α, an investment of b only changes the scores by αb, therefore the headstart of site 1 is q1 −q
α
2
.
As the size of the headstart decreases with α, the more effective SEO is, the less the initial
difference in scores matters. Even if site 1 is more relevant than site 2, it is not always the
case that it has a headstart. If ε1 = −1 and ε2 = 1 then sS1 = q1 − σ < sS2 = q2 + σ given
our assumption on the lower bound on σ. Thus, player 2 has a headstart of q2 +2σ−q α
1
. By
analyzing the outcome of the all-pay auction given the starting scores, we can determine the
expected utility of the SE and the websites.
We decompose the final scores of both sites into a headstart h and a bid as follows:
sS −sS
s̃1 = h + b1 and s̃F2 = b2 where h = 1 α 2 . The decomposed scores have the property that
F
APPENDIX A. APPENDIX FOR CHAPTER 2 66
s̃F1 ≥ s̃F2 ⇐⇒ sF1 ≥ sF2 for every b1 , b2 and thus preserve the outcome of the SEO game.
Since the investments are sunk and only the winner receives the benefits (with the exception
sS −sS
of a draw) the SEO game is equivalent to an all-pay auction with a headstart of h = 1 α 2 .
In the following, we present the solution of such a game to facilitate the presentation of the
remaining proofs.
All-pay auctions with complete information typically do not have pure-strategy Nash-
equilibria. In a simple auction with two players with valuations v1 > v2 , both players
mix between bidding 0 and v2 with different distributions. The generic two player all-pay
auction with headstarts has a unique mixed strategy equilibrium. When players valuations
are v1 ≥ v2 and player 1 has a headstart of h then s/he wins the auction with the following
probabilities:
1 h > v2
W1 (h) = P r(1 wins|h ≥ 0) = v2 h2
1 − 2v1 + 2v1 v2 h ≤ v2
v2
1 − 2v1 h ≥ v2 − v1
v1 −h2
2
W1 (h) = P r(1 wins|h < 0) = −v1 ≤ h < v2 − v1
2v01 v2 otherwise
For completeness, we specify the players’ cumulative bidding distributions. When h is posi-
tive,
0 b≤0
0 b≤0
1 − v2v−h
h+b
b ∈ (0, h]
F1 (b) = b ∈ (0, v2 − h] F2 (b) = 1
(A.8)
v2 1 − v2v−b b ∈ (h, v2 ]
1 b > v2 − h
1
1 b > v2
When h is negative,
0 b≤h 0 b≤0
b−h v2 −b
F1 (b) = v2
b ∈ (h, v2 + h] F2 (b) = 1 − v1 b ∈ (0, v2 ] (A.9)
1 b > v2 + h 1 b > v2
In our model, the value of the headstart is determined by the different realizations of the er-
rors ε1 , ε2 . There are four possible realizations with equal probability: h1 = h2 = q1 −qα
2
, h3 =
q1 −q2 +2σ q1 −q2 −2σ
α
and h4 = α
. Player 1, having the higher valuation, wins with the higher prob-
ability of v1 /2v2 and player 2’s surplus is 0. Thus, only the player with the highest valuation
makes a positive profit in expectation, but the chance of winning gives an incentive to the
other player to submit positive bids. In the case of an all-pay auction with headstarts the
equilibrium is very similar and the player with the highest potential score (valuation plus
headstart) wins with higher probability and the other player’s expected surplus is 0. The
winner’s expected surplus is equal to the sum of differences in valuations and headstarts.
Figure A.1 illustrates the probabilities that the two sites win and their payoffs as a function
of the headstart.
APPENDIX A. APPENDIX FOR CHAPTER 2 67
Figure A.1: Mixed strategy equilibrium of an all-pay auction as a function of the headstart of
player 1.
Sites’ valuations are v1 = 1.4 and v2 = 0.6. The probability that player 1 (player 2) wins is
weakly increasing(decreasing) in the headstart, similarly to the payoffs.
As we have seen when SEO is not possible and α = 0, we have P (0) = 3/4. Our goal
is therefore to determine whether the probability exceeds this value for any positive α SEO
effectiveness levels. It is useful, however, to begin with analyzing how the probability depends
on valuations and qualities for given α and σ values. The following Lemma summarizes our
initial results.
Lemma 4. For any fixed α and σ, P (α; σ, v1 , v2 , q1 , q2 ) is increasing in v1 and q1 and is
decreasing in v2 and q2 .
Proof. Since P (α) = 21 W1 (h1 ) + 14 W1 (h3 ) + 41 W1 (h4 ) and the headstart does not depend on
v1 and v2 , it is enough to show that W1 (·) is increasing in v1 and decreasing in v2 . These
easily follow from the definition of W1 (·). The results on q1 and q2 follow from the fact
that h1 , h3 , h4 are all increasing in q1 and decreasing in q2 , and W1 (·) depend on them only
through h in which it is increasing.
To show that our main results hold in this case, we derive the following.
Proposition 8.
1. For any σ > |q1 − q2 |/2, there exists a positive α̂ = α̂(σ, v1 , v2 , q1 , q2 ) SEO effectiveness
level such that P (α̂) ≥ P (0).
2. If v1 /v2 > 3/2 then for any σ > |q1 −q2 |/2, there exists a positive α̂ = α̂(σ, v1 , v2 , q1 , q2 )
such that P (α̂) > P (0).
APPENDIX A. APPENDIX FOR CHAPTER 2 68
v2 +v1 q1 −q2
3. If v1 < v2 and σ ≥ v2 −v1 2
then for any α > 0 we have P (α) ≤ P (0).
Proof. We use the notation Pi = P r(1 wins|hi ). Given the above described equilibrium
of the two-player all-pay auctions we have Pi = W1 (hi ). We further define α1 = q1v−q 2
2
,
q1 −q2 +2σ q2 −q1 +2σ 0 q2 −q1 +2σ
α3 = v2
, α4 = v1
, α4 = v1 −v2 . Note that P1 = P2 , since the headstarts in the
first two case are equal. Thus P (α) = 12 P1 + 14 P3 + 14 P4 , and P1 = 1 iff α ≤ α1 , P3 = 1 iff
v2
α ≤ α3 , P4 = 1 − 2v 1
iff α ≥ α40 . Furthermore, it is easy to check that α1 ≤ α3 , α4 ≤ α3 , and
0
α4 ≤ α4 .
We proceed by separating the three parts of the proposition:
• Part 2: In order to prove this part, we determine the α value that yields the highest
efficiency level for a given σ if v1 /v2 > 3/2. As noted above, P (α) is a linear com-
bination of W1 (h1 ), W1 (h3 ), W1 (h4 ). Since W1 (·) is continuous and h1 , h3 , h4 are all
continuous in α, it follows that P (α) is continuous in α. However, P (α) is not dif-
ferentiable everywhere, but there are only a finite number of points where it is not.
Therefore it suffices to examine the sign of P 0 (α) to determine whether it is increasing
or not. This requires tedious analysis, since depending on the value of σ the formula
describing P (α) is different in up to five intervals. We identify five different formulas
that P (α) can take in different intervals and take their derivatives:
(q1 − q2 − 2σ)2
P 0 (α) = PI0 (α) = if α4 ≤ α ≤ α1 &α40 ,
4α3 v1 v2
(q1 − q2 )2
P 0 (α) = PII
0
(α) = − if α1 ≤ α ≤ α4 ,
2α3 v1 v2
2(q1 − q2 )2 + (q1 − q2 + 2σ)2
P 0 (α) = PIII
0
(α) = − if α3 &α40 ≤ α,
4α3 v1 v2
4σ 2 − (q1 − q2 )(4σ + q1 − q2 )
P 0 (α) = PIV
0
(α) = if α1 &α4 ≤ α ≤ α40 ,
4α3 v1 v2
(q1 − q2 )(4σ + q1 − q2 )
P 0 (α) = PV0 (α) = − 3
if α3 &α40 ≤ α.
2α v1 v2
In any other range the derivative of P (α) is 0. It is clear from the above formulas
that PI0 (α) is always positive and that PII
0 0
(α), PIII (α), and PV0 (α) are always negative.
Furthermore, one can show that
√
0 1+ 2
PIV (α) > 0 iff σ > (q1 − q2 ).
2
This allows us to determine the maximal P (α) for different values of σ in four different
cases.
APPENDIX A. APPENDIX FOR CHAPTER 2 69
1. If q1 −q
2
2
≤ σ ≤ vv12 q1 −q
2
2
then α4 ≤ α40 ≤ α1 ≤ α3 and the derivative of P (α) takes
the following values in the five intervals respectively: 0, PI0 (α), 0, PII
0 0
(α), PIII (α).
Therefore P (α) is first constant, then increasing, then constant again and then
strictly decreasing. Thus, any value between α40 and α1 maximizes P (α). Using
the notation of Corollary 5, Â(σ) = [α40 , α1 ].
2. If vv21 q1 −q
2
2
≤ σ ≤ v1v+v
2
2 q1 −q2
2
then α4 ≤ α1 ≤ α40 ≤ α3 and the derivative of P (α)
takes the following values in the five intervals respectively: 0, PI0 (α), PIV 0
(α),
0 0
PII (α), PIII (α). Therefore P (α) is first constant, then decreasing, then strictly
0
increasing, then depending on the sign of PIV √
(α) increasing or decreasing, and
1+ 2
finally strictly decreasing. Therefore √
if σ < 2 (q1 −q2 ) then α1 maximizes P (α),
1+ 2
that is Â(σ) = {α1 }. If σ = 2 (q1 − q2 ) then √
P (α) is constant between α1 and
α4 , that is Â(σ) = [α1 , α4 ]. Finally, if σ = 2 (q1 − q2 ) then Â(σ) = {α40 }.
0 0 1+ 2
2 q1 −q2 q1 −q2
3. If v1v+v
2 2
≤ σ ≤ 2v2v−v
1
1 2
then α1 ≤ α4 ≤ α40 ≤ α3 and the derivative of
0 0
P (α) takes the following values in the five intervals respectively: 0, PII (α), PIV (α),
0 0 0 v1 +v2 q1 −q2 3 q1 −q2
PII (α), PIII (α). In this case PIV (α) > 0 since σ ≥ v2 ≥ (1 + 2 ) 2 >
√ q1 −q2 2
(1 + 2) 2 . Therefore P (α) is first constant, then decreasing, then strictly
increasing again and finally strictly decreasing. Thus, there are two candidates √
for the argmax: α1 and α40 . One can show that PIV (α40 ) > PII (α1 ) iff v1 > 2v2 ,
therefore α40 maximizes P (α) in this case.
q1 −q2
4. If 2v2v−v
1
1 2
≤ σ then α1 ≤ α4 ≤ α3 ≤ α40 and the derivative of P (α) takes
0 0
the following values in the five intervals respectively: 0, PII (α), PIV (α), PV0 (α),
0 0
PIII (α). Similarly to the previous case PIV (α) > 0, therefore P (α) is first con-
stant, then decreasing, then strictly increasing again and finally strictly decreas-
ing. Comparing the two candidates for the argmax yields that PIV (α3 ) > PII (α1 )
iff v1 > (3/2)v2 , that is α3 maximizes P (α) in this case.
In each of the cases above, it is clear that the maximum is higher than P (0) = 3/4. In
cases 1 and 2, P (α) is strictly increasing after a constant value of 3/4 and in cases 3
and 4 we directly compared to PII (α1 ) = 3/4. This completes the proof of Part 2.
• Part 3: One can derive the efficiency function for different cases as in Part 2. It follows
+v1 q1 −q2
that if σ ≥ vv22 −v 1 2
then P 0 (α) is first 0 then negative and finally positive. Therefore
P (α) either has a maximum in α = 0 or as it approaches infinity. However,
v1 1 3
P (α) −→ ≤ < = P (0).
α→∞ 2v2 2 4
The results are consistent with our main model, where the errors are not observed by
the firms prior engaging in SEO. We also examine how the benefits of SEO change with the
magnitude of the error. Let Â(σ) denote the set of α SEO effectiveness levels that maximize
APPENDIX A. APPENDIX FOR CHAPTER 2 70
the search engine’s traffic. For two sets A1 ⊆ R and A2 ⊆ R, we say that A1 A2 if and
only if for any α1 ∈ A1 there is an α2 ∈ A2 such that α2 ≤ α1 and for any α20 ∈ A2 there is
an α10 ∈ A1 such that α10 ≥ α20 .
Corollary 5. If v1 /v2 > 3/2, then the optimal SEO effectiveness is increasing as the variance
of the measurement error increases. In particular, for any σ1 > σ2 > 0, we have Â(σ1 )
Â(σ2 ).
Proof. In the proof of Proposition 8, we determined the values of α that maximize P (α) for
different σ’s. In summary:
if q1 −q ≤ σ ≤ vv21 q1 −q
0
[α4 , α1 ]
2 2
2 2 √
if vv12 q1 −q < σ ≤ (1 + 2) q1 −q
α1 2 2
√ q1 −q2
2 2
[α1 , α40 ] if σ = (1 √ + 2)2 2
Â(σ) =
α40 if (1 + 2) q1 −q 2
< σ ≤ v1v+v 2
2 q1 −q2
2
2 q1 −q2 q1 −q2
α40 if v1v+v ≤ ≤ v1
2 2
σ 2v2 −v1 2
q1 −q2
if 2v2v−v
α
3
1
1 2
≤ σ
It is straightforward to check that all of α1 , α3 , and α40 are increasing in σ and that the Â(σ)
is increasing over the entire range.
the error is large enough that it makes a difference, that is, we assume that q̃i ε < q̃i+1 ε̄ for
each 1 ≤ i ≤ n. Furthermore, let Φji be an indicator for site i appearing in location j among
the top k sites.
We treat consumer search as an exogenous process and assume that when site i is dis-
played in location j of the organic list, it receives βj clicks from a mass one of consumers.
We call this quantity the click-through rate. Given sites’ click-through rates, we define ti as
the total amount of visitor traffic a site receives in a list of k sites:
" k #
X
ti (b̃, q̃) = Eε̃ βj P r(Φji = 1) . (A.10)
j=1
The profit of site i is thus πi (b̃, q̃) = Ri (ti (b̃, q̃)) − b̃i . We let π = (π1 , . . . , πn ). The first order
conditions necessary for equilibrium are given by
Proof. We denote by Pij (b̃, q̃) the probability R that site i appears in location j among the top
k sites. This probability equals Pij (b̃, q̃) = Φji (b̃, q̃, ε̃)dFε̃ (ε̃). The total number of clicks site
i gets, ti , is therefore ti (b̃, q̃) = Jj=1 βj Pij (b̃, q̃).
P
A proportional increase of all bids in b̃ does not change the expected rankings of the
sites, and keeps the expected number of clicks constant for all sites: ti (b̃, q̃) = ti (η b̃, q̃) for
Pk η 6= ∂0. Since ti is homogeneous
any Pk ∂
of degree zero, by Euler’s homogeneous function theorem,
b̃ R (t
l=1 l ∂ b̃l i i ( b̃, q̃)) = ri l=1 l ∂ b̃l ti (b̃, q̃) = 0.
b̃
As a result, the following holds:
k k
X ∂ X ∂
πi (b̃, q̃) · b̃l = (b̃l Ri (ti (b̃, q̃))) − b̃i = −b̃i (A.12)
l=1
∂ b̃l l=1
∂ b̃l
Using Lemma 5, we can rewrite the first order conditions by defining a mapping b̃ = λ(τ )
that exists in some neighborhood of τ = 1:
d
τ πi (λi (τ ), τ λ−i (τ ), q) = −b̃i (A.14)
dτ
We let V = [0, v1 ] × . . . × [0, vk ] be the support of potential bids of players 1 to k, and define
D0 (b̃, q̃) = ∂∂b̃0 π(b̃, q̃) with the diagonal elements replaced with zeros. The following theorem
from Athey and Nekipelov (2010) establishes the conditions under which the mapping λ(τ )
exists locally around τ = 1 and globally for τ ∈ [0, 1], which yields the equilibrium bids of
the players.
Theorem 1 (Athey and Nekipelov (2010)). Assume that D0 is continuous in b̃. Suppose
that for each i = 1, . . . , k, ti (b̃, q̃) > 0, and that each πi is quasi-concave in b̃i on V and for
each b̃ its gradient contains at least one non-zero element. Then
1. An equilibrium exists if and only if for some δ > 0 the system of equations (A.14) has
a solution on τ ∈ [1 − δ, 1].
2. The conditions from part 1 are satisfied for all δ ∈ [0, 1] and so an equilibrium exists,
if D0 (b̃, q̃) is locally Lipschitz and non-singular for b̃ ∈ V except for a finite number of
points.
3. There is a unique equilibrium if and only if for some δ > 0 the system of equations
(A.14) has a unique solution on τ ∈ [1 − δ, 1].
4. The conditions from part 3 are satisfied for all δ ∈ [0, 1], so that there is a unique
equilibrium, if each element of ∂∂b̃0 π(b̃, q̃) is Lipschitz in b̃ and non-singular for b̃ ∈ V 1 .
The theorem shows that under very general conditions, websites would spend non-zero
efforts on SEO in equilibrium. We now proceed to analyze how positive levels of SEO
effectiveness α affect the satisfaction of consumers from the ranking of the organic list. To
analyze the incentives of the different websites, it is easier to transform the multiple links
contest into a game where websites choose the amount of traffic they would like to acquire
from organic clicks, which implicitly determines their bids. We define the vector of traffic for
each site i given the SEO effectiveness α and the vector of bids b̃ as tα (b̃) = (tα1 (b̃), . . . , tαn (b̃)).
For each player i, fixing the bids of other players as b−i , we can rewrite the first order condition
of each player as ∂π i
∂ti
= 0. The expected utility of consumers when searching through links
α α α
P
with traffic vector t is EU (t ) = i qi ti .
Analyzing the result of the SEO game with multiple links is hard. In addition, under
certain conditions, such as when the errors are small or α is very large, multiple equilibria
might exist as shown in Siegel (2009). We therefore proceed to analyze the special cases
1
Athey and Nekipelov (2010) give example conditions for the non-singularity of the matrix D0 in their
Lemma 2.
APPENDIX A. APPENDIX FOR CHAPTER 2 73
defined by Theorem 1 where an internal equilibrium exists for all players and the first order
conditions hold for players in equilibrium. For every α we define Tα = {tα |EU (tα ) ≥ EU (t0 )}
as the group of all traffic distributions over sites where the expected consumer utility is higher
than under the benchmark traffic distribution t0 .
The following proposition shows that under certain conditions, a positive level of SEO
can improve consumer satisfaction. These conditions are sufficient, but by no means neces-
sary. We conjecture that much weaker conditions can be found under which SEO improves
consumer satisfaction.
Proposition 9. For each α such that there exists a vector of non-negative functions M (t) =
(M1 (t), . . . Mk (t)) with
Mi (t) ti ∂π i+1
∂ti+1
(t)
> (A.15)
Mi+1 (t) ti+1 ∂π i
(t)
∂ti
∂πi+1
for every t ∈ Tα and ∂ti+1
(t) 6= 0, the equilibrium distribution of traffic tα∗ satisfies EU (tα∗ ) >
EU (t0 )
Proof. Recall that Tα contains all traffic distributions t = (t1 , . . . , tn ) for which the expected
utility of consumersPis weakly P
greater with an SEO effectiveness level of α than with α = 0,
implying EUP (t) = i qi · ti ≥ i qi · t0i .
Let βP = j βj be the sum of Pthe exogenous click-through rates. If we normalize the sum
of clicks i ti to 1 we have β = i ti . We then define, for each α, the mapping
Above, for convenience of notation, α was dropped and the first orders ∂π ∂ti
i
as well as the
traffic distributions ti are given under the specific α for each Fα . To simplify exposition we
assign t̃ = βt as the normalized traffic vector. This mapping has several special properties:
• The mapping maps a given traffic distribution to another, implicitly setting the re-
quired bids to reach this traffic distribution. The input and output distributions are
normalized to one, so the mapping is closed on traffic distributions. In addition, the
mapping is continuous.
• The fixed points of each mapping Fα are the equilibrium distributions of the SEO
game. To see this, note that when the first order conditions hold and are equal zero,
the mapping has a fixed point, and vice-versa.
As a result, showing that the fixed points of Fα are superior to t0 would prove that SEO
increases consumer utility in equilibrium. To see this, let t ∈ Tα . Then
! ∂πi
P ∂πj
X ti + Mi | ∂π
∂ti
i
| ti
X βM i | ∂ti
| − ti j Mj | ∂tj |
U (F (t̃)) − U (t̃) = qi ∂πj
− = qi ∂πj
(A.16)
β
P P
i β + j M j | ∂tj
| i β(β + j Mj | ∂tj
|)
P
As Mi (t) are non-negative and β = i ti ,
the difference in utilities is positive when:
! !
X ∂πi X ∂πj X X ∂πi
qi βMi − ti Mj = tj Mi (qi − qj ) > 0 (A.17)
i
∂ti j
∂t j j i
∂t i
Fix i, j and assume i < j, then qi ≥ qj . Looking at the couples of additions in the sum for
i, j we get
∂πi ∂πj ∂πi ∂πj
tj Mi (qi − qj ) + ti Mj (qj − qi ) = tj Mi − ti Mj (qi − qj ) (A.18)
∂ti ∂tj ∂ti ∂tj
Proposition 10. When (i) the decisions about the sponsored auction and SEO are made
simultaneously, (ii) consumers have a small, but positive search cost c, and (iii) r < vL :
1. The game has a unique equilibrium in which all consumers start their search with the
organic link.
APPENDIX A. APPENDIX FOR CHAPTER 2 75
e21
v1 − r + r Pr(s1 > s2 ) − .
2
Aside from the fixed v1 − r that the site make regardless of its SEO investment, this a special
case of what we saw in equation (2.2) in the paper. Site 1 will thus behave as if it had a
valuation r for the organic link, while its opponent had 0. The likelihood of site 1 acquiring
the organic link will be P (α; σ, r, 0, qH , qL ) which is increasing in α regardless of ρ. This
proves Part 2. For Part 1, it is easy to see that consumers are better off starting on the
organic side in this equilibrium. Similarly to the proof of Proposition 2, we can prove that
an equilibrium where consumers start on the sponsored side does not exist by redoing the
above steps assuming that they do start on the sponsored side. Finally, to prove Part 3, it is
trivial to see the search engine makes less money if the high quality site acquires the organic
link and as SEO becomes more efficient, this is more likely.
heterogeneous search costs across consumers. An important advantage of this setup is that
it allows us to examine consumers’ decision to visit the search engine and to understand how
SEO affects the search engine’s traffic.
Instead of fixing each consumer’s search cost at c ≥ 0, we now assume that consumers
have potentially different non-negative search costs distributed according to a distribution
with a support of [0, ∞) and a differentiable c.d.f., G. An important implication of having
consumers with different search costs is that some of them might have relatively high costs
so that they would only want to visit a single link. This leads to the emergence of an
equilibrium where consumers start their search with the sponsored link. We distinguish the
different types of equilibria depending on which side consumers start the search. We call the
equilibrium where consumers start with the organic link an O-type equilibrium, and we call
an equilibrium S-type if consumers start on the sponsored side.
Proposition 11. There is always one O-type equilibrium in which consumers start with
the organic link. When ρ is high enough and a large enough proportion of consumers have
high search costs, there is a second, S-type equilibrium in which consumers start with the
sponsored link.
Proof. We begin by showing that there is an O-type equilibrium, similarly to the proof of
Proposition 2. When consumers start with the organic link only the advertiser who does
not acquire the organic link will have a chance to get the sponsored link. When a high
quality player is in the organic position, the low quality competitor will not benefit from the
sponsored link. When a low quality player obtains the organic link, consumers with a low
search cost will click the sponsored link to find out if it is higher quality. Let ĉ(p) denote
the expected benefit of continuing to the sponsored link where p is the probability that a
high quality advertiser obtains the organic link when advertisers have different qualities and
valuations. Thus, consumers whose search costs is lower than the above benefit will search.
The proportion who continues is ϑ(p) = G(ĉ(p)) which is continuous in p. Performing the
same calculations as in the proof of Proposition 2, we get that the value for the site with the
high quality (denoted as site 1) to get the organic link is (1 − ϑ(p))v1 + ϑ(p)r, whereas site
2’s value is (1 − ϑ(p))v2 . Using the function P (α, vi , vj , qH , qL ) from the proof of Proposition
8, we obtain an equilibrium by solving p = P (α, (1 − ϑ(p))v1 + ϑ(p)r, (1 − ϑ(p))v2 , qH , qL ).
Since the derivative of the continuous P () function is less than 1 and the function takes a
positive value at p = 0 and less than 1 at p = 1, we obtain a unique solution. As long as
α is not too high, the organic link will be more likely to be high quality than low quality.
Therefore, consumers do not have an incentive to deviate and start with the sponsored link.
To show that existence of an S-type equilibrium assume that consumers start with the
sponsored link. When the organic link is obtained by the high quality site the sponsored
competition will be for the (1 − ϑ) proportion of consumer who only click on the first
(sponsored) link they encounter. When the low quality site obtains the organic link, the
sponsored competition is for all consumers (as the high quality either gets them all or none).
The sponsored link will thus always go to the advertiser with the higher per-click valuation
APPENDIX A. APPENDIX FOR CHAPTER 2 77
(as CTR’s will be the same, 1 for both players) as long as the minimum bids are exceeded,
otherwise there will be no sponsored link. In order for the minimum bid to be exceeded, we
need vH (1−ϑ) > r, that is 1−ϑ > r/vH . This condition is satisfied if enough consumers have
a high enough search cost so that they would never search, for example, 1 − G(qH − qL ) >
r/vH . If ρ is high enough then the player with the higher valuation is likely to be the high
quality advertiser. This makes it rational for consumers to start with the sponsored link,
as there is always a positive probability that the organic link will be acquired by the low
quality player.
The first type of equilibrium is a direct generalization of the one described in Proposition
2. If consumers start with the organic link, the sponsored link only serves as a backup and
the high quality advertiser has a higher chance of getting the organic link. Therefore, sites
take SEO seriously and the organic link will offer a higher expected quality to consumers who
rationally start their search there. However, when there are enough people with high search
costs who will never click more than one link there is an equilibrium in which consumers
start with the sponsored link. If sites expect a significant proportion of consumers to only
click the sponsored link they will take the sponsored auction seriously. If the site with the
higher quality is more likely to have a higher valuation (high ρ), it will win the sponsored
link no matter who acquires the organic link. Therefore, it is rational for consumers to start
with the organic link. SEO is not as important in the S-type equilibrium since most of the
competition will happen on the sponsored side and the organic link serves as a backup.
Although the S-type equilibrium only exists for a limited parameter range, it is at least
as important as the O-type equilibrium. When some consumers do very limited searches
and advertisers’ qualities are correlated with their valuations for consumers, it is plausible
for consumers to start with the sponsored link and advertisers to fight hard for them. The
multiplicity of equilibria may indeed explain the substantial differences observed between
sponsored click-through rates for different keywords (Jeziorski and Segal, 2009).
Taking the above analysis a step further, we directly examine how search costs impact
the outcome of SEO. In order to compare different search cost distributions, let G1 G2
denote the relation generated by first-order stochastic dominance.
Corollary 6. Suppose G1 G2 .
1. The likelihood of a high quality organic link in the O-type equilibrium is higher (lower)
under G1 than under G2 for high (low) values of ρ.
2. In the S-type equilibrium the likelihood of a high quality organic link is lower under G1
than under G2 .
Proof.
Part 1: Since G1 G2 , we have ϑ1 (p) = G(ĉ(p)) ≤ ϑ2 (p) = G2 (ĉ(p)). That is, when
search costs are higher, fewer consumer continue to the sponsored link. To determine the
probability of a high organic link, we obtain the solution of p = P (α, (1−ϑ(p))v1 +ϑ(p)r, (1−
APPENDIX A. APPENDIX FOR CHAPTER 2 78
ϑ(p))v2 , qH , qL ) as in the proof of Proposition 11. Note from the proof of Proposition 8 that
P (α, vi , vj , qH , qL ) is increasing in vi − vj . In our case vi − vj = (1 − ϑ)(v1 − v2 ) + ϑr which is
decreasing in ϑ when v1 > v2 , that is when ρ = 1 and increasing when v1 < v2 , that is when
ρ = −1. Since ϑ1 (p) is lower than ϑ2 (p), the solution for G1 () is higher (lower) than for G2
when ρ is high (low).
Part 2: The case of an S-type equilibrium is very similar, but this type of equilibrium
only exists when ρ is high. When the high quality advertiser has a high valuation, its benefit
of getting the organic link is ϑvL , the extra sponsored payment it would have to incur when
not getting the organic link. The low quality player has 0 valuation for the organic link,
therefore vi − vj = ϑvL which is increasing in ϑ, which completes the proof the same was as
in Part 1.
The results illustrate the different roles of SEO in the two types of equilibria. When con-
sumers start searching with the organic link, higher search costs generally lead to tougher
competition in SEO. The intuition is that higher search costs decrease consumers’ search
incentives and advertisers’ only chance to attract an increasing proportion of consumers is
through the organic link. Therefore, when valuations and qualities are correlated higher
search costs lead to a higher SEO investment on the high quality advertiser’s part. Con-
sequently, a lower percentage of consumers will move on to the sponsored side. This hurts
the high quality advertiser when its low quality competitor possesses the organic link and
incentivizes it to invest more in SEO. The opposite is true when valuations and qualities
are negatively correlated: as search costs go up the low quality advertiser (now with a high
valuation) has an increased incentive to fight for the organic link that becomes the only link
that an increasing proportion of consumers clicks on.
On the other hand, when consumers start with the sponsored link, the organic link only
serves as a backup. As search costs go up, fewer consumers continue to the organic link,
therefore its importance declines. Since this equilibrium only exists for correlated qualities
and valuation the smaller number of click on the organic link will reduce the high quality
advertiser’s incentive and chance of obtaining the organic link.
Finally, in addition to examining the differences in consumers searching behavior once
they arrive to the search engine, we also study their initial decision to visit. Comparing their
expected net payoff from visiting the search engine with an outside option of 0 allows us to
determine the amount of traffic and revenue the search engine receives.
Corollary 7. The search engine’s revenue is always decreasing as a high quality organic link
becomes more likely, even when the traffic to the search engine is increasing.
Proof.
In case of an O-type equilibrium: The expected benefit of moving on to the sponsored
link when encountering a low quality organic link is
where p1 is the probability of a high quality organic link when the advertisers have different
qualities and perfectly correlated valuations, p−1 when they perfectly negatively correlated
valuations, pH when both of them have high valuations and pL when both of them have
low valuations. The person with a search cost of ĉ will be therefore indifferent between
stopping and continuing. The same person will have an expected benefit of ĉ0 = q8H (2 + (1 +
ρ)2 p1 + (1 − ρ)2 p−1 + (1 + ρ)(1 − ρ)(pH + pL )) from visiting the search engine. Since ĉ0 > ĉ,
some consumers will stop searching after visiting the organic link. As the p values increase,
the benefit from visiting the search engine increases, but the ĉ threshold for clicking the
sponsored link decreases in each p. Therefore, even though the traffic to the search engine
strictly increases as any or all of the p values increases, the search engine’s revenue will
strictly decrease (as each click generates a revenue of r).
In case of an S-type equilibrium: The expected benefit of moving on to the organic link
2p−1 (1−ρ)2 +(1+ρ)(1−ρ)
when encountering a low quality sponsored link is ĉ = (qH −qL ) 2(1+ρ) 2 +4(1−ρ)2 +2(1+ρ)(1−ρ) . The
0 qH
same person will have an expected benefit of ĉ = 16 (4 + 3(1 + ρ) + (1 − ρ)2 ) from visiting
2
the search engine. Since ĉ0 > ĉ when the S-type of equilibrium exists (ρ > 0 is necessary),
the highest search costs visitors will only click on the sponsored link. Therefore, high search
cost consumers will not benefit from an increased expected organic quality and traffic, thus
revenue will not increase.
This result sheds more light on the fundamental tension between the search engine and
its visitors in their preference for a high quality organic link. We have already identified
the basic misalignment of incentives in the last part of Proposition 2, but in that case the
traffic to the search engine was exogenously fixed. Here, we show that even though a high
quality organic link makes consumers better off and attracts more traffic, it does not increase
revenues. The intuition is based on how consumers search. In the O-type equilibrium those
with a search cost below the expected benefit from visiting the search engine make the first
(organic) click. However, not all of them make the second (sponsored) click that would
generate revenue as the expected benefit of the second click is always lower than the first
one. Therefore, even though the search engine can attract more visitors by having a higher
expected quality organic link, the extra visitors will not generate revenue. In the S-type
equilibrium all visitors generate revenue, but the promise of a higher organic link will not
attract more visitors, as it only benefits low search cost consumers who visit anyway.
In both cases the misalignment between consumers’ and the search engine’s incentive
is clear. Although a higher quality organic link increases consumer welfare, it reduces the
search engine’s revenue. This phenomenon may explain why large search engines, such as
Google, take a stance against SEO that might potentially improve the quality of search
results.
APPENDIX A. APPENDIX FOR CHAPTER 2 80
Proposition 12.
1. When ρ is a sufficiently low negative correlation, mainstream consumers will start with
the organic link whereas niche consumers will start with the sponsored. A mainstream
organic link is less likely as α increases.
2. When ρ is sufficiently high positive correlation, mainstream consumers will start with
the sponsored link whereas niche consumers will start with the organic. A mainstream
organic link is more likely as α increases.
Proof.
We only need to examine the case when the two sites are of different type. If vH >
1 β
max β , 1−β vL and at least one consumer groups starts with the sponsored link, it is
straightforward to show that the advertiser with the highest valuation acquires the sponsored
link. Therefore, for low values of ρ the sponsored link is likely to be niche, whereas for high
values of ρ it is likely to be mainstream. Due to the error in the SEO process both sites have
a positive chance to get the organic link, therefore niche (mainstream) consumers will start
with the sponsored link in the former (latter) case and mainstream (niche) will start with the
organic link. The total sponsored payment will be vL (1 − β) when the mainstream site gets
the organic link and vL β otherwise. In the case of ρ = 1 the valuations for getting the organic
β
2
We assume vH > max β1 , 1−β vL and r < (1 − β)2 vL . Note that for fixed vL , vH , r these conditions
limit the value of β, therefore the results of this section only hold under a sufficient level of heterogeneity.
APPENDIX A. APPENDIX FOR CHAPTER 2 81
link will be vH (1 − β) + (2β − 1)vL for the mainstream site and vL (1 − β). Since β ≥ 1 − β,
the former is higher and SEO increases the chance of a mainstream organic link when ρ is
close to 1. When ρ = −1, the value of the organic link will be βvL and βvH + (1 − 2β)vL for
the mainstream and niche, respectively. In the assumed parameter range the latter is higher
resulting in a more likely niche organic link when ρ is close to −1.
The results contribute to our understanding of the different types of equilibria in the
previous section. We again identify two types of equilibria; one in which the majority of
consumers (mainstream) start with the organic link and another one in which the majority
of consumers start with the sponsored link. However, the equilibria in this case are unique
in each of the above parameter regions. The first part examines a typical mainstream-
niche scenario, where the mainstream advertiser cannot command as high of a margin as its
niche competitor. The high valuation of the niche firm will ensure its position in the top
sponsored position incentivizing niche customers to start searching on the sponsored side.
The mainstream product will have an advantage on the organic side as the search engine
wants to cater to the majority. However, the niche firm will invest more heavily in SEO
which decreases the likelihood of a mainstream organic link. SEO in this case decreases
consumer welfare,3 but increases the search engine’s revenue.
The second part identifies a more surprising scenario. When the product preferred by
the majority is able to command a higher margin then the majority of consumers start with
the sponsored link. We think of this type of market as one with a minority of consumers
who are well informed about a product category and a majority who are less informed. The
less informed customers can be charged a higher price which the more informed customers
are not willing to pay. Even though the majority of the customers start with sponsored link,
SEO will still increase consumer welfare by increasing the likelihood of a majority preferred
organic link. However, this reduces sites’ incentives to pay for the sponsored link, hurting the
search engine’s revenues. It is interesting to contrast these results with Proposition 2 where
consumers always start with the organic link. What makes this case different is a sufficient
level of consumer heterogeneity that leads some customers to focus on the sponsored link,
inducing more advertiser competition on the sponsored site, which in turn leads to more
consumer attention to the sponsored side. The transition is not continuous and the required
level of heterogeneity depends on the minimum bid r. Since the search engine is clearly
better off under the heterogeneous outcome, it should pay particular attention to setting the
minimum bid.
3
Mainstream consumers are worse off, whereas niche consumers are indifferent.
82
Appendix B
B.1 Proofs
Proof of Proposition 3. To find pA , notice that the profit of the advertiser is (2q A )ρ (1 − 2p).
1 ρ
Since q A ∼ p 2−ρ , we can drop the constants and solve for pA = arg maxp p 2−ρ (1−2p), yielding
1
ρ2 2−ρ
2
pA = ρ4 , and q A = 2
. The second order condition of each agent is:
ρ−2
ρ(ρ − 1) qi + q A pA − 1 < 0 (B.1)
1
2−ρ
ρ2 (ρ−1)
For ρ <= 1 it always holds, while for 1 < ρ < 2 if holds if qi > 4
− q A after
−ρ−1
plugging pA and collecting terms. The right hand side is negative if 2 (ρ − 1) < 1, which
holds for every 1 < ρ < 2
∗ 1
To show that q ∗ > q M , we notice that qqM = 2 2−ρ > 1 for 0 < ρ < 2. Similarly,
1
2−ρ
qM 2
qA
= ρ
> 1 for 0 < ρ < 2, which proves part 1 of the proposition.
To prove part 2, since q M > q A , the total revenue generated by the advertiser x(q1 , q2 ) is
2(q M )2 ρ
always larger under CPM. The share of profit given to the publisher under CPM is (2q M )ρ = 2 .
This is the same share pA given under a CPA contract. As a result, since revenues are strictly
larger and the same share is given, profits under CPM are larger.
To prove part 3, the difference in profit uA − uM of the publisher has a numeric root on
[0, 2] at ρc = 0.618185. The function has a unique extremum in this range at ρ = 0.246608,
which is a local maximum, and the the difference is zero at ρ = 0. Thus, it is positive below
ρc and negative above ρc proving part 3.
Proof of Corollary 3. In the single publisher case, q M = pM similarly to before, and solving
1
1 2 2−ρ
the advertiser optimization problem yields q M = ρ2 2−ρ . Under CPA, q A = ρ2 . We
immediately see that q A > q M ⇐⇒ ρ > 1.
APPENDIX B. APPENDIX FOR CHAPTER 3 83
ρ
The share of revenue given as payment to the publishers equals 2
in both cases. As a
result, when ρ > 1, π A > π M , and vice versa when ρ < 1.
Proof of Proposition 4 and Corollary 4. For completeness, we specify the resulting distribu-
tion function, f1 ( qq12 ):
1 q1 ≥ dq2
d2 q22 −2((d−1)d+1)q1 q2 +q12
−
2(d−1)2 q1 q2
q2 < q1 < dq2
q1 (q −dq )2
q2
f1 ( ) = 2 1
< q1 < q2 (B.2)
q2 2(d−1)2 q1 q2
1
d
q1 = q 2
2
0 q1 ≤ qd2
Proof of Lemma 2. The proof was built for θ ∈ [0, 1] assuming the effectiveness of advertising
is θq. In the text θ = 1.
The total profit from experimenting is:
p
βθ2 (2α + β 2 + β(α + N + 1) + N ) − 2βθ2 α(β + 1)(α + β + N ) + 2αN (α + β + 1)
2(α + β)(α + β + 1)
(B.3)
2
αβθ (α+β+N )
The second order with respect to n is − (α+β)(α+β+1)(α+β+n)3 and is negative for all α >
0,β > 0 and N > 0. At n = 0, the first order is positive when N > β(α+β)(1+α+β)
α
implying
the optimal sample size is positive. q
The solution to the first order condition is n∗ = α(N1+β
+α+β)
−(α+β) which is independent
of θ. Finally, calculation of the change with respect to α, β and N yield the conditions stated
in the lemma.
Proof of Proposition 6. The difference in profit from π min is
p
βθ2 (α + β) α(β + 2) + β 2 + β − 2 α(β + 1)(α + β + N ) + αN
(B.4)
2(α + β)2 (α + β + 1)
qb2 q2
M
u = N (qb s + q(1 − s))p − s − (1 − s) (B.5)
2 2
Maximizing the profit of the publisher yields qb = q = pM . Plugging into the advertiser’s
1−µ
profit and maximizing
h iover the expectation of s yields p = 2 , resulting in an advertiser
2
profit of N µ + (1−µ) 4
.
Performing a similar exercise for a CPA campaign, the publisher will opt not to show ads
to baseline consumers, as it receives commission for their conversions regardless of showing
1−2µ
them ads. Maximizing the publisher’s profit yields qb = 0 and q = pA , which yields p = 2(1−µ)
when plugged into the advertiser’s profit and maximized. This value is higher than 0 only
for µ < 12 , and for µ > 1/2 the advertiser will prefer not to use a CPA campaign. Comparing
q M to q A and q ∗ yields the second part of the proposition. The profit of the advertiser is
1
then N 1−µ which is lower than the CPM profit for any µ < 21 , concluding the proof.
APPENDIX B. APPENDIX FOR CHAPTER 3 85
Proof of Proposition 14. The following lemma from McAfee and McMillan (1991) establishes
conditions for payments offered by the advertiser to maximize its profit, and applies to our
model:
Lemma 6 (McAfee and McMillan (1991) Lemma 1). Suppose the payment functions bi
satisfy
Ex,θ−i [bi (θ)]|θi =0 = 0 (B.6)
and evoke in equilibrium outputs y ∗ (θ). Then the payments maximize the advertiser profits
subject to publisher individual rationality and incentive compatibility.
Theorem 2 of McAfee and McMillan (1991) shows that the payments in (B.17) yield an
optimal mechanism under the conditions that agents are complements in production and
∂y ∗ (θ)
when ∂zi j ≥ 0 for j 6= i. These conditions do not apply in our case when publishers are
substitutes in production.
We therefore proceed to show that the linear mechanism is optimal also under substitute
production. To prove these payments yield an optimal truthful mechanism, we first prove
the following:
Lemma 7. The optimal allocation of efficiency units y ∗ (θ) is unique, positive for positive
effectiveness and increases with self-reported type and decreases with reported types of other
publishers:
∂yi∗
• ∂θi
(θ̂i , θ−i ) ≥0
∂yj∗
• ∂θi
(θ̂i , θ−i ) ≤ 0 for j 6= i
Proof. Let π(y, θ) = x(y) − γ1 (y1 , θ1 ) − γ2 (y2 , θ2 ).
The first order conditions are:
2 − θi yi
1− − y−i = 0 (B.7)
θi3
With solutions:
∗ θi3 θ−i3
+ θ−i − 2
yi = 3 3 (B.8)
θi θ−i − θi θ−i + 2θi + 2θ−i − 4
These are positive for 1 ≥ θi > 0.
The first principal minors of π are zero and the second is positive. As a result π is
negative semidefinitie, and y ∗ is the unique maximum.
Using the implicit function theorem:
∂yi∗ 2(θi − 3)(2 − θ−i )y1
= ≥0 (B.9)
∂θi θi θi3 θ−i
3
− θi θ−i + 2θi + 2θ−i − 4
∂yi∗ 2θ13 (θ2 − 3)y2
= ≤0 (B.10)
∂θ−i θ2 (θ13 θ23 − θ1 θ2 + 2θ1 + 2θ2 − 4)
APPENDIX B. APPENDIX FOR CHAPTER 3 86
θi3 (θ−i3 +θ −2
−i )
The optimal allocation of efficiency units is yi∗ = θ3 θ3 −θ i θ −i +2θi +2θ −i −4
.
i −i
We note that ui |θ̂i =0 = 0 since αi = 0 in this case, which is the first sufficient condition
of Lemma 6.
The publisher will then choose to show yi ads that solve the first and second order
conditions:
∂ui ∂x ∂c
= αi (θ̂i , θ−i ) − =0 (B.13)
∂yi ∂yi ∂y
∂ 2 ui ∂ 2x ∂ 2c ∂ 2c
= α ( θ̂
i i −i, θ ) − = − <0 (B.14)
∂yi2 ∂yi2 ∂y 2 ∂y 2
The publisher will therefore choose ŷi s.t.
∂c
− ∂y (ŷi , θ̂i )
αi (θ̂i , θ−i ) = ∂x
(B.15)
(ŷ , y ∗ (θ̂ , θ ))
∂yi i −i i −i
We note that ŷi |θˆi =θi = yi∗ , which is the second sufficient condition of Lemma 6.
We therefore need to prove that the payments bi are incentive compatible for the pub-
lishers.
∂x ∂c
Denote xi = ∂y i
and cy = ∂y .
Differentiating (B.15) with respect to θ̂i yields:
∂y ∗
∂αi (θ̂i ,θ−i ) cy
∂ ŷi ∂ θˆi
+ x j
x2i ij ∂θi
(θ̂i , θ−i )
= cyy ≥0 (B.16)
∂ θ̂i xi
∂y ∗
This inequality holds since xij ∂θji (θ̂i , θ−i ) ≥ 0 by Lemma 7 and ∂αi (∂θ̂θiˆ,θ−i ) ≥ 0 by Theorem
i
3 of McAfee and McMillan (1991).
The remainder of the proof follows the proof of Theorem 2 in McAfee and McMillan
(1991), p. 574. Using the fact that ∂∂ŷθ̂i ≥ 0 is then sufficient to prove incentive compatibility.
i
∗ ∂q ∗
Proof of Lemma 8. Let qi∗ = yθi . Then ∂θii > 0. As showing any number of ads not in qi∗ will
yield zero revenue with positive costs, the publisher will prefer to show ads in qi∗ only.
Since qi∗ and yi∗ are both monotonically increasing in θi , choosing to show the optimal
number of ads such that θ̂i = θi is an equilibrium strategy for the publisher.
APPENDIX B. APPENDIX FOR CHAPTER 3 87
Finally, bi (x, q) is well defined. Suppose there are θ11 6= θ12 s.t. qi∗ (θ11 ) = qi∗ (θ12 ) yet
bi (x, θ11 ) > bi (x, θ12 ). Then because the utility of the publisher increases with the payment,
the publisher would prefer to claim its type is θ11 when its true type is θ12 . This contradicts
the truthfulness of the direct mechanism. Hence bi (x, θ11 ) = bi (x, θ12 ).
Solving for the decision of the publishers and the advertiser under CPM and CPA con-
tracts yields the following results:
• Under a CPA contract, if θ < 1, the advertiser will contract only with publisher one.
If θ > 1, the advertiser will only contract with publisher two.
We observe that asymmetry of the publishers creates starkly different incentives for the
advertiser and the publishers. Under CPM campaigns having more effective publishers in the
campaign increases the price offered by the advertiser to all publishers. As a result publisher
one will benefit when a better publisher joins the campaign yet will suffer when a worse one
joins.
Performance based campaigns using CPA, in contrast, make the advertiser exclude the
worst performing publisher from showing ads. The intuition is that because conversions are
generated by symmetric “production” input units of both publishers, the advertiser may just
as well buy all of the input from the publisher who has the lowest cost of providing them.
The only case when it is optimal for the advertiser to make use of both publishers is when
θ = 1 and they are symmetric.
1
It should be noted that this specification is equivalent to specifying the costs as being equal while the
conversion function being x(q1 , ζq2 ) for some value ζ.
APPENDIX B. APPENDIX FOR CHAPTER 3 88
Using a single publisher is significantly less efficient when two are available to the pub-
lisher. Adding an attribution process creates an opportunity for this shut-out publisher to
compensate for its lower effectiveness with effort. The resulting asymmetric equilibrium is
currently under investigation to understand the ramifications of the attribution process on
such a campaign.
To understand the intuition behind the definition of these payment functions, we first
note that y ∗ finds the optimal allocation of efficiency units the advertiser would like to employ
if the cost of advertising amounted to the virtual cost. The advertiser needs to consider the
virtual cost since the mechanism is required to incentivize high effectiveness publishers to
truthfully report their type and not try to impersonate lower effectiveness publishers. The
advertiser then calculates a desired performance level for the campaign given publishers’
reports, and pays publishers only if the output matches or exceeds this level.
The payment gives each publisher a share αi of x − x(y ∗ (θ)), which is the difference in
performance from the expected optimal output given the publishers reports. In addition,
the publisher is paid the expected optimal costs of showing ads c(y ∗ (θ), θi ) corrected for the
expected information rent.
We can now prove the following result that shows the payments in (B.17) yield the
optimal result for the advertiser:
2 ∂y ∗ (θ)
Proposition 14. When ∂y∂ i yxj (y ∗ (θ)) · ∂θ
j
i
≥ 0 the payments in (B.17) yield optimal profit
to the advertiser. They are incentive compatible and individually rational for publishers. In
equilibrium, publishers will choose to generate y ∗ (θ) efficiency units.
Proposition 14 shows that when publishers are substitutes in production and when the
equilibrium allocations are substitutes, then the linear contract is optimal, extending McAfee
and McMillan (1991) for the case of substitution among the publishers.
The intuition behind this result is subtle. When the other publishers −i are of higher
effectiveness they will produce more output in equilibrium. The resulting externality on
publisher i’s profit will then be stronger and as a result it will decide to increase its own
output to compensate and redeem its share of the profits. In equilibrium these effects cause
the publisher to increase its output with its type, which is a result similar to the standard
monotonicity result in single agent mechanism design.
The optimal mechanism allows the advertiser to efficiently screen among publishers at
the cost of giving positive rent to very effective publishers. The payment scheme is built as a
sum of two separate payments: payment for performance and payment for effort of displaying
ads. Using the ratio of the marginal cost to the marginal productivity of the publisher as
the share of performance given to the publisher, the advertiser is able to align the incentives
of the publisher at the margin. In equilibrium the most effective publishers will show the
full information (first-best) number of ads, but will also receive the highest share of profit
from the advertiser. The publishers with the lowest effectiveness will be excluded from the
campaign, and will receive zero profits. An interesting aspect of this payment scheme is that
publishers receive less expected rent when compared to the two standard CPA and CPM
APPENDIX B. APPENDIX FOR CHAPTER 3 90
schemes, which is a result of using a combination of reported types and observed performance
of the campaign.
The optimal mechanism yields improved results compared to standard compensation
schemes at the cost of requiring the assumption that publishers can report their effectiveness.
In an advertising campaign, however, publishers can only choose the effort they spend in
terms of number of ads they show. To remedy this technical issue we employ the taxation
principle to transform the direct mechanism into an indirect mechanism where publishers
choose the number of ads to show and the output they will achieve. Based on the observed
performance and effort, the advertiser will pay bi in the following way:
Lemma 8. Let bi (x, q) = bi (x, θ̂) when q = (yi∗ (θ̂)/θ̂i ) and bi = 0 otherwise. Then bi (x, q)
yields the same equilibrium as the mechanism in (B.17).
Although this result is standard in mechanism design, it typically requires the assumption
that publishers can be punished with arbitrary severity when they do not choose output
that the advertiser would prefer. We are able to show that in our case, not paying anything
sufficient for the mechanism to still be a truthful equilibrium.
The caveat, however, is that the resulting mechanism is highly non-linear in the effort of
publishers. The monotonicity of publisher effort also does not hold for many specifications,
which gives rise to multiple equilibria of the indirect mechanism. Another issue that arises is
the effect of the baseline conversion rate of consumers. This was not considered previously
and will prove to be detrimental to these mechanisms.
As it is highly unlikely that advertisers can implement such a mechanism in the reality,
we choose to develop a simpler mechanism that holds potential for achieving profits that are
closer to the full information (first-best) profits.