Attribution Models
Attribution Models
Xuhui Shao
Lexin Li
Turn, Inc.
835 Main St.
Redwood City, CA 94063
Department of Statistics
North Carolina State University
Raleigh, NC 27695
ABSTRACT
In digital advertising, attribution is the problem of assigning credit to one or more advertisements for driving the user
to the desirable actions such as making a purchase. Rather
than giving all the credit to the last ad a user sees, multitouch attribution allows more than one ads to get the credit
based on their corresponding contributions. Multi-touch attribution is one of the most important problems in digital
advertising, especially when multiple media channels, such
as search, display, social, mobile and video are involved. Due
to the lack of statistical framework and a viable modeling approach, true data-driven methodology does not exist today
in the industry. While predictive modeling has been thoroughly researched in recent years in the digital advertising
domain, the attribution problem focuses more on accurate
and stable interpretation of the influence of each user interaction to the final user decision rather than just user classification. Traditional classification models fail to achieve
those goals.
In this paper, we first propose a bivariate metric, one measures the variability of the estimate, and the other measures
the accuracy of classifying the positive and negative users.
We then develop a bagged logistic regression model, which
we show achieves a comparable classification accuracy as a
usual logistic regression, but a much more stable estimate of
individual advertising channel contributions. We also propose an intuitive and simple probabilistic model to directly
quantify the attribution of different advertising channels.
We then apply both the bagged logistic model and the probabilistic model to a real-world data set from a multi-channel
advertising campaign for a well-known consumer software
and services brand. The two models produce consistent general conclusions and thus offer useful cross-validation. The
General Terms
Algorithms, Performance, Theory
Keywords
Digital Advertising, Multi-touch Attribution Model, Bagged
Logistic Regression
1.
INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prof t or commercial advantage and that copies
bear this notice and the full citation on the f rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specif c
permission and/or a fee.
KDD11, August 2124, 2011, San Diego, California, USA.
Copyright 2011 ACM 978-1-4503-0813-7/11/08 ...$10.00.
258
259
2.
A BIVARIATE METRIC
It is always of interest to identify if a user is to make a
3.
3.1
3.2
There have been intensive research on classification modeling in the literature. Some well known examples include
support vector machines [8], neural networks [10], and other
unique methods designed for online advertising in [6] and
[9]. See [7], [10] and [11] for a good review. Most of those
methods generate a complex model, some of which are of
a black-box type. The resulting classification boundary is
rather flexible, so it can achieve a competent classification
accuracy. However, in attribution modeling, it is more of a
concern to obtain a model that is stable and relatively easy
to interpret, so that advertisers can develop a clear strat-
260
Npositive (xi )
Npositive (xi ) + Nnegative (xi )
(1)
4.
Npositive (xi , xj )
, (2)
Npositive (xi , xj ) + Nnegative (xi , xj )
4.1
for i 6= j. Here y is a binary outcome variable denoting a conversion event (purchase or sign-up), and
xi , i = 1, . . . , p, denote p different advertising channels.
Npositive (xi ) and Npositive (xi ) denote the number of
positive or negative users exposed to channel i, respectively, and Npositive (xi , xj ) and Nnegative (xi , xj )
denote the number of positive or negative users exposed to both channels i and j.
NUMERICAL ANALYSIS
Data Background
4.2
261
Table 1: Comparison of the bagged logistic regression (BLR) and the usual logistic regression (LR) in terms
of the V-A-metric.
0.25
ps
0.50
0.75
LR
BLR
LR
BLR
LR
BLR
0.25
V-metric A-metric
2.053
0.091
0.257
0.142
1.913
0.091
0.284
0.147
1.868
0.091
0.327
0.147
pc
0.50
V-metric A-metric
1.934
0.091
0.688
0.093
2.115
0.091
0.672
0.093
2.053
0.091
0.743
0.093
4.3
0.75
V-metric A-metric
2.006
0.091
0.824
0.091
1.972
0.091
1.039
0.091
1.968
0.091
1.294
0.091
4.4
We presented the user-level attribution analysis to the advertising team. Some interesting observations were made
when comparing the MTA model with advertisers existing
LTA model. The comparison is show in Table 2 for a subset
of channels that are of particular interests to the advertising
team. As seen from the table, for search click, email click,
retail email click and social click, MTA and LTA get very
similar numbers. Essentially these types of user initiated
responses are both: highly correlated to the final purchase
decision; and temporally occurring very close to the purchase decision.
On the other hand, the effectiveness of display ad networks are widely different. Overall, display ads (or banner
ads) are undervalued by the LTA model since these ad impressions are usually further away in time from the purchase
action than, say, search click. In addition, some ad networks
(for example, Network G) are doing much better and some
(for example, Network A) are doing much worse. This may
be attributed to a trick some ad networks play in gaming
the LTA model. It is called cookie bombing where large
amount of low-cost almost invisible ads are shown to large
amount of users. While these impressions do not have much
real influence on users decision, they appear quite often
as the last ad impression user sees and therefore gets the
credit from LTA model.
262
5000
15000
Channel
Search Click
Email Click
Display Network A
Display Network G
Display Network B
Display Trading Desk
Display Network C
Display Network D
Email View
Display Network E
Brand Campaign
Social
Display Network H
Display Network F
Display Network I
Retail Email Click
Display Network J
Retail Email
Social Click
Video
V1 V3 V5 V7 V9 V11 V13 V15 V17 V19 V21 V23 V25 V27 V29 V31 V33 V35 V37 V39
15000
LTA Total
17,017
7,340
8,148
470
1,272
1,367
1,373
1,233
458
1,138
1,581
1,123
284
787
136
491
92
110
153
31
Difference
97%
106%
146%
23%
70%
87%
92%
83%
32%
96%
174%
146%
38%
117%
28%
102%
41%
66%
115%
54%
5000
V1 V3 V5 V7 V9 V11 V13 V15 V17 V19 V21 V23 V25 V27 V29 V31 V33 V35 V37 V39
5.
MTA Total
17,494
6,938
5,567
2,037
1,818
1,565
1,494
1,491
1,420
1,187
907
768
746
673
489
483
222
168
133
58
DISCUSSION
1. Select the right dimensions to model on. Introducing unnecessary dimensions would introduce noise and
make results difficult to interpret.
In this article we proposed two statistical multi-touch attribution models. We also proposed a bivariate metric that
can be used to evaluate and select a data-driven MTA model.
We consider the main body of this work falls under descriptive or interpretive modeling, a field that has been largely
ignored in comparison to predictive modeling. For digital
advertising, having the right attribution model is critically
important as it drives performance metric, advertising in-
2. Control the dimensionality and cardinality. Higher dimensionality and cardinality would either significantly
increase the amount of data needed for statistical significance or drown out the important conclusions.
263
6.
REFERENCES
264