0% found this document useful (0 votes)

579 views

Attribution Models

The document discusses data-driven multi-touch attribution models for digital advertising. It introduces the problem of attribution in digital advertising and limitations of previous approaches. The authors propose a new evaluation metric and develop a bagged logistic regression model and probabilistic model to provide stable, accurate and interpretable attribution results based on campaign data.

Uploaded by

jade1986

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

579 views

Attribution Models

Uploaded by

jade1986

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Data-driven Multi-touch Attribution Models

Xuhui Shao

Lexin Li

Turn, Inc.
835 Main St.
Redwood City, CA 94063

Department of Statistics
North Carolina State University
Raleigh, NC 27695

[email protected]

ABSTRACT

results of our attribution models also shed several important

insights that have been validated by the advertising team.
We have implemented the probabilistic model in the production advertising platform of the first authors company,
and plan to implement the bagged logistic regression in the
next product release. We believe availability of such datadriven multi-touch attribution metric and models is a breakthrough in the digital advertising industry.

In digital advertising, attribution is the problem of assigning credit to one or more advertisements for driving the user
to the desirable actions such as making a purchase. Rather
than giving all the credit to the last ad a user sees, multitouch attribution allows more than one ads to get the credit
based on their corresponding contributions. Multi-touch attribution is one of the most important problems in digital
advertising, especially when multiple media channels, such
as search, display, social, mobile and video are involved. Due
to the lack of statistical framework and a viable modeling approach, true data-driven methodology does not exist today
in the industry. While predictive modeling has been thoroughly researched in recent years in the digital advertising
domain, the attribution problem focuses more on accurate
and stable interpretation of the influence of each user interaction to the final user decision rather than just user classification. Traditional classification models fail to achieve
those goals.
In this paper, we first propose a bivariate metric, one measures the variability of the estimate, and the other measures
the accuracy of classifying the positive and negative users.
We then develop a bagged logistic regression model, which
we show achieves a comparable classification accuracy as a
usual logistic regression, but a much more stable estimate of
individual advertising channel contributions. We also propose an intuitive and simple probabilistic model to directly
quantify the attribution of different advertising channels.
We then apply both the bagged logistic model and the probabilistic model to a real-world data set from a multi-channel
advertising campaign for a well-known consumer software
and services brand. The two models produce consistent general conclusions and thus offer useful cross-validation. The

Categories and Subject Descriptors

I.6.5 [Computing Methodologies]: Simulation and Modeling, Model Development

General Terms
Algorithms, Performance, Theory

Keywords
Digital Advertising, Multi-touch Attribution Model, Bagged
Logistic Regression

INTRODUCTION

Digital advertising started 16 years ago as a new media

where traditional print ads can appear [1]. When internet
continues to grow with an exploding rate, advertising industry embraced digital advertising and has made it a $40 Billion a year mega industry in US alone. Digital advertisings
appeal is not only in its ability to precisely target different
groups of consumers with customized ad messages and ad
placements, but probably more importantly in its ability to
track responses and performances almost instantaneously.
Advertising campaigns are often launched across multiple
channels. Traditional advertising channels include outdoor
billboard, TV, radio, newspapers and magazines, and direct mailing. Digital advertising channels include search,
online display, social, video, mobile and email. In this article, we focus on the digital advertising channels. Typically
multiple advertising channels have delivered advertisement
impressions to a user. When the user then makes a purchase decision or signs up to a service being advertised, the
advertiser wants to determine which ads have contributed
to the users decision. This step is critical in completing the
feedback loop so that one can analyze, report and optimize
an advertising campaign. This problem of interpreting the
influence of advertisements to the users decision process is
called the attribution problem.

Xuhui Shao is Chief Technology Officer, Turn, Inc.

Lexin Li is the corresponding author and Associate Professor, Department of Statistics, North Carolina State University.

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prof t or commercial advantage and that copies
bear this notice and the full citation on the f rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specif c
permission and/or a fee.
KDD11, August 2124, 2011, San Diego, California, USA.
Copyright 2011 ACM 978-1-4503-0813-7/11/08 ...$10.00.

258

Figure 1: An illustration of multi-touch attribution

problem.

The goal of attribute modeling is to pin-point the credit

assignment of each positive user to one or more advertising
touch point, which is illustrated in Figure 1. The resulting
user-level assignment can be aggregated along different dimensions including media channel to derive overall insights.
Attribution modeling is not to be confused with marketing mix modeling (MMM), which is limited to the temporal
analysis of marketing channels and can not perform any inference at the user level or any dimensions other than marketing channel.
To determine which media channel or which ad is to be
credited, initially a simple rule was developed and quickly
adopted by the online advertising industry: The last ad the
user clicked on before he made the purchase or sign up decision, or say, conversion, gets 100% of the credit. This lastclick win model was extended to include last-view win if
none of the ads was clicked within a reasonable time window before user conversion. We call both these two models last-touch attribution (LTA), where touch or touch
point is defined to be any ad impression, click or advertising
related interaction the user has experienced from the advertiser. The last-touch attribution model is simple. However,
it completely ignores the influences of all ad impressions except the last one. It is a highly flawed model as pointed out
by [2].
Alternatively, the concept of multi-touch attribution (MTA)
model has been recently proposed, where more than one
touch point can each have a fraction of the credit based on
the true influence each touch point has on the outcome, i.e.,
users conversion decision. Atlas institute, a division of Microsoft Advertising first proposed the notion of MTA [2].
However, in that paper and other related research from Microsoft Atlas, there is no proposal for how to assign the percentage of credit statistically based on the campaign data.
Clearsaleing is a consulting company specialized in attribution analysis, whose attribution model assigns equal fraction of credits to the first and the last touch point, and
collectively all the touch points in between [3]. While a datadriven custom model is described as available upon request,
the methodology of the custom model is not publicized.
Another company, C3 Metric, also offers a rule-based MTA
model [4]. But like [3], their model assigns credit to certain
touch points simply based on the temporal order of touch
points and with fixed percentages. In our opinion, because
users decision process is largely dependent on the adver-

259

tiser, the product offer, and how advertising messages and

creative design are structured, a desirable attribution model
should be campaign-specific and be driven by a solid statistical analysis of user response data.
In addition to the lack of a true data-driven MTA model,
a good metric to evaluate different MTA models is not available either. Intuitively, a good MTA model should have a
high degree of accuracy in correctly classifying a user as
positive (with a conversion action) or negative (without a
conversion action). Equally or more important in digital advertising is that, a good MTA model should provide a stable
estimation of individual variables (for example, media channel) contribution. Unlike predictive models, the stability of
the estimation is especially important here because attribution model determines the performance metric for the ad
campaign. Every advertising company and every advertising tactic ultimately are judged by the performance metric
set forth in the attribution model. Having stable and reproducible result is by definition what a performance metric
needs to be. Ideally the attribution model should be easy to
interpret as the results of attribution analysis are often used
to derive insights to the ad campaign and its optimization
strategy.
Although in recent years predictive modeling has been
thoroughly researched in the digital advertising domain, for
example in [5] and [6], the focus has been on the classification accuracy. The resulting models, many generated from
a black-box type predictive approach, are very hard to interpret. Furthermore, little attention has been paid to the
stability issue of the variable contribution estimate. There
is also the problem of variable correlation when one tries to
interpret the model coefficients directly, which was discussed
in section 4.4.2 of [7].
In this paper, we first propose a new bivariate metric.
One component of this metric measures the variability of
the estimate, and the other measures the accuracy of classifying the positive and negative users. We then develop a
bagged logistic regression model, which we show achieves a
comparable classification accuracy as a usual logistic regression, but a much more stable estimate of individual variable contributions. We also propose a simple and intuitive
probabilistic model to compute the attribution of different
variables based on a combination of first and second order
conditional probabilities. We evaluate both models using the
proposed bivariate metric, and find the two generate consistent results. We then analyze a large advertising campaign
data set, which has 72.5 million anonymized users with over
2 billion ad impressions coming from search, display, social,
email and video channels over a four-week period. As for
implementation, the probabilistic model has been deployed
in the production advertising system of the first authors
company. The bagged logistic regression model is currently
being developed for future product release in the production
system.
The rest of the paper is organized as follows. We present
the bivariate metric in Section 2, and the two data-driven
multi-touch attribution models in Section 3. We evaluate
the empirical performance the proposed models in Section
4. We conclude the paper with a discussion.

A BIVARIATE METRIC
It is always of interest to identify if a user is to make a

purchase or sign up for a service based on his exposure to

various advertisement channels. This is a typical classification problem, where the outcome is binary, with positive
meaning a user is to make a purchase action and negative
meaning otherwise, and the covariates are the number of
touch points of different channels. Towards that end, we
employ the usual misclassification error rate as part of an
evaluation metric for an MTA model.
On the other hand, human behavior is complex and the
user data are highly correlated. As a consequence, a simple MTA model, e.g., a usual logistic regression, could have
highly variable estimate which would make the model difficult to interpret. In addition, the high collinearity in attributes also causes strong variables to suppress weaker, correlated variables as described in Section 4.4.2 of [7]. Therefore we aim to capture the variability of an MTA model in
our model evaluation metric. Towards that goal, we employ
the notion of standard deviation and also take advantage of
the fact that the advertising campaign data almost always
have a large number of users.
More specifically, we first obtain a random subset of samples of both positive and negative users as a training data
set, then another random subset as a testing data set. To
avoid having too few positive users in the samples, we fix
the ratio of positive versus negative users. In our numerical
analysis, we have experimented this ratio with 1 : 1 and 1 : 4
and the two yield very similar results. For brevity we only
report the results based on 1 : 4 ratio below. We then fit
an MTA model to the training data. We record the contribution of each advertisement channel, i.e., the coefficient
estimate, from the fitted MTA model. We also evaluate the
fitted model on the independent testing data and record the
misclassification error rate.
We then repeat the above process multiple times in order
to compute the standard deviation of individual coefficient
estimates across multiple repetitions. We report the average of all standard deviations across different channels as
the variability measure (V-metric), and the average of misclassification error rates across data repetitions as the accuracy measure (A-metric). We evaluate an MTA model based
upon the bivariate metric of both the variability and the accuracy (the V-A-metric). A small A-metric indicates that
the model under investigation has a high accuracy of predicting the active or inactive user, while a small V-metric
indicates that the model has a stable estimate. Ideally a
good MTA model should have both metrics small.

3.
3.1

egy to optimize their resource allocations and optimization

among multiple advertising channels.
The bagging approach as a meta learning method was
first proposed in [12]. One of the most popular bagged approaches is random forest [13] where decision tree models are
stacked to increase performance and robustness. Bagged logistic regression is not of much interest in terms of predictive
modeling, since it is more productive to combine nonlinear
models in order to increase the prediction accuracy. It has
been shown to be outperformed by the tree-based method
[14]. On the other hand, the bagging approach possesses the
ability to isolate variable collinearity, as discussed in Section
15.4.1 of [7].
In our context of attribution modeling, we combine the
commonly used logistic regression, which is simple and easy
to interpret, and the bagging idea, which is to help reduce
the estimation variability due to the highly correlated covariates. This results in the bagged logistic regression, which
retains the ease of interpretation of a simple logistic model,
whereas achieving a stable and reproducible estimation result. More specifically, the bagged logistic regression is fitted
using the following steps.
Step 1. For a given data set, sample a proportion ps of
all the sample observations and a proportion pc of all
the covariates. Fit a logistic regression model on the
sampled covariates and the sampled data. Record the
estimated coefficients.
Step 2. Repeat Step 1 for M iterations, and the final
coefficient estimate for each covariate is taken as the
average of estimated coefficients in M iterations.
The sample proportion ps , the covariate proportion pc ,
and the number of iterations M are the parameters of the
bagged logistic regression. We will examine their choices in
detail in Section 4. Our observations are that, for a range
of values of ps and pc that are not close to either 0 or 1, the
bagged logistic regression yield similar results. Besides, the
results are not overly sensitive to the choice of M . When
evaluating the model using the proposed V-A-metric, we find
that, the bagged logistic regression achieves a very similar
misclassification rate (A-metric) but enjoys a much smaller
variability (V-metric) compared to a usual logistic regression, which is desirable for attribution modeling.

3.2

MULTI-TOUCH ATTRIBUTION MODELS

A Bagged Logistic Regression

There have been intensive research on classification modeling in the literature. Some well known examples include
support vector machines [8], neural networks [10], and other
unique methods designed for online advertising in [6] and
[9]. See [7], [10] and [11] for a good review. Most of those
methods generate a complex model, some of which are of
a black-box type. The resulting classification boundary is
rather flexible, so it can achieve a competent classification
accuracy. However, in attribution modeling, it is more of a
concern to obtain a model that is stable and relatively easy
to interpret, so that advertisers can develop a clear strat-

A Simple Probabilistic Model

In addition to the bagged logistic regression model, we

also develop a probabilistic model based on a combination
of first and second-order conditional probabilities. This new
model is even simpler than a logistic model. Such a model
simplicity translates into both low estimation variability and
ease of interpretation, meanwhile it trades off accuracy. As
such, compared to the bagged logistic model, we expect the
new model would achieve a smaller V-metric but a larger Ametric. Our numerical analysis confirms this expectation.
The probabilistic model is generated using the following
steps:
Step 1. For a given data set, compute the empirical
probability of the main factors,
P (y|xi ) =

260

Npositive (xi )
Npositive (xi ) + Nnegative (xi )

(1)

and the pair-wise conditional probabilities

P (y|xi , xj ) =

Npositive (xi , xj )
, (2)
Npositive (xi , xj ) + Nnegative (xi , xj )

4.1

for i 6= j. Here y is a binary outcome variable denoting a conversion event (purchase or sign-up), and
xi , i = 1, . . . , p, denote p different advertising channels.
Npositive (xi ) and Npositive (xi ) denote the number of
positive or negative users exposed to channel i, respectively, and Npositive (xi , xj ) and Nnegative (xi , xj )
denote the number of positive or negative users exposed to both channels i and j.

NUMERICAL ANALYSIS
Data Background

In this section, we analyze a large advertising campaign

data set using both proposed methods. This is a 2010 advertising campaign of a consumer software and services company. The campaign ran over a four week period. The size
of the data set is over 300GB compressed. We sampled onethird, i.e., 72.5 million anonymous users. In total these 72.5
million users received over 2 billion ad impressions coming
from search, display, social, email and video channels over
a four-week period. Because search advertising is priced as
pay-per-click model, only search clicks are reported for each
user. Furthermore, more than a dozen advertising networks
or equivalent media buying channels are involved in delivering identically designed advertisements. In our study, there
are 39 channels in total. It is an unresolved but critically important problem for the advertiser to determine the true effectiveness of each media buying channels. This attribution
analysis is not only important for ranking the effectiveness
of the channels, but also in deriving insights so that different
optimization tactic can be deployed under different circumstances. We apply the bagged logistic regression model and
the simple probabilistic model to analyze this data.

Step 2. The contribution of channel i is then computed

at each positive user level as:
1 Xn
C(xi ) = p(y|xi ) +
p(y|xi , xj )
2Nj6=i
j6=i
o
p(y|xi ) p(y|xj ) ,
(3)
where Nj6=i denotes the total number of js not equal
to i. In this case it equals to N-1, or the total number of channels minus one (the channel i itself) for a
particular user.
The model is essentially a second-order probability estimation. Due to the similarly designed advertising messages
and users exposure to multiple media channels, there are
a fair amount of overlapping between the influences of different touch points. Therefore it is critically important to
include the second-order interaction terms in the probability
model. Theoretically we can go to the third-order, fourthorder interactions or higher. However, the number of observations with the same third-order interaction drops significantly for even a data set as large as the one analyzed
in Section 4. Therefore, it is of little practical use to attempt to estimate the empirical probability with the third
or higher order. Furthermore, we make an important assumption in the probability model in that the net effect of
the second-order interaction goes evenly to each of the two
factors involved. Based on the Occams Razor principle, we
feel this is the minimal assumption we need to make without
any data evidence to suggest otherwise. Focusing on the first
and second-order terms also helps to reduce any assumption
to a minimum for example, trying to split the effect in the
third-order interactions can be more hazardous than in the
second-order interactions.
In Section 4, we will employ both the bagged logistic regression model and the probabilistic model to analyze the
same advertising campaign data set for overall attribution
results across all main media channels. Our results show
that, while there are small differences, the general conclusion is consistent between the two models.
The reason that we consider more than one model is the
following. Digital advertising relies on a fair amount of subjectivity. Having two different modeling approaches give advertiser the flexibility to choose. The bagged logistic regression model is more accurate and more flexible with a larger
number of covariates. It is slightly more difficult to interpret.
On the other hand, the probabilistic model is less accurate
but much more intuitive to interpret. In addition, the result
from both models can cross-validate the general conclusion
reached in the overall advertising campaign analysis.

4.2

Bagged Logistic Regression Analysis

In this section we examine the empirical performance of

the bagged logistic regression model, and compare with the
usual logistic regression using the V-A-metric. In addition,
we also examine the choice of the tuning parameters in the
bagged logistic regression. The simulation setup is based
upon the following scheme.
Step 1. Randomly sample a subset of N users as the
training data. We choose N = 50, 000, and the ratio
between the active and inactive users is 1 : 4. (The
results for the ratio of 1 : 1 are very similar, so are
omitted for brevity.) This leads to 10,000 randomly
selected active users and 40,000 inactive users.
Step 2. Randomly sample another independent subset
of N users as the testing data.
Step 3. Fit the bagged logistic regression to the training data, with the pre-specified sample proportion ps
and the covariate proportion pc , and obtain the coefficient estimate.
Step 4. Fit the usual logistic regression to the training
data, and obtain the coefficient estimate.
Step 5. Evaluate the misclassification error rate of
both regression models on the testing data.
Step 6. Repeat Steps 1 to 5 for S = 100 times. Compute the V-A-metric for both regression models. Because each sampling is random, all data have chance
of being selected as training or testing data.
We set the sample proportion as ps = 0.25, 0.5 and 0.75,
and the covariate proportion as pc = 0.25, 0.5 and 0.75, respectively. Table 1 reports the results. It is seen from the
table that, when ps and pc are both close to zero, the bagged

261

Table 1: Comparison of the bagged logistic regression (BLR) and the usual logistic regression (LR) in terms
of the V-A-metric.

0.25
ps

0.50
0.75

LR
BLR
LR
BLR
LR
BLR

0.25
V-metric A-metric
2.053
0.091
0.257
0.142
1.913
0.091
0.284
0.147
1.868
0.091
0.327
0.147

pc
0.50
V-metric A-metric
1.934
0.091
0.688
0.093
2.115
0.091
0.672
0.093
2.053
0.091
0.743
0.093

logistic model achieves a substantially smaller V-metric but

also a worse A-metric compared to the usual logistic model.
When ps and pc take some value in the middle range of zero
and one, e.g., when ps = 0.5 and pc = 0.5, we clearly see
that the bagged model achieves a variability measure that is
much smaller than the variability of the usual logistic model,
whereas the accuracy measure of the two models become almost identical. As ps and pc increase closer to one, the
bagged model exhibits a A-metric that is essentially identical to that of the usual logistic model, but with a lower
V-metric. As such we recommend to choose ps and pc to
take values around 0.5 if both the variability and the accuracy are of the concern. For the number of iterations M ,
we have experimented with a number of values and observe
the same qualitative patterns. For brevity, we only report
in Table 1 the results based on M = 1000 iterations. We
also note that the V-metric for the usual logistic regression
varies a little although it does not depend on the varying
parameters ps and pc . This is due to the random sampling
variation, which to some extent reflects how variable the
usual logistic model can be for the advertising data even
a random subset of samples would cause visible estimation
variation.

4.3

0.75
V-metric A-metric
2.006
0.091
0.824
0.091
1.972
0.091
1.039
0.091
1.968
0.091
1.294
0.091

for the two models in Figure 2. First, we observe that the

two models yield very similar patterns, suggesting a good
agreement of the two models. Second, the bagged logistic
regression model exhibits a relatively low variability across
data re-sampling, whereas the simple probabilistic model
shows a even smaller variability due to its model simplicity.
We also comment that, for ease of comparison, we choose
the simplest feature construction scheme for all the models, i.e., we only encode the presence of each channel as a
binary variable. The actual model can take on more complex features such as the creative design, web-site category,
time of advertisement, frequency of the users exposure to
the same ad, among others. While the scaling constants are
different, both proposed models have a computation complexity of O(p2 N ), where p is the number of dimensions and
N is the data sample size. These additional variables have
been implemented in the production environment of the first
authors company with the help of a cluster of multi-core
Linux servers. The general conclusion reached in this paper
extends well to those more complex models.

4.4

Interpretation of the Results

We presented the user-level attribution analysis to the advertising team. Some interesting observations were made
when comparing the MTA model with advertisers existing
LTA model. The comparison is show in Table 2 for a subset
of channels that are of particular interests to the advertising
team. As seen from the table, for search click, email click,
retail email click and social click, MTA and LTA get very
similar numbers. Essentially these types of user initiated
responses are both: highly correlated to the final purchase
decision; and temporally occurring very close to the purchase decision.
On the other hand, the effectiveness of display ad networks are widely different. Overall, display ads (or banner
ads) are undervalued by the LTA model since these ad impressions are usually further away in time from the purchase
action than, say, search click. In addition, some ad networks
(for example, Network G) are doing much better and some
(for example, Network A) are doing much worse. This may
be attributed to a trick some ad networks play in gaming
the LTA model. It is called cookie bombing where large
amount of low-cost almost invisible ads are shown to large
amount of users. While these impressions do not have much
real influence on users decision, they appear quite often
as the last ad impression user sees and therefore gets the
credit from LTA model.

Probabilistic Model Analysis

We next apply the simple probabilistic model to the same

data set, and we evaluate the model with the V-A-metric.
The resulting V-metric is 0.026, whereas the A-metric is
0.115. Comparing with the results in Table 1, we see that
the probabilistic model achieves a very low variability due
to its deterministic logic and simple model structure. On
the other hand, its misclassification rate is higher than the
bagged logistic model, which again is intuitively attributable
to the low model complexity. These observations reflect the
well known bias-variance tradeoff. Although more complicated models, e.g., a higher order probabilistic model, could
improve estimation accuracy, it would also induce higher
variation. Besides, higher order models are often computationally infeasible for ad data of such a scale.
We also compare the bagged logistic regression model and
the simple probabilistic model in terms of MTA user-level
assignment. For the bagged logistic model, we take the linear term xi as the contribution of the channel i, where
denotes the coefficient estimate based on the bagged model.
For the simple probabilistic model, we use equation (3) to
compute user-level assignment for each channel. We resample the data S = 100 times, and show the box plot

262

Table 2: The MTA user-level attribution analysis.

bagged logistic regression model

5000

15000

Channel
Search Click
Email Click
Display Network A
Display Network G
Display Network B
Display Trading Desk
Display Network C
Display Network D
Email View
Display Network E
Brand Campaign
Social
Display Network H
Display Network F
Display Network I
Retail Email Click
Display Network J
Retail Email
Social Click
Video

V1 V3 V5 V7 V9 V11 V13 V15 V17 V19 V21 V23 V25 V27 V29 V31 V33 V35 V37 V39

15000

simple probabilistic model

LTA Total
17,017
7,340
8,148
470
1,272
1,367
1,373
1,233
458
1,138
1,581
1,123
284
787
136
491
92
110
153
31

Difference
97%
106%
146%
23%
70%
87%
92%
83%
32%
96%
174%
146%
38%
117%
28%
102%
41%
66%
115%
54%

5000

sights and optimization strategy. We believe our work makes

some useful and unique contribution in this field.
Current state-of-the-art attribution models are represented
by [2], [3] and [4]. Comparing to our proposed models, none
of the existing publicized models are statistically derived
from the advertising data in question. To apply those models, one needs to either rely on some universal rule that
would result in identical assignment regardless of advertisers or user context ([3] and [4]), or one needs to come up
with some subjective assignment rule oneself based on human intuition. By contrast, our methods are data-driven
and are based upon the most relevant advertising data, and
as such are believed to be more accurate and objective.
The probabilistic model is currently deployed in the production environment of the first authors company. It is the
industrys first data-driven multi-touch attribution model
commercially available to the best of our knowledge. Because of this, at the time of this writing, a number of top-5
media holding companies and several Fortune 100 advertisers have signed up to test this MTA model. We are also
planning to develop and deploy the bagged logistic regression model as a follow-up version so that advertisers can
choose either model to focus more on accuracy or more on
interpretation.
While we believe both methods are statistically sound,
to make MTA models useful for digital advertising requires
additional heuristics in the following areas:

V1 V3 V5 V7 V9 V11 V13 V15 V17 V19 V21 V23 V25 V27 V29 V31 V33 V35 V37 V39

Figure 2: MTA user-level assignment for the bagged

logistic regression model and the simple probabilistic model.
Our models provided some important insights that helped
the advertiser to gauge the true effectiveness of each media
channel and root out those gaming tactics. By this change
alone, it is estimated that the advertiser can improve the
overall campaign performance by as much as 30%.

MTA Total
17,494
6,938
5,567
2,037
1,818
1,565
1,494
1,491
1,420
1,187
907
768
746
673
489
483
222
168
133
58

DISCUSSION

1. Select the right dimensions to model on. Introducing unnecessary dimensions would introduce noise and
make results difficult to interpret.

In this article we proposed two statistical multi-touch attribution models. We also proposed a bivariate metric that
can be used to evaluate and select a data-driven MTA model.
We consider the main body of this work falls under descriptive or interpretive modeling, a field that has been largely
ignored in comparison to predictive modeling. For digital
advertising, having the right attribution model is critically
important as it drives performance metric, advertising in-

2. Control the dimensionality and cardinality. Higher dimensionality and cardinality would either significantly
increase the amount of data needed for statistical significance or drown out the important conclusions.

263

3. Carefully encode variables so that domain knowledge

could help choose a compact yet effective model.

[5] Provost, F., Dalessandro, B., Hook, R., Zhang, X.,

and Murray, A. Audience Selection for On-line Brand
Advertising: Privacy-friendly Social Network
Targeting. In Proceedings of the Fifteenth ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2009.
[6] Li, W., Wang, X., Zhang, R., Cui, Y., Mao, J., and
Jin, R. Exploitation and Exploration in a Performance
Based Contextual Advertising System. In Proceedings
of the Fifteenth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
2010.
[7] Hastie, T., Tibshirani, R., and Friedman, J. The
Elements of Statistical Learning: Data Mining,
Inference and Prediction, 2nd Edition, Springer, New
York, 2009.
[8] Cortes, C., and Vapnik, V. Support-Vector Networks.
Machine Learning, 20, 273-297, 1995.
[9] Jin, X., Li, Y., Mah, T., and Tong, J. Sensitive
Webpage Classification for Content Advertising. In
Proceedings of the 1st international workshop on Data
mining and audience intelligence for advertising, 2007.
[10] Bishop, C.M. Neural Networks for Pattern
Recognition, Oxford University Press, 1996.
[11] Bishop, C.M. Pattern Recognition and Machine
Learning, Springer, 2007.
[12] Breiman, L. Bagging Predictors. Machine Learning,
24, 123-140, 1996.
[13] Breiman, L. Random Forests. Machine Learning, 45,
5-32, 2001.
[14] Perlich, Cl, Provost, F., and Simonoff, J.S. Tree
Induction vs. Logistic Regression: A Learning-Curve
Analysis. Journal of Machine Learning Research, 4,
211-255, 2003.

There are a number of avenues for future research. First,

bagging process is a wrapper method that can be applied
to many types of learning machines. For example Random
Forrest [13] is a very popular bagged decision tree model.
We choose logistic regression for the ease of implementation
and the simple interpretation of the coefficients. One area
of the future development is to extend this MTA framework
to other learning machines so that we can choose a more
powerful learning method while still be able to easily derive
the user-level attribution assignment. Another area of development is in formalizing the heuristics needed for building
specific types of MTA models that can address typical digital advertising questions such as budget allocation, crosschannel optimization, and message sequencing. The third
area is in incorporating the MTA model into predictive advertising models. Attribution model defines the success metric of each advertising campaign. Because of the dominance
of the LTA model, many predictive models used today are
influenced by it. New predictive models are needed when
advertisers start to adopt the new attribution model.

REFERENCES

[1] DAngelo, F. Happy Birthday, Digital Advertising!

2009.
http://adage.com/digitalnext/post?article_id=139964

[2] Chandler-Pepelnjak, J. Atlas Institute, Microsoft

Advertising. Measuring ROI Beyond the Last AD.
http://www.atlassolutions.com/uploadedFiles/Atlas/
Atlas_Institute/Published_Content/
dmi-MeasuringROIBeyondLastAd.pdf.

[3] Clearsaleing Inc. Clearsaleing Attribution Model.

http://www.clearsaleing.com/product/
accurate-attribution-management/

[4] C3 Metric, Inc. What is C3 Metric.

http://c3metrics.com/executive-summary/

264

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Unlevel the Playing Field: The Biggest Mindshift in PPC History
From Everand
Unlevel the Playing Field: The Biggest Mindshift in PPC History
Frederick Vallaeys
5/5 (1)
The Rise of the Platform Marketer: Performance Marketing with Google, Facebook, and Twitter, Plus the Latest High-Growth Digital Advertising Platforms
From Everand
The Rise of the Platform Marketer: Performance Marketing with Google, Facebook, and Twitter, Plus the Latest High-Growth Digital Advertising Platforms
Craig Dempster
5/5 (1)
The Marketing Audit: The Hidden Link between Customer Engagement and Sustainable Revenue Growth
From Everand
The Marketing Audit: The Hidden Link between Customer Engagement and Sustainable Revenue Growth
Orlando Skelton
5/5 (1)
Technology Business Management: The Four Value Conversations Cios Must Have With Their Businesses
From Everand
Technology Business Management: The Four Value Conversations Cios Must Have With Their Businesses
Todd Tucker
4.5/5 (2)
Versana Essential Transducer Guide Complete
100% (5)
Versana Essential Transducer Guide Complete
2 pages
Marketing Terminologies: Digital, E-commerce, Influencer, and Email Marketing Terms
From Everand
Marketing Terminologies: Digital, E-commerce, Influencer, and Email Marketing Terms
Chetan Singh
No ratings yet
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 1)
From Everand
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 1)
Max Editorial
No ratings yet
Customer-Centric Marketing: A Pragmatic Framework
From Everand
Customer-Centric Marketing: A Pragmatic Framework
R. Ravi
No ratings yet
Future of Advertising
From Everand
Future of Advertising
Zuri Deepwater
No ratings yet
Customer Relationship Management: A powerful tool for attracting and retaining customers
From Everand
Customer Relationship Management: A powerful tool for attracting and retaining customers
50minutes
3.5/5 (3)
Digital advertising, Past, Present, Future: Contemporaru Marketing, Strategy and tactics, #1
From Everand
Digital advertising, Past, Present, Future: Contemporaru Marketing, Strategy and tactics, #1
Dr A Whalley
No ratings yet
Marketing Analytics: How to Achieve Success, #1
From Everand
Marketing Analytics: How to Achieve Success, #1
Ricardo Moreno
No ratings yet
Digital Marketing A Comprehensive Guide: African Series, #1
From Everand
Digital Marketing A Comprehensive Guide: African Series, #1
Ncamiso Xaba
No ratings yet
Monetization Tactics
From Everand
Monetization Tactics
Lucas Morgan
No ratings yet
A Comprehensive Guide to Digital Marketing Strategy
From Everand
A Comprehensive Guide to Digital Marketing Strategy
Neil Kokemuller
No ratings yet
MARKETING DATA ANALYST HANDBOOK: A DATA WHISPERERS DREAM
From Everand
MARKETING DATA ANALYST HANDBOOK: A DATA WHISPERERS DREAM
DR JAMES SELIGMAN
No ratings yet
Beginners Modern Internet Marketing: Content Strategy
From Everand
Beginners Modern Internet Marketing: Content Strategy
magic road
No ratings yet
Affiliate Marketing: The Ultimate Guide to a Profitable Online Business (The Beginner's Step by Step Guide to Making Money Online With Affiliate Marketing)
From Everand
Affiliate Marketing: The Ultimate Guide to a Profitable Online Business (The Beginner's Step by Step Guide to Making Money Online With Affiliate Marketing)
Tim Hopkins
No ratings yet
Summary of Roland Smart's The Agile Marketer
From Everand
Summary of Roland Smart's The Agile Marketer
IRB Media
No ratings yet
Business Models
From Everand
Business Models
Ethan Evans
No ratings yet
Cpa Profit Guide
From Everand
Cpa Profit Guide
Nawras Saker
No ratings yet
Digital Marketing For Law Firms
From Everand
Digital Marketing For Law Firms
Ivan Theodoulou
No ratings yet
Relationship Management
From Everand
Relationship Management
IntroBooks Team
No ratings yet
The Business Model Canvas: Let your business thrive with this simple model
From Everand
The Business Model Canvas: Let your business thrive with this simple model
50minutes
3.5/5 (3)
A new era of Value Selling: What customers really want and how to respond
From Everand
A new era of Value Selling: What customers really want and how to respond
Thomas Menthe
No ratings yet
Unchained Value (Review and Analysis of Cronin's Book)
From Everand
Unchained Value (Review and Analysis of Cronin's Book)
BusinessNews Publishing
No ratings yet
Growth Loops
From Everand
Growth Loops
Amelia Green
No ratings yet
Affiliate Marketing Handbook for Digital Success
From Everand
Affiliate Marketing Handbook for Digital Success
Nikita Shigov
No ratings yet
The Ultimate Guide To Digital Marketing: Strategies and Tactics for Success
From Everand
The Ultimate Guide To Digital Marketing: Strategies and Tactics for Success
Nikola Manojlovic
No ratings yet
Master Affiliate Marketing and Gain Freedom
From Everand
Master Affiliate Marketing and Gain Freedom
Joseph Armstrong
No ratings yet
The Rise of Subscription Models: A Simple Guide to Big Ideas
From Everand
The Rise of Subscription Models: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
A-Z of Digital Marketing: The Digital Marketing Dictionary
From Everand
A-Z of Digital Marketing: The Digital Marketing Dictionary
Ezekiel Inyang
No ratings yet
Adwords; The Basics
From Everand
Adwords; The Basics
Janet Amber
No ratings yet
Platform Monetization Tactics
From Everand
Platform Monetization Tactics
Amelia Green
No ratings yet
Digital Marketing Secrets: Techniques for Online Success
From Everand
Digital Marketing Secrets: Techniques for Online Success
Martin Victor Badger
No ratings yet
Digital Marketing
From Everand
Digital Marketing
IntroBooks Team
No ratings yet
Mobile Marketing
From Everand
Mobile Marketing
JAMES SELIGMAN
No ratings yet
Marketing Mastery
From Everand
Marketing Mastery
Raghul
No ratings yet
Digital Advertising
From Everand
Digital Advertising
Zuri Deepwater
No ratings yet
The Ultimate Guide to Drop Servicing How to Build a Profitable Business Without Specialized Skills
From Everand
The Ultimate Guide to Drop Servicing How to Build a Profitable Business Without Specialized Skills
AJAY BHARTI
4/5 (1)
Business Unbound: Breaking Barriers with Innovation, Sustainability, and Strategy
From Everand
Business Unbound: Breaking Barriers with Innovation, Sustainability, and Strategy
Y C
No ratings yet
Fuel The Growth Of Your Brand: Your Guide To Mastering Google Ads
From Everand
Fuel The Growth Of Your Brand: Your Guide To Mastering Google Ads
Dack Douglas
No ratings yet
The Compendium of Marketing Tools: An Encyclopaedia for the Modern Marketer: Compendium of Marketing Tools, #1
From Everand
The Compendium of Marketing Tools: An Encyclopaedia for the Modern Marketer: Compendium of Marketing Tools, #1
EY Tassa
No ratings yet
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 2)
From Everand
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 2)
Max Editorial
No ratings yet
Conversion Secrets
From Everand
Conversion Secrets
Amelia Green
No ratings yet
The Big Picture of Digital Marketing, Simplified!
From Everand
The Big Picture of Digital Marketing, Simplified!
Mei Ping Mak
No ratings yet
Marketing Public Sector Services
From Everand
Marketing Public Sector Services
Jennifer Bean
No ratings yet
Customers Rule! (Review and Analysis of Blackwell and Stephan's Book)
From Everand
Customers Rule! (Review and Analysis of Blackwell and Stephan's Book)
BusinessNews Publishing
No ratings yet
Coaching for Small Business Owners and Entrepreneurs – the Basics
From Everand
Coaching for Small Business Owners and Entrepreneurs – the Basics
Clifford Woods
No ratings yet
Data Driven
From Everand
Data Driven
Ethan Evans
No ratings yet
Snapchat Ads Blueprint: Your Strategy For Unmatched Marketing Impact
From Everand
Snapchat Ads Blueprint: Your Strategy For Unmatched Marketing Impact
Hillary Jones
No ratings yet
How to Triple Your Email List in 30 Days: A Strategic Approach
From Everand
How to Triple Your Email List in 30 Days: A Strategic Approach
Mimi Okougbo
No ratings yet
Understanding Pay-Per-Click Search Engine Marketing
From Everand
Understanding Pay-Per-Click Search Engine Marketing
Michael Tantouri
5/5 (2)
Overcoming Impediments to Success
From Everand
Overcoming Impediments to Success
P Griffith Lindell
No ratings yet
Affiliate Marketing Blueprint: From Beginner to Pro
From Everand
Affiliate Marketing Blueprint: From Beginner to Pro
Ruhaan Qureshi
No ratings yet
The Ultimate Google Ads Guide for Orthodontists
From Everand
The Ultimate Google Ads Guide for Orthodontists
Dr. Tyler Coles
No ratings yet
The Digital Marketing Triangle "How to Combine TikTok , Meta, and Google Ads for an Unbeatable Strategy."
From Everand
The Digital Marketing Triangle "How to Combine TikTok , Meta, and Google Ads for an Unbeatable Strategy."
Quick Solutions
No ratings yet
Digital Marketing of Life. Accident and Health Insurance Products: Series 0001, #1
From Everand
Digital Marketing of Life. Accident and Health Insurance Products: Series 0001, #1
Abiodun Eke
No ratings yet
Kesteren Jol Van 10001962 MSC Etrics
No ratings yet
Kesteren Jol Van 10001962 MSC Etrics
70 pages
Martodipoetro Final Thesis
No ratings yet
Martodipoetro Final Thesis
53 pages
Attribution Modeling in Digital
No ratings yet
Attribution Modeling in Digital
23 pages
Power Factor Correction Conversion Chart
100% (1)
Power Factor Correction Conversion Chart
9 pages
Matriz de Audio ECLER MIMO 88 Manual de Instrucciones
No ratings yet
Matriz de Audio ECLER MIMO 88 Manual de Instrucciones
40 pages
Engineering Design Rubric
No ratings yet
Engineering Design Rubric
1 page
The Increasing Impact of Technology On Translation
No ratings yet
The Increasing Impact of Technology On Translation
5 pages
Presentation ON Communication: Submitted By: Anish, Arun, Bhushan, Shameem MBA/10/119
No ratings yet
Presentation ON Communication: Submitted By: Anish, Arun, Bhushan, Shameem MBA/10/119
17 pages
Job Opportunities, Skill Acquisition and Self Development
No ratings yet
Job Opportunities, Skill Acquisition and Self Development
31 pages
Nielsen Lund 2003 Overload PDF
No ratings yet
Nielsen Lund 2003 Overload PDF
13 pages
Wendler 5 3 1 v1.3
No ratings yet
Wendler 5 3 1 v1.3
44 pages
EME Developer Guide
No ratings yet
EME Developer Guide
2 pages
Chapter 4 Process Description (Draft)
No ratings yet
Chapter 4 Process Description (Draft)
4 pages
50ton Voyaguer E Stage 72KW
No ratings yet
50ton Voyaguer E Stage 72KW
10 pages
2018 Banded Barcode
No ratings yet
2018 Banded Barcode
91 pages
HC Verma Solutions Class 11 Chapter 13 - Fluid Mechanics
No ratings yet
HC Verma Solutions Class 11 Chapter 13 - Fluid Mechanics
4 pages
1425 Repair Manual
No ratings yet
1425 Repair Manual
98 pages
CV Inghon Update May 21
No ratings yet
CV Inghon Update May 21
6 pages
Datasheet
No ratings yet
Datasheet
2 pages
EMS Manual
75% (4)
EMS Manual
20 pages
Applications of Statistics and Probablity To Software Engineering
67% (6)
Applications of Statistics and Probablity To Software Engineering
6 pages
PK ENG 2019 Web
No ratings yet
PK ENG 2019 Web
16 pages
Aluminio Astm B 209 1 PDF
No ratings yet
Aluminio Astm B 209 1 PDF
29 pages
Backward Inclined SWSI Size30 Class1 PLR Arr9 1231rpm 0.0750lbft3
No ratings yet
Backward Inclined SWSI Size30 Class1 PLR Arr9 1231rpm 0.0750lbft3
12 pages
App Shop
No ratings yet
App Shop
2 pages
Fishbone Diagram
No ratings yet
Fishbone Diagram
5 pages
1st Sem Tech4ED Newsletter
No ratings yet
1st Sem Tech4ED Newsletter
12 pages
Free Ps Plus - 2019 Guide
No ratings yet
Free Ps Plus - 2019 Guide
2 pages
Xxxx-Spe-Xxx - Mto Format
No ratings yet
Xxxx-Spe-Xxx - Mto Format
10 pages
Retaining Wall
No ratings yet
Retaining Wall
4 pages
FI02585478
No ratings yet
FI02585478
8 pages
Non-Traditional Micromachining Processes: Golam Kibria B. Bhattacharyya J. Paulo Davim Editors
No ratings yet
Non-Traditional Micromachining Processes: Golam Kibria B. Bhattacharyya J. Paulo Davim Editors
431 pages

Attribution Models

Uploaded by

Attribution Models

Uploaded by

Data-driven Multi-touch Attribution Models

results of our attribution models also shed several important

Categories and Subject Descriptors

Digital advertising started 16 years ago as a new media

Xuhui Shao is Chief Technology Officer, Turn, Inc.

Figure 1: An illustration of multi-touch attribution

The goal of attribute modeling is to pin-point the credit

tiser, the product offer, and how advertising messages and

purchase or sign up for a service based on his exposure to

egy to optimize their resource allocations and optimization

MULTI-TOUCH ATTRIBUTION MODELS

A Simple Probabilistic Model

In addition to the bagged logistic regression model, we

and the pair-wise conditional probabilities

In this section, we analyze a large advertising campaign

Step 2. The contribution of channel i is then computed

Bagged Logistic Regression Analysis

In this section we examine the empirical performance of

logistic model achieves a substantially smaller V-metric but

for the two models in Figure 2. First, we observe that the

Interpretation of the Results

Probabilistic Model Analysis

We next apply the simple probabilistic model to the same

Table 2: The MTA user-level attribution analysis.

bagged logistic regression model

simple probabilistic model

sights and optimization strategy. We believe our work makes

Figure 2: MTA user-level assignment for the bagged

3. Carefully encode variables so that domain knowledge

[5] Provost, F., Dalessandro, B., Hook, R., Zhang, X.,

There are a number of avenues for future research. First,

[1] DAngelo, F. Happy Birthday, Digital Advertising!

[2] Chandler-Pepelnjak, J. Atlas Institute, Microsoft

[3] Clearsaleing Inc. Clearsaleing Attribution Model.

[4] C3 Metric, Inc. What is C3 Metric.

You might also like