Estimating CLV Using Aggregated Data - The Tuscan Lifestyles Case Revisited
Estimating CLV Using Aggregated Data - The Tuscan Lifestyles Case Revisited
T he Tuscan Lifestyles case (Mason, 2003) offers a simple twist on the stan-
dard view of how to value a newly acquired customer, highlighting how standard
retention-based approaches to the calculation of expected customer lifetime
PETER S. FADER
is Frances and Pei-Yuan Chia
Professor of Marketing at
value (CLV) are not applicable in a noncontractual setting. Using the data pre- The Wharton School of the
sented in the case (a series of annual histograms showing the aggregate dis- University of Pennsylvania;
tribution of purchases for two different cohorts of customers newly “acquired” e-mail: faderp@wharton. upenn.edu
by a catalog marketer), it is a simple exercise to compute an estimate of
“expected 5 year CLV.” If we wish to arrive at an estimate of CLV that includes
the customer’s “life” beyond five years or are interested in, say, sorting out the BRUCE G. S. HARDIE
purchasing process (while “alive”) from the attrition process, we need to use a is Associate Professor of Marketing at
formal model of buying behavior that can be applied on such coarse data. the London Business School;
To tackle this problem, we utilize the Pareto/NBD model developed by e-mail: [email protected]
Schmittlein, Morrison, and Colombo (1987). However, existing analytical results
do not allow us to estimate the model parameters using the data summaries
presented in the case. We therefore derive an expression that enables us to do KINSHUK JERATH
this.The resulting parameter estimates and subsequent calculations offer useful is a doctoral candidate in the
insights that could not have been obtained without the formal model. For
Marketing Department
instance, we were able to decompose the lifetime value into four factors, namely
purchasing while active, dropout, surge in sales in the first year and monetary at The Wharton School of the
value of the average purchase.We observed a kind of “triple jeopardy”in that the University of Pennsylvania;
more valuable cohort proved to be better on the three most critical factors. e-mail: kinshuk@wharton. upenn.edu
© 2007 Wiley Periodicals, Inc. and Direct Marketing Educational Foundation, Inc.
The second author acknowledges
JOURNAL OF INTERACTIVE MARKETING VOLUME 21 / NUMBER 3 / SUMMER 2007 the support of the London Business
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/dir.20085 School Centre for Marketing.
55
Journal of Interactive Marketing DOI: 10.1002/dir
Tuscan Lifestyles Data: Number of Purchases per Year (Not Including Initial Purchase) for Each of Two Cohorts
TABLE 1 Grouped by Size of Initial Purchase
partial bans on transborder data flows. Given general Hanna, 2003) have also relied on aggregate data and
outsourcing trends, these laws can create real barriers therefore are subject to similar limitations.
to the implementation of models built on individual-
level data. Thus, instead of using relatively simple “accounting”
methods to tally up the past value of each customer
For these reasons, it is often much harder to implement segment, we need a formal model to capture the under-
CLV models than a casual reading of the marketing lying purchase patterns and then project them out to
literature might suggest. When the transaction data future periods. This is where a stochastic model of
are summarized across customers, such as in Table 1, customer behavior comes in. Such a model posits latent
the concerns raised above evaporate. But this leads us probabilistic processes which are presumed to underlie
back to the issues at the heart of this paper: how to the observable behavior of each customer. In the CLV
compute CLV from such data. setting, we need to develop a probabilistic model that
takes into account three distinct (but possibly inter-
Using the data provided above, we can easily arrive at related) processes: (1) the purchase frequency of a
an estimate of “expected 5 years CLV.” But what about customer while active, (2) the attrition in the customer
the customer’s “life” beyond 5 years? And what if we base over time, and (3) the monetary value of the
wish to know more than just the mean purchase rate? purchases. Such a model can be fit using recorded data for
For instance, suppose we are interested in sorting out the early activity of the customer base and future pur-
the purchasing process (while “alive”) from the attri- chases can then be predicted. While the model is initially
tion process? Any serious examination of CLV—and conceptualized at the level of the individual customer, it
any corporate program that relies on it—should con- is then aggregated across a population of heterogeneous
sider such questions. Unfortunately they cannot be customers and estimated using data at the segment level
answered using these data alone. This situation is not or across the entire customer base (while still recognizing
unique; other researchers (e.g., Berger, Weinberg, & the underlying sources of heterogeneity).
In this paper, we invoke the Pareto/NBD framework v. Heterogeneity in dropout rates across customers
(Schmittlein, Morrison, & Colombo, 1987), a parsimo- follows a gamma distribution with shape parame-
nious model of repeat buying behavior in a noncon- ter s and scale parameter b.
tractual setting that provides excellent predictive
vi. The transaction rate l and the dropout rate m vary
power using limited summary information about each
independently across customers.
customer. However, in its original form, the parame-
ters of the Pareto/NBD cannot be be estimated using
Schmittlein, Morrison, and Columbo (1987) present a
aggregated data of the form given in Table 1. In the
careful justification of these assumptions. The empir-
next section, we derive an expression that enables us
ical validations of the model presented in Schmittlein
to estimate the model parameters using such data.
and Peterson (1994) and Fader, Hardie, and Lee (2005)
We then fit the model to the data and examine the key
provide further (indirect) support for these assumptions.
question: “What is a new Tuscan Lifestyles customer
worth?”
Given these assumptions, it possible to derive expres-
sions for expected purchasing, mean (or median) life-
time, expected CLV, and so on. In order to compute
MODEL DEVELOPMENT
these quantities, we need to know the values of the four
The Pareto/NBD is a powerful stochastic model of model parameters: r, a (which characterize the distri-
purchasing in a noncontractual setting. It starts by bution of transaction rates across the customer base)
assuming that a customer’s relationship with the firm and s, b (which characterize the distribution of dropout
has two phases: he or she is “alive” for an unobserved rates across the customer base).
period of time, then becomes permanently inactive.
While alive, the customer is assumed to purchase If we start by assuming that we know the exact timing
“randomly” around his or her mean transaction rate. of all the transactions associated with each customer, it
As such, a customer’s sequence of purchases can turns out that we can estimate the four model parame-
appear to be somewhat lumpy/uneven at times, even ters using a likelihood function that only requires
though the unobserved buying rate is constant. The “recency” (the time of the last purchase) and “frequency”
unobserved time at which the customer becomes per- (how many purchases occurred in a given time period)
manently inactive is also the outcome of a probabilis- information for each customer. However, in many situ-
tic process governed by a dropout rate specific to the ations we do not have access to such data; for example,
customer. We assume that customers are heteroge- we may only have summaries such as those given in
neous: both the transaction rates and dropout rates Table 1. The problem with such a data structure is that
vary from person to person. any longitudinal information about an individual cus-
tomer is lost. Suppose someone made two repeat pur-
More formally, the assumptions of the model are as chases in year 1; we do not know how many purchases
follows: they made in years 2–5. Does this mean we cannot
apply the Pareto/NBD model?
i. Customers go through two stages in their “life-
time” with a specific firm: They are alive for some
If we reflect on the above model assumptions, we see
period of time, then become permanently inactive.
that they tell a “story” about customer behavior that is
ii. While alive, the number of transactions made by a not at all related to the nature of the data that might
customer follows a Poisson process with transac- be available to estimate the model parameters. (This
tion rate l. is the hallmark of a stochastic model—tell the story
first, then deal with data issues later.)
iii. A customer’s unobserved “lifetime” of length v (after
which he is viewed as being permanently inactive) is
Let the random variable X(t, t 1) denote the number
exponentially distributed with dropout rate m.
of transactions observed in the time interval (t, t 1].
iv. Heterogeneity in transaction rates across cus- Referring back to the $50 group in Table 1, we see
tomers follows a gamma distribution with shape that X(0, 1) 0 for 611 people, X(1, 2) 1 for 1,441
parameter r and scale parameter a. people, and so on. If we can derive an expression for
P(X(t, t 1) x 0 r, a, s, b)
s 1r x2 r x s
dx0 c 1 a b d a b a b a b
b a 1 b
bt 1r 2x! a 1 a1 bt1
B(r x, s 1) x
(r s i) 1
arbs e B1 a B f (4)
B(r, s) i0 (r s) i! 2
where
(b t)
2F1 ar b nars
a
s, s 1; r s x 1; if a b t
a
B1 d (5)
bt a
2F1 ar s, r x; r s x 1; b n (b t) rs if a bt
bt
(b t)
2F1 ar b n (a 1) rsi
a
s i, s 1; r s x 1; if a b t
a1
B2 d (6)
bt a
2F1 ar s i, r x; r s x 1; b n (b t 1) rsi if a bt
bt1
and 2F1( ) is the Gaussian hypergeometric function. making one repeat purchase in the first year. (In the
(A step-by-step derivation of (3)–(6) is presented in absence of adequate knowledge of the true, underlying
the technical appendix.) data generation process, one can ex post consider the
characteristics of the collected data that might have
Given data summaries of the form presented in led to such patterns. For instance, Tuscan Lifestyles
Table 1, we can estimate the four Pareto/NBD model might have offered a coupon to its new customers that
parameters via the method of maximum likelihood in would expire 1 year after their initial purchases.)
the following manner. Suppose we have a sample of T
period-specific histograms that give us the distribu- More formally, we add a single parameter for each
tion of the number of purchases across a fixed set of group of customers to address this problem. We assume
customers in each period (of equal length). Let nx,t be that, within the first year after making the initial pur-
the number of people who made x purchases in the tth chase, a “hard-core” fraction p of the customers in the
period. (Referring back to the $50 group in Table 1, cohort make exactly one repeat purchase that year,
T 5, n0,1 611, n1,2 1,441, and so on.) The sample with the remaining fraction 1 p purchasing accord-
log-likelihood function is given by ing to the basic Pareto/NBD process.
This can be maximized using standard numerical opti- p (1 p)PPNBD (X(t, t 1) x) if t 0 and x 1
mization routines. µ (1 p)PPNBD (X(t, t 1) x) if t 0 and x 1
PPNBD (X(t, t 1) x) if t 0
MODEL ESTIMATION RESULTS (8)
We first applied the basic model to the data in Table 1. where PPNBD( ) is the basic Pareto/NBD Probability
Using (7), we obtained the maximum likelihood given in (4). From here on, we will only use this
estimates of the model parameters for each of the two “Pareto/NBD with spike” model for the results and
cohorts of customers. We then generated the purchase analysis that follow. Applying the maximum likelihood
histograms for the first 5 years exactly as given and estimation procedure in the same manner as for the
compared these generated histograms to the original basic Pareto/NBD model, we obtain the parameter
data. Looking closely at the raw data (Table 1), we can estimates for each group as reported in Table 2.
see that there is a large number of customers making
one repeat purchase in the first year (for both cohorts). These parameters can be interpreted by plotting the
This number then drops sharply in the second year, various mixing distributions that they characterize.
after which it declines smoothly. On the other hand, the Figure 1 shows how the transaction rates (l) and the
numbers of customers making more than one repeat dropout rates (m) vary across the members of each
purchase do not show any sharp variations. While the cohort. The high values of r and a indicate that there
Pareto/NBD model is very flexible, it is not flexible
enough to capture this year 1 aberration—not only did
it miss the spike at x 1, but in an attempt to capture
this surge in the first year, the predictions for the later
years were off as well.
TABLE 2 Parameter Estimates by Cohort
Transaction Rates Dropout Rates These “stories” about the underlying behavioral ten-
≥ $50 dencies within each segment seem to be plausible (and
≥ $50
managerially interesting). Looking at the raw data
alone provides no intuition about the interplay between
f () f()
< $50 the flow of transactions for active customers and the dif-
< $50 ference in dropout tendencies both within and across
each of the customer groups.
0.0 0.5 1.0 1.5 0.00 0.25 0.50 0.75
Even stronger support for the model is offered in
FIGURE 1 Figure 2; for each of the 5 years, we compare the
Heterogeneity in Transaction and Dropout Rates for the Two
actual distribution of the number of transactions per
Cohorts year with the corresponding fitted distribution
(computed using (8) and the parameter estimates
is relatively little heterogeneity in the underlying given in Table 2), by cohort. Table 3 shows the mean
transaction rate l. Similarly, the high values of s and absolute percentage error (MAPE) between the actu-
b indicate that there is little heterogeneity in the al and predicted numbers for the 5 years, individual-
underlying dropout rate m. Nevertheless, there are ly and across all the years (combined). It is quite
some noteworthy differences across the two groups. It remarkable to see how well a five-parameter model
is clear that the transaction rates tend to be higher can capture the different shapes that are seen with-
for the $50 group, albeit with a lower variance. The in each set of histograms. More importantly, the
dropout rates are much closer across the two groups, model seems to do an excellent job of following the
but they tend to be slightly higher for the $50 systematic “shift towards zero” as each group of cus-
group. Finally, the p parameters indicate that a hard- tomers slows down its collective level of purchasing
core of roughly 60% of the customers make just one over time. This is clear evidence that a substantial
repeat purchase in the first year, and this proportion degree of customer dropout is taking place, and
is about the same for each group. therefore confirms the need for the two different
2,000
0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+
≥ $50 Cohort
1,500
0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+
FIGURE 2
Comparing the Actual (Solid Bar) and Fitted (Clear Bar) Distributions of the Number of Transactions per Year, by Cohort
1.2 1.2
Average number of
Transactions
Transactions
0.8 Model 0.8 Model
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
1 2 3 4 5 1 2 3 4 5
Year Year
FIGURE 3
Comparing the Average Annual Number of Transactions per Customer with the Corresponding Model-Based
Estimates, by Cohort
2,000 1,500
0 0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+
FIGURE 4
Comparing the Actual (Solid Bar) and Predicted (Clear Bar) Distributions of the Number of Transactions in Years 4
and 5 (Given Parameters Estimated Using the First Three Years of Data), by Cohort
Lifestyles case—the value per transaction (revenue per E(CLV) margin E(revenue/transaction) DET.
transaction contribution margin) can be factored out (11)
and we can focus on forecasting the “flow” of future
transactions, discounted to yield a present value. Fader, Fader, Hardie, & Lee (2005) present an expression for
Hardie, and Lee (2005) call this quantity “discounted DET as implied by the Pareto/NBD model. However,
expected transactions,” or DET; it is the effective we cannot use it in this setting because of the
modification to the basic model to accommodate the this value, at $89. Clearly, a customer who makes a
more-than-expected number of people making just high-value first purchase with Tuscan Lifestyles is
one repeat purchase in the first year. We can there- more valuable in the long run compared to a customer
fore compute DET in the following manner: who makes a low-value first purchase; the lone data-
point of the value of the first purchase is reasonably
E[X(t, t 1)] discriminating in determining a customer’s future
DET a t0.5 , (12)
t0 (1 d) worth. Most of this difference is due to the fact that
the average order size is 65% higher for the $50
where d is the annual discount factor and E[X(t, t 1)] cohort; in contrast, DET for the $50 cohort is only
is computed using the expression given in (10). As the 17% higher than the corresponding number for the
transactions can occur at any point in time during $50 cohort.
the year, we discount them as if, on average, they
occur in the middle of the year. Note that this expres- The equivalent 5-year DET numbers using the annual
sion for DET does not include the initial transaction averages computed using the data in Table 1 are 2.04
that signals the start of the customer’s relationship and 2.39, resulting in “5-year lifetime value” estimates
with the firm. (If we wish to include this transac- of $40 and $76, respectively. Because of the truncation
tion—which we would need to do if we wish to set an at the end of 5 years, these numbers underestimate
upper bound for acquisition spend—we simply add 1 the true expected lifetime value by 14%.
to our estimate of DET.)
Some analysts may be willing to live with a 14% error
Coming to the revenue/transaction component of the for the sake of analytical simplicity. However, we can-
definition of CLV, we note that the Tuscan Lifestyles not be sure that the underestimation will always be
case only provides annual summary statistics of so low, for instance, when the variation in transaction
the spending levels for each cohort (Mason, 2003, rates and dropout rates is high. For the data at hand,
Exhibit 3). While it is easy to conceptualize stochas- not only is the mean purchasing rate low and the
tic models to capture the random variation in revenue/ dropout rate high for both cohorts, but the variation
transaction over time (Fader, Hardie, & Lee, 2005), in transaction rates and dropout rates for the cohorts
it would be difficult to estimate reliably parameters is also quite low. In other studies (e.g., Fader, Hardie, &
of such a model given the limited data on monetary Lee, 2005), considerably higher heterogeneity (along
value available here. Since the two groups were with faster purchasing and slower dropout) have been
defined on the basis of initial expenditure, this observed. Thus, the 14% underestimation in this case
removes much of the cross-sectional variation in rev- is a very modest number; in many other settings, the
enue/transaction. Thus, it is more appropriate to impact of ignoring the future when performing CLV
assume a constant level for the purchase amounts calculations will likely be much larger. And beyond
within each group of customers. The case data indi- the CLV calculation per se, the use of the model offers
cate that the mean spending level across the 5 years many other useful diagnostics as discussed earlier
for the $50 group is (32.09 41.78 51.05 52.43 and below.
53.63)/5 $46.20 per transaction, while for the
$50 group it is (93.46 74.02 67.75 67.12 Referring back to Figure 1, the between-cohort differ-
78.26)/5 $76.12. Finally, we follow the case and use ences in the distributions of the dropout rates are
a fixed margin of 42% for every transaction, and a dis- smaller than those for the transaction rates. While
count factor of 10% for our CLV calculations. the mean (b/(s 1)) and median (b (21/s 1)) lifetimes
are slightly higher for the $50 cohort (3.5 and
Using (10) and (12), we find that DET equals 2.36 for 2.4 years versus 3.4 and 2.2 years), the differences in
the $50 group and 2.77 for the $50 group. (In eval- the survival curves (Figure 5, left side) are negligible.
uating (12), we terminate the series at 100 years, Thus, the differences in DET are driven by differ-
which effectively represents infinity.) It follows that ences in the transaction rates. We note that the mean
our estimate of expected CLV for the $50 group is of the transaction rate distribution is 0.88 (purchases
$46, while the expected CLV for a randomly-chosen per annum while alive) for the $50 cohort and 1.04
member of the $50 group is almost double for the $50 cohort. This difference is reflected in the
1.00 4
..... ...... ....... ....... ....... ...... ....... .......
< $50 Cohort ... .....
E(Cum. Transactions)
..... .. ........................................................................................
0.75 ... ≥ $50 Cohort 3 .... ................................
...
..... ... ..............
..
% Alive
..... ..........
0.50 .... 2 .
...... .......
.... .....
......
..... ...
0.25 ...... 1 ..... < $50 Cohort
.........
......... ≥ $50 Cohort
.............
............................................
0.00 ........................................................... 0
0 5 10 15 20 25 0 5 10 15 20 25
Year Year
FIGURE 5
Plots of the Percentage Number of Customers Still Alive and the Expected Cumulative Number of
Transactions per Customer for Years 1–25, by Cohort
plots of expected cumulative transactions (undis- variance in the discounted number of transactions is
counted), given on the right side of Figure 5. actually higher for this cohort (2.67 versus 2.14 for the
$50 cohort).
As a final illustration of the value-added associated
with the use of a stochastic model of buyer behavior, let If we had sufficient data to estimate a stochastic
us consider the question of variability in CLV (or DET). model for revenue/transaction, we could augment our
To explore this, we simulate purchase sequences for estimates of expected CLV by the full distribution of
each customer, which are then discounted to give “dis- CLV across the customer base (and associated sum-
counted transaction” numbers. The between-customer mary statistics).
distribution of this quantity is reported in Figure 6 for
both cohorts. This figure shows how the discounted
transactions are spread around the expected DET for
DISCUSSION AND CONCLUSIONS
each cohort; computing the average of these numbers
yields the average DET for each cohort, as reported The Tuscan Lifestyles case offers a simple new twist
above. We note that while the variance in transaction on the standard view of how to value a newly acquired
rates is lower for the $50 cohort (Figure 1), the customer, highlighting how standard retention-based
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
Number of Discounted Transactions Number of Discounted Transactions
FIGURE 6
Distribution of Discounted Number of Transactions per Customer, by Cohort
approaches to the calculation of expected CLV are other kinds of splits yield more dramatic differences
impractical in a noncontractual setting. It is a sim- between groups of customers? These differences should
ple exercise to use the data presented in the case to be gauged not only in terms of overall expected CLV for
arrive at an estimate of “expected 5-year CLV.” each group but also in terms of the Pareto/NBD model
However, if we wish to arrive at an estimate that components. Maybe a certain kind of split can lead to
includes the customer’s “life” beyond 5 years or are a greater degree of homogeneity in each group’s trans-
interested in, say, sorting out the purchasing process action rates and/or dropout rates, thereby reducing
(while alive) from the attrition process or computing some uncertainty about their future behavior and
the distribution of CLV, we need to use a formal making it easier to target members of each group.
model of buying behavior. While the Pareto/NBD There are clearly many substantive benefits that arise
model is a natural starting point, existing results do from this kind of analysis.
not allow us to estimate the model parameters using
the data summaries presented in the case. A key From a methodological standpoint, the move from
contribution of this paper is the derivation of an detailed transaction data to histograms raises other
expression that enables us to do this. questions as well. How about data structures that lie
somewhere in between these two extremes? For
Our estimated parameters and subsequent calcula- instance, it is easy to imagine firms maintaining
tions offer useful insights that could not have been “interval-censored” data, that is, period-by-period
obtained without the formal model. For instance, we counts for each customer. Some ideas about how to
were able to decompose the expected CLV into four develop models using this kind of data structure are
factors, namely, purchasing while active, dropout, explored by Fader and Hardie (2005). Other questions
surge in sales in the first year, and monetary value relate to the length of the “window” for the censoring
of the average purchase. We observed a kind of process (e.g., quarterly histograms versus yearly his-
“triple jeopardy” in that the more valuable cohort tograms) and the number of histograms needed to
proved to be better on the three most critical factors obtain stable parameter estimates. All in all, there
(i.e., all but the first-year sales surge). This observa- are many promising research opportunities to be pur-
tion by itself deserves additional study, and may be sued down this path.
the basis for an interesting “empirical generaliza-
tion” about CLV differences across groups. By simply Although these methodological questions may be
eye-balling the raw data, it might be possible to straying pretty far from the original issues raised in
identify the existence of these factors, but it is the Tuscan Lifestyles case, they provide proof of the
impossible to assess their magnitudes, and, more healthy links that exist between well-formed man-
importantly, the difference in their magnitudes agerial questions and appropriately constructed
across the two cohorts. For example, one can observe empirical models. New developments in one area fre-
a considerable dropout rate in both cohorts, but can- quently open up new possibilities in another area, to
not ascertain how the within-cohort distributions for the benefit of everyone on both sides. We see Tuscan
the dropout rates might be different. Similarly, a Lifestyles as the beginning of such a dialogue, and we
spike in purchases in the first year is quite evident look forward to continuing the conversation.
from the histograms, but without the underlying
“organic” model of purchase, the magnitude of the
spike cannot be obtained. REFERENCES
Abramowitz, M., & Stegun, I. A. (Eds.). (1972). Handbook of
It is easy to see how these insights and projections can Mathematical Functions. New York: Dover Publications.
be of use to the management of Tuscan Lifestyles (and Agrawal, R., & Srikant, R. (2000). Privacy-Preserving
many other firms that face similar issues). Besides Data Mining. Proceedings of the 2000 ACM SIGMOD
International Conference on Management of Data.
being able to judge the economic efficiency of different
Dallas, Texas, May 15–18 (pp. 439–450). New York:
kinds of acquisition strategies, the model presented ACM.
here can help managers determine better ways to Berger, P. D., Weinberg, B., & Hanna, R. C. (2003). Customer
define cohorts—does it make the most sense to divide Lifetime Value Determination and Strategic Implica-
customers on the basis of initial expenditure, or would tions for a Cruise-Ship Company. Journal of Database
Marketing & Customer Strategy Management, 11, Mielikäinen, T. (2004). Privacy Problems with Anonymized
40–52. Transaction Databases. In S. Arikawa & E. Suzuki
Carey, P. (2004). Data Protection: A Practical Guide to UK and (Eds.), Discovery Science: Proceedings of the 7th
EU Law (2nd ed.). Oxford, UK: Oxford University Press. International Conference (DS2004). Lecture Notes in
Computer Science, 3245 (pp. 219–229). Berlin: Springer.
Casella, G., & Berger, R. L. (2002). Statistical Inference
(2nd ed.). Pacific Grove, CA: Duxbury. Mason, C. H. (2003). Tuscan Lifestyles: Assessing Customer
Fader, P. S., & Hardie, B. G. S. (2002). A Note on an Lifetime Value. Journal of Interactive Marketing, 17,
Integrated Model of Customer Buying Behavior. 54–60.
European Journal of Operational Research, 139, 682–687. Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduc-
Fader, P. S., & Hardie, B. G. S. (2005). Implementing the tion to the Theory of Statistics, (3rd ed.). New York:
Pareto/NBD Model Given Interval-Censored Data. McGraw-Hill Publishing Company.
Retrieved April 28, 2007, from http://brucehardie.com/ Schmittlein, D. C, Morrison, D. G., & Colombo, R. (1987).
notes/011/ Counting Your Customers: Who Are They and What Will
Fader, P. S., & Hardie, B. G. S. (2006). Deriving an Expression They Do Next? Management Science, 33, 1–24.
for P(X(t) x) Under the Pareto/NBD Model. Retrieved Schmittlein, D. C., & Peterson, R. A. (1994). Customer Base
April 28, 2007, from http://brucehardie.com/notes/012/ Analysis: An Industrial Purchase Process Application.
Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). RFM and Marketing Science, 13, 41–67.
CLV: Using Iso-value Curves for Customer Base Singleton, S. (2004). Tolley’s Data Protection Handbook
Analysis. Journal of Marketing Research, 42, 415–430. (3rd ed.). Croyden, Surrey: LexisNexis UK.
TECHNICAL APPENDIX
Schmittlein, Morrison, and Colombo (1987) and Fader and Hardie (2006) derive expressions for P(X(t) x), where
the random variable X(t) denotes the number of transactions observed in the time interval (0, t], as implied by the
Pareto/NBD model assumptions. In this appendix, we derive the corresponding expression for P(X(t, t 1) x),
where the random variable X(t, t 1) denotes the number of transactions observed in the time interval (t, t 1].
(lt) xe lt
P(X(t) x ƒ l) , x 0, 1, 2, # # # #
x!
It also implies that, assuming the customer is alive through the time interval (ta, tb],
[l(tb ta )] xe l(tb ta)
P(X(ta, tb ) x ƒ l) , x 0, 1, 2, # # # #
x!
iii. A customer’s unobserved “lifetime” of length v (after which he is viewed as being inactive) is exponentially
distributed with dropout rate m:
f (vm) me mv
.
iv. Heterogeneity in transaction rates across customers follows a gamma distribution with shape parameter r
and scale parameter a:
arlr 1e la
g(l 0r, a) . (A1)
(r)
v. Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter s and
scale parameter b:
bsms 1e mb
g (m 0s, b) . (A2)
(s)
vi. The transaction rate l and the dropout rate m vary independently across customers.
Suppose we know an individual’s unobserved latent characteristics l and m. For x 0, there are two ways x pur-
chases could have occurred in the interval (t, t 1]:
i. The individual was alive at t and remained alive through the whole interval; this occurs with probability
e m(t1). The probability of the individual making x purchases, given that he was alive during the whole inter-
val, is lxe /x!. It follows that the probability of remaining alive through the interval (t, t 1] and making x
purchases is
lxe lx
e m(t1)
. (A3)
x!
ii. The individual was alive at t but “died” at some point v ( t ⫹ 1), making x purchases in the interval (t, v].
The probability of this occurring is
t1 1
t)]xe t)
lx m (l m) x1sxe (lm)s
冮 冮
l(v
[l(v
me mv
dv e mt
ds,
x! (l m) x1
x!
t 0
x x
[(l m)] i
a ba b c1 d.
mt l m (lm)
e e a (A4)
lm lm i0 i!
These two scenarios also hold for the case of x 0 but need to be augmented by an additional reason as to why
no purchases could have occurred in the interval (t, t 1]: The individual was dead at the beginning of the
interval, which occurs with probability
mt
1 e . (A5)
Combining (A3)–(A5) gives us the following expression for the probability of observing x purchases in the interval
(t, t 1], conditional on l and m:
lx e l
e m(t1) x
P(X(t, t 1) x ƒ l, m) dx0 c 1 d a b a be
ut l m mt
e
x! lm lm
x x
[(l m)] i
a b a be l e
l m m(t1)
a . (A6)
lm lm i0 i!
In reality, we never know an individual’s latent characteristics; we therefore remove the conditioning on l and
m by taking the expectation of (A6) over the distributions of and M:
P(X(t, t 1) x 0r, a, s, b) 冮 冮 P(X(t, t 1) x ƒ l, m)g(l ƒ r, a)g(m ƒ s, b)dl dm.
0 0
(A7)
where
A1 冮 [1
0
e mt
] g(m ƒ s,b) dm (A9)
x
冮冮
l e le m(t1)
A2 g(l ƒ r, a)g(m ƒ s, b)dl dm (A10)
0 0
x!
x
冮冮 a b a be
l m
A3 mt
g(l ƒ r, a) g(m ƒ s, b)dl dm (A11)
0 0
lm lm
x
冮冮 a b a b (l m) ie
l m
A4 m(t1)
g(l ƒ r, a) g(m ƒ s, b)dl dm (A12)
0 0
lm lm
To solve (A11), consider the transformation Y M/( M) and Z M. Using the transformation technique
(Casella & Berger, 2002, Section 4.3, pp. 156–162; Mood, Graybill, & Boes, 1974, Section 6.2, p. 204ff), it follows
that the joint distribution of Y and Z is
arbs
g(y, z ƒ a, b, r, s) ys 1
(1 y) r 1 rs
z 1
e z(a (a b)y)
. (A15)
(r) (s)
Noting that the inverse of this transformation is l (1 y)z and m yz, it follows that
1
A3 冮 冮 y(1
0 0
y) xe yzt
g(y,z ƒ a, b, r, s) dz dy
1
arbs
(r) (s) 冮 冮 y (1
0 0
s
y) rx 1zrs 1
e z(a (a (bt))y)
dz dy
1
(b t) (rs)
b yd
arbs
冮 c1 a
1 a
ys (1 y) rx 1
dy,
B(r, s) ars 0
a
which, recalling Euler’s integral for the Gaussian hypergeometric function,1 equals
b s B(r x, s 1) (b t)
a b 2F1 ar s, s 1; r s x 1; b.
a
(A16)
a B(r, s) a
1b t2 a
Looking closely at (A16), we see that the argument of the Gaussian hypergeometric function, , is
a
guaranteed to be bounded between 0 and 1 when a b t, thus ensuring convergence of the series representa-
a 1b t2
tion of the function. However, when a b t we can be faced with the situation where 6 1 , in which
a
case the series is divergent.
Applying the linear transformation (Abramowitz & Stegun, 1972, equation 15.3.4)
2 F1 aa, 1 b,
z
2F1 (a, b; c; z) (1 z) a
c b; c; z
(A17)
gives us
arbs B(r x, s 1) bt a
A3 2F1 ar s, r x; r s x 1; b. (A18)
(b t) rs B(r, s) bt
We note that the argument of the above Gaussian hypergeometric function is bounded between 0 and 1 when
a b t. We therefore present (A16) and (A18) as solutions to (A11), using (A16) when a b t and (A18)
when a b t. We can write this as
B(r x, s 1)
A3 arbs B1 (A19)
B(r, s)
1
See http://functions.wolfram.com /07.23.07.0001.01
where
(b t)
2F1 ar b nars
a
s, s 1; r s x 1; if a b t
B1 µ
a
(A20)
bt a
2F1 ar s, r x; r s x 1; b n (b t) rs if a b t.
bt
To solve (A12), we also make use of the transformation Y M/( M) and Z M. Given (A15), it follows
that
1
A4 冮 冮 y(1
0 0
y) xzie (1 y)z
e yz(t1)
g(y, z ƒ a, b, r, s) dz dy
1
arbs
(r) (s) 冮 冮 y (1
0 0
s
y) rx 1 rsx
z 1
e z(a1 (a (bt))y)
dz dy
1
(r s i) a rbs (b t) (rsi)
冮 c1 a by d
a
ys (1 y) rx 1
dy,
(r) (s) (a 1) rsi 0
a1
which, recalling Euler’s integral for the Gaussian hypergeometric function, equals
(r s i) arbs B(r x, s 1) a (b t)
2F1 ar s i, s 1; r s x 1; b. (A21)
(r s) (a 1) rsi B(r, s) a1
Noting that the argument of the Gaussian hypergeometric function is only guaranteed to be bounded between
0 and 1 when a b t, we apply the linear transformation (A17), which gives us
The argument of the above Gaussian hypergeometric function is bounded between 0 and 1 when a b t. We
therefore present (A21) and (A22) as solutions to (A12): We use (A21) when a b t and (A22) when a b t.
We can write this as
(r s i) B(r x, s 1)
A4 arbs B2 (A23)
(r s) B(r, s)
where
F1 (r s i, s 1; r s x 1; a a (b 1 t ) )兾(a 1) rsi if a b t
B2 e 2 b t (A24)
2F1 (r s i, r x; r s x 1; b t 1 )兾(b t 1) if a b t.
a rsi
Substituting (A13), (A14), (A19), and (A23) in (A8) gives us the following expression for the probability of
observing x transactions in the time interval (t, t 1]:
s
(r x) r x s
P(X(t, t 1) x 0 r, a, s, b) dx0 c 1 a b d a b a b a b
b a 1 b
bt (r)x! a 1 a1 bt1
B(r x, s 1) x
(r s i) 1
aabs e B1 a B f (A25)
B(r, s) i0 (r s) i! 2
where expressions for B1 and B2 are given in (A20) and (A24), respectively.