0% found this document useful (0 votes)
18 views

Estimating CLV Using Aggregated Data - The Tuscan Lifestyles Case Revisited

This document summarizes a case study on estimating customer lifetime value (CLV) using aggregated data from Tuscan Lifestyles, a catalog marketer. It presents annual histograms showing the purchase patterns of two cohorts of customers grouped by their initial purchase amount. Standard CLV formulas that rely on retention rates cannot be applied in this non-contractual setting where customer attrition is unknown. The document proposes using a probabilistic dropout process and survival probabilities to estimate CLV from the aggregated data. It also notes challenges in obtaining and utilizing detailed customer-level data for modeling due to privacy concerns, data protection laws, and the difficulty of anonymizing information.

Uploaded by

Kwok Iris
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Estimating CLV Using Aggregated Data - The Tuscan Lifestyles Case Revisited

This document summarizes a case study on estimating customer lifetime value (CLV) using aggregated data from Tuscan Lifestyles, a catalog marketer. It presents annual histograms showing the purchase patterns of two cohorts of customers grouped by their initial purchase amount. Standard CLV formulas that rely on retention rates cannot be applied in this non-contractual setting where customer attrition is unknown. The document proposes using a probabilistic dropout process and survival probabilities to estimate CLV from the aggregated data. It also notes challenges in obtaining and utilizing detailed customer-level data for modeling due to privacy concerns, data protection laws, and the difficulty of anonymizing information.

Uploaded by

Kwok Iris
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

ESTIMATING CLV USING

AGGREGATED DATA: THE TUSCA N


LIFESTYLES CASE REVISITED
PETER S. FADER, BRUCE G. S. HARDIE, AND KINSHUK JERATH

T he Tuscan Lifestyles case (Mason, 2003) offers a simple twist on the stan-
dard view of how to value a newly acquired customer, highlighting how standard
retention-based approaches to the calculation of expected customer lifetime
PETER S. FADER
is Frances and Pei-Yuan Chia

Professor of Marketing at

value (CLV) are not applicable in a noncontractual setting. Using the data pre- The Wharton School of the

sented in the case (a series of annual histograms showing the aggregate dis- University of Pennsylvania;
tribution of purchases for two different cohorts of customers newly “acquired” e-mail: faderp@wharton. upenn.edu
by a catalog marketer), it is a simple exercise to compute an estimate of
“expected 5 year CLV.” If we wish to arrive at an estimate of CLV that includes
the customer’s “life” beyond five years or are interested in, say, sorting out the BRUCE G. S. HARDIE
purchasing process (while “alive”) from the attrition process, we need to use a is Associate Professor of Marketing at
formal model of buying behavior that can be applied on such coarse data. the London Business School;

To tackle this problem, we utilize the Pareto/NBD model developed by e-mail: [email protected]
Schmittlein, Morrison, and Colombo (1987). However, existing analytical results
do not allow us to estimate the model parameters using the data summaries
presented in the case. We therefore derive an expression that enables us to do KINSHUK JERATH
this.The resulting parameter estimates and subsequent calculations offer useful is a doctoral candidate in the
insights that could not have been obtained without the formal model. For
Marketing Department
instance, we were able to decompose the lifetime value into four factors, namely
purchasing while active, dropout, surge in sales in the first year and monetary at The Wharton School of the

value of the average purchase.We observed a kind of “triple jeopardy”in that the University of Pennsylvania;
more valuable cohort proved to be better on the three most critical factors. e-mail: kinshuk@wharton. upenn.edu

© 2007 Wiley Periodicals, Inc. and Direct Marketing Educational Foundation, Inc.
The second author acknowledges

JOURNAL OF INTERACTIVE MARKETING VOLUME 21 / NUMBER 3 / SUMMER 2007 the support of the London Business
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/dir.20085 School Centre for Marketing.

55
Journal of Interactive Marketing DOI: 10.1002/dir

INTRODUCTION We have five annual histograms for two groups of


customers—the first comprising 4,657 customers with
Most standard introductions to the notion of customer an initial purchase below $50 and the second compris-
lifetime value (CLV) center around a formula similar to ing 3,296 customers with an initial purchase greater
than or equal to $50. (Note that the “years” do not refer

rt to calendar time, but reflect time since initial purchase.)
CLV  a m (1)
t0 (1  d) t The task in the case is to compute the value of a new
Tuscan Lifestyles customer (i.e., estimate CLV).
(where m is the net cash flow per period, r is the reten-
tion rate, and d is the discount rate) and claim that this Besides highlighting the inapplicability of the stan-
is the appropriate starting point for the calculation of dard CLV formula, the Tuscan Lifestyles case also
lifetime value. brings forth another important issue—the practical
limitations of obtaining detailed transaction-level
However, such an expression is not applicable in data. Even though many researchers have developed
many business settings, particularly those that can be general frameworks and specific methods for modeling
viewed as noncontractual. A defining characteristic customer lifetime value (CLV) using disaggregate
of a noncontractual setting is that the time at which data, few have carefully considered the difficult realities
a customer becomes inactive is unobserved by the firm; of firms’ abilities (or inabilites) to deal with customer-
customers do not notify the firm “when they stop level data. While many reporting systems are able to
being a customer. Instead they just silently attrite” create simple data summaries for a fixed period of
(Mason, 2003, p. 55). This is in contrast to a contrac- time (e.g., an annual histogram of number of pur-
tual setting, where the time at which the customer chases), the process of extracting raw individual-level
becomes inactive is observed (e.g., when the custo- data can be a time-consuming task (especially if the
mer fails to renew his or her subscription, or contacts information technology group is not directly involved
the firm to cancel his or her contract). When the point with the project).
at which the customer disappears is not observed, we
cannot meaningfully utilize notions such as “reten-
High-profile stories on data loss, such as the 2005
tion rates” and therefore formulae along the lines of
loss of tapes containing information on 3.9 million
(1) are not appropriate. We can, however, capture the
Citigroup customers or the August 2006 loss of a
“silent attrition” phenomenon by using a probabilistic
computer containing data on 26.5 million veterans
dropout process for each customer. We can define the
by a Department of Veterans Affairs (VA) subcon-
“survival probability,” S(t), for each customer at a
tractor, have justifiably made a number of compa-
given time t, (i.e., the probability that the customer is
nies wary of releasing customer-level data. Coupled
“alive” at t). This leads to the following definitional
with rising consumer concerns about privacy, this
expression for expected CLV
has motivated a major research stream in informa-
 tion systems called privacy preserving data mining
E[v(t)]S(t)
E(CLV)  a t , (2) (e.g., Agrawal & Srikant, 2000). However, the process
t0 (1  d)
of “anonymizing” customer data can be challenging,
where E[v(t)] is the expected value (or net cash flow) making it only more difficult for marketing to
of the customer at time t (if active). The challenge is get the information technology group to extract
to operationalize (2) in any given setting. (See, for the data required for modeling. Furthermore, there
example, Fader, Hardie, & Lee, 2005.) are growing concerns regarding the extent to which
privacy is actually preserved in anonymized data-
One example of a noncontractual business setting is bases (Mielikäinen, 2004).
presented in the Tuscan Lifestyles case (Mason, 2003).
This case provides a summary of repeat buying beha- Moreover, data protection laws in many countries
vior for a group of 7,953 new customers over a 5-year (particularly in Europe) complicate the process of
period beginning immediately after their first-ever transferring raw data to the analyst (Carey, 2004;
purchase. These data are presented in Table 1. Singleton, 2004), even to the extent of there being

56 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

Tuscan Lifestyles Data: Number of Purchases per Year (Not Including Initial Purchase) for Each of Two Cohorts
TABLE 1 Grouped by Size of Initial Purchase

⬍$50 COHORT $50 COHORT


YEAR YEAR
# ORDERS 1 2 3 4 5 # ORDERS 1 2 3 4 5
0 611 2739 3687 3730 3837 0 421 1643 2430 2535 2673
1 3508 1441 671 661 626 1 2354 1120 562 548 463
2 416 332 207 185 141 2 397 354 214 151 120
3 94 100 68 56 38 3 91 121 53 37 30
4 21 30 19 14 10 4 20 53 27 20 7
5 7 9 1 9 3 5 6 12 6 3 2
6 3 2 2 6 5 5 2 2 1
7 1 1 7 1 3 1
8 1 2 8 1
9 1 9 1
10 1 10
11 11
12 12
13 13 1

partial bans on transborder data flows. Given general Hanna, 2003) have also relied on aggregate data and
outsourcing trends, these laws can create real barriers therefore are subject to similar limitations.
to the implementation of models built on individual-
level data. Thus, instead of using relatively simple “accounting”
methods to tally up the past value of each customer
For these reasons, it is often much harder to implement segment, we need a formal model to capture the under-
CLV models than a casual reading of the marketing lying purchase patterns and then project them out to
literature might suggest. When the transaction data future periods. This is where a stochastic model of
are summarized across customers, such as in Table 1, customer behavior comes in. Such a model posits latent
the concerns raised above evaporate. But this leads us probabilistic processes which are presumed to underlie
back to the issues at the heart of this paper: how to the observable behavior of each customer. In the CLV
compute CLV from such data. setting, we need to develop a probabilistic model that
takes into account three distinct (but possibly inter-
Using the data provided above, we can easily arrive at related) processes: (1) the purchase frequency of a
an estimate of “expected 5 years CLV.” But what about customer while active, (2) the attrition in the customer
the customer’s “life” beyond 5 years? And what if we base over time, and (3) the monetary value of the
wish to know more than just the mean purchase rate? purchases. Such a model can be fit using recorded data for
For instance, suppose we are interested in sorting out the early activity of the customer base and future pur-
the purchasing process (while “alive”) from the attri- chases can then be predicted. While the model is initially
tion process? Any serious examination of CLV—and conceptualized at the level of the individual customer, it
any corporate program that relies on it—should con- is then aggregated across a population of heterogeneous
sider such questions. Unfortunately they cannot be customers and estimated using data at the segment level
answered using these data alone. This situation is not or across the entire customer base (while still recognizing
unique; other researchers (e.g., Berger, Weinberg, & the underlying sources of heterogeneity).

ESTIMATING CLV USING AGGREGATED DATA 57


Journal of Interactive Marketing DOI: 10.1002/dir

In this paper, we invoke the Pareto/NBD framework v. Heterogeneity in dropout rates across customers
(Schmittlein, Morrison, & Colombo, 1987), a parsimo- follows a gamma distribution with shape parame-
nious model of repeat buying behavior in a noncon- ter s and scale parameter b.
tractual setting that provides excellent predictive
vi. The transaction rate l and the dropout rate m vary
power using limited summary information about each
independently across customers.
customer. However, in its original form, the parame-
ters of the Pareto/NBD cannot be be estimated using
Schmittlein, Morrison, and Columbo (1987) present a
aggregated data of the form given in Table 1. In the
careful justification of these assumptions. The empir-
next section, we derive an expression that enables us
ical validations of the model presented in Schmittlein
to estimate the model parameters using such data.
and Peterson (1994) and Fader, Hardie, and Lee (2005)
We then fit the model to the data and examine the key
provide further (indirect) support for these assumptions.
question: “What is a new Tuscan Lifestyles customer
worth?”
Given these assumptions, it possible to derive expres-
sions for expected purchasing, mean (or median) life-
time, expected CLV, and so on. In order to compute
MODEL DEVELOPMENT
these quantities, we need to know the values of the four
The Pareto/NBD is a powerful stochastic model of model parameters: r, a (which characterize the distri-
purchasing in a noncontractual setting. It starts by bution of transaction rates across the customer base)
assuming that a customer’s relationship with the firm and s, b (which characterize the distribution of dropout
has two phases: he or she is “alive” for an unobserved rates across the customer base).
period of time, then becomes permanently inactive.
While alive, the customer is assumed to purchase If we start by assuming that we know the exact timing
“randomly” around his or her mean transaction rate. of all the transactions associated with each customer, it
As such, a customer’s sequence of purchases can turns out that we can estimate the four model parame-
appear to be somewhat lumpy/uneven at times, even ters using a likelihood function that only requires
though the unobserved buying rate is constant. The “recency” (the time of the last purchase) and “frequency”
unobserved time at which the customer becomes per- (how many purchases occurred in a given time period)
manently inactive is also the outcome of a probabilis- information for each customer. However, in many situ-
tic process governed by a dropout rate specific to the ations we do not have access to such data; for example,
customer. We assume that customers are heteroge- we may only have summaries such as those given in
neous: both the transaction rates and dropout rates Table 1. The problem with such a data structure is that
vary from person to person. any longitudinal information about an individual cus-
tomer is lost. Suppose someone made two repeat pur-
More formally, the assumptions of the model are as chases in year 1; we do not know how many purchases
follows: they made in years 2–5. Does this mean we cannot
apply the Pareto/NBD model?
i. Customers go through two stages in their “life-
time” with a specific firm: They are alive for some
If we reflect on the above model assumptions, we see
period of time, then become permanently inactive.
that they tell a “story” about customer behavior that is
ii. While alive, the number of transactions made by a not at all related to the nature of the data that might
customer follows a Poisson process with transac- be available to estimate the model parameters. (This
tion rate l. is the hallmark of a stochastic model—tell the story
first, then deal with data issues later.)
iii. A customer’s unobserved “lifetime” of length v (after
which he is viewed as being permanently inactive) is
Let the random variable X(t, t  1) denote the number
exponentially distributed with dropout rate m.
of transactions observed in the time interval (t, t  1].
iv. Heterogeneity in transaction rates across cus- Referring back to the  $50 group in Table 1, we see
tomers follows a gamma distribution with shape that X(0, 1)  0 for 611 people, X(1, 2)  1 for 1,441
parameter r and scale parameter a. people, and so on. If we can derive an expression for

58 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

P(X(t, t  1)  x) as implied by the Pareto/NBD model P(X(t, t  1)  x|l,m)


assumptions, we can then use it as a means of estimat-
ing the four model parameters given the data in Table 1.
lxe le m(t1)
 dx0 c 1 e mt
d 
Suppose we know an individual’s unobserved latent x!
characteristics l and m. For x  0, there are two ways
x purchases could have occurred in the interval x
a b a be
l m mt
(t, t  1]: (3)
lm lm
i. the individual was alive at t and remained alive x
a b a be le
through the whole interval, making x purchases l m m(t1)

during this interval, or lm lm

ii. the individual was alive at t but “died” at some


[(l  m)]i
x
point v ( t  1), making x purchases in the inter-  a ,
i0 i!
val (t, v].

For the case of x  0, there is an additional reason as to where dx  0 equals 1 if x  0, 0 otherwise.


why no purchases could have occurred in the interval (t,
t  1]: The individual was “dead” at t. Given model In reality, we never know an individual’s latent char-
assumptions (ii) and (iii), we can derive the following acteristics; we therefore remove the conditioning on
expression for the probability of observing x purchases l and m by taking into account the distributions of the
in the interval (t, t  1], conditional on l and m: transaction and dropout rates, giving us

P(X(t, t  1)  x 0 r, a, s, b)

s 1r  x2 r x s
 dx0 c 1 a b d  a b a b a b
b a 1 b
bt 1r 2x! a  1 a1 bt1

B(r  x, s  1) x
(r  s  i) 1
 arbs e B1 a B f (4)
B(r, s) i0 (r  s) i! 2

where

(b  t)
2F1 ar b nars
a
 s, s  1; r  s  x  1; if a  b  t
a
B1  d (5)
bt a
2F1 ar  s, r  x; r  s  x  1; b n (b  t) rs if a bt
bt

(b  t)
2F1 ar b n (a  1) rsi
a
 s  i, s  1; r  s  x  1; if a  b  t
a1
B2  d (6)
bt a
2F1 ar  s  i, r  x; r  s  x  1; b n (b  t  1) rsi if a bt
bt1

ESTIMATING CLV USING AGGREGATED DATA 59


Journal of Interactive Marketing DOI: 10.1002/dir

and 2F1( ) is the Gaussian hypergeometric function. making one repeat purchase in the first year. (In the
(A step-by-step derivation of (3)–(6) is presented in absence of adequate knowledge of the true, underlying
the technical appendix.) data generation process, one can ex post consider the
characteristics of the collected data that might have
Given data summaries of the form presented in led to such patterns. For instance, Tuscan Lifestyles
Table 1, we can estimate the four Pareto/NBD model might have offered a coupon to its new customers that
parameters via the method of maximum likelihood in would expire 1 year after their initial purchases.)
the following manner. Suppose we have a sample of T
period-specific histograms that give us the distribu- More formally, we add a single parameter for each
tion of the number of purchases across a fixed set of group of customers to address this problem. We assume
customers in each period (of equal length). Let nx,t be that, within the first year after making the initial pur-
the number of people who made x purchases in the tth chase, a “hard-core” fraction p of the customers in the
period. (Referring back to the  $50 group in Table 1, cohort make exactly one repeat purchase that year,
T  5, n0,1  611, n1,2  1,441, and so on.) The sample with the remaining fraction 1 p purchasing accord-
log-likelihood function is given by ing to the basic Pareto/NBD process.

T 1  Under this augmented model, the probability that a


LL(r,a,s, b)  a a nx,t ln [P(X(t, t  1)  x 0r, a, s, b)]. customer makes x purchases in the (t  1)th period is
t0 x0
(7) P(X(t, t  1)  x 0 r, a, s, b, p) 

This can be maximized using standard numerical opti- p  (1 p)PPNBD (X(t, t  1)  x) if t  0 and x  1
mization routines. µ (1 p)PPNBD (X(t, t  1)  x) if t  0 and x 1
PPNBD (X(t, t  1)  x) if t  0
MODEL ESTIMATION RESULTS (8)
We first applied the basic model to the data in Table 1. where PPNBD( ) is the basic Pareto/NBD Probability
Using (7), we obtained the maximum likelihood given in (4). From here on, we will only use this
estimates of the model parameters for each of the two “Pareto/NBD with spike” model for the results and
cohorts of customers. We then generated the purchase analysis that follow. Applying the maximum likelihood
histograms for the first 5 years exactly as given and estimation procedure in the same manner as for the
compared these generated histograms to the original basic Pareto/NBD model, we obtain the parameter
data. Looking closely at the raw data (Table 1), we can estimates for each group as reported in Table 2.
see that there is a large number of customers making
one repeat purchase in the first year (for both cohorts). These parameters can be interpreted by plotting the
This number then drops sharply in the second year, various mixing distributions that they characterize.
after which it declines smoothly. On the other hand, the Figure 1 shows how the transaction rates (l) and the
numbers of customers making more than one repeat dropout rates (m) vary across the members of each
purchase do not show any sharp variations. While the cohort. The high values of r and a indicate that there
Pareto/NBD model is very flexible, it is not flexible
enough to capture this year 1 aberration—not only did
it miss the spike at x  1, but in an attempt to capture
this surge in the first year, the predictions for the later
years were off as well.
TABLE 2 Parameter Estimates by Cohort

Many researchers would be tempted to propose a more


COHORT ␶ ␣ S ␤ ␲
complicated story of buyer behavior in order to accom-
modate this aberration. However, inspired by Fader $50 32.83 37.21 12.13 37.74 0.63
and Hardie (2002), we accommodate this year 1 devia- $50 148.11 142.07 29.00 92.26 0.57

tion simply by adding a “spike” in the probability of

60 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

Transaction Rates Dropout Rates These “stories” about the underlying behavioral ten-
≥ $50 dencies within each segment seem to be plausible (and
≥ $50
managerially interesting). Looking at the raw data
alone provides no intuition about the interplay between
f (␭) f(␮)
< $50 the flow of transactions for active customers and the dif-
< $50 ference in dropout tendencies both within and across
each of the customer groups.
0.0 0.5 1.0 1.5 0.00 0.25 0.50 0.75
␭ ␮
Even stronger support for the model is offered in
FIGURE 1 Figure 2; for each of the 5 years, we compare the
Heterogeneity in Transaction and Dropout Rates for the Two
actual distribution of the number of transactions per
Cohorts year with the corresponding fitted distribution
(computed using (8) and the parameter estimates
is relatively little heterogeneity in the underlying given in Table 2), by cohort. Table 3 shows the mean
transaction rate l. Similarly, the high values of s and absolute percentage error (MAPE) between the actu-
b indicate that there is little heterogeneity in the al and predicted numbers for the 5 years, individual-
underlying dropout rate m. Nevertheless, there are ly and across all the years (combined). It is quite
some noteworthy differences across the two groups. It remarkable to see how well a five-parameter model
is clear that the transaction rates tend to be higher can capture the different shapes that are seen with-
for the  $50 group, albeit with a lower variance. The in each set of histograms. More importantly, the
dropout rates are much closer across the two groups, model seems to do an excellent job of following the
but they tend to be slightly higher for the  $50 systematic “shift towards zero” as each group of cus-
group. Finally, the p parameters indicate that a hard- tomers slows down its collective level of purchasing
core of roughly 60% of the customers make just one over time. This is clear evidence that a substantial
repeat purchase in the first year, and this proportion degree of customer dropout is taking place, and
is about the same for each group. therefore confirms the need for the two different

< $50 Cohort

Year 1 Year 2 Year 3 Year 4 Year 5


4,000

2,000

0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+

≥ $50 Cohort

Year 1 Year 2 Year 3 Year 4 Year 5


3,000

1,500

0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+

FIGURE 2
Comparing the Actual (Solid Bar) and Fitted (Clear Bar) Distributions of the Number of Transactions per Year, by Cohort

ESTIMATING CLV USING AGGREGATED DATA 61


Journal of Interactive Marketing DOI: 10.1002/dir

customer per year and the actual annual averages is


Annual and Combined 5-Year MAPE examined in Figure 3. While there are some modest
TABLE 3 Between the Actual and Fitted Values deviations, the overall fit is good. Furthermore, these
for Each Cohort
annual deviations tend to cancel out. For the  $50
cohort, the actual average number of transactions
COHORT YEAR 1 YEAR 2 YEAR 3 YEAR 4 YEAR 5 COMBINED across the 5 years is 2.39, while the model estimate
$50 10.1% 15.3% 12.9% 3.7% 7.2% 9.8% is 2.40; for the  $50 cohort, both the actual and
$50 12.1% 19.2% 10.9% 7.1% 6.1% 11.1% predicted average number of transactions across the
5 years is 2.80.
One additional validation exercise is to determine how
robust the model is when we limit the number of
histograms used to estimate it. Such a task also serves
behavioral processes at the heart of the Pareto/NBD as a type of “holdout test” to see if it is appropriate to
model. project the behavioral patterns beyond the observed
5-year period. Instead of using all 5 years of data to
Another way of summarizing model fit is to compare estimate the model, we re-estimate the model using
the model-based estimate of the average annual num- only the first 3 years of data. We wish to see how well
ber of transactions per customer (E[X(t, t  1)]) with we can predict the histograms for years 4 and 5 despite
the observed averages (as computed using the data in the fact that no data from those years were used for
Table 1). Defining the random variable X(t) as the parameter estimation. Figure 4 offers a comparison
number of transactions occurring in the interval of the model predictions for this limited specification
(0, t], we know from Schmittlein et al. (1987) that and the actual values for years 4 and 5. The close
correspondence of these histograms provides strong
s 1 evidence of the model’s capabilities. The mean absolute
E[X(t) 0 r, a, s, b]  c1 a b d.
rb b
percentage errors for the predictions—individually
a(s 1) bt and across both years (combined)—are given in Table 4;
we note that they are only slightly worse than those
Clearly E[X(t, t  1)]  E[X(t  1)] E[X(t)], so numbers obtained when the years 4 and 5 data were
used to estimate the model parameters (Table 3). This
E[X(t, t  1) ƒ r, a, s, b]
is a tough test for any model, especially one that is
s 1 s 1 calibrated on such a limited amount of data.
ca b a b d.
rb b b
 (9)
a(s 1) b  t bt1 Having established the validity of our modeling
approach, we now turn to the question that motivated
For the five-parameter model (i.e., the basic the Mason (2003) case in the first place.
Pareto/NBD model augmented with a “spike” at x  1
for the first year), we have
WHAT IS A NEW TUSCAN LIFESTYLES
E[X(t, t  1) ƒ r, a, s, b, p] 
CUSTOMER WORTH?
s 1 The Pareto/NBD model enables us to predict the
c1 a b d
rb b
p  (1 p) if t  0 expected transaction stream for a new customer.
a(s 1) b1
µ s 1 s 1
However, to assess the expected lifetime value for a cus-
ca b a b d
rb b b
if t  0. tomer, we also need to predict the monetary amount
a(s 1) b  1 bt1 associated with each purchase. Following Fader,
(10) Hardie, and Lee (2005), if we can assume that the mon-
etary value of each transaction is independent of the
The correspondence between the model-based esti- underlying transaction process—something we must do
mates of the expected number of transactions per here, given the nature of the data given in the Tuscan

62 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

< $50 Cohort ≥ $50 Cohort

1.2 1.2

1.0 Actual 1.0 Actual


Average number of

Average number of
Transactions

Transactions
0.8 Model 0.8 Model

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
1 2 3 4 5 1 2 3 4 5

Year Year

FIGURE 3
Comparing the Average Annual Number of Transactions per Customer with the Corresponding Model-Based
Estimates, by Cohort

< $50 Cohort ≥ $50 Cohort

Year 4 Year 5 Year 4 Year 5


4,000 3,000

2,000 1,500

0 0
0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+ 0 1 2 3 4 5 6 7+

FIGURE 4
Comparing the Actual (Solid Bar) and Predicted (Clear Bar) Distributions of the Number of Transactions in Years 4
and 5 (Given Parameters Estimated Using the First Three Years of Data), by Cohort

number of repeat transactions that a customer will


Years 4 and 5 and Combined MAPE
make, discounted back to the time of acquisition. In
Between the Actual and Predicted Values
TABLE 4 for Each Cohort (Given Parameters other words, a transaction that occurs, say, 10 years in
Estimated Using the First 3 Years of Data) the future, is only worth a fraction of a transaction at
time zero. DET is the sum of these “fractional transac-
COHORT YEAR 4 YEAR 5 COMBINED tions” and therefore captures both the overall number of
 $50 4.6% 10.0% 7.3% them as well as their spread over time. This number
 $50 8.3% 7.6% 8.0% of discounted expected transactions can then be res-
caled by a value “multiplier” to yield an overall estimate
of expected lifetime value:

Lifestyles case—the value per transaction (revenue per E(CLV)  margin  E(revenue/transaction)  DET.
transaction  contribution margin) can be factored out (11)
and we can focus on forecasting the “flow” of future
transactions, discounted to yield a present value. Fader, Fader, Hardie, & Lee (2005) present an expression for
Hardie, and Lee (2005) call this quantity “discounted DET as implied by the Pareto/NBD model. However,
expected transactions,” or DET; it is the effective we cannot use it in this setting because of the

ESTIMATING CLV USING AGGREGATED DATA 63


Journal of Interactive Marketing DOI: 10.1002/dir

modification to the basic model to accommodate the this value, at $89. Clearly, a customer who makes a
more-than-expected number of people making just high-value first purchase with Tuscan Lifestyles is
one repeat purchase in the first year. We can there- more valuable in the long run compared to a customer
fore compute DET in the following manner: who makes a low-value first purchase; the lone data-

point of the value of the first purchase is reasonably
E[X(t, t  1)] discriminating in determining a customer’s future
DET  a t0.5 , (12)
t0 (1  d) worth. Most of this difference is due to the fact that
the average order size is 65% higher for the  $50
where d is the annual discount factor and E[X(t, t  1)] cohort; in contrast, DET for the  $50 cohort is only
is computed using the expression given in (10). As the 17% higher than the corresponding number for the
transactions can occur at any point in time during  $50 cohort.
the year, we discount them as if, on average, they
occur in the middle of the year. Note that this expres- The equivalent 5-year DET numbers using the annual
sion for DET does not include the initial transaction averages computed using the data in Table 1 are 2.04
that signals the start of the customer’s relationship and 2.39, resulting in “5-year lifetime value” estimates
with the firm. (If we wish to include this transac- of $40 and $76, respectively. Because of the truncation
tion—which we would need to do if we wish to set an at the end of 5 years, these numbers underestimate
upper bound for acquisition spend—we simply add 1 the true expected lifetime value by 14%.
to our estimate of DET.)
Some analysts may be willing to live with a 14% error
Coming to the revenue/transaction component of the for the sake of analytical simplicity. However, we can-
definition of CLV, we note that the Tuscan Lifestyles not be sure that the underestimation will always be
case only provides annual summary statistics of so low, for instance, when the variation in transaction
the spending levels for each cohort (Mason, 2003, rates and dropout rates is high. For the data at hand,
Exhibit 3). While it is easy to conceptualize stochas- not only is the mean purchasing rate low and the
tic models to capture the random variation in revenue/ dropout rate high for both cohorts, but the variation
transaction over time (Fader, Hardie, & Lee, 2005), in transaction rates and dropout rates for the cohorts
it would be difficult to estimate reliably parameters is also quite low. In other studies (e.g., Fader, Hardie, &
of such a model given the limited data on monetary Lee, 2005), considerably higher heterogeneity (along
value available here. Since the two groups were with faster purchasing and slower dropout) have been
defined on the basis of initial expenditure, this observed. Thus, the 14% underestimation in this case
removes much of the cross-sectional variation in rev- is a very modest number; in many other settings, the
enue/transaction. Thus, it is more appropriate to impact of ignoring the future when performing CLV
assume a constant level for the purchase amounts calculations will likely be much larger. And beyond
within each group of customers. The case data indi- the CLV calculation per se, the use of the model offers
cate that the mean spending level across the 5 years many other useful diagnostics as discussed earlier
for the  $50 group is (32.09  41.78  51.05  52.43 and below.
 53.63)/5  $46.20 per transaction, while for the
 $50 group it is (93.46  74.02  67.75  67.12  Referring back to Figure 1, the between-cohort differ-
78.26)/5  $76.12. Finally, we follow the case and use ences in the distributions of the dropout rates are
a fixed margin of 42% for every transaction, and a dis- smaller than those for the transaction rates. While
count factor of 10% for our CLV calculations. the mean (b/(s 1)) and median (b (21/s 1)) lifetimes
are slightly higher for the  $50 cohort (3.5 and
Using (10) and (12), we find that DET equals 2.36 for 2.4 years versus 3.4 and 2.2 years), the differences in
the  $50 group and 2.77 for the  $50 group. (In eval- the survival curves (Figure 5, left side) are negligible.
uating (12), we terminate the series at 100 years, Thus, the differences in DET are driven by differ-
which effectively represents infinity.) It follows that ences in the transaction rates. We note that the mean
our estimate of expected CLV for the  $50 group is of the transaction rate distribution is 0.88 (purchases
$46, while the expected CLV for a randomly-chosen per annum while alive) for the  $50 cohort and 1.04
member of the  $50 group is almost double for the  $50 cohort. This difference is reflected in the

64 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

1.00 4
..... ...... ....... ....... ....... ...... ....... .......
< $50 Cohort ... .....

E(Cum. Transactions)
..... .. ........................................................................................
0.75 ... ≥ $50 Cohort 3 .... ................................
...
..... ... ..............
..
% Alive
..... ..........
0.50 .... 2 .
...... .......
.... .....
......
..... ...
0.25 ...... 1 ..... < $50 Cohort
.........
......... ≥ $50 Cohort
.............
............................................
0.00 ........................................................... 0
0 5 10 15 20 25 0 5 10 15 20 25
Year Year

FIGURE 5
Plots of the Percentage Number of Customers Still Alive and the Expected Cumulative Number of
Transactions per Customer for Years 1–25, by Cohort

plots of expected cumulative transactions (undis- variance in the discounted number of transactions is
counted), given on the right side of Figure 5. actually higher for this cohort (2.67 versus 2.14 for the
 $50 cohort).
As a final illustration of the value-added associated
with the use of a stochastic model of buyer behavior, let If we had sufficient data to estimate a stochastic
us consider the question of variability in CLV (or DET). model for revenue/transaction, we could augment our
To explore this, we simulate purchase sequences for estimates of expected CLV by the full distribution of
each customer, which are then discounted to give “dis- CLV across the customer base (and associated sum-
counted transaction” numbers. The between-customer mary statistics).
distribution of this quantity is reported in Figure 6 for
both cohorts. This figure shows how the discounted
transactions are spread around the expected DET for
DISCUSSION AND CONCLUSIONS
each cohort; computing the average of these numbers
yields the average DET for each cohort, as reported The Tuscan Lifestyles case offers a simple new twist
above. We note that while the variance in transaction on the standard view of how to value a newly acquired
rates is lower for the  $50 cohort (Figure 1), the customer, highlighting how standard retention-based

< $50 Cohort ≥ $50 Cohort


0.20 0.20

0.15 0.15

0.10 0.10

0.05 0.05

0.00 0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
Number of Discounted Transactions Number of Discounted Transactions

FIGURE 6
Distribution of Discounted Number of Transactions per Customer, by Cohort

ESTIMATING CLV USING AGGREGATED DATA 65


Journal of Interactive Marketing DOI: 10.1002/dir

approaches to the calculation of expected CLV are other kinds of splits yield more dramatic differences
impractical in a noncontractual setting. It is a sim- between groups of customers? These differences should
ple exercise to use the data presented in the case to be gauged not only in terms of overall expected CLV for
arrive at an estimate of “expected 5-year CLV.” each group but also in terms of the Pareto/NBD model
However, if we wish to arrive at an estimate that components. Maybe a certain kind of split can lead to
includes the customer’s “life” beyond 5 years or are a greater degree of homogeneity in each group’s trans-
interested in, say, sorting out the purchasing process action rates and/or dropout rates, thereby reducing
(while alive) from the attrition process or computing some uncertainty about their future behavior and
the distribution of CLV, we need to use a formal making it easier to target members of each group.
model of buying behavior. While the Pareto/NBD There are clearly many substantive benefits that arise
model is a natural starting point, existing results do from this kind of analysis.
not allow us to estimate the model parameters using
the data summaries presented in the case. A key From a methodological standpoint, the move from
contribution of this paper is the derivation of an detailed transaction data to histograms raises other
expression that enables us to do this. questions as well. How about data structures that lie
somewhere in between these two extremes? For
Our estimated parameters and subsequent calcula- instance, it is easy to imagine firms maintaining
tions offer useful insights that could not have been “interval-censored” data, that is, period-by-period
obtained without the formal model. For instance, we counts for each customer. Some ideas about how to
were able to decompose the expected CLV into four develop models using this kind of data structure are
factors, namely, purchasing while active, dropout, explored by Fader and Hardie (2005). Other questions
surge in sales in the first year, and monetary value relate to the length of the “window” for the censoring
of the average purchase. We observed a kind of process (e.g., quarterly histograms versus yearly his-
“triple jeopardy” in that the more valuable cohort tograms) and the number of histograms needed to
proved to be better on the three most critical factors obtain stable parameter estimates. All in all, there
(i.e., all but the first-year sales surge). This observa- are many promising research opportunities to be pur-
tion by itself deserves additional study, and may be sued down this path.
the basis for an interesting “empirical generaliza-
tion” about CLV differences across groups. By simply Although these methodological questions may be
eye-balling the raw data, it might be possible to straying pretty far from the original issues raised in
identify the existence of these factors, but it is the Tuscan Lifestyles case, they provide proof of the
impossible to assess their magnitudes, and, more healthy links that exist between well-formed man-
importantly, the difference in their magnitudes agerial questions and appropriately constructed
across the two cohorts. For example, one can observe empirical models. New developments in one area fre-
a considerable dropout rate in both cohorts, but can- quently open up new possibilities in another area, to
not ascertain how the within-cohort distributions for the benefit of everyone on both sides. We see Tuscan
the dropout rates might be different. Similarly, a Lifestyles as the beginning of such a dialogue, and we
spike in purchases in the first year is quite evident look forward to continuing the conversation.
from the histograms, but without the underlying
“organic” model of purchase, the magnitude of the
spike cannot be obtained. REFERENCES
Abramowitz, M., & Stegun, I. A. (Eds.). (1972). Handbook of
It is easy to see how these insights and projections can Mathematical Functions. New York: Dover Publications.
be of use to the management of Tuscan Lifestyles (and Agrawal, R., & Srikant, R. (2000). Privacy-Preserving
many other firms that face similar issues). Besides Data Mining. Proceedings of the 2000 ACM SIGMOD
International Conference on Management of Data.
being able to judge the economic efficiency of different
Dallas, Texas, May 15–18 (pp. 439–450). New York:
kinds of acquisition strategies, the model presented ACM.
here can help managers determine better ways to Berger, P. D., Weinberg, B., & Hanna, R. C. (2003). Customer
define cohorts—does it make the most sense to divide Lifetime Value Determination and Strategic Implica-
customers on the basis of initial expenditure, or would tions for a Cruise-Ship Company. Journal of Database

66 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

Marketing & Customer Strategy Management, 11, Mielikäinen, T. (2004). Privacy Problems with Anonymized
40–52. Transaction Databases. In S. Arikawa & E. Suzuki
Carey, P. (2004). Data Protection: A Practical Guide to UK and (Eds.), Discovery Science: Proceedings of the 7th
EU Law (2nd ed.). Oxford, UK: Oxford University Press. International Conference (DS2004). Lecture Notes in
Computer Science, 3245 (pp. 219–229). Berlin: Springer.
Casella, G., & Berger, R. L. (2002). Statistical Inference
(2nd ed.). Pacific Grove, CA: Duxbury. Mason, C. H. (2003). Tuscan Lifestyles: Assessing Customer
Fader, P. S., & Hardie, B. G. S. (2002). A Note on an Lifetime Value. Journal of Interactive Marketing, 17,
Integrated Model of Customer Buying Behavior. 54–60.
European Journal of Operational Research, 139, 682–687. Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduc-
Fader, P. S., & Hardie, B. G. S. (2005). Implementing the tion to the Theory of Statistics, (3rd ed.). New York:
Pareto/NBD Model Given Interval-Censored Data. McGraw-Hill Publishing Company.
Retrieved April 28, 2007, from http://brucehardie.com/ Schmittlein, D. C, Morrison, D. G., & Colombo, R. (1987).
notes/011/ Counting Your Customers: Who Are They and What Will
Fader, P. S., & Hardie, B. G. S. (2006). Deriving an Expression They Do Next? Management Science, 33, 1–24.
for P(X(t)  x) Under the Pareto/NBD Model. Retrieved Schmittlein, D. C., & Peterson, R. A. (1994). Customer Base
April 28, 2007, from http://brucehardie.com/notes/012/ Analysis: An Industrial Purchase Process Application.
Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). RFM and Marketing Science, 13, 41–67.
CLV: Using Iso-value Curves for Customer Base Singleton, S. (2004). Tolley’s Data Protection Handbook
Analysis. Journal of Marketing Research, 42, 415–430. (3rd ed.). Croyden, Surrey: LexisNexis UK.

ESTIMATING CLV USING AGGREGATED DATA 67


Journal of Interactive Marketing DOI: 10.1002/dir

TECHNICAL APPENDIX

Schmittlein, Morrison, and Colombo (1987) and Fader and Hardie (2006) derive expressions for P(X(t)  x), where
the random variable X(t) denotes the number of transactions observed in the time interval (0, t], as implied by the
Pareto/NBD model assumptions. In this appendix, we derive the corresponding expression for P(X(t, t  1)  x),
where the random variable X(t, t  1) denotes the number of transactions observed in the time interval (t, t  1].

Let us first review the assumptions underlying the Pareto/NBD model:


i. Customers go through two stages in their “lifetime” with a specific firm: they are “alive” for some period of
time, then become permanently inactive.
ii. While alive, the number of transactions made by a customer follows a Poisson process with transaction rate l.
This implies that the probability of observing x transactions in the time interval (0, t] is given by

(lt) xe lt
P(X(t)  x ƒ l)  , x  0, 1, 2, # # # #
x!
It also implies that, assuming the customer is alive through the time interval (ta, tb],
[l(tb ta )] xe l(tb ta)
P(X(ta, tb )  x ƒ l)  , x  0, 1, 2, # # # #
x!
iii. A customer’s unobserved “lifetime” of length v (after which he is viewed as being inactive) is exponentially
distributed with dropout rate m:

f (vm)  me mv
.

iv. Heterogeneity in transaction rates across customers follows a gamma distribution with shape parameter r
and scale parameter a:

arlr 1e la
g(l 0r, a)  . (A1)
(r)

v. Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter s and
scale parameter b:

bsms 1e mb
g (m 0s, b)  . (A2)
(s)

vi. The transaction rate l and the dropout rate m vary independently across customers.

Suppose we know an individual’s unobserved latent characteristics l and m. For x  0, there are two ways x pur-
chases could have occurred in the interval (t, t  1]:
i. The individual was alive at t and remained alive through the whole interval; this occurs with probability
e m(t1). The probability of the individual making x purchases, given that he was alive during the whole inter-
val, is lxe ␭/x!. It follows that the probability of remaining alive through the interval (t, t  1] and making x
purchases is

lxe lx
e m(t1)
. (A3)
x!

68 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

ii. The individual was alive at t but “died” at some point v ( t ⫹ 1), making x purchases in the interval (t, v].
The probability of this occurring is
t1 1
t)]xe t)
lx m (l  m) x1sxe (lm)s

冮 冮
l(v
[l(v
me mv
dv  e mt
ds,
x! (l  m) x1
x!
t 0

which, noting that the integrand is an Erlang-(x  1) pdf, equals

x x
[(l  m)] i
a ba b c1 d.
mt l m (lm)
e e a (A4)
lm lm i0 i!

These two scenarios also hold for the case of x  0 but need to be augmented by an additional reason as to why
no purchases could have occurred in the interval (t, t  1]: The individual was dead at the beginning of the
interval, which occurs with probability
mt
1 e . (A5)

Combining (A3)–(A5) gives us the following expression for the probability of observing x purchases in the interval
(t, t  1], conditional on l and m:

lx e l
e m(t1) x
P(X(t, t  1)  x ƒ l, m)  dx0 c 1 d  a b a be
ut l m mt
e
x! lm lm
x x
[(l  m)] i
a b a be l e
l m m(t1)
a . (A6)
lm lm i0 i!

In reality, we never know an individual’s latent characteristics; we therefore remove the conditioning on l and
m by taking the expectation of (A6) over the distributions of  and M:
 
P(X(t, t  1)  x 0r, a, s, b)  冮 冮 P(X(t, t  1)  x ƒ l, m)g(l ƒ r, a)g(m ƒ s, b)dl dm.
0 0
(A7)

Substituting (A1), (A2), and (A6) in (A7) gives us


x
1
P(X(t, t  1)  x ƒ r, a, s, b)  dx0 A1  A2  A3 a i! A4 (A8)
i0

where

A1  冮 [1
0
e mt
] g(m ƒ s,b) dm (A9)

  x

冮冮
l e le m(t1)
A2  g(l ƒ r, a)g(m ƒ s, b)dl dm (A10)
0 0
x!
  x

冮冮 a b a be
l m
A3  mt
g(l ƒ r, a) g(m ƒ s, b)dl dm (A11)
0 0
lm lm
  x

冮冮 a b a b (l  m) ie
l m
A4  m(t1)
g(l ƒ r, a) g(m ƒ s, b)dl dm (A12)
0 0
lm lm

ESTIMATING CLV USING AGGREGATED DATA 69


Journal of Interactive Marketing DOI: 10.1002/dir

Solving (A9) and (A10) is trivial:


s
A1  1 a b
b
(A13)
bt
(r  x) r x s
a b a b a b
a 1 b
A2  (A14)
(r)x! a  1 a1 bt1

To solve (A11), consider the transformation Y  M/(  M) and Z    M. Using the transformation technique
(Casella & Berger, 2002, Section 4.3, pp. 156–162; Mood, Graybill, & Boes, 1974, Section 6.2, p. 204ff), it follows
that the joint distribution of Y and Z is
arbs
g(y, z ƒ a, b, r, s)  ys 1
(1 y) r 1 rs
z 1
e z(a (a b)y)
. (A15)
(r) (s)

Noting that the inverse of this transformation is l  (1 y)z and m  yz, it follows that
1 
A3  冮 冮 y(1
0 0
y) xe yzt
g(y,z ƒ a, b, r, s) dz dy

1 
arbs

(r) (s) 冮 冮 y (1
0 0
s
y) rx 1zrs 1
e z(a (a (bt))y)
dz dy
1
(b  t) (rs)
b yd
arbs
冮 c1 a
1 a
 ys (1 y) rx 1
dy,
B(r, s) ars 0
a

which, recalling Euler’s integral for the Gaussian hypergeometric function,1 equals

b s B(r  x, s  1) (b  t)
a b 2F1 ar  s, s  1; r  s  x  1; b.
a
(A16)
a B(r, s) a
1b  t2 a
Looking closely at (A16), we see that the argument of the Gaussian hypergeometric function, , is
a
guaranteed to be bounded between 0 and 1 when a  b  t, thus ensuring convergence of the series representa-
a 1b  t2
tion of the function. However, when a  b  t we can be faced with the situation where 6 1 , in which
a
case the series is divergent.

Applying the linear transformation (Abramowitz & Stegun, 1972, equation 15.3.4)

2 F1 aa, 1 b,
z
2F1 (a, b; c; z)  (1 z) a
c b; c; z
(A17)
gives us
arbs B(r  x, s  1) bt a
A3  2F1 ar  s, r  x; r  s  x  1; b. (A18)
(b  t) rs B(r, s) bt

We note that the argument of the above Gaussian hypergeometric function is bounded between 0 and 1 when
a b  t. We therefore present (A16) and (A18) as solutions to (A11), using (A16) when a  b  t and (A18)
when a b  t. We can write this as

B(r  x, s  1)
A3  arbs B1 (A19)
B(r, s)

1
See http://functions.wolfram.com /07.23.07.0001.01

70 JOURNAL OF INTERACTIVE MARKETING


Journal of Interactive Marketing DOI: 10.1002/dir

where

(b  t)
2F1 ar b nars
a
 s, s  1; r  s  x  1; if a  b  t
B1  µ
a
(A20)
bt a
2F1 ar  s, r  x; r  s  x  1; b n (b  t) rs if a b  t.
bt

To solve (A12), we also make use of the transformation Y  M/(  M) and Z    M. Given (A15), it follows
that

1 
A4  冮 冮 y(1
0 0
y) xzie (1 y)z
e yz(t1)
g(y, z ƒ a, b, r, s) dz dy
1 
arbs

(r) (s) 冮 冮 y (1
0 0
s
y) rx 1 rsx
z 1
e z(a1 (a (bt))y)
dz dy
1
(r  s  i) a rbs (b  t) (rsi)

冮 c1 a by d
a
 ys (1 y) rx 1
dy,
(r) (s) (a  1) rsi 0
a1

which, recalling Euler’s integral for the Gaussian hypergeometric function, equals

(r  s  i) arbs B(r  x, s  1) a (b  t)
2F1 ar  s  i, s  1; r  s  x  1; b. (A21)
(r  s) (a  1) rsi B(r, s) a1

Noting that the argument of the Gaussian hypergeometric function is only guaranteed to be bounded between
0 and 1 when a  b  t, we apply the linear transformation (A17), which gives us

(r  s  i) arbs B(r  x, s  1) bt a


A4  F ar  s  i, r  x; r  s x 1; b. (A22)
(r  s) (b  t  1) rsi B(r, s) 2 1 bt1

The argument of the above Gaussian hypergeometric function is bounded between 0 and 1 when a b  t. We
therefore present (A21) and (A22) as solutions to (A12): We use (A21) when a  b  t and (A22) when a b  t.
We can write this as
(r  s  i) B(r  x, s  1)
A4  arbs B2 (A23)
(r  s) B(r, s)
where
F1 (r  s  i, s  1; r  s  x  1; a a (b 1 t ) )兾(a  1) rsi if a  b  t
B2  e 2 b  t (A24)
2F1 (r  s  i, r  x; r  s  x  1; b  t  1 )兾(b  t  1) if a b  t.
a rsi

Substituting (A13), (A14), (A19), and (A23) in (A8) gives us the following expression for the probability of
observing x transactions in the time interval (t, t  1]:

s
(r  x) r x s
P(X(t, t  1)  x 0 r, a, s, b)  dx0 c 1 a b d  a b a b a b
b a 1 b
bt (r)x! a  1 a1 bt1
B(r  x, s  1) x
(r  s  i) 1
 aabs e B1 a B f (A25)
B(r, s) i0 (r  s) i! 2

where expressions for B1 and B2 are given in (A20) and (A24), respectively.

ESTIMATING CLV USING AGGREGATED DATA 71

You might also like