0% found this document useful (0 votes)
27 views

A Causal Inference Approach To Measure Price Elasticity in Automobile Insurance

Uploaded by

Filipa Vaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

A Causal Inference Approach To Measure Price Elasticity in Automobile Insurance

Uploaded by

Filipa Vaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Expert Systems with Applications 41 (2014) 387–396

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

A causal inference approach to measure price elasticity in Automobile


Insurance
Leo Guelman a,⇑, Montserrat Guillén b
a
Royal Bank of Canada, RBC Insurance, 6880 Financial Drive, Mississauga, Ontario L5N 7Y5, Canada
b
Dept. Econometrics, Riskcenter, University of Barcelona, Diagonal 690, E-08034 Barcelona, Spain

a r t i c l e i n f o a b s t r a c t

Keywords: Understanding the precise nature of price sensitivities at the individual policyholder level is extremely
Causal inference valuable for insurers. A rate increase has a direct impact on the premium customers are paying, but there
Price elasticity is also the indirect impact as a result of the ‘‘causal’’ effect of the rate change on the customer’s decision to
Price optimization renew the policy term. A rate increase may impair its intended impact on the overall profitability of the
Insurance
portfolio if it causes a large number of policyholders to lapse their policy and switch to an alternative
insurer. The difficulty in measuring price elasticity from most insurance databases is that historical rate
changes are reflective of a risk-based pricing exercise. As a result, the specific rate change at which a cus-
tomer is exposed is a deterministic function of her observed covariates. The nature of the data is thus
observational, rather than experimental. In this context, measuring the causal effect of a rate change
on the policyholder’s lapse outcome requires special modeling considerations. Conventional modeling
approaches aimed to directly fit the lapse outcome as a function of the rate change and background
covariates are likely to be inappropriate for the problem at hand. In this paper, we propose a causal infer-
ence framework to measure price elasticity in the context of Auto Insurance. One of the strengths of our
approach is the transparency about the extent to which the database can support causal effects from rate
changes. The model also allows us to more reliably estimate price-elasticity functions at the individual
policyholder level. As the causal effect of a rate change varies across individuals, making an accurate rate
change choice at the individual subject level is essential. The rate at which each subject is exposed could
be optimized on the basis of the individual characteristics, for the purpose of maximizing the overall
expected profitability of the portfolio.
 2013 Elsevier Ltd. All rights reserved.

1. Introduction Auto Insurance is mandatory in many countries, a rate change


exceeding a certain threshold will make a policyholder more likely
Cost-based pricing of individual risks is a fundamental concept to shop for an alternative insurer and potentially switch to another
in the actuarial ratemaking literature. The goal of ratemaking company. If the rate change causes a large number of customers to
methodologies is to estimate the future costs related to the insur- lapse their policy, the revised rates could impair its intended im-
ance coverage. The loss cost approach defines the price of an insur- pact on the profitability of the insurance portfolio.
ance policy as the ratio of the estimated costs of all expected future In recent years, insurers are switching from a pure cost-based
claims against the coverage provided by the policy to the risk expo- pricing to a demand-based pricing. Price optimization strategies
sure, plus expenses (Denuit et al., 2007). There is a wealth of actu- (Towers Perrin, 2007) aim to integrate the cost-based pricing with
arial literature regarding appropriate methodologies for using the customer’s willingness to pay into an overall pricing frame-
exposure and claims data in order to calculate indicated rates work. A key component of this framework involves predicting, to
(Brown and Gottlieb, 2007; Finger, 2006). a high degree of accuracy, how customers will respond to alterna-
A revised set of rates will impact the profitability of an insur- tive rate changes, conditional on the customer’s characteristics
ance portfolio due to its direct impact on the premiums that poli- being held fixed.1
cyholders are paying. However, there is also the indirect impact as If we let for a moment the rate change play the role of a treat-
a result of the policyholders’ reaction to the rate change. As basic ment with varying ‘dose’ levels, the main problem involves the

⇑ Corresponding author. Tel.: +1 905 606 1175; fax: +1 905 286 4756.
1
E-mail addresses: [email protected] (L. Guelman), [email protected] (M. An additional issue is the reaction to new products or cross-selling (see, for
Guillén). instance Kaishev et al., 2012; Thuring et al., 2012).

0957-4174/$ - see front matter  2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2013.07.059
388 L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396

selection of optimal treatments for individuals on the basis of esti- our approach to price elasticity estimation in the context of Auto
mates of potential outcomes resulting from treatment alternatives. Insurance. Managerial implications and a conclusion are outlined
A similar kind of estimation problem is shared across many disci- at the end.
plines, ranging from economics to medicine. In this sense, the price
elasticity problem can be conceived under a causal inference
framework, which is typically interested in questions of the form 2. Price elasticity as a causal inference problem
‘‘what would happen to a subject had she been exposed to treat-
ment B instead of A?’’. The alternative choice B is a counterfactual We postulate the problem in the context of Rubin’s model of
with an associated potential outcome. Thus considerations about causal inference. This model conceptualizes the causal inference
potential outcomes from alternative treatment choices seems ines- problem in terms of potential outcomes under each treatment,
capable from the price elasticity estimation problem. only one of which is observed for each subject. In this paper, we
A randomized controlled experiment is generally the best ap- draw on the terminology and framework of experiments, and use
proach for drawing statistical inferences about effects caused by the words treatment and rate change interchangeably. The notation
treatments. The most effective way to measure price elasticity at introduced below will be used throughout the paper.
the portfolio level would be to randomize the allocation of pol- The insurance portfolio is composed of L policyholders,
icyholders to various treatment levels and then measure the im- ‘ = {1, 2, . . . , L}, characterized by a vector of pre-treatment covari-
pact on retention. However, in the most common situation, ates x‘. We consider the case of T treatments (representing rate
insurance databases contain historical price changes which are change levels), indexed by t = {1, 2, . . . , T}. We let Z‘t be a set of T
reflective of a risk-based pricing exercise. Under this situation, binary treatment indicators, such that Z‘t = 1 if subject ‘ received
treatment assignment is a deterministic function of the policy- treatment t, and Z‘t = 0 otherwise. We postulate the existence of
holder’s observed risk characteristics. The nature of the data is potential responses r‘t to denote the renewal outcome2 that would
thus observational rather than experimental, as randomization be observed from policyholder ‘ if assigned to treatment t. The ob-
P
is not used to assign treatments. In the absence of experimental served response for subject ‘ is R‘ ¼ Tt¼1 Z ‘t r ‘t .
design, causal inference is more difficult and requires appropri- Our interest lies in estimating price elasticity, defined here as the
ate modeling techniques. expected renewal outcomes that result and are caused by the rate
The standard actuarial approach to measure price elasticity in change interventions. Here causation is in the sense of ceteris pari-
insurance is to model the policyholder’s lapse behavior as a bus, meaning that we hold all policyholder’s covariates constant.
function of the rate change and the policyholder’s covariates Our aim is to obtain an estimate of the price-elasticity functions at
(Anderson et al., 2007; Smith et al., 2000; Yeo et al., 2001). the policyholder level, ^r‘t 8t ¼ f1; . . . ; Tg, and in particular in differ-
The key assumption is that the inclusion of those covariates will ences of the form ^r ‘j  ^r‘k , the causal effect of exposing subject ‘ to
adjust for the potential exposure correlations between price treatment j rather than to treatment k (for any j – k). We then use
elasticity and other explanatory variables. This approach will these individual estimates to construct an aggregate price-elasticity
P
function at the portfolio level, l ^ ðtÞ ¼ ð1=LÞ L‘¼1^r ‘t . If the variability
be unreliable for estimating causal effects from observational
data due to masked extrapolation problems, and the sensitivity of the causal effect ^r‘j  ^r‘k is large over L, then the average may
of the results to unwarranted assumptions about the form of not represent the causal effect on a specific policyholder ‘. The
the extrapolation (Berk, 2004, p. 115; Guo and Fraser, 2009, p. assumption that the effect of t is the same on every subject is known
82; Rubin, 1973, 1979; Morgan and Winship, 2007, p. 129). as the constant treatment effect, and it is relaxed in this study.
The problem is even worse when the number of explanatory In the context of observational data, policyholders exposed to
variables is large, as groups may differ in a multivariate direc- different rate change levels are not directly comparable. As a result,
tion and so non-overlap problems are more difficult to detect price-elasticity estimation requires adjustment for differences in
(Rubin, 1997). Standard statistical software can be remarkably the pre-treatment covariates. As discussed above, when the num-
deceptive for this objective because regression diagnostics do ber covariates is large and their distribution varies substantially
not include careful analysis of the distribution of the predictors among the different rate change levels, simple covariance adjust-
across treatment groups. When the overlap is too limited, the ment methods are typically inadequate. In this paper, we propose
data cannot support any causal conclusions about the differen- using propensity scores (Rosenbaum and Rubin, 1983) and match-
tial effects of treatments (Englund et al., 2008; Guelman et al., ing algorithms (Gu and Rosenbaum, 1993) as a method for remov-
2012; Guillén et al., 2012). ing all biases associated with differences in the pre-treatment
In this article, we propose a method for estimating price elas- variables. Our methodology offers a rigorous analysis of price-elas-
ticity with roots in Rubin’s causal model (Rosenbaum and Rubin, ticity in the context of Auto Insurance based on causal inference
1983, 1984; Rubin and Waterman, 2006). One of the strengths of foundations. The next section discusses the method in detail.
our approach is the transparency about the data support for
estimating the impact of rate changes on customer retention 3. The method
at the portfolio level. The model also allows us to more reliably
estimate individual price-elasticity functions. As the causal effect Without loss of generality, in this section we will present the
of a rate change varies across individuals, an accurate choice of method in a simplified case. We will focus on the binary treatment
the treatment at the individual subject level is essential. Each case, with t = {0, 1}, and let Z‘ = 1 if subject ‘ received the first treat-
subject’s treatment could be optimized on the basis of individual ment (the treated subjects), and Z‘ = 0 if received the alternative
characteristics, and thus maximize the overall positive impact of treatment (the control subjects). In the context of this study,
the rate change intervention. multi-valued treatments are handled by analyzing a set of binary
This article is organized as follows. We first formalize the treatment dichotomies. That is, given T treatments, we analyze
price elasticity estimation problem from a causal inference the T(T  1)/2 unordered dichotomies.3
perspective. We follow with an overview of the key assumptions
required to derive unbiased estimates of average causal effects 2
We denote the renewal outcome equal to 1 if the policyholder lapses (does not
caused by treatment interventions from observational data. renew), and 0 otherwise.
Propensity scores and matching algorithms are discussed next. 3
For example, with three treatments (T = 3), there are 3 = T(T  1)/2 unordered
The second half of the paper presents a detailed application of treatment dichotomies: {(1, 2), (1, 3), (2, 3)}.
L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396 389

3.1. Unconfoundedness and common support where the subscripts x‘jZ‘ = 1 indicate that the outer expectation is
taken over the distribution of x in the treated group. Finding treated
The fundamental problem of causal inference (Holland, 1986) is and control observations with similar values of the covariates will
that each subject receives only a single treatment, and thus r‘t is be impractical, if not impossible, with many covariates. Alternative
only observed for a single value of t. Hence, causal inference is, methods must be used, which are discussed in the next two
in a sense, a missing data problem because the counterfactual out- selections.
comes are never observed. In principle, if treatment assignment
was randomized across the portfolio of policyholders, which im- 3.2. Propensity score
plies that the assignment ignores the possible impact of the treat-
ment on the outcomes, then estimating the average causal effects The propensity score is the conditional probability of assign-
of rate changes would be straightforward. Randomization tends ment to treatment 1 given the pre-treatment covariates,
to balance observed and unobserved covariates across the treat-
ments, as subjects are drawn from the same population. In this
pðx‘ Þ ¼ PðZ ‘ ¼ 1jx‘ Þ: ð5Þ
context, the average treatment effect (ATE) of treatment 1 relative In a randomized experiment, treatment assignment is per-
to treatment 0 can be estimated by formed by a coin flip, and so the propensity score p(x‘) = 1/2 " x‘.
ATE ¼ E½r ‘1 jZ ‘ ¼ 1  E½r‘0 jZ ‘ ¼ 0 ¼ E½R‘ jZ ‘ ¼ 1  E½R‘ jZ ‘ ¼ 0: ð1Þ In this case, the results in the two treatment groups are directly
comparable as subjects are likely to be similar. In contrast, in an
Such randomization is unlikely to happen in most insurance observational study, the propensity score is typically unknown,
databases from which price elasticity is to be estimated. In most and must be estimated from observed quantities. Direct compari-
cases, these databases contain historical price changes which are sons can be misleading as some individuals are more likely than
reflective of a risk-based pricing exercise. Under this situation, others to receive one of the treatments, and so p(x‘) – 1/2 for some
treatment assignment is a deterministic function of the policy- individuals. However, suppose we pair subjects with different
holder’s observed risk characteristics. The nature of the data is thus treatments, but the same propensity score. The individual pairs
observational, as randomization is not used to assign treatments. might have different covariate values, but their difference will be
In this setting, covariates are not likely to be balanced across treat- irrelevant for predicting treatment assignment. Intuitively, this
ment groups. Estimating casual effects is more difficult as now the also suggests that the distribution of the observed covariates will
groups are not directly comparable. However, much progress can be similar for treated and control subjects with the same propen-
be made under two assumptions. The first, is the unconfoundedness sity score. This thought is formalized by the balancing property of
assumption, which states that conditional on x‘, the outcomes the propensity score, which states that treatment Z and the ob-
(r‘1, r‘0) are independent of treatment Z‘, served covariates x are conditionally independent given the pro-
ðr‘1 ; r ‘0 Þ ? Z ‘ jx‘ : ð2Þ pensity score p(x),

This condition implies that treatment assignment may depend Z ? xjpðxÞ: ð6Þ
upon the observed covariates x, but not on unobserved covariates Instead of having to match subjects exactly on their covariates
or potential responses after controlling for x. This assumption is x, the balancing property allows us to match only on a single var-
non-testable, but very likely to hold in our study, as all the histor- iable, namely the propensity score, and this will tend to balance all
ical variables used to assign policyholders to rate change levels are the observed covariates. Notice that this property only ensures bal-
observable covariates, and have been stored and are accessible to ance on the observed covariates. In that sense, randomization is a
us for the modeling exercise. much more effective tool to balance covariates, as it also provides a
The second assumption is the common support (a.k.a. overlap), basis for expecting that unobserved covariates and potential re-
and states that every unit in the population has a chance of receiv- sponses are also balanced.4
ing both treatments, The balancing property holds independently of whether the
0 < pðx‘ Þ  PðZ ‘ ¼ 1jx‘ Þ < 1; ð3Þ treatment assignment is strongly ignorable or not. However, a sec-
ond property of the propensity score is a key result which shows
where p(x) is known as the propensity score, discussed in the next that if treatment assignment is strongly ignorable given x, then it
section. This assumption is at risk in situations where treatment is also strongly ignorable given the propensity score p(x). That is,
assignments are based on ‘hard rules’ (e.g., every policyholder if (2) and (3) hold, then the following also holds,
whose age > some constant c receives a rate change, and no rate
change otherwise). However, in many situations, rate changes are ðr ‘1 ; r‘0 Þ ? Z ‘ jpðx‘ Þ; ð7Þ
implemented using much more convoluted frameworks, creating
opportunities for finding common support situations. In Rosen- 0 < PðZ ‘ ¼ 1jpðx‘ ÞÞ < 1: ð8Þ
baum and Rubin (1983), together unconfoundedness and common This property states that if treatment assignment is strongly
support constitute a property known as strong ignorability, which ignorable, then pair matching based on the propensity score is suf-
is necessary for identifying average treatment effects. ficient to produce unbiased estimates of the average treatment
Common support may not hold globally, but only for a subset of effect.
the covariate space. Causal effect estimation is still possible for the
region of x in which the treatment and control observations over- 3.3. Matching: a short review
lap. In the specific case that the support of x for the treated is a
subset of the support of x for control observation, then a quantity The essential idea of matching algorithms is to find pairs of sub-
of common interest is the average treatment effect for the treated jects, where one member of the pair has been exposed to the treat-
(ATT), which is identifiable under unconfoundedness, and is esti- ment group and the other to the control group, but they are
mated as otherwise identical in terms of their observed covariates before
ATTðZ ‘ ¼ 1Þ ¼ E½r ‘1 jZ ‘ ¼ 1  E½r ‘0 jZ ‘ ¼ 1
4
¼ Ex‘ jZ‘ ¼1 fE½R‘ jx‘ ; Z ‘ ¼ 1  E½R‘ jx‘ ; Z ‘ ¼ 0jZ ‘ ¼ 1g; ð4Þ One should still expect imbalances on observed covariates to occur in a
randomized setting – in fact, 1 out of 20 covariates should differ at 0.05 level by
chance alone.
390 L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396

time of exposure. Finding such pairs for all subjects is a difficult or patterns of covariates x, and this is ignored by (11). A hybrid alter-
impossible task when x contains many covariates, and this is native, the Mahalanobis distance with propensity score calipers
where the propensity score comes into play. Matching serves for MPðx‘1 ; x‘2 Þ, insists that subjects be close on the propensity score,
the following two purposes. First, if we can find among the L sub- but once this is achieved, the values of x matter. This distance is
jects a total of 2J distinct subjects matched in J pairs, we would set to infinity if pðx‘1 Þ and pðx‘2 Þ differ by more than a caliper of
have reconstructed a randomized experiment from observational width w, and otherwise it is the Mahalanobis distance. That is,
data. Inference about average effects caused by treatments would 
then be straightforward. Second, having formed the J closely MDðx‘1 ; x‘2 Þ; if PSðx‘1 ; x‘2 Þ 6 w;
MPðx‘1 ; x‘2 Þ ¼ ð12Þ
matched pairs, we could use the observed response on one subject 1; if PSðx‘1 ; x‘2 Þ > w:
of the pair to fill-in the ‘missing’ counterfactual response for the
The width w of the caliper is generally specified as a multiple of
other subject of the pair, and thereby using the difference between
the standard deviation of the propensity score, with a value re-
the responses as an estimate of the subject-level causal effect (Ru-
quired to obtain balance on the propensity score. Instead of setting
bin and Waterman, 2006).
MPðx‘1 ; x‘2 Þ ¼ 1 for violations of the propensity score constraint, it
Using matching to find pairs of subjects may be straightforward
may be more appropriate to add a penalty function to the distance
in concept, but there are many variants on how this can be
(Rosenbaum, 2002). The matching algorithm will attempt to re-
achieved. The selection of a matching method essentially involves
spect the caliper, but will prefer to slightly violate it for a few
three choices. First, there is the definition of distance between trea-
matched pairs when the caliper cannot be satisfied for all the pairs.
ted and control subjects in terms of their observed covariate vec-
A matching algorithm attempts to find pairs of subjects based
tors. Second, there is the choice of the algorithm used to form the
on the defined distance. A common approach is a best-first or gree-
matched pairs to make the distance small. Lastly, there is the
dy algorithm. Let L1 and L0 be the number of treated and control
choice of the structure of the match, which involves deciding the
subjects, respectively, and assume L1 6 L0. Units under each treat-
number of treated and control subjects that should be included
ment are first randomly ordered, and the first treated subject is
in each match set.
paired with the nearest control subject, then the second treated
Let us first consider the definition of distance. A common meth-
subject is paired with the nearest of the remaining L0  1 control
od for multivariate matching is based on Mahalanobis distance
subjects, and so on. A greedy algorithm will not generally find an
(Cochran and Rubin, 1973; Rubin, 1979). The Mahalanobis distance
optimal pair match in the sense of minimizing the total distance
between any two subjects, say ‘1 and ‘2, with covariate vectors x‘1
within pairs. The key difficulty is that two or more treated units
and x‘2 is:
may have the same control as their closest match, and greedy
n o12 matching resolves this problem arbitrarily. The alternative to gree-
MDðx‘1 ; x‘2 Þ ¼ ðx‘1  x‘2 Þ> S1 ðx‘1  x‘2 Þ ; ð9Þ dy matching is optimal pair matching, which can be reduced to
finding a flow of minimum cost in a certain network (Rosenbaum,
where S is the sample covariance matrix of X, which corresponds to
1989). This is a standard combinatorial optimization problem for
the data matrix of observations containing the x> ‘ row vectors,
which readily available algorithms exist (Bertsekas, 1998).
‘ = {1, . . . , L}. The Mahalanobis distance is appropriate for multivari-
Finally, there is the choice of the structure of the match. This
ate Normal data, but it can exhibit odd behavior in the presence of
may be performed using pair-matching or 1-to-1 matching, match-
highly skewed distributions or heavily tied covariates, such as rare
ing to a fixed number of m P 2 controls, and matching with a var-
binary variables. A more robust alternative it to use the rank-based
iable number of controls. The optimal structure for producing
Mahalanobis distance (Rosenbaum, 2010)
similarity within matched sets can be shown to be a full match
n o12 (Rosenbaum, 1991) where some matched sets may contain one
RMDðx‘1 ; x‘2 Þ ¼ ðrðx‘1 Þ  rðx‘2 ÞÞ> ðURUÞ1 ðrðx‘1 Þ  rðx‘2 Þ ; ð10Þ
treated subject with one or more controls, while other match sets
may contain multiple treated units with one control. This is intui-
where each covariate in x‘ is replaced by its ranks r(x‘), R is the
tive because the flexible arrangement of full match may group sev-
covariance matrix of the ranks, and U is a diagonal matrix whose
eral controls with a single treated subject in regions of the
elements are the ratios of the standard deviation of the untied ranks
covariate space where controls are vastly, and similarly, it may
to the standard deviation of the tied ranks of the covariates.
group several treated subjects with a single control in regions
Propensity score matching involves matching a treated unit to
where treated subjects are relatively plentiful. Also, because a full
the nearest control unit based on the distance along the propensity
match includes as special cases all of the other matching struc-
score
tures, it will produce match sets that are at least as close as those
PSðx‘1 ; x‘2 Þ ¼ jp
^ ðx‘1 Þ  p
^ ðx‘2 Þj: ð11Þ produced by any of those structures.

In practice, the propensity score must be estimated by, for


example, a logistic model. In that case, distance is generally defined 4. An application to Auto Insurance price elasticity estimation
in terms of the estimated linear predictor, rather than on the esti-
mated propensity score p ^ ðxÞ. This avoids compression of probabil- 4.1. The data
ities near zero and one. Additionally, the linear predictor is often
more nearly normally distributed, which has a technically justified The data used for this analysis were extracted from a large data-
advantage under certain data conditions and matching methods base from a major Canadian direct insurer. It consists of
(see Rosenbaum and Rubin, 1985; Rubin, 1976). L = 329,000 Auto Insurance policies that have been given a renewal
Matched samples may be evaluated based on two different, but offer from June-2010 to May-2012, along with more than 60 pre-
desirable, features. One is based on the balance criteria, which re- treatment covariates describing various characteristics of the pol-
fers to obtaining a similar distribution of the observed covariates icy, the vehicle and the driver.
x for treated and control units. The other is based on a stronger cri- The company sends a renewal package to its customers 45 days
teria defined by distance, which is judged by the closeness of the prior to the expiry date of the current policy term. This package
individual matched pairs in terms of their covariate values. The clearly specifies what would be the new rate for the upcoming pol-
main disadvantage of propensity score matching is that matched icy year, in case the customer decides to renew. The new rate could
units with the same estimated propensity score may have different either be lower, equal, or higher than the current rate. The
L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396 391

treatment is the rate change at which the customer was exposed, Table 1
and was computed as the percentage change in premium from Lapse rates by rate change level.

the current to the new rate. This is a continuous variable, but for Rate change level Rate change (%) N Obs. Lapse rate (%)
the purpose of this study, it was categorized into 5 ordered values 1 [20, 0) 63,212 3.03
t = {1, 2, . . . , 5 = T}. 2 ½0; 5Þ 44,609 3.45
The response variable is the renewal outcome of the policy (re- 3 ½5; 10Þ 40,455 4.44
newed or lapsed), which was measured 30 days after the effective 4 ½10; 20Þ 51,283 7.77
5 ½20; 40 30,948 14.22
date of the new policy term. Up to that point, the customer is guar-
anteed to have her money back in case she decides to terminate the All 230,507 5.92

policy. As expected, Table 1 shows that the lapse rate increases Note. This table displays the five rate change levels associated with each rate change
with the rate change level.5 In addition, the price sensitivity appears interval, number of policyholders and average lapse rate.
to be higher for price increases relative to price decreases. However,
as discussed above, differences in lapse rates among groups are not
directly comparable, as they might be driven by differences in the observed and counterfactual outcomes between the matched
covariates. pairs, ^r‘1 j  ^r ‘2 k . In the case subject ‘1 cannot find a match among
the subjects treated with k, then the data cannot support causal
4.2. Building the model effect estimates for this subject and treatment dichotomy, at
least not without making strong external assumptions involv-
In this study, we followed Rubin’s model of causal inference as ing model-based extrapolation.
described in Section 2. Our work has also been influenced by Fah- 4. Develop a ‘‘global model’’ of the response. Develop a global model
^
^r‘t ðx‘ Þ, obtained by fitting the estimates ^r ‘t of the observed
ner (2012), but ours is clearly different in the exposition, the spe-
cific application and the details of the model building process. responses, plus the estimates of a subset of the counterfactual
Our ultimate goal is to obtain estimates of lapse probabilities responses (i.e., as far as the overlap situation permits) on the
for each policyholder under each treatment. As each policyholder vector of observed characteristics x‘ and treatment level t. This
is only exposed to a single treatment, the rest remain counterfac- model allows us to predict the response for each treatment t
tual. The ideal way to think about the unobserved counterfactual and value of x. That is, estimate E½^r‘t jx‘ ; t.
outcomes is that they are missing values, and therefore should
be multiply imputed to represent their uncertainty. The idea is that up to step 3, we try to avoid risky extrapolation
We outline below the conceptual steps involved in the estima- by restricting inference to the overlap regions only. Only in step 4,
tion process. In short, we first fit a series of lapse probability mod- we may choose to ignore the overlap structure by inferring the full
els, one for each rate change level. We subsequently use propensity combinatorial set (all covariates x and treatments t). The inclusion
scores and matching algorithms for finding pairs of policyholders of the counterfactual responses in the estimation of the global
who were exposed to distinct rate change levels, but they are model reduces the exposure to extrapolation problems. It is our
otherwise comparable in terms of their pre-treatment covariates. experience and of others (Fahner, 2012), that this approach gives
Having found those pairs, we then use the estimated lapse proba- more control and insight during the modeling process than trying
bility from each subject of the pair to fill-in the ‘missing’ counter- to fit a global model directly to the observed data points.
factual response for the other subject of the pair. Finally, we fit a
‘‘global’’ model, which allows us to predict price-elasticity under 4.3. Propensity score estimates
each rate change level and value of the covariates.
In practice, the propensity score (5) is generally estimated with
1. Estimate a lapse model for each individual treatment. For each a logistic regression model. Accurate estimates require to include
treatment t = {1, 2, . . . , 5 = T}, obtain an estimate of the lapse all variables that simultaneously influence the treatment status
probability ^r ‘t ðx‘ Þ by regressing r‘t on x‘ based only on the sub- and the outcome, and have the correct functional form specifica-
jects that received treatment t. That is, estimate E[r‘tjx‘, t]. tion. The balancing property of the propensity score (6) is then
2. Propensity score analysis and matching. This step involves: examined to determine if refinements to the model are required.
(a) Given the five treatments (T = 5), estimate the propensity This can be accomplished by stratifying the estimated propensity
scores p(x‘) for all 10 = T(T  1)/2 treatment dichotomies, score at the quintiles of its distribution, and then testing whether
and identify common support (i.e., overlap) regions. balance has been achieved for each covariate within each stratum
Specifically, given a treatment dichotomy (j, k), estimate (Rosenbaum and Rubin, 1984). If there are significant differences in
E[Z‘jjx‘, t = (j, k)]. the distribution of a covariate between the treatment and compar-
(b) For each treatment dichotomy (j, k), form pairs of policy- ison groups within stratum, then adding higher-order terms or
holders (one from each treatment) using one of the match- interactions of the covariate may improve balance. Failure to sat-
ing algorithms described in Section 3.3. isfy this condition under all model specifications would allow us
3. Infer the counterfactual outcomes from the matched pairs. Con- to conclude that the treatments do not overlap along all
sider a matched pair including subjects ‘1 and ‘2, which have dimensions.
been exposed to treatments j and k, respectively. We use the As can be anticipated, moving back and forth from balance sta-
estimate ^r‘2 k to fill in for the counterfactual outcome of subject tistics to changing the model specification is a monotonous pro-
‘1 under treatment k. Similarly, we use the estimate ^r ‘1 j to fill in cess. In the context of this study, much better results and with
for the counterfactual outcome of subject ‘2 under treatment j. considerable less model tuning required were obtained by estimat-
The causal effect of exposing subject ‘1 to treatment j rather ing the propensity scores using Gradient Boosting Models (GBM)
than to treatment k can then be obtained by differencing the (Friedman, 2002). GBM models the log-odds of treatment assign-
ment, h(x‘) = log(p(x‘)/(1  p(x‘))), by iteratively fitting a collection
5
of simple regression trees, and then combining them to produce a
In this analysis, the rate change was combined from all the different coverages
‘‘strong’’ learning algorithm. Fit is measured by the Bernoulli devi-
(third party liability, damage to the car, etc). Additionally, it would be relevant to P
investigate the potential heterogeneity in price-elasticity from the individual ance,  L‘¼1 ðZ ‘ hðx‘ Þ  logð1 þ expðhðx‘ ÞÞÞÞ, with smaller values
coverages. indicating a better fit. These models have a number of appealing
392 L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396

properties for propensity score estimation. First, GBM is a general work solver, as described in Hansen and Klopfer (2006), to find
data modeling algorithm that allows for a flexible non-linear effect optimal matched pairs of policyholders (one from each rate change
of the covariates on the propensity score. Results are invariant un- level), such that the sum of the distances (12) between the
der order preserving transformations of covariates, so there is no matched pairs is minimized. There is a trade-off between produc-
need to consider functional form revision of the variables (e.g., ing closely matched pairs and maximizing the number of matches
log, power, squared-root, etc.). Second, as the propensity score is thus obtained. The width of the propensity score calipers in (12)
estimated from a sequence of tree-based models, complex interac- were selected to obtain a good compromise between these two
tions are identified within the fitting process. Last, GBM has a built- objectives.
in variable selection procedure, in the sense that the estimated Table 2 displays the balance results for some of the most impor-
model does not necessarily use all the covariates. For an overview tant covariates used in estimating the propensity score. Balance is
of GBM with an application to Auto Insurance ratemaking see shown for the first four treatment dichotomies before and after
Guelman (2012). matching. Notice that before matching, the means of the covariates
An important aspect in fitting propensity score models is to differ considerably within each treatment dichotomy. This is more
realize that the goal is to obtain estimates of the propensity score evident for dichotomies with larger rate change differences. This
that statistically balance the covariates between each treatment provides insights for understanding the nature of the distributional
dichotomy, rather than one that estimates the true propensity differences in the propensity scores (see Fig. 1 in Section 4.3). After
score as accurately as possible. Thus, the model parameters should matching, the differences in the means of the covariates between
not be chosen to minimize prediction error, but to maximize covar- groups diminished substantially. For instance, in the treatment
iate balance. The estimated propensity scores may tend to overfit dichotomy (5, 1), we started with 30,948 subjects treated with
the data, in the sense of producing better covariate balance than t = 5 and a comparison group of 63,212 subjects treated with
it would be expected under chance in the data set used to construct t = 1. Subjects treated with t = 5 have, on average, lower current
the score, but this is not a problem given the objective (Joffe and premium (premium), are less likely to have a Home Insurance pol-
Rosenbaum, 1999). icy with the company (home), less likely to have more than one
In this study, similarly to McCaffrey et al. (2004), GBM models vehicle (multi_vehicle), less likely to have policy discounts
were selected6 to minimize the average standardized absolute mean through an employer benefit (group), less likely to be in the best
difference in the covariates (ASAM). For each covariate, we calcu- driving record category (drv_rec7), and less likely to have full
lated the absolute value of the difference between the mean for coverage option (full_cov). By design, the matching algorithm
the treatment group and the weighted7 mean for the comparison required exact matches on ‘‘at-fault’’ accidents during the prior
group, divided by the standard deviation for the treatment group. year (accident). This is to ensure we are controlling for premium
We subsequently averaged these values across all covariates to ob- changes resulting from accidents caused by the driver, as opposed
tain the ASAM. to rate changes strictly driven by the company. The matched sam-
Fig. 1 shows the distribution of the final fitted propensity scores ple is composed of 26,592 subjects (13,296 from each treatment).
for each treatment dichotomy. The propensity scores are labeled When checking covariate balance, it is important not only to
‘‘Linear Propensity Scores’’ to reflect they are in the log-odds scale. examine differences in means, but to check more general summa-
These plots provide a simple, yet powerful, diagnostic on the data ries of the distribution. Fig. 2 shows the empirical QQ-plot for the
examined. We note that the overlap between distributions tends to variable premium in the (5, 1) treatment pair before and after
be much higher for rate changes that are closer, relative to those matching. Balance for this variable is improved by matching.
that are further apart. A key strength of the propensity score meth-
od is that it dramatically alerts us about this fact. For example, it is 4.5. Price-elasticity functions
clear that fewer of the subjects in the treatment dichotomy (5, 1)
are similar relative to those in the dichotomy (2, 1). This suggests Now that we have achieved good balance among all treatment
that finding appropriate matches will be relatively more difficult dichotomies, we can proceed by estimating the global model as
in the former dichotomy. Also, in most treatment dichotomies, discussed in the last step of Section 4.2. This model allows us to ob-
there are subjects exposed to one rate change level with higher tain estimates of lapse probabilities under each rate change level t
estimated propensity scores relative to the other rate change level, and covariates x.
indicating there is a combination of covariate values not appearing We fitted this model using a Generalized Additive Model (Hastie
in both groups. The next section provides key insights for under- and Tibshirani, 1990), with continuous covariates represented by
standing the nature of the differences in the propensity score penalized cubic regression splines. The degree of smoothness of
distributions. model terms was estimated as part of fitting process and selected
by Generalized Cross-Validation (GCV). Interaction terms between
the rate chance and each covariate were tested and added into the
4.4. Matching and covariate balance model guided by the GCV scores as well as domain expertise. This
allows for heterogeneity in price elasticities to be estimated at the
In this study, we tested the matching algorithms, distance def- individual policyholder level. As previously discussed, understand-
initions and matching structures described in Section 3.3. The best ing the precise nature of the individual price sensitivities can be
results, in the sense of producing closely matched pairs and bal- extremely valuable. Each subject’s treatment could be optimized
anced matched samples, were obtained by optimal pair matching on the basis of the individual characteristics.
using the Mahalanobis distance, including the propensity score Having estimated the global model, we then averaged the indi-
as an additional covariate and propensity score calipers. Specifi- vidual estimates to construct aggregate price-elasticity functions
cally, for each treatment dichotomy, we used a minimum-cost net- at the portfolio level. This was done for various subpopulations
within the insurance portfolio. A subpopulation of the portfolio
6
Model selection for GBM fundamentally involves selecting the values for two can be obtained by restricting the values of the covariates to a sub-
tuning parameters: the size of the individual trees fitted at each iteration and the
set x. Suppose the portfolio has Lx = {‘:x‘ 2 x} subjects with
number of fitted trees.
7
The weights for subject ‘ in the comparison group are defined by
covariate values in this subset. The estimated price-elasticity for
subpopulation x and treatment t is defined as PEð c x; tÞ ¼
w‘ ¼ p^ ðx‘ Þ=ð1  p
^ ðx‘ ÞÞ, the odds that a subject with covariates x‘ will be assigned
P
to treatment. ^
^
r ðx Þ=L .
8‘:x‘ 2x ‘t ‘ x
L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396 393

2, 1 3, 1 4, 1 5, 1 3, 2
0.8

0.6 0.4 0.3


0.3
0.6
0.3
0.4 0.2
0.2 0.4
0.2

0.2 0.1 0.1 0.2


0.1 level

1
0.0 0.0 0.0 0.0 0.0
Density

2
−2 −1 0 1 −4 −2 0 2 −4 −2 0 2 −5.0 −2.5 0.0 2.5 5.0 −2 −1 0 1
2, 4 5, 2 3, 4 5, 3 5, 4 3

0.3 4
0.4 0.3
0.6 5

0.3 0.4
0.2
0.2
0.4
0.2

0.1 0.2
0.1
0.2
0.1

0.0 0.0 0.0 0.0 0.0


−3 −2 −1 0 1 2 3 −2.5 0.0 2.5 5.0 −2 −1 0 1 −2 0 2 4 −2 −1 0 1 2 3
Linear Propensity Score

Fig. 1. Estimated propensity scores for all treatment dichotomies. Given a treatment dichotomy (j, k), each plot illustrates the distribution of the probability of assignment to
rate change level j relative to level k, conditional on the background covariates. Within each dichotomy, the rate change level with fewer units is represented by j, and the
other level with k. The propensity scores are labeled ‘‘Linear Propensity Scores’’ to reflect they are in the log-odds scale.

The results are illustrated in Fig. 3. The plots show the esti- The problem can be expressed as an integer program. As before,
mated lapse rate measured at each rate change level for the se- the portfolio is composed of L policyholders, ‘ = {1, 2, . . . , L}, charac-
lected subpopulations. For ease of interpretation, continuous terized by a vector of pre-treatment covariates x‘. Each subject can
covariates were categorized at the quartiles of their distribution be exposed to a rate change level t = {1, 2, . . . , 5 = T}, and we let Z‘t
(this is labeled with the numbers 1 to 4 in the plots). There is a be a binary indicator that takes a value of 1 if subject ‘ is exposed
clear interaction effect between the rate change level and ‘‘at- to rate change t and 0 otherwise. RCt is the actual rate change asso-
fault’’ accidents during the prior year (accident). Insureds with ciated with treatment t. The lapse estimates ^ ^r‘t represent the lapse
recent accidents already expect a rate increase and thus have a probability of subject ‘ if exposed to rate change level t. In addition,
lower price sensitivity. Also, as expected, the higher the current P‘ is the current premium, LR c ‘t the predicted loss ratio (i.e., the ra-
premium (premium), the higher is the price-elasticity for a given tio of the predicted insurance losses relative to premium).9 and a
rate change, but this relation tends to be much stronger with the the overall lapse rate of the portfolio.
increase in the rate change level. Similarly, younger policyholders The objective function is to maximize the expected profit of the
(age) and relatively newer customers (term) tend to be more portfolio
price-elastic. All the remaining variables have the expected effect
X
L X
T
on price-elasticity.8 Overall, price elasticity tends to be higher for MaxZ‘t 8‘8t ^ ‘t Þð1  ^^r‘t Þ
Z ‘t ½P‘ ð1 þ RC t Þð1  LR ð13Þ
rate increases relative to rate decreases. A rate increase provides ‘¼1 t¼1
an incentive to shop for an alternative insurer, whereas a rate de-
subject to the following constraints
crease does prevent customers from switching but to a less extent.
X
T
Z ‘t ¼ 1 8‘; ð14aÞ
5. Managerial implications: price optimization t¼1

In this section, we briefly illustrate an application of the derived Z ‘t 2 f0; 1g; ð14bÞ
estimates of price sensitivities to assist managers in optimizing the
expected profit of an insurance portfolio. The question is: which X
L X
T

rate change should be applied to each policyholder to maximize Z ‘t ^^r ‘t =L 6 a: ð14cÞ


‘¼1 t¼1
the overall expected profit for the company subject to a fixed over-
all retention rate? By understanding the precise nature of the
price-elasticities at the policyholder level, the individual rates
can be optimized based on each customer’s willingness to pay. 9 ^ ‘ =P‘ ð1 þ RC t Þ, where LC
^ ‘t ¼ LC
Specifically, LR ^ ‘ represents the expected loss cost
The causal inference framework used to derive estimates of lapse for policyholder ‘. The expected loss cost was derived using a variety of qualitative
probabilities at the individual subject level under each rate change customer attributes with traditional quantitative rating risk factors to accurately
scenario, allows us to solve this problem effectively. predict the likelihood that each policyholder ‘ may experience a claim in the future
and the expected claim cost. The data supporting this analysis was based on a 5-year
exposures and claim experience from the same insurer, with losses including the
8
An additional issue is the role of the product understanding in the renewal most recent case reserve estimates. The final loss cost estimates also include an
decision. In some cases, the level of insurance literacy can be limited and this is overall base level adjustment for pure IBNR (Incurred But Not Reported) and
potentially influencing decisions. development of known claims.
394 L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396

Table 2 We solved this optimization problem using the data discussed


Balance results of the covariates before and after matching for the first four treatment in Section 4.1 along with the estimated lapse probabilities from
dichotomies.
Section 4.5. The results for a sequence of (1  a) values10 are illus-
Pair (2, 1) Pair (3, 1) Pair (4, 1) Pair (5, 1) trated in Fig. 4. The efficient frontier represents the maximum ex-
t=2 t=1 t=3 t=1 t=4 t=1 t=5 t=1 pected profit that this company can obtain at a given desired
premium (Avg.)
retention rate. The expected profit is expressed in terms of change,
Before 2,012 2,362 2,041 2,362 2,144 2,362 2,130 2,362 measured in percentage points, relative to the current profit level
After 2,113 2,111 2,285 2,230 2,410 2,507 2,463 2,487 of the company. This insurer may choose to be at any given point
yrs_lic (Avg.) in the efficient frontier depending on its strategic objectives of mar-
Before 25.9 22.9 25.5 22.9 24.7 22.9 22.5 22.9 ket share and profitability. However, any point below the efficient
After 25.0 25.2 24.0 24.2 22.3 22.5 20.8 22.0 frontier is suboptimal in that it is possible to increase profits while
home (%) maintaining the retention level, or alternatively, increase retention
Before 77.6 78.6 73.6 78.6 70.9 78.6 63.6 78.6 while maintaining profitability. For instance, at the current state,
After 77.8 79.1 74.1 75.2 71.9 73.4 69.1 71.5
the company may choose to move in the ‘‘A’’ direction and increase
multi_vehicle (%) profits by almost 18%, without sacrificing customer retention. Alter-
Before 54.8 57.2 44.4 57.2 37.9 57.2 29.6 57.2
natively, the company may choose to shift in the ‘‘B’’ direction and
After 57.2 58.7 50.2 51.0 46.8 47.1 43.5 40.8
increase retention with no loss in profits. This might be a good strat-
group (%)
egy if the company is aiming to gain market share. Finally, it may
Before 12.0 14.0 12.6 14.0 8.2 14.0 6.2 14.0
After 12.2 12.3 12.0 12.3 11.5 11.1 8.8 8.4 choose to move in the ‘‘C’’ direction if the objective is to retain only
the most profitable customers. In this sense, the causal effects indi-
drv_rec7 (%)
Before 65.8 55.6 64.1 55.6 62.1 55.6 40.8 55.6 cated in the article could be shown as the lead for making commer-
After 60.8 61.9 55.5 57.3 48.5 51.4 38.2 43.2 cial decisions that keep the portfolio in good shape.
full_cov (%) Another consideration is in relation to situations where the in-
Before 93.5 89.4 90.1 89.4 84.9 89.4 70.9 89.4 surer may want to limit a certain class of risks in the portfolio, or
After 92.8 93.0 90.1 88.7 84.1 82.0 80.4 77.5 write more business in certain regions or distributions channels.
accident (%) These situations could be handled by imposing additional con-
Before 1.82 1.82 2.16 1.82 2.75 1.82 10.80 1.82 straints in the optimization problem. At the optimum, the com-
After 1.90 1.90 2.20 2.20 3.22 3.22 5.17 5.17
pany maximizes expected profits subject to these additional
lease_flag (%) constraints.
Before 13.4 13.1 11.7 13.1 10.5 13.1 8.42 13.1
After 13.4 13.6 12.2 12.6 11.5 12.4 10.8 9.9
veh_age (Avg.)
6. Conclusion
Before 5.87 6.14 6.67 6.14 7.19 6.14 8.48 6.14
After 5.96 5.94 6.59 6.72 7.08 7.11 7.42 7.84
This paper considers a shift in the paradigm to measure price
prop_score (Avg.)
Before 0.465 0.377 0.507 0.315 0.655 0.280 0.642 0.175
elasticity in the context of Auto Insurance, from traditional statis-
After 0.437 0.436 0.438 0.431 0.499 0.490 0.429 0.419 tical analysis to a causal inference approach. Price elasticity is ulti-
N Obs.
mately concerned with the effect of a rate change on each
Before 44,609 63,212 40,455 63,212 51,283 63,212 30,948 63,212 policyholder’s renewal outcome. The problem that motivates the
After 37,171 37,171 28,873 28,873 23,684 23,684 13,296 13,296 study is therefore not associational but causal in nature. As each
Note. This table displays the mean of the most relevant covariates before and after
policyholder is exposed to a single rate change level, the rest re-
matching across the first four treatment dichotomies. The differences in the means main counterfactual. The counterfactual model of causality devel-
of the covariates between groups diminished substantially on the matched subjects. oped by Rubin represents a useful framework to conceive the
price elasticity estimation problem. Under this framework, coun-
terfactuals are thought as missing values, which are multiply im-
puted to represent their uncertainty.
Additionally, rate changes reflected in most insurance databases
are not the result of a carefully designed experiment, but of a risk-
based pricing model. Addressing causal inference questions in the
presence of observational data requires appropriate data analysis
methods. Conventional analysis based on statistical or algorithmic
data models (regression, decision trees, neural nets, etc.), which at-
tempt to fit directly the observed data points, is subject to hidden
extrapolation problems without warnings. We have shown that
the propensity score is a straightforward method that would alert
the analyst from inadequately overlapping covariate distributions.
Under these situations, the data may not support causal conclu-
sions without relying on untestable assumptions about the form
Fig. 2. Empirical-QQ Plot of premium before and after matching in the (5, 1) of the extrapolation. Further, we have shown that optimal pair
treatment dichotomy. This figure displays the quantiles of premium for treatment matching is a useful method for identifying common support re-
5 vs. treatment 1 before (left) and after (right) matching. A 45-degree reference red gions from which estimates of the counterfactual renewal out-
line is also plotted (indicating perfect balance). Balance for this variable was
comes can be derived locally in those regions. These estimates
improved after matching.
were subsequently used jointly with the estimates of the observed
renewal outcomes to obtain a global price-elasticity function. This
Eqs. (14a) and (14b) ensure that each policyholder is assigned a rate
change level, and (14c) ensures that the portfolio has a lapse rate 10
The optimization problem was solved for 8 equally spaced values of a, which
which does not exceeds a. were then interpolated.
L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396 395

Fig. 3. Price-elasticity functions. The plots illustrate the average estimated lapse rate measured at each rate change level for selected subpopulations within the insurance
portfolio. Continuous covariates were categorized at the quartiles of their distribution (this is labeled with the numbers 1 to 4 in the plots).

Moreover, valuable insights can be gained by knowing the current


company’s position of market share and profitability relative to the
optimal values given by the efficient frontier. The relevant mana-
30 gerial decision is then to determine in which direction the com-
pany should move towards the frontier, as each solution point
Ef
fic
ie

places a different weight on each of these objectives.


nt
fro

Some decision model components can be sensitive to the type


Expected profit (%)

nt
ie

of insurer. In particular, this type of portfolio analysis is more rel-


r

20
evant to a direct insurer relative to a broker-based insurer. In the
later context, the ability to optimize the rates based on price elas-
ticity considerations is reduced, as the renewal decision is not nec-
essarily driven by the client, but likely to be influenced by the
10 broker, possibly due to the commission rates offered from the var-
C A ious competitors. Another consideration is in relation to individu-
als with insurance plans from their employment agreements. The
applicability of the proposed model in this case is highly depen-
dent on the regulatory pricing environment. For instance, if instead
B
0 of having a unique employee discount, the regulation allows for
Current state different discount levels based on price elasticity considerations,
then the model still applies.
Although the methodology could be more involved compared to
0.90 0.92 0.94 0.96 the conventional approach, it offers rigorous analysis of causal ef-
Retention Rate (1 − α) fects from non-experimental data. We hope this paper will stimu-
late more appreciation about the importance of causal inference,
Fig. 4. Expected profit efficient frontier. The efficient frontier represents the
and its relevance for price elasticity estimation in an insurance
maximum expected profit that the company can obtain at a given desired retention
rate. The expected profit is expressed in terms of change, measured in percentage
context.
points, relative to the profit at the current state. The current situation for this
company is not optimal, in that it is possible to obtain an increase in profit at the
current retention level, or a higher retention at the current profit.
Acknowledgments

function allowed us to predict the renewal outcome for the full LG thanks Royal Bank of Canada, RBC Insurance. MG thanks
combinatorial set of rate change levels and covariates. ICREA Academia and the Ministry of Science/FEDER Grant
Besides understanding the precise nature of price elasticities at ECO2010-21787-C03-01. We are grateful to the editor and the re-
the individual subject level, the model may assist managers in viewer for their thoughtful comments that helped us improve a
selecting an optimal rate change level for each policyholder for prior version of this article. All errors remain our own
the purpose of maximizing the overall profits for the company. responsibility.
396 L. Guelman, M. Guillén / Expert Systems with Applications 41 (2014) 387–396

References McCaffrey, D., Ridgeway, G., & Morral, A. (2004). Propensity score estimation with
boosted regression for evaluating causal effects in observational studies.
Psychological Methods, 9, 403–425.
Anderson, D., Feldblum, S., Modlin, C., Schirmacher, D. Schirmacher, E., & Thandi, N.
Morgan, S., & Winship, C. (2007). Counterfactuals and causal inference. New York:
(2007). A practitioner’s guide to generalized linear models. CAS Exam Study
Cambridge University Press.
Note Casualty Actuarial Society, CAS Syllabus Year, 2010, Exam Number: 9 (p.
Rosenbaum, P. (1989). Optimal matching for observational studies. Journal of the
1–116).
American Statistical Association, 84(408), 1024–1032.
Berk, R. (2004). Regression analysis: A constructive critique. Thousand Oaks, CA: Sage.
Rosenbaum, P. (1991). A characterization of optimal designs for observational
Bertsekas, D. (1998). Network optimization: Continuous and discrete models. Athena
studies. Journal of the Royal Statistical Society, 53(3), 597–610.
Scientific.
Rosenbaum, P. (2002). Attributing effects to treatment in matched observational
Brown, R., & Gottlieb, L. (2007). Introduction to ratemaking and loss reserving for
studies. Journal of the American Statistical Association, 97(457).
property and casualty insurance. Actex.
Rosenbaum, P. (2010). Design of observational studies. New York: Springer.
Cochran, W., & Rubin, D. (1973). Controlling bias in observational studies: A review.
Rosenbaum, P., & Rubin, D. (1983). The central role of the propensity score in
Sankhya A, 35, 417–446.
observational studies for causal effects. Biometrika, 70(1), 41–55.
Denuit, M., Maréchal, X., Pitrebois, S., & Walhin, J. F. (2007). Actuarial modeling of
Rosenbaum, P., & Rubin, D. (1984). Reducing bias in observational studies using
claim counts. England: Wiley.
subclassification on the propensity score. Journal of the American Statistical
Englund, M., Guillén, M., Gustafsson, J., Nielsen, L. H., & Nielsen, J. P. (2008).
Association, 79(387), 516–524.
Multivariate latent risk: A credibility approach. Astin Bulletin, 38(1), 137–146.
Rosenbaum, P., & Rubin, D. (1985). Constructing a control group using multivariate
Fahner, G. (2012). Estimating causal effects of credit decisions. International Journal
matched sampling methods that incorporate the propensity score. The American
of Forecasting, 28, 248–260.
Statistician, 39(1), 33–38.
Finger, R. (2006). Risk classification, practical aspects. Encyclopedia of actuarial science.
Rubin, D. (1973). The use of matched sampling and regression adjustment to
Wiley.
remove bias in observational studies. Biometrics, 29(1), 185–203.
Friedman, J. (2002). Stochastic gradient boosting. Computational Statistics & Data
Rubin, D. (1976). Multivariate matching methods that are equal percent bias
Analysis, 38, 367–378.
reducing. I: Some examples. Biometrics, 32(1), 109–120.
Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modeling
Rubin, D. (1979). Using multivariate sampling and regression adjustment to control
and prediction. Expert Systems with Applications: An International Journal, 39(3),
bias in observational studies. Journal of the American Statistical Association, 74,
3659–3667.
318–328.
Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2012). Random forests for uplift
Rubin, D. (1997). Estimating causal effects from large data sets using propensity
modeling: An insurance customer retention case. Lecture Notes in Business
scores. Annals of Internal Medicine, 127, 757–763.
Information Processing, 115, 123–133.
Rubin, D., & Waterman, R. (2006). Estimating the causal effects of marketing
Guillén, M., Nielsen, J. P., Scheike, T. H., & Pérez-Marín, A. M. (2012). Time-varying
interventions using propensity score methodology. Statistical Science, 21,
effects in the analysis of customer loyalty: A case study in insurance. Expert
206–222.
Systems with Applications, 39(3), 3551–3558.
Smith, K., Willis, R., & Brooks, M. (2000). An analysis of customer retention and
Guo, S., & Fraser, M. (2009). Propensity score analysis: Statistical methods. Thousand
insurance claim patterns using data mining: A case study. The Journal of the
Oaks, CA: Sage.
Operational Research Society, 51(5), 532–541.
Gu, X., & Rosenbaum, P. (1993). Comparison of multivariate matching methods:
Thuring, F., Nielsen, J. P., Guillén, M., & Bolancé, C. (2012). Selecting prospects for
Structures, distances, and algorithms. Journal of Computational and Graphical
cross-selling financial products using multivariate credibility. Expert Systems
Statistics, 2(4), 405–420.
with Applications, 39(10), 8809–8816.
Hansen, B., & Klopfer, S. (2006). Optimal full matching and related designs via
Towers Perrin. (2007). Price optimization: A potent weapon for innovative insurers,
network flows. Journal of Computational and Graphical Statistics, 15(3), 609–627.
Towers Perrin. <http://www.towersperrin.com/tp/getwebcachedoc?webc=TILL/
Hastie, T., & Tibshirani, R. (1990). Generalized additive models (1st ed.). Chapman and
USA/2007/200708/Price_optimization_80707pdf.pdf>.
Hall/CRC.
Yeo, A. C., Smith, K., Willis, R., & Brooks, M. (2001). Modeling the effect of premium
Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical
changes on motor insurance customer retention rates using neural networks.
Association, 81(396), 945–960.
Lecture Notes in Computer Science, 2074, 390–399.
Joffe, M., & Rosenbaum, P. (1999). Propensity scores. American Journal of
Epidemiology, 150(4), 327–333.
Kaishev, V. K., Nielsen, J. P., & Thuring, F. (2012). Optimal customer selection for
cross-selling of financial services products. Expert Systems with Applications,
40(5), 1748–1757.

You might also like