Methods-for-Causal-Inference-in-MarketingFoundations-and-Trends-in-Marketing
Methods-for-Causal-Inference-in-MarketingFoundations-and-Trends-in-Marketing
1561/1700000080
Aesthetics in Marketing
Henrik Hagtvedt
ISBN: 978-1-63828-286-0
Zezhen (Dawn) He
University of Rochester
[email protected]
Vithala R. Rao
Cornell University
[email protected]
Boston — Delft
Full text available at: http://dx.doi.org/10.1561/1700000080
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise,
without prior written permission of the publishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222
Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal
use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users
registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on
the internet at: www.copyright.com
For those organizations that have been granted a photocopy license, a separate system of payment
has been arranged. Authorization does not extend to other kinds of copying, such as that for
general distribution, for advertising or promotional purposes, for creating new collective works,
or for resale. In the rest of the world: Permission to photocopy must be obtained from the
copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA;
Tel. +1 781 871 0245; www.nowpublishers.com; [email protected]
now Publishers Inc. has an exclusive license to publish this material worldwide. Permission
to use this content must be obtained from the copyright license holder. Please apply to now
Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
[email protected]
Full text available at: http://dx.doi.org/10.1561/1700000080
Editors
Dawn Iacobucci
Vanderbilt University
Leonard Lee
National University of Singapore
Sharon Ng
National University of Singapore
Koen Pauwels
Northeastern University
Stefano Puntoni
University of Pennsylvania
William Rand
North Carolina State University
Bernd Schmitt
Columbia University
Gerrit van Bruggen
Erasmus University
Hema Yoganarasimhan
University of Washington
Juanjuan Zhang
Massachusetts Institute of Technology
Full text available at: http://dx.doi.org/10.1561/1700000080
Editorial Scope
Foundations and Trends® in Marketing publishes survey and tutorial articles
in the following topics:
Contents
1 Introduction 3
Acknowledgments 98
Appendices 99
References 122
Full text available at: http://dx.doi.org/10.1561/1700000080
ABSTRACT
Establishing causal relationships between the marketing
variables under the control of a firm and outcome measures
such as sales and profits is essential for the successful op-
eration of a business. The goal is generally hindered by a
lack of suitable experimental data owing to the costs and
feasibility of conducting randomized experiments. Accord-
ingly, researchers have employed observational and quasi-
experimental data for causal inference. The evolutionary
trajectory of causal inference is closely intertwined with
advancements in business and technology, particularly as
we enter the digital era characterized by big data and multi-
channel marketing.
Against this background, this monograph is a systematic
review of recent developments in causal inference methods
and their applications within the marketing field. For each
causal inference method, five recently published academic
papers in marketing research that employ these methods
are discussed.
1
Introduction
Marketing is a business function that relates the firm to its customers and
end-consumers. Undoubtedly, it is an essential and important function
of businesses and other organizations (see Kotler and Keller, 2012). On
the business side, the function involves activities such as product design,
sales forecasting, design of advertising strategy for a product and its
execution, and sales and distribution activities. The marketing function
is similar in other types of organizations, even though the terminologies
may differ.
Concurrent with the marketing developments in business, the aca-
demic field of marketing has blossomed over the last 80 years or so.
This discipline is thriving considering the high visibility of academic
marketing associations and academic journals. Three associations, the
American Marketing Association (AMA), the European Marketing
Academy (EMAC), and the Institute for Operations Research and Man-
agement Science (INFORMS) played a significant role in the discipline’s
growth. AMA’s premier journals, Journal of Marketing (JM) is in its
88th year of publication and the Journal of Marketing Research (JMR)
is in its 61st year of publication. The International Journal of Re-
search in Marketing (IJRM) of EMAC is in its 40th year of publication
3
Full text available at: http://dx.doi.org/10.1561/1700000080
4 Introduction
6 Introduction
A1. Survey data B1. Qualitative research C1. Social relationships data
A2. Experimental data∗ B2. Product/service reviews C2. Social games data
A3. Archival data B3. Videos, pictures C3. Postings to social media
A4. Panel data (choices B4. Consumer search data
and durations)
A5. Media ratings data B5. Data on physiological
measurements (e.g., eye
tracking)
A6. Sales and prices and B6. Neuroscience-related data
advertising data
B7. Genetic data
Note: ∗A special case of these data is conjoint analysis data (ratings or choices).
1
Although there are slight differences between observational data and quasi-
experimental data, we use them interchangeably in this monograph.
Full text available at: http://dx.doi.org/10.1561/1700000080
8 Introduction
2
This monograph will not cover the Directed Acyclic Graphs (DAGs) developed
by Pearl and his colleagues with an exclusive focus on econometrics-based methods.
See Pearl (2009a) for the DAG approaches. But we will briefly describe the debate
between Rubin and Pearl (Pearl, 2009b; Rubin, 1974).
Full text available at: http://dx.doi.org/10.1561/1700000080
Appendices
Full text available at: http://dx.doi.org/10.1561/1700000080
A
Python Code for Generating Simulated Data1
1
To enhance the replicability of the code, we have made the code available on
GitHub https://github.com/zhesimon/Methods-for-Causal-Inference-in-Marketing.
100
Full text available at: http://dx.doi.org/10.1561/1700000080
101
import numpy as np
import pandas as pd
import random
from scipy.stats import norm
from sklearn.preprocessing import MinMaxScaler
np.random.seed(1)
TV_areas, n_periods = 40, 10
age_mean, age_std = 38, 5
income_household_mean, income_household_std = 70000, 10000
female_rate, female_std = 50, 10
pct_days_promo_mean, pct_days_promo_std = 50, 15
data = []
for i in range(TV_areas):
TV_area = i+1
age = int(np.random.normal(age_mean, age_std))
income = int(np.random.normal(income_household_mean, income_household_std))
female = round(np.random.normal(female_rate, female_std),2)
pct_days_promo = round(np.random.normal(pct_days_promo_mean,
pct_days_promo_std),2)
for j in range(n_periods):
TV_area_data = {
'period': j+1,
'TV_areas': TV_area,
'avg_age': age,
'avg_income': income,
'%female': female,
'pct_days_promo': pct_days_promo,
}
data.append(TV_area_data)
df = pd.DataFrame(data)
# treatment effect = 3
df.to_csv('simulated_TV_areas.csv', index=False)
new_df2['TV_areas'] = new_df2['TV_areas'] + 20
# error ~ N(0, 2) for percent of days the brand was sold on promotion
new_df2['pct_days_promo'] = df.groupby('TV_areas')['pct_days_promo'].
transform(lambda x: x + np.random.normal(0, 2)).round(2)
df = pd.concat([new_df1, new_df2]).reset_index(drop=True)
103
# X: advertising expenditure
advertising_expenditure = pi0 + pi1 * advertising_costs + omit_exp * omitted_variable
+ np.random.uniform(-2, 2, size=n)
Full text available at: http://dx.doi.org/10.1561/1700000080
# Y: sales
sales = beta0 + beta1 * advertising_expenditure + omit_sales * omitted_variable
+ np.random.uniform(-5, 5, size=n)
data = pd.DataFrame({
'Ad_Costs': advertising_costs,
'Ad_Expenditure': advertising_expenditure,
'Omitted_variable': omitted_variable,
'Sales': sales
})
data.to_csv('IV_Data.csv',index=False)
data = []
for _ in range(num_samples):
brandimage = random.uniform(1, 5)
price = random.uniform(1, 5)
service = random.uniform(1, 5)
rating = brandimage * 0.3 + price * -0.2 + service * 0.4
data.append({
'Brandimage': brandimage,
'Price': price,
Full text available at: http://dx.doi.org/10.1561/1700000080
105
'Service': service,
'Rating_': rating})
df = pd.DataFrame(data)
# creates variables beta1, ..., beta50 and assigns them values from betas
for i in range(1, 51):
globals()[f'beta{i}'] = betas[i - 1]
print(beta1,beta2)
print(betas)
# generates a dictionary containing 50 key-value pairs, keys are "mu1", "mu2", ...,
,→ "mu50", values are random integers in range (5, 15)
mu_gen = {}
for i in range(1, 51):
mu_gen[f"mu{i}"] = random.randint(5, 15)
print(mu_gen)
n=200
treated_period=int(n/2)
random_state=100
data= pd.DataFrame(index=range(n))
# data_w holds the weighted control units Y[i]w, weighted by their corresponding beta
,→ values
data_w=pd.DataFrame(index = range(n))
for i, Y in enumerate (control_units):
data_w[control_units[i] + 'w'] = betas[i] * data[control_units[i]]
data_w['error'] = data['error']
data_w.to_csv('Data_w.csv',index=False)
107
data['Y0_treated'] = data_w[list(data_w.columns)].sum(axis = 1)
# treatment effect for the treated units (post-treatment periods 101-200), which
,→ equals 10 + eps ~Uniform (-0.1,+0.1)
data['eps_te'] = np.concatenate((np.zeros(treated_period,dtype = int),
np.random.uniform(-0.1,0.1,treated_period) + 10))
B
Stata Code for Analysis of Data
Nearest-Neighbor Matching
import␣delimited␣"/simulated_TV_areas_nnmatch.csv",␣clear
rename␣female␣pct_female
*Dependent var: purchase rate
teffects␣nnmatch␣(purchase_rate_t␣avg_age␣avg_income␣pct_female
pct_days_promo)(treatment),␣nneighbor(1)
*Dependent var: percent of first-time buyer
teffects␣nnmatch␣(pct_buyer_1sttime_t␣avg_age␣avg_income␣pct_female
pct_days_promo)(treatment),␣nneighbor(1)
108
Full text available at: http://dx.doi.org/10.1561/1700000080
109
Instrumental Variable
import␣delimited␣"/IV_Data.csv",␣clear
*Manually run 2sls
reg␣ad_expenditure␣ad_costs
gen␣constructed_ad_expenditure␣=␣_b[_cons]␣+␣_b[ad_costs]␣∗␣ad_costs
reg␣sales␣constructed_ad_expenditure
*IV
ivregress␣2sls␣sales␣(ad_expenditure␣=␣ad_costs)
*Test of endogeneity: check whether ad_expenditure is endogenous
estat␣endog
*Check whether the instrument is weak
estat␣firststage
*Performing synthetic control method using all 50 control states and all pre-treatment periods
use␣"/Synth_Panel.dta",␣clear
tsset␣state␣Period
synth␣Sales␣Sales(1(1)100),␣trunit(0)␣trperiod(101)␣fig␣keep(100period_50state)
graph␣export␣"/Graph_100period_50state.png",␣replace
*Performing synthetic control method using partial (25) control states and all
pre-treatment periods
use␣"/Synth_Panel.dta",␣clear
keep␣if␣State<26
tsset␣state␣Period
synth␣Sales␣Sales(1(1)100),␣trunit(0)␣trperiod(101)␣fig␣keep(100period_25state)
graph␣export␣"/Graph_100period_25state.png",␣replace
*Performing synthetic control method using all 50 control states and 50 pre-treatment periods
use␣"/Synth_Panel.dta",␣clear
keep␣if␣Period>50
tsset␣State␣Period
synth␣Sales␣Sales(51(1)100),␣trunit(0)␣trperiod(101)␣fig␣keep(50period_50state)
graph␣export␣"/Graph_50period_50state.png",␣replace
Full text available at: http://dx.doi.org/10.1561/1700000080
*Calculating treatment effects using DID (reshape data from wide format to long format)
use␣"/100period_50state.dta",␣clear
drop␣_Co_Number␣_W_Weight
reshape␣long␣_Y_,␣i(_time)␣j(State,␣string)
gen␣treatment␣=␣0
replace␣treatment␣=␣1␣if␣_time␣>=101␣&␣State␣==␣"treated"
encode␣State,␣generate(nState)
xtset␣nState
xtdidregress␣(_Y_)(treatment),␣group(nState)␣time(_time)
Differences-In-Differences
use␣"/Synth_Panel.dta",␣clear
*DID result using State2 as control (which has weight of beta 0.8 when generating the treatment
unit in our simulation)
keep␣if␣State␣==2␣|␣State␣==0
gen␣treatment␣=␣0
replace␣treatment␣=␣1␣if␣Period␣>=101␣&␣State␣==␣0
xtset␣State
xtdidregress␣(Sales)(treatment),␣group(State)␣time(Period)
Full text available at: http://dx.doi.org/10.1561/1700000080
C
ADID, Alternative Methods for ATT Estimation,
and Double Machine Learning
111
Full text available at: http://dx.doi.org/10.1561/1700000080
ˆ 1t = y (1) − y (0) = Treatment effect for the first unit at time t, with
∆ 1t 1t
t > T1 + 1.
We can describe the observed data as: yit = dit yit 1 +(1−d )y 0 , where
it it
dit = 1 if the i-th unit receives treatment at time t and 0 otherwise.
Given the above assumption on timing and units treated, the ATT
estimator that averages ∆ ˆ it over the post-treatment period is ∆ ˆ1 =
1 PT ˆ
T2 T1 +1 ∆it where T2 = T − T1 is the number of post-treatment time
periods., where ∆ ˆ it = y (1) − y (0) . While the data are available for y (0)
it it it
(1)
but not for yit , we need to estimate this quantity. The methods differ
according to the model used for this estimation. Table C.1 below shows
the models and differences.
To summarize, the ADID method is more flexible than DID because
it controls for slope in addition to intercept differences in pre-treatment
periods. This also means that it only requires that the treated unit’s
trend is parallel to a slope-adjusted trend of the control unit in pre-
treatment periods. Compared to Synthetic Control Methods that require
no intercept and weights sum to one, ADID require equal weights, but
the weights can sum to any value.
Method Model for y 1t Formula for ŷ 01t Formula for ATT Comments
0 1 0
PT
ADID yit = δ1 + δ2 y co,t + error ŷ1t,ADID = δ̂1 + δ̂2 yco,t T2 (y1t − y̆ˆ1t,ADID ) Compared HCW, if βj s are equal,
T1 +1
; t = 1, . . . , T 1 yields ADID
0 1 0
PT
DID yit = δ1 + y co,t + error ŷ1t,DID = δ̂1 + δ̂2 yco,t T2 (y1t − y̆ˆ1t,DID ) Compared HCW, if βj s are equal
T1 +1
; t = 1, . . . , T1 and positive, and sum to 1,
yields DID
0 1 0
PT
SC y1t = zt0 β + error; βSC ŷ1t,sc = zt0 β̂sc T2 (y1t − y̆ˆ1t,SC ) Compared HCW, if βj s are all
T1 +1
minimizes error sum of positive and sum to 1 with no
squares intercept, yields SC
0 0 1 0
PT
MSC Same as for SC with the ŷ1t,M SC = zt β̂M SC T2 (y1t − y̆ˆ1t,M SC ) Compared HCW, if βj s are all
T1 +1
conditions,
PN β1 = positive, yields MSC
0; βj = 1; βj ≥ 0.
j=2
0 1 0
PT
HCW y1t = zt0 β + error; t = ŷ1t,HCW = zt0 β̂OLS T2 (y1t − y̆ˆ1t,HCW ) This is the base case against
T1 +1
1, . . . , T 1; β̂OLS is the least which comparisons are made.
squares estimate of β
Full text available at: http://dx.doi.org/10.1561/1700000080
113
Full text available at: http://dx.doi.org/10.1561/1700000080
D
Useful Resources
114
Full text available at: http://dx.doi.org/10.1561/1700000080
115
117
Gu, X. and P. K. Kannan (2021). “The dark side of mobile app adop-
tion: Examining the impact on customers’ multichannel purchase”.
Journal of Marketing Research. 58(2): 246–264.
119
Lu, S., K. Rajavi, and I. Dinner (2021). “The effect of over-the-top me-
dia services on piracy search: Evidence from a natural experiment”.
Marketing Science. 40(3): 548–568.
Neyman, J. (1979). “C (α) tests and their use”. Sankhyā: The Indian
Journal of Statistics, Series A: 1–21.
121
Shi, Z., X. Liu, and K. Srinivasan (2022). “Hype news diffusion and
risk of misinformation: the Oz effect in health care”. Journal of
Marketing Research. 59(2): 327–352.
References
122
Full text available at: http://dx.doi.org/10.1561/1700000080
References 123
124 References
References 125
126 References
References 127
128 References
References 129
130 References
References 131
132 References
References 133
Tan, A.-H. (1999). “Text mining: The state of the art and the challenges”.
Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery
from Advanced Databases. 8: 65–70.
Thistlethwaite, D. L. and D. T. Campbell (1960). “Regression-discon-
tinuity analysis: An alternative to the ex post facto experiment”.
Journal of Educational Psychology. 51(6): 309.
Thomas, M. (2020). “Spillovers from mass advertising: An identification
strategy”. Marketing Science. 39(4): 807–826.
Tian, Z., R. Dew, and R. Iyengar (2023). “Mega or Micro? Influencer
selection using follower elasticity”. Journal of Marketing Research.
doi: 10.1177/00222437231210267.
Tirunillai, S. and G. J. Tellis (2017). “Does offline TV advertising affect
online chatter? Quasi-experimental analysis using synthetic control”.
Marketing Science. 36(6): 862–878.
Unal, M. and Y.-H. Park (2023). “Fewer clicks, more purchases”. Man-
agement Science. 69(12): 7317–7334.
Wager, S. and S. Athey (2018). “Estimation and inference of hetero-
geneous treatment effects using random forests”. Journal of the
American Statistical Association. 113(523): 1228–1242.
Wang, Y., M. S. Qin, X. Luo, and Y. Kou (2022). “Frontiers: How
support for Black Lives Matter impacts consumer responses on
social media”. Marketing Science. 41(6): 1029–1044.
Wickens, C. D. and S. R. Dixon (2007). “The benefits of imperfect
diagnostic automation: A synthesis of the literature”. Theoretical
Issues in Ergonomics Science. 8(3): 201–212.
Xu, Y. (2017). “Generalized synthetic control method: Causal inference
with interactive fixed effects models”. Political Analysis. 25(1): 57–
76.
Yan, S., K. M. Miller, and B. Skiera (2022). “How does the adoption
of ad blockers affect news consumption?” Journal of Marketing
Research. 59(5): 1002–1018.
Yazdani, E., S. Gopinath, and S. Carson (2018). “Preaching to the choir:
The chasm between top-ranked reviewers, mainstream customers,
and product sales”. Marketing Science. 37(5): 838–851.
Full text available at: http://dx.doi.org/10.1561/1700000080
134 References