Session3 and 4 - RKS - PredictiveAnalytics
Session3 and 4 - RKS - PredictiveAnalytics
Rakesh K Singh
Predictive Analytics
Marketers are constantly thinking of new and more efficient ways to engage
with their customers
Companies are looking for new ways to differentiate themselves from
competitors in order to retain customers.
Predictive analytics might be a solution that will allow them to better
understand and retain customers and acquire new ones more effectively.
Top 5 Reasons
1. Predict trends
2. Understand customers
3. Improve business performance
4. Drive strategic decision-making
5. Predict behavior
X3
Y
X1
X2
Multiple Regression Decision Process
Summated scales, or
Structural equation modeling procedures.
Meeting Multiple Regression Objectives
Only structural equation modeling (SEM) can directly
accommodate measurement error, but using summated scales
can mitigate it when using multiple regression.
When in doubt, include potentially irrelevant variables (as they
can only confuse interpretation) rather than possibly omitting a
relevant variable (which can bias all regression estimates).
Stage 2: Research Design of a Multiple
Regression Analysis
Issues to consider . . .
Sample size,
Unique elements of the dependence relationship can use
dummy variables as independents
Sample Size Considerations
Simple regression can be effective with a sample size of 20, but
maintaining power at .80 in multiple regression requires a minimum
sample of 50 and preferably 100 observations for most research
situations.
The minimum ratio of observations to variables is 5 to 1, but the
preferred ratio is 15 or 20 to 1, and this should increase when
stepwise estimation is used.
Maximizing the degrees of freedom improves generalizability and
addresses both model parsimony and sample size concerns.
Variable Transformations
Confirmatory (Simultaneous)
Sequential Search Methods:
Stepwise (variables not removed once included in regression
equation).
Forward Inclusion & Backward Elimination.
Hierarchical.
Combinatorial (All-Possible-Subsets)
Variable Selection Approaches
Theory must be a guiding factor in evaluating the final regression model
because:
Confirmatory Specification, the only method to allow direct testing
of a pre-specified model, is also the most complex from the
perspectives of specification error, model parsimony and achieving
maximum predictive accuracy.
Sequential search (e.g., stepwise), while maximizing predictive
accuracy, represents a completely automated approach to model
estimation, leaving the researcher almost no control over the final
model specification.
Combinatorial estimation, while considering all possible models,
still removes control from the researcher in terms of final model
specification even though the researcher can view the set of roughly
equivalent models in terms of predictive accuracy.
No single method is Best and the prudent strategy is to use a
combination of approaches to capitalize on the strengths of each to
reflect the theoretical basis of the research question
Interpretation
Coefficient of Determination
Variables Entered
Multicollinearity ??
Lets work on it!
Description of HBAT Primary Database Variables
Variable Description Variable Type
Data Warehouse Classification Variables
X1 Customer Type nonmetric
X2 Industry Type nonmetric
X3 Firm Size nonmetric
X4 Region nonmetric
X5 Distribution System nonmetric
Performance Perceptions Variables
X6 Product Quality metric
X7 E-Commerce Activities/Website metric
X8 Technical Support metric
X9 Complaint Resolution metric
X10 Advertising metric
X11 Product Line metric
X12 Salesforce Image metric
X13 Competitive Pricing metric
X14 Warranty & Claims metric
X15 New Products metric
X16 Ordering & Billing metric
X17 Price Flexibility metric
X18 Delivery Speed metric
Outcome/Relationship Measures
X19 Satisfaction metric
X20 Likelihood of Recommendation metric
X21 Likelihood of Future Purchase metric
X22 Current Purchase/Usage Level metric
X23 Consider Strategic Alliance/Partnership in Future nonmetric
Dataset HBAT
COMPUTE Pvalue=(x6+(11-x13))/2.
EXECUTE.
Check Assumptions
Correlations
X19 -
Satis factio
n Pos ts ales Marketing Ts upport Pvalue
Pears on X19 - 1.000 .632 .454 .234 .462
Correlatio Satis factio
n n
Pos ts ales .632 1.000 .311 .173 .093
Marketing .454 .311 1.000 .082 -.157
Ts upport .234 .173 .082 1.000 .112
In this case highest value is 1.155 for Postsales which indicates that
there is some association among the independent variables but it is not
a problem in predicting the dependent variable.
Estimating Regression Model
GRAPH
/SCATTERPLOT(BIVAR)=COO_1
WITH SRE_1
/MISSING=LISTWISE.
Outlier removal
USE ALL.
COMPUTE filter_$=(COO_1 <= 0.04).
VARIABLE LABEL filter_$ 'COO_1 <= 0.04
(FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1
'Selected'
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
4-39
Results Validation
4-40
Results Validation
Non-metric variables
Control variables