Final Project Marketing Analytics
Final Project Marketing Analytics
1. Significance F: Interpretation
Our value of significance F is basically zero. This is a good sign because it shows that
our regression model is able to explain the data we are studying as the variables
selected have a significant relationship. The smaller the value of F, the better; as we
obtain zero in our calculation, the regression model is proven to be valuable for our
analysis.
2. Regression: Convenient and Variety
The convenient and variety variables have coefficient values of 0.148 and 0.451
respectively. These results tell us that both factors have a positive relationship with the
dependent variable of price. This means that when the values of variety and convenient
increase, the price will increment as well.
3. P-value interpretation:
The p-value for the convenient variable is 0.007 and for variety is nearly zero
(1.5182E-12). This indicator determines if we can reject the null hypothesis or not. In
this case, the p-value for the interception is 7.2087E-07 which almost zero so we are
confident that the observed data will occur because the results are statistically
significant shown by the closed association between the independent variables and the
dependent variable.
4. R^2 Interpretation:
This indicator explains the percentage of variance of the dependent variable. Measures
the strength of the relationship between the dependent variable and the model. In this
case, the R2 is 0.216 which is a low value that we would have aim for. In other words,
the model is not able to fully explain all the variations the response variable might have
around its mean. Nevertheless, that doesn’t mean that the regression model is not
valuable for our study.
Summary
Overall, our F value indicates that our model is a good fit for our data, similarly, the
coefficients shows the type of relationship the variables have with the dependent variable,
which is a positive direct relationship. The p-value help us know if there’s an association
between the independent variables and dependent variable. It showed that there’s a clear
correlation between the variables we are analyzing because the p-value is 1.5182E-12
almost zero. On the other hand, the R2 indicates that not every variation in the dependent
variable can be explained in the model. Proving that the model shows valuable information
but we will observe discrepancies in the results.
Section 3: Logistic Regression
1. What is the value of the intercept?
The value of the intercept is -0.885.
2. What is the value of the coefficient for “Patronage”, and what is the p-value of the
coefficient?
The value of the coefficient for “Patronage” is 0,34. The p-value is 0,0029.
3. Compute the predicted probability on the number of signing up for promotional
email communication.
1) Number of Chipotle visits in the past 3 months (i.e., the # of visits is 0).
=1/ (1+EXP (-(-0.885+0,34*0))) = 0.29214273
2) Number of Chipotle visits in the past 3 months (i.e., the # of visits is 1).
=1/ (1+EXP (-(-0.885+0,34*1))) = 0.36702522
3) Number of Chipotle visits in the past 3 months (i.e., the # of visits is 3).
=1/ (1+EXP (-(-0.885+0,34*3))) = 0.53369884
4. Overall error rate and Evaluation.
For this study, the overall error rate is 29,84% which give us a 70% of accuracy for our
prediction model. However, the percentage of error for the predictions of the people that
would be interested in receiving promotional emails is 86,66% and the predictions that
determined the customers that wouldn’t desire to receive emails have just a 2.72% of error
rate. In this analysis is more vital to be able to estimate those people that want to receive
emails than those who doesn’t. This indicates that the use of the confusion matrix in this
case is not useful enough to study our clients.
Section 4: K-means clustering
• Describe each cluster’s characteristics
Cluster 1 resulted with positive values for all 3 variables of healthy, taste and ambience, in
which the pivot table indicates the level of relevance people gave to every item: 5,5, and
4.4 respectively. This shows that cluster 1 wants a restaurant with healthy and tasty food
with a good atmosphere overall. On the other hand, cluster 2 gave lower relevance to those
variables being the average of each factor 4.07, 4, and 3.5; and negative cluster values such
as -0.658, -3.386, and -1.032 respectively. Finally cluster 3 has 2 negative cluster values of
-1.118, 0.294, and -0.349; with rating averages of 3.77, 5, and 4, respectively.
In summary, cluster 1 is characterized as the largest and most homogenous cluster in which
people appreciate all 3 variables the same; while cluster 2 has the smallest values showing
that some people don’t value those elements as much. Meanwhile, cluster 3 is the most
heterogeneous group, where participants have different perceptions for every item. These
clear differences help us to determine the ideal target cluster for the restaurant Chipotle.
• Select one target cluster and provide a rationale for targeting this specific group.
The restaurant should target the group of people who value the most the main strengths of
the restaurant Chapotle. In this case, we can infer that Chapotle would target cluster 1
because they put real attention in the taste and healthy ingredients of the recipes. Similarly,
Chapotle must invest and keep improving the business atmosphere and taste ratings.
Usually those people who value these variables the most, recommend their favorite place to
their friends and family, which represents an opportunity for the business to look for
potential customers as well.
Section 5. Analysis of Your Choice
1. Variety of price
Variety has a positive relationship with the variable price, having a p-value of
almost zero. In other words, variety can influence the price, because the restaurant
can offer more diverse dishes and recipes in its menu. This action can encourage the
business to increase prices and justify the strategic pricing on customer satisfaction.
2. Target group cluster 1
The first group of participants’ value heavenly on good ambience, tasty dishes, and
healthy ingredients. As mentioned above, Chipotle must focus in cluster 1 because
they prefer those elements that the restaurant do best. More investment must be
done to offer healthy and tasty food. Similarly, the restaurant´s environment needs
to be continuously improved in order to retain loyal customers and attract potential
clients who heard about the restaurant by a word-of-mouth recommendation. The
restaurant must aim to get recognized through the city with large number of good
reviews, demanding customers such as those in cluster 1, remain loyal when they
find a good place to eat.
Section 6. Summary and Marketing Recommendations
Summary
In this study, we analyzed the demographics, people’s preference and displayed
information of other restaurants similar to Chipotle. Throughout the data we collected, we
can say that despite Chick-Fil-A attracts higher income customers, Chipotle can attract
younger people by giving a good ambience, a variety of flavors and healthy food. In
addition, the data shows that the restaurant is able to increase prices if they invest in
diversity in the food and in the overall environment of the business. The multiple regression
model fits very well our information, nevertheless, the model is not completely able to
explain all the discrepancies the dependent variable can have, similarly, the logarithmic
regression which includes the confusion matrix, is not able to predict the people that would
want to receive an email from the restaurant. On the other hand, the possibilities for
someone to accept promotional emails even if they hadn’t visit the place is 29.2 % and if
individuals visited the restaurant 3 times the possibility increases to 53.36%. Finally, our
cluster group 1 are the ideal customers for Chapotle because they value much more
characteristics such as taste, healthiness and ambiance at the moment of choosing a
restaurant.
Every restaurant’s data is divided into gender (female and male), with this information, we
can identify which type of customers the restaurant attract the most. Finally, the total
average is at the bottom from all businesses.
Tables Section 2.
Regression Stadistics
Multiple regression 0.46567
coefficient 374
0.21685
R^2 204
0.21198
R^2 adjusted 776
0.95096
Standard error 568
Observations 325
VARIANCE
ANALYSIS
Significa
df SS MS F nce F
80.63159 40.3157 44.5805 8.1069E-
Regression 2 19 959 636 18
0.90433
Residual 322 291.1961 572
371.8276
Total 324 92
The multiple regression provides these results such as the coefficients of the independent
variables, the standard error, p-value and residuals. We also have the variance analysis with
the significance F value that helps us to analyze the overall model.
Section 3 Tables.
Metric Value
# Iterations
Used 2
Residual DF 323
Residual 399.7277
Deviance 984
0.022571
Multiple R2 421
Includ
Predictor Criteria ed
16.83220
Intercept 121 TRUE
19.67231
patronage 557 TRUE
Confusion Matrix
Actual\
Predicted 0 1
0 214 6
1 91 14
Error Report
# # %
Class Cases Errors Error
2.7272
0 220 6 7
86.666
1 105 91 6
29.846
Overall 325 97 1
Metrics
Metric Value
Accuracy
(#correct) 228
Accuracy 70.1538
(%correct) 462
0.97272
Specificity 727
Sensitivity 0.13333
(Recall) 333
Precision 0.7
F1 score 0.224
Success Class 1
Success
Probability 0.5
Logistic regression provides many tables as well as information about the predictors for our
model. In the confusion matrix we have the actual and predicted variables, below is the
error report with the percentage of error behind every prediction.
Section 4 Tables: K-Means Clustering
importanthe important importantam
Cluster althy taste bience
Cluster
1 0.712 0.294 0.334
Cluster
2 -0.658 -3.386 -1.032
Cluster
3 -1.118 0.294 -0.349
For K-means clustering we have the 3 variables such as health, taste, and ambience divided
into 2 clusters in which participants graded their level of preference for each factor.