Bnad Case Assignment 1 - Hunter Bona
Bnad Case Assignment 1 - Hunter Bona
DATE: 3/23/2023
Introduction
After looking through your impressively large dataset on the various wines you sell and procure,
I was able to come to multiple conclusions. During my analysis, I was able to find at least one
difference in average rating as well as average price between these 4 different wine types. As
well, I was able to discover direct correlations between wine cost, and wine rating. When looking
at the skewness and kurtosis of the various wines, I was able to find out that although the
skewness seems to vary more, all wine types fall under negative kurtosis, showing that prices
aren’t as optimized as they could be. Speaking of prices, the average prices were heavily affected
by the large outliers in the cost of the different wines. By doing ANOVA tests on both the
average ratings and average prices, we were able to reject the null hypothesis, with both tests
resulting in a p-value lower than 0.05. With this being said, I ultimately found at least one
difference in average rating and price between the 4 wine types, and furthermore explanation for
this hypothesis is provided below.
Data Analysis
Descriptive Statistics
Averages: The averages between the 4 wines respective to their ratings and prices vary
tremendously yet share similar characteristics within their types. As best visualized by the bar
graph at the end of this document, the averages and prices share a similar difference ratio
between the 4 wine types. For example, the rose had the lowest average rating and the lowest
average price, while the sparkling wine had the highest average rating as well as the highest
average price. The mean wine ratings between the 4 types were all within roughly 0.25 rating
points of each other, and assuming that 5 is the best possible rating that one can give, that doesn’t
allow much difference in the public’s opinion of each wine. The mean wine prices however vary
much more, but means are on a much greater scale than the ratings means are.
Skewness/Kurtosis: The skewness between the different wines varies a generous amount with
red and white wine having a left-skewed distribution and rose and sparkling having a
right-skewed distribution. With kurtosis on the other hand, rose ended up having the highest
kurtosis measure while sparkling wine had the lowest. It is worth mentioning that all the kurtosis
values were under 3, meaning they all have a negative kurtosis and a curve that's too flat to be
perfectly optimal.
Outliers: The maximums and minimums between prices and ratings differ greatly however this is
due to the scale that both these metrics are within. The maximum and minimums in the wine
ratings aren’t nearly as far apart as those for the wine prices. These outliers really only have a
major effect on the price means as the rating means are too close together to truly have an outlier.
This is not the case for wine prices however, as for example the cheapest red wine that the
winery holds costs $3.55, while the most expensive wine they hold costs a little over $3400.
With these drastic differences in price, it heavily affects the mean and makes it hard to find a
definitive average price.
Because the p-value of 3.18E-74 is less than the significance level of 0.05, we can reject the null
hypothesis and can conclude there is at least one difference in average rating across the 4 wine
types.
Recommendation: With more testing you can try to find out which specific wines are causing the
average mean to dip the most and possibly try to replace those wines with better rated wines in
order to increase the mean rating.
Because the p-value of 1.47565E-52 is less than the significance level of 0.05, we can reject the
null hypothesis and can conclude there is at least one difference in average across the 4 wine
types.
Recommendation: With further testing you can try to implement cutting the cost on the more
expensive wines, but making the cheaper wines more expensive, and see how this impacts the
wines rating since the cost has a subconscious effect on the consumer.
In the overlapping bar graph, I was able to see the direct correlation between average price and
rating. As seen in the graph, wine types with the lowest average price also had the lowest
average rating, and wine types with the highest average price also had the highest rating. With
this correlation, it's safe to assume that the more the wine costs, the more likely it is that that
wine is rated higher out of 5 than a cheaper wine would be.
Conclusion
In conclusion, my analysis of the large dataset on the various wines revealed at least one
difference in average rating and price between the 4 wine types. I also discovered direct
correlations between wine cost and rating, and that prices could be further optimized. ANOVA
tests on both the average ratings and prices resulted in a p-value lower than 0.05, leading me to
reject the null hypothesis.
Appendix
Figure 1 - Red Wine Descriptive Statistics