RA Presentation
RA Presentation
Group No 03
1. Introduction
Overview of the Dataset
01
● Date: Sale Date
● Age & Age group: Buyer’s age details
● Customer Gender: Buyer’s gender
● Country/State: Location of customer
Key Variables in ● Product Category/Sub-Category: Type of product sold
The Data Set ● Product: Specific product sold
● Oder Quantity: Number of units sold
● Unit Cost: Product Cost
● Unit Price: The selling price of the product
● Revenue: Total revenue from the product
02
Key Questions to Address
What are the main factors generating Revenue?
Identify the impact of unit price, order quantity, and customer age on revenue.
03
Objective Methodology
Steps:
Checked for missing values in critical
columns: Revenue, Unit Price, Order
Quantity, Customer Age.
Example:
R Code:
06
Feature Scaling:
Making sure all number columns are on a similar scale.
Example:
• Age and Purchase Amount are scaled so that one doesn’t overpower the other.
Methods:
• Normalization: Changing values to fit between 0 and 1.
• Standardization: Adjusting values to have a mean of 0.
Why: Helps the model learn better by treating all numbers equally.
R Code:
08
Univariate Analysis:
12
Check Assumptions of Linear Regression
1. Linearity 2. Homoscedasticity
13
3. Multicollinearity 4. Normality of Residuals
We can check for multicollinearity The residuals should be approximately
using: Variance Inflation Factor (VIF) : normally distributed. This can be
A VIF value above 5 indicates high checked using a Q-Q plot or histogram
multicollinearity of residuals.
14
Feature Selection via Stepwise Regression
15
Final Model:
16
5. Model Evaluation
Interpretation of Coefficients:
• Intercept (-84.74): The intercept represents the expected value of revenue when all the
independent variables are zero.
• Unit_Price (1.27): This coefficient is highly significant, with a p-value less than 2e-16.
• Order_Quantity (20.39): This also indicates a very strong positive relationship between order
quantity and revenue, as shown by the t-value of 76.59 and a p-value of less than 2e-16.
• Customer_Age (0.57): Though the effect of age is smaller compared to unit price and order
quantity, it is statistically significant (p-value = 0.00388), indicating that older customers are
likely to contribute slightly more revenue.
17
Statistical Significance:
All the predictor variables (Unit_Price, Order_Quantity, and Customer_Age) are
statistically significant at the 0.01 level, meaning they have a significant effect on
predicting revenue.
Model Fit:
Multiple R-squared (0.6863): Approximately 68.63% of the variance in revenue is
explained by the model. This suggests a strong relationship between the predictors and
the outcome.
Adjusted R-squared (0.6863): The value is nearly identical to the Multiple R-squared,
indicating that adding more predictors would likely not improve the model significantly.
Residual Standard Error (733.2): the error is relatively high; it is expected given the scale
of the revenue values.
18
F-statistic (8.243e+04, p < 2.2e-16):
With an extremely high F-statistic and a p-value < 2.2e-16, we can conclude that the
model significantly explains the variation in revenue and that the relationship between
the predictors and revenue is statistically significant.
Residual Analysis:
The residuals are generally well-behaved, with a median
of -45, suggesting that the model does not systematically
overestimate or underestimate the revenue.
The spread of residuals appears somewhat wide (from -
1785 to 54507), which indicates some variability in how
well the model fits different points, especially at extreme
values.
19
Cross-Validation:
Cross-validation prevents overfitting and provides a more robust evaluation of the
model’s generalizability. The RMSE and R-squared from cross-validation confirms
consistent model performance.
20
6. Interpretation of Results
Interpretation of P-values:
Unit Price: Customer Age:
• High revenue linked with higher unit prices • Highest revenue in June, followed by December
and larger order quantities. and April.
• Sales volume decreases with very high • Align promotions with seasonal peaks for
prices, indicating customer price sensitivity. maximum sales impact.
• Adults aged 35-64 contribute most to • Australia and Germany show significant high-
revenue. revenue transactions.
• Youth (<25) prefer lower-priced mountain • Emerging markets like Spain and the
bikes. Netherlands offer growth opportunities.
22
8. Conclusion
Key Recommendations for Business Actions:
Targeted Marketing Strategies:
1. Youth Customers (<25): Focus on affordable product lines like mountain bikes
through online campaigns.
2. Adults (35-64): Promote premium products (e.g., road bikes, electric bikes) via
personalized in-store experiences.
Product Portfolio Optimization:
3. Prioritize road and mountain bikes while managing pricing to avoid deterring sales.
4. Expand into the electric bike market to tap into eco-friendly transportation demand.
Seasonal Promotions:
5. Capitalize on peak summer sales by aligning pricing and inventory strategies with
customer demand.
Geographic Expansion:
6. Focus on emerging markets like Spain and the Netherlands, customizing strategies
to fit local preferences and sensitivities. 23
Limitations and Future Opportunities
Data Limitations:
The analysis is based on historical data. Incorporating real-time trends and evolving customer
behavior is crucial for future accuracy.
Further Research:
Implement predictive modeling (e.g., machine learning) to better forecast future sales and
optimize business strategies.
24
9. References
● Source: https://www.kaggle.com/datasets/sadiqshah/bike-sales-in-europe
● Source: https://www.spss-tutorials.com/spss-kolmogorov-smirnov-test-for-normality/
● Source: https://www.mordorintelligence.com/industry-reports/europe-bicycle-market :
● https://www.mordorintelligence.com/industry-reports/europe-bicycle-market
● Applied Statistics and Probability for Engineers. John Wiley & Sons.
● Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52(3/4), 591-611.
25
Group Members :-
●D/DBA/23/0039 – D.M.D.N Dissanayaka
●D/DBA/23/0030 – H.A.H.D Wijayarathna
THANK YOU! ●D/DBA/23/0040 – P.T.G.S.T Thawalpitiya
●D/DBA/23/0024 – K.S Samarawickrama
●D/DBA/23/0015 – P.V.R Hirushi
●D/DBA/23/0034 – N.D.T.V Nawagamuwa