0% found this document useful (0 votes)
14 views27 pages

RA Presentation

Regressions Analysis project PPT

Uploaded by

2001remeena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

RA Presentation

Regressions Analysis project PPT

Uploaded by

2001remeena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Regression Analysis on

The Market of Bike Sales in Europe

Group No 03
1. Introduction
Overview of the Dataset

 The dataset is a comprehensive collection of European bike


sales from 2010 to 2020

 Containing 113,036 transactions and 18 variables.

 It includes information on product categories, sales volumes,


revenue, and customer profiles.

01
● Date: Sale Date
● Age & Age group: Buyer’s age details
● Customer Gender: Buyer’s gender
● Country/State: Location of customer
Key Variables in ● Product Category/Sub-Category: Type of product sold
The Data Set ● Product: Specific product sold
● Oder Quantity: Number of units sold
● Unit Cost: Product Cost
● Unit Price: The selling price of the product
● Revenue: Total revenue from the product

02
Key Questions to Address
What are the main factors generating Revenue?
Identify the impact of unit price, order quantity, and customer age on revenue.

Impact of customer age group?


Explore how age affects purchasing behavior, order quantity, and price sensitivity.

What is the Relationship Between Unit Price & Order Quantity?


Analyze how changes in price and order volume influence revenue.
Are there combined effects between the independent
variables?

03
Objective Methodology

The primary goal is to build a model This analysis utilizes multiple


that explains the variance in revenue regression techniques to analyze the
based on the features in the dataset. relationships between the dependent
By answering the key questions, variable (revenue) and independent
we aim to provide actionable insights variables.
for,
● Optimizing Pricing Strategies Furthermore, we will conduct
● Managing Inventory exploratory data analysis (EDA) to
● Customizing Marketing Approaches identify trends, outliers, and patterns
within the dataset.
04
2. Data Preprocessing
Handling Missing Data

Objective: Ensure data quality by


identifying and handling missing values.

Steps:
Checked for missing values in critical
columns: Revenue, Unit Price, Order
Quantity, Customer Age.

Output: No missing values found in the


critical columns, ensuring data integrity
for further analysis.

Data Cleaning: Removed any rows with 05


missing values in non-essential columns
Encoding Categorical Variables
Converting text data into numbers so the model can understand it.

Example:

• Turn Gender into numbers: "Male" = 1, "Female" = 0.


• Change Payment Method into categories: Cash = 1, Card = 2, etc.

Why: Regression models need numbers, not words, to make calculations.

R Code:

data$Gender <- as.numeric(factor(data$Gender))


data$Item_Purchased <- as.numeric(factor(data$Item_Purchased))
data$Payment_Method <- as.numeric(factor(data$Payment_Method))

06
Feature Scaling:
Making sure all number columns are on a similar scale.

Example:
• Age and Purchase Amount are scaled so that one doesn’t overpower the other.

Methods:
• Normalization: Changing values to fit between 0 and 1.
• Standardization: Adjusting values to have a mean of 0.

Why: Helps the model learn better by treating all numbers equally.

R Code:

data$Customer_Age <- scale(data$Customer_Age)


data$Purchase_Amount <- scale(data$Purchase_Amount)
07
3. Exploratory Data Analysis (EDA)
• This aims to uncover key insights from the European bike
sales data.

• The focus of our analysis is primarily on understanding


revenue, and how it is influenced by factors such as customer
demographics, order quantities, and unit prices.

• We'll use both univariate and bivariate analysis to explore


these relationships before moving to modeling.

08
Univariate Analysis:

Order Quantity: Mostly small orders with a few large


bulk orders
The most orders involve smaller quantities,
Revenue: Positively skewed distribution with outliers though some larger bulk orders stand out as
outliers. These large orders might correspond
This histogram shows that most sales generate to special deals or wholesale purchases.
lower revenue, with some outliers representing
high-value transactions. This could indicate bulk 09
orders or premium products.
Bivariate Analysis:

1. Higher unit prices are often 2. Higher order quantities generally


associated with higher revenue, but correspond to higher revenue.
interestingly, some high-revenue However, we also observe a few
transactions occur at mid-range outliers where small orders produce
prices. This suggests that premium substantial revenue, likely due to
products, although fewer in number, high-value items. 10
significantly impact revenue.
Additional Insights

• In terms of geography, Australia and Canada have a few


outliers with extremely high revenue, likely due to specific
large transactions.
• In product categories, the Bikes category dominates the
revenue, while Accessories and Clothing contribute much
less. This highlights the importance of focusing on bike
sales as a major revenue driver.
• June and December stand out as the top months. This
suggests potential seasonality, with summer and the
holiday season driving higher bike sales.
• These insights can help the company plan inventory and
promotional strategies.
11
4.Model Building for Revenue Prediction
Objective :
To predict the Revenue generated from bike sales using relevant variables from the dataset.

The basic linear regression model has the three variables:

 Unit_Price : The price per unit of the bike.


 Order_Quantity : The number of units ordered by the customer.
 Customer_Age : The age of the customer making the purchase.

12
Check Assumptions of Linear Regression
1. Linearity 2. Homoscedasticity

13
3. Multicollinearity 4. Normality of Residuals
We can check for multicollinearity The residuals should be approximately
using: Variance Inflation Factor (VIF) : normally distributed. This can be
A VIF value above 5 indicates high checked using a Q-Q plot or histogram
multicollinearity of residuals.

14
Feature Selection via Stepwise Regression

Stepwise Regression iteratively adds or removes predictors based on the Akaike


Information Criterion (AIC), which balances model fit and complexity.

The stepwise regression confirmed


that all three variables Unit_Price,
Order_Quantity, and Customer_Age
are significant contributors to
predicting revenue. As a result, all
three were retained in the final
model.

15
Final Model:

Revenue = β0 + β1 (Unit Price) + β2 (Order Quantity) + β3 (Customer Age) + €

16
5. Model Evaluation
Interpretation of Coefficients:
• Intercept (-84.74): The intercept represents the expected value of revenue when all the
independent variables are zero.

• Unit_Price (1.27): This coefficient is highly significant, with a p-value less than 2e-16.

• Order_Quantity (20.39): This also indicates a very strong positive relationship between order
quantity and revenue, as shown by the t-value of 76.59 and a p-value of less than 2e-16.

• Customer_Age (0.57): Though the effect of age is smaller compared to unit price and order
quantity, it is statistically significant (p-value = 0.00388), indicating that older customers are
likely to contribute slightly more revenue.
17
Statistical Significance:
All the predictor variables (Unit_Price, Order_Quantity, and Customer_Age) are
statistically significant at the 0.01 level, meaning they have a significant effect on
predicting revenue.

Model Fit:
 Multiple R-squared (0.6863): Approximately 68.63% of the variance in revenue is
explained by the model. This suggests a strong relationship between the predictors and
the outcome.
 Adjusted R-squared (0.6863): The value is nearly identical to the Multiple R-squared,
indicating that adding more predictors would likely not improve the model significantly.
 Residual Standard Error (733.2): the error is relatively high; it is expected given the scale
of the revenue values.
18
F-statistic (8.243e+04, p < 2.2e-16):
With an extremely high F-statistic and a p-value < 2.2e-16, we can conclude that the
model significantly explains the variation in revenue and that the relationship between
the predictors and revenue is statistically significant.

Residual Analysis:
 The residuals are generally well-behaved, with a median
of -45, suggesting that the model does not systematically
overestimate or underestimate the revenue.
 The spread of residuals appears somewhat wide (from -
1785 to 54507), which indicates some variability in how
well the model fits different points, especially at extreme
values.

19
Cross-Validation:
Cross-validation prevents overfitting and provides a more robust evaluation of the
model’s generalizability. The RMSE and R-squared from cross-validation confirms
consistent model performance.

20
6. Interpretation of Results
Interpretation of P-values:
Unit Price: Customer Age:

• P-value: 0 (Highly significant) • P-value: 0.0039 (Statistically significant)


• Strong positive impact on revenue. • Smaller effect compared to other variables.
• A $1 increase in unit price leads to a $1.27 rise in • Older customers slightly contribute more to
revenue. revenue.
Order Quantity:
Summary
• P-value: 0 (Highly significant)
• Positive relationship with revenue. • Unit Price and Order Quantity are the most
• Each additional unit sold increases revenue by critical factors.
$20.39. • Customer Age has a modest but significant
impact. 21
7. Insights and Findings
Revenue Drivers: Seasonality:

• High revenue linked with higher unit prices • Highest revenue in June, followed by December
and larger order quantities. and April.
• Sales volume decreases with very high • Align promotions with seasonal peaks for
prices, indicating customer price sensitivity. maximum sales impact.

Customer Demographics: Geographic Trends:

• Adults aged 35-64 contribute most to • Australia and Germany show significant high-
revenue. revenue transactions.
• Youth (<25) prefer lower-priced mountain • Emerging markets like Spain and the
bikes. Netherlands offer growth opportunities.
22
8. Conclusion
Key Recommendations for Business Actions:
Targeted Marketing Strategies:
1. Youth Customers (<25): Focus on affordable product lines like mountain bikes
through online campaigns.
2. Adults (35-64): Promote premium products (e.g., road bikes, electric bikes) via
personalized in-store experiences.
Product Portfolio Optimization:
3. Prioritize road and mountain bikes while managing pricing to avoid deterring sales.
4. Expand into the electric bike market to tap into eco-friendly transportation demand.
Seasonal Promotions:
5. Capitalize on peak summer sales by aligning pricing and inventory strategies with
customer demand.
Geographic Expansion:
6. Focus on emerging markets like Spain and the Netherlands, customizing strategies
to fit local preferences and sensitivities. 23
Limitations and Future Opportunities
Data Limitations:

The analysis is based on historical data. Incorporating real-time trends and evolving customer
behavior is crucial for future accuracy.

Further Research:

Explore income-based and geographical segmentation to refine marketing.

Implement predictive modeling (e.g., machine learning) to better forecast future sales and
optimize business strategies.

24
9. References
● Source: https://www.kaggle.com/datasets/sadiqshah/bike-sales-in-europe

● Source: https://www.spss-tutorials.com/spss-kolmogorov-smirnov-test-for-normality/

● Europe-bicycle-market: - Growth Trends & Forecasts (2024 - 2029)

● Source: https://www.mordorintelligence.com/industry-reports/europe-bicycle-market :

● https://www.mordorintelligence.com/industry-reports/europe-bicycle-market

● Montgomery, D. C., & Runger, G. C. (2014).

● Applied Statistics and Probability for Engineers. John Wiley & Sons.

● Discovering Statistics Using IBM SPSS Statistics. Sage.

● Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52(3/4), 591-611.

25
Group Members :-
●D/DBA/23/0039 – D.M.D.N Dissanayaka
●D/DBA/23/0030 – H.A.H.D Wijayarathna
THANK YOU! ●D/DBA/23/0040 – P.T.G.S.T Thawalpitiya
●D/DBA/23/0024 – K.S Samarawickrama
●D/DBA/23/0015 – P.V.R Hirushi
●D/DBA/23/0034 – N.D.T.V Nawagamuwa

You might also like