0% found this document useful (0 votes)

14 views27 pages

RA Presentation

Regressions Analysis project PPT

Uploaded by

2001remeena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views27 pages

RA Presentation

Regressions Analysis project PPT

Uploaded by

2001remeena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Regression Analysis on

The Market of Bike Sales in Europe

Group No 03
1. Introduction
Overview of the Dataset

 The dataset is a comprehensive collection of European bike

sales from 2010 to 2020

 Containing 113,036 transactions and 18 variables.

 It includes information on product categories, sales volumes,

revenue, and customer profiles.

01
● Date: Sale Date
● Age & Age group: Buyer’s age details
● Customer Gender: Buyer’s gender
● Country/State: Location of customer
Key Variables in ● Product Category/Sub-Category: Type of product sold
The Data Set ● Product: Specific product sold
● Oder Quantity: Number of units sold
● Unit Cost: Product Cost
● Unit Price: The selling price of the product
● Revenue: Total revenue from the product

02
Key Questions to Address
What are the main factors generating Revenue?
Identify the impact of unit price, order quantity, and customer age on revenue.

Impact of customer age group?

Explore how age affects purchasing behavior, order quantity, and price sensitivity.

What is the Relationship Between Unit Price & Order Quantity?

Analyze how changes in price and order volume influence revenue.
Are there combined effects between the independent
variables?

03
Objective Methodology

The primary goal is to build a model This analysis utilizes multiple

that explains the variance in revenue regression techniques to analyze the
based on the features in the dataset. relationships between the dependent
By answering the key questions, variable (revenue) and independent
we aim to provide actionable insights variables.
for,
● Optimizing Pricing Strategies Furthermore, we will conduct
● Managing Inventory exploratory data analysis (EDA) to
● Customizing Marketing Approaches identify trends, outliers, and patterns
within the dataset.
04
2. Data Preprocessing
Handling Missing Data

Objective: Ensure data quality by

identifying and handling missing values.

Steps:
Checked for missing values in critical
columns: Revenue, Unit Price, Order
Quantity, Customer Age.

Output: No missing values found in the

critical columns, ensuring data integrity
for further analysis.

Data Cleaning: Removed any rows with 05

missing values in non-essential columns
Encoding Categorical Variables
Converting text data into numbers so the model can understand it.

Example:

• Turn Gender into numbers: "Male" = 1, "Female" = 0.

• Change Payment Method into categories: Cash = 1, Card = 2, etc.

Why: Regression models need numbers, not words, to make calculations.

R Code:

data$Gender <- as.numeric(factor(data$Gender))

data$Item_Purchased <- as.numeric(factor(data$Item_Purchased))
data$Payment_Method <- as.numeric(factor(data$Payment_Method))

06
Feature Scaling:
Making sure all number columns are on a similar scale.

Example:
• Age and Purchase Amount are scaled so that one doesn’t overpower the other.

Methods:
• Normalization: Changing values to fit between 0 and 1.
• Standardization: Adjusting values to have a mean of 0.

Why: Helps the model learn better by treating all numbers equally.

R Code:

data$Customer_Age <- scale(data$Customer_Age)

data$Purchase_Amount <- scale(data$Purchase_Amount)
07
3. Exploratory Data Analysis (EDA)
• This aims to uncover key insights from the European bike
sales data.

• The focus of our analysis is primarily on understanding

revenue, and how it is influenced by factors such as customer
demographics, order quantities, and unit prices.

• We'll use both univariate and bivariate analysis to explore

these relationships before moving to modeling.

08
Univariate Analysis:

Order Quantity: Mostly small orders with a few large

bulk orders
The most orders involve smaller quantities,
Revenue: Positively skewed distribution with outliers though some larger bulk orders stand out as
outliers. These large orders might correspond
This histogram shows that most sales generate to special deals or wholesale purchases.
lower revenue, with some outliers representing
high-value transactions. This could indicate bulk 09
orders or premium products.
Bivariate Analysis:

1. Higher unit prices are often 2. Higher order quantities generally

associated with higher revenue, but correspond to higher revenue.
interestingly, some high-revenue However, we also observe a few
transactions occur at mid-range outliers where small orders produce
prices. This suggests that premium substantial revenue, likely due to
products, although fewer in number, high-value items. 10
significantly impact revenue.
Additional Insights

• In terms of geography, Australia and Canada have a few

outliers with extremely high revenue, likely due to specific
large transactions.
• In product categories, the Bikes category dominates the
revenue, while Accessories and Clothing contribute much
less. This highlights the importance of focusing on bike
sales as a major revenue driver.
• June and December stand out as the top months. This
suggests potential seasonality, with summer and the
holiday season driving higher bike sales.
• These insights can help the company plan inventory and
promotional strategies.
11
4.Model Building for Revenue Prediction
Objective :
To predict the Revenue generated from bike sales using relevant variables from the dataset.

The basic linear regression model has the three variables:

 Unit_Price : The price per unit of the bike.

 Order_Quantity : The number of units ordered by the customer.
 Customer_Age : The age of the customer making the purchase.

12
Check Assumptions of Linear Regression
1. Linearity 2. Homoscedasticity

13
3. Multicollinearity 4. Normality of Residuals
We can check for multicollinearity The residuals should be approximately
using: Variance Inflation Factor (VIF) : normally distributed. This can be
A VIF value above 5 indicates high checked using a Q-Q plot or histogram
multicollinearity of residuals.

14
Feature Selection via Stepwise Regression

Stepwise Regression iteratively adds or removes predictors based on the Akaike

Information Criterion (AIC), which balances model fit and complexity.

The stepwise regression confirmed

that all three variables Unit_Price,
Order_Quantity, and Customer_Age
are significant contributors to
predicting revenue. As a result, all
three were retained in the final
model.

15
Final Model:

Revenue = β0 + β1 (Unit Price) + β2 (Order Quantity) + β3 (Customer Age) + €

16
5. Model Evaluation
Interpretation of Coefficients:
• Intercept (-84.74): The intercept represents the expected value of revenue when all the
independent variables are zero.

• Unit_Price (1.27): This coefficient is highly significant, with a p-value less than 2e-16.

• Order_Quantity (20.39): This also indicates a very strong positive relationship between order
quantity and revenue, as shown by the t-value of 76.59 and a p-value of less than 2e-16.

• Customer_Age (0.57): Though the effect of age is smaller compared to unit price and order
quantity, it is statistically significant (p-value = 0.00388), indicating that older customers are
likely to contribute slightly more revenue.
17
Statistical Significance:
All the predictor variables (Unit_Price, Order_Quantity, and Customer_Age) are
statistically significant at the 0.01 level, meaning they have a significant effect on
predicting revenue.

Model Fit:
 Multiple R-squared (0.6863): Approximately 68.63% of the variance in revenue is
explained by the model. This suggests a strong relationship between the predictors and
the outcome.
 Adjusted R-squared (0.6863): The value is nearly identical to the Multiple R-squared,
indicating that adding more predictors would likely not improve the model significantly.
 Residual Standard Error (733.2): the error is relatively high; it is expected given the scale
of the revenue values.
18
F-statistic (8.243e+04, p < 2.2e-16):
With an extremely high F-statistic and a p-value < 2.2e-16, we can conclude that the
model significantly explains the variation in revenue and that the relationship between
the predictors and revenue is statistically significant.

Residual Analysis:
 The residuals are generally well-behaved, with a median
of -45, suggesting that the model does not systematically
overestimate or underestimate the revenue.
 The spread of residuals appears somewhat wide (from -
1785 to 54507), which indicates some variability in how
well the model fits different points, especially at extreme
values.

19
Cross-Validation:
Cross-validation prevents overfitting and provides a more robust evaluation of the
model’s generalizability. The RMSE and R-squared from cross-validation confirms
consistent model performance.

20
6. Interpretation of Results
Interpretation of P-values:
Unit Price: Customer Age:

• P-value: 0 (Highly significant) • P-value: 0.0039 (Statistically significant)

• Strong positive impact on revenue. • Smaller effect compared to other variables.
• A $1 increase in unit price leads to a $1.27 rise in • Older customers slightly contribute more to
revenue. revenue.
Order Quantity:
Summary
• P-value: 0 (Highly significant)
• Positive relationship with revenue. • Unit Price and Order Quantity are the most
• Each additional unit sold increases revenue by critical factors.
$20.39. • Customer Age has a modest but significant
impact. 21
7. Insights and Findings
Revenue Drivers: Seasonality:

• High revenue linked with higher unit prices • Highest revenue in June, followed by December
and larger order quantities. and April.
• Sales volume decreases with very high • Align promotions with seasonal peaks for
prices, indicating customer price sensitivity. maximum sales impact.

Customer Demographics: Geographic Trends:

• Adults aged 35-64 contribute most to • Australia and Germany show significant high-
revenue. revenue transactions.
• Youth (<25) prefer lower-priced mountain • Emerging markets like Spain and the
bikes. Netherlands offer growth opportunities.
22
8. Conclusion
Key Recommendations for Business Actions:
Targeted Marketing Strategies:
1. Youth Customers (<25): Focus on affordable product lines like mountain bikes
through online campaigns.
2. Adults (35-64): Promote premium products (e.g., road bikes, electric bikes) via
personalized in-store experiences.
Product Portfolio Optimization:
3. Prioritize road and mountain bikes while managing pricing to avoid deterring sales.
4. Expand into the electric bike market to tap into eco-friendly transportation demand.
Seasonal Promotions:
5. Capitalize on peak summer sales by aligning pricing and inventory strategies with
customer demand.
Geographic Expansion:
6. Focus on emerging markets like Spain and the Netherlands, customizing strategies
to fit local preferences and sensitivities. 23
Limitations and Future Opportunities
Data Limitations:

The analysis is based on historical data. Incorporating real-time trends and evolving customer
behavior is crucial for future accuracy.

Further Research:

Explore income-based and geographical segmentation to refine marketing.

Implement predictive modeling (e.g., machine learning) to better forecast future sales and
optimize business strategies.

24
9. References
● Source: https://www.kaggle.com/datasets/sadiqshah/bike-sales-in-europe

● Source: https://www.spss-tutorials.com/spss-kolmogorov-smirnov-test-for-normality/

● Europe-bicycle-market: - Growth Trends & Forecasts (2024 - 2029)

● Source: https://www.mordorintelligence.com/industry-reports/europe-bicycle-market :

● https://www.mordorintelligence.com/industry-reports/europe-bicycle-market

● Montgomery, D. C., & Runger, G. C. (2014).

● Applied Statistics and Probability for Engineers. John Wiley & Sons.

● Discovering Statistics Using IBM SPSS Statistics. Sage.

● Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52(3/4), 591-611.

25
Group Members :-
●D/DBA/23/0039 – D.M.D.N Dissanayaka
●D/DBA/23/0030 – H.A.H.D Wijayarathna
THANK YOU! ●D/DBA/23/0040 – P.T.G.S.T Thawalpitiya
●D/DBA/23/0024 – K.S Samarawickrama
●D/DBA/23/0015 – P.V.R Hirushi
●D/DBA/23/0034 – N.D.T.V Nawagamuwa

Marketing Analytics Unit 3
No ratings yet
Marketing Analytics Unit 3
54 pages
AdventureWorks Report
No ratings yet
AdventureWorks Report
10 pages
Tejaswi
No ratings yet
Tejaswi
44 pages
Final BDM
No ratings yet
Final BDM
22 pages
Rossmann Sales Prediction Presentation
No ratings yet
Rossmann Sales Prediction Presentation
35 pages
AnmolSharma_RandomMotors
No ratings yet
AnmolSharma_RandomMotors
35 pages
1A
No ratings yet
1A
62 pages
Rm Shiwangi
No ratings yet
Rm Shiwangi
17 pages
ReCell_Project.pdf
No ratings yet
ReCell_Project.pdf
21 pages
Regression Analysis Random Motors
80% (10)
Regression Analysis Random Motors
19 pages
Math_Internal_Assesment_AA_SL
No ratings yet
Math_Internal_Assesment_AA_SL
20 pages
PM Guided Project Sample Business Report.docx
No ratings yet
PM Guided Project Sample Business Report.docx
35 pages
Statistics Assignment Group 21
No ratings yet
Statistics Assignment Group 21
12 pages
Consumer Spending Habits Analysis-2313049-ASSIGNMENT1
No ratings yet
Consumer Spending Habits Analysis-2313049-ASSIGNMENT1
8 pages
DSREPORT (1)
No ratings yet
DSREPORT (1)
10 pages
Group 9 Paper Presentation
No ratings yet
Group 9 Paper Presentation
24 pages
Business Data Mining Week 8
No ratings yet
Business Data Mining Week 8
24 pages
Final Project
No ratings yet
Final Project
39 pages
Rashmi Jeswani Capstone
No ratings yet
Rashmi Jeswani Capstone
84 pages
Review Six Variables For Ten Time Series
No ratings yet
Review Six Variables For Ten Time Series
14 pages
Regression Analysis Report_Sanjeev Kumar_24MSG1R43
No ratings yet
Regression Analysis Report_Sanjeev Kumar_24MSG1R43
6 pages
Marketing and Retail Analytics - Project 1
No ratings yet
Marketing and Retail Analytics - Project 1
34 pages
Supermarket_Sales_Analysis_Algorithm- by Data Analaysis
No ratings yet
Supermarket_Sales_Analysis_Algorithm- by Data Analaysis
2 pages
SPSS 4
No ratings yet
SPSS 4
10 pages
E Commerce
No ratings yet
E Commerce
20 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
FILE_2620
No ratings yet
FILE_2620
24 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
ML Project
100% (1)
ML Project
10 pages
BS MINI PROJECT 2
No ratings yet
BS MINI PROJECT 2
5 pages
Big Data Jury
No ratings yet
Big Data Jury
21 pages
GMC Final Project - Maha
No ratings yet
GMC Final Project - Maha
20 pages
ML Project Presentation
No ratings yet
ML Project Presentation
9 pages
Report Group 8 Final
No ratings yet
Report Group 8 Final
13 pages
Marketing & Retail Analytics-Milestone 1 - 300521
71% (14)
Marketing & Retail Analytics-Milestone 1 - 300521
18 pages
Part A Doc 1
No ratings yet
Part A Doc 1
21 pages
Marketing (603a)
No ratings yet
Marketing (603a)
7 pages
BADM (1)
No ratings yet
BADM (1)
9 pages
Google Analytics Customer Revenue Prediction PDF
No ratings yet
Google Analytics Customer Revenue Prediction PDF
14 pages
E Commerce Project
No ratings yet
E Commerce Project
12 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
Predict Future Sales Group1 Presentation
No ratings yet
Predict Future Sales Group1 Presentation
11 pages
Marketing & Retail Analysis Project - Part A (Brahma Chari)
No ratings yet
Marketing & Retail Analysis Project - Part A (Brahma Chari)
28 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Google Merchandise Store Data Analysis: - Google Analytics Customer Revenue Prediction
No ratings yet
Google Merchandise Store Data Analysis: - Google Analytics Customer Revenue Prediction
15 pages
Report
No ratings yet
Report
9 pages
Regression Analysis Random Motors Project
100% (1)
Regression Analysis Random Motors Project
22 pages
AML Assignment 1 1
No ratings yet
AML Assignment 1 1
4 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Main+Projects+Rubrics+-+PM+-+Coded+%28NEW%29
No ratings yet
Main+Projects+Rubrics+-+PM+-+Coded+%28NEW%29
2 pages
Regression Analysis
No ratings yet
Regression Analysis
9 pages
Business Analytics Course
No ratings yet
Business Analytics Course
11 pages
Msai349 Project Final Report
No ratings yet
Msai349 Project Final Report
5 pages
Assignment
No ratings yet
Assignment
2 pages
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
From Everand
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
Commerce Central
No ratings yet
Business Intelligence Questions, Analytical & Reporting Hint
From Everand
Business Intelligence Questions, Analytical & Reporting Hint
Dr. Zemelak Goraga
No ratings yet
How to Optimise Your Supply Chain to Make Your Firm Competitive!
From Everand
How to Optimise Your Supply Chain to Make Your Firm Competitive!
Andrei Besedin
2/5 (2)
Value Nets (Review and Analysis of Bovet and Martha's Book)
From Everand
Value Nets (Review and Analysis of Bovet and Martha's Book)
BusinessNews Publishing
No ratings yet
The Profit Zone (Review and Analysis of Slywotzky and Morrison's Book)
From Everand
The Profit Zone (Review and Analysis of Slywotzky and Morrison's Book)
BusinessNews Publishing
No ratings yet