Project = Customer Segmentation for E-commerce
Project = Customer Segmentation for E-commerce
By
SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
1
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with “A++” grade by NAAC
Jeppiaar Nagar, Rajiv Gandhi Salai, Chennai – 600 119
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
This is to certify that this Product Report is the bonafide work of NITHIN
NAGABUSHANAM (42110906) who carried out the Design entitled “A STUDY ON
CUSTOMER SEGMENTATION FOR E-COMMERCE” under my supervision from July
2023 to November 2023.
Design Supervisor
MS.GracelinSheena, M.E(Ph.D).
2
DECLARATION
NITHIN NAGABUSHANAM (42110906) hereby declare that the Product Design Report
entitled “CUSTOMER SEGMENTATION FOR E-COMMERCE” done by me under the
guidance of MS.GracelinSheena, is submitted in partial fulfilment of the requirements
for the award of Bachelor of Engineering degree in Computer Science and Engineering
.
DATE: 05/10/2023
3
ACKNOWLEDGEMENT
I convey my thanks to Dr. T.Sasikala M.E., Ph. D, Dean, School of Computing, Dr.
L.Lakshmanan M.E., Ph.D., Head of the Department of Computer Science and
Engineering for providing me necessary support and details at the right time during
the progressive reviews.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many ways
for the completion of the project.
4
ABSTRACT
5
TABLE OF CONTENTS
1. INTRODUCTION 7-8
4.1 INTRODUCTION
4.2 AIM
4.3 OBJECTIVES
5.2 CODE
6. CONCLUSION 39
7. REFERENCE 40
6
1.INTRODUCTION
RFM Segmentation: RFM stands for Recency, Frequency, and Monetary value. This
method segments customers based on how recently they made a purchase, how often
they make purchases, and how much money they spend. This is particularly popular
for identifying high-value customers.
Segmentation by Engagement: This includes factors like how often customers open
emails, interact on social media, or participate in loyalty programs. Highly engaged
customers might receive special offers or exclusive content.
7
customers use for shopping (e.g., mobile, desktop, app) can help optimize the user
experience and marketing efforts for each segment.
8
e-commerce, various methods are used to categorize customers based on their
characteristics and behaviors. This can include demographic information like age,
gender, and location, as well as purchase history, browsing patterns, and engagement
with the website or app. Advanced techniques like machine learning and data analysis
are often employed to identify patterns and create segments. These segments are
then used to personalize marketing campaigns, recommend products, and enhance
the overall customer experience. Let me know if you'd like more information about
specific techniques or examples!
3.E-commerce Platforms:
*Shopify: Shopify provides tools and apps that allow for segmentation based on
various criteria, including purchase history and browsing behavior.
*Magento: Magento offers segmentation options as part of its marketing suite,
allowing for personalized
Marketing campaigns.
9
*Tealium: Tealium is another CDP that enables businesses to unify customer data
from various sources and create actionable segments.
1.Limited Data Sources: Some systems may have restrictions on the types and
sources of data they can incorporate. This can lead to incomplete or biased customer
profiles.
4.Lack of Real-Time Data: Some systems may not provide real-time data updates,
potentially leading to outdated customer profiles and less accurate segmentation.
6.Privacy Concerns: With increasing regulations around data privacy (such as GDPR
and CCPA), some systems may face challenges in handling customer data while
maintaining compliance.
9.Scalability Issues: As a business grows, some systems may not scale effectively
to handle larger volumes of customer data and may become less efficient.
10
and resources.
4.1 INTRODUCTION
Customer satisfaction is a critical factor in the airline industry, as it directly impacts the
success and reputation of an airline. Here's some content discussing various aspects
of airlines customer satisfaction:
b. Comfort and Space: Cabin comfort, seat pitch, legroom, and in-flight
amenities all contribute to passenger satisfaction.
e. Safety and Security: Passengers expect a high level of safety and security.
Airlines that prioritize safety measures enhance customer satisfaction.
11
d. Training and Empowering Staff: Providing extensive customer service training
for airline personnel ensures they can handle passenger concerns effectively
and professionally.
4. Loyalty Programs:
• Loyalty programs, such as frequent flyer programs, are effective in enhancing
customer satisfaction. These programs offer rewards, discounts, and special
treatment to loyal customers, incentivizing them to continue flying with the
airline.
6. Feedback Collection:
• Regularly collecting feedback from passengers through surveys and online
reviews helps airlines understand customer needs and make necessary
improvements.
4.2 AIM:
• This aim is a specific target (50% (X) improvement) and a timeframe (12 (Y)
months) for achieving the desired increase in customer satisfaction. The
actual values of X and Y should be determined based on the current state of
customer satisfaction, industry benchmarks, and the airline's resources and
capabilities.
4.3 objectives
• The airline industry faces numerous challenges, including fuel costs,
regulatory constraints, and economic factors. Balancing customer satisfaction
with operational efficiency and profitability can be difficult.
a. Service Improvement: Airlines can use the insights gained from customer
12
satisfaction analysis to identify areas of their services that need improvement.
For example, if the analysis reveals that customers are dissatisfied with in-flight
meals, the airline can work on enhancing their catering services.
d. Marketing and Branding: The findings can inform marketing strategies. Airlines
can highlight their strengths and address weaknesses in their advertising and
branding efforts, attracting more customers and retaining existing ones.
13
Overall, the analysis of airline customer satisfaction is a strategic tool that helps
airlines enhance their services, tailor their offerings, and optimize their
operations to meet the evolving demands and expectations of passengers.
This, in turn, can lead to improved customer loyalty, revenue growth, and long-
term success in a highly competitive industry.
5.EXPERIMENTAL ANALYSIS
1. Data Collection:
• Obtain the dataset containing information about airline customer satisfaction. This
dataset may include features like flight details, customer demographics, and their
satisfaction ratings.
2. Data Preprocessing:
• Clean the data by handling missing values, outliers, and data consistency issues.
• Convert categorical variables into numerical format using techniques like one-hot
encoding or label encoding.
4. Feature Engineering:
• Create new features if necessary, such as aggregating data or transforming
variables.
• Feature selection may be required to identify the most relevant features for
modeling.
6. Model Selection:
• Choose an appropriate machine learning model for predicting customer
satisfaction. Common models for binary classification tasks like this include logistic
regression, decision trees, random forests, support vector machines, and neural
networks. The choice depends on the dataset size, complexity, and your familiarity
with the model.
8. Model Evaluation:
• Evaluate the model's performance using appropriate evaluation metrics. For binary
classification, metrics like accuracy, precision, recall, F1-score, and ROC-AUC are
commonly used.
• Use techniques like cross-validation to ensure the model's robustness.
14
.
09. Model Interpretation (Optional):
• Depending on the model used, interpret the results to understand which features
contribute most to customer satisfaction.
5.1.Dataset needs
The dataset provided by Invistico Airlines contains valuable information about their
customers' experiences and satisfaction levels. With the aim of predicting future
customer satisfaction and improving service quality, this dataset encompasses
various customer attributes and feedback on different aspects of their flights. The
primary objectives of this dataset are to predict customer satisfaction and identify
areas for service improvement.
15
13. Inflight entertainment: Customer rating of inflight entertainment options.
14. Online support: Customer rating of online customer support.
15. Ease of Online booking: Customer rating of ease of online booking.
16. On-board service: Customer rating of on-board service provided by the airline.
17. Leg room service: Customer rating of leg room service provided during the flight.
18. Baggage handling: Customer rating of baggage handling.
19. Checkin service: Customer rating of check-in service.
20. Cleanliness: Customer rating of cabin cleanliness.
21. Online boarding: Customer rating of online boarding process.
22. Departure Delay in Minutes: The departure delay in minutes for each flight.
23. Arrival Delay in Minutes: The arrival delay in minutes for each flight
16
5.2 CODE:
All rating features measured on a scale from 0 to 5, where higher values indicate
greater satisfaction.
df = pd.read_csv("C:/Users/nithi/Desktop/Invistico_Airline.csv"
)
df.head()
[output1] =
17
# Seeing the shape of the data.
df.shape
[output2]=
(129880, 23)
[output3]=0
[output4]=
18
# Seeing information about data.
df.info()
[output5]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 satisfaction 129880 non-null object
1 Gender 129880 non-null object
2 Customer Type 129880 non-null object
3 Age 129880 non-null int64
4 Type of Travel 129880 non-null object
5 Class 129880 non-null object
6 Flight Distance 129880 non-null int64
7 Seat comfort 129880 non-null int64
8 Departure/Arrival time convenient 129880 non-null int64
9 Food and drink 129880 non-null int64
10 Gate location 129880 non-null int64
11 Inflight wifi service 129880 non-null int64
12 Inflight entertainment 129880 non-null int64
13 Online support 129880 non-null int64
14 Ease of Online booking 129880 non-null int64
19
15 On-board service 129880 non-null int64
16 Leg room service 129880 non-null int64
17 Baggage handling 129880 non-null int64
18 Checkin service 129880 non-null int64
19 Cleanliness 129880 non-null int64
20 Online boarding 129880 non-null int64
21 Departure Delay in Minutes 129880 non-null int64
22 Arrival Delay in Minutes 129487 non-null float64
dtypes: float64(1), int64(17), object(5)
memory usage: 22.8+ MB
for i in categorical_features:
print(df[i].value_counts())
print('-' * 50)
[output6]=
satisfied 71087
dissatisfied 58793
Name: satisfaction, dtype: int64
--------------------------------------------------
Female 65899
Male 63981
Name: Gender, dtype: int64
--------------------------------------------------
Loyal Customer 106100
disloyal Customer 23780
Name: Customer Type, dtype: int64
--------------------------------------------------
Business travel 89693
Personal Travel 40187
Name: Type of Travel, dtype: int64
--------------------------------------------------
Business 62160
Eco 58309
Eco Plus 9411
Name: Class, dtype: int64
--------------------------------------------------
3 29183
2 28726
20
4 28398
1 20949
5 17827
0 4797
Name: Seat comfort, dtype: int64
--------------------------------------------------
4 29593
5 26817
3 23184
2 22794
1 20828
0 6664
Name: Departure/Arrival time convenient, dtype: int64
--------------------------------------------------
3 28150
4 27216
2 27146
1 21076
5 20347
0 5945
Name: Food and drink, dtype: int64
--------------------------------------------------
3 33546
4 30088
2 24518
1 22565
5 19161
0 2
Name: Gate location, dtype: int64
--------------------------------------------------
4 31560
5 28830
3 27602
2 27045
1 14711
0 132
Name: Inflight wifi service, dtype: int64
--------------------------------------------------
4 41879
5 29831
3 24200
2 19183
1 11809
0 2978
Name: Inflight entertainment, dtype: int64
--------------------------------------------------
4 41510
5 35563
3 21609
2 17260
21
1 13937
0 1
Name: Online support, dtype: int64
--------------------------------------------------
4 39920
5 34137
3 22418
2 19951
1 13436
0 18
Name: Ease of Online booking, dtype: int64
--------------------------------------------------
4 40675
5 31724
3 27037
2 17174
1 13265
0 5
Name: On-board service, dtype: int64
--------------------------------------------------
4 39698
5 34385
3 22467
2 21745
1 11141
0 444
Name: Leg room service, dtype: int64
--------------------------------------------------
4 48240
5 35748
3 24485
2 13432
1 7975
Name: Baggage handling, dtype: int64
--------------------------------------------------
4 36481
3 35538
5 27005
2 15486
1 15369
0 1
Name: Checkin service, dtype: int64
--------------------------------------------------
4 48795
5 35916
3 23984
2 13412
1 7768
0 5
Name: Cleanliness, dtype: int64
22
--------------------------------------------------
4 35181
3 30780
5 29973
2 18573
1 15359
0 14
Name: Online boarding, dtype: int64
[output7]:
3 33546
4 30088
2 24518
1 22567
5 19161
Name: Gate location, dtype: int64
--------------------------------------------------
4 31560
5 28830
3 27602
2 27045
1 14843
Name: Inflight wifi service, dtype: int64
--------------------------------------------------
4 39920
5 34137
3 22418
2 19951
1 13454
Name: Ease of Online booking, dtype: int64
--------------------------------------------------
4 40675
5 31724
3 27037
23
2 17174
1 13270
Name: On-board service, dtype: int64
--------------------------------------------------
4 36481
3 35538
5 27005
2 15486
1 15370
Name: Checkin service, dtype: int64
--------------------------------------------------
4 48795
5 35916
3 23984
2 13412
1 7773
Name: Cleanliness, dtype: int64
--------------------------------------------------
4 35181
3 30780
5 29973
2 18573
1 15373
Name: Online boarding, dtype: int64
--------------------------------------------------
Following the refinement of the rating features
# Numerical columns
numerical_features = ["Age", "Flight Distance", "Departure Dela
y in Minutes", 'Arrival Delay in Minutes']
In [12]:
linkcode
plt.figure(figsize=(8, 6))
24
[output8]:
25
df.drop('Arrival Delay in Minutes', axis=1, inplace=True)
[output9]
26
# Iterate through each rating column
In these graphs, our objective is to discern whether ratings have a direct impact on
customer satisfaction or, in other words, to identify the most influential features
affecting customer satisfaction. Our analysis reveals that "Departure/Arrival time
convenient" and "Gate location" appear to have almost no effect on customer
satisfaction. The ratio of satisfied to dissatisfied customers is nearly equal for these
two features
# Setting the title, labels, and rotation for the x-axis tic
ks on a plot
plt.title('Distribution of {}'.format(col))
plt.xlabel('Rating')
plt.ylabel('Count')
plt.xticks(rotation=0)
27
28
29
30
31
# Iterate through each categorical column
for col in ['Gender', 'Customer Type', 'Type of Travel', 'Clas
s']:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x=col, hue='satisfaction', palette=
['salmon', 'skyblue'])
[output11]=
32
33
# Creating a side-by-side histogram subplot for Flight Distance
and Satisfaction distribution
plt.figure(figsize=(12, 5))
34
# Creating the right subplot
plt.subplot(1, 2, 2)
plt.hist(df['satisfaction'], bins=5, color='salmon', edgecolor
='black')
plt.xlabel('Satisfaction')
plt.ylabel('Frequency')
plt.title('Satisfaction Distribution')
[output12] =
# Label Encoding.
35
label_encoder = LabelEncoder()
for i in ['Gender', 'Customer Type', 'Type of Travel']:
X[i] = label_encoder.fit_transform(X[i])
X.head()
36
# Iterate over each model and evaluate its accuracy using
cross-validation.
for model in models:
scores = cross_val_score(model, X_train, y_train, cv=5)
# Predict y-predict.
y_pred = model.predict(X_test)
37
color='skyblue')
plt.yticks(range(len(sorted_idx)), features[sorted_idx])
plt.xlabel('Feature Importance')
plt.title('Feature Importance')
plt.show()
output
38
6.CONCLUSION
The analysis of customer satisfaction is not only a strategic advantage but a necessity.
Airlines that invest in understanding and improving customer satisfaction are better
positioned to retain and attract passengers, achieve financial success, and adapt to
changing market conditions.
39
7.REFERENCE
1. https://www.mdpi.com/0718-1876/18/1/29
2. https://www.omniconvert.com/blog/customer-segmentation-
models/https://www.omniconvert.com/blog/customer-
segmentation-models/
3. https://github.com/Sillians/E-Commerce-Customer-Segmentation-Project
4. https://github.com/karthickr7/E-commerce-Customer-Segmentation
40