0% found this document useful (0 votes)
30 views

Project = Customer Segmentation for E-commerce

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Project = Customer Segmentation for E-commerce

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

“A STUDY ON CUSTOMER SEGMENTATION FOR E-COMMERCE”

Submitted in partial fulfillment of therequirements for the award of


Bachelor of Engineering degree in Computer Science and Engineering

By

NITHIN NAGABUSHANAM (42110906)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF COMPUTING

SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY

(DEEMED TO BE UNIVERSITY)

Accredited with Grade “A++” by NAAC JEPPIAAR NAGAR, RAJIV


GANDHISALAI,CHENNAI – 600119
OCTOBER – 2023

1
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with “A++” grade by NAAC
Jeppiaar Nagar, Rajiv Gandhi Salai, Chennai – 600 119
www.sathyabama.ac.in

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that this Product Report is the bonafide work of NITHIN
NAGABUSHANAM (42110906) who carried out the Design entitled “A STUDY ON
CUSTOMER SEGMENTATION FOR E-COMMERCE” under my supervision from July
2023 to November 2023.

Design Supervisor

MS.GracelinSheena, M.E(Ph.D).

Head of the Department

Dr. L. LAKSHMANAN, M.E., Ph.D.

Submitted for Viva voce Examination held on

Internal Examiner External Examiner

2
DECLARATION

NITHIN NAGABUSHANAM (42110906) hereby declare that the Product Design Report
entitled “CUSTOMER SEGMENTATION FOR E-COMMERCE” done by me under the
guidance of MS.GracelinSheena, is submitted in partial fulfilment of the requirements
for the award of Bachelor of Engineering degree in Computer Science and Engineering
.

DATE: 05/10/2023

PLACE: Chennai SIGNATURE OF THE CANDIDATE

3
ACKNOWLEDGEMENT

I am pleased to acknowledge my sincere thanks to Board of Management of


SATHYABAMA for their kind encouragement in doing this project and for completing
it successfully. I am grateful to them.

I convey my thanks to Dr. T.Sasikala M.E., Ph. D, Dean, School of Computing, Dr.
L.Lakshmanan M.E., Ph.D., Head of the Department of Computer Science and
Engineering for providing me necessary support and details at the right time during
the progressive reviews.

I would like to express my sincere and deep sense of gratitude to my DesignSupervisor


MS.GracelinSheena, for her valuable guidance, suggestions and constant
encouragement paved way for the successful completion of project work.

I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many ways
for the completion of the project.

4
ABSTRACT

This project report entitled to “A study on customer segmentation as strategic


approach to enhance the performance of an E-commerce platform . This data analysis
project explores the factors influencing customer satisfaction behavior using a dataset
of customer interactions on an E-commerce platform. By analyzing user behavior and
purchase patterns, we aim to uncover insights that can inform marketing strategies
and enhance the product experience and improved customer engagement.

The methodology involves utilizing unsupervised machine learning techniques,


MySQL, python, powerBI on a dataset containing transactional records of customers
over a one-year period. The dataset encompasses attributes such as purchase
frequency, order value, product categories purchased, and customer demographic
information. Notably, "High-Value Buyers" contribute disproportionately to the
platform's revenue, while "Discount Seekers" display sensitivity to promotions and
sales events. Airline customer satisfaction is a critical factor in the aviation industry,
directly impacting an airline's reputation and customer loyalty. In this project, we
conducted an in-depth analysis of airline customer satisfaction, encompassing
exploratory data analysis (EDA) and predictive modeling. Our goal was to understand
the key drivers of customer satisfaction and build a model that can predict whether a
customer is satisfied or not. “Fashion Enthusiasts" consistently engage with clothing
and accessories categories, presenting opportunities for personalized fashion
recommendations.

These findings enable E-commerce businesses to tailor marketing campaigns,


optimize product recommendations, and refine inventory management strategies. In
conclusion, this project underscores the value of customer segmentation in
understanding and serving diverse E-commerce customer groups. The insights gained
contribute to informed decision-making for marketing strategies, ultimately fostering
customer satisfaction, retention, and revenue growth.

5
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE.NO

1. INTRODUCTION 7-8

2. SATISFACTION EXISTING SYSTEM 8-10

3. LIMITATIONS OF THE EXISTING SYSTEMS 10-11

4. AIRLINES CUSTOMER SATISFACTION 11-14

4.1 INTRODUCTION

4.2 AIM

4.3 OBJECTIVES

4.4 Use Of This Data Analysis

5. EXPERIMENTAL ANALYSIS 14-38

5.1 DATASET NEEDS

5.2 CODE

6. CONCLUSION 39

7. REFERENCE 40

6
1.INTRODUCTION

1.1:Introduction to Customer Segmentation In E-commerce

Customer segmentation for e-commerce refers to the process


of categorizing a company's customer base into distinct groups or segments based on
shared characteristics or behaviors. This allows businesses to better understand their
customers, tailor their marketing efforts, and provide more personalized shopping
experiences. Here are some common ways e-commerce companies may segment
their customers:

Demographic Segmentation: This involves categorizing customers based on


demographic information such as age, gender, income, education level, marital status,
etc. For example, a company might target a specific age group for a particular product
line.

Geographic Segmentation: This involves segmenting customers based on their


location, such as country, city, or region. This can be crucial for e-commerce
businesses, especially if they have to consider factors like shipping costs and delivery
times.

Psychographic Segmentation: This involves understanding customers' lifestyle,


interests, values, and personality traits. It helps in creating marketing messages that
resonate with specific customer groups.

Behavioral Segmentation: This is based on customer behavior, such as their


purchase history, browsing behavior, frequency of purchases, and engagement with
marketing materials. It can help in identifying patterns and predicting future behavior.

RFM Segmentation: RFM stands for Recency, Frequency, and Monetary value. This
method segments customers based on how recently they made a purchase, how often
they make purchases, and how much money they spend. This is particularly popular
for identifying high-value customers.

Segmentation by Purchase History: This involves categorizing customers based on


the types of products they have purchased in the past. For instance, a clothing retailer
might distinguish between customers who primarily buy casual wear and those who
purchase formal attire.

Segmentation by Engagement: This includes factors like how often customers open
emails, interact on social media, or participate in loyalty programs. Highly engaged
customers might receive special offers or exclusive content.

Segmentation by Device or Platform: Understanding which devices or platforms

7
customers use for shopping (e.g., mobile, desktop, app) can help optimize the user
experience and marketing efforts for each segment.

Segmentation by Customer Lifecycle Stage: Customers can be categorized based


on where they are in their relationship with the company, such as new customers,
repeat customers, loyal customers, and dormant customers.

Segmentation by Customer Value: This involves distinguishing between high-value


and low-value customers based on metrics like lifetime value (LTV) or average order
value (AOV).

By effectively segmenting their customer base, e-commerce businesses can create


targeted marketing campaigns, tailor product recommendations, improve customer
service, and ultimately enhance the overall shopping experience. This can lead to
increased customer satisfaction, loyalty, and ultimately, higher revenue.

1.2:What is Customer Segmentation?

Customer segmentation is the art of categorizing your


customer base into distinct groups based on shared characteristics or behaviors. This
method allows e-commerce businesses to delve deeper into the needs and
preferences of their clientele, offering tailored experiences that resonate on a personal
level.

1.3:Why is Customer Segmentation Crucial for E-commerce?

In an era defined by information overload, generic


one-size-fits-all approaches fall short in capturing the attention and loyalty of today's
discerning consumers. Customer segmentation empowers e-commerce platforms to
cut through the noise and deliver precisely what each segment desires. This not only
enhances customer satisfaction but also drives conversions and bolsters long-term
customer relationships.

1.4:Benefits of Effective Customer Segmentation

The benefits of customer segmentation in e-


commerce are manifold. It enables businesses to refine their product offerings, target
marketing efforts with precision, optimize user experiences, and ultimately, boost their
bottom line. Additionally, it fosters a sense of loyalty and brand affinity among
customers who feel genuinely understood and valued.

2. SATISFACTION EXISTING SYSTEM

In the existing system of customer segmentation in

8
e-commerce, various methods are used to categorize customers based on their
characteristics and behaviors. This can include demographic information like age,
gender, and location, as well as purchase history, browsing patterns, and engagement
with the website or app. Advanced techniques like machine learning and data analysis
are often employed to identify patterns and create segments. These segments are
then used to personalize marketing campaigns, recommend products, and enhance
the overall customer experience. Let me know if you'd like more information about
specific techniques or examples!

1.Customer Relationship (CRM) Systems:


*Salesforce: Salesforce offers a robust CRM platform with features for customer
segmentation, marketing automation, and personalized communication.
*HubSpot: HubSpot provides CRM and marketing automation tools that allow for
customer segmentation based on various factors like behavior, demographics, and
more.

2.Email Marketing Platforms:

*MailChimp: MailChimp includes segmentation features that allow e-commerce


businesses to target specific groups of customers based on their behavior and
preferences.
*Constant Contact: This platform offers customer segmentation capabilities for email
marketing campaigns.

3.E-commerce Platforms:
*Shopify: Shopify provides tools and apps that allow for segmentation based on
various criteria, including purchase history and browsing behavior.
*Magento: Magento offers segmentation options as part of its marketing suite,
allowing for personalized
Marketing campaigns.

4.Marketing Automation Platforms:


*Marketo: Marketo is a popular marketing automation platform that includes features
for customer segmentation, lead scoring, and personalized marketing.
*Pardot (by Salesforce): Pardot is another marketing automation tool with robust
segmentation capabilities.

5.Data Analytics and Business Intelligence Tools:


*Google Analytics: While not primarily a segmentation tool, Google Analytics
provides insights into user behavior, which can be used to segment customers for
marketing purposes.
*Tableau: Tableau is a powerful business intelligence tool that can be used to analyze
customer data and create custom segments.

6.Customer Data Platforms (CDPs):


*Segment: Segment is a customer data platform that helps businesses collect, clean,
and manage customer data for better segmentation and personalization.

9
*Tealium: Tealium is another CDP that enables businesses to unify customer data
from various sources and create actionable segments.

7.AI-Powered Customer Segmentation Tools:


*Optimizely (formerly Episerver): Utilizes AI-driven tools for customer segmentation
and personalization.
*Dynamic Yield: Employs AI to create personalized experiences and segments
customers based on behavior and preferences.

8.Open-Source Tools and Frameworks:


*R and Python with Machine Learning Libraries: Data scientists and analysts often
use programming languages like R and Python, along with libraries like scikit-learn,
for custom customer segmentation models.

3.LIMITATIONS OF THE EXISTING SYSTEMS

1.Limited Data Sources: Some systems may have restrictions on the types and
sources of data they can incorporate. This can lead to incomplete or biased customer
profiles.

2.Static Segmentation: Many systems rely on predefined rules for segmentation.


This can be less effective in capturing evolving customer behaviors and preferences.

3.Single-Dimensional Segmentation: Some systems may only allow for


segmentation based on one or two criteria which may not capture the full complexity
of customer behavior.

4.Lack of Real-Time Data: Some systems may not provide real-time data updates,
potentially leading to outdated customer profiles and less accurate segmentation.

5.Overlooking Contextual Information: Systems may not always consider external


factors or contextual information that could influence customer behavior (e.g.,
seasonal trends, economic shifts).

6.Privacy Concerns: With increasing regulations around data privacy (such as GDPR
and CCPA), some systems may face challenges in handling customer data while
maintaining compliance.

8.Inability to Handle Unstructured Data: Some systems may struggle with


processing unstructured data sources, like social media interactions or customer
reviews, which can provide valuable insights.

9.Scalability Issues: As a business grows, some systems may not scale effectively
to handle larger volumes of customer data and may become less efficient.

10.Cost and Resource Intensiveness: Implementing and maintaining sophisticated


customer segmentation systems can require a significant investment of time, money,

10
and resources.

11.Difficulty in Cross-Channel Integration: Achieving seamless customer


segmentation across various marketing channels (email, social media, website, etc.)
can be challenging for some system.

04:EXAMPLE - Airlines Customer Satisfaction (EDA + Modeling)

4.1 INTRODUCTION

Customer satisfaction is a critical factor in the airline industry, as it directly impacts the
success and reputation of an airline. Here's some content discussing various aspects
of airlines customer satisfaction:

1.Importance of Customer Satisfaction:


• Customer satisfaction is the cornerstone of any successful airline. Happy and
satisfied passengers are more likely to become loyal customers, recommend
the airline to others, and provide positive feedback. Airlines that prioritize
customer satisfaction tend to outperform their competitors in terms of
revenue and reputation.

2. Key Factors Affecting Customer Satisfaction:


a. Punctuality: On-time performance is a significant driver of customer
satisfaction. Passengers expect their flights to depart and arrive on schedule.

b. Comfort and Space: Cabin comfort, seat pitch, legroom, and in-flight
amenities all contribute to passenger satisfaction.

c. Customer Service: The attitude and professionalism of airline staff, both on


the ground and in the air, play a crucial role in passenger satisfaction.

d. Baggage Handling: Efficient and reliable baggage handling, including minimal


lost luggage incidents, is essential for a positive experience.

e. Safety and Security: Passengers expect a high level of safety and security.
Airlines that prioritize safety measures enhance customer satisfaction.

3. Strategies to Improve Customer Satisfaction:

a. Investing in Aircraft and Cabin Upgrades: Modern and well-maintained aircraft


with comfortable cabins contribute to passenger satisfaction.

b. Streamlined Booking and Check-In Processes: Easy and hassle-free booking


and check-in processes improve the overall experience.

c. Enhancing In-Flight Entertainment: Offering a variety of entertainment options


can make long flights more enjoyable.

11
d. Training and Empowering Staff: Providing extensive customer service training
for airline personnel ensures they can handle passenger concerns effectively
and professionally.

e. Clear Communication: Timely and clear communication with passengers,


especially during delays or disruptions, can mitigate frustration.

4. Loyalty Programs:
• Loyalty programs, such as frequent flyer programs, are effective in enhancing
customer satisfaction. These programs offer rewards, discounts, and special
treatment to loyal customers, incentivizing them to continue flying with the
airline.

5. Handling Customer Complaints:


• Airlines should have effective processes in place to address passenger
complaints and issues promptly. Quick resolutions can turn a dissatisfied
customer into a satisfied and loyal one.

6. Feedback Collection:
• Regularly collecting feedback from passengers through surveys and online
reviews helps airlines understand customer needs and make necessary
improvements.

7. Sustainability and Environmental Concerns:


• Many passengers now consider an airline's commitment to sustainability and
environmental responsibility in their overall satisfaction. Airlines that adopt
eco-friendly practices and technologies can appeal to environmentally-
conscious customers.

8. Challenges in Achieving High Customer Satisfaction:

4.2 AIM:
• This aim is a specific target (50% (X) improvement) and a timeframe (12 (Y)
months) for achieving the desired increase in customer satisfaction. The
actual values of X and Y should be determined based on the current state of
customer satisfaction, industry benchmarks, and the airline's resources and
capabilities.

4.3 objectives
• The airline industry faces numerous challenges, including fuel costs,
regulatory constraints, and economic factors. Balancing customer satisfaction
with operational efficiency and profitability can be difficult.

4.4 Use Of This Data Analysis

The analysis of airline customer satisfaction, encompassing both Exploratory Data


Analysis (EDA) and modeling, serves several essential purposes:

a. Service Improvement: Airlines can use the insights gained from customer

12
satisfaction analysis to identify areas of their services that need improvement.
For example, if the analysis reveals that customers are dissatisfied with in-flight
meals, the airline can work on enhancing their catering services.

b. Resource Allocation: By understanding the factors that most strongly influence


customer satisfaction, airlines can allocate resources more efficiently. They can
prioritize investments in areas that have the greatest impact on customer
experience.

c. Pricing Strategies: Airlines can adjust their pricing strategies based on


customer preferences. For instance, if passengers are willing to pay more for
flights with superior in-flight entertainment, the airline can adjust pricing for such
services.

d. Marketing and Branding: The findings can inform marketing strategies. Airlines
can highlight their strengths and address weaknesses in their advertising and
branding efforts, attracting more customers and retaining existing ones.

e. Customer Segmentation: Airlines can segment their customer base based on


preferences and satisfaction levels. This allows for more personalized
marketing and service offerings to different customer segments. For example,
loyal business travelers may have different needs than leisure travelers.

f. Competitive Analysis: Airlines can compare their customer satisfaction scores


to those of competitors. This information is invaluable for benchmarking and
staying competitive in the market.

g. Operational Efficiency: Identifying factors that influence delays, cancellations,


and customer complaints can lead to operational improvements. Reducing
disruptions and resolving issues swiftly can positively impact satisfaction.

h. Predictive Modeling: Airlines can use predictive models to forecast customer


satisfaction for future flights or services, helping them anticipate potential issues
and proactively address them.

i. Feedback Loops: Analyzing customer satisfaction can create a feedback loop


where airlines continuously gather feedback, make improvements, and monitor
how these changes affect satisfaction over time.

j. Regulatory Compliance: Some industries, including airlines, are subject to


regulations regarding customer satisfaction and service quality. By conducting
this analysis, airlines can ensure they meet or exceed regulatory requirements.

k. Financial Impact: Improved customer satisfaction can lead to higher customer


retention rates, increased repeat business, and positive word-of-mouth
referrals, all of which have a direct impact on an airline's financial performance.

l. Risk Management: Identifying factors that lead to customer dissatisfaction can


help airlines manage risks related to customer complaints, lawsuits, or negative
publicity.

13
Overall, the analysis of airline customer satisfaction is a strategic tool that helps
airlines enhance their services, tailor their offerings, and optimize their
operations to meet the evolving demands and expectations of passengers.
This, in turn, can lead to improved customer loyalty, revenue growth, and long-
term success in a highly competitive industry.

5.EXPERIMENTAL ANALYSIS

1. Data Collection:
• Obtain the dataset containing information about airline customer satisfaction. This
dataset may include features like flight details, customer demographics, and their
satisfaction ratings.

2. Data Preprocessing:
• Clean the data by handling missing values, outliers, and data consistency issues.
• Convert categorical variables into numerical format using techniques like one-hot
encoding or label encoding.

3. Exploratory Data Analysis (EDA):


• Explore the data using summary statistics, data visualizations, and descriptive
statistics to gain insights into the dataset. Some key EDA steps include:
• Univariate analysis: Understand the distribution of individual variables.
• Bivariate analysis: Explore relationships between variables.
• Visualizations like histograms, bar plots, scatter plots, and correlation matrices can
be helpful.

4. Feature Engineering:
• Create new features if necessary, such as aggregating data or transforming
variables.
• Feature selection may be required to identify the most relevant features for
modeling.

5. Splitting the Data:


• Split the dataset into a training set and a testing set to evaluate the model's
performance.

6. Model Selection:
• Choose an appropriate machine learning model for predicting customer
satisfaction. Common models for binary classification tasks like this include logistic
regression, decision trees, random forests, support vector machines, and neural
networks. The choice depends on the dataset size, complexity, and your familiarity
with the model.

8. Model Evaluation:
• Evaluate the model's performance using appropriate evaluation metrics. For binary
classification, metrics like accuracy, precision, recall, F1-score, and ROC-AUC are
commonly used.
• Use techniques like cross-validation to ensure the model's robustness.

14
.
09. Model Interpretation (Optional):
• Depending on the model used, interpret the results to understand which features
contribute most to customer satisfaction.

10. Deploy and Monitor (Optional):


• If the model performs well, you can. deploy it to make predictions for new data.
Regularly monitor the model's performance and retrain it as needed.
11. Report and Visualization:
• Create a report summarizing the findings from EDA, model performance, and any
insights gained from the model. Visualizations can help convey the results
effectively. EDA is crucial for understanding the data, while modeling aims to
predict customer satisfaction accurately. Regularly iterate on your analysis and
model to improve quality.

5.1.Dataset needs

The dataset provided by Invistico Airlines contains valuable information about their
customers' experiences and satisfaction levels. With the aim of predicting future
customer satisfaction and improving service quality, this dataset encompasses
various customer attributes and feedback on different aspects of their flights. The
primary objectives of this dataset are to predict customer satisfaction and identify
areas for service improvement.

1. satisfaction: The overall satisfaction level of the customer. It is a categorical


variable with options "satisfied" or "dissatisfied".
2. Gender: The gender of the customer. It is a categorical variable with options
"male" or "female".
3. Customer Type: Whether the customer is a "loyal customer" or a "disloyal
customer".
4. Age: The age of the customer.
5. Type of Travel: The type of travel, such as "Personal Travel" or "Business
travel."Class: The class of travel, such as "Eco" or "Business."
6. Class: The class of travel, such as "Eco" or "Business."
7. Flight Distance: The distance of the flight.
8. Seat comfort: Customer rating of seat comfort.
9. Departure/Arrival time convenient: Customer rating of convenience of
departure/arrival times.
10. Food and drink: Customer rating of food and drink quality.
11. Gate location: Customer rating of gate location.
12. Inflight wifi service: Customer rating of inflight Wi-Fi service.

15
13. Inflight entertainment: Customer rating of inflight entertainment options.
14. Online support: Customer rating of online customer support.
15. Ease of Online booking: Customer rating of ease of online booking.
16. On-board service: Customer rating of on-board service provided by the airline.
17. Leg room service: Customer rating of leg room service provided during the flight.
18. Baggage handling: Customer rating of baggage handling.
19. Checkin service: Customer rating of check-in service.
20. Cleanliness: Customer rating of cabin cleanliness.
21. Online boarding: Customer rating of online boarding process.
22. Departure Delay in Minutes: The departure delay in minutes for each flight.
23. Arrival Delay in Minutes: The arrival delay in minutes for each flight

16
5.2 CODE:

All rating features measured on a scale from 0 to 5, where higher values indicate
greater satisfaction.

# Import libraries. begin, let's import the necessary libraries


that we'll be using throughout this notebook:

# Data Manipulation Libraries


import numpy as np
import pandas as pd

# Data Visualization Libraries


import seaborn as sns
import matplotlib.pyplot as plt

# Machine Learning Models


from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# knowing the name of the dataset


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

df = pd.read_csv("C:/Users/nithi/Desktop/Invistico_Airline.csv"
)
df.head()

[output1] =

17
# Seeing the shape of the data.
df.shape

[output2]=
(129880, 23)

# Seeing if there are dublicated.


df.duplicated().sum()

[output3]=0

# seeing if there are null values.


df.isna().sum()

[output4]=

18
# Seeing information about data.
df.info()

[output5]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 satisfaction 129880 non-null object
1 Gender 129880 non-null object
2 Customer Type 129880 non-null object
3 Age 129880 non-null int64
4 Type of Travel 129880 non-null object
5 Class 129880 non-null object
6 Flight Distance 129880 non-null int64
7 Seat comfort 129880 non-null int64
8 Departure/Arrival time convenient 129880 non-null int64
9 Food and drink 129880 non-null int64
10 Gate location 129880 non-null int64
11 Inflight wifi service 129880 non-null int64
12 Inflight entertainment 129880 non-null int64
13 Online support 129880 non-null int64
14 Ease of Online booking 129880 non-null int64

19
15 On-board service 129880 non-null int64
16 Leg room service 129880 non-null int64
17 Baggage handling 129880 non-null int64
18 Checkin service 129880 non-null int64
19 Cleanliness 129880 non-null int64
20 Online boarding 129880 non-null int64
21 Departure Delay in Minutes 129880 non-null int64
22 Arrival Delay in Minutes 129487 non-null float64
dtypes: float64(1), int64(17), object(5)
memory usage: 22.8+ MB

categorical_features = ['satisfaction', 'Gender', 'Customer Typ


e', 'Type of Travel', 'Class', 'Seat comfort','Departure/Arriva
l time convenient', 'Food and drink', 'Gate location’,'Inflight
wifi service', 'Inflight entertainment', 'Online support’,'Ease
of Online booking', 'On-board service', 'Leg room service','Bag
gage handling', 'Checkin service', 'Cleanliness', 'Online board
ing']

for i in categorical_features:
print(df[i].value_counts())
print('-' * 50)

[output6]=

satisfied 71087
dissatisfied 58793
Name: satisfaction, dtype: int64
--------------------------------------------------
Female 65899
Male 63981
Name: Gender, dtype: int64
--------------------------------------------------
Loyal Customer 106100
disloyal Customer 23780
Name: Customer Type, dtype: int64
--------------------------------------------------
Business travel 89693
Personal Travel 40187
Name: Type of Travel, dtype: int64
--------------------------------------------------
Business 62160
Eco 58309
Eco Plus 9411
Name: Class, dtype: int64
--------------------------------------------------
3 29183
2 28726

20
4 28398
1 20949
5 17827
0 4797
Name: Seat comfort, dtype: int64
--------------------------------------------------
4 29593
5 26817
3 23184
2 22794
1 20828
0 6664
Name: Departure/Arrival time convenient, dtype: int64
--------------------------------------------------
3 28150
4 27216
2 27146
1 21076
5 20347
0 5945
Name: Food and drink, dtype: int64
--------------------------------------------------
3 33546
4 30088
2 24518
1 22565
5 19161
0 2
Name: Gate location, dtype: int64
--------------------------------------------------
4 31560
5 28830
3 27602
2 27045
1 14711
0 132
Name: Inflight wifi service, dtype: int64
--------------------------------------------------
4 41879
5 29831
3 24200
2 19183
1 11809
0 2978
Name: Inflight entertainment, dtype: int64
--------------------------------------------------
4 41510
5 35563
3 21609
2 17260

21
1 13937
0 1
Name: Online support, dtype: int64
--------------------------------------------------
4 39920
5 34137
3 22418
2 19951
1 13436
0 18
Name: Ease of Online booking, dtype: int64
--------------------------------------------------
4 40675
5 31724
3 27037
2 17174
1 13265
0 5
Name: On-board service, dtype: int64
--------------------------------------------------
4 39698
5 34385
3 22467
2 21745
1 11141
0 444
Name: Leg room service, dtype: int64
--------------------------------------------------
4 48240
5 35748
3 24485
2 13432
1 7975
Name: Baggage handling, dtype: int64
--------------------------------------------------
4 36481
3 35538
5 27005
2 15486
1 15369
0 1
Name: Checkin service, dtype: int64
--------------------------------------------------
4 48795
5 35916
3 23984
2 13412
1 7768
0 5
Name: Cleanliness, dtype: int64

22
--------------------------------------------------
4 35181
3 30780
5 29973
2 18573
1 15359
0 14
Name: Online boarding, dtype: int64

In our dataset, consisting of 129,880 rows and 23 columns, we observed no duplicate


d records, but there are 393 missing values specifically in the 'Arrival Delay in Minute
s' column. The data distribution is largely balanced across the various features

# Check if the rating features have a rate from 1 to 5 was done


correctly
for i in ['Gate location', 'Inflight wifi service', 'Ease of On
line booking',
'On-board service', 'Checkin service', 'Cleanliness',
'Online boarding']:
print(df[i].value_counts())
print('-' * 50)

[output7]:

3 33546
4 30088
2 24518
1 22567
5 19161
Name: Gate location, dtype: int64
--------------------------------------------------
4 31560
5 28830
3 27602
2 27045
1 14843
Name: Inflight wifi service, dtype: int64
--------------------------------------------------
4 39920
5 34137
3 22418
2 19951
1 13454
Name: Ease of Online booking, dtype: int64
--------------------------------------------------
4 40675
5 31724
3 27037

23
2 17174
1 13270
Name: On-board service, dtype: int64
--------------------------------------------------
4 36481
3 35538
5 27005
2 15486
1 15370
Name: Checkin service, dtype: int64
--------------------------------------------------
4 48795
5 35916
3 23984
2 13412
1 7773
Name: Cleanliness, dtype: int64
--------------------------------------------------
4 35181
3 30780
5 29973
2 18573
1 15373
Name: Online boarding, dtype: int64
--------------------------------------------------
Following the refinement of the rating features

# Numerical columns
numerical_features = ["Age", "Flight Distance", "Departure Dela
y in Minutes", 'Arrival Delay in Minutes']
In [12]:
linkcode
plt.figure(figsize=(8, 6))

# Calculate the correlation matrix for the numerical columns


correlation_matrix = df[numerical_features].corr()

# Create a heatmap using the correlation matrix


sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')

# Set the title of the heatmap


plt.title('Correlation Heatmap')

# Display the heatmap


plt.show()

24
[output8]:

Data Visualiation and Analysis

# Drop the 'Arrival Delay in Minutes' column from the


dataframe

25
df.drop('Arrival Delay in Minutes', axis=1, inplace=True)

# List of rating features to be considered for analysis


rating_features = ['Seat comfort', 'Departure/Arrival time
convenient', 'Food and drink',
'Online support', 'Ease of Online booking',
'On-board service', 'Baggage handling', 'Checkin service',
'Cleanliness',
'Online boarding']

# Set up subplots with 7 rows and 2 columns


fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(16, 16))

# Iterate through the rating features and create bar plots


for i, feature in enumerate(rating_features):
row, col = divmod(i, 2)
sns.barplot(x=feature, y='satisfaction', data=df, ax=axes[
row, col], palette=['salmon', 'skyblue'])

# Setting titles, x-axis labels, and y-axis labels for each


subplot
axes[row, col].set_title(f'Satisfaction vs {feature}')
axes[row, col].set_xlabel(feature)
axes[row, col].set_ylabel('Satisfaction')

# Adjust subplot layout


plt.tight_layout()

# Show the plots


plt.show()

[output9]

26
# Iterate through each rating column

In these graphs, our objective is to discern whether ratings have a direct impact on
customer satisfaction or, in other words, to identify the most influential features
affecting customer satisfaction. Our analysis reveals that "Departure/Arrival time
convenient" and "Gate location" appear to have almost no effect on customer
satisfaction. The ratio of satisfied to dissatisfied customers is nearly equal for these
two features

for col in rating_features:


plt.figure(figsize=(10, 6))
sns.countplot(data=df, x=col, hue='satisfaction', palette=
['salmon', 'skyblue'])

# Setting the title, labels, and rotation for the x-axis tic
ks on a plot
plt.title('Distribution of {}'.format(col))
plt.xlabel('Rating')
plt.ylabel('Count')
plt.xticks(rotation=0)

# Display the plot


plt.show()
[output10]

27
28
29
30
31
# Iterate through each categorical column
for col in ['Gender', 'Customer Type', 'Type of Travel', 'Clas
s']:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x=col, hue='satisfaction', palette=
['salmon', 'skyblue'])

# Setting the title, labels, and legend for a plot depicting


customer satisfaction by a specific feature
plt.title('Customer Satisfaction by {}'.format(col))
plt.xlabel(col)
plt.ylabel('Count')
plt.legend(title='Satisfaction', loc='upper right')

# Display the plot


plt.show()

[output11]=

32
33
# Creating a side-by-side histogram subplot for Flight Distance
and Satisfaction distribution
plt.figure(figsize=(12, 5))

# Creating the left subplot


plt.subplot(1, 2, 1)
plt.hist(df['Flight Distance'], bins=20, color='skyblue', edge
color='black')
plt.xlabel('Flight Distance (miles)')
plt.ylabel('Frequency')
plt.title('Flight Distance Distribution')

34
# Creating the right subplot
plt.subplot(1, 2, 2)
plt.hist(df['satisfaction'], bins=5, color='salmon', edgecolor
='black')
plt.xlabel('Satisfaction')
plt.ylabel('Frequency')
plt.title('Satisfaction Distribution')

# Ensuring proper spacing and layout between subplots


plt.tight_layout()

# Displaying the combined subplots


plt.show()

[output12] =

# Split data into x and y.


X = df.drop("satisfaction", axis=1)
y = df["satisfaction"]

# One hot Endocing .


X = pd.get_dummies(X, columns=['Class'])

# Label Encoding.

35
label_encoder = LabelEncoder()
for i in ['Gender', 'Customer Type', 'Type of Travel']:
X[i] = label_encoder.fit_transform(X[i])

# Select the features you want to scale


selected_features = X[["Age", "Flight Distance", "Departure
Delay in Minutes"]]

# Create a scaler object.


scaler = StandardScaler()

# Fit scaler on the selected features.


scaler.fit(selected_features)

# Transform the selected features with the scaler.


selected_features_scaled =
scaler.transform(selected_features)

# Replace the original columns with the scaled values in the


DataFrame
X[["Age", "Flight Distance", "Departure Delay in Minutes"]]
= selected_features_scaled

X.head()

# Define a list of models to evaluate


models = [
LogisticRegression(max_iter=1000),
DecisionTreeClassifier(),
RandomForestClassifier()
]

# Split train data into train and test.


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

36
# Iterate over each model and evaluate its accuracy using
cross-validation.
for model in models:
scores = cross_val_score(model, X_train, y_train, cv=5)

# Print the mean accuracy score for the current model


print(f"{model.__class__.__name__}: Mean Accuracy =
{scores.mean()}")

# Make tha random forest model with specific best


hyperparameters.
model = RandomForestClassifier()

# Fit the model.


model.fit(X_train, y_train)

# Predict y-predict.
y_pred = model.predict(X_test)

# Evaluate the accuracy of y-predict.


acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.3f}")

# Get feature importances


feature_importance = model.feature_importances_

# Sort feature importance indices


sorted_idx = np.argsort(feature_importance)

# Retrieve feature names


features = X.columns

# Plot feature importance


plt.figure()
plt.barh(range(len(sorted_idx)),
feature_importance[sorted_idx], align='center',

37
color='skyblue')
plt.yticks(range(len(sorted_idx)), features[sorted_idx])
plt.xlabel('Feature Importance')
plt.title('Feature Importance')
plt.show()

output

38
6.CONCLUSION

In conclusion, customer segmentation is an


indispensable strategy in the dynamic realm of e-commerce. By categorizing the
diverse customer base into distinct segments based on shared characteristics,
behaviors, and preferences, businesses can unlock a multitude of benefits.

Effective customer segmentation empowers e-commerce platforms to craft


personalized experiences that resonate on a deeply individual level. This precision in
targeting leads to higher customer satisfaction, increased conversion rates, and
bolstered long-term loyalty.

Moreover, segmentation allows businesses to refine product offerings, optimize


marketing strategies, and enhance the overall shopping experience. By understanding
the nuanced needs of different customer groups, e-commerce platforms can rise
above the noise and forge enduring connections with their audience.

The analysis of customer satisfaction is not only a strategic advantage but a necessity.
Airlines that invest in understanding and improving customer satisfaction are better
positioned to retain and attract passengers, achieve financial success, and adapt to
changing market conditions.

In conclusion, the project on Airlines Customer Satisfaction, which combines


Exploratory Data Analysis (EDA) and Modeling, represents a comprehensive effort to
understand, predict, and improve customer satisfaction within the airline industry.

In a dynamic and competitive industry like aviation, understanding and improving


customer satisfaction are essential for long-term success. This project provides
airlines with actionable insights and strategies to enhance their services, retain and
attract customers, and achieve financial sustainability. It underscores the significance
of data-driven decision-making and continuous efforts to meet customer expectations
in an ever-evolving market. The success of this project ultimately lies in the airline's
commitment to implementing the recommendations and adapting to changing
customer preferences and industry dynamics.

39
7.REFERENCE

1. https://www.mdpi.com/0718-1876/18/1/29

2. https://www.omniconvert.com/blog/customer-segmentation-
models/https://www.omniconvert.com/blog/customer-
segmentation-models/

3. https://github.com/Sillians/E-Commerce-Customer-Segmentation-Project

4. https://github.com/karthickr7/E-commerce-Customer-Segmentation

40

You might also like