0% found this document useful (0 votes)

6 views

Online Sales Data Analysis

Uploaded by

Amine Benseddiq

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Online Sales Data Analysis

Uploaded by

Amine Benseddiq

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

# This Python 3 environment comes with many helpful analytics

libraries installed
# It is defined by the kaggle/python Docker image:
https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

# Input data files are available in the read-only "../input/"

directory
# For example, running this (by clicking run or pressing Shift+Enter)
will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/)

that gets preserved as output when you create a version using "Save &
Run All"
# You can also write temporary files to /kaggle/temp/, but they won't
be saved outside of the current session

/kaggle/input/online-sales-data/Details.csv
/kaggle/input/online-sales-data/Orders.csv

# reading the datasets

details_df =
pd.read_csv("/kaggle/input/online-sales-data/Details.csv")
orders_df = pd.read_csv("/kaggle/input/online-sales-data/Orders.csv")

Diplaying the first rows of the details dataset

details_df.head()

Order ID Amount Profit Quantity Category Sub-Category \

0 B-25681 1096 658 7 Electronics Electronic Games
1 B-26055 5729 64 14 Furniture Chairs
2 B-25955 2927 146 8 Furniture Bookcases
3 B-26093 2847 712 8 Electronics Printers
4 B-25602 2617 1151 4 Electronics Phones

PaymentMode
0 COD
1 EMI
2 EMI
3 Credit Card
4 Credit Card

Diplaying the first rows of the orders dataset

orders_df.head()

Order ID Order Date CustomerName State City Year

Month
0 B-26055 2018-10-03 Harivansh Uttar Pradesh Mathura 2018
10
1 B-25993 2018-03-02 Madhav Delhi Delhi 2018
3
2 B-25973 2018-01-24 Madan Mohan Uttar Pradesh Mathura 2018
1
3 B-25923 2018-12-27 Gopal Maharashtra Mumbai 2018
12
4 B-25757 2018-08-21 Vishakha Madhya Pradesh Indore 2018
8

Exploring the datasets

# Details Dataset
print(details_df.info())
print(details_df.describe())

# Orders Dataset
print(orders_df.info())
print(orders_df.describe())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 1500 non-null object
1 Amount 1500 non-null int64
2 Profit 1500 non-null int64
3 Quantity 1500 non-null int64
4 Category 1500 non-null object
5 Sub-Category 1500 non-null object
6 PaymentMode 1500 non-null object
dtypes: int64(3), object(4)
memory usage: 82.2+ KB
None
Amount Profit Quantity
count 1500.000000 1500.00000 1500.000000
mean 291.847333 24.64200 3.743333
std 461.924620 168.55881 2.184942
min 4.000000 -1981.00000 1.000000
25% 47.750000 -12.00000 2.000000
50% 122.000000 8.00000 3.000000
75% 326.250000 38.00000 5.000000
max 5729.000000 1864.00000 14.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 500 non-null object
1 Order Date 500 non-null object
2 CustomerName 500 non-null object
3 State 500 non-null object
4 City 500 non-null object
dtypes: object(5)
memory usage: 19.7+ KB
None
Order ID Order Date CustomerName State City
count 500 500 500 500 500
unique 500 307 336 19 25
top B-26055 24-11-2018 Shreya Maharashtra Indore
freq 1 7 6 94 71

print(details_df.isnull().sum())
print(orders_df.isnull().sum())

Order ID 0
Amount 0
Profit 0
Quantity 0
Category 0
Sub-Category 0
PaymentMode 0
dtype: int64
Order ID 0
Order Date 0
CustomerName 0
State 0
City 0
dtype: int64
The two datasets have non null values

Converting the Order Date column to Date type

# Converting the Order Date column to Date type
orders_df["Order Date"] = pd.to_datetime(orders_df["Order
Date"],format="mixed")
orders_df["Year"] = orders_df["Order Date"].dt.year
orders_df["Month"] = orders_df["Order Date"].dt.month
orders_df.head()

Order ID Order Date CustomerName State City Year

# Merge datasets on 'Order ID'

merged_df = pd.merge(details_df, orders_df, on='Order ID',
how='inner')

# Display the merged dataset to inspect the results

merged_df.head()

Order ID Amount Profit Quantity Category Sub-Category \

0 B-25681 1096 658 7 Electronics Electronic Games
1 B-25681 1625 -77 3 Electronics Phones
2 B-25681 523 204 7 Clothing Trousers
3 B-25681 44 -3 1 Clothing Saree
4 B-25681 243 -14 2 Furniture Chairs

PaymentMode Order Date CustomerName State City Year

Month
0 COD 2018-04-06 Bhawna Madhya Pradesh Indore 2018
4
1 EMI 2018-04-06 Bhawna Madhya Pradesh Indore 2018
4
2 COD 2018-04-06 Bhawna Madhya Pradesh Indore 2018
4
3 Debit Card 2018-04-06 Bhawna Madhya Pradesh Indore 2018
4
4 COD 2018-04-06 Bhawna Madhya Pradesh Indore 2018
4

What are most sold category of products?

# Sales by Category
category_sales = merged_df.groupby('Category')
['Amount'].sum().reset_index()
print(category_sales)

Category Amount
0 Clothing 144323
1 Electronics 166267
2 Furniture 127181

Visualizing the sales by category

plt.figure(figsize=(12, 6))
sns.barplot(x='Category', y='Amount', data=category_sales)
plt.title('Total Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.show()
Observation : The Electronics Category is the most sold category
followed by Clothing.

What are the most sold products in Electronics?

electronics_df = merged_df[merged_df["Category"]=="Electronics"]
subcategory_sales = electronics_df.groupby("Sub-Category")
["Amount"].sum().reset_index()
most_sold_products =
subcategory_sales.sort_values(by="Amount",ascending=False)

plt.figure(figsize=(12,6))
sns.barplot(x="Sub-Category",y="Amount",data=most_sold_products)
plt.title("Total sales by products")
plt.xlabel("Products")
plt.ylabel("Total Sales Amount")
plt.show()
Observation : Printers are the most sold Products in Electronics.

What are the cities and states that have the

most sales?
# Sales by City
city_sales = merged_df.groupby('City')
['Amount'].sum().reset_index().sort_values(by='Amount',
ascending=False)

# Sales by State
state_sales = merged_df.groupby('State')
['Amount'].sum().reset_index().sort_values(by='Amount',
ascending=False)

Visualizing sales by city and state

# Set up the matplotlib figure
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(14, 12))

# Bar chart for Sales by City

sns.barplot(x='Amount', y='City', data=city_sales.head(10),
ax=axes[0], palette='viridis')
axes[0].set_title('Top 10 Cities by Sales')
axes[0].set_xlabel('Total Sales')
axes[0].set_ylabel('City')

# Bar chart for Sales by State

sns.barplot(x='Amount', y='State', data=state_sales.head(10),
ax=axes[1], palette='viridis')
axes[1].set_title('Top 10 States by Sales')
axes[1].set_xlabel('Total Sales')
axes[1].set_ylabel('State')

# Adjust layout to prevent overlap

plt.tight_layout()

# Show the plots

plt.show()
Observation : The graphs demonstrates that the states of
"Maharashtra" and "Madhya Pradesh" are the most profitable states
for the company, also the cities of "Indore" and "Mumbai" have the
most sales compared to the other cities in India.

What are the Months that have the most sales?

monthly_sales = merged_df.groupby("Month")
["Amount"].sum().reset_index().sort_values(by="Amount",ascending=False
)

plt.figure(figsize=(14, 6))
month_names = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
sns.barplot(x="Month",y="Amount",data = monthly_sales)
plt.title("Sales amount by month")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.xticks(ticks=range(12),labels=month_names)
plt.show()

Observation : January is the month to have the most sales, followed by

August and October.

Sales Dataset Analysis
No ratings yet
Sales Dataset Analysis
28 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
documentpython2
No ratings yet
documentpython2
22 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
20 pages
Project
No ratings yet
Project
12 pages
ML 5
No ratings yet
ML 5
11 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
SalesDataAnalysis__1693296057
No ratings yet
SalesDataAnalysis__1693296057
14 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
Amazon Sales Reports - Jupyter Notebook
No ratings yet
Amazon Sales Reports - Jupyter Notebook
29 pages
DMV - 1 - Jupyter Notebook
No ratings yet
DMV - 1 - Jupyter Notebook
4 pages
Pandas Notebook
No ratings yet
Pandas Notebook
24 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
GRL - EX - 4 (1) .Ipynb - Colaboratory
No ratings yet
GRL - EX - 4 (1) .Ipynb - Colaboratory
7 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Pandas
No ratings yet
Pandas
21 pages
DMV Lab 7
No ratings yet
DMV Lab 7
9 pages
2023 08 05 13 43 36 - 1691223216
No ratings yet
2023 08 05 13 43 36 - 1691223216
7 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Masterclass Data Analysis.ipynb - Colab
No ratings yet
Masterclass Data Analysis.ipynb - Colab
4 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
No ratings yet
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
9 pages
Implement K-Means Clustering.: Preprocessing
No ratings yet
Implement K-Means Clustering.: Preprocessing
8 pages
Siddhesh Asati: #Group: B (ML)
No ratings yet
Siddhesh Asati: #Group: B (ML)
9 pages
Amazon Sales Analysis-1
No ratings yet
Amazon Sales Analysis-1
14 pages
EcommerceAnalysis 1680541297
No ratings yet
EcommerceAnalysis 1680541297
11 pages
Grocery
No ratings yet
Grocery
41 pages
DOC-20241028-WA0016.
No ratings yet
DOC-20241028-WA0016.
13 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Python - Pandas_Numpy Interview Q&A
No ratings yet
Python - Pandas_Numpy Interview Q&A
12 pages
ML Practical 4D
No ratings yet
ML Practical 4D
11 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
KPMG - Task 1
No ratings yet
KPMG - Task 1
22 pages
Supermarket Sales Data analysis
No ratings yet
Supermarket Sales Data analysis
6 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
5-2a dataframes column operations - instruction
No ratings yet
5-2a dataframes column operations - instruction
2 pages
7
No ratings yet
7
18 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Set B
No ratings yet
Set B
8 pages
Task 2 Exploratory Data Analysis
No ratings yet
Task 2 Exploratory Data Analysis
5 pages
RFM - Analysis - Ipynb - Colaboratory
No ratings yet
RFM - Analysis - Ipynb - Colaboratory
10 pages
Data description
No ratings yet
Data description
6 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
21 pages
Project 2
No ratings yet
Project 2
40 pages
Practical File IP Class 12 2024 25 Sharing Removed
No ratings yet
Practical File IP Class 12 2024 25 Sharing Removed
29 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Online Reatil Data
No ratings yet
Online Reatil Data
3 pages
Geakmindz Test.ipynb - Colab
No ratings yet
Geakmindz Test.ipynb - Colab
8 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
IP Practical PRGM
No ratings yet
IP Practical PRGM
41 pages
EDA data mining
No ratings yet
EDA data mining
70 pages
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
From Everand
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
Abdelfattah Ragab
No ratings yet
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
Design of Experiment Project Report
No ratings yet
Design of Experiment Project Report
10 pages
Fiche Technique Video
No ratings yet
Fiche Technique Video
19 pages
HIS MasterPlan 2016 2020
No ratings yet
HIS MasterPlan 2016 2020
129 pages
cv
No ratings yet
cv
2 pages
Power Flow Analysis Using ETAP Software
100% (1)
Power Flow Analysis Using ETAP Software
68 pages
The Data Coding System
No ratings yet
The Data Coding System
63 pages
Buy ebook Coaching Evoking Excellence in Others 2 edition Edition Flaherty J. cheap price
100% (8)
Buy ebook Coaching Evoking Excellence in Others 2 edition Edition Flaherty J. cheap price
27 pages
Communications Systems: Unit 16
100% (1)
Communications Systems: Unit 16
8 pages
Chatbot For E-Commerce Assistance: Based On RASA
No ratings yet
Chatbot For E-Commerce Assistance: Based On RASA
7 pages
Now Learning: User's Guide
No ratings yet
Now Learning: User's Guide
15 pages
Samsung Hg24ee690ab Hwl60
No ratings yet
Samsung Hg24ee690ab Hwl60
181 pages
Ict Brochure
0% (1)
Ict Brochure
2 pages
An Exhuastive Guide On MEAN Stack Development
No ratings yet
An Exhuastive Guide On MEAN Stack Development
5 pages
Snx5176B Differential Bus Transceivers: 1 Features 3 Description
No ratings yet
Snx5176B Differential Bus Transceivers: 1 Features 3 Description
28 pages
A Study On Cybersecurity Risk and Their Impact On Financial Institutions
No ratings yet
A Study On Cybersecurity Risk and Their Impact On Financial Institutions
114 pages
Wipo Icc Smes 08 Topic03-Related2
No ratings yet
Wipo Icc Smes 08 Topic03-Related2
10 pages
Homogeneous - With Shearing
No ratings yet
Homogeneous - With Shearing
18 pages
FIXS Stunnel Implementation Guide for FIX Applications v1.0
No ratings yet
FIXS Stunnel Implementation Guide for FIX Applications v1.0
19 pages
Educația Financiară
No ratings yet
Educația Financiară
31 pages
Iot and Wireless Sensor Network Based Autonomous Farming Robot
No ratings yet
Iot and Wireless Sensor Network Based Autonomous Farming Robot
5 pages
Cyspher Jan Tomas - CV2
No ratings yet
Cyspher Jan Tomas - CV2
5 pages
Cloud Computing Nayan Ruparelia download pdf
100% (6)
Cloud Computing Nayan Ruparelia download pdf
81 pages
Inventor Drawing Resource Transfer V3 PDF
No ratings yet
Inventor Drawing Resource Transfer V3 PDF
4 pages
Text
No ratings yet
Text
2 pages
E-Challan-Management-System (2)
No ratings yet
E-Challan-Management-System (2)
12 pages
preprocessing-an-image
No ratings yet
preprocessing-an-image
6 pages
HCL Drive
No ratings yet
HCL Drive
1 page
SIM 03 Mill 4 AX: Siemens PLM Software
No ratings yet
SIM 03 Mill 4 AX: Siemens PLM Software
11 pages
D PTR (Ebp+0xc) ) That Is Popular Among Windows Users.: Default Mode
No ratings yet
D PTR (Ebp+0xc) ) That Is Popular Among Windows Users.: Default Mode
2 pages
Chapter 1 SQL
No ratings yet
Chapter 1 SQL
45 pages

Online Sales Data Analysis

Uploaded by

Online Sales Data Analysis

Uploaded by

# This Python 3 environment comes with many helpful analytics

import numpy as np # linear algebra

# Input data files are available in the read-only "../input/"

# You can write up to 20GB to the current directory (/kaggle/working/)

# reading the datasets

Diplaying the first rows of the details dataset

Order ID Amount Profit Quantity Category Sub-Category \

Diplaying the first rows of the orders dataset

Order ID Order Date CustomerName State City Year

Exploring the datasets

Converting the Order Date column to Date type

Order ID Order Date CustomerName State City Year

# Merge datasets on 'Order ID'

# Display the merged dataset to inspect the results

Order ID Amount Profit Quantity Category Sub-Category \

PaymentMode Order Date CustomerName State City Year

What are most sold category of products?

Visualizing the sales by category

What are the most sold products in Electronics?

What are the cities and states that have the

Visualizing sales by city and state

# Bar chart for Sales by City

# Bar chart for Sales by State

# Adjust layout to prevent overlap

# Show the plots

What are the Months that have the most sales?

Observation : January is the month to have the most sales, followed by

You might also like