Online Sales Data Analysis
Online Sales Data Analysis
libraries installed
# It is defined by the kaggle/python Docker image:
https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
/kaggle/input/online-sales-data/Details.csv
/kaggle/input/online-sales-data/Orders.csv
PaymentMode
0 COD
1 EMI
2 EMI
3 Credit Card
4 Credit Card
# Orders Dataset
print(orders_df.info())
print(orders_df.describe())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 1500 non-null object
1 Amount 1500 non-null int64
2 Profit 1500 non-null int64
3 Quantity 1500 non-null int64
4 Category 1500 non-null object
5 Sub-Category 1500 non-null object
6 PaymentMode 1500 non-null object
dtypes: int64(3), object(4)
memory usage: 82.2+ KB
None
Amount Profit Quantity
count 1500.000000 1500.00000 1500.000000
mean 291.847333 24.64200 3.743333
std 461.924620 168.55881 2.184942
min 4.000000 -1981.00000 1.000000
25% 47.750000 -12.00000 2.000000
50% 122.000000 8.00000 3.000000
75% 326.250000 38.00000 5.000000
max 5729.000000 1864.00000 14.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 500 non-null object
1 Order Date 500 non-null object
2 CustomerName 500 non-null object
3 State 500 non-null object
4 City 500 non-null object
dtypes: object(5)
memory usage: 19.7+ KB
None
Order ID Order Date CustomerName State City
count 500 500 500 500 500
unique 500 307 336 19 25
top B-26055 24-11-2018 Shreya Maharashtra Indore
freq 1 7 6 94 71
print(details_df.isnull().sum())
print(orders_df.isnull().sum())
Order ID 0
Amount 0
Profit 0
Quantity 0
Category 0
Sub-Category 0
PaymentMode 0
dtype: int64
Order ID 0
Order Date 0
CustomerName 0
State 0
City 0
dtype: int64
The two datasets have non null values
Category Amount
0 Clothing 144323
1 Electronics 166267
2 Furniture 127181
plt.figure(figsize=(12,6))
sns.barplot(x="Sub-Category",y="Amount",data=most_sold_products)
plt.title("Total sales by products")
plt.xlabel("Products")
plt.ylabel("Total Sales Amount")
plt.show()
Observation : Printers are the most sold Products in Electronics.
# Sales by State
state_sales = merged_df.groupby('State')
['Amount'].sum().reset_index().sort_values(by='Amount',
ascending=False)
plt.figure(figsize=(14, 6))
month_names = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
sns.barplot(x="Month",y="Amount",data = monthly_sales)
plt.title("Sales amount by month")
plt.xlabel("Month")
plt.ylabel("Sales")
plt.xticks(ticks=range(12),labels=month_names)
plt.show()