0% found this document useful (0 votes)
28 views8 pages

Python

The document contains code to import pandas and read an Excel file containing e-commerce sales data into a DataFrame. It then displays the head and tail of the DataFrame, provides shape and data type information, checks for duplicates, and drops an unnecessary column. The DataFrame contains over 34,000 rows of sales data including product, site, year, quantity, price and other fields.

Uploaded by

Abhishek Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views8 pages

Python

The document contains code to import pandas and read an Excel file containing e-commerce sales data into a DataFrame. It then displays the head and tail of the DataFrame, provides shape and data type information, checks for duplicates, and drops an unnecessary column. The DataFrame contains over 34,000 rows of sales data including product, site, year, quantity, price and other fields.

Uploaded by

Abhishek Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

12/30/22, 12:35PM Untitled35.

ipynb - Colaboratory

import pandas as pd
import numpy as np
ECom_Sales=pd.read_excel(r"/content/drive/MyDrive/Python programming/Sales.xlsx")
display(ECom_Sales)

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price

0 F0033 AHMEDABAD 2021 4 0 NO 1621.0 54

1 F0033 AHMEDABAD 2021 5 0 NO 651.0 51

2 F0033 AHMEDABAD 2021 6 0 NO 457.0 46

3 F0033 AHMEDABAD 2021 7 0 NO 1985.0 41

4 F0033 AHMEDABAD 2021 8 0 NO 6.0 38

... ... ... ... ... ... ... ... ...

34949 T0270 VIJAYAWADA 2019 9 1 NO 1008.0 40

34950 T0270 VIJAYAWADA 2019 10 1 NO 1179.0 40

34951 T0270 VIJAYAWADA 2019 12 1 NO 81.0 40

34952 T0270 VIJAYAWADA 2020 1 1 NO 580.0 40

34953 T0270 VIJAYAWADA 2020 2 1 NO 180.0 40

34954 rows × 19 columns

ECom_Sales.head(10)

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP Pa

0 F0033 AHMEDABAD 2021 4 0 NO 1621.0 54 82557 20 93640 80.0

1 F0033 AHMEDABAD 2021 5 0 NO 651.0 51 31473 21 35760 80.0

2 F0033 AHMEDABAD 2021 6 0 NO 457.0 46 19580 12 22293 70.0

3 F0033 AHMEDABAD 2021 7 0 NO 1985.0 41 78144 0 78144 70.0

4 F0033 AHMEDABAD 2021 8 0 NO 6.0 38 228 0 228 70.0

5 F0033 AHMEDABAD 2021 9 0 NO 1440.0 43 57795 0 57795 70.0

6 F0033 AHMEDABAD 2021 11 0 NO 1522.0 46 64639 0 64639 70.0

7 F0033 AHMEDABAD 2021 12 0 NO 3168.0 45 136603 0 136603 70.0

8 F0033 AHMEDABAD 2018 1 0 NO 1584.0 45 68425 0 68425 70.0

9 F0033 AHMEDABAD 2018 2 0 NO 5981.0 46 258791 0 258791 70.0

ECom_Sales.tail(10)

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 1/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP
len(ECom_Sales.index)
34944 T0270 VIJAYAWADA 2019 1 1 NO 225.0 34 7369 0 7369 50.0

34954
34945 T0270 VIJAYAWADA 2019 2 1 NO 75.0 34 2456 0 2456 50.0

34946 T0270 VIJAYAWADA 2019 6 1 NO 2820.0 33 91630 0 91630 50.0


len(ECom_Sales.columns)
34947 T0270 VIJAYAWADA 2019 7 1 NO 816.0 40 32070 0 32070 60.0
19
34948 T0270 VIJAYAWADA 2019 8 1 NO 228.0 40 8667 0 8667 60.0

34949 T0270 VIJAYAWADA 2019 9 1 NO 1008.0 40 38321 0 38321 60.0


ECom_Sales.info()
34950 T0270 VIJAYAWADA 2019 10 1 NO 1179.0 40 44215 0 44215 60.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex:
34951 34954 entries,
T0270 0 to 2019
VIJAYAWADA 34953 12 1 NO 81.0 40 2947 0 2947 60.0
Data columns (total 19 columns):
34952
# Column T0270 VIJAYAWADA
Non-Null Count 1 Dtype
2020 1 NO 580.0 40 21125 0 21125 60.0

34953
0 T0270 VIJAYAWADA
ParentSKU 349542020
non-null 2 object 1 NO 180.0 40 6550 0 6550 60.0
1 Site_Id 34954 non-null object
2 Year 34954 non-null int64
3 Month 34954 non-null int64
4 Product_Category 34954 non-null int64
5 Unit 34954 non-null object
6 Quantity 34954 non-null float64
7 Price 34954 non-null int64
8 Net_Sales 34954 non-null int64
9 Cash_Discount 34954 non-null int64
10 Customer_Amount 34954 non-null int64
11 MRP 34954 non-null float64
12 Pack_Size 34954 non-null float64
13 Pack_Unit_Id 34954 non-null object
14 State 34954 non-null object
15 Zone 34954 non-null object
16 Master_Category 34954 non-null int64
17 Size 34954 non-null object
18 Colour_Specification 34954 non-null object
dtypes: float64(3), int64(8), object(8)
memory usage: 5.1+ MB

ECom_Sales.duplicated()

0 False
1 False
2 False
3 False
4 False
...
34949 False
34950 False
34951 False
34952 False
34953 False
Length: 34954, dtype: bool

ECom_Sales.drop(columns=["Colour_Specification"])

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP

0 F0033 AHMEDABAD 2021 4 0 NO 1621.0 54 82557 20 93640 80.0

1 F0033 AHMEDABAD 2021 5 0 NO 651.0 51 31473 21 35760 80.0

2 F0033 AHMEDABAD 2021 6 0 NO 457.0 46 19580 12 22293 70.0

3 F0033 AHMEDABAD 2021 7 0 NO 1985.0 41 78144 0 78144 70.0

4 F0033 AHMEDABAD 2021 8 0 NO 6.0 38 228 0 228 70.0

... ... ... ... ... ... ... ... ... ... ... ... ...

34949 T0270 VIJAYAWADA 2019 9 1 NO 1008.0 40 38321 0 38321 60.0

34950 T0270 VIJAYAWADA 2019 10 1 NO 1179.0 40 44215 0 44215 60.0

34951 T0270 VIJAYAWADA 2019 12 1 NO 81.0 40 2947 0 2947 60.0

34952 T0270 VIJAYAWADA 2020 1 1 NO 580.0 40 21125 0 21125 60.0

34953 T0270 VIJAYAWADA 2020 2 1 NO 180.0 40 6550 0 6550 60.0

34954 rows × 18 columns

ECom_Sales.dtypes

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 2/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

ParentSKU object
Site_Id object
Year int64
Month int64
Product_Category int64
Unit object
Quantity float64
Price int64
Net_Sales int64
Cash_Discount int64
Customer_Amount int64
MRP float64
Pack_Size float64
Pack_Unit_Id object
State object
Zone object
Master_Category int64
Size object
Colour_Specification object
dtype: object

ECom_Sales.isna()

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP P

0 False False False False False False False False False False False False

1 False False False False False False False False False False False False

2 False False False False False False False False False False False False

3 False False False False False False False False False False False False

4 False False False False False False False False False False False False

... ... ... ... ... ... ... ... ... ... ... ... ...

34949 False False False False False False False False False False False False

34950 False False False False False False False False False False False False

34951 False False False False False False False False False False False False

34952 False False False False False False False False False False False False

34953 False False False False False False False False False False False False

34954 rows × 19 columns

ECom_Sales.isna().sum()/ECom_Sales.shape[0]

ParentSKU 0.0
Site_Id 0.0
Year 0.0
Month 0.0
Product_Category 0.0
Unit 0.0
Quantity 0.0
Price 0.0
Net_Sales 0.0
Cash_Discount 0.0
Customer_Amount 0.0
MRP 0.0
Pack_Size 0.0
Pack_Unit_Id 0.0
State 0.0
Zone 0.0
Master_Category 0.0
Size 0.0
Colour_Specification 0.0
dtype: float64

ECom_Sales.duplicated()

0 False
1 False
2 False
3 False
4 False
...
34949 False
34950 False
34951 False
34952 False
34953 False
Length: 34954, dtype: bool

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 3/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

ECom_Sales.drop_duplicates()

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP

0 F0033 AHMEDABAD 2021 4 0 NO 1621.0 54 82557 20 93640 80.0

1 F0033 AHMEDABAD 2021 5 0 NO 651.0 51 31473 21 35760 80.0

2 F0033 AHMEDABAD 2021 6 0 NO 457.0 46 19580 12 22293 70.0

3 F0033 AHMEDABAD 2021 7 0 NO 1985.0 41 78144 0 78144 70.0

4 F0033 AHMEDABAD 2021 8 0 NO 6.0 38 228 0 228 70.0

... ... ... ... ... ... ... ... ... ... ... ... ...

34949 T0270 VIJAYAWADA 2019 9 1 NO 1008.0 40 38321 0 38321 60.0

34950 T0270 VIJAYAWADA 2019 10 1 NO 1179.0 40 44215 0 44215 60.0

34951 T0270 VIJAYAWADA 2019 12 1 NO 81.0 40 2947 0 2947 60.0

34952 T0270 VIJAYAWADA 2020 1 1 NO 580.0 40 21125 0 21125 60.0

34953 T0270 VIJAYAWADA 2020 2 1 NO 180.0 40 6550 0 6550 60.0

34950 rows × 19 columns

#Non Categorical
ECom_Sales.describe()

Year Month Product_Category Quantity Price Net_Sales Cash_Discount Customer_Amount

count 34954.000000 34954.000000 34954.000000 34954.000000 34954.000000 3.495400e+04 34954.000000 3.495400e+04 34954.0

mean 2019.201779 6.592293 5.649682 648.388627 170.610173 6.821097e+04 0.828003 6.882714e+04 267.8

std 1.153980 3.480937 2.305392 1692.646968 151.154044 1.311820e+05 7.367032 1.316198e+05 229.2

min 2018.000000 1.000000 0.000000 1.000000 27.000000 2.800000e+01 0.000000 2.800000e+01 33.8

25% 2018.000000 4.000000 5.000000 113.000000 95.000000 1.524125e+04 0.000000 1.536900e+04 153.3

50% 2019.000000 7.000000 7.000000 275.000000 120.000000 3.379600e+04 0.000000 3.413150e+04 190.0

75% 2020.000000 10.000000 7.000000 616.000000 178.000000 7.258750e+04 0.000000 7.341675e+04 295.0

max 2021.000000 12.000000 8.000000 52226.000000 1023.000000 5.448294e+06 325.000000 5.448294e+06 1500.0

#Including Categorical
ECom_Sales.describe(include='all')

ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Disco

count 34954 34954 34954.000000 34954.000000 34954.000000 34954 34954.000000 34954.000000 3.495400e+04 34954.000

unique 52 25 NaN NaN NaN 2 NaN NaN NaN

top F0099 DELMDK NaN NaN NaN NO NaN NaN NaN

freq 803 1607 NaN NaN NaN 34953 NaN NaN NaN

mean NaN NaN 2019.201779 6.592293 5.649682 NaN 648.388627 170.610173 6.821097e+04 0.828

std NaN NaN 1.153980 3.480937 2.305392 NaN 1692.646968 151.154044 1.311820e+05 7.367

min NaN NaN 2018.000000 1.000000 0.000000 NaN 1.000000 27.000000 2.800000e+01 0.000

25% NaN NaN 2018.000000 4.000000 5.000000 NaN 113.000000 95.000000 1.524125e+04 0.000

50% NaN NaN 2019.000000 7.000000 7.000000 NaN 275.000000 120.000000 3.379600e+04 0.000

75% NaN NaN 2020.000000 10.000000 7.000000 NaN 616.000000 178.000000 7.258750e+04 0.000

max NaN NaN 2021.000000 12.000000 8.000000 NaN 52226.000000 1023.000000 5.448294e+06 325.000

ECom_Sales.plot(x='Quantity',y='Price',kind = 'scatter')

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 4/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

<matplotlib.axes._subplots.AxesSubplot at 0x7f5b782d8340>

ECom_Sales.groupby('Product_Category').Quantity.sum().plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x7f5b780e1ac0>

import·pandas·as·pd
import·numpy·as·np
import·statsmodels.api·as·sm
import·pandas.util.testing·as·tm
from·sklearn.model_selection·import·train_test_split
SMART=pd.read_excel(r"/content/drive/MyDrive/Python·programming/Retail.xlsx")·
display(SMART)

<ipython-input-39-9381febfad9d>:4: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.
import pandas.util.testing as tm
Store Date Weekly_Sales Temperature Fuel_Price CPI Unemployment

0 1 2022-02-05 1643690.90 42.31 2.572 211.096358 8.106

1 1 2022-02-12 1641957.44 38.51 2.548 211.242170 8.106

2 1 2022-02-19 1611968.17 39.93 2.514 211.289143 8.106

3 1 2022-02-26 1409727.59 46.63 2.561 211.319643 8.106

4 1 2022-03-05 1554806.68 46.50 2.625 211.350143 8.106

... ... ... ... ... ... ... ...

6430 45 2012-09-28 713173.95 64.88 3.997 192.013558 8.684

6431 45 2012-10-05 733455.07 64.89 3.985 192.170412 8.667

6432 45 2012-10-12 734464.36 54.47 4.000 192.327265 8.667

6433 45 2012-10-19 718125.53 56.47 3.969 192.330854 8.667

6434 45 2012-10-26 760281.43 58.85 3.882 192.308899 8.667

6435 rows × 7 columns

SMART=pd.read_excel(r"/content/drive/MyDrive/Python programming/Retail.xlsx")
display(SMART)

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 5/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

Store Date Weekly_Sales Temperature Fuel_Price CPI Unemployment

0 1 2022-02-05 1643690.90 42.31 2.572 211.096358 8.106

1 1 2022-02-12 1641957.44 38.51 2.548 211.242170 8.106

2 1 2022-02-19 1611968.17 39.93 2.514 211.289143 8.106

3 1 2022-02-26 1409727.59 46.63 2.561 211.319643 8.106

4 1 2022-03-05 1554806.68 46.50 2.625 211.350143 8.106

... ... ... ... ... ... ... ...

6430 45 2012-09-28 713173.95 64.88 3.997 192.013558 8.684


SMART.describe()
6431 45 2012-10-05 733455.07 64.89 3.985 192.170412 8.667

6432 45 Store
2012-10-12 734464.36
Weekly_Sales Temperature54.47
Fuel_Price4.000 192.327265 8.667
CPI Unemployment
6433 45 2012-10-19
count 6435.000000 718125.53
6.435000e+03 6435.00000056.47
6435.0000003.969 192.330854 6435.000000
6435.000000 8.667

6434
mean 45 2012-10-26
23.000000 760281.4360.66378258.85 3.3586073.882
1.046965e+06 192.308899
171.578394 8.667
7.999151
6435
stdrows ×12.988182
7 columns 5.643666e+05 18.444933 0.459020 39.356712 1.875885

min 1.000000 2.099862e+05 -2.060000 2.472000 126.064000 3.879000

25% 12.000000 5.533501e+05 47.460000 2.933000 131.735000 6.891000

50% 23.000000 9.607460e+05 62.670000 3.445000 182.616521 7.874000

75% 34.000000 1.420159e+06 74.940000 3.735000 212.743293 8.622000

max 45.000000 3.818686e+06 100.140000 4.468000 227.232807 14.313000

SMART.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6435 entries, 0 to 6434
Data columns (total 7 columns):
# Column Non-Null Count Dtype

0 Store 6435 non-null int64


1 Date 6435 non-null datetime64[ns]
2 Weekly_Sales 6435 non-null float64
3 Temperature 6435 non-null float64
4 Fuel_Price 6435 non-null float64
5 CPI 6435 non-null float64
6 Unemployment 6435 non-null float64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 352.0 KB

SMART.dtypes

Store int64
Date datetime64[ns]
Weekly_Sales float64
Temperature float64
Fuel_Price float64
CPI float64
Unemployment float64
dtype: object

#Independetn Variable
CP=pd.DataFrame(SMART,columns=['CPI','Unemployment','Fuel_Price'])
X=sm.add_constant(CP)
X.head(5)

/usr/local/lib/python3.8/dist-packages/statsmodels/tsa/tsatools.py:142: FutureWarning: In a future version of pandas all arguments


x = pd.concat(x[::order], 1)
const CPI Unemployment Fuel_Price

0 1.0 211.096358 8.106 2.572

1 1.0 211.242170 8.106 2.548

2 1.0 211.289143 8.106 2.514

3 1.0 211.319643 8.106 2.561

4 1.0 211.350143 8.106 2.625

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 6/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 7/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory

https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 8/7

You might also like