Python
Python
ipynb - Colaboratory
import pandas as pd
import numpy as np
ECom_Sales=pd.read_excel(r"/content/drive/MyDrive/Python programming/Sales.xlsx")
display(ECom_Sales)
ECom_Sales.head(10)
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP Pa
ECom_Sales.tail(10)
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 1/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP
len(ECom_Sales.index)
34944 T0270 VIJAYAWADA 2019 1 1 NO 225.0 34 7369 0 7369 50.0
34954
34945 T0270 VIJAYAWADA 2019 2 1 NO 75.0 34 2456 0 2456 50.0
34953
0 T0270 VIJAYAWADA
ParentSKU 349542020
non-null 2 object 1 NO 180.0 40 6550 0 6550 60.0
1 Site_Id 34954 non-null object
2 Year 34954 non-null int64
3 Month 34954 non-null int64
4 Product_Category 34954 non-null int64
5 Unit 34954 non-null object
6 Quantity 34954 non-null float64
7 Price 34954 non-null int64
8 Net_Sales 34954 non-null int64
9 Cash_Discount 34954 non-null int64
10 Customer_Amount 34954 non-null int64
11 MRP 34954 non-null float64
12 Pack_Size 34954 non-null float64
13 Pack_Unit_Id 34954 non-null object
14 State 34954 non-null object
15 Zone 34954 non-null object
16 Master_Category 34954 non-null int64
17 Size 34954 non-null object
18 Colour_Specification 34954 non-null object
dtypes: float64(3), int64(8), object(8)
memory usage: 5.1+ MB
ECom_Sales.duplicated()
0 False
1 False
2 False
3 False
4 False
...
34949 False
34950 False
34951 False
34952 False
34953 False
Length: 34954, dtype: bool
ECom_Sales.drop(columns=["Colour_Specification"])
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP
... ... ... ... ... ... ... ... ... ... ... ... ...
ECom_Sales.dtypes
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 2/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
ParentSKU object
Site_Id object
Year int64
Month int64
Product_Category int64
Unit object
Quantity float64
Price int64
Net_Sales int64
Cash_Discount int64
Customer_Amount int64
MRP float64
Pack_Size float64
Pack_Unit_Id object
State object
Zone object
Master_Category int64
Size object
Colour_Specification object
dtype: object
ECom_Sales.isna()
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP P
0 False False False False False False False False False False False False
1 False False False False False False False False False False False False
2 False False False False False False False False False False False False
3 False False False False False False False False False False False False
4 False False False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ...
34949 False False False False False False False False False False False False
34950 False False False False False False False False False False False False
34951 False False False False False False False False False False False False
34952 False False False False False False False False False False False False
34953 False False False False False False False False False False False False
ECom_Sales.isna().sum()/ECom_Sales.shape[0]
ParentSKU 0.0
Site_Id 0.0
Year 0.0
Month 0.0
Product_Category 0.0
Unit 0.0
Quantity 0.0
Price 0.0
Net_Sales 0.0
Cash_Discount 0.0
Customer_Amount 0.0
MRP 0.0
Pack_Size 0.0
Pack_Unit_Id 0.0
State 0.0
Zone 0.0
Master_Category 0.0
Size 0.0
Colour_Specification 0.0
dtype: float64
ECom_Sales.duplicated()
0 False
1 False
2 False
3 False
4 False
...
34949 False
34950 False
34951 False
34952 False
34953 False
Length: 34954, dtype: bool
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 3/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
ECom_Sales.drop_duplicates()
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Discount Customer_Amount MRP
... ... ... ... ... ... ... ... ... ... ... ... ...
#Non Categorical
ECom_Sales.describe()
count 34954.000000 34954.000000 34954.000000 34954.000000 34954.000000 3.495400e+04 34954.000000 3.495400e+04 34954.0
mean 2019.201779 6.592293 5.649682 648.388627 170.610173 6.821097e+04 0.828003 6.882714e+04 267.8
std 1.153980 3.480937 2.305392 1692.646968 151.154044 1.311820e+05 7.367032 1.316198e+05 229.2
min 2018.000000 1.000000 0.000000 1.000000 27.000000 2.800000e+01 0.000000 2.800000e+01 33.8
25% 2018.000000 4.000000 5.000000 113.000000 95.000000 1.524125e+04 0.000000 1.536900e+04 153.3
50% 2019.000000 7.000000 7.000000 275.000000 120.000000 3.379600e+04 0.000000 3.413150e+04 190.0
75% 2020.000000 10.000000 7.000000 616.000000 178.000000 7.258750e+04 0.000000 7.341675e+04 295.0
max 2021.000000 12.000000 8.000000 52226.000000 1023.000000 5.448294e+06 325.000000 5.448294e+06 1500.0
#Including Categorical
ECom_Sales.describe(include='all')
ParentSKU Site_Id Year Month Product_Category Unit Quantity Price Net_Sales Cash_Disco
count 34954 34954 34954.000000 34954.000000 34954.000000 34954 34954.000000 34954.000000 3.495400e+04 34954.000
freq 803 1607 NaN NaN NaN 34953 NaN NaN NaN
mean NaN NaN 2019.201779 6.592293 5.649682 NaN 648.388627 170.610173 6.821097e+04 0.828
std NaN NaN 1.153980 3.480937 2.305392 NaN 1692.646968 151.154044 1.311820e+05 7.367
min NaN NaN 2018.000000 1.000000 0.000000 NaN 1.000000 27.000000 2.800000e+01 0.000
25% NaN NaN 2018.000000 4.000000 5.000000 NaN 113.000000 95.000000 1.524125e+04 0.000
50% NaN NaN 2019.000000 7.000000 7.000000 NaN 275.000000 120.000000 3.379600e+04 0.000
75% NaN NaN 2020.000000 10.000000 7.000000 NaN 616.000000 178.000000 7.258750e+04 0.000
max NaN NaN 2021.000000 12.000000 8.000000 NaN 52226.000000 1023.000000 5.448294e+06 325.000
ECom_Sales.plot(x='Quantity',y='Price',kind = 'scatter')
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 4/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
<matplotlib.axes._subplots.AxesSubplot at 0x7f5b782d8340>
ECom_Sales.groupby('Product_Category').Quantity.sum().plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7f5b780e1ac0>
import·pandas·as·pd
import·numpy·as·np
import·statsmodels.api·as·sm
import·pandas.util.testing·as·tm
from·sklearn.model_selection·import·train_test_split
SMART=pd.read_excel(r"/content/drive/MyDrive/Python·programming/Retail.xlsx")·
display(SMART)
<ipython-input-39-9381febfad9d>:4: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.
import pandas.util.testing as tm
Store Date Weekly_Sales Temperature Fuel_Price CPI Unemployment
SMART=pd.read_excel(r"/content/drive/MyDrive/Python programming/Retail.xlsx")
display(SMART)
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 5/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
6432 45 Store
2012-10-12 734464.36
Weekly_Sales Temperature54.47
Fuel_Price4.000 192.327265 8.667
CPI Unemployment
6433 45 2012-10-19
count 6435.000000 718125.53
6.435000e+03 6435.00000056.47
6435.0000003.969 192.330854 6435.000000
6435.000000 8.667
6434
mean 45 2012-10-26
23.000000 760281.4360.66378258.85 3.3586073.882
1.046965e+06 192.308899
171.578394 8.667
7.999151
6435
stdrows ×12.988182
7 columns 5.643666e+05 18.444933 0.459020 39.356712 1.875885
SMART.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6435 entries, 0 to 6434
Data columns (total 7 columns):
# Column Non-Null Count Dtype
SMART.dtypes
Store int64
Date datetime64[ns]
Weekly_Sales float64
Temperature float64
Fuel_Price float64
CPI float64
Unemployment float64
dtype: object
#Independetn Variable
CP=pd.DataFrame(SMART,columns=['CPI','Unemployment','Fuel_Price'])
X=sm.add_constant(CP)
X.head(5)
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 6/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 7/7
12/30/22, 12:35PM Untitled35.ipynb - Colaboratory
https://colab.research.google.com/drive/1Y04qDxqyKa-fKrl6WhkGnWtTuMJlOBiS?usp=sharing#scrollTo=OCU9fW7esLlY&printMode=true 8/7