PDF Notebook
PDF Notebook
In [2]: df=pd.read_csv('aerofit.csv')
In [3]: df.head(10)
Out[3]:
Product Age Gender Education MaritalStatus Usage Fitness Income Miles
In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 180 non-null object
1 Age 180 non-null int64
2 Gender 180 non-null object
3 Education 180 non-null int64
4 MaritalStatus 180 non-null object
5 Usage 180 non-null int64
6 Fitness 180 non-null int64
7 Income 180 non-null int64
8 Miles 180 non-null int64
dtypes: int64(6), object(3)
memory usage: 12.8+ KB
In [5]: df.isnull().sum()
Out[5]: Product 0
Age 0
Gender 0
Education 0
MaritalStatus 0
Usage 0
Fitness 0
Income 0
Miles 0
dtype: int64
In [6]: df['Product'].value_counts()
Out[6]: KP281 80
KP481 60
KP781 40
Name: Product, dtype: int64
In [12]: df.groupby('Product')['Education'].median()
Out[12]: Product
KP281 16.0
KP481 16.0
KP781 18.0
Name: Education, dtype: float64
In [13]: df.groupby('Product')['Education'].describe()
Out[13]:
count mean std min 25% 50% 75% max
Product
Out[19]: <AxesSubplot:>
In [ ]: #200 products of KP781. Who's gonna buy more? What's the number?
In [ ]: #2300 PRODUCTS OF TYPE 281, WHO'S GONNA BUY MORE Married/ Unmarried and by what percentage
Out[22]:
Product KP281 KP481 KP781
Gender
Female 40 29 7
Male 40 31 33
Out[25]:
Product KP281 KP481 KP781
Gender
In [ ]: #1200 pieces of KP281 are sold, how many of them are bought by females = 600x
Out[26]:
Product KP281 KP481 KP781 All
Gender
Female 40 29 7 76
Male 40 31 33 104
All 80 60 40 180
In [ ]: #In my whole data, what's the contribution of males buying kp481 -- Joint probability - 17.2%
Out[28]:
Product KP281 KP481 KP781 All
Gender
In [ ]: #insights-----