Cardio Good Fitness Dataset
Cardio Good Fitness Dataset
import pandas as pd
import seaborn as sb
customer_data = pd.read_csv("CardioGoodFitness.csv")
In [3]: # cross checking the data were loaded correctly or not in dataframe.
customer_data.head(5)
Out[3]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles
customer_data.info()
<class 'pandas.core.frame.DataFrame'>
Observation:
1) There are total 9 columns, out of 3 are categorical and remaining 6 are numerical column.
2) There are total 180 rows, none of the of column row Null.
customer_data_copy = customer_data.copy()
In [7]: # This method we use to show statistical information about categorical as we as num
customer_data.describe(include='all') # not much information carry about categorica
We have seen in dataset have total 9 coloumns, out of 6 are the numerical and 3 are
categorical. In numerical data further classified into the contineous and discrete . So the
column Education, fitness and usege are discrete in nature so we convert it into the
category using astype() method
customer_data[col] = customer_data[col].astype('category')
In [9]: # describe the numerical data and transpose result inorder to understand better.
customer_data.describe().T
Age.
The range of age of peoples is 18 to 50.
Income.
The range of income of peoples is 30 k to 100 k.
The most representation of polutation income 44 k to 58 k.
differance between mean and mode is more, mean is right side of median thus data
right skewed.
Very high standard deviaion.
Miles.
The range of miles : 21 to 360
The most of representation of population 66 to 114.
Mean right side of data : data right skewed.
very high standard deviaion.
Visualization
In [10]: # Sample matplotlib Exaple.
a = [1,2,5,6,7]
b = [1,2,3,4,5]
c = [9,8,3,4,5]
plt.plot(a,b)
plt.plot(a,c)
plt.xlabel('a Data')
plt.title('Test Data')
plt.show(a,b);
Univariate Analysis
In [11]: plt.hist(customer_data.Age)
plt.show();
plt.title('Histogram of age')
plt.legend('age');
plt.title("Histogram")
plt.show()
Observation
localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 5/27
8/24/22, 1:57 PM CardioGoodFitnessDataset
In [14]: customer_data.Product.unique()
TM195_data = group_data.get_group('TM195')
TM498_data = group_data.get_group('TM498')
TM798_data = group_data.get_group('TM798')
plt.subplots(3,1,figsize= (8,10))
plt.subplot(311)
plt.title('Age')
plt.ylim(0,30)
plt.xlabel('TM195')
plt.subplot(312)
plt.ylim(0,30)
plt.xlabel('TM498')
plt.subplot(313)
plt.ylim(0,30)
plt.xlabel('TM798')
plt.show();
Observation :
Inference :
The most of buyers are in the age group 20 to 30. We cannot determine the preference
of the customer based on their age.
plt.legend(['TM195','TM498','TM798']);
plt.show();
As seen earlier the age is distributed across all products. We cannot categorize customer
based on age.
plt.subplot(311)
plt.title('Miles')
plt.ylim(0,60)
plt.xlabel('TM195')
plt.subplot(312)
plt.ylim(0,60)
plt.xlabel('TM498')
plt.subplot(313)
plt.ylim(0,60)
plt.xlabel('TM798')
plt.show()
Observation :
Inference :
In [19]: plt.ylim(0,16)
plt.xlim(25000,110000,10000)
plt.legend(['TM195','TM498','TM798'])
plt.title('Incomes')
plt.show();
Observation :
Inference :
In [20]: sb.distplot(customer_data.Income);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\distributi
ons.py:2619: FutureWarning: `distplot` is a deprecated function and will be remove
d in a future version. Please adapt your code to use either `displot` (a figure-le
vel function with similar flexibility) or `histplot` (an axes-level function for h
istograms).
warnings.warn(msg, FutureWarning)
The income range of the people is showing two peaks and looks like the data is rightskewed
indicating outlier on the right.
In [21]: sb.boxplot(customer_data.Income);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Observation :
As seen in the earlier plot, outliers are present on the higher value of data.
The data is uniformly distributed in the IQR.
Inference :
In [22]: sb.distplot(customer_data.Miles)
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\distributi
ons.py:2619: FutureWarning: `distplot` is a deprecated function and will be remove
d in a future version. Please adapt your code to use either `displot` (a figure-le
vel function with similar flexibility) or `histplot` (an axes-level function for h
istograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='Miles', ylabel='Density'>
Out[22]:
In [23]: sb.boxplot(customer_data.Income);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Observation :
Inference :
Out[24]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles
In [25]: sb.countplot(customer_data.Product);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Observation :
Inference :
We can guess that TM195 is more econamical or popular model compared to the other
two models.
In [26]: sb.countplot(customer_data.Gender);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
<AxesSubplot:xlabel='Product', ylabel='count'>
Out[28]:
Observation :
Inferece :
In [29]: sb.countplot(customer_data.MaritalStatus);
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
<AxesSubplot:xlabel='MaritalStatus', ylabel='count'>
Out[30]:
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
<AxesSubplot:xlabel='Education', ylabel='count'>
Out[31]:
Observation :
Customer with education of 16 years are the gretest buyer of the product.
Again Education is not best predictor of customer preference.
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Observation :
The TM195 used by the customers who are planning to use from 2 to 5 times a week.
The TM498 used by the customers who are planning to use from 2 to 5 times a week.
The TM798 used by the customers who are planning to use from 3 to 7 times a week.
Inference :
TM195 and TM498 are preferred by customer who are planning to use moderately.
TM798 are prefered by customers who planning a heavy usges.
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Observation :
Inference :
C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
Bivariate Analysis.
localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 17/27
8/24/22, 1:57 PM y CardioGoodFitnessDataset
ScatterPlot:
if the data in y axis, increases with the data in x axis , then data is positively correlated
if the data in y axis, decreases with the data in x axis , then data is negatively correlated
if the data in y axis, does not increase or decrease with the data in x axis , then data
does not show any correlation
The slight positive correlation can be observed between income and miles. This is may be
due to the fact that the high income people are buying TM798 model which has higher
range of miles.
<seaborn.axisgrid.FacetGrid at 0x1d7d16f2790>
Out[39]:
Patenered customers are planning to run more miles compared to single customeres.
Multivariate Analysis
In [40]: sb.catplot( x = "Product", y = 'Miles', hue = 'Gender', data = customer_data,kind =
Observation :
Inference :
In [41]: sb.pairplot(customer_data);
In [42]: sb.pairplot(customer_data_copy);
corr
In [44]: plt.figure(figsize=(14,4))
# sb.heatmap(corr)
In [45]: customer_data_copy.groupby(customer_data_copy.Product).mean()
Product
Summary
TM195 most economic and beginner choice; TM798 Expert
level fitness choice.
Most popular.
prefered amoung people with lower income range, fitness level less then equal to 3,
usage less then 4 times in a week.
TM498
TM798
Least sold.
High end model.
prefered amoung the people with higher income range, fitness level above 4 and usage
more then 4 times in a week.
Technical summary.
Many data are right skewed.
may encounter class imbalance problem.
In [ ]: