0% found this document useful (0 votes)
194 views

Cardio Good Fitness Dataset

The market research team at Adright analyzed customer data from CardioGood Fitness to identify profiles of customers for each treadmill product. The team collected data on 180 individuals who purchased a treadmill in the prior 3 months, including variables like product purchased, demographics, income, and expected fitness goals. Descriptive statistics showed differences across products in customer characteristics like age, income, and planned miles walked/run per week.

Uploaded by

Techie Guys
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views

Cardio Good Fitness Dataset

The market research team at Adright analyzed customer data from CardioGood Fitness to identify profiles of customers for each treadmill product. The team collected data on 180 individuals who purchased a treadmill in the prior 3 months, including variables like product purchased, demographics, income, and expected fitness goals. Descriptive statistics showed differences across products in customer characteristics like age, income, and planned miles walked/run per week.

Uploaded by

Techie Guys
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

8/24/22, 1:57 PM CardioGoodFitnessDataset

Cardio Good Fitness Case Study -


Descriptive Statistics
The market research team at Adright is assigned the task to identify the profile of the typical
customer for each treadmill product offered by CarfioGood Fitness. The market research
team decides to investigate whether there are differences across the product lines with
respect to customer characteristics. The team decides to coolect data on individuals who
purchased a treadmill at a CardioGoodFitness retail store during the prior three months. The
data are stored in the CardioGoodFItness.csv file.

The team identifies the following customer variables to study:


product purchased, TM195, TM498, or TM798;
gender;
age, in years;
education, in years;
relationship status, single or partnered;
annual household income ;
average number of times the customer plans to use the treadmill each week;
average number of miles the customer expects to walk/run each week;
and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

In [1]: # Import required libraries.

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sb

In [2]: # Load data into the dataframe.

customer_data = pd.read_csv("CardioGoodFitness.csv")

In [3]: # cross checking the data were loaded correctly or not in dataframe.

customer_data.head(5)

Out[3]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles

0 TM195 18 Male 14 Single 3 4 29562 112

1 TM195 19 Male 15 Single 2 3 31836 75

2 TM195 19 Female 14 Partnered 4 3 30699 66

3 TM195 19 Male 12 Single 3 3 32973 85

4 TM195 20 Male 13 Partnered 4 2 35247 47

In [4]: # Know more about data.

customer_data.info()

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 1/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 180 entries, 0 to 179

Data columns (total 9 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Product 180 non-null object

1 Age 180 non-null int64

2 Gender 180 non-null object

3 Education 180 non-null int64

4 MaritalStatus 180 non-null object

5 Usage 180 non-null int64

6 Fitness 180 non-null int64

7 Income 180 non-null int64

8 Miles 180 non-null int64

dtypes: int64(6), object(3)

memory usage: 12.8+ KB

Observation:

1) There are total 9 columns, out of 3 are categorical and remaining 6 are numerical column.

2) There are total 180 rows, none of the of column row Null.

In [5]: # Creating a copy of the data beforing alterning anything.

customer_data_copy = customer_data.copy()

In [6]: # Describe statistical information about data.

customer_data.describe() # it will show all the numerical column data.

Out[6]: Age Education Usage Fitness Income Miles

count 180.000000 180.000000 180.000000 180.000000 180.000000 180.000000

mean 28.788889 15.572222 3.455556 3.311111 53719.577778 103.194444

std 6.943498 1.617055 1.084797 0.958869 16506.684226 51.863605

min 18.000000 12.000000 2.000000 1.000000 29562.000000 21.000000

25% 24.000000 14.000000 3.000000 3.000000 44058.750000 66.000000

50% 26.000000 16.000000 3.000000 3.000000 50596.500000 94.000000

75% 33.000000 16.000000 4.000000 4.000000 58668.000000 114.750000

max 50.000000 21.000000 7.000000 5.000000 104581.000000 360.000000

In [7]: # This method we use to show statistical information about categorical as we as num
customer_data.describe(include='all') # not much information carry about categorica

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 2/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Out[7]: Product Age Gender Education MaritalStatus Usage Fitness In

count 180 180.000000 180 180.000000 180 180.000000 180.000000 180.0

unique 3 NaN 2 NaN 2 NaN NaN

top TM195 NaN Male NaN Partnered NaN NaN

freq 80 NaN 104 NaN 107 NaN NaN

mean NaN 28.788889 NaN 15.572222 NaN 3.455556 3.311111 53719.5

std NaN 6.943498 NaN 1.617055 NaN 1.084797 0.958869 16506.6

min NaN 18.000000 NaN 12.000000 NaN 2.000000 1.000000 29562.0

25% NaN 24.000000 NaN 14.000000 NaN 3.000000 3.000000 44058.7

50% NaN 26.000000 NaN 16.000000 NaN 3.000000 3.000000 50596.5

75% NaN 33.000000 NaN 16.000000 NaN 4.000000 4.000000 58668.0

max NaN 50.000000 NaN 21.000000 NaN 7.000000 5.000000 104581.0

We have seen in dataset have total 9 coloumns, out of 6 are the numerical and 3 are
categorical. In numerical data further classified into the contineous and discrete . So the
column Education, fitness and usege are discrete in nature so we convert it into the
category using astype() method

In [8]: # chnaging numerical categorical column into category type object.

for col in ['Education','Fitness','Usage']:

customer_data[col] = customer_data[col].astype('category')

With 5 point summary,we can see the following things:

range of the data


Interquartile range
difference between mean and median(50%).
if the mean and median coincide or closer to each other, not much skewness in data
if the mean is on left side of the median , data is left skewed
if the mean is on right side of the median, data is right skewed
if the standard deviation is very large (keeping in view with the range) , the data is very
sparsely distributed.

In [9]: # describe the numerical data and transpose result inorder to understand better.

customer_data.describe().T

Out[9]: count mean std min 25% 50% 75% max

Age 180.0 28.788889 6.943498 18.0 24.00 26.0 33.00 50.0

Income 180.0 53719.577778 16506.684226 29562.0 44058.75 50596.5 58668.00 104581.0

Miles 180.0 103.194444 51.863605 21.0 66.00 94.0 114.75 360.0

Age.
The range of age of peoples is 18 to 50.

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 3/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

The middle representation population 24 to 33.


The not much differece between mean and median, Thus not much skewness in data.

Income.
The range of income of peoples is 30 k to 100 k.
The most representation of polutation income 44 k to 58 k.
differance between mean and mode is more, mean is right side of median thus data
right skewed.
Very high standard deviaion.

Miles.
The range of miles : 21 to 360
The most of representation of population 66 to 114.
Mean right side of data : data right skewed.
very high standard deviaion.

Visualization
In [10]: # Sample matplotlib Exaple.

a = [1,2,5,6,7]

b = [1,2,3,4,5]

c = [9,8,3,4,5]

plt.plot(a,b)

plt.plot(a,c)

plt.xlabel('a Data')

plt.ylabel('b Data and c Data')

plt.title('Test Data')

plt.legend(['b Data','c Data'])

plt.show(a,b);

Univariate Analysis
In [11]: plt.hist(customer_data.Age)

plt.show();

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 4/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [12]: bins = [15,20,25,30,35,40,45,50]

plt.hist(customer_data.Age, bins, edgecolor = 'black')

plt.title('Histogram of age')

plt.legend('age');

We can observed most of the customers are aged between 20 to 30.

In [13]: bins = [18,20,22,25,28,30,32,35,40,45,50]

plt.hist(customer_data.Age,bins, edgecolor = 'red')

plt.title("Histogram")

plt.show()

Observation
localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 5/27
8/24/22, 1:57 PM CardioGoodFitnessDataset

1. The number of customers in the age range 22 to 28 is the gretest.


2. We have very less customers above age 40 ( Near 10 %)
3. The 90 % customers are in the age range 20 to 40.

In [14]: customer_data.Product.unique()

array(['TM195', 'TM498', 'TM798'], dtype=object)


Out[14]:

In [15]: group_data = customer_data.groupby(customer_data.Product)

TM195_data = group_data.get_group('TM195')

TM498_data = group_data.get_group('TM498')

TM798_data = group_data.get_group('TM798')

In [16]: bins = [15,20,25,30,35,40,45,50]

plt.subplots(3,1,figsize= (8,10))

plt.subplot(311)

plt.title('Age')

plt.hist(TM195_data.Age, bins, edgecolor = 'red')

plt.ylim(0,30)

plt.xlabel('TM195')

plt.subplot(312)

plt.hist(TM498_data.Age, bins, edgecolor = 'red')

plt.ylim(0,30)

plt.xlabel('TM498')

plt.subplot(313)

plt.hist(TM798_data.Age, bins, edgecolor = 'red')

plt.ylim(0,30)

plt.xlabel('TM798')

plt.show();

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 6/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Observation :

20 to 30 age group are the major buyers of TM195.


20 to 30 age group are the major buyers of TM498.
20 to 30 age group are the major buyers of TM798.

Inference :

The most of buyers are in the age group 20 to 30. We cannot determine the preference
of the customer based on their age.

In [17]: plt.hist(TM195_data.Age, edgecolor = 'Black')

plt.hist(TM498_data.Age, edgecolor = 'black')

plt.hist(TM798_data.Age, edgecolor = 'black')

plt.legend(['TM195','TM498','TM798']);

plt.show();

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 7/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

As seen earlier the age is distributed across all products. We cannot categorize customer
based on age.

In [18]: bins = range(0,400,50)

plt.subplots(3, 1, figsize = (8,10))

plt.subplot(311)

plt.title('Miles')

plt.hist(TM195_data.Miles, bins, edgecolor = 'red')

plt.ylim(0,60)

plt.xlabel('TM195')

plt.subplot(312)

plt.hist(TM498_data.Miles, bins, edgecolor= 'red')

plt.ylim(0,60)

plt.xlabel('TM498')

plt.subplot(313)

plt.hist(TM798_data.Miles, bins, edgecolor= 'red')

plt.ylim(0,60)

plt.xlabel('TM798')

plt.show()

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 8/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Observation :

TM195 is bought by people who planning to run upto 200 miles.


TM498 is bought by people who planning to run upto 250 miles.
TM798 is bought by people who planning to run upto 360 miles.

Inference :

TM195 is a low end model, used by beginners and non-professional.


TM498 is a mediam range model.
TM798 is a high end model preferred by people planning to run more number of miles.
Based on the number of miles a user is planning to run, We can predict preference of
customers.

In [19]: plt.ylim(0,16)

plt.xlim(25000,110000,10000)

plt.hist(TM195_data.Income, edgecolor = 'black')

plt.hist(TM498_data.Income, edgecolor = 'black')

plt.hist(TM798_data.Income, edgecolor = 'black')

plt.legend(['TM195','TM498','TM798'])

plt.title('Incomes')

plt.show();

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 9/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Observation :

TM195 is bought by people whose income ranges from 30k to 65k.


TM498 is bought by people whose income ranges from 30k to 65k.
TM798 is bought by people whose income ranges from 48k to 100k

Inference :

TM195 is low cost as compared to other models.


TM498 is also in lower price range.
TM798 is an expensive model compared to other.
Income range is also good predictor in finding the prefernce the customers.

In [20]: sb.distplot(customer_data.Income);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\distributi
ons.py:2619: FutureWarning: `distplot` is a deprecated function and will be remove
d in a future version. Please adapt your code to use either `displot` (a figure-le
vel function with similar flexibility) or `histplot` (an axes-level function for h
istograms).

warnings.warn(msg, FutureWarning)

The income range of the people is showing two peaks and looks like the data is rightskewed
indicating outlier on the right.

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 10/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [21]: sb.boxplot(customer_data.Income);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Observation :

As seen in the earlier plot, outliers are present on the higher value of data.
The data is uniformly distributed in the IQR.

Inference :

Customers are widely spread in the higher income range.

In [22]: sb.distplot(customer_data.Miles)

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\distributi
ons.py:2619: FutureWarning: `distplot` is a deprecated function and will be remove
d in a future version. Please adapt your code to use either `displot` (a figure-le
vel function with similar flexibility) or `histplot` (an axes-level function for h
istograms).

warnings.warn(msg, FutureWarning)

<AxesSubplot:xlabel='Miles', ylabel='Density'>
Out[22]:

Our customers are widely spread in the higher miles range.

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 11/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [23]: sb.boxplot(customer_data.Income);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Observation :

Outlier are present on the higher values of the data.


Customer are planning to run more then 180 miles.
The data is uniformly distributed in IQR.

Inference :

Customer are widely spread in the higher miles range.

In [24]: customer_data[customer_data['Miles'] > 180]

Out[24]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles

23 TM195 24 Female 16 Partnered 5 5 44343 188

84 TM498 21 Female 14 Partnered 5 4 34110 212

142 TM798 22 Male 18 Single 4 5 48556 200

148 TM798 24 Female 16 Single 5 5 52291 200

152 TM798 25 Female 18 Partnered 5 5 61006 200

155 TM798 25 Male 18 Partnered 6 5 75946 240

166 TM798 29 Male 14 Partnered 7 5 85906 300

167 TM798 30 Female 16 Partnered 6 5 90886 280

170 TM798 31 Male 16 Partnered 6 5 89641 260

171 TM798 33 Female 18 Partnered 4 5 95866 200

173 TM798 35 Male 16 Partnered 4 5 92131 360

175 TM798 40 Male 21 Single 6 5 83416 200

176 TM798 42 Male 18 Single 5 4 89641 200

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 12/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [25]: sb.countplot(customer_data.Product);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Observation :

The TM195 is most sold model in last 3 months.


The rangking of order -> TM195 > TM498 > TM798.
The ration approxmiately 4 : 3 : 2.
The number of customers prefering TM195 is most twice as the once prefering TM798.

Inference :

We can guess that TM195 is more econamical or popular model compared to the other
two models.

In [26]: sb.countplot(customer_data.Gender);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 13/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Males to Female customer are in ratio 4 : 3.

In [27]: sb.countplot(customer_data.Gender, hue = customer_data.Product);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

In [28]: sb.countplot(customer_data.Product, hue = customer_data.Gender)

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

<AxesSubplot:xlabel='Product', ylabel='count'>
Out[28]:

Observation :

TM195 are bought by equal number of male and female customers.


TM498 are bought by slightly more male customer then female customers.
TM798 are bought by more number of male customer then female customers.
The number of male and female does not varry much between the products.

Inferece :

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 14/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Gender is not great predictor for the preference of the customer.

In [29]: sb.countplot(customer_data.MaritalStatus);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

In [30]: sb.countplot(customer_data.MaritalStatus, hue = customer_data.Product)

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

<AxesSubplot:xlabel='MaritalStatus', ylabel='count'>
Out[30]:

In [31]: sb.countplot(customer_data.Education, hue = customer_data.Product)

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

<AxesSubplot:xlabel='Education', ylabel='count'>
Out[31]:

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 15/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Observation :

Customer with education of 16 years are the gretest buyer of the product.
Again Education is not best predictor of customer preference.

In [32]: sb.countplot(customer_data.Usage , hue = customer_data.Product );

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Observation :

The TM195 used by the customers who are planning to use from 2 to 5 times a week.
The TM498 used by the customers who are planning to use from 2 to 5 times a week.
The TM798 used by the customers who are planning to use from 3 to 7 times a week.

Inference :

TM195 and TM498 are preferred by customer who are planning to use moderately.
TM798 are prefered by customers who planning a heavy usges.

In [33]: sb.countplot(customer_data.Fitness, hue = customer_data.Product);

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 16/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Observation :

The TM195 used by customers whose self rated fitness 1 to 5.


The TM498 used by customers whose self rated fitness 1 to 4.
The TM798 used by customers whose self rated fitness 3 to 5.
Many customers have rated themselves with high fightness.

Inference :

TM195 and TM498 are bought by customers of varied fitness level.


TM798 are who rated themselves with high fitness.

In [34]: sb.countplot(customer_data.Usage, hue = customer_data.Fitness);

C:\Users\LENOVO\anaconda3\envs\newenvironment\lib\site-packages\seaborn\_decorator
s.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From vers
ion 0.12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretation.

warnings.warn(

Bivariate Analysis.
localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 17/27
8/24/22, 1:57 PM y CardioGoodFitnessDataset

ScatterPlot:

if the data in y axis, increases with the data in x axis , then data is positively correlated
if the data in y axis, decreases with the data in x axis , then data is negatively correlated
if the data in y axis, does not increase or decrease with the data in x axis , then data
does not show any correlation

In [35]: sb.jointplot(x = 'Age', y = 'Miles', data = customer_data);

There is no definite pattern between Age and Miles.

In [36]: sb.jointplot( x = 'Age', y = 'Income', data=customer_data);

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 18/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Age and Income seem to be positively correlated.

In [37]: sb.jointplot( x = 'Income', y = 'Miles', data = customer_data);

The slight positive correlation can be observed between income and miles. This is may be
due to the fact that the high income people are buying TM798 model which has higher
range of miles.

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 19/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [38]: sb.catplot( x = 'Gender', y = 'Income', data = customer_data);

Male customer have higher income range compared to female customer.

In [39]: sb.catplot( x = "MaritalStatus", y = 'Miles', data = customer_data, kind = 'box')

<seaborn.axisgrid.FacetGrid at 0x1d7d16f2790>
Out[39]:

Patenered customers are planning to run more miles compared to single customeres.

Multivariate Analysis
In [40]: sb.catplot( x = "Product", y = 'Miles', hue = 'Gender', data = customer_data,kind =

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 20/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

Observation :

TM195 : average planned miles male = 90 ; female = 75.


TM498 : average planned miles male = 90 ; female = 90.
TM798 : average planned miles male = 160 ; female = 175.

Inference :

Mostly male and female are planning to run equally.


Average planned number of miles run by user of TM195 and TM498 are almost similar.
AVerage planned number of miles run by user of TM798 is in the high range.

In [41]: sb.pairplot(customer_data);

sb.pairplot(customer_data, diag_kind = 'kde');

sb.pairplot(customer_data, hue = 'Product');

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 21/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 22/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 23/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [42]: sb.pairplot(customer_data_copy);

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 24/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

In [43]: corr = customer_data_copy.corr();

corr

Out[43]: Age Education Usage Fitness Income Miles

Age 1.000000 0.280496 0.015064 0.061105 0.513414 0.036618

Education 0.280496 1.000000 0.395155 0.410581 0.625827 0.307284

Usage 0.015064 0.395155 1.000000 0.668606 0.519537 0.759130

Fitness 0.061105 0.410581 0.668606 1.000000 0.535005 0.785702

Income 0.513414 0.625827 0.519537 0.535005 1.000000 0.543473

Miles 0.036618 0.307284 0.759130 0.785702 0.543473 1.000000

Heatmap help in visualization of diffrent strengths. Here it is used to visualization


correlation.

In [44]: plt.figure(figsize=(14,4))

# sb.heatmap(corr)

sb.heatmap(corr,annot = True, cmap = 'RdYlGn');

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 25/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

We can find high correlation between following values.

Miles and Usage


Miles and Fitness
income and education
fitnes and usage

In [45]: customer_data_copy.groupby(customer_data_copy.Product).mean()

Out[45]: Age Education Usage Fitness Income Miles

Product

TM195 28.55 15.037500 3.087500 2.9625 46418.025 82.787500

TM498 28.90 15.116667 3.066667 2.9000 48973.650 87.933333

TM798 29.10 17.325000 4.775000 4.6250 75441.575 166.900000

Summary
TM195 most economic and beginner choice; TM798 Expert
level fitness choice.

TM195 and TM498 customer character does not vary much.


TM195

Most popular.
prefered amoung people with lower income range, fitness level less then equal to 3,
usage less then 4 times in a week.

TM498

Less popular then TM195.


low income group, moderate fitness level.

TM798

Least sold.
High end model.

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 26/27


8/24/22, 1:57 PM CardioGoodFitnessDataset

prefered amoung the people with higher income range, fitness level above 4 and usage
more then 4 times in a week.

Technical summary.
Many data are right skewed.
may encounter class imbalance problem.

In [ ]:

localhost:8889/nbconvert/html/Great Learning/CardioGoodFitnessDataset.ipynb?download=false 27/27

You might also like