0% found this document useful (0 votes)
20 views

Heart Attacks Analysis

This document contains code for exploring and visualizing data from a cardiovascular disease dataset. It loads the data, performs some initial exploration by viewing the data types and counts of missing values. It then creates various visualizations to understand the distributions, relationships and outliers of variables like age, blood pressure, and cholesterol. These include histograms, count plots, pair plots and heatmaps. It also creates pie charts comparing the distributions of a 'thal' variable between patients with and without heart disease. The visualizations aid in analyzing patterns in the data related to the presence of heart conditions.

Uploaded by

Narasimha
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Heart Attacks Analysis

This document contains code for exploring and visualizing data from a cardiovascular disease dataset. It loads the data, performs some initial exploration by viewing the data types and counts of missing values. It then creates various visualizations to understand the distributions, relationships and outliers of variables like age, blood pressure, and cholesterol. These include histograms, count plots, pair plots and heatmaps. It also creates pie charts comparing the distributions of a 'thal' variable between patients with and without heart disease. The visualizations aid in analyzing patterns in the data related to the presence of heart conditions.

Uploaded by

Narasimha
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

In [ ]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]: df = pd.read_excel('data.xlsx')

In [ ]: # Exploratory Data Ananlysis (EDA) Part

In [5]: df.head()

Out[5]: age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 63 1 3 145 233 1 0 150 0 2.3 0 0 1 1

1 37 1 2 130 250 0 1 187 0 3.5 0 0 2 1

2 41 0 1 130 204 0 0 172 0 1.4 2 0 2 1

3 56 1 1 120 236 0 1 178 0 0.8 2 0 2 1

4 57 0 0 120 354 0 1 163 1 0.6 2 0 2 1

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 303 non-null int64
1 sex 303 non-null int64
2 cp 303 non-null int64
3 trestbps 303 non-null int64
4 chol 303 non-null int64
5 fbs 303 non-null int64
6 restecg 303 non-null int64
7 thalach 303 non-null int64
8 exang 303 non-null int64
9 oldpeak 303 non-null float64
10 slope 303 non-null int64
11 ca 303 non-null int64
12 thal 303 non-null int64
13 target 303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.3 KB

In [8]: df.isnull().sum()
Out[8]: age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

In [11]: variables = pd.read_excel('variable description.xlsx')


variables

Out[11]: variable description

0 age age in years

1 sex (1 = male; 0 = female)

2 cp chest pain type

3 trestbps resting blood pressure (in mm Hg on admission...

4 chol serum cholestoral in mg/dl

5 fbs (fasting blood sugar > 120 mg/dl) (1 = true; ...

6 restecg resting electrocardiographic results

7 thalach maximum heart rate achieved

8 exang exercise induced angina (1 = yes; 0 = no)

9 oldpeak ST depression induced by exercise relative to...

10 slope the slope of the peak exercise ST segment

11 ca number of major vessels (0-3) colored by flou...

12 thal 3 = normal; 6 = fixed defect; 7 = reversable ...

13 target 1 or 0

In [24]: plt.figure(figsize=(12,8))
df["age"].value_counts().plot(kind="bar")
plt.title('Age Distribution')
plt.show()
In [30]: plt.figure(figsize=(12,8))
sns.countplot(x = df['age'], hue = 'target', data=df)
plt.show()

In [32]: plt.figure(figsize=(12,8))
sns.histplot(x = df['age'], hue = 'target', data=df)
plt.show()
In [33]: plt.figure(figsize=(12,8))
sns.countplot(x = df['sex'], hue = 'target', data=df)
plt.show()

#creating subplots

In [43]: df.nunique()
Out[43]: age 41
sex 2
cp 4
trestbps 49
chol 152
fbs 2
restecg 3
thalach 91
exang 2
oldpeak 40
slope 3
ca 5
thal 4
target 2
dtype: int64

In [44]: fig,axes=plt.subplots(nrows=3, ncols=3, figsize=(12,10))


cat_features=['sex', 'cp', 'fbs', 'restecg','exang', 'slope', 'ca', 'thal',
'target'] for idx,feature in enumerate (cat_features):
if feature!='target':
sns.countplot(x=feature,hue='target',data=df)

In [48]: fig,axes=plt.subplots(nrows=3, ncols=3, figsize=(12,10))


cat_features=['sex', 'cp', 'fbs', 'restecg','exang', 'slope', 'ca', 'thal',
'target'] for idx,feature in enumerate (cat_features):
if feature!='target':
ax=axes[int(idx/3),idx%3]
sns.countplot(x=feature,hue='target',ax=ax,data=df)
#print(idx/3, idx%3)
In [41]: df.columns

Out[41]: Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',


'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
dtype='object')

create pie chart - having heart diseases


In [67]: #create pie chart - having heart diseases
plt.figure(figsize=(12,8))
labels = 'normal','fixed defect','reversable defect'
sizes=[6,130,28]
explode=[0,0.1,0]
colors=['pink','orange','purple']
plt.pie(sizes,labels=labels,autopct='%.1f%%',colors=colors, startangle = 140,explode=explode)
plt.axis('equal')
plt.title('Thalassemia with Heart Diseases')
plt.show()

#create pie chart - having no heart diseases


plt.figure(figsize=(12,8))
labels = 'normal','fixed defect','reversable defect'
sizes=[12,36,89]
explode=[0,0,0.2]
colors=['pink','orange','purple']
plt.pie(sizes,labels=labels,autopct='%.1f%%',colors=colors,explode=explode)
plt.axis('equal')
plt.title('Thalassemia with No Heart Diseases')
plt.show()
In [69]: num_var=['age','trestbps','chol','thalach','oldpeak']
sns.pairplot(df[num_var+['target']], hue='target')
plt.show()
In [71]: #create a plot to understand relationship between age & chol, according to
target sns.lmplot(x='age', y='chol',hue='sex', col='target', data = df)
plt.show()

In [74]: plt.figure(figsize=(12,8))
sns.heatmap(df.corr(), annot=True)
Out[74]: <AxesSubplot: >

In [78]: plt.figure(figsize=(12,8))
plt.subplot(1,3,1)
sns.boxplot(y=df['age'])
plt.title('Age', fontsize=15)

plt.subplot(1,3,2)
sns.boxplot(y=df['trestbps'])
plt.title('Resting blood pressure', fontsize=15)

plt.subplot(1,3,3)
sns.boxplot(y=df['chol'])
plt.title('Cholestrol level', fontsize=15)

plt.show()

You might also like