Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
5em;color:#800000"><br>
Heart Disease Risk Factor Data Analysis
Heart disease is the leading cause of death for men, women, and people of most racial and
ethnic groups in the United States. One person dies every 33 seconds in the United States
from cardiovascular disease.Several health conditions can increase the risk of heart disease.
These are called risk factors.
HighBP : Indicates if the person has been told by a health professional that they have High
Blood Pressure (0 = No; 1 = Yes).
HighChol : Indicates if the person has been told by a health professional that they have High
Blood Cholesterol (0 = No; 1 = Yes).
CholCheck : Cholesterol Check, if the person has their cholesterol levels checked within the
last 5 years (0 = No; 1 = Yes).
BMI: Body Mass Index, calculated by dividing the person's weight (kilograms) by the square
of their height (meters).
Smoker: Indicates if the person has smoked at least 100 cigarettes (0 = No; 1 = Yes).
Diabetes : Indicates if the person has a history of diabetes (0), or currently in pre-diabetes (1),
or suffers from either type of diabetes (2)
PhysActivity : Indicates if the person has some form of physical activity in their day-to-day
routine (0 = No; 1 = Yes).
Fruits : Indicates if the person consumes 1 or more fruit(s) daily (0 = No; 1 = Yes).
Veggies : Indicates if the person consumes 1 or more vegetable(s) daily (0 = No; 1 = Yes).
HvyAlcoholConsump: Indicates if the person has more than 14 drinks per week (0 = No; 1 =
Yes).
AnyHealthcare : Indicates if the person has any form of health insurance (0 = No; 1 = Yes).
NoDocbcCost : Indicates if the person wanted to visit a doctor within the past 1 year but
couldn’t, due to cost (0 = No; 1 = Yes).
GenHlth : Indicates the person's response to how well is their general health, ranging from 1
(excellent) to 5 (poor) (0 = No; 1 = Yes).
Menthlth : Indicates the number of days, within the past 30 days that the person had bad
mental health.
PhysHlth : Indicates the number of days, within the past 30 days that the person had bad
physical health.
DiffWalk : Indicates if the person has difficulty while walking or climbing stairs (0 = No; 1 =
Yes).
Sex : Indicates the gender of the person, where 0 is female and 1 is male.
Age : Indicates the age class of the person, where 1 is 18 years to 24 years up till 13 which is
80 years or older, each interval between has a 5-year increment.
Education : Indicates the highest year of school completed, with 0 being never attended or
kindergarten only and 6 being, having attended 4 years of college or more.
Income : Indicates the total household income, ranging from 1 (at least 10,000)𝑡𝑜6(
75,000+).
#Added Data: The following columns were added to replace the numerical categories given in
the original data with words or age ranges for graphing purposes: Age_range,
Blood_Pressure, Cholesterol, Smoke_Habits, Stroke_History, Fruit_Diet, Physical_Activity,
Alc_habits, Veggie_Diet
Import Packages
In [ ]: import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('/content/drive/MyDrive/Dataset/heart_disease_health_i
In [ ]: plt.style.use('ggplot')
Sample Data
This dataset has 22 columns 253,681 rows.
In [ ]: df.head()
Out[195]: HeartDiseaseorAttack HighBP HighChol CholCheck BMI Smoker Stroke Diabetes PhysA
5 rows × 23 columns
8 rows × 22 columns
This BMI data has outliers. The max BMI is 98.0 kg/m^2, but the 50% percentile BMI is 27
kg/m^2. This outlier was not excluded because it is still relevant.
Out[220]: Text(0.5, 1.0, 'Number of Individuals with Heart Disease in Each Age
Group')
The bar graph shows that the risk for heart disease increases with age, and it is consistent
with the fact that after the age of 65, the risk for heart disease increases.
In [ ]: df1["BMI"].plot(kind='hist', color = 'maroon',bins = 15, figsize=(15,10
plt.xlabel("BMI")
plt.ylabel("Number of Indivuals with Heart Disease")
plt.title("BMI of Indivuals with Heart Disease")
plt.xticks(np.arange(0, 100, 5))
In [ ]: Hdisease_BP=df[['HeartDiseaseorAttack', "Blood_Pressure"]].groupby('Hea
Hdisease_BP
In this dataset, 75.0% of individuals with heart disease also had high blood pressure.
Conversely, 25.0% of individuals without heart disease also had high blood pressure. This
makes sense since high blood pressure can damage arteries by making them less elastic.
This decreases the flow of blood and oxygen, leading to heart disease. This is process is
pictured below:
In [ ]: Hdisease_Chol=df[['HeartDiseaseorAttack', "Cholesterol"]].groupby('Hear
Hdisease_Chol
In this dataset, 70% of individuals with heart disease also had high blood pressure. This
makes sense since with high cholesterol, you can develop fatty deposits in your blood
vessels. Eventually, these deposits grow, making it difficult for enough blood to flow through
your arteries. Those deposits can break suddenly and form a clot that causes a heart attack.
This process is pictured below:
In this dataset, 2.8% of individuals without heart disease had a history of stroke. Additionally,
16.4% of individuals with heart disease had a history of stroke. It makes sense that the
percentage of individuals with a history stroke and Heart Disease is higher than the
percentage of individuals with a history of stroke but no Heart Disease because strokes and
heart disease are closely related. With heart disease, plaque build-up and blood clots in
arteries supplying blood to the brain can cause a stroke.
In [ ]: Hdisease_Smoke=df[['HeartDiseaseorAttack', "Smoke_Habits"]].groupby('He
Hdisease_Smoke
In this dataset, 61% of individuals with heart disease also smoked more 100 cigarettes within
their lifetime. Conversely, only 2% of individuals individuals without heart disease smoked
more 100 cigarettes within their lifetime. This makes sense because smoking increases the
formation of plaque in blood vessels and chemicals in cigarette smoke cause the blood to
thicken and form clots inside veins and arteries.
In [ ]: #Percentage Calculations
#BP
notbp_percent = 90901/(90901 + 138886)
bp_percent = 17928/(17928 + 5965)
print (bp_percent)
print (not_bp_percent)
#Chol
not_chol_percent = 90901/(90901 + 138886)
chol_percent = 17928/(17928 + 5965)
print (chol_percent)
print (not_chol_percent)
#Stroke
stroke_percent = 3937/(19956 + 3937)
print (stroke_percent)
notstroke_percent = 6355/(223432 + 6355)
print (notstroke_percent)
#smoke
smoke_percent = 14801/(14801 + 9092)
print (smoke_percent)
notsmoke_percent = 97622/(132165 + 97622)
print (notstroke_percent)
0.7503452894153099
0.24965471058469008
0.16477629431214164
0.027656046686714217
0.6194701376972335
0.027656046686714217
According to the bar graphs, for individuals diagnosed with heart disease, the majority of
recent lifestyle habits reflect healthy choices. These recent lifestyle habits include number of
fruits and vegetables eaten daily, number of alcoholic drinks per week, and partipation in daily
physical activity. This could indicate lifestyle changes being made as recommended by a
health professional.
Conclusion
This data analysis reinforces that blood pressure, cholesterol levels, history of stroke, and
smoking habits are all related to heart disease. It shows that age increase, so does the risk of
heart disease and that many people with heart disease are overweight. It also shows that the
majority of the individuals diagnosed with Heart Disease in this data were making healthy
choices.