Deep Learning Project Report
Deep Learning Project Report
The data set used in this analysis is the Heart Disease Dataset from the UCI Machine
Learning Repository. It includes information on 303 patients with 14 features related to their
medical history and diagnostic test results. The dataset is sourced from four different
hospitals and is commonly used for benchmarking heart disease prediction models.
Summary of Attributes
The following actions were taken to prepare the data for modeling:
Normalized the Age, Trestbps, Chol, Thalach, and Oldpeak columns to standardize
the range of values
One-hot encoded categorical variables such as CP, Restecg, Slope, and Thal
Split the data into training and testing sets with a 70:30 ratio
Model Training
Model Variations
Hyperparameter Tuning
For each model, hyperparameters such as learning rate, batch size, and the number of epochs
were tuned using grid search and cross-validation to find the optimal settings.
Recommended Model
After evaluating the performance of all models, Model 2 (neural network with two hidden
layers and dropout regularization) was selected as the final model. It achieved the highest
accuracy of 85% on the test set while maintaining good generalization performance.
Age, Chest pain type (CP), Maximum heart rate achieved (Thalach), and
Exercise-induced angina (Exang) were the most significant predictors of heart
disease.
The deep learning model effectively captured non-linear relationships in the data,
leading to improved prediction accuracy.
Regularization techniques such as dropout helped prevent overfitting and improved
the model's generalization to unseen data.
Python Code:
import pandas as pd
import numpy as np
import tensorflow as tf
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/
processed.cleveland.data"
column_names = [
df = pd.read_csv(url, names=column_names)
df = df.apply(pd.to_numeric, errors='coerce')
df.fillna(df.mean(), inplace=True)
X = df.drop("target", axis=1)
numeric_transformer = Pipeline(steps=[
("scaler", StandardScaler())
])
categorical_transformer = Pipeline(steps=[
("onehot", OneHotEncoder(handle_unknown="ignore"))
])
preprocessor = ColumnTransformer(
transformers=[
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)
model = Sequential()
model.add(Dropout(0.5))
model.add(Dense(32, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
history = model.fit(
X_train, y_train,
validation_split=0.2,
epochs=100,
batch_size=32,
callbacks=[early_stopping],
verbose=2
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.show()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.show()
Next Steps
To further improve the model, the following steps are recommended:
Collect additional data to increase the training set size and improve model robustness.
Explore feature engineering techniques to create new features that may enhance
predictive performance.
Investigate other deep learning architectures such as recurrent neural networks
(RNNs) or ensemble methods to potentially achieve better results.