0% found this document useful (0 votes)

3 views

Dovdush_KN-305_lab3

Uploaded by

multifunctionalbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Dovdush_KN-305_lab3

Uploaded by

multifunctionalbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Практична робота №3

з дисципліни "Інформаційні технології смартсистем"

на тему "Кардіологічна клініка"

Виконав:

студент групи КН-305

Довбуш Павло
In [1]: !pip install numpy
!pip install matplotlib
!pip install pandas
!pip install seaborn
!pip install tabulate

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: numpy in c:\users\олеся\appdata\roaming\python\python310\site-packages (1.23.5)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: matplotlib in c:\users\олеся\appdata\roaming\python\python310\site-packages (3.7.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (1.0.7)
Requirement already satisfied: cycler>=0.10 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (4.39.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: numpy>=1.20 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (1.23.5)
Requirement already satisfied: packaging>=20.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (23.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in c:\users\олеся\appdata\roaming\python\python310\site-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from pandas) (2022.7.1)
Requirement already satisfied: numpy>=1.21.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from pandas) (1.23.5)
Requirement already satisfied: six>=1.5 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: seaborn in c:\users\олеся\appdata\roaming\python\python310\site-packages (0.12.2)
Requirement already satisfied: numpy!=1.24.0,>=1.17 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from seaborn) (1.23.5)
Requirement already satisfied: pandas>=0.25 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from seaborn) (1.5.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from seaborn) (3.7.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7)
Requirement already satisfied: cycler>=0.10 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from pandas>=0.25->seaborn) (2022.7.1)
Requirement already satisfied: six>=1.5 in c:\users\олеся\appdata\roaming\python\python310\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: tabulate in c:\users\олеся\appdata\roaming\python\python310\site-packages (0.9.0)

In [2]: import os.path

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats as stats

In [3]: import warnings

warnings.simplefilter('ignore')

In [4]: pd.set_option('display.max_columns', 500)

pd.set_option('display.max_rows', 500)

Read the dataset

In [5]: print(os.path.exists("dataset_3.csv"))

True

In [6]: ds = pd.read_csv("dataset_3.csv")
ds.head()

Out[6]: Unnamed: 0 Age Sex ChestPainType RestingBP Cholesterol FastingBS RestingECG MaxHR ExerciseAngina Oldpeak ST_Slope HeartDisease

0 0 40.0 M ATA 140.0 289.0 0.0 Normal 172.0 N 0.0 Up 0.0

1 1 49.0 F NAP NaN 180.0 NaN Normal 156.0 N 1.0 Flat 1.0

2 2 37.0 M ATA 130.0 283.0 0.0 ST NaN N 0.0 Up 0.0

3 3 48.0 F ASY 138.0 214.0 0.0 Normal 108.0 Y 1.5 Flat 1.0

4 4 54.0 M NAP 150.0 195.0 0.0 Normal 122.0 N 0.0 Up 0.0

In [7]: print('columns count - ',len(ds.columns), '\n')

print('columns: ',list(ds.columns))

columns count - 13

columns: ['Unnamed: 0', 'Age', 'Sex', 'ChestPainType', 'RestingBP', 'Cholesterol', 'FastingBS', 'RestingECG', 'MaxHR', 'ExerciseAngina', 'Oldpeak', 'ST_Slope', 'HeartDisease']

Missing data imputation

In [8]: ds.shape

Out[8]: (918, 13)

In [9]: ds.dtypes

Out[9]: Unnamed: 0 int64

Age float64
Sex object
ChestPainType object
RestingBP float64
Cholesterol float64
FastingBS float64
RestingECG object
MaxHR float64
ExerciseAngina object
Oldpeak float64
ST_Slope object
HeartDisease float64
dtype: object

In [10]: for col in ds.columns:

if ds[col].isnull().values.any():
print("Missing data in ", col, ds[col].isnull().sum())

Missing data in Age 45

Missing data in Sex 18
Missing data in ChestPainType 18
Missing data in RestingBP 36
Missing data in Cholesterol 82
Missing data in FastingBS 45
Missing data in RestingECG 27
Missing data in MaxHR 91
Missing data in ExerciseAngina 9
Missing data in Oldpeak 73
Missing data in ST_Slope 91
Missing data in HeartDisease 64

In [11]: def impute_na(df, variable, value):

return df[variable].fillna(value)

In [12]: Age_median = ds['Age'].median()

RestingBP_median = ds['RestingBP'].median()
Cholesterol_median = ds['Cholesterol'].median()
FastingBS_median = ds['FastingBS'].median()
MaxHR_median = ds['MaxHR'].median()
Oldpeak_median = ds['Oldpeak'].median()
Sex_mode = ds['Sex'].mode()
ChestPainType_mode = ds['ChestPainType'].mode()
RestingECG_mode = ds['RestingECG'].mode()
ExerciseAngina_mode = ds['ExerciseAngina'].mode()
ST_Slope_mode = ds['ST_Slope'].mode()
HeartDisease_median = ds['HeartDisease'].median()

In [13]: #числові значення з заміною на середнє

ds['Age'] = impute_na(ds, 'Age',Age_median)
ds['RestingBP'] = impute_na(ds, 'RestingBP',RestingBP_median)
ds['Cholesterol'] = impute_na(ds, 'Cholesterol',Cholesterol_median)
ds['FastingBS'] = impute_na(ds, 'FastingBS',FastingBS_median)
ds['MaxHR'] = impute_na(ds, 'MaxHR',MaxHR_median)
ds['Oldpeak'] = impute_na(ds, 'Oldpeak',Oldpeak_median)
ds['HeartDisease'] = impute_na(ds, 'HeartDisease',HeartDisease_median)

#Заміна відсутніх значень на категорію, що найчастіше зустрічається

ds['Sex'] = impute_na(ds, 'Sex',Sex_mode)

ds['ChestPainType'] = impute_na(ds, 'ChestPainType',ChestPainType_mode)
ds['RestingECG'] = impute_na(ds, 'RestingECG',RestingECG_mode)
ds['ExerciseAngina'] = impute_na(ds, 'ExerciseAngina',ExerciseAngina_mode)
ds['ST_Slope'] = impute_na(ds, 'ST_Slope',ST_Slope_mode)

ds['Sex'].fillna(method ='ffill', inplace = True)

ds['ChestPainType'].fillna(method ='ffill', inplace = True)
ds['RestingECG'].fillna(method ='ffill', inplace = True)
ds['ExerciseAngina'].fillna(method ='ffill', inplace = True)
ds['ST_Slope'].fillna(method ='ffill', inplace = True)

In [14]: for col in ds.columns:

if ds[col].isnull().values.any():
print("Missing data in ", col, ds[col].isnull().sum())

Categorical encoding
In [15]: ds.nunique()

Out[15]: Unnamed: 0 918

Age 50
Sex 2
ChestPainType 4
RestingBP 66
Cholesterol 217
FastingBS 2
RestingECG 3
MaxHR 118
ExerciseAngina 2
Oldpeak 51
ST_Slope 3
HeartDisease 2
dtype: int64

In [16]: ds['Sex'].unique()

Out[16]: array(['M', 'F'], dtype=object)

In [17]: ds['ChestPainType'].unique()

Out[17]: array(['ATA', 'NAP', 'ASY', 'TA'], dtype=object)

In [18]: ds['RestingECG'].unique()

Out[18]: array(['Normal', 'ST', 'LVH'], dtype=object)

In [19]: ds['ExerciseAngina'].unique()

Out[19]: array(['N', 'Y'], dtype=object)

In [20]: ds['ST_Slope'].unique()

Out[20]: array(['Up', 'Flat', 'Down'], dtype=object)

In [21]: from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

In [22]: ds['Sex'] = le.fit_transform(ds['Sex'])

ds['ChestPainType'] = le.fit_transform(ds['ChestPainType'])
ds['RestingECG'] = le.fit_transform(ds['RestingECG'])
ds['ExerciseAngina'] = le.fit_transform(ds['ExerciseAngina'])
ds['ST_Slope'] = le.fit_transform(ds['ST_Slope'])

In [23]: ds.head(10)

Out[23]: Unnamed: 0 Age Sex ChestPainType RestingBP Cholesterol FastingBS RestingECG MaxHR ExerciseAngina Oldpeak ST_Slope HeartDisease

0 0 40.0 1 1 140.0 289.0 0.0 1 172.0 0 0.0 2 0.0

1 1 49.0 0 2 130.0 180.0 0.0 1 156.0 0 1.0 1 1.0

2 2 37.0 1 1 130.0 283.0 0.0 2 138.0 0 0.0 2 0.0

3 3 48.0 0 0 138.0 214.0 0.0 1 108.0 1 1.5 1 1.0

4 4 54.0 1 2 150.0 195.0 0.0 1 122.0 0 0.0 2 0.0

5 5 39.0 1 2 120.0 339.0 0.0 1 138.0 0 0.0 2 0.0

6 6 45.0 0 1 130.0 237.0 0.0 1 170.0 0 0.0 2 0.0

7 7 54.0 1 1 110.0 208.0 0.0 1 142.0 0 0.0 2 0.0

8 8 37.0 1 0 140.0 207.0 0.0 1 130.0 1 1.5 1 1.0

9 9 48.0 0 1 120.0 284.0 0.0 1 120.0 1 0.0 2 1.0

In [24]: def diagnostic_plots(df, variable):

# function takes a dataframe (df) and
# the variable of interest as arguments

# define figure size

plt.figure(figsize=(16, 4))

# histogram
plt.subplot(1, 3, 1)
sns.histplot(df[variable], bins=30)
plt.title('Histogram')

# Q-Q plot
plt.subplot(1, 3, 2)
stats.probplot(df[variable], dist="norm", plot=plt)
plt.ylabel('Variable quantiles')

# boxplot
plt.subplot(1, 3, 3)
sns.boxplot(y=df[variable])
plt.title('Boxplot')

plt.show()

In [25]: diagnostic_plots(ds, 'Age')

In [26]: diagnostic_plots(ds, 'RestingBP')

In [27]: diagnostic_plots(ds, 'Cholesterol')

In [28]: diagnostic_plots(ds, 'MaxHR')

In [29]: diagnostic_plots(ds, 'FastingBS')

In [30]: diagnostic_plots(ds, 'Oldpeak')

Data Scaling
In [31]: from sklearn.preprocessing import MinMaxScaler,StandardScaler
mms = MinMaxScaler() # Normalization
ss = StandardScaler() # Standardization

ds['Oldpeak'] = mms.fit_transform(ds[['Oldpeak']])
ds['Age'] = ss.fit_transform(ds[['Age']])
ds['RestingBP'] = ss.fit_transform(ds[['RestingBP']])
ds['Cholesterol'] = ss.fit_transform(ds[['Cholesterol']])
ds['MaxHR'] = ss.fit_transform(ds[['MaxHR']])
ds.head()

Out[31]: Unnamed: 0 Age Sex ChestPainType RestingBP Cholesterol FastingBS RestingECG MaxHR ExerciseAngina Oldpeak ST_Slope HeartDisease

0 0 -1.473387 1 1 0.427330 0.846142 0.0 1 1.443735 0 0.295455 2 0.0

1 1 -0.496724 0 2 -0.127534 -0.202998 0.0 1 0.780688 0 0.409091 1 1.0

2 2 -1.798941 1 1 -0.127534 0.788391 0.0 2 0.034759 0 0.295455 2 0.0

3 3 -0.605242 0 0 0.316357 0.124257 0.0 1 -1.208455 1 0.465909 1 1.0

4 4 0.045866 1 2 0.982193 -0.058621 0.0 1 -0.628288 0 0.295455 2 0.0

Модель машинного навчання не розуміє одиниці значень ознак. Він розглядає вхідні дані як просте число, але не розуміє справжнього значення цього значення. Таким чином, виникає необхідність масштабувати дані.

У нас є 2 варіанти масштабування даних: 1) Нормалізація 2) Стандартизація. Оскільки більшість алгоритмів передбачає, що дані мають нормальний (гаусівський) розподіл, нормалізація виконується для функцій, дані яких не відображають нормального розподілу, а
стандартизація виконується для функцій, які нормально розподіляються, де їхні значення величезні або дуже малі порівняно з іншими особливості.

Нормалізація: функцію Oldpeak нормалізовано, оскільки вона відображала правий спотворений розподіл даних. Стандартизація: Age, RestingBP, Cholesterol і MaxHR зменшено, оскільки ці функції розподілені нормально.

Modeling
In [32]: from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from lightgbm import LGBMClassifier
from sklearn.neural_network import MLPClassifier

In [33]: X = ds.drop(['HeartDisease'] ,axis = 1)

y = ds['HeartDisease']
X

Out[33]: Unnamed: 0 Age Sex ChestPainType RestingBP Cholesterol FastingBS RestingECG MaxHR ExerciseAngina Oldpeak ST_Slope

0 0 -1.473387 1 1 0.427330 0.846142 0.0 1 1.443735 0 0.295455 2

1 1 -0.496724 0 2 -0.127534 -0.202998 0.0 1 0.780688 0 0.409091 1

2 2 -1.798941 1 1 -0.127534 0.788391 0.0 2 0.034759 0 0.295455 2

3 3 -0.605242 0 0 0.316357 0.124257 0.0 1 -1.208455 1 0.465909 1

4 4 0.045866 1 2 0.982193 -0.058621 0.0 1 -0.628288 0 0.295455 2

... ... ... ... ... ... ... ... ... ... ... ... ...

913 913 -0.930797 1 3 -1.237261 0.210883 0.0 1 -0.213883 0 0.431818 1

914 914 1.565119 1 0 0.649275 -0.077871 1.0 1 0.159081 0 0.681818 1

915 915 0.371420 1 0 -0.127534 -0.674630 0.0 1 -0.918371 1 0.431818 1

916 916 0.371420 0 1 -0.127534 0.336010 0.0 0 1.526616 0 0.352273 1

917 917 -1.690423 1 2 0.316357 -0.251124 0.0 1 1.485176 0 0.295455 2

918 rows × 12 columns

In [34]: X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=21)

In [35]: models = [
{
"name": "Logistic Regression",
"estimator": LogisticRegression(),
"hyperparameters": {
"penalty": ["l2"],
"C": [0.01, 0.1, 1, 10],
"max_iter": [500]
}
},
{
"name": "Gradient Boosting",
"estimator": GradientBoostingClassifier(),
"hyperparameters": {
"n_estimators": [100],
"learning_rate": [0.1],
"max_depth": [3]
}
},
{
"name": "Random Forest",
"estimator": RandomForestClassifier(),
"hyperparameters": {
"n_estimators": [100, 200, 300],
"max_depth": [3, 5, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4]
}
},
{
"name": "Decision Tree",
"estimator": DecisionTreeClassifier(),
"hyperparameters": {
"criterion": ["gini", "entropy"],
"max_depth": [3, 5, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4]
}
},
{
"name": "K-Nearest Neighbors",
"estimator": KNeighborsClassifier(),
"hyperparameters": {
"n_neighbors": [3, 5, 7],
"weights": ["uniform", "distance"],
"algorithm": ["auto", "ball_tree", "kd_tree", "brute"]
}
},
{
"name": "Naive Bayes",
"estimator": GaussianNB(),
"hyperparameters": {
"var_smoothing": [1e-9, 1e-10, 1e-11, 1e-12]
}
},
{
"name": "AdaBoost",
"estimator": AdaBoostClassifier(),
"hyperparameters": {
"n_estimators": [50, 100, 200],
"learning_rate": [0.01, 0.1, 1],
"algorithm": ["SAMME", "SAMME.R"]
}
},

Choose the best parameters for each model

In [36]: accuracies = []
train_accuracies = []
best_models = {}

for model in models:

with warnings.catch_warnings():
warnings.simplefilter("ignore")
print(f"Training {model['name']}...")
grid_search = GridSearchCV(
estimator=model['estimator'],
param_grid=model['hyperparameters'],
scoring='accuracy',
cv=10
)
grid_search.fit(X_train, y_train)

# evaluate the model's performance

best_model = grid_search.best_estimator_
# Calculate training accuracy
y_train_pred = best_model.predict(X_train)
train_accuracy = accuracy_score(y_train, y_train_pred)
train_accuracies.append((model['name'], train_accuracy))

# Calculate testing accuracy

y_test_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_pred)
accuracies.append((model['name'], test_accuracy))

best_models[model['name']] = best_model

print(f"Best parameters for {model['name']}:{grid_search.best_params_}")

print("\033[1m--------------------------------------------------------\033[0m")
print(f"Training accuracy for {model['name']}:{train_accuracy}")
print("\033[1m--------------------------------------------------------\033[0m")
print(f"Testing accuracy for {model['name']}:{test_accuracy}")
print("\033[1m--------------------------------------------------------\033[0m")

Training Logistic Regression...

Best parameters for Logistic Regression:{'C': 10, 'max_iter': 500, 'penalty': 'l2'}
--------------------------------------------------------
Training accuracy for Logistic Regression:0.8174386920980926
--------------------------------------------------------
Testing accuracy for Logistic Regression:0.8315217391304348
--------------------------------------------------------
Training Gradient Boosting...
Best parameters for Gradient Boosting:{'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
--------------------------------------------------------
Training accuracy for Gradient Boosting:0.9223433242506812
--------------------------------------------------------
Testing accuracy for Gradient Boosting:0.8478260869565217
--------------------------------------------------------
Training Random Forest...
Best parameters for Random Forest:{'max_depth': 5, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}
--------------------------------------------------------
Training accuracy for Random Forest:0.8746594005449592
--------------------------------------------------------
Testing accuracy for Random Forest:0.8532608695652174
--------------------------------------------------------
Training Decision Tree...
Best parameters for Decision Tree:{'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 2}
--------------------------------------------------------
Training accuracy for Decision Tree:0.8787465940054496
--------------------------------------------------------
Testing accuracy for Decision Tree:0.7989130434782609
--------------------------------------------------------
Training K-Nearest Neighbors...
Best parameters for K-Nearest Neighbors:{'algorithm': 'auto', 'n_neighbors': 7, 'weights': 'distance'}
--------------------------------------------------------
Training accuracy for K-Nearest Neighbors:1.0
--------------------------------------------------------
Testing accuracy for K-Nearest Neighbors:0.6739130434782609
--------------------------------------------------------
Training Naive Bayes...
Best parameters for Naive Bayes:{'var_smoothing': 1e-09}
--------------------------------------------------------
Training accuracy for Naive Bayes:0.8174386920980926
--------------------------------------------------------
Testing accuracy for Naive Bayes:0.842391304347826
--------------------------------------------------------
Training AdaBoost...
Best parameters for AdaBoost:{'algorithm': 'SAMME.R', 'learning_rate': 0.1, 'n_estimators': 50}
--------------------------------------------------------
Training accuracy for AdaBoost:0.8365122615803815
--------------------------------------------------------
Testing accuracy for AdaBoost:0.8532608695652174
--------------------------------------------------------

Create the models with the best hyperparameters

In [37]: log_reg_model = LogisticRegression(
C=10,
max_iter=500,
penalty='l2'
)

# create the Random Forest model with the best hyperparameters

rf_model = RandomForestClassifier(
max_depth=5,
min_samples_leaf=4,
min_samples_split=10,
n_estimators=300
)
# create the Gradient Boosting model with the best hyperparameters
gb_model = GradientBoostingClassifier(
learning_rate=0.1,
max_depth=3,
n_estimators=100
)
# create the Decision Tree model with the best hyperparameters
dt_model = DecisionTreeClassifier(
criterion='gini',
max_depth=3,
min_samples_leaf=1,
min_samples_split=2
)

# create the K-Nearest Neighbors model with the best hyperparameters

knn_model = KNeighborsClassifier(
algorithm='auto',
n_neighbors=7,
weights='distance'
)

# create the Naive Bayes model with the best hyperparameters

nb_model = GaussianNB(
var_smoothing=1e-09
)
# create the AdaBoost model with the best hyperparameters
ab_model = AdaBoostClassifier(
algorithm='SAMME.R',
learning_rate=0.1,
n_estimators=50
)

Train models
In [38]: # Train Logistic Regression model
log_reg_model.fit(X_train, y_train)

# Train Random Forest model

rf_model.fit(X_train, y_train)

# Train Gradient Boosting model

gb_model.fit(X_train, y_train)

# Train Decision Tree model

dt_model.fit(X_train, y_train)

# Train K-Nearest Neighbors model

knn_model.fit(X_train, y_train)

# Train Naive Bayes model

nb_model.fit(X_train, y_train)

# Train AdaBoost model

ab_model.fit(X_train, y_train)

Out[38]: ▾ AdaBoostClassifier

AdaBoostClassifier(learning_rate=0.1)

Features importance
In [39]: # fit the model
ab_model.fit(X_train, y_train)

# get feature importances

importances = gb_model.feature_importances_

# get feature names

feature_names = X.columns

# sort feature importances in descending order

indices = np.argsort(importances)[::-1]

# plot feature importances

plt.figure(figsize=(10,5))
plt.title("Feature Importances")
plt.bar(range(len(indices)), importances[indices])
plt.xticks(range(len(indices)), feature_names[indices], rotation='vertical')
plt.show()
In [ ]:

KIR B1+ T1 Test Unit3 Consolidation
0% (1)
KIR B1+ T1 Test Unit3 Consolidation
3 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Dovdush_KN-305_lab2
No ratings yet
Dovdush_KN-305_lab2
2 pages
Vertopal.com Heart Failure Prediction With Detailed Headings
No ratings yet
Vertopal.com Heart Failure Prediction With Detailed Headings
12 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
eda-ml-decision-tree.ipynb - Colab
No ratings yet
eda-ml-decision-tree.ipynb - Colab
20 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
baseline.ipynb - Colab
No ratings yet
baseline.ipynb - Colab
5 pages
Practical 1
No ratings yet
Practical 1
7 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Heart Disease Prediction! ❤️?
No ratings yet
Heart Disease Prediction! ❤️?
52 pages
Major project - Colab
No ratings yet
Major project - Colab
15 pages
Hare Krishna
No ratings yet
Hare Krishna
1 page
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Sleep Disorder 1689050852
No ratings yet
Sleep Disorder 1689050852
41 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Copy of TP3.ipynb - Colab
No ratings yet
Copy of TP3.ipynb - Colab
17 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
1728086737277
No ratings yet
1728086737277
26 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Python Solution
No ratings yet
Python Solution
30 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Itm 617 Capstone Code - Colaboratory
No ratings yet
Itm 617 Capstone Code - Colaboratory
13 pages
Dovdush_KN-305_lab4
No ratings yet
Dovdush_KN-305_lab4
17 pages
Medidas de Tendencia Central 2020 PDF
No ratings yet
Medidas de Tendencia Central 2020 PDF
26 pages
Aids
No ratings yet
Aids
88 pages
Covid19 Death Prediction
No ratings yet
Covid19 Death Prediction
1 page
CardioGoodFitness - Jupyter Notebook
No ratings yet
CardioGoodFitness - Jupyter Notebook
12 pages
Heart Disease Prediction (1) (1) - 1
No ratings yet
Heart Disease Prediction (1) (1) - 1
1 page
Assignment 1
No ratings yet
Assignment 1
10 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
C ML1
No ratings yet
C ML1
10 pages
DocScanner Oct 22, 2024 17-38
No ratings yet
DocScanner Oct 22, 2024 17-38
2 pages
5
No ratings yet
5
5 pages
MEHAK MONIKA IP PROJECT FINAL 1
No ratings yet
MEHAK MONIKA IP PROJECT FINAL 1
24 pages
My Code
No ratings yet
My Code
7 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Heart Disease Classification ML Assignment - Jupyter Notebook
No ratings yet
Heart Disease Classification ML Assignment - Jupyter Notebook
7 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
HEART DISEASE CLASSIFICATION USING ANN HANDS-ON
No ratings yet
HEART DISEASE CLASSIFICATION USING ANN HANDS-ON
7 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Mehak Monika Ip Project Final 1
No ratings yet
Mehak Monika Ip Project Final 1
24 pages
Data Science Code
No ratings yet
Data Science Code
29 pages
Exp 5
No ratings yet
Exp 5
7 pages
RA2111003011432
No ratings yet
RA2111003011432
3 pages
heart_cleveland.ipynb - Colab
No ratings yet
heart_cleveland.ipynb - Colab
5 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Binary Prediction of Smoker Status using Bio-Signals
No ratings yet
Binary Prediction of Smoker Status using Bio-Signals
20 pages
B58_ Handling Missing Values,Feature_Selection (1)
No ratings yet
B58_ Handling Missing Values,Feature_Selection (1)
4 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Mohamed Sharbudeen: Bachelor of Engineering: RMK Engineering College, Kavaraipettai
No ratings yet
Mohamed Sharbudeen: Bachelor of Engineering: RMK Engineering College, Kavaraipettai
3 pages
Full Download (Ebook PDF) Accounting Information Systems 4th Edition by Robert Hurt - Digital Ebook PDF
100% (5)
Full Download (Ebook PDF) Accounting Information Systems 4th Edition by Robert Hurt - Digital Ebook PDF
41 pages
Quantum Computing
100% (4)
Quantum Computing
31 pages
Meenu's Logbook (1)
No ratings yet
Meenu's Logbook (1)
29 pages
AI Syllabus PDF
No ratings yet
AI Syllabus PDF
1 page
Paper_4-A_Review_on_Artificial_Intelligence
No ratings yet
Paper_4-A_Review_on_Artificial_Intelligence
8 pages
Fraud Detection in Financial Transactions
No ratings yet
Fraud Detection in Financial Transactions
5 pages
Data Analytics For Business Foundations And Industry Applications Fenio Annansingh download
100% (2)
Data Analytics For Business Foundations And Industry Applications Fenio Annansingh download
90 pages
SURVEY QUESTIONNAIRE ON THE Chat GPT
No ratings yet
SURVEY QUESTIONNAIRE ON THE Chat GPT
4 pages
Noe8e PPT ch08 Accessible
No ratings yet
Noe8e PPT ch08 Accessible
51 pages
Important Questions in AI
No ratings yet
Important Questions in AI
2 pages
Lecture 8
No ratings yet
Lecture 8
42 pages
Robotic Process Automation (RPA) Security (CIO IT Security 19 97 Rev 3) 02-14-2023
No ratings yet
Robotic Process Automation (RPA) Security (CIO IT Security 19 97 Rev 3) 02-14-2023
18 pages
ChatGPT Mastery - Zaka
No ratings yet
ChatGPT Mastery - Zaka
10 pages
Applicate IT Solutions Pvt. LTD
No ratings yet
Applicate IT Solutions Pvt. LTD
1 page
AI for Business Professionals
No ratings yet
AI for Business Professionals
1 page
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
No ratings yet
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
39 pages
Ensuring AI Safety in Autonomous Vehicles: A Framework Based on ISO PAS 8800
No ratings yet
Ensuring AI Safety in Autonomous Vehicles: A Framework Based on ISO PAS 8800
33 pages
Machine Learning
No ratings yet
Machine Learning
106 pages
An Intuitive Exploration of Artificial Intelligence Theory and Applications of Deep Learning Simant Dube - Download the ebook and start exploring right away
No ratings yet
An Intuitive Exploration of Artificial Intelligence Theory and Applications of Deep Learning Simant Dube - Download the ebook and start exploring right away
65 pages
MFilterIt Ecomm Media Optimization Tool
No ratings yet
MFilterIt Ecomm Media Optimization Tool
12 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
48 pages
Introduction To Expert Systems
No ratings yet
Introduction To Expert Systems
24 pages
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
No ratings yet
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
4 pages
HW04
No ratings yet
HW04
9 pages
Kec R2020 Ad
No ratings yet
Kec R2020 Ad
203 pages
Reading Writing 5 - 15.10.2024
No ratings yet
Reading Writing 5 - 15.10.2024
3 pages
Spark Education
No ratings yet
Spark Education
15 pages
Anglais PrepaEamac
No ratings yet
Anglais PrepaEamac
12 pages