CKD Prediction Project Report
CKD Prediction Project Report
A PROJECT REPORT
Submitted by
of
KOTHRIKALAN, SEHORE
DEC 2021
VIT BHOPAL UNIVERSITY, KOTHRIKALAN, SEHORE
MADHYA PRADESH – 466114
BONAFIDE CERTIFICATE
Certified that this project report titled “CKD PREDICTION USING ML” is the
knowledge, the work reported at this time is not related to any other
_______________ _______________
School of Computer Science and Engineering School of Computer Science and Engineering
Deep gratitude to Dr Mayuri A V R, Head of the Department, School of Computer Science and
Engineering for her invaluable support and encouragement throughout this project.
Our internal guide, Dr J. Manikandan, has provided valuable suggestions throughout the work on
Our thanks go out to all the technical and teaching staff at the School of Computer Science and
As a final note, we are deeply indebted to our parents who have given us tremendous support
List of Abbreviations -
9
List of Figures and Graphs
9
List of Tables
10
Abstract
1 CHAPTER-1:
.
1.1 Introduction
including techniques]
.
.
2.4 Summary
.
3 CHAPTER-3:
REQUIREMENT ARTIFACTS
13
3.1 Introduction
.
3.2.1 Hardware .
.
3.2.2 Software
3.2.3 Libraries
.
3.3 Summary
.
4 CHAPTER-4:
.
5 CHAPTER-5:
5.1 Outline
.
5.2 Technical coding and code solutions .
6 CHAPTER-6:
27
PROJECT OUTCOME AND
APPLICABILITY
6.1 Outline
.
6.2 Significant project outcomes .
6.4 Inference .
.
7 CHAPTER-7:
Appendix A -
Appendix B
-
References
31
LIST OF FIGURES AND GRAPHS
3 Real-Time Usage 27
LIST OF TABLES
1 Hardware Requirements 14
ABSTRACT
Chronic kidney disease (CKD) or chronic renal disease has become a major issue with a steady
growth rate. They are now a cause of global morbidity and mortality even in developing countries.
According to the report published by researchers and medical professionals from the Department
of Nephrology, the All India Institute of Medical Sciences, and the Director-General Health
Services, Ministry of Health and Family Welfare, Government of India The approximate prevalence
of CKD is 800 per million population.
Early detection and characterization are considered to be critical factors in the management and
control of chronic kidney disease. Using efficient data mining techniques can reveal and extract
hidden insights from clinical and laboratory patient data, which can be helpful to physicians to
identify disease severity stage with maximum accuracy. In the domain of healthcare, this paper
aims in building a model for risk level prediction in CKD considering all of the symptoms and
causes contributing to it. The symptoms are the attributes that will define different stages of kidney
diseases. Based on the different stages, one can classify a set of patient records to identify to
which class of kidney disease a patient may belong. Classifying patients results in easy recognition
of the dominant attributes of CKD. Certain solutions can be provided with respect to the dominant
attributes to avoid the progression of CKD.
To construct a model on risk prediction of kidney disease, various machine learning techniques can
be applied and then their performance can be compared with respect to the accuracy, specificity
and sensitivity of the models. Before the application of any machine learning technique, there is a
need of doing feature selection to understand the dominant attributes. A feature selection method
called random forest is used to achieve the selection of dominant attributes. This paper is mainly
concerned with the use of machine learning techniques namely neuro-fuzzy systems and
clustering which is termed unsupervised learning.
[PURPOSE-METHODOLOGY-FINDINGS]
PURPOSE:
The objective of this project is to develop a model for risk level prediction in CKD while considering
all of the symptoms and causes involved. Because of the building up of waste due to this disease,
treatment can help, but this condition can't be cured. Lab tests or imaging is always required, and
in later stages, filtering the blood with a machine (dialysis) or a transplant may be required. Due to
this, CKD is a substantial financial burden on patients, healthcare services, and the government.
Now, using the advancement of technology, Machine Learning Algorithms are used to detect and
predict diseases with more accuracy. The symptoms will be the criteria defining the different stages
of kidney disease. By identifying the different stages, one can identify the class of kidney disease a
patient might suffer from by categorizing their records.
METHODOLOGY:
Data mining techniques have been used to define new and understandable patterns to construct
classification templates. Supervised and unsupervised learning techniques require the construction
of models based on prior analysis and are used in medical and clinical diagnostics for classification
and regression. Four popular machine learning algorithms used are SVM, KNN, decision tree, and
random forest, which give the best diagnostic results. Machine learning techniques work to build
predictive/classification models through two stages:
1. the training phase, in which a model is constructed from a set of training data with the
expected outputs, and;
2. the validation stage, which estimates the quality of the trained models from the validation
dataset without the expected output.
All algorithms are supervised algorithms that are used to solve classification and regression
problems.
FINDINGS:
Through the process of working on this project, we came to the realization of just how much work
goes into writing, training and implementing a model. A project's success or failure is largely
determined by patience, teamwork, effort and time, in addition to technical knowledge and skills.
This project was designed to make human lives better, and with the completion of this project we
found out that with additional resources, the project could be further expanded to cover new
horizons, and ultimately benefit and better the lives of a large number of people across the world.
CHAPTER - 1
1.1 Introduction:
Our Chronic Kidney Disease Prediction model makes use of machine learning to achieve quick,
easy and most importantly: accurate predictions of whether or not a person has the disease based
on the symptoms input by the user.
Our true aim and motivation for the work were to use our knowledge in machine learning and
produce a model which could easily quickly and precisely predict chronic kidney disease, as CKD
requires a lot of tedious tests to detect via medical institutions and these tests cost both money and
time, one or both of which is not available to people belonging to middle-class/ Lower middle-class
& those beneath the poverty line. Thus we aim to make the detection of the disease a lot more
quick, cost-efficient and accurate. Therefore enabling the patients to expend on treating a
confirmed disease rather than trying to detect the disease itself.
Our main objective is to create a program that takes input from users and outputs correct
prediction results. To make this possible we used ‘Matplotlib’, ‘Seaborn’, ‘Pandas’, ‘Plotly Express’.
Along with this, we have used the National Health Records to procure a dataset to train our model.
This work proposes a workflow to predict CKD status based on clinical data, incorporating data
prepossessing, a missing value handling method with collaborative filtering and attributes
selection. The research also considers the practical aspects of data collection and highlights the
importance of incorporating domain knowledge when using machine learning for CKD status
prediction.
The team evenly divided many aspects of the work among us. While everyone contributed equally
to help create the core project code and the accompanying presentations, some of the other
things, such as the website designing and creating the various recommendations lists and links
was handled individually by some.
1.6 Summary:
The reason our project works so well is that we have implemented a personal touch that you
usually cannot find in the other corporate-mandated apps and websites. We aim to please and
relieve our users and help them identify their illness at an early stage and avail medical help at the
earliest.
CHAPTER - 2
2.1 Introduction:
Now with our spark of an idea out of the way, we began diving into learning and absorbing as much
information as we could regarding blue Chronic Kidney Disease, its symptoms and how to
implement its prediction using our tools at hand.
Fluffy Logic has been created till now for the arrangement of patients with CKD, but there are a few
classifiers that don't have to fit the informational index (data set) in the unique situation. Information
mining procedures are utilized to arrive at one specific resolution that relates to the qualities of
patients of various types who have Kidney infections. Some of the machine learning approaches
that are being considered, do not stand viable for a large volume of data.
On analyzing the currently available models, we found that some of the machine learning
approaches that are being considered, do not stand viable for a large volume of data. There are
also a few classifiers that don't have to fit the informational index (data set) when the unique
situation arises.
2.4 Summary:
It is always the teamwork and investigation that make a project like this so successful. Working
together as a singular unit, where everyone is contributing equally to bring forth something of true
value and promise while also learning so many new things along the way, it's something truly
special.
CHAPTER-3
REQUIREMENT ARTIFACTS
3.1 Introduction:
To translate our project into reality, we used various methods and artifacts to make it possible just
the way we had it in our minds.
RAM 4 GB
GPU -null-
STORAGE 5 GB HDD
WEBCAM -null-
3.2.2 Softwares:
● VS Code
● Numpy
● Spyder
● Jupyter
3.2.3 Libraries:
● Matplotlib
● Seaborn
● Pandas
● Plotly Express
Our primary goal was to build software that will take in the inputs from the patients regarding the
symptoms they are experiencing and accurately predict whether or not they have chronic kidney
disease and should they contact their doctors.
5.1 Outline:
The main body of the code deals entirely with running the facial recognition software to detect the
correct mood on the face of the user. Given below is the full program, complete with solutions and
output based on running the code.
import pandas as pd
#pandas is library used for data extraction and manipulation.
import numpy as np
#numpy module is use for perform numerical task or operations on data.
import matplotlib.pyplot as plt
#matplot is used for data visualization and graphical plotting library for creating static, animated,
and interactive visualizations .
import seaborn as sns
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import confusion_matrix,accuracy_score
#####################
from sklearn.neighbors import KNeighborsClassifier
#from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tkinter as tk
######################
df=pd.read_csv(r'C:\Users\RohanRVC\Documents\Kidney_disease_predicition/kidney_disease.c
sv')
df.head()
columns=pd.read_csv('C:/Users/RohanRVC/Documents/Kidney_disease_predicition/data_descri
ption.txt',sep='-')
columns=columns.reset_index()
columns.columns=['cols','abb_col_names']
columns
df.head()
df.columns=columns['abb_col_names'].values
df.head()
df.dtypes
def convert_dtype(df,feature):
df[feature]=pd.to_numeric(df[feature],errors='coerce')
df.dtypes
df.drop('id',axis=1,inplace=True)
def extract_cat_num(df):
cat_col=[col for col in df.columns if df[col].dtype=='object']
num_col=[col for col in df.columns if df[col].dtype!='object']
return cat_col,num_col
extract_cat_num(df)
cat_col,num_col=extract_cat_num(df)
cat_col
num_col
df['diabetes mellitus'].replace(to_replace={'\tno':'no','\tyes':'yes'},inplace=True)
df['coronary artery disease']=df['coronary artery disease'].replace(to_replace='\tno',value='no')
df['class']=df['class'].replace(to_replace='ckd\t',value='ckd')
plt.figure(figsize=(30,20))
for i, feature in enumerate(num_col):
plt.subplot(5,3,i+1)
df[feature].hist()
plt.title(feature)
len(cat_col)
plt.figure(figsize=(20,20))
for i,feature in enumerate(cat_col):
plt.subplot(4,3,i+1)
sns.countplot(df[feature])
sns.countplot(df['class'])
#stats
df.groupby(['red blood cells','class'])['red blood cell
count'].agg(['count','mean','median','min','max'])
df.columns
grid=sns.FacetGrid(df,hue='class',aspect=2)
grid.map(sns.kdeplot,'red blood cell count')
grid.add_legend()
#automate analysis
def violin(col):
fig=px.violin(df,y=col,x='color',color='class',box=True)
return fig.show()
def scatters(col1,col2):
fig=px.scatter(df,x=col1,y=col2,color='class')
return fig.show()
#from this function we can plot any colums line gragh within single line
def kde_plot(feature):
grid=sns.FacetGrid(df,hue='class',aspect=2)
grid.map(sns.kdeplot,feature)
grid.add_legend()
data=df.copy()
data.head()
random_sample.index
random_sample
data.head()
data[num_col].isnull().sum()
data[num_col].isnull().sum()
data[cat_col].isnull().sum()
def impute_mode(feature):
mode=data[feature].mode()[0]
data[feature]=data[feature].fillna(mode)
for col in cat_col:
impute_mode(col)
data[cat_col].isnull().sum()
data.head()
#cat to num
##label encoding
##normal -0
##abnormal - 1
##use case --100
le=LabelEncoder()
data.head()
X=data[ind_col]
y=data[dep_col]
X.head()
ordered_rank_features=SelectKBest(score_func=chi2,k=20)
ordered_feature=ordered_rank_features.fit(X,y)
ordered_feature
ordered_feature.scores_
datascores=pd.DataFrame(ordered_feature.scores_,columns=['Score'])
datascores
X.columns
dfcols=pd.DataFrame(X.columns)
dfcols
features_rank=pd.concat([dfcols,datascores],axis=1)
features_rank
features_rank.columns=['features','Score']
features_rank
features_rank.nlargest(10,'Score')
selected_columns=features_rank.nlargest(10,'Score')['features'].values
selected_columns
X_new=data[selected_columns]
X_new.head()
len(X_new)
X_new.shape
print(X_train.shape)
print(X_test.shape)
y_train.value_counts()
XGBClassifier()
params={
'learning_rate':[0.05,0.20,0.25],
'max_depth':[5,8,10],
'min_child_weight':[1,3,5,7],
'gamma':[0.0,0.1,0.2,0.4],
'colsample_bytree':[0.3,0.4,0.7]
classifier=XGBClassifier()
random_search=RandomizedSearchCV(classifier,param_distributions=params,n_iter=5,scoring
='roc_auc',n_jobs=-1,cv=5,verbose=3)
random_search.fit(X_train,y_train)
random_search.best_estimator_
random_search.best_params_
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
y_pred
confusion_matrix(y_test,y_pred)
accuracy_score(y_test,y_pred)
print("Accuracy is ",int(accuracy_score(y_test,y_pred)*100),"%")
class Predictor:
@staticmethod
def train(self):
df = pd.read_csv(r'C:\Users\RohanRVC\Documents\heart attack predictor/dataset.csv')
dataset = df
self.standardScaler = StandardScaler()
columns_to_scale = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang',
'oldpeak',
'slope', 'ca', 'thal']
dataset[columns_to_scale] = self.standardScaler.fit_transform(dataset[columns_to_scale])
y = dataset['target']
X = dataset.drop(['target'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)
self.knn_classifier = KNeighborsClassifier(n_neighbors=8)
self.knn_classifier.fit(X, y)
score = self.knn_classifier.score(X_test, y_test)
print('--Training Complete--')
print('Score: ' + str(score))
@staticmethod
def predict(self, row):
user_df = np.array(row).reshape(1, 13)
user_df = self.standardScaler.transform(user_df)
predicted = self.knn_classifier.predict(user_df)
print("Predicted: " + str(predicted[0]))
return predicted[0]
la=str()
def onClick():
row=[[age.get(),gender.get(),cp.get(),tbps.get(),chol.get(),fbs.get(),restecg.get(),thalach.get(),exa
ng.get(),oldpeak.get(),slope.get(),ca.get(),thal.get()]]
print(row)
predictor = Predictor()
o = predictor.has_disease(row)
root2 = tk.Tk()
root2.title("Prediction Window")
if (o == True):
print("Person Have Chronic Kidney Disease")
la="Person Have Chronic Kidney Disease"
tk.Label(root2, text=la, font=("times new roman", 20), fg="white", bg="maroon",
height=2).grid(row=15, column=1)
else:
print("Person Is Healthy")
la="Person Is Healthy"
tk.Label(root2, text=la, font=("times new roman", 20), fg="white", bg="green",
height=2).grid(row=15, column=1)
return True
root = tk.Tk()
root.title("Heart Disease Predictor")
tk.Label(root,text="""Fill your Details""",font=("times new roman", 12)).grid(row=0)
root.mainloop()
6.1 Outline
The project was designed keeping in mind the simple need of finding a solution to the problem of
indecision most people face on a daily basis. It was developed to help predict chronic kidney
diseases at an early stage while keeping it quick, easy and cost-effective. Enabling the general
The program achieved the original objective of designing and implementing the system to use our
knowledge in machine learning and produce a model which could easily quickly and precisely
predict chronic kidney disease, as CKD requires a lot of tedious tests to detect via medical
institutions and these tests cost both money and time, one or both of which is not available to
people belonging to middle-class/ Lower middle-class & those beneath the poverty line. Thus we
aim to make the detection of the disease a lot more quick, cost-efficient and accurate. Therefore
enabling the patients to expend on treating a confirmed disease rather than trying to detect the
disease itself.
The project can be used for personal use in homes as an accessibility device. It can also be used in
offices and workplaces, in order to ensure that employees can function healthily without the
over-looming possibility of disease.
6.4 Inference
Working on this project has provided us with the knowledge that it is important to ensure that
projects of such nature are made accessible to the public so that a maximum number of people can
avail their benefits. The effort and time spent on designing such projects, is ultimately contributing
towards making human lives easier and hence, more and more such projects should be actively
encouraged and funded.
CHAPTER - 7
7.1 Outline
As a final note, we would like to thank our faculty and project guides for giving us enough time to
explore this wonderful topic and create something that would be super helpful and helpful to anyone
who needs it.
Our study was built upon the data that includes age, sex, comorbidities, and medications.
However, laboratory test results were not included. Therefore, our approach is appropriate for a
population study but not recommended for assisting clinicians in assessing the risk for an individual
patient. For clinical practice, the decisions based on laboratory tests would be more reliable.
Adaptation of our method for clinical application would require further analysis and evaluation in a
clinical trial. We would also need to evaluate the model on more recent data to identify potential
model drift. If the model was used for decisions such as the need to set up a new dialysis center or
to launch a public awareness campaign, the performance metrics we obtain are adequate. These
models can also aid in proactively identifying CKD susceptible individuals in an entire region or in
large groups without the need for laborious physical tests. Also, the study is limited by geography
and demographics. Hence, it imposes constraints on the generalizability of the model to the global
population or a different region. The presence of noise in the data from human and technical
errors, which are difficult to identify, may also affect the performance of the model.
In the near future, if our little idea gains enough recognition and support, we would like to expand
our model to predict many other diseases beyond just Chronic Kidney Disease and expand it to
others like Heart diseases, and others. In fact, we would like to add much more so that we can help
the healthcare industry with our knowledge and technology as much as we can and make this world
a better place.
7.4 Inference
Working on this project provided us with invaluable technical and teamwork experience. We also
realized that the effort and time spent on designing such projects are ultimately contributing towards
making human lives easier. Hence, such projects need to be encouraged and funded.
REFERENCES
1. Chronic Kidney Disease Prediction using Machine Learning, Reshma S, Salma Shaji,
SR Ajina, Vishnu Priya SR, Janisha A, 09-07-2020,
https://www.ijert.org/chronic-kidney-disease-prediction-using-machine-learning
3. Risk Level Prediction of Chronic Kidney Disease Using NeuroFuzzy and Hierarchical
Clustering Algorithm (s), Kerina Blessmore Chimwayi, Noorie Haris, Ronnie D. Caytiles*
and N. Ch. S. N Iyengar**, 2017
4. Chronic Kidney Disease Detection Using Machine Learning Techniques, N. Vanitha &
S.V. Sendhuraa, 20.04.2021