Thyroid Disease Classification Using Machine Learning Project
Thyroid Disease Classification Using Machine Learning Project
The Thyroid gland is a vascular gland and one of the most important organs
of the human body.
The two types of Thyroid disorders are Hyperthyroidism and
Hypothyroidism.
A thyroid-related Blood test is used to detect this disease but it is often
blurred and noise will be present.
Data cleansing methods were used to make the data primitive enough for the
analytics to show the risk of patients getting this disease.
Machine Learning plays a very deciding role in disease prediction.
Machine Learning algorithms, SVM - support vector machine, Random
Forest Classifier, XGB Classifier and ANN - Artificial Neural Networks are
used to predict the patient’s risk of getting thyroid disease.
Technical Architecture:
1.1 Overview:
Define Problem / Problem Understanding
o Specify the business problem
o Business requirements
o Literature Survey
o Social or Business Impact.
Data Collection & Preparation
o Collect the dataset
o Data Preparation
Exploratory Data Analysis
o Descriptive statistical
o Visual Analysis
Model Building
o Training the model in multiple algorithms
o Testing the model
Performance Testing & Hyperparameter Tuning
o Testing model with multiple evaluation metrics
o Comparing model accuracy before & after applying hyperparameter
tuning
Model Deployment
o Save the best model
o Integrate with Web Framework
Project Demonstration & Documentation
o Record explanation Video for project end to end solution
o Project Documentation-Step by step project development procedure
Milestone 1: Define Problem / Problem Understanding
There are many popular open sources for collecting the data.
Eg: kaggle.com, UCI repository, etc.
In this project, we have used drug200.csv data. This data is
downloaded from kaggle.com.
Please refer to the link given below to download the dataset.
Link: https://www.kaggle.com/prathamtripathi/drug-classification
Our dataset format might be in .csv, excel files, .txt, .json, etc. We
can read the dataset with the help of pandas.
In pandas, we have a function called read_csv() to read the dataset.
As a parameter, we have to give the directory of the csv file.
Activity 2: Data Pre-processing
As we have understood how the data is, let's pre-process the collected
data.
The download data set is not suitable for training the machine learning
model as it might have so much randomness so we need to clean the dataset
properly in order to fetch good results. This activity includes the following steps.
Handling missing values
Descriptive analysis
Splitting the dataset as x and y
Handling Categorical Values
Checking Correlation
Converting Data Type
Splitting dataset into training and test set
Handled Imbalanced Data
Applying StandardScaler
Activity 2.1: Checking for null values
For checking the null values, data.isnull() function is used. To sum
those null values we use the .sum() function to it. From the below
image we found that there are no null values present in our dataset.
So we can skip handling the missing value step.
Removing the Redundant attributes from the dataset.
Checking the 'age' is there any above 100 and we drop the age>100.
Converting the data type from object to float. So that we will get
output properly and Checking info about the data.
Here, we have the object values are 'TSH', 'T3', 'TT4', 'T4U', 'FTI', 'TBG' and
convert them to float values.
Then we can check the datatype information about the dataset by code of
x.info()
Activity 2.4: Handling Categorical Values
As we can see our dataset has categorical data we must convert the categorical
data to integer encoding or binary encoding.
To convert the categorical features into numerical features we use encoding
techniques. There are several techniques but in our project we are using Ordinal Encoding and
Label Encoding.
Load the saved model. Importing the flask module in the project is mandatory. An
object of Flask class is our WSGI application. Flask constructor takes the name of
the current module (__name__) as argument.
Here we are routing our app to predict() function. This function retrieves all the
values from the HTML page using Post request. That is stored in an array. This
array is passed to the model.predict() function. This function returns the
prediction. And this prediction value will be rendered to the text that we have
mentioned in the submit.html page earlier.
Main Function: