Mdp
Mdp
Tomorrow, I have my interview. I have this project to describe. The name of the project is 'Multiple
Disease Prediction Model.' Please help me understand each file here. I've another file named app.py
whose code is:
import os
import pickle
import streamlit as st
from streamlit_option_menu import option_menu
heart_disease_model = pickle.load(open(f'{working_dir}/saved_models/heart_disease_model.sav',
'rb'))
['Diabetes Prediction',
'Heart Disease Prediction',
'Parkinsons Prediction'],
menu_icon='hospital-fill',
icons=['activity', 'heart', 'person'],
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/16
default_index=0)
# page title
st.title('Diabetes Prediction using ML')
with col1:
Pregnancies = st.text_input('Number of Pregnancies')
with col2:
Glucose = st.text_input('Glucose Level')
with col3:
BloodPressure = st.text_input('Blood Pressure value')
with col1:
SkinThickness = st.text_input('Skin Thickness value')
with col2:
Insulin = st.text_input('Insulin Level')
with col3:
BMI = st.text_input('BMI value')
with col1:
DiabetesPedigreeFunction = st.text_input('Diabetes Pedigree Function value')
with col2:
Age = st.text_input('Age of the Person')
diab_prediction = diabetes_model.predict([user_input])
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/16
if diab_prediction[0] == 1:
diab_diagnosis = 'The person is diabetic'
else:
diab_diagnosis = 'The person is not diabetic'
st.success(diab_diagnosis)
# page title
st.title('Heart Disease Prediction using ML')
with col1:
age = st.text_input('Age')
with col2:
sex = st.text_input('Sex')
with col3:
cp = st.text_input('Chest Pain types')
with col1:
trestbps = st.text_input('Resting Blood Pressure')
with col2:
chol = st.text_input('Serum Cholestoral in mg/dl')
with col3:
fbs = st.text_input('Fasting Blood Sugar > 120 mg/dl')
with col1:
restecg = st.text_input('Resting Electrocardiographic results')
with col2:
thalach = st.text_input('Maximum Heart Rate achieved')
with col3:
exang = st.text_input('Exercise Induced Angina')
with col1:
oldpeak = st.text_input('ST depression induced by exercise')
with col2:
slope = st.text_input('Slope of the peak exercise ST segment')
with col3:
ca = st.text_input('Major vessels colored by flourosopy')
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/16
with col1:
thal = st.text_input('thal: 0 = normal; 1 = fixed defect; 2 = reversable defect')
user_input = [age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal]
heart_prediction = heart_disease_model.predict([user_input])
if heart_prediction[0] == 1:
heart_diagnosis = 'The person is having heart disease'
else:
heart_diagnosis = 'The person does not have any heart disease'
st.success(heart_diagnosis)
# page title
st.title("Parkinson's Disease Prediction using ML")
with col1:
fo = st.text_input('MDVP:Fo(Hz)')
with col2:
fhi = st.text_input('MDVP:Fhi(Hz)')
with col3:
flo = st.text_input('MDVP:Flo(Hz)')
with col4:
Jitter_percent = st.text_input('MDVP:Jitter(%)')
with col5:
Jitter_Abs = st.text_input('MDVP:Jitter(Abs)')
with col1:
RAP = st.text_input('MDVP:RAP')
with col2:
PPQ = st.text_input('MDVP:PPQ')
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/16
with col3:
DDP = st.text_input('Jitter:DDP')
with col4:
Shimmer = st.text_input('MDVP:Shimmer')
with col5:
Shimmer_dB = st.text_input('MDVP:Shimmer(dB)')
with col1:
APQ3 = st.text_input('Shimmer:APQ3')
with col2:
APQ5 = st.text_input('Shimmer:APQ5')
with col3:
APQ = st.text_input('MDVP:APQ')
with col4:
DDA = st.text_input('Shimmer:DDA')
with col5:
NHR = st.text_input('NHR')
with col1:
HNR = st.text_input('HNR')
with col2:
RPDE = st.text_input('RPDE')
with col3:
DFA = st.text_input('DFA')
with col4:
spread1 = st.text_input('spread1')
with col5:
spread2 = st.text_input('spread2')
with col1:
D2 = st.text_input('D2')
with col2:
PPE = st.text_input('PPE')
parkinsons_prediction = parkinsons_model.predict([user_input])
if parkinsons_prediction[0] == 1:
parkinsons_diagnosis = "The person has Parkinson's disease"
else:
parkinsons_diagnosis = "The person does not have Parkinson's disease"
st.success(parkinsons_diagnosis)
To assist you in understanding your project, "Multiple Disease Prediction Model," I'll explain the code in
your `app.py` file, along with analyzing the three notebooks (`diabetes.ipynb`, `heart.ipynb`, and
`Parkinsons.ipynb`) you've provided. These components are likely interconnected.
Overview of `app.py`
The `app.py` file acts as the frontend interface using Streamlit. Its main functions:
1. Load pre-trained machine learning models from serialized files (`.sav`) stored in the
`saved_models` directory.
2. Provide an interactive interface for predicting:
Diabetes
Heart Disease
Parkinson's Disease
3. Allow navigation between disease prediction pages using a sidebar menu.
Key Sections:
2. Sidebar Navigation: The sidebar enables switching between the three disease prediction
interfaces.
3. Prediction Pages: Each page collects user input (medical features) and passes it to the
corresponding ML model to predict the likelihood of the disease.
4. Prediction Logic:
Collect user input via `st.text_input()`.
Convert inputs to numerical format (`float()`).
Use the `.predict()` method of the loaded models to make predictions.
Display the result on the page.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/16
Next Steps
1. Diabetes Notebook:
Imports libraries like `numpy`, `pandas`, `scikit-learn`, and `svm`.
Loads a dataset named `diabetes.csv`.
Uses the SVM (Support Vector Machine) algorithm to train the model.
2. Heart Disease Notebook:
Imports similar libraries, with a focus on `LogisticRegression` for model training.
Loads a dataset named `heart.csv`.
Implements logistic regression for prediction.
3. Parkinson's Disease Notebook:
Similar imports and uses SVM for training.
Loads a dataset named `parkinsons.csv`.
Next Steps:
I will:
Let me proceed to extract and summarize the preprocessing and model training logic.
I have extracted the relevant preprocessing and model training steps from each notebook. Each
notebook contains 7 key code blocks related to these steps. Let me summarize the key points for each:
1. Preprocessing:
Loads `diabetes.csv` and splits data into features and target (`Outcome`).
Uses `train_test_split` to create training and testing datasets.
2. Model Training:
Trains an SVM classifier on the training data.
Evaluates accuracy using `accuracy_score`.
3. Model Saving:
Saves the trained model as a `.sav` file using `pickle.dump`.
1. Preprocessing:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/16
Loads `heart.csv` and splits data into features and target (`target`).
Splits the dataset into training and testing sets.
2. Model Training:
Trains a Logistic Regression model on the training data.
Measures accuracy on test data.
3. Model Saving:
Serializes the trained model into a `.sav` file.
1. Preprocessing:
Loads `parkinsons.csv` and separates features and target (`status`).
Performs training/testing split.
2. Model Training:
Uses an SVM classifier for training.
Evaluates its performance with accuracy metrics.
3. Model Saving:
Stores the model using `pickle.dump`.
Please answer
Project Overview
The project is a machine learning-based application that predicts the likelihood of three diseases:
1. Diabetes
2. Heart Disease
3. Parkinson's Disease
The application has a user-friendly interface built with Streamlit, where users input medical data to
receive predictions. Three separate ML models were trained for each disease using datasets and stored
as `.sav` files.
Key Components
1. app.py
Loads the pre-trained models for each disease using the `pickle` library.
Provides a Streamlit-based user interface:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/16
A sidebar menu for selecting the disease type.
Pages for user input and displaying predictions.
Handles user input and passes it to the respective model for prediction.
Dataset: `diabetes.csv`
Contains features like pregnancies, glucose level, blood pressure, BMI, etc., and a target
column (`Outcome`).
Algorithm: Support Vector Machine (SVM).
Workflow:
Dataset preprocessing: Splitting into features and labels, and creating training/testing sets.
Training the SVM classifier.
Model evaluation using `accuracy_score`.
Saving the model using `pickle.dump`.
Dataset: `heart.csv`
Includes features like age, sex, chest pain type, blood pressure, cholesterol, etc., and a target
column (`target`).
Algorithm: Logistic Regression.
Workflow:
Preprocessing: Splitting features and labels, and creating training/testing sets.
Training the Logistic Regression model.
Testing the model for accuracy.
Saving the trained model.
Dataset: `parkinsons.csv`
Features include voice-related attributes such as jitter, shimmer, HNR, etc., with `status` as
the target.
Algorithm: Support Vector Machine (SVM).
Workflow:
Data preprocessing: Separating features and labels, and creating train/test splits.
Model training with SVM.
Model evaluation.
Saving the model.
1. Each model was trained and saved independently in the respective notebook.
2. `app.py` loads these saved models and uses them to make predictions based on user inputs.
3. The interactive interface ensures a seamless experience for testing predictions for all three
diseases.
Next Steps
Let’s break down the Diabetes Prediction Model Notebook (`diabetes.ipynb`) in detail:
1. Importing Libraries
python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
python
diabetes_dataset = pd.read_csv('/content/diabetes.csv')
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/16
Example (first 5 rows):
3. Preprocessing
The dataset is split into features (X) and target labels (Y):
python
X = diabetes_dataset.drop(columns='Outcome', axis=1)
Y = diabetes_dataset['Outcome']
python
4. Model Training
python
classifier = svm.SVC(kernel='linear')
classifier.fit(X_train, Y_train)
5. Model Evaluation
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/16
python
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
X_test_prediction = classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
6. Model Saving
python
import pickle
filename = 'diabetes_model.sav'
pickle.dump(classifier, open(filename, 'wb'))
Key Considerations
Why SVM?
Effective for smaller datasets with clear margins of separation.
Handles high-dimensional spaces well.
Limitations:
Could explore other models (e.g., Random Forest) for potential performance improvement.
Would you like me to explain the other notebooks or dive deeper into SVM?
Let’s break down the Heart Disease Prediction Model Notebook (`heart.ipynb`) and Parkinson's
Disease Prediction Model Notebook (`parkinsons.ipynb`) in detail:
1. Importing Libraries
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/16
python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
python
heart_data = pd.read_csv('/content/heart.csv')
Dataset includes medical features like age, sex, chest pain types, resting blood pressure,
cholesterol, etc., with a `target` column (1: Heart Disease, 0: No Heart Disease).
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal
63 1 3 145 233 1 0 150 0 2.3 0 0 1
3. Preprocessing
python
X = heart_data.drop(columns='target', axis=1)
Y = heart_data['target']
python
4. Model Training
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/16
python
model = LogisticRegression()
model.fit(X_train, Y_train)
5. Model Evaluation
python
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(Y_test, X_test_prediction)
6. Model Saving
python
import pickle
filename = 'heart_disease_model.sav'
pickle.dump(model, open(filename, 'wb'))
1. Importing Libraries
python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/16
2. Loading the Dataset
python
parkinsons_data = pd.read_csv('/content/parkinsons.csv')
Dataset includes 22 features related to vocal parameters (e.g., jitter, shimmer, etc.) and a `status`
column (1: Parkinson’s, 0: Healthy).
3. Preprocessing
python
python
4. Model Training
python
model = SVC(kernel='linear')
model.fit(X_train, Y_train)
5. Model Evaluation
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 15/16
python
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(Y_test, X_test_prediction)
6. Model Saving
python
import pickle
filename = 'parkinsons_model.sav'
pickle.dump(model, open(filename, 'wb'))
Summary
Each model is trained, evaluated, and saved for use in the Streamlit app (`app.py`). Let me know if you'd
like help integrating or enhancing this project!
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 16/16