DS Assignment
DS Assignment
Deadline: 15-03-2025
Objective
This assignment will help students apply machine learning techniques to predict
diseases using patient data. The focus will be on data preprocessing, exploratory data
analysis (EDA), feature engineering, training simple ML models with hyperparameter
tuning, and deploying a prediction web app using Flask/Streamlit.
Problem Statement
You are a Data Scientist working for a healthcare analytics company. Your task is to
build a machine learning model that predicts whether a patient is at risk of a
particular disease based on their health parameters. The dataset contains medical
records such as age, BMI, glucose levels, and other diagnostic features. Your goal is to
develop a classification model that can accurately predict the presence or absence of
a disease.
Dataset
• Age
• Gender
• Blood Pressure
• Cholesterol Levels
• Glucose Levels
Students can choose one of these datasets or any similar real-world dataset.
1. Load the dataset and display the first few rows.
• Summary statistics
• Correlation matrix
Deliverables:
• Python code with EDA
Deliverables:
1. Split the dataset into training and testing sets (80-20 or 70-30 split).
• Logistic Regression
• Decision Tree
• Random Forest
4. Train and evaluate models using metrics such as accuracy, precision,
recall, and F1-score.
Deliverables:
3. Apply feature selection techniques and retrain the model if necessary.
Deliverables:
2. Develop a Flask or Streamlit web application where users can input
patient details and receive a disease prediction.
3. Write a report summarizing:
• Problem statement
Deliverables:
Submission Instructions
• Naming Convention:
StudentID_LastName_FirstName_ML_Assignment.pdf
Additional Notes