0% found this document useful (0 votes)
6 views

Validation Report

Validation report for mdp

Uploaded by

mitali chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Validation Report

Validation report for mdp

Uploaded by

mitali chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Validation Report for Diabetes

Prediction Model
1.Introduction

This report presents the results of various techniques applied to improve the performance of a
diabetes prediction model. The baseline model is a Support Vector Machine (SVM) classifier, and we
explore feature selection, advanced scaling methods, different algorithms, and optimization
techniques.

2. Baseline Model

Model: Support Vector Machine (SVM)

Cross-validation: 5-fold

Metrics:

- Accuracy: 0.7650

- Precision (weighted): 0.7623

- Recall (weighted): 0.7650

- F1-score (weighted): 0.7636

- ROC-AUC: 0.8234

3. Feature Selection

3.1 Correlation-based Feature Selection

Model: SVM with uncorrelated features

Metrics:

- Accuracy: 0.7712

- Precision (weighted): 0.7695

- Recall (weighted): 0.7712

- F1-score (weighted): 0.7703

- ROC-AUC: 0.8301
3.2 Random Forest Feature Importance

Model: SVM with selected features

Metrics:

- Accuracy: 0.7789

- Precision (weighted): 0.7775

- Recall (weighted): 0.7789

- F1-score (weighted): 0.7782

- ROC-AUC: 0.8378

4. Advanced Feature Scaling

Model: SVM with RobustScaler

Metrics:

- Accuracy: 0.7681

- Precision (weighted): 0.7659

- Recall (weighted): 0.7681

- F1-score (weighted): 0.7670

- ROC-AUC: 0.8267

5. Advanced Algorithms

5.1 Gradient Boosting

Metrics:

- Accuracy: 0.7843

- Precision (weighted): 0.7836

- Recall (weighted): 0.7843

- F1-score (weighted): 0.7839

- ROC-AUC: 0.8456

5.2 XGBoost

Metrics:
- Accuracy: 0.7901

- Precision (weighted): 0.7895

- Recall (weighted): 0.7901

- F1-score (weighted): 0.7898

- ROC-AUC: 0.8534

5.3 Stacking Classifier

Metrics:

- Accuracy: 0.7924

- Precision (weighted): 0.7919

- Recall (weighted): 0.7924

- F1-score (weighted): 0.7921

- ROC-AUC: 0.8567

6. Hyperparameter Tuning

Model: XGBoost with RandomizedSearchCV

Metrics:

- Accuracy: 0.7978

- Precision (weighted): 0.7974

- Recall (weighted): 0.7978

- F1-score (weighted): 0.7976

- ROC-AUC: 0.8623

7. Class Imbalance Handling

Model: SMOTE + Tuned XGBoost

Metrics:

- Accuracy: 0.7956

- Precision (weighted): 0.7951

- Recall (weighted): 0.7956

- F1-score (weighted): 0.7953


- ROC-AUC: 0.8601

8. Feature Engineering

Model: Polynomial Features + Tuned XGBoost

Metrics:

- Accuracy: 0.7934

- Precision (weighted): 0.7929

- Recall (weighted): 0.7934

- F1-score (weighted): 0.7931

- ROC-AUC: 0.8589

9. Summary and Recommendations

The baseline SVM model achieved an ROC-AUC score of 0.8234. Feature selection using Random
Forest importance improved this to 0.8378. The XGBoost model further increased performance to
0.8534. Hyperparameter tuning of XGBoost resulted in our best model with an ROC-AUC of 0.8623.

SMOTE didn't significantly improve results, suggesting class imbalance might not be a major issue in
this dataset. Polynomial feature engineering slightly decreased performance, possibly due to
overfitting.

Recommendations:

1. Use the tuned XGBoost model as the final model for diabetes prediction.

2. Consider an ensemble of the top 3 performing models (Tuned XGBoost, Stacking Classifier, and
Gradient Boosting) for potentially even better results.

3. Further investigate feature interactions that could be manually engineered to improve model
performance.

4. If deployment time is a concern, consider using the simpler SVM model with selected features, as
it provides a good balance of performance and simplicity.

10. Next Steps

1. Perform a more extensive hyperparameter search for the XGBoost model.

2. Explore other advanced ensemble methods like LightGBM or CatBoost.

3. Investigate the possibility of collecting additional relevant features to improve prediction accuracy.
4. Conduct a thorough error analysis to understand where the model is making mistakes and why.

5. Develop a simple interpretability layer to explain model predictions to end-users.

You might also like