Report
Report
A high-risk pregnancy is one that threatens the health or life of the mother or her
foetus. It often requires specialized care from specially trained providers.
Some pregnancies become high risk as they progress, while some women are at
increased risk for complications even before they get pregnant for a variety of
reasons.
Early and regular prenatal care helps many women have healthy pregnancies and
deliveries without complications.
High-risk pregnancy is a term that can denote a wide variety of common conditions.
Many of them are related to pre-existing conditions you may have had before
becoming pregnant or conditions you may have developed while pregnant or during
delivery.
Aims and Objectives
The aim of this project is to build a machine learning model that predicts the risk
level in pregnant patients. This dataset has different factors influencing patient
health, we will classify the patients with a low, medium or high risk of having
pregnancy complications.This can be achieved by training a machine learning model
with past records of some patients having various features.
Methodology
The whole process of a machine learning project is explained in this flowchart
starting with data collection along with its cleaning and moving with model training
and evaluation till model deployment.
mailto:https://www.kaggle.com/datasets/csafrit2/maternal-health-risk-data?
resource=download
Data has been collected from different hospitals, community clinics, maternal health
cares through the IoT based risk monitoring system.
Data Pre-processing
The dataset has 1014 rows and 7 columns of dataentries.
There are no missing values in the data.
Number of rows with duplicates are 562.
Most of the columns are numeric.
Converted column RiskLevel with string data type to numeric data type to
make them fit for analysis and modelling.
To get familiar to the dataset, we used describe() function and gathered some
information.
In these plots, we will see the dependency of age against the column
risklevel.
This plot shows that with increase in age, risk level increases.
This plot shows the distribution of heart rate with risk level.
From the below heat map, we see how each feature are related with each
other.
Model Training And Evaluation
As our problem statement is classification based, so we tried different classification
algorithms in order to find the model with highest accuracy.
About eight (8) machine learning algorithms were evaluated on the resulting dataset.
The algorithms include:
KNN, DecisionTreesClassifier, AdaBoostingClassifier, RandomForest,
BaggingClassifier, XGBClassifier, CalibratedClassifierCV and GaussianNB.
According to our classification models in this study, the best-performing model was
the RandomForestClassifier so we stick to it.
Model Deployment
The deployment was done using the flask framework in python programming
language and was hosted using the Heroku platform. The saved model and
transformer were deployed alongside in order to transform new and subsequent
“facts” and “issue_area” returned in the form from the frontend of the project to make
predictions for the input given by the user of the web app. Here are the links;
Heroku:
Link to repo:
Conclusion
Recommendations
References
Using Machine Learning to Predict Complications in Pregnancy: A Systematic
Review,
mailto:https://www.frontiersin.org/articles/10.3389/fbioe.2021.780389/full
Pregnancy Outcome Prediction study,
mailto:https://en.wikipedia.org/wiki/Pregnancy_Outcome_Prediction_study
Team Members