0% found this document useful (0 votes)
37 views

Machine Learning-Based Maternal Health Risk Predic

This document discusses the development of a machine learning model to predict maternal health risks using data collected from IoT devices. The model was developed using random forest classification and evaluated on a dataset of 1014 instances and six risk factors. Exploratory data analysis was used to identify the most important risk factors. The best model achieved an accuracy of 93.14% and was deployed in an Android application to generate risk alerts and medical reports. The goal is to help improve maternal health monitoring and outcomes.

Uploaded by

rajeshkumar32it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Machine Learning-Based Maternal Health Risk Predic

This document discusses the development of a machine learning model to predict maternal health risks using data collected from IoT devices. The model was developed using random forest classification and evaluated on a dataset of 1014 instances and six risk factors. Exploratory data analysis was used to identify the most important risk factors. The best model achieved an accuracy of 93.14% and was deployed in an Android application to generate risk alerts and medical reports. The goal is to help improve maternal health monitoring and outcomes.

Uploaded by

rajeshkumar32it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DOI: https://doi.org/10.52756/ijerr.2023.v32.012 Int. J. Exp. Res. Rev., Vol.

32: 145-159 (2023)

Machine learning-based maternal health risk prediction model for IoMT framework
Subhash Mondal, Amitava Nag, Anup Kumar Barman and Mithun Karmakar*

Department of Computer Science and Engineering, Central Institute of Technology Kokrajhar,


Kokrajhar, Assam, India
E-mail/Orcid Id:
SM, [email protected], https://orcid.org/0000-0002-4203-8467; AN, [email protected], https://orcid.org/0000-0003-4408-7307; AKB,
[email protected], https://orcid.org/0000-0002-5697-0431; MK, [email protected], https://orcid.org/0000-0003-0683-88000000-0003-4408-7307

Article History: Abstract: The Internet of Things (IoT) is vital as it offers extensive applicability in various
Received: 15thApr., 2023 fields, including healthcare. In the context of the risk level during pregnancy, to monitor
Accepted: 28th Jul., 2023 and predict abnormalities, IoT devices provide a means to collect real-time health data,
Published: 30th Aug., 2023 enabling continuous monitoring and analysis in the Internet of Medical Things (IoMT)
environments. By integrating IoT devices into the system, crucial signs such as Heart Rate
Keywords: (HR), Systolic and Diastolic Blood Pressure (BP), Fetal Movements (FM), and
Maternal Health Risk, Temperature (T) can be tracked remotely and non-invasively. This allows for the timely
Internet of Medical detection of abnormalities or potential risk factors during pregnancy, empowering
Things (IoMT), healthcare professionals to intervene proactively and provide personalized care. This
Prediction Model, research focuses on developing a system for observing and predicting the maternal risk
Exploratory Data level in the IoT environment, mainly in remote areas. The goal is to improve maternal
Analysis (EDA), health and reduce maternal and child mortality rates, a significant decline according to
Android-based United Nations targets for 2030. The research utilizes analytical tools and Machine
Application, Random Learning (ML) algorithms to analyze health data and risk factors associated with
Forest Classifier pregnancy. The acquired dataset contains various risk factors categorized and classified
based on intensity. After comparing different ML models’ experimental results,
Exploratory Data Analysis (EDA) approaches to determine the most effective risk factors.
The fine-tuned Random Forest Classifier (RF) achieves the highest accuracy of 93.14%. An
Android-based application has also been developed to deploy the prediction model to
determine risk levels based on the different parameters.

Introduction (Castillejo et al., 2013). Despite recent


Maternal Health Risk (MHR) refers to potential technological advances, the rate of maternal death is
health problems arising during pregnancy, decreasing, making it difficult to ensure both the
childbirth, and postpartum. According to WHO, mother’s and child’s safety during pregnancy.
there are around 280,000 fatalities of women due to Pregnancy-related risks can be reduced in this
pregnancy complications, which means a woman scenario by anticipating complications and taking
dies approximately every two minutes (WHO, precautions.
2023). The various factors increase the mortality Some studies have been conducted in recent
rate of maternal women and childbirth, including years to predict certain risks that can occur during
the shortage of doctors and nurses and the pregnancy and to predict the birth method best
localization, time, and distance (Redondi et al., suited to mothers' pregnancy characteristics. For
2013). According to WHO's report in 2020, around example, Pereira et al. (2015) used different
800 women die daily due to poor resources and care supervised ML algorithms to predict the best

*Corresponding Author: [email protected] 145


Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
delivery method among vaginal, cesarean, forceps, Healthcare providers can prioritize their care by
and vacuum delivery. In another study, Chen et al. identifying women at higher risk. It can also
(2011) used a Neural Network (NN) and Decision empower pregnant women with information about
Tree (DT) algorithm to predict the factors their risk factors and allow them to make informed
associated with preterm birth. Similarly, Rawashdeh decisions about their health.
et al. (2020) used Random Forest (RF), DT, K- Overall, the motivation behind predicting
Nearest Neighbors (KNN), and NN to predict the maternal health risks and implementing it as an
risk of premature birth. For different data types, Android application is to enhance pregnant women's
different Machine Learning techniques are used, health and well-being, reduce complications, and
with varying results and performance. improve outcomes for both mother and babies.
This research study focuses on deploying the ML The study aims to develop an Android
classifiers prediction model that determined application that integrates with IoT devices, such as
maternal time frame health risk. Initial, five ML wearable sensors and remote monitoring systems, to
classifiers, namely RF, DT, KNN, Logistic predict and mitigate maternal health risks that arise
Regression, and Support Vector Machine, were during pregnancy. The article mainly focuses on the
deployed after performing some data preprocessing following:
techniques on the acquired dataset consisting of
1014 instances and six related factors that contribute ● To introduce an IoT-based framework that is
to determining the “Risk Level” as target outcomes capable of monitoring maternal health. The
in multiclass classification in the First Stage. In the medical sensors/devices collected data samples
second stage of the prediction model, an immense (blood pressure, body temperature, heart rate, etc.)
data analysis approach was performed on the entire that are directly fed into machine learning models
feature levels by considering the Exploratory Data for the risk prediction of maternal health.
Analysis (EDA) techniques in multifold to decide ● To create and deploy the ML model on an
the more contributing features that predict the Android-based application to generate an
outcome level. The best-performing RF model is emergency alert and medical reports to the user,
deployed on the processed dataset after eliminating their relatives, and doctors.
the noncontributing feature using EDA. Under the ● To perform feature selection via the Exploratory
best configurable test condition, the processed RF Data Analysis (EDA) approach to decide the
model performed well with an improved accuracy important and relevant factors contributing to
of 91.18%. The hyper-parameter tuning approach maternal health risk prediction.
was applied using the Grid Search CV to derive the
Related works
best estimator values corresponding to each
This section demonstrates a few related kinds of
parameter. The best hyper-parameterized RF model
literature conducted before using approaches like
was employed to tune the experimental results
Neural Networks (NN), ML classifiers, and the
under the same test condition and achieved the
ensemble technique to combine the different
highest accuracy of 93.14%.
architectures for predicting maternal health risk
Motivation
factors. Some of the studies focus on monitoring
Predicting MHR aims to improve the overall
systems during pregnancy time.
health of pregnant women and their babies. MHR
Ali Raza et al. (2022) proposed an ensemble
can occur during pregnancy, childbirth, and the
method, BiLTCN that combined the NN-based
postpartum period. However, it is most prevalent
BiLSTM, Temporal Convolutional Network, and
during pregnancy when women are at a higher risk
Decision Tree as a classifier using the clinical
of developing health issues, which can lead to
dataset of 1218 instances collected by the IoT-
miscarriage and death in certain circumstances
enabled system. The proposed system observed
(Hussain et al., 2014). By identifying and assessing
results after balancing using SMOTE with an
the potential health risks early on, healthcare
average accuracy of 88%. Also, they applied feature
professionals can take measures to prevent, manage,
selection techniques and used SVM along with
or treat these conditions.
BiLTCN, claiming 98% accuracy on the reduced
Predicting health risks can also help healthcare
feature model.
systems to allocate resources more effectively.
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
146
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
Ahmed et al. (2020) executed research by using Learning can be utilized through Data Analysis and
the ML models and concluded that the Logistic Feature Selection, Model Development, Training
Model Tree (LMT) classifier performs better in and Validation, and Predictive analysis. The
analyzing the factors related to maternal health. The classification task of predicting a specific disease,
IoT-enabled system data were collected and malware, or conditions using ML techniques
deployed on the LMT model, producing 90% enables one to reduce the dimension of the features
accuracy. using feature selection techniques or applying the
The mortality prediction rate was developed data analysis approaches and combining the
using the ML models, and the two-class SVM different model’s predictions using ensemble
model produced a more accurate accuracy of 86.7% techniques (Islam et al., 2023).
compared to other models (Rani and Kumar, 2021). The upcoming challenges in the medical field are
Also, Akbulut et al. (2018) developed the fetal the development of modern IoT devices and the
health monitoring system using the Decision Forest environment provided by the technology
Model with an accuracy of 89.5% under test enhancement and the uses of IoT applications. With
conditions compared to other ML models. Sarhaddi the recent development of the new Medical 4.0 in
et al. (2021) proposed an IoT-based Maternal health the healthcare sector, everything is now connected
monitoring system for long-term uses that monitor through IoT nodes, even hospital beds, to patients’
pregnant women the entire time. physical and biological characteristics. The
Assaduzzaman et al. (2023) focused on ML application of Medical 4.0 in healthcare sectors is
model to develop risk factors for maternal health discussed by Haleem et al. (2022) and provides the
using a dataset that preprocessed and applied feature details to decrease the cost of healthcare expenses in
engineering techniques to develop a prediction underdeveloped or developed countries. Patient data
model using RF and other ML classifiers; among is digitalized, and the transformation of doctor-
them, RF achieved an accuracy of 90% which was a centric treatment at a hospital or clinic is replaced
most top model. Pereira et al. (2020) addressed the by IoT technology to patient-centric approaches.
health monitoring system of maternal risk factors Medical 4.0 is embedded with industry 4.0 at the
using six ML models and applied the feature manufacturing level with high safety, security and
elimination technique RFE to the feature set. The privacy and is more effective (Oliveira et al., 2021;
RF classifier with RFE achieved the highest mean Al-Jaroodi et al., 2020). The IoT has a significant
accuracy of 93.24%. Pawar et al. (2022) deployed role in maternal health risk prediction. It can
eight ML models using the k-fold cross-validation provide real-time monitoring, data collection, and
technique to classify maternal risk into three connectivity between devices. In this research
classes. Among the models, RF provided the best study, three types of IoT devices (Heart rate, blood
results, with a mean accuracy of 70.21%. pressure, and body temperature measuring) will be
Maternal health risk prediction aims to develop used; these devices will provide real-time data for
and implement models and systems that can risk assessment. Many IoT-based software
effectively predict the risk associated with maternal applications are developed to increase the
health outcomes during pregnancy. It involves satisfaction level of patients through smooth
research, data collection, model development, result communication among the hospitals and are always
validation, and implementation to improve maternal connected through IoT-enabled applications
health care and reduce mortality rates. The concepts regardless of the physical locations (Pang et al.,
used in this study are ML, IoT, and Software 2018; Gupta et al., 2020; Celdrán et al., 2018; Jaleel
Development (Android application). et al., 2020).
ML techniques have an important role in From the above analysis, we note that there is a
maternal health risk prediction. It has been widely lack of work on automatic health risk prediction and
used in predicting the mode of childbirth and monitoring of a woman during their maternal.
assessing the potential maternal risk during Therefore, the proposed work is important because
pregnancy. These techniques allow us to develop it integrates IoT and ML to automatically diagnose
prediction models to analyze data and identify abnormalities of a woman during their maternal
patterns, correlations and predictive factors that give smart at early stage.
rise to adverse maternal health outcomes. Machine
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
147
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 1. IoT-based automated smart maternal health risks monitoring system: based on machine learning

Materials and methods Risk Level Assessment: Based on the analysis, the
The proposed system is an android-based maternal system would assign a risk level to each pregnant
health risk prediction system in an IoT environment, woman, indicating the likelihood and severity of potential
designed to analyze data from IoT devices and predict the health risks. This scoring system can help prioritize high-
health risk level of a pregnant woman during pregnancy. risk cases for further and immediate medical attention.
Its primary objective is to improve maternal health risk Early Warning: The system can generate alerts and
outcomes by identifying high-risk cases early on. The notifications for healthcare professionals and registered
system architecture of the proposed model based on ML family members when a patient's risk level crosses a
classifiers is depicted in Fig. 1. The detailed step-by-step certain threshold.
explanation of the system workflow is discussed below in Android Application Deployment: Deploying an
phases. Android app that utilizes machine learning models
Data Collection: The system would gather relevant involves several steps. First, the machine learning model
data about pregnant women, including age, blood must be trained and optimized for mobile deployment.
pressure (from IoT device), blood sugar, body Then, the model is integrated into the Android app,
temperature (from IoT device) and heart rate (from IoT ensuring compatibility and efficient resource usage.
device). Finally, the app and the embedded machine learning
Data Preprocessing: The system would preprocess model are packaged and benefit from the intelligent
raw data to make it suitable for further analysis and functionalities.
modelling. Implementation details
Exploratory Data Analysis (EDA): EDA is an As per the proposed system architecture, using the
approach for analyzing and visualizing data to gain first approach after collecting the raw dataset from the
insights, understand the underlying patterns, and open source, we performed some data preprocessing
identify relationships between variables. It helps in techniques to transform the raw dataset into a processed
understanding the structure of the data, detecting dataset to perform the ML model deployment for
outliers, and assessing variables. deciding any risk of abnormalities during pregnancy
Feature Selection/Feature Engineering: It is the time. The two-stage prediction model based on the ML
process of choosing a subset from a large set of technique in the IoT environment is illustrated in stage 1
available features in a dataset. for initial model prediction, and in the second stage, a
Machine Learning Models: The system would then unique approach, EDA was applied for feature selection
utilize machine learning algorithms to analyze the for the final model deployment. The detailed architecture
collected data and identify patterns and correlations is depicted in Figure 2.
between risk factors and potential health risks.

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
148
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 2. Two-stage prediction model workflow diagram

Table 1. Dataset feature description with null values


# Column/Feature #NullValues Dtype
0 Age 0
1 SystolicBP 0 int64
2 DiastolicBP 0
3 BS 0
float64
4 BodyTemp 0
5 HeartRate 0 int64
6 RiskLevel 0 object

Figure 3. Target outcome data distribution

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
149
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
st
Table 2. Experimental results in 1 stage model prediction

Model Acc %) Pre Re KMA (%) Fs


RF 86.275 0.864 0.863 83.556 0.863
DT 86.275 0.862 0.863 81.471 0.862
KNN 72.549 0.729 0.725 68.312 0.722
SVM 68.628 0.684 0.686 67.979 0.672
LR 65.686 0.655 0.657 63.700 0.643
Data preprocessing During the data analysis phase, we checked the data
Data preprocessing includes cleaning data, removing distribution of the target column; the target label was
impossible or replacing null values, and checking multivalued and categorical in nature, and the class
categorical features. The entire dataset does not have any distribution was not equal instances. The pictorial
null values, and to convert the categorical column, the presentation of class distribution is depicted in Figure 3.
Label Encoding technique was used to numerical ones for Our First Approach to deploying the ML-based model
the “RiskLevel” column. To standardize each feature used all the features as independent variables of model
value with a specific range between "0" and "1", the input and the target level by considering the actual values
normalization technique was applied to the entire raw of the risk level. For the model creation, we split the
dataset using the MinMax scaler to scale down the cell dataset of total instances into the ratio of 0.90:0.10 used
values. for the model training and the rest for the model
validation.
Model training & validation
After preprocessing, we moved towards the next stage
of model training. The five ML multiclass classifiers,
namely Random Forest (RF), Decision Tree (DT),
Support Vector Machine (SVM), k-Nearest Neighbor
(KNN), and Logistic Regression (LR), were deployed
under the best configurable Python environment using the
training dataset. The classification report was derived by
considering the performance metrics Accuracy (Acc),
Precision (Pre), Recall (Re), and F1 Score (Fs) to
evaluate the model performance using the test dataset.
Figure 4. The CM of the unbalanced RF model The cross-validation (CV) technique was also applied to
the entire dataset to handle the low-resource dataset

Figure 5. Accuracy comparison of the deployed ML models

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
150
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
Technique, an algorithm used to address the class
Table 3. The balanced employed models’ experimental findings

Model Acc (%) Pre Re KMA (%) Fs


RF 90.164 0.901 0.902 88.140 0.901
DT 88.525 0.885 0.885 86.867 0.883
KNN 77.049 0.769 0.770 73.088 0.768
SVM 67.213 0.663 0.672 68.696 0.655
LR 65.574 0.659 0.656 58.932 0.657
instances situation, in this case, to overcome the model imbalance in supervised learning problems. It is designed
overfitting and underfitting problems. The five-fold CV to oversample the minority class by creating synthetic
results and all other metrics outcomes of all the deployed examples. Both under-sampling and over-sampling have
models are represented in Table 2. The Confusion Matrix their disadvantages: data loss for under-sampling and
(CM) of the best-performing RF model is depicted in overfitting for oversampling. SMOTE has no
Figure 4. The experimental results of the deployed disadvantages since it creates synthetic examples to
models in terms of Acc are depicted in Figure 5. balance the data. The results could have been more
accurate, but in the case of multiclass, it was impressive.
The model's outcome is tabulated in Table 3.
Exploratory Data Analysis (EDA)
In the second stage of model training and testing,
before that, we performed the exploratory data analysis
phases among the features. The three types of EDA
approaches were executed by taking the features as a
factor and applying Univariate, Bivariate, and
Multivariate analysis on the six features corresponding to
the target variable “RiskLevel.”

Figure 7. In 1st stage, accuracy comparison of all models

Figure 6. The CM of the balanced RF model Univariate analysis


Univariate analysis separately explores the
The results obtained after implementation were normal,
distribution of each variable in a data set. It looks at the
so we used SMOTE, Synthetic Minority Over-Sampling
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
151
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 8. The histogram and boxplot of the Age and SystolicBP

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
152
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 9. The histogram and boxplot of the Diastolic BP, BS, Body Temp, and Heart Rate

Figure 10. The Correlation Heatmap of all features

range of values and the central tendency of the values. Bivariate analysis
Univariate data analysis does not look at relationships Bivariate analysis helps study the relationship
between variables (like bivariate and multivariate between two variables. It helps to find out if there is an
analysis); rather, it summarizes each variable association between the variables, and if yes, then what is
independently. Methods to perform univariate analysis the strength of the association? One variable here is
will depend on whether the variable is categorical or dependent, while the other is independent. We used
numerical. For the numerical variable, we would explore correlation coefficients to find out how high is the
the shape of the distribution (distribution can either be relationship between two variables. We also use scattered
symmetric or skewed) using histogram and density plots. plots to show the patterns that can be formed using the
We would use bar plots to visualize categorical variables' two variables. The correlation among the features and
absolute and proportional frequency distribution. with the target column, the heatmap was derived to check
The different univariate analyses were performed the inertia values among the features are depicted in
using the histograms and the boxplots of all the features Figure 10.
depicted in Figures 8 and 9. Observation: “Systolic BP” and “Diastolic BP” are
Observation: Almost all variables have outliers that highly correlated. As we can see from the graph, they
cause skewed distribution. We will ignore that outlier for have a positive correlation with a correlation coefficient
now because that value seems natural in this case, except value of 0.79. This means that SystolicBP and
for “Heart Rate.” That variable has an outlier that is too DiastolicBP variable contains highly similar information,
far from the other values. with very little or no variance in information. This is
known as a problem called multicollinearity, which
undermines the statistical significance of an independent
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
153
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 11. Bivariate histogram diagram of features concerning target outcome

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
154
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)

Figure 12. Bivariate histogram diagram of features concerning target outcome


Health risks seem to be getting higher along with the
number of heart rates.
variable. We can remove one of them because we do not Multivariate analysis
want a redundant variable while making or training our Multivariate analysis involves analyzing multiple
model. However, we will dig deeper to decide whether variables (more than two) to identify any possible
we need to remove this variable and which variable we association and find the relationship among them. More

Figure 13. Multivariate histogram of Body Temp and Heart Rate concerning Risk Level

should remove. specifically, we tried associating more than one predictor


We used the histogram with hue mapping to visualize variable with the response variable.
the predictor variables’ data distribution based on the In this case, we analyzed the impact of two different
target variable and patronized in Figures 11 and 12 predictor variables simultaneously on the "RiskLevel"
sequentially. variable. We used a scatter plot since all the predictor
Observation: As mentioned before, the "Heart Rate" variables have numerical values and then grouped them
variable has an outlier with an unnatural value of 6 bpm. using Risk Level values with different colours. We

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
155
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
analyzed the risk level by considering two variables at a “Seven” because that value does not make sense and is
time. We observed that in the previous two stages, “Heart most likely an input error.
Rate” and “Body Temperature” were highly correlated We will not store processed data in the original
with the response. In this case, only one scatter plot is variable; instead, we will store it in the new variable to
provided for the conclusion in Figure 13. compare it with the original data. Then, after conducting
Observation: Pregnant women with higher body several analyses of the predictor variables, we conclude
temperature seem to have a higher health risk, regardless that the "Heart Rate” variable is less helpful in
of their heart rate; also noted, according to the previous determining the health risks of pregnant women. So, it is
analysis, pregnant women in this observation mostly have safe to remove that variable. If we delete that variable,
a 98 F body temperature. The HeartRate variable could one might wonder why we drop records with outliers on
be more helpful in this case. the HeartRate variable. The answer is that it has an input
Table 4. Proposed prediction model experimental results
Model Acc (%) Pre Re KMA (%) Fs
Processed-RF 91.176 0.917 0.911 90.897 0.912
Tuned-RF 93.137 0.937 0.932 93.111 0.932
error, so the records may need to be legit. The label is
also incorrect, misleading the training process and
making the model less accurate.

Figure 15. The accuracy comparison of the processed


and tuned RF model

This research study concludes with an analysis of the


acquired dataset after performing EDA technique; we can
wind up that BS level is the most important variable in
determining the health level of pregnant women.
Figure 14. The CM of the processed and tuned RF Pregnant women with high blood glucose levels tend to
model have high health risks. Over 75% of pregnant women
with a BS of 8 or more have a high health risk. BS also
Discussion has a relatively strong positive correlation to Age,
In this dataset, several variables have outliers, but Systolic BP, and Diastolic BP, so pregnant women with
most of those values still make sense in real life. The high Age, Systolic BP, and Diastolic BP must be vigilant.
only variable that has an outlier with an unreasonable Age is also an important variable, where the health risks
value is "Heart Rate." In this variable, two observations of pregnant women seem to start to increase starting from
have a heart rate value of 7 bpm (beats per minute). The the age of 25 years. For Systolic BP and Diastolic BP,
average resting heart rate for adults ranges from 60 to these two variables have a strong relationship, as
100 beats per minute, and the lowest recorded resting evidenced by the correlation coefficient value of 0.79.
heart rate in human history was 25 bpm. Therefore, we About Body Temp, this variable does not give much
will drop these two records with a heart rate value of information because more than 79% of the total value is
98′F. However, this variable shows that pregnant women
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
156
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
with a body temperature above 98.4′F tend to have a initial dataset. This resulted in an accuracy of 90.16%
greater health risk. The last one is Heart Rate, the least throughout the testing phase, within the optimal
relevant variable in determining the health level of customizable setting. During the subsequent phase,
pregnant women. feature engineering and data cleaning procedures were

RF Models' Performance Comparison


96
94
92
90
88
86
84
82
80
78
Default-RF Balanced-RF Processed-RF Tuned-RF

Acc (%) Pre Re KMA (%) Fs

Figure 16. The performance comparison of the different RF models

executed, involving the removal of data outliers and the


Experimental results deletion of extraneous variables. As a result, the accuracy
Based on the Second Approach, the results obtained of the model exhibited an improvement, reaching a value
after doing EDA and then training and testing the best- of 91.18%. The results indicate that the suggested model
performing ML model, RF Classifier and then again fine- exhibits superior generalization capabilities when applied
tuning the model using Grid Search CV along k-fold CV to the processed dataset. In addition, hyperparameter
are shown below. tuning was conducted to determine the optimal values for
After applying for EDA and eliminating the feature the hyperparameter estimator in the Random Forest
“Heart Rate,” the prediction model is trained using the method. By utilizing the optimal hyperparameter
90% instances, and the model is validated over the 10% determined by the Grid Search CV tuning technique, the
data instances. The accuracy is observed significantly model achieves an enhanced accuracy rate of 93.14%.
under the same test condition. We fine-tuned the RF The use of cross-validation, employing a five-fold data-
model using the grid hyperparameter values and splitting methodology throughout the entirety of the
performed the Grid Search CV for better prediction dataset, resulted in a noteworthy mean accuracy of
outcomes. The processed data and hyper-tuned RF model 93.11%. This outcome suggests the presence of a stable
results are summarized in Table 4. The CM of the prediction model that is not prone to overfitting.
processed and Tuned RF model and their accuracy This research study could be the scope of the real-time
comparison are depicted in Figures 14 to 15, respectively. alerts and interventions system that can be enhanced to
Finally, the performance improvement of the RF provide real-time alerts and interventions based on risk
prediction model is significantly noticeable and prediction to enable timely notifications to healthcare
represented in Figure 16. professionals, allowing personalized care. The Android
app's user experience and interface can be improved to
Conclusion & future scope ensure its effectiveness and widespread adoption. A
This work culminates by constructing a stage feedback mechanism can be created by getting input from
prediction model. In the initial phase of constructing the healthcare professionals and pregnant women and can be
classification model, five machine-learning classifiers incorporated to enhance the usability and accessibility of
were employed. Among these classifiers, the Random the system. Since the study involves collecting sensitive
Forest (RF) classifier demonstrated an accuracy of health data and ensuring robust data privacy and security
86.28% when applied to the obtained dataset. measures, which is of utmost importance, a strong
Subsequently, we implemented the balanced Synthetic encryption technique can be developed, and compliance
Minority Over-sampling Technique (SMOTE) on the
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
157
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
with privacy regulations should be ensured to protect the Al-Jaroodi, J., Mohamed, N., & Abukhousa, E. (2020).
confidentiality and integrity of the collected data. Health 4.0: On the Way to Realizing the
Healthcare of the Future. IEEE Access, 8,
Conflicts of interest 211189 - 211210.
There are no known conflicts of interest for the https://doi.org/10.1109/ACCESS.2020.3038858
authors in the publication of this work. Castillejo, P., Martinez, J. F., Rodriguez-Molina, J., &
Cuerva, A. (2013). Integration of wearable
References devices in a wireless sensor network for an E-
Ahmed, M. (2022). Maternal Health Risk Data. From health application. IEEE Wireless
Kaggle: Communications, 20(4), 38 - 49.
https://www.kaggle.com/datasets/csafrit2/mater https://doi.org/10.1109/MWC.2013.6590049
nal-health-risk-data Chen, H.-Y., Chuang, C. H., Yang, Y. J., & Wu, T. P.
Ahmed, M., Abul Kashem, M., Rahman , M., & Khatun , (2011). Exploring the risk factors of preterm
S. (2020). Review and Analysis of Risk Factor birth using data mining. Expert Systems with
of Maternal Health in Remote Area Using the Applications, 38(5), 5384-5387.
Internet of Things (IoT). InECCE2019, pp. https://doi.org/10.1016/j.eswa.2010.10.017
357–365. https://doi.org/10.1007/978-981-15- Gupta, R., Shukla, A., Mehta, P., & Bhattacharya, P.
2317-5_30 (2020). VAHAK: A Blockchain-based Outdoor
Assaduzzaman, M., Al Mamun, A., & Hasan, M. (2023). Delivery Scheme using UAV for Healthcare 4.0
Early Prediction of Maternal Health Risk Services. IEEE INFOCOM 2020 - IEEE
Factors Using Machine Learning Techniques. Conference on Computer Communications
2023 International Conference for Workshops (INFOCOM WKSHPS). Toronto,
Advancement in Technology (ICONAT). Goa, ON, Canada.
India. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162
https://doi.org/10.1109/ICONAT57137.2023.10 738
080700 Haleem, A., Javaid, M., Singh, R., & Suman, R. (2022).
Celdrán, A., Gil Pérez, M., García Clemente, F., & Medical 4.0 technologies for healthcare:
Martínez Pérez, G. (2018). Sustainable securing Features, capabilities, and applications. Internet
of Medical Cyber-Physical Systems for the of Things and Cyber-Physical Systems, 2, 12-
healthcare of the future. Sustainable 30.
Computing: Informatics and Systems, 19, 138- https://doi.org/10.1016/j.iotcps.2022.04.001
146. Hussain, T. M., Shaikh, M., Ali, B. R., & Talpur, H.
https://doi.org/10.1016/j.suscom.2018.02.010 (2014). Internet of Things as Intimating for
Pereira, S., Costa Filho, R., Ramos, R., & Oliveira, M. Pregnant Women’s Healthcare: An Impending
(2020). Improving Maternal Risk Analysis in Privacy Issues. The Indonesian Journal of
Public Health Systems. 5th International Electrical Engineering and Computer Science
Conference on Smart and Sustainable (IJEECS), 12(6), 4337-4344.
Technologies (SpliTech). Split, Croatia. https://doi.org/10.11591/telkomnika.v12i6.4227
https://doi.org/10.23919/SpliTech49282.2020.9 Islam, R., Sayed, M., Saha, S., & Jamal Hossain, M.
243769 (2023). Android malware classification using
Raza, A., Rehman Siddiqui, H., Munir, K., & Almutairi, optimum feature selection and ensemble
M. (2022). Ensemble learning-based feature machine learning. Internet of Things and Cyber-
engineering to analyze maternal health during Physical Systems, 3, 100-111.
pregnancy and health risk prediction. PLoS https://doi.org/10.1016/j.iotcps.2023.03.001
ONE, 17(11), e0276525. Jaleel, A., Mahmood, T., Awais Hassan, M., & Bano, G.
https://doi.org/10.1371/journal.pone.0276525 (2020). Towards Medical Data Interoperability
Akbulut, A., Ertugrul, E., & Topcu, V. (2018). Fetal Through Collaboration of Healthcare Devices.
health status prediction based on maternal IEEE Access, 8, 132302 - 132319.
clinical history using machine learning https://doi.org/10.1109/ACCESS.2020.3009783
techniques. Computer Methods and Programs Oliveira, M.Y., Pesqueira, A., Sousa, M., & Dal Mas, F.
in Biomedicine, 163, 87-100. (2021). The Potential of Big Data Research in
https://doi.org/10.1016/j.cmpb.2018.06.010
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
158
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
HealthCare for Medical Doctors’ Learning. pregnant women based on IoT. Multimedia
Journal of Medical Systems, 45(13). Tools and Applications, 80, 24555–24571.
https://doi.org/10.1007/s10916-020-01691-7 https://doi.org/10.1007/s11042-021-10823-1
Pang, Z., Yang, G., Khedri, R., & Zhang, Y.T. (2018). Rawashdeh, H., Awawdeh, S., Shannag, F., & Henawi, E.
Introduction to the Special Section: (2020). Intelligent system based on data mining
Convergence of Automation Technology, techniques for prediction of preterm birth for
Biomedical Engineering, and Health women with cervical cerclage. Computational
Informatics Toward the Healthcare 4.0. IEEE Biology and Chemistry, 85, 107233.
Reviews in Biomedical Engineering, 11, 249 - https://doi.org/10.1016/j.compbiolchem.2020.1
259. 07233
https://doi.org/10.1109/RBME.2018.2848518 Redondi, A., Chirico, M., Borsani, L., & Cesana, M.
Pawar, L., Malhotra, J., Sharma, A., & Arora, D. (2022). (2013). An integrated system based on wireless
A Robust Machine Learning Predictive Model sensor networks for patient monitoring,
for Maternal Health Risk. 3rd International localization and tracking. Ad Hoc Networks,
Conference on Electronics and Sustainable 11(1), 39-53.
Communication Systems (ICESC). Coimbatore, https://doi.org/10.1016/j.adhoc.2012.04.006
India. Sarhaddi, F., Azimi , I., Labbaf , S., & Niela-Vilén, H.
https://doi.org/10.1109/ICESC54411.2022.9885 (2021). Long-Term IoT-Based Maternal
515 Monitoring: System Design and Evaluation.
Pereira, S., Portela, F., & Santos, M. (2015). Predicting Sensors, 21(17), 2281.
Type of Delivery by Identification of Obstetric https://doi.org/10.3390/s21072281
Risk Factors through Data Mining☆. Procedia WHO.(2023). Maternal mortality. Retrieved June 01,
Computer Science, 64, 601-609. 2023 from https://www.who.int/news-
doi:10.1016/j.procs.2015.08.573 room/fact-sheets/detail/maternal-mortality
Rani, S., & Kumar, M. (2021). Prediction of the mortality
rate and framework for remote monitoring of

How to cite this Article:


Subhash Mondal, Amitava Nag, Anup Kumar Barman and Mithun Karmakar (2023). Machine Learning-based maternal health risk
prediction model for IoMT framework. International Journal of Experimental Research and Review, 32,145-159.
DOI : https://doi.org/10.52756/ ijerr.2023.v32.012

DOI: https://doi.org/10.52756/ijerr.2023.v32.012
159

You might also like