Machine Learning-Based Maternal Health Risk Predic
Machine Learning-Based Maternal Health Risk Predic
Machine learning-based maternal health risk prediction model for IoMT framework
Subhash Mondal, Amitava Nag, Anup Kumar Barman and Mithun Karmakar*
Article History: Abstract: The Internet of Things (IoT) is vital as it offers extensive applicability in various
Received: 15thApr., 2023 fields, including healthcare. In the context of the risk level during pregnancy, to monitor
Accepted: 28th Jul., 2023 and predict abnormalities, IoT devices provide a means to collect real-time health data,
Published: 30th Aug., 2023 enabling continuous monitoring and analysis in the Internet of Medical Things (IoMT)
environments. By integrating IoT devices into the system, crucial signs such as Heart Rate
Keywords: (HR), Systolic and Diastolic Blood Pressure (BP), Fetal Movements (FM), and
Maternal Health Risk, Temperature (T) can be tracked remotely and non-invasively. This allows for the timely
Internet of Medical detection of abnormalities or potential risk factors during pregnancy, empowering
Things (IoMT), healthcare professionals to intervene proactively and provide personalized care. This
Prediction Model, research focuses on developing a system for observing and predicting the maternal risk
Exploratory Data level in the IoT environment, mainly in remote areas. The goal is to improve maternal
Analysis (EDA), health and reduce maternal and child mortality rates, a significant decline according to
Android-based United Nations targets for 2030. The research utilizes analytical tools and Machine
Application, Random Learning (ML) algorithms to analyze health data and risk factors associated with
Forest Classifier pregnancy. The acquired dataset contains various risk factors categorized and classified
based on intensity. After comparing different ML models’ experimental results,
Exploratory Data Analysis (EDA) approaches to determine the most effective risk factors.
The fine-tuned Random Forest Classifier (RF) achieves the highest accuracy of 93.14%. An
Android-based application has also been developed to deploy the prediction model to
determine risk levels based on the different parameters.
Figure 1. IoT-based automated smart maternal health risks monitoring system: based on machine learning
Materials and methods Risk Level Assessment: Based on the analysis, the
The proposed system is an android-based maternal system would assign a risk level to each pregnant
health risk prediction system in an IoT environment, woman, indicating the likelihood and severity of potential
designed to analyze data from IoT devices and predict the health risks. This scoring system can help prioritize high-
health risk level of a pregnant woman during pregnancy. risk cases for further and immediate medical attention.
Its primary objective is to improve maternal health risk Early Warning: The system can generate alerts and
outcomes by identifying high-risk cases early on. The notifications for healthcare professionals and registered
system architecture of the proposed model based on ML family members when a patient's risk level crosses a
classifiers is depicted in Fig. 1. The detailed step-by-step certain threshold.
explanation of the system workflow is discussed below in Android Application Deployment: Deploying an
phases. Android app that utilizes machine learning models
Data Collection: The system would gather relevant involves several steps. First, the machine learning model
data about pregnant women, including age, blood must be trained and optimized for mobile deployment.
pressure (from IoT device), blood sugar, body Then, the model is integrated into the Android app,
temperature (from IoT device) and heart rate (from IoT ensuring compatibility and efficient resource usage.
device). Finally, the app and the embedded machine learning
Data Preprocessing: The system would preprocess model are packaged and benefit from the intelligent
raw data to make it suitable for further analysis and functionalities.
modelling. Implementation details
Exploratory Data Analysis (EDA): EDA is an As per the proposed system architecture, using the
approach for analyzing and visualizing data to gain first approach after collecting the raw dataset from the
insights, understand the underlying patterns, and open source, we performed some data preprocessing
identify relationships between variables. It helps in techniques to transform the raw dataset into a processed
understanding the structure of the data, detecting dataset to perform the ML model deployment for
outliers, and assessing variables. deciding any risk of abnormalities during pregnancy
Feature Selection/Feature Engineering: It is the time. The two-stage prediction model based on the ML
process of choosing a subset from a large set of technique in the IoT environment is illustrated in stage 1
available features in a dataset. for initial model prediction, and in the second stage, a
Machine Learning Models: The system would then unique approach, EDA was applied for feature selection
utilize machine learning algorithms to analyze the for the final model deployment. The detailed architecture
collected data and identify patterns and correlations is depicted in Figure 2.
between risk factors and potential health risks.
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
148
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
149
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
st
Table 2. Experimental results in 1 stage model prediction
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
150
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
Technique, an algorithm used to address the class
Table 3. The balanced employed models’ experimental findings
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
152
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
Figure 9. The histogram and boxplot of the Diastolic BP, BS, Body Temp, and Heart Rate
range of values and the central tendency of the values. Bivariate analysis
Univariate data analysis does not look at relationships Bivariate analysis helps study the relationship
between variables (like bivariate and multivariate between two variables. It helps to find out if there is an
analysis); rather, it summarizes each variable association between the variables, and if yes, then what is
independently. Methods to perform univariate analysis the strength of the association? One variable here is
will depend on whether the variable is categorical or dependent, while the other is independent. We used
numerical. For the numerical variable, we would explore correlation coefficients to find out how high is the
the shape of the distribution (distribution can either be relationship between two variables. We also use scattered
symmetric or skewed) using histogram and density plots. plots to show the patterns that can be formed using the
We would use bar plots to visualize categorical variables' two variables. The correlation among the features and
absolute and proportional frequency distribution. with the target column, the heatmap was derived to check
The different univariate analyses were performed the inertia values among the features are depicted in
using the histograms and the boxplots of all the features Figure 10.
depicted in Figures 8 and 9. Observation: “Systolic BP” and “Diastolic BP” are
Observation: Almost all variables have outliers that highly correlated. As we can see from the graph, they
cause skewed distribution. We will ignore that outlier for have a positive correlation with a correlation coefficient
now because that value seems natural in this case, except value of 0.79. This means that SystolicBP and
for “Heart Rate.” That variable has an outlier that is too DiastolicBP variable contains highly similar information,
far from the other values. with very little or no variance in information. This is
known as a problem called multicollinearity, which
undermines the statistical significance of an independent
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
153
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
154
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
Figure 13. Multivariate histogram of Body Temp and Heart Rate concerning Risk Level
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
155
Int. J. Exp. Res. Rev., Vol. 32: 145-159 (2023)
analyzed the risk level by considering two variables at a “Seven” because that value does not make sense and is
time. We observed that in the previous two stages, “Heart most likely an input error.
Rate” and “Body Temperature” were highly correlated We will not store processed data in the original
with the response. In this case, only one scatter plot is variable; instead, we will store it in the new variable to
provided for the conclusion in Figure 13. compare it with the original data. Then, after conducting
Observation: Pregnant women with higher body several analyses of the predictor variables, we conclude
temperature seem to have a higher health risk, regardless that the "Heart Rate” variable is less helpful in
of their heart rate; also noted, according to the previous determining the health risks of pregnant women. So, it is
analysis, pregnant women in this observation mostly have safe to remove that variable. If we delete that variable,
a 98 F body temperature. The HeartRate variable could one might wonder why we drop records with outliers on
be more helpful in this case. the HeartRate variable. The answer is that it has an input
Table 4. Proposed prediction model experimental results
Model Acc (%) Pre Re KMA (%) Fs
Processed-RF 91.176 0.917 0.911 90.897 0.912
Tuned-RF 93.137 0.937 0.932 93.111 0.932
error, so the records may need to be legit. The label is
also incorrect, misleading the training process and
making the model less accurate.
DOI: https://doi.org/10.52756/ijerr.2023.v32.012
159