Storage Method For Medical and Health Big Data Based On Distributed Sensor Network
Storage Method For Medical and Health Big Data Based On Distributed Sensor Network
Journal of Sensors
Volume 2023, Article ID 8506485, 10 pages
https://doi.org/10.1155/2023/8506485
Research Article
Storage Method for Medical and Health Big Data Based on
Distributed Sensor Network
Received 9 August 2022; Revised 3 October 2022; Accepted 13 October 2022; Published 3 February 2023
Copyright © 2023 Hui Chen et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Monitoring and collecting medical data using embedded medical diagnostic devices with multiple sensors and sending these
actual measured data to the corresponding health monitoring centers using multipurpose wireless networks to take necessary
measures to coordinate with family medical service centers and regional medical service departments is a popular medical big
data architecture. However, healthcare big data is characterized by large data volume, fast growth, multimodality, high value
and privacy, etc. How to organize and manage it in a unified and efficient way is an important research direction at present. In
response to the problems of low balance and poor security in the storage of data collected by distributed sensor networks in
healthcare systems, we propose a distributed storage algorithm for big data in healthcare systems. The platform adopts Hadoop
distributed file system and distributed file storage framework as the healthcare big data storage solution, and implements data
integration, multidimensional data query and analysis mining components based on Spark-SQL data query tool, Spark
machine learning algorithm library and its mining and analysis pipeline development, respectively. The distributed storage
model of big data and three data storage levels are constructed using cloud storage architecture, and the data storage intensity
as well as levels are calculated by high data access in the upper level, data connection in the middle level, and data archiving in
the lower level according to the set known data granularity, odds, and elasticity to realize big data storage. It is experimentally
verified that the above algorithm has high distribution balance and low load balance in the storage process.
model improves the system data analysis to further optimize distributed storage of big data with health care density area
the query performance of the storage system. The advan- distribution and large differences in spatial and temporal
tages and disadvantages of the existing distributed medical distribution, which makes it difficult to form a generalized
data storage system are shown in Figure 1. application; while the latter uses the historical operational
Experts have conducted a lot of research and improve- data of health care density area distribution as the sample
ments for the business requirements of healthcare data stor- base and builds the model of probabilistic characteristics of
age system and the limitations of Hadoop system and big data distributed storage with health care density area dis-
summarized the improvement methods of Hadoop-based tribution by data-driven. The latter model is based on the
healthcare data storage system as follows. HDFS uses data historical data of health care density area distribution, and
blocks as data reading and writing units and stores metadata has better generalizability. Beta distribution is used to fit
in the memory of NameNode, but since healthcare data con- the prediction error of big data distributed storage with
tains a large amount of HDFS uses data blocks as data read health care density area distribution, and then the distribu-
and write units, and stores metadata in the memory of tion of prediction error of big data distributed storage with
NameNode. In addition, the Hadoop replica storage policy health care density area distribution is used to determine
makes it easy for nodes with frequent read and write opera- the size of energy storage capacity [12–15]. The t-distribu-
tions to reach the load threshold and trigger the load balan- tion with shift factor and scaling factor is used to describe
cing operation of the system several times. Therefore, by the big data distributed storage with regional distribution
optimizing the small file processing strategy and improving of medical health density, and then the model parameters
the copy selection strategy of Hadoop, the performance opti- are estimated using historical data samples. A third order
mization of the Hadoop-based medical health data storage Gaussian distribution function was used to fit the probability
system can be achieved [8–10]. The Hadoop distributed sys- distribution of the longitudinal moments of the big data dis-
tem has the advantages of low cost, high scalability, and high tributed storage of the regional distribution of health care
reliability, and is suitable for storing time-sensitive medical density, and good results were achieved.
health data but cannot meet the demand of real-time stor- The prediction errors of the big data distributed stor-
age. HDFS aims to achieve high throughput at the cost of age of health care health density regional distribution were
high latency; HDFS aims to achieve high throughput at the modeled using exponential distribution and normal distri-
cost of high latency and is not suitable for low latency read bution functions, respectively, and then the parameters of
requests, but medical health data has more read and fetch the distribution functions were estimated using the great
operations, and the long response time will affect the user likelihood estimation and least squares method. However,
experience. How to combine MapReduce, Spark, and other the above-mentioned modeling of probabilistic properties
big data analysis technologies for parallel processing of data of big data distributed storage of health care health density
sets is the key to analyze the value of data. There have been area distribution have the commonality of using a priori
many healthcare data storage solutions based on improved distribution models to simulate the probability density of
Hadoop storage systems, and good research results have big data distributed storage of health care health density
been achieved in system storage performance optimization, area distribution, so there are two drawbacks: the effect
efficient retrieval, and data analysis [11]. of parameter estimation on sample data relies on the a
Since the reform and opening, medical and health care priori definition set by human subjectivity, and it is diffi-
construction in China has gradually emerged, and its con- cult to guarantee if the assumptions of the a priori model
struction is divided by geography. As the concept of medical are biased [16–18]. The convergence of the fitted model is
and health care is built based on medical and health care ser- difficult to guarantee if the assumptions of the prior model
vices, medical and health care services are continuously car- are biased; the differences in the spatial-temporal distribu-
ried out under the promotion of the government, and tion of the distributed storage of big data with regional
medical and health care construction provides convenience distribution of healthcare density make it necessary to
for residents’ lives and greatly improves their quality of life. use different probability density distributions for different
In recent years, the social service function of medical and regions, which does not meet the requirement of universal
health care in major cities is high, the construction of infra- adaptation of modeling.
structure has achieved leapfrog development, and a medical Although the Hadoop-based approach for medical and
and health care service system covering 4 levels of city, dis- health data storage system has an extremely practical value,
trict, street, and residence have been established. At present, it is not applicable to some specific application scenarios
the research on the probabilistic characteristics modeling of because the density region distribution of medical and health
big data distributed storage for the regional distribution of resources and patient groups are not considered. The opti-
medical and health density can be divided into two major mal extraction of density region distribution in big data
aspects: the probabilistic density model of big data distrib- environment can effectively improve the data quality in big
uted storage with the characteristic probability distribution data environment. The optimal extraction of density region
function simulating the regional distribution of medical distribution needs to get the density value near each data
and health density, and the fitting estimation model driven quality sample, give the region where the samples are aggre-
by the historical data of regional distribution of medical gated, and complete the optimal extraction of density region
and health density. The former lacks accuracy and universal- distribution. The traditional method first forms the original
ity due to the disadvantages of many factors affecting the transaction data set and gives the distribution rules of the
Journal of Sensors 3
Advantages Disadvantages
Figure 1: Strengths and weaknesses of existing distributed medical data storage systems.
data, but neglects to give the region where the data samples health data are getting larger and larger, which leads to the
are aggregated, resulting in low extraction accuracy. following limitations of using relational database for the
Time series-based method for optimal extraction of den- storage of large-scale medical and health data: (1) medical
sity region distribution in big data environment. The and health data contain more unstructured data; however,
method first uses the time series model to identify the time the structure of relational database is relatively fixed and
series of each data state volume, classifies the density region cannot be applied to the storage of unstructured data. (2)
distribution in the time series, uses the high-density cluster- Relational database is limited by the storage capacity of sin-
ing method to get the density value near each data quality gle machine and cannot be applied to the storage scenario of
sample, gives the region where the samples are aggregated, medical and health care big data. Although the relational
and introduces the label movement speed into the sliding database supports distributed expansion, the installation
window adaptive adjustment process to complete the opti- and maintenance costs are high due to the complex rules
mal extraction of the density region distribution in the big of distributed relational database partitioning. (3) The scal-
data environment. Therefore, this paper proposes the big ability of relational database is poor, and it is difficult to real-
data distributed storage algorithm for medical and health ize data sharing among different medical and health
system, constructs the big data distributed storage algorithm institutions. (4) The read and write of relational database
through cloud storage architecture, and considers the den- must go through SQL parsing, and the performance of con-
sity region distribution through density estimation algo- current read and write on large-scale data is weak. (5) The
rithm to achieve the balance between storage system and volume of data is too large, which makes it difficult for data
actual demand, guarantees the antiattack of stored data, analysis software to analyze data effectively and accurately.
and realizes the big data distributed encrypted storage. In summary, the traditional relational database can no lon-
ger meet the storage needs of terabytes and petabytes of
2. Related Work medical and health data in the era of big data [21, 22].
2.1. Traditional Medical Health Data Storage System. Cur- 2.2. Distributed Medical and Health Data Storage System.
rently, the construction of mature hospital systems mainly After a long development, the data storage system has grad-
includes HIS (Hospital Information System), EMRS (Elec- ually evolved from a stand-alone storage system to a storage
tronic Medical Record System), RIS (Radiology Information system that supports distributed expansion. Subsequently,
Management System), and PACS (Image Archiving and distributed solutions for relational databases and NoSQL
Communication System). The schematic diagram of the databases that natively support distributed storage have
construction of inhospital medical and health storage system emerged. This section introduces Hadoop distributed stor-
is shown in Figure 2. Traditional healthcare data storage sys- age system and NoSQL database, respectively. Hadoop is a
tems mostly use relational databases, such as MySQL and mainstream distributed system supporting massive data
SQLServer, which organize data through a relational model storage and processing, including Hadoop File System
and store each record in a two-dimensional table in the form (HDFS), MapReduce, Hadoop Data Base (HBase), and other
of rows, but relational databases need to satisfy a predefined important components [23, 24]. Among them, HDFS is the
relational model and each record has a fixed data length [19, data storage and management center of Hadoop system,
20]. As the inhospital system is only for a single business or a with high fault tolerance, efficient writing, and other charac-
single data type of the hospital, the amount of data stored teristics. The NameNode is responsible for managing the
and managed is relatively small, so the relational database metadata and DataNode nodes of the file system, and the
can meet the demand. DataNode is the actual working node of the file system,
With the continuous development of network and infor- which is responsible for storing and retrieving data and
mation technology, the scale and complexity of medical and sending the stored block information to the NameNode
4 Journal of Sensors
3. Methods
Figure 4: Model structure.
3.1. Model Structure. The distributed storage of medical and
health care big data proposed in this paper is shown in
operations for all object domains in a determined high- Figure 4. This architecture consists of application layer, stor-
density subset. age layer, and platform layer. The application layer consists
Although the above method is of great practical value, it is of clients of HIS system and PACS system, which are
not applicable to some specific application scenarios. For responsible for providing users with operation interface,
example, in exploring the changes of animal habits by col- information management, image viewing, and other func-
lecting migration data of North American hoofed animals, tions. The storage layer is a two-level storage model of local
the data is obtained by radio telemetry means with large posi- side and cloud side, the local side consists of HIS server and
tioning errors and sampling intervals, and large errors are PACS server, which can be built on the local server side and
introduced when extracting metadata features such as speed is responsible for storing and managing.
and curvature, resulting in extremely unreliable classification The local side consists of HIS server and PACS server,
results. In addition, the trajectory data obtained by means of which can be built on the local server side and is responsible
radar scanning, Wi-Fi indoor positioning, cellular positioning, for storing and managing the structured information data
Flicker photo location data, etc. have similar statistical charac- and recent image data of the hospital; the cloud side is built
teristics. For this kind of data, if the trajectory data of different by FastDFS large-scale distributed cluster, which is responsi-
categories overlap severely in space, it is generally considered ble for the permanent storage of long-term files. The plat-
that its separability is not strong; on the contrary, if there is form layer is a virtual platform built on top of the
a certain degree of separation of trajectories in space, their infrastructure by virtualization technology, which facilitates
location-related features can be fully explored [30]. the provision of cloud services through the rational and effi-
The two-dimensional space where the two-dimensional cient use of server resources.
trajectory segments are located is divided, and the minimum
description length (MDL) is used as the criterion for select- 3.2. Distributed Sensor Network. Distributed sensor network-
ing the granularity of the division, and the rectangular based medical health monitoring system is a networked
homogeneous region containing only one type of trajectory physiological monitoring physiotherapy system for col-
is extracted as the feature. Compared with the trajectory pat- lecting users’ body status data, which should have the func-
tern feature method, this method not only improves the clas- tions of automatic recording, continuous monitoring,
sification accuracy of trajectories, but also enhances the warning notification, intelligent judgment, self-correction,
training efficiency of the classifier. However, this method and standard transmission. Noninvasive physiological signal
assumes that the significant regions are approximately rect- monitoring system is an important part of the monitoring
angular in distribution, which is not always applicable in system, which consists of multiple sensors that measure
practice. In addition, to reduce the search complexity of medical data including important vital signs such as blood
the optimal classification, the method uses the projection pressure, blood glucose, heart rate, blood oxygen concentra-
to x- and y-axis to select the division points of each axis tion, and arterial oxygen pressure saturation. For example, a
alternately, which is a limitation in the division of the trajec- noninvasive wristwatch blood pressure monitor allows users
tory cluster distribution. To solve this limitation, a strategy to wear it around like a watch, monitor blood pressure, and
of spatial region merging is proposed to extract homoge- record pulse rate 24/7 without discomfort for long periods of
neous regions; however, this method does not eliminate time. Over the Internet, medical monitoring data based on a
the limitation of rectangular region shape and still has strong distributed sensor network is transmitted by multiple com-
limitations. In addition, Gaussian mixture model (GMM) is plementary wireless networks to a specific health monitoring
proposed to fit the distribution of trajectory segments in center, where it is integrated into the permanent electronic
space, which eliminates the defects of region division medical record of the designated user. As a result, the med-
method and extends the application to the problem of clas- ical staff at the health monitoring center can monitor various
sifying trajectory data in 3D or even higher dimensions. vital signs of the user at any appropriate time, and if any
6 Journal of Sensors
abnormal physical signs are detected, the medical staff will is necessary to control the preservation capacity during data
give appropriate medical instructions before the condition storage and achieve continuous data storage by modifying
deteriorates, and then take steps to treat the condition. The the data granularity. Since the result obtained from Equation
health monitoring center specialists can also accurately (1) will make the data storage smoothly affected and the cal-
locate the user, consult with his or her home monitoring culated value is negative, if the data granularity is small, the
center doctor, and coordinate with local medical services result obtained from the calculation of Equation (1) may
using the fastest delivery method to take timely medical appear positive. Therefore, manipulating the data elasticity
assistance. The goal of the health monitoring system is to TðxÞ by the granularity rate p can improve the congestion
monitor the health status of the user at anytime and any- of data storage and reduce the degree of storage space being
where. Therefore, the following two typical situations are occupied. Since a negative correlation exists between TðxÞ
illustrated: when the user is at home or near his residence, and p, it is known, based on this conclusion, that TðxÞ can
and when the user is far from home or in another city. Con- maintain its original value by means of Equation (2).
sidering these two situations, the author proposes that the
distributed health monitoring system, health monitoring T ðxÞ ⇒ p: ð2Þ
center will be distributed to each region. In case 1, the user’s
medical monitoring data will be sent to the home health
monitoring center; in case 2, the medical monitoring data The expected value of E½TðxÞ ⇒ p can reach the time
will be sent to the corresponding visiting health monitoring function, therefore, is described by Equation (3) as:
center. ð
3.3. Distributed Storage Algorithm. When data storage E½T ðxÞ ⇒ p = E ½ T ðx Þ : ð3Þ
requirements are acquired by storage nodes, the distributed
data storage continuously sends preservation requests. Therefore, based on Equation (3), if at this stage we still
Therefore, through the storage capacity analysis and data set p to be the data preservation access granularity rate, i.e.,
storage hierarchy designed in this paper, if the demand of the next moment TðxÞ can be completed by the following
Equation (7) can be achieved after computing, the data is
preserved, and if not, Equations (1)–(7) are repeatedly exe- ð
cuted. At the same time, the data storage process is adjusted T ðx Þ = E½T ðxÞ + E½T ðxÞ ⇒ p: ð4Þ
into three levels, firstly, the upper level completes data height
access, the lower level realizes data archiving, and the middle
level mainly takes over the connection between the upper Since the distributed storage has a limited bandwidth
and lower levels. Among them, the upper layer of the data during the big data storage, TðxÞ can be completely covered
storage process is mainly represented by the following Equa- by the storage hierarchy, while the confirmed coverage asso-
tions (1)–(7). If the adoption probability of distributed data ciation Δt corresponding to the random moment can be
is expressed through PðxÞ, the inverse relationship appears expressed by
in the expectation of its adoption probability as EPðxÞ as
well as the elastic expectation as E½TðxÞ, so the elastic ð Δt qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
expectation of distributed data is calculated through ΔT ðxÞ = p Δ2 − T ðxÞ2 : ð5Þ
Δ
λ − λ 2 E ½ T ðx Þ 2 − E ½ T ðx Þ
E ½ P ðx Þ = : ð1Þ Based on Equation (5), the big data distributed storage
E ½ T ðx Þ intensity index Δλ is described by
50
40
ervation of the distributed storage of big data.
30
20
3.4. Density Area Distribution. In the big data environment,
10
most of the original state data time series contain multiple
0 feature data, so in the process of optimizing the extraction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 of feature data in the big data environment, it is necessary
Epoch to use the time series model to divide all the collected data
Training set states into multivariate continuous time series, give the
Validation set high-quality data state volume amplitude change law, extract
the feature data state characteristics, and calculate the fea-
Figure 5: Training set and test set loss convergence during ture data on the time. The effect of the feature data on the
training. time series fitting is calculated, and the residuals of the time
series fitting of each data distribution state are obtained. The
120
′ yy repre-
specific steps are detailed as follows: suppose, by αdf
100
sents the number of each data state set in the big data envi-
80 ronment, and X th represents the value of the data state
volume l at the moment h. Using Equation (9), all the col-
Fitting
60
lected data states are divided into a multivariate continuous
40 time series
20
′ yy ∗ X th
αdf
′ =
rt ghpp , ð9Þ
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
p′f gg
Epoch
70.00
them in a distributed database. The built healthcare big data
60.00
50.00
platform uses the distributed database SequoiaDB to store
40.00 data, and the SequoiaDB distributed database contains data
30.00 nodes, cataloging nodes, and coordination nodes. When an
20.00 application sends an access request to the coordinating node,
10.00 the coordinating node first calculates the optimal data node
0.00 by communicating with the cataloging node and distributes
2000.00 4000.00 6000.00 8000.00 10000.00
the query task, and finally returns the query results of each
This article algorithm data node to the application after aggregation.
Huge amount of spatial data cloud storage and query algorithms The data computing platform uses the Spark computing
Hadoop-based big data storage algorithm framework, which supports a variety of data storage models
and can be combined with Hadoop to share storage
Figure 7: Comparison of read rates of different algorithms (Mb/s).
resources and computation in a Hadoop cluster, and Spark
100.00
can compute data that is accessed frequently and centrally
90.00 and store such data in memory to improve access efficiency.
80.00 Users submit data requests on the healthcare platform, and
Amount of data (Mb)
70.00 the platform analyzes the user input and presents the data.
60.00 In addition to direct data list display, the data presentation
50.00 method also provides data graphical display, coding the plat-
40.00 form statistical classification of data into graphics, using the
30.00
mainstream visualization technology html5, the introduc-
20.00
10.00
tion of chart drawing tool library chart.js, the data will be
0.00 presented in the form of statistical chart reports.
2000.00 4000.00 6000.00 8000.00 10000.00
4.2. Experimental Results. When raw big data is stored, if ogy provides a new idea for storing massive medical and
there is an unbalanced distribution, it is easy to generate health data. Based on the advantages of HDFS, HBase, and
local hotspots, which makes larger loads appear in some MapReduce, the Hadoop-based healthcare data storage sys-
nodes and makes empty loads appear continuously in some tem further optimizes the storage and query performance
nodes at the same time. Therefore, based on the original big to realize a smart healthcare storage system that integrates
data distribution balance degree, the original big data distri- high throughput, fast location, and efficient analysis. The
bution balance state of different algorithms is analyzed. The distributed database based medical health data storage sys-
time required to store big data at different quantities for dif- tem can meet the demand for unified storage and fast
ferent algorithms is analyzed, and the analysis results are response of multimodal medical health data and provides a
shown in Table 2. According to Table 2, with the growth platform support for subsequent multimodal data analysis
of data quantity, the storage spending time of three algo- and medical health data mining.
rithms gradually increases, and when the data quantity is In this paper, we study the distributed storage algorithm
2000 Mb, the storage spending time of massive spatial data of medical and health care big data considering density area
cloud storage and query algorithm is the highest, reaching distribution, design the process of big data distributed stor-
67 s, and when the data quantity is 20000, the storage spend- age through cloud storage architecture, and use the density
ing time of this algorithm remains the highest among three area distribution algorithm to complete the distribution
algorithms, 156 s, higher than the Hadoop-based big data. and decryption of the stored big data, so that the stored
The storage algorithm 15 s, higher than the algorithm of this big data can play the maximum efficiency, and verify the
paper 84 s, the algorithm of this paper storage spending time storage capability of the algorithm in this paper through
is at least 26 s, so using the algorithm of this paper, can effec- experiments. The load balance is lower, and the encryption
tively reduce the big data storage spending time. is more resistant to attack. In the future research phase, we
The analysis of the ability to read/write data can effec- can continuously optimize the big data distributed storage
tively determine the real-time nature of the storage algo- algorithm so that it can be applied to various fields. In the
rithm, and the analysis results are shown in Figures 7 and future, we plan to conduct research on distributed medical
8by comparing different algorithms through data reading and health data storage schemes for privacy data protection
and writing operations in the case of different data volumes. and medical and health knowledge inference.
According to Figures 7 and 8, the read data rates of the three
algorithms are 79 Mb s-1, 46 Mb s-1, and 51 Mb s-1 when the Data Availability
data volume is 2000 Mb, and the read/write rates of the three
algorithms are improved when the data volume is increas- The datasets used during the current study are available
ing, but when the data volume reaches 10000 Mb, the read from the corresponding author on reasonable request.
data rate of the Hadoop-based big data storage algorithm
remains the lowest among the three algorithms. The algo- Conflicts of Interest
rithm in this paper has the lowest read and write data rates
of 93 Mb s-1 and 94 Mb s-1, respectively, at this data volume, Declares that they have no conflict of interest.
which keeps the highest among the three algorithms, and the
algorithm in this paper still keeps the highest read and write References
data rates at other data volumes, therefore, it shows that the
algorithm in this paper has high read and write data rates [1] J. Mittendorfer and M. Niederreiter, “Striking complexity of
and can realize faster distributed storage of big data. the photon field in medical devices with heterogeneous density
The utilization rate of density area distribution under distribution and challenges for industrial irradiators,” Radia-
different iterations is analyzed, and the utilization rate of tion Physics and Chemistry, vol. 190, p. 109778, 2022.
density area distribution of different algorithms is derived [2] R. A. Jordan, G. Sydney, and E. Andrea, “Relevance of spatial
by comparing three algorithms, and the analysis results are and temporal trends in nymphal tick density and infection
shown in Figure 9. According to Figure 9, with the increase prevalence for public health and surveillance practice in
of iteration number, the utilization rate of data density area long-term endemic areas: a case study in Monmouth County,
distribution of three algorithms increases, and the utilization NJ [J],” Journal of Medical Entomology, vol. 4, p. 4, 2022.
rate of data density area distribution of this algorithm always [3] G. He, Z. Ma, X. Wang, Z. Xiao, and J. Dong, “Does the
remains above 90% between 100 and 600 iterations, which improvement of regional eco-efficiency improve the residents'
health conditions: empirical analysis from China's provincial
indicates that the utilization rate of data density area distri-
data,” Ecological Indicators, vol. 124, article 107387, 2021.
bution of this algorithm is high.
[4] E. C. Emond, A. Bousse, L. Brusaferri, B. F. Hutton, and
K. Thielemans, “Improved PET/CT respiratory motion com-
5. Conclusion pensation by incorporating changes in lung density,” IEEE
Transactions on Radiation and Plasma Medical Sciences,
In the era of big data, the scale of medical and health data vol. 99, pp. 1–1, 2020.
expands dramatically, and the data presents multimodal [5] L. F. Knudsen, A. J. Terkelsen, P. D. Drummond, and
characteristics. The traditional relational database can no F. Birklein, “Complex regional pain syndrome: a focus on the
longer guarantee the efficient storage and fast response of autonomic nervous system,” Clinical Autonomic Research,
massive data, and for this reason distributed storage technol- vol. 29, no. 4, pp. 457–467, 2019.
10 Journal of Sensors
[6] A. Jalali, C. Martin, R. E. Nelson et al., “Provider practice com- [24] C. Vaitsis, G. Nilsson, and N. Zary, “Visual analytics in
petition and adoption of Medicare's oncology care model,” healthcare education: exploring novel ways to analyze and rep-
Medical Care, vol. 58, no. 2, p. 1, 2019. resent big data in undergraduate medical education,” Peer J,
[7] D. A. Marshall, L. Burgos-Liz, K. S. Pasupathy et al., “Trans- vol. 2, article e683, 2014.
forming healthcare delivery: integrating dynamic simulation [25] J. Adler-Milstein and A. K. Jha, “Healthcare's "big data" chal-
modelling and big data in health economics and outcomes lenge,” The American Journal of Managed Care, vol. 19,
research,” PharmacoEconomics, vol. 34, no. 2, pp. 115–126, no. 7, pp. 537-538, 2013.
2016. [26] F. A. Batarseh and E. A. Latif, “Assessing the quality of service
[8] K. Kaur and R. Rani, “A smart polyglot solution for big data in using big data analytics: with application to healthcare,” Big
healthcare,” IT Professional, vol. 17, no. 6, pp. 48–55, 2015. Data Research, vol. 4, pp. 13–24, 2016.
[9] D. Lopez and G. Manogaran, “A survey of big data architec- [27] L. A. Tawalbeh, R. Mehmood, E. Benkhelifa, and H. Song,
tures and machine learning algorithms in healthcare,” Interna- “Mobile cloud computing model and big data analysis for
tional Journal of Biomedical Engineering and Technology, healthcare applications,” IEEE Access, vol. 4, no. 99,
vol. 25, no. 2/3/4, p. 182, 2017. pp. 6171–6180, 2017.
[10] R. K. Gisele, “Big data in healthcare,” Journal of healthcare [28] D. V. Dimitrov, “Medical internet of things and big data in
Communications, vol. 1, no. 4, 2016. healthcare,” Healthcare Informatics Research, vol. 22, no. 3,
[11] F. Leppert and W. Greiner, “Big data in healthcare - opportu- pp. 156–163, 2016.
nities and challenges,” Value in Health, vol. 19, no. 7, [29] E. Kai, P. P. Ghosh, S. Inoue, and A. Ahmed, “Gram health big
pp. A463–A463, 2016. data for smart healthcare applications,” BME, vol. 51, 2013.
[12] H. Chang, “Book review: data-driven healthcare & analytics in [30] M. U. S. U. Sarwar, M. K. Hanif, R. Talib, A. Mobeen, and
a big data world,” Healthcare Informatics Research, vol. 21, M. Aslam, “A survey of big data analytics in healthcare,” The
no. 1, p. 61, 2015. Science and Information (SAI) Organization Limited, vol. 6,
[13] P. K. Sahoo, S. K. Mohapatra, and S. L. Wu, “Analyzing 2017.
healthcare big data with prediction for future health condi-
tion,” IEEE Access, vol. 4, pp. 9786–9799, 2017.
[14] C. C. Yang and P. Veltri, “Intelligent healthcare informatics in
big data era,” Artificial Intelligence in Medicine, vol. 65, no. 2,
pp. 75–77, 2015.
[15] F. Firouzi, A. M. Rahmani, K. Mankodiya et al., “Internet-of-
things and big data for smarter healthcare: from device to
architecture, applications and analytics,” Future Generation
Computer Systems, vol. 78, pp. 583–586, 2017.
[16] H. A. Al Hamid, S. M. M. Rahman, M. S. Hossain,
A. Almogren, and A. Alamri, “A security model for preserving
the privacy of medical big data in a healthcare cloud using a
fog computing facility with pairing-based cryptography,” IEEE
Access, vol. 5, pp. 22313–22328, 2017.
[17] S. Ryu and T. M. Song, “Big data analysis in healthcare,”
Healthcare Informatics Research, vol. 20, no. 4, pp. 247-248,
2014.
[18] T. M. Song and R. Seewon, “Big data analysis framework for
healthcare and social sectors in Korea,” Healthcare Informatics
Research, vol. 21, no. 1, pp. 3–9, 2015.
[19] M. S. Hossain and G. Muhammad, “Healthcare big data voice
pathology assessment framework,” IEEE Access, vol. 4, no. 99,
p. 1, 2017.
[20] J. Wu, H. Li, S. Cheng, and Z. Lin, “The promising future of
healthcare services: when big data analytics meets wearable
technology,” Information & Management, vol. 53, no. 8,
pp. 1020–1033, 2016.
[21] H. He, Z. Du, W. Zhang, and A. Chen, “Optimization strategy
of Hadoop small file storage for big data in healthcare,” Journal
of Supercomputing, vol. 72, no. 10, pp. 3696–3707, 2016.
[22] S. Rallapalli, R. R. Gondkar, and U. Ketavarapu, “Impact of
processing and analyzing healthcare big data on cloud com-
puting environment by implementing Hadoop cluster,” Proce-
dia Computer Science, vol. 85, pp. 16–22, 2016.
[23] S. S. Tan, G. Gao, and S. Koch, “Big data and analytics in
healthcare,” Methods of Information in Medicine, vol. 54,
no. 6, pp. 546-547, 2015.