0% found this document useful (0 votes)

28 views22 pages

Healt Care

Uploaded by

ouaatoufatima2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views22 pages

Healt Care

Uploaded by

ouaatoufatima2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO.

1, JANUARY 2021 1

Big Data Analytics in Healthcare — A Systematic

Literature Review and Roadmap for Practical
Implementation
Sohail Imran, Tariq Mahmood, Ahsan Morshed, and Timos Sellis, Fellow, IEEE

Abstract—The advent of healthcare information management practitioners and professionals to successfully implement BDA
systems (HIMSs) continues to produce large volumes of initiatives in their organizations.
healthcare data for patient care and compliance and regulatory Index Terms—Big data analytics (BDA), big data architecture,
requirements at a global scale. Analysis of this big data allows for healthcare, NoSQL data stores, patient care, roadmap, systematic
boundless potential outcomes for discovering knowledge. Big data literature review.
analytics (BDA) in healthcare can, for instance, help determine
causes of diseases, generate effective diagnoses, enhance QoS
I. Introduction
guarantees by increasing efficiency of the healthcare delivery and
effectiveness and viability of treatments, generate accurate The advent of healthcare information management systems
predictions of readmissions, enhance clinical care, and pinpoint (HIMSs) is now generating huge volumes of patient-centered,
opportunities for cost savings. However, BDA implementations in granular-level healthcare data. The high velocity of this data
any domain are generally complicated and resource-intensive influences the relationship of hospitals and clinics with their
with a high failure rate and no roadmap or success strategies to
guide the practitioners. In this paper, we present a comprehensive patients and necessitates the use of analytics to tap into the
roadmap to derive insights from BDA in the healthcare (patient needs, attitudes, preferences, and characteristics of clinical
care) domain, based on the results of a systematic literature entities such as patients and practitioners [1]–[3]. Hence,
review. We initially determine big data characteristics for HIMSs are now required to implement different data
healthcare and then review BDA applications to healthcare in deployment, management and analytics strategies with the
academic research focusing particularly on NoSQL databases. usage of state-of-the-art big data tools, techniques and
We also identify the limitations and challenges of these
applications and justify the potential of NoSQL databases to technologies in order to utilize and handle the transformation
address these challenges and further enhance BDA healthcare of the heterogeneous healthcare data into valuable and useful
research. We then propose and describe a state-of-the-art BDA insights [4]. In fact, big data is already motivating the use of
architecture called Med-BDA for healthcare domain which solves new architectures to transfer the operational models and data
all current BDA challenges and is based on the latest zeta big data centric architectures of HIMSs [5], [6]. Also, big data in
paradigm. We also present success strategies to ensure the healthcare is rapidly changing with the advent of system
working of Med-BDA along with outlining the major benefits of
BDA applications to healthcare. Finally, we compare our work development approaches that are highly compatible with
with other related literature reviews across twelve hallmark widely distributed systems, particularly non-relational NoSQL
features to justify the novelty and importance of our work. The technology for big data ingestion, storage, management,
aforementioned contributions of our work are collectively unique querying and analysis, e.g., through the use of MongoDB’s
and clearly present a roadmap for clinical administrators, and Apache Hadoop’s ecosystems [7], [8].
Manuscript received June 29, 2020; revised July 21, 2020; accepted July The process of analyzing big data, or big data analytics
22, 2020. This work was supported by two research grants provided by the (BDA) can tackle large volume, high velocity data streams
Karachi Institute of Economics and Technology (KIET) and the Big Data
Analytics Laboratory at the Insitute of Business Administration (IBA- enabling personalized medicine, which provides physicians
Karachi). Recommended by Associate Editor Qinglong Han. (Corresponding with a more comprehensive (in-depth) understanding of an
author: Tariq Mahmood.) individual’s health. For instance, BDA can be applied to
Citation: S. Imran, T. Mahmood, A. Morshed, and T. Sellis, “Big data improve diagnostic treatment decisions amidst unaided human
analytics in healthcare — A systematic literature review and roadmap for
practical implementation,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. inference [9], [10]. The focus on the potential benefits of BDA
1–22, Jan. 2021. has never subsided in research papers, technical blogs, and
S. Imran is with the Faculty of Computer Science, Karachi Institute of videos, motivating researchers to design solutions to address
Economics and Technology, Karachi 75190, Pakistan (e-mail: sohail@ the aforementioned issues [11]. However, BDA has presented
pafkiet.edu.pk).
T. Mahmood is with the Faculty of Computer Science, Institute of Business
challenges in multiple business domains in the last decade.
Administration, Karachi 75270, Pakistan (e-mail: [email protected]). There is considerable hesitation to invest in big data
A. Morshed is with the School of Engineering and Technology, CQ technologies due to lack of standardization, a rapidly-evolving
University, Melbourne 3000, Australia (e-mail: [email protected]). technology stack, complicated architecture design, a skill set
T. Sellis is with the Data Science Research Institute, Swinburne University which is difficult to learn, high resource and cost
of Technology, Hawthorn 3122, Australia (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available
requirements, and data management, storage, access and
online at http://ieeexplore.ieee.org. analysis challenges. Another issue is the lack of a standard
Digital Object Identifier 10.1109/JAS.2020.1003384 protocol of communication between the BDA team and the

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
2 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

business side; the BDA team typically does not have enough but these have serious limitations [24]. The newly introduced
background knowledge of business domain to model the zeta architecture [25] solves these issues and in our opinion, is
analytics as per business requirements and the business side an ideal solution for healthcare big data companies if it can be
does not have the appropriate analytics knowledge properly formalized. An architecture proposal also needs to be
(algorithms, technology stack, etc.) to tune and guide the BDA coupled up with a success strategy, because many BDA
results according to personal needs. In fact, Gartner estimated projects have failed in recent years due to lack of strategic
that 85% of big data and BDA projects were failing in 2019 direction in leading BDA projects [3].
due to aforementioned issues [12]. BDA applications in We address the aforementioned requirements for our
healthcare are also (currently) plagued by these issues. roadmap specification through two main research questions
In this paper, we thoroughly investigate the domain of BDA (MRQ1 and MRQ2). We define MRQ1 as follows:
applications in the healthcare sector, particularly with respect 1) MRQ1: What is healthcare big data, and how has it been
to patient care because a majority of healthcare big data analyzed in research using BDA applications, and what
sources are related to patient care, as are the majority of challenges and benefits do these applications have in assisting
research works related to BDA for healthcare. Our intention is patients, doctors, physicians and other medical practitioners?
to provide a roadmap to clinical practitioners for BDA To answer MRQ1, we divide it into the following four sub-
applications in healthcare. Previously, researchers have research questions (SRQs):
applied data science, business intelligence and data a) SRQ1: Do healthcare datasets exhibit the characteristics
warehousing techniques to enhance patient care [13]–[19]. and properties of big data? (answered in Section IV-B)
These applications, although useful and numerous, are created b) SRQ2: What are the challenges identified in research
with considerably limited and small datasets and their literature in applying BDA to healthcare? (answered in
usability in the presence of big data cannot be guaranteed. Section V)
They are also not sufficient to justify clinical use [20]–[22]. c) SRQ3: What are the applications of BDA in healthcare in
Big data is far more complex, varied, and voluminous and research literature specifically in regards to NoSQL
requires different data management tools and technologies to technologies? (answered in Section VI)
obtain better insights as compared to traditional data mining- d) SRQ4: What are the benefits of BDA applications in
based analytics. Considering the rapidly expanding big data healthcare? (answered in Section VII)
space and the importance of patient care, it becomes important MRQ2 builds upon the results of MRQ1 and we define it as
to clearly investigate and determine the exact BDA follows:
applications in this domain, their achieved benefits and the 2) MRQ2: Can the evolving NoSQL technology solve the
difficult challenges which need to be addressed for further current BDA challenges, what is the most relevant BDA
research in this area. architecture for such a solution, and what are the strategies by
Our vision of a roadmap in this paper is comprehensive and which it can be ensured that this solution will be successful in
unique and based on the following requirements. We initially clinical and medical industries?
need to define the characteristics of big data as applicable to To answer MRQ2, we divide it into the following three
healthcare; it is generally known that HIMSs integrate, SRQs:
manage and synchronize big data which is characterized by 4 V’s a) SRQ5: What is the potential of the state-of-the-art and
(volume, velocity, variety, value) at a general level [23]. We rapidly evolving NoSQL technology stack in addressing the
need to understand the meaning of these 4 V’s in the context challenges in BDA applications to healthcare? (answered in
of healthcare, and also check their compliance with the target Section VIII)
dataset. Rapidly-expanding and powerful NoSQL technology b) SRQ6: How can BDA architecture incorporating NoSQL
has alone solved many of the big data management problems and other big data technologies be used as a guidance for
since 2007, particularly through the use of Apache Hadoop future BDA implementations in the healthcare sector?
and its ecosystem [7], [8]. Hence, we need to investigate and (answered in Section IX)
describe the current NoSQL applications in healthcare with c) SRQ7: What are the practical strategies which can be
academic research or other types of online content, and also employed by healthcare professionals to ensure successful
highlight the benefits which have been achieved with these execution of this BDA architecture? (answered in Section X)
applications. We then need to determine the exact challenges The remainder of the paper is organized as follows. In
being faced by the healthcare big data community, both with Section II, we describe the methodology for our systematic
or without the application of these NoSQL data stores. In fact, literature review and describe the relevant background on big
a roadmap needs to be presented which solves these data in Section III. In Section IV, we describe the important
challenges in a concrete way by highlighting the untapped dimensions of healthcare big data along with big data
potential of NoSQL databases for the healthcare sector. For characteristics extracted from the relevant literature (SRQ1).
this, guidance needs to be provided particularly with respect to In Section V, we identify and classify the challenges in the
the implementation architecture for healthcare BDA. relevant literature (SRQ2), and in Section VI, we describe all
Designing a software architecture for BDA is complicated due relevant NoSQL applications for a BDA healthcare setting
to numerous analytical tasks which need to interact with each (SRQ3) followed by the identified benefits in Section VII
other over a complicated and large technology stack. Some (SRQ4). In Section VIII, we identify the potential benefits of
guidance is provided by the lambda and kappa architectures NoSQL databases to improve healthcare BDA applications

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
IMRAN et al.: BIG DATA ANALYTICS IN HEALTHCARE — A SYSTEMATIC LITERATURE REVIEW 3

(SRQ5), followed by our proposal of the Med-BDA in Table I. Title filtration gave us 260 articles, out of which
architecture for BDA healthcare in Section IX (SRQ6) and we filtered 150 after abstract filtration, and finally, 99 articles
success strategies in Section X (SRQ7) to allow practitioners after text filtration which we use to answer our seven sub-
to implement these improvements in their organizations. In research questions. Also, Table II shows the distribution of
Section XI, we compare the contributions of our work across our 260 title-filtered articles with respect to digital sources;
twelve hallmark features with other related literature reviews the majority of articles were retrieved by Google Scholar (70)
pertaining to BDA healthcare and finally conclude our paper while IEEE provided the minimum number of relevant papers
with future research directions in Section XII. (33), with both ACM and Springer providing 55 odd articles.
Finally, Google Search Engine retrieved 4 relevant technical
II. Research Methodology blogs with our 18 search queries which were all retrieved in
To answer SRQ1–SRQ7, we conducted a systematic title-filtration stage. In Table III, we show the distribution of
literature review focusing on the following research domains: content type for our 99 selected articles; majority of these are
healthcare analytics, big data applications in healthcare, BDA published in journals (74) while conference and other
applications in healthcare, NoSQL healthcare applications, publishing methods have a reduced frequency comparatively.
and NewSQL healthcare applications. NewSQL is the In Table IV, we show the distribution of these 99 articles with
preferred type of NoSQL databases in industry because they respect to SRQ1–SRQ7; here, parentheses represent repetition
provide ACID guarantees like with relational databases [7], as a given article could be answering multiple sub-research
[8]. Our search queries (described later on) are based on more questions. Articles discussing BDA healthcare challenges are
popular terms related to these domains. We have selected the most frequent, followed by applications, big data
these domains to include the complete set of big data characteristics, benefits and potential of BDA for healthcare.
technologies in the market. Of particular interest to us are the Articles focusing on the use of BDA architectures or
more popular and successful solutions like Apache Hadoop presenting success strategies are least frequent, and none of
and MongoDB, along with the cloud solutions of Amazon them propose any architecture or present a roadmap. Also, the
(AWS) and Microsoft (Azure) [26]. We targeted all types of year-wise distribution of the 99 articles is shown in Fig. 1,
academic research content as well as non-research content which shows a well-defined peak in publications from 2011 to
(e.g., technical blogs and company websites). For the research 2014 corresponding to a spark of interest in BDA applications
content, we selected Google Scholar which is the most brought about by the increasing popularity of several NoSQL
comprehensive search for computer science content along databases, particularly MongoDB (introduced in 2010), Redis
with four other well-known sources, i.e., IEEE, Springer, (2009), Apache Hadoop (2007 onwards), Apache Spark
Elsevier, and ACM. Content from remaining sources (Wiley, (2014) for speeding-up Hadoop along with AWS cloud
Taylor & Francis, etc.) was retrieved by Google Scholar, services (2009 onwards). This is proved at least by the use
which indexes content from all other computer science-related Hadoop and MongoDB in our extracted papers. However,
sources through mutual contracts [27]. Healthcare research since 2017 onwards, academic research has apparently
content is also indexed by Google Scholar, e.g., the US dwindled due to the complicated nature of healthcare data and
National Library of Medicine (www.ncbi.nlm.nih.gov) [28]. the BDA process. Such a trend has also been seen in the
We focused on research from 2005 onwards, but did not telecommunications sector [24]. The academic and corporate
ignore the more historical content if we deemed it essential. healthcare companies then apparently need the comprehensive
We selected Mendeley due to its increased usage and better roadmap presented in this paper to solve their BDA
features to manage our citations after a survey of other tools implementation issues and extract value from datasets. To
[29]–[33]. To retrieve the non-research content, we used the drill-down further, we present the break-down of 260 papers
Google search engine. (filtered through title) with respect to distribution of search
We adopted the following three-step methodology to filter queries over digital sources in Fig. 2 (with six basic queries),
out the relevant subset of research articles from our Mendeley Fig. 3 (with six queries combined with healthcare (HC)), and
database. In the first step, we filtered articles based on their Fig. 4 (with six queries combined with healthcare analytics
titles, i.e., the extent to which these titles matched our selected (HA)). All four technical blogs were retrieved with “Big Data
research domains. In the second step, we filtered the first-step HA” search query in title-filtration stage. Some of the
articles based on their abstracts, and in the third step, we important insights we can derive from these figures are given
filtered the second-step articles based on their research content below:
(after reading the first 2 pages). Following are the six basic 1) The hyped terms “big data” and “big data analytics” have
search queries: “big data”, “NoSQL”, “NewSQL”, “big data been used most frequently by authors and were retrieved in
tools”, “big data techniques”, and “big data analytics”. We the majority of relevant content, while “NoSQL”, “NewSQL”,
combined each of these queries with “healthcare” and then “techniques”, and “tools” retrieved relatively less relevant
with “healthcare analytics”, giving us a total of 18 queries. We articles.
considered these queries generic enough to extract content 2) The distribution of content seems uniform across all
related to our sub-research questions, i.e., challenges, digital sources for the terms “big data” and “big data
applications, architecture, benefits, potential, and success analytics”.
stories of healthcare big data. 3) The term “healthcare” is more commonly-used by
The results of our article filtration methodology are shown authors (and retrieved more relevant content) as compared to

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
4 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

“healthcare analytics”. 20
4) The large body of papers retrieved with “big data” and
“big data analytics” discuss more generic topics like big data 15
characteristics, challenges, benefits, etc., but do not present

Frequency
any roadmap or concrete NoSQL-based application to
10
enhance and motivate research in this domain; this has been
done to a limited extent in papers retrieved with other
keywords. 5

5) Overall, it is apparent that research on NoSQL

applications of BDA to healthcare, and on solutions to their 0

19 5
19 8
20 9
20 0
20 2
20 4
20 5
20 6
20 7
20 9
20 0
20 2
20 3
20 4
20 5
20 6
20 7
20 8
20 9
20
implementation problems through big data tools and

9
9
9
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
19
techniques are limited. Year

TABLE I Fig. 1. Year-wise distribution of selected 99 articles.

Article Filtration Results
Filtration step Total articles
Title filtration 260 Big Data Analytics

Abstract filtration 150 Big Data Techniques

Text filtration 99
Big Data Tools

TABLE II NewSQL
Distribution of Title-Filtered Articles wrt Digital
Sources
NoSQL

Digital sources Studies

Big Data
IEEE 33
0 10 20 30 40 50 60
Elsevier 44
ACM 52 IEEE Xplore Elsevier ACM Springer Google Scholar

Google Scholar 70
Springer 57 Fig. 2. Digital source distribution for six basic search queries.
Google Search Engine 4
Total 260
Big Data Analytics HC

TABLE III Big Data Techniques HC

Distribution of 99 Selected Articles wrt Content Type
Big Data Tools HC
Content type Frequency
Technical report 7 NewSQL HC

PhD thesis 1
NoSQL HC
Book 3
Big Data HC
Conference 10
Journal 74 0 5 10 15 20 25 30 35 40 45 50

Technical blogs 4 IEEE Xplore Elsevier ACM Springer Google Scholar

TABLE IV
Distribution of 99 Selected Articles wrt SRQ1–SRQ7 Fig. 3. Digital source distribution for six basic search queries + healthcare
(Numbers in Parentheses Represent Repetitions) (HC).

Sub-research question Article distribution

Ever since the emergence of the Internet, the volume of
SRQ1 23 corporate data has been increasing; nowadays, processing
SRQ2 40 (3) terabytes of data on a daily basis is common practice in retail,
SRQ3 32 (7) financial, healthcare and other representative sectors. The rise
SRQ4 14 (4) of social networking platforms has increased the size of big
SRQ5 11 (7)
data further into petabytes and exabytes particularly in the
case of E-Commerce companies like Amazon, Google, and
SRQ6 5 (5)
Yahoo [34]. Although solutions for big data are evolving
SRQ7 2 (2)
rapidly, a large amount of effort is still needed to standardize
them in a global scale [35]–[38]. For instance, one
III. Background on Big Data recommendation for healthcare is to view data-driven

B. Big Data Storage

Big DataAnalytics HA
Relational databases (Oracle, MySQL, SQL Server, etc.) are
Big DataTechniques HA not capable of handling data-intensive and large-scale
applications, that involve the storage and management of huge
Big DataTools HA
volumes, and high frequency reads and writes efficiently, and
NewSQL HA at the same time flexibly cope with different types of data
models and support heterogeneous data [45]–[47]. Even
NoSQL HA
though relational databases like PostgreSQL and others have
Big Data HA tried to address these challenges, they are not efficient and
effective compared to standard big data storage, which
0 5 10 15 20 25 30
typically called NoSQL data stores [48], [7], [8]. NoSQL
IEEE Xplore Elsevier ACM Springer Google Scholar Google SE
technologies have been able to solve a majority of data
management, data storage, data processing, and data querying
Fig. 4. Digital source distribution for six basic search queries + healthcare problems in BDA initiatives. They hence comprise a crucial
analytics (HA). component of BDA architectures. NoSQL stores having a
global impact are MongoDB, CouchBase, Cassandra, Neo4J,
approaches as tools to facilitate understanding of these clinical Redis, Amazon’s NoSQL DynamoDB and Microsoft Azure’s
entities not as a disruptive process [39]. NoSQL CosmosDB along with Apache Hadoop and its
ecosystem [8]. Stores like MongoDB, Redis, and CouchDB
A. Big Data: Definition and Characteristics
can store and query huge volumes of streaming big data along
Francis Diebold first formally introduced big data as “the with optimizing query latencies. Almost all standard NoSQL
explosion in the quantity (and sometimes, quality) of available stores now satisfy the ACID requirements of relational
and potentially relevant data, largely the result of recent and databases allowing users to always view and query complete,
unprecedented advancements in data recording and storage updated data. Such NoSQL stores are also known as NewSQL
technology. In this new and exciting world, sample sizes are stores and also work in a distributed scenario, e.g., MongoDB
no longer fruitfully measured in “number of observations”, and Redis [48], [49], [7].
but rather in, say, megabytes. Even data accruing at the rate of Hadoop is a state-of-the-art solution to address the complex
several gigabytes per day was not uncommon” [40]. In this issues of BDA initiatives, primarily because it brings forth a
paper, we consider four standard characteristics of big data, whole ecosystem of tools and technologies to solve these
i.e., the well-known “3 V’s” (volume, variety, and velocity) issues. It brings forth a shared-nothing and distributed
presented by [41] and value. Volume refers to size of data architecture, which spawns new cluster nodes to address
(e.g., in terabytes), velocity refers to the incoming data speed increased storage and computational power requirements. The
(streaming or batch), variety refers to different data types addition of a node is a seamless process with all processing
(relational, images, text, videos, etc.) and value refers to and data distribution among multiple nodes occurring
insights derived through BDA. These 4 V’s are the essence to transparently. It performs ingestion, wrangling (cleaning),
validate the measurement of data characteristics of any big storage, querying and analysis on big data (peta-byte scale)
data use case [42]–[44]. An excellent classification of big data both effectively and efficiently, through fault-tolerance on
is given in [5] as follows: low-cost, commodity storage and servers. A key feature of
1) Big Data Properties: Specifying the challenges that can Hadoop is that it separates data storage from data processing
occur due to the presence of 4 V’s. [50]. Hadoop has 4 components: 1) Hadoop Common
2) New Data Models: Specifying the need for novel types of comprising common libraries, 2) HDFS which is a default
models for big data which can deal with common issues such storage and file system as well as the communication
as referral integrity, provenance, data linking, and data life- backbone between nodes, 3) Hadoop YARN (Yet Another
cycle implementation. Resource Negotiator) which manages resources for running
3) New Analytics: Specifying the need for BDA based on Hadoop’s MapReduce tasks, and 4) Hadoop MapReduce
data science and real-time analysis mechanisms. which is the programming paradigm for processing data stored
4) Infrastructure and Tools: Specifying the need for novel in HDFS.
infrastructure mechanisms related to network, data storage, Hadoop’s ecosystem is shown in Fig. 5 and comprises open-
high performance computing, along with tools for source projects maintained by Apache Software Foundation.
heterogeneous multi-provider services integration, data HBase is Hadoop’s persistent database which runs atop HDFS
processing, data-centric security models (for trusted while Hive is Hadoop’s warehouse which can be used to
infrastructure), and data-centric service models. execute SQL queries over HDFS data and store this data in the
5) Source and Target: Specifying the novel breed of input form of tables (much like relational stores) [51]. Through Pig,
and output tools for big data that deal with data sources one can easily write programming code related to Hadoop
capturing high speed/velocity data generated from a variety of MapReduce for data processing and Mahout is used to
smart sensors, delivery of data to consumers, actionable perform Machine Learning on HDFS data. Sqoop is used to
systems, and implementation of visualization techniques. fetch data from relational and other sources and insert it into

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
6 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

HDFS while Zookeeper is used to maintain coordination benefits.

between the nodes (its required for any Hadoop cluster setup). The BDA process is far more comprehensive and
Chukwa and Flume allow ingestion of big log data files and complicated due to the presence of 4Vs. Traditional analytical
their monitoring, while Ambari and Hue (not shown) are used methods for RDBMSs cannot always be successfully applied
to monitor the health of Hadoop’s cluster. Avro is a data over big data. Following are three differences identified from
serialization format for storing HDFS data on a hard disk and our research literature [4], [55]:
Oozie can create work flows for Hadoop jobs. In fact, Hadoop 1) Data Structure: Structured data in an RDBMS is
is not suitable for interactive and iterative jobs due to the processed after collecting and organizing it with sophistication
overhead of iterative reads and writes to and from the disk and ease. Big data, on the other hand, cannot be integrated and
[52], [53]. Spark [52] is another Apache project, developed to organized easily due to its heterogeneity and its streaming,
overcome the shortcomings of Hadoop. It avoids most of the real-time nature.
disk inputs/outputs operations by executing processes in the 2) Sample of Analysis: Conventional data analysis uses a
memory when possible. Spark processing has been shown to pre-designed question approach, in which a sample is selected
be at least 100× faster than MapReduce. from the existing known population and the process of
analysis begins after posting research question(s). This
Oozie process is based on collection of data to seek appropriate
Chukwa Flume Zookeeper
(Workflow Data management
(Monitoring) (Monitoring) (Management) answers to those questions. In BDA, data sizes are large if
monitoring)
velocity and variety are both high, so acquiring the right
Pig Mahout Sqoop sample cannot be always guaranteed. Thus, it operates on both
Hive Avro
(Data (Machine (RDBMS Data access
(Query) (Serialization) known and unknown populations.
flow) learning) connector)
3) Nature of Data: Traditional analytics performs analysis
MapReduce (Cluster YARN (Cluster and Data processing
management) resource management) on data at rest, i.e., data residing statically in a database. BDA
also handles streaming and high velocity data.
Hadoop Hadoop HBase
Common HDFS MapReduce (Database) Data storage Within the last decade or so, these differences have given
rise to novel research directions focused on creating new
Fig. 5. Hadoop components and ecosystem. analytics algorithms, techniques, frameworks, models, tools,
architectures and designs particularly targeted only for BDA.
Here is how Hadoop handles the 4V’s requirement: The most notable have been the design and use of a large
1) Volume: Hadoop splits large-volume data and data- variety of NoSQL databases [4], [56], which can be
processing between multiple data nodes (slaves). With the categorized into three categories: a) computational and
increase in the processing workload or data volume on each processing, e.g., MapReduce and Apache Spark; b) storage,
individual data node, the data can be split by adding more data e.g., HDFS; and c) Analytics, e.g., Apache Mahout and
nodes. Apache Hive [56], [57].
2) Velocity: Hadoop can store streaming data in HDFS and
avoids, or at least postpones, the latency from storing it IV. Big Data in Healthcare
directly into a relational database. Over the last decade, the use of electronic medical records
3) Variety: Hadoop has no specific variety requirement for (EMRs) by the medical staff, physicians and doctors has
storing data on HDFS; it is able to store data as operating grown substantially. Most EMR usage is focused on patient
system files without prior processing or checks. Any variety care [10], [58]. Medical science research is bringing smart
of data can hence be stored and there is no need to understand wearable devices and new technologies for patients, which
and define the data structure beforehand. generates big data from the continuous monitoring of vitals by
4) Value: The Hadoop ecosystem allows big data the doctors in real-time, in an attempt to reduce the frequency
organizations to use these multiple technologies in a standard of physical patient visits to the hospital [59].
architecture to derive value or insights from big data. There
have been many such experiments across different corporate A. Data in Healthcare
domains in the last decade and in this paper, we mention those HIMSs are rapidly becoming a necessity at a global scale.
related to healthcare. Physicians, doctors, patient care staff, and management staff
have started to use HIMSs regularly. An HIMS provides an
C. Big Data Analytics (BDA) interface between patient care and medical researchers. It
Data analytics can extract unknown patterns and trends, becomes inevitable especially in a medical emergency. The
hidden correlations, and other useful information from data mainstay of HIMSs are databases to store patients’ related
(value). BDA does the same with big data. BDA results enable data including treatment plans, prescriptions, and diagnoses
analytics professionals, predictive modellers and data [60]. HIMS is a patient care data intensive system. There are
scientists to analyse and examine huge volumes of data sets many processes in an HIMS generating new data every minute
containing variety of data forms that may be undiscovered by at the same time (see Fig. 6). These processes are producing
traditional analytics processes [54]. BDA can lead to multiple records for each patient of same data. Along with
competitive advantages, improved operational efficiency, patient care referenced data, new derived data is also created
better service, more effective new opportunities and other [61].

Smart
heterogeneous nature [21] and has changed the culture of
Wearables doing research, management and business, from a data
Clinical Clinical
Staff
perspective. Existing traditional data analysis approaches
Labs
cannot cope with the frequency of change, variety and
Surgical increase in size of data. Therefore, the architecture of the
Patient Wards
traditional RDBMS requires essential evolutions [68].
HIMS Data
Generators Considering the healthcare domain, the implementation of
Pharmacy Admin & quality patient care services requires a better strategic
Finance relationship with patients. Strategic patient relationships
management uses technology, processes, techniques,
Physician Practitioner information and medical staff, as components of the BDA
Doctor process. However, this process is highly affected by the
heterogeneous, high growing volume of in-motion patient care
data. Research work has primarily identified our 4 Vs
Fig. 6. Data generators for an HIMS. (volume, variety, velocity, and value) regarding healthcare big
data, as a result of initial processing and big data
An HIMS can have many data sources, on the basis of
classification, [69]–[73] (see Fig. 7). We now discuss these V’
which we can classify the different healthcare data types as s in the context of healthcare as follows:
follows [62]–[65]: 1) Volume: The volume of healthcare data is growing
1) Clinical Data, e.g., measurements of clinical judgements, rapidly; currently it is more than 500 petabytes, which is
fluid intake-output, vital signs and clinical examination expected to spiral up 50-times to 25 000 petabytes in 2020.
(including Boolean questions such as “Does the patient use The major source of this data are HIMSs, which are
any drugs? Did the patient previously undergo any surgical generating new data every minute at the same time [65], [70].
operation? Did any family member of the patient suffer a Particularly, medical imagery has much to contribute to
specific disease?”). volume. The enhancement in the quality of medical images
2) Administrative Data, e.g., patient admissions, number of has resulted in increase of image resolution. Hence, the size of
beds available, and rate of usage of a medical equipment. medical images (previously not more than several
3) Finance Data, e.g., data related to medical insurance, Kilobytes/Megabytes) is now ranging from Megabytes to
patient fees, adjustments, and diagnoses-related group costing. Gigabytes. Another major contributor to data volume is the
4) Medical Imaging Data, e.g., test results of Ultrasound- need to store patient history in EMRs, due to which the size of
Mammography, magnetic resonance imaging (MRI), an EMR can easily reach up to Gigabyte scale. For research
computer tomography (CT), positron emission tomography purposes, a number of providers’ organizations are retaining
(PET), and Radiography. patient masked data for an indefinite period [70], [74].
5) Laboratory Test Data, e.g., Protein Blood test results, 2) Variety: The variety of patient care data is directly linked
Urine test results, Enzyme, and Blood Sugar test results. to the data types mentioned above, i.e., clinical, administrative,
From the above, we can deduce the following modes of finance, medical imaging, and laboratory testing. There is also
clinical data collection. unstructured data as text notes from nursing and clinical staff,
1) Oral Collection, e.g., when patient provides responses to along with videos, images and information from monitoring
oral questions regarding patient history. Oral data can be equipment and smart wearable sensors, all creating a wider
registered on paper, or fed into HIMS or fed directly into a variety of data types and formats [75]. As popularity of
handheld device [66]. healthcare gadgets grows, data from these streams are expected
2) Manual Collection, e.g., check up of blood sugar using to integrate patient care data in the near future. It is a complex
stick, Blood pressure, Respirations per minute, Fluid outtake challenge to combine these diverse types of data to diagnose
(with a catheter), or physical examination by the medical accurately and prescribe the best treatment and cure for a
doctor [67]. specific patient. To resolve this, healthcare industry is already
3) Autonomous Collection, e.g., laboratory and medical moving towards big data and analytics [74].
imaging results, along with smart patient monitoring data 3) Velocity: Healthcare data can be either recorded manually
which are also stored autonomously. Images are usually by medical staff or autonomously through smart sensors. The
compressed with simple lossless and near-lossless methods former doesn't have much velocity and is typically used by
and usually require large storage space. Standards used for data warehouse and analytics solutions in “batch” mode. This
storage and transmission include picture archiving and cannot compete with the real-time high-frequency and high-
communication system (PACS), digital imaging and velocity sensor data, which is driven by the growing use of
communications in medicine (DICOM). smart sensors, high-resolution medical images and video.
Real-time data applications, such as early detection of
B. Big Data Characteristics in Healthcare infections and drug discovery could be helpful in the
BDA is all about the integration, valuation, management, reduction of mortality and morbidity of patients and it could
synchronization and analysis of high volume, variety and also be helpful in the prevention of hospital outbreaks [75]. In
velocity of data [23]. Big data is particularly defined by its fact, high velocity can easily overwhelm HIMS’s ability to

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
8 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

store and analyse streaming data [76]. BDA has lead to issues in the patient’s employment and/or insurance
revolutionized healthcare analytics with its capability to coverage [39]. This adverse effect is directly correlated to
perform real-time analytics on high velocity data. Velocity is confidentiality risks and data access [82]. Ignorance of
also proposing different prospects to enhance outputs by privacy on medical and scientific data may result in public
integrating the several activities of the healthcare value chain data which can be accessed openly. This confidentiality
like integration between wards and laboratories in an challenge would require changes in legislation involving
operationally viable bench-to-bed paradigm [77]. healthcare delivery [83]. Legislative changes with regards to
4) Value: The real driver for using big data in healthcare is data security and confidentiality could provide a more flexible
ultimately the identification of valuable information which framework that could be helpful in the adaption to BDA
can potentially improve patient care [78]. For this, healthcare technologies [84]. However, analytics on sensitive patient care
industry is focusing on operational efficiencies and business data is still a challenge with the adaption of BDA in the data-
process enhancements. The latter aims to reduce fraud, waste, driven health sector [85].
and costs by applying more efficient approaches for service
delivery, data analysis, management, and integration. The B. Granular Access Control
former aims to discover new techniques of providing patient Granular access control in healthcare enables patients and
care while efficiently allocating healthcare services [79], [80]. hospital medical users’ responsibilities, privileges, rights and
Data-driven healthcare organizations are shifting from roles to be set such that users related to the hospital are given
conventional monitoring reports to discovery of insights to privileges only to their relevant data or functional area of the
overcome traditional ineffectiveness and develop smoother system [86]–[90]. Ensuring high level of usability and security
workflows for better coordination among healthcare staff and to access relevant piece of data is an often-cited challenge in
patients and improved patient care [20], [22], [81]. BDA application to healthcare [91]–[93]. The specific
problems with granular access control are as follows:
1) Successfully tracking the privacy policy integrity,
Volume ● Tons of data degenerated by various
departments of hospitals and clinics 2) Successfully tracking user access,
3) Difficulty of keeping track of secrecy/security policies
● Different forms of data from different
Variety sources of hospitals and clinics and requirements in a cluster-based big data environment,
4) Keeping track of multiple users in a cluster-based big
Velocity ● Real-time data, with high speed
generated from smart sensors
data ecosystem,
5) The risk of privacy invasion when different user types
(patients and healthcare professionals) access different
Value ● Valuable deep insights for improved
patient care and actionable outcomes
components of the big data ecosystem simultaneously [94],
and
6) The successful implementation of mandatory access
Fig. 7. The 4 V’s big data identified in healthcare research literature.
control with proper application of secrecy/security
requirements [95].
V. Challenges in Healthcare BDA
We have identified five challenges being faced by C. Interoperability
healthcare industry in application of BDA. These are shown in Interoperability between the different healthcare data types
Fig. 8. We describe them as follows. in order to achieve some healthcare strategic vision is a major
challenge [43], [96], [97]. This challenge demands an
Data agreement on common data sets, developing common
Security interfaces, recording health information, and defining quality
healthcare standards policies, languages and clinical standards
[98], [99]. In the presence of multiple components and their
Data Access different users, it remains unclear as to how one can enhance
Provenance BDA Control big healthcare data interoperability across the different data
Challenges sources and types [100].
for
Healthcare D. Data and Analytics Reliability
Maintaining the reliability of data and BDA results is
Data and another core problem in application of BDA to healthcare
Analytics Inter-
Operability [98], [101], [102]. We have seen the different data types
Reliability
which can be generated in the healthcare domain, the different
modes through which the data can be collected, and the
Fig. 8. The Challenges in Application of Big Data Analytics to Healthcare. different methods of storing this data. Along with this is the
problem of high data velocity and integrating data variety
A. Confidentiality and Data Security [61], [63], [66], [67]. These complex dynamics can potentially
The misappropriation of patient healthcare information may decrease reliability of data and analytics results due to the

following situations [96], [98], [101]–[104]: environment is that how much trustworthy the data is.
a) There is an increased chance of an erroneous data entry in Protection of the provenance meta-data can be effective in the
the manual mode (through humans). verification of multiple data sources [43].
b) The data integration process can remain unoptimized due
to high data diversity occurring at high velocity. VI. Big Data Applications to Healthcare
c) Different components of HIMSs may be managing data at In this section, we will describe the applications of BDA to
different volumes and velocities, making the BDA process healthcare extracted from the results of our systematic
heterogeneous. literature review. These applications are centered around four
d) The pre-BDA extract transform load (ETL) process, i.e., NoSQL types, i.e., key-value stores, columnar stores,
cleaning of dirty data and developing an understanding of document stores, graph stores, and hybrid stores. We will
healthcare data lake, can turn out to be very complicated and describe each type with an healthcare example and then
inefficient, describe the research works using that NoSQL type. Based on
e) Due to the difficulty of data integration, it could be an analysis of NoSQL applications to healthcare [106], we
required to learn different BDA models for different HIMSs extract the following important NoSQL properties of interest
components/data sources, hence increasing the complexity of to healthcare:
the overall process (lesser efficiency and more BDA models 1) Scaling Out: Scaling horizontally from tens to thousands
to maintain) of nodes for storing and processing ever increasing volumes
f) If data is to be sampled for BDA, it is complicated to of EMRs.
acquire representative samples from high velocity data 2) Automated Scaling: Autonomous scaling out of EMR
streams. data in case a node capacity or user query hit ratio crosses
g) The BDA models operating on streaming data lakes are some threshold.
potentially inaccurate due to inappropriate sampling or 3) Reliability: Reliability and fault-tolerance of BDA
frequent change in patterns; these models need to be then process is achieved through replication of EMR data in
learned at a lesser velocity which can itself compromise the distributed data execution mode.
final BDA outputs. 4) Data Model Options: Flexibility in choosing the data
h) In a data pipeline based on 3 V’s, incorporating a model to cater for structured, semi-structured and unstructured
permanent BDA infrastructure with a traditional analytics EMR data streams.
pipeline is a time-consuming activity potentially requiring 5) CAP Theorem Compliance: Ensuring either availability
technical trade-offs/compromises. of EMR data to the queries, or the consistency of this data, in
i) Considering the large number of BDA techniques, tools, the face of data distribution (partitioning).
and algorithms available, it could be time consuming to select 6) Compliance with Eventual Consistency: In case EMR
the right personalized BDA solution, particularly in the case data consistency is compromised at run-time, there is a
of non-availability of BDA experts. If this search is not guided standard guarantee that it will eventually become consistent at
by extensive experimentation, BDA results will be incorrect some later point of time.
and/or unreliable. 7) NewSQL Compliance: If healthcare administrators are
j) Inadequate training of healthcare staff in the use of BDA strict on both consistency and availability, then NewSQL
can lead to sub-optimal performance, hence minimizing BDA solutions can offer both, along with complete compliance with
benefits. ACID properties; in essence, this is RDBMS-based EMR
mapped onto big data.
E. Data Provenance 8) Optimized Query Execution: Most of the NoSQL/
Data management and provenance is another challenge for NewSQL solutions have personalized query execution
BDA applications in healthcare. Effective coordination of engines, which would remain optimized for EMR data with
multiple departments in the health sector to use big data is a 3 V’s.
complex task [105]. The segregation of duties is not similar to 9) Cost-Effective: The standard big data solutions (e.g.,
operational systems in big data. It is unclear how Hadoop ecosystem and MongoDB) are open-source and
responsibilities in healthcare big data systems are divided hence, would incur zero purchase cost for a BDA healthcare
across other relevant bodies of healthcare. Improved infrastructure.
healthcare data management is necessary for effective data
usage to facilitate access [20], [22]. Some vulnerabilities A. Key-Value Stores
related to big data storage are consistency, data provenance, Key-value stores are databases which are based on the key-
confidentiality, and integrity. Malfunctioning infrastructure of value model, in which values are mapped corresponding to
big data applications is a major threat to data integrity. In the keys, i.e., a given value is given identity through its key. A
applications of big data, the provenance meta-data is similar to snapshot of a key-value store from the healthcare domain is
meta-data. It contains the provenance for the infrastructure of shown in Fig. 9. This store consists of three databases, i.e.,
big data itself. The complexity of the provenance information patient, practitioner, and diagnosis. The unique key of a
contained in the metadata of the big data system increases Patient DB is the patient’s medical record number (MRN),
with the growth of volume of data. There is a wide variety of and the values contain the first and last names, age, and the
sources to collect big data. The paramount importance in this list of symptoms. Two values for patients John Buck and Jack

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
10 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

Owen are shown. The key for the Practitioner DB comprises removed or added at run-time, with millions of columns also
the doctor’s employee number (EMN) and values contain the possible in a single store. For performance reasons, these
doctor’s department and the respective list of consultation columns are sorted on the disk, minimizing random access.
clinics. The key for a diagnosis DB comprises a diagnosis Overall, the storage efficiency is enhanced in columnar stores
number (local to the hospital), and values comprise the list of [111]. The disadvantage is the update operation. While in
laboratory tests executed along with the list of symptoms. RDBMS an update of tuples with a foreign key can be
“Breaking up” table-based data in this way into a multitude of enough, a column-oriented big data database may require an
flexible, lightweight key-value pairs leads to remarkably update of all values in a column for all records.
better query response times as compared to traditional A snapshot of a healthcare-based columnar store is shown in
RDBMSs [55]. Fig. 10. Here, we form two column families, one for the
patient’s residence location (City, Province) and the other for
Patient DB Practitioner DB Diagnosis DB the patient’s vitals (blood pressure (BP) and body temperature
(Temp)). We notice the sparsity of data, and also that its not
Key: {MRN} Key: {EMN} Key: {DiagNumber}
necessary for a family data to be fulfilled completely. “PJB”
Key: {543-2,
and “Punjab” represent the same province (Prov.) but this
Value: {John, Buck,
Value: {Pediatrics, [T1, T3, T17],
32, [Shallow
[Clinic 3]} [Arthritis]}
flexibility of using different strings is allowed. The surgery
Breathing]}
Key: {123-9, column (Surg.) lists the surgery type given to a patient and it
Value: {Jack, Value: [T1, T4, T9, CT1, X4],
Owen, 41, [Pain in {Orthopaedics, [Brain Tumor, High
is obviously sparse as a sample patient sample gets surgeries
Chest, Sweating, [Clinic 1, Clinic 2, Blood Pressure,
Vomitting]} Clinic 5]} High Cholesterol]}
generally. The Satis column, represents the patient satisfaction
as acquired by a survey, is also sparse as not all patients
provide responses. It is possible to add hundreds to thousands
Fig. 9. A snapshot of key-value store from healthcare domain. more columns related to a patient’s EMR.
Authors of [107] addressed the issue of inadequate
Column Column
collaborative patient care by applying and developing family of family of
ontology and its corresponding rules by designing a cross- location vitals
domain, reusable, evidence-based knowledge base. They
design a clinical context model for the u-healthcare domain. MRN City Prov. BP Temp Doctor LT Surg. Satis.
This model stores data in the form of a key-value store, 456-2 KHI SDH John T1, S1
Dave T2
integrates data from diverse mobile platforms, and is
224-3 119/78 100 High
formalized as a set of ontologies. On a similar note, contextual 123-9 LHR PJB T65 Low
information (CI) is applied by [108] to develop a healthcare 874-3 110/70 99
model based on ontology. The basic data structure in CI are 546-3 Jack T43
key-value pairs. Value in CI is used as an environment Sleeve
variable. The proposed ontology for healthcare includes Med.
service systems in several spaces, e.g., office, home, etc., and 445-7 ISB Punjab 156/99 S3
678-3 178/90 Sara S4
several devices, e.g., computer, mobile devices, etc. This
998-4 Low
ontology has been implemented in ubiquitous environments
387-5 T1, T2,
for personalized healthcare services. Finally, in [109], the T3
author presents a framework for integrating key-value stores
within a typical HIMS architecture. The core benefits of the Fig. 10. A snapshot of columnar store from healthcare domain.
efficient key-value approach is patient monitoring, clinical
predictions, and corresponding simulations, all done in real- To predict and efficiently manage the patient’s disease, the
time. Another benefit is the scalability of the framework to authors in [112] propose a patient-customized healthcare
include more hospitals, and the offering of the framework on system based on Hadoop with text mining (PHSHT). PHSHT
the cloud. consists of a text mining-based Hadoop module (TMHM), a
medical data collection module (MDCM), a disease
B. Columnar Stores management and prediction module (DMPM), and a disease
The idea of columnar stores was initially conceived by rules creation module (DRCM). These modules operate as
Google and implemented in their BigTable columnar store follows:
[110]. In a columnar store, a single table is dynamically 1) MDCM: It stores healthcare big data in HBase, which is
distributed over a cluster. There is no stringent requirement of divided into both structured and unstructured entities.
avoiding null values (as in RDBMS). Columnar stores can be 2) TMHM: It converts unstructured data to structured form
easily quite sparse, with each new row having a different through text mining, and distributes it collectively with other
schema. So, a single column can remain empty across structured data in HBase.
thousands of rows and there is no storage cost for these null 3) DRCM: It uses conditional probability set theory (CPST)
values. Columns can also be combined together to form to generate rules associating the relevant set of patient’s EMR
column families. The model is scalable in that columns can be attributes with the diagnosis.

4) DMPM: It provides customized patient medical services Patient DB Practitioner DB Diagnosis DB

by comparing a patient’s family history, patient’s current
{MRN: 134-1, {EMN: 446, {MRN: 543-2,
status and patient’s information with the disease rules stored First Name: John, Department: Tests: [T1,T3,T17],
Last Name: Buck, Pediatrics, Diagnosis: Arthritis}
in the DRCM. The most important service of DMPM is its Age: 32, Clinic: Clinic 3}
capability of efficient prediction of a patient’s disease based Symptoms: {MRN: 123-9,
Shallow {EMN: 650, Tests:
on patient’s current and historical status [112]. Breathing} Department: [T1,T4,T9,CT1,X4],
A more concrete work on the use of HBase in healthcare is Ortho, Diagnosis: [Brain
{MRN: 144-2, Clinic: [Clinic 1, Tumor, High Blood
presented in [113]. According to the authors, EMR data is Name: Jack Clinic 2, Clinic 5], Pressure, High
Owen, Designation: Cholesterol ]
typically spread out in different Excel files or related Senior Doctor}
Symptoms: [Pain PresCrit:
applications. These files can grow in volume and it is difficult in Chest, [Blood Thinning,
to query them collectively. They also contain diverse data. Sweating, {EMN: 650, Minimizing
Vomiting], Department: epilepsies,
Authors then motivate HBase to integrate these files, and RelSt: Married} Neurosurgery} Increasing Energy]}
show a simple experimental run of their idea on a 2-node
cluster. These runs demonstrate the efficiency of HBase with
respect to Excel file queries. Moreover, in [114], the authors Fig. 11. A snapshot of a document store from healthcare domain.
implement a complicated POC for a Canadian healthcare
project to load 30 TBs of healthcare data to an HBase cluster attachment management in a distributed fashion. Authors
with all base processing done by MapReduce. The authors applied the MapReduce paradigm to execute queries over
show that data ingestion and loading with HBase takes a long CouchDB, which demonstrated efficiency in information
time (one month), the MapReduce process has high retrieval as compared to RDBMS scenario.
limitations, and it is difficult to establish a schema for
D. Graph Stores
complicated healthcare data. However, the authors highlight
key-value databases as the best solution for healthcare data. In a graph database, data is represented in the form of a
graph (either directed or undirected). Data is associated with
C. Document Stores each node and each edge. A snapshot of such a scenario for
In document stores, the smallest atomic data storage unit is the healthcare domain is shown in Fig. 12. Here, the nodes
a document, which are semi-structured and comprise a represent two patients named Dave and Sharon, and a
collection of key-value pair data [65], [115]. Every document practitioner named Kate. Practitioners are identified by EMN
has its unique identifier and serializes tabular or object-based and patients by MRN. An undirected edge between Dave and
data by encoding it in semi-structured formats such as java Sharon shows similarity: both live in USA and were
script object notation (JSON), binary SON (BSON), and diagnosed with cancer in 2016. The two directed edges show
XML. In this regard, document stores are better than that Kate treated Dave and Sharon: Dave passed away
RDBMSs or object databases on many fronts. They consume (Mortality) after one year of treatment (Time), while Sharon
lesser storage, offer more efficient query access, and have a recovered after three years.
highly flexible schema. Diagnosis: Cancer
A snapshot of a document store for the healthcare domain is Year: 2016
shown in Fig. 11. It comprises three different databases, i.e., Country: USA
MRN: 334-6 MRN: 421-7
Patient, Practitioner, and Diagnosis. We show several Name: Dave Name: Sharon
documents for each database. A document is equivalent to one Age: 22 Age: 34
row of an RDBMS table. Notice the following:
1) It is not necessary that every attribute (key) is recorded in
Treated
the same way in a given document, e.g., names and symptoms Treated Time: 3 years
Time: 1 year Mortality: Yes
are recorded differently in the patient DB as are clinics in Mortality: No
practitioner DB. EMN: 314
Name: Kate
2) It is not necessary for documents in a DB to follow the Age: 43
same schema. Specifically: a) attribute relationship status
(RelSt) is included in second document in patient DB but not Fig. 12. A snapshot of a graph store from healthcare domain.
in the first, b) attribute Designation is recorded in second
document in practitioner DB but not in first or third, c) Physicians, patients, hospitals, insurance companies, and
attribute Clinic is not recorded in third document and d) other healthcare agencies have investigated the usefulness of a
attribute prescription criteria (PresCrit) is included in second graph database in the healthcare domain [116]. In [117], the
document in diagnosis DB but not in first one. authors propose and implement a social network data structure
In [65], the authors create and test two databases using open to integrate the variety of healthcare data and facilitate the
source Apache CouchDB which is a well-known document practitioners in their treatment processes. They show average
store. First, they loaded 1 949 753 images in a larger database performance results over basic graph operations, i.e.,
through Ruby scripting, while complying with the digital betweenness, centrality, closeness, and eccentricity. Also, in
imaging and communications in medicine (DICOM) standard. [118], a web-based graph database is implemented to illustrate
Then, they used CouchDB for database replication and the flowchart of a complicated cancer treatment, along with

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
12 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

some basic statistical graphs, e.g., cancer incidence per age comprises several tens of successful solutions which have not
group and per gender. been employed in published research, e.g., Redis, Riak,
Moreover, authors of [119] have compared an application of Aerospike, BerkeleyDB, Apache Cassandra, and many tools
Neo4J graph database1 and mySQL RDBMS. They propose of Hadoop’s ecosystem [8], [125]–[128].
transformation rules for mapping a normalized RDBMS 3) Highly successful in-memory NoSQL technologies like
schema to Neo4J. Results over two queries demonstrate the Apache Spark and Redis have not been used and have nor had
efficiency of Neo4J over mySQL. Authors also present an their potential fully realized in BDA healthcare applications.
efficient implementation decision support to the medical 4) None of the research works propose a formalized
experts designed rules by analyzing the whole big data healthcare architecture for BDA, e.g., using a lambda, kappa
system. For analysis and the integration of medical reports, or zeta architecture.
[120] presented a graph database framework called Gpf4Med
to introduce an effective and efficient healthcare research tool. VII. Benefits of BDA Applications to Healthcare
The framework was based on the architectural design and In this section, we mention the specific benefits of BDA
implemented BDA by taking exponential growth of data into applications to healthcare identified from our research papers.
consideration. These benefits can be particularly realized if heterogeneous
healthcare data can be successfully converted to knowledge
E. Hybrid Stores [4]. BDA can enhance patient care, decision-making and
The term “hybrid” in the NoSQL domain implied the use of healthcare planning [67], and identify best practices and
more than one NoSQL store in combination. However, the effective treatments [129]. Nurses also benefit from big data,
method of this combination is not clearly defined and was left since nursing care is related not only to the assessment of the
to the user. In academic research, there are four articles which patients’ clinical needs, but also to understand and focus on
employ this definition of hybrid. Specifically, in [121], the the psychological and social problems of the patient [130],
authors implement a proof-of-concept (POC) for a Czech [131]. The BDA benefits are illustrated in Fig. 13 and
healthcare center to manage healthcare big data through the summarized below:
Vertica NoSQL hybrid. The four step BDA process followed 1) Better Healthcare: BDA empowers medical profe-
by authors include data management, data storage, data ssionals to improve quality of life, cure diseases, avoid
analytics, and data visualization. Primarily, execution time for preventable deaths and predict epidemics. It can reduce
querying TBs of data is reduced while increasing the number medical errors and improve healthcare outcomes [9].
of Vertica nodes to 5. In [122], the authors implement a POC 2) Better Patient Care: BDA revolutionizes patient care by
to benchmark a hybrid architecture of MonogoDB, HBase and identifying infections swiftly and suggesting the right
Cassandra on e-health clouds for an industrial project based in treatments to patients. It also promotes personalized care to
India. The primary components are a query interface, query specific patients. This can be helpful to the patients to
administrator (which converts queries to MapReduce code), effectively manage their health such as medication adherence,
and translators for the hybrid NoSQL arrangement. Authors diet, exercise, etc [132].
execute some basic queries on the cloud to validate the query 3) Better Medical Care: BDA can help hospitals and clinics
efficiency of this hybrid. In [123], the authors implement a to store, digitally collate and analyze its patients’ conditions
cloud-based POC comprising a hybrid of MongoDB, related data to receive the best medical care. Through smart
PostgreSQL, and Neo4j for specific healthcare data types, devices, patients can be monitored and treated irrespective of
within the context of an Indian project for data portability locations. This provides better 24/7 medical care and is similar
between clouds. The FHIR standard2 is used for prototyping to having medical staff in every patients’ room [132].
the selected data and the authors present some basic execution 4) Better Healthcare Value: BDA can effectively reduce the
results to validate the approach. In [124], the authors costs of processing and storing of healthcare data and then
implement a POC to compare the performance of three apply sophisticated big data techniques to transform that
NoSQL databases, i.e., BaseX, eXistdb, and Berkeley DB patient centered data into valuable outcomes [78], [133].
with CouchBase. They validate the superior performance of 5) Better Care Delivery: BDA can be helpful in preventing
CouchBase for high-end big data workloads. duplication of treatment and unnecessary laboratory tests by
instantly accessing and tracking the patient’s medical history
F. Gaps in BDA Applications to Healthcare to determine the patient’s condition progress. This on-time
In summary, we derive the following gaps and limitations coordination of the patients’ records can be used to increase
regarding applications of big data solutions to healthcare effectiveness and efficiency of care delivery. In emergencies
domain: by delivering patients’ related information at the right time
1) The frequency of practical BDA implementations using BDA provides better healthcare delivery [79], [134].
NoSQL data stores in published research is limited; there are
only 13 such articles. VIII. Potential of NoSQL Applications to Healthcare
2) The standard NoSQL technology stack currently NoSQL technologies have been able to solve a majority of
data management problems and have had a global impact. It is
1 https://neo4j.com imperative to enhance NoSQL applications to healthcare big
2 http://www.hl7.org/implement/standards/fhir/) data. We conducted several Google searches to verify that the

Healthcare data lake SQL

Security &
Data Cleaning
governance
BI

Cleaning
DWH

Cleaning

Predictive analytics / Machine learning

Cleaning
Ingestion

Data sources

Fig. 13. Med-BDA: A state-of-the-art BDA architecture for healthcare.

same idea is being recommended in various industrial blogs benefits of this implementation for healthcare sector (e.g.,
and commentaries, described as follows: efficiency of query execution, scalability, reduced costs, load
1) In [135], technical analyst Martin stresses the importance balancing, etc.).
of using MongoDB, Hadoop and Neo4J graph store to manage 6) In a technical report for a healthcare project in Romania
and store healthcare big data. He particularly mentions [140], the authors concretely define the limitations of
unstructured, geo-spatial and sensor healthcare data, and RDBMSs and motivate the use of NoSQL databases to solve
stresses the need to ensure success in BDA initiatives through the data management problems for patient monitoring data.
careful selection of the NoSQL data stores. They propose the use of SimpleDB, CouchDB, and MongoDB
2) The company MarkLogic has successfully implemented as document databases, Voldemort, Riak, Scalaris,
its proprietary NoSQL database to solve healthcare big data Memcached as key-value stores, and HBase and Cassandra as
problems of the American Psychological Association [136]. wide columnar stores. They also propose MySQL Cluster,
MarkLogic stresses that NoSQL is necessary as RDBMSs are VoltDB, Clustrix, ScaleBase, NimbusDB as scalable
now incapable of handling healthcare big data. relational systems which can, to some extent, solve healthcare
3) The MongoDB company lists its successful applications BDA problems.
to store and query healthcare big data on its website [137].
According to MongoDB, “Healthcare companies rely on IX. Med-BDA: A State-of-the-Art BDA Architecture
MongoDB to address a broad variety of use cases while at the for Healthcare
same time meeting compliance standards and improving In our opinion, the core reason for limited SQL applications
healthcare outcomes”. Some use cases are 360-degree patient for healthcare BDA is the lack of a standardized architecture
view, population management for at-risk demographics, and primarily because: 1) the more well-known lambda and kappa
lab data management and analytics. BDA architectures are both complicated and expensive to
4) The CouchBase company has implemented an implement, 2) the state-of-the-art Zeta architecture solves the
architecture which uses its NoSQL database to solve issues of lambda and kappa but there is no guidance on how to
healthcare data management problems [138]. According to the implement it for the healthcare sector, 3) a rapidly expanding
company, this database is suitable for healthcare due to high NoSQL technology stack makes it difficult to decide on a
data availability, robust connections between health mobile particular store, 4) there is a lack of available expertise to
devices, best-in-class performance, flexibility, security, handle the complicated configuration of NoSQL stores and
regulatory compliance and scalability. their programming within a BDA architecture, 5) there is a
5) In [139], the authors conduct a POC to store and query need for extensive trial-and-error to tune the usage of NoSQL
representative electronic health records (EHRs) in MongoDB, data stores. To solve these issues, we propose a layered BDA
in the context of a healthcare project in Botswana. They architecture for healthcare big data which we label as Med-
propose a MongoDB schema for EHR and mention the BDA (shown in Fig. 13) and in the next section, we define

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
14 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

success strategies on how to ensure a successful BDA have proposed the use of well-known Docker tool for
initiative with Med-BDA. It is important to stress that the containerization (whale icon) [144]. Each unique software
application of Med-BDA requires a BDA requirement from component and process in Med-BDA, specifically data store,
the clinical managers. This requirement could be related to data ingestion, data governance and security process,
one or more BDA specifications, namely: data assessment and analytical engine, data query and processing technologies and
data quality management (under the umbrella of data visualization tools, will be running in its own Docker
governance), SQL-based querying, business intelligence, data container. All Docker containers within Med-BDA will
warehousing or predictive analytics (machine learning). coordinate with each other using either Docker Swarm,
Following are the hallmark characteristics of Med-BDA: Docker Compose or Kubernetes (see [144] for more details).
1) State-of-the-Art Technology Stack: We have designed 10) DevOps: Development in Med-BDA follows the
Med-BDA based on a thorough research of NoSQL and other DevOps technology, allowing continuous development,
big data tools and technologies with respect to their testing, integration and live testing, all coordinated through
performance (read, write, query), scalability, ease-of-use, the well-known GitLab software (a project of GitHub).
successful applications, user acceptability, limitations, and DevOps is the de-facto standard in an architecture develop-
community support3 We also used our previous knowledge of ment process and is globally applied [145].
designing BDA architectures, e.g., our work done for the In terms of the above features, we now describe Med-BDA’s
telecom sector [24]. layers as follows (in order of analytical data flow):
2) Comprehensive: Med-BDA is designed for all types of 1) Data Sources: This layer comprises all potential
healthcare analytics and BDA applications. healthcare data sources, namely (from the top), in-patient, out-
3) Zeta Architecture: Med-BDA follows the state-of-the-art patient, human resource, EHRs, all types of medical
Zeta BDA architecture proposed by MapR technologies, databases, pharmaceutical, health insurance, patient surveys,
which solves the limitations of the historical lambda and IoT-related (smart devices), bioinformatics, genomics and
kappa BDA architectures and enhances efficiency, resource social networking data.
utilization and NoSQL tool management [25]. 2) Ingestion: This layer ingests data from Data Sources,
4) Python: Med-BDA architecture development is based on the BDA requirement. Ingestion APIs should be
completely based on Python language, which is the top-most developed in Python. The meta-data to be recorded includes
big data programming language currently, according to the name of connected sources, the time and schedule of
popularity of programming language index [141]. ingestion, amount of data ingested, etc. Apache Kafka is the
5) Hybrid Database: Med-BDA employs a hybrid database de-facto standard for ingesting data streams, and we
which combines several NoSQL stores and a relational store recommend the same. The ingestion activity will initially pose
under one access mechanism (detailed below). To the best of configuration issues (as is standard for any open-source tool
our knowledge, this is the first proposal of a hybrid for usage) but with tuning (represented by the icon “T” on green
healthcare and is necessary to cater for the complicated and background), the issues will be solved. Tuning represents
diverse nature of healthcare data. change of parameters and ingestion methodology.
6) Data Governance: Med-BDA is the first BDA 3) Security & Data Governance: This layer implements the
architecture to incorporate the requirement of data gover- required data security practices, e.g., to anonymize the data, as
nance, a rapidly-expanding technology which ensures data required by clinical regulatory authorities. This is part of a
quality, security and management throughout the organization data governance initiative which initially assesses the quality
of the data and then implements standard rules throughout the
and is a top-most analytical trend in 2020 [142], [143].
clinical organization to improve the current quality and ensure
7) Meta-Data: In Med-BDA, we record the relevant meta-
that errors in data and analytical processes do not occur in the
data at each layer, depicted by the icon labeled “M” (with
future [143].
yellow background), as per requirement of data governance
4) Healthcare Data Lake: This layer inserts the audited and
practices.
secure healthcare data into a hybrid database, which forms our
8) Master Data: Med-BDA also implements master data
data lake. The implementation of a lake is now standard
management, which is a critical activity to maintain a clean,
practice in BDA [146]. For our hybrid, we propose the use of
updated and ubiquitous version of the most important data in
MongoDB, Redis and Apache Cassandra (running on Hadoop)
an organization.
as NoSQL stores and PostegreSQL as the relational store, the
9) Micro-Services and Containerization: In the context of
latter being the best relational store for BDA and used
Zeta, Med-BDA uses micro-services implemented with
extensively by Amazon Web Services cloud. Redis can also
containerization. Containers are small, light-weight software
be used as a caching service. Due to the use of Hadoop, it is
components with pre-installed functionalities. A containerized
also possible to execute data warehousing through Apache
BDA architecture has numerous containers interacting with
Hive and faster processing through Apache Spark during
each other (called orchestration) in a plug-and-play fashion,
analytics later on. We are confident that all types of healthcare
which greatly enhances resource optimization and reduces
data (to be used for BDA) can be accommodated in our hybrid
time and cost of running the architecture. In Med-BDA, we database. For instance:
3A group of graduate students participated in this activity over a period of 3
a) Complicated and high volume EHR data can be stored in
months. For the sake of brevity, the details are outside the scope of this paper. Cassandra to cater for the scalability requirement and provides

excellent query performance. X. Success Strategies for BDA in Healthcare

b) The IoT healthcare data can be stored in MongoDB In this section, we will lay down a road map to achieve a
which has many successful real-time analytics use cases. successful BDA implementation for healthcare based on our
c) Bioinformatics and genomics data can be stored in Redis systematic review and the Med-BDA architecture. The
as key-value pairs as there are long and complicated value roadmap consists of the following steps:
patterns. 1) Understanding the Healthcare Domain: The initial task
d) Pharmaceutical data is more regular in nature (drug of the BDA team is to understand the particular healthcare
research, supply chain, sales, etc.) and can be stored in process, i.e., acquire the domain knowledge. For instance,
PostegreSQL. EHR analysis requires BDA team to understand the in-patient,
In this layer, we recommend implementation of master data out-patient and other treatment-related processes. To gather
management (MDM) practices which are associated with data this knowledge, the team can use YouTube, online blogs and
governance [143] along with the meta data repository (MDR). also domain experts (healthcare managers). This activity
Note that meta-data will be stored in this layer extensively should last from 2 weeks to 1 month.
along with fine-tuning of the hybrid database to all the 2) Specifying the Problem Statement: The next task of the
incoming data ingestion streams. Although the tuning process BDA team is to interact with the clinical domain experts to
can get lengthy (e.g., see [114]), our choice of the data stores extract the exact problem(s) which needs to be solved and
is expected to solve these issues more efficiently. document it. This will serve as the crux for the whole project.
5) SQL: This layer allows execution of SQL queries on This step is necessary because sometimes, even the experts are
clean and processed data (ETL) from the lake. For this, we not sure about the requirements as the BDA team. Multiple
propose Apache Drill which is the best choice to execute SQL discussion sessions on this topic will evolve the specific
queries on NoSQL and relational data stores. SQL layer could problem(s). This activity should be completed within 2 weeks.
also require the use of Spark (over Hadoop) for processing 3) Identify and Understand Data: The next task is to
before SQL is applied. Meta-data is recorded and tuning for identify the relevant data sources for solving the above
Drill and Spark usage will be required. problems and understand the current schema of this data,
6) BI: This layer allows execution of Business Intelligence particularly the names of the attributes. For this, the BDA
queries on clean and processed data (ETL) from the lake. For team will need active communication with the IT department
BI tool, we recommend using PowerBI, the leader in Gartner’s which typically manages the database. The output is a data
magic quadrant for BI in 2020 [147]. PowerBI can be applied glossary document which will assist the team in later stages of
to both batch data (using Spark over Hadoop) or streaming data usage. This activity should last from 1 week to a
data (using Spark Streaming over Hadoop). Meta-data is maximum 3 weeks, and is followed by data ingestion.
recorded and tuning for Spark usage will be required. 4) Data Governance Application: The healthcare
7) DWH: This layer allows execution of data warehousing organizations need to implement a data governance initiative
queries on clean and processed data (ETL) from the lake. For which is managed through the creation of a council comprised
this, both Hive and Spark can be used. Meta-data is recorded of all the business managers, IT managers and BDA managers
and tuning for Spark and Hive usage will be required. DWH in the organization. The council ensures enterprise-wide data
outputs (multi-dimensional aggregate values) can be viewed security and regulatory requirements, data assessment for data
in PowerBI. entry errors (missing values, incorrect values, duplicated
8) Predictive Analytics: This layer allows execution of values, outlier values), and implementation of corrective rules
machine learning on clean and processed data (ETL) from the at all data entry points to ensure autonomous removal of these
lake. For this, both Apache Mahout (using MapReduce over errors and other related data issues in future data. Data
Hadoop) and MLLib (using Spark over Hadoop) can be used. governance is particularly important in the context of the
Meta-data is recorded and tuning for Mahout or MLlib usage general data protection regulation (GDPR), an EU regulation
will be required. Machine Learning outputs (prediction on data privacy which is now being adopted globally at a
accuracies, etc.) can be viewed and analyzed in PowerBI. rapid pace since its introduction in 2019 [143].
We note that all client interaction through PowerBI 5) Maintaining the Technology Stack: The big data
dashboards will occur with REST APIs which are standard technology stack has been evolving rapidly in the last 15
methods of communicating between GUIs and back-end years. Our proposal of Med-BDA is based on the current
architectures. The orange spiral (bottom-right) implies that the standard stack. To achieve success, this needs to evolve based
above data flows are repeatable for different types of on the latest, well-known and most usable tools (always
analytics. So, given an analytical requirement, a subset of the determined over several comparison parameters).
pool of Docker containers can be used to set up an analytical 6) Automatic Update of BDA Results: The BDA results
pipeline on-the-fly and this pipeline can be terminated once from Med-BDA are not a one-time execution output. Data
the requirement is complete. It can be easily verified that sources will keep on pumping big healthcare data into Med-
Med-BDA is state-of-the-art and potentially more effective, BDA. So, for each analytics process in Med-BDA, a pipeline
efficient, cost-effective, and comprehensive as compared to in Python should be developed which autonomously takes the
other published BDA architectures for healthcare [114], new batch of data and executes the whole BDA process until
[121]–[123]. visualization in PowerBI or Apache Drill. One application is

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
16 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

autonomous machine learning where a predictive model programmed in advance so ingested data is automatically
autonomously updates itself to cater for new training data, integrated within the hybrid at high data speeds. The use of
while maintaining the predictive accuracy. NoSQL allows more flexibility in data integration than
7) Investment in Hardware vs Cloud: If allowed by relational databases, allowing a single BDA pipeline to cater
regulatory bodies, medical institutions should implement for healthcare data integrated in different ways within the
Med-BDA on the cloud (AWS or Azure). This will save them same database [8].
the hassle of buying hardware (server machines) and c) Data comprehension: Our hybrid database allows
maintaining a complicated network consisting of tens of different NoSQL query engines to run concurrently over
Docker containers interacting with each other in different different healthcare data types, hence providing enhanced data
ways. Otherwise, Med-BDA will be implemented in-house comprehension more efficiently as compared to traditional
over dedicated server machines (at least 3–5) along with NAS technologies. Python’s programming modules is then used to
(Networked Attached Storage) as backup. This in-house setup further explore and understand the data, e.g., through the
is definitely more expensive than a cloud-based installation. standard Pandas and Numpy libraries [141].
8) Investment in BDA Skillset: One of the major reasons for d) Data sampling: The tuning process at the ingestion layer
BDA project failures has been the lack of relevant skill sets. can determine the right sample of data from real-time or near
The medical institution and/or the BDA vendor should invest real-time data streams, due to the use of Apache Kafka, the
in developing a team with core BDA skill set, specifically, capability of NoSQL databases to store streaming data, and
expertise in Python, Linux platform development, Hadoop containerization. In fact, testing of these samples also occurs
Ecosystem and NoSQL store installation and usage, and at high speed (not in the traditional batch-based fashion).
architecture development skills. e) Infrastructure and technology stack: Our proposal of
It is important to mention that all major BDA healthcare Med-BDA solves this problem through the use of a well-
challenges presented in Section V can be successfully addre- researched, effective, efficient and previously successful
ssed by Med-BDA and our success strategies, specified technology stack and architecture.
below: f) Inadequate training: We mention developing the BDA
9) Confidentiality and Data Security: Med-BDA’s security skillset within the clinical organization as a success strategy
and data governance layer allows implementation of data which could involve several employee trainings, e.g., on
anonymization, security requirements and regulatory Docker, NoSQL databases, and containerization.
compliances to protect the patients’ treatment, insurance and 13) Data Provenance: To ensure data provenance, we
other clinical data. The activities in this layer will be record meta-data at all data activity points in Med-BDA and
automated and the governance team will monitor these the choice Med-BDA’s technology stack solves all data and
activities on a regular basis. analytics reliability requirements.
10) Granular Access Control: Providing data and We note that, currently, there are many different types of
information access control to clinical employees, at any level BDA problems in healthcare, e.g., related to patient care,
of data security, is another feature of the security and data pharmacy, health insurance, IOT-related (e.g., body area
governance layer. Software programming is used to networks), bioinformatics and genomics. The Med-BDA
automatically execute the access rules whenever an employee architecture is generic enough to be applicable to each of these
logs on to the system. In fact, all data security practices can be problems, particularly due to the plug-and-play nature of its
managed extremely effectively with the right data governance zeta architecture. For each application, the roadmap defines
tools and team (for details, refer to [143]). the application process and Med-BDA provides the
11) Interoperability: Med-BDA’s hybrid database allows implementation details. In other words, the technology stack
interoperability between different healthcare data types, by for any type of healthcare BDA application will remain
storing this data variety in different NoSQL databases and exactly the same as we have proposed in Med-BDA.
controlling them through a unified interface. We have already However, we cannot generalize this to other domains (finance,
mentioned that this is fast becoming a practice in BDA telecommunications, retail, agriculture, etc.) because the data
applications [146]. management dynamics of each domain is unique and requires
12) Data and Analytics Reliability: All issues in
a unique, tailored architecture (e.g., see [24] for a proposed
maintaining reliability of data and analytics in healthcare
BDA architecture for telecom industry).
BDA (listed in Section V-D) can be now solved through Med-
BDA’s technology stack. For this, we have divided them in XI. Comparison to Related Literature Review and
the following headings: Commentary Papers
a) Data entry errors: Manual data entry errors are In this section, we compare the hallmark features of our
completely eliminated through application of automated, hard- work to the following nine (9) selected literature review and
coded data rules determined by the governance team after an commentary papers of BDA applications to healthcare:
initial data assessment activity of healthcare database. [148]–[151], [103], [152]–[155]4. The hallmark features of
b) Data integration: The data integration process can be our work are as follows:
managed effectively at the ingestion and data lake layer. Basic 1) Systematic Literature Review (SLR): This feature records
data cleaning before data integration occurs at ingestion layer,
and the integrated schema (for hybrid database) is 4 To the best of our knowledge, this list is complete as of June 2020.

TABLE V
Comparison of Related Review and Commentary Papers to our Work; SLR = Systematic Literature Review; CHRT =
Characteristics of Big Data; BNFT = Benefits of BDA; APP = BDA Applications; CHLN = Challenges of BDA
Applications; PTNL = Potential of BDA Applications; LIM = Limitations and Gaps of BDA; SS = Success
Strategies of BDA Initiatives; DT = Big Data Types; Process = BDA Process; Architecture = BDA
Architecture; NoSQL = NoSQL Databases
Paper SLR CHRT BNFT APP CHLN PTNL LIM SS DT Process Architecture NoSQL
[148] No No Yes (L) Yes (L) Yes (L) Yes No No Yes Yes No No
[149] No Yes Yes Yes (L) No Yes No Yes Yes Yes No No
[150] No Yes No Yes (L) Yes No No No No Yes No No
[151] No Yes Yes (L) Yes Yes Yes No No Yes Yes (L) No No
[103] No No No Yes (L) No No No No No No No No
[152] No Yes No Yes Yes Yes No No Yes No No No
[153] Yes Yes No Yes (L) No Yes Yes No Yes No No No
[154] Yes Yes No Yes (L) No Yes Yes No Yes No No No
[155] No No Yes Yes Yes No Yes Yes Yes Yes No Yes
Our Work Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

whether the compared paper is an SLR (or not). In our papers, healthcare domain (or not).
a non-SLR paper is a commentary paper. Since our work is an 9) Data Sources of BDA Applications (DT): This feature
SLR, we need to compare it against both SLRs and records whether the compared paper deals with BDA
commentary papers. Note that an SLR applies a more applications across all possible healthcare data sources (or
effective and robust method of paper extraction than a not). In our work, we are dealing with all the datasets.
commentary paper, which does not execute any research 10) Process of BDA Applications (Process): This feature
methodology. records whether the compared paper proposes and discusses a
2) Big Data Characteristics (CHRT): This feature records BDA process for healthcare (or not).
whether the compared paper determines big data 11) Architecture of BDA Applications (Architecture): This
characteristics in healthcare data or not. We have answered feature records whether the compared paper proposes and
this through a formal research question (SRQ1). discusses a BDA architecture for healthcare (or not).
3) Benefits of BDA Applications (BNFT): This feature 12) NoSQL Applications (NoSQL): This feature records
records whether the compared paper investigates the benefits whether the compared paper discusses BDA applications with
of BDA applications to healthcare (or not). We have answered respect to NoSQL databases (or not). This feature is
this through a formal research question (SRQ4). important, considering that big data has to be stored in
4) BDA Applications (APP): This feature records whether NoSQL databases, which itself has a strong impact on the
the compared paper extracts papers related to BDA ensuing analytics (Section III-D).
applications to healthcare (or not). We extracted applications In Table V, we compare our work with the nine selected
through a formal research questions (SRQ3) and categorized papers across the hallmark features. Our proposal of Med-
our applications in following dimensions of NoSQL (Section BDA is unique in that none of these papers has proposed any
VI): scaling out, automated scaling, reliability, data model standard BDA architecture, although some of them list steps
options, CAP theorem compliance, eventual consistency, for a BDA process. Note that in Med-BDA, we also define the
NewSQL compliance, optimized query execution, and cost- process to be followed along with the architecture. Also, only
effectiveness. [155] proposes the use of NoSQL stores for healthcare BDA
5) Challenges of BDA Applications (CHLN): This feature besides our work, showing that the other works are not per-
records whether the compared paper investigates the fectly aligned with the latest trends (as shown in Section VII).
challenges of BDA applications to healthcare (or not). We The word “L” in Table V means “limited”; for columns
have answered this through a formal research question BNFT, CHLN, and Process, this means that the benefits and
(SRQ2). challenges are defined superficially and for APP, it means that
6) Potential of BDA Applications (PTNL): This feature no formal attempt was made to extract all application-related
records whether the compared paper investigates the potential papers (true for 75% of the papers). Also, only two works are
of BDA applications to healthcare (or not). SLRs (like our work) with the rest being simple commentary
7) Limitations of BDA Applications (LIM): This feature papers with no formal review methodology. The first [153]
records whether the compared paper investigates the reviews only 22 papers as compared to our 80, with
limitations and gaps of BDA applications to healthcare (or applications limited to predictive analytics for healthcare
not). We specifically mention them in Section VI-E. operations and supply chain management. The second [154]
8) Success Strategies of BDA Applications (SS): This reviews 65 papers as compared to our 80 with applications
feature records whether the compared paper investigates the limited to machine learning, cloud-based, heuristic-based,
strategies of ensuring a successful BDA initiative in agent-based, and hybrid mechanisms. This paper contains

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
18 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

long tables difficult to interpret in one go. Big data applications, e.g., analysis of IoT-based production data [156],
characteristics, its potential for healthcare and big data types learning compressed data representations through latent factor
are all mentioned most frequently across all papers. The models [157], and analysis of mobile data streams [158]. To
limitations of BDA research are mentioned in only three develop the architecture, the particular requirements of a
works while success strategies are mentioned in only two given domain needs to be initially analyzed and then the
works. Overall, we have proved that this paper combines a set technology stack can be selected based on these requirements
of hallmark features which have not been collectively by big data domain experts.
addressed in any previous paper5.

XII. Conclusion and Future Work References

The healthcare sector is now facing a deluge of big data [1] N. V. Chawla and D. A. Davis, “Bringing big data to personalized
healthcare: A patient-centered framework,” J. Gen. Intern. Med.,
from multiple heterogeneous sources. BDA is hence a core vol. 28, no. S3, pp. 660–665, Jun. 2013.
requirement to treat patients effectively and run all data- [2] A. R. Reddy and P. S. Kumar, “Predictive big data analytics in
related clinical operations smoothly. However, the BDA healthcare,” in Proc. 2nd Int. Conf. Computational Intelligence &
process is complicated to a large degree and has seen much Communication Technology, Ghaziabad, India, 2016.
failure in different industrial sectors due to increased costs, a [3] R. Kohli and S. S. L. Tan, “Electronic health records: How can IS
skill set which is difficult to acquire, rapidly expanding researchers contribute to transforming healthcare?” MIS Quart.,
vol. 40, no. 3, pp. 553–573, Sept. 2016.
technology stack, and increased management overhead. We
[4] H. Chen, R. H. L. Chiang, and V. C. Storey, “Business intelligence and
have previously tried to present a roadmap which solves these
analytics: From big data to big impact,” MIS Quart., vol. 36, no. 4,
issues to some extent for the telecommunications sector [24]. pp. 1165–1188, Dec. 2012.
In this paper, we have adopted a more aggressive approach to [5] Y. Demchenko, C. Ngo, and P. Membrey, “Architecture framework
solving these issues for the healthcare sector through a and components for the big data ecosystem Draft Version 0.2,” System
systematic literature review. We verify big data characteristics and Network Engineering, SNE technical report SNE-UVA-2013-02,
for healthcare, identify the current BDA challenges, extract Sept. 2013.
and review the BDA applications based on NoSQL databases, [6] C. M. Tucker, M. Marsiske, K. G. Rice, J. J. Nielson, and K. Herman,
“Patient-centered culturally sensitive health care: Model testing and
identify their limitations, motivate the further use of NoSQL refinement,” Health Psychol., vol. 30, no. 3, pp. 342–350, May 2011.
databases to solve all BDA challenges, propose a state-of-the- [7] G. Harrison, Next Generation Databases: NoSQL, NewSQL, and Big
art BDA architecture called Med-BDA based on the latest zeta Data. Apress, 2015.
paradigm, provide a list of success strategies to ensure a [8] X. Wu, S. Kadambi, D. Kandhare, and A. Ploetz, Seven NoSQL
successful execution of Med-BDA, identify all potential Databases in a Week: Get Up and Running with the Fundamentals and
benefits of BDA and finally, compare our work with related Functionalities of Seven of the Most Popular NoSQL Databases
Kindle. USA: Packt Publishing, 2018.
literature reviews of healthcare analytics to prove that our
aforementioned contributions are unique. [9] K. Jee and G. H. Kim, “Potentiality of big data in the medical sector:
Focus on how to reshape the healthcare system,” Healthc. Inform.
Our work endeavors to assist healthcare organizations in Res., vol. 19, no. 2, pp. 79–85, Jun. 2013.
planning an actionable roadmap comprising big data technical [10] J. King, V. Patel, and M. F. Furukawa, “Physician adoption of
experts, BDA experts, and technology and process enhance- electronic health record technology to meet meaningful use objectives:
ments to lead a comprehensive valuation of their current BDA 2009–2012,” The Office of the National Coordinator for Health
abilities and capabilities, value drivers and prioritize their Information Technology, Tech. Rep., Dec. 2012.
BDA-related goals. By applying our roadmap, organizations [11] V. Mayer-Schönberger and K. Cukier, Big Data: A Revolution That
Will Transform How We Live, Work, and Think. Eamon Dolan, 2014.
can develop a cost-effective and timely strategy for their BDA
[12] S. Axryd. Why 85% of big data projects fail. [Online]. Available:
initiatives that leads to incremental benefits. A limitation of
https://www.digitalnewsasia.com/insights/why-85-big-data-projects-
our roadmap is that our proposed architecture has not been fail. Accessed on: Apr. 16, 2019.
currently implemented practically; the first prototype implemen- [13] I. Yoo, P. Alafaireet, M. Marinov, K. Pena-Hernandez, R. Gopidi, J. F.
tation is scheduled to start at Agha Khan University Hospital Chang, and L. Hua, “Data mining in healthcare and biomedicine: A
in July 2020, a well-known international chain of hospitals survey of the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448,
(https://hospitals.aku.edu/pakistan/Pages/default.aspx). The fu- May 2012.
[14] D. Tomar and S. Agarwal, “A survey on data mining approaches for
ture work primarily involves experimenting with Med-BDA in healthcare,” Int. J. Bio-Sci. Bio-Technol., vol. 5, no. 5, pp. 241–266,
the light of our proposed success strategies. In proposing Oct. 2013.
Med-BDA, we have concretely solved the BDA problem for [15] H. C. Koh and G. Tan, “Data mining applications in healthcare,” J.
the healthcare sector because the implementation team now Healthc. Inf. Manage., vol. 19, no. 2, pp. 64–72, Feb. 2005.
has a solution methodology and tool for every possible [16] S. Patel and H. Patel, “Survey of data mining techniques used in
challenge. We will be proposing extensions to Med-BDA (if healthcare domain,” Int. J. Inf. Sci. Techn., vol. 6, no. 1–2, pp. 53–60,
Mar. 2016.
needed) in case of any abnormal changes in BDA technology
[17] R. Sujatha, R. Sumathy, and Nithya R A, “A survey of health care
stack. Finally, it is important to mention that variants of Med-
prediction using data mining,” Int. J. Innov. Res. Sci., Eng. Technol.,
BDA can be developed for other important big data vol. 5, no. 8, pp. 14538–14543, Aug. 2016.
5
[18] P. Horstmeier. Healthcare business intelligence: What your strategy
A detailed discussion of the nine compared papers is outside the scope of
needs. [Online]. Available: https://www.healthcatalyst.com/healthcare-
this work; we invite the reader to go through these papers for more required
business-intelligence-data-warehouse, Accessed on: Jan. 1, 2016.
information.

[19] H. Smalltree. Business intelligence case study: Hospital BI helps [44] E. Morley-Fletcher, “ Big data healthcare: An overview of the
healthcare. [Online]. Available: https://searchbusinessanalytics.techtar- challenges in data intensive healthcare,” 2013. [Online]. Available:
get.com/news/1507291/Business-intelligence-case-study-Hospital-BI- http://ec.europa.eu/information_society/newsroom/cf/dae/document.cf
helps-healthcare, Accessed on: Jul. 20, 2006. m?doc_id=3499.
[20] M. Karlberg and M. Skaliotis, “Big data for official statistics – [45] G. Luo, “Mlbcd: A machine learning tool for big clinical data,” Health
Strategies and some initial European applications,” United Nations Inf. Sci. Syst., vol. 3, no. 1, pp. 3, Sep. 2015.
Economic Commission for Europe, Geneva, Switzerland, Tech. Rep., [46] E. F. Codd, “A relational model of data for large shared data banks,”
Sept. 2013. Commun. ACM, vol. 13, no. 6, pp. 377–387, Jun. 1970.
[21] O. Ola and K. Sedig, “The challenge of big data in public health: An [47] K. Orend, “Analysis and classification of NoSQL databases and
opportunity for visual analytics,” Online J. Public Health Inf., vol. 5, evaluation of their ability to replace an object-relational persistence
no. 3, pp. 223, Feb. 2014. layer,” M.S. thesis, Technische Universität München, Germany, 2010.
[22] B. Kayyali, D. Knott, and S. Van Kuiken, “The big-data revolution in [48] N. Marz and J. Warren, Big Data: Principles and Best Practices of
us health care: Accelerating value and innovation,” Mckinsey & Scalable Realtime Data Systems. Greenwich, USA, Manning
Company, Tech. Rep., Apr. 2013. Publications, 2015.
[23] I. R. M. Association, Healthcare Administration. IGI Global, 2015. [49] B. G. Tudorica and C. Bucur, “A comparison between several NoSQL
[24] H. Zahid, T. Mahmood, A. Morshed, and T. Sellis, “Big data analytics databases with comments and notes,” in Proc. RoEduNet Int. Conf.
in telecommunications: Literature review and architecture 10th Edition: Networking in Education and Research, Iasi, Romania,
recommendations,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, 2011.
pp. 18–38, Jan. 2020.
[50] Q. Yao, Y. Tian, P. F. Li, L. L. Tian, Y. M. Qian, and J. S. Li, “Design
[25] MapR, “Zeta architecture and the data-centric enterprise,” 2020. and development of a medical big data processing system based on
[Online]. Available: https://mapr.com/solutions/zeta-enterprise-archi- Hadoop,” J. Med. Syst., vol. 39, no. 3, pp. 23, Feb. 2015.
tecture/.
[51] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S.
[26] Wikibon, “Hadoop-nosql software and services market forecast 2012- Antony, H. Liu, and R. Murthy, “Hive – A petabyte scale data
2017,” 2013. [Online]. Available: wikibon.org/wiki/v/. warehouse using hadoop,” in Proc. IEEE 26th Int. Conf. Data
[27] M. L. Rethlefsen, D. L. Rothman, and D. S. Mojon, Internet Cool Engineering, Long Beach, USA, 2010, pp. 996–1005.
Tools for Physicians. Berlin, Germany: Springer, 2009, pp. 37–40. [52] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,
[28] R. Vine, “Google scholar,” J. Med. Libr. Assoc., vol. 94, no. 1, “Spark: Cluster computing with working sets,” in Proc. 2nd USENIX
pp. 97–99, Jan. 2006. Conf. Hot Topics in Cloud Computing, Boston, USA, 2010.
[29] WU Libraries, “Comprehensive comparison of reference managers: [53] G. M. Siddesh, S. Hiriyannaiah, and K. G. Srinivasa, “Driving big data
Mendeley vs. zotero vs. docear. 2012. [Online]. Available: with hadoop technologies,” in Handbook of Research on Cloud
https://isg.beel.org/blog/2014/01/15/comprehensive-comparison-of- Infrastructures for Big Data Analytics, P. Raj and G. C. Deka, Eds. IGI
reference-managers-mendeley-vs-zotero-vs-docear/. Global, 2014, pp. 232–262.
[30] “How to choose: Zotero, mendeley, or endnote,” 2017. [Online]. [54] K. Sravanthi and T. S. Reddy, “Applications of big data in various
Available: http://libguides.wustl.edu/choose. fields,” Int. J. Comput. Sci. Inf. Technol., vol. 6, no. 5, pp. 4629–4632,
[31] “Mendeley: Comparing citation managers,” 2017. [Online]. Available: 2015.
http://libguides.lib.msu.edu/mendeley/comparison. [55] K. Michael and K. W. Miller, “Big data: New opportunities and new
[32] “Comparison chart,” 2017. [Online]. Available: https://www.library.wisc. challenges [guest editors’ introduction],” Computer, vol. 46, no. 6,
edu/services/citation-managers/comparison-chart/. pp. 22–24, Jun. 2013.

[33] “Readcube,” 2020. [Online]. Available: https://www.readcube.com/ [56] D. Zeng and R. Lusch, “Big data analytics: Perspective shifting from
home. transactions to ecosystems,” IEEE Intell. Syst., vol. 28, no. 2, pp. 2–5,
Mar. 2013.
[34] Y. J. Chen, Y. C. Su, Y. M. Chen, and C. Y. Huang, “Design and
implementation of a medical knowledge service system for cross- [57] M. Pospiech and C. Felden, “Big data – A state-of-the-art,” in Proc.
organization healthcare collaboration,” in Proc. 6th IEEE Int. Conf. 18th Americas Conf. Information Systems, Detroit, USA, 2012.
Industrial Informatics, Daejeon, South Korea, 2008. [58] R. L. Sallam, C. Howson, C. J. Idoine, T. Oestreich, J. L. Richardson,
[35] E. Gasiorowski Denis, “Big plans for big data,” 2017. [Online]. and J. A. Tapadinhas. Magic quadrant for business intelligence and
Available: https://www.iso.org/news/2014/03/Ref1821.html. analytics platforms. [Online]. Available: https://www.gartner.com/
doc/3611117/magic-quadrant-business-intelligence-analytics,
[36] Sokrati, “Importance of standardizing your big-data,” 2017. [Online]. Accessed on: Feb. 01, 2017.
Available: https://sokrati.com/engineering/standardizing-big-data/.
[59] J. A. Menius Jr and M. D. Rousculp, “Growth in health care data
[37] J. Stevens, “Standardization and big data,” 2017. [Online]. Available: causing an evolution in the pharmaceutical industry,” North Carol.
https://www.artezio.com/pressroom/blog/standardization-and-big-data. Med. J., vol. 75, no. 3, pp. 188–190, Jun. 2014.
[38] T. Olavsrud, “Big data leaders and users unite around standardization,” [60] S. Salas-Vega, A. Haimann, and E. Mossialos, “Big data and health
2015. [Online]. Available: https://www.cio.com/article/2884666/big- care: Challenges and opportunities for coordinated policy development
data/big-data-leaders-and-users-unite-around-standardization.html. in the EU,” Health Syst. Reform, vol. 1, no. 4, pp. 285–300, May 2015.
[39] B. Feldman, E. M. Martin, and T. Skotnes, “Big data in healthcare [61] F. F. Costa, “Big data in biomedicine,” Drug Dis. Today, vol. 19, no. 4,
hype and hope,” Dr. Bonnie 360, Tech. Rep., Oct. 2012. pp. 433–440, Apr. 2014.
[40] F. X. Diebold, “Big data’ dynamic factor models for macroeconomic [62] A. Carstensen and K. Sandkuhl, “Coordination of inter-organisational
measurement and forecasting,” in Advances in Economics and
healthcare processes: Experiences from combining process- and
Econometrics, Eighth World Congress of the Econometric Society
document centred modelling,” in Proc. Communication and
Cambridge, Cambridge, UK, 2000, pp. 115–122.
Coordination in Business Processes: The Int. Workshop, Kiruna,
[41] D. Laney, “3D data management: Controlling data volume, velocity, Sweden, 2005.
and variety,” META Group, Tech. Rep., Feb. 2001. [63] S. Schneeweiss, “Learning from big health care data,” N. Engl. J.
[42] J. S. Ward and A. Barker, Undefined by data: A survey of big data Med., vol. 370, no. 23, pp. 2161–2163, Jun. 2014.
definitions. 2013. [Online]. Available: https://arxiv.org/abs/1309.5821 [64] S. Zillner and S. Neururer, “Technology roadmap development for big
[43] R. Bellazzi, “Big data and biomedical informatics: A challenging data healthcare applications,” KI – Künstl. Intell., vol. 29, no. 2,
opportunity,” Yearb. Med. Inform., vol. 9, no. 1, pp. 8–13, May 2014. pp. 131–141, Nov. 2015.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
20 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

[65] O. Schmitt and T. A. Majchrzak, “Using document-based databases for [87] H. M. Krumholz, J. S. Ross, C. P. Gross, E. J. Emanuel, B. Hodshon,
medical information systems in unreliable environments,” in Proc. 9th J. D. Ritchie, J. B. Low, and R. Lehman, “A historic moment for open
Int. Conf. Information Systems for Crisis Response and Management, science: The yale university open data access project and medtronic,”
Vancouver, Canada, 2012. Ann. Intern. Med., vol. 158, no. 12, pp. 910–911, Jun. 2013.
[66] M. J. C. Nuijten, “The selection of data sources for use in modelling [88] I. Khanna, “Drug discovery in pharmaceutical industry: Productivity
studies,” PharmacoEconomics, vol. 13, no. 3, pp. 305–316, Mar. 1998. challenges and trends,” Drug Dis. Today, vol. 17, no. 19-20,
[67] R. Thorlby, S. Jorgensen, B. Siegel, and J. Z. Ayanian, “How health pp. 1088–1102, Oct. 2012.
care organizations are using data on patients’ race and ethnicity to [89] M. M. Mello, J. K. Francer, M. Wilenzick, P. Teden, B. E. Bierer, and
improve quality of care,” Milbank Quart., vol. 89, no. 2, pp. 226–255, M. Barnes, “Preparing for responsible sharing of clinical trial data,” N.
Jun. 2011. Engl. J. Med., vol. 369, no. 17, pp. 1651–1658, Oct. 2013.
[68] P. D. Clayton and G. Hripcsak, “Decision support in healthcare,” Int. [90] J. S. Ross, R. Lehman, and C. P. Gross, “The importance of clinical
J. Bio-Med. Comput., vol. 39, no. 1, pp. 59–66, Apr. 1995. trial data sharing: Toward more open science,” Circ.: Cardiovasc.
[69] R. Lenz and M. Reichert, “IT support for healthcare processes – Qual. Outcomes, vol. 5, no. 2, pp. 238–240, Mar. 2012.
Premises, challenges, perspectives,” Data Knowl. Eng., vol. 61, no. 1, [91] P. C. Tang, J. S. Ash, D. W. Bates, J. M. Overhage, and D. Z. Sands,
pp. 39–58, Apr. 2007. “Personal health records: Definitions, benefits, and strategies for
[70] R. C. Brownson, J. G. Gurney, and G. H. Land, “Evidence-based overcoming barriers to adoption,” J. Am. Med. Inform. Assoc., vol. 13,
decision making in public health,” J. Public Health Manage. Pract., no. 2, pp. 121–126, Mar. 2006.
vol. 5, no. 5, pp. 86–97, Sept. 1999. [92] D. J. Ballantyne and M. Mulhall, “Method and apparatus for
[71] B. Reeder, D. Revere, R. A. Hills, J. G. Baseman, and W. B. Lober, electronically accessing and distributing personal health care
“Public health practice within a health information exchange: information and services in hospitals and homes,” U.S. Patent 5 867
Information needs and barriers to disease surveillance,” Online J. 821, February 02, 1999.
Public Health Inform., vol. 4, no. 3, pp. ojphi.v4i3.4277, Dec. 2012. [93] I. Iakovidis, “Towards personal health record: Current situation,
obstacles and trends in implementation of electronic healthcare record
[72] M. Goddard, D. Mowat, C. Corbett, C. Neudorf, P. Raina, and V.
in Europe,” Int. J. Med. Inform., vol. 52, no. 1-3, pp. 105–115, Oct.
Sahai, “The impacts of knowledge management and information
1998.
technology advances on public health decision-making in 2010,”
Health Inform. J., vol. 10, no. 2, pp. 111–120, Jun. 2004. [94] K. Caine and R. Hanania, “Patients want granular privacy control over
health information in electronic medical records,” J. Am. Med. Inform.
[73] M. M. Hansen, T. Miron-Shatz, A. Y. S. Lau, and C. Paton, “Big data
Assoc., vol. 20, no. 1, pp. 7–15, Jan. 2013.
in science and healthcare: A review of recent literature and
perspectives: Contribution of the IMIA social media working group,” [95] Y. Demchenko, Z. M. Zhao, P. Grosso, A. Wibisono, and C. de Laat,
Yearb. Med. Inform., vol. 9, no. 1, pp. 21–26, Aug. 2014. “Addressing big data challenges for scientific data infrastructure,” in
Proc. IEEE 4th Int. Conf. Cloud Computing Technology and Science,
[74] B. B. Cohen, S. Franklin, and J. K. West, “Perspectives on the
Taipei, China, 2012, pp. 614–617.
massachusetts community health information profile (MassCHIP):
Developing an online data query system to target a variety of user [96] L. H. Curtis, J. Brown, and R. Platt, “Four health data networks
needs and capabilities,” J. Public Health Manage. Pract., vol. 12, no. 2, illustrate the potential for a shared national multipurpose big-data
pp. 155–160, Mar.–Apr. 2006. network,” Health Aff., vol. 33, no. 7, pp. 1178–1186, Jul. 2014.
[75] F. J. Ohlhorst, Big Data Analytics: Turning Big Data into Big Money. [97] M. Frisse, A. Wilcox, D. Sittig, M. Kahn, and M. H. Lopez, “Clinical
Hoboken, USA: Wiley, 2013. informatics, CER, and PCOR: Building blocks for meaningful use of
big data in health care,” AcademyHealth, Oct. 31, 2012.
[76] P. V. Raja, E. Sivasankar, and R. Pitchiah, “Framework for smart
health: Toward connected data from big data,” in Intelligent [98] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare:
Computing and Applications, D. Mandal, R. Kar, S. Das, and B. K. Promise and potential,” Health Inf. Sci. Syst., vol. 2, no. 1, Feb. 2014.
Panigrahi, Eds. New Delhi, India: Springer, 2015, pp. 423–433. [99] D. A. Gritzalis, “Enhancing security and improving interoperability in
[77] M. Mian, A. Teredesai, D. Hazel, S. Pokuri, and K. Uppala, “Work in healthcare information systems,” Med. Inform., vol. 23, no. 4,
progress – In-memory analysis for healthcare big data,” in Proc. IEEE pp. 309–323, Jan. 1998.
Int. Congr. Big Data, Anchorage, USA, 2014. [100] A. Berler, S. Pavlopoulos, and D. Koutsouris, “Design of an
[78] H. D. Miller, “From volume to value: Better ways to pay for health interoperability framework in a regional healthcare system,” in Proc.
care,” Health Aff., vol. 28, no. 5, pp. 1418–1428, Sept. 2009. 26th Annu. Int. Conf. IEEE Engineering in Medicine and Biology
Society, San Francisco, USA, 2004.
[79] J. Roski, G. W. Bo-Linn, and T. A. Andrews, “Creating value in health
care through big data: Opportunities and policy implications,” Health [101] M. H. Kuo, T. Sahama, A. W. Kushniruk, E. M. Borycki, and D. K.
Aff., vol. 33, no. 7, pp. 1115–1122, Jul. 2014. Grunwell, “Health big data analytics: Current perspectives, challenges
and potential solutions,” Int. J. Big Data Intell., vol. 1, no. 1-2,
[80] A. Gandomi and M. Haider, “Beyond the hype: Big data concepts, pp. 114–126, Jan. 2014.
methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2,
pp. 137–144, Apr. 2015. [102] S. Hoffman and A. Podgurski, “The use and misuse of biomedical
data: Is bigger really better?” Am. J. Law Med., vol. 39, no. 4,
[81] W. Raghupathi and J. Tan, “Strategic IT applications in health care,” pp. 497–538, Dec. 2013.
Commun. ACM, vol. 45, no. 12, pp. 56–61, Dec. 2002.
[103] R. Nambiar, R. Bhardwaj, A. Sethi, and R. Vargheese, “A look at
[82] H. C. Kum and S. Ahalt, “Privacy-by-design: Understanding data challenges and opportunities of big data analytics in healthcare,” in
access models for secondary data,” in AMIA Jt. Summits Transl. Sci. Proc. IEEE Int. Conf. Big Data, Silicon Valley, USA, 2013.
Proc., vol. 2013, pp. 126-130, Mar. 2013.
[104] S. D. Fihn, J. Francis, C. Clancy, C. Nielson, K. Nelson, J. Rumsfeld,
[83] M. Peeters, “Free movement of patients: Directive 2011/24 on the T. Cullen, J. Bates, and G. L. Graham, “Insights from advanced
application of patients’ rights in cross-border healthcare,” Eur. J. analytics at the veterans health administration,” Health Aff., vol. 33,
Health Law, vol. 19, no. 1, pp. 29–60, Mar. 2012. no. 7, pp. 1203–1211, Jul. 2014.
[84] I. S. Rubinstein, “Big data: The end of privacy or a new beginning?” [105] European Commission, “Together for health: A strategic approach for
Int. Data Priv. Law, vol. 3, no. 2, pp. 74–87, May 2013. the EU 2008–2013,” Commission of the European Communities,
[85] S. Imran and I. Hyder, “Security issues in databases,” in Proc. 2nd Int. Brussels, Tech. Rep., Oct. 2007.
Conf. Future Information Technology and Management Engineering, [106] M. Ercan and M. Lane, “An evaluation of the suitability of NoSQL
Sanya, China, 2009, pp. 541–545. databases for distributed EHR systems,” in Proc. 25th Australasian
[86] P. Nisen and F. Rockhold, “Access to patient-level data from Conf. Information Systems, Auckland, New Zealand, 2014.
GlaxoSmithKline clinical trials,” N. Engl. J. of Med., vol. 369, no. 5, [107] J. Kim and K. Y. Chung, “Ontology-based healthcare context
pp. 475–478, Aug. 2013. information model to implement ubiquitous environment,” Multimed.

Tools Appl., vol. 71, no. 2, pp. 873–888, Jul. 2014. [Online]. Available: https://db-engines.com/en/ranking/document+
[108] H. Q. Yu, X. Zhao, X. Zhen, F. Dong, E. J. Liu, and G. Clapworthy, store.
“Healthcare-event driven semantic knowledge extraction with hybrid [127] DB-Engines, “DB-Engines ranking of graph DBMS,” 2017. [Online].
data repository,” in Proc. 4th Edition of the Int. Conf. Innovative Available: https://db-engines.com/en/ranking/graph+dbms.
Computing Technology, Luton, UK, 2014.
[128] DB-Engines, “DB-Engines ranking of wide column stores,” 2017.
[109] M. Mazurek, “Applying NoSQL databases for operationalizing clinical [Online]. Available: https://db-engines.com/en/ranking/wide+column+
data mining models,” in Proc. 10th Int. Conf. Beyond Databases, store.
Architectures, and Structures, Ustron, Poland, 2014, pp. 527–536.
[129] K. L. Chen and H. Lee, “The impact of big data on the healthcare
[110] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. information systems,” in Transactions of the Int. Conf. Health
Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A Information Technology Advancement, 2013.
distributed storage system for structured data,” in Proc. 7th Symp.
[130] S. Zillner, N. Lasierra, W. Faix, and S. Neururer, “User needs and
Operating Systems Design and Implementation, Seattle, USA, 2006,
requirements analysis for big data healthcare applications,” Stud.
pp. 205–218.
Health Technol. Inform., vol. 205, pp. 657–661, Aug. 2014.
[111] G. Matei, “Column-oriented databases, an alternative for analytical
[131] H. Boinepelli, “Applications of big data,” in Big Data, A. Primer, Ed.
environment,” Data. Syst. J., vol. 1, no. 2, pp. 3–16, 2010.
New Delhi, India: Springer, 2015, pp. 161–179.
[112] B. Lee and E. Jeong, “A design of a patient-customized healthcare
[132] L. Hood, J. C. Lovejoy, and N. D. Price, “Integrating big data and
system based on the Hadoop with text mining (PHSHT) for an
actionable health coaching to optimize wellness,” BMC Med., vol. 13,
efficient disease management and prediction,” Int. J. Software Eng.
no. 1, pp. 4, Jan. 2015.
Appl., vol. 8, no. 8, pp. 131–150, 2014.
[133] E. Begoli, T. Dunning, and C. Frasure, “Real-time discovery services
[113] C. T. Yang, J. C. Liu, W. H. Hsu, H. W. Lu, and W. C. C. Chu,
over large, heterogeneous and complex healthcare datasets using
“Implementation of data transform method into NoSQL database for schema-less, column-oriented methods,” in Proc. IEEE 2nd Int. Conf.
healthcare data,” in Proc. Int. Conf. Parallel and Distributed Big Data Computing Service and Applications, Oxford, UK, 2016.
Computing, Applications and Technologies, Taipei, China, 2013, pp.
198–205. [134] J. Lawler, A. Joseph, and H. Howell-Barber, “A big data analytics
methodology program in the health sector,” Inf. Syst. Edu. J., vol. 14,
[114] D. Chrimes, M. H. Kuo, A. W. Kushniruk, and B. Moa, “Interactive no. 3, pp. 63–75, May 2016.
big data analytics platform for healthcare and clinical services,” Global
J. Eng. Sci., vol. 1, no. 1, Sept. 2018. [135] Martin, “Big data in healthcare,” 2016. [Online]. Available: https://
www.martinsights.com/?p=853.
[115] A. Lith and J. Mattsson, “Investigating storage solutions for large data
– A comparison of well performing and scalable data storage solutions [136] M. Logic, “Health information systems mobilized by NoSQL
for real time extraction and batch insertion of data,” M.S. thesis, solutions,” 2016. [Online]. Available: https://www.intel.com/content/
Chalmers Univ. Technology, Göteborg, Sweden, 2010. dam/www/public/us/en/documents/solution-briefs/xeon-e5-v3-
marklogic-healthcare-database-migration.pdf.
[116] Y. Park, M. Shankar, B. H. Park, and J. Ghosh, “Graph databases for
large-scale healthcare systems: A framework for efficient data [137] MongoDb, “Healthcare,” 2020. [Online]. Available:
management and data services,” in Proc. IEEE 30th Int. Conf. Data https://www.mongodb.com/industries/healthcar.
Engineering Workshops, Chicago, USA, 2014. [138] CouchBase, “ Why couchbase NoSQL for healthcare,” 2020. [Online].
[117] M. Baglioni, S. Pieroni, F. Geraci, F. Mariani, S. Molinaro, M. Available: https://www.couchbase.com/solutions/nosql-for-healthcare.
Pellegrini, and E. Lastres, “A new framework for distilling higher [139] R. Sreekanth, G. V. Madhava Rao, and S. Nanduri, “Big data
quality information from health data via social network analysis,” in electronic health records data management and analysis on cloud with
Proc. IEEE 13th Int. Conf. Data Mining Workshops, Dallas, USA, mongoDB: A NoSQL database,” Int. J. Adv. Eng. Global Technol.,
2013. vol. 3, no. 7, pp. 946–949, Jul. 2015.
[118] P. Conde, T. Alonso, I. Garau, P. Roca, and J. Oliver, “Treatment of [140] C. Dobre and F. Xhafa, “NoSQL technologies for real time (patient)
medical databases and their graphical representation on the internet,” monitoring,” in Medical Imaging: Concepts, Methodologies, Tools,
Med. Inform. Internet Med., vol. 31, no. 3, pp. 195–204, Jan. 2006. and Applications, Information Resources Management Association,
[119] S. Batra and C. Tyagi, “Comparative analysis of relational and graph Ed. IGI Global, 2016.
databases,” Int. J. Soft Comput. Eng. (IJSCE)., vol. 2, no. 2, [141] PYPL, “PYPL popularity of programming language,” 2020. [Online].
pp. 509–512, May 2012. Available: http://pypl.github.io/PYPL.html.
[120] E. Torres-Serrano, “A large-scale graph processing system for medical [142] T. Trends, “Most important business intelligence trends for 2020,”
imaging information based on DICOM-SR,” Int. J. Image Min., vol. 1, 2020. [Online]. Available: https://medium.com/@akki.greatlearning/
no. 2-3, pp. 143–158, Jan. 2015. most-important-business-intelligence-trends-for-2020-1fe65e4389ab.
[121] M. Štufi, B. Bačić, and L. Stoimenov, “Big data analytics and [143] J. Ladley, Data Governance: How to Design, Deploy, and Sustain an
processing platform in Czech republic healthcare,” Appl. Sci., vol. 10, Effective Data Governance Program. 2nd ed. Waltham, USA:
no. 5, pp. 1705, Mar. 2020. Academic Press, 2019.
[122] M. P. Gopinath, G. S. Tamilzharasi, S. L. Aarthy, and R. [144] S. P. Kane and K. Matthias, Docker: Up & Running: Shipping Reliable
Mohanasundram, “An analysis and performance evaluation of NoSQL Containers in Production. 2nd ed. USA: O’Reilly Media, 2018.
databases for efficient data management in e-health clouds,” Int. J. [145] G. Kim, P. Debois, J. Willis, J. Humble, and J. Allspaw, The DevOps
Pure Appl. Math., vol. 117, no. 21, pp. 177–197, 2017. Handbook: How to Create World-Class Agility, Reliability, and
[123] K. Kaur and R. Rani, “Managing data in healthcare information Security in Technology Organizations. Portland, USA: IT Revolution
systems: Many models, one solution,” Computer, vol. 48, no. 3, Press, 2016.
pp. 52–59, Mar. 2015. [146] A. Gorelik, The Enterprise Big Data Lake: Delivering the Promise of
[124] S. M. Freire, D. Teodoro, F. Wei-Kleiner, E. Sundvall, D. Karlsson, Big Data and Data Science. Sebastopol, California: O’Reilly Media,
and P. Lambrix, “Comparing the performance of NoSQL approaches 2019.
for managing archetype-based electronic health record data,” PLoS [147] J. Richardson, R. Sallam, K. Schlegel, A. Kronz, and J. L. Sun, “2020
One, vol. 11, no. 3, pp. e0150069, Mar. 2016. Gartner magic quadrant for analytics and business intelligence
[125] DB-Engines, “DB-engines ranking of key-value stores,” 2017. platforms,” 2020. [Online]. Available: https://info.microsoft.com/ww-
[Online]. Available: https://db-engines.com/en/ranking/key-value+ landing-2020-gartner-magic-quadrant-for-analytics-and-business-
store. intelligence.html?LCID=EN-US.
[126] DB-Engines, “DB-Engines ranking of document stores,” 2017. [148] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare:

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.
22 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 1, JANUARY 2021

Promise and potential,” Health Inf. Sci. Syst., vol. 2, no. 1, pp. 3, Feb. Tariq Mahmood is an Associate Professor at the
2014. Faculty of Computer Science, Institute of Business
Administration (IBA), Pakistan. He received the
[149] Y. C. Wang, L. A. Kung, and T. A. Byrd, “Big data analytics: Ph.D. degree in machine learning from University of
Understanding its capabilities and potential benefits for healthcare Trento, Italy, and the M.S. degree in statistical
organizations,” Technol. Forecasting Soc. Change, vol. 126, pp. 3–13, machine learning from Universite Pierre et Marie
Jan. 2018. Curie (Paris 6), France. He has published around 20
[150] A. Belle, R. Thiagarajan, S. M. R. Soroushmehr, F. Navidi, D. A. international journal and 35 conference publications
Beard, and K. Najarian, “Big data analytics in healthcare,” BioMed with total 691 citations and h-index of 12 (Google
Res. Int., vol. 2015, pp. 370194, Jul. 2015. Scholar). His research interests include BDA, deep
learning and machine learning/data science. He heads the Big Data Analytics
[151] J. M. Sun and C. K. Reddy, “Big data analytics for healthcare,” in
Laboratory at IBA, with the focus on imparting data science and big data
Proc. 19th ACM SIGKDD Int. Conf. Knowledge Discovery and Data
certifications to students and industry professionals, implementing BDA-
Mining, Chicago, USA, 2013.
related industrial projects and researching in BDA technology stack,
[152] L. Hong, M. Q. Luo, R. X. Wang, P. X. Lu, W. Lu, and L. Lu, “Big particularly to develop BDA architectures for different types of streaming and
data in health care: Applications and challenges,” Data Inf. Manage., non-streaming data. He also consults in various local industries regarding
vol. 2, no. 3, pp. 175–197, Dec. 2018. business intelligence, data governance, BDA, and machine learning.
[153] M. M. Malik, S. Abdallah, and M. Ala’raj, “Data mining and
predictive analytics applications for the delivery of healthcare services:
A systematic literature review,” Ann. Oper. Res., vol. 270, no. 1-2, Ahsan Morshed is a Lecturer in ICT at CQ
pp. 287–312, Nov. 2018. University, Australia. Previously, he was a Research
Fellow in Data Analytics at Swinburne University of
[154] A. Pashazadeh and N. J. Navimipour, “Big data handling mechanisms Technology and a Senior Project Officer at RMIT
in the healthcare applications: A comprehensive and systematic University. He was also a Postdoctoral Fellow at
literature review,” J. Biomed. Inform., vol. 82, pp. 47–62, Jun. 2018. CSIRO (Australia) on sensor data integration and
[155] D. Tomar, J. P. Bhati, P. Tomar, and G. Kaur, “Migration of healthcare machine learning, and an Information Management
relational database to NoSQL cloud database for healthcare analytics Specialist in the OEKC division at Food and
and management,” in Healthcare Data Analytics and Management: A Agriculture Organization (FAO) of UN in Rome,
Volume in Advances in Ubiquitous Sensing Applications for Italy. During his time in FAO, he acquired extensive
Healthcare, N. Dey, C. Bhatt, A. S. Ashour, and S. J. Fong, Eds. skills in metadata standards, knowledge organization systems, ontologies,
Amsterdam, The Netherlands: Elsevier, 2019, pp. 59–87. Linked Open Data management and information management tools. His
research interests are the big data, data science, semantic web, linked open
[156] K. Ding and P. J. Jiang, “RFID-based production data analysis in an
IoT-enabled smart job-shop,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, data and semantic machine learning. He holds the Ph.D. degree from the
pp. 128–138, Jan. 2018. University of Trento, Italy. Dr. Morshed has 50 peer-reviewed publications
(book, book chapter, journals, conference and workshop papers), with 229
[157] M. S. Shang, X. Luo, Z. G. Liu, J. Chen, Y. Yuan, and M. C. Zhou,
citations and an h-index of 6 (Google Scholar).
“Randomized latent factor model for high-dimensional and sparse
matrices from industrial applications,” IEEE/CAA J. Autom. Sinica,
vol. 6, no. 1, pp. 131–141, Jan. 2019. Timos Sellis (F’09) is a Professor at Swinburne
[158] M. Ghahramani, M. C. Zhou, and G. Wang, “Urban sensing based on University of Technology, Australia. He holds
mobile phone data: Approaches, applications, and challenges,” the diploma from National Technical University of
IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 627–637, May 2020. Athens (NTUA), Greece, the M.Sc. degree from
Harvard University, USA, and the Ph.D. degree from
the University of California at Berkeley, USA. Timos
Sohail Imran is an Assistant Professor and a has a significant international research reputation in
doctoral candidate at the PAF-Karachi Institute of big data, data analytics, data integration and
Economics and Technology, Pakistan. He has more spatiotemporal database systems. He is a Fellow of
than 15 years teaching experience in databases, data the Association for Computing Machinery (ACM)
science, and big data analytics, and more than 10 for his contributions to database query optimisation, spatial data management
years of training experience in databases (SQL and and data warehousing and also an Institute of Electrical and Electronics
NoSQL), big data infrastructure, and data science for Engineers (IEEE) Fellow for his contributions to database query optimisation
different institutes, universities, and the corporate and spatial data management. In 2018 he was awarded the IEEE TCDE
sector. His research work is focused on mapping Impact Award, in recognition of his impact in the field and for contributions
OLAP data warehousing schema into the distributed
to database systems research and broadening the reach of data engineering
Hadoop environment. Specifically, he has developed a framework which
creates dimension and fact tables over Hbase and Hive in a NoSQL schema- research. Before joining Swinburne, Timos was the Director of the Institute
less manner and computes aggregates through SQL-overHadoop technologies for Management of Information Systems and Professor at the National
(Presto, Drill, Spark SQL). This functionality is made scalable through Technical University of Athens. He has also held the role of Director, Big
containerization and more efficient through the use of Apache Spark. Data Lab at RMIT University.

Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 25,2021 at 23:47:39 UTC from IEEE Xplore. Restrictions apply.

Big Data Analytics in Healthcare
100% (3)
Big Data Analytics in Healthcare
193 pages
Big Data Analytics in Healthcare A Systematic Literature Review Edited
No ratings yet
Big Data Analytics in Healthcare A Systematic Literature Review Edited
36 pages
Mini Project Doc 2
No ratings yet
Mini Project Doc 2
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
24 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
16 pages
Phim0016 0001f 2
No ratings yet
Phim0016 0001f 2
19 pages
P1 Harnessing - Big - Data - Analytics - For - Healthcare
No ratings yet
P1 Harnessing - Big - Data - Analytics - For - Healthcare
38 pages
Health Analytics Patients MNGMT
No ratings yet
Health Analytics Patients MNGMT
16 pages
Big Healthcare Data Preserving Security and Privac PDF
No ratings yet
Big Healthcare Data Preserving Security and Privac PDF
18 pages
HealthcareBigData AComprehensiveOverview
No ratings yet
HealthcareBigData AComprehensiveOverview
29 pages
(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges
No ratings yet
(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges
29 pages
MSYS116 Case Study 3
No ratings yet
MSYS116 Case Study 3
8 pages
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
100% (22)
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
23 pages
Big Data in The Healthcare System A Synergy
No ratings yet
Big Data in The Healthcare System A Synergy
16 pages
Literature Review of Effect of Big Data Analysis
No ratings yet
Literature Review of Effect of Big Data Analysis
7 pages
A Review Paper On Scope of Big Data Analysis in Heath INFORMATICS
No ratings yet
A Review Paper On Scope of Big Data Analysis in Heath INFORMATICS
8 pages
A Review of Big Data Trends and Challenges in Healthcare
No ratings yet
A Review of Big Data Trends and Challenges in Healthcare
14 pages
Raghupathi-Raghupathi2014 Article BigDataAnalyticsInHealthcarePr PDF
No ratings yet
Raghupathi-Raghupathi2014 Article BigDataAnalyticsInHealthcarePr PDF
10 pages
Analysis of Research in Healthcare Data Analytics - Sathyabama
No ratings yet
Analysis of Research in Healthcare Data Analytics - Sathyabama
43 pages
Big Data Hadoop in Health Care
No ratings yet
Big Data Hadoop in Health Care
51 pages
Big Data Analytics in Healthcare
No ratings yet
Big Data Analytics in Healthcare
16 pages
Big Data Analytics For Healthcare Recommendation Systems
No ratings yet
Big Data Analytics For Healthcare Recommendation Systems
6 pages
Application of Big Data Analytics - An Innovation in Health Care
No ratings yet
Application of Big Data Analytics - An Innovation in Health Care
14 pages
BioMed Research International - 2015 - Belle - Big Data Analytics in Healthcare
No ratings yet
BioMed Research International - 2015 - Belle - Big Data Analytics in Healthcare
16 pages
Bigdata Teikyo University PDF
No ratings yet
Bigdata Teikyo University PDF
16 pages
KPI Admin
80% (5)
KPI Admin
3 pages
2.big Data Analytics in Healthcare A Systematic Literature Review
No ratings yet
2.big Data Analytics in Healthcare A Systematic Literature Review
36 pages
Big Data Analytics
No ratings yet
Big Data Analytics
11 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
14 pages
FEBIM2022BigDataEthics Bigdata
No ratings yet
FEBIM2022BigDataEthics Bigdata
11 pages
Big Data Analytics For Healthcare Decision-Making Enhancing Outcomes Through Data-Driven Insights
No ratings yet
Big Data Analytics For Healthcare Decision-Making Enhancing Outcomes Through Data-Driven Insights
9 pages
A Review of The Role and Challenges of Big Data in Healthcare Informatics 2022
No ratings yet
A Review of The Role and Challenges of Big Data in Healthcare Informatics 2022
10 pages
Final Big Data Word
No ratings yet
Final Big Data Word
9 pages
Bsa Assignment
No ratings yet
Bsa Assignment
13 pages
Challenges and Opportunities of Big Data Analytics
No ratings yet
Challenges and Opportunities of Big Data Analytics
11 pages
Big Data in Healthcare Management: A Review of Literature: American Journal of Theoretical and Applied Business
No ratings yet
Big Data in Healthcare Management: A Review of Literature: American Journal of Theoretical and Applied Business
13 pages
Big Data in Health Care Sector: Department of Computer Applications
No ratings yet
Big Data in Health Care Sector: Department of Computer Applications
9 pages
Sample
No ratings yet
Sample
11 pages
10 1109ICoAC44903 2018 8939061
No ratings yet
10 1109ICoAC44903 2018 8939061
9 pages
Mar Publishing
No ratings yet
Mar Publishing
7 pages
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
No ratings yet
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
13 pages
Big Data Analytics in Healthcare - Promise and Potential
100% (1)
Big Data Analytics in Healthcare - Promise and Potential
10 pages
Algoritmos de Aprendizaje Automatic, Medicina
No ratings yet
Algoritmos de Aprendizaje Automatic, Medicina
4 pages
2nd Draft of Literature Review
No ratings yet
2nd Draft of Literature Review
6 pages
The Role of Data Science in Healthcare Advancement
No ratings yet
The Role of Data Science in Healthcare Advancement
11 pages
Big Data in Health Care
No ratings yet
Big Data in Health Care
4 pages
The Role of Big Data in Enhancing Healthcare Quality
No ratings yet
The Role of Big Data in Enhancing Healthcare Quality
5 pages
Statistical Analysis of Big Data To Improvise Health Care: February 2018
No ratings yet
Statistical Analysis of Big Data To Improvise Health Care: February 2018
4 pages
Big Data On Health Care System
No ratings yet
Big Data On Health Care System
4 pages
Big Data Analytics For Healthcare Organization A S
No ratings yet
Big Data Analytics For Healthcare Organization A S
8 pages
Big Data Security in Healthcare Survey On Frameworks and Algorithms
No ratings yet
Big Data Security in Healthcare Survey On Frameworks and Algorithms
6 pages
Big Data Security and Privacy Issues in Healthcare
No ratings yet
Big Data Security and Privacy Issues in Healthcare
4 pages
Big Data Diagnostics
No ratings yet
Big Data Diagnostics
4 pages
Big Data Health Care Using Tools
No ratings yet
Big Data Health Care Using Tools
18 pages
The Role of Big Data Analytics in Hospital Management System
No ratings yet
The Role of Big Data Analytics in Hospital Management System
6 pages
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
No ratings yet
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
10 pages
1.philosophy of
No ratings yet
1.philosophy of
58 pages
Concurrence of Big Data Analytics and Healthcare
No ratings yet
Concurrence of Big Data Analytics and Healthcare
10 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
16 pages
Introduction To EJB 3
No ratings yet
Introduction To EJB 3
79 pages
IoT Unit 1
No ratings yet
IoT Unit 1
72 pages
Big Data in Healthcare Systems and Research
No ratings yet
Big Data in Healthcare Systems and Research
4 pages
HP Device Manager User Guide en US
No ratings yet
HP Device Manager User Guide en US
237 pages
Report On Software Development Life Cycle
No ratings yet
Report On Software Development Life Cycle
12 pages
KDD-Knowledge Discovery in Databases
No ratings yet
KDD-Knowledge Discovery in Databases
5 pages
Standar Dan Kerangka Kerja Keamanan Informasi: ISO 27000 Information Security Management System
No ratings yet
Standar Dan Kerangka Kerja Keamanan Informasi: ISO 27000 Information Security Management System
16 pages
A Method To Align A Manufacturing Execution System With Lean Objectives
No ratings yet
A Method To Align A Manufacturing Execution System With Lean Objectives
19 pages
Cyber Ebook v1.5 Web
No ratings yet
Cyber Ebook v1.5 Web
20 pages
Red Hat Satellite-6.4-Installing Satellite Server From A Connected Network-en-US
No ratings yet
Red Hat Satellite-6.4-Installing Satellite Server From A Connected Network-en-US
76 pages
Head First Design Patterns 4.0
0% (1)
Head First Design Patterns 4.0
22 pages
Developing Task Model Applications: Aneka Tutorial Series
No ratings yet
Developing Task Model Applications: Aneka Tutorial Series
41 pages
Core Abap: Erp Introduction Local Structures
No ratings yet
Core Abap: Erp Introduction Local Structures
4 pages
Excercise Programs On JS
No ratings yet
Excercise Programs On JS
28 pages
Implementing SAP From End-To-End Business Process Scenarios
No ratings yet
Implementing SAP From End-To-End Business Process Scenarios
20 pages
Checking at Program Level With AUTHORITY-CHECK
No ratings yet
Checking at Program Level With AUTHORITY-CHECK
2 pages
AWS T&C Partner Training Calendar
No ratings yet
AWS T&C Partner Training Calendar
23 pages
Solution Architecture - CEFR B2
No ratings yet
Solution Architecture - CEFR B2
13 pages
04 Connect To An AWS EC2 Instance - Windows and PuTTY
No ratings yet
04 Connect To An AWS EC2 Instance - Windows and PuTTY
13 pages
DWDM Assignment 2
No ratings yet
DWDM Assignment 2
16 pages
Association Rules Max-Pattern Closed-Pattern Sequential Pattern
No ratings yet
Association Rules Max-Pattern Closed-Pattern Sequential Pattern
8 pages
Data Mining For The Masses: Dr. Matthew North
No ratings yet
Data Mining For The Masses: Dr. Matthew North
7 pages
Penyusunan Buku Pedoman Kode Klasifikasi Arsip Di
No ratings yet
Penyusunan Buku Pedoman Kode Klasifikasi Arsip Di
9 pages
Iad Cnfiguration
No ratings yet
Iad Cnfiguration
4 pages
Disaster Recovery
No ratings yet
Disaster Recovery
6 pages
Add Item To Dropdown List in HTML Using Javascript: Sign Up Log in Tour Help
No ratings yet
Add Item To Dropdown List in HTML Using Javascript: Sign Up Log in Tour Help
3 pages
Project Weekly Assessment Report-2
No ratings yet
Project Weekly Assessment Report-2
3 pages
SYLLABUS
No ratings yet
SYLLABUS
2 pages
Finfitt Private Limited Internship Data
No ratings yet
Finfitt Private Limited Internship Data
2 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet

Healt Care

Uploaded by

Healt Care

Uploaded by

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO.

Big Data Analytics in Healthcare — A Systematic

5) Overall, it is apparent that research on NoSQL

TABLE I Fig. 1. Year-wise distribution of selected 99 articles.

Abstract filtration 150 Big Data Techniques

Digital sources Studies

TABLE III Big Data Techniques HC

Technical blogs 4 IEEE Xplore Elsevier ACM Springer Google Scholar

Sub-research question Article distribution

B. Big Data Storage

HDFS while Zookeeper is used to maintain coordination benefits.

4) DMPM: It provides customized patient medical services Patient DB Practitioner DB Diagnosis DB

Healthcare data lake SQL

Predictive analytics / Machine learning

Fig. 13. Med-BDA: A state-of-the-art BDA architecture for healthcare.

excellent query performance. X. Success Strategies for BDA in Healthcare

XII. Conclusion and Future Work References

You might also like