0% found this document useful (0 votes)
107 views

(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges

Uploaded by

Eko Priyanto
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

(25439251 - Data and Information Management) Big Data in Health Care - Applications and Challenges

Uploaded by

Eko Priyanto
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Data and Information Management, 2018; 2(3): 175–197

Literature Review Open Access

Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu, Wei Lu*, Long Lu*

Big Data in Health Care:


Applications and
Challenges
[email protected]
https://doi.org/10.2478/dim-2018-0014 Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu: School of Information
received May 8, 2018; accepted June 18, Management, Wuhan University, Wuhan, P. R. China
2018.

Abstract: The concept of Big Data is popular in a


variety of domains. The purpose of this review was to
summarize the features, applications, analysis
approaches, and challenges of Big Data in health care.
Big Data in health care has its own features, such as
heterogeneity, incompleteness, timeliness and longevity,
privacy, and ownership. These features bring a series of
challenges for data storage, mining, and sharing to
promote health- related research. To deal with these
challenges, analysis approaches focusing on Big Data in
health care need to be developed and laws and
regulations for making use of Big Data in health care
need to be enacted. From a patient perspective,
application of Big Data analysis could bring about
improved treatment and lower costs. In addition to
patients, government, hospitals, and research
institutions could also benefit from the Big Data in
health care.

Keywords: Big Data, public health, cloud


computing, medical applications

1 Introduction
Big Data, the generic term for data sets of structured
and unstructured data that are extremely large and
complex so that the traditional software, algorithm, and
data repositoriesareinadequatetocollect, process,
analyze, and store them (Asante-Korang & Jacobs,
2016; Kyoungyoung Jee & Gang Hoon Kim, 2013;
Khoury & Ioannidis, 2014; Tan, Gao, & Koch, 2015),
has become an intensively studied area in recent years.
With the development of

*Corresponding authors: Wei Lu, School of Information


Management, Wuhan University, Wuhan, P. R. China, E-mail:
weilu@ whu.edu.cn; Long Lu, Cincinnati children’s Hospital
Medical Center, Cincinnati, OH, U.S.A; School of Information
Management, Wuhan University, Wuhan, P. R. China, E-mail:
are required for data acquisition, extraction, processing,
analysis, and storage. Currently, infrastructure for Big
the Internet, the mobile Internet, the
Data includes servers, storage systems, cloud service,
Internet of things, social media, biology,
and networking equipment. Software for Big Data
finance, and digital medicine, the volume
includes parallel and distributed file systems, retrieval
of data has increased dramatically. Big
software, and data-mining software (Anderson &
Data not only describes the large size of
Chang, 2015).
data as its name suggests but also implies
The advanced analytical technologies developed
rapid data processing ability and novel
for Big Data have driven its applications in many areas
technology and approaches for handling
such as combating crime, business execution, finance,
the data (Krumholz, 2014). After entering
Global Positioning System (GPS), commerce, travel,
the 21st century, Big Data went through a
urban informatics, meteorology, genomics, complex
series of evolutionary steps, and software
physics simulations, biology, environmental research,
in suitable environment has been
and health care (Chen, Mao, & Liu, 2014). Health care
developed. With the growth of
data are one of the driving forces of Big Data. With
information exchanges, Big Data has
advanced data generation technology, there presents an
been expanded to a certain scale, not only
exponential increasing trend in the volume of data. For
in its size but also in data technology. In
example, as can be seen from the Human Genome
terms of its five main characteristics,
Project completed in 2003, one single genome in human
volume, variety, velocity, variability, and
DNA occupies 100–150 gigabytes (Marx, 2013;
veracity, state- of-the-art techniques,
O’Driscoll, Daugelaite, & Sleator, 2013). In terms of
technologies, and equipment are required
data size, Big Data in health care exceeded 150
to deal with Big Data in correlation
exabytes after 2011 (Wang, Kung, Ting, & Byrd, 2015),
analysis, clustering analysis, modeling,
and a study showed that data size in health care is
prediction, and hypothesis verification.
estimated to be around 40 ZB in 2020, about 50 times
Thus, advanced hardware and software
the 2009 figure of 0.8 ZB (O et al., 2013) (Fig. 1A).

Open Access. © 2018 Liang Hong et al., published by Sciendo. This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 3.0 License.
17 Liang Hong et
al.
6

2.1 Big Data in medicine and clinics

Big Data in medicine and clinics includes various types


and large amounts of data generated from hospitals, such
as clinical data, and medical imaging. It is often closely
associated with doctors and patients. In other words,
Big Data in medicine is generated from historical clinical
activities (Tsumoto, Hirano, & Iwata, 2013) and has
significant effects on the medical industry. For instance,
it can assist in planning treatment paths for patients,
processing clinical decision support (CDS), and improving
health care technology and systems (Kyoungyoung Jee &
Gang Hoon Kim, 2013).
Figure 1A Data explosion in health care In the medical domain, Big Data comes from
hospital information resources, surgeons’ work,
activities of anesthesia, physical examinations,
radiography, magnetic resonance imaging (MRI),
computer tomography (CT), information of patients,
pharmacy, treatment, medical imaging, and imaging
report (Tan et al., 2015; Wang & Alexander, 2013).
These clinical activities generate a large number of
records including identification information of patients,
diagnosis, medicine scheme, notes from physicians, and
sensor data (Tan et al., 2015; Wang & Alexander,
2013). Major data from clinical activities are electronic
health record (EHR)/ electronic medical record (EMR),
personal health record (PHR), and medical images.
Figure 1B Literature explosion searched with “health care” in
EMR comprises structured and unstructured data that
Pubmed
contain all the medical activity information of the
patients and is often used for treatment and treatment
In addition, as researchers continue to make progress in decisions, while EHR is associated with health-related
health care, there is a dramatic explosion in the quantity information for individuals such as medical information
of research literatures (Fig. 1B). and financial information, which are closely related to
the health care of the individuals (L. Wang &
Alexander, 2013; Wu et al., 2017). Differences between
EHR and EMR are that EHR can be shared between
2 Major Types and Sources of Big different systems in different organizations (Heart, Ben-
Data in Health Care Assuli, & Shabtai, 2017; Joshi & Yesha, 2012; L. Wang
& Alexander, 2013) and is the whole- life record of a
Health care has become an important issue in developed patient from birth to death stored in the medical
countries and middle-income countries (Kyoungyoung institution, while EMR is the complete record of
Jee & Gang Hoon Kim, 2013). Big Data in health care patient’s disease stored in the hospital; EHR focuses on
can be classified into four main types based on the data health management of residents, while EMR focuses on
sources, i.e., Big Data in medicine, also named as clinical diagnosis of patients; EHR also contains data of
medical/clinical Big Data; Big Data in public health and demographics, medical history, medication and
behavior; Big Data in medical experiments; and Big allergies, immunization status, laboratory test results,
Data in medical literature. Table 1 summarizes the radiology images, vital signs, personal statistics, and
information of major data types. billing information (M, 2014); EMR is the record of
care delivery organization (CDO) and belongs to CDO,
while EHR is the subset of CDO and belongs to the
patients or stakeholders (Garets & Davis, 2007). EHRs
are adopted by many countries, generating about 500
petabytes of data in
Table 1
Summary of Major Date Types of Big Data in Health Care

Data type Data name Data description Data acquisition Technology/database/system


Big Data in Electronic health Standard data Hospital information resource, Medical record data exchange,
medicine and record (EHR)/ collection of surgery’s work, activities of standards: Health Level 7 (HL7) ,
clinics electronic medical medical and health anesthesia, physical examination, Continuity of Care Record (CCR),
record (EMR) information for radiography, magnetic resonance Continuity of Care Document (CCD),
patients and can imaging (MRI), computer controlled medical vocabulary (CMV),
be shared in tomography (CT), information of computerized provider order entry
different patient, pharmacy, treatment, (CPOE) (Valdes, Kibbe, Tolleson,
organizations medical imaging, imaging report, Kunik, & Petersen, 2004) (Garets
(Gunter & Terry, identification information of & Davis, 2007), all scripts, Epic
2005). Often patient, clinical diagnosis, Systems, Practice Fusion, NextGen
comes from medicine scheme, notes from Healthcare, clinical decision support
medical activities physician, sensor data (Belle et al., systems, pharmacy management
and public health 2015; Wang & Alexander, 2013), system, EMR Adoption Model (Wang
data patient demographics, clinic or & Alexander, 2013) (Garets & Davis,
inpatient notes, electronic reports 2007), NoSQL database, clinical data
repository (CDR) (Garets & Davis,
2007)

Personal health Allergies and adverse drug Cloud computing, Health Insurance
record (PHR) reactions, chronic diseases, Portability and Accountability
As its name family history, illnesses and Act(HIPAA) , and HL7 (Chen et al.,
suggests, it is the hospitalizations, imaging 2012); stored in paper like printed
health-related data reports, laboratory test results, laboratory reports, copies of clinic
and information of medications and dosing, notes, and health histories created
patients (Tang, Ash, prescription record, surgeries and by the individual; electronic devices
Bates, Overhage, other procedures, vaccinations and such as personal computer-based
& Sands, 2006) observations of daily living, and software, CD, DVD, and smart card;
and about people’s reported by patients (Rumsfeld, web applications such as HealthVault
lifelong health Joynt, & Maddox, 2016) and PatientsLikeMe; and cloud
information. It is servers (Chen et al., 2012)
available for further
use (Chen et al.,
2012)

Medical images Data that present


X-ray, CT, histology, positron- Statistical shape models (SSMs),
visual information of
emission tomography (PET), medial models, clustering, active
interior human body
radiography, MRI, nuclear appearance models (AAMs), active
medicine, elastography, shape models (ASMs) (Heimann &
tactile imaging, photoacoustic Meinzer, 2009), image segmentation
imaging, echocardiography algorithm, fuzzy C-means (FCM)
(Kovalev & Kalinovsky, 2015), algorithm (Zhang & Chen, 2004),
ultrasonography, angiography image registration, picture archiving
and communication systems,
Super PACS (Picture Archiving and
Communication Systems) , RIS,
and digital image communication
in medicine (DICOM) (Luo, Wu,
Gopukumar, & Zhao, 2016)
Continued
Table 1
Summary of Major Date Types of Big Data in Health Care

Data type Data name Data description Data acquisition Technology/database/system


Electrocardiogram Electrical graph Electrocardiograph (ECG) signal MIT-BIH Arrhythmia Database,
recording heartbeat American Heart Association(AHA)
activity of a person database, Common Standards for
in a period of time Electrocardiography database, ST-T
like 1 minute database,
Physikalisch-Technische
Bundesanstalt (PTB) and Paroxysmal
Atrial Fibrillation(PAF)
Big Data in public health and behavior

Vitals Mainly refer to four


Temperature, pulse, respiratory Mobile technology, portable
sings (temperature,
rate, and blood pressure equipment, wearable system, and
pulse, respiratory
advanced devices like smartphones
rate, and blood
with third-party applications
pressure) and
(HealthKit from Apple, Google Fit
other physiological
from Google, and S Health form
data outside the
Samsung), Android watches and
health-care setting
Google glasses (Safavi & Shukur,
(Rumsfeld et al.,
2014), and medical devices
2016)
like implantable cardioverter–
defibrillators (Rumsfeld et al.,
-omics data Biology 2016)
Genomics, transcriptomics –
information data in
whole genome sequencing, RNA Data End-of-life (EOL) Extension
molecular- level
seq, metabolomics –Nuclear (DAnTE) and DanteR
catalog (Skotnes,
Magnetic Resonance (NMR) ,
2012). Reflects
mass spectrometry, proteomics
characteristics
– mass spectrometry,
of individual for
methylomics – pyrosequencing,
treatment
and ChIP-on-chip
(Rumsfeld et al.,
2016)

Molecular biology
Interaction and Molecular cloning, polymerase NCBI
experiment
regulation of chain reaction (PCR),
biological activity macromolecule blotting and
within cells, such probing, microarrays, and next-
as interactions generation sequencing
between DNA,
RNA, proteins, and
biosynthesis
Human body
Data and samples Cells, tissues, and organs Mayo Clinic Biobanks (http://
samples
of cells, tissues, and specimencentral.com/biobank-
organs in human directory/)
body (Bagayoko,
Dufour, Chaacho,
Big Data in medical experiment

Bouhaddou, &
Fieschi, 2010)

Clinical trials Experiments for


Drug efficacy, toxicity, new ClinicalTrials.gov
evaluating new
treatment devices, and procedures
medical treatment
(e.g., drug, device)
(Kanagaraj &
Sumathi, 2012)
Continued
Table 1
Summary of Major Date Types of Big Data in Health Care

Data type Data name Data description Data acquisition Technology/database/system


Journal/ Research articles Pubmed.com, New England Journal Website of journal articles, Google
conference article written by of Medicine, Lancet, Nature, Scholar, and Science Citation Index
Big Data in medical literature

researchers Science, and Cell (SCI)


Structured MeSH and Database in MeSH NCBI
knowledge International
Classification of
Diseases 10th
revision (ICD-10)

2012, which is expected to reach 25,000 petabytes by 2020 such as smartphones with third-party applications
(Feldman, Martin, & Skotnes, 2012). (HealthKit from Apple, Google Fit from Google, and S
PHR comes from a variety of patient health and Health form Samsung), Android watches, and Google
social information; the main role of it is as a data source Glasses have been developed with sensors in the health
for medical analysis and clinical decision support care area (Safavi & Shukur, 2014). Since people have
(Poulymenopoulou et al., 2015) . It includes data of become more concerned with their own health on a day-
allergies and adverse drug reactions (ADRs), chronic to- day basis, ODLs have come to play a key role in
diseases, family history, illnesses and hospitalizations, recording personal daily health and behavior, signs, and
imaging reports, laboratory test results, medications symptoms of patients (Backonja et al., 2012).
and dosing, prescription records, surgeries and other Additionally, data of sports and diet of people also
procedures, vaccinations, and observations of daily contribute significantly to Big Data in public health
living (ODLs). Unlike other document or text data, and behavior. In the Apple iTunes store alone, there are
medical imaging mainly comes from X-ray, CT, more than 40,000 health care apps available (Aitken &
histology, PET, radiography, magnetic resonance Gauntlett, 2013). In 2017, it is predicted that more than
imaging (MRI), nuclear medicine, ultrasound, 1.7 billion people will have downloaded health care
elastography, tactile imaging, photoacoustic imaging, apps.
echocardiography, and so on. It contains visual In terms of infectious diseases in public health,
elements, and this means that data are usually very large there is a well-known case in which Google
(Kovalev & Kalinovsky, 2015). successfully predicted the time and scale of an influenza
by analyzing the search engine results.

2.2 Big Data in public health and behavior


2.3 Big Data in medical experiment
Big Data in public health and behavior focuses on the
physiological data of users that are often collected by This part of Big Data mainly focuses on molecular
portable equipment (Yan, Y., Qin, X., Fan, J., & Wang, biology, human body data set, clinical trials, biology
L., 2014), such as electrocardiogram, vitals, contagion, samples, gene sequences, and clinical and medical
wearable device, daily health record, sports, and diet. research laboratory tests and “omics” data (Table 1).
Electrocardiogram is the electrical graph recording Molecular biology, a vital part of both biological
heartbeat activity of a person in a period of time, and medical experiments, focuses on interaction and
e.g., 1 minute; the recording process involves putting regulation of biological activities within cells, such as
electrodes on the skin. Vitals, short for vital signs, include interactions between DNA, RNA, and proteins and
temperature, pulse, respiratory rate, and blood pressure. biosynthesis (Fenderson & Bruce, 2008). It has a close
These signs are the most important four signs of the relationship with fields of biochemistry and genetics in
body’s function. Wearable device in public health refers to research of proteins and genes (Lodish, 2008). The main
equipment that records details about lifestyle and vitals of techniques of molecular biology include molecular
people, from which the physicians can be assisted in cloning, polymerase chain reaction (PCR),
treatment and diagnosis for patients. Advanced devices macromolecule blotting and probing, microarrays, and
so on. Human body data sets include
180 Liang Hong et
al.

samples of cells, tissues, and organs in human body, as


concept of distribution to handle tremendous volumes
well as cross-sectional photographs of the human body
of data (Asante-Korang & Jacobs, 2016; Kyoungyoung
in the visible human project, which is used to visualize
Jee & Gang Hoon Kim, 2013). In terms of data
anatomy of human body in support of medical activities
management, data warehouses are used for supporting
(Vesna, 2000). Similar to human body data sets,
decision-making, online transaction processing (OLTP),
biological laboratory specimen also comes from
and online analysis processing (OLAP) (Sheta &
sampling of human body and it is stored in biorepository.
Eldeen, 2013). In addition, machine learning in data
In case of one type of new drug, novel vaccines, or new
mining seems to be the most popular technological
medical device has been created, clinical trials should be
approach in Big Data analysis, and some technologies
processed before they come into use. Clinical trial, a
such as retrieval, web mining, decision tree, support
kind of experiment or observation in medical or clinical
vector machines (SVMs), clustering, neural network,
research, is a procedure of evaluating the effectiveness of
network analysis, knowledge maps, and Natural
new medical treatment through study on human
Language Processing (NLP) and Multi-Layer
volunteers (DerSimonian & Laird, 1986). Gene
Perceptron (MLP) approaches have been used. For
sequencing, mainly referring to DNA sequencing, is a
instance, named- entity recognition is one of the most
medical research activity of obtaining precise order of
important techniques in BioNLP, used in recognizing
nucleotides within DNA. This process results in a large
particular entity processes such as gene normalization
amount of data for recording DNA sequences. Medical
and event extraction (Usami, Cho, Okazaki, & Tsujii,
research is often performed by researchers in
2011). Various techniques for – omics data analysis,
universities, research institutions, and industry. The
such as amplified fragment length polymorphism
objective of their work is to make breakthrough in
(AFLP) for DNA fingerprinting and interpretation,
cellular, molecular, and physiological mechanisms in
validation tools for –omics data (Hassani S, 2010), and
human for health care; fundamental parts of it also
statistical tools data analysis tool extension (DAnTE)
include molecular biology, medical genetics,
and data analysis tool extension R (DanteR) for
immunology, neuroscience and psychology (Obenshain,
–omics data analysis have emerged with different
2004). Omics data are the biology information data in
usages (Polpitiya er al., 2008; Taverner et al., 2012). In
the molecular level catalog, which include genomics,
addition to the techniques in data processing,
proteomics, metabolomics, transcriptomics, epigenomics,
techniques for health care data have progressed in HISs.
lipidomics, immunomics, glycomics, and RNomics (Wu
For example, a typical system is developed for data
et al., 2017).
collection, data management, and data sharing in
Hospital Information System (HIS) (Abernethy,
Wheeler, & Bull, 2011). Currently, new technologies
2.4 Big Data in medical literature
and new models have been found to be effective for
structured and unstructured Big Data in health care.
As the medical/clinical area has developed, currently,
Data mining, as well as NLP, has been incorporated in
research articles as well as the structured knowledge are
the Big Data platform to handle complex scientific
produced at a high speed. Additionally, there are also
research oriental problems.
many older materials in the medical/clinical area. This
As a sociotechnical subsystem, HIS is commonly
literature makes a significant contribution to Big Data
featured in presenting quality community for historical
in health care.
data resource, information, and knowledge in health
care for hospital administration and patient health care
(Bagayoko & Dufour, 2010; Kanagaraj & Sumathi,
2.5 Hospital information system (HIS) 2011; Roberts, 1985; Tsumoto et al., 2013) (Table 2).
and its evolution HIS was developed only for administrative management
usage in the early 1960s and gradually expanded to
Technology for Big Data storage and processing like
information management after 1970 (Pai & Huang,
the Cassandra database has been applied; the main
2011). Broadly speaking, there are many types of HIS.
characteristic of this tool is that it can accommodate
For instance, PACS, short for picture archiving and
about two million columns in one row, making it
communication systems, is a common HIS for storing
more convenient to deal with large volumes of data
and transferring digital images (Joshi & Yesha, 2012).
(Kyoungyoung Jee & Gang Hoon Kim, 2013). In Big
Additionally, laboratory information system (LIS),
Data, including those in health care, one of the most
radiology information system (RIS), ultrasound
popular processing tools Hadoop, created by Apache,
information system (UIS), and EHR system, EMR
uses the
system and PHR system are also included (He, Jin,
Big Data in Health Care: Applications and 181
Challenges
Zhao, & Xiang, 2010; Joshi & Yesha, 2012).
In
Table 2
Systems for Acquiring Medical/Clinical Big Data

System Description

HIS Hospital information system; the system provides quality community for historical data
resource, information, and knowledge in healthcare for hospital administration and patient
health care (Bagayoko et al., 2010; Kanagaraj & Sumathi, 2011; Sirintrapun & Artz, 2016;
Tsumoto, Hirano, & Iwata, 2013)

LIS Laboratory information system; often used to collect, restore, archive, process, extract,
and analyze data in laboratory; this system aims to improve efficiency of turn-around-times
(TAT) of records, quality of resource utilization, and public health supporting (Blaya et al.,
2007; Sepulveda & Young, 2013)

RIS Radiology information system; it is used to capture and store data including images,
demographic and clinical information, and so on, also assisting in patient
registration, report repository, and physician directory with advanced technology
(Nance, Meenan, & Nagy, 2013)

PACS (super sound PACS, endoscope PACS) Picture archiving and communication systems; it is a common HIS for storage and
transferring of digital images (Joshi & Yesha, 2012)

EMR EMR system is used to maintain medical records and store, process, and retrieve
information. It also ensures accuracy of information. Its aim is to ensure accuracy of
information in order to provide patient control and transparency, interdepartmental
communication, and great reporting capabilities for treatment (Kumar & Aldrich,
2010)

Cost accounting System for collecting, recording, classifying, analyzing, summarizing, allocating,
and evaluating financial cost in the medical area

Physical examination system System for checking signs of patient

terms of handling HL7 format data, the open archive Information System (HIS) development. According to
information system model was applied (Celesti, Fazio, Bagayoko & Dufour (2010), web infrastructure, server
Romano, & Villari, 2016). HIS presents the ability to operation systems, developer tools, and databases are
capture, store, and process health care data and often commonly used in Europe and North America.
requires a large number of techniques to assist it. In
other words, one of the major research challenges is
how to integrate advanced techniques of information
processing into HIS (Roberts, 1985). Cloud computing,
3 Unique Features of Big Data
a technique for data storage and sharing, is widely used in Health Care
in information system. The use of cloud computing in
HIS is well known and very common for data In addition to the “5V” features of Big Data, Big Data
processing, data backup, and information sharing in health care has its own unique features, such as
between different organizations, such as cloud-based heterogeneity, incompleteness, timeliness and longevity,
PACS and cloud-based EHR systems (He et al., 2010; data privacy, and ownership.
Joshi & Yesha, 2012; Kanagaraj & Sumathi, 2011). Cloud
security requires in many aspects, including data
security, application security, system security, network 3.1 Heterogeneity
security, and physical security, a high-quality of
security management platform. Additionally, novel Big Data in health care often has incompatible formats,
techniques have been proposed to improve the quality of which can be classified into structured and unstructured
HIS. For example, in order to achieve data-level data. For example, some EHR collect data in structured
interoperability, an adaptive AdapteR Interoperability formats and International Classification of Diseases 10th
ENgine (ARIEN) mediation system was proposed revision (ICD-10) are structured (Asante-Korang &
(Khan et al., 2014) for HIS with different health care Jacobs, 2016). However, the majority of Big Data in
standards. Open- source software is also available for health care is
supporting Hospital
182 Liang Hong et
al.

unstructured, including data from CT, MRI, X-ray,


(SPECT) images, MRI, and EEG are a function of time
Holter monitoring, angiography, and laboratories
and thus have a strong timeliness. Keeping medical/
(Swan, 2013).
health information current is a major challenge for Big
The sources of the Big Data in health care can
Data in health care analytics, and HIS should maximize
be classified into four categories (Table 1). There is a
the timeliness of data. At the same time, storage time of
shortage of tools to analyze the information from these
medical records is different among hospitals. For some
heterogeneous sources. A German calciphylaxis registry
familial or genetic diseases, it is useful to know the
proposed a framework and developed a tool to integrate
family history in order to support medical decision-
medical record, imaging data, and signal data for the
making. To this point, there is no link between one’s
purpose of improving knowledge of rare diseases
medical records with those of his/her family members.
(Deserno et al., 2014). Windridge and Bober (2014)
proposed a kernel-based framework to analyze
heterogeneous data in the medical domain, which
addressed the missing data problem presented by
3.4 Data privacy
patients with sparse or absent data modalities. Using the
Owing to the sensitivity of health care data, there are
kernel method, regression and classification of
significant concerns regarding privacy and security
heterogeneous medical information can be achieved.
(Clemens Scott Kruse et al., 2016; Naito, 2014).
Cismondi et al. (2013) developed a classifier to determine
Extreme care should be taken to protect patient privacy,
which missing data of ICUs should be imputed and
and privacy concerns pose limitations in linking
which should not be. Through a simulated test bed, the
external data to individual insured data, which may
performance of this method is improved compared with
improve consumer health-related experience and
that of the previous work.
personalize service and care (Yuen-Reed & Mojsilović,
2016). Because of the centralization of much health care
information, the data are highly vulnerable to attacks
3.2 Incompleteness
(Mohr, Burns, Schueller, Clarke, & Klinkman, 2013).
Owing to privacy issues, Herland et al. (2014) used
To the extent that the data created by monitoring
synthesized EMR/EHR and PHRs with help from a
devices consist of continuous data streams, such as
medical professional to conduct their research. Health
electrocardiogram, it is difficult to consistently save it
care mobile phone applications, such as Google Health,
in the longitudinal record (Clemens Scott Kruse, Rishi
promise consumers “complete control over your data,”
Goswamy, Yesha Raval, & Sarah Marawi, 2016). It is
meaning that personal information will not be sold or
too expensive to store all the Big Data in health care, a
shared without the consumer’s explicit permission
situation that leads to data incompleteness. Additionally,
(Steinbrook, 2008). In different countries, there are two
the EHR requires doctors or nurses to record disease
patterns of policies and regulations to protect the data in
information of patients, such as medications and
health care. In one pattern, based on the basic privacy
allergies, and this process may also lead to data
laws, governments pass additional laws, policies, and
incompleteness (Hong, Kaur, Farrokhyar, & Thoma,
regulations to protect personal health care information,
2015). In Menelik II Referral Hospital, inpatient
such as HIPAA in the US, Health Records and
medical record completeness was 73%, which is low
Information Privacy Act 2002 in Australia, and Medical
against the standard. Medical records not only support
Privacy Act and Healthcare Insurance Act in France. In
direct patient care but also support clinical audit,
the other pattern, taking personal health care
epidemiology, medical research, and resource
information as part of personal information or sensitive
allocation. Improving the completeness of medical
information, governments pass laws to protect personal
records is important to improve the quality of health
information or sensitive information, such as the Data
care (Tola, Abebe, Gebremariam, & Jikamo, 2017).
Protection Act in England and he Personal Information
Protection and Electronic Documents Act (PIPEDA) in
Canada.
3.3 Timeliness and longevity

For HIS, there is a delay time from when the EHR


3.5 Ownership
information is entered into HIS to the point when the
EHR is available for electronic access (Medicare &
Although consumers who have medical needs legally
Medicaid Services 2010). Medical signals such as
own their health data, which may be stored and
electrocardiogram (ECG), Single Photon Emission
controlled by
Computed Tomography
hospitals, physicians, laboratories, clinics, pharmacies,
diagnosis (Costa, 2014). Now in the cardiology area,
and government agencies in innumerable, incompatible
computing and Big Data technology enable
data silos, consumers may lack access to and control
cardiologists to read patients’ medical record via
over their own health care data. To solve this problem,
smartphones, which are helpful in identifying
the cooperative, which is an old and successful form
emergency cases in need of immediate treatment
of corporation that is entirely owned by citizens, is an
(Hsieh, Li, & Yang, 2013).
effective approach. Each consumer has one account that
stores and manages all health care data. They can share
subsets or all the data for research purposes (Pentland,
Reid, & Heibeck, 2013).
4.2 The perspective of the government
or the public
BDA could reduce costs in the medical domain,
4 Importance of Big Data in estimated at approximately 8% of national health care
expenditures for the US government (Manyika et al.,
Health Care 2011). In Italy, by exploiting the admissions for
“laparoscopic appendectomy” surgery in different
It is important to extract valuable information and discard
sanitary districts, it was possible to categorize districts
useless fragments from Big Data. As the main issue for
based on cost efficiency and timeliness by using the
this discussion, Big Data in health care could produce
number of admissions and the average days of
considerable economic benefits with the application of Big
hospitalization. This data analysis provides an automatic
Data analytics (BDA). For example, a significant amount
and continuous monitoring of the sanitary districts. The
of money could be saved in the health care industry
results of this data analysis provide useful insights into
(Asante-Korang & Jacobs, 2016). Additionally, it would
reducing cost and increasing the effectiveness and
be applied in clinical diagnosis, medical research, hospital
efficacy of health care services (Mancini, 2014a).
management, and fundamental demand in medicine.
BDA could help governments prevent the spread of
Through the use of Big Data techniques, patients may
infectious diseases. In Pakistan, BDA with smartphone
have personalized medicine and patient-centric care. This
technology helped in detection and prevention of the
argument supposes that Big Data would help to provide
early stage of the dengue fever epidemics. The method
novel approaches to deal with issues in health care (C. S.
was also used to detect outbreaks of flu epidemics in
Kruse, R Goswamy, Y Raval, & S Marawi, 2016).
the US (Pentland et al., 2013). Governments can thus
respond more quickly to epidemics and help people
avoid the disease.
4.1 The perspective of the
BDA has the potential to reveal regional health
research institution and the problems. For example, Duke University led a project
hospital that involved building an integrated clinical data
warehouse by combining millions of patient records
Research institutions could better understand the from their EHRs with geographic information system
mechanisms and effects of newly developed drugs data (Braunstein, 2015). Based on the combined data,
through BDA. For example, it could also reprocess this project reveals the social determinants of health.
cancer data to hunt for new cancer drugs (Marx, 2013).
Through using statistical tools and algorithms,
researchers could improve the clinical trial design and
4.3 The perspective of patients and
reduce trial failures (Wullianallur Raghupathi &
Raghupathi, 2014).
their relatives
Physicians could use clinical decision support
Using health care mobile phone applications and other
systems (CDSS) with BDA to make more informed
online health-related websites, patients can store,
decisions, which may improve the quality of patient
retrieve, manage, and share their health data. Over the
care (K. Jee & G.-H. Kim, 2013; Kim, Park, Yi, & Kim,
long term, this process will improve health care and
2014). Allowing Big Data to influence clinical decision-
decrease costs, especially for patients who have
making, new practices, and treatment guidelines within
complicated chronic conditions (Steinbrook, 2008), such
clinical research may be integrated and lead to an
as diabetes. Some diabetes applications offer a variety
optimized result. BDA and computer-aided diagnostics
of functions, including medication or insulin logs, self-
may be used to save time in cancer detection, reducing
monitoring
the false-positive rate of cancer
blood glucose recording, and prandial insulin dose
promotion services. New Zealand is in a strong position
calculators (Demidowich, Lu, Tamler, & Bloomgarden,
to analyze patterns of childhood morbidity due to
2012), and others integrate health care providers who
universal enrollment with a primary care provider at
can access the patients’ records and formulate
birth. However, analyzing morbidity patterns within
personalized feedback. Thus, patients can take the right
these extracted data is problematic because primary care
treatments and live healthier, more comfortable lives
practices do not consistently or frequently use
(Asri, Mousannif, Al Moatassime, & Noel, 2015).
diagnostic labeling and there is marked variability
Through Big Data techniques, patients may have
between clinicians and conditions. A study conducted
personalized medicine and patient-centric care (Chawla
by MacRae et al. (2015) aimed to extend the use of
& Davis, 2013; Collins, 2016). Chawla and Davis
Pattern Recognition Over Standard Aesculapian
(2013) constructed a framework called the Collaborative
Information Collections (PROSAIC) to identify
Assessment and Recommendation Engine (CARE) for
childhood respiratory conditions within primary care
patient-centered disease prediction and management.
consultations by building an algorithm to classify the
It can generate personalized disease predictions and
unstructured clinical narrative written by clinicians.
management plans. In addition through BDA, three
Three independent sets of 1,200 child consultation
drugs have been identified and used in specific groups
records were randomly extracted from a data set of all
of cancer patients. Dabrafenib is used to treat
general practitioner consultations in participating
melanoma; the BRAF mutation V600E, a targeted
practices between January 1, 2008, and December 31,
therapy using trastuzumab, is used to treat breast cancer
2013, for children younger than 18 years of age
and the amplification or overexpression of the gene
(n=754,242). Each consultation record within these sets
encoding Her2/Neu; and imatinib is used to treat
was independently classified by two expert clinicians as
different types of tumor that contain the fusion protein
respiratory or non-respiratory and subclassified
BCR-ABL (Costa, 2014).
according to respiratory diagnostic categories to create
Through BDA, patients may have their diseases
three “gold standard” sets of classified records. These
detected earlier, receive treatment earlier, and have
three “gold standard” record sets were used to train, test,
better outcomes (K. Jee & G.-H. Kim, 2013; Kim et al.,
and validate the algorithm. Then, sensitivity,
2014). In daily life, BDA can help patients and their
specificity, positive predictive value, and F-measure
relatives monitor their respective conditions.
were calculated to illustrate the algorithm’s ability to
replicate judgments of expert clinicians within the 1,200
record “gold standard” validation set. This algorithm
5 Common Approaches for that uses primary care Big Data can accurately classify
the content of clinical consultations. It enables accurate
Analyzing Big Data in Health Care estimation of the prevalence of childhood respiratory
illness in primary care and the resultant service
With the growing awareness of data as an asset, more
utilization. The algorithm is able to analyze very large
and more data-mining approaches are adopted in order
data sets, including routinely recorded unstructured
to gain insights from large volumes of data. In medicine
clinical narratives. These data sets would be impractical
and health care, a data-rich environment generates an
to analyze manually.
enormous amount of data every day. Thus, we need to
Frantzidis et al. (2010) applied data classification
use data-mining approaches such as classification,
techniques to emotion recognition for health care
clustering, regression analysis, and association rules to
applications, taking into account the bidirectional
analyze big health care data.
emotion theory model that accounts emotions as
mixtures of two (orthogonal and independent)
dimensions, namely, valence and arousal. Specifically,
5.1 Classification this paper uses classification rules derived from the
C4.5 algorithm and pattern classifier based on the
Classification is the process of organizing data into
Mahalanobis distance. It then favors the role of
categories for its most effective and efficient use.
multiphysiological recordings for the enhancement of
Classification is widely applied in mining health care
emotion discrimination and the use of metadata
data. There are some specific introductions in these areas.
structure designs via the extensible markup language
Primary care influences child health outcomes by
(XML) for linking the various system components.
managing illness and providing preventive and health
Fan et al. (2011) developed a hybrid model named
case-based reasoning and fuzzy decision tree (CBFDT)
for medical data classification in two medical domains:
breast cancer diagnosis and liver disorder diagnosis. In
than 1,500 patients were analyzed), with a sensitivity
this paper, they introduced the method and algorithm
of and specificity close to 90%, which are considerably
of a case-based fuzzy decision tree (FDT) model for
better than those predicted by human experts.
medical classification problems. Two medical data sets
including liver disorders and Breast Cancer Wisconsin
are selected from University of California Irvine (UCI)
database. More than 900 data sets are used to conduct
5.2 Clustering
this experiment. Decision tree induction is free from
Clustering is the task of grouping a set of objects in
parametric assumptions, and it generates a reasonable
such a way that objects in the same cluster are more
tree by progressively selecting attributes to branch the
similar to each other than those in other clusters.
tree. By combining all kinds of medical features of liver
Clustering techniques are widely used for exploratory
disorders and Breast Cancer Wisconsin database, this
data analysis, with applications including patient
research applies an FDT to develop a forecasting model
segmentation, outlier health care data detection, disease
for generating decision rules in disease classification.
prediction, and clustering of patients.
This classification model integrates a data clustering
Elbattah & Molloy (2017) employed clustering in
technique, an FDT, and genetic algorithms (GAs) to
order to realize the segmentation of patients from a
construct a medical classification system based on
data-driven viewpoint. The Irish Hip Fracture Database
medical database. It can be divided into four major
(IHFD) is the primary source of data used in the study.
steps:
Its records contain ample information about patients’
(1) screening medical database from UCI data set; (2)
journeys from admission to discharge. Then, a set of
clustering case library into smaller cases; (3)
data pre-processing procedures are conducted for two
establishing FDT; and finally (4) outputting the
purposes: (1) dealing with data anomalies and (2)
classification results.
extraction of additional features that are considered as
Clinical data usually contain numerous features
indicators of care quality. In this paper, the authors use
with small sample size, leading to degradation in
k-means algorithm as the partitioned clustering
accuracy and efficiency of the system by curse of
approach. The k-means clustering uses a simple iterative
dimensionality. This leads to the degradation of
technique to group points in a data set into clusters that
classifier system’s performance in high-dimensional
contain similar characteristics.
data sets because irrelevant features not only lead to
Christy et al. (2015) proposed two cluster-based
insufficient classification accuracy but also add extra
outlier detection algorithms including distance-based
difficulties in finding potentially useful knowledge.
outlier detection and cluster-based outlier detection. The
Azar and Hassanien (2015) presented a linguistic
main purpose of the algorithms was to remove outliers
hedges neuro-fuzzy classifier with selected features
that are irrelevant or only weakly relevant to the
(LHNFCSF) for dimensionality reduction, feature
analysis of health care data. Experimental evaluation
selection, and classification. The new classifier is
based on the metrics of F-score and likelihood ratio
compared with the other classifiers for different
shows that the cluster- based outlier detection method
classification problems. All data sets are in the public
outperforms distance- based outlier detection method.
domain. The data sets are breast cancer Wisconsin
Huang and Yao (2016) proposed a novel clustering
diagnostic, breast cancer Wisconsin prognostic,
approach for multidimensional physical health data
erythemato-squamous disease, and thyroid disease data
based on artificial ant colony optimization. This method
set. These data sets are obtained from the well-known
is determined through testing to be an effective and
UCI machine learning repository. The results indicate
efficient approach to clustering health and medical data
that applying LHNFCSF not only reduces the
for further analysis.
dimensions of the problem but also improves
Paul and Hoque (2010) proposed to use the
classification performance by discarding redundant,
background knowledge of medical domain in the
noise-corrupted, or unimportant features. The results
clustering process to predict the likelihood of diseases.
strongly suggest that the proposed method not only
The developed algorithm can handle both continuous
helps reduce the dimensionality of large data sets but
and discrete data and perform clustering based on
also can speed up the computation time of a learning
anticipated likelihood attributes with core attributes of
algorithm and simplify the classification tasks.
disease in data point. In this paper, its effectiveness has
Estella et al. (2012) designed an advanced system
been demonstrated by testing it on a real-world patient
for autonomously classifying brain MRI images of
data set.
neurodegenerative diseases, with the main purpose of
assisting in decision-making in classification tasks. The
method was tested on data from a large database (more
Hastie et al. (2005) conducted a test in which
Risk adjustment is an important component of
188 individuals (59.0% female) completed several
outcomes and quality analysis in surgical health care.
psychological instruments and underwent ischemic,
However, there are some concerns that should be
pressure, and thermal pain assessments. Then, 13
addressed if risk-adjustment models avoid subjective
separate pain measures were obtained by using three
data elements, such as history of comorbidities, and
experimental pain modalities with several parameters
rely on objective data, such as laboratory values or
tested within each modality. Cluster analyses of PSI
other machine-collected variables that do not require
scores revealed four distinct clusters, and significant
subjective interpretation and input of hospital personnel.
correlations were found between psychological
A study was conducted by Anderson and Chang
measures and index scores. These findings highlight the
(2015) was conducted to determine whether machine-
need for future investigation to identify patterns of
collected data elements could perform as well as a
responses across different pain modalities in order to
traditional, full risk-adjustment model that includes
more accurately characterize individual differences in
other physician- assessed and physician-recorded data
responses to experimental pain.
elements. This research uses all available The National
Surgical Quality Improvement Program (NSQIP) data
from January 1, 2005, to December 31, 2010. This
5.3 Regression analysis
nationally validated program measures more than 135
variables on each patient and follows up each patient for
Regression analysis is widely used in analyzing health
30 days postoperatively. The primary analysis included
care Big Data for estimating the relationships among
all patients in the database who were categorized as
variables or properties. The main research issues
having had an operation performed by a general
include trend features of data sequences, prediction of
surgeon or surgeons in some surgery subspecialties and
data sequences, and relationships between data.
having an adverse event. Multivariate logistic
With the emergence of administrative databases,
regression models were created to predict either
the ability to access longitudinal patient data to adjust
mortality or any complication in the inpatient setting or
for comorbidity has improved considerably. This raises
within 30 days of surgery. The researchers then
the issueof themostappropriatelookback periodto
compared the ROC AUC of each regression using
determine patients’ disease status for risk estimation.
objective preoperative risk variables to its
Most research has used relatively short lookback
corresponding regression with all variables. A total of
durations, but longer lookback periods are likely to
745,053 patients were included. The difference in AUC
capture more conditions per patient, as well as assign
comparing models with all variables with objective
comorbidities to a greater proportion of patients. Preen
variables ranged from
et al. (2006) conducted a research to discover the
−0.0073 to 0.1944 for mortality and from 0.0198 to
impact of different comorbidity ascertainment lookback
0.0687 for complications. These data suggest that it is
periods on modeling post- hospitalization mortality and
possible to create a risk-adjustment system with a high
readmission. Data were extracted for ~1.1 million
discriminatory value based only on objective variables.
patients admitted to hospital in the Washington State
By restricting data collection to objective data, we can
from July 1990 to December 1996. Hierarchically
reduce concerns about reliability and validity as well as
nested Cox regressions were used to model mortality
threats of gaming the system from attempting to
within one year and readmission within 30 days of
increase the risk score of patients through subjective
index separation. Additionally, deaths within one year
variables.
and readmissions within 30 days of index
Kennedy et al. (2013) conducted a retrospective
hospitalization were analyzed using logistic regression
cohort study. In this paper, they identified all Veterans
and receiver operator characteristic (ROC) area under
Health Administration (VHA) patients without recent
the curve (AUC) determined for each hierarchically
cerebral and cardiovascular (CCV) events treated at
nested lookback model in order to estimate the
twelve facilities from 2003 to 2007 and predicted risk
predictive power of different models. The result is that
using the Framingham risk score (FRS), logistic
longer lookback resulted in more comorbidity being
regression, generalized additive modeling, and gradient
identified. For the entire sample, 46.8% of comorbidity
tree boosting.
observed across the five-year lookback period was
Oztekin et al. (2009) used three different variable
recorded at index hospitalization. For readmission,
selection methods on a large and feature-rich data set
lookback periods of five years perform better than
to generate a consolidated set of factors and use them
shorter durations for both patient groups.
to develop Cox regression models for heart–lung graft
survival. The main objective of this study was to
improve the prediction of outcomes
following combined heart–
lung transplantation by proposing an integrated data-
the data sets provided by the National Health Insurance
mining methodology. The data files were obtained
Plan of Taiwan demonstrates that the proposed method
from United Network for Organ Sharing (UNOS) using
can find the hidden rules that may occur less often but
a formal data requisition procedure. The complete data
have robust relationships.
set consists of 443 variables and 61,391 records. These
variables included the socio-demographic and health-
related factors of both the donor and the recipients.
There are also procedure-related factors included in the 6 Systems and Applications for
data set. The results indicated that the proposed
integrated data-mining methodology using Cox hazard
Analyzing Big Data in Health Care
models better predicted graft survival with different
Big Data can provide support across many aspects of
variables than the conventional approaches commonly
health care. BDA has made progress to different
used in the literature.
degrees in CDS, remote medical information services,
public health, disease pattern analysis, and personalized
medicine. There are some specific applications and
5.4 Association rules
potential opportunities in these areas.

Association rule mining aims to discover associations


between items in large databases. The typical
6.1 CDSS
association rule mining methods include Apriori
(Agrawal, Imieliński, & Swami, 1993) and Frequent
A CDSS can provide a large amount of medical support
Pattern (FP)-tree growth (Han, Pei, & Yin, 2000).
for clinicians, helping them to make diagnoses and
Association rule mining is normally a two-step process
choose the best treatments. CDSS helps in
where in the first step, frequent item- sets are
supplementing the knowledge of clinicians, preventing
discovered (i.e., item-sets whose support is no less than
human negligence, and reducing the costs while
a minimum support) and in the second step, association
improving the quality of medical treatment.
rules are derived from the frequent item-sets using
Representative data-driven CDSSs include the Health
some measures of interestingness.
Evaluation Through Logical Processing (HELP)
Antonie et al. (2001) used Apriori algorithm to
system, Quick Medical Reference (QMR) system, Iliad
discover association rules among the features extracted
system, and MYCIN system.
from the mammography database and the category to
which each mammogram belongs. They constrained the
association rules to be discovered such that the
6.1.1 The HELP system
antecedent of the rules is composed of a conjunction of
features from the mammogram, while the consequent of
The Health Evaluation Through Logical Processing
the rules is always the category to which the
system (Gardner, Pryor, & Warner, 1999) is the first
mammogram belongs. Once the association rules are
data-driven clinical decision-making and HIS. The
found, they are used to construct a classification system
system uses the knowledge base to make decisions from
that categorizes the mammograms as normal, malign, or
the multi-source clinical data stored in its integrated
benign.
clinical database. For example, a serum potassium of
In a medical database, the most complete and
6.2 meq/L will trigger an elevated potassium alert to the
detailed information is anamnesis data, which contain
nurse caring for a patient via a digital pager. Time-
disease name, prescription, patient’s detail information,
driven decision-making capabilities are also available
etc. Through this method, it is possible to find the
within the HELP system. Using natural language
association rules between diseases. Driven by this, Kuo
processing, data from transcribed reports such as
et al. (2007) proposed a novel framework of data
handwritten medical records have become a major
mining that clusters the data first and then follows with
source of data for decision-making.
association rule mining. The first stage uses the ant
The HELP system consists of a knowledge base,
system-based clustering algorithm (ASCA) and ant k-
decision-making processor, data and time driver, data
means (AK) to cluster the database, while the ant
review alerts, accounting system, longitudinal patient
colony system (ACS)- based association rule mining
data repository, and other components.
algorithm is applied to mine the association rule for
each cluster. Experimentation on
6.1.2 The QMR system
6.2 Remote medical information systems
QMR is a typical CDSS to help physicians, using the
knowledge base of INTERNIST-1/CADUCEUS. This The aggregated electrocardiogram (ECG) and images
knowledge base is widely used as a medical book, from hospitals worldwide can become Big Data, which
which contains 750 diseases, 5,000 clinical symptoms, could be used to develop an e-consultation program
and more than 50,000 disease relationships. QMR was helping on-site practitioners deliver appropriate
one of the earliest CDSSs to use artificial intelligence treatment. Real-time teleconsultation and telediagnosis
and probability ranking system. of ECG and images can be practiced via an e-platform
Because many of the diseases in the system are rare for clinical, research, and educational purposes.
and documented, an ad hoc scoring model is proposed With respect to large-scale data research, Chia and
to encode the relationship between specific clinical Syed (2011) used Big Data computing to generate a
symptoms and disease. One of the factors limiting the predictor of the mortality risk for patients with acute
use of QMR is that its knowledge base needs to be coronary syndromes in 2011. This predictor was
constantly updated. The significance of QMR lies in its developed through data mining and machine learning,
powerful knowledge base, which is used as the basic based on 24-hour continuous ECG readings over 4,000
model of other knowledge base system. patients’ trials. In each trial, 24-hour ECG readings were
collected in a two-year period. This Big Data-based
predictor can predict over 50% of deaths with fewer
6.1.3 The Iliad system false positives as compared with the traditional ECG
analysis, which was conducted based on a smaller
Iliad is a medical expert consulting system developed segment of ECG signals. This approach can be easily
by the University of Utah School of Medicine. It is used extended to other clinical and non-clinical applications
as a consultation tool or a simulation training tool for focused on approximate sequential pattern discovery in
CDS and teaching (Lincoln, 1998). massive time- series data sets.
The Iliad consultant utilizes a number of inferencing To make telemedicine more efficient, medical
mechanisms to emulate the strategy of a medical expert wearable devices that apply Big Data-mining and
in working with a patient. The knowledge in Iliad is analysis techniques are used. For example, patients with
represented in Bayesian and Boolean frames. These dementia (such as Alzheimer type) need to be looked
frames permit the use of sensitivities and specificities to after day and night in order to manage their negative
describe the relationship of a disease to its behaviors, which means a sea of input of labor and
manifestations and provide a basis for explaining its capital. With the purpose of resolving this problem,
conclusions. Iliad has four basic components: the real-time health monitoring devices have been
inference engine, the user interface, the data driver, and developed to capture a large amount of data. Based on
the best information algorithm. these real-time data, patients with dementia can be
diagnosed whether in agitation or not. At the same time,
medical Big Data also pose challenges to data cleaning;
6.1.4 The MYCIN system poor-quality data should be identified and rejected to
ensure that the results of data mining are right.
MYCIN is an interactive expert system for the diagnosis Moreover, data captured from remote motoring devices
and treatment of central nervous system’s infection can be mined to realize long-term prognoses.
(Berner, 2003). It is composed of three subsystems: A Context Processing Algorithm (CPA) (Moore,
consultation, interpretation, and rules. According to the Xhafa, Barolli, & Thomas, 2013) is proposed to address
clinical manifestations and laboratory results of the issues encountered in decision support in medical
patients, MYCIN imitates the expert reasoning process, diagnosis and potential prognoses based on the event–
assists clinicians in determining bacterial species, and condition– action (ECA) rule concept. CPA regards
makes clinical recommendations. The system adopts the captured Big Data as a kind of contextual information to
method of if–then inference rules and produces more carry out data processing in intelligent context-aware
than 400 kinds of embodied knowledge expert judgment systems.
rules. On the basis of Big Data, pervasive remote medical
systems are designed for both healthy and ill people.
Páez et al. (2015) proposed an architecture including
the application of cloud computing, Big Data, and
Internet of things approaches to make sure chronic or
non- chronic patients as well as healthy people are
monitored
in different environments. Family members, emergency related to the use
systems, and hospitals can interact with the patients
whenever and wherever possible.
While Big Data promotes the function of medical
remote monitoring and diagnosis, the development of
telemedicine also enriches the connotation of Big Data.
Traditionally, medical Big Data refers to EHR and
remote monitoring health data. However now, medical
Big Data, including user’s behaviors, physical strength,
and mental state data, has been rapidly generated
(Redmond et al., 2014). Technological advances in the
medical field, such as medical video communications,
also provide a new type of medical Big Data. For
instance, a light-field-based 3D cloud telemedicine
system (Wang, Xiang, Pickering, & Zhang, 2016) that
combines Big Data analysis with 3D technologies is
proposed to mine big video data.

6.3 Applications in public health

In the field of public health, BDA represents a new


solution that can mine web-based and social media data
to predict disease outbreaks based on consumers’
searches, social content, and query activity. Systems in
public health also support clinicians and
epidemiologists performing analyses across patients and
care venues to help identify disease trends and drug
safety.
BDA is oftenusedfor monitoring of
diseasenetworking. An example is Google’s use of BDA
to study the timing and location of search engine
queries to predict disease outbreaks. Research shows
that one-third of consumers currently use social
networking for health care purposes (Facebook,
YouTube, blogs, Google, Twitter). As demand for
access to health information from social networking
sites continues to proliferate, BDA can potentially
support key prevention programs such as disease
surveillance and outbreak management.
The Global Burden of Disease Study (GBD) is a
comprehensive regional and global research program
of disease burden that assesses mortality and disability
from major diseases, injuries, and risk factors. GBD is
a collaboration of more than 1,800 researchers using
medical Big Data from 127 countries. The 2015 report
(Collaborators, 2017) showed that globally, diarrhea
was a leading cause of death among all ages, as well as
a leading cause of disability-adjusted life years
(DALYs) because of its disproportionate impact on
young children.
BDA is also widely applied to supervise drug
safety, particularly ADRs, and identify susceptible
population. ADR is defined as an appreciably harmful
or unpleasant reaction resulting from an intervention
of a medicinal product (Edwards & triage, decompensation, adverse events, and treatment
Aronson, 2000). ADR can be used in the optimization for diseases affecting multiple organ
field of medical administration and systems.
warrants prevention, specific treatment,
alteration of the dosage regimen, and
withdrawal of the product. 6.4 Applications in disease pattern
With the help of Big Data, health analysis and personalized medicine
departments or medical companies can
efficiently take actions when they detect Hay et al. (2013) imported new sources of data, such as
potential ADRs among the people who social data, to relevant environmental information to
take the medication. In 2004, Wilson et create a dynamic and real-time global infectious disease
al. proposed that Knowledge Discovery map. On the basis of infectious disease risk maps,
in Databases (KDD) is a more effective human
way to determine the presence and assess
the strength of ADR signals. At this
point, numerous data- mining techniques
have been used in drug safety, such as
cluster analysis, link analysis, deviation
detection, and disproportionality
assessment.
As Big Data emerges, health social
media sites are regarded as a fast and direct
data resource for scientist to get first-
hand ADR information. Compared with
ADRs recorded by health professionals,
spontaneous reporting of data on health
social media sites is much more abundant,
open, and timely. Owing to the advantages
discussed earlier, Christopher et al. (2009)
used association mining and proportional
reporting ratio to analyze the detected
ADRs for different drugs on the basis of
social data. Given the prosperity of medical
research especially in the ADR field and
the advantages of Big Data, Shah et al.
(2012) believed that Big Data in
biomedical informatics will grow
considerably. There is no doubt that the
age of data-medicine is poised to create a
proactive, predictive, preventive,
participatory, and patient-centered health
system.
Apart from the great potential shown
in drug safety, Big Data can also achieve
powerful effects in identifying
susceptible populations. A large
collection of EHRs accumulated by
various medical treatments provides an
opportunity to dig out the statistical
model of high-risky people. The model
aims to reduce the cost of health care and
conserve limited resources in health
value. Bates et al. (2014) suggested
that identifying and managing six
practical use cases’ data is the way to use
predictive medical systems. The use cases
include high-cost patients, readmissions,
Big Data in Health Care: Applications and 191
Challenges

beings can deepen their knowledge of infectious handwritten


diseases and improve the ability to triage spatially and
issue infectious disease outbreak alerts. Lazer et al.
(2014) stated that “Big Data hubris” is the often implicit
assumption that Big Data is a substitute for, rather than
a supplement to, traditional data collection and analysis.
Given that most Big Data cannot reach the standard of
scientific statistical analysis, there is no doubt that the
results can have large errors. Additionally, medical
algorithms are not constant. On the contrary, they are
dynamic and process a continuous series of
adjustments.
Big medical data can be applied not only to mining
public medical patterns but also to personalized medical
care. At present, health care is moving from a disease-
centered model toward a patient-centered model. In a
disease-centered model, physicians’ decision making is
centered on the clinical expertise and data from medical
evidence and various tests. In a patient-centered model,
patients actively participate in their own care and
receive services focused on individual needs and
preferences.
Personalized healthcare is a data-driven approach.
This means a kind of patient-centered medical model
that assesses the relationship among patients who are
exposed to similar risk, lifestyle, and environmental
factors that are created. In light of these thoughts,
Chawla and Davis (2013) developed a system named
CARE that uses a collaborative filtering method to
capture patient similarities and produces personalized
disease profiles for personalized disease risk
predictions.
Panahiazar et al. (2014) presented the main
challenges in the standpoints including variety of the
data, quality of the data, volume of the data, and
velocity of the data. Alyass et al. (2015) proposed that
personalized medicine may widen the growing gap in
health systems between rich and poor countries.
Moreover, they blamed the slow transition from
conventional to personalized medicine based on several
factors: generation of cost-effective high- throughput
data, hybrid education and multidisciplinary teams, data
storage and processing, data integration and
interpretation, and individual and global economic
relevance.

7 Challenges for Mining Big Data


in Health Care

7.1 Data mining

Clinical Big Data contains a large amount of


unstructured data such as natural language or other
190 Liang Hong et
data (Jee &al.Kim, 2013) whose integration, regional medical data are usually derived from a region
analysis, and storage bring a certain degree with millions of people and hundreds of medical
of difficulty. At the current stage, it is institutions, and the amount of data continues to grow.
inefficient to share structured data among In accordance with the relevant provisions of the
agencies and the sharing of unstructured data medical industry, a patient’s data typically need to be
among the same organizations is even more retained for more than 50 years. The data of this patient
difficult to achieve. Determining how to not only contain a large number of online or real-time
effectively mine a large amount of data but also include
unstructured data will continue to be a major
challenge (Sejdic, 2014). One of the
characteristics of Big Data is variability in
data sources (Dieringer & Schlotterer, 2003),
and medical data itself have a strong
timeliness; for example, personalized medical
care has high timeliness requirements. The
medical industry’s processing speed of data
is extremely demanding, especially when the
patient’s condition deteriorates rapidly. In
addition, when using real-time applications
such as cloud computing to access and
analyze data, the patient data’s privacy and
security are also a challenge (Jee & Kim,
2013). Cloud computing now offers new
possibilities for medical Big Data’s mining
and sharing. However, there are also several
challenges to be overcome before cloud
computing can become more practical. First,
although cloud computing offers an easy and
flexible way to mine resources, it also
increases the risk of privacy disclosure, a fact
that is particularly evident in fields such as
clinical informatics and public health
informatics. Second, in medicine, a large
amount of data are often required to be
imported or exported to the cloud (petabyte
level). The network bandwidth constraints
affect the speed of data transmission and also
increase the cost of cloud computing (J. J.
Chen, Qian, Yan, & Shen, 2013). At present,
the attention to Big Data focuses mainly on
its accuracy; timely and accurate data mining
is another challenge, which is still in the
initial stage (Abenstein & Tompkins, 1982;
Xu et al.).

7.2 Data storage

The current difficulties in data storage are


mainly due to high costs. Medical data costs
arise mainly from three aspects. The huge
amount of medical data is one of the sources
of storage costs. With the development of
medical information, the medical industry
has produced a large amount of data, ranging
from medical diagnostic images to
pathological analysis of maps. For example,
a variety of data such as diagnosis and medication
regional health information platforms, and population
recommendations in CDS, various structured data
and public health data of government surveys. There
tables, non-(semi-) structured text documents, medical
is not much connection between these data sets. At the
images, and other information. The massive size of the
same time, data sharing mechanism is imperfect due
data inevitably increases the cost and difficulty of
to the information barriers among hospitals, scientific
storage. There are also costs associated with moving
research institutions, and other institutions (Kruse,
them from one place to another as well as analyzing
Goswamy, Raval, & Marawi, 2016). For example, in
them. Finally, the types of medical data type are diverse,
China, medical institutions have limited communication
including numerical data that record various disease
and sharing with each other as a whole (Rui, Y., 2015).
tests, as well as various diagnostic images, records
With the globalization of data, Big Data in health care
made by doctors and nurses, and even diagnostic
will also face varying degrees of language, terminology,
speech, video, and other unstructured data. Unstructured
and standardization barriers (Kruse et al, 2016.).
data are more difficult to store, analyze, and manipulate
than structured data. They also, to a certain extent,
increase the cost of storage. It is also a challenge to
7.3.3 Volume of data
maintain safety and privacy in the process of storing,
extracting, and downloading patient- related data
The massive volume of health care Big Data in the
(Youssef, 2014).
terabyte (TB) level and even petabyte (PB) level is now
beyond the capabilities of personal computers and
network file sharing programs, thus establishing that a
7.3 Data sharing
new sharing mechanism is urgently needed (Kruse et al,
2016; Service).
7.3.1 Limited data standardization and interoperability

The current standards and technologies are inadequate


7.3.4 Insufficient data integration
to meet the requirements of the integrative applications
of health care Big Data. The difficulties are two folds.
More data integration is needed. The data have not yet
First, the data lack uniform standards, consistent
been fully embedded in business processes and
description format, and presentation methods. Second,
organizational management practices. For example, in
different levels of structured, semi-structured, and
many cases, patient monitoring data have not yet been
unstructured data integration are difficult. At the same
integrated into clinical diagnosis and treatment, and
time, each database uses different software and data
clinical data have not yet been integrated into public
formats, especially the latter makes data comparison,
health services and infectious disease monitoring (Tao,
analysis, transfer, sharing, and other processes more
D. A. I., 2016).
difficult (Chawla & Davis, 2013; Mohr, Burns,
Schueller, Clarke, & Klinkman, 2013; W. Raghupathi &
Raghupathi, 2014). Data integration can also reduce the
cost. Hillestad et al. (2005) compared health care with
7.4 Data privacy
the use of IT in other industries and estimated that the
Health care data are more sensitive and centralized
use of interoperable electromagnetic radiation system
than other types of Big Data. There are significant
can save $142–137 billion.
concerns regarding confidentiality (Mancini, 2014b; D.
C. Mohr et al., 2013). However, for the problem of
patient data privacy protection, no perfect solution has
7.3.2 Information barriers
yet emerged. Patient data leakage may have
unpredictable consequences (including injury,
The medical field of Big Data users covers a wide
discrimination, and others). There are many real cases at
range, such as hospital clinics, regional medical centers,
home and abroad. Big Data technology makes personal
medical insurance companies, drug management
medical data face a greater risk. Some people even
analysis units, and medical equipment monitoring
believe that in the era of Big Data, protecting personal
centers. The corresponding data resources are scattered
privacy is impossible (Schadt, 2012). The problem can
in different data pools, including hospital medical
be alleviated by special processing (such as de-
records, settlement and cost data, medical firms’
identification and digital identity encryption), but the
records, academic medical research data, residents’
identification and de-identification of information still
health records collected by
require people or applications to process
192 Liang Hong et
al.

Table 3
An Example of Data Privacy Breach

Voter registration data (publicly available) Hospital discharge data

Name Sex Zip code Date of birth Address Sex Zip code Date of birth Disease

Angela Female 77889 06/18/90 Arizona Female 77889 06/18/90 Diabetes

Harry Male 83456 02/14/76 California Male 66723 07/19/88 Anemia

Harley Female 76231 09/15/92 Connecticut Male 32412 10/01/79 Malnutrition

identifiable information that may cause the patient’s


issues. Lin et al. (2004) found that “Specifying DNA
health information to be misappropriated by others
sequence at only 30 to 80 statistically independent SNP
without knowing or unauthorizedly (Rothstein, 2010).
positions will uniquely define a single person”. As such
Big Data increases the risks to patient data for two
the privacy protection becomes the focus.
reasons. First is the risk of the data itself. The data can
be copied and preserved without space and time
constraints, and this feature is characterized by high risk
and long-term risk under Big Data conditions. Second
7.5 Data technologies and talent
is the risk of Big Data technology. Under Big Data
As described in the main characteristics of Big Data,
technology conditions, even if a Big Database uses
in terms of data size, Big Data in health care exceeded
anonymous personal encrypted data, there is still a user
150 exabytes after 2011 (Y. C. Wang et al., 2015). A
identity that can be re-identified by residual risk, and
study showed that data size in health care is estimated to
personal identities can be re-determined by data link
be around 40 ZB in 2020 (Fig. 1) (O’Driscoll,
technology because Big Data uses pseudonymized
Daugelaite, & Sleator, 2013). The complexity of the
personal confidential data that have been anonymized
data is also growing rapidly, with data diversity, fast
but retain a residual risk of re-identification
change, low value density, and other complex features
(Ward,2014). This risk is greater when different data
becoming increasingly significant. Their complexity
are used to relate. De-anonymization is an attack in
poses a serious challenge to traditional computing and
which anonymous data and other sources of data are
information technology (Tony Hey, 2012.06). At
compared in order to re-identify the anonymous data
present, it is difficult to accommodate the availability,
sources (Yom-Tov, E, 2016). For example, comparing
consistency, and partition fault tolerance of the
voter registration data and hospital discharge data can
distributed system all at once. It is also difficult to solve
determine whether a person is sick. Voter registration
the health care data collection, processing real-time and
data contains date of birth, sex, zip code, address, date
dynamic index, lack of prior knowledge, and other
last voted, name, data registered, and other details.
difficult issues (Zhang Zhen, Zhou Yi, Du Shou-hong,
Hospital discharge data contains date of birth, sex, zip
Luo Xue-qiong, Mei Tian, 2014). Even some widely
code, diagnosis, ethnicity, medication, procedure, visit
used Big Data technology also has its challenges. For
date, and other information. By comparing the same
example, Hadoop helps solve the storage problems of
fields in the two data sources, such as date of birth, sex,
Big Data and also reduces the cost of data storage and
and zip code, an attacker can determine the specific
improves the speed of operation. However, Hadoop is
source and then determine the subject’s illness and
faced with technical problems of low security and that
voting situation. In the example in Table 3, through the
data cannot be interconnected (Augustine, 2014. Mar;
comparison of these two data sources, it is not difficult
K. Jee & G. H. Kim, 2013). In addition, promoting the
to determine that the person whose date of birth, sex,
development of health care Big Data applications needs
and zip code are 06/18/90, female, 77889, respectively, is
human experts who have both clinical and analytic
Angela and she is suffering from diabetes.
knowledge (Mavandadi et al., 2012). According to
Also in the future, in order to better achieve
McKinsey, even in the U.S., the leading information
individualized treatment, our individual genomes may
technology power, the related talent gap will reach 14–
be added to the EHR. The individual genome is private,
19 million in 2018 (James Manyika, 2011). Many of the
and the gene sequence may lead to many privacy-related
data technologies today, including Hadoop and
computing cloud, are challenging for many businesses,
especially small firms. The skills required are in many
grants (Nos. 31601083 and 61772375), and the Recruitment
cases not simple; they involve data mining, analysis,
Program of Global Experts (No. 104413100019).
manipulation, and other techniques that are too difficult
and expensive for most small firms to master (K. Jee & G.
H. Kim, 2013). At present, only a small number of
companies in the world have mastered the core technology References
of Big Data analysis. The world needs more data analysts
who can use information technology to visualize the data Abenstein, J. P., & Tompkins, W. J. (1982). A new data-
before presenting to the policy makers. Finally, we also reduction algorithm for real-time ECG analysis. IEEE
Transactions on Biomedical Engineering,29(1), 43–48.
need to master the professional management of technology,
Abernethy, A. P., Wheeler, J. L., & Bull, J. (2011). Development
data processing technology, and medical data management of a health information technology-based data system in
personnel. They can use the appropriate management community-based hospice and palliative care. American
model to make the information infrastructure a continuous Journal of Preventive Medicine, 40(5, Suppl 2), S217–S224.
research and application platform, ensure continuity, Agrawal, R., Imieliński, T., & Swami, A. (1993, May). Mining
and achieve cross-cutting cooperation (Sepulveda,2013. association rules between sets of items in large databases.
In B. Peter, & J. Sunshil(Eds.), Proceeding of the ACM
Youssef, 2014).
SIGMOD
Conference on Management of Data(pp.207-216).
Washington, DC: ACM Press.
Aitken, M., & Gauntlett, C. (2013). Patient apps for improved
8 Conclusions healthcare: from novelty to mainstream. IMS Institute for
Healthcare Informatics. Retrieved from https://www.mendeley.
Medical research that integrates Big Data will com/catalogue/patient-apps-improved-healthcare-novelty-
contribute to a higher level of human health at a broader mainstream/
and deeper level. This paper summarizes and introduces Alyass, A., Turcotte, M., & Meyre, D. (2015). From big data analysis
to personalized medicine for all: Challenges and opportunities.
the related research of medical data at home and abroad
BMC Medical Genomics, 8(1), 33.
in recent years. This paper mainly introduces the related Anderson, J. E., & Chang, D. C. (2015). Using electronic health
concepts of medical Big Data, the background, and the records for surgical quality improvement in the era of big data.
main applications, and it introduces several key Jama Surgery, 150(1), 24-29.
technologies related to medical Big Data. In addition, Antonie, M. L., Zaïane, O. R., & Coman, A. (2001). Application of
data mining techniques for medical image classification.
we summarize and think about the opportunities and
Proceedings of the Second International Conference on
challenges in the study of big medical data. In general,
Multimedia Data Mining, 94-101. doi:10.1.1.23.9742
the current research on medical data is not yet mature; Asante-Korang, A., & Jacobs, J. P. (2016). Big Data and paediatric
there are many problems that need to be resolved. In cardiovascular disease in the era of transparency in healthcare.
order to take full advantage of the profound patterns Cardiology in the Young, 26(8), 1597–1602.
contained in the massive data, Big Data storage, Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2015,
June). Big data in healthcare: challenges and
mining, analysis, and related talent are essential. These
opportunities. Proceedings of 2015 International
technologies and talents will support research on health Conference on Cloud Computing Technologies and
care Big Data and further serve a wide range of medical Applications,Marrakech, Morocco.
applications such as public health, medical care, and Augustine, D. P. (2014). Leveraging big data analytics and Hadoop
medical insurance, and many others. in developing India’s healthcare services. International
Journal of Computers and Applications, 89(16), 44–50.
Azar, A. T., & Hassanien, A. E. (2015). Dimensionality reduction
Acknowledgments: ML wrote sections 1 and 2, RW
of medical big data using neural-fuzzy classifier. Soft
wrote sections 3 and 4, LH wrote sections 5 and 6, and PL Computing, 19(4), 1115–1127.
wrote sections 7 and 8. WL provided critical suggestions Backonja, U., Kim, K., Casper, G. R., Patton, T., Ramly, E., &
for the paper. LL designed the paper structure, Brennan, P. F. (2012, June). Observations of daily living:
integrated all sections, and supervised the paper writing. putting the “personal” in personal health records. NI 2012:
We thank Lina Zhou and Ni Wen for assistance in 11th International Congress on Nursing Informatics, Montreal,
Canada.
literature search. This paper is supported in part by The
Bagayoko, C. O., Dufour, J. C., Chaacho, S., Bouhaddou, O., &
National Key Research and Development Program of Fieschi, M. (2010). Open source challenges for hospital
China (No. 2016YFB1000603), Key Program of the Major information system (HIS) in developing countries: A pilot
Research Plan of the National Natural Science Foundation project in Mali. BMC Medical Informatics and Decision
of China (No. 91646206), National Natural Science Making, 10(22), 1-13.
Foundation of China
Bamidis, P. D. (2010). On the classification of emotional
Collins, B. (2016). Big data and health economics: Strengths,
biosignals evoked while viewing affective pictures:
waknesses, opportunities and threats. PharmacoEconomics,
An integrated data-mining-based approach for healthcare
34(2), 101–106.
applications. IEEE
Costa, F. F. (2014). Big data in biomedicine. Drug Discovery
Transactions on Information Technology in Biomedicine, 14(2),
Today, 19(4), 433–440.
309–318.
Dai, T. (2016). Health and medical big data development
Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G.
perspective. Journal of Medical Informatics, 37(2), 2–8.
(2014). Big data in health care: Using analytics to identify and
Demidowich, A. P., Lu, K., Tamler, R., & Bloomgarden, Z. (2012).
manage high-risk and high-cost patients. Health Affairs, 33(7),
An evaluation of diabetes self-management applications for
1123–1131
Android smartphones. Journal of Telemedicine and Telecare,
Belle, A., Thiagarajan, R., Soroushmehr, S. M., Navidi, F., Beard,
18(4), 235–238.
D. A., & Najarian, K. (2015). Big data analytics in healthcare.
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials.
Biomed Research Internatioan,2015:370194, 1-16.
Controlled Clinical Trials, 7(3), 177–188.
Berner, E. S. (2003). Diagnostic decision support systems: How to
Deserno, T. M., Haak, D., Brandenburg, V., Deserno, V., Classen,
determine the gold standard? Journal of the American Medical
C., & Specht, P. (2014). Integrated image data and medical
Informatics Association,10(6), 608–610.
record management for rare disease registries. A general
Blaya, J. A., Shin, S. S., Yagui, M. J., Yale, G., Suarez, C. Z.,
Asencios, framework and its instantiation to the German
L. L., Fraser, H. S. (2007). A web-based laboratory information Calciphylaxis Registry.
system to improve quality of care of tuberculosis patients in Journal of Digital Imaging, 27(6), 702–713.
Peru: Functional requirements, implementation and usage Dieringer, D., & Schlotterer, C. (2003). Microsatellite analyser (MSA):
statistics. BMC Medical Informatics and Decision Making, 7(1), A platform independent analysis tool for large microsatellite
33–43. data sets. Molecular Ecology Notes, 3(1), 167–169.
Braunstein, M. L. (2015). Health big data and analytics. Docherty,A., (2014). Big Data—Ethical perspectives. Anaesthesia,
Practitioner’s Guide to Health Informatics (pp. 133–149). Berlin, 69(4), 390–391.
Germany: Springer International Publishing. Edwards, I. R., & Aronson, J. K. (2000). Adverse drug reactions:
Celesti, A., Fazio, M., Romano, A., & Villari, M. (2016). A hospital Definitions, diagnosis, and management. Lancet, 356(9237),
cloud-based archival information system for the efficient 1255–1259.
management of HL7 big data. 2016 39th International Fan, C.-Y., Chang, P.-C., Lin, J.-J., & Hsieh, J. C. (2011). A hybrid
Convention on Information and Communication model combining case-based reasoning and fuzzy decision
Technology, Electronics and Microelectronics (MIPRO). tree for medical data classification. Applied Soft Computing,
Opatija, Croatia. 11(1), 632–644.
Centers for Medicare & Medicaid Services (CMS), HHS. (2010). Feldman, B., Martin, E. M., & Skotnes, T. (2012). Big data in
Medicare and Medicaid programs; electronic health record healthcare: Hype and hope. Dr. Bonnie, 2012(1), 122–125.
incentive program. Final rule, Federal Register, 75(144), Fenderson, & Bruce.,A. (2008). Molecular Biology of the Cell,5th
44313–44588. PMID:20677415 Edition. Medicine & Science in Sports & Exercise, 40(9), 1709.
Chawla, N. V., & Davis, D. A. (2013). Bringing big data to Frantzidis, C. A., Bratsas, C., Klados, M. A., Konstantinidis, E.,
personalized healthcare: A patient-centered framework. Lithari, C. D., Vivas, A. B., Gardner, R. M., Pryor, T. A., &
Journal of General Internal Medicine, 28(3, Suppl 3), S660– Warner,
S665. H. R. (1999). The HELP hospital information system: Update
Chen, J., Qian, F., Yan, W., & Shen, B. (2013). Translational 1998. International Journal of Medical Informatics, 54(3),
biomedical informatics in the cloud: Present and future. 169–182.
BioMed Research International,2013, 658925. PMID:23586054 Garets, D., & Davis, M. (2007). Electronic medical records vs
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Electronic health records: Yes, there is a difference. Zhongguo
Mobile Networks and Applications, 19(2), 171–209. Yiyuan, 11(5), 38–39.
Chen, T. S., Liu, C. H., Chen, T. L., Chen, C. S., Bau, J. G., & Lin, T. C. Gunter, T. D., & Terry, N. P. (2005). The emergence of national
(2012). Secure dynamic access control scheme of PHR in cloud electronic health record architectures in the United States and
computing. Journal of Medical Systems, 36(6), 4005–4020. Australia: Models, costs, and questions. Journal of Medical
Chia, C.-C., & Syed, Z. (2011). Computationally generated cardiac Internet Research, 7(1), 13-15.
biomarkers: Heart rate patterns to predict death following Han, J., Pei, J., & Yin, Y.. (2000, May). Mining frequent patterns
coronary attacks. Proceedings of the 2011 SIAM International without candidate generation: A frequent-pattern tree
Conference on Data Mining, 735-746. approach. Proceedings of the 2000 ACM SIGMOD international
Christopher C. Yang, H. Y., Jiang, L., & Zhang, M. (2009). Social conference on Management of data(pp.1-12), Texas, USA.
media mining for drug safety signal detection. Proceedings Hassani S, M. H., Qannari E M, et al. (2010). Analysis of -omics data:
of the 2012 international workshop on Smart health and Graphical interpretation- and validation tools in multi-block
wellbeing. methods. Chemometrics and Intelligent Laboratory Systems,
Christy, A., Gandhi, G. M., & Vaithyasubramanian, S. (2015). 104(1), 140–153.
Cluster based outlier detection algorithm for healthcare Hastie, B. A., Riley, J. L., Robinson, M. E., Glover, T., Campbell, C. M.,
data. Procedia Computer Science,50, 209–215. Staud, R., & Fillingim, R. B. (2005). Cluster analysis of multiple
Cismondi, F., Fialho, A. S., Vieira, S. M., Reti, S. R., Sousa, J. M., experimental pain modalities. Pain, 116(3), 227–237.
& Finkelstein, S. N. (2013). Missing data in medical Hay, S. I., George, D. B., Moyes, C. L., & Brownstein, J. S. (2013).
databases: Impute, delete or classify? Artificial Big data opportunities for global infectious disease
Intelligence in Medicine, 58(1), 63–72. surveillance. PLoS Medicine, 10(4), e1001413.
He, C., Jin, X., Zhao, Z., & Xiang, T. (2010, Deceember). A cloud
Kovalev, V., & Kalinovsky, A. (2015). Big Medical Data: Image
computing solution for hospital information system. Paper
Mining, Retrieval and Analytics. Paper presented at Big
presented at the 2010 IEEE International Conference on
Data and Predictive Analytics, Minsk, Belarus.
Intelligent Computing and Intelligent Systems, Xiamen, China.
Krumholz, H. M. (2014). Big data and new knowledge in medicine:
Heart, T., Ben-Assuli, O., & Shabtai, I. (2017). A review of PHR, EMR
The thinking, training, and tools needed for a learning health
and EHR integration: A more personalized healthcare and
system. Health Affairs, 33(7), 1163–1170.
public health policy. Health Policy and Technology, 6(1), 20–25.
Kruse, C. S., Goswamy, R., Raval, Y., & Marawi, S. (2016). Challenges
Heimann, T., & Meinzer, H. P. (2009). Statistical shape models for
and opportunities of big data in health care: A systematic
3D medical image segmentation: A review. Medical Image
review. Jmir Medical Informaticas, 4(4), e38.
Analysis, 13(4), 543-563.
Kumar, S., & Aldrich, K. (2010). Overcoming barriers to electronic
Herland, M., Khoshgoftaar, T. M., & Wald, R. (2014). A review of
medical record (EMR) implementation in the US healthcare
data mining using big data in health informatics. Journal of
system: A comparative study. Health Informatics Journal, 16(4),
Big Data, 1(2), 1–35.
306–318.
Hillestad, R., Bigelow, J., Bower, A., Girosi, F., Meili, R., Scoville,
Kuo, R., Lin, S., & Shih, C. (2007). Mining association rules through
R., & Taylor, R. (2005). Can electronic medical record systems
integration of clustering analysis and ant colony system for
transform health care? Potential health benefits, savings, and
health insurance database in Taiwan. Expert Systems with
costs. Health Affairs, 24(5), 1103-1117.
Applications, 33(3), 794-808.
Hong, C. J., Kaur, M. N., Farrokhyar, F., & Thoma, A. (2015).
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big
Accuracy and completeness of electronic medical records
data. The parable of Google Flu: Traps in big data analysis.
obtained from referring physicians in a Hamilton, Ontario,
Science, 343(6176), 1203–1205
plastic surgery practice: a prospective feasibility study.
Lin, Z., Owen, A. B., & Altman, R. B. (2004). Genetics: Genomic
Plastic Surgery, 23(1), 48.
research and human subject privacy. Science, 305(5681),
Hsieh, J. C., Li, A. H., & Yang, C. C. (2013). Mobile, cloud, and big
183.
data computing: Contributions, challenges, and new
Lincoln, M. J. (1998). Applying commonly available expert systems
directions in telecardiology. International Journal of
in physician assistant education. Perspective on Physician
Environmental Research and Public Health, 10(11), 6131–
Assistant Education, 9(3), 144–151.
6153.
Lodish, H. (2008). Molecular cell biology. San Francisco, CA:
Huang, X. J., & Yao, Y. (2016, August). Multi-dimensions
W.H.Freeman and Company.
clustering approach for physical health data based on
Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data
aritificial ant colony optimization. Paper presented at the
application in biomedical research and health care: A literature
8th International Conference on Intelligent Human-
review. Biomedical Informatics Insights, 8, 1–10.
Machine Systems and Cybernetics (IHMSC), Hangzhou,
M, T. T. (2014). Mobile Tech Contributions to Healthcare & Patient
China.
Experiences. Retrieved from http://topmobiletrends.com/
Jee, K., & Kim, G. H. (2013). Potentiality of big data in the medical
mobile-technologycontributions-Patient-experience-parmar/
sector: Focus on how to reshape the healthcare system.
MacRae, J., Darlow, B., McBain, L., Jones, O., Stubbe, M., Turner,
Healthcare Informatics Research, 19(2), 79–85.
Joshi, K., & Yesha, Y. (2012). Workshop on analytics for big data N., & Dowell, A. (2015). Accessing primary care Big Data:
generated by healthcare and personalized medicine domain. The development of a software algorithm to explore the
Proceedings of the 2012 Conference of the Center for Advanced rich content of consultation records. BMJ Open, 5(8),
Studies on Collaborative Research, 267-269. e008160.
Kanagaraj, G., & Sumathi, A. C. (2011, December). Proposal of Mancini, M. (2014). Exploiting big data for improving healthcare
an open-source cloud computing system for exchanging services. Journal of e-Learning and Knowledge Society, 10(2),
medical images of a Hospital Information System. Paper 23-33.
presented Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh,
at the 3rd International Conference on Trendz in Information C., & Byers, A. H. (2011). Big data: The next frontier for
Sciences & Computing (TISC2011), Chennai, India. innovation, competition, and productivity. Retrieved from
Kennedy, E. H., Wiitala, W. L., Hayward, R. A., & Sussman, J. Mckinsey Glbal Institute website:
B. (2013). Improved cardiovascular risk prediction using https://www.mckinsey.com/business- functions/digital-
nonparametric regression and electronic health record data. mckinsey/our-insights/big-data-the-next- frontier-for-
Medical Care, 51(3), 251–258. innovation
Khan, W. A., Khattak, A. M., Hussain, M., Amin, M. B., Afzal, M., Marx, V. (2013). Biology: The big challenges of big data.
Nugent, C., & Lee, S. (2014). An adaptive semantic based Nature, 498(7453), 255–260.
mediation system for data interoperability among Health Mavandadi, S., Dimitrov, S., Feng, S., Yu, F., Yu, R., Sikora, U., &
Information Systems. Journal of Medical Systems, 38(8), 1-18. Ozcan, A. (2012). Crowd-sourced BioGames: Managing the big
Khoury, M. J., & Ioannidis, J. P. A. (2014). Medicine. Big data meets data problem for next-generation lab-on-a-chip platforms. Lab
public health. The New Zealand Medical Journal, 346(6213), on a Chip, 12(20), 4102–4106.
1054–1055. Mohr, D. C., Burns, M. N., Schueller, S. M., Clarke, G., & Klinkman,
M. (2013). Behavioral intervention technologies: Evidence
Kim, T.-W., Park, K.-H., Yi, S.-H., & Kim, H.-C. (2014). A big data
review and recommendations for future research in
framework for u-Healthcare systems utilizing vital signs.Paper
mental health. General Hospital Psychiatry, 35(4), 332–
presented at 2014 International Symposium on Computer,
338.
Consumer and Control, Taichung, Taiwan.
Moore, P., Xhafa, F., Barolli, L., & Thomas, A. (2013, October).
Monitoring and detection of agitation in dementia: Towards
real-time and big-data solutions. Paper presented at the 2013
Eighth International Conference on P2P, Parallel, Grid, Cloud
data mean for wearable sensor systems? Yearbook of Medical
and Internet Computing, Compiegne, France.
Informatics, 9(1), 135–142.
Naito, M. (2014). Utilization and application of public health
Roberts, E. B. (1985). Health information systems. Clinics
data in descriptive epidemiology. Journal of
in Laboratory Medicine, 23(5), 672–676.
Epidemiology, 24(6), 435–436.
Rothstein, M. A. (2010). Is deidentification sufficient to protect
Nance, J. W., Jr., Meenan, C., & Nagy, P. G. (2013). The future of
health privacy in research? The American Journal of Bioethics,
the radiology information system. AJR. American Journal of
10(9), 3–11.
Roentgenology,200(5), 1064–1070.
Rui, Y. (2015). Medical big data: The next industry windy spot.
Obenshain, M. K. (2004). Application of data mining techniques
Business School,[Chinese], 4, 100-103.
to healthcare data. Infection Control and Hospital
Rumsfeld, J. S., Joynt, K. E., & Maddox, T. M. (2016). Big
Epidemiology, 25(8), 690–695.
data analytics to improve cardiovascular care:
O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’,
Promise and challenges. Nature Reviews. Cardiology,
Hadoop and cloud computing in genomics. Journal of
13(6), 350–359.
Biomedical Informatics,46(5), 774–781.
Safavi, S., & Shukur, Z. (2014). Conceptual privacy framework
Oztekin, A., Delen, D., & Kong, Z. J. (2009). Predicting the graft
for health information on wearable device. PLoS One,
survival for heart-lung transplantation patients: An integrated
9(12), e114306.
data mining methodology. International Journal of Medical
Schadt, E. E.(2012). The changing privacy landscape in the era of big
Informatics, 78(12), e84–e96.
data. Molecular Systems Biology, 8(1), 612.
Páez, D. G., Rodríguez, M. D. B., Sánz, E. P., Villalba, M. T., & Gil,
Sejdić, E. (2014). Medicine: Adapt current tools for handling big
R. M. (2015). Big data processing using wearable devices for
data. Nature, 507(7492), 306.
wellbeing and healthy activities promotion. In I. Cleland, L.
Sepulveda, J. L., & Young, D. S. (2013). The ideal laboratory
Guerrero, & J. Bravo (Eds.), IWAAL: Ambient assisted living.
information system. Archives of Pathology & Laboratory
ICT- based Solutions in Real Life Situations (pp. 196–205).
Medicine, 137(8), 1129–1140.
Cham, Switzerland: Springer.
Sepulveda, M. J.(2013). From worker health to citizen health:
Pai, F. Y., & Huang, K. I. (2011). Applying the technology
Moving upstream. Journal of Occupational and Environmental
acceptance model to the introduction of healthcare
Medicine, 55(12, Suppl), S52–S57.
information systems. Technological Forecasting and Social
Service, R. F.(2013). Biology’s dry future. Science, 342(6155), 186–
Change, 78(4), 650–660.
189.
Panahiazar, M., Taslimitehrani, V., Jadhav, A., & Pathak, J.
Shah, N. H., & Tenenbaum, J. D. (2012). The coming age of data-
(2014, October). Empowering personalized medicine
driven medicine: Translational bioinformatics’ next frontier.
with big data and semantic web technology: Promises,
Journal of the American Medical Informatics Association,
Challenges, and Use Cases. 2014 IEEE International
19(e1), e2–e4.
Conference on Big Data, Washington, DC.
Sheta, O. E., & Eldeen, A. N. (2013). The technology of using a
Paul, R., & Hoque, A. S. M. L. (2010). Clustering medical data to
data warehouse to support decision-making in health care.
predict the likelihood of diseases. 2010 Fifth International
International Journal of Database Management Systems,
Conference on Digital Information Management, 44-49.
5(3),75-86.
Thunder Bay, Canada.
Sirintrapun, S. J., & Artz, D. R. (2016). Health information systems.
Pentland, A., Reid, T., & Heibeck, T. (2013). Big data and health:
Clinics in Laboratory Medicine, 36(1), 133.
Revolutionizing medicine and public health. Report of the Big
Steinbrook, R. (2008). Personally controlled online health data—The
Data andd Health Working Group 2013. Retrieved from http://
next big thing in medical care? The New England Journal of
www.wish-qatar.org/summits/wish-2013/forums-research-
Medicine, 358(16), 1653–1656.
chairs/big-data-healthcare/
Swan, M. (2013). The quantified self: Fundamental disruption in big
Polpitiya, A. D., Qian, W. J., Jaitly, N., Petyuk, V. A., Adkins, J. N.,
data science and biological discovery. Big Data, 1(2), 85–99.
Camp, D. G.,…Smith, R. D. (2008). DAnTE: A statistical tool for
Tan, S. S., Gao, G., & Koch, S. (2015). Big Data and Analytics
quantitative analysis of -omics data. Bioinformatics, 24(13),
in Healthcare. Methods of Information in Medicine, 54(6),
1556–1558.
546–547.
Poulymenopoulou, M., Malamateniou, F., Prentza, A.,
Tang, P. C., Ash, J. S., Bates, D. W., Overhage, J. M., & Sands, D.
&Vassilacopous, G. (2015). Challenges of evolving PINCLOUD
Z. (2006). Personal health records: Definitions, benefits,
PHR into a PHR-based health analytics system. Paper presented
and strategies for overcoming barriers to adoption.
at the Proceedings of the European, Mdediterranean & Middle
Journal of the American Medical Informatics Association,
Eastern Conference on Information Systems EMCIS.
13(2), 121–126.
Preen, D. B., Holman, C. D., Spilsbury, K., Semmens, J. B., &
Taverner, T., Karpievitch, Y. V., Polpitiya, A. D., Brown, J. N., Dabney,
Brameld, K. J. (2006). Length of comorbidity lookback
A. R., Anderson, G. A., & Smith, R. D. (2012). DanteR: An
period affected regression model performance of
extensible R-based tool for quantitative analysis of -omics data.
administrative health data. Journal of Clinical
Bioinformatics (Oxford, England), 28(18), 2404–2406.
Epidemiology,59(9), 940–946.
Tola, K., Abebe, H., Gebremariam, Y., & Jikamo, B. (2017).
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in
Improving Completeness of Inpatient Medical Records in
healthcare: Promise and potential. Health Information
Menelik II Referral Hospital, Addis Ababa, Ethiopia.
Science and Systems, 2(1), 3.
Advances in Public Health, 2017, 1–5.
Redmond, S. J., Lovell, N. H., Yang, G. Z., Horsch, A., Lukowicz,
Tony, H., Stewart, T., & Kristin, T. (2012). The fourth paradigm: Data
P., Murrugarra, L., & Marschollek, M. (2014). What does
-intensive scientific discover. Berlin, Germany : Springer-
big
Verlag Berlin Heidelberg.
Tsumoto, S., Hirano, S., & Iwata, H. (2013). Mining nursing care plan
Zhang, Z., Zhou, Y., Du, S. H., Luo, X. Q., & Mei, T. (2014). Medical
from data extracted from hospital information system. Paper
big data and the facing opportunities and challenge. Journal of
presented at the 2013 IEEE/ACM International Conference on
Medical Informatics, 6, 2–8.
Advances in Social Networks Analysis and Mining, Niagara
Falls, ON, Canada.
Usami, Y., Cho, H. C., Okazaki, N., & Tsujii, J. I. (2011). Automatic
acquisition of huge training data for bio-medical named entity
recognition. Proceedings of BioNLP 2011 Workshop 5, 65-73.
Valdes, I., Kibbe, D. C., Tolleson, G., Kunik, M. E., & Petersen, L. A.
(2004). Barriers to proliferation of electronic medical records.
Journal of Innovation in Health Informatics, 12(1), 3–9.
Vesna, V. (2000). The Visible Human Project: Informatic bodies and
posthuman medicine. AI & Society, 14(2), 262–263.
Wang, L., & Alexander, C. A. (2013). Applications of automated
identification technology in EHR/EMR. International Journal of
Public Health Science, 2(3), 109–122.
Wang, Y., Kung, L., Ting, C., & Byrd, T. A. (2015). Beyond a
technical perspective: Understanding big data capabilities in
health care. Proceedings of 48th Annual Hawaii International
Conference on System Sciences 48( pp.3044-3053). Hawaii,
USA.
Ward, J. C. (2014). Oncology reimbursement in the era of
personalized medicine and big data. Journal of Oncology
Practice 10(2), 83–86.
White, S. E. (2013). De-identification and the sharing of big
data. Journal of American Health Information
Management Association, 84(4), 44–47.
Wilson, A. M., Thabane, L., & Holbrook, A. (2004). Application
of data mining techniques in pharmacovigilance. British
Journal of Clinical Pharmacology, 57(2), 127–134.
Windridge, D., & Bober, M. (2014). A kernel-based framework for
medical big-data analytics. In A. Holzinger & I. Jursica (Eds.),
Interactive knowledge discovery and data mining in
biomedical informatics (pp. 197-208). Berlin, Germany:
Springer-Verlag.
Wu, P. Y., Cheng, C. W., Kaddi, C. D., Venugopalan, J., Hoffman,
R., & Wang, M. D. (2017). –Omic and electronic health
record big data analytics for precision medicine. IEEE
Transactions on Biomedical Engineering, 64(2), 263–273.
Xiang, W., Wang, G., Pickering, M. & Zhang, Y. (2016). Big video
data for light-field-based 3D telemedicine. IEEE Network,
30(3), 30–38.
Xu, J., Wise, C., Varma, V., Fang, H., Ning, B., Hong, H., Kaput,
J. (2010). Two new Array Track libraries for personalized
biomedical research. BMC Bioinformatics, 11(Suppl 6),
S6.
Yan, Y., Qin, X., Fan, J., & Wang, L. (2014). A review on healthcare
big data research. E-Science Technology & Application,
[Chinese], 5(6), 3-16.
Yom-Tov, E. (2016). Crowdsourced health: How what you do on the
Internet will improve medicine. Cambridge, MA: Mit Press.
Youssef, A. E. (2014). A framework for secure healthcare systems
based on big data analytics in mobile cloud computing
environments. The International Journal of Ambient Systems
and Applications, 2(2), 1-11.
Yuen-Reed, G., & Mojsilović, A. (2016). The role of big data and
analytics in health payer transformation to consumer-centricity.
In C. Weaver, M. Ball, G. Kim & J. Kiel (Eds.), Healthcare
information management systems (pp. 399–420). Switzerland:
Springer.
Zhang, D. Q., & Chen, S. C. (2004). A novel kernelized fuzzy c-means
algorithm with application in medical image segmentation.
Artificial Intelligence in Medicine, 32(1), 37–50.

You might also like