Big Data in Health Care Applications and Challenge
Big Data in Health Care Applications and Challenge
Journal 2018;
xyz 2017; 1 (2): AoP
122–135
2 Major Types and Sources of Big organizations (Heart, Ben-Assuli, & Shabtai, 2017; Joshi &
Yesha, 2012; L. Wang & Alexander, 2013) and is the whole-
Data in Health Care life record of a patient from birth to death stored in the
medical institution, while EMR is the complete record
Health care has become an important issue in developed of patient’s disease stored in the hospital; EHR focuses
countries and middle-income countries (Kyoungyoung Jee on health management of residents, while EMR focuses
& Gang Hoon Kim, 2013). Big Data in health care can be on clinical diagnosis of patients; EHR also contains
classified into four main types based on the data sources, data of demographics, medical history, medication and
i.e., Big Data in medicine, also named as medical/clinical allergies, immunization status, laboratory test results,
Big Data; Big Data in public health and behavior; Big radiology images, vital signs, personal statistics, and
Data in medical experiments; and Big Data in medical billing information (M, 2014); EMR is the record of care
literature. Table 1 summarizes the information of major delivery organization (CDO) and belongs to CDO, while
data types. EHR is the subset of CDO and belongs to the patients or
stakeholders (Garets & Davis, 2007). EHRs are adopted by
many countries, generating about 500 petabytes of data in
Big Data in Health Care: Applications and Challenges 3
Table 1
Summary of Major Date Types of Big Data in Health Care
Personal health As its name Allergies and adverse drug Cloud computing, Health Insurance
record (PHR) suggests, it is the reactions, chronic diseases, Portability and Accountability
health-related data family history, illnesses and Act(HIPAA) , and HL7 (Chen et al.,
and information of hospitalizations, imaging 2012); stored in paper like printed
patients (Tang, Ash, reports, laboratory test results, laboratory reports, copies of clinic
Bates, Overhage, medications and dosing, notes, and health histories created
& Sands, 2006) prescription record, surgeries and by the individual; electronic devices
and about people’s other procedures, vaccinations and such as personal computer-based
lifelong health observations of daily living, and software, CD, DVD, and smart card;
information. It is reported by patients (Rumsfeld, web applications such as HealthVault
available for further Joynt, & Maddox, 2016) and PatientsLikeMe; and cloud
use (Chen et al., servers (Chen et al., 2012)
2012)
Medical images Data that present X-ray, CT, histology, positron- Statistical shape models (SSMs),
visual information of emission tomography (PET), medial models, clustering, active
interior human body radiography, MRI, nuclear appearance models (AAMs), active
medicine, elastography, shape models (ASMs) (Heimann &
tactile imaging, photoacoustic Meinzer, 2009), image segmentation
imaging, echocardiography algorithm, fuzzy C-means (FCM)
(Kovalev & Kalinovsky, 2015), algorithm (Zhang & Chen, 2004),
ultrasonography, angiography image registration, picture archiving
and communication systems,
Super PACS (Picture Archiving and
Communication Systems) , RIS,
and digital image communication
in medicine (DICOM) (Luo, Wu,
Gopukumar, & Zhao, 2016)
4 Liang Hong et al.
Table 1
Continued
Summary of Major Date Types of Big Data in Health Care
Vitals Mainly refer to four Temperature, pulse, respiratory Mobile technology, portable
sings (temperature, rate, and blood pressure equipment, wearable system, and
pulse, respiratory advanced devices like smartphones
rate, and blood with third-party applications
pressure) and (HealthKit from Apple, Google Fit
other physiological from Google, and S Health form
data outside the Samsung), Android watches and
health-care setting Google glasses (Safavi & Shukur,
(Rumsfeld et al., 2014), and medical devices
2016) like implantable cardioverter–
defibrillators (Rumsfeld et al., 2016)
-omics data Biology information Genomics, transcriptomics – Data End-of-life (EOL) Extension
data in molecular- whole genome sequencing, RNA (DAnTE) and DanteR
level catalog seq, metabolomics –Nuclear
(Skotnes, Magnetic Resonance (NMR) , mass
2012). Reflects spectrometry, proteomics – mass
characteristics spectrometry, methylomics –
of individual for pyrosequencing, and ChIP-on-chip
treatment (Rumsfeld
et al., 2016)
Human body Data and samples of Cells, tissues, and organs Mayo Clinic Biobanks (http://
samples cells, tissues, and specimencentral.com/biobank-
organs in human directory/)
body (Bagayoko,
Big Data in medical experiment
Dufour, Chaacho,
Bouhaddou, &
Fieschi, 2010)
Table 1
Continued
Summary of Major Date Types of Big Data in Health Care
Classification of
Diseases 10th
revision (ICD-10)
2012, which is expected to reach 25,000 petabytes by 2020 such as smartphones with third-party applications
(Feldman, Martin, & Skotnes, 2012). (HealthKit from Apple, Google Fit from Google, and S
PHR comes from a variety of patient health and Health form Samsung), Android watches, and Google
social information; the main role of it is as a data source Glasses have been developed with sensors in the health
for medical analysis and clinical decision support care area (Safavi & Shukur, 2014). Since people have
(Poulymenopoulou et al., 2015) . It includes data of become more concerned with their own health on a day-to-
allergies and adverse drug reactions (ADRs), chronic day basis, ODLs have come to play a key role in recording
diseases, family history, illnesses and hospitalizations, personal daily health and behavior, signs, and symptoms
imaging reports, laboratory test results, medications of patients (Backonja et al., 2012). Additionally, data of
and dosing, prescription records, surgeries and other sports and diet of people also contribute significantly
procedures, vaccinations, and observations of daily living to Big Data in public health and behavior. In the Apple
(ODLs). Unlike other document or text data, medical iTunes store alone, there are more than 40,000 health
imaging mainly comes from X-ray, CT, histology, PET, care apps available (Aitken & Gauntlett, 2013). In 2017,
radiography, magnetic resonance imaging (MRI), nuclear it is predicted that more than 1.7 billion people will have
medicine, ultrasound, elastography, tactile imaging, downloaded health care apps.
photoacoustic imaging, echocardiography, and so on. It In terms of infectious diseases in public health,
contains visual elements, and this means that data are there is a well-known case in which Google successfully
usually very large (Kovalev & Kalinovsky, 2015). predicted the time and scale of an influenza by analyzing
the search engine results.
samples of cells, tissues, and organs in human body, as concept of distribution to handle tremendous volumes of
well as cross-sectional photographs of the human body data (Asante-Korang & Jacobs, 2016; Kyoungyoung Jee &
in the visible human project, which is used to visualize Gang Hoon Kim, 2013). In terms of data management, data
anatomy of human body in support of medical activities warehouses are used for supporting decision-making,
(Vesna, 2000). Similar to human body data sets, biological online transaction processing (OLTP), and online analysis
laboratory specimen also comes from sampling of human processing (OLAP) (Sheta & Eldeen, 2013). In addition,
body and it is stored in biorepository. In case of one type of machine learning in data mining seems to be the most
new drug, novel vaccines, or new medical device has been popular technological approach in Big Data analysis, and
created, clinical trials should be processed before they come some technologies such as retrieval, web mining, decision
into use. Clinical trial, a kind of experiment or observation tree, support vector machines (SVMs), clustering, neural
in medical or clinical research, is a procedure of evaluating network, network analysis, knowledge maps, and Natural
the effectiveness of new medical treatment through study Language Processing (NLP) and Multi-Layer Perceptron
on human volunteers (DerSimonian & Laird, 1986). Gene (MLP) approaches have been used. For instance, named-
sequencing, mainly referring to DNA sequencing, is a entity recognition is one of the most important techniques
medical research activity of obtaining precise order of in BioNLP, used in recognizing particular entity processes
nucleotides within DNA. This process results in a large such as gene normalization and event extraction (Usami,
amount of data for recording DNA sequences. Medical Cho, Okazaki, & Tsujii, 2011). Various techniques for –
research is often performed by researchers in universities, omics data analysis, such as amplified fragment length
research institutions, and industry. The objective of their polymorphism (AFLP) for DNA fingerprinting and
work is to make breakthrough in cellular, molecular, interpretation, validation tools for –omics data (Hassani
and physiological mechanisms in human for health S, 2010), and statistical tools data analysis tool extension
care; fundamental parts of it also include molecular (DAnTE) and data analysis tool extension R (DanteR) for
biology, medical genetics, immunology, neuroscience –omics data analysis have emerged with different usages
and psychology (Obenshain, 2004). Omics data are the (Polpitiya er al., 2008; Taverner et al., 2012). In addition
biology information data in the molecular level catalog, to the techniques in data processing, techniques for
which include genomics, proteomics, metabolomics, health care data have progressed in HISs. For example,
transcriptomics, epigenomics, lipidomics, immunomics, a typical system is developed for data collection, data
glycomics, and RNomics (Wu et al., 2017). management, and data sharing in Hospital Information
System (HIS) (Abernethy, Wheeler, & Bull, 2011).
Currently, new technologies and new models have been
2.4 Big Data in medical literature found to be effective for structured and unstructured Big
Data in health care. Data mining, as well as NLP, has been
As the medical/clinical area has developed, currently, incorporated in the Big Data platform to handle complex
research articles as well as the structured knowledge are scientific research oriental problems.
produced at a high speed. Additionally, there are also As a sociotechnical subsystem, HIS is commonly
many older materials in the medical/clinical area. This featured in presenting quality community for historical
literature makes a significant contribution to Big Data in data resource, information, and knowledge in health
health care. care for hospital administration and patient health care
(Bagayoko & Dufour, 2010; Kanagaraj & Sumathi, 2011;
Roberts, 1985; Tsumoto et al., 2013) (Table 2). HIS was
2.5 Hospital information system (HIS) and developed only for administrative management usage in
its evolution the early 1960s and gradually expanded to information
management after 1970 (Pai & Huang, 2011). Broadly
Technology for Big Data storage and processing like speaking, there are many types of HIS. For instance, PACS,
the Cassandra database has been applied; the main short for picture archiving and communication systems,
characteristic of this tool is that it can accommodate is a common HIS for storing and transferring digital
about two million columns in one row, making it images (Joshi & Yesha, 2012). Additionally, laboratory
more convenient to deal with large volumes of data information system (LIS), radiology information system
(Kyoungyoung Jee & Gang Hoon Kim, 2013). In Big Data, (RIS), ultrasound information system (UIS), and EHR
including those in health care, one of the most popular system, EMR system and PHR system are also included
processing tools Hadoop, created by Apache, uses the (He, Jin, Zhao, & Xiang, 2010; Joshi & Yesha, 2012). In
Big Data in Health Care: Applications and Challenges 7
Table 2
Systems for Acquiring Medical/Clinical Big Data
System Description
HIS Hospital information system; the system provides quality community for historical data
resource, information, and knowledge in healthcare for hospital administration and patient
health care (Bagayoko et al., 2010; Kanagaraj & Sumathi, 2011; Sirintrapun & Artz, 2016;
Tsumoto, Hirano, & Iwata, 2013)
LIS Laboratory information system; often used to collect, restore, archive, process, extract,
and analyze data in laboratory; this system aims to improve efficiency of turn-around-times
(TAT) of records, quality of resource utilization, and public health supporting (Blaya et al.,
2007; Sepulveda & Young, 2013)
RIS Radiology information system; it is used to capture and store data including images,
demographic and clinical information, and so on, also assisting in patient registration,
report repository, and physician directory with advanced technology (Nance, Meenan, &
Nagy, 2013)
PACS (super sound PACS, endoscope PACS) Picture archiving and communication systems; it is a common HIS for storage and
transferring of digital images (Joshi & Yesha, 2012)
EMR EMR system is used to maintain medical records and store, process, and retrieve
information. It also ensures accuracy of information. Its aim is to ensure accuracy of
information in order to provide patient control and transparency, interdepartmental
communication, and great reporting capabilities for treatment (Kumar & Aldrich, 2010)
Cost accounting System for collecting, recording, classifying, analyzing, summarizing, allocating, and
evaluating financial cost in the medical area
terms of handling HL7 format data, the open archive Information System (HIS) development. According to
information system model was applied (Celesti, Fazio, Bagayoko & Dufour (2010), web infrastructure, server
Romano, & Villari, 2016). HIS presents the ability to operation systems, developer tools, and databases are
capture, store, and process health care data and often commonly used in Europe and North America.
requires a large number of techniques to assist it. In other
words, one of the major research challenges is how to
integrate advanced techniques of information processing
into HIS (Roberts, 1985). Cloud computing, a technique for
3 Unique Features of Big Data in
data storage and sharing, is widely used in information Health Care
system. The use of cloud computing in HIS is well known
and very common for data processing, data backup, and In addition to the “5V” features of Big Data, Big Data
information sharing between different organizations, such in health care has its own unique features, such as
as cloud-based PACS and cloud-based EHR systems (He et heterogeneity, incompleteness, timeliness and longevity,
al., 2010; Joshi & Yesha, 2012; Kanagaraj & Sumathi, 2011). data privacy, and ownership.
Cloud security requires in many aspects, including data
security, application security, system security, network
security, and physical security, a high-quality of security 3.1 Heterogeneity
management platform. Additionally, novel techniques
have been proposed to improve the quality of HIS. For Big Data in health care often has incompatible formats,
example, in order to achieve data-level interoperability, which can be classified into structured and unstructured
an adaptive AdapteR Interoperability ENgine (ARIEN) data. For example, some EHR collect data in structured
mediation system was proposed (Khan et al., 2014) formats and International Classification of Diseases 10th
for HIS with different health care standards. Open- revision (ICD-10) are structured (Asante-Korang & Jacobs,
source software is also available for supporting Hospital 2016). However, the majority of Big Data in health care is
8 Liang Hong et al.
unstructured, including data from CT, MRI, X-ray, Holter (SPECT) images, MRI, and EEG are a function of time
monitoring, angiography, and laboratories (Swan, 2013). and thus have a strong timeliness. Keeping medical/
The sources of the Big Data in health care can health information current is a major challenge for Big
be classified into four categories (Table 1). There is a Data in health care analytics, and HIS should maximize
shortage of tools to analyze the information from these the timeliness of data. At the same time, storage time of
heterogeneous sources. A German calciphylaxis registry medical records is different among hospitals. For some
proposed a framework and developed a tool to integrate familial or genetic diseases, it is useful to know the family
medical record, imaging data, and signal data for the history in order to support medical decision-making. To
purpose of improving knowledge of rare diseases (Deserno this point, there is no link between one’s medical records
et al., 2014). Windridge and Bober (2014) proposed a with those of his/her family members.
kernel-based framework to analyze heterogeneous data
in the medical domain, which addressed the missing
data problem presented by patients with sparse or absent 3.4 Data privacy
data modalities. Using the kernel method, regression and
classification of heterogeneous medical information can Owing to the sensitivity of health care data, there are
be achieved. Cismondi et al. (2013) developed a classifier to significant concerns regarding privacy and security
determine which missing data of ICUs should be imputed (Clemens Scott Kruse et al., 2016; Naito, 2014). Extreme
and which should not be. Through a simulated test bed, care should be taken to protect patient privacy, and
the performance of this method is improved compared privacy concerns pose limitations in linking external data
with that of the previous work. to individual insured data, which may improve consumer
health-related experience and personalize service and
care (Yuen-Reed & Mojsilović, 2016). Because of the
3.2 Incompleteness centralization of much health care information, the data
are highly vulnerable to attacks (Mohr, Burns, Schueller,
To the extent that the data created by monitoring Clarke, & Klinkman, 2013). Owing to privacy issues,
devices consist of continuous data streams, such as Herland et al. (2014) used synthesized EMR/EHR and PHRs
electrocardiogram, it is difficult to consistently save it with help from a medical professional to conduct their
in the longitudinal record (Clemens Scott Kruse, Rishi research. Health care mobile phone applications, such
Goswamy, Yesha Raval, & Sarah Marawi, 2016). It is as Google Health, promise consumers “complete control
too expensive to store all the Big Data in health care, a over your data,” meaning that personal information will
situation that leads to data incompleteness. Additionally, not be sold or shared without the consumer’s explicit
the EHR requires doctors or nurses to record disease permission (Steinbrook, 2008). In different countries,
information of patients, such as medications and allergies, there are two patterns of policies and regulations to
and this process may also lead to data incompleteness protect the data in health care. In one pattern, based on
(Hong, Kaur, Farrokhyar, & Thoma, 2015). In Menelik II the basic privacy laws, governments pass additional laws,
Referral Hospital, inpatient medical record completeness policies, and regulations to protect personal health care
was 73%, which is low against the standard. Medical information, such as HIPAA in the US, Health Records and
records not only support direct patient care but also Information Privacy Act 2002 in Australia, and Medical
support clinical audit, epidemiology, medical research, Privacy Act and Healthcare Insurance Act in France. In
and resource allocation. Improving the completeness of the other pattern, taking personal health care information
medical records is important to improve the quality of as part of personal information or sensitive information,
health care (Tola, Abebe, Gebremariam, & Jikamo, 2017). governments pass laws to protect personal information
or sensitive information, such as the Data Protection Act
in England and he Personal Information Protection and
3.3 Timeliness and longevity Electronic Documents Act (PIPEDA) in Canada.
hospitals, physicians, laboratories, clinics, pharmacies, diagnosis (Costa, 2014). Now in the cardiology area,
and government agencies in innumerable, incompatible computing and Big Data technology enable cardiologists
data silos, consumers may lack access to and control to read patients’ medical record via smartphones, which
over their own health care data. To solve this problem, are helpful in identifying emergency cases in need of
the cooperative, which is an old and successful form immediate treatment (Hsieh, Li, & Yang, 2013).
of corporation that is entirely owned by citizens, is an
effective approach. Each consumer has one account that
stores and manages all health care data. They can share 4.2 The perspective of the government or
subsets or all the data for research purposes (Pentland, the public
Reid, & Heibeck, 2013).
BDA could reduce costs in the medical domain, estimated
at approximately 8% of national health care expenditures
4 Importance of Big Data in Health for the US government (Manyika et al., 2011). In
Italy, by exploiting the admissions for “laparoscopic
Care appendectomy” surgery in different sanitary districts, it
was possible to categorize districts based on cost efficiency
It is important to extract valuable information and discard and timeliness by using the number of admissions and
useless fragments from Big Data. As the main issue for the average days of hospitalization. This data analysis
this discussion, Big Data in health care could produce provides an automatic and continuous monitoring of the
considerable economic benefits with the application of Big sanitary districts. The results of this data analysis provide
Data analytics (BDA). For example, a significant amount useful insights into reducing cost and increasing the
of money could be saved in the health care industry effectiveness and efficacy of health care services (Mancini,
(Asante-Korang & Jacobs, 2016). Additionally, it would be 2014a).
applied in clinical diagnosis, medical research, hospital BDA could help governments prevent the spread of
management, and fundamental demand in medicine. infectious diseases. In Pakistan, BDA with smartphone
Through the use of Big Data techniques, patients may technology helped in detection and prevention of the
have personalized medicine and patient-centric care. This early stage of the dengue fever epidemics. The method
argument supposes that Big Data would help to provide was also used to detect outbreaks of flu epidemics in the
novel approaches to deal with issues in health care (C. S. US (Pentland et al., 2013). Governments can thus respond
Kruse, R Goswamy, Y Raval, & S Marawi, 2016). more quickly to epidemics and help people avoid the
disease.
BDA has the potential to reveal regional health
4.1 The perspective of the research problems. For example, Duke University led a project that
institution and the hospital involved building an integrated clinical data warehouse
by combining millions of patient records from their EHRs
Research institutions could better understand the with geographic information system data (Braunstein,
mechanisms and effects of newly developed drugs 2015). Based on the combined data, this project reveals
through BDA. For example, it could also reprocess cancer the social determinants of health.
data to hunt for new cancer drugs (Marx, 2013). Through
using statistical tools and algorithms, researchers could
improve the clinical trial design and reduce trial failures 4.3 The perspective of patients and their
(Wullianallur Raghupathi & Raghupathi, 2014). relatives
Physicians could use clinical decision support systems
(CDSS) with BDA to make more informed decisions, which Using health care mobile phone applications and other
may improve the quality of patient care (K. Jee & G.-H. online health-related websites, patients can store, retrieve,
Kim, 2013; Kim, Park, Yi, & Kim, 2014). Allowing Big Data manage, and share their health data. Over the long
to influence clinical decision-making, new practices, term, this process will improve health care and decrease
and treatment guidelines within clinical research may costs, especially for patients who have complicated
be integrated and lead to an optimized result. BDA and chronic conditions (Steinbrook, 2008), such as diabetes.
computer-aided diagnostics may be used to save time in Some diabetes applications offer a variety of functions,
cancer detection, reducing the false-positive rate of cancer including medication or insulin logs, self-monitoring
10 Liang Hong et al.
blood glucose recording, and prandial insulin dose promotion services. New Zealand is in a strong position to
calculators (Demidowich, Lu, Tamler, & Bloomgarden, analyze patterns of childhood morbidity due to universal
2012), and others integrate health care providers who can enrollment with a primary care provider at birth. However,
access the patients’ records and formulate personalized analyzing morbidity patterns within these extracted data
feedback. Thus, patients can take the right treatments and is problematic because primary care practices do not
live healthier, more comfortable lives (Asri, Mousannif, Al consistently or frequently use diagnostic labeling and there
Moatassime, & Noel, 2015). is marked variability between clinicians and conditions. A
Through Big Data techniques, patients may have study conducted by MacRae et al. (2015) aimed to extend
personalized medicine and patient-centric care (Chawla the use of Pattern Recognition Over Standard Aesculapian
& Davis, 2013; Collins, 2016). Chawla and Davis (2013) Information Collections (PROSAIC) to identify childhood
constructed a framework called the Collaborative respiratory conditions within primary care consultations
Assessment and Recommendation Engine (CARE) for by building an algorithm to classify the unstructured
patient-centered disease prediction and management. clinical narrative written by clinicians. Three independent
It can generate personalized disease predictions and sets of 1,200 child consultation records were randomly
management plans. In addition through BDA, three drugs extracted from a data set of all general practitioner
have been identified and used in specific groups of cancer consultations in participating practices between January
patients. Dabrafenib is used to treat melanoma; the BRAF 1, 2008, and December 31, 2013, for children younger
mutation V600E, a targeted therapy using trastuzumab, than 18 years of age (n=754,242). Each consultation
is used to treat breast cancer and the amplification or record within these sets was independently classified by
overexpression of the gene encoding Her2/Neu; and two expert clinicians as respiratory or non-respiratory
imatinib is used to treat different types of tumor that and subclassified according to respiratory diagnostic
contain the fusion protein BCR-ABL (Costa, 2014). categories to create three “gold standard” sets of classified
Through BDA, patients may have their diseases records. These three “gold standard” record sets were used
detected earlier, receive treatment earlier, and have better to train, test, and validate the algorithm. Then, sensitivity,
outcomes (K. Jee & G.-H. Kim, 2013; Kim et al., 2014). specificity, positive predictive value, and F-measure were
In daily life, BDA can help patients and their relatives calculated to illustrate the algorithm’s ability to replicate
monitor their respective conditions. judgments of expert clinicians within the 1,200 record
“gold standard” validation set. This algorithm that uses
primary care Big Data can accurately classify the content
breast cancer diagnosis and liver disorder diagnosis. In than 1,500 patients were analyzed), with a sensitivity
this paper, they introduced the method and algorithm of and specificity close to 90%, which are considerably
of a case-based fuzzy decision tree (FDT) model for better than those predicted by human experts.
medical classification problems. Two medical data sets
including liver disorders and Breast Cancer Wisconsin
are selected from University of California Irvine (UCI) 5.2 Clustering
database. More than 900 data sets are used to conduct
this experiment. Decision tree induction is free from Clustering is the task of grouping a set of objects in such
parametric assumptions, and it generates a reasonable a way that objects in the same cluster are more similar
tree by progressively selecting attributes to branch the to each other than those in other clusters. Clustering
tree. By combining all kinds of medical features of liver techniques are widely used for exploratory data analysis,
disorders and Breast Cancer Wisconsin database, this with applications including patient segmentation, outlier
research applies an FDT to develop a forecasting model health care data detection, disease prediction, and
for generating decision rules in disease classification. clustering of patients.
This classification model integrates a data clustering Elbattah & Molloy (2017) employed clustering in order
technique, an FDT, and genetic algorithms (GAs) to to realize the segmentation of patients from a data-driven
construct a medical classification system based on viewpoint. The Irish Hip Fracture Database (IHFD) is
medical database. It can be divided into four major steps: the primary source of data used in the study. Its records
(1) screening medical database from UCI data set; (2) contain ample information about patients’ journeys from
clustering case library into smaller cases; (3) establishing admission to discharge. Then, a set of data pre-processing
FDT; and finally (4) outputting the classification results. procedures are conducted for two purposes: (1) dealing
Clinical data usually contain numerous features with data anomalies and (2) extraction of additional
with small sample size, leading to degradation in features that are considered as indicators of care quality.
accuracy and efficiency of the system by curse of In this paper, the authors use k-means algorithm as the
dimensionality. This leads to the degradation of classifier partitioned clustering approach. The k-means clustering
system’s performance in high-dimensional data sets uses a simple iterative technique to group points in a data
because irrelevant features not only lead to insufficient set into clusters that contain similar characteristics.
classification accuracy but also add extra difficulties in Christy et al. (2015) proposed two cluster-based outlier
finding potentially useful knowledge. Azar and Hassanien detection algorithms including distance-based outlier
(2015) presented a linguistic hedges neuro-fuzzy classifier detection and cluster-based outlier detection. The main
with selected features (LHNFCSF) for dimensionality purpose of the algorithms was to remove outliers that are
reduction, feature selection, and classification. The irrelevant or only weakly relevant to the analysis of health
new classifier is compared with the other classifiers for care data. Experimental evaluation based on the metrics
different classification problems. All data sets are in the of F-score and likelihood ratio shows that the cluster-
public domain. The data sets are breast cancer Wisconsin based outlier detection method outperforms distance-
diagnostic, breast cancer Wisconsin prognostic, based outlier detection method.
erythemato-squamous disease, and thyroid disease data Huang and Yao (2016) proposed a novel clustering
set. These data sets are obtained from the well-known approach for multidimensional physical health data
UCI machine learning repository. The results indicate that based on artificial ant colony optimization. This method is
applying LHNFCSF not only reduces the dimensions of the determined through testing to be an effective and efficient
problem but also improves classification performance by approach to clustering health and medical data for further
discarding redundant, noise-corrupted, or unimportant analysis.
features. The results strongly suggest that the proposed Paul and Hoque (2010) proposed to use the background
method not only helps reduce the dimensionality of large knowledge of medical domain in the clustering process
data sets but also can speed up the computation time of to predict the likelihood of diseases. The developed
a learning algorithm and simplify the classification tasks. algorithm can handle both continuous and discrete data
Estella et al. (2012) designed an advanced system and perform clustering based on anticipated likelihood
for autonomously classifying brain MRI images of attributes with core attributes of disease in data point.
neurodegenerative diseases, with the main purpose of In this paper, its effectiveness has been demonstrated by
assisting in decision-making in classification tasks. The testing it on a real-world patient data set.
method was tested on data from a large database (more
12 Liang Hong et al.
Hastie et al. (2005) conducted a test in which Risk adjustment is an important component of
188 individuals (59.0% female) completed several outcomes and quality analysis in surgical health care.
psychological instruments and underwent ischemic, However, there are some concerns that should be
pressure, and thermal pain assessments. Then, 13 separate addressed if risk-adjustment models avoid subjective
pain measures were obtained by using three experimental data elements, such as history of comorbidities, and
pain modalities with several parameters tested within rely on objective data, such as laboratory values or other
each modality. Cluster analyses of PSI scores revealed machine-collected variables that do not require subjective
four distinct clusters, and significant correlations were interpretation and input of hospital personnel.
found between psychological measures and index scores. A study was conducted by Anderson and Chang (2015)
These findings highlight the need for future investigation was conducted to determine whether machine-collected
to identify patterns of responses across different pain data elements could perform as well as a traditional, full
modalities in order to more accurately characterize risk-adjustment model that includes other physician-
individual differences in responses to experimental pain. assessed and physician-recorded data elements. This
research uses all available The National Surgical Quality
Improvement Program (NSQIP) data from January 1, 2005,
5.3 Regression analysis to December 31, 2010. This nationally validated program
measures more than 135 variables on each patient and
Regression analysis is widely used in analyzing health care follows up each patient for 30 days postoperatively. The
Big Data for estimating the relationships among variables primary analysis included all patients in the database who
or properties. The main research issues include trend were categorized as having had an operation performed
features of data sequences, prediction of data sequences, by a general surgeon or surgeons in some surgery
and relationships between data. subspecialties and having an adverse event. Multivariate
With the emergence of administrative databases, the logistic regression models were created to predict
ability to access longitudinal patient data to adjust for either mortality or any complication in the inpatient
comorbidity has improved considerably. This raises the setting or within 30 days of surgery. The researchers
issue of the most appropriate lookback period to determine then compared the ROC AUC of each regression using
patients’ disease status for risk estimation. Most research objective preoperative risk variables to its corresponding
has used relatively short lookback durations, but longer regression with all variables. A total of 745,053 patients
lookback periods are likely to capture more conditions were included. The difference in AUC comparing models
per patient, as well as assign comorbidities to a greater with all variables with objective variables ranged from
proportion of patients. Preen et al. (2006) conducted a −0.0073 to 0.1944 for mortality and from 0.0198 to 0.0687
research to discover the impact of different comorbidity for complications. These data suggest that it is possible to
ascertainment lookback periods on modeling post- create a risk-adjustment system with a high discriminatory
hospitalization mortality and readmission. Data were value based only on objective variables. By restricting
extracted for ~1.1 million patients admitted to hospital data collection to objective data, we can reduce concerns
in the Washington State from July 1990 to December about reliability and validity as well as threats of gaming
1996. Hierarchically nested Cox regressions were used to the system from attempting to increase the risk score of
model mortality within one year and readmission within patients through subjective variables.
30 days of index separation. Additionally, deaths within Kennedy et al. (2013) conducted a retrospective cohort
one year and readmissions within 30 days of index study. In this paper, they identified all Veterans Health
hospitalization were analyzed using logistic regression Administration (VHA) patients without recent cerebral
and receiver operator characteristic (ROC) area under the and cardiovascular (CCV) events treated at twelve facilities
curve (AUC) determined for each hierarchically nested from 2003 to 2007 and predicted risk using the Framingham
lookback model in order to estimate the predictive power risk score (FRS), logistic regression, generalized additive
of different models. The result is that longer lookback modeling, and gradient tree boosting.
resulted in more comorbidity being identified. For the Oztekin et al. (2009) used three different variable
entire sample, 46.8% of comorbidity observed across selection methods on a large and feature-rich data set
the five-year lookback period was recorded at index to generate a consolidated set of factors and use them
hospitalization. For readmission, lookback periods of to develop Cox regression models for heart–lung graft
five years perform better than shorter durations for both survival. The main objective of this study was to improve
patient groups. the prediction of outcomes following combined heart–
Big Data in Health Care: Applications and Challenges 13
lung transplantation by proposing an integrated data- the data sets provided by the National Health Insurance
mining methodology. The data files were obtained Plan of Taiwan demonstrates that the proposed method
from United Network for Organ Sharing (UNOS) using can find the hidden rules that may occur less often but
a formal data requisition procedure. The complete data have robust relationships.
set consists of 443 variables and 61,391 records. These
variables included the socio-demographic and health-
related factors of both the donor and the recipients. There
are also procedure-related factors included in the data
6 Systems and Applications for
set. The results indicated that the proposed integrated Analyzing Big Data in Health Care
data-mining methodology using Cox hazard models
better predicted graft survival with different variables Big Data can provide support across many aspects of
than the conventional approaches commonly used in the health care. BDA has made progress to different degrees in
literature. CDS, remote medical information services, public health,
disease pattern analysis, and personalized medicine.
There are some specific applications and potential
5.4 Association rules opportunities in these areas.
QMR is a typical CDSS to help physicians, using the The aggregated electrocardiogram (ECG) and images from
knowledge base of INTERNIST-1/CADUCEUS. This hospitals worldwide can become Big Data, which could be
knowledge base is widely used as a medical book, which used to develop an e-consultation program helping on-site
contains 750 diseases, 5,000 clinical symptoms, and more practitioners deliver appropriate treatment. Real-time
than 50,000 disease relationships. QMR was one of the teleconsultation and telediagnosis of ECG and images can
earliest CDSSs to use artificial intelligence and probability be practiced via an e-platform for clinical, research, and
ranking system. educational purposes.
Because many of the diseases in the system are rare With respect to large-scale data research, Chia and Syed
and documented, an ad hoc scoring model is proposed (2011) used Big Data computing to generate a predictor of the
to encode the relationship between specific clinical mortality risk for patients with acute coronary syndromes
symptoms and disease. One of the factors limiting the use in 2011. This predictor was developed through data mining
of QMR is that its knowledge base needs to be constantly and machine learning, based on 24-hour continuous ECG
updated. The significance of QMR lies in its powerful readings over 4,000 patients’ trials. In each trial, 24-hour
knowledge base, which is used as the basic model of other ECG readings were collected in a two-year period. This Big
knowledge base system. Data-based predictor can predict over 50% of deaths with
fewer false positives as compared with the traditional ECG
analysis, which was conducted based on a smaller segment
6.1.3 The Iliad system of ECG signals. This approach can be easily extended to
other clinical and non-clinical applications focused on
Iliad is a medical expert consulting system developed by approximate sequential pattern discovery in massive time-
the University of Utah School of Medicine. It is used as a series data sets.
consultation tool or a simulation training tool for CDS and To make telemedicine more efficient, medical
teaching (Lincoln, 1998). wearable devices that apply Big Data-mining and analysis
The Iliad consultant utilizes a number of inferencing techniques are used. For example, patients with dementia
mechanisms to emulate the strategy of a medical expert (such as Alzheimer type) need to be looked after day and
in working with a patient. The knowledge in Iliad is night in order to manage their negative behaviors, which
represented in Bayesian and Boolean frames. These means a sea of input of labor and capital. With the purpose
frames permit the use of sensitivities and specificities to of resolving this problem, real-time health monitoring
describe the relationship of a disease to its manifestations devices have been developed to capture a large amount
and provide a basis for explaining its conclusions. Iliad of data. Based on these real-time data, patients with
has four basic components: the inference engine, the dementia can be diagnosed whether in agitation or not. At
user interface, the data driver, and the best information the same time, medical Big Data also pose challenges to
algorithm. data cleaning; poor-quality data should be identified and
rejected to ensure that the results of data mining are right.
Moreover, data captured from remote motoring devices
6.1.4 The MYCIN system can be mined to realize long-term prognoses.
A Context Processing Algorithm (CPA) (Moore, Xhafa,
MYCIN is an interactive expert system for the diagnosis Barolli, & Thomas, 2013) is proposed to address the issues
and treatment of central nervous system’s infection encountered in decision support in medical diagnosis
(Berner, 2003). It is composed of three subsystems: and potential prognoses based on the event–condition–
consultation, interpretation, and rules. According to the action (ECA) rule concept. CPA regards captured Big Data
clinical manifestations and laboratory results of patients, as a kind of contextual information to carry out data
MYCIN imitates the expert reasoning process, assists processing in intelligent context-aware systems.
clinicians in determining bacterial species, and makes On the basis of Big Data, pervasive remote medical
clinical recommendations. The system adopts the method systems are designed for both healthy and ill people.
of if–then inference rules and produces more than 400 Páez et al. (2015) proposed an architecture including the
kinds of embodied knowledge expert judgment rules. application of cloud computing, Big Data, and Internet
of things approaches to make sure chronic or non-
chronic patients as well as healthy people are monitored
Big Data in Health Care: Applications and Challenges 15
in different environments. Family members, emergency of a medicinal product (Edwards & Aronson, 2000). ADR
systems, and hospitals can interact with the patients can be used in the field of medical administration and
whenever and wherever possible. warrants prevention, specific treatment, alteration of the
While Big Data promotes the function of medical dosage regimen, and withdrawal of the product.
remote monitoring and diagnosis, the development of With the help of Big Data, health departments or
telemedicine also enriches the connotation of Big Data. medical companies can efficiently take actions when
Traditionally, medical Big Data refers to EHR and remote they detect potential ADRs among the people who take
monitoring health data. However now, medical Big Data, the medication. In 2004, Wilson et al. proposed that
including user’s behaviors, physical strength, and mental Knowledge Discovery in Databases (KDD) is a more
state data, has been rapidly generated (Redmond et al., effective way to determine the presence and assess the
2014). Technological advances in the medical field, such strength of ADR signals. At this point, numerous data-
as medical video communications, also provide a new mining techniques have been used in drug safety, such
type of medical Big Data. For instance, a light-field-based as cluster analysis, link analysis, deviation detection, and
3D cloud telemedicine system (Wang, Xiang, Pickering, disproportionality assessment.
& Zhang, 2016) that combines Big Data analysis with 3D As Big Data emerges, health social media sites are
technologies is proposed to mine big video data. regarded as a fast and direct data resource for scientist
to get first-hand ADR information. Compared with ADRs
recorded by health professionals, spontaneous reporting of
6.3 Applications in public health data on health social media sites is much more abundant,
open, and timely. Owing to the advantages discussed earlier,
In the field of public health, BDA represents a new solution Christopher et al. (2009) used association mining and
that can mine web-based and social media data to predict proportional reporting ratio to analyze the detected ADRs
disease outbreaks based on consumers’ searches, social for different drugs on the basis of social data. Given the
content, and query activity. Systems in public health prosperity of medical research especially in the ADR field and
also support clinicians and epidemiologists performing the advantages of Big Data, Shah et al. (2012) believed that
analyses across patients and care venues to help identify Big Data in biomedical informatics will grow considerably.
disease trends and drug safety. There is no doubt that the age of data-medicine is poised to
BDA is often used for monitoring of disease networking. create a proactive, predictive, preventive, participatory, and
An example is Google’s use of BDA to study the timing patient-centered health system.
and location of search engine queries to predict disease Apart from the great potential shown in drug safety,
outbreaks. Research shows that one-third of consumers Big Data can also achieve powerful effects in identifying
currently use social networking for health care purposes susceptible populations. A large collection of EHRs
(Facebook, YouTube, blogs, Google, Twitter). As demand accumulated by various medical treatments provides an
for access to health information from social networking opportunity to dig out the statistical model of high-risky
sites continues to proliferate, BDA can potentially support people. The model aims to reduce the cost of health care
key prevention programs such as disease surveillance and and conserve limited resources in health value. Bates
outbreak management. et al. (2014) suggested that identifying and managing
The Global Burden of Disease Study (GBD) is a six practical use cases’ data is the way to use predictive
comprehensive regional and global research program medical systems. The use cases include high-cost patients,
of disease burden that assesses mortality and disability readmissions, triage, decompensation, adverse events,
from major diseases, injuries, and risk factors. GBD is and treatment optimization for diseases affecting multiple
a collaboration of more than 1,800 researchers using organ systems.
medical Big Data from 127 countries. The 2015 report
(Collaborators, 2017) showed that globally, diarrhea was a
leading cause of death among all ages, as well as a leading 6.4 Applications in disease pattern analysis
cause of disability-adjusted life years (DALYs) because of and personalized medicine
its disproportionate impact on young children.
BDA is also widely applied to supervise drug safety, Hay et al. (2013) imported new sources of data, such as
particularly ADRs, and identify susceptible population. social data, to relevant environmental information to
ADR is defined as an appreciably harmful or unpleasant create a dynamic and real-time global infectious disease
reaction resulting from an intervention related to the use map. On the basis of infectious disease risk maps, human
16 Liang Hong et al.
beings can deepen their knowledge of infectious diseases data (Jee & Kim, 2013) whose integration, analysis, and
and improve the ability to triage spatially and issue storage bring a certain degree of difficulty. At the current
infectious disease outbreak alerts. Lazer et al. (2014) stated stage, it is inefficient to share structured data among
that “Big Data hubris” is the often implicit assumption agencies and the sharing of unstructured data among
that Big Data is a substitute for, rather than a supplement the same organizations is even more difficult to achieve.
to, traditional data collection and analysis. Given that Determining how to effectively mine a large amount of
most Big Data cannot reach the standard of scientific unstructured data will continue to be a major challenge
statistical analysis, there is no doubt that the results can (Sejdic, 2014). One of the characteristics of Big Data is
have large errors. Additionally, medical algorithms are not variability in data sources (Dieringer & Schlotterer, 2003),
constant. On the contrary, they are dynamic and process a and medical data itself have a strong timeliness; for
continuous series of adjustments. example, personalized medical care has high timeliness
Big medical data can be applied not only to mining requirements. The medical industry’s processing speed of
public medical patterns but also to personalized medical data is extremely demanding, especially when the patient’s
care. At present, health care is moving from a disease- condition deteriorates rapidly. In addition, when using
centered model toward a patient-centered model. In a real-time applications such as cloud computing to access
disease-centered model, physicians’ decision making is and analyze data, the patient data’s privacy and security
centered on the clinical expertise and data from medical are also a challenge (Jee & Kim, 2013). Cloud computing
evidence and various tests. In a patient-centered model, now offers new possibilities for medical Big Data’s mining
patients actively participate in their own care and receive and sharing. However, there are also several challenges
services focused on individual needs and preferences. to be overcome before cloud computing can become more
Personalized healthcare is a data-driven approach. practical. First, although cloud computing offers an easy
This means a kind of patient-centered medical model and flexible way to mine resources, it also increases the
that assesses the relationship among patients who are risk of privacy disclosure, a fact that is particularly evident
exposed to similar risk, lifestyle, and environmental in fields such as clinical informatics and public health
factors that are created. In light of these thoughts, Chawla informatics. Second, in medicine, a large amount of data
and Davis (2013) developed a system named CARE that are often required to be imported or exported to the cloud
uses a collaborative filtering method to capture patient (petabyte level). The network bandwidth constraints affect
similarities and produces personalized disease profiles for the speed of data transmission and also increase the cost
personalized disease risk predictions. of cloud computing (J. J. Chen, Qian, Yan, & Shen, 2013).
Panahiazar et al. (2014) presented the main challenges At present, the attention to Big Data focuses mainly on
in the standpoints including variety of the data, quality its accuracy; timely and accurate data mining is another
of the data, volume of the data, and velocity of the data. challenge, which is still in the initial stage (Abenstein &
Alyass et al. (2015) proposed that personalized medicine Tompkins, 1982; Xu et al.).
may widen the growing gap in health systems between
rich and poor countries. Moreover, they blamed the slow
transition from conventional to personalized medicine 7.2 Data storage
based on several factors: generation of cost-effective high-
throughput data, hybrid education and multidisciplinary The current difficulties in data storage are mainly due
teams, data storage and processing, data integration to high costs. Medical data costs arise mainly from three
and interpretation, and individual and global economic aspects. The huge amount of medical data is one of the
relevance. sources of storage costs. With the development of medical
information, the medical industry has produced a large
amount of data, ranging from medical diagnostic images
7 Challenges for Mining Big Data in to pathological analysis of maps. For example, regional
Health Care medical data are usually derived from a region with
millions of people and hundreds of medical institutions,
and the amount of data continues to grow. In accordance
7.1 Data mining with the relevant provisions of the medical industry,
a patient’s data typically need to be retained for more
Clinical Big Data contains a large amount of unstructured than 50 years. The data of this patient not only contain a
data such as natural language or other handwritten large number of online or real-time data but also include
Big Data in Health Care: Applications and Challenges 17
a variety of data such as diagnosis and medication regional health information platforms, and population
recommendations in CDS, various structured data tables, and public health data of government surveys. There
non-(semi-) structured text documents, medical images, is not much connection between these data sets. At the
and other information. The massive size of the data same time, data sharing mechanism is imperfect due
inevitably increases the cost and difficulty of storage. to the information barriers among hospitals, scientific
There are also costs associated with moving them from research institutions, and other institutions (Kruse,
one place to another as well as analyzing them. Finally, Goswamy, Raval, & Marawi, 2016). For example, in China,
the types of medical data type are diverse, including medical institutions have limited communication and
numerical data that record various disease tests, as well sharing with each other as a whole (Rui, Y., 2015). With
as various diagnostic images, records made by doctors the globalization of data, Big Data in health care will
and nurses, and even diagnostic speech, video, and other also face varying degrees of language, terminology, and
unstructured data. Unstructured data are more difficult standardization barriers (Kruse et al, 2016.).
to store, analyze, and manipulate than structured data.
They also, to a certain extent, increase the cost of storage.
It is also a challenge to maintain safety and privacy in the 7.3.3 Volume of data
process of storing, extracting, and downloading patient-
related data (Youssef, 2014). The massive volume of health care Big Data in the terabyte
(TB) level and even petabyte (PB) level is now beyond
the capabilities of personal computers and network file
7.3 Data sharing sharing programs, thus establishing that a new sharing
mechanism is urgently needed (Kruse et al, 2016; Service).
7.3.1 Limited data standardization and interoperability
The current standards and technologies are inadequate to 7.3.4 Insufficient data integration
meet the requirements of the integrative applications of
health care Big Data. The difficulties are two folds. First, More data integration is needed. The data have not yet been
the data lack uniform standards, consistent description fully embedded in business processes and organizational
format, and presentation methods. Second, different levels management practices. For example, in many cases,
of structured, semi-structured, and unstructured data patient monitoring data have not yet been integrated into
integration are difficult. At the same time, each database clinical diagnosis and treatment, and clinical data have
uses different software and data formats, especially the not yet been integrated into public health services and
latter makes data comparison, analysis, transfer, sharing, infectious disease monitoring (Tao, D. A. I., 2016).
and other processes more difficult (Chawla & Davis, 2013;
Mohr, Burns, Schueller, Clarke, & Klinkman, 2013; W.
Raghupathi & Raghupathi, 2014). Data integration can 7.4 Data privacy
also reduce the cost. Hillestad et al. (2005) compared
health care with the use of IT in other industries and Health care data are more sensitive and centralized
estimated that the use of interoperable electromagnetic than other types of Big Data. There are significant
radiation system can save $142–137 billion. concerns regarding confidentiality (Mancini, 2014b; D.
C. Mohr et al., 2013). However, for the problem of patient
data privacy protection, no perfect solution has yet
7.3.2 Information barriers emerged. Patient data leakage may have unpredictable
consequences (including injury, discrimination, and
The medical field of Big Data users covers a wide range, others). There are many real cases at home and abroad.
such as hospital clinics, regional medical centers, Big Data technology makes personal medical data face
medical insurance companies, drug management analysis a greater risk. Some people even believe that in the era
units, and medical equipment monitoring centers. The of Big Data, protecting personal privacy is impossible
corresponding data resources are scattered in different (Schadt, 2012). The problem can be alleviated by special
data pools, including hospital medical records, settlement processing (such as de-identification and digital identity
and cost data, medical firms’ records, academic medical encryption), but the identification and de-identification of
research data, residents’ health records collected by information still require people or applications to process
18 Liang Hong et al.
Table 3
An Example of Data Privacy Breach
Name Sex Zip code Date of birth Address Sex Zip code Date of birth Disease
identifiable information that may cause the patient’s issues. Lin et al. (2004) found that “Specifying DNA
health information to be misappropriated by others sequence at only 30 to 80 statistically independent SNP
without knowing or unauthorizedly (Rothstein, 2010). Big positions will uniquely define a single person”. As such
Data increases the risks to patient data for two reasons. the privacy protection becomes the focus.
First is the risk of the data itself. The data can be copied
and preserved without space and time constraints, and
this feature is characterized by high risk and long-term 7.5 Data technologies and talent
risk under Big Data conditions. Second is the risk of Big
Data technology. Under Big Data technology conditions, As described in the main characteristics of Big Data,
even if a Big Database uses anonymous personal in terms of data size, Big Data in health care exceeded
encrypted data, there is still a user identity that can be 150 exabytes after 2011 (Y. C. Wang et al., 2015). A study
re-identified by residual risk, and personal identities showed that data size in health care is estimated to be
can be re-determined by data link technology because around 40 ZB in 2020 (Fig. 1) (O’Driscoll, Daugelaite, &
Big Data uses pseudonymized personal confidential data Sleator, 2013). The complexity of the data is also growing
that have been anonymized but retain a residual risk of rapidly, with data diversity, fast change, low value density,
re-identification (Ward,2014). This risk is greater when and other complex features becoming increasingly
different data are used to relate. De-anonymization is an significant. Their complexity poses a serious challenge to
attack in which anonymous data and other sources of traditional computing and information technology (Tony
data are compared in order to re-identify the anonymous Hey, 2012.06). At present, it is difficult to accommodate
data sources (Yom-Tov, E, 2016). For example, comparing the availability, consistency, and partition fault tolerance
voter registration data and hospital discharge data can of the distributed system all at once. It is also difficult to
determine whether a person is sick. Voter registration data solve the health care data collection, processing real-time
contains date of birth, sex, zip code, address, date last and dynamic index, lack of prior knowledge, and other
voted, name, data registered, and other details. Hospital difficult issues (Zhang Zhen, Zhou Yi, Du Shou-hong,
discharge data contains date of birth, sex, zip code, Luo Xue-qiong, Mei Tian, 2014). Even some widely used
diagnosis, ethnicity, medication, procedure, visit date, Big Data technology also has its challenges. For example,
and other information. By comparing the same fields in Hadoop helps solve the storage problems of Big Data
the two data sources, such as date of birth, sex, and zip and also reduces the cost of data storage and improves
code, an attacker can determine the specific source and the speed of operation. However, Hadoop is faced with
then determine the subject’s illness and voting situation. technical problems of low security and that data cannot be
In the example in Table 3, through the comparison of interconnected (Augustine, 2014. Mar; K. Jee & G. H. Kim,
these two data sources, it is not difficult to determine 2013). In addition, promoting the development of health
that the person whose date of birth, sex, and zip code are care Big Data applications needs human experts who have
06/18/90, female, 77889, respectively, is Angela and she is both clinical and analytic knowledge (Mavandadi et al.,
suffering from diabetes. 2012). According to McKinsey, even in the U.S., the leading
Also in the future, in order to better achieve information technology power, the related talent gap will
individualized treatment, our individual genomes may reach 14–19 million in 2018 (James Manyika, 2011). Many
be added to the EHR. The individual genome is private, of the data technologies today, including Hadoop and
and the gene sequence may lead to many privacy-related computing cloud, are challenging for many businesses,
Big Data in Health Care: Applications and Challenges 19
especially small firms. The skills required are in many grants (Nos. 31601083 and 61772375), and the Recruitment
cases not simple; they involve data mining, analysis, Program of Global Experts (No. 104413100019).
manipulation, and other techniques that are too difficult
and expensive for most small firms to master (K. Jee & G. H.
Kim, 2013). At present, only a small number of companies
in the world have mastered the core technology of Big Data
References
analysis. The world needs more data analysts who can Abenstein, J. P., & Tompkins, W. J. (1982). A new data-reduction
use information technology to visualize the data before algorithm for real-time ECG analysis. IEEE Transactions on
presenting to the policy makers. Finally, we also need to Biomedical Engineering,29(1), 43–48.
master the professional management of technology, data Abernethy, A. P., Wheeler, J. L., & Bull, J. (2011). Development
processing technology, and medical data management of a health information technology-based data system in
community-based hospice and palliative care. American
personnel. They can use the appropriate management
Journal of Preventive Medicine, 40(5, Suppl 2), S217–S224.
model to make the information infrastructure a continuous Agrawal, R., Imieliński, T., & Swami, A. (1993, May). Mining
research and application platform, ensure continuity, association rules between sets of items in large databases.
and achieve cross-cutting cooperation (Sepulveda,2013. In B. Peter, & J. Sunshil(Eds.), Proceeding of the ACM SIGMOD
Youssef, 2014). Conference on Management of Data(pp.207-216). Washington,
DC: ACM Press.
Aitken, M., & Gauntlett, C. (2013). Patient apps for improved
healthcare: from novelty to mainstream. IMS Institute for
8 Conclusions Healthcare Informatics. Retrieved from https://www.mendeley.
com/catalogue/patient-apps-improved-healthcare-novelty-
mainstream/
Medical research that integrates Big Data will contribute Alyass, A., Turcotte, M., & Meyre, D. (2015). From big data analysis
to a higher level of human health at a broader and deeper to personalized medicine for all: Challenges and opportunities.
level. This paper summarizes and introduces the related BMC Medical Genomics, 8(1), 33.
research of medical data at home and abroad in recent Anderson, J. E., & Chang, D. C. (2015). Using electronic health
records for surgical quality improvement in the era of big data.
years. This paper mainly introduces the related concepts
Jama Surgery, 150(1), 24-29.
of medical Big Data, the background, and the main Antonie, M. L., Zaïane, O. R., & Coman, A. (2001). Application of
applications, and it introduces several key technologies data mining techniques for medical image classification.
related to medical Big Data. In addition, we summarize Proceedings of the Second International Conference on
and think about the opportunities and challenges in the Multimedia Data Mining, 94-101. doi:10.1.1.23.9742
study of big medical data. In general, the current research Asante-Korang, A., & Jacobs, J. P. (2016). Big Data and paediatric
cardiovascular disease in the era of transparency in healthcare.
on medical data is not yet mature; there are many problems
Cardiology in the Young, 26(8), 1597–1602.
that need to be resolved. In order to take full advantage Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2015,
of the profound patterns contained in the massive data, June). Big data in healthcare: challenges and opportunities.
Big Data storage, mining, analysis, and related talent are Proceedings of 2015 International Conference on Cloud
essential. These technologies and talents will support Computing Technologies and Applications,Marrakech,
Morocco.
research on health care Big Data and further serve a wide
Augustine, D. P. (2014). Leveraging big data analytics and Hadoop in
range of medical applications such as public health, developing India’s healthcare services. International Journal of
medical care, and medical insurance, and many others. Computers and Applications, 89(16), 44–50.
Azar, A. T., & Hassanien, A. E. (2015). Dimensionality reduction of
Acknowledgments: ML wrote sections 1 and 2, RW wrote medical big data using neural-fuzzy classifier. Soft Computing,
sections 3 and 4, LH wrote sections 5 and 6, and PL wrote 19(4), 1115–1127.
Backonja, U., Kim, K., Casper, G. R., Patton, T., Ramly, E., &
sections 7 and 8. WL provided critical suggestions for
Brennan, P. F. (2012, June). Observations of daily living:
the paper. LL designed the paper structure, integrated putting the “personal” in personal health records. NI 2012:
all sections, and supervised the paper writing. We 11th International Congress on Nursing Informatics, Montreal,
thank Lina Zhou and Ni Wen for assistance in literature Canada.
search. This paper is supported in part by The National Bagayoko, C. O., Dufour, J. C., Chaacho, S., Bouhaddou, O., &
Fieschi, M. (2010). Open source challenges for hospital
Key Research and Development Program of China (No.
information system (HIS) in developing countries: A pilot
2016YFB1000603), Key Program of the Major Research Plan project in Mali. BMC Medical Informatics and Decision Making,
of the National Natural Science Foundation of China (No. 10(22), 1-13.
91646206), National Natural Science Foundation of China
20 Liang Hong et al.
Bamidis, P. D. (2010). On the classification of emotional biosignals Collins, B. (2016). Big data and health economics: Strengths,
evoked while viewing affective pictures: An integrated waknesses, opportunities and threats. PharmacoEconomics,
data-mining-based approach for healthcare applications. IEEE 34(2), 101–106.
Transactions on Information Technology in Biomedicine, 14(2), Costa, F. F. (2014). Big data in biomedicine. Drug Discovery Today,
309–318. 19(4), 433–440.
Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. Dai, T. (2016). Health and medical big data development
(2014). Big data in health care: Using analytics to identify and perspective. Journal of Medical Informatics, 37(2), 2–8.
manage high-risk and high-cost patients. Health Affairs, 33(7), Demidowich, A. P., Lu, K., Tamler, R., & Bloomgarden, Z. (2012).
1123–1131 An evaluation of diabetes self-management applications for
Belle, A., Thiagarajan, R., Soroushmehr, S. M., Navidi, F., Beard, Android smartphones. Journal of Telemedicine and Telecare,
D. A., & Najarian, K. (2015). Big data analytics in healthcare. 18(4), 235–238.
Biomed Research Internatioan,2015:370194, 1-16. DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials.
Berner, E. S. (2003). Diagnostic decision support systems: How to Controlled Clinical Trials, 7(3), 177–188.
determine the gold standard? Journal of the American Medical Deserno, T. M., Haak, D., Brandenburg, V., Deserno, V., Classen, C.,
Informatics Association,10(6), 608–610. & Specht, P. (2014). Integrated image data and medical record
Blaya, J. A., Shin, S. S., Yagui, M. J., Yale, G., Suarez, C. Z., Asencios, management for rare disease registries. A general framework
L. L., Fraser, H. S. (2007). A web-based laboratory information and its instantiation to the German Calciphylaxis Registry.
system to improve quality of care of tuberculosis patients in Journal of Digital Imaging, 27(6), 702–713.
Peru: Functional requirements, implementation and usage Dieringer, D., & Schlotterer, C. (2003). Microsatellite analyser (MSA):
statistics. BMC Medical Informatics and Decision Making, 7(1), A platform independent analysis tool for large microsatellite
33–43. data sets. Molecular Ecology Notes, 3(1), 167–169.
Braunstein, M. L. (2015). Health big data and analytics. Docherty,A., (2014). Big Data—Ethical perspectives. Anaesthesia,
Practitioner’s Guide to Health Informatics (pp. 133–149). Berlin, 69(4), 390–391.
Germany: Springer International Publishing. Edwards, I. R., & Aronson, J. K. (2000). Adverse drug reactions:
Celesti, A., Fazio, M., Romano, A., & Villari, M. (2016). A hospital Definitions, diagnosis, and management. Lancet, 356(9237),
cloud-based archival information system for the efficient 1255–1259.
management of HL7 big data. 2016 39th International Fan, C.-Y., Chang, P.-C., Lin, J.-J., & Hsieh, J. C. (2011). A hybrid model
Convention on Information and Communication Technology, combining case-based reasoning and fuzzy decision tree for
Electronics and Microelectronics (MIPRO). Opatija, Croatia. medical data classification. Applied Soft Computing, 11(1),
Centers for Medicare & Medicaid Services (CMS), HHS. (2010). 632–644.
Medicare and Medicaid programs; electronic health record Feldman, B., Martin, E. M., & Skotnes, T. (2012). Big data in
incentive program. Final rule, Federal Register, 75(144), healthcare: Hype and hope. Dr. Bonnie, 2012(1), 122–125.
44313–44588. PMID:20677415 Fenderson, & Bruce.,A. (2008). Molecular Biology of the Cell,5th
Chawla, N. V., & Davis, D. A. (2013). Bringing big data to Edition. Medicine & Science in Sports & Exercise, 40(9), 1709.
personalized healthcare: A patient-centered framework. Journal Frantzidis, C. A., Bratsas, C., Klados, M. A., Konstantinidis, E.,
of General Internal Medicine, 28(3, Suppl 3), S660–S665. Lithari, C. D., Vivas, A. B., Gardner, R. M., Pryor, T. A., & Warner,
Chen, J., Qian, F., Yan, W., & Shen, B. (2013). Translational H. R. (1999). The HELP hospital information system: Update
biomedical informatics in the cloud: Present and future. 1998. International Journal of Medical Informatics, 54(3),
BioMed Research International,2013, 658925. PMID:23586054 169–182.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Garets, D., & Davis, M. (2007). Electronic medical records vs
Networks and Applications, 19(2), 171–209. Electronic health records: Yes, there is a difference. Zhongguo
Chen, T. S., Liu, C. H., Chen, T. L., Chen, C. S., Bau, J. G., & Lin, T. C. Yiyuan, 11(5), 38–39.
(2012). Secure dynamic access control scheme of PHR in cloud Gunter, T. D., & Terry, N. P. (2005). The emergence of national
computing. Journal of Medical Systems, 36(6), 4005–4020. electronic health record architectures in the United States and
Chia, C.-C., & Syed, Z. (2011). Computationally generated cardiac Australia: Models, costs, and questions. Journal of Medical
biomarkers: Heart rate patterns to predict death following Internet Research, 7(1), 13-15.
coronary attacks. Proceedings of the 2011 SIAM International Han, J., Pei, J., & Yin, Y.. (2000, May). Mining frequent patterns
Conference on Data Mining, 735-746. without candidate generation: A frequent-pattern tree
Christopher C. Yang, H. Y., Jiang, L., & Zhang, M. (2009). Social approach. Proceedings of the 2000 ACM SIGMOD international
media mining for drug safety signal detection. Proceedings conference on Management of data(pp.1-12), Texas, USA.
of the 2012 international workshop on Smart health and Hassani S, M. H., Qannari E M, et al. (2010). Analysis of -omics data:
wellbeing. Graphical interpretation- and validation tools in multi-block
Christy, A., Gandhi, G. M., & Vaithyasubramanian, S. (2015). Cluster methods. Chemometrics and Intelligent Laboratory Systems,
based outlier detection algorithm for healthcare data. Procedia 104(1), 140–153.
Computer Science,50, 209–215. Hastie, B. A., Riley, J. L., Robinson, M. E., Glover, T., Campbell, C. M.,
Cismondi, F., Fialho, A. S., Vieira, S. M., Reti, S. R., Sousa, J. M., & Staud, R., & Fillingim, R. B. (2005). Cluster analysis of multiple
Finkelstein, S. N. (2013). Missing data in medical databases: experimental pain modalities. Pain, 116(3), 227–237.
Impute, delete or classify? Artificial Intelligence in Medicine, Hay, S. I., George, D. B., Moyes, C. L., & Brownstein, J. S. (2013). Big
58(1), 63–72. data opportunities for global infectious disease surveillance.
PLoS Medicine, 10(4), e1001413.
Big Data in Health Care: Applications and Challenges 21
He, C., Jin, X., Zhao, Z., & Xiang, T. (2010, Deceember). A cloud Kovalev, V., & Kalinovsky, A. (2015). Big Medical Data: Image
computing solution for hospital information system. Paper Mining, Retrieval and Analytics. Paper presented at Big Data
presented at the 2010 IEEE International Conference on and Predictive Analytics, Minsk, Belarus.
Intelligent Computing and Intelligent Systems, Xiamen, China. Krumholz, H. M. (2014). Big data and new knowledge in medicine:
Heart, T., Ben-Assuli, O., & Shabtai, I. (2017). A review of PHR, EMR The thinking, training, and tools needed for a learning health
and EHR integration: A more personalized healthcare and system. Health Affairs, 33(7), 1163–1170.
public health policy. Health Policy and Technology, 6(1), 20–25. Kruse, C. S., Goswamy, R., Raval, Y., & Marawi, S. (2016). Challenges
Heimann, T., & Meinzer, H. P. (2009). Statistical shape models for and opportunities of big data in health care: A systematic
3D medical image segmentation: A review. Medical Image review. Jmir Medical Informaticas, 4(4), e38.
Analysis, 13(4), 543-563. Kumar, S., & Aldrich, K. (2010). Overcoming barriers to electronic
Herland, M., Khoshgoftaar, T. M., & Wald, R. (2014). A review of medical record (EMR) implementation in the US healthcare
data mining using big data in health informatics. Journal of Big system: A comparative study. Health Informatics Journal, 16(4),
Data, 1(2), 1–35. 306–318.
Hillestad, R., Bigelow, J., Bower, A., Girosi, F., Meili, R., Scoville, Kuo, R., Lin, S., & Shih, C. (2007). Mining association rules through
R., & Taylor, R. (2005). Can electronic medical record systems integration of clustering analysis and ant colony system for
transform health care? Potential health benefits, savings, and health insurance database in Taiwan. Expert Systems with
costs. Health Affairs, 24(5), 1103-1117. Applications, 33(3), 794-808.
Hong, C. J., Kaur, M. N., Farrokhyar, F., & Thoma, A. (2015). Accuracy Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big data.
and completeness of electronic medical records obtained from The parable of Google Flu: Traps in big data analysis. Science,
referring physicians in a Hamilton, Ontario, plastic surgery 343(6176), 1203–1205
practice: a prospective feasibility study. Plastic Surgery, 23(1), Lin, Z., Owen, A. B., & Altman, R. B. (2004). Genetics: Genomic
48. research and human subject privacy. Science, 305(5681), 183.
Hsieh, J. C., Li, A. H., & Yang, C. C. (2013). Mobile, cloud, and big Lincoln, M. J. (1998). Applying commonly available expert systems
data computing: Contributions, challenges, and new directions in physician assistant education. Perspective on Physician
in telecardiology. International Journal of Environmental Assistant Education, 9(3), 144–151.
Research and Public Health, 10(11), 6131–6153. Lodish, H. (2008). Molecular cell biology. San Francisco, CA:
Huang, X. J., & Yao, Y. (2016, August). Multi-dimensions clustering W.H.Freeman and Company.
approach for physical health data based on aritificial ant Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data
colony optimization. Paper presented at the 8th International application in biomedical research and health care: A literature
Conference on Intelligent Human-Machine Systems and review. Biomedical Informatics Insights, 8, 1–10.
Cybernetics (IHMSC), Hangzhou, China. M, T. T. (2014). Mobile Tech Contributions to Healthcare & Patient
Jee, K., & Kim, G. H. (2013). Potentiality of big data in the medical Experiences. Retrieved from http://topmobiletrends.com/
sector: Focus on how to reshape the healthcare system. mobile-technologycontributions-Patient-experience-parmar/
Healthcare Informatics Research, 19(2), 79–85. MacRae, J., Darlow, B., McBain, L., Jones, O., Stubbe, M., Turner,
Joshi, K., & Yesha, Y. (2012). Workshop on analytics for big data N., & Dowell, A. (2015). Accessing primary care Big Data:
generated by healthcare and personalized medicine domain. The development of a software algorithm to explore the rich
Proceedings of the 2012 Conference of the Center for Advanced content of consultation records. BMJ Open, 5(8), e008160.
Studies on Collaborative Research, 267-269. Mancini, M. (2014). Exploiting big data for improving healthcare
Kanagaraj, G., & Sumathi, A. C. (2011, December). Proposal of an services. Journal of e-Learning and Knowledge Society, 10(2),
open-source cloud computing system for exchanging medical 23-33.
images of a Hospital Information System. Paper presented Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C.,
at the 3rd International Conference on Trendz in Information & Byers, A. H. (2011). Big data: The next frontier for innovation,
Sciences & Computing (TISC2011), Chennai, India. competition, and productivity. Retrieved from Mckinsey Glbal
Kennedy, E. H., Wiitala, W. L., Hayward, R. A., & Sussman, J. Institute website: https://www.mckinsey.com/business-
B. (2013). Improved cardiovascular risk prediction using functions/digital-mckinsey/our-insights/big-data-the-next-
nonparametric regression and electronic health record data. frontier-for-innovation
Medical Care, 51(3), 251–258. Marx, V. (2013). Biology: The big challenges of big data. Nature,
Khan, W. A., Khattak, A. M., Hussain, M., Amin, M. B., Afzal, M., 498(7453), 255–260.
Nugent, C., & Lee, S. (2014). An adaptive semantic based Mavandadi, S., Dimitrov, S., Feng, S., Yu, F., Yu, R., Sikora, U., &
mediation system for data interoperability among Health Ozcan, A. (2012). Crowd-sourced BioGames: Managing the big
Information Systems. Journal of Medical Systems, 38(8), 1-18. data problem for next-generation lab-on-a-chip platforms. Lab
Khoury, M. J., & Ioannidis, J. P. A. (2014). Medicine. Big data meets on a Chip, 12(20), 4102–4106.
public health. The New Zealand Medical Journal, 346(6213), Mohr, D. C., Burns, M. N., Schueller, S. M., Clarke, G., & Klinkman,
1054–1055. M. (2013). Behavioral intervention technologies: Evidence
Kim, T.-W., Park, K.-H., Yi, S.-H., & Kim, H.-C. (2014). A big data review and recommendations for future research in mental
framework for u-Healthcare systems utilizing vital signs.Paper health. General Hospital Psychiatry, 35(4), 332–338.
presented at 2014 International Symposium on Computer, Moore, P., Xhafa, F., Barolli, L., & Thomas, A. (2013, October).
Consumer and Control, Taichung, Taiwan. Monitoring and detection of agitation in dementia: Towards
real-time and big-data solutions. Paper presented at the 2013
22 Liang Hong et al.
Eighth International Conference on P2P, Parallel, Grid, Cloud data mean for wearable sensor systems? Yearbook of Medical
and Internet Computing, Compiegne, France. Informatics, 9(1), 135–142.
Naito, M. (2014). Utilization and application of public health data Roberts, E. B. (1985). Health information systems. Clinics in
in descriptive epidemiology. Journal of Epidemiology, 24(6), Laboratory Medicine, 23(5), 672–676.
435–436. Rothstein, M. A. (2010). Is deidentification sufficient to protect
Nance, J. W., Jr., Meenan, C., & Nagy, P. G. (2013). The future of health privacy in research? The American Journal of Bioethics,
the radiology information system. AJR. American Journal of 10(9), 3–11.
Roentgenology,200(5), 1064–1070. Rui, Y. (2015). Medical big data: The next industry windy spot.
Obenshain, M. K. (2004). Application of data mining techniques to Business School,[Chinese], 4, 100-103.
healthcare data. Infection Control and Hospital Epidemiology, Rumsfeld, J. S., Joynt, K. E., & Maddox, T. M. (2016). Big data
25(8), 690–695. analytics to improve cardiovascular care: Promise and
O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, challenges. Nature Reviews. Cardiology, 13(6), 350–359.
Hadoop and cloud computing in genomics. Journal of Safavi, S., & Shukur, Z. (2014). Conceptual privacy framework
Biomedical Informatics,46(5), 774–781. for health information on wearable device. PLoS One, 9(12),
Oztekin, A., Delen, D., & Kong, Z. J. (2009). Predicting the graft e114306.
survival for heart-lung transplantation patients: An integrated Schadt, E. E.(2012). The changing privacy landscape in the era of big
data mining methodology. International Journal of Medical data. Molecular Systems Biology, 8(1), 612.
Informatics, 78(12), e84–e96. Sejdić, E. (2014). Medicine: Adapt current tools for handling big
Páez, D. G., Rodríguez, M. D. B., Sánz, E. P., Villalba, M. T., & Gil, data. Nature, 507(7492), 306.
R. M. (2015). Big data processing using wearable devices for Sepulveda, J. L., & Young, D. S. (2013). The ideal laboratory
wellbeing and healthy activities promotion. In I. Cleland, L. information system. Archives of Pathology & Laboratory
Guerrero, & J. Bravo (Eds.), IWAAL: Ambient assisted living. ICT- Medicine, 137(8), 1129–1140.
based Solutions in Real Life Situations (pp. 196–205). Cham, Sepulveda, M. J.(2013). From worker health to citizen health:
Switzerland: Springer. Moving upstream. Journal of Occupational and Environmental
Pai, F. Y., & Huang, K. I. (2011). Applying the technology acceptance Medicine, 55(12, Suppl), S52–S57.
model to the introduction of healthcare information systems. Service, R. F.(2013). Biology’s dry future. Science, 342(6155),
Technological Forecasting and Social Change, 78(4), 650–660. 186–189.
Panahiazar, M., Taslimitehrani, V., Jadhav, A., & Pathak, J. (2014, Shah, N. H., & Tenenbaum, J. D. (2012). The coming age of data-
October). Empowering personalized medicine with big data driven medicine: Translational bioinformatics’ next frontier.
and semantic web technology: Promises, Challenges, and Journal of the American Medical Informatics Association,
Use Cases. 2014 IEEE International Conference on Big Data, 19(e1), e2–e4.
Washington, DC. Sheta, O. E., & Eldeen, A. N. (2013). The technology of using a
Paul, R., & Hoque, A. S. M. L. (2010). Clustering medical data to data warehouse to support decision-making in health care.
predict the likelihood of diseases. 2010 Fifth International International Journal of Database Management Systems,
Conference on Digital Information Management, 44-49. 5(3),75-86.
Thunder Bay, Canada. Sirintrapun, S. J., & Artz, D. R. (2016). Health information systems.
Pentland, A., Reid, T., & Heibeck, T. (2013). Big data and health: Clinics in Laboratory Medicine, 36(1), 133.
Revolutionizing medicine and public health. Report of the Big Steinbrook, R. (2008). Personally controlled online health data—The
Data andd Health Working Group 2013. Retrieved from http:// next big thing in medical care? The New England Journal of
www.wish-qatar.org/summits/wish-2013/forums-research- Medicine, 358(16), 1653–1656.
chairs/big-data-healthcare/ Swan, M. (2013). The quantified self: Fundamental disruption in big
Polpitiya, A. D., Qian, W. J., Jaitly, N., Petyuk, V. A., Adkins, J. N., data science and biological discovery. Big Data, 1(2), 85–99.
Camp, D. G.,…Smith, R. D. (2008). DAnTE: A statistical tool for Tan, S. S., Gao, G., & Koch, S. (2015). Big Data and Analytics
quantitative analysis of -omics data. Bioinformatics, 24(13), in Healthcare. Methods of Information in Medicine, 54(6),
1556–1558. 546–547.
Poulymenopoulou, M., Malamateniou, F., Prentza, A., Tang, P. C., Ash, J. S., Bates, D. W., Overhage, J. M., & Sands, D. Z.
&Vassilacopous, G. (2015). Challenges of evolving PINCLOUD (2006). Personal health records: Definitions, benefits, and
PHR into a PHR-based health analytics system. Paper presented strategies for overcoming barriers to adoption. Journal of the
at the Proceedings of the European, Mdediterranean & Middle American Medical Informatics Association, 13(2), 121–126.
Eastern Conference on Information Systems EMCIS. Taverner, T., Karpievitch, Y. V., Polpitiya, A. D., Brown, J. N., Dabney,
Preen, D. B., Holman, C. D., Spilsbury, K., Semmens, J. B., & A. R., Anderson, G. A., & Smith, R. D. (2012). DanteR: An
Brameld, K. J. (2006). Length of comorbidity lookback period extensible R-based tool for quantitative analysis of -omics data.
affected regression model performance of administrative Bioinformatics (Oxford, England), 28(18), 2404–2406.
health data. Journal of Clinical Epidemiology,59(9), 940–946. Tola, K., Abebe, H., Gebremariam, Y., & Jikamo, B. (2017). Improving
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in Completeness of Inpatient Medical Records in Menelik II
healthcare: Promise and potential. Health Information Science Referral Hospital, Addis Ababa, Ethiopia. Advances in Public
and Systems, 2(1), 3. Health, 2017, 1–5.
Redmond, S. J., Lovell, N. H., Yang, G. Z., Horsch, A., Lukowicz, Tony, H., Stewart, T., & Kristin, T. (2012). The fourth paradigm: Data
P., Murrugarra, L., & Marschollek, M. (2014). What does big -intensive scientific discover. Berlin, Germany : Springer-Verlag
Berlin Heidelberg.
Big Data in Health Care: Applications and Challenges 23
Tsumoto, S., Hirano, S., & Iwata, H. (2013). Mining nursing care plan Zhang, Z., Zhou, Y., Du, S. H., Luo, X. Q., & Mei, T. (2014). Medical
from data extracted from hospital information system. Paper big data and the facing opportunities and challenge. Journal of
presented at the 2013 IEEE/ACM International Conference on Medical Informatics, 6, 2–8.
Advances in Social Networks Analysis and Mining, Niagara
Falls, ON, Canada.
Usami, Y., Cho, H. C., Okazaki, N., & Tsujii, J. I. (2011). Automatic
acquisition of huge training data for bio-medical named entity
recognition. Proceedings of BioNLP 2011 Workshop 5, 65-73.
Valdes, I., Kibbe, D. C., Tolleson, G., Kunik, M. E., & Petersen, L. A.
(2004). Barriers to proliferation of electronic medical records.
Journal of Innovation in Health Informatics, 12(1), 3–9.
Vesna, V. (2000). The Visible Human Project: Informatic bodies and
posthuman medicine. AI & Society, 14(2), 262–263.
Wang, L., & Alexander, C. A. (2013). Applications of automated
identification technology in EHR/EMR. International Journal of
Public Health Science, 2(3), 109–122.
Wang, Y., Kung, L., Ting, C., & Byrd, T. A. (2015). Beyond a technical
perspective: Understanding big data capabilities in health care.
Proceedings of 48th Annual Hawaii International Conference on
System Sciences 48( pp.3044-3053). Hawaii, USA.
Ward, J. C. (2014). Oncology reimbursement in the era of
personalized medicine and big data. Journal of Oncology
Practice 10(2), 83–86.
White, S. E. (2013). De-identification and the sharing of big
data. Journal of American Health Information Management
Association, 84(4), 44–47.
Wilson, A. M., Thabane, L., & Holbrook, A. (2004). Application of
data mining techniques in pharmacovigilance. British Journal
of Clinical Pharmacology, 57(2), 127–134.
Windridge, D., & Bober, M. (2014). A kernel-based framework for
medical big-data analytics. In A. Holzinger & I. Jursica (Eds.),
Interactive knowledge discovery and data mining in biomedical
informatics (pp. 197-208). Berlin, Germany: Springer-Verlag.
Wu, P. Y., Cheng, C. W., Kaddi, C. D., Venugopalan, J., Hoffman, R.,
& Wang, M. D. (2017). –Omic and electronic health record big
data analytics for precision medicine. IEEE Transactions on
Biomedical Engineering, 64(2), 263–273.
Xiang, W., Wang, G., Pickering, M. & Zhang, Y. (2016). Big video
data for light-field-based 3D telemedicine. IEEE Network, 30(3),
30–38.
Xu, J., Wise, C., Varma, V., Fang, H., Ning, B., Hong, H., Kaput,
J. (2010). Two new Array Track libraries for personalized
biomedical research. BMC Bioinformatics, 11(Suppl 6), S6.
Yan, Y., Qin, X., Fan, J., & Wang, L. (2014). A review on healthcare big
data research. E-Science Technology & Application, [Chinese],
5(6), 3-16.
Yom-Tov, E. (2016). Crowdsourced health: How what you do on the
Internet will improve medicine. Cambridge, MA: Mit Press.
Youssef, A. E. (2014). A framework for secure healthcare systems
based on big data analytics in mobile cloud computing
environments. The International Journal of Ambient Systems
and Applications, 2(2), 1-11.
Yuen-Reed, G., & Mojsilović, A. (2016). The role of big data and
analytics in health payer transformation to consumer-centricity.
In C. Weaver, M. Ball, G. Kim & J. Kiel (Eds.), Healthcare
information management systems (pp. 399–420). Switzerland:
Springer.
Zhang, D. Q., & Chen, S. C. (2004). A novel kernelized fuzzy c-means
algorithm with application in medical image segmentation.
Artificial Intelligence in Medicine, 32(1), 37–50.