FEBIM2022BigDataEthics Bigdata
FEBIM2022BigDataEthics Bigdata
net/publication/360209789
CITATIONS READS
0 158
4 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ben S Liu on 26 April 2022.
Victor Chang1 a , Rahman Olamide Eniola2 b, Ben Shaw-Ching Liu3 c , and Mitra Arami4 a
1Department of Operations and Information Management, Aston Business School, Aston University, Birmingham, UK
2 Cybersecurity, Information Systems and AI Research Group, School of Computing, Engineering and Digital Technologies,
Teesside University, UK
3 Department of Marketing, Lender School of Business Center, Quinnipiac University Hamden, CT 06518, USA
4 Pardis Limited, London and EM Normandie Business School, France
Keywords: Ethics for AI and Data Science; Ethical framework; Ethics for smart healthcare.
Abstract: There has been significant growth in big data technology in healthcare in recent years. However, the potential
of big data analytics is affected by various ethical and security concerns, which have hampered the application
of big data analytics in healthcare. Recently, numerous studies have been conducted on the emerging big data
ethical issues in healthcare. While most of the journal reflects on privacy and security questions, it did not
examine; objectively the possible discriminatory impact of big data analytics has no. This mixed-method
project aims to highlight various ethical problems in big data analytics while also providing an in-depth insight
into the biased results derivable from big data analytics and the effects of such outcomes.
a https://orcid.org/0000-0002-8012-5852
b https://orcid.org/0000-0001-9799-861X
c https://orcid.org/0000-0002-2950-9607
c https://orcid.org/0000-0001-6855-9888
Figure 1: Healthcare Analytics market in the USA, by end
user, 2018 – 2028 (USD Million)
2 LITERATURE REVIEW
Big data and big data analytics are arguably the pillars
of other disruptive technologies, providing the
necessary business insights for patients, experts, and different conclusions. Big data analytics may result in
government (Wong, Zhou, and Zhang, 2019). Big unintentional discrimination (Žliobaitė, 2017;
data analytics is the method of storing, processing, Sonawane and Irabashetti, 2015). Žliobaitė (2017)
and analyzing vast collections of data to find trends established that discrimination is indirect, not by the
and other valuable knowledge (Heyman et al., 2004). analyst's intention but because of the structure and
These massive and complex big data collections are noise of experimental data. Such algorithms may
manipulated and managed using various systematically disfavor persons belonging to
computational methods such as machine learning and particular groups or categories, rather than depending
artificial intelligence (Ward and Barker, 2013). The purely on individual merits.
advent of advanced technology has provided Conversely, other academic studies
conditions and procedures for voluminous databases emphasized intentional discrimination (e.g.,
to be compiled and processed, resulting in informed Kuempel, 2016; Sonawane and Irabashetti, 2015).
decision-making in addressing health problems (Raja According to Kuempel (2016), data brokers
et al., 2020). frequently combine raw components of personal data
Big data has emerged as a promising option in a discriminatory way, leaving customers exposed
with the potential to revolutionize the healthcare to exploitative and distasteful marketing techniques.
system by lowering costs and optimizing treatment The effect of utilizing such a biased dataset with
process, delivery, and management (Patil and sensitive information is that such individuals or
Seshadri, 2014). The application of big data comes groups of people would lead to direct discrimination.
with some ethical issues that demand careful Suresh and Guttag (2019) explain how bias problems
consideration (Camilleri, 2020). Suresh and Guttag occur, how they apply to specific applications, and
(2019) explain how bias problems occur, how they how they inspire various solutions. They also present
apply to specific applications, and how they inspire a framework for understanding analytical bias at a
various solutions. They also present a framework for higher level of abstraction to facilitate constructive
understanding analytical bias at a higher level of dialogue and solution development.
abstraction to facilitate constructive dialogue and
solution development.
Notwithstanding the amount of data generated 3 RESEARCH QUESTIONS AND
in healthcare, the underlying challenge remains in the
integration, of structured and unstructured health BIG DATA ANALYTICS
data. According to Dridi et al. (2020), approximately ARCHITECTURE
80% of clinical data is unstructured: and widely
underutilized, once generated. Different clinical data The first step of the research was to identify relevant
formats, such as scanned canned medical documents, research questions. The main research question is,
prescriptions, patient registries, and clinician notes, "given the many applications and benefits of big data
result in poor standardization of healthcare data, and big data analytics in healthcare, do the ethical
making it more difficult to handle by EHR systems risks overshadow the benefits?"
and more prone to bias from data preprocessing (Cave To answer this main question, we need to find
et al., 2019; Dridi et al., 2020). answers to the following sub-questions:
Patient privacy invasion is an emerging 1. What are the applications of big data in
problem in big data analytics. Patients' behavior and healthcare?
sentiment data can be obtained from various online 2. What are the current ethical issues of
sources. For example, an online drug retailer may healthcare big data analytics?
have recorded the purchase of a particular
medication, a ride-hailing app may have recorded a
visit to a clinic or lab, or a social media app may have
recorded patients' interactions with a medical web
page. Furthermore, patients' data can also be
extracted unethically via health-care-specific
applications and wearable devices.
Also, we studied several publications to grasp
better the potential discriminatory effects and popular
Figure 3: Power BI architecture
drivers of discrimination or inequality in big data
analytics on subjects. Different writers arrived at
3. What is the cause of discrimination in big appropriateness of big data analytics in healthcare
data analytics? (Rehman et al., 2021). The aggregate of these data is
The big data analytics framework utilized in this analyzed to assist patients with diets, reminders of
project is a blend of many steps that explains the big preventative care, personalized medical care, follow-
data Analytics procedure (shown in Figure 3 above). up on prior consultations and medicines, and
The first phase in the framework is data preparation, counseling (Razzak et al., 2020).
which involves the ETL, i.e., Extraction, Due to the considerably broad customer base,
Transformation, and Loading of the data. Extraction relatively few regulatory obligations and ease of
is the process of determining the data type to be access to wearable devices and medical apps,
utilized and collecting it from different data sources, personalized medical care has significantly increased
such as existing databases and repositories, APIs, and its market size, as shown in Figure 4.
the cloud. Data transformation is the next step in
which data is transformed, aggregated, and loaded
into the Power Business Intelligence (BI) dashboard.
The transformation step is to ensure the: (1) handling
of inconsistencies and missing values in the data; (2)
elimination of duplicate data; (3) removal of useless
data; and (4) sorting of data into the appropriate type.
Figure 3 below illustrates the overview of the Power
BI analytics procedure.
The visualization step involves taking the
processed outputs and transforming them into
meaningful insights by viewing the results in
diagrams, KPIs, or other easy-to-understand formats. Figure 4: Trend of Personalized Medicine (2012 -2022)
It is crucial to ensure that results can be interpreted by
those with no previous experience or expertise. 4.2 Evidence-Based Healthcare
Unlike other tools, Power BI allows the integration of
different programming languages. Applying Python
Traditional healthcare is changing from expedient
and R functionalities while using the DAX and M-
language formulas is the advantage of Power BI. It and discretionary decision-making to evidence-based
gives a better result due to the combined strengths of medical practices (Piai and Claps, 2013).
different programming languages. Evidence care is a healthcare practice where
we base the patients' conditions on scientific proof.
Through consolidating data from various outlets, big
data offers evidence-based treatment. The data trends
4 APPLICATION AND and patterns would provide sufficient support for
BENEFITS OF BIG DATA diagnosis and treatment (Piai and Claps, 2013).
ANALYTICS
4.3 Enhancement of Public Health
4.1 Preventive Medicine Monitoring
Preventive medicine is arguably the most innovative The analysis of healthcare data with ground-breaking
application of big data analysis which employs methods aids in the epidemic trends analysis, disease
cutting-edge data analytics methods: for disease outbreaks monitoring, and the spread of disease. This
detection and classification, association analytics, approach improves public health monitoring,
and clustering, with the promise of efficiently education, and reaction time. An excellent example is
discovering valuable patterns by analyzing large the Covid 19 pandemic surveillance system in the
amounts of unstructured, heterogeneous, non- United Kingdom which offers a daily update of a
standard data (Razzak et al., 2020). Appropriate postcode district-based location with infection rates
disease prevention involves identifying and treating in that district, generates a risk score, and
at-risk patients. To increase therapeutic adherence, communicates it to the user. Furthermore, the app
several preventative strategies are employed. allows users to check into a specific place, recording
Pertinent data, such as body temperature, pulse, and their presence at that particular time and date. The app
blood pressure, are electronically collected, enabling also stores an individual's check-ins with the name
automated risk prediction. Consonantly, the increased and IDs of such locations, which work with the test
usage has contributed significantly to the
Figure 5: Emerging Ethical Issues in Healthcare Big data
and trace teams to inform users on association with a of specific articles related to the proposed research.
particular area at a given time. For example, suppose We used an inclusion basis to choose big data and
someone visits a local bar and is tested positive with healthcare papers to find relevant papers to answer
Coronavirus. In that case, the app alerts everyone who research questions based on predefined keywords.
has also checked in the same place to self-isolate or Our aim is to support developing an emerging ethical
quarantine. framework for Healthcare big data, as shown in Figu
7 EVALUATION AND
DISCUSSION OF FINDINGS
7.1 Conclusion and Implications
Removing private information to increase patients'
anonymity is a powerful method of protecting patient
data. The difficulty faced is determining the
Figure 12: Big data and AI in Healthcare removable feature with high sensitivity from the data.
While, in cases such as the coronavirus pandemic, the
use of sensitive patient data such as location may
improve governments' and research institutions'
ability to combat the threat more quickly by a
surveillance system that provides location data used
to curb the current crisis. The diabetes data, on the
other hand, has features that could give
discriminatory and stereotypical generalizations.
Data scientists must be mindful that utilizing
Figure 13: Sensitive Data these large amounts of data comes at the expense of
human liberty and social autonomy. Lessening the
The HINT (NCI, 2020) survey results (shown risks of using these data must be monitored by
in Figure 14 below) indicate that, while the majority established legislative measures, such as the General
of respondents are concerned about unauthorized Data Protection Regulation (GDPR). The Human-
access to their health records, they have confidence Centered Design approach must be the intent and
that medical providers and institutions would value goals of data usage, including its processing, analysis,
their voice and therefore keep their data secure. warehousing, and dataset sharing.
The following are the main conclusions
observed from these principles and criteria for
operational use of data-driven healthcare analytics: