Chatterjee P. Machine Learning Algorithms and Applications in Engineering 2023
Chatterjee P. Machine Learning Algorithms and Applications in Engineering 2023
and Applications in
Engineering
Features:
Edited by
Prasenjit Chatterjee, Morteza Yazdani,
Francisco Fernández-Navarro, and
Javier Pérez-Rodríguez
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2023 Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author
and publisher cannot assume responsibility for the validity of all materials or the consequences
of their use. The authors and publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged
please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying, microfilming, and recording, or in
any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.
copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.
ISBN: 9780367569129 (hbk)
ISBN: 9780367612559 (pbk)
ISBN: 9781003104858 (ebk)
DOI: 10.1201/9781003104858
Typeset in Palatino
by Newgen Publishing UK
Contents
Preface......................................................................................................................vii
Organization of the Book....................................................................................... ix
The Editors.............................................................................................................xiii
v
vi Contents
Index...................................................................................................................... 315
Preface
vii
Organization of the Book
ix
x Organization of the Book
xiii
newgenprepdf
areas, all of them with a high impact index (Q1). His teaching experience
has been mainly in different and diverse undergraduate subjects within the
official teaching load assigned to the Department of Quantitative Methods of
Loyola Andalucía University.
1
Machine Learning for Smart Health Care
Rehab A. Rayan
Department of Epidemiology, High Institute of Public Health,
Alexandria University, Egypt
[email protected]
CONTENTS
1.1 Introduction..................................................................................................... 1
1.2 Major Applications of ML in Health Care.................................................. 3
1.2.1 Medical Diagnostics........................................................................... 3
1.2.2 Precision Health.................................................................................. 5
1.2.3 Monitoring Health.............................................................................. 7
1.3 Opportunities and Limitations..................................................................... 9
1.4 Conclusions................................................................................................... 10
1.1 Introduction
Machine learning (ML) incorporates advanced algorithms working on dis-
parate big data to reveal valuable trends that would be challenging to be
figured out by even skilled experts. Nowadays, ML applications spawn
several disciplines such as gaming (Silver et al. 2018), suggesting products
(Batmaz et al. 2019), and self-driven cars (Bojarski et al. 2016) while in
medicine, the examples are the human genome project (Venter et al. 2001)
and the cancer omics (genomics and proteomics) (Zhang et al. 2019; Ellis
et al. 2013). Gathering and exploring health-related big data has the poten-
tial to shift medicine into a data-centered and outcomes-focused field with
advances in detecting, diagnosing, and managing diseases. Molecular and
phenotypic datasets were developed, which covered genetic examination for
individualized cancer therapy, high-resolution three-dimensional anatomical
body parts’ images, histological examination of biopsies, and smartwatches
with biosensors for monitoring heart rates and alerting about abnormalities
DOI: 10.1201/9781003104858-1 1
2 Machine Learning Algorithms and Applications in Engineering
(Shilo et al. 2020). Such big data supply the raw material for a future of early,
precise diagnosis, customized therapies, and continuous monitoring of
health.
ML could promote health care by releasing the potential of health big
data. Prior applications of ML in diagnosis and care were promising in
detecting breast cancer from X-rays (McKinney et al. 2020), finding novel
antibiotics (Stokes et al. 2020), expecting gestational diabetes early from
digital medical records (Artzi et al. 2020), and determining patients who
share a molecular signature of therapeutic responses (Zitnik et al. 2019). ML
digital pattern recognition could tackle complicated health big data in cases
where a manual exploration is infeasible and ineffective. Several diseases
include complicated modifications, which are experienced robustly and dif-
ferently among patients and need diligent exploration and cautious evalu-
ation of disparate big data to ascertain unique patterns so as to diagnose
and manage, hence, helping health care providers and researchers to find
and describe valuable insights from such big data (Rajkomar et al. 2019).
The functionality of a new algorithm could be strictly evaluated with previ-
ously proven associations between either quantitative biomarkers or quali-
tative data (which vary based on demographics and ecological exposures)
and patient health outcomes.
More research models have been developed for gathering and assembling
big data that relate attributes to a health condition, which could be applied
in training and testing ML techniques. Cancer models could accumulate
molecular profiles from testing frameworks or patients’ samples with data
on diagnosis, treatment, and prognosis; for instance, the Cancer Dependency
Map has gathered data on genomic stability, multimodal molecular profiles,
and therapeutic responses from thousands of cancer cell lines. By applying
innovative algorithms, these models could bring about a shift in know-
ledge about illnesses and advance anticipating health outcomes (Tsherniak
et al. 2017).
ML is derived from artificial intelligence (AI), which involves techniques to
facilitate machines showing learning and reasoning similar to humans. ML
stresses on building algorithms to learn from data. Major ML classes involve
supervised learning where datasets are linked to a certain outcome; categor-
ical values use classification techniques such as “healthy” or “diseased”
while continuous ones are applied in regression models such as responding
to treatment levels; semi-supervised or unsupervised techniques to classify
data into specific sets, which could be manually tagged and linked to an out-
come; ensemble learning, in which findings from many digital frameworks
are integrated to give a final recommendation enabling more precise
estimations via facilitating frameworks that enhance scaling-up to novel
data; deep learning (DL) that applies artificial neural networks, inspired by
similar networks in the human brain, to identify trends and relations in the
data where it is valuable operating on unstructured data like text, speech, or
images; and Bayesian learning where previous information is encrypted into
Machine Learning for Smart Health Care 3
FIGURE 1.1
ML applications in health.
response to therapy. Deep learning (DL) could analyze and interpret medical
images where many late pieces of research indicated that programs applying
ML such as computer-aided detection (CAD) could interpret radiological
scans comparable to radiologists. For instance, a DL-based CAD program
could precisely detect diabetic retinopathy (Gulshan et al. 2016) and deter-
mine all grades in situ or invasive breast cancer comparable to medical
experts (McKinney et al. 2020). Therefore, with the help of big data, DL-based
techniques could function similarly to professionals over a spectrum of diag-
nosing jobs in medical imaging with high precision (Liu et al. 2019).
Molecular assays could determine abnormalities in genes and measure
stages of gene expression and abundant proteins in different samples such
as tissues, saliva, or blood where ML could identify complicated biomarkers
linked to different diseases, informing patients’ outcomes, and determining
proper managing planes. In cancer biology, for instance, applying positioning
nucleosome (Heitzer et al. 2019) and methylating DNA (Kang et al. 2017)
Machine Learning for Smart Health Care 5
in the blood to estimate the source tissue of a tumor, measuring the stage
of induction in the cell pathway from biopsies or other samples (Way and
Greene 2018), applying magnetic resonance imaging to estimate genetic
characteristics of brain tumors (P. Chang et al. 2018), and integrating imaging
with omics to predict outcomes in cancer patients (Chaudhary et al. 2018;
Mobadersany et al. 2018). ML also could identify patients who are deprived
of sleep via exploring blood mRNA, indicating the adverse effect of inad-
equate sleep on health (Laing et al. 2019). With the help of coordinated data
of different sources and biomarkers, the ML models are promising to work
more precisely than current techniques that are usually confined to a low set
of biomarkers reflecting only limited insights of the complicated diseases.
Cooperative human–machine diagnostic techniques, as shown in Figure 1.1,
are promising for taking the advantages of both humans and machines where
the health care provider could arrive at a diagnosis via combining all the
accessed data including those generated through ML platforms (Ahuja 2019).
Hence, ML would computerize regular diagnosis, labeling troublesome cases
in need for further human intervention, and offering more valuable data to
reach a diagnosis (Ardila et al. 2019). Therefore, integrating knowledge from
both health care experts and innovative algorithms would advance diag-
noses however. There is a need to evaluate the biologic utility of these func-
tionalities. Hence, prior to broad installation and implementation, it is vital
to control the transparency of ML applications via optimal goals, qualities,
measurable functionalities, and constraints of certain algorithms and their
verifying processes (Cai et al. 2019). Thus, assisting health care providers
in applying ML programs to reach correct conclusions enhances decision-
making. ML- based programs could grow confidence in the health care
system, enabling more knowledge about the implicit biologic pathways in
illnesses (Ching et al. 2018).
ML, coupled with more advances in techniques for clinical investigations,
requires considering the balance between detection rates for diseases,
patient outcomes, and distinct elements influencing health and quality of
life. Applying ML techniques would elevate the rate of detecting diseases,
and differentiating severe from minor diseases would be necessary to deter-
mine subgroups of diseases, choose highly effective therapies, and eliminate
overtreatment. For ML to promote health care, thoughtfully formulating clin-
ical objectives and the related testing and validation measures are required.
1.2.2 Precision Health
In precision health, a promising ML application, health care is customized
as per the patient’s disease profile. In precision oncology, an initial model
for ML in precision health, cancer is managed according to the molecular
features of the tumor. Lately, molecular biomarkers for a subject like the levels
of gene expressions or physical transformations usually inform the choice
6 Machine Learning Algorithms and Applications in Engineering
1.2.3 Monitoring Health
Treating complex diseases would be shifted from curative toward man-
aging approaches. Such a wide approach for managing health would pre-
serve health across several diseases and the natural aging process. Managing
health would need continuous monitoring of all health aspects for likely
diseases and personalizing therapies according to patients’ responses and
hence the key role of ML. Figure 1.2 shows integrating data and ML for inclu-
sive, ongoing, and precision health monitoring where data collected at both
houses or clinics are combined through predictive frameworks. Inclusive
frameworks are promising for better performance since they integrate more
personal data and are adaptable in any setting.
Beyond the clinical environment, wearables and intelligent home elec-
tronics could be used to manage health by gathering vast quantities of high-
quality health-related data to be used by ML techniques for recommending
timely interventions, changing lifestyle, or referring to a health care provider
for consultations. Today, wearables have in- built biosensors to monitor
movement, pulsation, rates of respiration, and levels of oxygen, body tem-
perature, and blood pressure, among other indicators. Test models showed
FIGURE 1.2
Inclusive ML framework.
8 Machine Learning Algorithms and Applications in Engineering
that data from wearables could be used for managing diabetes (Chang et al.
2016), detecting atrial fibrillation (Bumgarner et al. 2018) or early diagnosis
of Parkinson’s disease (Lonini et al. 2018), tracking cholesterol levels in the
blood (Fu and Guo 2018), complying with therapeutic regimens (Car et al.
2017), and to prompt an alert on a cardiac arrest (Sahoo, Thakkar, and Lee
2017). Speech-based home aids could identify agonal breathing, an audible
early sign of a heart attack (Chan et al. 2019). Soon, ML applications would be
promising in detecting more biomarkers from audio and wearable sensors’
data, maybe via integrating data among various platforms where DL and
classical supervised learning might design models from such data.
Applying ML to the gathered data via smartphones is also promising for
diagnostics. DL techniques could analyze smartphone-captured photos to
detect various forms of dermal tumors (Esteva et al. 2017) and to identify dia-
betes retinopathy (Micheletti et al. 2016). Lately, researchers have found that
smartphone-gathered sensory data such as voice, response time, and acceler-
ometer data could be processed applying ML to monitor symptoms and pro-
gress of Parkinson’s disease (Ginis et al. 2016). These tested models showed
that ML- based wearables, home devices, and smartphones could collect
valuable data involving biometric measures, images, diet consumption, and
ecological data (Vermeulen et al. 2020). Linking such data with diagnoses,
ML could recognize patterns within the data and point to a certain diagnosis.
Managing health implies continuous monitoring of the subject’s body
functions and behavior via wearables and home devices coupled with results
from regular blood examinations. Using baseline activities and functions,
individualized frameworks could be generated by tailoring population-wide
frameworks to gathered personal data, enabling accumulating individual
baselines and detecting deviations, which might reflect changes in health.
Applying individualized frameworks, ML techniques could monitor all
abnormalities in a person and alert them if a consultation with health care
providers should be done. Besides, tracking subjects who are searching online
for their symptoms, for example, self-reported health problems like losing
weight, bronchitis, coughing, chest pain, among others, along with machine-
trained multipersonal patterns, could be used to detect pancreas and lung
tumors early (White and Horvitz 2017; Paparrizos, White, and Horvitz 2016),
hence notifying a health care professional or a patient suggesting seeking
medical care if a further severe condition might convey the apparently
simple searched symptoms. However, several privacy-related problems are
expected.
Using ML, health care providers would be able to interpret the highly pre-
cise molecular testing and imaging to recognize significant biomarkers and
reach a final diagnosis. Digital search results and multidimensional modeling
for dissimilar patients would enable diagnosing diseases requiring therapies
and provide informed therapeutic options. Following diagnosis and therapy,
managing health starts again with continuous monitoring of personal health.
Machine Learning for Smart Health Care 9
Rewarding techniques for sharing data, which boost variable datasets for
learning, are also required, such as local and global standards for sharing
data, which acquire data from both large hospitals and small clinics. ML
applications, which enhance patients’ therapeutic responses in large hospitals,
might not suit small clinics because of variability in patients and general
care. Yet, the ideal target of gathering health-related data for ML requires
acquiring data from proper representative patients’ populations for building
precise ML frameworks that can be generalized to the diverse public. Hence,
focused attempts considering factors like patient condition before therapy,
therapeutic regimes, age, sex, race, ethnicity, and ecological risks are needed
(Goecks et al. 2020).
A thorough analysis of ML techniques applied in medicine is required, par-
ticularly with ongoing learning. The functionality of an ML system is best
tested through the precision of longitudinal predictions. An iterative ML tech-
nique might involve training with retrospective data, deploying algorithms,
and evaluating precision of the acquired predictions. Deployment-gathered
data, and further retrospective big data, could retrain and enhance the algo-
rithm, followed by a cycle of testing for deployment. Assessing ongoing
learning systems, like those used in monitoring health, which adapt to changes
in behaviors or health conditions, would possibly need strengthening such
cycle and applying deployment-gathered data in identifying both barriers
and flaws. Besides, measuring confidence intervals is vital, as some ML
applications would be further tolerating uncertain predictions than others,
hence confidence intervals could inform making decisions.
Iteratively training and deploying applications of ML holds regulatory
limitations since the majority of diagnostic and therapeutic applications pre-
sume fixed data models. Upgrading models with novel data or adapting
them for new diagnoses and therapies would require continuous assessment
for the reliability and precision of the predictions. Thus, there is a need for
real or simulated datasets, which are prospective and multidimensional and
also costly for robustly assessing the applications of ML in medicine.
1.4 Conclusions
ML incorporates advanced algorithms working on disparate big data to
reveal valuable trends that would be challenging to be figured out by even
skilled experts. ML could promote health care by releasing the potential of
health big data. This chapter discussed how ML could promote healthcare via
better medical diagnostics and precision or monitoring health, highlighting
opportunities and successful early applications, and ending with challenges
that hinder achieving the full potential of ML in health. With the growing
powerful ML supercomputers and the infrastructure for gathering and
Machine Learning for Smart Health Care 11
References
Ahuja, Abhimanyu S. 2019. “The Impact of Artificial Intelligence in Medicine on the
Future Role of the Physician.” PeerJ 7 (October). doi:10.7717/peerj.7702
Ardila, Diego, Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi, Joshua J. Reicher, Lily
Peng, Daniel Tse, et al. 2019. “End-to-End Lung Cancer Screening with Three-
Dimensional Deep Learning on Low- Dose Chest Computed Tomography.”
Nature Medicine 25(6): 954–61. doi:10.1038/s41591-019-0447-x
Artzi, Nitzan Shalom, Smadar Shilo, Eran Hadar, Hagai Rossman, Shiri Barbash-
Hazan, Avi Ben-Haroush, Ran D. Balicer, Becca Feldman, Arnon Wiznitzer, and
Eran Segal. 2020. “Prediction of Gestational Diabetes Based on Nationwide
Electronic Health Records.” Nature Medicine 26 (1): 71– 76. doi:10.1038/
s41591-019-0724-8
Batmaz, Zeynep, Ali Yurekli, Alper Bilge, and Cihan Kaleli. 2019. “A Review on Deep
Learning for Recommender Systems: Challenges and Remedies.” Artificial
Intelligence Review 52(1): 1–37. doi:10.1007/s10462-018-9654-y
Björnsson, Bergthor, Carl Borrebaeck, Nils Elander, Thomas Gasslander, Danuta
R. Gawel, Mika Gustafsson, Rebecka Jörnsten, et al. 2019. “Digital Twins to
Personalize Medicine.” Genome Medicine 12(1): 4. doi:10.1186/s13073-019-0701-3
Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat
Flepp, Prasoon Goyal, Lawrence D. Jackel, et al. 2016. “End to End Learning
for Self-Driving Cars.” ArXiv:1604.07316 [Cs], April. http://arxiv.org/abs/
1604.07316.
Brown, Benjamin P., Yun- Kai Zhang, David Westover, Yingjun Yan, Huan Qiao,
Vincent Huang, Zhenfang Du, et al. 2019. “On-Target Resistance to the Mutant-
Selective EGFR Inhibitor Osimertinib Can Develop in an Allele Specific Manner
Dependent on the Original EGFR Activating Mutation.” Clinical Cancer Research,
12 Machine Learning Algorithms and Applications in Engineering
Ellis, Matthew J., Michael Gillette, Steven A. Carr, Amanda G. Paulovich, Richard D.
Smith, Karin K. Rodland, R. Reid Townsend, et al. 2013. “Connecting Genomic
Alterations to Cancer Biology with Proteomics: The NCI Clinical Proteomic
Tumor Analysis Consortium.” Cancer Discovery 3 (10): 1108–1112. doi:10.1158/
2159-8290.CD-13-0219
Esteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen
M. Blau, and Sebastian Thrun. 2017. “Dermatologist-Level Classification of Skin
Cancer with Deep Neural Networks.” Nature 542 (7639): 115–118. doi:10.1038/
nature21056
Fu, Yusheng and Jinhong Guo. 2018. “Blood Cholesterol Monitoring with Smartphone
as Miniaturized Electrochemical Analyzer for Cardiovascular Disease
Prevention.” IEEE Transactions on Biomedical Circuits and Systems 12(4): 784–790.
doi:10.1109/TBCAS.2018.2845856
Ginis, Pieter, Alice Nieuwboer, Moran Dorfman, Alberto Ferrari, Eran Gazit,
Colleen G. Canning, Laura Rocchi, Lorenzo Chiari, Jeffrey M. Hausdorff, and
Anat Mirelman. 2016. “Feasibility and Effects of Home-Based Smartphone-
Delivered Automated Feedback Training for Gait in People with Parkinson’s
Disease: A Pilot Randomized Controlled Trial.” Parkinsonism & Related Disorders
22 (January): 28–34. doi:10.1016/j.parkreldis.2015.11.004
Goecks, Jeremy, Vahid Jalili, Laura M. Heiser, and Joe W. Gray. 2020. “How Machine
Learning Will Transform Biomedicine.” Cell 181 (1): 92– 101. doi:10.1016/
j.cell.2020.03.022
Gulshan, Varun, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunachalam
Narayanaswamy, Subhashini Venugopalan, et al. 2016. “Development and
Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs.” JAMA 316 (22): 2402– 2410. doi:10.1001/
jama.2016.17216
Heitzer, Ellen, Imran S. Haque, Charles E. S. Roberts, and Michael R. Speicher.
2019. “Current and Future Perspectives of Liquid Biopsies in Genomics-
Driven Oncology.” Nature Reviews Genetics 20 (2): 71– 88. doi:10.1038/
s41576-018-0071-5
Huang, Cai, Evan A. Clayton, Lilya V. Matyunina, L. DeEtte McDonald, Benedict B.
Benigno, Fredrik Vannberg, and John F. McDonald. 2018. “Machine Learning
Predicts Individual Cancer Patient Responses to Therapeutic Drugs with High
Accuracy.” Scientific Reports 8 (1): 16444. doi:10.1038/s41598-018-34753-5
Kang, Shuli, Qingjiao Li, Quan Chen, Yonggang Zhou, Stacy Park, Gina Lee, Brandon
Grimes, et al. 2017. “CancerLocator: Non-Invasive Cancer Diagnosis and Tissue-
of-Origin Prediction Using Methylation Profiles of Cell-Free DNA.” Genome
Biology 18(1): 53. doi:10.1186/s13059-017-1191-5
Kreimeyer, Kory, Matthew Foster, Abhishek Pandey, Nina Arya, Gwendolyn Halford,
Sandra F. Jones, Richard Forshee, Mark Walderhaug, and Taxiarchis Botsis.
2017. “Natural Language Processing Systems for Capturing and Standardizing
Unstructured Clinical Information: A Systematic Review.” Journal of Biomedical
Informatics 73: 14–29. doi:10.1016/j.jbi.2017.07.012
Kurnit, Katherine C., Ecaterina E. Ileana Dumbrava, Beate Litzenburger, Yekaterina
B. Khotskaya, Amber M. Johnson, Timothy A. Yap, Jordi Rodon, et al. 2018.
“Precision Oncology Decision Support: Current Approaches and Strategies for
the Future.” Clinical Cancer Research: An Official Journal of the American Association
for Cancer Research 24(12): 2719–2731. doi:10.1158/1078-0432.CCR-17-2494
14 Machine Learning Algorithms and Applications in Engineering
Laing, Emma E., Carla S. Möller-Levet, Derk-Jan Dijk, and Simon N. Archer. 2019.
“Identifying and Validating Blood MRNA Biomarkers for Acute and Chronic
Insufficient Sleep in Humans: A Machine Learning Approach.” Sleep 42 (1).
doi:10.1093/sleep/zsy186
Lasso, Gorka, Sandra V. Mayer, Evandro R. Winkelmann, Tim Chu, Oliver Elliot, Juan
Angel Patino-Galindo, Kernyu Park, Raul Rabadan, Barry Honig, and Sagi D.
Shapira. 2019. “A Structure-Informed Atlas of Human-Virus Interactions.” Cell
178(6): 1526–1541.e16. doi:10.1016/j.cell.2019.08.005
Liu, Xiaoxuan, Livia Faes, Aditya U. Kale, Siegfried K. Wagner, Dun Jack Fu, Alice
Bruynseels, Thushika Mahendiran, et al. 2019. “A Comparison of Deep Learning
Performance against Health- Care Professionals in Detecting Diseases from
Medical Imaging: A Systematic Review and Meta-Analysis.” The Lancet Digital
Health 1(6): e271–e297. doi:10.1016/S2589-7500(19)30123-2
Lonini, Luca., Dai, Andrew., Shawen, Nicholas. et al. Wearable sensors for Parkinson’s
disease: which data are worth collecting for training symptom detection
models. npj Digital Med 1, 64 (2018). https://doi.org/10.1038/s41746-018-0071-z
McKinney, Scott Mayer, Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha
Antropova, Hutan Ashrafian, Trevor Back, et al. 2020. “International Evaluation
of an AI System for Breast Cancer Screening.” Nature 577 (7788): 89– 94.
doi:10.1038/s41586-019-1799-6
Metzcar, John, Yafei Wang, Randy Heiland, and Paul Macklin. 2019. “A Review of
Cell-Based Computational Modeling in Cancer Biology.” JCO Clinical Cancer
Informatics 3: 1–13. doi:10.1200/CCI.18.00069
Micheletti, J. Morgan, Andrew M. Hendrick, Farah N. Khan, David C. Ziemer, and
Francisco J. Pasquel. 2016. “Current and Next Generation Portable Screening
Devices for Diabetic Retinopathy.” Journal of Diabetes Science and Technology
10(2): 295–300. doi:10.1177/1932296816629158
Mobadersany, Pooya, Safoora Yousefi, Mohamed Amgad, David A. Gutman, Jill S.
Barnholtz-Sloan, José E. Velázquez Vega, Daniel J. Brat, and Lee A. D. Cooper.
2018. “Predicting Cancer Outcomes from Histology and Genomics Using
Convolutional Networks.” Proceedings of the National Academy of Sciences
115(13): E2970–E2979. doi:10.1073/pnas.1717139115
Paparrizos, John, Ryen W. White, and Eric Horvitz. 2016. “Screening for Pancreatic
Adenocarcinoma Using Signals from Web Search Logs: Feasibility Study and
Results.” Journal of Oncology Practice 12 (8): 737–744. doi:10.1200/JOP.2015.010504
Rajkomar, Alvin, Jeffrey Dean, and Isaac Kohane. 2019. “Machine Learning in
Medicine.” New England Journal of Medicine, April. Massachusetts Medical
Society. www.nejm.org/doi/10.1056/NEJMra1814259.
Sahoo, Prasan Kumar, Hiren Kumar Thakkar, and Ming-Yih Lee. 2017. “A Cardiac
Early Warning System with Multi Channel SCG and ECG Monitoring for Mobile
Health.” Sensors (Basel, Switzerland) 17(4). doi:10.3390/s17040711
Shickel, Benjamin, Patrick Tighe, Azra Bihorac, and Parisa Rashidi. 2018. “Deep
EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic
Health Record (EHR) Analysis.” IEEE Journal of Biomedical and Health Informatics
22(5): 1589–1604. doi:10.1109/JBHI.2017.2767063
Shilo, Smadar, Hagai Rossman, and Eran Segal. 2020. “Axes of a Revolution: Challenges
and Promises of Big Data in Healthcare.” Nature Medicine 26 (1): 29– 38.
doi:10.1038/s41591-019-0727-5
Machine Learning for Smart Health Care 15
2
Predictive Analysis for Flood Risk Mapping
Utilizing Machine Learning Approach
Kandivali, India
2
Assistant Professor, Department of Electronics Engineering, Thakur College of
Engineering & Technology, Kandivali, India
3
Professor, Department of Electronics Engineering, Thakur College of Engineering &
Technology, Kandivali, India
4
Associate Professor, Department of Electronics Engineering, Thakur College of
Engineering & Technology, Kandivali, India
CONTENTS
2.1 Introduction................................................................................................... 18
2.1.1 Machine Learning (ML)................................................................... 20
2.1.2 ML Algorithms Used for Flood Risk Mapping............................ 20
2.1.2.1 Artificial Neural Networks............................................... 20
2.1.2.2 Multilayer Perception........................................................ 20
2.1.2.3 Adaptive Neuro Fuzzy Inference System
(ANFIS)................................................................................ 21
2.1.2.4 Wavelet Neuro Networks (WNN)................................... 21
2.1.2.5 Support Vector Machine (SVM)....................................... 21
2.2 Methodology................................................................................................. 21
2.2.1 Study Area......................................................................................... 21
2.2.2 Remote Sensing................................................................................. 24
2.2.3 LULC Map Creation......................................................................... 24
2.2.4 State Selection................................................................................... 24
2.2.5 Markov Model.................................................................................. 25
2.2.6 Transition Matrix Calculations....................................................... 25
2.3 Results............................................................................................................ 27
2.4 Conclusion..................................................................................................... 28
DOI: 10.1201/9781003104858-2 17
18 Machine Learning Algorithms and Applications in Engineering
2.1 Introduction
Land cover refers to the surface cover of the ground, whether it is vegetation,
water, or bare soil. In short, land cover indicates the physical type of the land
(water, snow, grassland, soil). Land use refers to human activities that are
directly related to the land (agricultural land, canal, built-up land, and other
human-made characteristics). Together, they form a pattern called as land use
and land cover (LULC) pattern. This pattern is an outcome of socioeconomic
and natural factors and their utilization by humans in time and space [1].
Land use and land cover mapping is carried out to study the land utilization
pattern and future planning and management of the land resource to either
avoid economic loss due to natural factors or improve the ecological balance in
the system. Also, it is used for planning and development of land parcel. The
assessment of risks and the development of risk maps for future land use and
infrastructure development is essential. Therefore, the change in land use land
cover (LULC) pattern is detected. This pattern is affected by floods that occur
in a particular region. Floods can cause devastation to human lives, property,
and possessions as well as disruptions in communications. Flood tends to occur
when rainfall is very high, absorption is very low, and when overflows are not
controllable. Some other factors responsible for floods are unconditional rainfall,
increase in the number of low-lying areas, rising sea level, poor sewage system,
and low absorption capability of soil. Using predictive analysis and remote
sensing methods, floods can be predicted in advance, which will not only help
in detecting imminent patterns but will also help us in preparing beforehand
to tackle such situations and come up with proper contingency plans. We plan
to model and predict future impacts as well as the rate of occurrence of floods
using image processing and predictive analysis.
Earlier, when no remote sensed data and computer assistance were avail-
able, land use/land cover changes were detected with the aid of tracing
papers and topographical sheets. But this method was tedious, inefficient,
and inaccurate. Studying large areas required considerable amount of effort
and time. Conventional ground methods used for land use mapping are
labor-intensive, time-consuming, and are done less frequently [2]. Thus,
with the advent of satellite remote sensing techniques, which allow for easy
collection of data over a variety of scales and resolutions, preparing accurate
land use land cover maps was feasible. Also, monitoring changes at regular
intervals of time became relatively simpler. In case of an inaccessible region,
the only method of acquiring required data is by applying this technique.
Today, remote sensing and geographical information system (GIS) tech-
nology has enabled ecologists and natural resources managers to acquire
timely data and observe periodical changes. Predicting floods in any location
has remained a longstanding challenge, which plays a major role in emer-
gency management.
Predictive Analysis for Flood Risk Mapping 19
Remote sensing makes it easier to locate floods that have spread over a
large region, thereby making it easier to plan rescue missions quickly. Also,
remote sensing is a relatively cheap and efficient method in comparison to
traditional methods. Predictive analytics offers a unique opportunity to iden-
tify certain trends and patterns that are used to identify future outcomes.
Implementation of predictive mapping techniques became easier with the
advent of predictive modeling techniques and their simple incorporation
with other technologies. Thus, by combining both remote sensing and pre-
dictive analysis, a model can be created that will help in the detection of
floods well in advance.
High-definition satellite images can be integrated with socioeconomic
data in order to build environmental, economic, and social threat prediction
models. Using these models, one can build demand-driven applications to
help public and private organizations understand, prepare, and respond to
economic, social, and humanitarian losses in a timely manner. Such models
can be used to make near-real-time remote sensing applications by adding
additional layers of response tools, like alerts and advisories. Ideally, these
technologies would be able to be scaled up from local to global for emergency
management and risk management. Some of the benefits it can offer are dis-
aster risk reduction, event prediction for timely response and recovery, and
allowing stakeholders to better target investments and protect assets. With
the help of this study, governments can formulate suitable policies that are in
the interest of the people and which will also help us in attaining ecological
stability.
Among the cataclysmic events, floods are the most dangerous in
causing severe harm to human life, foundation, farming, and the finan-
cial frameworks [1]. Flood risk analysis is undertaken to research land use
and is also used in the planning and management of future land resources,
which is done either to prevent economic loss due to natural causes or to
enhance the system’s ecological balance. Using predictive analysis and
remote sensing methods, floods can be predicted in advance, which will not
only help in detecting imminent patterns but will also help us in preparing
beforehand to tackle such situations and come up with proper contingency
plans. Implementation of predictive mapping techniques became simpler
with the advent of predictive modeling techniques and its simple incorpor-
ation with other technologies [1]. Robust and precise prediction significantly
leads to water resource management strategies, policy recommendations
and research, as well as further forecasting and evacuation. Governments
are actually under pressure to set up dependable and precise maps of flood
risk regions and further prepare for effective management of floods [1].
Governments will be able to devise suitable policies with the help of this
study that are in people’s interest and which will eventually help to achieve
ecological stability.
20 Machine Learning Algorithms and Applications in Engineering
2.1.2.2 Multilayer Perception
For a more sophisticated modeling approach, the knack percolation learning
algorithm calculates the propagation error in hidden network nodes indi-
vidually. Nonlinear activation functions resolve the major drawbacks of
linear activation function by allowing backpropagation because they have
an input-related derivative function [1]. And they allow various layers of
neurons to stack to create a deep neural network. Because of many variables,
MLP is considered to be tougher to optimize.
Predictive Analysis for Flood Risk Mapping 21
2.2 Methodology
2.2.1 Study Area
Mumbai is the capital of the state of Maharashtra, one of states with the
highest population. It is located on a peninsula on the western coast of India
sharing borders with Arabian Sea to the south and west, Mira Bhayander
and Thane to the north and Navi Mumbai to the east [2]. The study area, that
is, Mumbai suburban district, has an area of 446 sq. km. and a population
22 Machine Learning Algorithms and Applications in Engineering
TABLE 2.1
Literature Review of Work Done by Various Authors over Similar Tracks
TABLE 2.1 (Continued)
Literature Review of Work Done by Various Authors over Similar Tracks
Presented at Year Name Author Inference
National Institute 2011 Land use and land Biswajit Majumdar Mapping of land
of Technology, cover change use and land
Rourkela detection at Sukinda cover pattern
Valley using remote
sensing and GIS
Journal of Applied 2018 Land use/land cover Bello Ho Markov based
Science change analysis LULC changes
Environment using Markov-based with respect to
Management model for Eleyele water bodies
Reservoir
International 2018 Accuracy analysis of Marina Gudelj, Workflow for this
Multidisciplinary the inland waters Mateo Gasparovic, project
Scientific Geo detection MladenZrinjski
Conference
Landscape and 2016 Specific features of Szilard Szabo, NDWI, NDVi,
Environment 10 NDVI, NDWI and Zoltan Gasci, MNDWI
MNDWI as Reflected BoglarkaBalazs coverages for
in LC Categories water bodies
LULC study
FIGURE 2.1
Geographical Region of Mumbai Suburban Region.
of 9,356,962 according to the 2011 census [3]. Thus, the density of Mumbai
suburban district is 20,980 people per square kilometer. With elevations ran-
ging from 10 meters to 15 meters, the city generally lies just above the sea
level with an average elevation of 14 meters [4, 5]. This makes the city highly
prone to flooding and waterlogging. Figure 2.1 shows a geographical map of
Mumbai.
24 Machine Learning Algorithms and Applications in Engineering
2.2.2 Remote Sensing
Earlier, when no remote sensed data was available, changes in LULC
patterns were detected using tracing papers and topographic sheets. But that
approach has been slow, unsuccessful, and unreliable. Wide areas of study
involved a considerable amount of time, effort, and cost [5, 6]. Conventional
solutions are labor-intensive, time-consuming, and less often performed.
Therefore, the preparation of detailed land use and land cover maps and
tracking adjustments at regular intervals of time is comparatively simpler
with the advent of satellite remote sensing techniques [7]. In the event of
inaccessible areas, the only method of data acquisition is by applying this
technique. Quantum geographical information system (QGIS) is the software
tool used for remote sensing purposes.
The Bands of the LISS-III Satellite and their Wavelengths are shown in
Table 2.2.
2.2.4 State Selection
Since the Markov model is a stochastic state- based transition model,
selection of states is an important step. The states need to be clearly distin-
guishable from the available LULC maps as that will provide greater
accuracy [10].
TABLE 2.2
Bands of LISS-III Satellite and Their Respective Wavelengths
2.2.5 Markov Model
We have seen that whenever a succession of chance analyses frames a free
preliminaries process, the potential results for each examination are equiva-
lent and happen with a similar likelihood [11, 12]. Further, information on
the results of the past investigations doesn’t impact our expectations for the
results of the following test. The circulation for the results of a solitary trial
is adequate to develop a tree and a tree measure for a grouping of n trials,
and we can respond to any likelihood question about these trials by utilizing
this tree measure. Current likelihood hypothesis reads chance procedures for
which the information of past results impacts expectations for future ana-
lyses. At a fundamental level, whenever we watch a grouping of chance
analysis, the entirety of the past results could impact our forecasts for the
following test [6, 12].
Markov chain calculates how much land is estimated to change from the
latest date to the expected date [7]. In this method, the transfer probabilities
are the output, which is a matrix that records the likelihood that each land
cover class will move to each other class[9, 12]. Through the Markov chain
simulation, the analysis of two different dates of the LULC images creates
the transition matrices, a transition area matrix, and a set of conditional prob-
ability images. The transition probability matrix for the period of 2008–2015
was determined to predict the 2019 LULC map and thus any future given
date for prediction [13]. The probability of one pixel moving to another
LULC or staying in the original LULC can be calculated by producing a tran-
sition matrix of probabilities. The transition matrix is generally important
for predicting a future classification map [14]. If the hypothesis tested has no
significant differences in the observed LULC and the predicted LULC, the
model is considered successful for future predictions [15, 16].
FIGURE 2.2
State Markov Model.
26 Machine Learning Algorithms and Applications in Engineering
FIGURE 2.3
Block Diagram of the proposed model.
FIGURE 2.4
NDWI Map of the year 2011.
2 . 3
R esults
Using QGIS software as a tool for remote sensing, the study area was analyzed
for a period of five years starting from 2011 to 2015. During this analysis,
NDWI values were calculated for each year and their corresponding maps
were generated. Figure 2.4 shows NDWI Map of the year 2011. The NDWI
maps and rainfall data for year 2011–2015 are shown in Figure 2.5. In order
to have a clear picture of the water bodies present in the study area, Band 5
of LISS—III satellite is used for short-wave infrared sensing. These NDWI
values, along with rainfall data in the above-mentioned years, are taken
as parameters or variables for carrying out predictive analysis using the
Markov chain model. Figure 2.6 shows NDWI Map of the year 2015 whereas
Figure 2.7 shows Rainfall Data of the year 2015.
The Markov model, a stochastic state-based transition model, is used with
two different states. The states need to be clearly distinguishable from the
available LULC maps as it will provide greater accuracy. In this case, two
states are decided. The first state is the state of being in flood (A) and the next
28 Machine Learning Algorithms and Applications in Engineering
FIGURE 2.5
Rainfall Data of the year 2011.
state is the state of not being in the flood (B). So, basically there are two states,
viz. A and B. The transition probabilities will be calculated and the transition
matrix will be formed depending upon these states. The transition matrix
calculated for years 2011–2015 is shown in Figure 2.8.
Figure 2.9 shows the implementation of two-state Markov model using the
MATLAB simulation tool. The time steps for simulation are 5, and the transi-
tion probabilities of each element are depicted in this simulation.
2.4 Conclusion
The geographic analysis of study area, viz. coverage area, population, and
height from sea, is performed to make a prediction about the possibility of
flooding and waterlogging. Superior quality satellite images are integrated
Predictive Analysis for Flood Risk Mapping 29
FIGURE 2.6
NDWI Map of the year 2015.
FIGURE 2.7
Rainfall Data of the year 2015.
FIGURE 2.8a
Two State Markov Models for the year 2011–2015.
Predictive Analysis for Flood Risk Mapping 31
FIGURE 2.8b
Two State Markov Models for the year 2011–2015.
FIGURE 2.9
State Markov Model Implementation on MATLAB.
32 Machine Learning Algorithms and Applications in Engineering
References
[1] Mosavi, A.; Ozturk, P.; Chau, K.-W. Flood Prediction Using Machine Learning
Models: Literature Review. Water 2018, 10, 1536.
[2] www.census2011.co.in/census/district/357-mumbai-city.html
[3] http://pibmumbai.gov.in/English/PDF/E2013_PR798.PDF
[4] Biswajit Majumdar, “Land use and land cover change detection at Sukinda Valley
using remote sensing and GIS”, National Institute of Technology Rourkela, 2011.
[5] Rahel Hamad, Heiko Balzter, Kamal Kolo, “Predicting land use/ land
cover changes using a CA- Markov model under two different scenarios”,
Sustainability, 2018.
[6] Jannes Muenchow, Patrik Schratz, Alexander Brenning, “RQGIS: Integrating R
with QGIS for statistical computing,” R Journal, 2017.
[7] S Junaida, Sulaiman and Siti Hajar, Wahab (2018) Heavy Rainfall Forecasting
Model Using Artificial Neural Network for Flood Prone Area. In: IT Convergence
and Security 2017. Lecture Notes in Electrical Engineering, 449 . Springer,
Singapore, pp. 68–76.
[8] S. Khatri and H. Kasturiwale, "Quality assessment of Median filtering techniques
for impulse noise removal from digital images," 2016 3rd International
Conference on Advanced Computing and Communication Systems (ICACCS),
2016, pp. 1–4, doi: 10.1109/ICACCS.2016.7586331.
[9] J. Abbot,; J. Marohasy, “Input selection and optimization for monthly rain-
fall forecasting in Queensland, Australia, using artificial neural networks.
Atmospheric Research, pp. 166–178, 2014.
[10] A. Jain,, S. Prasad Indurthy, Closure to “comparative analysis of event-based
rainfall-
runoff modeling techniques— deterministic, statistical, and artificial
neural networks”.. Journal of Hydrologic Engineering, vol. 9, pp. 551–553, 2004.
[11] R.C. Deo, M. Sahin, Application of the artificial neural network model for pre-
diction of monthly standardized precipitation and evapotranspiration index
using hydrometeorological parameters and climate indices in eastern Australia.
Atmospheric Research, pp. 65–81, 2015.
[12] M.K. Tiwari, C. Chatterjee, Development of an accurate and reliable hourly
flood forecasting model using wavelet–bootstrap–ANN (WBANNss) hybrid
approach. Journal of Hydrology, pp. 458–470, 2010.
[13] C.A. Guimarães Santos, G.B.L.d. Silva. Daily streamflow forecasting using a
wavelet transform and artificial neural network hybrid models. Hydrological
Sciences Journal, pp. 312–324, 2014.
[14] S. Supratid, T. Aribarg, S. Supharatid. An integration of stationary wavelet
transform and nonlinear autoregressive neural network with exogenous
input for baseline and future forecasting of reservoir inflow. Water Resources
Management, vol. 31, pp. 4023–4043, 2017.
[15] E. Dubossarsky, J.H. Friedman, J.T. Ormerod,. M.P. Wand. Wavelet-based gra-
dient boosting. Statistics and Computing, vol. 26, pp. 93–105, 2016.
[16] K. Kasiviswanathan, J. He,; K. Sudheer, J.-H. Tay. Potential application of wavelet
neural network ensemble to forecast streamflow for flood management. Journal
of Hydrology, pp. 161–173, 2016.
[17] L.A. Zadeh. Soft computing and fuzzy logic. In Fuzzy Sets, Fuzzy Logic, and Fuzzy
Systems: Selected Papers by Lotfi a Zadeh, World Scientific: pp. 796–804, 1996.
Predictive Analysis for Flood Risk Mapping 33
[18] P.S. Yu, T.C. Yang, S.Y. Chen, C.M. Kuo, H.W. Tseng.Comparison of random
forests and support vector machine for real- time radar-
derived rainfall
forecasting. Journal of Hydrology, vol. 552, pp. 92–104, 2017.
[19] S. Li, K. Ma, Z. Jin, Y. Zhu. In A new flood forecasting model based on svm
and boosting learning algorithms, Evolutionary Computation (CEC), IEEE
Conference, pp. 1343–1348, 2016.
[20] E. Danso-Amoako, M. Scholz, N. Kalimeris, Q. Yang, J. Shao. Predicting dam
failure risk for sustainable flood retention basins: A generic case study for the
wider greater Manchester area. Computers, Environment and Urban Systemsvol
Vol. 36, pp. 423–433, 2012.
[21] Y.B. Dibike, S. Velickov, D. Solomatine, M.B. Abbott. Model induction with
support vector machines: Introduction and applications. Journal of Computing
in Civil Engineering, vol.15, pp. 208–216, 2001.
[22] Rahel Hamad, Heiko Balzter, Kamal Kolo, Predicting land use/ land cover
changes using a CA-Markov model under two different scenarios, 2018.
newgenprepdf
3
Machine Learning for Risk Analysis
CONTENTS
3.1 Introduction................................................................................................... 35
3.1.1 Machine Learning............................................................................. 36
3.1.2 Risk Analysis..................................................................................... 37
3.2 Risk Assessment and Risk Management................................................... 40
3.2.1 Risk Management............................................................................. 40
3.3 Risks in the Business World........................................................................ 41
3.3.1 Models in the Field of Risk Management..................................... 43
3.4 Machine Learning Techniques for Risk Management and
Assessment.................................................................................................... 44
3.4.1 Challenges of Machine Learning in Risk Management.............. 46
3.4.2 Machine Learning Use Cases in the Financial Sector.................. 48
3.5 Case Studies to Understand the Role of Machine Learning in
Risk Management......................................................................................... 49
3.6 Conclusion..................................................................................................... 51
3.1 Introduction
Machine learning (ML) is the new development in technology. It has the
proficiency of replacing emphatic programming of the devices. It is gener-
ally hinged on the conception that a substantial amount of data is provided
and on the basis of that data and some algorithms, the machine is trained
DOI: 10.1201/9781003104858-3 35
36 Machine Learning Algorithms and Applications in Engineering
and different machine modules are fabricated. The decisions made by the
machines on the basis of data provided are immensely efficient and accurate.
With increased use of technology, ingenious crimes and risks are also escal-
ating. In the modern world, the most important constituent to take off is
risk management and its intensifying production. This chapter provides all
the information regarding how machine learning has been applied in risk
assessment (Apostolakis, 2004; Aven, 2012).
Large-scale organizations, companies, and institutions are prone to risks
like frauds; consequently, they are sticking to various machine learning
techniques that can prevent or abate these frauds or risks. The chapter is
an amalgamation of various strategies for changing the perspective of risk
assessment. On the basis of risk assessment with machine learning, various
case studies covering different aspects are presented in the chapter. These
case studies effectively portray the seriousness of risk assessment. Case
studies elucidate constructing techniques to diminish risks and act as a ref-
erence for future. Several applications are also incorporated in this chapter,
which lead us to the significance of risk assessment in several industries and
organizations for protection from a large amount of frauds and deceptions
altogether with the integration of machine learning in this peculiar field
(Chen, 2008; Cheng, 2016).
The essence of this chapter is related to the arguments that provide a solu-
tion for risks found in the literature followed by a conclusion, which covers
machine learning in the field of risk management as a whole.
3.1.1 Machine Learning
Machine learning can be described as a subcategory of Artificial Intelligence
(AI), which has its main focus on examining and recognizing patterns and
arrangements in data to facilitate features such as training, thinking, decision
making, learning, and researching without interference. Machine learning
allows the user to pack an enormous sum of data with a computer algo-
rithm and allows the computer to examine and analyze the data to make
recommendations based on the input received. If some features require
redesigning, they are classified and improved for a better design for the
future (Creedy, 2011; Comfort, 2019).
The main aim of the technology is to produce an easy- to-
use mech-
anism, which works on making decisions by the computational algo-
rithm. Variables, algorithms, and innovations are accountable for making
decisions. Awareness toward the solution is needed for the better learning
of the working of the systems and thus helps in understanding the path to
reach the result.
In the initial stages of implementation of the algorithm, input or data is fed
to the machine provided the result is already known for that set of feed data.
The changes are then made to produce the result (Diekmann, 1992; Durga,
Machine Learning for Risk Analysis 37
FIGURE 3.1
Machine learning algorithm process.
2009; Goodfellow, 2016). The efficiency of the result depends on the amount
and the quantity of the data that is fed to the machine as shown in Figure 3.1.
3.1.2 Risk Analysis
Risk is referred to as the occurrence of undesired events while running a pro-
ject, which creates a negative impact on the achievement of the goals of that
project. Risk is the possibility of an unwanted or harmful event. Risk analysis
is done by adopting different methods in order to check the probability of
the happening of the risk and it is performed in order to remove the chances
or minimize the probability of the occurrence of the harmful events. Risk
analysis is desirable as the occurrence of risk is directly proportional to the
amount of losses faced in a project. Every technology or project has an equal
chance of being successful as well as becoming a failure and hence the ana-
lysis is done to reduce this chance, which is called the risk of failure (Kaplan
1981; Hastie, 2009; Hauge, 2015; Haugen, 2015; Khakzad, 2015).
Risk analysis includes identifying the type of hazard that can be associated
with the project, which may be a chemical hazard, mechanical hazard, or
even technical hazard. It is a process that is analytical in nature and the
aim is to know all the desired information related to undesirable events. It
involves an analysis of the hazards that have occurred in the past and also
which have the probability of occurring in future (Khakzad, 2013a; 2013b).
It not only analyzes the hazards but also their consequences on the system
so that appropriate measures can be taken. The main objective is to increase
the chances of success and at the same time minimize the cost or investment
on the project. As important risk analysis is, it is even more difficult to be
performed. The traditional method of using employees requires a lot of time
to complete the process (King, 2001; Kongsvik, 2015; Landucci, 2016a, 2016b).
But now with the growth of industries, the amount of data generated is very
38 Machine Learning Algorithms and Applications in Engineering
high, operations performed on a project are more than before, but the time to
complete that project is lower. So, in order to fulfill the demand versus time
ratio, industries use technical and more reliable methods such as ML. There
are different methods in which risk analysis can be done. These methods are
as follows:
(i) Hazard identification: The primary step to proceed toward risk analysis
is the identification of the problems, risks, or hazards that can be faced
in the future. The number of hazards determines the level of risk for the
project. More the number of hazards higher is the level of risk. Once the
Machine Learning for Risk Analysis 39
FIGURE 3.2
Components of risk analysis.
3.2.1 Risk Management
The next step is to manage the risks that are being assessed and analyzed.
Different strategies are used to manage the risks. These strategies aim to
remove negative effects of the risk and increase productivity. Some people
tend to manage the risk by avoiding it, but it is not the correct way to deal
with the hazards. Because if they are not dealt with at the earlier stage, they
will grow into something serious. So, the correct way of managing risks is
Machine Learning for Risk Analysis 41
first of all accepting them (Paltrinieri, 2019). Risk management is done on the
basis of priority, which means the risk with highest probability of occurrence
or the risk that can cause maximum amount of damage needs to be dealt with
first and the risks that are expected to cause lower loss are prioritized in the
descending order. Hence, a classification of the risks needs to be done on the
basis of loss and probability.
Generally, the amount of loss faced is given more priority than the fre-
quency of its occurrence. It is the most important step because if the risks
are not managed properly, then there is no point in identifying and assessing
them. The unmanaged risks reduce the rate of profit, durability, quality, and
reliability of the project. It can also affect the brand value of a company and
decrease the trust of users. The problem or limitation faced by this phase
is the limited amount of technique and resources available to manage the
hazards and the increase in cost due to risk management. The steps required
in risk management are:
(i)
Identification of the risk and its resource and domain
(ii)
Considering the impact of this on the system
(iii)
Probability of occurrence
(iv) Impact on effective cost
(v) Consequences on the project
(vi) Classification on the basis of priority
(vii) Assessing the constraints that can be faced
(viii) Describing the needs of users and agenda for doing this activity
(ix) Engaging in discussions and communication to decide the managing
technique
(x) Finally, developing an analytical approach in order to manage the risk
(xi) Organizing the resources and cost required for the managing process
Thus, risk management is done on the basis of results obtained from risk
assessment and then the appropriate techniques are chosen by managers.
standards, the old methods of risk management and analysis require modifi-
cation. The major risks that the firms face are the following:
trouble can be caused if the payment is not done on time. Similar is the
case when a particular company is in great debt. International businesses
undergo huge financial risks. We can go back to the example of cosmetics
where the person wanted to sell the products in the international market
and if the person tries to sell the goods in the United States, UK, France,
and India, the company may have to bear the conversion charges of these
different currencies. This type of example comes under financial risk.
(v) Reputational risk: One thing that is common to all the businesses is their
reputation, be it a small size business or a multinational firm. If the
reputation is positive, then the selling of commodity and recruitment of
employees become easier. If the reputation is damaged, then there is an
immediate loss in revenue and the employees might leave the company
too. The advertising agencies may not show interest in the work related
to the company and so on (Yang, 2015).
(c) Equal opportunity: For each protected class, true positive standards
are similar.
(d) Equal odds: For each protected class, true-positive and false-positive
rates are equal.
(iii) Feature engineering: It is the method of managing data to create
innovations that can be executed with the help of machine learning
algorithms. It is the process of selecting significant characteristics from
a fresh pool of data and converting them into forms that are fit for
machine learning. Feature engineering model development in ML is
complicated in comparison to the models used traditionally. The first
reason for this is that machine learning models can combine a higher
amount of information. The second is that there is a need for feature
engineering that is required for disorganized data references at the pre-
processing level before the training method can start.
(iv) Hyper parameters: The variables which define the system composition
and decide the network training method are the hyper parameters. It is
very important to understand the variables and determine the appro-
priate selection of hyper parameters. The approaches for the selection of
hyper parameters include the latest practices used in the industry and
expert judgment.
(v) Production readiness: Machine learning models, despite being algo-
rithmic, require a lot of computation. This element is generally viewed
in the process of model development. The validation is done previously
to evaluate a variety of model risks connected with its usage and for
machine learning, they expand their scope. There is a requirement of
setting a limitation on the data flow through the model, keeping in
mind the run time and the architecture of the model.
(vi) Dynamic model calibration: Sometimes, there is a dynamic change in the
parameters in some types of models depicting the data patterns. The
validators can easily decide which dynamic calibration is best suited
for the firm. The factors that are evaluated include the development of
a monitoring plan, ensuring proper control to reduce risk in accordance
with the usage of the model.
The ML techniques form various models that are quite complex in nature as
it uses too many variables and other parameters in comparison of the output
generated, which make it heavy as well as complex as shown in Figure 3.3.
Due to the complexity in design, the interpretation time also increases, which
46 Machine Learning Algorithms and Applications in Engineering
FIGURE 3.3
Machine learning assessment model techniques.
TABLE 3.1
ML Algorithms for Risk Management
Credit risk Neural Networks, SVM, KNN, Random Forest, Lasso regression,
Cluster analysis
Liquidity risk SVM, ANN, Bayesian Networks
Market risk GELM, Cluster analysis, SOM, Gaussian Mixtures, cluster analysis
Operational risk Nonlinear clustering method, Neural Networks, k-Nearest Neighbor,
Naïve Bayesian, Decision Tree.
RegTech risk SVM
(i) Fraud detection: Machine learning is now often used in the field of fraud
detection, mainly to detect credit card fraud in the banking system.
It has resulted in higher significant success rate till now. Banks have
set up various monitoring and surveillance systems in order to ensure
security. These systems keep a track on the payment fraud activities
that take place more often. The fraud model engine works on the trad-
itional historical payment algorithm that blocks the fraudulent trade as
soon as it detects it.
The training, testing, and authorization of machine learning
algorithms become possible because of large data sets provided by
the credit card payments. Moreover, various classification algorithms
are trained with the help of historical data with identified fraud and
nonfraud tags. This training is done with the help of large historical
nonfraudulent data. The above historical payment data sets generate a
clear view of features of the card by distinguishing them on the basis of
transactions, owner, and history of the payment.
(ii) Credit risk: The term “credit risk” signifies the prospects of loss because
of unsuccessful payments made by the lender. Therefore, the credit risk
management system (CRMS) is implemented in order to meet the losses
by analyzing the fairness of the bank and its capital along with the loan
loss section at any instant of time. This system has enhanced the trans-
parency as demanded by the financial crises. Transparency is carried by
paying more emphasis on the regular examination of the knowledge of
banks for its customers and credit risks.
Suitable credit risk management improves the entire performance
of the system and provides a competitive benefit. Various prediction
models are made in order to make predictions regarding the kind of
lender. This is where the machine learning techniques are implemented.
These machine learning algorithms provide positive results and greater
success rates in solving the problems including credit risks as well.
(iii) Supervision of conduct and market abuse in trading: Another use case of
machine learning in the risk management system is the surveillance of
the performance loopholes generated mainly by the traders working in
the financial institutions. The various trading illegalities often lead to
economic as well as status failure. In order to overcome such crimes and
shortcomings, numerous self-operating systems have been developed
in order to provide a check on the trading behaviors of dealers with
increased accuracy and distinct ways to identify them.
Machine Learning for Risk Analysis 49
On average banks are prone to various risks such as credit risk, market
risk, foreign exchange risk, foreign risk, frauds, etc. These risks are treach-
erous for the reputation of the banks and their worthwhile procedures.
These risks are dormant privation to the banks and they can be
questioned for the obligations on them. Risks are unpredictable and can
develop irrespective of time. How well would it be if the risk can be spe-
cified or predicted? To specify the upcoming risks in advance, machine
learning can be implemented. All the possible risks are first analyzed and
reviewed. During the process of analysis of these risks a large amount of
data is collected which is basically unstructured in nature. This data is
collected from market information, customer reviews, metadata etc.
Machine learning is all about working with data. It processes this data
for working machines. The machines will produce a desired output on
the basis of the input processed. When machine learning along with some
algorithms will be implemented on the large amount of unstructured
data. The machine will interact on the basis of the data and the output
will be generated. This generated output can be a sort of prediction of the
upcoming risk that might rise up in the near future.
For the accurate results from the system, it is important the training of
the large amount of unstructured data that is collected is done in a perfect
way. After the training is finished the data is converted into structured
data or the labeled data. On this final set of data various algorithms
are processed. These algorithms basically mould the data into a frame-
work of a process to take place on the basis of which the final output is
generated. The type of data collected in the final output will also be of the
same category.
50 Machine Learning Algorithms and Applications in Engineering
FIGURE 3.4
Role of machine learning in risk management for disaster management.
Disaster can be defined as the natural affliction that can lead to various
human, environmental and property losses. These disasters are unpre-
dictable but there are several technologies which are operated today to
predict an approximate time for such calamities. The labeled data which
is collected for machine learning is used to predict the disaster in the form
of a sequence of their occurrence. There are various processes which can
forecast the occurrence of the disasters on the basis of their probability
of occurrence as shown in Figure 3.4. During these processes there is a
chance that several fallacies occur. This measure indicates misfiring of
appliance or apparatus, preservation excess, quantity of supplementary
worked, etc.
These machines that predict the upcoming disasters and the human
generation cannot afford risks in these indicators as everyone relies on
their final output. Consequently, the role of risk assessment in fields
like disaster management becomes necessary for protection of a specific
area from sudden tragedy and irreplaceable losses. Machine learning
includes several techniques that look after the prevention of such risks
in this area.
Apart from all the issues and problems, looking at the positives in a situation
is a much more crucial segment. Considering the future of machine learning
in risk management is worthy. This is evident from the fact that the cost of
development is reduced. Moreover, the time-consuming nature of this pro-
cess has also decreased at a higher scale. One such instance is that of Banco
Bilbao Viscera Argentaria (BBVA), a financial service company in Spain. This
Machine Learning for Risk Analysis 51
3.6 Conclusion
Throughout the chapter, various aspects and techniques of machine
learning that are engaged in the fields of risk assessment to ameliorate this
process has been discussed. The importance of machine learning in risk
assessment is skillfully acknowledged. Machine learning has the poten-
tial to create wonders in risk assessment by incorporating methodical
procedures and models that can fabricate accurate results for risk analysis
by effectively monitoring large and complex datasets. The application of
machine learning in this sector has led to the conclusion that these methods
or techniques can be used to analyze huge amounts of data with efficient
predictive analysis. Various use cases have also been discussed such as
supervision of conduct and market abuse in trading, fraud detection, and
credit risk.
The description of various models and the case studies that are covered in
the chapter is a blueprint of the utilization of machine learning in different
industries and organizations in the fields like risk management in banks
and risk management in disaster management. The major issue that can be
addressed by employing technology is to counter risks as machine learning
provides various models and techniques that can minimize or prevent such
risks. These techniques and models change with different sets of labeled
data as per the convenience of the procedure. It provides the stability in
the field and aids in the protection of human and national information
that can be considered as highly confidential information. Still research is
ongoing in the area of risk assessment and several measures are already
implemented. There are still many areas untouched by machine learning. It
can ease the stress of several industries and organizations as potential risks
can be identified and suitable methods can be employed if the probability
of risk is strong.
In the end, one can just say that humans will be encountering an era which
will make even complex problems easy to solve with efficiency and they will
solve it for sure, which will be time-saving and will be economically benefi-
cial. Time is not far afar when technology such as AI and ML is employed as
a solution to almost every problem.
52 Machine Learning Algorithms and Applications in Engineering
References
Apostolakis, G.E. 2004. How useful is quantitative risk assessment? Risk Analysis
24: 515–520.
Aggarwal, P.K., Grover, P.S., Ahuja, L. 2018. Exploring quality aspects of smart mobile
phone applications. Journal of Advanced Research in Dynamical and Control Systems
11: 292–297.
Aggarwal, P.K., Grover, P.S., Ahuja, L. 2019. Assessing quality of mobile applications
based on a hybrid MCDM approach. International Journal of Open Source Software
and Processes, 10: 51–65.
Aven, T. 2012. The risk concept—historical and recent development trends. Reliab.
Eng. Syst. Saf. 99: 33–44.
Aven, T., Krohn, B.S. 2014. A new perspective on how to understand, assess and
manage risk and the unforeseen. Reliab. Eng. Syst. Saf. 121:1–10.
Bucelli, M., Paltrinieri, N., Landucci, G., 2018. Integrated risk assessment for oil and
gas installations in sensitive areas. Ocean Eng. 150: 377–390.
Chen, H., Moan, T., Verhoeven, H. 2008. Safety of dynamic positioning operations on
mobile offshore drilling units. Reliab. Eng. Syst. Saf. 93: 1072–1090.
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson,
G., Corrado, G., Chai, W., Ispir, M. 2016. Wide and deep learning for recom-
mender systems. In: Proceedings of the 1st Workshop on Deep Learning for
Recommender Systems. ACM, pp. 7–10.
Comfort, L.K. 2019. The Dynamics of Risk: Changing Technologies, Complex Systems, and
Collective Action in Seismic Policy. Princeton, NJ: Princeton University Press.
Creedy, G.D. 2011. Quantitative risk assessment: how realistic are those frequency
assumptions? J. Loss Prev. Process Ind. 24: 203–207.
De Marchi, B., Ravetz, J.R. 1999. Risk management and governance: a post-normal
science approach. Futures 31: 743–757.
Durga Rao, K., Gopika, V., Sanyasi Rao, V.V.S., Kushwaha, H.S., Verma, A.K., Srividya,
A. 2009. Dynamic fault tree analysis using Monte Carlo simulation in probabil-
istic safety assessment. Reliab. Eng. Syst. Saf. 94: 872–883.
Diekmann, E.J. 1992. Risk analysis: lessons from artificial intelligence. Int. J. Proj.
Manag. 10, 75–80.
Goodfellow, I.J., Bengio, Y., Courville, A. 2016. Deep Learning. The MIT Press,
Cambridge, MA.
Hastie, T., Tibshirani, R., Friedman, J. 2009. Unsupervised learning. In: The Elements of
Statistical Learning. Springer, Berlin, pp. 485–585.
Haugen, S., Vinnem, J.E. 2015. Perspectives on risk and the unforeseen. Reliab. Eng.
Syst. Saf. 137: 1–5.
Jain, P., Sharma, A., Ahuja, L. 2018. The impact of agile software development pro-
cess on the quality of software product. In: Proceedings of the International
Conference on Reliability, Infocom Technologies and Optimization (Trends and
Future Directions) (ICRITO), pp. 812–815.
Jain, P., Sharma, S. 2019. Prioritizing factors used in designing of test cases: An ISM-
MICMAC based analysis. In: Proceedings of International Conference on Issues
and Challenges in Intelligent Computing Techniques (ICICT), India.
Machine Learning for Risk Analysis 53
Jain, P., Sharma, A., Aggarwal, P.K. 2020. Key attributes for a quality mobile applica-
tion. In Proceedings of the International Conference on Cloud Computing, Data
Science and Engineering Confluence, 50–54.
Kaplan, S., Garrick, B.J., 1981. On the quantitative definition of risk. Risk Anal.
1: 11–27.
Khakzad, N. 2015. Application of dynamic Bayesian network to risk analysis of
domino effects in chemical infrastructures. Reliab. Eng. Syst. Saf. 138: 263–272.
Khakzad, N., Khan, F., Amyotte, P. 2013a. Risk-based design of process systems using
discrete-time Bayesian networks. Reliab. Eng. Syst. Saf. 109: 5–17.
Khakzad, N., Khan, F., Amyotte, P. 2013b. Quantitative risk analysis of offshore
drilling operations: a Bayesian approach. Saf. Sci. 57: 108–117.
King, G., Zeng, L. 2001. Logistic regression in rare events data. Polit. Anal. 9: 137–163.
Kongsvik, T., Almklov, P., Haavik, T., Haugen, S., Vinnem, J.E., Schiefloe, P.M. 2015.
Decisions and decision support for major accident prevention in the process
industries. J. Loss Prev. Process Ind. 35: 85–94.
Landucci, G., Paltrinieri, N. 2016a. A methodology for frequency tailorization
dedicated to the Oil & Gas sector. Process Saf. Environ. Prot. 104: 123–141.
Landucci, G., Paltrinieri, N. 2016b. Dynamic evaluation of risk: from safety indicators
to proactive techniques. Chem. Eng. Trans. 53: 169–174.
Lasi, H., Fettke, P., Kemper, H.-G., Feld, T., Hoffmann, M. 2014. Industry 4.0. Bus. Inf.
Syst. Eng. 6: 239–242.
Musgrave, G.L. 2013. James Owen Weatherall, the physics of Wall Street: A brief his-
tory of predicting the unpredictable. Bus. Econ. 48: 203–204.
Nivolianitou, Z.S., Leopoulos, V.N., Konstantinidou, M. 2004. Comparison of
techniques for accident scenario analysis in hazardous systems. J. Loss Prev.
Process Ind. 17: 467–475.
Nobre, F.S., 2009. Designing Future Information Management Systems. IGI Global.
Noh, Y., Chang, K., Seo, Y., Chang, D. 2014. Risk-based determination of design
pressure of LNG fuel storage tanks based on dynamic process simulation
combined with Monte Carlo method. Reliab. Eng. Syst. Saf. 129: 76–82.
Nývlt, O., Haugen, S., Ferkl, L. 2015. Complex accident scenarios modelled and
analysed by stochastic Petri nets. Reliab. Eng. Syst. Saf. 142: 539–555.
Nývlt, O., Rausand, M. 2012. Dependencies in event trees analyzed by Petri nets.
Reliab. Eng. Syst. Saf. 104: 45–57.
Øien, K., Utne, I.B., Herrera, I.A. 2011. Building safety indicators: Part 1 –theoretical
foundation. Safety Sci. 49: 148–161.
Paltrinieri, N., Khan, F., Cozzani, V. 2015. Coupling of advanced techniques for
dynamic risk management. J. Risk Res. 18: 910–930.
Paltrinieri, N., Reniers, G. 2017. Dynamic risk analysis for Seveso sites. J. Loss Prev
Process Ind. 49: Part A: 111–119.
Paltrinieria, N., Comfortb, L., Reniers, G. 2019. Learning about risk: Machine learning
for risk assessment. Safety Sci. 118: 475–486.
Pasman, H., Reniers, G. 2014. Past, present and future of quantitative risk assessment
(QRA) and the incentive it obtained from land-use planning (LUP). J. Loss Prev.
Process Ind. 28: 2–9.
Svozil, D., Kvasnicka, V., Pospichal, J. 1997. Introduction to multi-layer feed-forward
neural networks. Chemom. Intell. Lab. Syst. 39: 43–62.
54 Machine Learning Algorithms and Applications in Engineering
Tanwar, S., Bhatia, Q., Patel, P., Kumari, A., Singh, P.K., Hong, W.C. 2020. Machine
learning adoption in blockchain-based smart applications: The challenges, and
a way forward. IEEE Access. 8: 474–488.
Villa, V., Paltrinieri, N., Khan, F., Cozzani, V. 2016a. Towards dynamic risk analysis: a
review of the risk assessment approach and its limitations in the chemical pro-
cess industry. Saf. Sci. 89. https://doi.org/10.1016/j.ssci.2016.06.002
Villa, V., Paltrinieri, N., Khan, F., Cozzani, V., 2016b. A short overview of risk analysis
background and recent developments. In: Dynamic Risk Analysis in the Chemical
and Petroleum Industry: Evolution and Interaction with Parallel Disciplines in
the Perspective of Industrial Application. 10.1016/B978-0-12-803765-2.00001-9.
Wolpert, D.H., 2002. The supervised learning no-free-lunch theorems. Soft Comput.
Ind. 25–42.
Yang, X., Haugen, S., 2015. Classification of risk to support decision-making in haz-
ardous processes. Safety Sci. 80: 115–126.
newgenprepdf
4
Machine Learning Techniques
Enabled Electric Vehicle
CONTENTS
4.1 Introduction................................................................................................... 56
4.1.1 Artificial Intelligence Technology to Enhance EV
Production and Support the Deployment of
Electric Vehicles................................................................................ 56
4.1.2 Artificial Intelligence Used to Supercharge Battery
Development for Electric Vehicles................................................. 58
4.1.3 A Smarter Approach to Battery Testing........................................ 58
4.1.4 Wider Applications.......................................................................... 59
4.1.5 A Review on AI-based Predictive Battery Management
System for E-Mobility...................................................................... 60
4.2 Reverse Engineering with AI for Electric Power Steering...................... 61
4.3 Artificial Intelligence Technology to Enhance EV Production and
Support the Deployment of Electric Vehicles........................................... 62
4.4 Artificial Intelligence Used to Supercharge Battery
Development for Electric Vehicles............................................................. 63
4.5 How AI Helps Build Better Batteries......................................................... 65
4.6 Machine Learning Supercharges Battery Development......................... 67
4.7 AI-Based Predictive Battery Management System for E-Mobility........ 67
4.8 Uses of Artificial Intelligence for Electric Vehicle
Control Applications.................................................................................... 69
4.9 Conclusion..................................................................................................... 71
DOI: 10.1201/9781003104858-4 55
56 Machine Learning Algorithms and Applications in Engineering
4.1 Introduction
Machine learning is expected to play a vital role in the upcoming industry
revolution. The evolution of ML and AI has great implications for the devel-
opment of electric vehicles in various means. Battery performance can make
or break the electric vehicle experience, from driving range to charging time
to the lifetime of the car. Now, artificial intelligence has made dreams like
recharging an EV in the time it takes to stop at a gas station a more likely
reality and could help improve other aspects of battery technology.
For decades, advances in electric vehicle batteries have been limited by a
major bottleneck: evaluation times. At every stage of the battery develop-
ment process, new technologies must be tested for months or even years to
determine how long they will last. But now, a team led by Stanford professors
Stefano Ermon and William Chueh has developed a machine learning-based
method that slashes these testing times by 98 percent. Although the group
tested their method on battery charge speed, they said it can be applied
to numerous other parts of the battery development pipeline and even to
nonenergy technologies.
“In battery testing, you have to try a massive number of things, because
the performance you get will vary drastically,” said Ermon, an assistant pro-
fessor of computer science. “With AI, we’re able to quickly identify the most
promising approaches and cut out a lot of unnecessary experiments.”
Given the quantity of data however, a major challenge for the team was the
fact that it would take human experts about 32 weeks to sift through EV user
reviews in order to extract useful insights from free text.
So, almost on the side, we started experimenting with deep learning nat-
ural language processing techniques to unlock some insights there. It turned
out that the review data was an untapped source of research innovation for
us. We quickly realized that the application of AI/ML to this data could both
accelerate policy analysis and reduce science and technology (S&T) research
evaluation costs.
Consequently, by deploying deep learning techniques to analyze those
EV user reviews, we were able to show how machine learning tools could
be used to quickly analyze streaming data for policy evaluation in near
real time and provide new insight into important electric vehicle charging
infrastructure policies, such as the need to focus on the quality of the user
experience and evidence supporting public involvement in EV charging
network buildout.
By displacing gasoline and diesel fuels, electric cars and fleets reduce
emissions from the transportation sector, thus offering important public
health benefits. However, public confidence in the reliability of charging
infrastructure remains a fundamental barrier to adoption. Using large-scale
social data and machine learning from 12,720 electric vehicle (EV) charging
stations, we provide national evidence on how well the existing charging
infrastructure is serving the needs of the rapidly expanding population of
EV drivers in 651 core-based statistical areas in the United States. We deploy
supervised machine learning algorithms to automatically classify unstruc-
tured text reviews generated by EV users. Extracting behavioral insights at
a population scale has been challenging given that streaming data can be
costly to classify. Using computational approaches, we reduce processing
times for research evaluation from weeks of human processing to just
minutes of computation. Contrary to theoretical predictions, we find that
stations at private charging locations do not outperform public charging
locations provided by the government. Overall, nearly half of the drivers
who use mobility applications have faced negative experiences at EV char-
ging stations in the early growth years of public charging infrastructure, a
problem that needs to be fixed as the market for electrified and sustainable
transportation expands.
The study, published by Nature on February 19, 2020, was part of a larger
collaboration among scientists from Stanford, Massachusetts Institute of
Technology (MIT), and the Toyota Research Institute that bridges foun-
dational academic research and real- world industry applications. The
goal: finding the best method for charging an EV battery in 10 minutes that
maximizes the battery’s overall lifetime. The researchers wrote a program
that, based on only a few charging cycles, predicted how batteries would
respond to different charging approaches. The software also decided in real
time what charging approaches to focus on or ignore. By reducing both the
58 Machine Learning Algorithms and Applications in Engineering
length and number of trials, the researchers cut the testing process from
almost two years to 16 days.
“We figured out how to greatly accelerate the testing process for extreme
fast charging,” said Peter Attia, who co-led the study while he was a graduate
student. “What’s really exciting, though, is the method. We can apply this
approach to many other problems that, right now, are holding back battery
development for months or years.”
and different approaches—and when to exploit, or zero in, on the most prom-
ising ones.”
The team used this power to their advantage in two key ways. First, they
used it to reduce the time per cycling experiment. In a previous study, the
researchers found that instead of charging and recharging every battery until
it failed—the usual way of testing a battery’s lifetime—they could predict
how long a battery would last after only its first 100 charging cycles. This is
because the machine learning system, after being trained on a few batteries
cycled to failure, could find patterns in the early data that presaged how long
a battery would last.
Second, machine learning reduced the number of methods they had to
test. Instead of testing every possible charging method equally, or relying on
intuition, the computer learned from its experiences to quickly find the best
protocols to test.
By testing fewer methods for fewer cycles, the study’s authors quickly
found an optimal ultra-fast-charging protocol for their battery. In addition
to dramatically speeding up the testing process, the computer’s solution was
also better—and much more unusual—than what a battery scientist would
likely have devised, said Ermon.
“It gave us this surprisingly simple charging protocol—something we
didn’t expect,” Ermon said. Instead of charging at the highest current at the
beginning of the charge, the algorithm’s solution uses the highest current
in the middle of the charge. “That’s the difference between a human and a
machine: The machine is not biased by human intuition, which is powerful
but sometimes misleading.”
4.1.4 Wider Applications
The researchers said their approach could accelerate nearly every piece of
the battery development pipeline: from designing the chemistry of a battery
to determining its size and shape, to finding better systems for manufac-
turing and storage. This would have broad implications not only for electric
vehicles but for other types of energy storage, a key requirement for making
the switch to wind and solar power on a global scale.
“This is a new way of doing battery development,” said Patrick Herring,
coauthor of the study and a scientist at the Toyota Research Institute.
“Having data that you can share among a large number of people in aca-
demia and industry, and that is automatically analyzed, enables much faster
innovation.”
The study’s machine learning and data collection system will be made
available for future battery scientists to freely use, Herring added. By using
this system to optimize other parts of the process with machine learning,
battery development—and the arrival of newer, better technologies—could
accelerate by an order of magnitude or more, he said.
60 Machine Learning Algorithms and Applications in Engineering
The potential of the study’s method extends even beyond the world of
batteries, Ermon said. Other big data testing problems, from drug devel-
opment to optimizing the performance of X-rays and lasers, could also be
revolutionized by the use of machine learning optimization. And ultimately,
he said, it could even help to optimize one of the most fundamental processes
of all.
“The bigger hope is to help the process of scientific discovery itself,” Ermon
said. “We’re asking: Can we design these methods to come up with hypoth-
eses automatically? Can they help us extract knowledge that humans could
not? As we get better and better algorithms, we hope the whole scientific dis-
covery process may drastically speed up.”
electric vehicles. Three major research topics are covered in the chapter, state
of charge (SoC), state of health (SoH) of the battery pack, and the remaining
driving range estimation.
it would then be able to see the vehicle from any point and jump into and
zoom all through the entire plan of the vehicle. Some truly modern CAD
frameworks will permit to siphon the plan into a test system program. This
could permit to conceivably go about like the vehicle exists and perceive how
it runs.
At the Cybernetic AI Self- Driving Car Institute, they are creating AI
frameworks for self- driving vehicles, which reveal an unmistakable fas-
cination that a similar industry-wide quest for figuring out of customary
vehicles is presently in progress for AI self-driving vehicles as well. The five
key stages to the preparing parts of an AI self-driving vehicle are sensor infor-
mation assortment and translation, sensor combination, virtual world model
refreshing, AI activity plan refreshing, and car orders controls issuance. The
most noticeable actual parts of an AI self-driving vehicle are the sensors. There
are a huge number of sensors on an AI self-driving vehicle, including radar
sensors, sonic sensors, cameras, light detection and ranging (LIDAR), inertial
measurement units (IMUs), and different sensors. Every car creator and tech
firm is definitely keen on knowing which sensors different organizations are
utilizing in their AI self-driving vehicles. It’s a fairly free-for-good now in
that there is no predominant provider fundamentally. To be sure, the different
organizations that make sensors appropriate for AI self-driving vehicles are
in a savage battle about attempting to pick up gain traction for their sensors.
designs in the early information that foretold how long a battery would last.
Second, machine learning diminished the quantity of strategies they needed
to test. Rather than testing each conceivable charging technique similarly, or
depending on instinct, the PC gained from its encounters to rapidly locate the
best conventions to test.
By testing less strategy for less cycle, the examination’s creators imme-
diately found an optimal ultrafast charging convention for their battery.
Notwithstanding significantly accelerating the testing cycle, the PC’s answer
was likewise better and substantially more uncommon than what a battery
researcher would almost certainly have conceived. It gave us this shockingly
straightforward charging convention something we didn’t expect. Rather
than charging at the most elevated current toward the start of the charge,
the calculation’s answer utilizes the most noteworthy current in the charge.
That is the distinction between a human and a machine: The machine isn’t
one-sided by human instinct, which is ground-breaking yet some of time
deceiving.
The specialists said their methodology could quicken virtually every bit of
the battery advancement pipeline: from planning the science of a battery to
deciding its size and shape, to finding better frameworks for assembling and
capacity. This would have wide ramifications for electric vehicles as well as for
different sorts of energy stockpiling, a critical necessity for doing the change
to wind and sunlight-based force on a worldwide scale. The investigation’s
AI and information assortment framework will be made accessible for future
battery researchers to uninhibitedly utilize, Herring added. By utilizing this
framework to improve different pieces of the cycle with AI, battery advance-
ment and the appearance of fresher, better innovations could quicken by
a significant degree or more. The capability of the investigation’s strategy
broadens even past the universe of batteries.
Advances in electric vehicle batteries have regularly been restricted by
the colossal bottleneck of assessment times. At each phase of the battery
advancement measure, new advances should be tried for quite a long time or
up to years to decide how long they will last. Presently, Stanford University
analysts have built up an AI-based strategy that cuts these testing times by
98 percent. The specialists tried the strategy on battery charge speed, how-
ever state that it tends to be applied to numerous different pieces of battery
improvement. The advancement strategy could make the fantasies about
energizing an electric battery in the time it takes to stop at a service station
a more probable reality. Artificial Intelligence is ready to add some juice to
electric vehicles by accelerating upgrades in battery innovation.
Specialists are drawing nearer to conveying batteries that highlight
upgrades profoundly looked for by architects and advertisers of elec-
tric vehicles: batteries that are more secure, revive quicker, and are more
practical than the age of lithium-ion batteries now being used. Within five
years, specialists state, electric vehicles will arrive at value equality with
Machine Learning Techniques Enabled Electric Vehicle 65
the age of the electrical flow and had around 20,000 likely mixes. Utilizing
customary computational techniques to screen those competitors might have
taken five years. AI assessed them in nine days. The AI is currently getting
substantially more important as we change the solvents and electrolytes to
improve regarding limit and life cycle, since we have had more opportunity
to prepare it.
Notwithstanding assisting with materials research, AI decreases the
time it takes to test batteries. Previously, streamlining new battery-cell
plans was a cycle that regularly took long stretches of charging and
releasing the batteries a great many occasions. Presently, AI’s capacity
to rapidly investigate tremendous measures of information gathered
during battery testing permits the researchers to make forecasts about
execution a lot quicker, decreasing the quantity of tests that should be
run. The anode stores lithium and deliveries lithium-ion (lithium less an
electron) when the battery is releasing. The separator permits lithium
ion to go through while electrons are compelled to travel independently
delivering an electric flow. What’s more, the AI empowers specialists to
recognize incorporated natural mixes that could help the anode’s ability
to hold lithium ion. The cathode works when the battery is revived, and
the lithium ion and electrons venture out back to the anode in different
ways. What’s more, with AI’s assistance, scientists are building up an
iodine-based option in contrast to cobalt, a costly, hard to reuse hefty
metal utilized in most lithium-ion batteries. The electrolyte, normally a
blend of salts and solvents, encourages the development of the lithium
ion. With AI scientists can test new electrolyte plans quicker, assisting
them with distinguishing mixes with expanded voltage and higher blaze
focuses. Quicker charging commonly harms the separator, lessening the
battery’s lifetime and possibly prompting fires. Computer-based intel-
ligence assists analysts with finding the sweet spot adjusting charging
speeds, charging flows, charging recurrence, and battery lifetime.
Researchers and analysts around the globe are progressively going to
Artificial Intelligence (AI) and alleged robo-scientists to assist them with
finding everything from new anti-infection agents and medications, through
to new shower on sun-powered board materials and immunizations, so it
shouldn’t come as a very remarkable amazement that AI is presently being
utilized to help grow new battery tech. “In battery testing, you need to attempt
countless things, on the grounds that the exhibition you get will change def-
initely,” said Ermon, an associate educator of software engineering at MIT,
alluding to how researchers customarily chase for new battery forward leaps,
and who drove the new undertaking to utilize AI to assist them with cre-
ating promising batteries for electric vehicles. “With AI, we’re ready to rap-
idly recognize the most encouraging methodologies and cut out a great deal
of pointless trials.” The examination, distributed by Nature, was essential
for a bigger cooperation among researchers from Stanford University, MIT,
Machine Learning Techniques Enabled Electric Vehicle 67
and the Toyota Research Institute that spans basic scholastic exploration and
genuine industry applications, and their objective: to locate the best strategy
for charging an EV battery in a short time that amplifies the battery’s general
lifetime.
nominal voltage, power density, and cost. In EVs, a smart battery manage-
ment system (BMS) is one of the fundamental parts; it gauges the conditions
of battery precisely, yet in addition guarantees safe activity and drags out
the battery life. The exact assessment of the condition of charge (SOC) of a
Li-ion battery is an extremely testing task in light of the fact that the Li-ion
battery is an exceptionally time variation, nonlinear, and complex electro-
chemical framework.
This section clarifies the functions of a Li-ion battery, gives the fundamental
highlights of a smart BMS, and thoroughly surveys its SOC assessment
techniques. These SOC assessment techniques have been ordered into
four fundamental classes relying upon their inclination. A critical clarifica-
tion, including their benefits, restrictions, and their assessment errors from
different examinations, is given. A few proposals relying upon the advance-
ment of innovation are recommended to improve the online assessment.
This section tends to the worries for the current BMSs. State assessment of a
battery, including condition of charge, condition of well-being, and condition
of life, is a basic undertaking for a BMS. Through inspecting the most recent
systems for the state assessment of batteries, the future difficulties for BMSs
are introduced and potential arrangements are proposed too.
4.9 Conclusion
There is an increasing demand for scalable and autonomous management
systems. We have proposed AI- based EV, which would eliminate fossil
fuel (petrol and diesel) consumption. A universal approach to performing
the reverse engineering of electric power steering (EPS) for the purpose of
external control is also presented, designed, and executed. The primary
objective of the related study was to solve the problem associated with the
precise prediction of the dynamic trajectory of an autonomous vehicle, which
was presented and designed. This task has been accomplished by deriving
a new equation for determining the lateral tire forces and adjusting some of
the vehicle parameters under road test conditions. The expert systems were
made more flexible and effective for the present application by the introduc-
tion of hybrid artificial intelligence with logical reasoning. The innovation
offers a solution to the major problem of liability in the event of an autono-
mous transport vehicle being involved in a collision.
5
A Comparative Analysis of Established
Techniques and Their Applications
in the Field of Gesture Detection
CONTENTS
5.1 Introduction................................................................................................... 74
5.2 Challenges and Areas of Improvements................................................... 76
5.2.1 Image Acquisition Challenges........................................................ 76
5.2.2 Gesture Tracking, Segmentation and Identification
Challenge........................................................................................... 76
5.2.3 Feature Extraction Challenges in Gesture Detection and
Identification..................................................................................... 77
5.2.4 Limitations and Challenges in Gesture Gradation or
Categorization................................................................................... 78
5.2.5 Limitations of End-User Gesture Analysis
Customization................................................................................... 79
5.3 Related Fields and Special Mentions......................................................... 79
5.3.1 Collaborative Learning of Gesture Recognition and
3D Hand Pose Estimation with Multi-Order
Feature Analysis............................................................................... 79
5.3.2 Gesture Analysis and Organizational Research:
The Development and Application of a Protocol for
Naturalistic Settings......................................................................... 80
5.3.3 Gesture and Language Trajectories in Early
Development: An Overview from the Autism Spectrum
Disorder Perspective........................................................................ 80
5.3.4 Learning Individual Styles of Conversational Gesture............... 81
DOI: 10.1201/9781003104858-5 73
newgenprepdf
5.1 Introduction
In day-to-day mundane conversations and tasks, information reinforcement
and clarification play a cardinal role, using sign language as a medium to
do so. Sign language is an irreplaceable component of daily communication,
and it may have variegated forms like visual motions or hand gestures, but
it aims to convey the same message and fulfil the purpose of information
reinforcement. It includes use of multiple body parts including fingers and
hands, which are the popular ones, but it also incorporates facial gestures,
head, body and arms, which may be lesser known in mundane communi-
ties. For those with hearing, listening or visual disabilities, sign language is
the only means of communication and expression. But those with these dis-
abilities depend completely on sign language to interact with outer world
find it cumbersome to do so as many people are unaware of the complete
imbrications of sign language. Moreover, while use and knowledge of sign
language is common among communities with visual disabilities, it is not
very widespread among communities with hearing disabilities. This poses
a challenge for these communities to interact or convey their message to the
outer world as they have relied completely on sign language, which con-
tinues to be an unsolved issue till today.
While most of the sign language includes upper half of body [1], some
sign language methods use variegated shapes and multiple changes in the
complete body [2]. Gestures most commonly used are hand gestures, which
can be classified into multiple types, namely, conversational gestures, con-
trolling gestures, manipulative gestures and communicative gestures [3].
Due to the highly structured and organized attributes of sign language, the
field of computer vision and its various algorithms suits the same [4]. The
study here compiles contemporary expertise on the detection of gestures
implementing supervised learning and image segregation algorithms,
along with their associated mannerisms while proposing modifications for
inclusivity of a wider audience, with a prominent emphasis on people with
communication disabilities. Furthermore, the work done in this chapter may
culminate in the development of real-world applications such as determining
A Comparative Analysis of Techniques in Gesture Detection 75
FIGURE 5.1
Limitations and Challenges in Gesture Gradation or Categorization are elaborated pictorially.
A Comparative Analysis of Techniques in Gesture Detection 79
interaction but also build up their confidence to further aid in the treatment
of the disorder. After discerning differences in gesture analysis, the research
identifies multiple differences between trajectories of frequency of gestures
in different stages of ASD, as well differences based on the age of the patient,
tracking trajectory can also indicate the course, speed and effectiveness of
treatment, if given. Apart from the above, early diagnosis and detection can
also be related to the neurological paths that patients follow while their dis-
order regrows or slips in the dormant state. These insights can be substantial
in finding cure to their respective disorders.
Andrea et al. [42] designed a LSTM model for the video graphic detec-
tion of autism spectrum disorder. The study, done on 20 individuals with a
diagnosis of autism, fed a short video clip showing both healthy and autistic
participants to the neural network. The experiment consisted of 48 trials of
neutral hand gestures of grasping an object, which was filtered for discrep-
ancies. The evaluation for a thresholding value ranging from 0.5 to 0.95 gave
subsequent accuracy values of 83.33 percent and recall of 85 percent. The
frameworks have been deemed as progressive comparable to state-of-the-art
LSTM systems along with the publication of the video data set for public use.
Ahmedt-Aristizabal et al. [43] pursued computer vision to enumerate a
facial expression gesture analysis system for the early detection of epilepsy.
A system consisting of a video feed recording blinking, chewing, gazing
and motion of the jaw at 25 frames, a R-CNN trained with the WIDER
data set and a Deep CNN for feature extraction followed by a LSTM with a
many-to-one layering for the evaluation. For result generation, a multi-fold
cross-validation method was applied, which measures both reliability and
consistency of the system. The results discussed show an average accuracy
of 96.58 percent with an AUC of 0.9926. This MTLE-based detection has been
considered a success by the authors; however, the absence of large public
datasets of seizure facial expressions leaves room for overfitting-based bias
in the results, as addressed in the paper itself.
Ye [44], in his paper, amalgamates the 3D conventional neural network
with the multi-modal feature of the recurrent neural network to gain a
superior network. In their experiments, the C2D model is leveraged to
extract metric data from short video clips of ASL communication. Using
a greedy approach to stitch together the clips with the highest confidence
score, the proposed hybrid framework called 3DRCNN can train and work
with any of the multitude of ASL present in use. For the data set, the team
created a modified Kinect sensor in association with the ASL Association
and created short clips of 99 different hand gestures as well as 100 short
one-minute video clips of sentences. After training on the Sports1M dataset,
testing showed an accuracy of 69.2 percent in case of person-dependent
comparison and 65.2 percent in case of person-independent analysis. Both
scores were compared and undefeated by two other systems, LRCN and
only C3D methods.
In another paper by Rastagoo [45], the team started by building a data set
by only 10 participants who provided 10,000 RGB videos of over 90 hand
gestures in the Persian language. For the model, a pipeline system starts from
a CNN to extract the heatmap of the hand using key points and boundary
boxes projected a skeleton of the hand on a plane that feeds it to the proposed
framework named 3DCNN. A total of five such parallel streams generates
a composite modal joint heatmap, which is processed via a LSTM network.
All possible combinations of the data streams with the LSTM, including
modifying the number of streams, has been recorded with the best result
A Comparative Analysis of Techniques in Gesture Detection 85
5.5 Conclusion
Throughout this chapter, discussions have been made with the primary
focus on using gesture detection frameworks that have been proposed,
developed and implemented in real-world scenarios. Gesture detection has
many uses, starting from the reasons mentioned in the first sections of the
chapter to newer intuitive implementation that have not been discovered
till date. This is not surprising considering the fact that in this day where
digital technology is widespread into every nook and cranny of the society,
the access to camera and the ability to use it is extremely trivial. As such,
anyone can leverage the fruits of the research done in the fields of gesture
analysis, with a main focus still remaining in the domain of ‘American sign
language’, which is one of the most widely used techniques to communicate
using palm and hand gestures.
In this chapter, the different challenges with the acquisition and processing
of data sets are discussed with certain examples detailing the reason for
it. Thereafter, the different requirements along the process of developing
usable and efficient frameworks from the system has also been perused over,
followed by the application of gesture analysis for the benefit of the society.
As previously stated, it is obvious that the most important and obvious use of
this technique is in developing tools for communicating with those who are
challenged in some form; however, this technique can also be modified for use
in novel ways, such as the detection and treatment of autism. Another similar
field that got a brief mention was the use in detection of sleep and fatigue
levels in students attending online classes as well as making 3D models of
complex actions from simple images. There is no doubt that with time, these
techniques would get refined by the work of the community, leading to other
fruitful discoveries.
Finally, an extended literature review of the current techniques that have
something unique to offer have been compiled in the review section. Instead
88 Machine Learning Algorithms and Applications in Engineering
of just stating the work done in those studies, relevant details such as the data
sets used, the deep learning techniques implemented for the development of
the pipelines, as well as the outcomes and comparative analysis performed
have been condensed and mentioned to aid in quick retrieval of important
information without access to the original work. A few minor observations
have been made: among the techniques evaluated, a large number of them
relied on LSTM to finalize and unify the data in the final layer of the models.
The use of deep learning-based techniques has seen more prevalence, with
normal CNN being used mostly in extracting features from image and video
data sets. Most of the studies also had to rely on creating their own data sets,
which they have subsequently made public, which can only be seen as a huge
asset to the entire research community.
References
[1] Bellugi U, Fischer S (1972) A comparison of sign language and spoken language.
Cognition 1: 173–200.
[2] Yang R, Sarkar S, Loeding B (2010) Handling movement epenthesis and hand
segmentation ambiguities in continuous sign language recognition using nested
dynamic programming. IEEE Trans Pattern Anal Mach Intell 32: 462–477.
[3] Wu Y, Huang TS (1999) Vision- based gesture recognition: a review.
In: International Gesture Workshop, Springer, pp. 103–115.
[4] Wu Y, Huang TS (1999) Human hand modeling, analysis and animation in the
context of HCI. In: Image Processing, ICIP 99. Proceedings. 1999 International
Conference, IEEE, pp. 6–10.
[5] Starner T, Weaver J, Pentland A (1998) Real-time American sign language recog-
nition using desk and wearable computer-based video. IEEE Trans Pattern Anal
Mach Intell 20:1371–1375.
[6] Grobel K, Assan M (1997) Isolated sign language recognition using hidden
Markov models. In: Systems, Man, and Cybernetics, 1997. Computational cyber-
netics and simulation. 1997 IEEE International Conference, IEEE, pp. 162–167.
[7] Vogler C, Metaxas D (1998) ASL recognition based on a coupling between
HMMs and 3D motion analysis. In: Computer Vision, 1998. Sixth international
conference, IEEE, pp. 363–369.
[8] Vogler C, Metaxas D (1997) Adapting hidden Markov models for ASL recogni-
tion by using three-dimensional computer vision methods. In: Systems, Man, and
Cybernetics, Computational cybernetics and simulation. 1997 IEEE international
conference, IEEE, pp. 156–161.
[9] Suarez J, Robin R (2012) Hand gesture recognition with depth images: A review.
IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human
Interactive Communication Spet.
[10] Ferrone A, Maita F, Maiolo, and Arquilla M (2016) Wearable band for hand
gesture recognition based on strain sensors. IEEE RAS/EMBS International
Conference on Biomedical Robotics and Biomechatronics 26–29 June.
A Comparative Analysis of Techniques in Gesture Detection 89
[11] Plouffe G, Ana-Maria C. Static and dynamic hand gesture recognition in depth
data using dynamic time warping, IEEE Transactions on Instrumentation and
Measurement, Vol. 65, no. 2, February 2016.
[12] Wei Lu, Zheng Tong, Jinghui Chu. Dynamic hand gesture recognition with leap
motion controller. IEEE signal Processing Letters, Vol 23, No. 9, September 2016.
[13] Alani AA, Georgina Cosma, Aboozar and McGinnity TM. Hand gesture recog-
nition using an adapted convolutional neural network with data augmentation.
4th IEEE International Conference on Information Management, 2018.
[14] Avola D, Bernardi M, Cinque L, Luca Foresti G, Massaroni C. Exploiting recur-
rent neural networks and leap motion controller for the recognition of sign lan-
guage and semaphoric hand gestures. Journal of Latex Class Files, Vol. 14, No 8,
August 2015.
[15] Zabulis X, Baltzakis H, Argyros A. Vision-based hand gesture recognition for
human–computer interaction. Gesture, 1–56, 2009.
[16] Chakraborty B, Sarma D, Bhuyan M, Macdorman K. Review of constraints on
visionbased gesture recognition for human–computer interaction (2018).
[17] Bauer, B., Karl-Friedrich, K. (2002) Towards an automatic sign language recog-
nition system using subunits. In: Wachsmuth, I., Sowa, T. (eds.) GW 2001. LNCS
(LNAI), vol. 2298, pp. 64–75. Springer, Heidelberg.
[18] Zhu, Y., Yang, Z., & Yuan, B. (2013, April). Vision based hand gesture recognition.
In 2013 International Conference on Service Sciences (ICSS) (pp. 260–265). IEEE.
[19] Chakraborty, B. K., Sarma, D., Bhuyan, M. K., & MacDorman, K. F. (2018).
Review of constraints on vision-based gesture recognition for human–computer
interaction. IET Computer Vision, 12(1), 3–15.
[20] Liang, RH, Ouhyoung M. (1998, April). A real-time continuous gesture recogni-
tion system for sign language. In Proceedings third IEEE international confer-
ence on automatic face and gesture recognition (pp. 558–567). IEEE.
[21] Morency, LP, Quattoni A, Darrell T. (2007, June). Latent-dynamic discriminative
models for continuous gesture recognition. In 2007 IEEE conference on com-
puter vision and pattern recognition (pp. 1–8). IEEE.
[22] Morency LP, Christoudias CM, Darrell, T. (2006, November). Recognizing gaze
aversion gestures in embodied conversational discourse. In Proceedings of the
8th international conference on Multimodal interfaces (pp. 287–294).
[23] Krauss, RM, Chen Y, Chawla P. (1996). Nonverbal behavior and nonverbal
communication: What do conversational hand gestures tell us?. In Advances in
Experimental Social Psychology (Vol. 28, pp. 389–450). Academic Press, London.
[24] Vega K, Cunha M, Fuks H. (2015, March). Hairware: the conscious use of
unconscious auto-contact behaviors. In Proceedings of the 20th International
Conference on Intelligent User Interfaces, pp. 78–86.
[25] Mitra S, Acharya T. Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern.
Part C Appl. Rev. 37(3), 311–324 (2007).
[26] Wobbrock JO, Morris MR, and Wilson AD. (2009). User-defined gestures for sur-
face computing. Proc. CHI ‘09, 1083–1092.
[27] Morris MR, Wobbrock JO, Wilson AD. (2010). Understanding users’ preferences
for surface gestures. Proc. GI ‘10, 261–268.
[28] Avrahami D, Hudson SE, Moran TP, Williams BD. (2001). Guided gesture
support in the paper PDA. Proc. UIST ‘01, 197–198.
[29] Bigdelou A, Schwarz L, Navab. (2012). An adaptive solution for intra-operative
gesture-based human-machine interaction. Proc. IUI ‘12, 75–83.
90 Machine Learning Algorithms and Applications in Engineering
CONTENTS
6.1 Dream: Introduction..................................................................................... 94
6.1.1 Theories of Dream............................................................................ 94
6.1.1.1 Wish-Fulfillment Theory................................................... 95
6.1.1.2 Information-Processing Theory....................................... 95
6.1.1.3 Activation Synthesis Theory............................................ 95
6.1.1.4 Physiological Function Theory........................................ 96
6.1.2 Origin of Dream................................................................................ 96
6.1.2.1 Dream versus Sleep........................................................... 96
6.1.2.2 Dream versus Memory...................................................... 97
6.1.2.3 Human Visual Cortex........................................................ 98
6.1.2.4 Hippocampus and Amygdala.......................................... 99
6.2 Brain–Computer Interface......................................................................... 100
6.2.1 BCI Tools.......................................................................................... 101
6.2.1.1 Electroencephalography (EEG)...................................... 101
6.2.1.2 Electrocardiography (ECG)............................................ 101
6.2.1.3 Magnetic Resonance Imaging (MRI)............................. 102
6.2.1.4 Magnetoencephalography (MEG)................................. 102
6.2.2 Procedure......................................................................................... 103
6.2.2.1 Signal Acquisition............................................................ 103
6.2.2.2 Pattern Recognition......................................................... 103
6.2.2.3 Pattern Classification....................................................... 104
6.2.2.4 Command Generation..................................................... 104
DOI: 10.1201/9781003104858-6 93
newgenprepdf
6.1 Dream: Introduction
Philosophers around the world believe that dream is one of the most curious
things in human beings [1]. The dream is the succession of images and
sounds occurring inside the mind during sleep [2]. Researchers have found
that visual and auditory stimuli are active during certain stages of sleep
within the brain regions [3]. Some researchers found in their research that
the hippocampus, amygdala, visual cortex, auditory cortex, and motor cortex
are the major regions that are active during the dreaming of a sleeping object
[4]. However, most of the oneirologists committed in various articles that
dream originates from the centrally located brain stem area but associated
with other important cortical areas also.
There are various futuristic research ideas in my mind related to dreams,
and belief says that if it will be possible to implement some of them, it might
be beneficial for our community. Suppose a dreamer may spend six to eight
hours every night sleeping, he is also spending almost an equal time in the
workplace during daytime for his survival [5]. So, the question is, is it pos-
sible to make use of that time spent sleeping in any way. The dream might
be a solution for making sleeping time useful; it means if it is possible to con-
trol dream activities and contents, it will help humans to grow financially by
increasing our effective work hours.
6.1.1 Theories of Dream
Since the early sixteenth century, scientists are striving to know the fun-
damental causes of why dream occurs, but still, no one sure about it [6].
There are various philosophical theories proposed by various scientists and
philosophers from time to time, and some popular theories of the dream
(Figure 6.1) are briefly described here.
Brain–Computer Interface for Dream Visualization 95
Dream-Theories
FIGURE 6.1
Various theories of dreaming.
6.1.1.1 Wish-Fulfillment Theory
Wish-fulfillment theory was proposed by an Austrian oneirologist, Sigmund
Freud. He believed that dream contents are related to the wishes or desires of
the dreamer. Generally, a dreamer sees contents into a dream, which are his
incomplete or unfilled desires in his real life [7]. The most common example
of such a type of dream is sexual arousal during sleep when an adult dreamer
might be involved in sexual or romantic activities in his dream. Some other
examples are feeling sadness and happiness, participating in any traditional
ceremony, visiting new places or any ancient place and finding voluble things
or properties [8].
6.1.1.2 Information-Processing Theory
This theory deals with the process of computation like a computer system to
get input from the external environment and process the data and finally gen-
erate output. This theory comprises its data transmission path and processes
in the human brain just like computer hardware circuits and physical devices
are organized in a computer architecture [9].
It is well known that data and information inside the computer system
travel through busses and from one device to another device. In the same
manner, neurotransmitters travel through the axons and dendrites of neuron
cells in the nervous system or biological neural network [10]. Further, the
central part of neuron cells, which is also known as soma, is responsible for
collecting and modifying information in the form of chemical composition or
neurotransmitters. The same concept is also applicable in the case of dream
content generation inside the brain [11].
6.1.2 Origin of Dream
Scientists undertook imaging of the many regions of the brain that are active
while dreaming, starting from the brain stem to the cortex through the
thalamus [16]. Most of the observations show that brain regions are active
similar to waking stages. Suppose you are seeing any visual imagery during
dreaming, it relates to the visual cortex in the occipital lobe of the brain. In
the same way, the other cortex is also related to concerned activities [17].
The cortical regions hippocampus and amygdala are responsible for feeling
emotions and sensation activities [18].
FIGURE 6.2
Neocortical activations during REM sleep.
(Source: www.mentalhealthsciences.com/)
TABLE 6.1
Time Distribution for Various Sleep Stages
NREM
Stage-1 5 5 5 5 5 25
Stage-2 50 40 40 30 20 180
Stage-3 20 20 10 10 10 70
Stage-4 5 5 5 5 5 25
REM 10 20 30 40 50 150
Total 90 90 90 90 90 90×5=450+25*=480
FIGURE 6.3
Long-term memory formation.
(Source: www.nature.com)
FIGURE 6.4
Visual cortex activation during REM sleep.
(Source: www.nature.com)
FIGURE 6.5
Position of hippocampus inside the limbic system.
(Source: https://medimagery.com/)
FIGURE 6.6
Position of amygdala inside the limbic system.
(Source: www.rebalanceclinic.co.nz)
Researchers believe that like the hippocampus and other cortical regions,
amygdala is also active during dreaming and participates in emotional activ-
ities in dreaming. A dreamer can feel fear, anger, anxiety, and violence due to
the presence of amygdala [30].
6.2 Brain–Computer Interface
At the early stage of this research work, core concepts and anatomical
classifications on brain–computer interface (BCI) were published. The pre-
sent chapter focused on an important application in the domain of dream
physiology (oneirology) [31]. The previous section described important brain
organs and activities associated with dreaming, and this section discusses
some important modalities and techniques frequently used in the domain of
brain–computer interface [32].
In clinical bionics, the brain–computer interface is a very popular term
among biomedical engineers and researchers. In general, the brain–computer
interface is a communication pathway between the human brain and the com-
puter system [33]. Fundamentally, this domain has two broad subdomains
where a researcher must focus on modalities and procedures.
6.2.1 BCI Tools
Researchers are using a tool or modality for extracting signals/images from
the brain. Here, some popular modalities are briefly described.
6.2.1.1 Electroencephalography (EEG)
Electrical activity measurement is the most popular study among brain
researchers. Electroencephalography (EEG) is one of the major technologies
for electrical signal acquisition from the human brain by placing electrodes
noninvasively on the scalp during the experiment [37]. EEG has various
applications in the domain of brain–computer interface such as medical diag-
nosis, game playing, skill improvement, and psycho-physiological analysis.
This information can be extracted from various neocortical regions of the
brain. As stated in the first section, dreaming is directly associated with
memory consolidation happening inside the various neocortical regions of
the brain [38]. EEG modality is used for tracking electrical activities occurring
during memory consolidation during REM sleep. Electrodes must be placed
systematically (10–20 system) in cortical regions. Some major cortical regions
are the visual cortex, auditory cortex, motor cortex, and also thalamus region
[39]. That means we have to measure electrical activities from the frontal
lobe, occipital lobe, and temporal lobe for getting different features of dream
contents.
6.2.1.2 Electrocardiography (ECG)
Electrocardiography is also used to measure electrical activities inside the
human body, but in general, it is used for recording and analyzing electrical
activities of the heart and other associated organs. However, it cannot be used
directly for brain diagnosis, but sometimes researchers try to study brain-
instructed activities occurring in other associated organs of the body [40].
102 Machine Learning Algorithms and Applications in Engineering
6.2.1.4 Magnetoencephalography (MEG)
The basic concept of MEG is also magnetism, but unlike MRI, in this tech-
nology any external magnet is not used. It is based on the magnetic property
of blood. Magnetoencephalography, or MEG scan, is an imaging technique
that identifies brain activity and measures small magnetic fields produced in
the brain [48]. The scan is used to generate a magnetic image to pinpoint the
source of seizures. In this imaging technique, magnetic fields are detected by
extremely sensitive devices [49]. This device is known as superconducting
quantum interference device (SQID), frequently used to detect the mag-
netic field. The MEG scanner is used to detect and amplify magnetic signals
produced by the neurons of various active regions of the brain, and unlike
MRI, it doesn’t emit radiation or magnetic fields [50].
Comparative study of EEG, ECG, MRI and MEG signals acquisition modal-
ities, is shown in Table 6.2.
Brain–Computer Interface for Dream Visualization 103
TABLE 6.2
Comparative Study of Various Signals Acquisition Modalities
6.2.2 Procedure
The brain– computer interface is a mediator method and procedure for
establishing communication between the dreamers. A BCI communication
system has to be built by following various activities and tools such as soft-
ware and hardware. In this section, a detailed description of developing a
BCI system with complete steps is presented.
6.2.2.1 Signal Acquisition
In the formal working procedure of a brain– computer interface system,
signal acquisition is the first phase where a hardware modality is used to
extract signals from the brain. Generally, hardware modality is associated
with the brain directly or indirectly [51]. The way of connection depends on
the type of modality; it may be wired or wireless and it may also be inva-
sive, semi-invasive, and noninvasive type of connection. Now consider the
example of an EEG machine. So, it may be a wired or wireless system [52].
In early times, researchers used wired EEG systems, but nowadays, most
of the dream researchers also used mobile or wireless EEG modality. In the
case of EEG-based experiment, an electrode cap having motile electrodes are
placed on the scalp. These electrodes read the electrical stimulus from the
brain regions, which depends on the number of channels designed in the
machine [53].
6.2.2.2 Pattern Recognition
Recorded raw data might have some noise and redundancy, so in this phase,
we have to identify patterns or features from the brain signals that have been
recorded. Pattern recognition is the task of analyzing the signals to distin-
guish significant signal features from general raw materials and representing
them in a standard form suitable for translation into commands during fea-
ture translation [54]. When the input signals to an algorithm are too large
and it is suspected to be redundant, then it can be transformed into a reduced
104 Machine Learning Algorithms and Applications in Engineering
set of features. Sometimes, unnecessary portions of the raw data set must be
removed, and the process of removing such type of data portion is known
as dimensionality reduction. The extracted features are expected to contain
valuable and relevant information so that the desired task can be performed
by using this reduced pattern instead of the complete source data [55].
6.2.2.3 Pattern Classification
It is the process of categorizing data based on their features to perform the
variability reduction of feature values. This stage classifies the extracted fea-
ture signals having different features into account [56]. The responsibility of
the feature classifier algorithm is to use the feature vector provided by the fea-
ture extractor to assign the object to a category of the feature [57]. However,
complete classification is often impossible, so a more common task is to
determine the probability for each of the possible categories of features. The
problem of the classification depends on the variation in the feature values
for certain objects in the same category relative to the variation between fea-
ture values for certain objects in different categories [58]. The variation of
feature values for certain objects in the same category may be due to the com-
plexity of features and may be due to noise in signals.
6.2.2.4 Command Generation
Finally, based on data evaluation, decisions could be taken by computer
for performing any specific task by generating appropriate command. This
stage performs command generation [59]. This stage translates the signals
into meaningful commands for any connected device. The classified feature
signals are translated by the feature translation algorithm, which converts
the feature signals into the appropriate commands for the specific operations
performed by the connected device [60]. In this context source feature,
signals are known as the independent variable and targeted device control
commands are known as the dependent variable. In the translation process,
the independent variable is converted into the dependent variable. Feature
translation algorithms may be linear or nonlinear by using statistical analysis
and neural networks, respectively [61].
6.3 Deep Learning
Intelligence is a key factor in the human brain; that’s why it is known as a
super creature of nature. It is well known that intelligence couldn’t occur
randomly within the muscular tissues, but it is only possible through the
Brain–Computer Interface for Dream Visualization 105
TABLE 6.3
Comparative Study of Various Deep Learning Models
Multilayer Multilayer
Type Staked RBM Perceptron Perceptron Hybrid Network
6.3.2.1 Keras
Google Brain developed TensorFlow, which is used as a backend for Keras.
Keras is an open-source framework developed by Google engineer Francois
Chollet and it is a deep learning framework easy to use and evaluate our
models, by just writing a few lines of code [80]. Keras is the best framework
to start for beginners, and it was created to be user-friendly and easy to work
with Python and it has many pre-trained models—VGG, Inception, among
others. Not only ease of learning, but in the backend, it supports Tensorflow
and is used in deploying models [81]. Keras was created to be user-friendly,
modular, easy to extend, and to work with Python. The API was designed
for human beings, not machines, and follows best practices for reducing cog-
nitive load. Neural layers, cost functions, optimizers, initialization schemes,
activation functions, and regularization schemes are all standalone modules
that can be combined to create new models [82]. New modules are simple to
108 Machine Learning Algorithms and Applications in Engineering
add as new classes and functions. Models are defined in Python code, not
separate model configuration files.
The biggest reasons to use Keras stem from its guiding principles, primarily
the one about being user-friendly. Beyond the ease of learning and ease of
model building, Keras offers the advantages of broad adoption, support for a
wide range of production deployment options, integration with at least five
backend engines TensorFlow, CNTK, Theano, MXNet, and PlaidML, and
strong support for multiple GPUs and distributed training. Further, Keras is
backed by Google, Microsoft, Amazon, Apple, Nvidia, Uber, and others [83].
6.3.2.2 Caffe
Convolutional architecture for fast feature embedding (Caffe) is the open-
source deep learning framework developed by the University of Berkeley
AI Research Group. The framework is available as free open-source software
under a BSD license. This framework supports both researchers and indus-
trial applications in Artificial Intelligence [84]. Caffe is a deep learning frame-
work characterized by its speed, scalability, and modularity. Caffe works
with CPUs and GPUs and is scalable across multiple processors. Most of
the developers use Caffe for its speed, and it can process 60 million images
per day with a single NVIDIA K40 GPU [85]. Caffe has many contributors
to update and maintain the frameworks, and it is suitable for industrial
applications in the fields of machine vision, multimedia, and speech. Caffe
works well in computer vision models compared to other domains in deep
learning [86].
Caffe can work with many different types of deep learning architectures.
The framework is suitable for various architectures such as CNN, LRCN, and
LSTM. A large number of pre-configured training models are available to
the user, allowing a quick introduction to machine learning and the use of
neural networks [87]. As a platform for Caffe come Linux distributions such
as Ubuntu but also macOS and Docker container in question. For Windows
installations, solutions are also available on GitHub. For the Amazon AWS
Cloud, Caffe is available as a preconfigured Amazon Machine Image [88].
6.3.2.3 PyTorch
Nowadays, in the domain of deep learning, PyTorch is one of the popular
frameworks among researchers around the world. It is also an open-source
framework developed by Facebook AI Research Group; it is a pythonic way
of implementing our deep learning models and it provides all the services
and functionalities offered by the Python environment. It allows auto differ-
entiation that helps to speed up the backpropagation process [89]. PyTorch
comes with various modules like torchvision, torchaudio, and torchtext,
which is flexible to work in neural language processing (NLP) and computer
Brain–Computer Interface for Dream Visualization 109
vision [90]. PyTorch is more flexible for the researcher than for developers.
It provides better performance compared to Keras and Caffe, but it is a low-
level API that focused on direct work on array expressions [91].
PyTorch is a Python-based scientific computing package that is a replace-
ment for NumPy to make use of the power of GPUs and a deep learning
research platform that provides maximum flexibility and speed [92]. It
ensures an easy-to-use API, which helps with easier usability and better
understanding when making use of the API [93]. It is fast and feels native,
hence ensuring easy coding and fast processing. The support for CUDA
ensures that the code can run on the GPU, thereby decreasing the time
needed to run the code and increasing the overall performance of the
system [94].
6.3.2.4 MATLAB
In the domain of electronics and communication, MATLAB is the most
popular software tool since early times. However, it is losing popularity
among modern engineers and researchers. A special Deep Learning Toolbox
is developed under the umbrella of MATLAB for processing and analyzing a
dataset by using a new model, pre-trained models, and apps [95]. However,
it provides interfaces for designing and implementing most of the popular
neural network models such as CNN, RNN, GAN, and LSTM. Here CNN
and LSTM are used to perform classification and regression on image, time
series, and text data. Further, GAN and Siamese networks can also be used
for automatic differentiation, custom training loops, and shared weights [96].
It uses its programming language and environment, which is different from
the Python environment. This tool is originally used for signal processing.
However, models can also be imported and exchanged in the Python envir-
onment through other popular deep learning tools described in previous
sections (TensorFlow, Caffe, and PyTorch) into the MATLAB environment
by using the ONNX tool. The toolbox also supports transfer learning with
DarkNet-53, ResNet-50, NASNet, SqueezeNet, and many other pretrained
models [97].
Deep Network Designer and Experiment Manager are additional
applications with MATLAB Deep Learning Toolbox. Deep Network Designer
is helpful for designing, analyzing, and training neural networks graphically
[98,99]. The Experiment Manager helps in managing multiple deep learning
experiments, keeping track of training parameters, analyzing results, com-
paring code from different experiments, visualizing layer activations, and
graphically monitoring training progress [100]. It is easily possible to speed
up training using multiple GPU machines or scale up to clusters and clouds,
including NVIDIA-GPU Cloud and Amazon EC2-GPU instances [101].
Comparative Study of Keras, Caffe, PyTorch, MATLAB Deep Learning
Frameworks are shown in Table 6.4.
110 Machine Learning Algorithms and Applications in Engineering
TABLE 6.4
Comparative Study of Various Deep Learning Frameworks
MATLAB (Deep-
Parameter Keras Caffe PyTorch Learning Toolbox)
6.4 Conclusion
This chapter provides just an overview of a dream communication system.
However, it’s not easy to implement such a type of system, but this is a step
toward such significant technology. It may be available among the people
in near future. It is considered that information can be processed inside the
brain in the same manner as the computer works, while both dreaming and
awake. Hence information processing theory is most significant among all
other theories of dreaming. Further, it can be seen that during the REM sleep
stage, mostly dreams occur in the mind. On the other hand, it can also be seen
that most of the memory consolidation happens during the REM sleep stage,
which means temporary memory is transferred from the hippocampus to
associated neocortical regions for permanent storage, which is also known as
long-term memory. Dreaming is the result of memory consolidation, which
is a way of learning things by strengthening cortical connections of chemical
composition or neurotransmitters. It is possible to read and perform func-
tional activities while dreaming in the REM sleep stage by using various hard-
ware modalities frequently used in the domain of brain–computer interface
technology. Finally, we have to map these signals or imageries using deep
learning-based models clearly with having better audio and video qualities.
References
[1] Thomas Metzinger, Why are dreams interesting for philosophers? The example
of minimal phenomenal selfhood, plus an agenda for future research, Frontiers
in Psychology, 4:746, 2013.
[2] Walinga and Charles Stangor, Introduction to Psychology, BC- Campus Open
Education.
[3] Ricardo A. Velluti, Interactions between sleep and sensory physiology, J. Sleep
Res., 1997, 6, 61–77.
Brain–Computer Interface for Dream Visualization 111
[4] Julian Mutz, Amir-Homayoun Javadi, Exploring the neural correlates of dream
phenomenology and altered states of consciousness during sleep, Neuroscience
of Consciousness, Volume 2017, Issue 1, 2017, nix009, https://doi.org/10.1093/
nc/nix009
[5] Gwen Dewar, Sleep Requirements, Parenting Science Store.
[6] Sander van der Linden, The science behind dreaming, Scientific American, 2011.
[7] Sigmund Freud, The Interpretation of Dreams, 1900, https://psychclassics.yorku.
ca/Freud/Dreams/dreams.pdf.
[8] Wei Zhang and Benyu Guo, Freud’s dream interpretation: A different perspec-
tive based on the self-organization theory of dreaming, Front. Psychol., Vol 9,
2018, DOI=10.3389/fpsyg.2018.01553.
[9] Eugen Tarnow, How dreams and memory may be related, Neuropsychoanalysis, 2014.
[10] Chester A. Pearlman, REM sleep and information processing: Evidence from
animal studies, Neurosci. Biobehav. Rev., 1979, 57–68.
[11] Erin J. Wamsley and Robert Stickgold, Dreaming and offline memory pro-
cessing, Curr. Biol., 2010 Dec 7;20(23):R1010-3. doi: 10.1016/j.cub.2010.10.045.
PMID: 21145013; PMCID: PMC3557787.
[12] Michael S. Franklin, The role of dreams in the evolution of the human mind,
Evolutionary Psychology, January 2005. doi:10.1177/147470490500300106.
[13] J. Allan Hobson, Robert W. McCarley, The brain as a dream state generator: An
activation-synthesis hypothesis of the dream process, The American Journal of
Psychiatry, 1977.
[14] Eiser AS. Physiology and psychology of dreams. Semin Neurol. 2005
Mar;25(1):97-105. doi: 10.1055/s-2005-867078. PMID: 15798942.
[15] Lin Edwards, Dreams may have an important physiological function, Medical
Xpress, 2009.
[16] Gent TC, Bandarabadi M, Herrera CG, Adamantidis AR. Thalamic dual control
of sleep and wakefulness. Nat Neurosci. 2018 Jul;21(7):974–984. doi: 10.1038/
s41593-018-0164-7. Epub 2018 Jun 11. PMID: 29892048; PMCID: PMC6438460.
[17] Brown RE, Basheer R, McKenna JT, Strecker RE, McCarley RW. Control of
sleep and wakefulness. Physiol Rev. 2012 Jul;92(3):1087– 187. doi: 10.1152/
physrev.00032.2011. PMID: 22811426; PMCID: PMC3621793.
[18] Lukas T. Oesch, Mary Gazea, Thomas C. Gent, Mojtaba Bandarabadi, Carolina
Gutierrez Herrera, Antoine R. Adamantidis, REM sleep stabilizes hypothal-
amic representation of feeding behavior, Proceedings of the National Academy of
Sciences, 117 (32), 19590–19598, 2020, https://doi.org/10.1073/pnas.192190911.
[19] Corsi- Cabrera M, Velasco F, Del Río- Portilla Y, Armony JL, Trejo- Martínez
D, Guevara MA, Velasco AL. Human amygdala activation during rapid eye
movements of rapid eye movement sleep: an intracranial study. J Sleep Res. 2016
Oct;25(5):576–582. doi: 10.1111/jsr.12415. Epub 2016 May 5. PMID: 27146713.
[20] Brain Basics: Understanding Sleep. www.ninds.nih.gov/Disorders/Patient-
Caregiver-Education/Understanding-Sleep.
[21] Sleep Physiology, National Academies Press, 2006.
[22] Memories involve replay of neural firing patterns. www.nih.gov/news-events/
nih-research-matters/memories-involve-replay-neural-firing-patterns.
[23] Rasch B, Born J. About sleep's role in memory. Physiol Rev. 2013;93(2):681–766.
doi:10.1152/physrev.00032.2012
[24] Takeuchi Tomonori, Duszkiewicz Adrian J. and Morris Richard G. M. 2014.
The synaptic plasticity and memory hypothesis: encoding, storage and
112 Machine Learning Algorithms and Applications in Engineering
[83] Martin Heller, What is keras? The deep neural network API explained, www.
infoworld.com/article/3336192/what-is-keras-the-deep-neural-network-api-
explained.html
[84] Why use keras? https://mran.microsoft.com/snapshot/2018-01-07/web/
packages/keras/vignettes/why_use_keras.html
[85] Gautam Ramuvel, Best open source frameworks for machine learning, https://
medium.com/coinmonks/5-best-open-source-frameworks-for-machine-learn
ing-739d06170601
[86] What is caffe—The deep learning framework, https://codingcompiler.com/
what-is-caffe/
[87] Evan Shelhamer, Deep learning for computer vision with caffe and cuDNN, https://
developer.nvidia.com/blog/deep-learning-computer-vision-caffe-cudnn/
[88] Martin Heller, Caffe: deep learning conquers image classification, www.infowo
rld.com/article/3154273/review-caffe-deep-learning-conquers-image-classif
ication.html
[89] Caffe Python 3.6 NVidia GPU Production on Ubuntu, https://aws.ama
zon.com/marketplace/pp/Jetware-Caffe-Python-36-NVidia-GPU-Product
ion-on-U/
[90] Keras vs PyTorch vs Caffe: Comparing the implementation of CNN, https://
analyticsindiamag.com/keras-vs-pytorch-vs-caffe-comparing-the-impleme
ntation-of-cnn/
[91] Deep learning frameworks compared: Keras vs PyTorch vs Caffee, https://cont
ent.techgig.com/deep-learning-frameworks-compared-keras-vs-pytorch-vs-
caffee/articleshow/77480133.cms
[92] Tensorflow vs Keras vs Pytorch: Which framework is the best? https://med
ium.com/@AtlasSystems/tensorfl ow-vs-keras-vs-pytorch-which-framework-
is-the-best-f92f95e11502
[93] Implementing deep neural networks using PyTorch, https://medium.com/
edureka/pytorch-tutorial-9971d66f6893
[94] Deploying Pytorch in Python via a Rest API with Flask, https://pytorch.org/
tutorials/intermediate/flask_rest_api_tutorial.html
[95] Cuda Semantics, https://pytorch.org/docs/stable/notes/cuda.html
[96] Deep Learning Toolbox, www.mathworks.com/products/deep-learning.html
[97] Introducing deep learning with MATLAB, https://in.mathworks.com/campai
gns/offers/deep-learning-with-matlab.html
[98] Pretrained deep neural networks, https://in.mathworks.com/help/deeplearn
ing/ug/pretrained-convolutional-neural-networks.html
[99] Build networks with deep network designer, www.mathworks.com/help/
deeplearning/ug/build-networks-with-deep-network-designer.html
[100] Experiment manager, www.mathworks.com/help/deeplearning/ref/experi
mentmanager/
[101] Cloud and data centre, www.nvidia.com/en-in/data-center/gpu-cloud-
computing/
newgenprepdf
7
Machine Learning and Data Analysis
Based Breast Cancer Classification
CONTENTS
7.1 Introduction..................................................................................................117
7.1.1 Review of the Literature.................................................................118
7.2 Methodology................................................................................................118
7.2.1 Evaluation of Machine Learning Models.................................... 122
7.2.2 Data Description............................................................................. 122
7.3 Heat Map of Correlation........................................................................... 123
7.4 Results and Discussions............................................................................ 123
7.5 Conclusion................................................................................................... 128
7.1 Introduction
The current study preprocesses the cancer patient data, and machine learning
models will be trained with this data to classify breast cancer. The study
mainly consists of five parts: (i) data collection and understanding, (ii) data
preprocessing, (iii) training the ML models, (iv) evaluation, and (v) deploy-
ment. We have extracted the required data from the images of cancer cells
of 570 patients. We took 30 attributes from the images to classify cancer. The
features are computed from a digitized image of a fine needle aspirate (FNA)
of a breast mass. They describe the characteristics of the cell nuclei present
in the image. For applying machine learning techniques on the data set, we
need to preprocess and dispute the data to get a better data set. For analyzing
the data set, we need specific Python libraries and packages like Pandas for
data manipulation and analysis, NumPy for mathematical operations on the
data set, Matplotlib for data visualization, Seaborn for making heatmaps, and
Sklearn package for normalization of the data set. In data preprocessing, we
remove unnecessary columns from the data frame, change categorical values
to numerical values, and standardize the data.
Machine learning is often used to train machines on how to handle the
data more efficiently. Here, we have applied supervised learning algorithms
to train the model. These include KNN, decision tree, logistic regression,
and support vector machine (SVM). The main advantage of using machine
learning is that, once an algorithm learns what to do with data, it can do its
work automatically. Then we evaluate these models using the Metrics library
of Sklearn package. We select the model with high accuracy and deploy it to
classify the cancer of the new patient using his cancer cells data.
7.2 Methodology
Here, we describe a few concepts and algorithms, which we have used for
constructing the machine learning models for breast cancer classification.
ML in Breast Cancer Classification 119
(i) Data preprocessing: The raw data is highly vulnerable to missing data,
outliers, noise, and inconsistent data because of the vastness of data, mul-
tiple resources, and their gathering methods [6]. The poor quality of data
profoundly affects the results of machine learning models. Therefore,
preprocessing techniques must be applied to the data to get better results
of ML models [7]. After preprocessing, the data must be transformed into
the required form for training the ML models [8]. Data preprocessing
methods are data cleaning, filling missing values, removing outliers from
data, and standardization of data [9]. Missing values of a column data
can be filled with the mean of the column data. Outliers can be removed
using the binning process. Standardization of data gives a better classi-
fication because all the features will be constricted in the range of [–1,1]
[10]. We should split the data to train and test data sets in the ratio of 7:3.
Train data set will be used to train the ML model and test data set is
used to evaluate the trained ML model [11]. Cross-validation techniques
belong to conventional approaches used to ensure good generalization
and to avoid overtraining. The basic idea is to divide the data set T
into two subsets—one subset is used for training while the other sub-
group is left out, and the performance of the final model is evaluated
on it. The primary purpose of cross-validation is to achieve a stable and
confident estimate of the model performance [12]. Cross-validation
techniques can also be used when evaluating and mutually comparing
more models, various training algorithms, or when seeking optimal
model parameters [13].
(ii) Training of ML models: The purpose of machine learning is to learn from
the data. Many studies have been done on how to make machines learn
by themselves. Many mathematicians and programmers apply several
approaches to find a solution to this problem. Some of them are KNN,
logistic regression, decision tree, SVM, and random forest. Logistic
regression is a classification function that uses a class for building and
uses a single multinomial logistic regression model with a single esti-
mator [14]. SVM is the most recent supervised machine learning tech-
nique that revolves around the notion of a margin on either side of a
hyperplane that separates two data classes. Maximizing the margin
and thereby creating the most significant possible distance between the
separating hyperplane and the instances on either side of it has been
proven to reduce an upper bound on the expected generalization error
[14]. Decision trees are those types of trees that groups attribute by
sorting them based on their values. The decision tree is used mainly for
classification purposes. Each tree consists of nodes and branches. Each
node represents attributes in a group that is to be classified, and each
branch represents a value that the node can take. In k-NN classification,
the output is a class membership. An object is classified by a plurality
vote of its neighbors, with the object being assigned to the class most
common among its k nearest neighbors [14].
120 Machine Learning Algorithms and Applications in Engineering
^
Y = α + β1 + X1 + β 2 + X 2 + + β n + X n
^ 1 ^
cos t(Y , Y ) = (Y , Y )2
2
1 n
∑
^
J (β) = cos t(Y i , Yi )
n i =1
^ 1
Sigmoid function: σ(Y ) =
1 + e −Y
^
Probability that the output is 1: p(Y = 1| X) = σ(Y )
^
Probability that the output is 0: p(Y = 0|X) = 1 − σ(Y )
ML in Breast Cancer Classification 121
^
If σ(Y ) is greater than or equals to 0.5, logistic regression returns 1 as
output, else returns 0 as output.
SVM uses different kernel functions to map the data into high dimen-
sional plane, where the data can be separated linearly.
Kernel functions used in SVM are the following:
N
y( x ) = ∑ wi ϕ( x − xi )
i =1
122 Machine Learning Algorithms and Applications in Engineering
7.2.2 Data Description
The data that is used in the current study for classifying breast cancer contains
data of 570 patients of Wisconsin hospital with 32 attributes. The attributes
information is given below.
ID number, and Diagnosis (M =malignant, B =benign) are the general
attributes.
Ten real-valued attributes are considered and given as follows: (a) radius
(mean of distances from center to points on the perimeter), (b) texture (standard
deviation of gray-scale values), (c) perimeter, (d) area, (e) smoothness (local
variation in radius lengths), (f) compactness (perimeter^2 /area –1.0),
(g) concavity (severity of concave portions of the contour), (h) concave points
(number of concave portions of the contour), (i) symmetry, and (j) fractal
dimension (“coastline approximation” –1)
The mean, standard error, and “worst” or largest (mean of the three lar-
gest values) of these features were computed for each image, resulting in 30
features.
ML in Breast Cancer Classification 123
FIGURE 7.1
Correlational plot.
(i) KNN
(a) The accuracy of this model is 93 percent with a k value of 7.
(b) Maximum K-fold cross-validation accuracy is 93.75 percent with a
k value of 5.
FIGURE 7.2
Accuracy graph for KNN.
FIGURE 7.3
Cross validation accuracy graph for KNN.
(ii) Decision Tree
The accuracy is 89.9 percent. Maximum K-fold cross-validation accuracy is
93.6 percent with a k value of 14.
The F1 score of decision tree algorithm is 92.61 percent.
ML in Breast Cancer Classification 125
FIGURE 7.4
Accuracy graph for Decision tree.
FIGURE 7.5
Accuracy graph for Logistic regression.
126 Machine Learning Algorithms and Applications in Engineering
(iii) Logistic Regression
The accuracy is 92.9 percent. Maximum K-fold cross-validation accuracy is
93.77 percent with a k value of 5.
The F1 score of logistic regression algorithm is 90.5 percent.
FIGURE 7.6
Accuracy graph for SVM (Rbf).
FIGURE 7.7
Accuracy graph for SVM (poly).
ML in Breast Cancer Classification 127
FIGURE 7.8
Accuracy graph for SVM (sigmoid).
FIGURE 7.9
Accuracy graph for Random Forest.
(iv) SVM (Rbf)
The accuracy is 92.89 percent. Maximum K-fold cross-validation accuracy is
95 percent with a k value of 12.
The F1 score of SVM (Rbf) algorithm is 90.78 percent.
128 Machine Learning Algorithms and Applications in Engineering
(v) SVM(Poly)
The accuracy is 89.89 percent. Maximum K-fold cross-validation accuracy is
91.37 percent with a k value of 12.
The F1 score of SVM (Poly) algorithm is 84.61 percent.
(vi) SVM(Sigmoid)
The accuracy is 88.16 percent. Maximum K-fold cross-validation accuracy is
88.07 percent with a k value of 7 or 8.
The F1 score of SVM (Sigmoid) algorithm is 84.5 percent.
(vii) Random Forest
The accuracy is 93.49 percent. Maximum K-fold cross-validation accuracy is
96.08 percent with a k value of 9.
The F1 score of random forest algorithm is 93.5 percent.
From the above results, we can conclude that random forest model is the
best model so far with a 96.08 percent accuracy and 0.935 F1 score.
7.5 Conclusion
The current study attempts to solve the problem of classifying breast
cancer based on the data obtained from raw images of breast cancer. We
have considered various state-of-art machine learning algorithms, such as
K-nearest neighbor, decision tree, support vector machine, and random
forest. It is found that random forest is the best model for classifying breast
cancer. All the experiments and analysis are performed using Python library.
This study helps in formulating the process of diagnosing the breast cancer
from the preprocessed data obtained from raw images of breast cancer.
In future study, deep learning-based tools and techniques can be used to
improve the prediction power and make the model more efficient. Future
selection process can also be added with the existing study to improve the
prediction accuracy.
ML in Breast Cancer Classification 129
References
[1] A. Bhardwaj and A. Tiwari, “Breast Cancer Diagnosis Using Genetically
Optimized Neural Network Model,” Expert Syst. Appl., pp. 1–15, 2015.
[2] A. O. Ibrahim and S. M. Shansuddin, “Intelligent breast cancer diagnosis based
on enhanced Pareto optimal and multilayer perceptron neural network,” Int.
J. Comput. Aided Eng. Technol., vol. 10, no. 5, 2018.
[3] N. Liu, E. Qi, M. Xu, B. Gao, and G. Liu, “A novel intelligent classification model
for breast cancer diagnosis,” Inf. Process. Manag., vol. 56, no. 3, pp. 609–623, 2019.
[4] N. Zemmal, N. Azizi, N. Dey, and M. Sellami, “Adaptive semi-supervised
support vector machine semi supervised learning with features cooperation
for breast cancer classification,” J. Med. Imaging Heal. Informatics, vol. 6, no. 1,
pp. 53–62, 2016.
[5] A. Helwan, J. B. Idoko, and R. H. Abiyev, “Machine learning techniques for clas-
sification of breast tissue,” Procedia Comput. Sci., vol. 120, no. 2017, pp. 402–410,
2018, doi: 10.1016/j.procs.2017.11.256.
[6] M. Han, Jiawei and Jian Pei, and Kamber, Data Mining: Concepts and Techniques.
Elsevier, Amsterdam, 2011.
[7] J. Han, “Proceedings of the 1996 ACM SIGMOD international conference on
management of data,” in Data Mining Techniques, 1996, p. 545.
[8] Soukup, Tom, and Davidson, Lan, Visual Data Mining: Techniques and Tools for
Data Visualization and Mining. John Wiley & Sons, 2002.
[9] G. Wang, Jianyong, and Karypis, “On efficiently summarizing categorical
databases,” Knowl. Inf. Syst., vol. 9, no. 1, pp. 19–37, 2006.
[10] W. S. Alasadi, A. Suad, and Bhaya, “Review of data preprocessing techniques in
data mining,” J. Eng. Appl. Sci., vol. 12, no. 16, pp. 4102–4107, 2017.
[11] H.R. Bowden, J. Gavin, and Graeme C. Dandy, and Maier, “Input determination
for neural network models in water resources applications. Part 1—background
and methodology,” J. Hydrol., vol. 301, no. 1–4, pp. 75–92, 2005.
[12] W. G. Cochran, Sampling Techniques. John Wiley & Sons, 2007.
[13] D. Fernandes, Stenio, Carlos Kamienski, Judith Kelner, and Denio Mariz, and
Sadok, “A stratified traffic sampling methodology for seeing the big picture,”
Comput. Networks, vol. 52, no. 14, pp. 2677–2689, 2008.
[14] J. Osisanwo, J.E.T. Akinsola, O. Awodele, J.O. Hinmikaiye, O. Olakanmi, and
Akinjobi, “Supervised machine learning algorithms: classification and com-
parison,” Int. J. Comput. Trends Technol., vol. 48, no. 3, pp. 128–138, 2017.
newgenprepdf
8
Accurate Automatic Functional
Recognition of Proteins: Overview and
Current Computational Challenges
CONTENTS
8.1 Biological Framework................................................................................ 131
8.2 Identifying the Protein Functions............................................................ 133
8.3 Automatic Functional Annotation of Proteins....................................... 134
8.4 Challenges................................................................................................... 137
8.1 Biological Framework
Bioinformatics is a discipline that binds biology and computer science and
deals with tasks such as the acquisition, storage, analysis and diffusion
of biological data. The data that this field deals include very often DNA
and amino acid sequences. Bioinformatics uses advanced computational
approaches to answer a wide variety of problems in molecular biology.
Traditionally, the alignment of sequences, gene finding, the establishment
of evolutionary relationships or the prediction of three-dimensional forms
of sequences were the main problems addressed by this field (Mount, 2004).
Recently, the new trends that have appeared include the functional predic-
tion of proteins.
Proteins are complex chemical macromolecules which play a funda-
mental role in life within organisms (O’Connor et al, 2010). They can be seen,
according to their primary structure, as long chains of amino acids connected
to each other. Amino acids are small organic molecules that are linked by a
peptide bond. These linear chains fold in space forming a three-dimensional
structure. The way of folding is dependent on the amino acids that compose
it1 and determines the properties of the protein.
The importance of proteins lies in the many functions they perform in
organisms. For example, at the structural level, they make up the majority
of cell material; at the regulatory level, enzymes are proteins; at the immune
level, antibodies are glycoproteins. They constitute more than 50 percent of
the dry weight of cells (Liu, 2020), and it can be generally stated that they are
involved in regulating all the processes that take place in living beings, and
in the vast majority of cases, their molecular function is determined by the
relationship they establish with other proteins (Braun and Gingras, 2012) and
with other molecules in the environment.
Due to how determinant it is to know the function of the different proteins
in an organism, many scientific projects are currently focused on trying to elu-
cidate their behaviour, regulation and possible activity from a biochemical,
biological, biomedical or bioinformatics angle. Hundreds of thousands of art-
icles are published every year describing the activity of proteins in different
situations because it is a crucial factor in analysing cellular mechanisms, iden-
tifying the functional changes that lead to possible problems at a systemic
level and discovering new tools for the prevention, diagnosis and treatment
of diseases. In other words, knowing the function of proteins is synonymous
with understanding life at a molecular level, which has a series of relevant
implications for the pharmaceutical and biomedical industry.
In late December 2019, the World Health Organization (WHO) was noti-
fied of several cases of pneumonia of unknown aetiology, including severe
cases. Shortly after, the new coronavirus SARS-CoV-2 (Zhu et al., 2020) was
identified as the causative agent. International health authorities recorded
a rapid spread of the virus, calling the outbreak of SARS-CoV-2 infections a
global pandemic and declaring it a Public Health Emergency of International
Importance. Given the seriousness, numerous costly research projects began
to be developed around the pathogen. To date, after genome sequencing and
analysis, several genes of the coronavirus that causes COVID-19 have been
discovered, which encode the synthesis of 39 proteins (Gordon et al., 2020)
for the time being. In some cases, it has been possible to identify the functions
they play. However, in others, they are still a mystery.
When discussing the function of proteins, it should be borne in mind that
this is a concept that can have different meanings depending on the different
contexts and/or biological levels. Thus, the biochemical/molecular level,
the cellular level and the phenotypic level can be considered. Therefore, the
function of a protein can be categorized into three main groups:
• Molecular function
• Biological process
• Cellular component
Accurate Automatic Functional Recognition of Proteins 133
TABLE 8.1
Number of Sequences (patterns) with Experimental Annotations in Data Sets Grouped
by Sub-ontologies
FIGURE 8.1
Overall performance evaluation at two levels.
Accurate Automatic Functional Recognition of Proteins 137
8.4 Challenges
In the light of the aforementioned, after an exhaustive review of the specific lit-
erature on the problem of protein function prediction, a number of challenges
have been identified that could be of great interest to scientists involved in
research projects on this topic. First, to propose new assessment metrics for
the problem of automatic functional annotation. Within this point, models
have traditionally been evaluated with metrics derived from precision and
recall. These metrics are often biased towards the majority classes and do not
consider the class hierarchy present in the problem. It would be interesting
to propose new evaluation metrics for supervised classifiers implemented
to solve automatic protein function prediction problems. Within this area,
models have traditionally been evaluated with metrics derived from pre-
cision and recall. Such metrics (e.g., F-score) tend to have a bias towards
majority classes (as does the correct classification rate [CCR]). It also assumes
that the distributions associated with the class label distributions and the
predicted distributions are equal. In short, the F-score is not a suitable metric
to measure the goodness of fit of a classifier in a typical bioinformatics scen-
ario with multiple classes and imbalance problems between them. Moreover,
this problem presents hierarchical classes so that classification errors should
not all have the same weight.
Second, it is considered necessary to develop scalable machine learning
models for the problem under study. Functional protein annotation is a classi-
fication problem composed of more than 100k instances, more than 5k classes
and of a multi-label nature. State-of-the-art machine learning algorithms have
serious computational difficulties in addressing problems of this typology
due to these models scale up with the sample size of the problem. In order
to overcome these serious computational difficulties to deal with problems
with so many instances and classes, it would be appealing to test lightweight
alternatives (if compared to classical algorithms) and propose models whose
computational complexity is adjusted to the needs of the problem. Moreover,
these models will have to be designed taking into account that, on the one
hand, this is a multi-label problem, and that, on the other hand, they must rely
on different and diverse sources of information, which makes an ensemble
perspective ideal.
Finally, in line with the above challenge, it should again be noted that the
data sets found in the protein function prediction problem contains more than
138 Machine Learning Algorithms and Applications in Engineering
Notes
1 The way of folding does not only depend on the sequence of amino acids. For
example, prions are proteins that fold in a different way than native protein but do
not change their amino acid sequence. There are proteins in the cell that are respon-
sible for the correct folding of the generated new proteins. The physicochemical
environment may also affect the final result of the folding.
2 Number of entries in UniProtKB/Swiss-prot.
References
Braun, P. and Gingras, A.C. (2012). History of protein–protein interactions: From egg
white to complex networks. Proteomics, 12(10), 1478-1498.
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. and
Madden, T. L. (2009). BLAST+: architecture and applications. BMC Bioinformatics,
10(1), 421.
Gene Ontology Consortium (2019). The gene ontology resource: 20 years and still
GOing strong. Nucleic Acids Research, 47 (D1), D330–D338.
Gordon, D.E., Jang, G.M., Bouhaddou, M. et al. (2020). A SARS-CoV-2 protein inter-
action map reveals targets for drug repurposing. Nature 583, 459–468.
Götz, S., García-Gómez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., ...
& Conesa, A. (2008). High-throughput functional annotation and data mining
with the Blast2GO suite. Nucleic Acids Research, 36(10), 3420–3435.
Jeffery, C.J. (2018). Protein moonlighting: what is it, and why is it important?
Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1738),
20160523.
Kulmanov, M., Khan, M. A. and Hoehndorf, R. (2018). DeepGO: predicting protein
functions from sequence and interactions using a deep ontology-aware classi-
fier. Bioinformatics, 34(4), 660–668.
Liu, S. (2020). Bioprocess Engineering: Kinetics, Sustainability, and Reactor Design. Elsevier.
Mount, D.W. (2004). Sequence and genome analysis. Bioinformatics. Cold Spring
Harbour Laboratory Press: Cold Spring Harbour, 2.
O’Connor C.M., Adams, J.U. and Fairman, J. (2010). Essentials of Cell Biology.
Cambridge, MA: NPG Education, 1, 54.
Accurate Automatic Functional Recognition of Proteins 139
Peled, S., Leiderman, O., Charar, R., Efroni, G., Shav-Tal, Y. and Ofran, Y. (2016).
Denovo protein function prediction using DNA binding and RNA binding
proteins as a test case. Nature Communications, 7(1), 1–9.
Radivojac, P., Clark, W.T., Oron, T.R., Schnoes, A. M., Wittkop, T., Sokolov, A. … and
Pandey, G. (2013). A large-scale evaluation of computational protein function
prediction. Nature Methods, 10(3), 221–227.
Rifaioglu, A.S., Doğan, T., Martin, M.J., Cetin- Atalay, R. and Atalay, V. (2019).
DEEPred: Automated protein function prediction with multi-task feed-forward
deep neural networks. Scientific Reports, 9(1), 1–16.
Slatko, B.E., Gardner, A.F. and Ausubel, F.M. (2018). Overview of next-generation
sequencing technologies. Current Protocols in Molecular Biology, 122(1), e59.
Spolaôr, N., Cherman, E.A., Monard, M.C. and Lee, H.D. (2013). A comparison of
multi-label feature selection methods using the problem transformation
approach. Electronic Notes in Theoretical Computer Science, 292, 135–151.
Tsoumakas, G. and Katakis, I. (2007). Multi- label classification: An overview.
International Journal of Data Warehousing and Mining (IJDWM), 3(3), 1–13.
UniProt Consortium (2018). UniProt: the universal protein knowledgebase. Nucleic
Acids Research, 46(5), 2699.
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015). The I-TASSER
Suite: protein structure and function prediction. Nature Methods, 12(1), 7.
Zheng, S. (2016). IRAS: High-throughput identification of novel alternative splicing
regulators. Methods in Enzymology (Vol. 572, pp. 269–289). Academic Press.
Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., … and Niu, P. (2020). A novel
coronavirus from patients with pneumonia in China, 2019. New England Journal
of Medicine, 382 (8), pp. 727–733.
9
Taxonomy of Shilling Attack Detection
Techniques in Recommender System
CONTENTS
9.1 Introduction................................................................................................. 142
9.1.1 Collaborative Filtering in Recommender Systems.................... 143
9.1.1.1 User-Based Collaborative Filtering (UBCF)................. 143
9.1.1.2 Item-Based Collaborative Filtering (IBCF)................... 144
9.2 Profile Injection Attack Method............................................................... 145
9.3 Shilling Attack Detection........................................................................... 146
9.4 Rating........................................................................................................... 146
9.5 Time Interval............................................................................................... 146
9.6 Classification............................................................................................... 147
9.6.1 Using the Rating Parameter.......................................................... 147
9.6.2 Detection of Attack Profiles........................................................... 147
9.6.2.1 Hilbert–Haung Transform and Support
Vector Machine (HHT-SVM)........................................ 148
9.6.2.2 Re-scale AdaBoost......................................................... 148
9.6.2.3 Variable-Length Partitions with Neighbor
Selection.......................................................................... 148
9.6.2.4 Principal Component Analysis.................................... 149
9.6.2.5 Discrete Wavelet Transform and Support
Vector Machine (DWTSVM)......................................... 149
9.6.2.6 Principal Component Analysis and Perturbation..... 149
9.6.2.7 Semi-supervised Learning Using the Shilling
Attack Detection (SEMI-SAD)..................................... 150
9.6.2.8 Support Vector Machine and Target Item
Analysis (SVM-TIA)...................................................... 150
9.6.2.9 Rating Deviation from Mean Agreement................... 150
9.6.2.10 Novel Shilling Attack Detection.................................. 150
9.1 Introduction
Within the last 20 years, recommender systems have come out as one of the
efficient techniques to deal with the knowledge, which is overloaded by sug-
gestive information and it acquires potential interest to the online users. They
are helpful to the businesses manufacturing merchandise as it increases the
selling rate, cross-sales, and customers’ loyalty. As a result, customers tend to
come back to the sites that best serve their desires. The recommendation strat-
egies are typically categorized into three main prospects: (i) content-based
recommendation, (ii) collaborative filtering, and (iii) hybrid recommenda-
tion approaches. Recommendation system, particularly the collaborative
filtering (CF)-based system, is introduced with success to filter out irrelevant
resources (Si and Li 2020, Sarwar et al. 2001). In the recommender systems,
collaborative filtering recommender system (CFRS) is considered as one of
the most well-liked and productive techniques. CFRS works on the principle
that identical users have identical tastes. However, collaborative filtering
results in the source of strength and vulnerability for recommender systems
due to its open and interactive nature. Generally, a user-based CA algorithm
makes recommendation by searching out similar user patterns, which are
illustrated by the preferences of numerous totally non-identical people (Si
and Li 2020). If profiles contain biased information, they may be thought as
real users and eventually lead to biased recommendations. Therefore, rele-
vant data gets buried under a good deal of irrelevant information. Collective
Shilling Attack Detection Techniques in Recommender System 143
FIGURE 9.1
Framework of user-based collaborative filtering (UBCF).
shows the framework of UBCF. For this technique, it needs two tasks. First,
using a similarity function the k-nearest neighbors (kNN) to the user are
found. The kNN-based algorithm is the most popular CF algorithm. Data is
represented as a u × i user-item matrix. Where an entry (u, i) indicates either
the rating user u gave to item i, if he/she rated it, or null otherwise. Using
the Pearson correlation given in Equation (9.1), it computes the similarity
between users (Chirita et al. 2005) as
Wij =
∑ k ∈I (R ik
R −R
−Rl jk
j )( ) (9.1)
( )∑ ( )
2 2
∑ k ∈I
Rik − Rl k ∈I
Rjk − R j
where I represents the set of items. Rik and Rjk are the ratings users i and j gave
and R
to item k. R are the average ratings of user i and j, respectively.
l j
Finally, using the kNN formula, the predictions for user i and item b can
be computed as
Pia = Ri +
∑
k
j =1 (
Wij Rja − Rj )
(9.2)
∑
k
j =1
Wij
FIGURE 9.2
Framework of item-based collaborative filtering (IBCF).
i. j
sim ( i , j ) = (9.3)
i. j
Then, the prediction for an item is computed by using weighted sum. Here,
Rub is a weighted average of user’s ratings. Therefore, the prediction score is
calculated using the following formula given as
TABLE 9.1
General Shilling Profile
Items iA1 ... iAj iD1 ... iDl iα1 ... iαl iP
Ratings δ(iA1) ... δ(iAj) σ(iD1) ... σ(iDl) Null Null Null γ(iP)
9.4 Rating
Rating is a measurement of the quality of something, especially when
compared with other things of the same type.
9.5 Time Interval
A clock breaks the time into the intervals of hours, minutes, and seconds. An
interval is a discrete measurement of time between two things.
Shilling Attack Detection Techniques in Recommender System 147
9.6 Classification
In this section, a classification of shilling attack detection technique has been
presented. Based on the parameter used, the techniques have been classified
into two types: (i) using rating parameter and (ii) using rating and time
interval parameter. The techniques are further classified into three types
based on the output: (i) attack profile detection, (ii) attack item detection and
attack profiles, and (iii) attack item detection. The classification is shown in
Figure 9.3.
FIGURE 9.3
Classification.
148 Machine Learning Algorithms and Applications in Engineering
9.6.2.2 Re-scale AdaBoost
This detection method improves the overall performance with respect to
following two aspects. Initially, it finds out on well-structured features from
client properties. Then using the statistical properties based on the various
attack models, it makes hard identification circumstances to act simpler.
Later, with reference to the general thought of re-scale Boosting (RBoosting)
and AdaBoost, it tends to use a variation of AdaBoost, known as the re-scale
AdaBoost (RAdaBoost) (Yang et al. 2016). The RAdaBoost detection tech-
nique supports the extracted features. This implies that if the parameters are
selected suitably, the RBoosting technique is basically an optimal boosting-
type calculation. The RAdaBoost may be employed in conjunction with sev-
eral alternative sorts of the machine learning (ML) algorithms for upgrading
the capability of SA recognition. This technique also increases stress on
concerned attacks. On a troublesome classification task, it will clearly improve
the predictive capability.
and after application of Gaussian noise in every user profile. Therefore, the
SAs are discovered by merging the results obtained from two PCAs. This
technique achieves higher accuracy in experiments as rating of a shilling pro-
file (SP). The SP contains less deviation from average rating. The effect of
injecting perturbation to SPs ought to be greater than that of the legitimate
profile. The experiment result confirms that this technique outperforms the
initial PCA. Injecting perturbation is useful for shilling attack detection.
allocates a binary decision tree (BDT) where attack profiles are gathered
in a leaf node. BDT is constructed by recursively clustering the training
data to locate the fake attack profiles using k-means clustering algorithm.
The user-item matrix is divided into two distinct clusters at each level.
Intra-cluster correlation coefficient is calculated for each internal node.
The process is repeated until there remains at most a predefined number
of users in any leaf node. Then BDT is traversed to detect anomalies with
intra-cluster correlation coefficient and label the node holding all or most
of the attack profiles.
9.6.7.3 Data Tracking
This data tracking detection technique (Qi et al. 2018) is used in huge infor-
mation atmosphere. This technique supports new information feature. The
technique uses extended Kalman filter, which quickly tracks and accurately
predicts the rating status of the item supported two new detection attributes,
short-term average change activity (SACA) and short-term variance change
activity (SVCA). The detector then detects the abnormal item by comparing
predicted ratings and actual ones.
TABLE 9.2
Advantages and Disadvantages
1. Novel shilling attack 1. This method gives promising 1. This method detects
detection ( Bilge et al. results for almost all filler only specific three
2014b). size and attack size. types of attacks namely
bandwagon, segment, and
average attack.
2. When all the profiles are
genuine in the system, this
technique misjudges the
genuine profiles as malicious
profiles.
2. Hilbert–Haung 1. This method decomposes 1. In this method, as soon as
Transform and each rating series. It also new types of attacks occur,
support vector extracts Hilbert spectrum- the SVM classifier requires to
machine (HHT-SVM) based features for be re-trained offline.
( Fuzhi et al. 2014). characterizing the profile
injection attacks.
2. Here SVM distinguishes the
genuine users’ profile and
attacker profile.
3. Re-scale AdaBoost 1. This method extracts 1. The detection rate of this
(Yang et al. 2016). well-designed features method is low for the attacks
from user profiles. For that have small size of attack
improving detection and filler size.
performance, these profiles 2. Re-scale AdaBoost cannot
are established based on effectively detect Power User
the various attack models’ Attack-Aggregate Similarity
statistical properties. (PUA-AS), Power User
2. It makes hard detection Attack-Number of Ratings
scenarios easier to perform. (PUA-NR) and Power User
Attack-In Degree (PUA-ID)
attacks.
3. Generic features and type-
specific features which are
present in this technique
as extractive features are
not enough to depict their
material characteristics.
4. Rating Deviation 1. This algorithm successfully 1. It is unable to detect
from Mean detects attack profiles which segment attack and love/hate
Agreement (RDMA) are random, average and attack.
(Si and Li 2020; Burke bandwagon.
et al. 2006).
Shilling Attack Detection Techniques in Recommender System 155
TABLE 9.2 (Continued)
Advantages and Disadvantages
14. Attack detection in 1. The time series of these 1. During attack, sometimes
time series (Zhang sample average and sample normal users’ rating patterns
et al 2006). entropy features can expose also change, so it becomes
attack events by giving difficult to identify shilling
reasonable assumptions attackers.
about their duration.
2. An optimal window size will
be derived theoretically, only
if the attack profile number
is known. This is done to
best identify the rating
distribution changes, which
is caused by attacks.
15. Dynamic 1. This technique divides the 1. In this method, there
time interval life cycle of each item into exists an issue of
segmentation several time intervals. This improving the detection
technique (Xia et al. division is done dynamically performance against target
2015). at checkpoints. shifting attack.
2. It detects anomalous item 2. It is not able to minimize the
through hypothesis test. impact of attacks effectively,
after identifying suspicious
intervals of each item.
16. UnRap (Si and Li 1. This method is used to detect 1. It detects only the attack user
2020). shilling profiles by analyzing profiles for individual times.
user profile’s rating deviation
on target item.
17. Data tracking (Qi 1. Big Data processing is 1. Data tracking is not
et al. 2018) adapted in this technique. applicable to handle large
2. Detection efficiency of the data for distributed system.
data tracking method is high.
9.7 Conclusion
Recommender system (RS) is an application that helps users to select relevant
products from the internet. To reduce the damage of a shilling attack and
maintain good quality of recommendation, the security of RSs is a significant
issue. However, fake user profiles created by SA methods can be accurately
detected by recent SA detection techniques. The ratting pattern of these SA
techniques is different from real users. Being one of the most efficient ways
for handling the problem of information overload, CFRSs are very much
158 Machine Learning Algorithms and Applications in Engineering
9.7.1 Future Direction
The future directions are working on the shortcomings of these shilling
attack detection techniques. Like just in case of dynamic time interval seg-
mentation technique, the detection performance against target shifting attack
requires to be enhanced. A shortcoming of data tracking technique is that it
is not applicable on distributed system to handle giant data. The problem
with HHT-SVM is that, when new kinds of attacks are conducted, it becomes
difficult to detect attack profiles. SVM classifier also requires being re-trained
offline. Overcoming these problems are the future directions.
References
Bhaumik R, Burke R, Mobasher B (2007). Effectiveness of crawling attacks against
web-based recommender systems. Proceedings of the 5th workshop on intelli-
gent techniques for web personalization (ITWP-07).
Bhaumik R, Mobasher B, Burke R. (2011). A clustering approach to unsupervised
attack detection in collaborative recommender systems. Proceedings of the
International Conference on Data Mining (DMIN). The Steering Committee of
the World Congress in Computer Science, Computer Engineering and Applied
Computing (World Comp), p. 1.
Bhaumik R, Williams C, Mobasher B, Burke R (2006 Jul 16). Securing collaborative
filtering against malicious attacks through anomaly detection. In Proceedings of
the 4th Workshop on Intelligent Techniques for Web Personalization (ITWP’06),
Boston. Vol. 6, p. 10.
Bilge A, Gunes I, Polat, H. (2014a) Robustness analysis of privacy-preserving model-
based recommendation schemes. Expert Syst Appl, 41(8): 3671–3681.
Bilge A, Ozdemir Z, Polat H. (2014b). A novel shilling attack detection method. Proc
Computer Science, 31: 165–174.
Shilling Attack Detection Techniques in Recommender System 159
Wang Y et al. (2018). A comparative study on shilling detection methods for trust-
worthy recommendations. Journal of Systems Science and Systems Engineering
27(4): 458–478.
Wu Z, Cao J, Mao B, Wang Y. (2011, October). Semi-SAD: applying semi-supervised
learning to shilling attack detection. In Proceedings of the fifth ACM conference
on Recommender systems. ACM. pp. 289–292.
Xia H., Fang B, Gao M, Ma H, Tang Y, Wen J. (2015). A novel item anomaly detec-
tion approach against shilling attacks in collaborative recommendation systems
using the dynamic time interval segmentation technique. Information Sciences
306: 150–165.
Xin, Z, Hong X, Yuqi S (2019). Meta-path and matrix factorization based shilling
detection for collaborate filtering. International Conference on Collaborative
Computing: Networking, Applications and Worksharing. Springer, Cham,
pp. 3–16.
Yang, Z., et al. (2016). Re-scale AdaBoost for attack detection in collaborative filtering
recommender systems. Knowledge-Based Systems 100: 74–88.
Zhang, S, Chakrabarti A, Ford J, Makedon F (2006). Attack detection in time series for
recommender systems. In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, ACM. pp. 809–814.
Zhou W et al. (2016). SVM-TIA a shilling attack detection method based on SVM and
target item analysis in recommender systems. Neurocomputing 210: pp. 197–205.
newgenprepdf
10
Machine Learning Applications in
Real-World Time Series Problems
Córdoba, Spain
[email protected]
*
CONTENTS
10.1 Introduction................................................................................................. 161
10.2 Real-World Applications of TSDM Using ML algorithms.................... 164
10.2.1 Massive Missing Data Reconstruction in Wave
Height Time Series......................................................................... 164
10.2.2 Detection and Prediction of Tipping Points............................... 165
10.2.3 Prediction of Fog Formation......................................................... 167
10.2.4 Prediction of Other Convective Situations................................. 169
10.3 Summary and Other Related Works........................................................ 170
10.1 Introduction
This first section introduces the topic presented and the related state-of-the-
art developments. Time series data mining (TSDM) mainly consists of the
following tasks: anomaly detection (Blázquez-García et al., 2020), classifica-
tion (Ismail-Fawaz et al., 2019), analysis and preprocessing (Hamilton, 1994),
segmentation (Keogh et al., 2004), clustering (Liao, 2005) and prediction
(Weigend, 2018). More concretely, this chapter is focused on the applications
of time series preprocessing, segmentation and prediction to real- world
problems.
Time series analysis and preprocessing are considered previous steps
for other TSDM tasks. One of the most important tasks is the imputation
of missing values in time series, which is essential for the application of
FIGURE 10.1
Segmentation procedure for a time series of length N =20 and four segments.
s1 = { y1 , y 2 , y 3 , y 4 , y 5 , y 6 } , s2 = { y 6 , y7 , y 8 , y 9 } , s3 = { y 9 , y10 , y11 , y12 , y13 , y14 , y15 , y16 , y17 } and
transition from one segment to the next, each cut point belongs to two
segments, in such a way that they belong to the previous and to the next
segment. Figure 10.1 graphically represents a segmentation procedure.
As abovementioned, segmentation is used to achieve different object-
ives. In the literature, there are two main groups of objectives. On the one
hand, this technique is carried out to discover segment similarities or useful
patterns over time. Methods within this context try to optimize the division
of time series and then group the segments into different clusters. Several
methods have been proposed during the last two decades. Initially, Abonyi
et al. (2003) stated that all points in a time series belonging to the same cluster
are contiguous in time. After that, many authors proposed methods to group
segments instead of points. Tseng et al. (2009) developed an algorithm in
which similarities in wavelet space are considered to guide the segmentation,
resulting in a clustering of segments of different length. Besides, a signifi-
cant clustering of subsequent time series was addressed using two efficient
methods (Rakthanmanon et al., 2012). As can be seen, all these segmenta-
tion procedures require clustering algorithms, given that they aim to make
groups of segments in order to discover useful similarities. In this sense,
several algorithms for time series clustering have been proposed recently
(Guijo-Rubio et al., 2020a), aiming to obtain time series groups with similar
characteristics or based on segments typologies.
On the other hand, time series segmentation is also applied for approxi-
mating time series; in other words, to reduce the number of time series points.
In this case, the procedure is performed by selecting those points whose
approximation is the most accurate regarding the real-time series, that is, the
approximation aims to minimize the information loss or the approximation
error. One of the main purposes is to mitigate the difficulty of processing and
memory requirements. A well-known approach in this context is the use of
piecewise linear approximations, where linear regressions or interpolations
are used for modelling each segment (Keogh et al., 2004). Moreover, Fu
(2011) presented an approach for approximating time series by defining it as
an optimization problem that could be solved by evolutionary algorithms.
Lately, other authors have developed a novel approach based on connected
lines under a predefined maximum error bound (Zhao et al., 2016).
Finally, the prediction of a time series is easily and formally defined as
follows: given a time series Y = { yi } (i = 1 N ) , the prediction consists in the
determination of the value y n + t , T being a future instant of time. This pro-
cedure learns from the known past values in order to generate a model able to
estimate future ones accurately. Traditional statistical models are still widely
used for this task, for example, the COVID-19 pandemic spread in Saudi
Arabia has been forecasted by AutoRegressive Integrated Moving Average
(ARIMA) models (Alzahrani et al., 2020). Nevertheless, nowadays there has
been an increasing interest in developing novel ML algorithms, such as arti-
ficial neural networks (ANNs), or even more advanced approaches, such as
164 Machine Learning Algorithms and Applications in Engineering
successfully applied over Lake Ontario. And finally, physical methods have
also been combined with ML approaches for wave height estimation (Casa-
Prat et al., 2014).
As most ML models lack interpretability inherent to these techniques, we
proposed a new method in Durán-Rosal et al. (2015). In this paper, a product
unit neural network trained by an evolutionary algorithm (EPUNN) was
proposed through a two-staged procedure. This methodology was applied to
six buoys located at the Gulf of Alaska, United States.
The first stage consisted in performing an initial reconstruction using
transfer functions and neighbour correlation method. On the one hand,
transfer functions are based on the analysis of the correlation and the estima-
tion of a gappy time series by means of a complete one (note that the correl-
ation between them needs to be higher than a predefined threshold). On the
other hand, the neighbour correlation method estimates these missing values
by adding information from the highly correlated buoys.
Once both methods have recovered the missing values, we keep the best
one, that is, the one resulting in the smallest error. After that, the best recovery
for each time series is used as input for the EPUNN of the second stage. In this
sense, the two most correlated inputs (concerning the original time series) are
used to get the definitive reconstructed time series.
The results achieved for the six buoys located at the Gulf of Alaska indicated
that EPUNNs are suitable for this kind of problems, given their accuracy and
their interpretability. Besides, they can be represented as linear models by
applying natural logarithm to the inputs for easing the understanding and
interpretation. As a remark, better reconstructions are achieved for coastal
buoys, given that the availability of values was higher.
FIGURE 10.2
Best ANN obtained by MOEA using PUs as basis functions. V and D are the velocity and wind
direction, respectively, T corresponds to the temperature, H represents the humidity, P is the
pressure, and RVR is the runway visual range previously defined.
The results showed that the combination of a FW and a DW with the algo-
rithm KDLOR is outstanding, producing the best results in terms of min-
imum sensitivity and AMAE (the average of ordinal classification error made
for each class). This methodology could lead to an improvement in the safety
and profitability of aviation operations at airports affected by low-visibility
events.
The authors of this chapter have addressed some other applications in real-
world problems since time series can be found in a wide variety of fields, as
was mentioned above. A GA was proposed in combination with a likelihood-
based segmentation procedure to automatically recognize financial patterns
in European stock market indexes (Durán-Rosal et al., 2017a). The detection
and prediction of extreme waves using the retrieved time series obtained by
oceanographic buoys were also solved by applying a two-stage algorithm
(Durán-Rosal et al., 2017b). First, a segmentation algorithm, resulting from
an evolutionary approach hybridized with a likelihood-based segmentation
involving a beta distribution, was used to detect the extreme events. Second,
an evolutionary ANN was proposed to predict them. Finally, in Guijo-Rubio
et al. (2020d), evolutionary ANNs were successfully applied for the predic-
tion of global solar radiation at the radiometric station of Toledo (Spain)
using satellite-based measurements.
Apart from this, there is a vast amount of works using time series in other
applications. Therefore, the potential of this type of temporal data is signifi-
cant due to the wide variety of solutions given to these problems by using
ML techniques. The future lies in the massive collection of data in real time,
its processing and the instantaneous generation of automatic ML techniques
to model and extract knowledge from them.
References
Abonyi, J., Feil, B., Nemeth, S., Arva. P. 2003. Fuzzy clustering based segmentation of
time-series. In Advances in Intelligent Data Analysis V, 275–285. https://doi.org/
10.1007/978-3-540-45231-7_26
Alzahrani, S. I., Aljamaan, I. A., Al-Fakih, E. A. 2020. Forecasting the spread of the
COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under
current public health interventions. Journal of Infection and Public Health, 1(7),
914–919. https://doi.org/10.1016/j.jiph.2020.06.001
Belda, S., Pipia, L., Morcillo-Pallarés, P., Rivera-Caicedo, J. P., Amin, E., De Grave, C.,
Verrelst, J. 2020. DATimeS: A machine learning time series GUI toolbox for gap-
filling and vegetation phenology trends detection. Enviromental Modelling and
Software 127, 104666. https://doi.org/10.1016/j.envsoft.2020.104666
Balas, C., Koç, L., Balas, L. 2004. Predictions of missing wave data by recurrent
neuronets. Journal of Waterway, Port, Coastal, and Ocean Engineering, 130(5), 256–
265. https://doi.org/10.1061/(ASCE)0733-950X(2004)130:5(256)
Bergot, T., Terradellas, E., Cuxart, J., Mira, A., Liechti, O., Mueller, M., Nielsen, N.W.
2007. Intercomparison of single-column numerical models for the prediction of
radiation fog. Journal of Applied Meteorology and Climatology, 46, 504–521. https://
doi.org/10.1175/JAM2475.1
Bhattacharya, B., Shrestha, D., Solomatine, D. 2003. Neural networks in reconstructing
missing wave data in sedimentation modelling. In Proceedings of the 30th IAHR
Congress, 500, 770–778.
172 Machine Learning Algorithms and Applications in Engineering
Blázquez-García, A., Conde, A., Mori, U., Lozano, J. A. 2020. A review on outlier/
anomaly detection in time series data. ArXiv:2002.04236 [Cs], https://arxiv.org/
abs/2002.04236
Bremnes, J. B., Michaelides, S. C. 2007. Probabilistic visibility forecasting using
neural networks. In Fog and Boundary Layer Clouds: Fog Visibility and Forecasting.
Springer, pp. 1365–1381. https://doi.org/10.1007/s00024-007-0223-6
Cao, Z., Cai, H. 2016. Identification of forcing mechanisms of convective initi-
ation over mountains through high- resolution numerical simulations.
Advances in Atmospheric Sciences, 33(10), 1104. https://doi.org/10.1007/s00
376-016-6198-4
Casas-Prat, M., Wang, X. L., Sierra, J. P. 2014. A physical-based statistical method for
modeling ocean wave heights. Ocean Modelling, 73, 59–75. https://doi.org/
10.1016/j.ocemod.2013.10.008
Chandra, R., Chand, S. 2016. Evaluation of co- evolutionary neural net-
work architectures for time series prediction with mobile application in
finance. Applied Soft Computing, 49, 462–473. https://doi.org/10.1016/
j.asoc.2016.08.029
Chmielecki, R. M., Raftery, A. E. 2011. Probabilistic visibility forecasting using
Bayesian model averaging. Monthly Weather Review, 139, 1626–1636. https://doi.
org/10.1175/2010MWR3516.1
Colabone, R. D. O., Ferrari, A. L., Vecchia, F. A. D. S., Tech, A. R. B. 2015. Application
of artificial neural networks for fog forecast. Journal of Aerospace Technology and
Management, 7, 240–246. https://doi.org/10.5028/jatm.v7i2.446
Cornejo-Bueno, L., Casanova-Mateo, C., Sanz-Justo, J., Cerro-Prada, E., Salcedo-Sanz,
S. 2017. Efficient prediction of low-visibility events at airports using machine-
learning regression. Boundary-Layer Meteorology, 165, 349–370. https://doi.org/
10.1007/s10546-017-0276-8
Dakos, V., Carpenter, S. R., Brock, W. A., Ellison, A. M., Guttal, V., Ives, A. R., Kefi,
S., Livina, V., Seekell, D. A., Van Nes, E. H, et al. 2012. Methods for detecting
early warnings of critical transitions in time series illustrated using simulated
ecological data. PloS One, 7(7), e41010. https://doi.org/10.1371/journal.
pone.0041010
Durán-Rosal, A. M., Hervás-Martínez, C., Tallón-Ballesteros, A. J., Martínez-Estudillo,
A. C., Salcedo-Sanz, S. 2015. Massive missing data reconstruction in ocean buoys
with evolutionary product unit neural networks. Ocean Engineering, 117, 292–
301. https://doi.org/10.1016/j.oceaneng.2016.03.053
Durán-Rosal, A. M., de la Paz- Marín, M., Gutiérrez, P. A., Hervás- Martínez, C.
2017a. Identifying market behaviours using European Stock Index time series
by a hybrid segmentation algorithm. Neural Processing Letters, 46(3), 767–790.
https://doi.org/10.1007/s11063-017-9592-8
Durán-Rosal, A. M., Fernández, J. C., Gutiérrez, P. A., Hervás-Martínez, C. 2017b.
Detection and prediction of segments containing extreme significant wave
heights. Ocean Engineering, 142, 268–279. https://doi.org/10.1016/j.ocean
eng.2017.07.009.
Durán-Rosal, A. M., Fernández, J. C., Casanova-Mateo, C., Sanz-Justo, J., Salcedo-
Sanz, S., Hervás-Martínez, C. 2018. Efficient fog prediction with multi-objective
evolutionary neural networks. Applied Soft Computing 70, 347–358. https://doi.
org/10.1016/j.asoc.2018.05.035.
ML Applications in Real-World Time Series Problems 173
Fabbian, D., de Dear, R., Lellyett, S., 2007. Application of artificial neural network
forecasts to predict fog at Canberra international airport. Weather Forecast, 22,
372–381. https://doi.org/10.1175/WAF980.1.
Fu, T. C. (2011). A review on time series data mining. Engineering Applications of
Artificial Intelligence, 24(1), 164–181.
Guijo-Rubio, D., Gutiérrez, P. A., Casanova-Mateo, C., Sanz-Justo, J., Salcedo-Sanz,
S., Hervás-Martínez, C. 2018. Prediction of low-visibility events due to fog
using ordinal classification. Atmospheric Research, 214, 64–73. https://doi.org/
10.1016/j.atmosres.2018.07.017.
Guijo-Rubio, D., Durán-Rosal, A. M., Gutiérrez, P. A., Troncoso, A., Hervás-Martínez,
C. 2020a. Time-series clustering based on the characterization of segment typ-
ologies. IEEE Transactions on Cybernetics (early access). http://doi.org/10.1109/
TCYB.2019.2962584
Guijo-Rubio, D., Gutiérrez, P. A., Casanova-Mateo, C., et al. 2020b. Prediction of con-
vective clouds formation using evolutionary neural computation techniques.
Neural Computing and Applications, 33, 13917–13929. https://doi.org/10.1007/
s00521-020-04795-w
Guijo-Rubio, D., Casanova-Mateo, C., Sanz-Justo, J., Gutiérrez, P. A., Cornejo-Bueno,
S., Hervás, C., Salcedo-Sanz, S. 2020c. Ordinal regression algorithms for the ana-
lysis of convective situations over Madrid-Barajas airport. Atmospheric Research,
236, 104798. https://doi.org/10.1016/j.atmosres.2019.104798
Guijo-Rubio, D., Durán-Rosal, A. M., Gutiérrez, P. A., et al. 2020d. Evolutionary artifi-
cial neural networks for accurate solar radiation prediction. Energy, 210, 118374.
https://doi.org/10.1016/j.energy.2020.118374
Gunaydin, K. 2008. The estimation of monthly mean significant wave heights by using
artificial neural network and regression methods. Ocean Engineering, 35(14–15),
406–1415. https://doi.org/10.1016/j.oceaneng.2008.07.008
Karevan, Z., & Suykens, J. A. (2020). Transductive LSTM for time-series prediction: An
application to weather forecasting. Neural Networks, 125, 1–9.
Hamilton, J. D. 1994. Time Series Analysis, volume 2. Princeton University Press,
Princeton, NJ.
Ismail Fawaz, H., Forestier, G., Weber, J. et al. 2019. Deep learning for time series clas-
sification: a review. Data Mining and Knowledge Discovery 33, 917–963. https://
doi.org/10.1007/s10618-019-00619-1
Keogh, E., Chu, S., Hart, D., Pazzani, M. 2004. Segmenting time series: A survey and
novel approach. Data mining in Time Series Databases, 1–21. https://doi.org/
10.1142/9789812565402_0001
Lenton, T. M. 2011. Early warning of climate tipping points. Nature Climate Change, 1,
201–209. https://doi.org/10.1038/nclimate1143
Liao, T. W. 2005. Clustering of time series data—a survey. Pattern Recognition, 38(11),
1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025
Londhe, S. 2008. Soft computing approach for real-time estimation of missing wave
heights. Ocean Engineering, 35(11–12), 1080–1089. https://doi.org/10.1016/
j.oceaneng.2008.05.003
López, I., Andreu, J., Ceballos, S., de Alegría, I. M., Kortabarria, I. 2013. Review of
wave energy technologies and the necessary power- equipment. Renewable
and Sustainable Energy Reviews, 27, 413–434. https://doi.org/10.1016/
j.rser.2013.07.009
174 Machine Learning Algorithms and Applications in Engineering
Zhao, H., Dong, Z., Li, T., Wang, X., Pang, C. 2016. Segmenting times series with
connected lines under maximum error bound. Information Sciences, 345, 1–8.
https://doi.org/10.1016/j.ins.2015.09.017
Zhou, J., Huang, Z. 2018. Recover missing sensor data with iterative imputing net-
work. In Workshops at the Thirty- Second AAAI Conference on Artificial
Intelligence.
newgenprepdf
11
Prediction of Selective Laser Sintering
Part Quality Using Deep Learning
CONTENTS
11.1 Introduction.............................................................................................. 177
11.1.1 Aim of the Chapter.................................................................... 180
11.2 Selective Laser Sintering Additive Manufacturing............................. 180
11.3 Machine Learning.................................................................................... 181
11.3.1 Data.............................................................................................. 182
11.3.2 Models......................................................................................... 182
11.3.3 Training....................................................................................... 182
11.3.4 Learning Type............................................................................ 183
11.4 Deep Neural Network Learning............................................................ 183
11.5 An Illustration Case................................................................................. 184
11.5.1 Dataset for the Chapter............................................................. 185
11.5.2 Deep Neural Network Parameters.......................................... 185
11.5.3 K-fold-Cross-validation for Training the Deep
Neural Networks....................................................................... 187
11.5.4 Overfitting.................................................................................. 187
11.5.5 Results and Discussion............................................................. 188
11.6 Conclusions............................................................................................... 189
11.1 Introduction
Manufacturing is the pillar of the economy that transforms raw materials into
products. Manufacturing is faced with a growing need for individualisation,
FIGURE 11.1
A typical SLS additive manufacturing process.
formed over the build platform by the roller in the AM machine. A powder
removal platform is moved down the powder removal port to remove the
excessive metal powder from the machine. These two processes—the powder
layering and powder melting—take place alternately till all products are
produced. Each layer is joined with the adjacent layers by the laser energy-
fused material powder. In end, the parts on the build platform are extracted
out of the AM machine. The machine is now cleaned for the next AM part
production. Next, the parts on the build platform often undergo thorough
heat treatment to remove the thermal stress. Finally, the parts are cut from
the platform for performing the post-processing actions like removing the
support structure and surface polishing.
Selective laser sintering (SLS) has the merits to recycle the leftover unpro-
cessed metal powder, time efficiency, energy efficiency and geometrical
freedom for product design. SLS is a promising AM manufacturing process
for various applications such as aerospace and automotive manufacturing.
11.3 Machine Learning
Machine learning is a promising area that employs existing data to predict/
respond to the future data. Machine learning is used for computational
statistics, pattern recognition and artificial intelligence. Machine learning is
182 Machine Learning Algorithms and Applications in Engineering
vital for fields, for example, spam filtering, facial recognition and other areas
with no feasibility/possibility to frame algorithms to do a task. Machine
learning is a technique to use machines, that is, computers with software, to
find insights from existing data. It also refers to the ability of the machines to
learn from the environment. Machines have been employed for human help
since the start of civilisation. Machine learning is a process of using an algo-
rithm to transform input data into parameters to interpret the future data.
Now, this section describes some key terms used in machine learning.
11.3.1 Data
Every learning technique is based upon data. A data set is employed to
impart training to the machine system. Data sets are gathered by people for
training. The data set may be of very large size. Machine control systems can
gather data using sensors from the system operation. It employs the data to
recognise the parameters or to train the machine system.
11.3.2 Models
Models are usually employed in the machine learning systems. The models
provide the mathematical framework for the machine learning systems.
A model is made by a person. It depends upon the human observations and
their experiences. For instance, a model of bus of length 1 and width 1 looking
from the top is of a rectangular shape with the size of a standard parking slot.
Models are often considered as man-made to give a framework to machine
learning. But sometimes machine learning developed its models without any
man-made structure.
11.3.3 Training
The machine learning system relates an input data to an output. It is required
training to perform this work. Similar to people, as the man needs training
to do the tasks, the machine also needs learning systems for their training.
Training is provided by feeding the machine system with a known input
and the corresponding known output to change the models or data in the
learning for relationship learning. Sometimes, it is similar to the curve fitting
technique or the regression technique. By having sufficient training data, the
machine system gains the ability to generate correct outputs corresponding
to new inputs. For instance, on feeding thousands of rat images to the face
recognition machine system, the images are generated to be of rats. Now, on
giving new rat images, the machine system will be able to identify these as
rats. If enough training data sets or the training data in terms of quantity or
variety is not provided, it may face a problem in identifying the rats.
DL in Prediction of Selective Laser Sintering Part Quality 183
11.3.4 Learning Type
This section describes types of learning as below.
The hidden layers are used to cast the input data on a space of multiple
dimensions. Here, the given input data may be examined applying various
perspectives . Greater the network hidden layer numbers, greater the probable
hidden patterns may be identified in a given data. But increasing layers creates
obstacles to train and perform the deep neural network such as the following.
(i) Severe vanishing gradient problem. Due increased depth of the neural
networks, it becomes difficult for some starting layers of the deep
neural network to get the forecasting error. It degrades the effectiveness
of the training considerably.
(ii) The risk of the overfitting considerably goes up from increased com-
plexity in the architecture of deep neural network, since it needs more
parameters for network training.
(iii) There is increased computation power required to deal with increased
complexity of the deep neural network and amount of training data .
But these problems had been solved up to some degree with advances
in deep neural network learning.
X = [ w , lt , dt , v , Ts , Te , Ms ] (11.1)
Here, the deep neural network is capable of expressing relationship of the con-
traction ratio with the SLS parameter. This chapter has considered a study[47]
for the demonstration of deep learning. Specimen material in this chapter
is taken as HBI, a composite of polystyrene in the study[47]. The shrinkage
ratio[47] is defined as
Y % = ([SD − SM ] SD ) × 100 (11.4)
where SD and SM are CAD model value and the measured value.
TABLE 11.1
The SLS Parameters with Levels [47]
one node for one response characteristic, that is, SLS manufactured product’s
relative density. There was 10 hidden layers in the deep neural network. It is
found that due to the vast variations in values of seven input SLS parameters
and output as product quality characteristic, that is, contraction ratio, the
deep neural network learning process resulted in a deep neural network. So,
input characteristics as well as the output characteristics should be made in
normal form as below.
With mean of characteristic Xi and sample size N, Xmean is calculated as,
∑
N
i =1
Xi
Xmean = (11.5)
N
− Xmean )
∑ (X
N 2
i =1 i
Xsd = (11.6)
N
Xi − Xmean
X NormDeviate = (11.7)
Xsd
The rectified linear unit (ReLU), f(ZNormDeviate) =max (0, ZNormDeviate) is an active
function, employed for dealing to move from one layer to the next layer
[50]
. This activation function is a very popular nonlinear function due to its
faster learning ability in deep neural networks having multiple layers of
neurons[51]. The sigmoid function, an active function, that is, f(ZNormDeviate) = 1/
(1+exp(-ZNormDeviate)) was employed for deep neural network’s output layer
due to output nature of real value. For training of the deep neural network
DL in Prediction of Selective Laser Sintering Part Quality 187
1 ∑ (YPredicted − Yreal )
N 2
Eod ( w ) = i = 1 (11.8)
2 N
where Yreal, Ypredicted and N are the actual magnitude of experimental output,
predicted magnitude of the model and grand number of SLS printed parts,
respectively.
11.5.4 Overfitting
Overfitting is a problem that usually happens in machine learning for very
accurate model with the training data set; however, it not very accurate with
validation data set. On the occurrence of overfitting, the neural network
model generally indulges in learning of the noise found within the training
188 Machine Learning Algorithms and Applications in Engineering
data set, rather than actual relationship among the parameters in the data
set. In order to refrain from overfitting, the dropout technique and weight
decay regularisation technique are employed. Weight decay technique is an
important technique to avoid weight to grow very high in value without real
requirements. This technique does so with the addition of a term in the loss
function to penalise the high weights [54,55] as in the equation below:
1
Ed ( w ) = Eod ( w ) + .λ.wi2 (11.9)
2
∑ (Y )
− Yreal
N
i predicted
AAE = (11.10)
N
Figure 11.2 exhibits the 8-fold-cross-validation strategy for deep neural net-
work having 440 epochs for every fold with 0.001 amount of learning rate.
Convergence for validation as well as training graphs shows that there is
no overfitting by the deep neural network. Further, the value of AAE is 1.53
for the training phase. The value of AAE is 1.54 for the validation phase.
DL in Prediction of Selective Laser Sintering Part Quality 189
FIGURE 11.2
The 8-fold-cross-validation strategy for deep neural network.
This shows that there is similarity in the average AAE for training as well as
validation phase. The value of standard deviation for training phase is 0.21.
This lower standard deviation shows that deep neural network does not have
any tendency to change considerably with various training subset data. This
indicates a good forecasting ability of the deep neural network.
11.6 Conclusions
In this chapter, the quality of a product in terms of minimum shrinkage
ratio produced by selective laser sintering was predicted from important
SLS parameters such as surrounding working temperature, laser scanning
speed, layer thickness, scanning mode, hatch distance, laser power and
interval time. The relationship among SLS parameters with contraction
ratio is modelled using the deep neural network because the SLS variables
are considered to be multitudinous as well as nonlinear. In this chapter, the
machine learning system employed supervised deep learning including
process parameters as input characteristics and the quality of a product in
terms of minimum shrinkage ratio as output characteristic. Weight decay and
the dropout technique were employed to overcome the overfitting problem
found in deep neural network technique. The predicted outputs character-
istic was compared to the actual characteristic. The shrinkage ratio found
by the deep neural network can be employed to determine information for
shrinkage compensation in the SLS process.
190 Machine Learning Algorithms and Applications in Engineering
References
[1] Kang, H.S., J.Y. Lee, S. Choi, H. Kim, J.H. Park, and J.Y. Son et al. 2016. Smart
manufacturing: past research, present findings, and future directions. Int J Precis
Eng Manuf Green Technol, 3(January (1)):111–28.
[2] Kusiak, A. 2017. Smart manufacturing must embrace big data. Nature, 544 (April
(7648)): 23–25.
[3] Lee, J., B. Bagheri, and H.A. Kao. 2015. Cyber-physical systems architecture for
industry 4.0 based manufacturing systems. Manuf Lett, 3 (January): 18–23.
[4] Lee, J., H. Davari, J. Singh, and V. Pandhare. 2018. Industrial artificial intel-
ligence for industry 4.0- based manufacturing systems. Manuf Lett, 18
(October): 20–23.
[5] Kumar, L. and P.K. Jain. 2010. Selection of additive manufacturing technology.
Adv Production Eng Mgmt J, 5(2): 75–84.
[6] Kumar, L. and P. K. Jain (2006). Rapid prototyping: a review, issues and problems,
International Conference on CARs & FOF,VIT, India, 1: 126–138.
[7] Kumar, L. and R. A. Khan (2004). Rapid design and manufacturing. Global
Conference on flexible System Management (GLOGIFT) March, JMI, Delhi,
1: 13–15.
[8] Kumar, L., M. Shoeb, and A. Haleem (2020). An overview of additive manufac-
turing technologies. Studies in Indian Place Names, 40(10), 441–450.
[9] Kumar, L., M. Shoeb, A. Haleem, and M. Javaid (2022). Composites in context
to Additive Manufacturing, CIMS-2020-International Conference on Industrial
and Manufacturing Systems, NIT Jalandhar, Punjab, India, Lecture Notes
on Multidisciplinary Industrial Engineering, Springer, Cham, 491–503, ISBN
978-3-030-73494-7.
[10] Kumar L. and P.K. Jain (2022). Carbon conscious and artificial immune system
optimization modeling of metal powder additive manufacturing scheduling.
In Computational Intelligence for Manufacturing Process Advancements, eds. P.
Chatterjee, D.P. Željko-Stević, S. Chakraborty, and S. Bhattacharyya, Taylor &
Francis, CRC Press (Accepted for publication by Editor).
[11] Lu, B, D. Li, and X. Tian. 2015. Development trends in additive manufacturing
and 3D printing. Engineering, 1(1):85–89.
[12] Derby, B. 2015. Additive manufacture of ceramics components by ink jet printing.
Engineering, 1(1):113–123.
[13] Gu, D., C. Ma, M. Xia, D. Dai, and Q. Shi. 2017. A multi scale understanding of
the thermodynamic and kinetic mechanisms of laser additive manufacturing.
Engineering, 3(5): 675–684.
[14] Herzog, D., V. Seyda, E. Wycisk, and C. Emmelmann. 2016. Additive manufac-
turing of metals. Acta Mater, 117(15): 371–392.
[15] Liu, L., Q. Ding, Y. Zhong, J. Zou, J. Wu, and Y.L. Chiu et al. 2018. Dislocation
network in additive manufactured steel breaks strength– ductility trade-
off.
Mater Today, 21(4):354–361.
[16] Gorsse, S., C. Hutchinson, M. Gouné, and R. Banerjee. 2017. Additive manu-
facturing of metals: a brief review of the characteristic microstructures and
properties of steels, Ti–6Al–4V and high-entropy alloys. Sci Technol Adv Mater,
18(1): 584–610.
DL in Prediction of Selective Laser Sintering Part Quality 191
[17] Santosa, E.C., M. Shiomia, K. Osakadaa, and T. Laoui. 2006. Rapid manufac-
turing of metal components by laser forming. Int J Mach Tools Manuf, 46(12–
13): 1459–1468.
[18] Chen, X., C. Wang, X. Ye, Y. Xiao, and S. Huang. 2001. Direct slicing from power
SHAPE models for rapid prototyping. Int J Adv Manuf Technol, 17(7): 543–547
doi:10.1007/s001700170156
[19] Li, X.S., M. Han, and Y.S. Shi. 2001. Model of shrinking and curl distortion for
SLS prototypes. Chin J Mech Eng, 12(8): 887–889.
[20] John, D.W. and R.D. Carl. 1998. Advances in modeling the effects of selected
parameters on the SLS process. Rapid Prototyping J, 4 (2): 90–96. doi:10.1108/
13552549810210257
[21] Yang, H.J., P.J. Huang, and S.H. Lee. 2002. A study on shrinkage compensation of
SLS process by using the Taguchi method. Int J Mach Tools Manuf, 42(10): 1203–
1212. doi:10.1016/S0890-6955 (02)00070-6
[22] Masood, S.H., W. Ratanaway, and P. Iovenitti. 2003. A genetic algorithm for best
part orientation system for complex parts in rapid prototyping. J Mater Process
Technol, 139(3):110–116. doi:10.1016/S0924-0136(03)00190-0
[23] Bai, P.K., J. Cheng, B. Liu, and W.F. Wang. 2006. Numerical simulation of
temperature field during selective laser sintering of polymer coated molyb-
denum powder. Trans Nonferrous Met Soc China, 16 (3):603–607. doi:10.1016/
S1003-6326(06)60264-1
[24] Arni, R.K. and S.K. Gupta. 1999. Manufacturability analysis for solid freeform
fabrication. In: Proceedings of DETC 1999 ASME Design Engineering Technical
conference, Vegas, NV, 1–12.
[25] Armillotta, A. and G.F. Biggioggero. 2001. Control of prototyping surface finish
through graphical simulation. In: Proceedings of the 7th ADM International
conferences, Grand Hotel, Rimini, Italy, 17–24.
[26] Shi, Y., J. Liu, and S. Huang. 2002. The research of the SLS process optimization
based on the hybrid of neural network and expert system. In: Proceedings of the
International Conference on Manufacturing Automation, 409–418.
[27] Zheng, H.Z., J. Zhang, S.Q. Lu, G.H. Wang, and Z.F. Xu. 2006. Effect of core–
shell composite particles on the sintering behavior and properties of nano-
Al2O3/polystyrene composite prepared by SLS. Mater Lett, 60(9–10): 1219–1223.
doi:10.1016/j.matlet. 2005.11.003
[28] Kohavi, R. and F. Provost. 1998. Glossary of terms. Mach Learn, 30(2–
3): 271–274.
[29] Géron, A. 2017. Hands-on Machine Learning with ScikitLearn and Tensor Flow:
Concepts, Tools, and Techniques to Build Intelligent Systems. Boston, MA: O’Reilly
Media Inc.
[30] Devlin, J., M. Chang, K. Lee, and K.B. Toutanova. 2018. Pre-training of deep bi-
directional transformers for language understanding. arXiv, 1810:04805.
[31] Anusuya, M.A. and S.K. Katti. 2010. Speech recognition by machine-a review.
arXiv, 1001–2267.
[32] Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. Image Net classification
with deep convolutional neural networks. In: F. Pereira, C.J.C. Burges, L. Bottou,
and K.Q. Weinberger, eds. Advances in Neural Information Processing Systems
25, Proceedings of Neural Information Processing Systems, 2012, December 3–6,
Lake Tahoe, NV, 1097–1105.
192 Machine Learning Algorithms and Applications in Engineering
[33] Ondruska, P. and I. Posner. 2016. Deep tracking: seeing beyond seeing using
recurrent neural networks, arXiv, 1602.00991.
[34] Mehrotra, P., J.E. Quaicoe, and R. Venkatesan. 1996. Speed estimation of induc-
tion motor using artificial neural networks. IEEE Trans Neural Netw, 6: 881–886.
[35] Hornik, K., M. Stinchcombe, and H. White.1989. Multilayer feed forward
networks are universal approximators. Neural Netw, 2(5): 359–366. doi:10.1016/
0893-6080(89)90020-8
[36] Lecun, Y., Y. Bengio, and G. Hinton. 2015. Deep learning. Nature, 521 (May
(7553)): 436–44.
[37] Hinton, G.E. and R.R. Salakhutdinov. 2006. Reducing the dimensionality of data
with neural networks. Science, 313 (July (5786)): 504–507.
[38] Hinton, G.E., S. Osindero, and Y.W. Teh. 2006. A fast learning algorithm for deep
belief nets. Neural Comput, 18(7): 1527–1554.
[39] Hirschberg, J. and C.D. Manning. 2015. Advances in natural language pro-
cessing. Science, 349 (July (6245)): 261–266.
[40] Chan, W., N. Jaitly, Q. Le, and O. Vinyals. 2016. Listen, attend and spell: neural
network for large vocabulary conversational speech recognition. Proc. IEEE-
ICASP, 4960–4964.
[41] Liu, W., Z. Wang, X. Liu, N. Zeng, Y. Liu, and F.E. Alsaadi. 2017. A survey of
deep neural network architectures and their applications. Neuro Computing,
234(April): 11–26.
[42] Mamoshina, P., A. Vieira, E. Putin, and A. Zhavoronkov. 2016. Applications of
deep learning in biomedicine. Mol Pharm, 13(May (5)):1445–1454.
[43] Chen, C., A. Seff, A. Kornhauser, and J. Xiao. 2015. DeepDriving: learning
affordance for direct perception in autonomous driving. Proc. IEEE- ICCV,
2722–2730.
[44] Covington, P., J. Adams, and E. Sargin. 2016. Deep neural networks for YouTube
recommendations. Proc. ACM—RecSys, 191–198.
[45] He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image rec-
ognition. Proc. IEEE-CVPR, 770–778.
[46] Sze, V., Y.H. Chen, T.J. Yang, and J.S. Emer. 2017. Efficient processing of deep neural
networks: a tutorial and survey. Proc. IEEE, 105 (December (12)): 2295–2329.
[47] Rong-Ji, W., L. Xin-hua, W. Qing-ding, and L. Lingling (2009). Optimizing pro-
cess parameters for selective laser sintering based on neural network and gen-
etic algorithm. Int J Adv Manuf Technol, 42: 1035–1042.
[48] Abadi, M., P. Barham, J. Chen, Z. Chen, A. Davis, and J. Dean et al. 2016.
TensorFlow: a system for large-scale machine learning. 12th USENIX Symp.
Oper. Syst. Des. Implement. (OSDI’ 16). https://doi.org/10.1038/nn.3331
[49] Zelle, J. 2010. Python programming: an introduction to computer science.
https:// doi.org/10.2307/2529413
[50] Glorot, X., A. Bordes, and Y. Bengio. 2011. Deep sparse rectifier neural
networks. AISTATS’ 11 Proc 14th Int Conf Artif Intell Stat, 15: 315– 23.
doi:10.1.1.208.6449
[51] LeCun, Y.A., Y. Bengio, and G.E. Hinton. 2015. Deep learning. Nature, 44: 436–
444. https:// doi.org/10.1038/nature14539
[52] Kingma, D.P., and J.L. Adam Ba. 2017. A method for stochastic optimization. Int
Conf Learn Represent. https://doi.org/10.1145/1830483.1830503
DL in Prediction of Selective Laser Sintering Part Quality 193
12
CBPP: An Efficient Algorithm for Privacy-
Preserving Data Publishing of 1:M Micro
Data with Multiple Sensitive Attributes
CONTENTS
12.1 Introduction.............................................................................................. 196
12.2 Related Works........................................................................................... 197
12.3 Contributions............................................................................................ 200
12.4 Correlation Attacks and Those Scenarios............................................. 200
12.5 Implementation of the Proposed CBPP Algorithm............................. 202
12.5.1 Generalization and Allocation of Batch Id............................. 204
12.6 CBPP Algorithm....................................................................................... 205
12.7 Evaluation................................................................................................. 207
12.8 Measurement of Information Loss........................................................ 207
12.8.1 Query Accuracy......................................................................... 207
12.9 Conclusion and Future Directions......................................................... 208
12.1 Introduction
Preservation of the individuals’ privacy is an essential concern during and
after the data is transmitted to third person. In the era of digital world, data
owner organizations are discomforted by data privacy. Due to an increase
in the generation of data, the task of masking data without disclosing the
sensitive information of an individual has become a challenging task.
Organizations such as health sectors, pharmaceutical agencies and govern-
ment sectors often share their data with researchers and third parties for
various analysis. Therefore, the data publisher needs to take the responsi-
bility of preserving privacy during data publishing. The data publisher [1]
must be a trustworthy person. The data publisher needs to have a complete
knowledge about the privacy law and act before disclosing the data to the
data recipients. Data publisher should take the responsibility for the privacy
of the data and also the utility of the data. Data publisher ensures the data is
properly anonymized, so that the sensitive information of the original data
is unknown to the data recipient. However, the utility of the original data
should be preserved, so that the information needed is acquired correctly and
properly. The publisher needs to ensure that the proper model and techniques
have been applied on the original data before disclosing the data to the
recipient so that there is no clue for sensitive information. Two types of data
are of major concern before publishing the data: (i) individual privacy and
(ii) collective privacy. Revealing explicit identifiers such as name and id can
directly breach a particular individual privacy. To protect personal privacy,
the direct identifiers should be removed and the sensitive attributes related
to the particular individual need to be made private and anonymized. The
safeguarding of an individual privacy may not be adequate. Learning about
an individual’s sensitive information may also lead to inferring of the infor-
mation about a group of individuals. Therefore, sensitive knowledge about
the data set needs to be preserved. Preserving sensitive knowledge inferred
from the data set is termed as collective privacy preservation [2]. The data set
consists of two kinds of attributes termed as (i) sensitive attribute and (ii) non-
sensitive attributes [3]. A distinctive care needs to be given for sensitive
attributes as they contain sensitive information about an individual. These
sensitive attributes should not be disclosed to the third party. Non-sensitive
attributes in the dataset are projected for the purpose of analysis. The non-
sensitive attributes are collectively termed as quasi-identifier. Data publisher
discloses the non-sensitive attributes to the third party. However, the non-
sensitive attributes can collectively provide the personal information about
an individual if it is related with other external sources. Data anonymization
should be carried in a proper way by adopting various privacy methods and
models. The privacy-preserving models and methods implemented on the
data set should balance between the privacy and utility. The basic notion of
CBPP for Privacy-Preserving Data Publishing 197
FIGURE 12.1
Privacy-preserved data publishing.
12.2 Related Works
Earlier, the researchers were concentrating on privacy- preserving data
publishing on a 1:1 dataset. An individual might have multiple records with
different sensitive attributes called a 1:M dataset in the real world. Consider
an individual who has cancer. A cancer disease comes with many side effects
such as weight loss, vomiting and hair loss. The same individual visits
different doctors in the hospital to be treated. So, each time he visits the hos-
pital, a record is registered in the hospital database. Such a scenario leads to
a 1:M dataset. When 1:1 privacy models are implemented on 1:M datasets, it
leads to various privacy breaches. Several techniques have been proposed for
a 1:1 dataset, such as slicing [5], Mondrian [6], suppression [7], clustering and
multi-sensitive bucketization [8].
Another method called (k, km)-anonymous [9] was proposed, which
divided the attributes into relational and transactional attributes. The limi-
tation of (k, km) –anonymous is if the information loss for the relational
attribute is minimized, the information loss for the transactional attribute
would increase and vice- versa. An efficient approach (p,k)-anglicization,
was proposed for the anonymization of multiple sensitive attributes. The
(p,k)-anglicization provides the optimal balance between privacy and utility
[10]. A novel method called overlapped slicing was proposed for privacy-
preserving data publishing. Overlapped slicing was implemented with mul-
tiple sensitive attributes. Overlapped slicing has proven to provide better
utility [11].
The anatomization prevents generalization of quasi-identifiers, thus the
information loss is significantly less and results in higher privacy. However,
the anatomization method is efficient due to the publishing of multiple
198 Machine Learning Algorithms and Applications in Engineering
Explicit
Identifier Quasi-identifier Sensitive Attributes
Name Pid Sex Age Zipcode Disease Treatment Symptom Doctor Diagnostic Method
tp1
Avan 1 * 20-30 142** HIV ART Infection John Elisa Test
tp2 Avan 1 * 20-30 142** Influenza Medicine Fever Alice RITD Test
tp3 Avan 1 M 2* 142** Dyspepsia Antibiotics Abdominal Pain Victor Ultrasound
tp4 Becon 2 M 2* 142** Lung Cancer Radiation Weight Loss Alice MRI Scan
tp5 Becon 2 M,F 26 1420* Influenza Medicine Fever Alice RITD Test
tp6 Canty 3 M, F 28 1420* HIV Art Weight Loss John Elisa Test
tp7 Denny 4 M 25-45 14249 Abdominal Cancer Chemotherapy Abdominal Pain Bob Chest Xray
tp8 Emy 5 F 25-45 13084 Covid19 Antibiotics Fever Dave RT-PCR Test
tp9 Emy 5 M,F 24-45 13084 Asthma Medication Chest Tightness Alice Methacholine Challenge Test
tp10 Frank 6 M,F 24-45 13064 Asthma Medication Shortness Of Breath Suzan Methacholine Challenge Test
tp11 Lisa 7 M, F 24-45 13318 Lupus Medicine Joint Pain Jane Ana Test
tp12 Lisa 7 F 2* 1**** Myocarditis Medicine Abnormal Heartbeat Patrick ECG
tp13 Ram 8 M 2* 1**** Asthma Medication Shortness Of Breath Suzan Methacholine Challenge Test
tp14 Ram 8 M 2* 1**** Obesity Nutrition Control Eating Disorders Sana Body Mass Index
199
200 Machine Learning Algorithms and Applications in Engineering
anonymize the sensitive attributes. The f-slip model thwarts five correlation
attacks such as (i) background knowledge attack, (ii) multiple sensitive cor-
relation attack, (iii) quasi-identifier correlation attack, (iv) non-membership
correlation attack, and (v) membership correlation attack [27]. As per the
study, the commonly used 1:M datasets are INFORMS and YouTube [25,26,27].
The chapter is organized as follows: Section 12.3 discusses the contribution
of the paper. Section 12.4 validates the various correlation attacks with the
scenarios. Section 12.5 discusses the implementation of the CBPP algorithm
and Section 12.6 presents the algorithm of CBPP. Section 12.7 describes the
evaluation and experimentation results that validate the effectiveness of the
CBPP algorithm. Section 12.8 concludes the work with future direction.
12.3 Contributions
The privacy-preserving data publishing models should protect the microdata
with high privacy and less information loss. Though various algorithms have
been proposed to balance privacy and utility, the challenge the challenge
remains unsolved. The significant contributions of the work are as follows:
(1) Attacks using background knowledge correlation: The intruder can infer
the sensitive attributes of an individual if he possesses significant
background knowledge about the individual. Case scenario 1 explains
the background knowledge correlation attacks.
newgenrtpdf
CBPP for Privacy-Preserving Data Publishing
TABLE 12.2
Anonymity
Explicit
Identifier Quasi-identifier Sensitive Attributes
Name Pid Sex Age Zip code Disease Treatment Symptom Doctor Diagnostic Method
tp1
Avan 1 M 27 14248 HIV ART Infection John Elisa Test
tp2 Avan 1 M 27 14248 Influenza Medicine Fever Alice RITD Test
tp3 Avan 1 M 27 14248 Dyspepsia Antibiotics Abdominal Pain Victor Ultrasound
tp4 Becon 2 M 26 14206 Lung Cancer Radiation Weight Loss Alice MRI Scan
tp5 Becon 2 M 26 14206 Influenza Medicine Fever Alice RITD Test
tp6 Canty 3 F 28 14207 HIV Art Weight Loss John Elisa Test
tp7 Denny 4 M 25 14249 Abdominal Cancer Chemotherapy Abdominal Pain Bob Chest Xray
tp8 Emy 5 F 44 13084 Covid19 Antibiotics Fever Dave RT-PCR Test
tp9 Emy 5 F 44 13084 Asthma Medication Chest Tightness Alice Methacholine Challenge Test
tp10 Frank 6 M 45 13064 Asthma Medication Shortness Of Breath Suzan Methacholine Challenge Test
tp11 Lisa 7 F 24 13318 Lupus Medicine Joint Pain Jane Ana Test
tp12 Lisa 7 F 24 13318 Myocarditis Medicine Abnormal Heartbeat Patrick ECG
tp13 Ram 8 M 22 14421 Asthma Medication Shortness Of Breath Suzan Methacholine Challenge Test
tp14 Ram 8 M 22 14421 Obesity Nutrition Control Eating Disorders Sana Body Mass Index
201
202 Machine Learning Algorithms and Applications in Engineering
(2) Attack using quasi- identifiers correlation: The intruder can perform a
quasi-identifier correlation attack by correlating the quasi-identifier
values such as age, zip code, and gender to infer the individual sen-
sitive attributes value and the complete information of the individual.
Case scenarios 1 and 2 explain the quasi-identifiers correlation attacks.
(3) Attacks using non-membership correlation: The intruder can perform a
non-membership correlation attack if he can infer the non-existence of
an individual from the data set. The case scenarios 1, 2, and 3 explain
the non-membership correlation attacks.
Scenario 1: If the intruder possesses the basic information of an indi-
vidual, he can gather sensitive information about that particular
individual. If the intruder knows that Emy is a female, age>40, from
zip code 13084, he can easily infer that Emy falls in either one of the
equivalence classes 4 and 5 from Table 12.2. If the intruder also has
strong background knowledge that Emy often suffers from breath-
lessness problems, the probability of inferring Emy’s record from
equivalence classes 4 and 5 is high.
Scenario 2: If the intruder possesses background knowledge about the
individual and the quasi-identifier information, he can easily link
the sensitive attributes of an individual by using the quasi-identifier
information. If the intruder knows that Becon is a male, age >25, zip
code 14206, then the intruder can easily infer that Becon falls in the
equivalence classes 2 and 3 from Table 12.2. With the strong back-
ground knowledge, he can quickly identify the values of the sensi-
tive attributes of Becon.
Scenario 3: If the intruder has strong background knowledge and
possesses the information of quasi-identifier about the individual,
that is, Avan is a male, age <30, zip code 14248, also Avan is highly
infected with a deadly disease, then he can easily infer that the
records of Avan fall in equivalence classes 1 and 2 from Table 12.2.
So, the existence of the individual Avan is identified, which leads to
privacy breaches. The traditional methods and algorithms cannot
thwart the privacy breaches in the 1:M dataset.
When traditional models and algorithms have been applied on the 1:M
dataset, it may cause privacy breaches. Therefore, an efficient CBPP algorithm
has been proposed to resist background knowledge attacks, non-membership
correlation attacks and quasi-identifier correlation attacks. CBPP algorithm
partitions the original data set into two tables: the quasi-identifier table and
the sensitive attribute table. Both are linked together by a batch id, which is
further explained below.
Consider the original sample dataset Ts in Table 12.1. When the algorithm
CBPP is applied on the data set, it converts the original data set into two
tables. The original dataset Ts consists of multiple sensitive attributes for a
particular individual and shares the same quasi-identifiers such as age, zip
code, and gender. Patient ID is just used for reference, and it will be removed
during the publishing of data. The records having same patient id are merged.
For example, the first three records of Table 12.1 with patient id 1 are merged.
(i.e., tp1Ս tp2Ս tp3). After merging the records of an individual, an aggregated
table Tsa is formed. Anatomization is performed on the aggregated dataset Tsa
and partitions the data set into two tables: (1) quasi-identifier Tsq and (2) sen-
sitive attribute Tss. Let qd be the quasi-identifier values of table Tsq and the set
of quasi-identifier attributes are (qd1, qd2, qd3 … qdn). The quasi-identifier table Tsq
comprises age, sex, and zip code.
Let sd be the values of the sensitive attributes of Tsa and the set of sensitive
attributes are (sd1, sd2, sd3 … sdn). The sensitive attributes are disease, treatment,
symptom, doctor, and diagnostic method. The quasi- identifier attribute
values are generalized and a batch id is allotted, as shown in Table 12.3. The
sensitive attribute table is formed as shown in Table 12.4 according to the
batch id. During the anonymization process, the data gets shuffled. Thus,
batch id, written as bid, is allocated to link the records in both quasi-identifier
and sensitive attribute table.
TABLE 12.3
Quasi Identifier Table Tsq
M 25-45 13000-15000 2
M 25-45 13000-15000 2
F 25-45 13000-15000 2
M 25-45 13000-15000 2
F 41-60 13000-15000 3
M 41-60 13000-15000 3
F 15-24 13000-15000 1
M 15-24 13000-15000 1
204 Machine Learning Algorithms and Applications in Engineering
TABLE 12.4
Sensitive Attribute Batch Table T
Diagnosis
Disease Treatment Symptom Physician Method Batch id
FIGURE 12.2
Architectural diagram of the CBPP algorithm.
12.6 CBPP Algorithm
The algorithm has been divided into two parts and each part is elaborated.
In algorithm 1, the original data set is passed as an input argument in
line 1. In line 2, the multi-records of an individual are merged using the
group by function such that each individual has only one entry. In line 3,
anatomization has been performed to partition the original table Ts into two
tables: (a) quasi-identifiers (Tsq) and (b) sensitive attribute table (Tsa). In line
4, the quasi-identifier is identified as sex, age, and zip code. The new list for
age and zip code are created in lines 5 and 6. In line 7, a new list has been
created for batch id. If the value for age ranges from 15 to 60, the optimal
number of distributions is 3, that is, 15–24, 25–44, and 45–60. From lines 8 to
10, the quasi-identifier age has been generalized. For example, if the person’s
age is 23, then the value 15–24 will be appended to the list. Similarly, in lines
11–13, the quasi-identifier zip code, distributions are formed and values are
appended into the zip code list stated in lines 14 and 15. In the sample data
set of the work, the zip code ranges from 13000 to 15000. So, in Table 12.3, the
distribution of zip code ranges from 13000 to 15000 for all the individuals.
According to the algorithm, the total number of batch id combinations pos-
sible is 2*3*3+3 =21, whereas the original sample data set of the work has the
total number of batch id combinations possible as 2*3*1+3 =9.
In algorithm 2, sensitive attributes are arranged according to the batch id
and the quasi-identifier that has been used. In line 1, the anatomized sensitive
attribute table has been passed as an argument. From lines 2 to 4, three lists of
each sensitive attribute have been created. Specifically, three lists are created
because the number of distributions that have been made in the quasi-identifiers
is 3 and that quasi-identifier in return gives existence to batch id. In line 5, iterate
the values according to age and each value is sent to the list corresponding to
206 Machine Learning Algorithms and Applications in Engineering
the batch id. A similar procedure is performed for all the sensitive attributes and
the sensitive values are clubbed together according to batch id.
12.7 Evaluation
The CBPP algorithm is implemented in Python and the experiments are
conducted in the machine that runs on Windows 10, 8 GB RAM, 1 TB storage
and 128 GB SSD. The CBPP algorithm is performed on the real-world 1:M
dataset INFORMS. There are 2, 30,231 records in INFORMS dataset. After
grouping of records and removal of duplicates, the size of the dataset is
40,126. The birth year, month, sex, and race are chosen as quasi-identifiers
and education year, income, and poverty line are taken as sensitive attributes.
The information loss is measured using query accuracy.
Query Error =
∑ (QI ) − ∑ ( Org )
(12.1)
count ( Org )
The query error measures the COUNT queries executed on the micro-table
data set and original data set for measuring information loss. In this case, the
total number of batch ID combinations possible is 9. Out of these 9, only one
particular combination is the key that links the quasi-identifier table and the
sensitive batch table. Hence, batch id is the first line of protection. The prob-
ability that the attacker successfully finds the perfect combination becomes 1/
9 =0.11, which is very low. The combinations of batch id can be increased by
making more distributions in age and sex, further decreasing the probability
of identifying the batch id combination. In Figure 12.3, it is clearly shown that
information loss is inversely proportional to the number of values. Therefore,
information loss would decrease with the increase in the number of records.
However, in k-anonymity, information loss increases as the dimensionality
increases [18], as shown in Figure 12.3. The CBPP algorithm provides better
utility when compared with the traditional method, k-anonymity.
Consider that, in some cases, the attacker can get through the batch ID.
Then the intruder further has to make proper combinations of all the shuffled
208 Machine Learning Algorithms and Applications in Engineering
FIGURE 12.3
Information loss.
FIGURE 12.4
Probability of identification of quasi-identifier combination versus number of distributions.
sensitive values to obtain the proper and complete details about the individ-
uals. The relation between the distributions and the probability to crack the
combination is given in Figure 12.4.
References
[1] Rashid A. and Yasin N., Privacy preserving data publishing: review. International
Journal of Physical Sciences, 10(7), 239 (2015).
[2] Sheppard, E., Data Privacy is a collective concern, 2020. https://medium.com/
opendatacharter/data-privacy-is-a-collective-concern-8ebad29b25ce
[3] Vasudevan L., Sukanya S., and Aarthi N., Privacy preserving data mining
using cryptographic role based access control approach, Proceedings of the
International MultiConference of Engineers and Computer Scientists, Hong
Kong, March 19–21, 2008, p. 474.
[4] Churi Prathamesh, P. and Pawar Ambika, V., A systematic review on privacy pre-
serving data publishing techniques, Journal of Engineering Science and Technology
Review, 12(6), 17–25 (2019).
[5] Li, T., Li, N., Zhang, J., and Molloy, I. Slicing: a new approach for privacy-
preserving data publishing, IEEE Transactions on Knowledge and Data Engineering,
24(3), 561–574 (2010).
[6] Wantong Zheng, Zhongye Wang, Tongtong Lv, Yong Ma, and Chunfu Jia,
K-anonymity algorithm based on improved clustering, International Conference
on Algorithms and Architectures for Parallel Processing, 11335, 426–476 (2018).
[7] Elanshekhar, N. and Shedge, R. An effective anonymization technique of big
data using suppression slicing method, International Conference on Energy,
Communication, Data Analytics and Soft Computing (ICECDS), 2500– 2504
(2017).
[8] Radha, D. and Valli Kumari Vatsavayi, Bucketize. Protecting privacy on multiple
numerical sensitive attribute, Advances in Computational Sciences and Technology,
10(5), 991–1008 (2017).
[9] Puri, V., Sachdeva, S., and Parmeet Kaur, Privacy preserving publication of
relational and transaction data: Survey on the anonymization of patient data,
Computer Science Review, 32, 45–61 (2019).
210 Machine Learning Algorithms and Applications in Engineering
[10] Anjum, A., Ahmad, N., Malik, U.R., Zubair, S., and Shahzad, B., An efficient
approach for publishing micro data for multiple sensitive attributes, The Journal
of Supercomputing, 74, 5127–5155 (2018).
[11] Widodo, Budiardjo, E.K., and Wibowo, W.C. Privacy preserving data publishing
with multiple sensitive attributes based on overlapped slicing. Information, 10,
362 (2019).
[12] Lin Yao, Zhenyu Chen, and Xin Wang, Dong Liu, and Guowei Wu, Sensitive label
privacy preservation with anatomization for data publishing, IEEE Transactions
on Dependable and Secure Computing, 18(2), 904–917 (2019).
[13] Susan, V.S. and Christopher, T. Anatomisation with slicing: a new privacy pres-
ervation approach for multiple sensitive attributes. SpringerPlus 5, 964 (2016).
[14] Jayapradha, J., Prakash, M., and Harshavardhan Reddy, Y., Privacy preserving
data publishing for heterogeneous multiple sensitive attribute with personalized
privacy and enhanced utility, Systematic Reviews of Pharmacy, 11(9), 1055–1066
(2020).
[15] Yuelei Xiao and Haiqi Li, Privacy preserving data publishing for multiple sensi-
tive attributes based on security level, Information, MDPI, 11, 1–27 (2020).
[16] Das, D. and Bhattacharyya, D.K., Decomposition: improving l-diversity for mul-
tiple sensitive attributes, International Conference on Computer Science and
Information Technology 403–412 (2012).
[17] Yang Ye, Liu Yu, Chi Wang, Depang Lv, and Jianhua Feng, Decomposition: privacy
preservation for multiple sensitive attributes. International Conference on
Database Systems for Advanced Applications, 486–490 (2009).
[18] Kiruthika, S. and Mohamed Raseen, M., Enhanced slicing models for preserving
privacy in data publication, International Conference on Current Trends in
Engineering and Technology, IEEE, 1–8 (2013).
[19] Khan, R., Tao, X., Anjum, A., Sajjad, H., Khan, A., and Amiri, F. Privacy pre-
serving for multiple sensitive attributes against fingerprint correlation attack
satisfying c-diversity. Wireless Communications and Mobile Computing, 1–18
(2020).
[20] Bennati, S. and Kovacevic, A., Privacy metric for trajectory data based on k-
anonymity, l-diversity and t-closeness, (2020), https://arxiv.org/pdf/2011.0921
8v1.pdf
[21] Pika, A., Twynn, M., Budiono, S., Ter Hofstede, A.H.M, van der Aalst, wil MP,
and Reijers, H.A. Privacy-preserving process mining in healthcare, International
Journal of Environmental Research Public Healthcare, 17(5), 1612 (2019).
[22] Aggarwal, C.C. On k anonymity and curse of dimensionality, 31st International
Conference on Very Large Databases, 433–460 (2018).
[23] Bild, R., Kuhn, K.A., and Prasser, F., Better safe than sorry—implementing reli-
able health data anonymization, Student Health Technology Inform, 270, 68–72
(2020).
[24] Wang, R., Zhu, Y., Chen, T., and Chang, C. Privacy-preserving algorithms for
multiple sensitive attributes satisfying t-closeness. Journal of Computer Science
Technology 33, 1231–1242 (2018).
[25] Qiyuan Gonga, Junzhou Luo, Ming Yang, Weiwei Ni, and Xiao- Bai Li,
Anonymizing 1:M microdata with high utility, Knowledge-Based Systems, 1–12
(2016).
CBPP for Privacy-Preserving Data Publishing 211
13
Classification of Network Traffic on
ISP Link and Analysis of Network
Bandwidth during COVID-19
CONTENTS
13.1 Introduction.............................................................................................. 214
13.2 Methodology............................................................................................. 214
13.2.1 Network Topology..................................................................... 217
13.2.2 Data Cleaning and Preprocessing........................................... 218
13.2.3 Data Visualization with Tableau.............................................. 218
13.3 Traffic Classification Using Classification Algorithms....................... 220
13.3.1 Comparison of Classifiers......................................................... 222
13.4 Bandwidth Requirement Prediction Using Time Series Model
(ARIMA).................................................................................................... 223
13.4.1 Feature Extraction Selection..................................................... 223
13.4.2 Auto-Regressive Integrated Moving Average Model
(ARIMA)..................................................................................... 224
13.4.3 Time Series Data: Trends and Seasonality.............................. 225
13.4.4 Time Series Data: Seasonal Patterns........................................ 227
13.5 Implementing the ARIMA Model.......................................................... 230
13.5.1 Building ARIMA Model............................................................ 231
13.5.2 Printing the Forecasted Values of Bandwidth....................... 232
13.5.3 Model Evaluation...................................................................... 234
13.6 Conclusion: Business Benefits of the Models....................................... 234
13.6.1 Network Traffic Classification................................................. 234
13.6.2 Time Series Prediction............................................................... 235
13.1 Introduction
Nowadays, ISP networks are more complex than before, the number of
applications that run on clients/servers are increasing leading to a network
resource management problem. The IT professionals tasked with maintaining
the network are facing serious challenges in determining which applications
consume the resources or degrade network performance. Analyzing the per-
formance metrics collected from clients/servers using traditional methods
is not enough anymore to make a correct decision whether an application
consumes more network resources. The improvement of the network per-
formance is not an easy task, and even a network bandwidth upgrade might
not be an optimal solution to solve the problem of high network utilization.
In this study, taking advantage of machine learning, business intelligence
and data analytics to compare and show the benefits of using those techniques
in analyzing the network performance metrics, performance metrics from a
real ISP network are collected. A cleaned data with performance metrics will
be analyzed to find the pattern and correlation to give a better understanding
of the applications and network performance. The data set will be used to
apply machine learning algorithms to predict future network performance
under certain conditions. The implementation of those techniques in future
will cut the cost of running a network and reduce investigation time when-
ever a problem occurs and make IT professional’s life much easier.
Due to the COVID-19 situation, almost 100% workforce across organizations
and various businesses were working from home/remotely. This results in
heavy utilization of internet bandwidth at the ISP side. As a result, it’s crit-
ical for ISPs to understand and classify the many types of network traffic
that pass through their network. Internet service providers can use this clas-
sification to successfully control the network performance of their Internet
lines and deliver a high-quality service to their consumers, leading to high
customer satisfaction.
Network traffic classification is also significant for network security and for
determining intrusion detection, QoS, and other features. Forecasting internet
bandwidth demand would also greatly assist ISPs in efficiently planning their
network resources. ISPs are interested in centralized measures and detecting
problems with specific customers before the they complain about the difficul-
ties, and if possible, before the consumers discover the problems at all.
13.2 Methodology
An experiment will be conducted out to ascertain the effects of applications
on resource consumption in the networks by extracting performance metrics.
The data is analyzed using machine learning and data analytics techniques,
Classification of Network Traffic on ISP Link during COVID-19 215
in addition to finding any correlation that exists between those metrics. The
performance metrics are extracted from the client as well as the server. The
clients are accessing files or web applications over the internet links.
The methodology is outlined as follows:
• Data collection
• Parsing the data—data cleaning, scaling, and normalization as required
• Generating graphs—to find trends and patterns in utilizations, find
correlations
• Data modelling using ML techniques to predict the class of network
traffic
• Predictive modelling to forecast the bandwidth requirement
• Analysis technique: would include Machine Learning classifiers.
Data Collection Method: This is the most crucial and first step, which
comprises data collection. The real-time network traffic is recorded in this
step. There are several programs for capturing network traffic; however, the
TCP dump utility can capture real-time network information. The solar wind
packet capturing and analysis program is utilized to gather network traffic.
Application traffic such as WWW, DNS, FTP, P2P, and Telnet is recorded.
Random network connection details were captured for the duration of one
year as shown in Figure 13.1.
FIGURE 13.1
Random network connection captured in mid-July 2019 to mid-July 2020.
216 Machine Learning Algorithms and Applications in Engineering
FIGURE 13.2
Network traffic data is collected for one of the links from the ISP.
Network traffic data is collected for the defined duration for one of the
links from the ISP as shown in Figure 13.2. This is done by masking client-
specific information to avoid the compliance issues.
Data Set Information: The following attributes are available in a data set for
analysis.
Variable Description
Variable Description
Variable Description
Peak TX Mbps Peak transmit speed on the link on that day in Mbps
Peak RX Mbps Peak receive speed on the link on that day in Mbps
Max_users Maximum number of user connections on that day
Bandwidth Bandwidth utilized on the link in a day
13.2.1 Network Topology
The network and application performance is measured through several
performance metrics such as bandwidth, throughput, disk time, number
of packets send/Recv per sec, and number of bytes send/Recv per second.
The network used in the experiment has several tools to collect network and
application metrics as in Figure 13.3. The metrics are collected on the hosts
and the communication link between the two end points.
The tables below show the performance metrics categorized by the
tool used:
Flow ID Bandwidth
Source IP Transfer rate
Destination IP Receive rate
Source port
FIGURE 13.3
Network performance metric tools.
218 Machine Learning Algorithms and Applications in Engineering
Destination port
Protocol
Protocol type
L7_protocol
Application name
Max_Users
FIGURE 13.5
Traffic increased almost exponentially from March 2020.
FIGURE 13.6
Increased internet users from April 1, 2020.
FIGURE 13.7
High increase during lockdown period.
FIGURE 13.8
Network traffic classification model.
222 Machine Learning Algorithms and Applications in Engineering
13.3.1 Comparison of Classifiers
The simulation tool provides precise results regarding the applied methods,
such as accuracy, training time, and recall, after various machine learning
algorithms have been implemented. The four different classifiers used in this
exercise are KNN, decision tree, random forest, and naïve Bayes.
When compared to other algorithms, the random forest algorithm produces
extremely accurate results. The accuracy results of applying these machine
learning techniques are compared in the chart in Figure 13.9.
Classification accuracy of 69 percent has been achieved using Naïve Bayes
algorithm. With k=4, the classification accuracy of 89 percent is achieved
with KNN algorithm. Setting k=3 and running the code again gives us better
results as shown below. Accuracy increased to 91 percent when the value
of k is reduced from 4 to 3. The decision tree gives improved accuracy of
92.89 percent compared to KNN. Random forest gives the highest accuracy
of 99.9 percent among all the algorithms attempted in classifying the network
traffic.
50%
0%
KNN(k=4) KNN(k=3) Decision Tree Random Forest Naïve Bayes
FIGURE 13.9
Applying the accuracy results.
Classification of Network Traffic on ISP Link during COVID-19 223
There are no null values as this is a TCP dump from the tool. All the network
parameters captured for the specified time stamp are given below.
Hence transmit speed parameters and user count are used as features in the
prediction model.
The RX parameters was dropped from the data set for further modelling.
Then timestamp column was set as an index for the data set, which is required
for time series modelling.
The subplots of all the variables with respect to index (timestamp) provides
insights into data variation and trend as shown below in Figure 13.10.
There is substantial increase in network throughput post April 2020. This
is exactly when most of work started happening online across the industries.
This has put a heavy load on ISPs as almost all entire workforce across the
world is working remotely as offices were shut.
FIGURE 13.10
Subplots of index of timestamp.
utilize the ARIMA module in Python to create a time series model. It’s critical
to examine the trends and seasonability of the time series data before moving
further with the TS model construction.
In comparison to 2019 and 2020, there is a significant difference in band-
width utilization as shown in Figure 13.11.
FIGURE 13.11
Difference in bandwidth.
FIGURE 13.12
Removing the seasonality.
FIGURE 13.13
Visualize under TX speeds.
Classification of Network Traffic on ISP Link during COVID-19 227
FIGURE 13.14
Maximum number of users.
FIGURE 13.15
Upward trends of individual network from April 2020.
FIGURE 13.16
First order difference of the ‘bandwidth’ data series.
FIGURE 13.17
First order differencing plots of TX speeds.
FIGURE 13.18
First order differencing plots of maximum users.
FIGURE 13.19
First order differencing of all network parameters.
All the time series variables are plotted again on the same graph to see
how they look like. The first-order differencing of all network parameters is
plotted in one graph as in Figure 13.19 to recheck the seasonal pattern if any.
From the above visualization, certain amount of seasonability in the time
series data for all the variables can be observed as in Figure 13.20 and also an
upward trend from the second quarter of 2020 onward.
230 Machine Learning Algorithms and Applications in Engineering
FIGURE 13.20
Seasonability in the time series data for all the variables.
Training data
Testing data
Classification of Network Traffic on ISP Link during COVID-19 231
For forecasting the next six values of bandwidth, we have six observations
in data set. We will then map these forecasts to start index of the test data for
comparison.
232 Machine Learning Algorithms and Applications in Engineering
Let us concatenate the forecasted data frame with original data frame and
plot it (Figure 13.21).
Using the ARIMA model that was built, the prediction of the bandwidth
requirement of next two months is done and checked if the predictions are
in line with current spike in network utilization due to COVID situation. The
prediction of plot of this two-month forecast along with original data set is
done to calculate the bandwidth required in August and September 2020 as
shown in Figure 13.22. It is also observed the range of forecast values show
the interval within which the predictions will fall for future dates. As we
can see, the bandwidth requirement will remain in the range of 800 to 1000
during August and September 2020 period as per the time series model as in
Figure 13.23.
FIGURE 13.21
The predictions are in line with high bandwidth utilization in 2020.
Classification of Network Traffic on ISP Link during COVID-19 233
FIGURE 13.22
Bandwidth required in August and September 2020.
FIGURE 13.23
Bandwidth range of 800–1000 during August and September 2020.
234 Machine Learning Algorithms and Applications in Engineering
FIGURE 13.24
Evaluation of this proposed model using the diagnostic plot.
13.5.3 Model Evaluation
The evaluation of this proposed model can be done using the diagnostic plot
is shown in Figure 13.24.
Standardized Residuals: The residual errors appear to have a uniform vari-
ance and fluctuate around a mean of zero. The density plot indicates a normal
distribution with a mean of zero.
Normality: The dots on the Q–Q plot should all be completely aligned with
the red line. Any large variances would indicate a skewed distribution.
The ACF plot, also known as the correlogram, illustrates that the residual
errors are not autocorrelated. Any autocorrelation would imply that the
residual errors have a pattern that isn’t explained by the model. As a result,
you’ll need to add more Xs (predictors) to the model. In general, it appears
to be a decent match.
number or protocol). Using classified network traffic, you may do tasks like
monitoring, discovery, control, and optimization. The overall purpose of net-
work traffic classification is to improve network performance.
The packets can be labelled once they have been classified as belonging to a
specific application. These markings or flags will assist the network router in
determining the proper service policies for certain network flows.
This study’s classification approach can be used in conjunction with net-
work or packet monitoring software. This allows us to create a real-time
dashboard that shows application-specific network use in real time. This
will aid in the rapid and efficient regulation of traffic flow across any
internet link.
Bibliography
[1] Shahraki, Amin, Mahmoud Abbasi, Amir Taherkordi, and Anca Delia Jurcut.
“Active learning for network traffic classification: A technical survey.” arXiv pre-
print arXiv:2106.06933 (2021).
[2] Callegari, Christian, Pietro Ducange, Michela Fazzolari, and Massimo Vecchio.
“Explainable Internet Traffic Classification.” Applied Sciences 11, no. 10
(2021): 4697.
[3] Abbasi, Mahmoud, Amin Shahraki, and Amir Taherkordi. “Deep learning
for network traffic monitoring and analysis (NTMA): A survey.” Computer
Communications 170 (2021): 19–41.
[4] Dong, Shi. “Multi class SVM algorithm with active learning for network traffic
classification.” Expert Systems with Applications 176 (2021): 114885.
[5] Zhongsheng, Wang, Wang Jianguo, Yang Sen, and Gao Jiaqiong. “Traffic identi-
fication and traffic analysis based on support vector machine.” Concurrency and
Computation: Practice and Experience 32, no. 2 (2020): e5292.
[6] Salman, Ola, Imad H. Elhajj, Ayman Kayssi, and Ali Chehab. “A review on
machine learning-based approaches for internet traffic classification.” Annals of
Telecommunications 75, no. 11 (2020): 673–710.
[7] Network Based Application Recognition (NBAR)— Cisco, 2009: www.cisco.
com/c/en/us/products/ios-nx-os-software/network-based-application-reco
gnition-nbar/index.html. (Accessed March 12, 2021)
[8] Palo Alto Networks, Next-Generation Firewall Overview, 2011.
[9] www.paloaltonetworks.com/content/dam/paloaltonetworks-com/en_US/
assets/pdf/datasheets/firewall-features-overview/firewall-features-overview.
pdf. (Accessed March 15, 2021)
[10] Mellia, Marco, Antonio Pescapè, and Luca Salgarelli. “Traffic classification
and its application to modern networks.” Computer Networks (1999) 53, no. 6
(2009).
[11] Risso, Fulvio, Mario Baldi, Olivier Morandi, Andrea Baldini, and Pere Monclus.
“Lightweight, payload- based traffic classification: An experimental evalu-
ation.” In 2008 IEEE International Conference on Communications, pp. 5869–5875.
IEEE, 2008.
[12] Avalle, Matteo, Fulvio Risso, and Riccardo Sisto. “Efficient multistriding of large
non-deterministic finite state automata for deep packet inspection.” In 2012 IEEE
International Conference on Communications (ICC), pp. 1079–1084. IEEE, 2012.
[13] Crotti, Manuel, Maurizio Dusi, Francesco Gringoli, and Luca Salgarelli. “Traffic
classification through simple statistical fingerprinting.” ACM SIGCOMM
Computer Communication Review 37, no. 1 (2007): 5–16.
[14] Bujlow, Tomasz, Tahir Riaz, and Jens Myrup Pedersen. “A method for classifi-
cation of network traffic based on C5. 0 Machine Learning Algorithm.” In 2012
International Conference on Computing, Networking and Communications (ICNC),
pp. 237–241. IEEE, 2012.
[15] Nguyen, Thuy T.T., and Grenville Armitage. “A survey of techniques for internet
traffic classification using machine learning.” IEEE Communications Surveys &
Tutorials 10, no. 4 (2008): 56–76.
Classification of Network Traffic on ISP Link during COVID-19 237
[16] Ubik, Sven, and Petr Žejdl. “Evaluating application-layer classification using a
machine learning technique over different high-speed networks.” In 2010 Fifth
International Conference on Systems and Networks Communications, pp. 387–391.
IEEE, 2010.
[17] Bujlow, Tomasz, and Jens Myrup Pedersen. “Multilevel classification and
accounting of traffic in computer networks.” Classification and Analysis of
Computer Network Traffic 167.
[18] Bujlow, Tomasz, and Jens Myrup Pedersen. “A practical method for multi-
level classification and accounting of traffic in computer networks” Aalborg
Universitet (2014).
[19] Jun, Li, Zhang Shunyi, Lu Yanqing, and Zhang Zailong. “Internet traffic clas-
sification using machine learning.” In 2007 Second International Conference on
Communications and Networking in China, pp. 239–243. IEEE, 2007.
[20] Bujlow, Tomasz, and Valentin Carela-Espanol. “Comparison of deep packet
inspection (DPI) tools for traffic classification” (UPC-DAC-RR-CBA-2013-3 ed.)
Universitat Politècnica de Catalunya. www.ac.upc.edu/app/research-reports/
html/research_center_index-CBA-2013,en.html (2013).
[21] Dusi, Maurizio, Francesco Gringoli, and Luca Salgarelli. “Quantifying the
accuracy of the ground truth associated with Internet traffic traces.” Computer
Networks 55, no. 5 (2011): 1158–1167.
[22] Cascarano, Niccolò, Luigi Ciminiera, and Fulvio Risso. “Optimizing deep
packet inspection for high-speed traffic analysis.” Journal of Network and Systems
Management 19, no. 1 (2011): 7–31.
[23] Alcock, Shane, and Richard Nelson. “Measuring the accuracy of open-source
payload-based traffic classifiers using popular internet applications.” In 38th
Annual IEEE Conference on Local Computer Networks- Workshops, pp. 956–
963.
IEEE, 2013.
[24] Goss, Ryan, and Reinhardt Botha. “Deep packet inspection— Fear of the
unknown.” In 2010 Information Security for South Africa, pp. 1–5. IEEE, 2010.
[25] Valenti, Silvio, Dario Rossi, Alberto Dainotti, Antonio Pescapè, Alessandro
Finamore, and Marco Mellia. “Reviewing traffic classification.” In Data Traffic
Monitoring and Analysis, pp. 123–147. Springer, Berlin, Heidelberg, 2013.
[26] Wei, Yong, Z. Yun-Feng, and Li-chao Guo. “Analysis of message identification
for OpenDPI.” Computer Engineering (2011): S1.
[27] Alcock, Shane, and Richard Nelson. Libprotoident: Traffic Classification Using
Lightweight Packet Inspection. Technical report, University of Waikato, 2012.
[28] Levandoski, Justin. “Application layer packet classifier for Linux.” http://l7-fil
ter. sourceforge. net/(2008).
[29] Dainotti, Alberto, Walter De Donato, and Antonio Pescapé. “Tie: A community-
oriented traffic classification platform.” In International Workshop on Traffic
Monitoring and Analysis, pp. 64–74. Springer, Berlin, Heidelberg, 2009.
[30] Khakpour, Amir R., and Alex X. Liu. “High-speed flow nature identification.”
In 2009 29th IEEE International Conference on Distributed Computing Systems,
pp. 510–517. IEEE, 2009.
[31] Kim, Hyunchul, Kimberly C. Claffy, Marina Fomenkov, Dhiman Barman,
Michalis Faloutsos, and KiYoung Lee. “Internet traffic classification demys-
tified: myths, caveats, and the best practices.” In Proceedings of the 2008 ACM
CoNEXT conference, pp. 1–12. 2008.
238 Machine Learning Algorithms and Applications in Engineering
[32] Aceto, Giuseppe, Alberto Dainotti, Walter De Donato, and Antonio Pescapé.
“PortLoad: taking the best of two worlds in traffic classification.” In 2010
INFOCOM IEEE Conference on Computer Communications Workshops, pp. 1–5.
IEEE, 2010.
[33] Kumar, Sailesh, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan
Turner. “Algorithms to accelerate multiple regular expressions matching for
deep packet inspection.” ACM SIGCOMM Computer Communication Review 36,
no. 4 (2006): 339–350.
[34] Cascarano, Niccolo, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto.
“iNFAnt: NFA pattern matching on GPGPU devices.” ACM SIGCOMM Computer
Communication Review 40, no. 5 (2010): 20–26.
[35] Bujlow, Tomasz, Valentín Carela-Español, and Pere Barlet-Ros. “Extended inde-
pendent comparison of popular deep packet inspection (DPI) tools for traffic
classification.” (2014). Universitat Politècnica de Catalunya. www.ac.upc.edu/
app/research-reports/html/research_center_index-CBA-2014,en.html
[36] Fusco, Francesco, and Luca Deri. “High speed network traffic analysis with
commodity multi-core systems.” In Proceedings of the 10th ACM SIGCOMM
Conference on Internet Measurement, pp. 218–224. 2010.
newgenprepdf
239
14
Integration of AI/Ml in 5G Technology
toward Intelligent Connectivity,
Security, and Challenges
Myanmar
[email protected]
*
CONTENTS
14.1 Introduction.............................................................................................. 240
14.2 Overview of 5G........................................................................................ 240
14.2.1 Enabling Technologies............................................................... 241
14.2.2 Key Enablers for 5G................................................................... 241
14.2.3 5G Requirements........................................................................ 243
14.3 Artificial Intelligence (AI) and Machine Learning.............................. 243
14.3.1 Amalgam of AI and ML in 5G.................................................. 244
14.3.2 Security and Privacy.................................................................. 244
14.3.3 AI Learning-Based Scheme for 5G........................................... 246
14.4 Wireless AI................................................................................................ 246
14.5 AI in Context with 5G............................................................................. 248
14.6 Intelligent Connectivity........................................................................... 249
14.7 5G Security................................................................................................ 250
14.8 Challenges................................................................................................. 250
14.8.1 5G Complexity............................................................................ 250
14.8.2 Security and Privacy.................................................................. 251
14.8.3 Trade-off in Speed...................................................................... 251
14.9 Research Scope......................................................................................... 251
14.10 Conclusion................................................................................................ 252
14.1 Introduction
In the ceaselessly developing correspondence organization engineering, to
coordinate different scopes of gadgets with remarkable prerequisites for
an alternate organization, parameters have brought about refined difficul-
ties for network security. The new improvements in 5G networks and past
are working with the vivid development of information correspondence
by giving higher information rates. 5G is a versatile network confronting
many difficulties to meet the remarkable developing requests for admit-
tance to remote administrations with super low idleness and, further, high
information rates. 5G today is the central innovation of many state-of-the-art
advancements like the IoT, smart grid, automated aeronautical frameworks,
and self-driving vehicles. 5G is needed to be described by high adaptability in
plan and assets by the executives and allotted to fulfill the expanding needs
of this heterogeneous network and clients or end-users[1–3].
Numerous wireless examinations driving gatherings foresee that artifi-
cial intelligence (AI) as the next huge “game-evolving” innovation, ready
to furnish 5G with the adaptability and the knowledge required. Hence
numerous analysts have examined the proficiency of this hypothesis in
numerous parts of 5G remote interchanges including regulation, channel
coding, obstruction of the executives, planning, 5G slicing, reserving, energy
productivity, and digital protection. ML and AI algorithms can be utilized
to process and dissect cross-space information that would be needed in 5G
in a considerably more effective manner empowering speedy choices and
as such facilitating the organization’s intricacy and decreasing the support
cost. The cross-space information incorporates geographic data, designing
boundaries, and different information to be utilized by AI and ML to all the
more likely gauge the pinnacle traffic, streamline the organization for limit
extension, and empower more canny inclusion through powerful imped-
ance estimations [4].
14.2 Overview of 5G
The quick reception of 5G innovation is promising a stunning number of new
gadgets. For instance, the Cisco Annual Internet Report (2018–2023) estimates
that “Machine-to-Machine (M2M) communication will develop 2.4 overlays,
from 6.1 billion out of 2018 to 14.7 billion by 2023. There will be 1.8 M2M com-
munication for every individual from the worldwide populace by 2023.” The
outstanding development in associated gadgets alongside the presentation of
5G innovation is relied upon to cause a test for productive and dependable
organization’s asset assignment. In addition, the gigantic organization of the
241
Internet of Things and associated gadgets to the internet might pose a genuine
danger to the organization’s security assuming they are not taken care of appro-
priately [5,6]. The 5G networks are relied upon to help a lot higher level het-
erogeneity (as far as associated gadgets and networks) when contrasted with
their archetypes. For example, 5G supports intelligent vehicles, shrewd homes,
savvy structures, and shrewd urban areas. Also, the IoT in the 5G design will
include more strong and versatile strategies to deal with the basic security issues
both at the organization and gadget sides. The security of such a network will
be considerably more convoluted due to the external interruption just as the
nearby interruption. Artificial intelligence and ML can give arrangements by
grouping delicate security connections in the middle, for example, character,
validation, and confirmation. The security and security in 5G-IoT will cover
every one of the layers like character assurance, security, and protecting end-
to-end users. For example, the key validation system from end-gadget to core
network and onward to the specialist co-op, while hiding the key identifier, is
as yet a mind-boggling issue. AI and ML can likewise assume a significant part
in key validation alongside adequately limiting the disguising attacks.
14.2.1 Enabling Technologies
5G will empower its availability billions of gadgets in the IoT and the
associated world. Thus, there are three significant classifications of utiliza-
tion cases for 5G.
1 NFV (network function Asset usage further developed Data-centric Disengagement, asset designation, decency,
virtualization) throughput, energy investment funds, communication income/value improvement, versatility the
TABLE 14.2
5G Requirements as Compared to 4G
1 Speed 1–10 Gbps associations with end focuses in the field (for
example, not hypothetical greatest)
2 Latency 1 millisecond start to finish full circle delay—dormancy
3 BW utilization 1000× transfer speed per unit region
4 Device to be connected 10–100× number of associated devices
5 Accessibility Perception of 99.999% accessibility
6 Coverage 100%
7 Usage of energy 90% decrease in network energy use
8 Battery life of device As long as ten-year battery life for low power, machine-
type gadgets
14.2.3 5G Requirements
Lately, there have been a few perspectives about a definitive structure that
5G technology should take. There have been two perspectives on what 5G
ought to be:
14.3.1 Amalgam of AI and ML in 5G
An expanded transfer speed, higher range use, and further, high information
rates in 5G organizations have additionally enlarged the threat and security
scene from individual gadgets to the specialist co-op network. Consequently,
the network ought to be savvy enough to manage these difficulties continu-
ously and ML and AI methods could assist with demonstrating these strong
unique calculations in recognizing network issues and furnish with the con-
ceivable arrangement progressively. In short to medium-term plans, AI and
ML can be utilized to distinguish the dangers and counter them with hearty
and versatile security calculations [13]. Though, in the long haul, a com-
pletely mechanized security instrument is imagined for opportune reaction
to threats and cyberattacks.
The 5G network relied upon to help a lot higher level heterogeneity (as far
as associated gadgets and networks) when contrasted with its archetypes.
For example, 5G supports brilliant vehicles, shrewd homes, savvy structures,
and smart urban areas. Additionally, the IoT in 5G network design will
include more hearty and versatile strategies to deal with the basic security
issues both at the network and gadget sides [13, 14].
middle, for example, identity, confirmation, and protecting the user details.
The security and privacy in 5G-IoT will cover every one of the layers like
personality insurance, protection, and E2E protection. Catering for security
and protection of information from these various frameworks with extraor-
dinarily unique security prerequisites becomes a monotonous undertaking
[15]. AI and ML with an outline of SBA and security prerequisites for dis-
tinctive end-frameworks can identify and redress these issues progressively
by arranging and bunching surprising dangers. AI and ML can help in cre-
ating security instruments by making trust models, gadget security, and
information confirmation to give precise security for the entire 5G-IoT net-
work [16,17].
TABLE 14.3
AI Techniques for 5G
AI Learning Leaning or
S. No. Scheme Training Model Application
14.4 Wireless AI
5G communication and network management is enabled by machine
learning and deep learning. For every model, the benefits and weaknesses of
AI-empowered procedures is discussed as follows:
(a) M-MIMO and beamforming: Massive MIMO is one element of 5G. Using
countless radiating antennas, 5G can concentrate the transmission and
gathering of signal power into small regions. Be that as it may, a few
issues are identified with this innovation. AI/ML has been applied in
247
14.6 Intelligent Connectivity
It is a new idea, which is based on a combination of three significant
innovations, 5G and IoT, which is intended to fill in as a way to speed
up the advancement of troublesome computerized administrations. This
new idea works with the association of gadgets through a quick and low
inactivity portable organization, that is 5G, which gathers advanced data
through the machines and sensors, which is the capacity of IoT. At that
point, examinations and contextualizations by AI/ML lastly creates sig-
nificant results that are helpful for the clients [25,26]. This would empower
new groundbreaking capacities in the greater part of the business areas, for
example transport, producing, medical care, public well-being, security, and
so forth.
14.7 5G Security
Generally, AI and ML calculations are information hungry in nature, which
implies that information is expected to prepare the model for powerful
working. In the period of 5G, information age, stockpiling, and the man-
agement of data are quite easy as we have high computational power, out-
standing information development, and information sources. The network
can be kept up with, accessed, and examined for potential threats, attacks,
and weaknesses utilizing AI and ML at a lower cost of figuring and a reason-
able framework [21, 30–32].
AI and ML models can be utilized to recognize dubious exercises con-
tinuously by dissecting the designs and boundaries of network movement.
Order calculations can be utilized to distinguish inconsistencies by
observing organization boundaries such as throughput and organization
blunder logs. Grouping algorithms can be utilized to classify different
sorts of dangers and also escape clauses in network security. The models,
for example, measure induction assaults and generative adversarial
networks (GAN) can create counterfeit datasets and these are new safety
efforts just like testing and carrying out advanced security conventions
and calculations. The exploration in creating private AI and ML models
have seen some huge improvement in secure computation, encryption,
protection, and unified learning. Half and half models are made by taking
on methods from various fields to make models effective, quicker, and
summed up [21, 33, 34].
14.8 Challenges
14.8.1 5G Complexity
AI/ML algorithms should be deployed at the time of proper communica-
tion in wireless gadgets. Be that as it may, numerous remote gadgets have
restricted memory and registering capacities, which isn’t appropriate for
complex calculations. The assortment of huge examples and preparing pro-
found learning models take extensive time, which is a huge hindrance to
convey them on some remote gadgets having restricted power and cap-
acity. Now and again, the higher the quantity of tests and the more huge
the preparation time are, the higher the precision of acknowledgment of the
sign and organization highlights is. Obtaining more examples and preparing
the models for longer occasions bring about lethargic input. Thus, machine
learning models ought to be intended to accomplish best precision with less
examples and within a brief time frame [34].
251
14.8.3 Trade-off in Speed
The dependability of these procedures is definitely not exactly conven-
tional strategies in remote interchanges in taking care of certain issues. For
example, profound learning can contend with LS and MMSE in remote
divert assessment in massive MIMO; however, lethargic input portrays these
methods. AI/ML induction might lengthen the framework reaction time.
This is because not most remote gadgets approach cloud computing, and
regardless of the situation, correspondence with cloud servers will present
additional postponements.[37–40].
14.9 Research Scope
To facilitate the mix of AI/ML, research endeavors are required in a few ways.
For example, the speed increase of a profound neural organization has to be
cutting edge for equal registering, quicker calculation, and distributed com-
puting. Appropriated profound learning frameworks present a chance for
5G to assemble the insight in its frameworks to convey high throughput and
super low idleness. There have been a few ongoing endeavors in increasing
speed of a profound neural organization.
5G-IoT security and protection need more examination in the spaces of con-
firmation, approval, access control, and protection safeguarding. The current
3GPP-defined networks utilize practical hub detail and conceptual interfaces
yet in 5G-IoT. The actual organization will fill in as a center framework and
security affirmation will be the vital test to manage. At this stage, semi-
managed AI-helped arrangements better suit the appropriate frameworks.
With the advancement of AI calculations/algorithms, these frameworks will
turn out to be completely mechanized later on.
252
14.10 Conclusion
In this chapter, we focused on the integration of AI/ML for 5G network
with respect to different constraints. We concentrated on a few contextual
investigations counting regulation characterization, channel coding, enor-
mous MIMO, reserving, energy proficiency, and network safety. As a con-
clusion of this top-to-bottom review, AI-empowered 5G communication and
systems administration is a promising arrangement that can furnish remote
organizations with the insight, effectiveness, and adaptability needed to deal
with the alarm radio asset well and convey superior grade of administration
to the clients. In any case, a few endeavors are as yet expected to decrease
the intricacy of profound adapting so it tends to be executed in time-delicate
organizations and low-power gadgets and test the models in more reason-
able situations. These days, with the power and pervasiveness of data, various
specialists are adjusting their insight and growing their instruments arms
stockpile with AI-based models, calculations, and practices, particularly in the
5G world, where even a couple of milliseconds of inactivity can have an effect.
References
[1] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless
networking: A survey,” IEEE Communications Surveys & Tutorials, 2019.
[2] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep learning-
based channel estimation,” IEEE Communications Letters, vol. 23, no. 4, pp. 652–
655, 2019.
[3] S. Gao, P. Dong, Z. Pan, and G. Y. Li, “Deep learning based channel estimation
for massive MIMO with mixed-resolution ADCS,” IEEE Communications Letters,
vol. 23, no. 11, pp. 1989–1993, 2019.
[4] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel estimation
and signal detection in ofdm systems,” IEEE Wireless Communications Letters,
vol. 7, no. 1, pp. 114–117, 2017.
[5] Devasis Pradhan and K.C. Priyanka, “RF-energy harvesting (RF-EH) for sustain-
able ultra dense green network (SUDGN) in 5G green communication,” Saudi
Journal of Engineering and Technology 2415–6264 (Online, 5(6), pp. 2582013264
DOI: 10.36348/sjet.2020.v05i06.001, 2020)
[6] Devasis Pradhan and K.C. Priyanka, “A comprehensive study of renewable
energy management for 5G green communications: Energy saving techniques
and its optimization,” Journal of Seybold Report 1533– 9211, 25(10), pp. 270–
284. 2020.
[7] Devasis Pradhan and A. Dash, “An overview of beam forming techniques
toward the high data rate accessible for 5G networks,” International Journal
of Electrical, Electronics and Data Communication, 2320–2084, 2321–2950, 8(12),
pp. 1–5, 2020.
253
[8] Devasis Pradhan and R. Rajeswari, “5G-green wireless network for communica-
tion with efficient utilization of power and cognitiveness,” in Jennifer S. Raj (ed.),
International Conference on Mobile Computing and Sustainable Informatics.
Springer Nature Switzerland AG 2021:Springer, Cham, pp. 325–335. 2020.
[9] Devasis Pradhan, P. K. Sahu, A. Dash, and Hla Myo Tun, “Sustainability of 5G
green network toward D2D communication with RF-energy techniques,” 2021
International Conference on Intelligent Technologies (CONIT), 2021, pp. 1–10,
doi:10.1109/CONIT51480.2021.9498298
[10] Y. Wang, M. Narasimha, and R. W. Heath, “Mmwave beam prediction with situ-
ational awareness: A machine learning approach,” in 2018 IEEE 19th International
Workshop on Signal Processing Advances in Wireless Communications
(SPAWC). IEEE, 2018, pp. 1–5.
[11] E. Balevi and J. G. Andrews, “Deep learning-based channel estimation for high-
dimensional signals,” arXiv preprint arXiv:1904.09346, 2019.
[12] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation
for beamspace mmwave massive mimo systems,” IEEE Wireless Communications
Letters, vol. 7, no. 5, pp. 852–855, 2018.
[13] M. S. Safari and V. Pourahmadi, “Deep ul2dl: Channel knowledge transfer from
uplink to downlink,” arXiv preprint arXiv:1812.07518, 2018.
[14] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive mimo csi feed-
back,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, 2018.
[15] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive neural signal
detection for massive mimo,” arXiv preprint arXiv:1906.04610, 2019.
[16] G. Gao, C. Dong, and K. Niu, “Sparsely connected neural network for massive
MIMO detection,” EasyChair, Tech. Rep., 2018.
[17] M. Alrabeiah and A. Alkhateeb, “Deep learning for TDD and FDD
massive MIMO: Mapping channels in space and frequency,” arXiv preprint
arXiv:1905.03761, 2019.
[18] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless
networking: A survey,” IEEE Communications Surveys Tutorials, vol. 21, pp. 2224–
2287, 2019.
[19] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial
machine learning,” in Proceedings of the 4th ACM workshop on Security and
artificial intelligence, pp. 43–58, ACM, 2011.
[20] Devasis Pradhan, P.K. Sahu, K. Rajeswari, and Hla Myo Tun, “A study of local-
ization in 5G green network (5G-GN) for futuristic cellular communication,”
The 3rd International Conference on Communication, Devices and Computing
(ICCDC 2021), India, 16–18 August 2021.
[21] G. A. Plan, “An inefficient truth.” Global Action Plan Report, 2007.
[22] Hla Myo Tun, “Radio network planning and optimization for 5G telecom-
munication system based on physical constraints”, Journal of Computer Science
Research, 3(1), January 2021. https://doi.org/10.30564/jcsr.v3i1.2701
[23] W. Song, F. Zeng, J. Hu, Z. Wang, and X. Mao, “An unsupervised learning-based
method for multi-hop wireless broadcast relay selection in urban vehicular
networks,” IEEE Vehicular Technology, 865 Conference, vol. 2017.
[24] Pradhan, D., Sahu, P. K., Dash, A., & Tun, H. M. (2021, June). Sustainability of
5G Green Network toward D2D Communication with RF-Energy Techniques.
In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1–10). IEEE.
254
[25] M. S. Parwez, D. B. Rawat, and M. Garuba, “Big data analytics for user-activity
analysis and user- anomaly detection in mobile wireless network,” IEEE
Transactions on Industrial Informatics, vol. 13, no. 4, pp. 2058–2065, 2017.
[26] L.-C. Wang and S. H. Cheng, “Data-driven resource management for ultra-dense
small cells: An affinity propagation clustering approach,” IEEE Transactions on
Network Science and Engineering, vol. 4697, no. c, pp. 1–1, 2018.
[27] U. Challita, L. Dong, and W. Saad, “Deep learning for proactive resource allo-
cation in LTE-U networks,” in European Wireless 2017, 23rd European Wireless
Conference, 2017.
[28] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent
neural networks,” Tech. Rep., 2013.
[29] Q.V. Le, N. Jaitly, and G.E. Hinton Google, “A simple way to initialize recurrent
networks of rectified linear units,” Tech. Rep., 2015.
[30] G. Alnwaimi, S. Vahid, and K. Moessner, “Dynamic heterogeneous learning
games for opportunistic access in LTE-based macro/femtocell deployments,”
IEEE Transactions on Wireless Communications, vol. 14, no. 4, pp. 2294–2308, 2015.
[31] D. D. Nguyen, H. X. Nguyen, and L. B. White, “Reinforcement learning with
network-assisted feedback for heterogeneous RAT selection,” IEEE Transactions
on Wireless Communications, vol. 16, no. 9, pp. 6062–6076, 2017.
[32] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to
optimize: Training deep neural networks for interference management,” IEEE
Transactions on Signal Processing, vol. 66, no. 20, pp. 5438–5453, 2018.
[33] Y. Junhong and Y. J. Zhang, “Drag: Deep reinforcement learning based base
station activation in heterogeneous networks,” IEEE Transactions on Mobile
Computing, vol. 19, no. 9, pp. 2076–2087, 2019.
[34] J. Liu, B. Krishnamachari, S. Zhou, and Z. Niu, “Deepnap: Data-driven base
station sleeping operations through deep reinforcement learning,” IEEE Internet
of Things Journal, vol. 5, no. 6, pp. 4273–4282, 2018.
[35] M. Kozlowski, R. McConville, R. Santos-Rodriguez, and R. Piechocki, “Energy
efficiency in reinforcement learning for wireless sensor networks,” arXiv preprint
arXiv:1812.02538, 2018.
[36] B. Matthiesen, A. Zappone, E. A. Jorswieck, and M. Debbah, “Deep learning for
optimal energy-efficient power control in wireless interference networks,” arXiv
preprint arXiv:1812.06920, 2018.
[37] C. Gutterman, E. Grinshpun, S. Sharma, and G. Zussman, “Ran resource
usage prediction for a 5G slice broker,” in Proceedings of the Twentieth ACM
International Symposium on Mobile Ad Hoc Networking and Computing.
ACM, 2019, pp. 231–240.
[38] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for pro-
active caching in cloud-based radio access networks with mobile users,” IEEE
Transactions on Wireless Communications, vol. 16, no. 6, pp. 3520–3535, 2017.
[39] X. Lu, L. Xiao, C. Dai, and H. Dai, “Uav- aided cellular communications
with deep reinforcement learning against jamming,” arXiv preprint
arXiv:1805.06628, 2018.
[40] M. Sadeghi and E. G. Larsson, “Physical adversarial attacks against end-to-end
autoencoder communication systems,” IEEE Communications Letters, vol. 23, no.
5, pp. 847–850, 2019.
newgenprepdf
255
15
Electrical Price Prediction using
Machine Learning Algorithms
CONTENTS
15.1 Introduction.............................................................................................. 256
15.2 Literature Review..................................................................................... 257
15.3 Methodology............................................................................................. 258
15.3.1 Univariate Forecasting Models................................................. 258
15.3.1.1 Autoregressive Model............................................... 258
15.3.1.2 Moving Average (MA).............................................. 259
15.3.1.3 ARMA.......................................................................... 259
15.3.1.4 ARIMA........................................................................ 259
15.3.1.5 LSTM........................................................................... 260
15.3.2 Multivariate Forecasting Models............................................. 260
15.3.2 1 DNN............................................................................ 260
15.3.2.2 CNN............................................................................. 261
15.3.2.3 CNN-LSTM................................................................. 261
15.3.2.4 CNN-LSTM-DNN..................................................... 261
15.4 Electrical Demand Forecasting............................................................... 261
15.4.1 Dataset.......................................................................................... 261
15.4.2 Data Preprocessing..................................................................... 262
15.4.3 Experiment.................................................................................. 262
15.4.3.1 Univariate Time-Series Forecasting........................ 262
15.4.3.2 Multivariate Time-Series Forecasting..................... 264
15.5 Results and Discussions.......................................................................... 264
15.5.1 For Univariate Models............................................................... 264
15.5.2 For Multivariate Models............................................................ 266
15.6 Conclusions............................................................................................... 267
15.7 Limitations and Future Scope................................................................ 268
15.1 Introduction
Energy is essential for the economic development of any country. In the
case of developing countries, the energy sector assumes critical importance
given the ever-increasing energy needs requiring huge investments to meet
them. Electricity has influenced our day-to-day life and in industries, electri-
city provides power to large machines, which produce essential utilities and
industrial and consumer products. Electricity provides the basis for the eco-
nomic development of any country because a consistent and reliable power
source creates a variety of businesses and job opportunities. It provides access
to online resources and information.
Electricity is a secondary form of energy, because it’s generated from the
conversion of energy produced from different sources such as renewable,
for example, solar energy, hydro energy, onshore offshore energy, and non-
renewable such as coal, lignite, natural gases, and fossil fuels. It cannot be
stored in large quantities due to its properties. Sometimes, there is a disbalance
in the generation and supply of electrical energy. This causes fluctuation in
the electricity prices. Extreme price volatility has forced power generators to
hedge not only volume but also price risk. Price forecasts from a few hours to
a few months ahead have been of particular interest to power portfolio man-
agers. A power generation company capable of predicting volatile wholesale
prices with a reasonable level of accuracy can adjust its strategy and its own
production or consumption schedule to reduce the risk or maximize profits
in day-ahead trading. It helps management take the right decisions and
manage resources accordingly and work on the shortage. Lack of sufficient
power generation capacity, poor transmission and distribution infrastruc-
ture, high costs of supply to remote areas, or simply lack of affordability for
electricity are among the biggest hurdles for extending grid-based electricity.
Forecasting of power price plays an essential role in the electricity industry,
as it provides the basis for making decisions in power system planning and
operation.
Electricity price forecasting is a time- series forecasting problem.
Therefore, in this research work, electricity prices are forecast by applying
machine learning (ML) algorithms with an open-source data set. The clas-
sical forecasting models and the combination of recurring neural networks
(RNN) and convolutional neural networks (CNN) are used for price pre-
diction. Classical models such as moving average, autoregressive moving
average (ARMA), autoregressive integrated moving average (ARIMA),
and long short-term memory (LSTM) are used for electricity price predic-
tion. Electricity prices are governed by different variables known as price
determinants. Therefore, after the application of classical forecasting models
and RNN, multivariate forecasting models are applied on the same dataset.
Consequently, the results obtained from the models applied are compared
257
based on mean absolute percentage error (MAPE) and mean absolute error
(MAE) values for both univariate models and multivariate models. The
graphs of the actual price and forecasting price are also plotted to visualize
the results obtained.
15.2 Literature Review
Researchers have already applied ML algorithms for forecasting the change
in prices. Rolnick et al. defined the climate change problem and encouraged
the use of ML algorithms for forecasting the adverse effects of climate change
[1]. Kof Nti et al. presented a review on the classification of electrical energy
and categorized the forecasting methods into industrial and artificial intelli-
gence (AI) methods [2]. Aishwarya et al. applied ML models such as Bayesian
linear regression, boosted decision tree regression, decision forest regression,
and statistical analysis method for predicting the number of customers with
internal data and external data in a ubiquitous environment [3]. Mir et al.’s
comparative review shows that the time-series modeling approach has been
extensively used for long-and medium-term forecasting. AI-based techniques
for short-term forecasts remain prevalent in the literature [4]. Johannesen
et al. compared various regression tools based on the lowest MAPE value
and concluded that random forest regressor provides better short-term load
prediction (30 min) and K-nearest neighbor (KNN) offers relatively better
long-term load prediction (24 h) [5]. Ahmad and Chen determined that wea-
ther change is responsible for the change in energy consumption pattern in
the domestic, commercial, and industrial sectors and concluded that LSBoost
performance is more modest than the LMSR and NARM for monthly, sea-
sonally, and yearly ahead intervals [6]. Fattah et al. developed an ARIMA
model for demand forecasting of a product using the Box–Jenkins time-series
approach [7]. For understanding the nature of time series and the objective
of analysis, Athiyarath et al. explained and applied the concepts of LSTM,
ARIMA, and CNN [8]. Kaushik et al. evaluated different statistical, neural,
and ensemble techniques in their ability to predict patients’ weekly average
expenditures on certain pain medications. Two statistical models, persistence
(baseline) and ARIMA, multilayer perceptron model, LSTM model and an
ensemble model were used to predict the expenditures on two different pain
medications [9]. Ahmad et al. used artificial neural networks (ANN) with
a nonlinear autoregressive exogenous multivariable inputs model, multi-
variate linear regression model, and adaptive boosting model to predict
energy demand in a smart grid environment [10]. Allee et al. used a data-
driven approach for demand prediction using survey and smart meter data
from 1,378 Tanzanian mini-grid customers. Applied support vector machines
258
15.3 Methodology
This section presents the various univariate and multivariate forecasting
models that were applied in this work.
where
• Pt–1, Pt–2…Pt–p are the past series values (lags),
• At is white noise (i.e., randomness), and
• δ is defined by the following equation:
15.3.1.3 ARMA
An ARMA model is used to describe weakly stationary stochastic time
series in terms of two polynomials. The first of these polynomials is for
autoregression, and the second for the MA. In this model, the impact of past
lags along with the residuals is considered for predicting the future values as
explained in the following equation:
where
β =the coefficients of the AR model
α =the coefficients of the MA model.
15.3.1.4 ARIMA
ARIMA is a class of models that explain a given time series data based on its
own previous values, that is, its own lags and the lagged forecast errors, so
260
that Equations 15.5–15.7 can be used to forecast future values. The ARIMA
model is used when the mean of the time series is not constant, that is, the
time series is not stationary. Therefore, the data is prepared by a degree of dif-
ferencing in order to make it stationary, that is, to remove trend and seasonal
structures that negatively affect the regression model:
a1 = ∑ i = 1 Zk − i + a1
k =1
(15.7)
where
at = data points on the non-stationary time series
ϕ₁*Zt–1 =represents the autoregressive part
ɵ1*€t–1 =represents the moving average part
€t =error occurred
15.3.1.5 LSTM
LSTMs are special kind of RNNs, precisely designed to avoid the long-term
dependency in problems. They can remember information for longer period
of time with ease and it is their intended property. They possess a sigmoid
neural net layer and a pointwise multiplication operation. The sigmoid layer
outputs numbers between 0 and 1, describing how much of each component
should be let through. A value of 0 means “let nothing through,” while a value
of 1 means “let everything through!” The vanishing gradient effect is red by
implementing three gates along with the hidden state. The three implemented
gates are commonly referred to as input, output, and forget gates. An LSTM
has these gates to protect and control the cell state. Efficient applications of
LSTM networks can be found in research fields such as human trajectory pre-
diction, traffic forecasting, speech recognition, and weather prediction.
pil = f (∑ J
j =1
(w q + bi
i,k k )) (15.8)
261
where p is the output of neuron of the current layer, q is the output of neuron
of the previous layer, w is the set of weights, b is bias, f is nonlinear activa-
tion, i is current neuron, l is current layer, and k is the earlier layer.
15.3.2.2 CNN
CNN is traditionally designed for image detection systems but can also be
used as time-series forecasting model. The neurons in CNN can hold high
dimensional data and so this model performs better than the other neural
networks. CNN models have three prominent layers: (i) convolutional layer,
(ii) pooling layer, and (iii) flatten layer. A flatten layer is used between the
convolutional layers and the dense layer to reduce the feature maps to a
single one-dimensional vector.
15.3.2.3 CNN-LSTM
This architecture is also known as long-term recurrent convolutional net-
work (LRCN) model. The CNN-LSTM architecture uses CNN layers for fea-
ture extraction on input data combined with LSTMs to support sequence
prediction. A CNN-LSTM can be defined by adding CNN layers at the front
end followed by LSTM layers with a dense layer at the output. These models
were developed for visual time-series prediction problems.
15.3.2.4 CNN-LSTM-DNN
In order to perform complex tasks, the neural networks need to be trained
deeper. When deeper networks start to converge, the problem of degrad-
ation occurs. In degradation, with the increase in network depth, accuracy
gets saturated and then decreases rapidly. Unexpectedly, such degradation is
not caused by overfitting, and adding more layers to a suitably deep model
leads to higher training error. Skip connections, which are also known as
shortcut connections, are introduced to overcome this problem. This type
of connection skips some of the layers in the neural network and feeds the
output of one layer as the input to the next layers. In this CNN-LSTM model,
output is skipped to common DNN.
service operator (TSO) data. The dataset is unique because it contains hourly
data for electrical prices and the respective forecasts by the TSO for consump-
tion and pricing. The work focuses on predicting electric prices better than
that already forecast by the present data. The metrics we are using for com-
parison is MAPE and MAE defined below:
100% n At − Ft
MAPE =
n
∑ t =1 At (15.9)
∑
n
i =1
At − ft
MAE = (15.10)
n
where
At =actual value
Ft =forecast value
n =total number of datapoints
15.4.2 Data Preprocessing
In order to apply ML algorithms, the raw dataset is first cleaned and
preprocessed. After cleaning the data, that is eliminating all the null values
and noises, the dataset is converted to a usable format. The next step is fea-
ture scaling, after which the data is split into testing and training dataset.
For univariate time-series forecasting, a single feature is extracted and the
value for total actual price is predicted. Subsequently, a min–max scaler is
employed to scale the dataset. The dataset is divided into two parts: 80% as a
training dataset and 20% as a testing dataset.
For multivariate time-series forecasting, multiple features, that is, energy
consumption, price, day of the week, and month of the year are extracted.
Then the scaling of the features is done using min–max scaler.
15.4.3 Experiment
15.4.3.1 Univariate Time-Series Forecasting
After data preprocessing and feature selection, the selected forecasting
models are successfully applied using Jupyter notebook. Jupyter notebook
is an open-source workbench used to create and share documents having
codes, projects, equations, and visualization. The forecasted and actual
prices are plotted in the graphs shown in Figure 15.1. From the figure, it
can be concluded that the forecasted price is closest to actual price for the
LSTM model. MAPE and MAE values obtained for each model is shown in
Table 15.1.
263
FIGURE 15.1
The actual and the forecasted price is plotted in the graph, predicted by the univariate
forecasting models.
264
TABLE 15.1
The MAPE, RMSE, and MAE Values Obtained
for Each Univariate Model
1. A three-layer DNN (one layer plus the common bottom two layers)
2. A CNN with two layers of 1D convolutions with max pooling
3. A LSTM with two LSTM layers
4. A CNN-stacked LSTM with layers from models 2 and 3 feeding into the
common DNN layer. We are using the same layers from the CNN and
LSTM model, stacking the CNN as input to the pair of LSTMs.
5. A CNN is stacked LSTM with a skip connection to the common
DNN layer.
The same CNN and LSTM layers as the previous models are used this time
with a skip connection direct to the common DNN layer.
Figure 15.2 shows that the LSTM appears to oscillate over a longer fre-
quency compared to the other models. The CNN also seems to capture the
intraday oscillations. CNN-stacked LSTM shows how these two attributes
of the model’s learning are combined. Table 15.2 shows the MAE values
obtained by the different models.
265
266
TABLE 15.2
The MAPE Values Obtained by the Different Multivariate Models
FIGURE 15.3
Comparison between the applied univariate models on the basis of MAE.
comparison to all the other forecasting models. If MAPE is taken as the metric
of comparison, LSTM model gives the lowest MAPE. Lower MAPE shows
that the LSTM forecasting results are closer to the actual prices as compared
to other models. Figure 15.4 shows the MAPE values of all univariate models
and it can be seen that MAPE of ARIMA is maximum, that is, ARIMA model’s
accuracy is the lowest. Figure 15.1 shows the comparison of actual prices
and forecasted prices, and it can be seen that the prices forecasted by LSTM
overlap with the actual price graph. Figure 15.3 shows the MAE obtained
from all the univariate models applied and it can be seen MAE value for
ARIMA is maximum.
FIGURE 15.4
Comparison between the applied univariate models on the basis of MAPE.
FIGURE 15.5
Comparison between the applied multivavariate models on the basis of MAPE.
of LSTM skipped is more than that of CNN stacked and that’s why the fore-
casted value obtained from CNN stacked is close to actual one in comparison
to CNN skipped.
15.6 Conclusions
Electricity plays a key role in the economic development of any nation. All
the power generation companies in the market aim to utilize all the available
268
FIGURE 15.6
Comparison between the applied multivavariate models on the basis of MAE.
resources to the fullest, and improve the generation, consumption, and supply
of electricity produced in order to maximize profit generation. Electricity price
prediction plays a vital role in accomplishing the aims of the power companies.
The accurate prediction of the electric prices enables the power managers to
make decisions regarding the raw materials used and the production and
supply of electricity. In this chapter, the idea is to apply different univariate
and multivariate machine learning algorithms for electricity price prediction.
Jupyter notebook, an open-source workbench, is used for coding and visual-
izing the results. Electric price predictions involve historic data of factors such
as load and time in hours, days, and weeks. The machine learning algorithms
perform well and handle a large amount of data. Out of the univariate models,
LSTM outperforms all the other methods. Out of multivariate models, LSTM
skip performs better than LSTM stacked. CNN and DNN give satisfactory
results. As managers aim to understand the dynamics of the market, the
methods discussed in this chapter can be used by them to create a concrete
basis to make their decisions. Lower errors in forecasting lead to accurate
decisions to be made in resource planning and capacity planning, which in
turns improves the profit share of the power generation companies.
References
[1] Rolnick, D., Donti, P.L., Kaack, L.H., Kochanski, K., Lacoste, A., Sankaran,
K., Ross, A.S., Milojevic- Dupont, N., Jaques, N., Waldman- Brown, A., and
Luccioni, A., 2019. Tackling climate change with machine learning. arXiv pre-
print arXiv:1906.05433.
[2] Nti, I.K., Teimeh, M., Nyarko-Boateng, O., and Adekoya, A.F., 2020. Electricity
load forecasting: a systematic review. Journal of Electrical Systems and Information
Technology, 7(1), 1–19.
[3] Aishwarya, K., Aishwarya Rao, Nikita Kumari, Akshit Mishra, and Rashmi,
M.R., 2020. Food demand prediction using machine learning, International
Research Journal of Engineering and Technology, 7(6), 3672–3675.
[4] Mir, A.A., Alghassab, M., Ullah, K., Khan, Z.A., Lu, Y., and Imran, M., 2020.
A review of electricity demand forecasting in low and middle income coun-
tries: The demand determinants and horizons. Sustainability, 12(15), 5931.
[5] Johannesen, N.J., Kolhe, M., and Goodwin, M., 2019. Relative evaluation of
regression tools for urban area electrical energy demand forecasting. Journal of
Cleaner Production, 218, 555–564.
[6] Ahmad, T. and Chen, H., 2019. Nonlinear autoregressive and random forest
approaches to forecasting electricity load for utility energy management
systems. Sustainable Cities and Society, 45, 460–473.
[7] Fattah, J., Ezzine, L., Aman, Z., El Moussami, H., and Lachhab, A., 2018.
Forecasting of demand using ARIMA model. International Journal of Engineering
Business Management, 10, 1–9.
[8] Athiyarath, S., Paul, M., and Krishnaswamy, S., 2020. A comparative study and
analysis of time series forecasting techniques. SN Computer Science, 1, 1–7.
[9] Kaushik, S., Choudhury, A., Sheron, P.K., Dasgupta, N., Natarajan, S., Pickett,
L.A., and Dutt, V., 2020. AI in healthcare: time-series forecasting using statistical,
neural, and ensemble architectures. Frontiers in Big Data, 3, 4.
[10] Ahmad, T. and Chen, H., 2018. Potential of three variant machine-learning
models for forecasting district level medium- term and long- term energy
demand in smart grid environment. Energy, 160, 1008–1020.
[11] Allee, A., Williams, N.J., Davis, A., and Jaramillo, P., 2021. Predicting initial elec-
tricity demand in off-grid Tanzanian communities using customer survey data
and machine learning models. Energy for Sustainable Development, 62, 56–66.
[12] Dong, B., Cao, C., and Lee, S.E., 2005. Applying support vector machines to pre-
dict building energy consumption in tropical region. Energy and Buildings, 37(5),
545–553.
[13] Mohammed, J. and Mohamed, M., 2021. The forecasting of electrical energy
consumption in morocco with an autoregressive integrated moving average
approach, Hindawi Mathematical Problems in Engineering, 2021, Article ID
6623570, 9.
[14] Hasanah, R.N., Indratama, D., Suyono, H., Shidiq, M., and Abdel-Akher, M.,
2020. Performance of genetic algorithm-support vector machine (GA-SVM) and
autoregressive integrated moving average (ARIMA) in electric load forecasting.
Journal FORTEI-JEERI, 1(1), 60–69.
[15] Aurna, Md. N.F., Rubel, T.M., Siddiqui, T.A., Karim, T., Saika, S., Md. Arifeen,
M., Mahbub, T.N., Salim Reza, S.M., and Kabir H. 2021. Time series analysis of
270
271
16
Machine Learning Application to Predict
the Degradation Rate of Biomedical Implants
CONTENTS
16.1 Introduction................................................................................................. 271
16.2 Related Work............................................................................................... 275
16.3 Proposed Methodology............................................................................. 277
16.4 Conclusion................................................................................................... 280
16.1 Introduction
A bone fracture often termed as a broken bone, is a medical condition
in which the shape or contour of bone changes due to impact to external
forces or injuries under many biological as well as mechanical circumstances
such as injuries during physical activities, vehicle accidents, accidental falls
or due to weakening of bones because of aging as well an underlying disease
[1]. Under fracture conditions, broken or cracked bone is stabilized and
supported to handle the weight of the body for movement during the process
of fracture healing. Some fractures are healed from outside of the body
using plasters, but using them gives rise to critical issues. To reduce the risk
of infection from external support, surgical procedures are performed to
implant supports internally to stabilize fractured bones with some implants
TABLE 16.1
Different Types of Fractures and Their Internal Fixators
FIGURE 16.1
Bone healing procedure.
such as plates, screws, nails, or wires [2]. Different types of implant devices
used for fracture healing are shown in Table 16.1.
These implants are of two types: non-biodegradable and biodegradable.
Non-biodegradable implants have to be removed by second surgery after
healing from fracture. But biodegradable implants degrade the inside body
and therefore a second surgery is not needed after healing of the fractured
bones. Degradation of biological implant materials occurs by an electro-
chemical reaction in presence of an electrolyte, which results in the forma-
tion of oxides, hydroxides, hydrogen gas, or other compounds, as shown in
Figure 16.1. Metals are used as bone implant materials in a non-biodegradable
implantation to fix cracked, deformed, worn-out, or broken bones.
The artificial replacements are made up of metals, polymers, or ceramic
to be strong enough as well as flexible for everyday movement, as given
in Figure 16.2. So, while choosing a material, it is necessary to ascertain its
mechanical strength, flexibility, and biocompatibility. Biomaterials are syn-
thetic materials, either degradable or non-degradable in nature, but they
must possess good load-bearing capacity if they are to be used inside the
human body [3–5].
Orthopedic biomaterials are implanted near the bone fracture to provide
support and to heal the bone tissues. The end of fractured bones is connected
to implant devices, which are fixed with metal pins or screws. After the
healing of fractured bones, these screws and pins are removed. If these screws
are made up of non-biodegradable materials, after healing, they need to be
removed by a second operation.
273
FIGURE 16.2
Bone fracture implant materials, issues, and research scope areas.
device, its surface and mechanical properties have to be studied. If the selec-
tion of the material or its alloy composition is not accurate, then it may lead to
failure of the implant through loosening, osteolysis, wear, or toxic effect [8].
Subsequently, it is required to analyze the surrounding environment of
implant device and tissues where the device is placed. Therefore, for better
functioning of the biomedical implant, the choice of the appropriate material
holds high importance.
The selection of materials for the implantation is a crucial step for successful
long-term implants. The study of the implant material and its properties need
to be focused upon for a successful treatment.
The characteristics of the implant materials studied are modulus of elasti-
city, compressive strength, tensile strength, shear strength, yield, and fatigue
strength, ductility, hardness, corrosive properties (crevice corrosion, pitting
corrosion, galvanic corrosion, electrochemical corrosion), surface tension,
surface energy, and surface roughness [8]. Table 16.2 shows the material
properties that can be considered for implantations.
The biodegradation progression is driven by three major factors such as
chemical, mechanical, and biological interactions. In the case of the chemical-
based deterioration of the polymer, the degradation rate is highly dependent
on the polymer’s composition, crystallinity, molecular structure as well as its
hydrophobic and hydrophilic nature [2].
The chemical degradation process is achieved by breaking the polymer’s
molecular chains, breaking the cross-linking structure, or making interference
TABLE 16.2
Classification of Biomaterials
Material Types
Bio-toxic Gold
Co-Cr alloys Polyethylene
Stainless steel Polyamide
Niobium Polymethylmethacrylate
Tantalum Polytetrafluroethylene
Polyurethane
Bio-inert Commercially pure Al oxide
titanium
Titanium alloy Zirconium oxide
(Ti-6AL-4U)
Bio-active Hydroxyapatite
tricalcium phosphate
Bio-glass
carbon silicon
Biodegradable Magnesium, Calcium phosphate, Silk, collagen, polylactic acid
Zinc, calcium, iron silica, alumina
275
in its crystallinity. Bulk or surface degradation occurs in the body. In the case
of bulk degradation, the rate of degradation is attained at a faster rate as
in hydrophilic polymers by achieving their conversion into water-soluble
materials. Surface degradation takes place in hydrophobic polymers in which
it is intended to keep the inner structure intact and offers better control over
degradation rates [3].
In the case of degradation at the biological level, the materials are exposed
to the body fluids that result in changes in the chemical composition of the
polymers. The degradation can through enzymatic, oxidation, or hydrolytic
methods.
However, the interaction levels of the tissue and its behavior at the implant
site depend on its physical, biological, and chemical nature. On the basis of
their nature, implant device materials are categorized as:
16.2 Related Work
Mahdi Dehestani et al. [9]experimentally investigated the mechanical prop-
erties and corrosion behavior of iron and hydroxyapatite (HA) composites
for biodegradable implant applications. It was observed that the mechan-
ical strength decreases with increasing HA content and decreasing HA par-
ticle size whereas their corrosion rates increased. Fe–2.5 wt% HA was finally
created as the strongest composite.
Rakesh Rajan et al. [10] investigated the zinc– magnesium composite
implant material for its mechanical, corrosion, and biological properties
of magnesium. The mechanical strength is nearly that of the bone and the
corrosion rate was observed to be 0.38 mm/year with a 12% elongation rate.
Richard et al. [11] investigated experimentally the corrosion properties
of Fe–Mn–Si alloys for biodegradable medical implants. A corrosion rate of
0.24–0.44 mm/year was observed.
Tong et al. [12] studied the microstructure, mechanical properties, bio-
compatibility, and degradation behavior of Zn–Ge alloy for biodegradable
276
Borjali et al. [22] and Bedi et al. [23] used machine learning methods to pre-
dict the wear rate of biomedical implants and to validate them by quantifying
the prediction error.
16.3 Proposed Methodology
The following problems are faced:
Some of the properties need to be studied while selecting the materials for
manufacturing of implant devices.
Tensile strength: Implants should have high tensile strength, which lowers
the stress at fracture interface.
Yield strength: Implants should have high yield strength to prevent brittle
fracture under cyclic load.
Degradation rate: Degradation rate should be related with the healing rate.
Implant materials should be dependent on the length of time that it is
necessary to keep them in the body.
FIGURE 16.3
Proposed training process.
has to make samples and perform testing on them to know their mechanical
and biological behavior. It is quite a long task to analyze the material. In any
situation, if the material testing fails, the entire effort becomes worthless.
So, to reduce time consumption, the machine learning approach is used to
predict the accuracy of material composition used for implant design. This
process is performed in two steps: training and testing. Figure 16.3 illustrates
the training process of the proposed model. Similarly, Figure 16.4 represents
the testing process of the model.
FIGURE 16.4
Testing process.
Form S-groups
Main Loop
While iter < max iteration
Select features
Perform training process
Aggregate training rules
Update decision
Final decision
End
be different from each other. Each random forest module can be trained with
different parameters of implant materials. Bagging, random selection, and
boosting selection strategies can be used to select training samples
In this proposed architecture, bagging rules are taken as a base for ensemble
module in which each module is trained individually and further they are
aggregated by applying a combination method. During the testing phase, the
aggregate strategy or voting strategy among all machine learning modules
will decide the test data class label. In ensemble random forest architecture,
n training sample sets are constructed with n individual modules. To achieve
higher efficiency, different training sample sets are taken in order to improve
the aggregation result with higher efficiency.
16.4 Conclusion
Biomedical implants are used in many applications such as hip replacement,
femur bone replacement, and dental implants. Research has been ongoing on
making better implant designs and selecting the best material composition
for making implants last longer, in addition to reducing the body’s reaction
to their presence. For proper healing of the bone, implant devices are required
to provide mechanical strength and stability. Choosing the proper material
for making implant devices can improve osseointegration. Mechanical
strength or tensile strength of the implants must be compatible with the nat-
ural bone. But selection of the material manually and its experimental ana-
lysis to determine its degradation rate take a very long time. The primary
focus of this work is to use the power of machine learning for effective bone
healing without any adverse effects. The use of machine learning will help in
deciding the suitable implant material to enable fast healing of the bone frac-
ture without any side effects.
References
[1] Claes, L. and Ignatius, A. Development of new, biodegradable implants. Chirurg.
73, 990–996 (2002). https://doi.org/10.1007/S00104-002-0543-0.
[2] Li, C., Guo, C., Fitzpatrick, V., Ibrahim, A., Zwierstra, M.J., Hanna, P., Lechtig, A.,
Nazarian, A., Lin, S.J., and Kaplan, D.L. Design of biodegradable, implantable
devices towards clinical translation. Nat. Rev. Mater. 51(5), 61–81 (2019). https://
doi.org/10.1038/s41578-019-0150-z.
[3] Hofmann, G.O. Biodegradable implants in traumatology: A review on the state-
of-the-art. Arch. Orthop. Trauma Surg. 1143 (114), 123–132 (1995). https://doi.
org/10.1007/BF00443385.
281
[4] Liu, Y., Zheng, Y., and Hayes, B. Degradable, absorbable or resorbable—what
is the best grammatical modifier for an implant that is eventually absorbed by
the body? Sci. China Mater. 60, 377–391 (2017). https://doi.org/10.1007/S40
843-017-9023-9.
[5] Karpouzos, A., Diamantis, E., Farmaki, P., Savvanis, S., and Troupis, T.
Nutritional aspects of bone health and fracture healing. J. Osteoporos. (2017).
https://doi.org/10.1155/2017/4218472.
[6] Radha, R. and Sreekanth, D. Insight of magnesium alloys and composites for
orthopedic implant applications—A review. J. Magnes. Alloy. 5, 286–312 (2017).
https://doi.org/10.1016/J.JMA.2017.08.003.
[7] Wang, W., Han, J., Yang, X., Li, M., Wan, P., Tan, L., Zhang, Y., and Yang, K. Novel
biocompatible magnesium alloys design with nutrient alloying elements Si, Ca
and Sr: structure and properties characterization. Mater. Sci. Eng. B. 214, 26–36
(2016). https://doi.org/10.1016/J.MSEB.2016.08.005.
[8] Li, H., Yang, H., Zheng, Y., Zhou, F., Qiu, K., and Wang, X. Design and
characterizations of novel biodegradable ternary Zn- based alloys with IIA
nutrient alloying elements Mg, Ca and Sr. Mater. Des. 83, 95–102 (2015). https://
doi.org/10.1016/J.MATDES.2015.05.089.
[9] Dehestani, M., Adolfsson, E., and Stanciu, L.A. Mechanical properties and
corrosion behavior of powder metallurgy iron-hydroxyapatite composites for
biodegradable implant applications. Mater. Des. 109, 556–569 (2016). https://
doi.org/10.1016/J.MATDES.2016.07.092.
[10] Kottuparambil, R.R., Bontha, S., Rangarasaiah, R.M., Arya, S.B., Jana, A., Das,
M., Balla, V.K., Amrithalingam, S., and Prabhu, T.R. Effect of zinc and rare-earth
element addition on mechanical, corrosion, and biological properties of magne-
sium. J. Mater. Res. 33, 3466–3478 (2018). https://doi.org/10.1557/JMR.2018.311.
[11] Drevet, R., Zhukova, Y., Malikova, P., Dubinskiy, S., Korotitskiy, A., Pustov, Y.,
and Prokoshkin, S. Martensitic transformations and mechanical and corrosion
properties of Fe-Mn-Si alloys for biodegradable medical implants. MMTA. 49,
1006–1013 (2018). https://doi.org/10.1007/S11661-017-4458-2.
[12] Tong, X., Zhang, D., Zhang, X., Su, Y., Shi, Z., Wang, K., Lin, J., Li, Y., Lin, J., and
Wen, C. Microstructure, mechanical properties, biocompatibility, and in vitro
corrosion and degradation behavior of a new Zn–5Ge alloy for biodegradable
implant materials. Acta Biomater. 82, 197–204 (2018). https://doi.org/10.1016/
J.ACTBIO.2018.10.015.
[13] Wątroba, M., Bednarczyk, W., Kawałko, J., Mech, K., Marciszko, M., Boelter,
G., Banzhaf, M., and Bała, P. Design of novel Zn-Ag-Zr alloy with enhanced
strength as a potential biodegradable implant material. Mater. Des. 183, 108154
(2019). https://doi.org/10.1016/J.MATDES.2019.108154.
[14] Xia, D., Liu, Y., Wang, S., Zeng, R.C., Liu, Y., Zheng, Y., and Zhou, Y. In vitro
and in vivo investigation on biodegradable Mg-Li-Ca alloys for bone implant
application. Sci. China Mater. 62, 256–272 (2018). https://doi.org/10.1007/S40
843-018-9293-8.
[15] Suryavanshi, A., Khanna, K., Sindhu, K.R., Bellare, J., and Srivastava, R.
Development of bone screw using novel biodegradable composite orthopedic
biomaterial: from material design to in vitro biomechanical and in vivo biocom-
patibility evaluation. Biomed. Mater. 14 (2019). https://doi.org/10.1088/1748-
605X/AB16BE.
282
[16] Zhang, Z.Y., Guo, Y.H., Zhao, Y.T., Chen, G., Wu, J.L., and Liu, M.P. Effect of
reinforcement spatial distribution on mechanical properties of MgO/ ZK60
nanocomposites by powder metallurgy. Mater. Charact. 150, 229–235 (2019).
https://doi.org/10.1016/J.MATCHAR.2019.02.024.
[17] Yang, H., Jia, B., Zhang, Z., Qu, X., Li, G., Lin, W., Zhu, D., Dai, K., and Zheng,
Y. Alloying design of biodegradable zinc as promising bone implants for load-
bearing applications. Nat. Commun. 11, 1–16 (2020). https://doi.org/10.1038/
s41467-019-14153-7.
[18] Razzaghi, M., Kasiri- Asgarani, M., Bakhsheshi- Rad, H.R., and Ghayour, H.
Microstructure, mechanical properties, and in-vitro biocompatibility of nano-
NiTi reinforced Mg– 3Zn-0.5Ag alloy: Prepared by mechanical alloying for
implant applications. Compos. Part B Eng. 190, 107947 (2020). https://doi.org/
10.1016/J.COMPOSITESB.2020.107947.
[19] Cilla, M., Borgiani, E., Martínez, J., Duda, G.N., and Checa, S. Machine learning
techniques for the optimization of joint replacements: Application to a short-
stem hip implant. PLoS One. 12 (2017). https://doi.org/10.1371/JOURNAL.
PONE.0183755.
[20] Chatterjee, S., Dey, S., Majumder, S., RoyChowdhury, A., and Datta, S.
Computational intelligence based design of implant for varying bone conditions.
Int. J. Numer. Method. Biomed. Eng. 35 (2019). https://doi.org/10.1002/
CNM.3191.
[21] Niculescu, B., Faur, C.I., Tataru, T., Diaconu, B.M., and Cruceru, M. Investigation
of biomechanical characteristics of orthopedic implants for tibial plateau
fractures by means of deep learning and support vector machine classification.
Appl. Sci. 10, 4697 (2020). https://doi.org/10.3390/APP10144697.
[22] Borjali, A., Monson, K., and Raeymaekers, B. Predicting the polyethylene wear
rate in pin-on-disc experiments in the context of prosthetic hip implants: Deriving
a data-driven model using machine learning methods. Tribol. Int. 133, 101–110
(2019). https://doi.org/10.1016/J.TRIBOINT.2019.01.014.
[23] Bedi P., Goyal S.B., Rajawat A.S., Shaw R.N., and Ghosh A. (2022) A framework
for personalizing atypical web search sessions with concept-based user profiles
using selective machine learning techniques. In: Bianchini M., Piuri V., Das S.,
and Shaw R.N. (eds), Advanced Computing and Intelligent Technologies. Lecture
Notes in Networks and Systems, vol. 218. Springer, Singapore. https://doi.org/
10.1007/978-981-16-2164-2_23
newgenprepdf
283
17
Predicting the Outcomes of Myocardial
Infarction Using Neural Decision Forest
CONTENTS
17.1 Introduction.............................................................................................. 284
17.2 Neural Decision Forests.......................................................................... 284
17.3 Literature Review..................................................................................... 286
17.4 Research Gap and Objective of the Research....................................... 289
17.5 Data Collection......................................................................................... 289
17.6 Data Analysis............................................................................................ 289
17.6.1 Attribute 113 as FIBR_PREDS................................................... 290
17.6.2 Attribute 114 as PREDS_TAH................................................... 290
17.6.3 Attribute 115 as JELUD_TAH................................................... 290
17.6.4 Attribute 116 as FIBR_JELUD................................................... 291
17.6.5 Attribute- 117 as A_V_BLOK.................................................... 291
17.6.6 Attribute 118 as OTEK_LANC................................................. 291
17.6.7 Attribute 119 as RAZRIV........................................................... 291
17.6.8 Attribute 120 as DRESSLER...................................................... 291
17.6.9 Attribute 121 as ZSN.................................................................. 291
17.6.10 Attribute 122 as REC_IM........................................................... 292
17.6.11 Attribute 123 as P_IM_STEN.................................................... 292
17.6.12 Attribute 124 as LET_IS............................................................. 292
17.7 Model Training......................................................................................... 292
17.7.1 Parameters Used for Training the Model................................ 292
17.7.2 Parameters for Training a Neural Decision Tree Model....... 293
17.7.3 Parameters for Training a Neural Decision
Forest Model................................................................................ 293
17.8 Results........................................................................................................ 293
17.9 Conclusions............................................................................................... 294
17.1 Introduction
Myocardial infarction (MI) has become a serious silent killer in current times.
It is very hard to predict the likely outcome for the patient suffering from
MI. There are many factors involved such as diabetes, excessive alcohol
consumption, high blood pressure, lack of exercise, high blood cholesterol,
smoking, and lack of proper diet. Most of the patients suffering from MI have
coronary artery disease (CAD). MI is usually caused due to the coronary
artery blockage, which results from a breakage of an atherosclerotic plaque.
So, an early stage-diagnosis with the help of tests such as electrocardiograms
(ECGs), blood tests, and coronary angiography may help locate the presence
of any disease. In general, chest pain is one of the symptoms that may occur
due to MI and may travel to the shoulder, to the arm, and may even to the
back. Feeling fainted, shortening of breath, cold sweat, or tiredness are a few
other symptoms that can be observed in the patient’s body. MI may lead to
cardiac arrest, failure of the heart, or even an abnormal heartbeat.
Predicting MI outcomes is quite challenging because a many factors affect
them. There is high mortality in the first year of patients suffering from
acute myocardial infarction (AMI). A large number of people die from AMI
even before reaching the hospitals. This is because of the huge uncertainty
in the prediction of MI complications and outcomes. It can occur without
complications or with complications. At the same time, it is observed that half
the number of patients having acute or subacute MI have complications that
result in worsening of the disease and even result in loss of life of the patient.
It is hard even for experienced specialists to predict the complications. But
with the help of a few techniques such as deep learning and using the pre-
vious data of the patients, it is quite feasible to some extent.
FIGURE 17.1
Architecture of the neural decision tree.
make predictions for the dataset provided as the input to it. Random forests
or random decision forests create a large number of decision trees at the
training time. In case of classification tasks, the class selected by most of
the trees is the output of the random forest. Whereas in case of regression,
the average prediction of the individual tree is the output. The purpose
of using the concept of random forest in the NDF is to reduce overfitting
to the training dataset. The deep neural decision forest (deep NDF) technique
becomes the bridge between classification trees and the representation
learning approach by training them in an end-to-end manner. Here, the
concept of differentiable and the stochastic decision tree model comes into
play. It propagates the representation learning organized in the initial layers
of a (deep) CNN.
The deep NDF is the large number of interconnected decision trees, each
of which makes a prediction for the given sample input and the average of
all these predictions is made to get the random forest prediction. Figure 17.1
shows the architecture of a single neural decision tree. In other words, it can
be said that random first uses a divide and conquer policy and has a simple
model with very high performance.
In the NDF model, there are a number of neural decision trees that are
trained at the same time. The average output of the trees is calculated as the
final output of the NDF.
NDF consists of many neural decision trees in which each tree has to learn
two types of weights, “pi” and “decision_fn.” ‘Pi’ represents the probability
distribution of the classes that are present in the tree leaves, whereas “deci-
sion_fn” represents the probability of going to each leaf node. There are
four steps in the working of the neural decision tree. In the first step, the
model takes the input features in the form of a single vector containing all
the features of an instance in the batch. This vector is generated with the
help of CNN. In the second step, the model randomly selects a subset of
286
input features using “used_features_mask.” In the third step, for each input
instance taken from the second step, the model computes the probabilities
(mu) to reach the tree leaves. The model iteratively performs a stochastic
routing throughout the tree levels. Finally, in the fourth step, we get the final
output by combining the class probabilities at the leaves with the probabil-
ities of reaching the leaves.
17.3 Literature Review
Ibrahim el. al. [1]states that there is a need for accurate detection of AMI at
an early stage so that the patient can get timely provision of medical inter-
vention and is crucial for reducing the mortality rate. It is already known that
machine learning has proved its potential in aiding the diagnosis of diseases.
Lujain Ibrahim et al. [1] has used 713,447 extracted ECG samples along with
the related auxiliary data from the longitudinal and comprehensive ECG-
ViEW II database for predicting the AMI. The author has also conducted
research using the XGBoost, which is a type of decision-tree-based model.
The research revolves around creating a framework that can be used to detect
AMI at an early stage. For the best-performing CNN, recurrent neural net-
work (RNN) and XGBoost models, the prediction accuracy comes out to be
89.9, 84.6, and 97.5 percent, respectively, and ROC as 90.7, 82.9, and 96.5 per-
cent (the curve areas) are achieved, respectively. The importance and use of
the machine learning techniques are clearly proven to be of great value in the
prediction of cardiovascular disease in the paper.
Lenselink et al. [2]studied the risk prediction models (RPMs) for CAD
with the help of variables related to CAD. But the use of predictors in the
clinical practice is very limited because of unavailability of a proper descrip-
tion of the model, method for external validation, and the head-to-head
comparison. The author uses Tufts PACE CPM Registry and a systematic
PubMed search to identify the RPMs for CAD prediction and all the selected
models are externally verified in three large cohorts, namely, UK Biobank,
LifeLines, and PREVEND. The author takes two endpoints, which are MI
as a primary endpoint and CAD as a secondary endpoint, into consider-
ation for validating every RPM externally. It consists of MI, coronary artery
bypass grafting, and percutaneous coronary intervention. He calculates C-
index (model discrimination), intercept and regression slope (calibration),
and accuracy (Brier Score) to compare the selected RPMs. To estimate the
calibration ability of an RPM, he used the method of linear regression ana-
lysis. In the paper, 28 RPMs were selected, but according to this paper, no
best-performing RPM is identified as C-index of most of the RPMs is 0.706
± 0.049, 0.778 ± 0.097, and 0.729 ± 0.074 for the prediction of MI in different
287
can reveal hypotension, tachypnea and fever can be common, and because
of the distended neck veins, there can be the right ventricular failure. Last,
but not the least, if the patient has developed pulmonary edema, then he
may have wheezing and rales.
Panju et al. [5]focuses on the features that help in increasing or decreasing
the probability of AMI. The selected features are history, physical examin-
ation of the patient, and ECG data. The ECG data is included for research
as the doctor usually interprets the results as the immediate initial clinical
assessment of the condition of the patient. In the first step, three diagnostic
groupings of the patients were made on the basis of acute chest pain and
then compared the contrast with the categorization of chest pain whether
the MI is present or not. The symptoms of MI are described briefly along
with the signs of MI mechanism of chest pain and the conditions that are
usually present with other symptoms related to MI. The paper discusses, in
detail, the role of the accuracy and precision of history, the physical exam-
ination of the patient, and ECG data in the identification of MI. The clinical
data related to these mentioned features along with their associated like-
lihood ratios (LRs) is considered for the prediction rules for AMI and are
taken with a broad set of inclusion criteria. The paper provides the conclu-
sion that the most crucial clinical feature is chest pain radiating toward the
arm that increases the probability of MI. Precisely, patients having MI have
two times the probability of having the chest pain radiating toward the left
arm than the patients who are not having MI. Meanwhile, the probability
of chest pain radiating toward the right arms is three times and the prob-
ability of chest pain radiating toward both the arms is seven times higher
than the patients without MI.
Rossiev et al. [6]discusses the computer expert system to forecast four
different types of complications that may appear to the patient suffering
from the MI in the hospital period. The neural network used in the paper
gathers the experience while training on the input data of real clinical cases.
The dataset chosen contains four types of attributes that have the high pri-
ority or are more crucial. They divide the main task into eight subtasks,
which contain four binary subtasks with the outcomes as 1 or 0 for having
the complication or not having the complications, respectively, and four
subtasks have the numerical output. For network training, backpropagation
technique is used. The objective of training is to minimize the estimation
function (global minimum taken). The conclusion made in this paper is
about the great possibilities of creating a neural network expert system for
predicting the complications of MI with the use of neural networks and
accelerating the process of creating an expert system that does not require
the mathematical algorithms for solving each task. The output from this
expert system is probabilistic. The doctor can take help from the expert
system by feeding the input to the expert system and make his or her own
decision afterward.
289
17.5 Data Collection
MI complications dataset are taken from UCI Machine Learning [7]. The
dataset has 1,700 instances with 124 unique attributes, but it has a few missing
values also (about 7.6 percent). Figure 17.2 shows the first 20 instances of the
dataset with the first header as the name of the attributes each starting with
a unique ID.
Out of 124 attributes, the first column is specified for the patient’s ID,
columns 2–112 attributes are used as input data for prediction, and columns
113–124 are attributes to be predicted that are the possible outcomes
(complications). The prediction is done for the time at the end of 72 h (72 h
after admission to the hospital) and columns (2–112) are used as the input for
prediction.
17.6 Data Analysis
As there are a lot of attributes used in the dataset, we have discussed only the
attributes that are to be predicted (complications). We have given the details
regarding the values of the attributes, their significance in relation to the MI,
290
FIGURE 17.2
First 20 instances of the dataset.
the number of instances, and their percentage present in the dataset. In the
dataset, the missing values are not present in the attributes that are to be
predicted (complications), but these are present in other remaining attributes
that are taken as input to the model. A brief description of the 12 attributes
(113–124) used in the dataset, which are to be predicted, is given below.
“1” stands for “yes” and constitutes for 42 instances (2.47 percent) with no
missing values.
17.7 Model Training
For training the model, we have used 80–20 split proportion for training and
testing, respectively. The attributes of columns 113–124 are taken individually
with the dataset of columns 1–112 for the prediction separately. We calculated
the performance of the model at different instances by varying different
parameters such as the number of epochs, the number of trees, and depth of
trees. The objective of research is to find the best NDF model, which gives the
highest performance for predicting the attributes 113–124 (complications) by
varying different parameters that are discussed below.
17.8 Results
We evaluate our model on the input data at various parameters. Basically, we
checked for the loss and sparse categorical accuracy of the model and found
the best suitable parameters for the given objective and the input data. The
best performance of the model is achieved by taking batch size as 100, and
the number of epochs as 10.
In the case of a single neural decision tree model, we use the depth as 5 and
we take all features for input features (used_feature_rate =1.0).
In the case of neural decision forest, we use the number of trees as 10 (num_
trees =10) and the depth as 10, while taking only half the number of total
input features (used_feature_rate =0.5), which are selected randomly.
The performance of a single neural decision tree model where we use all
input features columns 2–112 for the prediction of attributes 113–124 (stated
above) is shown in Table 17.1. The number of epochs used is 10, and the
depth of the tree used is 5.
The performance of the NDF that consists of many neural decision trees
where we use all input features columns 2–112 for the prediction of attributes
113–124 (stated above) is shown in Table 17.2. The number of epochs used is
10, number of trees used is 5, and the depth of trees used is 5.
294
TABLE 17.1
Performance of a Single Neural Decision Tree Model
TABLE 17.2
Performance of the Neural Decision Forest Model
17.9 Conclusions
The use of machine learning in the field of health care and medicine has
shown very good results and opens up tremendous possibilities in this field
for the benefit of the society. Machine learning also provides clues related to
diseases, which are quite difficult to predict with mathematical calculations
alone, by using sophisticated techniques such as NDFs. By the use of the
NDF method, the prediction of the possible outcomes (complications) of a
disease becomes easy, and it provides high accuracy despite having missing
295
values in the input data. This gives the opportunity for patients to become
aware of the symptoms of the disease in advance and prepare accordingly or
get treatment as soon as possible.
In future work, the creation of more complex machine learning algorithms
is required to increase the efficiency of the prediction models. Moreover,
there is a need to create an algorithm that can work efficiently, not only for
the people of a particular geographical area, but for people who belong to
different communities, different places, and different geographical locations.
To avoid overfitting, the dataset should include the instances of the patients
having variety in terms of the demographics, lifestyle, age, and medical
history.
References
[1] Ibrahim L,Mesinovic M, Yang K-W, Eid MA. Explainable prediction of acute
myocardial infarction using machine learning and Shapley values, IEEE Access,
2020;8: 210410–210417, , doi: 10.1109/ACCESS.2020.3040166.
[2] Lenselink C,Ties D,Pleijhuis R, van der Harst P. Validation and comparison of
28 risk prediction models for coronary artery disease, Eur. J. Prev. Cardiol., 2021.
doi: 10.1093/eurjpc/zwab095.
[3] Smith LN et al., Acute myocardial infarction readmission risk predic-
tion models, Circ. Cardiovasc. Qual. Outcomes, 2018; 11(1). doi: 10.1161/
CIRCOUTCOMES.117.003885.
[4] Mechanic OJ, Gavin M, Grossman SA. Acute Myocardial Infarction. [Updated
2022 May 9]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing;
2022 Jan. Available from: www.ncbi.nlm.nih.gov/books/NBK459269/.
[5] Panju AA, Is this patient having a myocardial infarction? JAMA, 1998;280(14):1256,
doi: 10.1001/jama.280.14.1256.
[6] Rossiev DA, Golovenkin SE, Shulman VA, Matjushin GV. Neural networks for
forecasting of myocardial infarction complications, in The Second International
Symposium on Neuroinformatics and Neurocomputers, pp. 292– 298,
doi: 10.1109/ISNINC.1995.480871.
[7] https:// a rch i ve.ics.uci.edu/ m l/ d atas e ts/ M yo c ard i al+ i nf a rct i on+ c ompli
cations
[8] Saleh M, Ambrose JA. Understanding myocardial infarction. F1000Res.
2018 Sep 3;7:F1000 Faculty Rev- 1378. doi: 10.12688/f1000research.15096.1.
PMID: 30228871; PMCID: PMC6124376.
[9] Wu J, Qiu J, Xie E, Jiang W, Zhao R, Qiu J, Zafar MA, Huang Y, Yu AC. Predicting
in-hospital rupture of type A aortic dissection using Random Forest, J. Thorac
Dis., 2019;11(11):4634–4646.
[10] Chandler AB, Chapman I, Erhardt LR, et al. Coronary thrombosis in myocar-
dial infarction. Report of a workshop on the role of coronary thrombosis in the
pathogenesis of acute myocardial infarction. Am. J. Cardiol. 1974;34(7):823–833.
doi: 10.1016/0002-9149(74)90703-6
296
[11] Thygesen K, Alpert JS, Jaffe AS, et al.: Third universal definition of myocardial
infarction. Circulation. 2012;126(16):2020–35. doi: 10.1161/CIR.0b013e31826e1058
[12] DeWood MA, Spores J, Notske R, et al. Prevalence of total coronary occlusion
during the early hours of transmural myocardial infarction. N Engl J Med.
1980;303(16):897–902. doi: 10.1056/NEJM198010163031601
[13] Ambrose JA, Najafi A. Strategies for the prevention of coronary artery disease
complications: Can we do better? Am J Med. 2018; pii: S0002–9343(18)30382-6.
doi: 10.1016/j.amjmed.2018.04.006
[14] Bassand JP, Hamm CW, Ardissino D, et al. Guidelines for the diagnosis and
treatment of non-ST-segment elevation acute coronary syndromes: The task force
for the diagnosis and treatment of non-ST-segment elevation acute coronary
syndromes of the European Society of Cardiology. Eur Heart J 2007;28:1598–660.
[15] Mandelzweig L, Battler A, Boyko V, et al. The second Euro Heart Survey
on acute coronary syndromes: Characteristics, treatment, and outcome of
patients with ACS in Europe and the Mediterranean Basin in 2004. Eur Heart J
2006;27:2285–2293.
[16] Sanchis-Gomar F, Perez-Quilis C, Leischik R, Lucia A. Epidemiology of coronary
heart disease and acute coronary syndrome. Ann Transl Med 2016;4:256–256.
[17] Frohlich ED, Quinlan PJ. Coronary heart disease risk factors: public impact of
initial and later-announced risks. Ochsner J 2014;14:532–537.
newgenprepdf
297
18
Image Classification Using Contrastive
Learning
CONTENTS
18.1 Introduction.............................................................................................. 298
18.2 Background............................................................................................... 298
18.3 Implementation........................................................................................ 300
18.3.1 Self-Supervised Learning.......................................................... 300
18.3.2 Contrastive Learning................................................................. 300
18.3.3 SimCLR........................................................................................ 301
18.3.3.1 Dataset......................................................................... 302
18.3.3.2 Data Augmentation................................................... 302
18.3.3.3 Extraction of Representation Vectors with
Neural Network Encoder......................................... 303
18.3.3.4 Nonlinear Projection Head....................................... 306
18.3.3.5 Normalized Temperature-Scaled
Cross-Entropy Loss Function for
Contrastive Prediction.............................................. 307
18.3.3.6 Training of ResNet-18 using NT- Xent Loss........... 308
18.4 Results and Discussion............................................................................ 309
18.4.1 Data Augmentation: Original and Augmented Image......... 309
18.4.2 Layers of Neural Network Base Encoder and
Projection Head.......................................................................... 310
18.4.3 Training Losses............................................................................311
18.4.4 Training of ResNet-18 using NT-Xent Loss............................ 312
18.5 Conclusion and Future Work................................................................. 312
18.1 Introduction
The problem of learning visual representations effectively without human
supervision can be done by contrastive learning. This work has been based
on learning visual representations by using the SimCLR framework [1].
Visual representations are image representation vectors on which supervised
linear or unsupervised classifiers can be trained for accurate image recogni-
tion and classification. These representations can be learned by training deep
learning models like ResNets on labeled datasets like ImageNet. But labeling
and annotating data is a time-consuming and elaborate process and can be
avoided by using alternate learning techniques like self-supervised learning
where the training data is automatically labeled by finding and utilizing
correlations between various input features.
Contrastive learning of visual representations can be done by efficiently
finding similar and dissimilar images. For understanding contrastive
representation learning, we interpreted the key elements of the SimCLR
framework which show that:
The rest of the chapter will describe the background of previous work done
in the field of using self-supervised and contrastive learning to achieve state-
of-the-art results; how SimCLR framework has been used and implemented
in this work; visualization of the results obtained from the implementation
and future scope and applications of our work.
18.2 Background
In this chapter, we explored and highlighted how a simple framework for
contrastive learning of visual representations referred to as SimCLR [1]
299
18.3 Implementation
18.3.1 Self-Supervised Learning
Self-
supervised learning method is used to train computers to do tasks
without the need to provide labeled data manually (Figure 18.1). It is a
subset of un-supervised learning where results are derived by information
labeling, categorizing, and finally analyzing it. The conclusions are drawn by
the machine itself on the basis of connections and correlations. The system
learns and attains the capability to know the different parts of any object by
encoding it, so that recognition can be done from any angle. Only then can
the object be classified correctly and provide context for analysis to come up
with the desired output.
18.3.2 Contrastive Learning
Contrastive learning is a method used for finding similar and dissimilar
things by training a ML model to learn classification between similar and
dissimilar images (Figure 18.2). Contrastive learning contrasts positive pairs
against negative pairs and learns representations. This method of learning
FIGURE 18.1
Self-supervised learning results.
FIGURE 18.2
Expected contrastive learning prediction output.
301
18.3.3 SimCLR
SimCLR maximizes similarity value between differently augmented versions
of same image data by contrastive loss in latent space and learns visual
representations (Figure 18.3). The framework includes the following major
processes:
FIGURE 18.3
SimCLR framework.
302
18.3.3.1 Dataset
We have used a manually created ImageNet dataset containing 1250 images
for training (250 images for each of the 5 categories) and 250 images for
test (50 images for each of the five categories). The five categories used for
SimCLR framework analysis are car, dog, bear, donut, and jean.
18.3.3.2 Data Augmentation
Data augmentation has previously been used for supervised or unsupervised
representation learning [4,6,13] for contrastive prediction tasks by making
changes in the architecture. In SimCLR application, generally, for a batch of
N images, on applying the below mentioned composition of data augmenta-
tion operations, 2N augmented images are obtained as shown in Figure 18.4.
Then, for a particular positive pair of images (i, j) from the 2N images,
2(N–1) images are considered as negative pair (i, j) examples as displayed in
Figure 18.5.
In our analysis, as ImageNet images are always of different sizes; each of
the images in our dataset went through the first data augmentation operation
and were cropped and resized to size 224 × 224 (Figure 18.6). This was done
by standard random cropping [14] of random size 1.0 in area, of original size
and then resizing of the cropped image. Thus, we obtained the first set of
augmented images. This implementation is carried out in Pytorch and makes
use of RandomResizedCrop class of torch package transforms of Pytorch
library as shown in the pseudocode in Figure 18.7. Additionally, in another
data augmentation operation, color distortion was done to get another set
303
FIGURE 18.4
2N images formed from batch of N images on performing data augmentation operations.
FIGURE 18.5
From the 2N images, 2(N–1) images considered as negative pairs.
FIGURE 18.6
Data augmentation highlighted in main SimCLR algorithm.
FIGURE 18.7
Data augmentation displayed in pseudo code.
305
FIGURE 18.8
Encoder function for representation vectors in main SimCLR algorithm.
FIGURE 18.9
ResNet-18 with top layer replaced by fully connected layers and last layer replaced with
nonlinear classifier.
FIGURE 18.10
Nonlinear projection head used after representation in main SimCLR algorithm.
FIGURE 18.11
Removal of last and ReLU layer and the method to see the tSNE visualization.
307
shows the removal of the last layer of the projection head and its affect can be
visualized using tSNE visualization.
FIGURE 18.12
Removal of last layer of projection head and the method to see the tSNE visualization.
FIGURE 18.13
Loss function for similarity maximization between vectors.
308
FIGURE 18.14
NT-Xent Loss function code.
FIGURE 18.15
Linear classifier code.
train a batch size 256 using cloud GPU and carry out the decaying of learning
rate with cosine decay schedule without restarts. We use SGD optimizer with
square root learning rate scaling for analysis purpose as we have considered a
small batch size and small number of epochs, but training with SGD becomes
unstable with large batch size; in that case, LARS optimizer can be used [1].
Figure 18.14 shows the code for the loss function where tau is a temperature
hyperparameter, which makes loss function more expressible and here, the
similarity between two vectors a and b is the dot product of their respective
unit vectors a_cap and b_cap.
testing accuracy and losses versus number of epochs graph are generated
which is discussed upon in the next section.
TABLE 18.1
Two Images Each from “Donut” and “Bear” Category Obtained by Applying
Composition of Random Crop and Resize and Color Distortion on the Original Picture
The conclusion here is that even though the positive pairs are getting iden-
tified efficiently, but there exists no single transformation obtained after data
augmentation that is enough to learn good representations; but the above
discussed composition is good for learning generalizable features.
t-
SNE visualization of training dataset t-
SNE visualization of testing
dataset
The conclusions drawn from the above discussion and results are as
follows:
TABLE 18.2
t-SNE Visualizations of the Last Layer Vectors of Train (10% of 1,250 =125) and Test
(250) Images
TABLE 18.3
t-SNE Visualizations of the Second Last Layer Vectors of Train (10% of 1250 =125) and
Test (250) Images
FIGURE 18.16
Code to visualize training losses graph.
18.4.3 Training Losses
We modified the architecture of our Resnet-18 model by replacing top and
last layers of it by fully connected layers and a classifier respectively. On
training the Resnet-18 model, some training losses calculated by NT-Xent
Loss function alongside the number of epochs used for training is visualized
using matplotlib.pyplot (Figures 18.16 and 18.17).
312
FIGURE 18.17
Training losses versus no. of epochs graph.
TABLE 18.4
Accuracy and Losses Graphs Plotted While Training a Linear Classifier on 10 Percent
Labeled Training Data
References
1. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton,
“A simple framework for contrastive learning of visual representations”, In
the Proceedings of the 37th International Conference on Machine Learning,
Vienna, Austria, July 2020.
314
2. Hadsell, R., Chopra, S., and LeCun, Y., “Dimensionality reduction by learning
an invariant mapping”, In 2006 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’06), volume 2, pp. 1735–
1742. IEEE, 2006.
3. R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal,
Phil Bachman, Adam Trischler, and Yoshua Bengio, “Learning deep
representations by mutual information estimation and maximization”, arXiv
preprint arXiv:1808.06670, August 2018.
4. Philip Bachman, R Devon Hjelm, and William Buchwalter, “Learning
representations by maximizing mutual information across views”, In the
33rd Conference on Neural Information Processing Systems (NeurIPS 2019),
Vancouver, Canada, 2019.
5. Aaron van den Oord, Yazhe Li, and Oriol Vinyals, “Representation learning
with contrastive predictive coding”, arXiv preprint arXiv:1807.03748, 2018.
6. Olivier J. Henaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch,
S. M. Ali Eslami, and Aaron van den Oord, “Data-efficient image recogni-
tion with contrastive predictive coding”, In the Proceedings of the 37th
International Conference on Machine Learning, Vienna, Austria, 2020
7. Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin, “Unsupervised fea-
ture learning via non-parametric instance discrimination”, In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–
3742, 2018.
8. Kaiming He Haoqi Fan Yuxin Wu Saining Xie and Ross Girshick, “Momentum
contrast for unsupervised visual representation learning”, arXiv preprint
arXiv:1911.05722, 2019.
9. Ishan Misra and Laurens van der Maaten, “Self- supervised learning of
pretext-invariant representations”, arXiv preprint arXiv:1912.01991, 2019.
10. Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin
Riedmiller, and Thomas Brox, “Discriminative unsupervised feature learning
with exemplar convolutional neural networks”, In Advances in Neural
Information Processing Systems, pp. 766–774, 2014.
11. Yonglong Tian, Dilip Krishnan, and Phillip Isola, “Contrastive Multiview
coding”, arXiv preprint arXiv:1906.05849, 2019.
12. Ye, M., Zhang, X., Yuen, P. C., and Chang, S.-F. “Unsupervised embedding
learning via invariant and spreading instance feature”, In the Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6210–
6219, 2019.
13. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenet classi-
fication with deep convolutional neural networks”, In Advances in Neural
Information Processing Systems, pp. 1097–1105, 2012.
14. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew
Rabinovich, “Going Deeper with Convolutions”, In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.
15. https://medium.com/analytics-vidhya/understanding-simclr-a-simple-
framework-for-contrastive-learning-of-visual-representations-d544a9003f3c.
315
Index
Note: Page numbers in bold refer to tables and those in italic refer to figures.
315
316
316 Index
Index 317
318 Index
Index 319
320 Index
Index 321
322 Index
Index 323
t-SNE visualizations: last layer vectors wave height time series: oceanography
310; second last layer vectors 311 buoys 164; two-staged procedure
165
UniProtKB/Swiss-Prot consortium 134 wavelet neuro networks (WNN) 21
univariate models 264–6, 266 wavelets transform (WT) 21
unsupervised learning 45, 46, 183 wireless AI: channel coding 247;
unsupervised retrieval of attack profiles cognitive radio 247–8; energy-efficient
(UnRAP) detection technique 153 network 247; 5G network slicing 247;
user-based CA algorithm 142 M-MIMO and beamforming 246–7;
user-based collaborative filtering modulation regulation 248
(UBCF) 143–4, 144 wish-fulfillment theory 95