Kulkarni A. Optimization in Machine Learning and Applications 2020
Kulkarni A. Optimization in Machine Learning and Applications 2020
Anand J. Kulkarni
Suresh Chandra Satapathy Editors
Optimization in
Machine Learning
and Applications
Algorithms for Intelligent Systems
Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, Department of Mathematics and Computer Science,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms
for intelligent systems with their applications to various real world problems. It
covers research related to autonomous agents, multi-agent systems, behavioral
modeling, reinforcement learning, game theory, mechanism design, machine
learning, meta-heuristic search, optimization, planning and scheduling, artificial
neural networks, evolutionary computation, swarm intelligence and other algo-
rithms for intelligent systems.
The book series includes recent advancements, modification and applications
of the artificial neural networks, evolutionary computation, swarm intelligence,
artificial immune systems, fuzzy system, autonomous and multi agent systems,
machine learning and other intelligent systems related areas. The material will be
beneficial for the graduate students, post-graduate students as well as the
researchers who want a broader view of advances in algorithms for intelligent
systems. The contents will also be useful to the researchers from other fields who
have no knowledge of the power of intelligent systems, e.g. the researchers in the
field of bioinformatics, biochemists, mechanical and chemical engineers,
economists, musicians and medical practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
Editors
Optimization in Machine
Learning and Applications
123
Editors
Anand J. Kulkarni Suresh Chandra Satapathy
Department of Mechanical Engineering School of Computer Engineering
Symbiosis Institute of Technology Kalinga Institute of Industrial
Pune, Maharashtra, India Technology (KIIT)
Bhubaneswar, Odisha, India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
vii
viii Contents
ix
Chapter 1
Use of Artificial Neural Network
for Abnormality Detection in Medical
Images
1 Introduction
crossings, rib vessel crossings which can be mistaken as malignant tumors. Thus, the
precise detection of the cancer will highly depend on reduction in such factors that
tend to be FP images and a considerable increase in the true-positive (TP) results.
There are a number of ways to reduce faulty results. These methods mostly work
based on extraction of features and classification of features. Some of the authors
tried feature extraction and implemented with artificial neural networks (ANN) [3,
4]. The critical task is to identify faulty cells. Even for experienced radiologists, it is
very difficult and risky job to distinguish normal and abnormal cells. The tool specif-
ically used by pathologist is WSI, but it is lacking in automation and classification
of features which is an important parameter for early diagnosis of disease [5].
2 Literature Survey
nonlinear classifiers are used for analysis [4]. The aim of the research is to carry out
the experiment to improve the image quality. In various medical images, improve-
ment has been done by using the adaptive fractional order derivative. Improved image
quality will help in accurate diagnosis [13]. A system is developed for self diagnosis
in preliminary level. As per the claim by author, users have to give the details about
the symptoms and based on the database available the diagnosis will be done [14].
Author had considered various images collected from MRI, CT and PET for analysis
purpose. Image segmentation based on deep convolution neural network has been
used. Different fusion schemes like feature learning level, fusing at classifier level
and fusing at decision-making level have been performed [15].
3 Proposed Method
There are various techniques that have been used for classification of cancer cells.
Cancer affected regions have been captured by X-ray images, endoscopy, microscop-
ically zoomed biopsy, etc. In all these methods, automation in diagnosis is lacking
for better results. To overcome the drawback of existing system, a new enhanced and
effective methodology is implemented.
System block diagram has been shown in Fig. 1. Through this chapter, we pro-
pose classification algorithms using ANN to build a prototype for the automatic
classification of tumor being malignant or benign.
For this research chapter, we need to have authenticated unaltered and unprocessed
[13] images for algorithms to be applied on the images for the enhancement of the
precise classification. The experimental database has been collected from private
hospitals, Japanese Society of Radiological Technology (JSRT) which is the only
open-browsing database for medical image processing research work.
Image Applica-
Final Classifier tion
Result of Neural
Network
The image database used in experiment has 24 images with minor growth of
cells, 24 images with major growth and 24 images with tuberculosis, and such total
72 images have been used. Each of the images is of 512 × 512 pixels in size. The
images are obtained by browsing the public database of JSRT [11]. For preprocessing
of data, MATLAB software has been used. The scanned image is saved with the size
of 512 × 512 pixels. After scanning, the image quality gets affected by artifacts like
non-uniform intensity, speeds and shift. Thus, the preprocessing removes the noise
present in scanned images by keeping essential details of the image [9]. Hence, image
filtering helps in preprocessing. Image filtering can be done by median filtering given
as f (x, y) = median{g(s, t)}, where (x, y) are the target pixel coordinates to be replaced
by the value of median pixel value at (s, t). The first step is segmentation which helps
in separating background and tumor cells [10]. With the help of segmentation, it is
possible to remove the bony structure which makes analysis easier. In prescribed
method by defining the peripheral coordinates, we can define the affected area. The
image is made binary by indicating field area by logical value 1 and remaining area
by logical value 0. Masks can be used to find out edges in the images by checking for
discontinuity in the pixel values. Threshold value is considered as a valley point of
two peaks on histogram. This valley will give an approximate value of the threshold
to be set for segmenting it as a nodule from the lung region. It has been observed
that unwanted gray pixels also get segmented in this method. Morphological erosion
and dilation can be done to remove the artifacts. Morphological operations help in
maintaining the details necessary for further processing. In image processing, feature
extraction stage is an important stage that uses algorithms and techniques by which it
is possible to sight and isolate numerous desired parts or shapes (features) of a given
image [12]. Different parameters like affected area and size have been obtained from
image. Pixels with value 1 denote segmented tumor. Algorithm is implemented and
analyzed by 2 by 2 pixel patterns. Here, the peripheral pixels will decide the perimeter
of the tumor in image. In morphology of tumors, generally the shape of the tumor is
circular. Change in shape has been identified by index measured by equation,
I = 4π A/ p2
where P is the perimeter of the tumor, and A is area of the tumor in pixels. Based on
contrast and texture, cell classification is possible.
For the classification of malignant or benign, the nodes of the neural network
are average gray level, standard deviation, smoothness, third moment, uniformity,
entropy, contrast and energy. In the training process of malignant and benign images,
the classifier sets threshold for these nodes and upon testing the process classifies it
based on the values obtained. It classifies it to the most strongly matched area, i.e.,
malignant or benign. For training of malignant and benign images, the classifier sets
threshold for these nodes. Classification has been done based on the values obtained
by calculation using given formulas in equations [1, 3, 6–8]. Based on strongly
matched parameters, classification is done. P(Zi) is an estimate of the probability
of occurrence of gray level Zi. The Lth moment of Z about its mean is defined as
follows.
1 Use of Artificial Neural Network for Abnormality Detection … 5
L−1
M= (Zi − m)n P(Zi) (1)
i=0
L−1
Third Movement = (Zi − m)3 P(Zi) (3)
i=0
L−1
Uniformity, U = (P)2 (Zi) (4)
i=0
L−1
Entropy, e = − P(Zi)log2P(Zi) (5)
i=0
which will not only classify images in malignant or benign but also detect and diag-
nose malignancy. With the culmination of deep learning and data mining techniques,
a fully automated, high precision system can be developed for detection, classifica-
tion and even diagnoses of cancer. Figure 3 shows the flowchart of complete system
functioning.
Received results are shown in Figs. 4, 5, 6, 7, 8, 9, 10, 11 and Tables 1 and 2. From
received results, it is concluded that we can classify between benign and malignant
X-ray images using artificial neural network classifier more accurately. Table 1 gives
comparison for benign images, and Table 2 gives the details about malignant images.
Comparison has been done by considering various parameters like, avg. gray level,
std. deviation, smoothness, third moment, uniformity and entropy. As per the first
step of algorithm, the image has been captured by scanning one of the preprocessing
step. After preprocessing, there are chances of noise inclusion in the image; hence,
image is get filtered out and gets converted to gray form as shown in Fig. 4. Image
segmentation is one of the most important steps to separate out the lungs area from
rib structure. For better analysis and diagnosis, the shoulder bones, rib, etc., must get
filtered out. Segmentation is helping in separation of interested portion from complete
image as shown in Fig. 5. This segmentation has been carried out to find benign and
malignant as shown in Figs. 6 and 7. Experiment purpose benign image has been
taken as shown in Fig. 8. Segmentation process has been done on the image, and
results are shown in Fig. 9. Figures 10 and 11 give the details about GUI developed
for user interfacing.
We can add multiple nodes or neurons to this network for further precision in the
analysis and classification. The merging of data mining and deep learning together
1 Use of Artificial Neural Network for Abnormality Detection … 7
will make this prototype a full-fledged automated cancer classifier and detector.
The key feature of algorithm implemented is information processing using artificial
intelligence. In artificial neural network, large number of interconnected elements
called as neurons are present, which can get trained to solve specific problems.
8 P. R. Rajarapollu et al.
References
10. Johra FT, Shuvo MMH (2016) Detection of breast cancer from histopathology image and
classifying benign and malignant state using fuzzy logic. In: 2016 3rd international conference
on electrical engineering and information communication technology (ICEEICT). Dhaka, pp
1–5
11. Economou GK et al (1994) Medical diagnosis and artificial neural networks: a medical expert
system applied to pulmonary diseases. In: proceedings of IEEE workshop on neural networks
for signal processing. Ermioni, Greece, pp 482–489
12. Lenzi C, Pasian M, Bozzi M, Perregrini L, Caorsi S (2017) MM-waves modulated Gaussian
pulse radar breast cancer imaging approach based on artificial neural network: preliminary
assessment study. In: 2017 Mediterranean microwave symposium (MMS). Marseille, pp 1–4
13. Krouma H, Ferdi Y, Taleb-Ahmedx A (2018) Neural Adaptive fractional order differential
based algorithm for medical image enhancement. In: 2018 international conference on signal,
image, vision and their applications (SIVA). Guelma, Algeria, pp 1–6
14. Aljurayfani M, Alghernas S, Shargabi A (2019) Medical self-diagnostic system using artificial
neural networks. In: 2019 international conference on computer and information sciences
(ICCIS). Sakaka, Saudi Arabia, pp 1–5
15. Guo Z, Li X, Huang H, Guo N, Li Q (2018) Medical image segmentation based on multi-modal
convolutional neural network: study on image fusion schemes. In: 2018 IEEE 15th international
symposium on biomedical imaging (ISBI 2018), Washington, DC, pp 903–907
Chapter 2
Deep Learning Techniques for Crime
Hotspot Detection
1 Introduction
The prevention of crimes is more profitable to a society than to solve a crime after
its occurrence. It is essential for police forces across the world to have prior knowl-
edge about the probable locations of future crimes, for more efficient utilization of
police resources. Hotspot analysis is a major part of crime mapping studies. A crime
hotspot is defined as an area that has a greater than the average number of crimi-
nal or disorder events, or an area where people have a higher than average risk of
victimization. Accuracy and time complexity are the two major constraints associ-
ated with this problem, since real-time results are necessary for a quick response to
changing conditions. Also, accuracy is hard to achieve since the problem depends
on many dynamic parameters, like gang behavior. Statistical approaches to the prob-
lem are more time consuming and are not able to provide the real-time results that
are needed. A deep learning-based approach would provide much faster results for
a better response from police force. Also, accuracy would be improved due to the
ability of the deep neural network to find complex relations that are hidden in the
raw data.
Deep learning techniques have been successfully used in similar applications due
to its ability to find complex relationships between inputs of large dimensions and
corresponding outputs. Deep learning techniques have been proven to work well
with huge datasets with large number of parameters. The input datasets are made to
be correlated with introducing some overlap in time interval between consecutive
datasets, and this correlation would help the deep neural network to identify the
features of the dataset much better. The raw dataset was converted into heat maps
so that two-dimensional convolutional filters could be applied to find local and more
global features of the dataset.
Crime analysis involves exploiting data about crimes to enable law enforcement to
better apprehend criminals and prevent crimes. Data used by crime analysts includes
the time and locations of crimes and a variety of characteristics, such as methods of
entry and items stolen, that vary with the type of crime. Crime analysts use these data
with methodologies like aggregate crime rate analysis, hotspots, and space-time point
process modeling to analyze and predict the spatial patterns of crimes. Ring-shaped
hotspot detection is important for a variety of application domains where finding a
ring-shaped hotspot may help focus domain users’ efforts to a specific region. For
example, finding a ring-shaped hotspot may focus public security officials’ efforts to
the inner circle of a ring when searching for a possible crime source [1, 2]. Various
criminal profiling and criminal behavioral studies [3, 4] have been used as the basis
for modeling the probability of a site being selected by criminals as their preferred
locations for a crime.
Spatial scan statistics are used to determine hotspots in spatial data and are widely
used in epidemiology and bio-surveillance. In [5], experiments regarding the compu-
tational study of spatial scan statistics were performed. An algorithm was proposed
to find the largest discrepancy region in a domain. Approximation algorithms were
developed using these discrepancy functions, which could be used for spatial scan
analysis of crime locations.
Aggregate crime rate analysis uses sample units, such as neighborhoods, cities,
and schools, to explain the variation in crime rates across those units. The statistical
basis of this analytic approach is well established, and it is modeled as a regression
problem. Many regression methods, especially Poisson regressions, are well studied
and broadly used [6–8].
Hotspot models use past crime data to identify unusual clusters of criminal
incidents within a well-defined region. These clusters are commonly referred to
as hotspots representing areas that contain unusual amounts of crimes. The term
‘hotspots’ has become part of the lexicon of crime analysts. In [9], the computa-
tional study of spatial scan statistics is extensively studied. First, an exact algorithm
for finding the largest discrepancy region is described. Then, a new approximation
algorithm is proposed for a large class of discrepancy functions to improve approx-
imation. A survey of the existing techniques for the identification of geographic
hotspots for crimes and other application has been studied in some papers [10, 11].
In [12], spatiotemporal hotspots are selected through density estimation techniques
using both kernel methods and mixture models.
Various studies have also been conducted on the effects of sociological and envi-
ronmental parameters on the crime rates. Cohen and Felson [13] postulate, based
2 Deep Learning Techniques for Crime Hotspot Detection 15
on human ecological theory, that higher crime rates happen when the dispersion of
activities away from households is more. In [14], large data samples of crimes have
been collected, and geographic location-based profiling has been done for various
types of crimes. In [15, 16], the authors investigate various hidden aspects like the
physical and social characteristics of crime sites and people’s perceptions of crime
locations and the policies that create or maintain these locations.
In [17], the author defines various types of hotspots and elaborates on the accepted
theories about the probable root causes for the occurrence of hotspots. Chainey and
Dando [18] discuss the different statistical tests and other techniques to effectively
identify the crime hotspots. Four different datasets were used to compare the different
proposed statistical methods. Some spatial analysis tools to closely study the spatial
patterns and locational contexts of crime are examined in [19]. These tools are being
used by police forces in various parts of the world with different levels of success.
2 Proposed Methodology
The block diagram in Fig. 1 describes the various stages by which the experiments
proposed in this paper have been implemented. The raw data consists of date when
each crime has occurred and the corresponding geographical location of each crime,
as defined by the latitude and longitude where the crime has occurred. The prepro-
cessing of the dataset is done by dividing the crimes into many sets based on the
date of occurrence of the crimes. Each set includes crimes that have occurred in a
two-week period. Also, two sets that have consecutive two-week intervals have an
overlapping period of one week.
Fig. 1 Block diagram describing various steps involved in the proposed idea
16 S. N. Nair and E. S. Gopi
The next step is the generation of heat maps for each of these two-week intervals.
The intensity gradation of each pixel is a grayscale value proportional to the number
of crimes that happened in the area of geographic location corresponding to the
pixel. The best possible circular and ring-shaped hotspots have been identified for
each heat map, using the metric log likelihood ratio (LLR). Once it was identified
that the circular hotspot had better performance, it was chosen to train a deep neural
network which takes the heat maps of past timeframes as inputs and gives the best
circular hotspot of the immediate next timeframe as the output.
A heat map is a thematic map in which areas are shaded or patterned in proportion
to the measurement of the statistical variable being displayed on the map, such as
population density or per capita income. Heat maps provide an easy way to visualize
how a measurement varies across a geographic area or show the level of variability
within a region.
Heat maps are made use of to represent the crimes in a 2-D map projection based
on the location of crime. The data is pre-processed before converting into heat maps.
The dataset contains all crimes that occurred between January 1, 2010, and July 31,
2018, a period of more than 8 21 years. The parameters for each crime include date
and time of occurrence, latitude and longitude of the location. The preprocessing
involves splitting of the dataset into various sets based on the date of occurrence of
the crimes. Each set consists of crimes happened within a time interval of 14 days,
and consecutive sets have an overlap of 7 days. For example, the first set contains
crimes between January 1, 2010, and January 14, 2010. The second set contains
crimes between January 8, 2010, and January 21, 2010, and so on.
The total area in which the crimes in a set occurred is called an activity set. The
area is divided into a grid of size P × P. Now, each crime is mapped onto a pixel
on the P × P activity area, corresponding to its actual location on the map. One
pixel could have more than one crime happening there. So, the grayscale activity
area is graded according to the number of crimes happening there. Lighter shades
of gray indicate lesser number of crimes, and darker shades of gray indicate more
crimes. White-colored pixels indicate locations without any crimes, and black pixels
are locations where maximum crimes have occurred.
2.2 Hotspots
hotspot R is uniquely defined by its inner radius (denoted by r0 ), outer radius (r1 ),
and coordinates of the center of the concentric circles (xr , yr ). Similarly, a circular
hotspot C is uniquely defined by its radius (denoted by rc ) and its center coordinates
(xc , yc ). The statistical metric used to find the best hotspot for each heat map is log
likelihood ratio (LLR).
For an activity area of size P × P, all possible circular hotspots are considered
where the radius rc varies between 5 and P2 pixels. The center coordinates are also
varied between 0 and P. Similarly, for ring-shaped hotspots, inner radius r0 varies
from 5 to P2 , and outer radius r1 varies from r0 + 5 to P2 . The best hotspot for each
shape is selected from this set of hotspots using the metric LLR.
The likelihood ratio expresses how many times more likely the data is under one
model than the other. This likelihood ratio, or equivalently its logarithm, can then be
compared to a critical value to decide whether or not to reject the null model. When
the logarithm of the likelihood ratio is used, the statistic is known as a log likelihood
ratio statistic.
The equation for computing log likelihood ratio is defined as
C C A − C A−C
LLR = log × (1)
B A−B
where
A × ar ea(R)
B=
ar ea(S)
for a null hypothesis that the crime points(activities) are distributed uniformly across
the activity area S. In the above equation, B denotes the expected number of activities
within a given hotspot R, C denotes the observed number of activities within the
hotspot, and A denotes the total number of activity points present in the whole
activity area S (refer Appendix for detailed derivation).
C
The likelihood ratio is the product of two terms; the first term CB denotes the
likelihood ratio of the crime activities inside the hotspot, and the second term denotes
the likelihood ratio of the crime activities outside the hotspot. Thus, the product
gives the likelihood ratio of actual distribution as against the distribution of the null
hypothesis. The null hypothesis assumes that the distribution of crime points across
the activity area follows uniform distribution, since no prior information about any
clustering is available.
Higher values of log likelihood ratio for a given hotspot indicate that the distribu-
tion of crimes within the hotspot is higher when compared to the expected number
of crimes according to the null hypothesis of uniform distribution. If the distribution
exactly matches the null hypothesis, the LLR value computed would be 0. A negative
LLR for a given hotspot indicates that the given hotspot has less number of crimes
than the expected number of crimes. A hotspot is considered to be valid only if the
given hotspot has a positive LLR value. The best hotspot is considered to be the
hotspot for which the LLR is maximum for the given heat map. In this manner, the
best hotspots are computed for all the heat maps. For each heat map, both ring-shaped
and circular hotspots are computed.
18 S. N. Nair and E. S. Gopi
A deep neural network is a type of artificial neural network with a large number of
hidden layers. The advantage of a deep neural network is that it can find very complex
relations between input and output. The lower-level layers of a DNN identify low-
level features like curves and edges that are more local in nature. The higher-level
layers can identify increasingly global features which are more complex. Such a
technique is particularly effective in cases where the dataset is huge and very complex.
From the results obtained, it was concluded that circular hotspots had a better
performance compared to ring-shaped hotspots. So, circular hotspots were used in
the training phase of the deep neural network.
The input layer to the DNN consists of a matrix that would contain the grayscale
heat maps of an arbitrary number of consecutive intervals. The output layer of the
DNN is to be a 3 × 1 vector which contains the parameters (radius and center coor-
dinates) that represent the best hotspot for the immediately next future interval.
f (x) = x + = max(0, x)
For example, assume a 100 × 100 matrix representing the initial input and a 2 × 2
filter that runs over the input. A stride of 2 means the (d x, dy) for stepping over
the input will be (2, 2), and will not overlap regions. Then, the resulting output
will be a 50 × 50 matrix. For each of the regions represented by the filter, we will
take the maximum value of that region and create a new, output matrix where each
element is the max of a region in the original input.
• Dropout Layer: Dropout refers to ignoring units (i.e., neurons) during the training
phase of certain set of neurons which is chosen at random. These units are not
considered during a particular forward or backward pass. At each training stage,
individual nodes are either dropped out of the net with probability 1-p or kept with
probability p, so that a reduced network is left; incoming and outgoing edges to a
dropped out node are also removed. Dropout layers are used to prevent overfitting
in the neural network. A fully connected layer occupies most of the parameters,
and hence, neurons develop codependency among each other during training which
curbs the individual power of each neuron leading to overfitting of training data.
Due to overfitting, the neural network will give a good performance for training
data, but performs poorly for other data inputs, including testing data. In this
experiment, the dropout layers are employed with a dropout probability of 20%.
Also, the deep neural network employed in this project consists of repetition of
2-D convolutional layers, maxpool layers, and dropout layers.
• Flattening Layer: The flattening step is needed to make use of fully connected
layers after some convolutional layers. Fully connected layers do not have a local
limitation like convolutional layers (which only observe some local part of an
image by using convolutional filters). This means we can combine all the found
local features of the previous convolutional layers. Each feature map channel in
the output of a CNN layer is a flattened 2-D array created by adding the results
of multiple 2-D kernels (one for each channel in the input layer). For example, a
flattening layer converts a 25 × 25 × 8 three-dimensional layer into a 5000 × 1
one-dimensional layer. A flattening layer is usually used before a fully connected
layer.
• Fully Connected Layer: A fully connected layer is placed just before the output
layer. All neurons from the previous layer are connected to all neurons in the output
layer. The activation function used is the rectified linear unit function. The output
layer consists of three neurons, for the parameters (rc , xc and yc ) representing the
predicted circular hotspot which is the expected output of the deep neural network.
The deep neural network modeled in this project tackles a regression problem where
the target output is a 3 × 1 vector. The performance of the deep neural network is
analyzed by the predicted output for testing data as compared to the corresponding
target outputs. The performance metrics used to evaluate the neural network in this
project are defined below.
20 S. N. Nair and E. S. Gopi
• Mean Square Error (MSE): The mean squared error (MSE) squares the differ-
ence of all corresponding elements of target vector and predicted vector before
summing them all. The equation below defines the mean squared error.
1
MSE = (y − ŷ)2 (2)
n
where y is the target (actual) output vector, ŷ is the predicted output vector, and n is
the total number of data points. The effect of the square term in the MSE equation
is most apparent with the presence of outliers in the data. Each residual in MSE
contributes quadratically to the total mean squared error. This ultimately means
that outliers in the data will contribute to much higher total error in the MSE, as
compared to mean absolute error. Similarly, the model will be penalized more for
making predictions that differ greatly from the corresponding actual value. This
is to say that large differences between actual and predicted are punished more in
MSE than in MAE.
• Mean Absolute Error (MAE): The mean absolute error (MAE) is the simplest
regression error metric. The residual for every data point is calculated by taking
only the absolute value of each so that negative and positive residuals do not can-
cel out. Then, the average of all these residuals is calculated. Effectively, MAE
describes the typical magnitude of the residuals. The formal equation for mean
absolute error is
1
MAE = |y − ŷ| (3)
n
The MAE is also the most intuitive of the metrics since we only observe the
absolute difference between the data and the model’s predictions. Because we use
the absolute value of the residual, the MAE does not indicate underperformance
or overperformance of the model (whether or not the model under or overshoots
actual data). Each residual contributes proportionally to the total amount of error,
meaning that larger errors will contribute linearly to the overall error. A small
MAE suggests that the model is great at prediction, while a large MAE suggests
that the model may have trouble in certain areas. A MAE of 0 means that the
model is a perfect predictor of the outputs. While the MAE is easily interpretable,
using the absolute value of the residual often is not as desirable as squaring this
difference. Depending on how the model should treat outliers, or extreme values,
in your data, you may want to bring more attention to these outliers or downplay
them. The issue of outliers can play a major role in which error metric you use.
MAE requires more complicated tools such as linear programming to compute
the gradient. MAE is more robust to outliers since it does not make use of square.
On the other hand, MSE is more useful if concerning about large errors whose
consequences are much bigger than equivalent smaller ones.
• Mean Absolute Percentage Error (MAPE): The mean absolute percentage error
(MAPE) is the percentage equivalent of MAE. The equation looks just like that of
MAE, but with adjustments to convert everything into percentages. The equation
2 Deep Learning Techniques for Crime Hotspot Detection 21
1 y − ŷ
MAPE = | | × 100% (4)
n y
Just as MAE is the average magnitude of error produced by your model, the MAPE
is how far the model’s predictions are off from their corresponding outputs on aver-
age. Like MAE, MAPE also has a clear interpretation since percentages are easier
for people to conceptualize. Both MAPE and MAE are robust to the effects of
outliers thanks to the use of absolute value. However, for all of its advantages,
MAPE is a weaker measure when compared to MAE. Many of MAPE’s weak-
nesses actually stem from the use of division operation. Now that everything is to
be scaled by the actual value, MAPE is undefined for data points where the value
is 0. Similarly, the MAPE can grow unexpectedly large if the actual values are
exceptionally small themselves. Finally, the MAPE is biased toward predictions
that are systematically less than the actual values themselves. That is to say, MAPE
will be lower when the prediction is lower than the actual compared to a prediction
that is higher by the same amount.
• Cosine Proximity: Cosine proximity is same as cosine similarity, which is a mea-
sure of similarity between two nonzero vectors of an inner product space that
measures the cosine of the angle between them. In this case, note that unit vectors
are maximally similar if they are parallel and maximally dissimilar if they are
orthogonal (perpendicular). This is analogous to the cosine, which is unity (max-
imum value) when the segments subtend a zero angle and zero (uncorrelated)
when the segments are perpendicular. Cosine proximity loss function computes
the cosine proximity between the predicted value and actual value, which is defined
as follows:
y · ŷ
CP = − (5)
y · ŷ
The bounds between 0 and 1 apply for any number of dimensions, and the cosine
similarity is most commonly used in high-dimensional positive spaces. One advan-
tage of cosine similarity is its low-complexity, especially for sparse vectors: Only
the nonzero dimensions need to be considered.
• Percentage Deviation of Log Likelihood Ratio: Another method used in this
project to evaluate the performance of prediction of the deep neural network is
to compare the log likelihood ratio (LLR) of the predicted hotspot and the actual
target hotspot for all testing data. The target hotspot is the best hotspot computed
using the LLR metric and represents the best hotspot for the given input, because
that hotspot has the highest value of LLR among all possible locations. So, the
LLR value of predicted hotspot would always be lesser than or equal to (best case)
the LLR value of actual hotspot.
If the predicted hotspot is perfectly equal to the target hotspot, then both LLR values
would be same, and the difference in LLR values would be zero. The difference in
22 S. N. Nair and E. S. Gopi
LLR values gives a measure of how efficient the predicted hotspot is, as compared
to the best possible hotspot. The error is measured as a percentage of the LLR
of target hotspot, for better comparison. A low percentage in deviation of LLR
values shows that the predicted hotspot is almost as efficient as the actual hotspot,
even if the deviation in radius and center coordinates of hotspot is larger. To find
the mean percentage deviation of LLR, the percentage deviation values for every
testing data are assumed to be the sample outcomes from a Gaussian distributed
random variable. Then, the mean of the above defined Gaussian random variable
would give the mean percentage deviation of log likelihood ratio between actual
and predicted hotspots.
The dataset has been divided into 452 different sets based on the date on which each
crime has occurred. The dataset contains crimes that were reported in the city of
Los Angeles, California from January 1, 2010, to July 31, 2018. The dataset has
been divided into sets where each set contains all crimes that have occurred over a
14-day period and with consecutive sets having an overlap of a 7-day interval. This
timeline-based approach is useful in creating a deep learning network which will use
past crime data to predict the future hotspots.
Figure 2 shows heat maps from ten consecutive sets over the time period between
January 4, 2016, and March 20, 2016. It has been observed that there is an average
of 8468 crimes within a two-week period. From the heat maps shown in Fig. 2, we
can see that although there is an observable variation between distribution of crimes
between different time intervals, there is also a significant correlation between heat
maps of consecutive time intervals. This correlation is achieved as a result of the
overlap of seven days introduced between consecutive heat maps. This ensures that
around half of the crime locations would be same for consecutive heat maps, which
result in a high correlation. This correlation is introduced to help the deep neural
network to more efficiently find the time-related variances in locations of time, and
subsequently to more efficiently predict the future hotspot which will have a strong
correlation with the past heat maps.
With the 452 heat maps that have been generated from the dataset, the next step
involved is to calculate the best hotspot for each heat map. In this experiment, both
ring-shaped hotspots and circular hotspots have been investigated and compared. A
ring-shaped hotspot is defined by its inner radius, outer radius, and center coordinates.
Similarly, a circular hotspot is uniquely defined by its radius and center coordinates.
The metric used to determine the best hotspot is log likelihood ratio. The best hotspot
is determined by calculating the log likelihood ratio for all possible hotspots by
scanning across the heat map.
For a ring-shaped hotspot, the heat map is scanned for all possible combinations
of inner and outer radii, and center coordinates. The inner radius is varied between
5 and 20, the outer radius is varied between 10 and 25, with the gap between inner
2 Deep Learning Techniques for Crime Hotspot Detection 23
and outer radii varying between 5 and 20. Also, both x and y values of the center
coordinates are varied between 10 and 40, thus ensuring that all possible hotspot
locations are considered. Similarly, for a circular hotspot, the radius is varied for
all values between 5 and 25, and the x and y coordinates of the center are varied
independently between 5 and 45. The log likelihood ratio has been calculated for all
these possible hotspots, and a hotspot is considered valid only if the LLR value is
positive. The hotspot with the largest LLR value among the valid hotspots has been
selected as the best hotspot for the corresponding heat map.
Figure 3 illustrates the best ring-shaped and circular hotspots for various heat
maps, as computed using the log likelihood ratio. The figure also denotes the LLR
value for the best hotspots. It is observed that the circular hotspot generally has
higher LLR values as compared to ring-shaped hotspots. Table 1 shows a comparison
of performance between the circular and ring-shaped hotspots. It is concluded that
24 S. N. Nair and E. S. Gopi
Fig. 3 Illustration for various heat maps and the corresponding best ring-shaped hotspots and
circular hotspots
2 Deep Learning Techniques for Crime Hotspot Detection 25
Table 1 Hidden layers of the deep neural network used in case the number of past heat maps used
to predict the future hotspot is fixed as 8, then the input matrix would have a size of 100 × 200 (by
concatenating 8 images of size 50 × 50 each), along with the related parameters that define each
layer, like the activation function
Layer No. Layer name Layer input size Layer output size
1 2-D convolution (5 × 5 × 32) 100 × 200 100 × 200 × 32
2 Maxpool (1 × 2) 100 × 200 × 32 100 × 100 × 32
3 Dropout (20%) 100 × 100 × 32 100 × 100 × 32
4 2-D convolution (3 × 3 × 16) 100 × 100 × 32 100 × 100 × 16
5 Maxpool (2 × 2) 100 × 100 × 16 50 × 50 × 16
6 Dropout (20%) 50 × 50 × 16 50 × 50 × 16
7 2-D convolution (3 × 3 × 8) 50 × 50 × 16 50 × 50 × 8
8 Maxpool (2 × 2) 50 × 50 × 8 25 × 25 × 8
9 Dropout (20%) 25 × 25 × 8 25 × 25 × 8
10 Flatten 25 × 25 × 8 5000 × 1
11 Fully connected ReLU 5000 × 1 3×1
the circular hotspots have a better performance than ring-shaped hotspots, based
on the higher average LLR value, and also take significantly lesser training time as
compared to ring-shaped hotspots. This helps in providing better real-time results as
computational time is also a constraint in this application. So, only circular hotspots
are considered for the subsequent phases of experiments.
The heat maps serve as input for the deep learning network implemented in the
next phase of the experiment. The network is designed to take an arbitrary number
of consecutive heat maps as input and give the hotspot parameters of the next future
heat map as the output. So, the heat maps and corresponding circular hotspots are
split into training data and testing data. Out of 452 images, 402 are used for training,
and 50 are used for testing.
The deep neural network is trained using the 402 heat map images. Four different
networks are trained, with varying number of heat maps used to predict the next
hotspot. The number of heat maps used to predict the future hotspot is varied as
4, 6, 8, and 12, and the performance of the four networks is compared. The deep
neural network has three neurons in the output layer corresponding to the radius and
center coordinates of the output circular hotspot. So, the target outputs are similarly
arranged as a 3 × 1 vector.
Figure 4 shows some visual examples of the predictions made by the deep neural
networks and a comparison of the performance based on log likelihood ratio values.
Table 2 gives a comparison between the four different neural networks that were
trained using different sample sizes of past data for every iteration of training. The
performance of a neural network for regression is compared using various regression
metrics like mean squared error, mean absolute error, mean absolute percentage error,
cosine proximity, and mean percentage deviation of log likelihood ratio (Table 3).
26 S. N. Nair and E. S. Gopi
Fig. 4 Illustration for various heat maps with actual and predicted hotspots
Table 3 Comparing performance of different deep neural networks, with varying number of pre-
vious heat maps as inputs, denoted by N = 4, 6, 8, 12
MSE MAE MAPE (%) Cosine LLR % Training time
proximity (%) deviation (%) (in hours)
N = 4 13.092 4.412 56.6 36.9 42.51 1.85
N =6 7.850 3.669 43.09 43.65 36.72 3.20
N =8 7.197 3.200 41.18 46.5 35.48 4.36
N = 12 7.334 3.382 41.78 44.93 36.33 6.82
based on deep learning has been proposed to predict the best circular hotspot for a
future timeframe using the heat map distributions of crimes in past timeframes. The
results show that deep learning approach can provide significantly good prediction
results.
This paper provides a clear direction for future researchers in the domain of crime
hotspot analysis and related topics. Metrics other than LLR can also be applied to
compare different hotspots. Also, various other architectures for deep neural network
can be tried to improve on the performance of the proposed network. Also, the heat
map-based methodology can be adapted to related applications where activities can
be represented as points on a geographic map. Some areas where this approach can
be implemented are epidemiology, and natural disasters like forest fires and cyclones.
Appendix
Let S be the activity area where the crime points are distributed, and let R be the
subset of S which indicates the candidate hotspot area. Let A denote the total number
of activities in the activity area. Let C denote the actual observed number of crime
points within the hotspot. Assuming the null hypothesis H0 that the crime points are
uniformly distributed within the activity area S, the expected number of activities
within the hotspot R denoted by B can be defined as
A × Ar ea(R)
B=
Ar ea(S)
The probability that a given crime point lies within the hotspot if the null hypoth-
esis were true, is given by BA , and for the point to be outside the hotspot is A−B A
.
Assuming all the points are distributed independently, and that null hypothesis is
true, the total probability that exactly C points are present within the hotspot R is
given by
B C A − B (A−C)
P0 = ×
A A
B C × (A − B)(A−C)
=
AA
The alternate hypothesis H1 considers that the null hypothesis is not true. In this
case, the probability that any point lies within the hotspot is given by CA , and that the
point is outside the hotspot is A−C
A
. The probability that exactly C points are present
within the hotspot R if the actual distribution is true is given by
C C A − C (A−C)
P1 = ×
A A
28 S. N. Nair and E. S. Gopi
C C × (A − C)(A−C)
=
AA
P1
The likelihood ratio is given by the expression P0
. Taking logarithm on both sides,
we get the expression for log likelihood ratio.
P
1
LLR = log
P0
C
C × (A − C)(A−C) AA
= log × C
AA B × (A − B)(A−C)
C C |A| − C |A|−C
= log ×
B |A| − B
C A−C
= C × log + (A − C) × log
B A−B
References
1. Eftelioglu E, Shekhar S, Kang JM, Farah CC (2016) Ring-shaped hotspot detection. IEEE
Trans Knowl Data Eng 28:3367–3381
2. Eftelioglu E, Shekhar S, Oliver D, Zhou X, Evans MR, Xie Y, Kang JM (2014) Ring-shaped
hotspot detection: a summary of results. In: Proceedings of IEEE international conference on
data mining, pp 815–820
3. Xue Y, Brown DE (2003) A decision model for spatial site selection by criminals: a foundation
for law enforcement decision support. IEEE Trans Syst Man Cybern Part C: Appl Rev 33(1):78–
85
4. Turvey BE (2011) Criminal profiling: an introduction to behavioral evidence analysis. Elsevier,
Amsterdam
5. Agarwal D, McGregor A, Phillips JM, Venkatasubramanian S, Zhu Z (2006) Spatial scan
statistics: approximations and performance study. In: Proceedings of the 12th ACM SIGKDD
international conference on knowledge discovery and data mining, pp 24-33
6. Peterson John J (2009) Regression analysis of count data. Technometrics 41(4):371–371
7. Gardner W, Mulvey EP, Shaw EC (1995) Regression analyses of counts and rates: Poisson,
overdispersed Poisson, and negative binomial models. Psychol Bull 118(3):392–404
8. Osgood DW (2000) Poisson-based regression analysis of aggregate crime rates. J Quant Crim-
inol 16(1):21–43
9. Harries K (1999) Mapping crime: principle and practice. CDRC, NIJ. https://www.ncjrs.gov/
pdffiles1/nij/178919.pdf
10. Bremer S (2000) An exploration of the methods for detecting hot spots and changes in hot spot
locations. M.S. thesis, Univ. Virginia, Charlottesville, VA
11. Dalton J (1999) Bandwidth selection for kernel density estimation of geographic point pro-
cesses. M.S. thesis, Univ. Virginia, Charlottesville, VA
12. Brown D, Liu H, Xue Y (2001) Mining preferences from spatial-temporal data. https://doi.org/
10.1137/1.9781611972719.26
13. Cohen LE, Felson M (1979) Social change and crime rate trends: a routine activity approach.
Am Sociol Rev 44(4):588–608
14. Rossmo DK (1999) Geographic profiling. CRC Press, Boca Raton, FL, USA
15. Brantingham PJ, Brantingham PL (1981) Environmental criminology. Sage Publications, Bev-
erly Hills, CA, USA
2 Deep Learning Techniques for Crime Hotspot Detection 29
16. Brantingham PL, Brantingham PJ (1993) Environment, routine and situation: toward a pattern
theory of crime. Routine Activ Ration Choice: Adv Criminol Theory 5:259–294
17. Eck JE (2005) Crime hot spots: what they are, why we have them, and how to map them.
In: Mapping crime: understanding hot spots. NIJ, pp 1–14. http://discovery.ucl.ac.uk/11291/
1/11291.pdf
18. Chainey S, Dando J (2005) Methods and techniques for understanding crime hot spots. In:
Mapping crime: understanding hot spots. NIJ, pp 15–34. http://discovery.ucl.ac.uk/11291/1/
11291.pdf
19. Cameron JG, Leitner M (2005) Spatial analysis tools for identifying hot spots. In: Mapping
crime: understanding hot spots. NIJ, pp 35–64. http://discovery.ucl.ac.uk/11291/1/11291.pdf
Chapter 3
Optimization Techniques for Machine
Learning
1 Introduction
Machine Learning (ML) is one of the areas of Artificial intelligence (AI). It aimed to
extract and automatically exploit the crucial information present in large databanks.
It refers to the development, analysis, and implementation of methods that enable
a machine to evolve through a learning process and, thus, to perform tasks that are
difficult or impossible to achieve by means of conventional algorithms.
ML algorithms draw on a variety of sources that combine different disciplines:
statistics and data analysis [1], symbolic learning [2, 3], neural learning, inductive
logic programming, reinforcement learning, statistical learning [4], support vector
machines [5], expert committees, Bayesian inference and Bayesian networks [6],
evolutionary algorithms (genetic algorithms, evolutionary strategies, genetic pro-
gramming), databases, human–machine interfaces, etc. The optimization of learning
methods saves storage space and prediction time by reducing the size of the obtained
models obtained. This is essential for applications that require short response times.
In this study, we tried to address the following questions:
– To introduce the history, techniques, and application of machine learning to novice
researchers;
– To provide a comprehensive review of machine learning methods;
– To identify the specific applications areas to which the commonly used learning
methods are applied;
– To summarize the most popular optimization techniques used in machine learning;
S. T. Zouggar
Department of Economics, Oran 2 University, Oran, Algeria
e-mail: [email protected]
A. Adla (B)
Department of Computer Science, Oran 1 University, Oran, Algeria
e-mail: [email protected]
– To discuss the strengths and the shortcomings of these techniques and highlight
potential research directions.
The review presented here differs from previous works that in addition to describ-
ing the history and techniques of machine learning, it also gives a critical review of
currently available selection measures.
The rest of the chapter is organized as follows: Sect. 2 provides an overview
of machine learning. In Sect. 3, we present a general description of decision tree.
Section 4 outlines ensembles methods, the different ways to generate them and their
selection. Finally, concluding remarks and future work are given in Sect. 5.
2 Machine Learning
Machine learning becomes a major concern of artificial intelligence in the late 1970s,
when expert systems face the challenge of acquiring existing expertise. It aims to
build hypotheses from examples. The resulting hypotheses are judged according
to two criteria: predictive efficiency (with respect to data) and intelligibility (with
respect to the expert or the user) [7].
Machine learning is the development of programs that improve with experience.
Its applications are numerous and concern a wide variety of fields. Examples include
pattern recognition, in particular, speech and written word recognition, process con-
trol and fault diagnosis, etc. [8]. According to [9], the knowledge produced by the
machine, in other words coming from machine learning, is not necessarily of a log-
ical nature; it can take various forms: neural network, algebraic model, geometric
model, etc. Simon [10] defines learning as “any change in the system that allows it to
perform a task better the second time, when repeating the same task, or when another
task occurs from the population.” Learning involves generalization from experience.
Why do we want a machine to learn to recognize an illness, for example, when
so far the man has not done too badly? Various reasons may explain this need:
• The scarcity of specialists;
• The impossibility for humans to access certain hostile environments or difficult to
access for reasons of cost or delay.
For example, in some clinical cases, the diagnosis of a disease is impossible
without surgery. Developing an automatic diagnostic system would prevent some
patients from having surgery wrongly and would allow the community to reduce
health expenses and dispense patients from unnecessary acts.
There are two kinds of machine learning methods.
3 Optimization Techniques for Machine Learning 33
Empirical learning methods are based on the acquisition of knowledge from exam-
ples. Empirical learning methods include case-based reasoning (CBR), artificial neu-
ral networks, decision trees, and genetic algorithms. These methods are divided
between analog learning methods and induction learning methods.
Learning Methods by Analogy. Approaches based on analogy transfer knowledge
on a well-known task to a less well-known one. Thus, it is possible to learn new
concepts or to derive new solutions from similar known concepts and solutions. Two
concepts become very important in the definition of learning by analogy: transfer
and similarity, as for example case-based reasoning (CBR) systems [11].
Induction Learning Methods. In this approach, one seeks to acquire general
rules representing the knowledge obtained from examples. The induction learning
algorithm receives a set of learning examples and must produce classification rules
allowing to classify the new examples. This algorithm can operate in a supervised
or unsupervised manner [12, 13].
Supervised Learning. The goal is to find a general and featured description describ-
ing a class without having to enumerate all the examples of this class [14]. Learning
strives toward two competing objectives:
1. An explanation of the studied concept, i.e., of the examples distribution in classes;
2. A decision or prediction function allowing assigning a class (insulin dependence,
for example) to examples (patients) of which this one is unknown.
The goal of supervised learning is to construct a prediction model, also called
classifier, which will allow to identify an attribute Y to predict, called endogenous
variable, class, variable to explain or variable to predict, from a set of explana-
tory attributes X, called exogenous variables, explanatory variables or predictors,
variables.
• The prediction model or classification function ϕ is built on a sub-ensemble of the
population Ωa , called the learning sample;
• An individual ω belonging to the sample;
• The attribute to predict Y associates with each individual of Ωa a class belonging
to C {set of classes C = {c1 , …, C m }}.
Y : Ωa → C
ω → Y (ω)
• X the exogenous (explanatory) variable is defined by:
X : Ωa → Ej
ω → Xj (ω)
Ej = e1j , e2j , . . . , epj : set of modalities (values) of X j .
Unsupervised learning. The unsupervised learning system considers a set of
objects or examples without knowing if these objects belong or not to the same
class. It tries to find the regularities between the examples while carrying out the best
possible clusterings. These clusters of similar objects are called prototypes [15].
34 S. T. Zouggar and A. Adla
The examples in this section are used to illustrate the different concepts associated
with the methods aforementioned. These are three bases of learning in the medical
field: The DIABETES database which groups patients to be classified as Type I or
Type II. The ULCERE base represents patients who are suffering or not from ulcer
perforation and, the MONITDIAB database to detect classes of complications for
diabetics.
ULCERE Learning Base. The base consists of 130 individuals and 12 descriptors
and a two-valued class indicating the existence of ulcer perforation or not. The 12
exogenous variables X j (j = 1.12); X j = {DEPIG, AGC, PYROSIS, DB, VOUM,
DAP, BEPIG, DCR, CEPIG, FEVER, EMAT, DEDPP}, E11 = {0,1} represents the
set of values of the variable EMAT, the variable to predict noted CLASS takes its
values in {0: Unperforated Ulcer, 1: Perforated Ulcer} (Table 2).
3 Decision Trees
Decision trees emerged with the AID algorithm “Automatic Interaction Detection”
[18]. Decision trees use regression trees for prediction. Among the improvements
to AID, for example, the CHAID method “CHi-square AID” of [19] is used for
classification.
The real success of these methods was resided in the development of CART and
ID3 algorithms [2, 13] which laid down the theoretical and applied foundations of a
new research field. Quinlan [3, 13] then proposes a set of heuristics to improve his
system. He proposed C4.5 in 1993 and C5.0 implemented in commercial software.
|Ωi |
Info(X , Ω) = ∗ Info(Ωi ) (1)
i=1,n
Ω
In the case of the DIABETE database, the calculation of the information provided
by the State variable is given by:
In the case of the ULCERE database, the calculation of the information provided
by the DEPIG variable is given by:
Consider the quantity Gain (X, ) defined as follows: Gain (X, ) = Info() −
Info(X, ). The gain represents the difference between the information needed to
identify an element of and the information needed to identify an element of
after obtaining the value of the attribute X. This is the information gain due to the
attribute X. In the DIABETES example, the gain for the State variable is:
If we consider the Assoc attribute, we find Info (Assoc, ) is equal to 0.36 and
Gain (Assoc, ) is 0.28. We deduce that the Assoc variable offers more information
than the State attribute. The notion of gain is used to classify attributes and build a
decision tree. At each node, there is the attribute that has the largest gain compared to
the others. The advantage of this scheduling is to create a small decision tree which
allows identifying a record with a small number of questions.
ID3 is the first popular decision tree algorithm proposed by Quinlan in 1986 [13]
for supervised classification. The tree is un-pruned, non-incrementally scalable and
greedy, and where Shannon’s entropy is used for data partitioning.
3 Optimization Techniques for Machine Learning 39
ID3 algorithm;
Input: X (exogenous variables), Y (class), learning
sample a;
If a is empty then return a node of value failure;
If a is consists of similar values for the class then
return a node labeled by the value of that class;
If X is empty then return a simple node with as value
the most frequent value of the class values in a;
D←argmaxXj gain(X, a) with Xj in X;
{dji with i = 1 … p} the values of the variable Xj;
{ai with i = 1 … p} the subsets of a composed of
in-dividals having dji values of the variable Xj;
Root tree D and arcs labeled by dj1 , …,djp going to
subtrees ID3(X-D,Y,a1),ID3(X-D,Y,a2),…,ID3(X-D,Y,ap);
Output: ID3 decision tree;
By applying the ID3 algorithm, on an excerpt of the Diabetes base composed of
132 patients, we obtain the tree (see Fig. 1).
– From a decision tree, we can extract rules in the form:
If <Condition> Then <Class ci> (degree of likelihood = effectif_
nbre_ instances_ ci
total_ feuille
)
– From the tree of Fig. 1, we can extract sixteen rules. The extraction is done from
the root and going toward the leaves of the tree. For example, we have the five
following rules:
– If Weight = 0 and CDC = 0, then TD = 1 ⇔ If Weight = ‘Normal’ and CDC =
‘Diabetic Foot,’ then Diabetes Type II.
– If Weight = 2, then TD = 0 ⇔ If Weight = ‘Obese,’ then Diabetes Type I.
– If Weight = 1 and CDC = 3, then TD = 0 ⇔ If Weight = ‘lean’ and CDC =
‘Retinopathy,’ then Diabetes Type I.
– If Weight = 1 and CDC = 4, then TD = 0 ⇔ If Weight = ‘lean’ and CDC =
‘Coma hyperhosmolar,’ then Diabetes Type I.
The decision trees are easily interpretable because of their graphical representation
and have good prediction and generalization performance. Referring to the 2001
study conducted by Piatetsky-Shapiro on his site dedicated to the industrial market
for extracting knowledge from data, decision trees are used by more than 50% of the
population surveyed.
In the study conducted in 2007 and in response to the question “What are the most
used data mining tools in the last 12 months? 62.2% of respondents cited the deci-
sion trees in the site http://www.kdnuggets.com/polls/2007/dataminingmethods.htm.
These statistical studies confirm the importance of these methods, mainly because
of their ease of use and their interpretability; these properties make these methods
widely used in areas that require justification for decision making as in the medical
field.
The most important constructing element of a decision tree classifier is the mea-
sure used to assess the quality of a partition. These measures belong to two main
categories: those based on entropy and those based on the notion of distance. A
partition quality calculation measure called distance-based new information mea-
sure (NIM) is proposed in [22]. It allows generating smaller-sized trees with high
performances.
Description of the Measure. The following notations are used:
– n: The total number of individuals in the learning sample a ;
– ni : The number of individuals of class i;
– esj : Modality s of the variable X j ;
– nsj : Number of individuals associated with the modality s of the variable X j ;
– nisj : Number of individuals class i associated with the modality s of the variable
Xj;
– m: The number of modalities of the class,
The measure NIM uses two functions:
– The importance function denoted Imp which has as a parameter
a variablemodal-
ity.
Let
esj be the
modality
s of the variable X j , Imp Xj = es = Imp esj =
chosen
variable X j and for each of its modalities e1j , e2j , . . . enj , if
–
For the
Imp esj − nsj
= 0, then the branch associated with the modality leads to
a leaf.
– If
Imp esj − nsj
= 0, then we repeat the step 1 by considering the remaining
variables and considering only the subpopulation associated with the branch
labeled by the modality esj .
Step 4:
– End the process when all nodes are “pure” leaves.
Partitions’ Generation: Application on MONITDIAB. To illustrate, we consider
the example MONITDIAB for monitoring diabetics. We assign to σ the value 0:
Step 1: At the beginning of learning, the learning sample a has the following initial
configuration presented in Table 4:
f (Ωa ) = |4 − 20/3| + |13 − 20/2| + |3 − 14/2| = 2.66 + 6.34 + 3.66 = 12.66, la
valeur |f (Ωa ) − 20| = |12.66 − 20| <> 0, then we deduce that the concerned node
is not terminal.
Step 2: We label the initial node by one of the 13 exogenous variables:
For
the
TD
variable
whose modalities are
Type I
and
Type II:
Imp(TypeI)
=
4 − 93
+
5 − 9
+
0 − 9
= 6, Imp(TypeII) =
0 − 11
+
8 − 11
+
3 − 11
= 8.66 Alors
3 3 3 3 3
f (TD) = 6 + 8.66 = 14.66, the calculation is done in the same way for the remaining
variables f (Var) = 14.66, f (IMC) = 6.63, f (Glyc) = 10.66, f (HBANC) = 13.34,
f (EFO) = 18.65, f (Crea) = 13.32, f (Urée) = 12.66, f (McrAlb) = 13.29, f (Cc) =
12.66, f (Neuropath) = 20.64, f (ECG) = 21.97, f (DA) = 13.97.
We choose the variable that maximizes the function f. So the ECG variable is
chosen to split the root node (see Fig. 2).
The proposed algorithm named IDT_NIM performs recursive partitioning as
adopted by the ID3 method. The partitioning of the generated child nodes is done
in the same way as the partitioning of the root node. The process stops when all the
obtained nodes are homogeneous leaves. The different steps are described by the
following pseudocode:
IDT_NIM algorithm;
Input:X(Exogenous Variables),Y(Class),a(Learning
Sample);
Calculate f (a );
If|f (a )-n|=0 Then “the tree is the root node”;
D←argmaxXj f (X, a ), Xj in X;
{edj(d=1…k) }ensemble of k modalities;
{aj(j=1…k } sub ensembles of a associated with the value
edj of Xj;
If |Imp(edj )- aj | = 0 Then from D generate the sub-tree
IDT_NIM (X-D,Y,aj)associated with the modality edj of
Xj;
Otherwise from D generate a leaf associated with the
modality edj of Xj and whose size is aj ;
Output: IDT_NIM tree.
An experimental study given in [22] shows the interest of NIM compared to
Shannon’s entropy and the gain ratio.
3 Optimization Techniques for Machine Learning 43
4 Ensembles’ Methods
The problem of classification models instability, for example those based on decision
trees, resides in that insignificant changes in the learning sample can cause large
changes in the generated classification rules. Therefore, the rules generated from
two similar samples with a few differences can be completely different. Different
models or hypotheses (H) are constructed from “almost” similar samples which
complicates the decision-making process. The theoretical quality of a hypothesis H
can be calculated by measuring the deviation, for each example x of X, between the
result of H and that of y.
– The models for which the prediction error is less than 0.5 are good enough. The
probability of error of a set J of models is equal to the probability that J/2 models
are mistaken follows a binomial distribution.
Model aggregation goes through two stages: a diversification stage which allows
different models to be selected to minimize error correlation. Diversification results
in covering different regions in the instance space. This stage is followed by an
integration that combines them to maximize the space covered. This integration can
be static (vote or basic predictions average) or dynamic (use an adaptive process to
integrate the basic predictions (meta-learning)).
Diversification by Resampling. There are four types of diversification by
resampling:
Bagging. Bagging bootstrap aggregating is a resampling method introduced by
Beiman in 1996 [23]. Given a learning sample a and a prediction method called
basic rule which builds on a a predictor ĥ(., Ωa ), bagging consists to draw with reset
several bootstrap samples (Ωa θ1 , . . . , Ωa θq ),
apply on them the basic rule (decision
tree) to generate a collection of predictors ĥ ., Ωaθ1 , . . . , ĥ ., Ωaθ1 , and finally,
44 S. T. Zouggar and A. Adla
The proposed measures are based on diversity and/or performance. The paths are
hill-climbing ones or based on genetic algorithms.
Multi-objective function. This function [39, 40] allows a directed hill-climbing
ensemble pruning (DHCEP) [38] search in a homogeneous ensemble of C4.5 trees
[3]. The selected sub-ensemble must come to a compromise between diversity
maximum and minimum error rate.
The motivation behind the joint use of the two criteria is that there is a relation
between the individual performance of the classifiers and their diversity. The more
precise the classifiers, the less they disagree. Using one of the two properties is not
sufficient to find the best performing sub-ensemble. The multi-objective function
is based on this compromise between tree individual performance and diversity of
trees. A reduced number of trees allow a gain in memory space and computing time
that can be very significant for large samples and real-time applications.
The function S is given by:
n k
nX θ 2
− X kn2 j=1 ej2 − X 2
i=1 i
S= +α (3)
nk − X Xkn − X
The Pruning Ensemble using Diversity and Accuracy (PEDA) algorithm given
below summarizes the steps to simplify B-trees generated by bagging and using a
hill-climbing path:
PEDA algorithm;
Entry: B = {A1,…,Ak}
Eval: validation or pruning sample;
Neighborhood (ϕj ): function that returns the sub
ensembles of models obtained from ϕj by adding a model
(tree);
Initialize (ϕ0 );
Calculate S (ϕ0 , Eval);
If ∃ϕj such that S (ϕ0 , Eval) < S (ϕj , Eval) où ϕj ∈
Neighborhood (ϕ0 ) then ϕ0 = argmin ϕj (S (ϕj , Eval));
Go to 3;
Output: A sub ensemble ϕ0 , ϕ0 ⊆B.
Entropy function. The entropy function is a diversity-based function presented
in [41] and is used in [31] to simplify heterogeneous ensembles (an ensemble of
different models). The function denoted f E is given by:
1 1
n
fE = min nc xj , T − nc xj (4)
n j=1 T − T
2
The f E measure was used with two paths: a hill-climbing path and a path based
on genetic algorithms. For a search based on genetic algorithms, we suppose that we
have an ensemble of four trees C = {T 1 , T 2 , T 3 , T 4 }, the chromosome ch1 = (1 0 1
0) corresponds to the fact that the trees T 1 , and T 3 are chosen in the sub-ensemble.
To the two trees correspond classification vectors on v . It is also assumed that
|v | = 2, we associates, for example, with T 1 and T 3 the vectors of classification (1
0)t and (0 1)t respectively. Calculating the fitness function ffE for chromosome ch1
is equivalent to calculating f E :
1 1
n
fE = min nc xj , T − nc xj
n j=1 T − T
2
48 S. T. Zouggar and A. Adla
5 Conclusion
Inference of classifiers from examples is an old but still active research field in
machine learning community. Classification methods, particularly those based on
decision trees, are of major interest given their application results obtained in differ-
ent fields. Their major point, compared to any other classification method, resides in
their intelligibility; they produce ranking functions that make sense of themselves. In
addition, the methods have good prediction and generalization performance. How-
ever, these methods mainly suffer from the drawbacks of the generated models com-
plexity and instability. Indeed, the complex models make these methods lose their
property of interpretability which makes them the most widespread methods in the
classification field. Instability reduces the credibility of the tool used which makes
it highly dependent on the data.
Among the proposed measures segmentation variables selection, the new infor-
mation measure (NIM) [22] is less complex than the information theory or distance
measurements. NIM used in a greedy partitioning algorithm Induction of Decision
Tree New Information Measure (IDT_NIM) allows generating trees of reduced sizes
with similar or even superior performances. For homogeneous or heterogeneous
ensemble selection, diversity and/or performance-based [29, 30, 42] functions are
used in hill climbing and algorithm genetic. The obtained sub-ensembles are smaller
in size and more efficient than the initial ensemble.
Throughout this chapter, we have underlined several points of deepening and
future work. First, NIM measurement can be used in sensitive areas where there is a
class imbalance. Applications in these areas are very frequent where the imbalance
resulting in the data which are scarce but critical may lead to serious economic and
strategic consequences in case affectation error, for example, to diagnose a subject as
healthy while suffering from cancer. The decentering proposed in [43] also may be
used to favor scarce cases in a learning sample. As for the multi-objective function
and the entropy function, they can be used in a random forest selection knowing that
a random forest ensemble improves the performance of a bagging [44].
3 Optimization Techniques for Machine Learning 49
References
1. Bzdok D, Altman N, Krzywinski M (2018) Statistics versus machine learning. Nat Methods
15(4)
2. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees.
Wadsworth International Group
3. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann
4. Kodratoff Y (1998) Technique et outils de l’extraction de connaissances à partir de données.
Université Paris-Sud, Revue SIGNAUX (92)
5. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers.
In: 5th annual workshop on computational learning theory. ACM, Pittsburgh, pp 144–152
6. Kim J, Pearl J (1987) Convice; a conversational inference consolidation engine. IEEE Trans
Syst Man Cybern 17:120–132
7. Sebag M (2001) Apprentissage automatique, quelques acquis, tendances et défis. L.M.S: Ecole
Polytechnique
8. Denis F, Gilleron R (1996) Notes de cours sur l’apprentissage automatique. Université de Lille
9. Kodratoff Y (1997) L’extraction de connaissance à partir de données: un nouveau sujet pour la
recherche scientifique. Revue électronique READ
10. Simon H (1983) Why should machines learn? In: Machine learning: an artificial intelligence
approach, vol 1
11. Carbonell JG (1962) Learning by analogy: formulating and generalizing plans from past expe-
rience. In: Michalak RS, Carbonell JG, Mitchell TM (eds) Machine learning, an artificial
intelligence approach. Tioga Press, Palo Alto, CA
12. Langley P, Simon HA (1995) Applications of machine learning and rule induction. Technical
Report 95-1, Institute for the Study of Learning and Expertise
13. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
14. Denis F, Gilleron R (1997) Apprentissage à partir d’exemples. Université Charles de Gaulle,
Lille 3
15. Dayan P, Sahani M, Deback G (1999) Unsupervised learning. In: Wilson RA, Keil F (eds) The
MIT encyclopedia of the cognitive sciences
16. Mitchell T (1997) Machine learning. McGraw-Hill Publishing Company, McGraw-Hill Series
in Computer Science (Artificial Intelligence)
17. Taleb Zouggar S, Adla A (2013) On generating and simplifying decision trees using tree
automata models. INFOCOMP J 12(2):32–43
18. Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am
Stat Assoc 58:415–434
19. Kass G (1980) An exploratory technique for investigating large quantities of categorical data.
Appl Stat 29(2):119–127
20. Friedman JH (1977) A recursive partitioning decision rule for non parametric classification.
IEEE Trans Comput 26(4):404–408
21. Partalas I, Tsoumakas G, Vlahavas I (2012) A study on greedy algorithms for ensemble prun-
ing. Technical Report TR-LPIS-360-12, LPIS, Dept. of Informatics, Aristotle University of
Thessaloniki, Greece
22. Taleb Zouggar S, Adla A (2017) Proposal for measuring quality of decision trees partition. Int
J Decis Support Syst Technol 9(4):16–36
23. Beiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat
24(6):2350–2383
24. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an
application to boosting. In: The 2nd European conference, EuroCOLT ’95. Springer-Verlag,
pp 23–37
25. Breiman L (2000) Randomizing outputs to increase prediction accuracy. Mach Learn 40:229–
242
26. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans
Pattern Anal Mach Intell 20(8):832–844
50 S. T. Zouggar and A. Adla
1 Introduction
The classification of medical images is a diagnostic technique and pattern that classify
different images based on some similar measurements in different categories. The
identification of the type of tumor in abnormal brain images is considered as one of the
important uses of the classification. Manual diagnosis of brain tumor tissues is time-
consuming, due to the complexity of brain tissue, and it depends on the operator’s
condition. Also, there is a need for experts to examine the images to diagnose, which
lead to the inefficiency of the common and old methods in the absence of these people.
Therefore, the use of automatic methods will be very useful for the examination of
tumors in a precise manner. Nowadays, the use of MRI images has attracted a lot of
attention due to the simpler analysis to determine the tumor and its characteristics
[1]. Relevant MRI images are usually used as proton density (PD), T1-Weighted,
T2-Weighted, and FLAIR [2]. T2-W images have higher weights, denser textures,
and their color tends to be white. This property causes cancer tissues are more easily
detected because we will have more cell density due to the growth of cancer cells in
the target area.
In the field of tumor diagnosis with computer-aided design (CAD), different clas-
sification algorithms have been created in MRI images, and different results have
been obtained [3]. The methods of the classification MRI images can be divided
into two categories of traditional methods and deep learning methods. In general,
the steps involved in these algorithms can be divided into pre-processing, feature
A. Balavand
Department of Industrial Engineering, Science and Research Branch, Islamic Azad University,
Tehran, Iran
e-mail: [email protected]
A. Husseinzadeh Kashan (B)
Faculty of Industrial and Systems Engineering, Tarbiat Modares University, Tehran, Iran
e-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2020 51
A. J. Kulkarni and S. C. Satapathy (eds.), Optimization in Machine
Learning and Applications, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-15-0994-0_4
52 A. Balavand and A. Husseinzadeh Kashan
2 Literature Review
There are various methods for pre-processing in MRI images. The intensity normal-
ization is one of the fields that have many applications in the pre-processing of MRI
images, which are essential for the analysis of quantitative textures and improving the
contrast of the images. Six methods of intensity scaling, contrast stretch normaliza-
tion, histogram normalization, histogram stretching, histogram equalization, Gaus-
sian kernel normalization are introduced in [4] which is associated with the intensity
normalization field. According to the results of [5], the histogram normalization
method has better performance than other methods. In this method, the intensity
values are generated based on the application of histogram normalization methods
in the original images. But [5] says histogram equalization is more successful in
medical images because it obliterates the small details.
4 A Package Including Pre-processing, Feature Extraction … 53
In this section, a new algorithm is proposed for classifying brain tumors in MRI
images, in which its flowchart is based on Fig. 1. This algorithm includes four main
steps. In the first step pre-processing is performed on the images using the histogram
equalization technique. In the second step, seven features using a GLCM technique,
and 1000 features are extracted from MRI images using the GoogleNet technique,
4 A Package Including Pre-processing, Feature Extraction … 55
Images classes
Meningioma
pre-processing feature extraction Dimension reduction classification
pituitary
and in the third step, due to the creation of many features by the GoogleNet method
and the creation of high computational complexity in the classification, using the PCA
technique, dimension reduction is performed in GoogleNet features, in which finally
100 important features of DWT are identified. Finally, in the fourth step, the OVO-
MV method will be used for classification. In this algorithm, classes are divided into
a maximum binary subset, and each binary subset is predicted by the majority vote
of seven classification algorithms. In this study, the k-fold cross-validation method is
used for dividing data into training and test data, and MSE is also used to calculate the
error of classification. This study is carried out to increase the accuracy and reduce the
prediction error in the classification of brain tumors in MRI images, which increased
accuracy depends on these two factors. The first factor is the feature extraction, in
which the use of appropriate feature extraction methods can have a great impact on
classification accuracy, and the second factor is the use of an appropriate classifier.
The combination of the OVO-MV classification algorithm and GoogleNet features
can lead to an increase in the accuracy of the classification and reducing classification
error in comparison with the single classifier, which is the main difference between
our method and the state-of-the-art methods. There are three hypotheses: in the
first hypothesis, it is expected that due to the efficiency of the GoogleNet feature,
this technique leads to generate suitable features. In the second hypothesis, it is
predicted that using the OVO-MV method can lead to an increase in the accuracy
of the classification and reducing classification error in comparison with the single
classifier, and in the third hypothesis, it is expected that no single classifier can have
good results for all data. In the following, at first, the MRI images are described.
Then histogram equalization and GLCM, GoogleNet methods with PCA methods
are described. Finally, the OVO-MV algorithm is presented.
In this study, 900 MRI images in the form of T1-W have been collected from the
southern medical university of Guangzhou website [37] to create a valid database.
56 A. Balavand and A. Husseinzadeh Kashan
Three kinds of brain tumor of meningioma (900 slices), glioma (900 slices), and
pituitary tumor (900 slices) have been detected in these images. An example of MRI
images, as well as the type of tumor with their classes, is shown in Fig. 2. All images
are resized to 227 × 227 in size.
Histogram equalization is used for doing the adjustment process of the intensity
values automatically. In this method, the histogram of the output image becomes
uniform, and the image contrast will be increased as much as possible. Histogram
equalization is calculated according to Eq. (1) for each pixel:
cdf(v) − cdfmin
h(v) = round × (L − 1) (1)
(w × h) − 1
where, h(v) is the value of the histogram, cdf(v) is the value of the cumulative
distribution function related to the pixel v, cdfmin is the minimum value of the
cumulative distribution function, w is the image width, h is the image height, and L
is the number of gray levels used which in most cases is 256. In Fig. 3, the left image
represents the original image, and the right image is created after using the histogram
equalization method. At this stage, the pre-processing operations are done using the
histogram equalization method on all MRI images.
3.3 GLCM
Nowadays, researchers that work with artificial intelligence use deep learning for
creating powerful computing systems. Meanwhile, convolutional neural networks
are used to feature extraction and classification [38, 39]. The purpose of the design of
convolutional neural networks is detailed modeling of how the human visual system
works. A convolutional neural network is the kind of deep learning which contains
a large number of convolution and pooling layers. The input of the convolutional
neural network is usually an image, and its output is a feature vector with high
resolution and corresponding to one class. Hidden layers in the convolutional neural
network include convolution layer, pooling layer, and fully connected layer [40]. The
simple structure of the convolutional neural network is shown in Fig. 5. Convolution
layer includes educable weights and biases which in the form of filters with different
dimensions and depths are applied on input layers, and a feature map for each sample
and filter is created. Connecting these feature maps to each other forms convolution
layer. Pooling layer is a nonlinear sampling function along with scaling down which
can be a function as maximizing, averaging, and even least square norm. Applying
this layer to the input layer causes the input layer dimension to decrease gradually.
The fully connected layer is a final layer with high-level features, and each neuron
in this layer connects with one of the feature maps in the previous layer.
There are two methods to use convolutional neural networks. In the first method,
the process of training is done by using a large data set, and in the second method,
the pre-trained convolutional neural networks are used to feature extraction [9]. In
this study, a pre-trained method is used for feature extraction called GoogleNet. The
GoogleNet is proposed in [11]. In this model, a new concept called inception is
proposed. Each inception includes six convolutional layers and two pooling layers.
Based on Fig. 6, this model includes two convolutional layers, three pooling layers,
and nine inception layers.
1 = w11 x1 + w12 x2 + · · · + w1 j x p
2 = w21 x1 + w22 x2 + · · · + w2 j x p
.
(2)
.
.
p = wi1 x1 + wi2 x2 + · · · + wi j x p
Using this technique, a total of 100 principal features of the GoogleNet features have
been identified, in which these 100 features contain 84% of the variance.
3.6 Classification
3.6.1 OVO-MV
Considering the pseudo-code shown in Fig. 7, in the first line, separating training
and test data is performed by k-fold. In the following, the process of training is done
for each classifier and each binary class (i,j) in line six. Then, prediction on test data
is done by the created model related to line six and test data. This process repeats
for all classifiers of DT, K-NN, LDA, LR, NB, SVM, and SVM-RBF in each binary
class. The process of majority vote is done from line 11 to line 19. In order to see
how the calculation of the majority vote is done, we refer to Table 3. According to
Table 3, we assume that the binary class (1,2) has four rows. Prediction of classifiers
has been done from column two to column seven. The last column is obtained based
on the majority vote of the columns of two to seven. For example, in the first row, all
classifiers have predicted class 1. Therefore, the first element of the last column is
equal to number one. Also in the last row, four classifiers have predicted class two,
and three classifiers have predicted class one. Therefore, the last element of the last
column is equal to number two. Also, we use three-fold cross-validation for dividing
data to training and test data. In this method, all data are considered as a test once.
The MSE is used to calculate the prediction error for all of the classifiers in each pair
class, which is obtained based on the Eq. (3). In this equation, ŷi is the predicted
class, and yi is the real class. The average of MSE in three folds is considered as a
final error of classification. In the last row of Table 3, the calculation of MSE has
been shown. This row is obtained based on the difference between the real label (first
4 A Package Including Pre-processing, Feature Extraction … 61
Table 3 Majority vote method and MSE calculation for binary classes (1,2)
Real classes DT K-NN LDA LR NB SVM SVM-RBF Majority vote
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
2 2 1 2 2 2 1 2 2
2 2 1 2 2 1 1 2 2
MSE 0 0.5 0 0 0.25 0.5 0 0
column) and predicted label of each classifier. In Table 3, classifiers of DT, LDA, LR,
SVM-RBF, and majority vote have good performance, and all labels are predicted
correctly.
c
1
n
2
MSE = ŷi − yi (3)
n=1
n i=1
62 A. Balavand and A. Husseinzadeh Kashan
4 Data Analysis
In this section features of the GLCM and GoogleNet are classified using the OVO-
MV algorithm to evaluate the proposed algorithm. Given that the OVO-MV algorithm
uses the majority vote of the seven classification algorithms, the parameters of each
classification algorithm are set according to Table 4.
This data includes seven extracted features from the GLCM and 100 important fea-
tures from the GoogleNet, which are classified into three classes. Given that the
GLCM features extract seven features of each MRI image, therefore, we have a
database with 900 rows and 7 columns. Also, GoogleNet features have created a
database with 900 rows and 100 columns. In the following of this section, the classi-
fication results of the GLCM and GoogleNet features will be shown and compared
to each other.
4.1.1 GLCM
The results of OVO-MV algorithm based on GLCM features are shown in Tables 5,
6, and 7. Given that a total of seven features are extracted from 900 MRI images using
techniques GLCM, a database with 900 rows and seven columns is obtained that rows
show the number of images, and columns indicate the features that are considered as
inputs to the classification algorithm. Given that this data has three classes, therefore,
the binary classes are equal to three classes. MSEs of classifiers in fold 1 in Table 5
shows that among the classifiers, LDA and DT have the minimum, and SVM has
the maximum average of MSE. The last column shows MSE of the majority vote
of classifiers. Results of MSE in majority vote show that there is only classification
error in binary class (2,3). Average of MSE in majority vote is reported 0.04 which
shows that majority vote has appropriate performance in the classification. Table 6
shows the MSEs of classifiers in fold 2. Average of classifiers error in the last row
shows that the best performance and the worst performance are related to LDA and
SVM, respectively. The error of the majority vote in the last column is 0.04 which
shows that most classifiers have had good performance. MSEs of classifiers in fold 3
present in Table 7. DT, K-NN, LR, and majority vote have had the best performance.
Also, SVM has not had good performance compared with other algorithms. The
average of MSEs of last rows in Tables 5, 6, and 7 is shown in Fig. 8. In this figure,
LDA, DT, majority vote, LR, K-NN, SVM-RBF, NB, and SVM have had minimum
MSE error, respectively.
64 A. Balavand and A. Husseinzadeh Kashan
Fig. 8 Comparison of the average of MSEs in three folds in classifiers for GLCM features
4.1.2 GoogleNet
Fig. 9 Comparison of the average of MSEs in three folds in classifiers for GLCM features
low MSEs. This is also evident in Table 9. In this table, most classifiers have good
performance in fold 2. The MSEs of classifiers in Table 10 show that GoogleNet
features have good quality and cause the proper separation of classes. The average
of MSEs of last rows in Tables 8, 9, and 10 is shown in Fig. 9. Minimum MSE in
this figure is related to DT, K-NN, LDA, and majority vote, and maximum MSE is
related to SVM. Also, LR classifier has had good performance.
This study introduced a new algorithm for classifying brain tumors in MRI images,
including 900 MRI images. Four steps including pre-processing, feature extraction,
dimension reduction, and classification using the OVO-MV algorithm were defined
in order to classify MRI images. In the first step, the pre-processing operations were
performed on the images using the histogram equalization method. In the second
step, seven features using the GLCM method and 100 features were extracted using
66 A. Balavand and A. Husseinzadeh Kashan
the GoogleNet method, in which the PCA method was used to reduce the dimen-
sions and dependence due to having many features using the GoogleNet method, and
finally, 100 main features are identified from the GoogleNet features. In the fourth
step, the OVO-MV algorithm with two phases was introduced. In the first phase, the
three-fold cross-validation method is used to divide the training and test data, then,
the binary classification was performed in which the data class was divided into max-
imum binary subsets, and seven classification algorithms were used in heterogeneous
groups in order to classify each binary subset. Classification algorithms consisted of
seven classifiers of DT, K-NN, LDA, LR, NB, SVM, and SVM-RBF. According to
the results, the proposed method achieved the high accuracy in the classification of
brain tumors in GoogleNet features which in the classification of GoogleNet features,
most of the classifiers were better than GLCM features. Although a highly accurate
classification was achieved by OVO-MV algorithm, this method may have some lim-
itations including increasing run time of classification and decreasing classification
accuracy in some problems. Also, according to the comparative results, no classifica-
tion could provide the appropriate results in all data, and in more times, better results
can be achieved using the majority vote method. For future works, the clustering
phase could be added in current work for segmentation of MRI images. Metaheuris-
tic search methods such as league championship algorithm [42, 43], optics inspired
optimization [44, 45], and find-fix-finish-exploit-analyze [46] can be used for cluster-
ing method. Also, there are efficient algorithms such as ACDEA [47] for determining
the optimum number of the cluster for increasing performance of clustering.
References
12. Zöllner FG, Emblem KE, Schad LR (2012) SVM-based glioma grading: optimization by feature
reduction analysis. Zeitschrift für medizinische Physik 22(3):205–214
13. Dash M, Liu H (1997) Feature selection for classification. Intel Data Anal 1(1–4):131–156
14. Coppersmith D, Hong SJ, Hosking JR (1999) Partitioning nominal attributes in decision trees.
Data Min Knowl Disc 3(2):197–217
15. Fletcher-Heath LM et al (2001) Automatic segmentation of non-enhancing brain tumors in
magnetic resonance images. Artif Intell Med 21(1):43–63
16. Guo Y, Hastie T, Tibshirani R (2006) Regularized linear discriminant analysis and its application
in microarrays. Biostatistics 8(1):86–100
17. Freedman DA (2009) Statistical models: theory and practice. Cambridge University Press
18. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer
series in statistics, New York
19. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press
20. Chaplot S, Patnaik L, Jagannathan N (2006) Classification of magnetic resonance brain images
using wavelets as input to support vector machine and neural network. Biomed Signal Process
Control 1(1):86–92
21. Wasserman PD (1993) Advanced methods in neural computing. Wiley
22. Specht DF (1990) Probabilistic neural networks. Neural Netw 3(1):109–118
23. Du K-L, Swamy M (2014) Radial basis function networks. In: Neural networks and statistical
learning. Springer, pp 299–335
24. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
25. Kang S, Cho S, Kang P (2015) Constructing a multi-class classifier using one-against-one
approach with different binary classifiers. Neurocomputing 149:677–682
26. Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure
for building and training a neural network. In: Neurocomputing: algorithms, architectures and
applications, vol 68(41–50), p 71
27. Galar M et al (2011) An overview of ensemble methods for binary classifiers in multi-
class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn
44(8):1761–1776
28. Kang S, Cho S (2015) Optimal construction of one-against-one classifier based on meta-
learning. Neurocomputing 167:459–466
29. Dean BL et al (1990) Gliomas: classification with MR imaging. Radiology 174(2):411–415
30. El-Dahshan E-SA, Hosny T, Salem A-BM (2010) Hybrid intelligent techniques for MRI brain
images classification. Digit Signal Proc 20(2):433–441
31. Marshkole N, Singh BK, Thoke A (2011) Texture and shape based classification of brain tumors
using linear vector quantization. Int J Comput Appl 30(11):21–23
32. Zhang Y et al (2011) A hybrid method for MRI brain image classification. Expert Syst Appl
38(8):10049–10053
33. Saritha M, Joseph KP, Mathew AT (2013) Classification of MRI brain images using combined
wavelet entropy based spider web plots and probabilistic neural network. Pattern Recogn Lett
34(16):2151–2156
34. Ortiz A et al (2013) Improving MRI segmentation with probabilistic GHSOM and multiobjec-
tive optimization. Neurocomputing 114:118–131
35. Zahran B (2014) Classification of brain tumor using neural network. Int Rev Comput Softw
(IRECOS) 9(4):673–678
36. Gaikwad SB, Joshi MS (2015) Brain tumor classification using principal component analysis
and probabilistic neural network. Int J Comput Appl 120(3)
37. School of Biomedical Engineering. Jun Cheng: Southern Medical University, Guangzhou,
China
38. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision.
In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS). IEEE
39. Nagi J et al (2011) Max-pooling convolutional neural networks for vision-based hand ges-
ture recognition. In: 2011 IEEE international conference on signal and image processing
applications (ICSIPA). IEEE
68 A. Balavand and A. Husseinzadeh Kashan
40. Liu T et al (2015) Implementation of training convolutional neural networks. arXiv preprint
arXiv:1506.01195
41. Subhash S (1996) Applied multivariate techniques. Wiley, Canada
42. Kashan AH (2011) An efficient algorithm for constrained global optimization and application
to mechanical engineering design: League championship algorithm (LCA). Comput Aided Des
43(12):1769–1792
43. Kashan AH (2014) League Championship Algorithm (LCA): an algorithm for global
optimization inspired by sport championships. Appl Soft Comput 16:171–200
44. Kashan AH (2015) An effective algorithm for constrained optimization based on optics inspired
optimization (OIO). Comput Aided Des 63:52–71
45. Kashan AH (2015) A new metaheuristic for optimization: optics inspired optimization (OIO).
Comput Oper Res 55:99–125
46. Kashan AH, Tavakkoli-Moghaddam R, Gen M (2017) A warfare inspired optimization algo-
rithm: the Find-Fix-Finish-Exploit-Analyze (F3EA) metaheuristic algorithm. In: Proceedings
of the tenth international conference on management science and engineering management.
Springer
47. Balavand A, Kashan AH, Saghaei A (2018) Automatic clustering based on Crow Search
Algorithm-Kmeans (CSA-Kmeans) and Data Envelopment Analysis (DEA). Int J Comput
Intell Sys 11(1):1322–1337
Chapter 5
Predictive Analysis of Lake Water
Quality Using an Evolutionary Algorithm
1 Introduction
One of the preconditions for the existence of a living organism and the sustainability
of the planet earth is water. It plays a crucial role in socio-economic development,
ecological sustainability and economic growth. The exponential increase in popula-
tion has resulted in stress on the limited natural resources, and water is one of this
overstressed natural resource. Over 3.6 billion people worldwide are already living
in potential water-scarce areas for at least one month per year, and this might increase
to 4.8–5.7 billion in 2050. The world economic forum’s global risk report 2018 states
that among the most pressing environmental challenges dealing with us are extreme
climate occasions and temperatures; accelerating biodiversity loss and pollution of
air, soil and water [33]. As per the World Water Vision Report, the crisis is no longer
about having too little water to satisfy our wants. But the disaster is about suited
administration of available water [35]. Lakes are one of the vital sources of fresh
water. They can also provide us with prime opportunities for recreation, tourism, and
cottage or residential living. They have historical and traditional values and also serve
to be a source of raw drinking water for a municipality, industry and an irrigation
M. Jadhav
SVC Polytechnique, Pune, Maharashtra 411041, India
e-mail: [email protected]
K. Khare (B) · S. Apte · R. Kulkarni
Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune,
Maharashtra 412115, India
e-mail: [email protected]
S. Apte
e-mail: [email protected]
R. Kulkarni
e-mail: [email protected]
source for agriculture, and also work to replenish groundwater. Positively influence
the water quality of downstream watercourses and prevent flooding. Global water
resource situation shows that, out of the whole accessible water, fresh water is solely
3%. Out of this 3%, surface water is 0.3%; out of this 0.3% surface water, 87% water
is in natural lakes or artificial reservoirs—11% is in swamps and only 2% is in rivers
[30]. It is therefore worthwhile taking efforts to save the water in our lakes.
The quality of water may be delineated in terms of the concentration and dissolved
or particulate state of organic and inorganic material present in water. Physical char-
acteristics of water add to this quality assessment. Long-term, standardised mea-
surement of water quality may be termed as monitoring. Monitoring is carried out
to estimate nutrient fluxes discharged by rivers or groundwater to lakes and other
water bodies. It is also used to check whether any unexpected change is occurring in
water quality. Monitoring helps to determine trends in the quality of water or aquatic
environment. We can even understand how the quality is affected by the release of
contaminants, other anthropogenic activities and by waste treatment operations.
Monitoring of water quality is based on the collection of data. Data collection
points are selected at given geographical locations in the water body. Water quality
variables are described by the longitude and latitude of the sampling or measurement
site (x and y coordinates). They are characterised by the depth at which the sample
is taken (vertical coordinate z). Monitoring data must also be recorded at the time t
at which the sample is taken. Thus, c = f (x, y, x, t), where c is a concentration of any
physical, chemical or biological variable. Monitoring data must, therefore, provide
a precise determination of these parameters to be used for data interpretation and
water quality assessments.
All assessment programs start with scrutinising the real need for water quality
information critically since we use water resources to several competing beneficial
uses. There are two types of monitoring programs, depending on how many assess-
ment objectives have to be met. Single-objective monitoring is set up to address one
problem area only.
This process involves a set of variables, such as pH, alkalinity and some cations
for acid rain; nutrients and chlorophyll pigments for eutrophication; various nitroge-
nous compounds for nitrate pollution; or sodium, calcium, chloride and a few other
elements for irrigation. Multi-objective monitoring may cover multiple water uses
and provide data for more than one assessment program such as drinking water sup-
ply, industrial manufacturing, fisheries or aquatic life, thereby involving a large set
of variables. The assessment objectives may focus on the spatial distribution of qual-
ity (high station number), on trends (high sampling frequency) or pollutants. Full
coverage of all three requirements is virtually impossible, or very costly.
Water quality monitoring can help researchers predict and learn from natural
processes in the environment and determine human impacts on an ecosystem. These
measurement efforts can also assist in restoration projects or ensure environmental
standards are being met. Many researchers have worked on prediction/forecasting
of water quality. However, more work needs to be done in terms of effectiveness,
reliability, accuracy, as well as usability of the current water quality management
methodologies.
72 M. Jadhav et al.
There are many meta-heuristic algorithms known today in computer science, includ-
ing random optimisation, simulated annealing and even greedy algorithm. One of the
algorithms is evolutionary algorithms. Evolutionary algorithms are used to discover
solutions to problems free of human preconceptions or biases. The adaptive natures
of evolutionary algorithms do generate solutions which are comparable to, and often
better than the best human efforts. They use mechanisms inspired by biological
5 Predictive Analysis of Lake Water Quality Using an Evolutionary … 73
The study presents faecal coliform, biochemical oxygen demand and chemical oxy-
gen demand forecasting models one month in advance for Gangapur reservoir located
in state of Maharashtra. Models are developed with 18 input parameters, viz. temper-
ature (Temp °C); electrical conductivity general (EC_GEN); electrical conductivity
field (EC_FLD; µmho/cm); pH (general and field) (pH_GEN, pH_FLD) (pH units);
dissolved oxygen (DO; mg/L); total dissolved solids (TDS; mg/L); total coliforms
74 M. Jadhav et al.
(T-col-MPN; MPN/100 mL); total phosphorus (P-Tot; mg P/L), total nitrogen oxi-
dised (NO2 + NO3 ; mg N/L); ammonia nitrogen (NH3 –N; mg N/L); sodium (Na;
mg/L); chemical oxygen demand (COD mg/L); carbonate (CO3 ; mg/L); chloride
(Cl; mg/L); biochemical oxygen demand (BOD3 -27, 3 days; mg/L); total alkalinity
(ALK-Tot; mg CaCO3 /L); and faecal coliform F-col (MPN/100 mL).
Monthly water quality data collected by the Maharashtra Water Resources Depart-
ment, Hydrological Data Users Group (HDUG), from March 2001 to January 2015,
is used in the present study. The number of sampling points is generally equal to the
rounded value of the log of the lake area in square kilometres. The surface area of
the lake under consideration is about 22.86 km2 . Therefore, the data from a single
sampling point is sufficient to represent the lake water quality [12]. Nashik (20°
5 Predictive Analysis of Lake Water Quality Using an Evolutionary … 75
02 N, and 73° 50 E) is situated on both banks of Godavari River, extending in an
east–west direction along its banks and that of its tributaries Nasardi, Waghadi and
Darna. Nashik is famous for a religious gathering “Kumbh Mela” which adversely
affects the environment and public health. The problems arising out of such activities
are mainly associated with mass bathing, cloth washing, idol immersion, nirmalyav-
isarjan, etc. Domestic waste generated is disposed of in the river through nallas in
unsewered areas. Outside the municipal boundaries, agricultural activities are carried
out at a massive scale on both the banks of Godavari River. Because of social events
and farming activities of wine yards, there is an adverse impact on river ecosystem,
and therefore calls for regular monitoring water quality of the river, define the level
of pollution and take immediate remedial measures to restore the quality. Gangapur
dam (22.86 km2 ) is an earthen dam constructed on the Godavari River. Water from
the reservoir is used for drinking purposes, irrigation and pisciculture. Gangapur
dam headwork’s on river Godavari, which supplies piped water for almost 1.6 mil-
lion residents of Nashik Municipal Corporation area [5]. It is the primary source of
water for domestic and industrial use in Nashik city. Nashik has sewage treatment
plants having a combined capacity of 270.5 m3 /day. About 78 effluent-generating
industries from MIDC Satpur are just 18 km away from Gangapur reservoir, and
most of them are the industrial sectors having water pollution index score of 60 and
above [29].
Various water quality parameters that must be monitored for assessment and
prediction of Gangapur lake water quality are physical parameters like temperature,
pH and turbidity, DO, DO saturated, TDS, total coliform, P, NO2 and NO3 , Na, COD,
hardness, chlorides BOD and alkalinity.
We have used software developed under the talent project N° 9800463 entitled “Data
to Knowledge—D2K” funded by the Danish Technical Research Council (STVF)
and the Danish Hydraulic Institute (DHI).
F-col, BOD and COD are core water quality parameters given in water quality cri-
terion 2002 [33]. The presence of faecal coliform bacteria in the aquatic environment
indicates water contamination with the faecal material of man or other animals. BOD
measures an approximate amount of biodegradable organic matter present in water.
It serves as an indicator parameter for the extent of water pollution. Monthly water
quality data collected by the Maharashtra Water Resources Department, Hydrolog-
ical Data Users Group (HDUG), from March 2001 to January 2015, is used in the
present study for model verification.
Significant input parameters for all models have been found by genetic program-
ming which is used for cause-effect models. The previous concentration of all signif-
icant parameters (t to t-6) is used for hybrid cause-effect models (cause-effect with
time step models).
76 M. Jadhav et al.
F − col (t + 1) = f (EC_FL, pH_GEN, DO, T − col, P − Tot, COD, BOD, F − col) t (1)
BOD (t + 1) = f (EC_FLD, pH_GEN, pH_FLD, NO2 + NO3 , Na, COD, ALK − Tot, BOD) t (2)
From these equations, the parameters having more than 2% recurrence are treated
as significant parameters. The values of significant parameters at time from t to t-6
may influence the forecasting process [21, 22]. With these significant parameters,
GP equations were evolved to develop a relationship between output at time t + 1
and significant input parameters with time steps from t to t-6. The control parameters
and function sets used for GP runs are summarised in Tables 1 and 2, respectively.
Flow chart for F-col is presented in Fig. 2 based on Eq. 4, and control parameters and
functions sets are shown in Tables 1 and 2, respectively. The flow charts for BOD
and COD can be developed using Eqs. 5 and 6, respectively. Out of the available data
sets, 75% data is used for training and 25% for testing for all runs.
The maximum initial tree size was restricted to 45, and maximum tree size was
selected to be 15 because GP tends to evolve uncontrollably large trees if the tree
size is not limited [22].
Maximum tree size 15 has another advantage; restricting to this size evolves
simple expressions that are easy to interpret and contain only 4–8 variables which
are most significant and comfortable to handle [20–22].
The values of population size, no. of children to be produced, objective type,
cross-over rate and the mutation were fixed by trial and error and by referring earlier
researchers’ work [12, 21, 22, 35]. For GP runs, different simple mathematical oper-
ations are used as function sets. Small and simple function sets, as represented in
Table 2, are used. GP is very creative at choosing simple functions and creating what
it needs by combining those [22]. A simple function set also leads to the evolution of
simple GP models, which are easy to interpret. With complex functions, the models
are difficult to understand and therefore avoided in the present study.
2.4 Cross-validation
The data is procured from 2001 to 2015, wherein there is one value per month of each
parameter. The sample size is comparatively small. Training data and validation data
are prepared in advance. Training data is used for learning. Validation data is never
used for learning. The final evaluation is done by validation data, which gives a fair
judgement whether the program has acquired an acceptable level of generalisation
without overfitting.
There are noise and errors generally involved in learning data because of vari-
ous reasons. Therefore, when the user conducts the training, until fitness value has
reached a minimum (forecasting error), the program has learnt not only what was
required to model, what is the phenomenon of interest also it has learned the errors
in the particular set of data used for training. If the user can forecast the validation
data, it is no longer possible to measure the robustness of the solution. Therefore, it
is considered as an excellent practice to prepare the third validation data set (an inde-
pendent data set) to assess the model which is entirely separate from the training set.
More significantly, the data set more robust is the solution obtained. Cross-validation
is the process used when data sets are small. Part of the data is excluded, and learning
is performed with the remainder of the set. The excluded part is then used for the
test. This procedure is repeated with different portions excluded from the original
data until all of the data has been eliminated. Ten trials with different sets of data
were taken for testing, and mean values of the scores are the index of robustness
[11]. Trials are carried for F-col cause-effect model with and without k-fold valida-
tion. Since GP-evolved equations relating to input and output variables might shed
78 M. Jadhav et al.
physical insight into the ecological processes involved, they are used to identify the
significant variables [20, 21, 23].
Selection of significant input parameter is one of the most critical steps. A large
number of inputs may lead to the curse of dimensionality [20–22]. Computational
complexity and memory requirement of the model increase, due to increase in input
dimensionality, which results from an increase in time to build the model. As the
input parameters increase, there is an increase in training samples. Addition of irrel-
evant input increases the local minima present in the error surface, which results
in poor model accuracy. Interpreting complex models is complicated, and if simple
models achieve comparable results, one should select those. In time series, as lag
length increases, the complexity of the model also increases. Thus, the selection of
an appropriate set of significant inputs plays an important role. Since GP-evolved
equations relating to input and output variables might shed physical insight into the
ecological processes involved, they are used to identify the significant variables [21,
22]. For F-col, BOD and COD models, 47, 65, and 54 GP equations, respectively,
are evolved for 30 days ahead forecasting as shown in Table 3. A similar exercise
for hybrid cause-effect models is shown in Table 5.
Table 3 Number of equations evolved to find significant inputs for cause-effect models
Trial No. Function set % of training F-col BOD COD
No. of equations evolved
1. +, −, *, /, sqrt 75 10 12 10
2. +, −, *, /, sqrt 80 05 16 10
3. +, −,*, /, sqrt 85 11 14 13
4. +, −,*, /, sqrt 90 10 10 11
5. +, −, *, / 75 11 13 10
Total number of equations evolved 47 65 54
80 M. Jadhav et al.
GP evolves equations which contain most significant variables out of the total 18
input parameters.
It is measured by considering number of times the variable is selected in equa-
tions. Table 4 shows a summary of the recurrence of several input variables. These
parameters are those whose number of terms is more than 2% of the total number of
terms in GP equations (Table 5).
For faecal coliform model, eight significant input parameters are identified, e.g.
EC_FL, pH_GEN, DO, T-col, P-Total, COD, BOD and F-col (t). For BOD model,
eight significant input variables are determined, e.g. EC_FLD, pH_GEN, pH_FLD,
NO2 + NO3 , Na, COD, ALK-Tot, BOD, whereas for COD, four significant input
variables are selected, viz. Temp, Na, COD and Cl. Summary for hybrid cause-effect
models is presented in Table 6.
Training data and validation data are prepared in advance for model runs. The final
evaluation is performed with validation data, providing a reasonable judgement of
whether the program has acquired an acceptable level of generalisation without over-
fitting. Cross-validation is executed by excluding part of the data, with learning per-
formed with the remainder of the data set. The excluded part is then used for the test.
This procedure is repeated with different portions excluded from the original data
until all the data have been excluded. Thus, the trials were executed ten times with
different data sets for testing and for cross-validation [11].
Results of all cause-effect models are shown in Table 7 and Fig. 3a, b, c. Correlation
coefficients (CC), root-mean-square error (RMSE), coefficient of determination (R2 )
and coefficients of efficiency (CE) of forecasted and observed values are presented.
The values of significant parameters at time from t to t-6 may influence the forecasting
process [20–22]. With significant parameters, GP equations were evolved to develop
a relationship between output at time t+1 and significant variables with time steps
from t to t-6. Table 8 and Fig. 2 show the results of forecasted models.
For both models, performance evaluation of the correlation coefficient (CC), coef-
ficient of determination (R2 ), root-mean-square error (RMSE) and coefficient of effi-
ciency (CE) are used in the present study to test the performance of various models
generated by GP. The correlation coefficient (CC) is selected as the degree of co-
linearity criterion of forecasting level. It has been widely used for model evaluation.
Table 4 Recurrence and the contribution factor of each parameter in all equations for BOD and COD cause-effect models
BOD COD F-col
Input variables Recurrence Contribution Input variables Recurrence Contribution Input variables Recurrence Contribution
factor in% factor in% factor in%
Temp 5 1.19 Temp 63 23.07 EC_FLD 29 8.73
EC_GEN 5 1.19 EC_GEN 1 0.36 pH_GEN 25 7.53
EC_FLD 9 2.14 EC_FLD 0 0 pHFLD 5 1.50
pH_GEN 22 5.23 pH_GEN 2 0.73 DO 9 3.20
pH_FLD 9 2.14 PH_FLD 0 0 DO_Sat.% 3 0.90
T-col 2 0.47 T-col 1 0.36 TDS 3 0.90
P-Tot 4 0.95 P-Tot 0 0 T-col 88 26.50
NO2 + NO3 27 6.42 NO2 + NO3 5 1.83 P-Tot 33 9.93
NH3 -N 1 0.23 NH3 -N 0 0 NO2 + NO3 04 1.20
Na 27 6.42 Na 6 2.19 Na 2 0.60
F-col 0 0 F-col 1 0.36 COD 9 2.71
COD 62 14.76 COD 170 62.27 Cl 4 1.20
CO3 1 0.23 CO3 0 0 BOD 58 17.46
5 Predictive Analysis of Lake Water Quality Using an Evolutionary …
Table 5 Number of equations evolved to find significant input for hybrid cause-effect models
Trial No. Function set % of training No. of equations evolved
F-col BOD COD
1. +, −, *, /, 75 10 12 10
2. +, −, *, /, sqrt 80 12 11 11
3. +, −, *, /, 85 10 10 10
4. +, −, *, /, pow(x, 2) 90 10 11 10
Total number of equations evolved 42 44 41
BOD(t-5) 32 3.61
P-Tot(t) 30 3.39
Total No. of 261 284 886
terms
83
84 M. Jadhav et al.
(a) Cause effect F-col model (b) Cause effect BOD model
5000 50 Observed
Observed
By GP
By GP
4000 40
F-col(MPN/100mL)
BOD(mg/L)
3000 30
2000 20
1000 10
0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
Time(months) Time (months)
130
120
110
COD(mg/L)
100
90
80
70
60
50
40
30
20
10
0
0 20 40 60 80 100 120 140 160 180
Time(months)
40 Time lag
40
35
BOD(mg/L)
BOD(mg/L)
30 30
25
20 20
15
10 10
5
0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
Time(months) Time(months)
Fig. 4 a Time series plot of hybrid BOD model (with an observed time lag), b time series plot of
hybrid BOD model (after executing time lag correction)
100
90
COD(mg/L)
100
80
70
60
50
40 50
30
20
10
0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
Time(months) Time(months)
Fig. 5 a Time series plot of the hybrid COD model (with an observed time lag), b time series plot
of the hybrid COD model (after executing time lag correction)
network performance in daily flow predictions [36]. In the present study, SA is used
to estimate the error frequency of all BOD models with time steps and the COD
model with time steps. Table 9 indicates the average repetition time of error cycle
for hybrid cause-effect models of BOD and COD. Trial version of XLSTAT is used
to perform SA.
Runs were executed with transformed data set by differencing and converting each
ith element into its difference from the (i-k)th element. The correction is applied for
the respective data set. New models were developed with corrected data sets for GP.
It was observed that the time lag is removed, and the relationship between input
and output is better mapped. Time series plot before and after removal of time lag
for hybrid cause-effect BOD and COD models is presented in Figs. 4a, b and 5a, b.
Results of both models before and after time lag corrections are shown in Table 10.
Time series plot for hybrid cause-effect models for F-col is presented in Fig. 6.
5000
F-col(MPN/100mL)
4000
3000
2000
1000
0
0 20 40 60 80 100 120 140 160 180
Time(months)
References
1. Azhagesan R (1999) Water quality parameters and water quality standards for different uses.
National Water Academy Report
2. Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic Programming: an introduc-
tion, an automatic evolution of computer programs and its applications. Morgan Kaufmann
Publishers, Inc., San Francisco, California
3. Bartram J, Ballance R, World Health Organization & United Nations Environment Programme
(1996). Water quality monitoring: a practical guide to the design and implementation of fresh-
water quality studies and monitoring programs. In: Bartram J, Ballance R (eds). E & FN Spon,
London. http://www.who.int/iris/handle/10665/41851
4. Brameier M (2004) On linear genetic programming; Ph.D. thesis, University of Dortmund
https://pdfs.semanticscholar.org/31c8/a5e106b80c07c1c0f74bcf42de6d24de2bf1.pdf
5. Chavan A, Sharma MP, Bhargava R (2009) Water quality assessment of the Godavari river
National conference on hydraulics. HydroNepal J Water Energy Environ 1:31–34. https://doi.
org/10.3126/hn.v5i0.2483
6. Coppola E, Rana A, Poultonx M, Szidarovszky F, Uhl V (2005) A neural network model for
predicting aquifer water level elevations. Ground Water 43(2):231–243
7. Dawson CW, Wilby RL (1999) Hydrological modelling using artificial neural networks. Prog
Phys Geogr Earth Environ 25(01):80–108. https://doi.org/10.1177/030913330102500104
88 M. Jadhav et al.
8. Dogan E, Koklu R, Sengorur B (2009) Modelling biological oxygen demand of the Melen River
in Turkey using an artificial neural network technique. J Environ Manage 90(2):1229–1235
9. Francone FD, Markus C, Banzhaf W, Nordin P (1999) Homologous crossover in genetic
programming. Proc Genet Evol Comput Conf 2:1021–1026
10. Guven A (2009) Linear genetic programming for time-series modelling of daily flow rate. J
Earth Syst Sci 118(02):137–146
11. Hitoshi I, Yoshihiko H, Topon KP (2009) Applied genetic programming and machine learning.
CRC Press International Series on Computational Intelligence, Boca Raton
12. Jadhav MS, Khare KC, Warke AS (2015) Water quality prediction of Gangapur reservoir
(India) using LS-SVM and genetic programming. Lakes Reservoirs Res Manag 20(04):275–
284. https://doi.org/10.1111/lre.12113
13. Jadhav MS, Khare KC, Warke AS (2014) Selection of significant input parameters for water
quality prediction-a comparative approach. Int J Res Advent Technol 2(03):81–90
14. Khovanova NA, Shaikhina T, Mallick KK (2015) Neural networks for analysis of trabecular
bone in osteoarthritis. Bioinspired, Biomimetic Nanobiomaterials 4(1):90–100
15. Koza JR (1992) Genetic programming: on the programming of computers using natural
selection. A Bradford book. MIT Press, Cambridge, Massachusetts, London, England
16. Lebaron B, Weigend AS (1998) A bootstrap evaluation of the effect of data splitting on financial
time series, IEEE Trans Neural Networks 213–220
17. Lermontov A, Yokoyama L, Lermontov M, Machado MAS (2009) River quality analysis using
fuzzy water quality index: Ribeira do Iguape river watershed, Brazil. Ecol Ind 9(6):1188–1197
18. Londhe SN, Dixit PR (2012) Genetic programming—new approaches and successful applica-
tions. In: Soto SV (ed) 8/12. In Tech Publications
19. Londhe S, Charhate S (2010) Comparison of data-driven modelling techniques for river flow
forecasting. Hydrol Sci J 55(7):1163–1174
20. Muttil N, Chau K (2007) Machine learning paradigms for selecting ecologically significant
input variable. Eng Appl Artif Intell 20(06):735–744. https://doi.org/10.1016/j.engappai.2006.
11.016
21. Muttil N, Chau K (2006) Neural network and genetic programming for modelling coastal algal
blooms. Int J Environ Pollut 28(3–4):223–238. https://doi.org/10.1504/IJEP.2006.011208
22. Muttil N, Lee JHW (2005) Genetic programming for analysis and real-time prediction of coastal
algal blooms. Ecol Model 189(03):363–376. https://doi.org/10.1016/j.ecolmodel.2005.03.018
23. Muttil N, Lee JHW, Jayawardena AW (2004) Real-time prediction of coastal algal blooms
using genetic programming. In: 6th international conference on hydro informatics. Singapore,
pp 890–897. https://doi.org/10.1142/9789812702838_0110
24. Najah A, Elshafie A, Karim OA, Jaffar O (2009) Prediction of johor river water quality
parameters using artificial neural networks. Eur J Sci Res 28(3):422–435
25. Nordin JP (1997). Evolutionary program induction of binary machine code and its application.
Ph.D. dissertation, Department of Computer Science, University of Dortmund
26. Palani S, Liong S-Y, Tkalich P (2008) An ANN application for water quality forecasting. Mar
Pollut Bull 56:1586–1597
27. Preis A, Ostfeld A (2008) A coupled model tree–genetic algorithm scheme for flow and water
quality predictions in watersheds. J Hydrol 349:364–375
28. Recknagel F, Cao H, Kim B, Takamura N, Welk A (2006) Unravelling and forecasting algal
population dynamics in two lakes different in morphometry and eutrophication by neural and
evolutionary computation. Ecol Inform 1(2):133-151
29. Sawant R (2015). A comprehensive study of polluted river stretches and preparation of action
plan of river Godavari from Nashik downstream to Paithan. The report, Aavanira Biotech
P. Ltd., Maharashtra Pollution Control Board. http://mpcb.gov.in/ereports/pdf/GodavariRiver_
ComprehensiveStudyReport.pdf
30. Shiklomanov I (1993) Water in crisis: a guide to the world’s freshwater resources. In: Gleick
PH (ed). Oxford University Press, New York, pp 13–25, https://www.academia.edu/902661/
Water_in_Crisis_Chapter_2_Oxford_University_Press_1993
5 Predictive Analysis of Lake Water Quality Using an Evolutionary … 89
31. Tikhe SS, Khare KC, Londhe SN (2015) Multicity seasonal air quality index forecasting using
soft computing techniques. Adv Environ Res 4(02):83–104. https://doi.org/10.12989/aer.2015.
4.2.083
32. Shaikhina T, Khovanova NA (2017) Handling limited datasets with neural networks in med-
ical applications: a small-data approach. Artif Intell Med 75:51–63. https://doi.org/10.1016/j.
artmed.2016.12.003
33. US Environmental Protection Agency (2009). Technical assistant document for reporting of
daily air quality—air quality index. Research Triangle Park, North Carolina
34. Wang W, Chau K, Xu D, Chen X (2015) Improving forecasting accuracy of annual runoff time
series using ARIMA based on EEMD decomposition. Water Resour Manage 29(08):2655–
2675. https://doi.org/10.1007/s11269-015-0962-6
35. Water Quality Criteria (2002) https://www.epa.gov/sites/production/files/2018-12/documents/
national-recommended-hh-criteria-2002.pdf
36. Whigham PA, Recknagel F (1999) Predictive modelling of plankton dynamics in fresh-
water lakes using genetic programming. The Information Science Discussion Paper Series.
Department of Information Science, University of Otago, Dunedin, New Zealand, pp 1–7
37. Wu CL, Chau KW, Li YS (2009) Methods to improve neural network performance in daily
flows. J Hydrol 372(1–4):80–93
38. Xiang, Y, Jiang L (2009) Water quality prediction using LS-SVM and particle swarm optimiza-
tion. In: Conference proceedings of the second international workshop on knowledge discovery
and data mining, WKDD 2009, Moscow, Russia. https://doi.org/10.1109/wkdd.2009.217
Chapter 6
A Survey on the Latest Development
of Machine Learning in Genetic
Algorithm and Particle Swarm
Optimization
D. K. Sarmah (B)
Symbiosis Institute of Technology, Symbiosis International (Deemed University), 412115 Pune,
MH, India
e-mail: [email protected]
As depicted in Fig. 1, there are five categories of optimization algorithms. The first
two categories are single-variable optimization algorithm [69] and multi-variable
optimization algorithms [69]. Single-variable optimization algorithms are classified
into two groups: (a) direct methods [113] and (b) gradient-based methods [113].
Direct methods always take the values of objective function to analyse the search pro-
cess. There is no derivative information of the objective function associated to execute
the process. However, the gradient-based methods utilize the first-order/second-order
derivative functions to guide the search process. Very few single-variable optimiza-
tion problems exist in the real scenario; thus, multi-variable optimization algorithms
are demonstrated. These algorithms are also partitioned into two techniques: (a) direct
and (b) gradient-based techniques. The third category is defined as constrained opti-
mization algorithms [7]. These algorithms frequently make an effort to identify the
optimal solution in the feasible search region. They are most often used to solve
engineering optimization problems. The fourth category is considered as special-
ized optimization problems classified as integer programming [71] and geometric
programming [44]. Integer programming deals with the integer design variables.
However, the geometric programming entertains the objective function and con-
straints written in particular form. The last category is described as non-traditional
optimization algorithms. They are referred to as (a) genetic algorithm (GA) [73] and
(b) simulated annealing [27].
The broad category of solving optimization engineering problem is described as
heuristics [63] and metaheuristics [45]. Heuristics techniques most often fall into
local optima as they are very much problem dependent and try to utilize all the
problem parameters and its specification. On the other side, the metaheuristics algo-
rithms are not at all problem dependent. Such techniques explore the solution space
Direct Gradient
Method Based Gene c Simulated
Method algorithms annealing
Direct Gradient
Method Based Integer Geometric
Method programming programming
Single- Mul -
variable variable
op miza on op miza on
algorithms algorithms
more thoroughly to get a better solution, and it can be used as black boxes. There
is no assurance to achieve global optimal solution from metaheuristics algorithms
in comparison with iterative methods. They are very useful to identify the opti-
mal solution to real-world combinatorial problems as it is very simple for them to
search a solution for a large set of feasible solutions. These algorithms are generally
referred to as nature-inspired optimization techniques [126] which are quite popular
among researchers nowadays. Such techniques consist of collections of algorithms
which seek inspiration from various occurrence perceived in nature. As shown by
Kumar et al. [65], the broad category of nature-inspired optimization algorithms is
partitioned into three groups: (a) bio-inspired, (b) swarm intelligence and (c) phys-
ical–chemical systems. These techniques are successfully used to solve NP-hard
problems.
The promising algorithms under each category are listed in Fig. 2. The recognition
of these algorithms has increased due to their approach in finding the optimal solu-
tions to complex and real-world computational problems. The limitations of such
On the other side, there is an emergent concept of ML, showing potential to solve
computational thinking natural world problems. DL algorithm is one of the strongest
developments of this concept which is radically gaining the importance and trans-
forming the real-world scenario by enhancing the performance of the computer-based
procedures. ML is a sub-branch of computational intelligence/soft computing which
works on the principles of human mind. It tries to solve the NP-hard problems to
compute the exact solution in polynomial time which otherwise is sometimes chal-
lenging for the existing algorithms. As shown in Fig. 3, the science of soft computing
is divided into four components: (a) ML, (b) fuzzy logic, (c) evolutionary computation
and (d) methods involving probability computations.
ML is also referred to as predictive algorithms which designs a mathematical
model centred on certain training data. The nature of these algorithms varies based
on the allotted task, problem, input and output and is categorized into three groups:
(a) supervised learning [119],( b) unsupervised learning [33] and (c) reinforcement
learning [78] as depicted in Fig. 4.
The common functionality of any of the group of ML algorithms is to mimic the
human common sense to identify the hidden characteristics or features for analysing
the new data. Supervised learning algorithms work on the pair of input data and
required output(s), referred to as training examples which helps to prepare mathe-
matical model to predict new data or to improve precision of its outputs. Supervised
So Compu ng
learning is classified into two subgroups, i.e. classification and regression. The sec-
ond group of unsupervised learning algorithms deals with the input data. There are
no respective output values associated with it. Such algorithms build a mathemat-
ical model by identifying some common features of the raw data and learn based
on the occurrence of that feature in each new bit of data. Unsupervised learning
algorithm is partitioned into three subgroups, i.e. clustering, associative and dimen-
sionality reduction. Clustering is used when there is a need to group similar type of
data in one cluster. K-means [38] and K-nearest neighbours (KNNs) [130] are the
well-known algorithms used for clustering. The second subgroup of unsupervised
learning, i.e. associative, is used to find the closeness or togetherness of frequently
used items. Apriori algorithm is the acknowledged algorithm under this category. The
third subgroup, i.e. dimensionality reduction, is employed to solve complex prob-
lems where thousands of input parameters are involved. The well-known algorithm
under this subgroup is principal component analysis (PCA) [52] which transforms the
two-dimensional input parameters to one dimension. The third group of reinforce-
ment learning focuses to maintain a balance between exploration and exploitation in
which the decision is taken by the system based on the preceding performed action.
It is divided into three subgroups: (a) deep reinforcement learning [80], (b) inverse
reinforcement learning [2] and (c) apprenticeship learning [85]. The practical and
most common use cases of these three groups are depicted in Figs. 5, 6 and 7.
The widely used ML algorithms are linear regression, logistic regression,
clustering/K-means, support vector machine (SVM) [128], decision trees [62], Naïve
Bayes [116], etc. The complex and advance form of ML is referred to as DL which
employs the concept of neural network (NN) [118]. A NN is a model used in ML
which solves the complex problems by modelling the data using neurons. They take
the intelligent decisions by themselves by structuring the NN in a layered form. One
of the simplest forms of NN is referred to as artificial neural network (ANN) [128,
129] which consists of three layers of neurons: (a) input layer, hidden layer and
output layer. DL is also considered as a subset of ML where multiple layers can be
used by DL models in order to extract the high-level features. Such neural networks
are recognized as deep neural networks (DNNs) [15]. DL algorithms play a very
96 D. K. Sarmah
Object Detec on
Market segmenta on
Clustering
Social graph analysis
Facial recogni on
Data Mining
Computer Vision
Dimensionality Reduc on: PCA
Image Compression
Bioinforma cs
Robo cs
Neural Networks
Deep Learning
[47] and recurrent neural network (RNN) [81]. DNN typically follows a strategy
used in feed-forward network (FFN) where the movement of data is from input
layer to output layer without looping back. In RNN, the data movement is in either
direction forward/backward. RNN can be mainly used to solve sequential problems
such as (a) one-to-many, (b) many-to-one, (c) many-to-many. The most common
applications used for RNN are handwriting recognition, speech recognition, natural
language processing, sentiment analysis, question answering, anomaly detection in
time series, log data analysis (Web data), sensor data analysis (time series), video
classification, etc. On the other side, CNN is mostly used for image data. This archi-
tecture can be applied to any of the prediction problems such as classification pre-
diction or regression prediction, where image data is used as an input. The variety
of processes/algorithms can be applied to single/multiple types of ML algorithms
to improve their performance. However, each algorithm cannot be used to solve all
types of problems as there are certain pros and cons associated with every algorithm.
Thus, there is a need to explore further in recognizing ML algorithms as per the real-
world applications. Also, the efficiency and the optimized solution of the algorithms
could be observed for solving a particular application which opens a new direction
for the researchers to hybridize nature-inspired optimization algorithms with ML.
98 D. K. Sarmah
Deep Learning
Architectures Deep Stacking Network (DSN)
neural network (BPNN), logistic regression analysis (LR) and quadratic discrimi-
nant analysis (QDA). In order to validate the results, the financial information of
different banks of Taiwanese banking industry is collected from 1998 to 2002. Fur-
thermore, Azadeh et al. [4] proposed a new technique to estimate electrical energy
consumption by integrating GA and ANN. Researchers have considered the case
study of Iranian agriculture sector from 1981 to 2005. In order to predict electricity
demand, few parameters have been used in this paper. GA is applied to tune these
parameters, and ANN is used for forecasting the electricity consumption rate. The
results are validated in comparison with regression analysis and time series approach.
One of the combinatorial NP-hard optimization problems of job shop scheduling is
efficiently solved by hybridizing the concept of GA and ML. Lee et al. [72] proposed
a system by considering the strengths of GA and ML to build a system to solve this
6 A Survey on the Latest Development of Machine Learning … 101
optimization problem. The obtained results are quite satisfactory in comparison with
the contemporary methods. Further, genetic programming (GP) is used to predict the
stock prices.
Kaboudan [57] proposed a profitable prediction approach where GP is utilized
to develop regression models which direct to build up a single day-trading strategy
(SDTS). The proposed work is validated on six stocks for 50 successive trading days
and experimentally produced high returns on investment in comparison with the
similar approaches. The combination of GA and NN further helped to optimize both
topology and neural weights of feed-forward network [114]. A hybrid intelligent
algorithm is developed by Ding et al. [25] to estimate the quality of a river by
combining three techniques, i.e. principal component analysis (PCA) [125], GA [6]
and back-propagation neural network (BPNN) [10]. It is observed that the merging
of these three techniques predicts accurately the water quality which further helps
to reduce the real-time associated risk [60]. Carbonne et al. [17] worked on the
limitation of profile identification where sorting a number of profiles contextually
and recognizing a profile for a particular individual are quite challenging. The authors
developed a framework by customizing the concept of vector space model. GA is
further applied to train this model and to identify and compare the similarity of two
profiles. Experimentally, it is proved to consider this method for profile clustering or
finding a match similarity between numbers of profiles.
Further, a review study is done by Chiroma et al. [18] to optimize NN through
GA. In this paper, authors analysed several NN design issues and limitations to solve
complex problems by employing GA and presented a state of the art. Furthermore,
Such et al. [107] worked on a query “Is GA suitable to solve a problem in deep
artificial neural network (DANN)?” The authors considered a population-based GA
to gradually develop the weights of DNN. This paper validated the said question by
applying GA on deep reinforcement learning (DRL) benchmark problems such as
Atari 2600 [12, 14, 77] and Humanoid Locomotion in the MuJoCo simulator [14,
99, 100, 111]. The satisfactory results are obtained in comparison with the existing
algorithms. GA is also applied with ML to recognize profile of a person. Tian et al.
[112] proposed a research work which optimized the CNN model for different visual
data sets. The optimizing is done using GA by considering pre-trained CNN models
as population. Experimental results prove the efficiency of the proposed framework
in comparison with the contemporary techniques. GA is further applied with ML
to optimize the travel time between each pair of location to identify the best travel
route [70]. In this article, a tree-based ML model, i.e. standard XGBoost model, is
applied to a huge data set to manage various categorical features. GA is then applied
to this trained model to plan the optimal journey. However, this work could be further
extended by incorporating Google Maps API for route planning.
Further, Kaur et al. [56] developed a system for English character recognition by
hybridizing the concept of NN and GA where back-propagation algorithm is used
with GA to work on the extracted features of characters. On the other side, Sun [108]
presented a novel method to identify the suitable architecture of CNN based on the
image classification problem. GA is applied to discover the suitable CNN architec-
ture. The proposed work is validated on benchmark data sets by comparing with
102 D. K. Sarmah
Table 2 A list of applications as per state of the art by combining PSO and ML
Application Author Year References
To diagnose unexplained syncope Gao et al. 2006 [30]
Shortest path problems Lima et al. 2009 [48]
Time series forecasting Neto et al. 2009 [84]
To optimize the input weights and hidden biases of Han et al. 2012 [37]
single-hidden-layer feed-forward neural networks
(SLFN)
To detect breast cancer Zhang et al. 2012 [127]
Gender classification of real-world face images Nazir et al. 2014 [83]
Representations of feature construction Dai et al. 2014 [22]
Accuracy detection for intrusion attacks Bamakan et al. 2015 [8]
Designing artificial neural network (ANN) Garro et al. 2015 [29]
Detecting travel mode Xiao et al. 2015 [120]
To optimize DL parameters using PSO Qolomany et al. 2017 [89]
High performing robot controllers Mario et al. 2017 [74]
Hyperparameter selection method Ye 2017 [124]
Twitter application Jayasekara 2018 [50]
Web spamming Singh et al. 2018 [97]
Predicting voltage instability Ibrahim et al. 2018 [46]
To diminish lung nodule false positive on computed Silva et al. 2018 [98]
tomography scans
To develop a multi-criteria recommender system Hamada et al. 2018 [36]
Real-world NP-hard problem Ding et al. 2019 [24]
Tunnel settlement forecasting Hu et al. 2019 [76]
Further, Neto et al. [84] proposed a model by integrating PSO and ANN to identify
the solution to time series forecasting problem as ANN works better in forecasting
systems where decision-making is involved. In this research, ANN parameters are
adjusted efficiently by using PSO. Six real-world time series are considered for
testing purpose to analyse the results. In the year 2012, Han et al. presented an
improved PSO which is applied on extreme learning machine (ELM) to optimize the
input weights and hidden biases of single-hidden-layer feed-forward neural networks
(SLFN). The proposed work is found more efficient in comparison with the traditional
ELM methods. Further, Zhang et al. [127] developed a novel NN classifier to detect
breast cancer. The authors have improved the efficiency of the traditional classifier
methods by combining floating centroid methods with PSO. Testing is accomplished
using UCI ML data set. Also, a novel method is developed and proposed by Nazir
et al. [83] for gender classification of real-world face images in an unconstrained
manner. The local features of an image are extracted through local binary pattern
(LBP). The classification accuracy rate is improved by merging the extracted features
with clothing features. PSO algorithm and GA are selected to identify the optimal
104 D. K. Sarmah
number of features which are treated as an input for support vector machine (SVM).
Experimentally, there is an improvement observed in classification accuracy rate by
comparing with the existing methods.
Further, feature extraction is considered an important parameter by Dai et al. [22].
The authors proposed a novel technique of representations of feature construction and
developed two representation techniques to overcome the limitations of the traditional
approaches, i.e. PSOFCPair and PSOFCArray. Experimentally, their classification
performance is improved by identifying a new high-level feature in contrast to the
existing methods. Furthermore, PSO is combined with a technique of ML to improve
the detection accuracy for intrusion attacks in Bamakan et al. [8]. A novel model is
proposed by merging multiple criteria linear programming, a classification method
with PSO. The proposed work is evaluated by considering the data set of KDD CUP
99. The results demonstrated better performance in comparison with two benchmark
classifiers as mentioned in the paper. An automatic ANN is designed using PSO
algorithm as described by Garro et al. [29]. Three PSO algorithms are employed in
the research work, namely basic PSO, second-generation particle swarm optimization
(SGPSO) and a new model of PSO called NMPSO. Experimentally, the proposed
work exhibited better results in terms of efficiency as compared with conventional
methods. Xiao et al. [120] proposed a model for detecting travel mode by merging
the concept of NN with PSO algorithm. A travel survey is conducted based on a
smartphone which considers four travel modes, specifically walk, bike, bus and car.
The positioning data is collected through GPS for testing purpose which results in the
improved accuracy in comparison with the contemporary methods. A research work
is proposed by Qolomany et al. [89] to optimize DL parameters using PSO. In any
DL network, two parameters are considered, i.e. number of layers in the network and
number of neurons in each layer. The experimental results showcased the optimized
tuning of these parameters in comparison with the grid search method.
Further, Mario et al. [74] proposed a research work on high performing robot
controllers. The authors considered multi-robot obstacle avoidance as a benchmark
optimization problem and compared the results of PSO with Q-learning. The results
are exhibited good results for PSO for certain testing scenario and different eval-
uation parameters such as performance efficiency, total evaluation time and their
overall behaviours. In recent work, Ye [124] worked on the hyperparameter selec-
tion method in order to optimize the values for network training phase. One of the
important hyperparameters, namely learning rate, is considered in his research. An
efficient way is proposed by integrating the advantages of PSO and steepest gradient
descent algorithm which allows the model to automatically identify the optimized
network structure for DNN and enables to optimize the hyperparameters as well.
Several experiments are considered in this work which showcased the better results
in contrast to the existing frameworks. In one of the latest reports by Jayasekara [50],
ML and PSO are applied on a Twitter application. The important concept to consider
for this application is feature selection as an optimization problem. There are sev-
eral applications of feature selection such as data classification, image classification,
cluster analysis, data analysis, image retrieval, opinion mining and review analysis.
6 A Survey on the Latest Development of Machine Learning … 105
Different methods such as wrapper method and filters are applied to solve this opti-
mization problem. However, the best optimal results are obtained by PSO. Further,
Tweet data clustering is completed by applying PSO algorithm after pre-processing.
Experimental results validate the performance of PSO clustering quite satisfactory
in comparison with hierarchical and partitioning clustering techniques.
There are different applications in which DL is applied. Singh et al. [97] also
worked on the similar lines by considering an important application of Web spam-
ming, one of the major challenges in search engines. Optimal feature selection method
plays a significant role to reduce Web spamming. The authors described a novel
method where PSO is used with the properties of correlation-based feature selection
(CFS) technique to identify the relevant and optimal features. During experimen-
tation, five classifiers are considered in Web spam-2006. Results indicate success
towards the size of features and accuracy. One of the challenging applications of
predicting voltage instability is considered in the paper by Ibrahim et al. [46]. In this
paper, a powerful algorithm, namely recurrent neural network (RNN) [88], is inves-
tigated and PSO algorithm is applied to train RNN for projecting voltage instability.
In order to validate the effectiveness of the proposed work, back-propagation (BP)
[10] algorithm is applied to train RNN and results are compared. PSO also worked
efficiently with CNN to diminish lung nodule false positive on computed tomog-
raphy scans as proposed by Silva et al. [98]. The efficiency of the presented work
is validated by considering two databases, i.e. Lung Image Database Consortium
and Image Database Resource Initiative (LIDC-IDRI). Further, PSO is employed
with ANN to develop a multi-criteria recommender system by Hamada et al. [36]. A
multi-criteria data set of movie recommendation to users is selected for experimen-
tal purpose which demonstrates the high prediction accuracy in comparison with the
recent approaches. Also, Ding et al. [24] projected the limitations of asynchronous
and traditional reinforcement learning algorithm to solve a real-world problem. In
order to address the limitations, the authors applied PSO to asynchronous reinforce-
ment learning algorithm to generate the optimal solution to the problem. This novel
version of the algorithm is referred to as asynchronous PSO. Further, the authors
developed a new algorithm based on asynchronous PSO and backward Q-learning
which is referred to as APSO-BQSA. The effectiveness of these algorithms is also
evident in this paper. Tunnel settlement forecasting, one of the major challenges for
construction companies to avoid unexpected disasters, is pointed out by Hu et al.
[76] in their research work. By identifying the limitations of traditional forecast-
ing methods, namely model-based methods and artificial intelligence (AI) enhanced
methods, the authors extended the approach of AI by integrating the concept of PSO
with support vector regression (SVR), back-propagation neural network (BPNN)
and extreme learning machine (ELM). This work is validated experimentally in two
large cities of China by forecasting the exterior completion of tunnel structure.
It could be observed through the state of the art that PSO is a powerful technique
in order to solve the real-world optimization problems. However, by joining with
the techniques of ML, it outperforms in several real-time applications and shows
improvement in many ways. By observing the latest applications solved by PSO
106 D. K. Sarmah
The study reveals various real-world and practical applications where GA or PSO
is applied with ML techniques to enhance the solution efficiency or to identify the
optimized solution to a complex problem. In this discussion, the advantages of both
the algorithms are evaluated and elaborated for solving a problem. As explained in
the methodologies, Sect. 2, by combining ML either with PSO or GA, the drawbacks
of each algorithm are sheltered; thus, improved solution is extracted. In this paper, a
sincere effort is made to make the researchers to understand the concepts more thor-
oughly by citing and explaining the most of the modern research work related to these
fields. In today’s world, there are thousands of applications in banking and finan-
cial services, government services, education, health care, transportation, etc., which
directly or indirectly uses optimization techniques and ML. Therefore, demand for
improved optimization techniques along with security is always there. By harness-
ing the new power of ML along with highly scalable computing powers of today’s
computers, researchers can give a new direction to the world. Also, researchers can
focus on security aspect of any algorithm and also other nature-inspired optimization
techniques along with ML so that new possibilities can be explored.
References
32. Geem ZW (2010) State-of-the-art in the structure of harmony search algorithm. In: Recent
advances in harmony search algorithm, studies in computational intelligence. Springer, Berlin,
Heidelberg
33. Ghahramani Z (2003) Unsupervised learning. Summer school on machine learning. In:
Advanced lectures on machine learning, lecture notes in computer science, 3176, pp 72–112
34. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn
3(2–3):95–99
35. Goldberg, D.E., Deb, K. (1991): “A comparative analysis of selection schemes used in genetic
algorithms”, in: Foundations of Genetic Algorithms, 1, Morgan Kaufmann Publishers Inc,
pp. 69–93
36. Hamada M, Hassan M (2017) Artificial neural networks and particle swarm optimization
algorithms for preference prediction in multi-criteria recommender systems. informatics
5(25):1–16
37. Han F, Yao HF, Ling QH (2011) An Improved extreme learning machine based on particle
swarm optimization. Int Conf Intell Comput Bio-Inspired Comput Appl 6840:699–704
38. Harbi SH, Smith VJR (2006) Adapting k-means for supervised clustering. Appl Intell
24(3):219–226
39. Harpham C, Dawson CW, Brown MR (2004) A review of genetic algorithms applied to
training radial basis function networks. Neural Comput Appl 13(3):193–201
40. Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2019) Emperor Penguins colony:
a new metaheuristic algorithm for optimization. Evol Intel 12(2):211–226
41. Hsieh JC, Chang PC, Chen SH (2006) Integration of genetic algorithm and neural network
for financial early warning system: an example of Taiwanese Banking Industry. In: First
international conference on innovative computing, information and control, 1, 30 Aug.–1
Sept. 2006, IEEE, Beijing, China
42. Hosseini HS (2009) The intelligent water drops algorithm: a nature-inspired swarm-based
optimization algorithm. Int J Bio-Inspired Comput 1(1/2):71–79
43. Huan TT, Kulkarni AJ, Kanesan J (2016) Ideology algorithm: a socio-inspired optimization
methodology. Neural Comput Appl 1–32. (https://doi.org/10.1007/s00521-016-2379-4)
44. Huang CH (2013) Engineering design by geometric programming. Mathematical problems
in engineering, 2013, Article ID 568098, pp 1–8
45. Hussain K, Salleh MNM, Cheng S, Shi Y (2018) Metaheuristic research: a comprehensive
survey. Artif Intell Rev (https://doi.org/10.1007/s10462-017-9605-z)
46. Ibrahim AM, El-Amary NH (2018) Particle Swarm Optimization trained recurrent neural
network for voltage instability prediction. J Electr Syst Inform Technol 5(2):216–228
47. Indolia S, Goswami AK, Mishra SP, Asopa P (2018) Conceptual understanding of convolu-
tional neural network-a deep learning approach. Procedia Comput Sci 132:679–688
48. Iima H, Kuroe Y (2009) Swarm reinforcement learning algorithm based on particle swarm
optimization whose personal bests have lifespans. In: International conference on neural
information processing, part of the lecture notes in computer science book series, 5864, pp
169–178
49. Javid AA (2011) Anarchic society optimization: a human-inspired method. In: Evolutionary
computation, CEC, 2011 IEEE Congress, IEEE, New Orleans, USA, pp 2586–2592
50. Jayasekara D (2018) Machine learning—particle swarm optimization (PSO) and Twit-
ter, https://medium.com/pythondatasciencezerotohero/machine-learning-particle-swarm-
optimization-pso-and-twitter-c952a9ace499
51. Jones AJ (1993) Genetic algorithms and their applications to the design of neural networks.
Neural Comput Appl 1(1):32–45
52. Jolliffe IT (2002) Introduction. In: Principal component analysis, Springer series in statistics.
Springer, New York, NY, pp 1–9
53. Joshi AS, Kulkarni O, Kakandikar GM, Nandedkar VM (2017) Cuckoo search optimization-a
review. Mater Today: Proc 4(8):7262–7269
54. Jung SH (2003) Queen-bee evolution for genetic algorithms. Electron Lett 39(6):575–576
6 A Survey on the Latest Development of Machine Learning … 109
55. Kashan AH (2009) League championship algorithm: a new algorithm for numerical func-
tion optimization. In: International conference on soft computing and pattern recognition,
SOCPAR09, IEEE, Singapore, pp 43–48
56. Kaur R, Singh B, Gobindgarh I, Sahib BF, Sahib F (2011) A Hybrid neural approach for
character recognition system. Int J Comput Sci Inform Technol 2(2):721–726
57. Kaboudan MA (2000) Genetic programming prediction of stock prices. Comput Econ
16(3):207–236
58. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE
international conference on neural networks, 4, IEEE, pp 1942–1948
59. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function
optimization: artificial bee colony (ABC) algorithm. J Global Optim 39:459–471
60. Krausmann E, Cruz AM, Salzano E (2017) Chapter 14—reducing natech risk: organizational
measures. Natech Risk Assessment and Management, Reducing the Risk of Natural-Hazard
Impact on Hazardous Installations, pp 227–235
61. Krishnanand KN, Ghose D (2006) Glowworm swarm based optimization algorithm for
multimodal functions with collective robotics applications. Multiagent Grid Syst Int J
2:209–222
62. Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283
63. Kleining G, Witt H (2000) The Qualitative Heuristic approach: a methodology for discovery in
psychology and the social sciences. Rediscovering the method of introspection as an example.
Forum Q Soc Res 1(1), Article 13
64. Koehn, P. (1994): “Combining Genetic Algorithms and Neural Networks: The Encoding
Problem”, A Thesis Presented for the Master of Science Degree The University of Tennessee,
Knoxville, pp 1–67
65. Kumar M, Kulkarni AJ, Satapathy SC (2018) Socio evolution & learning optimiza-
tion algorithm: a socio-inspired optimization methodology. Fut Generation Comput Syst
81:252–272
66. Kulkarni AJ, Durugkar IP, Kumar M (2013) Cohort intelligence: a self supervised learning
behavior. In: Systems, man, and cybernetics, SMC, IEEE international conference. IEEE,
Manchester, UK, pp 1396–1400
67. Kumar V, Chhabra JK, Kumar D (2015) Differential search algorithm for multiobjective
problems. Procedia Comput Sci 48:22–28
68. Kuo HC, Lin CH (2013) Cultural evolution algorithm for global optimizations and its
applications. J Appl Res Technol 11(4):510–522
69. Lindfield GR, Penny JET (2012) 8—optimization methods. In: Numerical methods 3rd edn.
Science Direct, pp 371–432
70. Lazovskiy V (2018) Travel time optimization with machine learning and genetic algo-
rithm. Towards Data Science (https://towardsdatascience.com/travel-time-optimization-with-
machine-learning-and-genetic-algorithm-71b40a3a4c2)
71. Louveaux Q, Skutella M (2016) Integer programming and combinatorial optimization. In:
18th international conference, IPCO 2016, Liège, Belgium, June 1–3, 2016, proceedings, part
of the lecture notes in computer science, 9682
72. Lee CV, Piramuthu S, Tsai YK (2010) Job shop scheduling with a genetic algorithm and
machine learning. Int J Prod Res 35(4):1171–1191
73. Man KF, Tang KS, Kwong S (1996) Genetic algorithms: concepts and applications [in
engineering design]. IEEE Trans Industr Electron 43(5):519–534
74. Mario ED, Talebpour Z, Martinoli A (2013) A Comparison of PSO and reinforcement learning
for multi-robot obstacle avoidance. IEEE Congress on Evolutionary Computation, Cancún,
México, June 20–23, 2013, pp 149–156
75. Marinakisa Y, Marinaki M (2014) A bumble bees mating optimization algorithm for the open
vehicle routing problem. Swarm Evol Comput 15:80–94
76. Min Hu M, Li W, Yan K, Ji Z, Hu H (2019) Modern machine learning techniques for univariate
tunnel settlement forecasting: a comparative study. In: Mathematical problems in engineering,
Hindawi, 2019, pp 1–12
110 D. K. Sarmah
77. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller
M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement
learning. Nature 518:529–541
78. Moerland TM, Broekens J, Jonker CM (2018) Emotion in reinforcement learning agents and
robots: a survey. Mach Learn 107(2):443–480
79. Moosavian N (2015) Soccer league competition algorithm for solving knapsack problems.
Swarm Evol Comput 20:14–22
80. Mousavi SS, Schukat M, Howley E (2016) Deep reinforcement learning: an overview. In:
Proceedings of SAI intelligent systems conference, Lecture notes in networks and systems,
16, pp 426–440
81. Mulder WD, Bethard S, Moens MF (2015) A survey on the application of recurrent neural
networks to statistical language modelling. Comput Speech Lang 30(1):61–98
82. Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional
clustering. Swarm Evol Comput 16:1–18
83. Nazir M, Majid-Mirza A, Khan SA (2014) PSO-GA based optimized feature selection using
facial and clothing information for gender classification. J Appl Res Technol 12(1):5–163
84. Neto PSG, Petry GG, Aranildo RLJ, Ferreira TAE (2009) Combining artificial neural network
and particle swarm system for time series forecasting. In: International joint conference on
neural networks, IEEE, 14–19 June 2009, Atlanta, GA, USA
85. Ng AY (2006) Reinforcement learning and apprenticeship learning for robotic control. In:
International conference on algorithmic learning theory, lecture notes in computer science,
4264, pp 29–31
86. Parpinelli RS, Lopes HS (2011) An eco-inspired evolutionary algorithm applied to numerical
optimization. In: Third world congress on nature and biologically inspired computing, 19–21
Oct. 2011, IEEE, Salamanca, Spain
87. Pham D, Ghanbarzadeh A, Koç E, Otri S, Rahim S, Zaidi M (2005) The Bees algorithm
technical note. Manufacturing Engineering Centre, Cardiff University, UK, pp 1–57
88. Poznyak TI,Oria IC, Poznyak AS (2019) Chapter 3—Background on dynamic neural net-
works. Ozonation and Biodegradation in Environmental Engineering, Dynamic Neural
Network Approach, pp 57–74
89. Qolomany B, Maabreh M, Al-Fuqaha A, Gupta A, Benhaddou D (2017) Parameters opti-
mization of deep learning models using particle swarm optimization. In: 13th international
wireless communications and mobile computing conference (IWCMC), IEEE, 26–30 June
2017, Valencia, Spain
90. Rabanal P, Rodríguez I, Rubio F (2017) Applications of river formation dynamics. J Comput
Sci 22:26–35
91. Rashedi E, pour HN, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci
179(13):2232–2248
92. Rao RV, Savsani VJ, Vakharia DP (2012) Teaching–learning-based optimization: an opti-
mization method for continuous non-linear large scale problems. Inform Sci 183(1):1–15
93. Robbins GE, Plumbley MD, Hughes JC, Fallside F, Prager R (1993) Generation and adaptation
of neural networks by evolutionary techniques (GANNET). Neural Comput Appl 1(1):23–31
94. Rosenberg L (2016) Artificial Swarm Intelligence vs human experts. In: International joint
conference on neural networks (IJCNN), 24–29 July 2016, IEEE, Vancouver, BC, Canada
95. Sang HY, Duan PY, Li JQ (2018) An effective invasive weed optimization algorithm for
scheduling semiconductor final testing problem. Swarm Evol Comput 38:42–53
96. Satapathy S, Naik A (2016) Social group optimization (SGO): a new population evolutionary
optimization technique. Complex Intel Syst 2(3):173–203
97. Singh S, Singh AK (2018) Web-spam features selection using CFS-PSO. Procedia Comput
Sci 125:568–575
98. Silvaa GLF, Valente TLA, Silvaa AC, Paivaa ACD, Gattass M (2018) Convolutional neural
network-based PSO for lung nodule false positive reduction on CT images. Comput Methods
Programs Biomed 162:109–118
6 A Survey on the Latest Development of Machine Learning … 111
99. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization.
In: Proceedings of the 32nd international conference on machine learning, PMLR, 37, pp
1889–1897
100. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization
algorithms. arXiv preprint arXiv: 1707.06347
101. Serrano W (2019) Genetic and deep learning clusters based on neural networks for man-
agement decision structures. Neural Comput Appl 1–25 https://doi.org/10.1007/s00521-019-
04231-8
102. Serrano W (2018) The random neural network with a genetic algorithm and deep learning
clusters. In Fintech: Smart Investment, Imperial College London
103. Shapiro AF (2002) The merging of neural networks, fuzzy logic, and genetic algorithms.
Insurance Mathe Econ 31(1):115–131
104. Shafti LS, Pérez E (2004) Machine learning by multi-feature extraction using genetic algo-
rithms. In: Ibero-American Conference on Artificial Intelligence, Part of the Lecture Notes
in Computer Science, 3315, pp 246–255
105. Shi Y (2011) Brain storm optimization algorithm. In: International conference in Swarm
Intelligence, advances in swarm intelligence, part of the lecture notes in computer science,
6728, pp 303–309
106. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global
optimization over continuous spaces. J Glob Optim 11(4):341–359
107. Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2018) Deep Neuroevolu-
tion: genetic algorithms are a competitive alternative for training deep neural networks for
reinforcement learning. Neural Evol Comput 1–14
108. Sun Y, Zhang M, Yen GG (2019) Automatically designing CNN architectures using genetic
algorithm for image classification. arXiv: 1808.03818 v2, pp 1–12
109. Sun J, Zhang H, Zhang Q, Chen H (2018) Balancing exploration and exploitation in multiob-
jective evolutionary optimization. In: Proceedings of the genetic and evolutionary computation
conference companion, GECCO’18, Kyoto, Japan—July 15–19, 2018
110. Suryansh S (2018) Genetic algorithms + neural networks = best of both worlds. Towards
Data Science (https://towardsdatascience.com/gas-and-nns-6a41f1e8146d)
111. Todorov E, Erez T, Tassa Y (2012) MuJoCo: A physics engine for model-based control. In:
international conference on intelligent robots and systems (IROS) (https://doi.org/10.1109/
iros.2012.6386109)
112. Tian H, Pouyanfar S, Chen J, Chen SC, Iyengar SS (2018) Automatic convolutional neural
network selection for image classification using genetic algorithms. In: IEEE international
conference on information reuse and integration (IRI), 6–9 July 2018, IEEE, Salt Lake City,
UT, USA
113. Venter G (2010) Review of optimization techniques. In: Encyclopedia of aerospace engineer-
ing. Wiley
114. Vizitiu I, Popescu F (2010) GANN system to optimize both topology and neural weights of
a feed forward neural network. (https://doi.org/10.1109/iccomm.2010.5509105)
115. Watanabe Y, Mizuguchi N, Fujii Y (1998) Solving optimization problems by using a Hopfield
neural network and genetic algorithm combination. Syst Comput Japan 29:68–74
116. Webb GJ (2010) Naïve Bayes. In: Encyclopedia of machine learning, pp 30–45
117. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans
Evol Comput 1(1):67–82
118. Wu B (1992) An introduction to neural networks and their applications in manufacturing. J
Intell Manuf 3(6):391–403
119. Xia X, Lo D, Wang X, Yang X, Li S, Sun J (2013) A comparative study of supervised
learning algorithms for re-opened bug prediction. In: 17th European conference on software
maintenance and reengineering, 5–8 March 2013, IEEE, Genova, Italy
120. Xiao G, Juan Z, Gao J (2015) Travel mode detection based on neural networks and particle
Swarm optimization. Information 6:522–535
112 D. K. Sarmah
121. Xu Y, Cui Z, Zeng J (2010) Social emotional optimization algorithm for nonlinear constrained
optimization problems. In: Swarm, evolutionary, and memetic computing, SEMCCO 2010.
In: Lecture notes in computer science, 6466, Springer, Berlin, Heidelberg, pp 583–590
122. Yang X (2010) A new metaheuristic bat-inspired algorithm. Nature Inspired Cooperative
Strategies for Optimization (NICSO 2010). Stud Comput Intell 284:65–74
123. Yang, X.S. (2012): “Flower Pollination Algorithm for Global Optimization”, International
Conference on Unconventional Computing and Natural Computation, Part of the Lecture
Notes in Computer Science, 7445, pp 240–249
124. Ye F (2017) Particle swarm optimization-based automatic parameter selection for deep neural
networks and its applications in large-scale and high-dimensional data. Plos One, 12(12)
(https://doi.org/10.1371/journal.pone.0188746)
125. Youcai Z, Sheng H (2017) Chapter four—pollution characteristics of industrial construction
and demolition waste. In: Pollution control and resource recovery, industrial construction and
demolition wastes, pp 51–101
126. Zang H, Zhang S, Hapeshi K (2010) A review of nature-inspired algorithms. J Bionic Eng
7(Supplement):S232–S237
127. Zhang L, Wang L, Wang X, Liu K, Abraham A (2012) Research of neural network classifier
based on FCM and PSO for breast cancer classification. In: International conference on hybrid
artificial intelligence systems, part of the lecture notes in computer science book series, 7208,
pp 647–654
128. Zhang Y (2012) Support vector machine classification algorithm and its application. Int Conf
Info Comput Appl Commun Comput Info Sci 308:179–186
129. Zou J, Han Y, So SS (2008) Overview of artificial neural networks. In: DJ Livingstone (eds)
Artificial neural networks, methods in molecular biology™, 458, Human Press
130. Zhang S, Zong M, Sun K, Liu Y, Cheng D (2014) “Efficient kNN algorithm based on graph
sparse reconstruction. In: International conference on advanced data mining and applications,
ADMA 2014, lecture notes in computer science, 8933, pp 356–369
Chapter 7
A Hybridized Data Clustering for Breast
Cancer Prognosis and Risk Exposure
Using Fuzzy C-means and Cohort
Intelligence
1 Introduction
Cancer is a type of disease in which cells in the body grow and mutate in an
unorderly and uncontrollable manner triggered due to certain genetic abnormali-
ties. This uncontrollable growth of cells may result in the formation of a mass of
cells called as tumor; which may be malignant or a benign tumor. Cancer as a disease
is one of the leading causes for deaths in the world and can be classified into different
types depending on the area of the body where they origin from and the type of the
cell they comprise (or resemble). Breast cancer, the most diagnosed cancer types in
females, is a result of abnormal and unruly growth of the cells in the breast tissues.
The cells speedily segregate from a bulge of extra tissue, called tumor. This tumor
can be either malignant (cancerous) or benign (non-cancerous) in nature. As per a
recent article on global cancer statistics by Bray et al. [7], breast cancer is the leading
causes of cancer death in many countries, and around 2.1 million women between
the age group 40–55 year of age were diagnosed worldwide with this disease in 2018
alone. Studies also establish that the major contributing factors for breast cancer are
gender and the age, with genetic mutations set off as a result of the aging process
and lifestyle changes rather than from hereditary factors.
Fortunately, the mortality rate has deteriorated in the recent years, with improved
prognostic and diagnostic techniques and more effective treatment and medicines.
Earlier, clinical approaches like mammography, surgical biopsy, magnetic resonance
imaging (MRI) and fine needle aspiration (FNA) were practiced, among which mam-
mography is highly recommended in order to predict and diagnose breast cancer
[12]. Detection of seeded region (tumor mass) and distinguishing them from the
background tissues using morphological operators with image processing and image
segmentation methods were used. Image processing and image segmentation meth-
ods [16, 34] focus primarily on abnormality detection from a mammogram. These
methods rely on some form of preprocessing techniques which filter the noise from a
mammogram. Various researches propose different image segmentation techniques
which separate out the region of interest (RoI) (i.e., the probable area where the
tumor could be concentrated). This is then followed by image processing techniques
to identify/classify abnormalities in the ROI [17]. Over the past years, researchers
have augmented the techniques of data mining and/or machine learning with clin-
ical methods to improve the quality of results obtained for breast cancer predic-
tion/detection. Machine learning [18, 27] is a branch of artificial intelligence, where
machines generate the competence to learn on its own without being programmed.
Machine learning methods [2, 3, 33] like Support Vector Machine, Decision Tree,
Naïve Bayes, Artificial Neural Network [1, 5], Bayesian Network are being exten-
sively used for applications focusing on early detection and prognosis of cancer [19]
and its type. Enrichment in the data collected from different sources, and robust data
mining methods play a crucial part in medical field. Machine learning algorithms
extensively use data mining techniques for building models to predict the future out-
come of given data. Data mining [13] is analytical process of exploring and retrieval
of useful hidden pattern or relationships between the variables on a given dataset.
These consistent patterns and findings are then validated by applying the detected
pattern to new subset of data.
Clustering and classification are popular data mining methods that have been
used on cancer datasets for early detection of cancer in patients on the basis of
parameters in the datasets [29]. A popular classification technique K-nearest neighbor
(KNN) is used for cancer prognosis [26, 28] which is dependent on the number
of neighbors and percentage of data used. KNN is highly dependent on distance
measure (for example, Euclidean and Manhattan distance). They are effective in
terms of classification and performance but are often time consuming in nature. Data
clustering techniques partition the set of unstructured data objects into clusters [31]
where the clusters are formed in such a way that the data objects within one cluster
exhibit more similarities than to the objects of another cluster. Data clustering may
be categorized as hierarchical or partitional clustering. Hierarchical clustering [11]
groups objects into connected tree-like structure in which clusters are connected to
each other. The hierarchical clustering is further divided as top-down (divisive) and
bottom-up (agglomerative) approach. The article by Jain et al. [15] summarizes the
advantages of hierarchical clustering. Firstly, no advance initialization of number of
7 A Hybridized Data Clustering for Breast Cancer Prognosis … 115
clusters and secondly the cluster formation is not dependent of initial conditions.
However, hierarchical clustering is passive in nature, i.e., a data object appoints
to a cluster cannot move to another. Also, the overlapping of clusters cannot be
eliminated due to lack of knowledge about clusters initial shape and size. With
partitional clustering [24], only one set of cluster is created, where the data is grouped
into several disjoint groups of cluster. Thus, partitional clustering technique is best
suited for larger datasets. The advantages of hierarchical clustering happen to be the
disadvantages of partitional clustering and vice versa. A commonly used partitional
algorithm K-means [14], which is a hard clustering technique, fast and simple in
nature, is used for partitioning the data for the minimization of mean square error
measure. Although it is a popular method for clustering, it faces few limitations.
Firstly, its performance is dependent on initial centroid choice, secondly, the objective
function is not convex, and hence, it contains local minima as well as local maxima.
Also, it is highly sensitive toward noises and outliers. K-means clustering [10] was
used to assess the impact of clustering on the Breast Cancer Wisconsin (BCW)
diagnostic dataset, which is a popular dataset used across various researches focusing
on detection and prognosis of breast cancer.
Another clustering algorithm is fuzzy C-means (FCM) [6, 32] is an iterative
method of clustering, most frequently used in pattern recognition and image pro-
cessing. It is based on the concept of minimization of objective function to achieve
the least-squared error. FCM is an unsupervised soft clustering method, in which data
objects on the limits between classes may not fully belong to a single class. Every
data point is assigned with a membership value between 0 and 1 conforming to each
cluster on the basis of the distance between the cluster center and data point [9, 35].
It employs fuzzy partition such that the data point can belong to all groups with
this degree, and after every iteration, the membership degree and cluster centers get
updated. Factors like distance measures, cluster shape and scattering of data points
in 2D space make fuzzy C-means more suited to larger datasets. Fuzzy C-means is a
better classification method in comparison with k-means [8, 30] because of its unsu-
pervised soft clustering nature. FCM is also dependent on center initialization, and
it is more convoluted in computation than k-means. It also takes higher computation
time. To overcome certain limitations of FCM, in the recent years, hybrid version of
this clustering techniques is being proposed. Certain heuristic approaches are also
used to optimize the performance of traditional FCM. For example, genetic algorithm
(GA) is hybridized with various machine learning and data mining methods. An arti-
cle by Jain et al. [23] discusses a system using the concept of fuzzy with genetic
algorithm referred as genetic fuzzy rule-based system (GFRBS) is proposed. The
system attains high performance and provides the interpretable feature. We present
a Table 1 to enlist a few relevant researches on breast cancer prognosis which use
various machine learning/data mining approaches and hybrid methods.
The current work proposes a hybrid fuzzy C-means method compounded with an
optimization technique for robust and superior data clustering to overcome possible
limitations of FCM. The cohort intelligence (CI) optimization algorithm [20, 21] is
used in the current work to hybridize the traditional FCM, and then, the performance
of this hybrid methodology is validated using the Wisconsin breast cancer dataset.
116 M. Kumar et al.
In current research, a new novel hybridized data clustering model referred as fuzzy
cohort intelligence algorithm (FCI) is proposed. The projected algorithm converges
more swiftly and attains more precise solutions avoiding getting trapped in the local
minima. The rest of the article is organized as follows: Sect. 2 discusses the algorith-
mic framework of the proposed technique, also focusing on the basic FCM method
and the working of the CI optimizer. Section 3 details the experimental findings and
compares the performance of the proposed hybridized FCI with the traditional FCM
method. Finally, Sect. 4 concludes the study.
7 A Hybridized Data Clustering for Breast Cancer Prognosis … 117
The FCM is an autonomous partitional clustering method in which data objects lying
on the borderline in between the classes may not entirely belong to a single class. The
data points located on the cluster boundaries are not mandated to belong to a particular
cluster. They may be members of multiple clusters, and a fuzzy membership value
determines their degree of association with a certain cluster. Every data object is
assigned a membership value between [0, 1] based on the distance of the data point
with a cluster center.
Let X = [x1 , x2 , . . . , x N ], where xi ∈ X D , be the set of N data objects that are
to be clustered and C = [c1 , c2 , . . . , c A ] be the set of A clusters represented by their
centers and U = [u 1 , u 2 , . . . , u N ] be the set of fuzzy membership function for A
clusters. In this procedure, each data in set X will be assigned in one of the A cluster
in such a manner that it minimizes the objective function. The objective function is
the summation of membership function u i j and squared Euclidean distance between
each xi and c j . This objective function is defined as
N
A 2
S = F(U, C) = min u imj xi − c j , 1 ≤ m < ∞ (1)
i=1 j=1
where
• m is a real number that works like the fuzziness index, m ∈ [1, ∞].
• c j = ∅, ∀ j{1, 2, . . . , A} is the center of cluster j.
• u i j is the fuzzy membership function which represents the membership of the ‘ith’
data point to the ‘jth’ cluster center.
• k = {1, 2, . . . , n}, where n is number of iterations.
The method uses fuzzy partition such that the data point is acceptable to all clusters
with this degree and the membership degree, and cluster centers get updated with
every iteration. Thus, the algorithm performs fuzzy partitioning through iterative
optimization, and the partitions will become fuzzier with increasing m. This iterative
process continues till there is improvement in the computed values of the objective
function. The process stops when this improvement between the current and the
previous iteration is below a threshold value ε (where 0 < ε < 1)). The fuzzy
membership function u i j can be defined as
1
ui j = (2)
A xi −c j m−1
2
and
118 M. Kumar et al.
N
i=1 u i j ∗ x i
m
cj = N m , c j represents jth cluster center. (3)
i=1 u i j
The current work presents a hybridized algorithm referred to as fuzzy cohort intel-
ligence (fuzzy-CI) which hybridizes the basic fuzzy C-means algorithm with the
optimizer CI algorithm with an aim to improve the clusters and hence generated. The
hybrid methodology attempts to optimize and thus minimize the objective function
of FCM resulting in improved data cluster formation for a given dataset/clustering
problem at hand. An optimized objective function indicates optimized centroids and
better partitioning of data and thus better recognition of patterns in a dataset. The
hybridized algorithmic approach may be used to optimize cluster formation for aug-
menting the prediction accuracy. The amalgamation allows the proposed algorithm
to converge more rapidly and attain a more precise solution by avoiding getting stuck
in local minima.
Steps of Fuzzy-CI Algorithm
Consider the objective function of basic FCM which needs to be minimized, given
as Eq. (1). The CI optimizer attempts to optimize this equation. In the current study,
7 A Hybridized Data Clustering for Breast Cancer Prognosis … 119
where
S z = F u z , c z = c1z , c2z , . . . , c zj , . . . , c zA (5)
c zj = x1z , x2z , . . . , x zj , . . . , x Dz (6)
For each candidate, randomly generate the initial cluster centers A, described in
Eq. (2), where C A = [c1 , c2 , . . . , c A ].
120 M. Kumar et al.
Step 3: Every candidate then calculates its fuzzy membership measure u i j , using
Eq. (2), for k = 1, 2, 3 . . . , n, i = 1, 2, 3 . . . , N and j = 1, 2, 3 . . . , A.
Step
4: Each candidate determines its new cluster centers, using Eq. (3), described as
c1 , c2 , . . . , c j , where z = 1, 2, 3, . . . , Z and j = 1, 2, 3, . . . , A. For example, if
Z Z Z
1 1 1
A = 3, new clusters formed by candidate
2 2 2z(1) may be represented as z 1 = c1 , c2 , c3
and for candidate z(2) as z 2 = c1 , c2 , c3 .
Step 5: At each iteration, every candidate computes its objective function S z using
Eq. (1), where z = 1, 2, 3, . . . , Z , which represents the overall behavior of a specific
candidate in the cohort at an iteration n.
Step 6: Every candidate in the cohort instinctively attempts to enhance its behavior
by updating its behavior. This is done by a candidate by observing the behavior of
other candidates in the cohort as well as itself. It may then choose to follow the
behavior of a better behaving candidate. The probability p z of choosing the behavior
S z of a candidate z is calculated using
1/S z
pz = Z (8)
z
z=1 1/S
Step 7: Every candidate may pursue certain behavior, and this behavior is
selected using Roulette wheel selection approach in the FCI. Using the Roulette
wheel,
each candidate may choose which corresponding behavior S Z [∗] =
Z [∗] Z [∗] Z [∗]
f C1 , C2 , . . . , C A is to be followed. Roulette wheel selection method is
a probabilistic selection approach which is used in current study to recommend a
fitter/better behavior. This approach improves the chance of every behavior to be
selected based purely on its quality at least once. This process helps each candidate
to preferred better behavior with the help of correlated probability p Z using Eq. (8)
seeing that p Z is directly proportional to the characteristics of the behavior S Z .
Step 8: Every candidate shrinks its sampling interval αiz[∗] for every feature repre-
sented as ciZ [∗] to its nearest neighbor and forms a new sampling interval using Eq. (9).
Following or learning from a certain behavior means that current sampling interval
associated with every S z is updated to the close neighborhood of the candidate to be
followed.
αiz[∗] ∈ ciz[∗] − (αi /2), ciz[∗] + (αi /2) (9)
where αi = (αi ) × r . Here, ‘*’ illustrates that the behavior is selected at random
by the candidates and not known previously.
Step 9: After having updated its features (i.e., the cluster centers), each candidate
computes the updated objective function according to Eq. (1).
Step 10: This iterative process continues (between steps 3 and 9) till the cohort
converges, i.e., if any of the under-mentioned conditions become true.
• if no significant improvement is noticed in the behavior S z of every candidate in
cohort
7 A Hybridized Data Clustering for Breast Cancer Prognosis … 121
cn cn−1
max S Z − max S Z ≤ ε, where cn is the current iteration
• if maximum number of iterations (max_iterations) is reached.
Step 11: If either of the two conditions is fulfilled, then acquire any of the best
behavior from S Z from the set of candidate behaviors (best objective function) as
the concluding objective function value and end. If not, then continue to step 3.
The section discusses the dataset used to test the proposed hybridized data cluster-
ing algorithm, and the findings are reported. To evaluate the functioning of FCI, a
comparative analysis is conducted with traditional FCM approach tested on the same
dataset. In the experiment, the computations are executed in MATLAB R2013a on
Mac OS platform with 1.6 GHz Intel Core i5 processor with 8 GB RAM.
The Wisconsin Breast cancer (WBC) dataset [4, 25] from UCI Machine Learning
Repository is used to validate the proposed algorithm. The dataset contains 699
instances (amounting to a single clinical case) including 16 missing values. It includes
nine features (as shown in Table 3) each of which is assigned an integer value between
1 and 10 and a class output attribute. This class output may report/classify a benign
or malignant breast cancer diagnoses for a particular data object.
In the current study, the WBC dataset was used to validate the performance of the
proposed FCI. The clustering performance of fuzzy C-means was also tested on the
WBC dataset and then compared with hybridized FCI. A total of 40 trials were car-
ried out with each method, and the number of clusters ‘A’ is known prior to solving
the clustering problem. The simulation tries to optimize cluster centers of fuzzy C-
means clustering algorithm using CI. For every trial, the input to the system included:
the dataset as a csv file (N = 683, A = 3) and random initial cluster centers. Param-
eters like the best solution produced (Best), the worst value (Worst) recorded for
the objective function, mean value of solution (Average), standard deviation (Std.
Deviation), average running time (R.T.) and number of function evaluations (FE)
across the trials for each FCI and FCM algorithms were recorded. The simulation
results given are presented in Table 4. The results indicate that the hybridized fuzzy
C-means, i.e., the FCI is superior in performance to FCM. The optimizer definitely
aids in improved data clustering as can be seen in Table 4 that the objective function
is minimized in FCI for all the criterion (best case, average case and even in worst
trial run). It can also be seen that the hybridized fuzzy C-means shows a consistent
performance than the traditional FCM even though the optimizer itself is heuristic
nature and has a aspect of randomness to it; and also, the traditional FCM initializes
with random seed (i.e., random cluster centers at the start).
Thus, it can be inferred that the optimized FCI lends a more consistent perfor-
mance to the fuzzy C-means making it more robust with a smaller value of standard
deviation. This may be attributed to the strength of CI which has strong capabilities of
reaching better and accurate solutions by avoiding getting stuck in the local minima
and also leading the algorithm to converge much more quickly.
Figure 1 illustrates the cluster formation graphically for both FCM and hybridized
FCI with selected attributes on the WBC dataset for three clusters. Figure 1a shows
the cluster formation after the traditional FCM was applied to the said dataset. It
shows that the centroids as suggested by FCM result in overlapping clusters, thus
hinting at weaker cluster formation due to the cluster centers. This may also lead to
weaker predictive qualities if these clusters were used for further classification of
unknown data. Figure 1b shows well-formed clusters as the objective function also
Fig. 1 Cluster formation on WBC dataset using FCM (a) and hybridized FCI (b), respectively
had achieved a better and minimized value with the fuzzy C-means hybridized with
the CI optimizer. It may also be noted that there is a large difference in the FE values
for FCM and the hybrid FCI Table 4. This is due to the reason that the hybrid version
uses CI, where multiple agents (known as candidates of the cohort) are exploring the
solution space, with each candidate making its own function evaluation or calculation
of objective function at every iteration. However, even with higher FE, the hybrid
model of FCI runs much faster also yielding improved cluster formations (as seen
from the running time taken by both the clustering algorithms in Table 4).
Figures 2 and 3 illustrate behavior plots of FCM and FCI, respectively. Figure 2
shows that how the behavior ‘S’ is progressing and steadily moves toward con-
vergence. On the other hand, the hybrid model of FCI (Fig. 3) with five different
candidates in the cohort, each of which have their own certain set of behaviors, shows
all the candidates approaching convergence much faster and exploiting the solution
space more gradually as they near convergence.
11000
B E H A V IO U R (S )
10000
9000
8000
7000
6000
5000
1 1.5 2 2.5 3 3.5 4 4.5 5
ITERATIONS
124 M. Kumar et al.
11000
Behaviour (S)
10000
9000
8000
7000
6000
5000
1 1.5 2 2.5 3 3.5 4 4.5 5
iterations
4 Conclusion
This paper presents a hybrid fuzzy-CI procedure for data clustering. And the hybrid
algorithm tries to combine the advantages of two algorithms, where fuzzy C-means
is hybridized with the optimizer CI to enhance the cluster formation capabilities of
traditional FCM. The proposed method is tested on Wisconsin Breast Cancer (WBC)
dataset. The blend of fuzzy C-means and the stochastic CI allows the proposed algo-
rithm to converge faster with improved and more accurate clustering. The results
of the hybridized FCI were then compared with traditional Fuzzy C-means. The
empirical result indicates the algorithmic outcome produces greater quality clusters
with a much lower standard deviation on the particular dataset. In the future, per-
formance of the traditional FCM could be improved and validated by comparing
with concurrent metaheuristics. A very recent and promising class of optimization
frameworks includes the socio-inspired metaheuristics which are evolutionary algo-
rithms inspired from the social behavior of humans seen in various societal setups.
Another scope for research could be to use a modified CI algorithm which would be
self-adaptive in nature, which will aid in further optimization of the traditional FCM.
References
1. Agrawal S, Agrawal J (2015) Neural network techniques for cancer prediction: a survey. Proc
Comput Sci 60:769–774
2. Ahmad LG, Eshlaghy AT, Poorebrahimi A, Ebrahimi M, Razavi AR (2013) Using three
machine learning techniques for predicting breast cancer recurrence. J Health Med Inform
4(124):3
3. Asri H, Mousannif H, Al Moatassime H, Noel T (2016) Using machine learning algorithms
for breast cancer risk prediction and diagnosis. Proc Comput Sci 83:1064–1069
7 A Hybridized Data Clustering for Breast Cancer Prognosis … 125
28. Odajima K, Pawlovsky AP (2014) A detailed description of the use of the kNN method for
breast cancer diagnosis. In: 2014 7th international conference on biomedical engineering and
informatics (BMEI). IEEE, pp 688–692
29. Ojha U, Goel S (2017) A study on prediction of breast cancer recurrence using data min-
ing techniques. In: 2017 7th international conference on cloud computing, data science and
engineering-confluence. IEEE, pp 527–530
30. Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means
clustering techniques: a comprehensive study. In: Advances in computer science, engineering
and applications. Springer, Berlin, Heidelberg, pp 451–460
31. Ramani R, Valarmathy S, Vanitha NS (2013) Breast cancer detection in mammograms based
on clustering techniques—a survey. Int J Comput Appl 62(11)
32. Suganya R, Shanthi R (2012) Fuzzy c-means algorithm—a review. Int J Sci Res Publ 2(11):1
33. Suthaharan S (2016) Machine learning models and algorithms for big data classification. Integr
Ser Inf Syst 36:1–12
34. Verma A, Khanna G (2016) A survey on image processing techniques for tumor detection
in mammograms. In: 2016 3rd international conference on computing for sustainable global
development (INDIACom). IEEE, pp 988–993
35. Yang MS (1993) A survey of fuzzy clustering. Math Comput Model 18(11):1–16
Chapter 8
Development of Algorithm for Spatial
Modelling of Climate Data
for Agriculture Management
for the Semi-arid Area of Maharashtra
in India
1 Introduction
Agriculture is the backbone of Indian economy. It not only provides the food grains
and other raw material but also it provides employment opportunities to more than
50% of the population [22]. It acts as a major source of income and also provides the
food and fodder to the livestock. It is the major contributor to the national income and
brings the foreign exchange to the country [12, 17, 19]. The semiarid and arid regions,
contribute 67% of the net sown area in India. The semiarid region of India extends
over 218 districts across 14 states [23]. The states in the Northern region include
Rajasthan, Punjab, Gujarat, Haryana and the southern regions include Maharashtra,
Karnataka and Tamilnadu, Telangana. Out of the 174 million hectares cropped area
in India, 131 million hectare lies under semiarid regions [10]. In spite of the major
contribution of rainfed agriculture in Indian agriculture, the region is facing prob-
lems such as low productivity of the major rainfed crops and the degradation in the
socioeconomic conditions of the small and marginal farmers. The agricultural crop
production system in this region is greatly influenced by climatic parameters such
as rainfall, temperature and evapotranspiration [15, 16, 24]. Increase in temperature
due to climate change increases the potential evapotranspiration and thus increases
the crop water requirement by 10% in semiarid and arid regions of India [18]. The
uncertainty in the rainfall and limited irrigation facilities affects the crop yield in this
region. The variations in the climate affect the crop management activities and credit
investment management of the farm. It becomes difficult for the farmers to adjust
their farm management activities and amount of investment to be done in the crop
production inputs. It affects the season cycle and because of this the gap between crop
yield and investment affects the income prospects for the farmers [13]. The variabil-
ity in the rainfall also have major effect on crop yield in the region. The distribution
of rainfall during the crop growth cycle is uneven, receiving scarce amount when in
need, and high amounts when already in abundance, thereby, adversely affecting the
crop yield in the respective regions [1, 3–8, 18]. The mid season growth for the semi
arid crops get affected due to these prolonged rainless spells [21].
The area selected for the study is a semiarid region of Sataradistrict, Maharashtra.
The area selected for the current study covers eastern part of Satara district. Admin-
istratively, it covers five talukas of the district namely, Khandala, Koregaon, Phaltan,
Man, Khatav. The geographical location of the study area covers the area between
17°22 54.807 N to 18°10 57.579 N and 73°52 14.2566 E to 74°54 35.0238 E,
which corresponds to an area of 5454.80 km2 . The agriculture pertaining to our study
area is completely dependent on the rainfall. The region suffers from a climate change
and it is classified as drought region for 20% of the years between 1991 and 2011 [9,
14, 26]. The maximum number of continuous dry days affects the crop growth and
irrigation scheduling in the study area [3]. There is strong need to propose a model
which will provide the early warnings to the farmers in this study area about the spa-
tial variation of climate parameters. The current study has proposed an algorithm for
spatial modelling of the climate data provided by Indian Meteorological department
(IMD).
2 Methodology
This section explains the step by step method to design the spatial modelling of
climate data. The spatial data generated with the mentioned algorithm is validated
with the real time satellite data of Tropical Rainfall Monitoring Mission (TRMM).
Figure 1 shows the flow of the proposed architecture for climate data processing
system which we have named as “Day wise Spatial Climate Data Generation Process
(DSCDGP)” consists of processes like export to database, day-wise table generation,
spatial data modeling of climate data and spatial modeling of climate requirements
for crop growth period.
The grid wise climate data collected from Indian Meteorological department was
imported to Microsoft Excel Workbook by using text to columns utility in Excel.
For both the type of climate data, rainfall and temperature, a separate workbook was
created and stored in one common folder named as “Input”. This folder was given
as an input to the next process for exporting to Oracle database. Table 2 shows the
Module 1 Module 2
DB of
climate Export to Oracle Day-wise Table
data Generation
Database
from
Module 4 Module 3
Spatial Modeling of Spatial Modeling of
Spatial
climate requirements Rainfall, temperature&
DBofclimat
e data for crop growth period Reference
Evapotranspiration
Fig. 1 Overview of proposed day wise spatial climate data generation process (DSCDGP)
sample of the database collected from IMD for rainfall data, for the date 01st January
2012. In the Table 2, the top left cell shows the date for which data was collected
and then remaining column headings show the longitude values and row headings
shows the latitude values.
The objective of this module was to read all the workbook files from the “Input” folder
specified and export the data to Oracle database. The algorithm for this process, for
which code was written using Java as a programming language is as follows:
Step 1: Import Apache Poor Obfuscation Implementation (POI) libraries in the java
file. The Apache POI is the Java Application Programming Interface (API)
to access the Microsoft by Apache.
Step 2: Create an object of HSSF Workbook class from Apache POI was used to
represent the workbook and locate a sheet.
Step 3: Read a number of rows and columns from the entire worksheet.
Step 4: Locate the cell values for the selected study area from the worksheet and
read those values from the total rows and columns.
Step 5: Generate the Oracle “Insert table” query script (.sql) by concatenating cell
values.
Step 6: Execute the “Insert table” .sql script in Oracle database.
After Step 6, the tables are created in the Oracle database. Table 3, shows the
sample structure of the table created in Oracle database.
The objective of this module was to read the day wise climate data from the master
table created in Module-2 and create separate tables (views) for each date, for the
selected latitude and longitude. Thus for the year, 365/366 views will be created for
the selected area at the end of this module. For this, the Oracle script was written to
read the data from the master table and create day wise separate Oracle views. For
this process execution, in an Oracle script file, the concept of cursors in SQL was
used. Oracle cursor is nothing but a memory area or a context area which holds the
8 Development of Algorithm for Spatial Modelling of Climate Data … 131
results of the SQL query. The algorithm for this process, which was written using
Structured Query Language (SQL) as a programming language is as follows:
Step 1: Create a cursor which finds the distinct dates from the base table.
Step 2: Open the cursor, and for each date from the cursor, retrieve the data from
the base table and create a view.
This module was designed to develop the spatial representation of day wise tabular
representation of rainfall and temperature data, which was created in the previous
step. For this process, a model was written in ARCGIS. The model iteratively reads
the table from the database, and then creates an XY layer from the same. Daily average
temperature and rainfall data were calculated using Inverse Distance Weighted (IDW)
Interpolation method from each day wise XY layer. The mathematical model of IDW
is based on the basic principle that, the values which are closest to the prediction
location will have the highest weight than the values which are far away from the
prediction location. The value of the weight will go on reducing as the measured
value moves away from the value to be predicted [6, 20, 25]. From the interpolated
image the study area was extracted to generate day wise climate parameters maps
for the study area (Fig. 2).
In this next step of this module, the reference evapotranspiration was calculated
from the spatial data prepared in the previous module for minimum and maximum
temperature. Reference evapotranspiration is the amount of water evaporated by soil
surface [2]. The reference evapotranspiration was calculated using the Hargreaves
Potential Evapotranspiration (PET) method (Eq. 1) [11].
45
40
35
30
25
20
15
10
0
1 51 101 151 201 251 301 351
Day of Year
where
Ra The total incoming extraterrestrial Solar radiation
Tmax Daily Maximum Temperature
Tmin Daily Minimum Temperature.
The algorithm for this process, for which code is written using Python as a
programming language is as follows:
Step 1: Import the arcpy library.
Step 2: Set the workspace for the code, as the path where the Spatial data of
temperature is stored.
Step 3: For each date, read the minimum and maximum temperature spatial data
from the database.
Step 4: Calculate the reference evapotranspiration using Eq. 1.
Step 5: Repeat the step for all the days of the year.
8 Development of Algorithm for Spatial Modelling of Climate Data … 133
At the end of this proposed system, “Day wise Spatial Climate Data Generation Pro-
cess (DSCDGP)”, from the 365/366 days spatial data of the year, climate require-
ments for crop growth period were calculated. The calculating of climate requirement
includes, total rainfall, average minimum and maximum temperature and total crop
water requirement. The algorithms for this process, for which code was written using
Python as a programming language is as follows:
• Algorithm for total rainfall during the crop growth cycle:
Step 1: Read the start date of the crop growth cycle.
Step 2: Read the end date of the crop growth cycle.
Step 3: Iterate through the database of spatial data created in Module 3 between the
start date to end date and calculate the total of all the maps.
Step 4: Save the result of Step 3 spatial data.
The spatial representation of climate data was done for the years 2010–2013 by
applying DSCDGP (Figs. 4, 5, 6 and 7). The analysis of the spatial data generated
includes the study of the variation of climate parameters such as rainfall, temperature,
reference evapotranspiration for the study area. For the study area, the reference evap-
otranspiration derived from rainfall and temperature shows that, the daily minimum
reference evapotranspiration ranges between 6.34 to 7.57 mmd−1 and maximum ref-
erence evapotranspiration ranges between 16.33 and 17.33 mmd−1 . The analysis also
shows that, there was a decrease in rainfall and temperature and thus the reference
evapotranspiration also decreased from the year 2010–2011. There was not much
variation in evapotranspiration for the year 2011–2012. The results also revealed
that, as the temperature has increased and rainfall has decreased, the reference evap-
otranspiration has increased from year 2012–2013. The trend analysis of reference
evapotranspiration shows that reference evapotranspiration is more from January to
May and then from June onwards it decreases as the temperature reduced and rain-
fall increases (Table 4). Figure 3 shows the variation of relationship between climate
parameters for the year 2013.
The analysis of climate data for the study area concludes that, there is uneven dis-
tribution of the rainfall and continuing increase in the maximum temperature. The
136 V. Kumbhar and T. P. Singh
study revealed that average maximum temperature for the study area has gradually
increased above 38 °C. Because of increase in temperature and unusual rainfall, the
reference evapotranspiration has increased for the study area and this has affected
the soil moisture contents and fertility of the soil for the region. The studies also
show that increase in reference evapotranspiration has increased the crop evapotran-
spiration and because of increase in water evaporated by crop, increased the crop
water requirement for the crop. The lack of water availability and rainfall dependent
agriculture has affected the crop yield for the region.
8 Development of Algorithm for Spatial Modelling of Climate Data … 137
The results of DSCDGP were validated with the Tropical Rainfall Monitoring Mis-
sion (TRMM) data for the years 2012 and 2013. The monthly 0.0.25 × 0.25 (degree)
TRMM data product 3B43 was collected for the study area from Goddard Earth
Sciences Data and Information Services Center (GES DISC). The study area was
extracted from the TRMM data. The monthly average rainfall, maximum rainfall
and minimum rainfall were found from the extracted data. The results of correla-
tion analysis for average rainfall between TRMM and IMD for the year 2012 was
observed to be 0.865 and for the year 2013 was 0.990 (Fig. 8). This validates the
proposed DSCDGP system.
4 Conclusion
The suggested method has proposed a system for climate data process named as “Day
wise Spatial Climate Data Generation Process (DSCDGP)” which has automatized
the process of generating spatial representation of climate data. This process has
offered the agricultural experts an easy technique to study the spatial variation of
climate parameters and helps them for contingency planning of the study area. The
current research has also validated the grid wise Indian Meteorological Department
8 Development of Algorithm for Spatial Modelling of Climate Data … 139
(IMD) rainfall data with the Tropical Rainfall Monitoring Mission (TRMM) satellite
rainfall data. The model will have predicted climate data from IMD for the upcoming
season and soil data for the farmer from the selected taluka and village. From the
daily spatial climate data the crop growth stage wise variation of climate parameters
will help farmers for micro level planning of agriculture of study area. With the
real time availability of IMD data the model will provide early warnings of climate
variation can help farmers to decide the crops. If monsoon is delayed then contingency
planning of crop can be done as per the rainfall.
References
1. Aggarwal et al (2010) Managing climatic risks to combat land degradation and enhance food
security: key information needs. Proc Environ Sci 1:305–312
2. Allen RG, Pereira LS, Raes D, Smith M (1998) Crop evapotranspiration-guidelines for
computing crop water requirements-FAO irrigation and drainage paper 56. FAO, Rome
300(9):D05109
3. Atal KR, Zende AM (2015) Wet and dry spell characteristics of semi-arid region, western
Maharashtra, India E-proceedings of the 36th IAHR world congress, deltas of the future and
what happens upstream, pp 1–7. International Association for Hydro-Environment Engineering
and Research-IAHR, The Hague, The Netherlands
4. Balaghi et al (2010) Managing climatic risks for enhanced food security: key information
capabilities. Proc Environ Sci 1:313–323
5. Bantilan MCS, Aupama KV (2006) Vulnerability and adaptation in dryland agriculture in
India’s SAT: experiences from ICRISAT’s village-level studies. J SAT Agric Res 2(1):1–14
6. Burrough PA, McDonnell RA (1998) Principles of geographical information systems. Oxford
University Press Inc., New York, pp 333–340
7. Coe R, Stern RD (2011) Assessing and addressing climate-induced risk in sub-Saharan rainfed
agriculture: lessons learned. Exp Agric 47(02):395–410
8. Cooper PJM, Dimes J, Rao KPC, Shapiro B, Shiferaw B, Twomlow S (2008) Coping better
with current climatic variability in the rain-fed farming systems of sub-Saharan Africa: an
essential first step in adapting to future climate change? Agric Ecosyst Environ 126(1):24–35
9. Dawane PR (2015) A comparative study of dairy co-operative unions in Satara district. Doctoral
dissertation. http://hdl.handle.net/10603/34954
10. Gautam R, Rao J (2007) Integrated water management-concepts of rainfed agriculture. Cen-
tral Research Institute of Dryland Agriculture (CRIDA), IARI. http://nsdl.niscair.res.in/jspui/
bitstream/123456789/554/1/Conceptsofrainfedagriculture-Formatted.pdf
11. Hargreaves GH (1994) Simplified coefficients for estimating monthly solar radiation in North
America and Europe, departmental paper, Department of boiler and irrigation engineering,
Utah State University, Logan, Utah
12. Himani (2014) An analysis of agriculture sector in indian economy. IOSR J Human Soc Sci
19(1):47–54
13. Hochman Z, Horan H, Reddy DR, Sreenivas G, Tallapragada C, Adusumilli R, Roth CH (2017)
Smallholder farmers managing climate risk in India: 1. Adapting to a variable climate. Agric
Syst 150:54–66
14. Jagannath B (2014) Rainfall trend in drought prone region in eastern part of Satara district of
Maharashtra, India. Euro Acad Res 2(1):329–340
15. Krishna Kumar K, Rupa Kumar K, Ashrit RG, Deshpande NR, Hansen JW (2004) Climate
impacts on Indian agriculture. Int J Climatol 24(11):1375–1393
16. Mall RK, Singh R, Gupta A, Srinivasan G, Rathore LS (2006) Impact of climate change on
Indian agriculture: a review. Clim Change 78(2–4):445–478
140 V. Kumbhar and T. P. Singh
17. Mathur AS, Das S, Sircar S (2006) Status of agriculture in India: trends and prospects. Econ
Polit Weekly 41(52):5327–5336
18. Meinke H, Nelson R, Kokic P, Stone R, Selvaraju R, Baethgen W (2006) Actionable climate
knowledge: from analysis to synthesis. Clim Res 33(1):101–110
19. Pandey MM (2009) Indian agriculture—an introduction [Country Report]. Asian and Pacific
centre for agricultural engineering and machinery (APCAEM). Thailand, Country Report, India
20. Philip GM, Watson DF (1982) A precise method for determining contoured surfaces. Aust Pet
Explor Assoc J 22(1):205–212
21. Sarker RP, Biswas BC (1978) Agricultural meteorology in India: a status report. In: Agrocli-
matological research needs of the semi-arid tropics, proceedings of the international workshop
on the agroclimatological research needs of the semi-arid tropics. International Crops Research
Institute for the Semi-Arid Tropics
22. Sharma VP (2011) India’s agricultural development under the new economic regime: policy
perspective and strategy for the 12th five year plan. Indian Institute of Management
23. Singh HP, Venkateswarlu B, Vittal KPR, Ramachandran K (2000) Management of rainfed agro-
ecosystem. In: Proceedings of the international conference on managing natural resources for
sustainable agricultural production in the 21st century, pp 14–18
24. Sinha SK, Singh GB, Rai M (1998) Decline in crop productivity in Haryana and Punjab: myth
or reality. Indian Council of Agricultural Research, New Delhi, India
25. Watson DF, Philip GM (1985) A refinement of inverse distance weighted interpolation. Geo-
processing 2(4):315–327
26. Zende AM, Nagarajan R, Atal KR (2012) Rainfall trend in semi arid region-Yerala river basin
of western Maharashtra, India. Int J Adv Technol 3:137–145
Chapter 9
A Survey on Human Group Activity
Recognition by Analysing Person Action
from Video Sequences Using Machine
Learning Techniques
1 Introduction
S. Kulkarni
D.Y. Patil College of Engineering, Akurdi, Pune, India
e-mail: [email protected]
S. Jadhav
Army Institute of Technology, Dighi, Pune, India
e-mail: [email protected]
S. Kulkarni · D. Adhikari (B)
MIT Academy of Engineering, Alandi (D), Pune, India
e-mail: [email protected]
The majority of previous work [2, 3] on GAR is modelled on a small group of actions
with comprehensible structural level information. In [9], the authors modelled the
scenes using 2D polygonal shape, and each person in the group is considered over a
period. The model is functional for malfunction detection in surveillance. Application
of rigidity 3D polygonal formation to represent the parade group activity is discussed
in [10]. They use an entire group as a complete activity instead of considering every
person individually.
In [11], authors employ probabilistic, highly structured techniques for recogniz-
ing actions such as the way American football is played. In [12], recognized multi-
player games and player strategies are discussed. Specific group behaviour activity
is recognized in [13] with the help of multiple cameras. Two hierarchical clustering
approaches are anticipated in [14] for real-time surveillance in a challenging envi-
ronment. The major problem of these frameworks is its design for particular type
9 A Survey on Human Group Activity Recognition … 143
of activities with firm strategy, and as a result cannot apply to additional general
activities. Stochastic representation is present in [15, 16] which describes equally
spatial and temporal engagements between group people intended for more general
group activities.
However, many research experts encode the illustration of actions manually. The
above-mentioned approaches are able to recognize group activities automatically and
those which are important for surveillance and sport analytics applications.
Various ML techniques in [4] have been employed for automatic human activity
recognition modelling. Interactions between people are approximated based on prob-
ability distribution using hidden Markov model (HMM) with distribution of sequence
learning. The HMM framework in [17] is able to be used to model stochastic methods
wherever the non-observable state of the scheme is directed by a Markov process.
The human-recognizable sequences of the system have an essential probabilistic
dependency. An HMM computing the probability model is used for recognizing
activities.
A layered probabilistic representation of HMM is successfully applied for
sequence learning of actions. Single-layer HMM faces the overfitting problem due to
limited training data. In [17], a proposed two-layer HMM structure that had benefited
over previous works to discriminate group actions from individual actions is being
discussed. Layer I-HMM represents the individual person action, and G-HMM des-
ignates the group action. Most of these previous methods are developed for a fixed
number of group members. They cannot handle a changing number of group mem-
bers. For automatic group activity detection, asynchronous HMM is implemented in
[18] that handle changing number of group members. In [19], symmetry group activ-
ity is captured by HMM model for recognizing symmetry activities by computation
of probability.
Though HMM integrates temporal information, it has a drawback which requires
large-scale training data [20]. HMM is less efficient for relating and differentiating
complicated temporal interactions along with several trajectories in-group activities.
Besides in-group activity, recognition handling motion uncertainties of an individual
person is an essential issue. In view of the fact that uncertain motion nature of
persons differs inherently in-group activities by this, the recognition accuracy may
be significantly affected. Thus, it is essential to build up a more flexible recognition
framework [21].
However, these approaches had restrictions for identifying the scene-related
actions due to the negligence of relationship between persons in addition to their
surrounding scene. In unpredictable situations, HMM model becomes complex and
restricted to represent interaction between people.
144 S. Kulkarni et al.
relationships and [24, 25] formulate ST-AOG as Monte Carlo Tree Search (MCTS)
cost-sensitive inference between persons’ action. This model leads to more challeng-
ing learning problems. In [31], authors have implemented fully connected graph to
discover subgroups of interacting people. In [27, 33], authors have modelled latent
adaptive structures and grouping nodes [30] to discriminate interactions in a scene.
These frameworks are trained using handcrafted features and cannot straight-
forwardly be adopted in deep learning learned feature models. Effectively com-
bined graphical model with the deep neural network [36, 38] captures dependen-
cies between persons and gains competitive good accuracy than latent max-margin
graphical method [27]. These methodologies for GAR is not feasible because these
approaches frequently involve high computational cost. It is very difficult to gen-
eralize higher-order interactional framework, using the graphical method. Various
graphical structures [39] have been discovered to model pair-wise interaction con-
text; however, they cannot indicate the entire interaction context adequately and
efficiently. In [39], this group activity optimized multi-target tracking interaction
between pair-wise people using hypergraph Bayesian technique. This hypergraph
solution is efficiently applicable to real-world application as camera calibration is
not essential.
In recent studies, developed contextual information model for individual person
and nearby person does not adequately represent the spatial and temporal reliability in
group actions. To overcome this problem, Kaneko et al. [40] illustrated a technique to
assimilate the individual recognition information through fully connected conditional
random fields (CRFs), which describe every relation among the people in a video
frame and adjust the relations strength by means of the amount of their similarity.
where x is person action labels, y is group action label, and f (x) is predicted label.
The inference problem is solved by optimizing the model parameters w to find the best
group action label y for a person action labels x. Maximizing the distance between
the hyperplanes requires minimizing w which is an optimization problem and can
be written as in Eq. (2).
146 S. Kulkarni et al.
n
min λw + (1 − yi(xi, w)) (2)
i=1
In Eq. (2), the first term is a regularizer of the SVM; the second term is the loss. The
regularizer λ balances between margin maximization and loss. For a misclassified
sample, update the weight vector w using the gradients, else if classified properly.
The improved discriminative capacities of SVMs are robust and thus are appropri-
ate for GAR in noisy surroundings. The kernel matrix involved in SVM is proficient
for handling high-dimensional data in the optimization process.
The major computational challenge in SVM learning is loss-augmented inference
or finding the most complicated group activity. In SVM, kernel selection is tricky
task on which output accuracy depends for a given task. The input vectors of an
SVM require fixed dimensions, whereas in GAR each sequence can have variable
intervals.
Hand-engineered and static feature human activity models [22–40] are not suitable
for automatic high-level learning. The state-of-the-art method for GAR consists of
handcrafted feature extractor as densely or sparse significant points (e.g. HOG and
SIFT) in a Bag-of-Words static feature model, which are not suitable for contin-
uous learning. These are then used to learn interaction between people. Manually
selected features and static interaction models require independent design for each
application. These models are incapable of handling dynamic environments due to
the static nature of the feature model. Thus, it is essential to develop techniques
for online activity recognition based on automatic learning of the feature models
for GAR recognition, from the unlabelled data in unsupervised manner for newly
arriving instances.
Recently, deep learning has been implemented effectively into several regions
such like computer vision, natural language processing, audio recognition and bioin-
formatics. In [41], the authors implemented automatically selected deep hybrid fea-
tures for continuous active learning. These deep learning methods ensure significant
improvement in the performance of action recognition in computer vision [42].
The deep model needs to learn spatiotemporal relations between the persons [36].
GAR is a higher-level representation that captures scene-level actions. Spatiotem-
poral relations are changed for different group activity. For complex group activity,
handcrafted feature approach is limited for representation as it uses linear mode. The
most recent subfield of ML, deep learning, is able to act as an association involving
big video data and intelligent group activity learning. Deep learning approaches are
capable to represent high-level video data and classify pattern by assembling multi-
ple layers of statistics segments during hierarchical structural design [43]. Most of
9 A Survey on Human Group Activity Recognition … 147
the earlier group activity recognition approaches do not deal with high-order inter-
actional framework and are restricted to offer flexible and scalable structure. Deep
learning-based methods have an end-to-end effectiveness within trainable model for
higher-level reasoning [44].
Most previous works take the approach of indirect modelling structure of frame-level
classifiers successively over a video at multiple temporal scales which do not satisfy
accuracy as well as computational efficiency.
Group activity recognition needs sequential nature frames by means of individual
person action and interaction among persons with dynamic temporal information.
Recurrent neural network (RNN) handles variable length space–time inputs and
dynamic temporal behaviour as it contains nonlinear units. RNN is broadly appro-
priate for video analysis tasks such as activity recognition [44]. To model person-
level dynamics to entire group dynamics, deep model by assembling several layers
of RNN recommended. Visual recognition approaches emphasize on deep learn-
ing methodologies associating the reasonably low-level model’s output to interpret
higher-level compositional scenes. This remains a challenging task. In [38], graphical
models are integrated with deep neural networks. Additionally, RNN models high-
light the dynamics of human interaction as collective group activity. RNN model gets
deteriorated from vanishing gradients which neglect human interaction dynamics.
148 S. Kulkarni et al.
Recurrent neural networks based on the long short-term memory (LSTM) models
have accomplished decent achievement in a great variety of applications having
temporal sequencing data. Sequence learning represented by RNN/LSTM from video
frames signifies improved performance to describe group-level dynamics in spite of
demonstrating group action from a video frame.
In recent times, LSTM has turned out to be excellent in modelling dynamics of
individual person action identification. This is owing to its capability of capturing the
sequential temporal motion facts. LSTM includes additional ‘memory cell’ modules
for keeping information over longer periods, which permits them to learn long-term
dependencies of human interaction dynamics [42].
In [44], the authors present an end-to-end mode by means of a combination of
back propagation and reinforce methodology for action recognition in video which
motivates directly to predict sequential bounds of actions. In multi-person event
occurrence, though many persons are acting, only a small group of persons contribute
to a definite event in the scene. In [45], the authors proposed a method which acquires
time-varying features at every time instant and are processed using RNN to indicate
responsible people for the event classification. The bidirectional LSTM hidden states
are then used by an attention model to recognize the ‘key’ player at each instant.
Recursive network including LSTM accepts orderly input sequences. However,
in GAR the position of person-level features is without order. To recognize group
activity, hierarchical deep model is assembled [46, 47]. Additionally, the model
requires clear tags for person actions which are exclusive and rigid to recognize
activities in sports like ice hockey.
For classifying group activity in ice hockey as suggested in [48], a deep learning
model by feature aggregation of a person’s data is combined in the context of the
activities in ice hockey games. In [49], hierarchical relational deep network model
learns relational feature illustrations between persons in a scene that can efficiently
classify person and group activity.
In [48], authors proposed bidirectional LSTM network, for group interaction pre-
diction that incorporates both global motion and detailed local action dynamics of
each individual. In [50], GAR is implemented by confidence-energy recurrent net-
work (CERN) by minimization of the energy and maximization of the confidence
measure of predictions.
In-group activities, recognizing multi-person interaction and information of every
individual is a challenging problem. Group activity analysis is required in several
applications such as societal incident prediction. Semantics-based GAR structure is
proposed in [51] which uses two-stage LSTM model that accomplishes higher accu-
racy and effectiveness. CNN features perform well in the task of scene classification.
It is extremely essential to be able to predict a group activity in real time for some
application scenarios, e.g. sport analytics.
9 A Survey on Human Group Activity Recognition … 149
6 Conclusion
In this review article, a brief introduction regarding GAR using ML models is given
to present an insight to a reader interested in this domain. ML in GAR was initi-
ated with probabilistic structure modelling and followed by layered HMM model for
150 S. Kulkarni et al.
sequence learning of action. However, these approaches had limitations for complex-
ity of inference in unexpected circumstances due to the negligence of relationship
between persons in addition to their surrounding scene. GAR established on hand-
crafted feature-based machine learning model resolves complexity in scene-related
actions by considering individual action context model, person-to-person interac-
tion graphical model along with SVM classifier for GAR. Graphical model cannot
indicate the entire group interaction context adequately and efficiently. This com-
plexity has been recently achieved with the modern developments in learned features
using deep learning model. Deep learning-based methods have effectiveness within
higher-level reasoning as GAR.
References
1. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In:
ICCV
2. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In:
ICPR
3. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of
the IEEE international conference on computer vision, pp 3551–3558
4. Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human
activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473
5. Aggarwal JK, Ryoo (2011) MS human activity analysis. ACM Comput Surv 43(3):1–43
6. Ke S-R, Thuc H, Lee Y-J, Hwang J-N, Yoo J-H, Choi K-H (2013) A review on video-based
human activity recognition. Computers 2(2):88–131
7. Vahora SA, Chauhan NC (2017) A comprehensive study of group activity recognition methods
in video. Indian J Sci Technol 10(23):1–11
8. Stergiou A, Poppe R (2018) Understanding human-human interactions: a survey. arXiv:1808.
00022
9. Vaswani N, Roy Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics
of the configuration of interacting objects. In: 2003 IEEE computer society conference on
computer vision and pattern recognition, proceedings, vol 2, pp II-633
10. Khan SM, Shah M (2005) Detecting group activities using rigidity of formation. In: Proceedings
of the 13th annual ACM international conference on multimedia, pp 403–406
11. Intille SS, Bobick AF (2001) Recognizing planned multiperson action. Comput Vis Image
Underst 81(3):414–445
12. Moore D, Essa I (2002) Recognizing multitasked activities from video using stochastic context-
free grammar. In: AAAI/IAAI, pp 770–776
13. Cupillard F, Brémond F, Thonnat M (2002) Group behavior recognition with multiple cameras.
In: Sixth IEEE workshop on applications of computer vision, proceedings, pp 177–183
14. Chang M-C, Krahnstoever N, Lim S, Yu T (2010) Group level activity recognition in crowded
environments across multiple cameras. In: 7th IEEE international conference on advanced
video and signal based surveillance, pp 56–63
15. Ryoo MS, Aggarwal JK (2009) Stochastic representation and recognition of high-level group
activities: describing structural uncertainties in human activities. In: 2009 IEEE computer
society conference on computer vision and pattern recognition workshops, pp 11–11
16. Ryoo MS, Aggarwal JK (2011) Stochastic representation and recognition of high-level group
activities. Int J Comput Vision 93(2):183–200
17. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions
in meetings with layered HMMs. IEEE Trans Multimedia 8(3):509–520
9 A Survey on Human Group Activity Recognition … 151
18. Lin W, Sun M-T, Poovendran R, Zhang Z (2010) Group event detection with a varying number
of group members for video surveillance. IEEE Trans Circuits Syst Video Technol 20(8):1057–
1067
19. Zaidenberg S, Boulay B, Brémond F (2012) A generic framework for video understanding
applied to group behavior recognition. In: IEEE ninth international conference on advanced
video and signal-based surveillance. Beijing, pp 136–142
20. Guo P, Miao Z, Zhang X, Shen Y, Wang S (2012) Coupled observation decomposed hidden
markov model for multiperson activity recognition. IEEE Trans Circuits Syst Video Technol
22(9):1306–1320
21. Lin W, Chu H, Wu J, Sheng B, Chen Z (2013) A heat-map-based algorithm for recognizing
group activities in videos. IEEE Trans Circuits Syst Video Technol 23(11):1980–1992
22. Choi W, Shahid K, Savarese S (2009) What are they doing? Collective activity classification
using spatio-temporal relationship among people. In: IEEE 12th international conference on
computer vision workshops, ICCV workshops, pp 1282–1289
23. Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning
a visually grounded storyline model from annotated videos. In: IEEE conference on computer
vision and pattern recognition, pp 2012–2019
24. Amer MR, Xie D, Zhao M, Todorovic S, Zhu S-C (2012) Cost-sensitive top-down/bottom-
up inference for multiscale activity recognition. European conference on computer vision.
Springer, Berlin, Heidelberg, pp 187–200
25. Amer MR, Todorovic S, Fern A, Zhu S-C (2013) Monte carlo tree search for scheduling
activity recognition. In: Proceedings of the IEEE international conference on computer vision,
pp 1353–1360
26. Lan T, Wang Y, Yang W, Mori G (2010) Beyond actions: discriminative models for contextual
group activities. In: Advances in neural information processing systems, pp 1216–1224
27. Lan T, Wang Y, Yang W, Robinovitch SN, Mori G (2012) Discriminative latent models for
recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562
28. Lan T, Sigal L, Mori G (2012) Social roles in hierarchical models for human activity recognition.
In: IEEE conference on computer vision and pattern recognition, pp 1354–1361
29. Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity
recognition. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp
215–230
30. Amer MR, Lei P, Todorovic S (2014) Hirf: hierarchical random field for collective activity
recognition in videos. In: European conference on computer vision. Springer, pp 572–585
31. Choi W, Chao YW, Pantofaru C, Savarese S (2012) Discovering groups of people in images.
In: European conference on computer vision. Springer, pp 417–433
32. Khamis S, Morariu VI, Davis LS (2012) Combining per-frame and per-track cues for
multi-person action recognition. European conference on computer vision. Springer, Berlin,
Heidelberg, pp 116–129
33. Hajimirsadeghi H, Mori G (2015) Learning ensembles of potential functions for structured pre-
diction with latent variables. In: Proceedings of the IEEE international conference on computer
vision, pp 4059–4067
34. Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Contextaware modeling and recognition of
activities in video. In: Computer vision and pattern recognition (CVPR), IEEE conference, pp
2491–2498
35. Tran KN, Gala A, Kakadiaris IA, Shah SK (2014) Activity analysis in crowded environments
using social cues for group discovery and human interaction modeling. Pattern Recogn Lett
44:49–57
36. Deng Z, Zhai M, Chen L, Liu Y, Muralidharan S, Roshtkhari MJ, Mori G (2015) Deep structured
models for group activity recognition. In: British machine vision conference, pp 179.1–179
37. Hajimirsadeghi H, Yan W, Vahdat A, Mori G (2015) Visual recognition by counting instances: a
multi-instance cardinality potential kernel. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 2596–2605
152 S. Kulkarni et al.
38. Deng Z, Vahdat A, Hu H, Mori G (2016) Structure inference machines: recurrent neural
networks for analyzing relations in group activity recognition. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 4772–4781
39. Li W, Chang MC, Lyu S (2018) Who did what at where and when: simultaneous multi-person
tracking and activity recognition. arXiv:1807.01253
40. Kaneko T, Shimosaka M, Odashima S, Fukui R, Sato T (2014) A fully connected model for
consistent collective activity recognition in videos. Pattern Recogn Lett 43:109–118
41. Hasan M, Roy-Chowdhury AK (2015) A continuous learning framework for activity recogni-
tion using deep hybrid feature models. IEEE Trans Multimedia 17(11):1909–1922
42. Bisagno N, Zhang B, Conci N (2018) Group LSTM: group trajectory prediction in crowded
scenarios. In: Proceedings of the European conference on computer vision (ECCV)
43. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video
classification with convolutional neural networks. In: Proceedings of the IEEE conference on
computer vision and pattern recognition, pp 1725–1732
44. Yeung S, Russakovsky O, Mori G, Fei-Fei L (2016) End-to-end learning of action detection
from frame glimpses in videos. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 2678–2687
45. Ramanathan V, Huang J, Abu-El-Haija S, Gorban A, Murphy K, Fei-Fei L (2016) Detecting
events and key actors in multi-person videos. In: Proceedings of the IEEE conference on
computer vision and pattern recognition, pp 3043–3053
46. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal
model for group activity recognition. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1971–1980
47. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) Hierarchical deep temporal
models for group activity recognition. arXiv:1607.02643
48. Tora MR, Chen J, Little JJ (2017) Classification of puck possession events in ice hockey. In:
2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp
147–154
49. Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and
retrieval. In: Proceedings of the European conference on computer vision (ECCV), pp 721–736
50. Shu T, Todorovic S, Zhu S-C (2017) CERN: confidence-energy recurrent network for group
activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 5523–5531
51. Li X, Chuah MC (2017) Sbgar: semantics based group activity recognition. In: Proceedings of
the IEEE international conference on computer vision, pp 2876–2885
52. Shu X, Tang J, Qi G-J, Liu W, Yang J (2018) Hierarchical long short-term concurrent memory
for human interaction recognition. arXiv:1811.00270
53. Tsunoda T, Komori Y, Matsugu M, Harada T (2017) Football action recognition using hierarchi-
cal LSTM. In: Proceedings of the IEEE conference on computer vision and pattern recognition
workshops, pp 99–107
54. Yan R, Tang J, Shu X, Li Z, Tian Q (2018) Participation-contributed temporal dynamic model
for group activity recognition. In: ACM multimedia conference on multimedia conference, pp
1292–1300
55. Azar SM, Atigh MG, Nickabadi A (2018) A multi-stream convolutional neural network
framework for group activity recognition. arXiv:1812.10328
56. Bagautdinov T, Alahi A, Fleuret F, Fua P, Savarese S (2017) Social scene understanding: end-
to-end multi-person action localization and collective activity recognition. In: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp 4315–4324
57. Biswas S, Gall J (2018) Structural recurrent neural network (SRNN) for group activity analysis.
In: IEEE winter conference on applications of computer vision (WACV), pp 1625–1632
58. Tang Y, Wang Z, Li P, Lu J, Yang M, Zhou J (2018) Mining semantics-preserving attention for
group activity recognition. In: 2018 ACM multimedia conference on multimedia conference,
pp 1283–1291
9 A Survey on Human Group Activity Recognition … 153
1 Introduction
With the introduction of artificial intelligence news anchor by China’s state news
agency Xinhua, the world of journalism has witnessed the adoption of the next level
of technology [2]. The ongoing transformations in media landscape remain unabated
across the globe. The radical digital advancements and innovations could be attributed
to the sea changes in information and communication technologies (ICTs) [3]. Such
kind of digital revolution is instrumental for the development of a nation. However,
the perception and implementation of ICTs differ from a technologically advanced
nation to technologically marginalized nation. Moreover, it has invited numerous
deliberations, which are diverse from a sector to a sector in which technology is
being utilized.
Since technology is one of the key factors for development, its positioning
by international agencies for development carries worthy discussions. The United
Nations asserts that the use of technology is required to minimize poverty, which can
drive society towards sustainable development. Hence, the use of technology and
human development cannot be isolated from each other. However, such technological
solutions should be judiciously used for societal development.
S. K. Biswal (B)
Symbiosis Institute of Media and Communication (SIMC), Symbiosis International (Deemed
University), Pune, Maharashtra 412115, India
e-mail: [email protected]
N. K. Gouda
Department of Media and Communication, School of Communication, Central University of
Tamil Nadu, Thiruvarur, Tamil Nadu 610101, India
e-mail: [email protected]
Artificial intelligence (AI) has been an important part of the technology industry.
As an academic discipline, AI came into existence in 1956 [4], and since then, it
has been experiencing a series of optimism and pessimism. AI, an area of computer
science, stresses on the creation of intelligent machines to work and react like human
beings. For this, computers with AI cover the aspects of speech recognition, learning,
planning, and problem solving. AI can be divided into analytical, human-inspired,
and humanized artificial intelligence [5].
In this twenty-first century, AI is being used in the field of health care, auto-
motive, finance and economics, video games, military, audit including advertising,
journalism, and various other branches of media and communication. It has become
an instrumental to resolve the issues in computer science, software engineering and
operations research [1, 6]. Hence, AI can be associated with all sorts of area in which
the efficiency of a human being can be enhanced.
Machine learning (ML), a subset of AI, is the scientific study of algorithms and
statistical models that computer systems perform various assignments without using
clear instructions [7]. The use of a machine is immense in the field of agriculture,
banking, communication, sentiment analysis, software engineering, user behavior
analytics, search engines, and the like. Even though its application is very important
in various fields, machine learning suffers from certain shortcomings. Lack of suitable
data, biases in choosing the data set, wrong algorithms and lack of resources, and
evaluation could be the reasons for the underperformance of such kind of technology.
Taken literarily, Marshall McLuhan’s famous quote ‘medium is the message’ [8], we
can find that the medium is getting more emphasis since the applications of AI are in
practice. As a result, the process of communication including the source, message,
and receiver is also influenced by the technology-driven by AI. On the contrary, the
stand of Manuel Castells is different.
In the context of discussing the medium of communication and AI, digital jour-
nalism and online activism come into the picture. Digital journalism and online
activism are interrelated. Online activism which is technology-driven is mobiliz-
ing social movements [9]. However, going beyond the information society, Manuel
Castells [10] opines that it is not the technology, which is the key to social structure
and social movement, but social networks which manage the technologies used for
information dissemination. Along with the importance of the message, the vitality
of the medium cannot be avoided.
The ICTs used for internal communication play a vital role in any organization
[11]. They are also being utilized in the process of teaching and learning. Both teach-
ing and learning are forms of communication. The power of technology is immense
10 Artificial Intelligence in Journalism: A Boon or Bane? 157
4 What is Journalism?
alternative digital platforms like The Quint, The Wire, Firstpost, and Daily O is con-
siderably free from government and corporate interferences in terms of disseminating
information.
The rise of community media, a source of alternative media, could be one solution
to give voice to the voiceless [24–26]. Citizen journalism, one form of alternative
media, is proliferating its space in the field where mainstream and business-driven
media outlets dominate [27]. However, it cannot replace trained journalists, as they
are part and parcel of news ecology. Everybody will believe in one thing that the
world is gripping with the issue to differentiate between truth and myth [4, 28].
Similarly, media education in India is also not free from flaws. Journalism edu-
cation is at a crossroads as there is a swift pace of digitization and globalization of
media [29]. Unfortunately, classroom teaching in several places is not able to accom-
modate these changes. Hence, the pedagogy about media education should consider
the technological improvements including the foray of artificial intelligence.
Max Boenke, Head of video, Berliner Morgenpost, has stated that nowadays
news organizations are frequently using 360 for stories, which sometimes may not
be interesting. It may deter the audiences to watch the news content further. On
the contrary, previously associated with BBC Research & Development has opined
that if VR contents are perfectly made, it can empower the journalism field. It can
make many wonders [33]. Such type of technology is very much useful for science
communication.
A study has found that journalism has become a driving force for taking and
executing VR mainstreaming. The scope of journalism has enhanced in terms of
topic, style, and scope. However, the use of VR has brought challenges in terms of
journalistic norms and practices [34]. Another study has found that in the domain of
VR, journalism remains a minor section. However, VR has enabled the emergence of
immersive journalism which has fueled the media industry and media education. The
advent of the theoretical and conceptual framework is providing a Philip to future
academic and industry endeavors through immersive journalism. Hence, the impact
of VR on journalism remains a mixed bag of advantages and disadvantages.
Similarly, the AR has a significant impact on journalism. The content of jour-
nalism has undergone multiple changes with the advent of AR. It has enhanced the
audience engagement which is not available in the traditional form of disseminating
information. Moreover, this technology has provided more contextualized informa-
tion in the age of fast-paced journalism [35]. Hence, one more aspect of AR could
be that it has fueled citizen journalism and user-generated contents which are being
produced, distributed, and consumed by the citizens themselves. There is no surprise
that VR and AR are the change agents in the field of journalism. However, AI has
started proving more influential that VR and AR.
Chinese news apps like Jinri Toutiao, Qutoutiao, and Kuaibao are immensely
used to provide personalized news from a range of news providers [32]. AI enables
to personalize the media contents in order to recommend better to its audiences. By
the virtue of robot journalism, more and more stories and videos can be incorporated.
AI provides technological support to journalists in the age of information overload
syndrome.
The journalistic practice has gone trendy in the light of understanding, researching,
and implementing AI. The following can be summarized.
The quantitative formats have become the new phenomena in modern-day journalism
[37]. This new kind of journalism has created a special space in academic literature
and media practice. With the functioning of AI, the quantitative format of journal-
ism has transformed to the next level. As a result, the production, distribution, and
consumption of media contents have been redefined.
When there is an increased volume of news contents produced and distributed for
the consumption of audiences automatically, it is called automated journalism. It is
an algorithm process which enables the data set to be converted into news stories for
human interest and readability [41, 42]. This can only be possible when AI is used in
newsrooms. AI mobilizes the newsroom in varied manners—streamlining the media
production process, automating the routinized tasks, crunching more data, exploring
media insights, minimizing the fake news, and delivering the requirements.
The leading media houses like The New York Times, Reuters, The Washington Post,
Quartz, Yahoo, Associated Press, The Guardian, and The BBC have adopted AI in
their newsrooms. In an experimental mode, The New York Times has executed its AI
project ‘Editor’ in 2015 in order to simplify the journalistic production process. The
aim of the project was to simplify the journalistic process in the newsroom. When
writing an article, a journalist can use tags to highlight phrase, headline, or main points
of the text. By using various tools through AI, The New York Times has attempted to
moderate the readers’ comments and encouraged constructive discussions and at the
same time overcoming the abusive remarks. Needless to say, The BBC has a huge
amount of data comprising news, features, and videos. Since 2012, it has been using
Juicer, a data extraction tool to link all the data more accessible and more meaningful.
Since 2016, Reuters has been using AI with assistance from semantic technology
company Graphiq. With the help of AI, it is able to provide data-driven news stories,
which are visually stimulating and easy to understand. Apart from providing speedy
access to data, AI also allows the publishers to get the information in terms of simple
tables or charts [43].
The use of Heliograf smart software in The Washington Post; Automative
Insights—a prominent natural language generation vendor in Yahoo; Semantic Dis-
covery and News Whip in Associated Press; and Chatbot Media Interfaces in The
Guardian and Quartz have been the indicators of adopting AI in newsrooms world-
wide. In India, this format of journalism is yet to take off. Leading media houses like
The Times of India, Hindustan Times, The Hindu, The Telegraph, The Indian Express,
NDTV, and India Today may experiment AI to speed up the journalistic process.
162 S. K. Biswal and N. K. Gouda
The use of AI in journalistic practice has several advantages. Firstly, AI has overcome
certain contemporary journalistic issues. Journalists are able to analyze the data
from several sources. Apart from analyzing the images, they can convert the spoken
words into texts, texts to audio and video. They are able to overcome the issues of
information overload, lack of credibility, and shoddy journalism. Secondly, today the
journalists are facing the issues of fake news and misinformation. Professor Kalina
Bontcheva has further identified the prevalence of fake information on social media
[44]. With the help of AI, they can deliver enhanced news quality and accuracy by
identifying and dismantling the fake news. Journalists are getting benefitted quickly
by automated fact-checking [45]. Thirdly, AI has quickened the news editing process
as per given editorial policy. It has brought relief for the journalists who boringly
slug in the newsroom. Software is available to collect news and later to rephrase
it, according to the prescribed editorial policy without any human interventions.
The Associated Press uses urbs to distribute news stories to various media houses
[36]. Fourthly, AI has facilitated a personalized news agenda, which differs from a
media house to another. By the virtue of content personalization, it can provide news
services in multiple languages, keeping the larger audiences across the globe in mind.
Fifthly, AI has propelled the speed of journalistic practice. Robot reporters are able
to produce news stories [1] at a faster pace. The Associated Press has confirmed that
AI has enhanced customer services by more than ten times. Lastly, AI has fetched
a robust defense against such manipulation and propaganda that can endanger a
nation’s security. The Chinese government is using AI to track objectionable contents,
dissent, and propaganda messages. Certain countries are using AI to probe foreign
interference in elections by understanding the contents on Facebook and other social
media outlets [1, 46].
to adopt AI in order to avoid costs in human resources apart from fastening the
newsroom processing. Fourthly, legitimate concerns can grip the journalistic practice
driven by AI. As of now, technological developments have no solutions to legal
problems emanated from algorithm-generated content about private citizens. News
organizations may not be able to defend the legal issues which could be because of
algorithm-driven news stories in Google and other similar digital news platforms.
Fifthly, data utilization has been an issue in AI [47, 48]. The security and privacy of
data have often been an issue to overcome for developers and governments. To bring
correct, objective, and accurate data, news organizations using AI should shoulder
ethical duties for the time being.
9 Way Forward
In the age of science, technology plays a vital role in society. Technology keeps on
changing with the pace of time. Therefore, in the context of AI and machine learning
in journalism, what needs to automated should be automated. It has reorganized
the newsroom as never before in several developed countries. Participatory culture
is getting exercised in newsroom setup across the globe. However, the adoption of
technologies should not push this professional field into a tailspin. The prediction
that the newsroom of 2025 to be run by AI will be witnessed in the years to come.
To some, in the future, larger media contents will be produced with the help of
AI. In a study report, 78% of the industries has agreed on the fact that it is high time
to invest in artificial intelligence in the field of journalism [32]. Technology through
AI can act as an enabler for better journalism and more impactful journalism. It can
pave the professional way in aligning media contents with social good [46].
As the domain of journalism is technology-driven, the industry will shift from time
to time with changes in AI. However, as AI is more into play, it would not pose a threat
to the profession and employment [38]. It can further add values to the journalists
in the digital age. The machine would not completely replace the journalists. Rather
machines will enhance the journalistic skills in more sophisticated manners. The
presence of human journalists is inevitable no matter how much technology changes.
The use of AI in the field of journalism in India will be a learning and experimental
curve. Ramesh Menon, an author, and award-winning journalist asserts, AI is already
being experimented by the Chinese in newsrooms to write news stories and features.
Also, other countries are testing it. It is just a matter of time when AI would be
dominant in Indian newsrooms and even media management systems that will use
it to figure out consumer profiles and needs to keep up with the changing times and
stiff competition. We do not know what the next five or ten years are going to be and
are at a loss in the classroom how to prepare media students for the future. We do
not know how penetrative AI is going to be and how it will affect jobs.
Will Indian media houses invest in AI writing news stories? Of course, they will.
And, why not? After all if you feed in the required information, the robot would
figure out how to pick up relevant information, the kind of intro to writing, how to
164 S. K. Biswal and N. K. Gouda
structure the story logically, what conclusion it should have based on the research it
does from the Internet, the graphs and illustrations and the photographs to be secured
for the story that will not have copyright issues. Who will say no to this? However,
the fact is that the best stories will come from writer–journalists who can put in fine
details, empathy, drama, color, and analysis into their stories. What is really good in
the changing scenario where AI will come in is that tomorrow we can get robots to
do the routine stuff that today takes 80% of the journalists’ time. This can help the
reporters and editors concentrate on big-ticket stories that require a lot of footwork
in terms of getting to the right people, getting them to talk, analyzing the present and
even talk of the way forward. Their time can be better utilized if they have robots
to help them do the normal sundry work. Dynamic changes are happening, and we
as journalists can see that. AI and robots will write stories in the future, and they
will get better at doing it as humans will fine-tune it. Fine-tuning has to be done as
there have been instances of AI going completely wrong in figuring out the news
story. Instead of being overexcited, we must be very cautious. The human interface,
therefore, cannot be completely ruled out as human intelligence to tell the right from
the wrong will be dominant. Imagine what will happen if AI gives a wrong headline
or a wrong interpretation?
It will just be a matter of time when AI-assisted automated reporting systems
and machine learning techniques to sift through massive data to write news reports.
Whether we like it or not, it is going to affect media jobs. In another five years,
we would know. That is why media schools must start teaching techniques that will
equip them for the future and not get stuck on teaching the inverted pyramid style of
writing which the robot will do. They will have to have different skills, and media
schools will do well to stress on ethics which robots will not do.
Interestingly, Google has coughed up $805,000 to build software that will gather,
automate, and write nearly 30,000 local stories every month to British news agency
Press Association. Labeled as reporters and data and robots, the software will auto-
mate local reporting with large public databases from government agencies or the
local police.
Yonhap, a news agency in South Korea, has introduced an automated reporting
system to produce news on football games. Machine learning algorithms are being
already employed to write stories by Thomson Reuters and Associated Press and The
New York Times. Others are using it to beef up their research. Web sites hungry for
content and news Web sites eager to be the first with news and analysis are going
to use AI shortly. Very soon, you will not be able to even think of quality content
generation and the speed with which it is required without AI. Menon concludes that
we might be apprehensive of losing jobs, but in the final analysis, no one will be able
to replace a good journalist who can write stylistically and turn phrases into very
readable copy or even sit down and use his or her knowledge to analyze the turn of
historic events.
Suffice to say, AI will have an immense impact on the ecosystem of media market
round the globe. One the one hand, the technology can have enough scope to create
social good where it can assist the human to navigate the required data out of a
huge pool of data by personalized recommendations. On the other hand, AI can
10 Artificial Intelligence in Journalism: A Boon or Bane? 165
manufacture the media contents as human needs which could not be beneficial to
humankind. It could only happen by deceiving media audiences. By the path of
business model with utter manipulations, it may reduce the social good to business
good which can only be a bubble of business for a short period. Hence, ethical
challenges need to be amicably resolved. Its use and human resource should strike a
chord in the industry. Later, there will a clarion call to use AI in the field of journalism
for the greater interest of humankind.
References
1. Peiser J (2019, February 5) The rise of the robot reporter. The New York Times. Retrieved from
May 26, 2019. https://www.nytimes.com/2019/02/05/business/media/artificial-intelligence-
journalism-robots.html
2. Kuo L (2018, November 9) World’s first AI news anchor unveiled in China. The Guardian.
Retrieve from July 20, 2019. https://www.theguardian.com/world/2018/nov/09/worlds-first-ai-
news-anchor-unveiled-in-china
3. Wölker A, Powell TE (2018) Algorithms in the newsroom? News readers’ perceived credibility
and selection of automated journalism. Journalism, 1–18
4. Simon HA (1965) The shape of automation for men and management. Harper & Row, New
York
5. Kaplan A, Michael H (2018) Siri, Siri in my hand, who’s the fairest in the land? on the
interpretations, illustrations and implications of artificial intelligence. Bus Horiz 62(1):15–25
6. Clark J (2015, 8 December) Why 2015 Was a breakthrough year in artificial intelligence.
Bloomberg News. Retrieved from July 18, 2019. https://www.bloomberg.com/news/articles/
2015-12-08/why-2015-was-a-breakthrough-year-in-artificial-intelligence
7. Salathé M, Vu DQ, Khandelwal S, Hunter DR (2013) The dynamics of health behavior senti-
ments on a large online social network. EPJ Data Science, 2(4). DOI:https://doi.org/10.1140/
epjds16
8. McLuhan M (1967) Understanding media: the extensions of man. Sphere Books, London
9. Basu P, De S (2016) Social media and social movement: contemporary online activism in Asia.
Media Watch 7(2):226–243
10. Castells M (2012) Networks of outrage and hope: social movements in the internet age. Polity
Press, Cambridge, UK
11. O’Donovan T (1998) The impact of information technology on internal communication. Educ
Inf Technol 3(1):3–26
12. Trivedi J (2014) Effectiveness of social media communications on Gen Y’s attitude and
purchase intentions. J Manag Outlook 4(2):30–41
13. Chakraborty U, Bhat S (2017) Credibility of online reviews and its impact on brand image.
Manag Res Rev 41(1):148–164
14. Chakraborty U, Bhat S (2018) Online reviews and its impact on brand equity. Int J Internet
Mark Advertising 12(2):159–180
15. Encyclopædia Britannica (2019) Journalism. Retrieved from July 10, 2019. https://www.
britannica.com/topic/journalism
16. Patankar S (2015) Facebook as platform for news dissemination, possibilities of research on
Facebook in Indian context. Amity J Media Commun Stud 6(2):49–56
17. Jaggi R, Ghosh M, Prakash G, Patankar S (2017) Health and fitness articles on Facebook-a
content analysis. Indian J Public Health Res Dev 8(4):762–767. https://doi.org/10.5958/0976-
5506.2017.00428.4
18. Kusuma KS (2018) Media, technology and protest: an Indian experience. Language
in India, 18(7). Retrieved from August 12, 2019. http://languageinindia.com/july2018/
kusumamediatechnologyprotest.pdf
166 S. K. Biswal and N. K. Gouda
19. Bulatova M, Kungurova O, Shtukina E (2019) Recognizing the role of blogging as a journalistic
practice in Kazakhstan. Media Watch 10(2):374–386
20. Yusuf Ahmed IS, Idid SA, Ahmad ZA (2018) News consumption through SNS platforms:
extended motivational model. Media Watch 9(1):18–36
21. Ghosh M (2019) Understanding the news seeking behavior online: a study of young audiences
in India. Media Watch 10:55–63
22. Biswal SK (2017) Role of the media in a democracy revisited. Vidura 9(1):19–20
23. Pradhan A, Narayanan S (2016) New media and social-political movements. In Narayan SS,
Narayanan S (eds) India connected. Sage, New Delhi
24. Dash B (2015) Community radio movement: an unending struggle in India. J Dev Commun
26(1):88–94
25. Dash B (2016) Media for empowerment: a study of community radio initiatives in Bundelkhand.
(PhD Thesis). Tata Institute of Social Sciences, Mumbai
26. Pavarala V, Malik K (2007) Other voices: the struggle for community radio in India. Sage,
Thousand Oaks, California
27. Biswal SK (2019) Exploring the role of citizen journalism in rural India. Media Watch 10:43–54
28. Simons M (2017, April 15) Journalism faces a crisis worldwide—we might be entering a new
dark age. The Guardian. Retrieved from July 15, 2019. https://www.theguardian.com/media/
2017/apr/15/journalism-faces-a-crisis-worldwide-we-might-be-entering-a-new-dark-age
29. Raman U (2015, December 1) Failure of communication: India must face up to the
rift between its newsrooms and classrooms. The Caravan. Retrieved from July 19,
2019. https://caravanmagazine.in/perspectives/failure-of-communication-rift-between-india-
newsrooms-clasrooms
30. Virtual Reality Society (2017) What is virtual reality? Retrieved from July 9, 2019. https://
www.vrs.org.uk/virtual-reality/what-is-virtual-reality.html
31. Owen T, Pitt F, Aronson-Rath R, Milward J (2015) Virtual reality journalism. Retrieved from
July 10, 2019. https://www.cjr.org/tow_center_reports/virtual_reality_journalism.php
32. Newman N (2019) Journalism, media, and technology trends and predictions 2019. Digital
News Project, 2019. Retrieved from May 31, 2019. https://reutersinstitute.politics.ox.ac.uk/
sites/default/files/2019-01/Newman_Predictions_2019_FINAL_2.pdf
33. BBC Academy (2017, November 7) Virtual reality journalism: is it the new real-
ity for news? Retrieved from July 10, 2019. https://www.bbc.co.uk/academy/en/articles/
art20171107112942639
34. Mabrook R, Singer JB (2019) Virtual reality, 360° video, and journalism studies: conceptual
approaches to immersive technologies. Journal Stud. DOI: https://doi.org/10.1080/1461670x.
2019.1568203
35. Pavlik JP, Bridges F (2013) The emergence of augmented reality (AR) as a storytelling medium
in journalism. Journal Commun Monogr 15(1):4–59
36. Graefe A (2016) Guide to automated journalism. Columbia Journalism Review, New York.
Retrieved from July 19, 2019. https://www.cjr.org/tow_center_reports/guide_to_automated_
journalism.php
37. Coddington M (2015) Qualifying journalism’s quantitative turn: a typology for evaluating
data journalism, computational journalism, and computer-assisted reporting. Digit Journal
3(3):331–348
38. Veglis A, Bratsas C (2017) Towards a taxonomy of data journalism. J Media Critiques
3(11):109–121
39. Williams II D (2017, December 6) The history of augmented reality (Infographic). Huff-
Post. Retrieved from July 9, 2019. https://www.huffpost.com/entry/the-history-of-augmented_
b_9955048
40. Hamilton JT, Turner F (2009) Accountability through algorithm: developing the field of
computational journalism. Behavioral Sciences Summer Workshop, Stanford. Retrieved
from July 16, 2019. http://web.stanford.edu/~fturner/Hamilton%20Turner%20Acc%20by%
20Alg%20Final.pdf/
10 Artificial Intelligence in Journalism: A Boon or Bane? 167
41. Carlson M (2015) The robotic reporter: automated journalism and the redefinition of labor,
compositional forms, and journalistic authority. Digit Journal 3(3):416–431
42. Galily Y (2018) Artificial intelligence and sports journalism: is it a sweeping change? Technol
Soc 54:47–51
43. Underwood C (2019, January 31) Automated journalism—AI applications at New York Times,
Reuters, and Other Media Giants. Emerj Artificial Intelligence Research. Retrieved from July
20. https://emerj.com/ai-sector-overviews/automated-journalism-applications/
44. Ali W, Hassoun M (2019) Artificial intelligence and automated journalism: contemporary
challenges and new opportunities. Int J Media Journal Mass Commun 5(1):40–49
45. Graves L (2018)Understanding the promise and limits of automated fact-checking. Retrieved
from July 16, 2019. https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2018-02/graves_
factsheet_180226%20FINAL.pdf
46. Sullivan D (2016, December 24) Google’s top results for ‘did the Holocaust happen’ now
expunged of denial sites. Retrieved July 15, 2019. https://searchengineland.com/google-
holocaust-denial-site-gone-266353
47. Monti M (2019) Automated journalism and freedom of information: ethical and juridi-
cal problems related to AI in the press field. Retrieved from July 10, 2019. http://www.
opiniojurisincomparatione.org/opinio/article/view/126
48. Wang W, Siau K (2018) Ethical and moral issues with AI: a case study on healthcare robots.
In: Twenty-fourth Americas conference on information systems. Retrieved from July 16, 2019.
file:///C:/Users/HP/Downloads/EthicalandMoralIssueswithAI.pdf
Chapter 11
The Space of Artificial Intelligence
in Public Relations: The Way Forward
1 Introduction
The industry of media and communication keeps on evolving and is ceaselessly mov-
ing forward. Various types of mass communication—journalism, advertising, Public
Relations (PR), social media, audio, film and television, and photography—have
been witnessing sea changes with technological interventions. Imparting education,
itself a form of communication, is influenced by certain technologies. Certain tech-
nologies have made their marks in the field of educational pedagogies [15]. Going
further, technology is being used in varied other services including banking sector.
Banks offer chatbots to improve customer service. Chatbots form as an information
system, which is essential for examining customer experiences [18]. Mobile commu-
nication and other means of information and communication technologies are being
used for healthcare facilities [11, 12]. E-governance is able to meet the requirements
of citizens in building a progressive nation [10]. Moreover, with the explosion of the
Internet, E-commerce sector is expanding in which the optimum utilization of big
data can be possible for bigger business possibilities [6]. Since consumers are active
on digital platforms, it is imperative to understand online reviews on functional and
hedonic brand images, which are required for the promotion of business and brand-
ing [2]. Moreover, in the Indian context, Pandey [13] finds that it is the Internet, a
type of technological innovation which could speed up the developmental process.
Understanding the pattern of communication in journalism, film, or business
by tapping the big data is essential. Overall, communication can be art-oriented
or business-oriented. Business communication, an applied form of communication,
remains an essential characteristic of the management of a business. It is the infor-
mation disseminating among people within and outside an organization. Such kind
S. K. Biswal (B)
Symbiosis Institute of Media and Communication (SIMC), Symbiosis International (Deemed
University), Pune, Maharashtra 412115, India
e-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2020 169
A. J. Kulkarni and S. C. Satapathy (eds.), Optimization in Machine
Learning and Applications, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-15-0994-0_11
170 S. K. Biswal
ML is the scientific study of algorithms and statistical models that computer systems
execute certain assignments without using explicit instructions. ML algorithms are
being used in various fields. It is a subset of artificial intelligence (AI). With the
development of AI, ML and natural language processing along with new techno-
logical platforms, it is feasible to dehumanize the processing of large quantity of
publicly available data [16]. Such kind of machine application has changed the pro-
cess of communication. In contrast, communication has been considered as a human
process often mediated by technology [3]. Adopting the mode of AI, the Associated
Press has changed the pattern of production and distribution of news. The technology
is being used from interpersonal interlocutor to content producer. Amazon’s Alexa
is programmed to meet human queries and needs. AI is automating and fastening the
pace of the communication, and subsequently, social processes are getting reliant on
it [5]. Therefore, there is a departure from historical role of media to a new emerg-
ing role of business and social communication. Business communication, a form
of communication with technological interventions, is getting more efficient for the
dissemination of consumer information [9]. Such kind of machine application is also
beneficial in the hiring process and the recruitment industry, which are beneficial for
clients and candidates as well [19]. Therefore, such kind of machine utilizations can
be commercially oriented and utilized.
11 The Space of Artificial Intelligence in Public Relations … 171
Needless to say, AI has slowly come into play in communications industry. In the
domain of PR, AI has the capacity to frame the data-driven contents and handle the
crisis. It also understands the upcoming media trends. As of now, only prominent PR
agencies have been able to tap the power of AI in their daily works. It is being used to
enhance the capabilities of people. As a result, people working in PR firms are able
to spend their time on creative activities [14]. Bourne [1] finds that due to ignorance
of AI, the level of diversity in the PR functions may be lowered. Therefore, the role
of such technologies has become essential to make PR activities effective.
With the inputs from AI, the creation of new campaigns can be possible. It can also
help a PR firm to get rid of the guess works. The automation and ML assist the
professionals to understand which elements will pay the success to PR campaigns.
Since a machine does faster than human beings, it is easy to take fast and accurate
decisions, which are beneficial for the client concerned. It helps to understand and
foresee the trend, which is ultimately required for decision-making process [14].
Such kind of machine inputs has been proved fruitful for qualitative and quantitative
decisions. It assists in sorting out the time, content, medium, and audience of the
campaign. By employing AI, PR persons can produce hyper-specific materials, which
will be best suitable for their clients’ requirements. It can lessen the time wastes on
content creations for specified audience.
AI has attempted to bring relief to the PR professionals from mundane tasks. Rou-
tinized or repetitive works are being easily accomplished with the intervention from
AI. By the virtue of this technology, Robotic Process Automation (RPA) is making
several regular works possible. Scheduling calendars, structuring meeting notes, and
other similar works are done with the machine used in the firm. The technology is
bringing relief to PR persons from the works like administration, crunching num-
bers, and organizing files. Empowered with the technology, they are able to create,
organize, and prioritize task in their firms to meet their clients’ requirements. Cer-
tain instances have revealed that PR firms have started engaging with automating
things. Earnings’ reports can be of one such instance, which maximizes their cre-
ative assignments. Since several of the works are completed with AI, PR persons are
more engaged with project ideation and venturing into newer avenues.
172 S. K. Biswal
PR tools such as its Web sites, social media pages on Facebook, Instagram, and
Twitter can all be carried out efficiently by AI.
PR research is another area where AI would be extremely effective. Real-time
sourcing of information related to the company being posted on the Web by media
outlets or other users can be carried out instantly. Analysis of opinion polls, con-
sumers’ feedback, and monitoring of various platforms can be entrusted to AI. With
the growth of such technology, it is not difficult to use AI to draft press releases.
It is possible to present a technically correct press release based on the information
fed into the system and the situation at hand. However, this is one area where this
technology may fail to make a mark. Only humans can truly guess the pulse of the
audience/general public at large. The press release from an AI-operated robot may
be transparent, simple, and direct. However, it may miss the human touch of under-
standing and conveying the message emotionally. Secondly, while AI may take over
efficiency, trust between an organization and its stakeholders would suffer. People
trust a brand that delivers the promise of ‘customer service’ but not by opting the
easy way out with AI. Also, when it comes to disaster management and crisis PR, it
is only humans who can truly take decisions and win back trust from the company’s
stakeholders.
Another media educator of eminence who has been associated with researches
in PR finds that the use of AI in PR is very less talked about topic and practically
abysmally used phenomenon in India. However, at global platforms, its importance
has been acknowledged in the form of research for PR campaigns, automation of
routine yet important tasks and analysis of people’s sentiments and crisis commu-
nication. Since PR is understood as image-building exercise by an organization, the
effective use of media has become highly automated. Eventually, PR has to depend
upon AI for many types of groundworks. Also, PR tries to mold public opinion in
the direction of favorable image of the organization, so human acumen and wit are
essential to handle any difficult situation. Therefore, human skills supported by AI
to fasten the procedure are best way to conceptualize, execution, and completion of
PR activities. One cannot depend completely on it for such activities since AI does
not differentiate between people and machine and does not know how to handle the
emotions of the human beings. On the other side, humans are not as fast as machines
and many times, they can be biased with the data. So, AI should be used to comple-
ment the lacunae of human beings, and human beings should be used to complement
the shortcomings of AI. Then, only we can get the best results.
A researcher in the field of PR and AI explores that data crunching is a big issue,
which requires ample amount of time. AI and ML cater to the needs of millennial con-
sumers. It provides a unique user experience in terms of business communication to
PR persons. The services like ‘speech to text conversion’, ‘sentiment analysis’, ‘mas-
sive data analysis’, and ‘identification of common problems’ are worth-mentioning.
Though the need of AI is increasing, the requirement of human does not decrease.
Only the approach is changing that for our benefit, some strata of the business world
consider AI is as a threat, if we look back into technological help to us. We have
transitioned from letters to E-mails and WhatsApp. We can not deny the fact that life
174 S. K. Biswal
has become much easier and communication is growing strong with each passing
day.
Mudita Mishra, a media educator in the field of PR, asserts that the business of
PR is to craft an image for an organization or a person, and then help sustain it, by
way of managing reputation and then crisis, if one might arise. In an era dominated
by technological evolution, the field of Corporate Communication (CorpComm) in
PR has not been left untouched by deliberations on AI. If one is able to look at
CorpComm as a model wherein communication is directed at internal and external
publics, who are the stakeholders of the organization, then there could not have been
a better time to appreciate the possibility of integration of AI and CorpComm. This
is understood better once we acknowledge that PR in general has always had to
deal with a certain public distrust, in that the public has found it difficult since the
early days to believe that corporate communicators would be telling the truth. On
the contrary, corporate communicators have been looked at as quite the defenders of
wrong-doings of organizations or any other entities that they might be defending, by
way of presenting such information in a manipulated way.
The possible role of AI and its applications can be a revolution that might help
in contesting this widely held disbelief in two broad ways. Firstly, by helping the
public in finding comfort and belief in the authenticity of communication being
meted out by corporate houses, since the communications will no longer be the sole
proprietary of humans, so as to say, but be validated by ‘machines’. Secondly, and
more importantly, the reason why the former reasoning will be able to stand is that
the primary data for crafting such communications itself will be crafted with the
help of AI. This technology will be able to sense and pick up the various data points
being generated by countless conversation points among consumers, audience, and
citizens—the external public, in general. All in all, CorpComm integrated with AI
makes for a very robust case for the image of PR in itself—a case that the PR industry
has been fighting solely with the aid of its human representatives until now. AI will
help bring in that unbiased perspective to this mix—at least, hopefully, a perception
of unbiased communication coming from corporations.
Professor Pradeep Nair, a media educator and researcher, opines that the AI tech-
nologies have revolutionized the methods of teaching PR as a subject and as a prac-
tice. It brought a paradigm shift in PR education by making the teaching pedagogy
more approachable. It makes the learning process more collaborative by engaging
both the teachers and the students in real-life corporate situations. Today, AI is used
in teaching PR for designing a teaching module and for engaging the students in
assignments, assessment, and evaluation of students’ projects. It is used to assess
the subjective understanding of the students by designing instructional contents as
per the immediate needs of the students. It provides multiple digital platforms to
interact and instruct the students about emerging PR practices, thus making PR as
an academic discipline more structured and streamlined. By producing smart audio-
visual contents, a teacher has an opportunity to help the students to understand the
PR industry and can help them to improve their insights on the need of consumers
and creating fine-tuned PR messages for them. The use of AI in teaching PR helps
the media educators to adopt a utilitarian approach by analyzing the most prevalent
11 The Space of Artificial Intelligence in Public Relations … 175
trend among the students and to address it accordingly. It also helps the media edu-
cators to teach the students about how PR companies are improving their services
with the help of high-speed data to understand the digital DNA so that tailored and
customized PR messages could be designed as per the requirements of the market.
5 Concluding Remarks
Acknowledgements The researcher is sincerely thanking Sneha Verghese, Archana Kumari, and
Prerona Sengupta for their insightful comments on the topic.
References
6. Ghosh M (2017) Significance of big data in E-commerce: the case of Amazon India. Media
Watch 8(2):61–66
7. James SB (2018, May 23). Humans still needed: AI use in PR to treble in three years, report
suggests. Retrieved from 16 Aug 2019. https://www.prweek.com/article/1465483/humans-
needed-ai-use-pr-treble-three-years-report-suggests
8. Kaul A (2015) Effective business communication. Prentice Hall India, New Delhi
9. Naidoo J, Dulek RE (2018) Artificial intelligence in business communication: a snapshot. Int
J Bus Commun 1–22
10. Nair P (2009) An IT technical framework for e-government: based on case study in Indian
context. Electron Gov Int J 6(4):391–405
11. Nair P (2014) ICT based health governance practices: the Indian experience. J Health Manag
16(1):25–40
12. Nair P, Bhaskaran H (2015) The emerging interface of healthcare system and mobile
communication technologies. Health Technol 4(4):337–343
13. Pandey US (2016) The Internet in India: crystallizing the historical inequalities. In: Narayan
SS, Narayanan S (eds) India connected: mapping the impact of New Media. Sage, New Delhi,
pp 221–236
14. Peterson A (2019, January 16). The past, present & future of artificial intelligence in PR.
Retrieved from 16 Aug 2019. https://www.cision.com/us/2019/01/artificial-intelligence-PR/
15. Rego R (2017) New Media technologies in teaching and learning in higher education. Media
Watch 8(1):75–88
16. Salathé M, Vu DQ, Khandelwal S, Hunter DR (2013) The dynamics of health behavior
sentiments on a large online social network. EPJ Data Sci 2(1):4. https://doi.org/10.1140/
epjds16
17. Seitel FP (2007) The practice of public relations. Pearson Prentice Hall, Upper Saddle River
18. Trivedi J (2019) Examining the customer experience of using banking Chatbots and its impact
on brand love: the moderating role of perceived risk. J Internet Commer 18(1):91–111
19. Upadhyay AK, Khandelwal K (2018) Applying artificial intelligence: implications for
recruitment. Strateg HR Rev 17(5):255–258. https://doi.org/10.1108/SHR-07-2018-0051
20. Whitaker A (2017, March 20) How advancements in artificial intelligence will impact public
relations. Retrieved from 16 Aug 2019. https://www.forbes.com/sites/theyec/2017/03/20/how-
advancements-in-artificial-intelligence-will-impact-public-relations/#1b84ba8941de
Chapter 12
Roulette Wheel Selection-Based
Computational Intelligence Technique to
Design an Efficient Transmission Policy
for Energy Harvesting Sensors
1 Introduction
Internet of Things (IoT) and machine learning are getting much attention in recent
years. Besides the connectivity of computers and mobile phones, Internet of Things
empowers the connectivity among billions of ‘things’ and devices through Inter-
net or local area networks (LAN). Multifarious applications of IoT include but not
limited to household needs, industrial applications, wireless sensors, etc. Most of
these applications need gathering and transmission of sensed data round the clock.
Enabling these billions of devices requires a continuous supply of energy for their
uninterrupted functioning. Conventional power supply may not be feasible for all
applications, especially those of wireless domain, and the usage of battery requires
timely monitoring and replacement. As a result of unprecedented growth in IoT-
enabled devices, maintenance of these power resources becomes a hefty exercise
and led to the evolution of energy harvesting (EH) sensors as a viable option [1].
EH sensors harvest the energy from natural resources in small amounts, store it in
a rechargeable battery and use it instantaneously for all the needs [2, 3, 18]. EH
sensors contribute to green communication and can operate independently over long
periods of time. These EH devices are finding a considerable applications in wire-
less sensor networks (WSN) because of the benefits mentioned [17–26]. EH sensors
are relatively low-cost devices and operate with minimal amount of energy. So their
prevalent presence can be seen in many applications like monitoring and controlling
the environment, especially in remote and dangerous areas [22]. In EH applications,
harvesting process and phenomenon need to be properly analysed and adapted; trans-
S. Mahammad · V. Yogesh
Communication Systems, NIT, Trichy, India
E. S. Gopi (B)
Department of Electronics and Communication Engineering, NIT, Tiruchirappalli, India
e-mail: [email protected]
mission energy management with deterministic harvesting process has been studied
in [14]. However, energy harvesting is being done from the environment and is least
likely to be deterministic. Due to the sporadic nature of resource availability, chance
of energy harvesting in a given time interval can be treated as a stochastic process with
some harvesting probabilities [30, 31]. Quantity of harvested energy also depends
on various factors and varies time to time, and this should also be considered while
setting up the simulation environment [13, 14, 20]. Different power management
schemes have been studied in [14, 29]. In a communication model, either transmit-
ting node or receiving node or both of them can be capable of energy harvesting.
Considerable research work had been carried out with transmitter nodes alone being
energy harvesting capable [31, 32], receiving nodes alone being harvesting capable
[27, 28] and both transmitting and receiving nodes harvesting capable [23–26]. In
this paper, we consider that the transmitting nodes are capable of harvesting energy
from the environment. To evaluate the performance of any transmission policy, a
communication model with performance metric needs to be considered [6, 9, 15,
16, 31]. Performance metric can aim at optimization of any single key parameter or
overall energy utilization such that it defines the efficiency of communication model
with all constraints.
1.1 Background
Because of their replenishing abilities and prolonged lifetime, energy harvesting sen-
sors found a place in communication models [4–11]. Significant research work has
been carried out on different factors aiming at achieving better performance of the
system. An efficient multi-stage energy transfer system, which has the relation among
various components of the system and their optimal selection according to the needs,
is presented in [13]. Adapting those guidelines in hardware components considerably
increases the capacity of an EH sensor. Multi-parametric programming approaches
with adjustments to different crucial parameters such as buffer size, sampling rate,
timing and routing are studied in [21, 29]. In [4], a directional water filling algo-
rithm to minimize the transmission completion time of the communication session
while maximizing the throughput has been introduced. An online dynamic program-
ming framework to control the admissions into data buffer is derived in [5]. Energy
management policies stabilizing the data queues and optimizing the delay proper-
ties in a single-user communication model under a liner approximation is studied in
[6]. Throughput optimal energy allocation with a time-constrained slotted setting in
energy harvesting system is studied in [8]. Some other performance metrics of an EH
sensor that have been studied in the literature include the minimization of transmis-
sion time [9], improving the quality of coverage [15], maximization of short-term
throughput [16], optimizing throughput and minimizing delay [6]. Apart from these,
main aim of any communication model also includes the faithful transmission of
collected data. Packet drop probability or packet outage probability gives a measure
of successful transmission. In [31, 32], packet drop probability has been considered
12 Roulette Wheel Selection-Based Computational Intelligence … 179
2 System Model
As mentioned, EH sensors harvest the energy from natural resources such as solar,
vibration, mechanical and wind [13, 14, 18, 33]. Due to the sporadic nature of
resource availability, it cannot be deterministic, it can only be treated as a stochastic
process with probability of energy being harvested Phar v at the beginning of every slot
time. In this paper, we considered wind energy as the source of harvesting. Intensity
of the wind usually varies from period to period adhering to the environmental
conditions [35, 36]. Wind power density changes from place to place. Wind power
of any selected site is proportional to the cube of wind speed; therefore, wind power
density (WPD) can be written as [35, 37, 38]:
n
1 1 W
WPD = ρ ν3 = ρ ν3 (1)
2n i=1
2 m2
where WPD is the wind power density in W/m2 , ρ is the air density in kg/m3 , ν is
the mean wind speed in m/s and n is the number of observations in the specific time
period. This wind power density is estimated closely using the Weibull distribution
function in [35] as
1
p(ν) = ρ A ν3 (2)
2
∞
p(ν) 1 1 3
WPD = = ρ Aν 3 f (ν)dν = ρc3 Γ 1+ (3)
A 0 2 2 g
where p(ν) denotes the power available in watts, A denotes the root swept area in m2 ,
Γ denotes the mathematical gamma function and f (ν) denotes the two parameter
Weibull function with c as Weibull scale parameter in m/s and g as Weibull shape
parameter.
∞
Γ (n) = e−x x n−1 d x (4)
0
g ν g−1 ν g
f (ν) = exp − (5)
c c c
12 Roulette Wheel Selection-Based Computational Intelligence … 181
For simulating the harvesting environment, Weibull scale (c) and shape (g) param-
eters of Taralkatti area, Karnataka, have been adopted [35]. To confirm the random-
ness in wind and to evaluate the rigidity of designed policies, g and c values for
four months are considered. Therefore, the quantity of harvested energy can be any
of the values formed by all possible combinations of g and c. The performance of
transmission policies has been evaluated and presented for a range of harvesting
probabilities (Phar v ).
where Bm,n represents the battery level and E m,n represents the energy spent in
transmitting a data packet at slot n of time frame m. Packet re-transmission model
with re-transmission index (K) of four is shown in Fig. 1.
To imitate the wireless channel and its fading effects, a Rayleigh fading channel with
additive white Gaussian noise has been considered [32, 34]. A discrete channel model
is used for covering the fading gains of the wireless channel. All the possible fading
gains or channel gains are considered and covered by a discrete channel gain set, G =
{γ1 , γ2 , . . . , γ N }. Besides reduction in memory, discretization of channel gains also
results in all the benefits of quantization [31]. These states can be computed based on
182 Sk. Mahammad et al.
Fig. 1 Packet re-transmission model with energy harvesting for re-transmission index, K = 4
the underlying Doppler frequency and fading distribution, following the procedures
mentioned in [39, 40]. The channel is considered to be block fading channel [31],
which implies that the coherence channel time is larger than the frame duration
and it changes for every frame time, i.e. T f rame Tcoher ence [34]. So, the channel
characteristics remain constant for one frame time, as shown in Fig. 2. Rayleigh
distribution of channel gain (γ ) with scale parameter ‘α’ is defined as:
γ −γ 2
R(α) = exp (7)
α2 2α 2
π
Rmean (α) = α × (8)
2
4−π
Rsd (α) = α 2 (9)
2
where N0 represents the power spectral density of additive white Gaussian noise
(awgn) and l represents the packet length, i.e. number of bits per packet.
Therefore, once we find out or fix the desired accuracy of the communication
system in terms of its probability of error, we can derive a relation between channel
gain and transmission energy as
2
Q −1 1 − (1 − Pe) l
1
× N0
γm × E m,n = . (11)
2
From (11), it is evident that if we estimate the channel gain value, γm , we can
estimate the amount of transmission energy required for packets’ first transmission
attempt (k = 1). Therefore, in this paper, we are estimating the channel gain to
determine the next possible state Sm+1,1 , in turn E m+1,1 to transmit the data packet
with an optimal transmission energy. State
of the EH node at K th slot of mth frame
will be Sm,K = E m,K , γm , Bm,K , Rm,K , K . Transition of state from Tm,K to Tm+1,1
will be
buffer D of size K will be maintained, and whenever the EH node receives an NACK
in first attempt and an ACK in its subsequent attempts, the deviation between the two
channel gain estimates is recorded and stored in this buffer. So, the buffer, D, holds
the most recent K deviations of faulty estimations and these deviations are used in
finding the energy for next transmission attempt. Incremental energy is taken as the
root mean square value of deviations stored in D.
The state transfer from slot Tm,n (Sm,n ) to slot Tm,n+1 (Sm,n+1 ) can be observed as
:
⎧
⎪
⎪ 0, γm , Bm,n+1 , 0, K
⎪
⎪
⎪
⎪ if k = K
⎪
⎪
⎪
⎪
⎪
⎪ 0, γm , Bm,n+1 , 1, k + 1
⎪
⎨ if k < K & Rm,n = 1
Sm,n+1 =
(15)
⎪
⎪ 0, γm , Bm,n+1 , 0, k + 1
⎪
⎪
⎪
⎪ if k < K & Rm,n = 0 & Bm,n+1 < E m+1,1
⎪
⎪
⎪
⎪
⎪
⎪ E m,n+1 , γm , Bm,n+1 , 0, k + 1
⎪
⎩
if k < K & Rm,n = 0 & Bm,n+1 > E m+1,1
Our main aim is to reduce the packet drop probability to as low as possible.
Therefore, in addition to the above-mentioned state transitions, a special case is
introduced when 0.7 ∗ Bm,n+1 E m+1,1 < Bm,n+1 and k = K . In this scenario, a
transmission attempt is made as it may result in Rm,n = 1.
Channel gain estimation plays a vital role in energy estimation, in turn the state esti-
mation. An accurate channel gain estimation reduces the number of re-transmissions
and leads to an efficient transmission policy design. In this paper, we considered the
well-known artificial neural network (ANN) and extreme learning machine (ELM)
techniques in addition to the proposed maximum matched distribution (MMD) tech-
nique. We compare the packet outage probability of all these techniques. Initial sam-
ple value for channel gain estimate is taken as twice the mean channel gain value.
From the next sample onwards, policies use respective estimation techniques. All
these computational intelligence techniques require a lead on channel gain history
to estimate the next possible sample value. As mentioned in Sect. 2.3, Rayleigh dis-
12 Roulette Wheel Selection-Based Computational Intelligence … 185
tribution is used for simulating the wireless channel. While evaluating performance
for a Rayleigh fading channel of mean gain ‘γmid ’, these models are trained with a
sequence of mean gain ‘γmid ’, length equal to one-third of the number of total slots.
However, the channel gains to be estimated are simulated as a Rayleigh sequence
with mean value varying arbitrarily between ‘γmid − 2’ and ‘γmid + 2’ to measure
the robustness to variations from the trained values. A two-hidden-layer ANN with
resilient back propagation mechanism is considered [44, 45]. Additive hidden nodes
with log sigmoid function are considered for ELM and estimate the next channel gain
sample [41–43]. Most recent history of ten channel gain samples is considered while
estimating the next channel gain sample in ANN and ELM. In MMD model, a tran-
sition probability distribution matrix, T is constructed using the history of channel
gain samples. In T, each row and each column correspond to the discretized channel
gains. The elements represent the transition probability from one state to the other.
Therefore, selected row of T represents the transition probabilities from that channel
gain to all other channel gain values. Algorithm for constructing T is as follows:
Once T is formed, normalize each individual row so that it represents the transi-
tion probabilities. To estimate the next channel gain sample, the latest channel gain
estimate will be considered. This sample is then digitized, and its corresponding row
from T, which represents the transition probabilities, will be taken. A roulette wheel
is formed with the ten most probable states of transition probabilities, and one of
them will be selected as next channel estimate based on roulette wheel selection
mechanism.
186 Sk. Mahammad et al.
From the observations of Table 1, it can be understood that with the increase in
harvested energy which can be through increase in harvesting probability or the
re-transmission index, performance of all the policies has been improved. It can
also be observed that the improvements are a bit more significant in the case of
transmission policy which uses MMD-RW mechanism to estimate the channel gain.
Another important observation is when the channel gain is less, the performance of
MMD-RW is not as prominent as compared to that in higher channel gain scenarios,
whereas the performance of policy employing ANN has been improved moderately
but the change in outage probability is not as remarkable as that employing MMD-RW
technique. This clearly indicates that the transmission policy employing MMD-RW
method is more effective but is consuming a little extra energy for transmitting a
packet. Therefore, if that little extra requirement is fulfilled, its performance could
be even better.
Understandably, the extra energy cannot be assured from natural resources as
it is not in the control of EH node. Increasing re-transmission index helps but
also increases the delay in transmitting subsequent packets from source to destina-
tion. Therefore, we are proposing the concept of collaborative transmission policy.
According to this policy, when an EH node runs out of energy, i.e. required estimated
energy for transmitting a packet is more than the battery reserve, it seeks the help
of other nearby EH nodes in the network. EH node, which has sufficient energy for
transmitting its own packet as well as the requesting nodes packet, will then transmit
both the packets. In this manner, more efficient data transmission can be achieved
by means of collaboration. Selection of EH node which aids the requesting node
depends on reward factor. The node with higher reward factor will be given priority.
To illustrate the collaborative transmission policy, a wireless sensor network
with EH sensor nodes is considered. At each EH node, a metric space of its sur-
rounding nodes is maintained. In this work, three-dimensional euclidean space is
considered. So, every EH node has a prior knowledge of its adjacent nodes and
respective euclidean distances. Euclidean distance between two points (x1 , y1 , z 1 )
and (x2 , y2 , z 2 ) is measured as
d12 = (x2 − x1 )2 + (y2 − y1 )2 + (z 2 − z 1 )2 (16)
Battery insufficiency occurs when the required transmission energy of the packet
is higher than the battery reserve.
In this case, it looks for the status of its neighbour nodes by seeking their status
vector S and evaluates the reward factor associated with each of them. Rewarding
factor R varies inversely with distance as path loss varies directly with the distance.
12 Roulette Wheel Selection-Based Computational Intelligence … 187
Table 1 Packet drop probability or outage probability for re-transmission index K = 4 and K = 6
with different energy harvesting probabilities, Phar v and mean channel gains, γmid
Attempt MMD-RW ELM ANN
K =4 K =6 K =4 K =6 K =4 K =6
Harvesting probability, Phar v = 0.3 and γmid = 6
1 0.4115 0.2295 0.3915 0.2642 0.3895 0.2695
2 0.4218 0.2320 0.3870 0.2697 0.3835 0.2738
3 0.4128 0.2298 0.3860 0.2585 0.3832 0.2538
4 0.4190 0.2385 0.3972 0.2662 0.3935 0.2600
5 0.4205 0.2375 0.3907 0.2750 0.3872 0.2730
Avg 0.4171 0.2335 0.3905 0.2667 0.3874 0.2660
Harvesting probability, Phar v = 0.3 and γmid = 8
1 0.2807 0.1235 0.2635 0.1930 0.2767 0.1817
2 0.2840 0.1242 0.2550 0.1943 0.2715 0.1883
3 0.2860 0.1225 0.2612 0.1963 0.2737 0.1923
4 0.2830 0.1270 0.2520 0.1875 0.2697 0.1870
5 0.2908 0.1258 0.2590 0.1983 0.2750 0.1975
Avg 0.2849 0.1246 0.2581 0.1939 0.2733 0.1894
Harvesting probability, Phar v = 0.5 and γmid = 6
1 0.1832 0.0742 0.2050 0.1805 0.2013 0.1030
2 0.1742 0.0775 0.1940 0.2000 0.1960 0.1060
3 0.1710 0.0762 0.2060 0.1935 0.2000 0.0980
4 0.1772 0.0788 0.2010 0.1890 0.1990 0.1055
5 0.1810 0.0757 0.2023 0.1835 0.2025 0.1030
Avg 0.1773 0.0765 0.2017 0.1893 0.1998 0.1031
Harvesting probability, Phar v = 0.5 and γmid = 8
1 0.0843 0.0288 0.1708 0.1667 0.1643 0.1625
2 0.0887 0.0318 0.1752 0.1590 0.1730 0.1485
3 0.0840 0.0300 0.1812 0.1570 0.1742 0.1512
4 0.0910 0.0275 0.1782 0.1745 0.1718 0.1690
5 0.0890 0.0280 0.1812 0.1600 0.1752 0.1557
Avg 0.0874 0.0292 0.1773 0.1634 0.1717 0.1574
Harvesting probability, Phar v = 0.7 and γmid = 6
1 0.0890 0.0340 0.1660 0.1457 0.1598 0.1427
2 0.0870 0.0333 0.1737 0.1593 0.1663 0.1400
3 0.0830 0.0338 0.1817 0.1552 0.1730 0.1740
4 0.0862 0.0348 0.1802 0.1532 0.1745 0.1462
5 0.0860 0.0320 0.1750 0.1562 0.1668 0.1353
Avg 0.0862 0.0336 0.1753 0.1539 0.1681 0.1476
Harvesting probability, Phar v = 0.7 and γmid = 8
1 0.0293 0.0090 0.1633 0.1368 0.1608 0.1330
2 0.0285 0.0085 0.1603 0.1312 0.1585 0.1258
3 0.0225 0.0088 0.1600 0.1292 0.1555 0.1217
4 0.0290 0.0083 0.1530 0.1418 0.1507 0.1380
5 0.0293 0.0090 0.1683 0.1332 0.1650 0.1288
Avg 0.0277 0.0087 0.1610 0.1344 0.1581 0.1295
188 Sk. Mahammad et al.
Let the EH node i is in battery shortage and is trying to evaluate the reward factor
associated with EH node l. Then,
l
i,l
Bm,n
Rm,n = + Bm,n − E m,n δ Rm,n−1
l i
l
,1 (18)
di,l
where Ri,l i
m,n is the rewarding factor of EH node l w.r.t EH node i, E m,n denotes the
l
estimated energy required for transmitting the packet of EH node i. Bm,n denotes
l
the battery reserves of EH node l during slot n of time frame m. Rm,n−1 is the
acknowledgement of EH node l at the instant Tm,n−1 and δi, j is Kronecker delta
function.
After evaluating the rewarding factor of all the EH nodes in its vicinity, EH node
i requests the node with highest reward factor to transmit the packet.
L = max (Ri,l
m,n ) ∀ l ∈ V (19)
where L denotes the selected node to request the packet transmission and V denotes
the neighbourhood of EH node i. Once the EH node L transmits the packet of node
i, its battery reserve will be updated to
L
Bm,n = Bm,n
L
− E m,n
i
(20)
where E u is the energy spent in transmitting a packet per unit distance among the
nodes. If none of the neighbouring nodes have a positive Ri,lm,n or if the EH node
i does not have battery reserve even to transmit the packet to EH node L or if the
packet receives an NACK (Rm,n = −1) after transmission, it results in packet drop
or outage once the packet runs out of maximum transmission attempts (K).
5 Numerical Results
Experiments are carried out to evaluate and compare the performances of all the
proposed transmission policies under various effecting factors such as harvesting
probability (Phar v ), mean channel gain (γmid ) and re-transmission index (K ). The
results are summarized and presented in Table 1. Further, the impact of collabora-
tive transmission policy in achieving even better results is studied, and results are
tabulated in Table 2. MMD-RW represents the transmission policy which employed
maximum matched distribution model-based technique with roulette wheel selection
to estimate the channel gain. ELM represents the transmission policy with extreme
12 Roulette Wheel Selection-Based Computational Intelligence … 189
Table 2 Packet drop probability or outage probability for collaborative re-transmission policy with
re-transmission index K = 4, different energy harvesting probabilities, Phar v and mean channel
gains, γmid
Attempt Without collaboration With collaborative policy
RW ANN RW ANN RW(RW) ANN(RW) RW(ANN) ANN(ANN)
Harvesting probability, Phar v = 0.3 and γmid = 6
1 0.4158 0.3842 0.4105 0.3890 0.2542 0.2705 0.2452 0.2690
2 0.3960 0.3693 0.3937 0.3665 0.2375 0.2630 0.2243 0.2610
3 0.4030 0.3563 0.4083 0.3688 0.2452 0.2675 0.2430 0.2660
4 0.4020 0.3625 0.4065 0.3703 0.2437 0.2650 0.2362 0.2590
5 0.3995 0.3655 0.4073 0.3795 0.2540 0.2717 0.2370 0.2715
Avg 0.4033 0.3676 0.4053 0.3748 0.2469 0.2675 0.2371 0.2653
Harvesting probability, Phar v = 0.3 and γmid = 8
1 0.2880 0.2732 0.2855 0.2925 0.1600 0.2268 0.1557 0.2225
2 0.2940 0.2692 0.2960 0.2750 0.1570 0.2253 0.1520 0.2200
3 0.2815 0.2652 0.2858 0.2637 0.1565 0.2308 0.1525 0.2238
4 0.3040 0.2913 0.3090 0.3010 0.1745 0.2490 0.1658 0.2432
5 0.2968 0.2695 0.3035 0.2863 0.1593 0.2288 0.1510 0.2253
Avg 0.2929 0.2737 0.2960 0.2837 0.1615 0.2321 0.1554 0.2270
Harvesting probability, Phar v = 0.5 and γmid = 6
1 0.1757 0.2200 0.1767 0.2182 0.1050 0.2065 0.0988 0.2050
2 0.1742 0.2188 0.1735 0.2172 0.1075 0.2100 0.1035 0.2062
3 0.1875 0.2235 0.1802 0.2255 0.1118 0.2075 0.1045 0.2115
4 0.1822 0.2238 0.1730 0.2190 0.1160 0.2092 0.1047 0.2047
5 0.1643 0.2150 0.1740 0.2097 0.1037 0.2040 0.1003 0.2020
Avg 0.1768 0.2202 0.1755 0.2179 0.1088 0.2074 0.1024 0.2059
Harvesting probability, Phar v = 0.5 and γmid = 8
1 0.0770 0.1305 0.0783 0.1375 0.0503 0.1275 0.0520 0.1260
2 0.0785 0.1305 0.0808 0.1435 0.0450 0.1278 0.0510 0.1280
3 0.0777 0.1393 0.0725 0.1445 0.0465 0.1340 0.0473 0.1355
4 0.0732 0.1365 0.0760 0.1390 0.0437 0.1313 0.0418 0.1315
5 0.0650 0.1123 0.0675 0.1288 0.0370 0.1100 0.0413 0.1108
Avg 0.0743 0.1298 0.0750 0.1387 0.0445 0.1261 0.0467 0.1264
Harvesting probability, Phar v = 0.7 and γmid = 6
1 0.0808 0.1747 0.0803 0.1760 0.0540 0.1745 0.0542 0.1747
2 0.0858 0.1742 0.0860 0.1787 0.0610 0.1732 0.0597 0.1725
3 0.0717 0.1792 0.0775 0.1735 0.0510 0.1777 0.0490 0.1777
4 0.0757 0.1742 0.0745 0.1797 0.0525 0.1742 0.0460 0.1742
5 0.0777 0.1787 0.0765 0.1727 0.0503 0.1790 0.0515 0.1767
Avg 0.0783 0.1762 0.0790 0.1761 0.0538 0.1757 0.0521 0.1752
Harvesting probability, Phar v = 0.7 and γmid = 8
1 0.0285 0.1462 0.0222 0.1462 0.0205 0.1460 0.0138 0.1457
2 0.0280 0.1485 0.0238 0.1490 0.0187 0.1485 0.0165 0.1482
3 0.0272 0.1560 0.0270 0.1568 0.0182 0.1557 0.0175 0.1555
4 0.0195 0.1475 0.0265 0.1455 0.0140 0.1467 0.0182 0.1475
5 0.0290 0.1552 0.0270 0.1497 0.0182 0.1550 0.0203 0.1545
Avg 0.0264 0.1507 0.0253 0.1494 0.0179 0.1504 0.0173 0.1503
190 Sk. Mahammad et al.
Fig. 3 Battery utilization and energy harvesting for the first 100 time slots of transmission. Envi-
ronmental conditions, harvesting probability, Phar v = 0.6, mean channel gain, γmid = 10 and re-
transmission index, K = 4
learning machine technique to estimate the channel gain, and ANN represents the
transmission policy that utilizes artificial neural network to estimate the channel gain.
Weibull scale and shape parameters corresponding to Taralkatti, Karnataka, region
for four months have been adopted. So, the number of probable combinations which
decide the amount of harvested energy is sixteen. This helps in predicting the suit-
ability of proposed policies to tough environmental conditions. Frequency of energy
harvesting is governed by the harvesting probability. Initial battery level at EH node
is taken as 70% of the total capacity. Battery levels of all the transmission policies
for the first hundred time slots are shown in Fig. 3. Supporting environmental con-
ditions are harvesting probability, Phar v = 0.6, re-transmission index, K = 4, and
mid-channel gain γmid = 10. The increase in battery level indicates energy harvest-
ing, and the dips in battery level over the slots indicate the energy spent in transmitting
the packets.
Harvesting probability indicates the frequency of energy getting harvested from the
natural resources. The higher the probability, the higher will be energy availability in
battery to spend in transmitting the packets. This results in lesser non-transmissions
due to the lack of sufficient energy (15), in turn reduction in packet outage probability.
Probability of harvesting and amount of energy harvested each time totally depends
on environmental conditions. Reduction in packet drop probability with increase in
12 Roulette Wheel Selection-Based Computational Intelligence … 191
Fig. 4 Variation in packet drop probability against harvesting probability under a fixed environment
of mean channel gain, γmid = 7 and re-transmission index, K = 4
From (11), it can be understood that the channel gain or fading gain directly effects
the amount of energy required in transmitting a packet from EH node. If the channel
gain is more, the energy required will be less, doesn’t exhaust much battery. This
results in lesser packet outages due to lack of energy, which directly reduces the
packet drop probability. Simulations are carried out for a range of channel gain
variations. As mentioned earlier, for a considered γmid , the computational intelligence
techniques are trained with a history of channel gain samples with mean channel gain
of γmid , whereas the actual channel gain samples to be estimated vary arbitrarily from
γmid − 2 to γmid + 2. From Table 1, a significant reduction in packet drop probability
with higher γmid can be observed. For a good channel with higher gains, packet
outages will be lesser and performance will be higher.
The higher the K , the higher the number of slots, more chance for energy harvesting
as well as transmission attempts. Higher harvesting increases the battery reserve
and reduces the packet drop probability. It may happen that the EH node transmits
the packet with energy closer to the required value in (K − 1)th attempt. If another
attempt is given, it may result in an ACK. Therefore, the chance of reaching the actual
required energy increases with increase in re-transmission index. Though higher K
gives better performance, it significantly increases the delay. Effect of K on packet
drop probability is quiet evident from the results presented in Table 1.
6 Conclusion
References
1. J.M. Rabaey, M.J. Ammer, J.L. da Silva, D. Patel, S. Roundy, “PicoRadio supports ad hoc
ultra-low power wireless networking”, IEEE Computer Society, pp.42 – 48, Jul 2000
2. J.A. Paradiso, T. Starner, “Energy scavenging for mobile and wireless electronics”, IEEE Per-
vasive Computing,pp. 18 – 27, 2005
3. Sravanthi Chalasani, James M. Conrad, “A survey of energy harvesting sources for embedded
systems”, IEEE South east Conf., pp.442 – 447, 2008
4. Ozel O, Tutuncuoglu K, Yang J, Ulukus S, Yener A (Sep. 2011) Transmission with energy
harvesting nodes in fading wireless channels: Optimal policies. IEEE J. Sel. Areas Commun.
29(8):1732–1743
5. Lei J, Yates R, Greenstein L (February 2009) A generic model for optimizing single-hop
transmission policy of replenishable sensors. IEEE Trans. Wireless Commun. 8:547–551
6. Sharma V, Mukherji U, Joseph V, Gupta S (April 2010) Optimal energy management policies
for energy harvesting sensor nodes. IEEE Trans. Wireless Commun. 9:1326–1336
7. Gatzianas M, Georgiadis L, Tassiulas L (February 2010) Control of wireless networks with
rechargeable batteries. IEEE Trans. Wireless Commun. 9:581–593
8. C. Ho and R. Zhang, “Optimal energy allocation for wireless communications powered by
energy harvesters”, in IEEE ISIT, June 2010
9. J. Yang and S. Ulukus, “Transmission completion time minimization in an energy harvesting
system”, in CISS, March 2010
10. J. Yang and S. Ulukus, “Optimal packet scheduling in an energy harvesting communication
system”, IEEE Trans. Commun.,Jan 2012
11. K. Tutuncuoglu and A. Yener, “Optimum transmission policies for battery limited energy
harvesting nodes”, IEEE Trans. Wireless Commun.,Mar 2012
12. I. Stanojev, O. Simeone, Y. Bar-Ness, and D. Kim, “On the energy efficiency of hybrid-ARQ
protocols in fading channels”, in Proc. ICC, 2007, pp. 3173–3177
13. X. Jiang, J. Polastre, and D. Culler, “Perpetual environmentally powered sensor networks”,
Proc. 4th ACM/IEEE IPSN, 2005, pp. 463–468
14. Kansal A, Hsu J, Zahedi S, Srivastava MB (Sep. 2007) Power management in energy harvesting
sensor networks. ACM Trans. Embedded Comput. Syst. 6(4):32–66
15. Seyedi Alireza, Sikdar Biplab (2010) Energy Efficient Transmission Strategies for Body Sensor
Networks with Energy Harvesting. IEEE Transactions on Communications
16. Tutuncuoglu Kaya, Yener Aylin (2011) Short-Term Throughput Maximization for Battery
Limited Energy Harvesting Nodes
194 Sk. Mahammad et al.
17. Shenqiu Z, Seyedi A, Sikdar B (Aug. 2013) An analytical approach to the design of energy
harvesting wireless sensor nodes. IEEE Trans. Wireless Commun. 12(8):4010–4024
18. S. Roundy, D. Steingart, L. Frechette, P.K. Wright, and J.M.Rabaey, “Power Sources for Wire-
less Sensor Networks”, Proc. First European Workshop Wireless Sensor Networks (EWSN
’04), pp. 1–17,Jan. 2004
19. Shaobo Mao, Man Hon Cheung and Vincent W. S. Wong, “Joint Energy Allocation for Sensing
and Transmission in Rechargeable Wireless Sensor Networks”, IEEE Trans. Vehicular Tech.,
vol. 63, no. 6, pp. 2862–2875, Jul 2014
20. B. Zhang, R. Simon, and H. Aydin, “Maximal utility rate allocation for energy harvesting
wireless sensor networks”, in Proc. ACM Int. Conf.Model., Anal., Simul. Wireless Mobile
Syst., 2011, pp. 7–16
21. Ren-Shiou Liu ; Prasun Sinha ; Can Emre Koksal “Joint Energy Management and Resource
Allocation in Rechargeable Sensor Networks”, Proc. IEEE INFOCOM,March 2010
22. Stankovic JA, Abdelzaher TE, Lu C, Sha L, Hou JC (Jul. 2003) Real time communication and
coordination in embedded sensor networks. Proc. IEEE 91(7):1002–1022
23. Zhou S, Chen T, Chen W, Niu Z (Mar. 2015) Outage minimization for a fading wireless link
with energy harvesting transmitter and receiver. IEEE J. Sel. Areas Commun. 33(3):496–511
24. Sharma MK, Murthy CR (2014) “Packet drop probability analysis of ARQ and HARQ-CC
with energy harvesting transmitters and receivers”, in Proc. Atlanta, GA, USA, Dec, IEEE
Global Signal Inf. Process., pp 148–152
25. Doshi J, Vaze R (2014) “Long term throughput and approximate capacity of transmitter-receiver
energy harvesting channel with fading”, in Proc. Macau, China, Nov, IEEE Int. Conf. Commun.
Syst., pp 46–50
26. Yadav A, Goonewardena M, Ajib W, Elbiaze H (2015) “Novel retransmission scheme for energy
harvesting transmitter and receiver”, in Proc. London, U.K., Jun, IEEE Int. Conf. Commun.,
pp 4810–4815
27. Mahdavi-Doost H, Yates RD (2013) “Energy harvesting receivers: Finite battery capacity”, in
Proc. Istanbul, Turkey, Mar, IEEE Int. Symp. Inf. Theory, pp 1799–1803
28. Yates RD, Mahdavi-Doost H (2013) “Energy harvesting receivers: Optimal sampling and
decoding policies”, in Proc. Austin, TX, USA, Dec, IEEE Global Signal Inf. Process., pp
367–370
29. Moser C, Thiele L, Brunelli D, Benini L (Apr. 2010) Adaptive Power Management for Envi-
ronmentally Powered Systems. IEEE Trans. Comput. 59(4):478–491
30. Lei J, Yates R, Greenstein L (Feb. 2009) A generic model for optimizing single-hop transmission
policy of replenishable sensors. IEEE Trans. Wireless Commun. 8(2):547–551
31. Aprem A, Murthy CR, Mehta NB (Oct. 2013) Transmit power control policies for energy
harvesting sensors with retransmissions. IEEE J. Sel. Topics Signal Process. 7(5):895–906
32. Animesh Yadav, Mathew Goonewardena, Wessam Ajib, Octavia A. Dobre and Halima Elbiaze,
“Energy Management for Energy Harvesting Wireless Sensors With Adaptive Retransmission”,
IEEE Tansactions on communications,Dec. 2017
33. Nicholas Roseveare and Balasubramaniam Natarajan, “An Alternative Perspective on Utility
Maximization in Energy-Harvesting Wireless Sensor Networks”,IEEE Trans. Vehicular Tech.,
Vol. 63, no. 1,Jan. 2014
34. Bhargav Medepally, Neelesh B. Mehta, Chandra R. Murthy, “Implications of Energy Profile and
Storage on Energy Harvesting Sensor Link Performance”, IEEE Global Telecommunications
Conference, 2009. GLOBECOM 2009
35. Akanksha Sharma,Bharat Kumar Saxena and K. V. S. Rao “Comparison of Wind Speed, Wind
Directions, and Weibull Parameters for Sites Having Same Wind Power Density”, IEEE Intl.
Conf. on Technological Advancements in Power and Energy, 2017
36. Kong Fanxin, Dong Chuansheng (Nov. 2014) Xue Liu and Haibo Zeng “Quantity Versus
Quality: Optimal Harvesting Wind Power for the Smart Grid”. Proc. of the IEEE 102(11):1762–
1776
37. Masseran N (Mar. 2015) Evaluating wind power density models and their statistical properties.
Energy 84:533–541
12 Roulette Wheel Selection-Based Computational Intelligence … 195
M
B Mahammad, Shaik, 177
Balavand, Alireza, 51
Bansode, Nutan V., 1
Biswal, Santosh Kumar, 155, 169 N
Nair, Sankar N., 13
G
Gopi, E. S., 13, 177
R
Gouda, Nikhil Kumar, 155
Rajarapollu, Prachi R., 1
H
Husseinzadeh Kashan, Ali, 51 S
Sarmah, Dipti Kapoor, 91
Satapathy, Suresh Chandra, 113
J Singh, T. P., 127
Jadhav, Mrunalini, 69
Jadhav, Sangeeta, 141
Y
Yogesh, Vineetha, 177
K
Khare, Kanchan, 69
Kulkarni, Anand J., 113 Z
Kulkarni, Rushikesh, 69 Zouggar, Souad Taleb, 31