Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing Simon James Fong 2024 scribd download
Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing Simon James Fong 2024 scribd download
com
https://textbookfull.com/product/bio-inspired-algorithms-
for-data-streaming-and-visualization-big-data-management-
and-fog-computing-simon-james-fong/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-loucas/
textboxfull.com
https://textbookfull.com/product/probabilistic-data-structures-and-
algorithms-for-big-data-applications-gakhov/
textboxfull.com
https://textbookfull.com/product/artificial-intelligence-for-
coronavirus-outbreak-simon-james-fong/
textboxfull.com
https://textbookfull.com/product/learning-ipython-for-interactive-
computing-and-data-visualization-second-edition-cyrille-rossant/
textboxfull.com
Innovations in Bio Inspired Computing and Applications
Ajith Abraham
https://textbookfull.com/product/innovations-in-bio-inspired-
computing-and-applications-ajith-abraham/
textboxfull.com
https://textbookfull.com/product/disk-based-algorithms-for-big-
data-1st-edition-healey/
textboxfull.com
https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/
textboxfull.com
https://textbookfull.com/product/bio-inspired-algorithms-in-pid-
controller-optimization-first-edition-ashour/
textboxfull.com
Springer Tracts in Nature-Inspired Computing
Bio-inspired
Algorithms for Data
Streaming and
Visualization, Big
Data Management,
and Fog Computing
Springer Tracts in Nature-Inspired Computing
Series Editors
Xin-She Yang, School of Science and Technology, Middlesex University, London,
UK
Nilanjan Dey, Department of Information Technology, Techno India College of
Technology, Kolkata, India
Simon Fong, Faculty of Science and Technology, University of Macau, Macau,
Macao
The book series is aimed at providing an exchange platform for researchers to
summarize the latest research and developments related to nature-inspired
computing in the most general sense. It includes analysis of nature-inspired
algorithms and techniques, inspiration from natural and biological systems,
computational mechanisms and models that imitate them in various fields, and
the applications to solve real-world problems in different disciplines. The book
series addresses the most recent innovations and developments in nature-inspired
computation, algorithms, models and methods, implementation, tools, architectures,
frameworks, structures, applications associated with bio-inspired methodologies
and other relevant areas.
The book series covers the topics and fields of Nature-Inspired Computing,
Bio-inspired Methods, Swarm Intelligence, Computational Intelligence,
Evolutionary Computation, Nature-Inspired Algorithms, Neural Computing, Data
Mining, Artificial Intelligence, Machine Learning, Theoretical Foundations and
Analysis, and Multi-Agent Systems. In addition, case studies, implementation of
methods and algorithms as well as applications in a diverse range of areas such as
Bioinformatics, Big Data, Computer Science, Signal and Image Processing,
Computer Vision, Biomedical and Health Science, Business Planning, Vehicle
Routing and others are also an important part of this book series.
The series publishes monographs, edited volumes and selected proceedings.
Editors
Bio-inspired Algorithms
for Data Streaming
and Visualization, Big Data
Management, and Fog
Computing
123
Editors
Simon James Fong Richard C. Millham
University of Macau Durban University of Technology
Taipa, China Durban, South Africa
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
The purpose of this book is to provide some insights into recently developed
bio-inspired algorithms within recent emerging trends of fog computing, sentiment
analysis, and data streaming as well as to provide a more comprehensive approach
to the big data management from pre-processing to analytics to visualisation phases.
Although the application domains of these new algorithms may be mentioned, these
algorithms are not confined to any particular application domain. Instead, these
algorithms provide an update into emerging research areas such as data streaming,
fog computing, and phases of big data management.
This book begins with the description of bio-inspired algorithms with a
description on how they are developed, along with an applied focus on how they
can be applied to missing value extrapolation (an area of big data pre-processing).
The book proceeds to chapters including identifying features through deep learning,
overview of data mining, recognising association rules, data streaming, data visu-
alisation, business intelligence and current big data tools.
One of the reasons for writing this book is that the bio-inspired approach does
not receive much attention although it continues to show considerable promise and
diversity in terms of approach of many issues in big data and streaming. This book
outlines the use of these algorithms to all phases of data management, not just a
specific phase such as data mining or business intelligence. Most chapters
demonstrate the effectiveness of a selected bio-inspired algorithm by experimental
evaluation of it against comparative algorithms. One chapter provides an overview
and evaluation of traditional algorithms, both sequential and parallel, for use in data
mining. This chapter is complemented by another chapter that uses a bio-inspired
algorithm for data mining in order to enable the reader to choose the most
appropriate choice of algorithms for data mining within a particular context. In all
chapters, references for further reading are provided, and in selected chapters, we
will also include ideas for future research.
v
Contents
vii
viii Contents
ix
Chapter 1
The Big Data Approach Using
Bio-Inspired Algorithms: Data
Imputation
1 Introduction
In this chapter, the concept of big data is defined based on the five characteristics
namely velocity, volume, value, veracity, and variety. Once defined, the sequential
phases of big data are denoted, namely data cleansing, data mining, and visual-
ization. Each phase consists of several sub-phases or steps. These steps are briefly
described. In order to manipulate data, a number of methods may be employed.
In this chapter, we look at an approach for data imputation or the extrapolation of
missing values in data. The concept of genetic algorithms along with its off-shoot,
meta-heuristic algorithms, is presented. A specialized type of meta-heuristic algo-
rithm, bio-inspired algorithms, is introduced with several example algorithms. An
example, a bio-inspired algorithm, the kestrel, is introduced using the steps outlined
for the development of a bio-inspired algorithm (Zang et al. 2010). This kestrel algo-
rithm will be used as an approach for data imputation within the big data phases
framework.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 1
to Springer Nature Singapore Pte Ltd. 2021
S. J. Fong and R. C. Millham (eds.), Bio-inspired Algorithms for Data
Streaming and Visualization, Big Data Management, and Fog Computing,
Springer Tracts in Nature-Inspired Computing,
https://doi.org/10.1007/978-981-15-6695-0_1
2 R. Millham et al.
The definition of big data varies from one author to another. A common definition
might be that it denotes huge volume and complicated data sets because it comes
from heterogeneous sources (Banupriya and Vijayadeepa 2015). Because of the enor-
mous variety in definitions, big data is often known by its characteristics of velocity,
volume, value, veracity, and variety which constitutes the framework of big data.
Velocity relates to how quickly incoming data needs to be evaluated with results
produced Longbottom and Bamforth (2013). Volume relates to the amount of data to
be processed. Veracity relates to the accuracy of results emerging from the big data
processes. Value is the degree of worth that the user will obtain from the big data
analysis.
of more complex individuals (as in wolf search algorithm) (Tang et al. 2012), or the
single behaviour of an individual (Agbehadji et al. 2016b). Within these categories,
such as particle swarm, there are many types (such as artificial bee colony), and
within these types, there are many applications of the same algorithm for such things
as image processing, route optimization, etc. (Selvaraj et al. 2014).
A major category of bio-inspired algorithms are particle swarm algorithms.
Particle swarm algorithms is a bio-inspired technique that mimics the swarm
behaviour of animals such as fish schools or bird flocks (Kennedy and Eberhart
1995). The behaviour of the swarm is determined by how particles adapt and make
decisions in changing their position within a space relative to the positions of neigh-
bouring particles. The advantage of swarm behaviour is that as particles make a
decision, it leads to local interaction among particles which in turn, lead it to an
emergent behaviour (Krause et al. 2013). Particle swarm algorithm that focuses on
finding the near-optimal solution includes the firefly algorithm, bats (Yang and Deb
2009) and cuckoo birds (Yang and Deb 2009).
The basis of the firefly algorithm’s behaviour is the short and rhythmic flashes it
produces. This flashing light of fireflies is used as an instrument to attract possible
prey, attract mating partners, and to act as a warning signal. The firefly signalling
system consists of rhythmic flash, frequency of flashing light and time period of
flashing. This signalling system is controlled by simplified basic rules underlining
the behaviour of firefly that can be summarized as, one firefly can be connected with
another; hence, this connection which refers to attractiveness is proportional to the
level of brightness between each firefly and brightness is affected by landscape (Yang
2010a, b, c). The attraction formulation is based on the following assumptions:
(a) Each firefly attracts another fireflies that has a weak flash light
(b) This attraction depends on the level of brightness of the flash which is reversely
proportional to their proximity to each other
(c) The firefly with the brightest flash is not attracted to any other firefly and their
flight is random (Yang 2010a, b, c).
The signal of this flashing light instrument is governed by a simplified basic rule
which forms the basis of firefly behaviour. In comparison with a genetic algorithm, it
uses what is referred to as operators that are mutation, crossover, and selection. The
firefly uses attractiveness and brightness of its flashing light. The similarity between
the firefly algorithm and the genetic algorithm is that both algorithms generate an
initial population which is updated continuously at each iteration, via fitness function.
In terms of firefly behaviour, the brighter fireflies attract those fireflies nearest to them
and those fireflies whose brightness fall below a defined threshold are removed from
subsequent population. The brightest fireflies, whose brightness have exceeded a
specified threshold, constitute the next generation and this generation continues until
either a termination criteria (best possible solution) is met or the highest number of
1 The Big Data Approach Using Bio-Inspired Algorithms: Data … 5
iterations is achieved. The use of brightness in firefly algorithm is to help attract the
weaker firefly which mimics the extrapolation of missing values in a dataset where
the fireflies represent known values and those with the brightest light (indicating
closeness to the missing values as well as nearness to the set of data including the
missing value) are selected as suitable to replace the missing value entries.
The bat search algorithm is another bio-inspired search technique that is grounded
on the behaviour of micro-bats within their natural environment (Yang 2010a, b, c).
Bat is known to have a very unique behaviour called echolocation. This characteristic
assists bats to orient themselves and find prey within their habitat. The search strategy
of a bat, whether to navigate or to capture prey, is governed by the pulse rate and
loudness of their cry. This pulse rate governs the enhancement of the best possible
solution, its loudness affects the acceptance of the best possible solution (Fister
et al. 2014). Similar to genetic search algorithm, the bat search algorithm begins
with random initialization, evaluation of the newly generated population, and after
multiple iterations, the best possible solution is outputted. In contrast to the wolf
search algorithm that uses attractiveness, the bat search algorithm uses its pulse rate
and loudness to steer its search for a near-optimal solution. The bat search algorithm,
with its behaviour, has been applied to several optimization problems to find the best
possible solution.
air disturbances from flying prey (especially flying insects) as indicators of prey”,
and can move “with precision through a changing airstream”. Kestrels have the
ability to flap their winds and adjust their long tails in order to stay in a place
(denoted as a still position) in a “changing airstream”. While in perch mode (often
perching from high fixed structures such as poles or trees), kestrels change their
perch position every few minutes before performing a thorough search (which is
denoted as “local exploitation” based on its individual hunt behaviour) of its local
territory which requires “less energy than a hovering hunt”. While in perch mode, the
kestrel uses its ultraviolet detection capacity to discover potential prey such as voles
nearer to its perch area. This behaviour suggests that while in perch stance, kestrel
uses this position to conserve some energy and to focus their ultraviolet detection
capabilities for spotting slow moving prey on the ground. Regardless of perch or
hovering mode, skill development also plays a role. Individual kestrels with better
“perch and hovering skills” that are utilized in a larger search area possess a better
chance to swoop down faster on their prey or flee from its enemies than “individual
kestrels that develop hunting skills in local territories” (Varland 1991). Consequently,
it is important to combine hunting skills from both hovering and perch modes in order
to accomplish a successful hunt.
In order to better characterize the kestrel, certain traits are given as their defining
behaviour:
(1) Soaring: it provides a wider search space (global exploration) within their visual
coverage area
(2) Perching: this enables thorough search or local exploitation within a visual
coverage radius
(a) Behaviour involves “frequent bobbing of head” to find the best position of
attack
(b) Using a trail, identify potential prey and then the kestrel glides to capture
prey
(d) “New trails are more attractive than old trails”. Thus, the trail decay, as the trail
evaporates, depends on “the half-life of the trail”.
Following the steps of Zang et al. (2010), a model that represents the kestrel behaviour
is expressed mathematically. The following sets of kestrel characteristics, with their
mathematical equivalents, are provided below:
• Encircling behaviour
This encircling behaviour occurs when the “kestrel randomly shifts (or changes)” its
“centre of circling direction” in response to detecting the current position of prey.
When the prey changes from its present position, the kestrel randomly shifts, or
changes, the “centre of circling direction” in order to recognize the present position
of prey. With the change of position of prey, the kestrel correspondingly alters its
encircling behaviour to encircle its prey. The movement of prey results in the kestrel
adopting the best possible position to strike. This encircling behaviour
D (Kumar 2015) is denoted in Eq. 1 as:
→
= −
D C ∗−
→
x p (t) − x(t) (1)
C = 2 ∗ −
→
r1 (2)
x(t + 1) = −
→
x p (t) − A ∗ D
(3)
A = 2 ∗ z ∗ −
→
r2 − z (4)
itr
z = z hi − (z hi − z low ) (5)
Max_ itr
where itr is the current iteration, Max_itr represents maximum number of iterations
that stop the search, zhi denotes the higher bound of 2, zlow denotes the lower bound
of 0. Any other kestrels included in this search for prey will update their position
based on the best position of the leading kestrel. In addition, the change in position in
the airstream for kestrels is dependent on the “frequency of bobbing”, how it attracts
prey and “trail evaporation”. These dependent variables are denoted as follows:
(a) Frequency of bobbing
The bobbing frequency is used to determine sight distance measurement within the
search space. This is denoted in Eq. 6 as follows:
k
f t+1 = f min + f max − f min ∗ α (6)
Attractiveness β denotes the light reflection from trails, which is expressed in Eq. (7)
as follows:
β(r ) = βo e−γ r
2
(7)
n
|xi,k − xc,k |λ ) λ
1
s(xi , xc ) = ( (8)
k=1
V ≤ s(xi , xc ) (9)
where x i denotes the current sight measurement, x c indicates all possible adjacent
sight measurement near x i , n is the total number of adjacent sights and λ is the order
(values of 1 or 2) and V is the visual range.
(c) Trail evaporation
A trail may be defined as way to form and maintain a line (Dorigo and Gambardella
1997). In meta-heuristic algorithms, trails are used by ants to track the path from their
10 R. Millham et al.
home to a food source while avoiding getting mired to just one food source. Thus,
these trails enable multiple food sources to be used within a search space. (Agbehadji
2011) While ants search continuously, trails are developed with elements attached to
these trails. These elements assist ants in communicating with each other regarding
the position of food sources. Consequently, other ants constantly follow this path
while depositing elements for the trail to remain fresh. In the same manner that ants
use trails, “kestrels use trails to search for food sources”. These trails, unlike those
of ants, are created by prey which, thus, provide an indication to kestrels on the
obtainability of food sources. The assumption with the kestrel is that the elements
left by these prey (urine, faeces, etc.) are similar to those elements left on an ant
trail. In addition, when the food source indicated by the trail is exhausted, kestrels
no longer pursue this path as the trail elements begin to reduce with “time at an
exponential rate”. With the reduction of trails’ elements, the trail turns old. This
reduction indicates the unstable essence of trail elements which is expressed as if
there are N “unstable substances” with an “exponential decay rate” of γ, then the
equation to detail how N element reduces in time t is expressed as follows (Spencer
2002):
dN
= −γ N (10)
dt
Because these elements are unstable, there is “randomness in the decay process”.
Consequently, the rate of decay (γ ) with respect to time (t) can be re-defined as
follows:
γt = γo e−λt (11)
where γo is a “random initial value” of trail elements that is reduced at each iteration.
t is the number of iterations/generations/time steps, where t ∈ [0, Max_itr] with
Max_itr being the maximum number of iterations.
⎧
⎨ γt > 1, trail is new
if γt → (12)
⎩
0, otherwise
φmax − φmin
λ= (13)
t 21
where λ is “the decay constant”, φmax is the maximum number elements in trail,
φmin is the minimum number of elements in trail and t 21 is the “half-life period of
a trail which indicates that a trail” has become “old and unattractive” for pursuing
prey.
Lastly, the Kestrel will updates its location using the following equation:
1 The Big Data Approach Using Bio-Inspired Algorithms: Data … 11
2
k
xt+1 = xtk + βo e−γ r x j − xi + f tk (14)
k
where xt+1 signifies the present optimal location of kestrels. xtk is the preceding
location.
• Fitness function
In order to evaluation how well an algorithm achieves in terms of some criteria (such
as the quality of estimation for missing value), a fitness function is applied. In the
case of missing value estimation, the measurement of this achievement is in terms
of “minimizing the deviation of data points from the estimated value”. A number of
performance measurement tools may be used such as mean absolute error (MAE),
root mean square (RMSE), and mean square error (MSE).
In this chapter, the fitness function for the kestrel search algorithm uses the mean
absolute error (MAE) as its performance measurement tool in order to determine
the quality of estimation of missing values. MAE was selected for use in the fitness
function because it allows the modelled behaviour of the kestrel to fine tune and
improve on its much more precise estimation of values concern for negative values.
The MAE is expressed in Eq. (15) as follows:
1
n
MAE = |oi − xi | (15)
n i=1
where xi indicates the estimated value at the ith position in the dataset, oi denotes
the observed data point at ith position “in the sampled dataset, and n is the number
of data points in the sampled dataset”.
• Velocity
The velocity of kestrel as it moves from its current optimal location in a “changing
airstream” is expressed as:
vt+1
k
= vtk + xtk (16)
Any variation in velocity is governed by the inertia weight ω (which is also denoted
as the convergent parameter). This “inertia weight has a linearly” diminishing value.
Thus, velocity is denoted in Eq. 17 as follows:
vt+1
k
= ωvtk + xtk (17)
where ω is the “convergence parameter”, vtk is the “initial velocity”, xtk is best loca-
tion of the kestrel and the vt+1
k
is the present best velocity of the kestrel. Kestrels
explore through the search space to discover optimal solution and in so doing, they
constantly update the velocity, random encircling, and location towards the best
estimated solution.
12 R. Millham et al.
Following Zang (2010) steps to develop a new bio-inspired algorithm, after certain
aspects of behaviour of the selected animal is mathematically modelled, the pseudo-
code or algorithm that incorporates parts of this mathematical model is developed
both to simulate animal behaviour and to discover the best possible solution to a
given problem.
The algorithm for kestrel is given as follows (Table 1).
After the algorithm for the newly developed bio-inspired algorithm has been deter-
mined, the next step, according to Zang et al. (2010) is to test the algorithm experi-
mentally. Although kestrel behaviour, due to its encircling behaviour and adaptability
to different hunting contexts [either high above as in hovering or near the ground as
in perching] (Agbehadji et al. 2016a), is capable of being used in a variety of steps
and phases of big data mining, the step of estimating missing values within the data
cleansing phase was chosen.
Following Zang’s et al. (2010) prescription to develop a bio-inspired algorithm,
the parameters of the bio-inspired algorithm are set. The initial parameters for the
KSA algorithm were set as βo = 1 with visual range = 1. As per Eq. 5, the parameters
for the lower and higher bound, zmin = 0.2 and zmax = 0.9, respectively, were set
accordingly. A maximum number of 500 iterations/generations were set in order to
allow the algorithm to have a better opportunity of further refining the best estimated
values in each iteration.
Further to Zang’s et al. (2010) rule, the algorithm is tested against appropriate
data. This algorithm was tested using a representative dataset matrix of 46 rows and
9 columns with multiple values missing in each row of the matrix. This matrix was
designed to allow for a thorough testing of estimation of missing values by the KSA
1 The Big Data Approach Using Bio-Inspired Algorithms: Data … 13
5
-10
6
-10
2 3 4 5 6 7 8 9 10
Iterations
algorithm. This testing produced the following Fig. 1: A “sample set of data (46 by 9
matrix) with multiple missing values in the row matrix was used in order to provide
a thorough test of missing values in each row of a matrix”. The test revealed the
following figure represented as Fig. 2:
Figure 2 shows a single graph of the fitness function value of the KSA algorithm
during “500 iterations”. As can be seen in this graph, the “curve ascends and descends
steeply during the beginning iterations and then gradually converges at the best
possible solution at the end of 500 iterations/generations”. The steps within the
curve symbolize looking for a best solution within a particular search space, using
a random method, until one is found and then another space is explored. The curve
characteristics indicate that at the starting iterations, the KSA algorithm “quickly
maximizes the search space and then gradually minimizes” until it converges to the
best possible optimal value.
1.5
0.5
0
0 50 100 150 200 250 300 350 400 450 500
Iterations
common in the domain of real-time stock trading with missing data values. In real-
time trading, each stock value is marked in conjunction with a timestamp. In order to
extrapolate the correct timestamp from missing incorrect/missing timestamps, every
data entry point is checked against the internal system clock to estimate the likely
missing timestamp (Narang 2013). However, this timestamp extrapolation method
has disadvantages in its high computation cost and slower system response time for
huge volumes of data.
There are other ways to handle missing data. Conventional approaches include
ignoring missing attributes or fill in missing values with a global constant (Quinlan
1989), with the real possibility of detracting from the quality of pattern(s) discovered
based on these values. Another approach was by Grzymala-Busse et al. (2005), that is
the closest fit method, where the same attributes from similar cases are used to extrap-
olate the missing attributes. Other approaches of extrapolation include maximum
likelihood, genetic programming, Expectation-Maximization (EM), Expectation-
Maximization (EM), and “machine learning approach (such as autoencoder neural
network)” (Lakshminarayan et al. 1999).
• Closest fit Method
This method determines the closest value of the missing data attribute through the
closest fit algorithm based on the same attributes from similar cases. Using the
closest fit algorithm, the distance between cases (such as case x and y) are based on
the Manhattan distance formula that is given below:
1 The Big Data Approach Using Bio-Inspired Algorithms: Data … 15
n
distance(x, y) = distance(xi , yi )
i=1
where:
⎧
⎨0 if x = y
distance(x, y) = 1 if x and y are symbolic and x = y, or x =? or y =?
⎩ |x−y|
r
if x and y are numbers and x = y
where r denotes the differences between the maximum and minimum of the unknown
values of missing numeric values (Grzymala-Busse et al. 2005).
• Maximum likelihood:
L(θ |Yobserved ) = f Yobserved , Ymissing |θ dYmissing
where Yobserved denotes the observed data, Ymissing is the missing data, and º is
the parameter of interest to be predicted (Little and Rubin 1987). Subsequently,
likelihood function is expressed by:
n
L(θ ) = f (yi |θ )
i=1
where f(y|8) is the probability density function of the observations y whilst θ is the set
of parameters that has to be predicted provided n number of independent observation
(Allison 2012). The value of θ must be first determined before a maximum likelihood
prediction can be calculated which serves to maximize the likelihood function.
Suppose that there are n independent observation on k variables (y1 , y2 , …, yk )
“with no missing data, the likelihood function “is denoted as:
n
L= f (yi1 , yi2 , . . . , yik ; θ )
i=1
However, suppose that data is missing for individual observation i for y1 and y2.
Then, the likelihood of the individual missing data is dependent on the likelihood
16 R. Millham et al.
As the missing variable are continuous, the joint likelihood is the integral of all
potential values of the two variable that contain the missing values in the dataset.
Thus, the joint likelihood is expressed as:
5 Conclusion
The chapter introduced the concept of big data with its characteristics namely
velocity, volume, and variety. It introduces the phases of big data management,
which includes data cleansing and mining. Techniques that are used during some of
these phases are presented. A new category of algorithm, bio-inspired algorithms,
1 The Big Data Approach Using Bio-Inspired Algorithms: Data … 17
References
Abdella, M., & Marwala, T. (2006). The use of genetic algorithms and neural networks to
approximate missing data in database. Computing and Informatics, 24, 1001–1013.
Agbehadji, I. E. (2011). Solution to the travel salesman problem, using omicron genetic algorithm.
Case study: Tour of national health insurance schemes in the Brong Ahafo region of Ghana. M.
Sc. (Industrial Mathematics) Thesis. Kwame Nkrumah University of Science and Technology.
Available https://doi.org/10.13140/rg.2.1.2322.7281.
Agbehadji, I. E., Fong, S., & Millham, R. C. (2016a). Wolf Search Algorithm for Numeric
Association Rule Mining.
Agbehadji, I. E., Millham, R., & Fong, S. (2016b). Wolf search algorithm for numeric association
rule mining. In 2016 IEEE International Conference on Cloud Computing and Big Data Analysis
(ICCCBDA 2016). Chengdu, China. https://doi.org/10.1109/ICCCBDA.2016.7529549.
Allison, P. D. (2012). Handling missing data by maximum likelihood. Statistical horizons. PA, USA:
Haverford.
Banupriya, S., & Vijayadeepa, V. (2015). Data flow of motivated data using heterogeneous
method for complexity reduction. International Journal of Innovative Research in Computer
and Communication Engineering, 2(9).
Cha, S. H., & Tappert, C. C. (2009). A genetic algorithm for constructing compact binary decision
trees. Journal of Pattern Recognition Research, 4(1), 1–13.
Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: A cooperative learning approach to
the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–66.
Dorigo, M., Birattari, M., & Stutzle, T. (2006). Ant colony optimization. IEEE Computational
Intelligence Magazine, 1(4), 28–39.
Fister, I. J., Fister, D., Fong, S., & Yang, X.-S. (2014). Towards the self-adaptation of the bat
algorithm. In Proceedings of the IASTED International Conference Artificial Intelligence and
Applications (AIA 2014), February 17–19, 2014 Innsbruck, Austria.
18 R. Millham et al.
Fong, S. J. (2016). Meta-Zoo heuristic algorithms (p. 2016). Islamabad, Pakistan: INTECH.
Grzymala-Busse, J. W., Goodwing, L. K., & Zheng, X. (2005). Handling missing attribute values
in Preterm birth data sets.
Honkavaara, J., Koivula, M., Korpimäki, E., Siitari, H., & Viitala, J. (2002). Ultraviolet vision and
foraging in terrestrial vertebrates. Oikos, 98(3), 505–511.
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of IEEE
International Conference on Neural Networks (pp. 1942–1948), Piscataway, NJ.
Krause, J., Cordeiro, J., Parpinelli, R. S., & Lopes, H. S. (2013). A survey of swarm algorithms
applied to discrete optimization problems. Swarm intelligence and bio-inspired computation:
Theory and applications (pp. 169–191). Elsevier Science & Technology Books.
Kumar, R. (2015). Grey wolf optimizer (GWO). Available https://drrajeshkumar.files.wordpress.
com/2015/05/wolf-algorithm.pdf. Accessed 3 May 2017.
Lakshminarayan, K., Harp, S. A., & Samad, T. (1999). Imputation of missing data in industrial
databases. Applied Intelligence, 11, 259–275.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Longbottom, C., & Bamforth, R. (2013). Optimising the data warehouse. Dealing with large volumes
of mixed data to give better business insights. Quocirca.
Narang, R. K. (2013). Inside the black box: A simple guide to quantitative and high frequency
trading, 2nd ed. Wiley: USA. Available: https://leseprobe.buch.de/imagesadb/78/04/78041046-
b4fd-4cae-b31d-3cb2a2e67301.pdf Accessed 20 May 2018.
Quinlan, J. R. (1989). Unknown attribute values in induction. In Proceedings of the Sixth
International Workshop on Machine Learning (pp. 164–168). Ithaca, N.Y.: Morgan Kaufmann.
Selvaraj, C., Kumar, R. S., & Karnan, M. (2014). A survey on application of bio-inspired algorithms.
International Journal of Computer Science and Information Technologies, 5(1), 366–370.
Shrubb, M. (1982). The hunting behaviour of some farmland Kestrels. Bird Study, 29(2), 121–128.
Spencer, R. L. (2002). Introduction to matlab. Available https://www.physics.byu.edu/courses/com
putational/phys330/matlab.pdf Accessed 10 Sept 2017.
Tang, R., Fong, S., Yang, X.-S., & Deb, S. (2012). Wolf search algorithm with ephemeral memory. In
2012 Seventh International Conference on Digital Information Management (ICDIM) (pp. 165–
172), 22–24 August 2012, Macau. https://doi.org/10.1109/icdim.2012.6360147.
Varland, D.E. (1991). Behavior and ecology of post-fledging American Kestrels.
Vlachos, C., Bakaloudis, D., Chatzinikos, E., Papadopoulos, T., & Tsalagas, D. (2003). Aerial
hunting behaviour of the lesser kestrel falco naumanni during the breeding season in thes-
saly (Greece). Acta Ornithologica, 38(2), 129–134. Available: http://www.bioone.org/doi/pdf/
10.3161/068.038.0210 Accessed 10 Sept 2016.
Yang, X-S. (2010a). Firefly algorithms for multimodal optimization.
Yang, X. S. (2010b). A new metaheuristic bat-inspired algorithm. In Nature inspired cooperative
strategies for optimization (NICSO 2010) (pp. 65–74).
Yang, X. S. (2010c). Firefly algorithm, stochastic test functions and design optimisation. Interna-
tional Journal of Bio-Inspired Computation, 2(2), 78–84.
Yang, X. S., & Deb, S. (2009, December). Cuckoo search via Lévy flights. In Nature & Biologically
Inspired Computing, 2009. NaBIC 2009. World Congress on (pp. 210–214). IEEE.
Zang, H., Zhang, S., & Hapeshi, K. (2010). A review of nature-inspired algorithms. Journal of
Bionic Engineering, 7, S232–S237.
Israel Edem Agbehadji graduated from the Catholic University college of Ghana with B.Sc.
Computer Science in 2007, M.Sc. Industrial Mathematics from the Kwame Nkrumah University
of Science and Technology in 2011 and Ph.D. Information Technology from Durban University
of Technology (DUT), South Africa, in 2019. He is a member of ICT Society of DUT Research
group in the Faculty of Accounting and Informatics; and IEEE member. He lectured undergrad-
uate courses in both DUT, South Africa, and a private university, Ghana. Also, he supervised
several undergraduate research projects. Prior to his academic career, he took up various manage-
rial positions as the management information systems manager for National Health Insurance
Scheme; the postgraduate degree programme manager in a private university in Ghana. Currently,
he works as a Postdoctoral Research Fellow, DUT-South Africa, on joint collaboration research
project between South Africa and South Korea. His research interests include big data analytics,
Internet of Things (IoT), fog computing and optimization algorithms.
Hongji Yang graduated with a Ph.D. in Software Engineering from Durham University, England
with his M.Sc. and B.Sc. in Computer Science completed at Jilin University in China. With over
400 publications, he is full professor at the University of Leicester in England. Prof Yang has
been an IEEE Computer Society Golden Core Member since 2010, an EPSRC peer review college
member since 2003, and Editor in Chief of the International Journal of Creative Computing.
Chapter 2
Parameter Tuning onto Recurrent
Neural Network and Long Short-Term
Memory (RNN-LSTM) Network
for Feature Selection in Classification
of High-Dimensional Bioinformatics
Datasets
1 Introduction
The introduction describes the characteristics of big data, review on method and
search strategies for feature selection. With the current dispensation of big data,
reducing the volumes of dataset may be “achieved by selecting relevant features for
classification. Moreover, big data is also characterized by velocity, value, veracity
and variety. The characteristic of velocity relates to “how fast incoming data need to
be processed and how quickly the receiver of information needs the results from the
processing system” (Longbottom and Bamforth 2013); the characteristic of volume
refers to the amount of data for processing; the characteristic of value refers to what
a user will gain from data analysis. Other characteristics of big data include “variety
and veracity.” The characteristic of variety looks at “different structures of data such
as text and images, while the characteristic of veracity focuses on authenticity of the
data source.” While these characteristics (i.e., volume, value, variety and veracity) are
significant in any big data analytics, it is important to reduce the volume of dataset and
produce value (relevant and useful features) with reduced computational cost given
© The Editor(s) (if applicable) and The Author(s), under exclusive license 21
to Springer Nature Singapore Pte Ltd. 2021
S. J. Fong and R. C. Millham (eds.), Bio-inspired Algorithms for Data
Streaming and Visualization, Big Data Management, and Fog Computing,
Springer Tracts in Nature-Inspired Computing,
https://doi.org/10.1007/978-981-15-6695-0_2
Another Random Scribd Document
with Unrelated Content
Sillä hetkellä hän ei tiennyt, oliko hän samaa vaiko eri mieltä, oliko
hänestä outoa vaiko aivan luonnollista, että mies pelkäsi rakkautta,
meni sentähden kihloihin tytön kanssa ja ajatteli avioliittoa elämän
tyyneyden säilyttäjänä.
VII
»Entä tyttö?»
»Enidkö?»
»Eivät tytöt.»
»Sekö on koetuskivi?»
»Vain mitä?»
Mount jatkoi:
»Ei —»
Hän ei ollut koskaan ollut niin lähellä sitä Enidiä, joka oli
lupautunut hänen vaimokseen, kuin hän nyt oli tätä toista tyttöä,
jonka taipuisaa vartaloa hän nyt puristi käsivarsillaan ja jonka
kasvoja hän ei nähnyt, koska ne olivat kaksinkertaisessa piilossa,
yön pimeyden salaamina ja painettuina hänen olkaansa vasten.
Yhdeksänkolmatta vuotta kestäneen elämänsä aikana hän ei ollut
kertaakaan puhunut niin avoimesti, niin luontevasti ja tutunomaisesti
ainoallekaan tytölle. Ja ajatella, että tämä tyttö olisi hänelle vieras,
jollei olisi sattunut viime viikkojen aavistamattomia tapauksia! Mutta
vain muutamat tunnit — nämä keskiöisen myrskyn raivotessa
suojassa eletyt tunnithan ne olivat saattaneet heidät (sekä
sielullisesti että sananmukaisesti) lähelle toisiaan.
Kunnes —
VIII
»Sepä ikävää.»
Lyhyt äänettömyys.
IX
Lapsi-rukka, lapsi-rukka!
Hän ei hievahtanut.
»Mitä niin?»
»Sitä helmeä.»
»Helmeä?»
Hänen oli saatava tuo tyttö omakseen, mutta ainoastaan, jos hän
tahtoisi, ainoastaan miten ja milloin hän tahtoisi…
Mies ei liikahtanut.
XV luku
Seuraavana päivänä
II
Oli uskomatonta, että hän oli heittäytynyt Mountin syliin. Hän oli
lapsen lailla itkien rukoillut häntä pitelemään itseään… Hän oli ollut
siellä koko yön! Nojautuneena Mountin sydäntä vasten ja puhellen
arvottomasti…
Nyt oli kaikki piloilla. Kiihkeästi hän toivoi, ettei hänen enää
milloinkaan tarvitsisi nähdä sitä miestä. Turha toive. Oi, jospa se vain
olisi ollut kuka muu hyvänsä eikä herra Mount!… Kuinka lohdullista
olisikaan, jos hänen ei enää koskaan tarvitsisi puhutella Mountia, ei
enää koskaan katsoa häntä silmiin.
III
IV
»Minä en tule.»
Antaa tulla.
M.V.»
Kiivaasti hän viskasi levyn kivelle. Entä jos hän jättäisi helmensä
sen päälle, jotta hänen katseensa osuisi niiden välkkeeseen?
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com