Data Analytics
Data Analytics
Data Analytics
Concepts, Techniques, and Applications
Edited by
Mohiuddin Ahmed and Al-Sakib Khan Pathan
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all material or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this
publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we
may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC),
222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Acknowledgments...............................................................................................ix
Preface.................................................................................................................xi
List of Contributors........................................................................................... xv
vii
viii ◾ Contents
Index�����������������������������������������������������������������������������������������������������������417
Acknowledgments
I am grateful to the Almighty Allah for blessing me with the opportunity to work
on this book. It is my first time as a book editor and I express my sincere gratitude
to Al-Sakib Khan Pathan for guiding me throughout the process. The book e diting
journey enhanced my patience, communication, and tenacity. I am thankful to
all the contributors, critics, and the publishing team. Last but not least, my very
best wishes for my family members whose support and encouragement contributed
significantly to the completion of this book.
Mohiuddin Ahmed
Centre for Cyber Security and Games
Canberra Institute of Technology, Australia
ix
Preface
Introduction
Big data is a term for datasets that are so large or complex that traditional data
processing applications are inadequate for them. The significance of big data has
been recognized very recently and there are various opinions on its definition.
In layman’s terms, big data reflects the datasets that cannot be perceived, acquired,
managed, and processed by the traditional information technology (IT) and
software/hardware tools in an efficient manner. Communities such as s cientific
and technological enterprises, research scholars, data analysts, and technical
practitioners have different definitions of big data. Due to a large amount of data
arriving at a fast speed, a new set of efficient data analysis techniques are required.
In addition to this, the term data science has gained a lot of attention from both the
academic research community and the industry. Therefore, data analytics becomes
an essential component for any organization. For instance, if we consider health
care, financial trading, Internet of Things (IoT) smart cities, or cyber-physical sys-
tems, one can find the role of data analytics. However, with these diverse applica-
tion domains, new research challenges are also arising. In this context, this book on
data analytics will provide a broader picture of the concepts, techniques, applica-
tions, and open research directions in this area. In addition, the book is expected to
serve as a single source of reference for acquiring knowledge on the emerging trends
and a pplications of data analytics.
xi
xii ◾ Preface
Section I contains six chapters that cover the fundamental concepts of data
analytics. These chapters reflect the important knowledge areas, such as machine
learning, regression, clustering, information retrieval, and graph analysis. Section II
has six chapters that cover the major techniques of data analytics, such as t ransition
from regular database to big data, big graph analysis tools and techniques, and
game theoretical approaches for big data analysis. The rest of the chapters in this
section cover topics that lead to newer research domains, i.e., project management,
Preface ◾ xiii
Industry 4.0, and dark data. These topics are considered as the emerging trends in
data analytics. Section III is dedicated to the applications of data analytics in differ-
ent domains, such as education, traffic offenses, sports data visualization and, last
but not the least, two interesting chapters on cybersecurity for big data analytics
with specific focus on the health care sector.
xv
xvi ◾ List of Contributors
Tamanna Motahar is a lecturer at the electrical and computer engineering depart-
ment of North South University. She completed her master’s degree in electri-
cal engineering from the University of Alberta, Canada. She graduated summa
cum laude with a BSc in computer engineering from the American International
University, Bangladesh. Her high-school graduation is from Mymensingh Girls’
Cadet College, Bangladesh. Her interdisciplinary research works are based on opti-
cal electromagnetics and nano optics. Her current research areas include Internet
of Things, human–computer interaction, and big data. She is taking Junior Design
classes for computer science engineering students in the North South University
and is also mentoring several groups for their technical projects. Her email address
is [email protected].
Asma Noor graduated in 2017 from Charles Sturt University with a master’s degree
in information technology, specializing in networking and system analysis. In 2011,
she obtained her master’s degree in international relations from the University of
Balochistan, Pakistan. In 2007, she obtained a bachelor’s degree in business adminis-
tration from Iqra University, Pakistan, specializing in marketing. Her hobbies include
reading and writing. Her interests in the field of information technology include cloud
computing, information security, Internet of Things, and big data analytics.
xxii ◾ List of Contributors
Ahsanur Rahman is an assistant professor of the electrical and computer engi-
neering department at the North South University. He completed his PhD
in computer science from Virginia Tech, Blacksburg, VA. Before that he has
worked as a lecturer in the computer science and engineering department at
the American International University Bangladesh. He obtained his bachelor’s
degree in computer science and engineering from Bangladesh University of
Engineering and Technology. His research interest lies in the areas of compu-
tational systems biology, graph algorithms, hypergraphs, and big data. He is
List of Contributors ◾ xxiii
mentoring several groups of NSU for their research projects. His email address
is [email protected].
An Introduction to
Machine Learning
Mark A. Norrie
icare
Contents
1.1 A Definition of Machine Learning................................................................3
1.1.1 Supervised or Unsupervised?..............................................................4
1.2 A rtificial Intelligence.....................................................................................5
1.2.1 Th e First AI Winter...........................................................................8
1.3 ML and Statistics..........................................................................................9
1.3.1 R ediscovery of ML..........................................................................13
1.4 Critical Events: A Timeline.........................................................................14
1.5 Types of ML................................................................................................19
1.5.1 Supervised Learning........................................................................19
1.5.2 Unsupervised Learning....................................................................20
1.5.3 Semisupervised Learning.................................................................21
1.5.4 R einforcement Learning..................................................................21
1.6 Summary....................................................................................................22
1.7 Glossary......................................................................................................23
References............................................................................................................29
3
4 ◾ Data Analytics
1.2 Artificial Intelligence
The two main disciplines involved predate computers by quite a margin. They are
artificial intelligence (AI) and classical statistics. Interest in AI extends historically
all the way back to classical antiquity. Once the Greeks convincingly demonstrated
that thought and reason are basic physical processes, it was hoped that machines
could also be built to demonstrate thought, reason, and intelligence.
One of the key milestones in the development of AI came from the well-known pio-
neer of information systems, Alan Turing. In 1950, Turing suggested that it should be
possible to create a learning machine that could learn and become artificially intelligent.
A couple of years after this, people began to write programs to play games
with humans, such as checkers on early IBM computers (rather large computers
at that, as the transistor hadn’t been invented yet, and Integrated Circuits (ICs)
and Complementary metal-oxide-semiconductors (CMOS) were so far ahead in
the future that nobody had given the matter much thought).
Leading thinkers, such as John von Neumann, long advocated thinking of (and
designing) computer and information systems architectures based on an under-
standing of the anatomy and physiology of the brain, particularly its massively
parallel nature. In the Silliman lectures of 1956, which the dying von Neumann
was too ill to give, this approach is set out in detail. The lectures were published
posthumously as The Computer and the Brain, (1958) by Yale University Press, New
Haven and London [1].
In the year 1957, Frank Rosenblatt, a research psychologist, invented the
Perceptron, a single layer neural network while working at the Cornell Aeronautical
Laboratory. In Figure 1.2, we can see Dr. Rosenblatt’s original Perceptron design.
6 ◾ Data Analytics
LEARNS BY DOING
In the first fifty trials, the machine made no distinction between them.
It then started registering a Q for the left squares and O for the right squares.
8 ◾ Data Analytics
Dr. Rosenblatt said he could explain why the machine learned only in highly
technical terms. But he said the computer had undergone a self-induced
change in the wiring diagram.
The first Perceptron will have about 1,000 electronic association cells
receiving electrical Impulses from an eye-like scanning device with 400
photo-cells. The human brain has 10,000,000.000 responsive cells, including
100,000,000 connections with the eyes.
Interestingly, the IBM 704 computer they were using was state-of-the-art and would
be worth over $17M USD in today’s money. Five months later, in the December
6th issue of the New Yorker an article appeared where Dr Rosenblatt was also
interviewed [3]. The article was imaginatively entitled Rival. He was quoted saying
similar things to what had appeared in the earlier New York Times article. This was
the first true hype around AI. This hype when it finally came off the rails was also
the cause of the first AI Winter.
There is absolutely no doubt that the Perceptron was an astonishing intellectual
and engineering achievement. For the first time ever, a non-human agent (and non-
biological for that matter) could actually demonstrate learning by trial and error.
Rosenblatt published a paper (1958) [4] and a detailed report, later published as
a book on the Perceptron and used this in his lecture classes for a number of years
(1961, 1962) [5,6].
these days), and they also pulled the plug on major research funding. In the 1960s,
enormous amounts of money had been given to various AI researchers (Minsky,
Rosenblatt, etc.) pretty much to spend as they liked.
In 1969, the Mansfield Amendment was passed as part of the Defense
Authorization Act of 1970. This required the DARPA to restrict its support for basic
research to only those things that were directly related to military functions and
operational requirements. All major funding for basic AI research was withdrawn.
While it is true that there had been a number of high profile failures, in hindsight
it can be seen that the optimism was partly correct. Today almost all the advanced
technologies have elements of AI within them, as Ray Kurzweil has remarked in his
2006 book The Singularity Is Near: “many thousands of AI a pplications are deeply
embedded in the infrastructure of every industry.” [9]
To Kurzweil, the AI winter has ended, but there have been two very highly sig-
nificant ones, the first from 1974 to 1980 and the second from 1987 to 1993. There
have also been a host of minor incidents or episodes, e.g., the failure of machine
translation in 1966 and the abandonment of connectionism in 1970.
It is still the case, apparently, that venture capitalists as well as government
officials get a bit nervous at the suggestion of AI, and so most of the AI research
undertook a rebranding exercise. They also refocused research on smaller, more
tractable problems. This reminds one of what Dennis Ritchie said about why he
invented the UNIX operating system–to concentrate on doing one thing at a time
and doing it well.
écarter le moins possible. This sentence translates to English as “This shows that the
least squares method demonstrates the center around which the measured values
from the experiment are distributed so as to deviate as little as possible.”
Gauss also published on this method in 1809 [12]. A controversy erupted (simi-
lar to the one between Newton and Leibnitz about the primacy of invention of
the calculus). It turns out that although it is highly likely that Gauss knew about
this method before Legendre did, he didn’t publish on it or talk about it widely,
so Legendre can be rightly regarded as the inventor of the least squares method
(Stigler, 1981) [13].
Stigler also states: “When Gauss did publish on least squares, he went far beyond
Legendre in both conceptual and technical development, linking the method to
probability and providing algorithms for the computation of estimates.”
As further evidence of this, Gauss’s 1821 paper included a version of the Gauss–
Markov theorem [14].
The next person in the story of least squares regression is Francis Galton. In
1886, he published Regression towards Mediocrity in Hereditary Stature, which was
the first use of the term regression [15]. He defined the difference between the height
of a child and the mean height as a deviate.
Galton defined the law of regression as: “The height deviate of the offspring
is on average two thirds of the height deviate of its mid-parentage” where mid-
parentage refers to the average height of the two parents. Mediocrity is the old term
for average. Interestingly, it has taken on a vernacular meaning today as substan-
dard; however, the correct meaning is average. For Galton, regression was only ever
meaningful in the biological context.
His work was further extended by Udny Yule (1897) [16] and Karl Pearson
(1903) [17]. Yule’s 1897 paper on the theory of correlation introduced the term
variables for numerical quantities, since he said their magnitude varies. He also used
the term correlation and wanted this to be used instead of causal relation, presaging
the long debate that correlation does not imply causation. In some cases, this debate
still rages today.
Yule and Pearson’s work specified that linear regression required the variables to
be distributed in a Gaussian (or normal) manner.
They assumed that the joint distribution of both the independent and depen-
dent variables was Gaussian; however, this assumption was weakened by Fisher, in
1922, in his paper: The goodness of fit of regression formulae, and the distribution of
regression coefficients [19].
Fisher assumed that the dependent variable was Gaussian, but that the joint
distribution needn’t be. This harkens back to the thoughts that Gauss was express-
ing a century earlier.
Another major advance was made by John Nelder and Robert Wedderburn
in 1972, when they published the seminal paper “Generalised Linear Models” in
the Journal of the Royal Statistical Society [20]. They developed a class of general-
ized linear models, which included the normal, poisson, binomial, and gamma
distributions.
The link function allowed a model with linear and nonlinear components. A
maximum likelihood procedure was demonstrated to fit the models. This is the way
we do logistic regression nowadays, which is the single most important statistical
procedure employed by actuaries and in modern industry (particularly the finance
industry).
The multiple linear regression model is given by
where
yi is the dependent variable for the ith observation
β 0 is the Y intercept
β1, β2,…, βk are the population partial regression coefficients
x1i, x 2i,…,xki are the observed values of the independent variables x1, x 2,..., xk
and k = 1, 2, 3,..., K are the explanatory variables
The model written in vector form is shown as
y = Xθ + ε (1.2)
E (ε ) = 0 (1.3)
This embodies the assumption that the εj are all uncorrelated, i.e., they have zero
means and the same variance σ2.
12 ◾ Data Analytics
µj = ∑x θ ,
i =1
ij i j = 1, 2,…, n (1.4)
2
∑ j
yj −
∑
i
xijθi
(1.5)
which is the sum of squares (Eq. 1.5) in the least squares method [21].
Figure 1.4 graphically demonstrates a line of best fit, i.e., the line which mini-
mizes the squared distance between all the points and the line. This is the basis of
linear regression. The slope of the line corresponds to the regression coefficients and
the intercept is the first term in our model above. Note that this corresponds to the
well-known equation of a straight line: y = ax + b, where the slope is given by a and
b is the intercept.
Regression methods have been a continuing part of statistics and, even today,
they are being used and extended. Modern extensions include things such as ridge
regression and lasso regression.
1.3.1 Rediscovery of ML
In a sense to take up the slack, expert systems came into vogue during the first
AI Winter. By the end of it, they were waning in popularity and are hardly used
these days.
After about a 15-year hiatus, ML was revived with the invention of backpropa-
gation. This took the Perceptron model to a new level. This helped break the logjam
where it was assumed that neural nets would never amount to anything. The key
paper in 1986 was by David Rumelhart, Geoff Hinton, and Ronald J Williams [22].
It was entitled “Learning Representations by Back-Propagating Errors” and was
published in Nature, the world’s most prestigious scientific journal.
In Figure 1.5, the essential architecture and the process of backpropagation
are shown. The network topology has three layers (there may be more than this,
but three layers is a fairly common arrangement). The weights propagate forward
through the layers. Each node in the network has two units: a summing function
(for the weights) and an activation unit that uses a nonlinear transfer function.
Common transfer functions include the log-sigmoid which outputs 0 to 1 and the
tan-sigmoid which outputs –1 to 1.
The weights are rolled up to predict y, and this is compared with the actual
target vector z. The difference is known as the error signal, and it is transmitted
back through the layers, which cause re-weighting that runs forward for another
comparison. The algorithm proceeds to find an optimal value for the output by
means of gradient descent. Sometimes, local minima in the error surface can
disrupt the process of finding a global minimum, but if the number of nodes is
increased, this problem resolves, according to Rumelhart, Hinton, and Williams
(1986).
Research on neural nets exploded. Three years after that, Kur Hornik, Maxwell
Stinchcombe, and Halber White published Multilayer Feedforward Networks Are
Universal Approximators [23]. As they said, “backpropagation is the most common
neural net model used today it overcomes the limitations of the Perceptron by using
a combination of a hidden layer and a sigmoid function.”
Development in neural nets has expanded enormously, and today the current
interest is in deep belief networks, which are neural networks with multiple hid-
den layers and are topologically quite complex. They have enjoyed recent success
in allowing images to be classified accurately as well as in many other use cases.
Today, it is fair to say that Google makes an extensive use of neural nets (almost
exclusively) and they have also open-sourced such important technologies as
TensorFlow and Google Sling, which are natural language frame semantics parsers
of immense power.
1945 von Neumann A draft report that John von Neumann wrote on Eckert and Mauchly’s EDVAC proposal was
architecture widely circulated and became the basis for the von Neumann architecture. All digital
electronic computers today use this architecture [24].
1950 Turing test In a famous paper, “Computing Machinery and Intelligence” published in the journal, Mind,
Alan Turing introduces what he calls The Imitation Game [25]. Later on, this becomes a more
generalized proposition which in effect states that if a person were unable to tell a
1952 Computer checkers Arthur Samuel created a program to play checkers at IBM’s Poughkeepsie Laboratory.
programs
1957 Perceptron Frank Rosenblatt invents the Perceptron. This is the first time that a machine can be said to
learn something. This invention creates a massive amount of excitement and is widely
covered in influential media (The New York Times, The New Yorker magazine, and many
others).
Some of the claims made are quite extraordinary examples of hyperbole and this will be the
direct cause of the first AI Winter.
(Continued)
◾
15
An Introduction to Machine Learning
Von Neumann , John (1958) The Computer & the Brain, Yale University Press, New Haven and
London.
Anonymous (1958) New Navy Device Learns by Doing; Psychologist Shows Embryo of
Computer Designed to Read and Grow Wiser, The New York Times, July 1958, 25 .30
Mason, Harding , Stewart, D. , Gill, Brendan (1958) The Talk of the Town: Rival, The New
Yorker, December 1958 Issue, pp. 4445.
Rosenblatt, Frank (1958) The Perceptron: A probabilistic model for information storage and
organization in the brain, Psychological Review, Vol. 65, No. 6, pp. 386408.
Rosenblatt, Frank (1961) Principles of Neurodynamics: Perceptrons and the Theory of Brain
Mechanisms, Cornell Aeronautical Laboratory, Inc., Cornell University, New York.
Rosenblatt, Frank (1962) Principles of Neurodynamics: Perceptrons and the Theory of Brain
Mechanisms, Spartan books, Washington, DC, pp. xvi, 616.
Minsky, Marvin and Papert, Seymour (1969) Perceptrons: An Introduction to Computational
Geometry, MIT Press, Cambridge, MA.
Lighthill, James (1973) Artificial Intelligence: A General Survey, Artificial Intelligence: A Paper
Symposium, Science Research Council.
Kurzweil, Ray (2006) The Singularity Is Near, Viking Press, New York.
Jordan, Michael Irwin (2014) Comments in Relation to Statistics and Machine Learning,
http://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/; 09 Sep
2014 .
Legendre, Adrien-Marie (1805, 1806) Nouvelles Mthodes pour la Dtermination des Orbites des
Comtes, Courcier, Paris.
Gauss, Carl Friedrich (1809) Theoria Motus Corporum Coelestium in Sectionibus Conicis
Solem Ambientum, Friedrich Perthes and I.H. Besser, Hamburg.
Stigler, Stephen M. (1981) Gauss and the invention of least squares, The Annals of Statistics,
Vol. 9, No. 3, pp. 465474.
Gauss, Carl Friedrich (1821) Theoria Combinationis Observationum Erroribus Minimis
Obnoxiae, H. Dieterich , Gottingen.
Galton, Francis (1886) Regression towards mediocrity in hereditary stature, The Journal of the
Anthropological Institute of Great Britain and Ireland, Vol. 15, No. 1886, pp. 246263.
Yule, George Udny (1897) On the theory of correlation, Journal of the Royal Statistical Society,
Vol. 60, No. 4, pp. 812854.
Pearson, Karl (1903) The law of ancestral heredity, Biometrika, Vol. 2, No. 2, pp. 211236.
Pearl, Raymond (1910) Letter Raymond Pearl letter to Karl Pearson, about disagreement on
hereditary theory and his removal as an editor of Biometrika (2/15/1910),
www.dnalc.org/view/12037-Raymond-Pearl-letter-to-Karl-Pearson-about-disagreement-on-
hereditary-theory-and-his-removal-as-an-editor-of-Biometrika-2-15-1910-.html.
Fisher, Ronald Aylmer (1922) The goodness of fit of regression formulae, and the distribution of
regression coefficients, Journal of the Royal Statistical Society, Vol. 85, No. 4, pp. 597612.
Nelder, John and Wedderburn, Robert William Maclagan (1972) Generalised linear models,
Journal of the Royal Statistical Society. Series A (General), Vol. 135, No. 3, pp. 370384.
Stuart, Alan and Ord, J. Keith (1991) Kendalls advanced theory of statistics. Volume 2,
Classical Inference and Relationship, 5th edition, Edward Arnold, London.
Rumelhart, David ; Hinton, Geoff and Williams, Ronald J. (1986) Learning representations by
back-propagating errors, Nature, Vol. 323, pp. 533536.
Hornik, Kur; Stinchcombe, Maxwell and White, Halber (1989) Multilayer feedforward networks
are universal approximators, Neural Networks, Vol. 2, pp. 359366.31
Von Neumann, John (1945) First Draft of a Report on the EDVAC, Contract No.
W670ORD4926 Between the United States Army Ordnance Department and the University of
Pennsylvania, Moore School of Electrical Engineering, University of Pennsylvania, June 30,
1945 .
Turing, Alan Mathison (1950) Computing machinery and intelligence, Mind, Vol. LIX. No. 236,
pp. 433460.
Ho, Tin Kam (1995) Random Decision Forests, Proceedings of the Third International
Conference on Document Analysis and Recognition. Montreal, Quebec: IEEE. 1: 278282.
Cortes, Corinna and Vapnik, Vladimir (1995) Support-vector Networks, Machine Learning,
Kluwer Academic Publishers, Dordrecht, 20 (3), pp. 273297.
Baum, Leonard E. , Petrie, Ted , Soules, George and Weiss, Norman (1970) A maximization
technique occurring in the statistical analysis of probabilistic functions of Markov chains, The
Annals of Mathematical Statistics, Vol. 41, No. 1, pp. 164171.
Bellman, Richard (1957) A Markovian decision process, Journal of Mathematics and
Mechanics, Vol. 6, No. 5, pp. 679684.
Regression for Data Analytics
Alex Krizhevsky , Ilya Sutskever , Geoffrey E. Hinton . ImageNet Classification with Deep
Convolutional Neural Networks, Advances in neural information processing systems, 2012.
Sepp Hochreiter , Jurgen Schmidhuber . Long short-term memory, Neural Computation
9(8):17351780, 1997.
Ian J. Goodfellow , Jean Pouget-Abadie , Mehdi Mirza , Bing Xu , David Warde-Farley , Sherjil
Ozair , Aaron Courville , Yoshua Bengio . Generative Adversarial Nets. Advances in neural
information processing systems, 26722680, 2014.
Dean De Cock . Ames, Iowa: Alternative to the Boston housing data as an end of semester
regression project, Journal of Statistics Education 19(3), 2011. doi = 10.1080/
10691898.2011.11889627.
Lon Bottou . Proceedings of COMPSTAT2010, 177186, 2014.
Diederik Kingma , Jimmy Ba . Adam: A method for stochastic optimization, International
Conference on Learning Representation, San Diego, CA, 2015.
Yann LeCun , Lon Bottou , Yoshua Bengio , and Patrick Haffner . Gradient-based learning
applied to document recognition, Proceedings of the IEEE 86(11):22782324, 1998.
Fabian Pedregosa , Gal Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion ,
Olivier Grisel , Mathieu Blondel , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , Jake
Vanderplas , Alexandre Passos , David Cournapeau , Matthieu Brucher , Matthieu Perrot ,
Edouard Duchesnay . Scikit-learn: Machine learning in {P}ython, Journal of Machine Learning
Research 12:28252830, 2011.
Sandra Lach Arlinghaus . PHB Practical Handbook of Curve Fitting. CRC Press, 1994, 249 pp.
Persistent URL (URI): http://hdl.handle.net/2027.42/58759.
William M. Kolb . Curve Fitting for Programmable Calculators. Syntec, Incorporated, 1984.
James Bergstra , Yoshua Bengio . Random search for hyper-parameter optimization, Journal of
Machine Learning Research 13:281305, 2012.
James Bergstra , Remi Bardenet , Yoshua Bengio , Balazs Kgl . Algorithms for Hyper-
Parameter Optimization, Advances in neural information processing systems, 25462554, 2011.