Grid Search Random Search Genetic Algorithm A Big
Grid Search Random Search Genetic Algorithm A Big
Abstract
In this paper, we compare the three most popular algorithms for hyperparameter opti-
mization (Grid Search, Random Search, and Genetic Algorithm) and attempt to use them
for neural architecture search (NAS). We use these algorithms for building a convolutional
neural network (search architecture). Experimental results on CIFAR-10 dataset further
demonstrate the performance difference between compared algorithms. The comparison re-
sults are based on the execution time of the above algorithms and accuracy of the proposed
models.
Keywords: Neural architecture search, grid search, random search, genetic algorithm,
hyperparameter optimization
1. Introduction
Over the last few years, convolutional neural networks (CNNs) and their varieties have seen
great results on a variety of machine learning problems and applications (Shin et al. (2016);
Krizhevsky et al. (2012); Wu (2019); Xu et al. (2014); Goodfellow et al. (2014)). However,
each of the currently known architectures is designed by human experts in machine learning
(Simonyan and Zisserman (2014); Szegedy et al. (2014); He et al. (2015a)). Today, the
number of tasks that can be solved using neural networks is growing rapidly and designing a
neural network architecture becomes a long, slow and expensive process. It’s a big challenge
to design a good neural network.
Typical CNN architecture consists of several convolution, pooling, and fully-connected
layers. While designing a network architecture, an expert has to make a lot of design
choices: the number of layers of each type (convolution, pooling, dense, etc.), the ordering
of the layers, the hyperparameters for each layer, the receptive field size, stride, padding for
a convolution layer, etc.
Many kinds of research in the field of so-called automated machine learning (Auto-ML)
have been made (Zhong et al. (2017); Cai et al. (2018)). Zoph and Le (2016) propose a
reinforcement learning-based method for neural architecture search. Hebbal et al. (2019)
1
Liashchynskyi
Figure 1: An illustration of different architecture spaces. Each node in the graphs corre-
sponds to a layer in a neural network: a convolutional or pooling layer, etc. An
edge from layer Li to layer Lj denotes that Lj receives the output of Li as in-
put. Left: an element of a chain-structured space. Right: an element of a more
complex search space with additional layer types and multiple branches and skip
connections (Elsken et al. (2018).)
experiment with Bayesian optimization algorithm using deep Gaussian processes. Pham
et al. (2018) propose another approach using parameter sharing. Jin et al. (2018) and Weill
et al. (2019) presented the most inspiring frameworks for NAS and Auto-ML called Auto
Keras and AdaNet respectively.
Despite advances in the field of automated machine learning, in this paper, we attempt
to use classic hyperparameter optimization algorithms to find the optimal neural network
architecture. We compare the execution time between the above algorithms and finally will
know what algorithm proposes a model with the highest score for less time.
2. Search space
The search space defines which neural network architectures might be discovered by used
algorithm. There are many methods and strategies for neural architecture search of CNNs.
In most cases, experts build architecture from scratch by alternating convolutional and
fully-connected layers.
The simple search space is the space of chain-structured neural networks, as illustrated
in Figure 1 (Elsken et al. (2018)).
Rather than designing the entire convolutional network, one can design smaller mod-
ules and then connect them together to form a network (Pham et al. (2018)). Using this
approach, a neural network architecture can be written as a sequence of layers. Then the
search space is parametrized by:
• n number of layers;
2
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
• hyperparameters of every layer (e.g., kernel size and filters for a convolutional layer).
3. Search strategy
Grid Search. The traditional method of hyperparameters optimization is a grid search,
which simply makes a complete search over a given subset of the hyperparameters space
of the training algorithm (Figure 4). Because the machine learning algorithm parameter
space may include spaces with real or unlimited values for some parameters, it is possible
that we need to specify a boundary to apply a grid search. Grid search suffers from high
dimensional spaces, but often can easily be parallelized, since the hyperparameter values
that the algorithm works with are usually independent of each other.
Random Search. It overrides the complete selection of all combinations by their random
selection. This can be easily applied to discrete cases, but the method can be generalized
to continuous and mixed spaces. Random search can outperform a grid search, especially
if only a small number of hyperparameters affect the performance of the machine learning
algorithm.
3
Liashchynskyi
input
Convolution
BiasAdd
ReLU
BatchNorm
MaxPooling
Dropout
Convolution Cell
Flatten
Dense Cell
Dropout
Dense
Figure 2: An illustration of a basic CNN architecture. The search space defines the number
of convolutional and dense blocks.
4
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
FLAGS:
biases Convolution pooling: bool
batch_norm:bool
BiasAdd
ReLU
BatchNorm
MaxPooling
Dropout
Figure 3: An illustration of a convolutional block. This block accepts the following parame-
ters as required: input, filters, kernel size. If additional parameters are specified,
the output is passed through the BatchNormalization and MaxPooling layers.
5
Liashchynskyi
Figure 4: An illustration of a grid search space. We manually set a range of the possible
parameters and the algorithm makes a complete search over them. In other words,
the grid search algorithm is a complete brute-force and takes a too long time to
execute.
6
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
ditionally, we shift the training images horizontally and vertically, and randomly flipping
them horizontally.
Search spaces. We apply each of the algorithm to one search space: the macro search
space over basic convolutional model with adding convolutional cells to it (Section 2).
Training details. All convolutional kernels are initialized with He uniform initialization
(He et al. (2015b)). We also apply L2 weight decay with rate 10−4 and set the kernel size
in convolutions to 3. The first convolutional layer uses 32 filters. Each convolutional cell
uses 64 filters. All convolutions are followed by BatchNormalization with MaxPooling. The
parameters of each network are trained with Adamax optimizer (Kingma and Ba (2014)),
where the learning rate is set to 2e − 3 and other parameters is stay by default. We set the
dropout rate to 0.2 in each convolutional cell. Dropout rate in the basic model is 0.5 and
the number of units in the dense cells is 512. Each architecture search is run for 50 epochs
on Nvidia Tesla K80 GPU. The basic model achieves 76% accuracy after 50 epochs.
Results. The whole trainig procedure of 2 × 4 = 8 (length of conv cells list × length of
dence cells list) models took ≈4.3 hours.
2 1 0.58M 82 0.57
As you can see, the best accuracy is about 83%. The model has 2 convolutional and 2
dense cells.
7
Liashchynskyi
The best model shows about 86% accuracy. As you can see, this algorithm is faster than
Grid Search, but if we take more runs it will be much longer.
5. Final thoughts
We tested the neural architecture search approach with the three most popular algorithms
— Grid Search, Random Search, and Genetic Algorithm. Almost all of the tested algorithms
take a long time to search for the best model. Grid Search is too slow, Random Search
is limited to search space distributions. The most inspiring is the evolutionary algorithm,
where we encode the number of parameters as a genome. Evolution may take too long
before we get the best model.
1. We had also to limit the number of MaxPooling layers in the convolutional cells to prevent dimensions
error when input dimension comes too low. We skip MaxPooling if the dimension of the input comes
lower than (?, 2, 2, ?).
2. We set MaxPooling layers randomly in convolutional blocks to prevent low dimensionality of the output.
8
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
Regarding hyperparameter optimization, it’s difficult to say which of the above algo-
rithms will show the best results. If a model and search space is not too large then the grid
search or random search may be a good choice to do that. But if a model has too many
layers and large search space, then the evolutionary algorithm may be the best choice.
5.1 Summary
Grid search is a brute force algorithm. This makes a complete search for a given subset of the
hyperparameter space. If the search space is too large, do not choose this algorithm. That
is, we will train and test every possible combination of network parameters we provided.
The random search maybe a little faster, but it does not guarantee us the best results.
Finally, if our search space is large then the best choice is the evolutionary algorithm.
It also takes a long time to run, but we can control it over the number of generations and
length of the population. Each individual represents a solution in the search space for a
given problem. Each individual is coded as a finite length vector of components. These
variable components are analogous to genes. Thus a chromosome (individual) is composed
of several genes (variable components).
When there are too many parameters for optimization, the genetic algorithm performs
faster than others. Choose this one and the evolution will do all for you.
References
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network
transformation for efficient architecture search. CoRR, abs/1806.02639, 2018. URL http:
//arxiv.org/abs/1806.02639.
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A
survey, 2018.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition, 2015a.
9
Liashchynskyi
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification, 2015b.
Ali Hebbal, Loic Brevault, Mathieu Balesdent, El-Ghazali Talbi, and Nouredine Melab.
Bayesian optimization using deep gaussian processes, 2019.
Haifeng Jin, Qingquan Song, and Xia Hu. Efficient neural architecture search with network
morphism. CoRR, abs/1806.10282, 2018. URL http://arxiv.org/abs/1806.10282.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep
convolutional neural networks. Commun. ACM, 60:84–90, 2012.
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural
architecture search via parameter sharing. CoRR, abs/1802.03268, 2018. URL http:
//arxiv.org/abs/1802.03268.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition, 2014.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper
with convolutions, 2014.
Charles Weill, Javier Gonzalvo, Vitaly Kuznetsov, Scott Yang, Scott Yak, Hanna Mazzawi,
Eugen Hotaj, Ghassen Jerfel, Vladimir Macko, Ben Adlam, Mehryar Mohri, and Corinna
Cortes. Adanet: A scalable and flexible framework for automatically learning ensembles.
CoRR, abs/1905.00080, 2019. URL http://arxiv.org/abs/1905.00080.
Li Xu, Jimmy SJ Ren, Ce Liu, and Jiaya Jia. Deep convolutional neural network for
image deconvolution. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,
and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27,
pages 1790–1798. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/
5485-deep-convolutional-neural-network-for-image-deconvolution.pdf.
Zhao Zhong, Junjie Yan, and Cheng-Lin Liu. Practical network blocks design with q-
learning. CoRR, abs/1708.05552, 2017. URL http://arxiv.org/abs/1708.05552.
10
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning.
CoRR, abs/1611.01578, 2016. URL http://arxiv.org/abs/1611.01578.
11