Application of Cellular Neural Networks in Semantic Segmentation
Application of Cellular Neural Networks in Semantic Segmentation
Semantic Segmentation
András Fülöp, András Horváth
Pazmany Peter Catholic University, Faculty of Information Technology and Bionics
2021 IEEE International Symposium on Circuits and Systems (ISCAS) | 978-1-7281-9201-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/ISCAS51556.2021.9401249
Budapest, Hungary
Abstract—The popularity of convolutional neural networks and for semantic segmentation problems became more and more
deep learning based approaches has increased continuously in the important in the past years.
past years. These methods has enabled the solution of various One possible reduction of the high energy consumption and
practical problems, but they still are not heavily exploited in
the embedded domain, which requires low-power implementation this way enabling the application of multi-layered convolu-
of these architectures. Cellular neural networks can provide an tional neural networks in embedded applications could be the
analogue and power-efficient implementation of these networks analogue implementation of these structures substituting the
which also enables the exploitation of non-Boolean, beyond digital approaches. This could also enable the non-Boolean
CMOS elements such as memristive structures. application of these architectures using beyond Moore’s law
In this paper we will demonstrate on simple datasets how
cellular neural networks with different cell dynamics can be
devices such as spintronic oscillators [5] or memristive devices
efficiently applied in the problem of semantic segmentation. [6].
Index Terms—Cellular neural networks, Convolutional neural Cellular neural networks (CellNN) can provide an efficient,
networks, Semantic segmentation analogue implementation to various image processing tasks
and they were successfully applied in computer vision in the
I. I NTRODUCTION past decades [7]. The analog and continuous-time dynamics
of CellNNs can result lower power and smaller delay in
The application of deep neural networks brought a paradigm practice [8]. It was showed in [9] that CellNNs can be
shift in computer vision and data processing [1], which was efficiently used as an analogue implementation of ConvNNs
mainly boosted by convolutional neural networks (ConvNN) providing low power with high computational capabilities. In
[2]. These methods enabled the solution of problems which this implementation the feedback template of CellNNs was not
were not solvable before and significantly increased the state- utilized at all, rendering the functionality pf each layer to a
of-the-art accuracy on various complex problems. single convolution.
Apart from the clear benefits of these approaches they also It was demonstrated in [10] how the optimization methods
come with disadvantages: these techniques require a large of ConvNN, namely gradient descent [11] and backpropa-
and general dataset (representing the distribution of the real gation [12] can be exploited to optimize the programming
problem cases), instead of the mathematical definition of the templates of cellular neural networks, including the feed-back,
problem. Collection of such a dataset usually requires a huge feed-forward and bias terms for solving classification tasks.
effort, but as an advantage it can also ensure thorough testing In [8] and [9] analogue implementation of deep neural
and is especially beneficial in practical problems with vague networks exploiting cellular connections were demonstrated
definitions (e.g. classification of a dogs, detecting happy faces solving classification problems, where it was also shown that
etc.) which are difficult to be constructed using mathematical these approaches can result a four times lower energy then
formalism. traditional digital implementations.
These methods also require a fairly high amount of com- In classification problems one typically maps the original
putation. For example classifying a single image with the high-dimensional input to a single dimensional vector where
DenseNet-121 architecture requires 2.88 GFLOPs [3]. Al- the values (usually referred as logits) represent the likelihood
though the typical operations required by these architectures of the input belonging to a given class or category. Con-
are well supported by modern platforms, such as GPUs, volutional networks naturally contain this gradual dimension
the billions of operations still require a significant amount reduction layer-by-layer meanwhile CellNNs process the same
of energy. A single inference with DenseNet-121 requires dimensional inputs and always generate an output with the
approximately 8.78 Joules on modern and low-power mobile same spatial dimensions. To come around this problem in
processors such as the ARM Cortex-A57 chip [4]. The energy classification either extra steps are introduced to downsample
requirement is even higher in case of semantic classification the output maps of CellNNs, or unnecessarily large feature
problems, since they usually require a more complex net- maps are processed in the deeper layers. Both approaches
work architecture and work with higher dimensional outputs. increase the overall energy consumption of these systems.
Because of this the design of low-power implementations In the past years classification was substituted in most
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 15,2021 at 04:15:38 UTC from IEEE Xplore. Restrictions apply.
computer vision applications by segmentation, since they Another additional operator in ConvNNs is the pooling
usually provide better understanding of scenes then clas- operator which selects the maximum intensity in a small
sification or detection with bounding boxes. Segmentation neighbourhood of the image:
methods may vary depending on the selected architectures
(U-Net[13], SegNet[14], Mask-RCNN[15], RetinaNet[16]) or Pmax (Ii,j ) = max(Ri,j ) (4)
even on the exact specification of the segmentation prob- Where Pmax is the maximum pooling operator. I is the input
lem (semantic segmentation[17], instance segmentation[18] or feature map, R is a two-dimensional region which is selected
amodal segmentation[19]), but all of these approaches generate for pooling. Maximum pooling performs well in practice, adds
an output with the same spatial coordinates as the input, which extra non-linearity to the network and is easy to be calculated
fits ideally on Cellular architectures. (only an index has to be stored to propagate the error back).
In this paper we will demonstrate how network parameters
of CellNNs, can be trained using gradient based algorithms A. Leaky CNN Non-linearity
enabling efficient solutions in segmentation problems.
Gradient based training proved to be a useful tool in a wide
II. C ELLULAR N EURAL N ETWORKS range of practical problems, but unfortunately it can not be
directly applied with CellNNs, because of the characteristics
CellNNs and ConvNNs stem from the same motivation,
of the standard CelNN non-linearity. This output function
which is the replication of the operation of neurons in the
(fout (x)) does not provide any gradients if the input, x is
mammalian vision system. The predecessors of both methods
below minus one or above plus one.
can be traced back to the Neocognitron [20].
This architecture modeled the structure of the human retina These input elements providing zero gradient can prevent
and the primary visual cortex in a simplified way, but these the training of these subgraphs of the network and usually
aspects, as an inspiration, can be found in both ConvNNs and result inefficiencies. This phenomenon was also observed and
CellNNs. The most haracteristic differences between these two presents as a problem in the training of ConvNNs, usually
arhcitectures are the alteration of convolutional and pooling refered to as the ”dead ReLU problem ”[22].
layers in ConvNNs, which were modelled by the imple and Because of this a modified output inspired by the Leaky-
complex cells of the V1 area of the primary visual cortex and ReLU function was introduced in [10]:
the spatial representation of these cells in the cortex or the −1 + α(x + 1) , if −1 > x
retina. Meanwhile the topology of CellNNs follow more the y= x , if −1 ≤ x ≤ 1 (5)
structure of the human retina [21]. This structure with local
1 + α(x − 1) , if 1<x
coupling ensures the cellularity of the network, which is its
most important aspect and provides feasibility in practice. Where alpha is a small number, typically 10−4 , which deter-
ConvNNs are implemented on Boolean, discrete time ar- mines the leakage and also the gradient values below −1 and
chitecture, meanwhile in case of CellNNs all computation above 1.
happens in continuous time and in an analogue way.
The state of a standard CellNN cell can be defined by B. Multichannel CellNN
following differential equation: An other important distinguishing feature of ConvNNs is
dxi,j (t) P their multiple channels in each layer. The applications of
= −xi,j (t) + Ai,j,k,l y[xi,j,k,l (t)]+
dt k,l∈Si,j (1) different parallel feature-maps can also be observed with
P
+ Bi,j,k,l ui,j,k,l (t) + Zi,j CellNNs, but these are generally handled independently from
k,l∈Si,j
each other, meanwhile these feature-maps are typically com-
Where x represents the state variable, u is the input, A, B and bined with each other in every layer. Motivated by this idea
Z denote the connecting weights (templates of the array), i, an extended CeNN architecture was introduced in [10] using
j, k and l are index variables to select a cell in the processing multiple layers (as a cascade) and multiple feature maps in
array. each layer. These multiple feature maps can be noted as an
y defined the output function of each cell, which can be additional dimension to the spatial dimensions of the array,
calculated following way: similarly as it is done with ConvNNs.
1 1 dxi,j,k (t)
y = fout (x) = |x + 1| − |x − 1| (2) = −xi,j,k (t)
2 2 P dt
+ Ai,j,k,l,m,n y[xi,j,k,l,m,n (t)]+ (6)
Discretizing Equation 1 and setting all values in A to l,m,n∈Si,j,k
P
zero one can implement a convolutional layer. In ConvNNs + Bi,j,k,l,m,n ui,j,k,l,m,n (t) + Zi,j,k
l,m,n∈Si,j ,k
most commonly applied non-linearity is the rectified linear
unit (ReLU) which thresholds all negative values to zero as Where i, j, and l, m represent the indices of the spatial dimen-
opposed to the threshold levels of a CellNN (below −1 and sions and k (input channel index) and n (output channel index)
above 1): represent the different channels, feature maps. This general no-
ReLU (x) = max(0, x) (3) tation allows the arbitrary weighted summation of the feature
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 15,2021 at 04:15:38 UTC from IEEE Xplore. Restrictions apply.
maps, similarly to ConvNNs. In practice other specific summa- We have created a simple and modular simulator using
tion procedures can be implemented as well, such as depth- these elements in Python implementing it in the two most
wise separable convolutions[23] or dilated convolutions[24], popular machine learning frameworks, both in Tensorflow and
which can also be implemented using equation 6, but we were in Pytorch, where the architectures, datasets, error functions
not investigating these possibilities in our experiments. and optimization methods can be varied in a modular manner.
It was demonstrated in [10] that a multichannel, multi- Our simulator enables the definition of cellular architectures
layered cellular network with leaky non-linearity can be with various cell dynamics (including memristive ones) and
trained for classification using gradient based optimization provides easy-to-use application of gradient based optimiza-
methods. To overcome the problem of continuous time oper- tion algorithms with the help of automatic differentiation [29].
ation the Euler method [25] can be used, which is commonly Our simulator can be downloaded from: http://users.itk.ppke.
applied in the discretization of ordinary differential equations hu/∼horan/cnn/Simulators
and was also applied in previous CellNN simulators.
IV. R ESULTS
III. M EMRISTIVE C ELLULAR NEURAL N ETWORKS
In this section we would like to briefly introduce our simu-
Memristive cellular neural networks for computer vision lations and results through two simple semantic segmentation
were introduced in [26]. In these architectures the Cell dynam- problems. We list the most important parameters for our
ics determining the operations of the array are more complex simulation setups, but all of our codes will be made publicly
than the equation defined in Eq. 1 and show memristive available in the final version for the sake of reproducibility.
characteristics. CellNNs with memristive cell dynamics can
be described by the following equation (the description and A. Datasets for Semantic segmentation
parameters are taken from [27] and [28]:
We have selected the dataset M2NIST: multidigit MNIST
ẋm = g(xm , vm ) for semantic segmentation [30]. This dataset contains images
i̇m = x−1 v
P m m (7) consisting of three digits which are superposed from the
−x−1
m vm +
1
k,l=−1 (iak,l +ibk,l )+iz MNIST dataset [31]. The segmentation task of separating
v̇m = CX background and foreground pixels (belonging to the digits)
where xm , im and vm denotes the state, current and voltage is simple problem in this dataset, but the desired output of
of the memristive cells. Coupling happens to the voltage of each pixel is represented as eleven values (one belonging to
the cell through output and input currents between the cells. each digit.type, class and one representing background pixels),
The memristor currents are represented by the memristor state- this way the type of the digits has also be classified by the
dependent Ohm law: ij = x−1 j vj and the output of a cell can network.
be given as ioutvx = fout (vx )glin. Where glin is a linear We have extended this setup increasing the complexity of
parameter (conductance) and fout is the CellNN non-linearity. the background and substituted the black background pixels
Here we have used the leaky version in our experiments of MNIST using images from the CIFAR-10 dataset [32].
defined in equation 5. We refer to this dataset as CM2NIST. This dataset creates a
The evolution of the memristive state is defined by: small easy-to-investigate problem for semantic segmentation,
β−α which contains all important aspects from the separation of
g(xm , vm ) = (−βvm + 2 (|vm + Vt | − |vm − Vt |))∗
foreground and background pixels to the detection of object
(sign(vm )f p (vm ))
(8) shapes and features, meanwhile the small spatial dimensions
where Vt denotes the positive real-valued switching threshold ensure that the dataset can be processed in relatively short
voltage, whereas α and β are non-negative real coefficients time, which serves ideally the investigation of neural networks
with units Ω V −1 s−1 , determining the slopes of the piecewise with complex cell dynamics. The input image dimensions
linear function k. Meanwhile f p (vm ) is defined as:
were 64 × 86 and we have randomly combined the 50000
xm −xon
− 1)2p train images from CIFAR-10 and overlayed three randomly
1−( , if xm > 0
xof f −xon
p xm −xon 2p
f (vm ) =
1−(
xof f −xon
) , if xm < 0 (9) selected digits on all of them selected from MNIST and we
xon xm
1 − 1 [( − 1)2p + ( )2p ]
, if xm = 0
have created an independent test set accordingly.
2 xof f −xon xof f −xon
Where xon and xof f will define the domain of the possible Randomly selected sample images from both datasets can
memristor states and p is a positive integer, imposing boundary be seen in Figure 1.
values.
The parameters of the memristive cells and the normaliza- B. Simulations
tion method of the inputs were the same as presented in [10]. We have created three architectures to test their performance
A multi-layered, multichannel CelNN can be created in on the two previously described datasets. We have imple-
a straightforward manner using memristive cell dynamics as mented a digital, discrete time convolutional network based
well using equation 7 to define the dynamics of the cells and on the U-Net architecture [13]. Our network contained 32,
equation 6 to define the coupling of these elements in the 64, 128 channels in the three downscaling layers, which were
architecture. implemented by strided convolutions and 128, 64, 32 and 11
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 15,2021 at 04:15:38 UTC from IEEE Xplore. Restrictions apply.
Table I
T HIS TABLE SUMMARIZES OUR ACCURACY RESULTS ON THE TEST SETS
OF THE TWO INVESTIGATED DATASETS : M2NIST AND CM2NIST, WHICH
CONTAINS MNIST DIGITS OVER CIFAR10 BACKGROUND IMAGES . T HE
MEASUREMENTS ARE THE AVERAGE CROSS - ENTROPY LOSS VALUES
(XENT-L OSS ) ON THE TEST SET AND INTERSECTION OVER UNION
MEASURES (I O U). T HE THREE INVESTIGATED ARCHITECTURES ARE
REPRESENTED IN THE COLUMNS CONTAINING RESULTS FOR A U-NET
INSPIRED CONVOLUTIONAL NEURAL NETWORK (UNET) AS A BASELINE
AND THE IMPLEMENTATIONS OF C ELL NN BASED ARCHITECTURES , ONE
BUILT ON STANDARD C ELL NN CELL DYNAMICS (C ELL NN) AND
ANOTHER CONTAINING MEMRISTIVE CELLS (M EM -C ELL NN). W E HAVE
ALSO APPROXIMATED THE ENERGY CONSUMPTION OF A SINGLE
INFERENCE OF EACH NETWORK WHICH CAN BE SEEN IN THE LAST ROW.
T HE CALCULATIONS WERE BASED ON THE DATA PUBLISHED IN [8]. N O
ENERGY CONSUMPTION WAS CALCULATED FOR THE MEMRISTIVE CELLS ,
Figure 1. Sample images from the M2NIST dataset (upper row) where the SINCE THESE DYNAMICS WERE AVAILABLE ONLY IN SIMULATION AND NO
digits has to be classified and segmented simultaneously (the segmentation of DATA WAS AVAILABLE FOR A POSSIBLE POWER CONSUMPTION OF SUCH
foreground and background pixels is a simple task in this case). The lower row CELLS .
contains our modified version of the dataset where the background pixels were
substituted using images from the CIFAR10 dataset. The expected outputs are
not presented in the figure, since they are 11 × 64 × 86 images. Architecture UNET CellNN Mem-CellNN
M2NIST XENT-Loss 0.2481 0.1835 0.1442
M2NIST IoU 0.9470 0.9637 0.9870
CM2NIST XENT-Loss 0.7804 0.1783 0.1563
channels in the upscaling layers, which were implemented CM2NIST IoU 0.8891 0.9390 0.9819
using transposed convolutions. Energy 670 µJ 75 µJ -
We have also implemented a cellular structure where the
feature-maps were not downscaled and upscaled, each feature
map was represented by 64 × 86 image in each layer. This as memristive cells can further increase the accuracy of the
network contained 32, 64, 128 and 11 channels respectively. architecture. We have to emphasize that these results are com-
Two different versions of this network were implemented one ing from simulations and the effect of noise in the analogue
containing the standard CNN cell dynamics the other the implementation and its robustness has to be investigated in
memristive cell dynamics described in Section III. The number detail to enable practical appellations. But we think the results
of layers, channels and parameters were the same in case of demonstrate that analogue networks, even with complex cell
both implementations. dynamics, can be trained using gradient based methods and
We would like to emphasize that comparing the complexity might enable the application of complex neural networks using
of the networks is difficult since it can be done along different non-Boolean, beyond CMOS elements in the future.
dimensions. Comparison of the operations are more difficult
since the cell dynamics are more complex in case of the V. C ONCLUSION
memristive version, but the interconnections and the number In this paper we have demonstrated how CellNNs can be
of cells are identical. Also the architectures are difficult to used for semantic segmentation problems exploiting a multi-
be compared since the operations are digital floating point channel, multi-layered structure. These architectures do not
operations in case of the convolutional network and analogue contain neither downscaling nor upscaling steps, which results
diffusions in case of the cellular ones. Because of this we an efficient implementation compared to their convolutional
chose to compare the implementations along the number of counterparts. The weights of these networks were optimized
parameters. In our implementations the number of trainable using gradient based methods, which is to the best of the
parameters of all three variants are the same. authors knowledge is the first such application for training
Training for all network variants was implemented using CellNNs for semantic segmentation.
batches of 16 using ADAM optimizer with an initial learning We have demonstrated on simple datasets, which contain all
rate of 10−4 and we trained the networks for ten epochs. We aspects of semantic segmentation problems, that and analogue,
have used softmax cross-entropy as a loss function for training cellular implementation can provide an energy efficient alter-
and the performance was measured on the independent test set native. In case of our implementation the analogue, continuous
calculating both cross-entropy loss and the intersection over time network required 8.9 times less energy, than a digital
union (IoU) between the expected and actual responses. implementation with lower overall accuracy. We have also
The results can be seen in Table I. As it can be seen demonstrated using memristive dynamics that these architec-
from the results CellNNs provided solutions with higher tures might be implementable using non-Boolean elements
accuracy (lower cross-entropy loss and higher IoU) than their such as memristors.
digital countperpart. Also the analogue implementation can
yield lower-energy consumption enabling the application of ACKNOWLEDGMENT
CellNNs for semantic segmentation problems in machine This research has been partially supported by the Hungarian
vision. Exploiting even more complex cell dynamics, such Government by the following grant: 2018-1.2.1-NKP00008:
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 15,2021 at 04:15:38 UTC from IEEE Xplore. Restrictions apply.
Exploring the Mathematical Foundations of Artificial Intelli- [21] L. O. Chua and T. Roska, “The cnn paradigm,” IEEE Transactions on
gence. Circuits and Systems I: Fundamental Theory and Applications, vol. 40,
no. 3, pp. 147–156, 1993.
[22] M. M. Lau and K. H. Lim, “Investigation of activation functions in
R EFERENCES deep belief network,” in Control and Robotics Engineering (ICCRE),
2017 2nd International Conference on. IEEE, 2017, pp. 201–206.
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, [23] F. Chollet, “Xception: Deep learning with depthwise separable convolu-
no. 7553, p. 436, 2015. tions,” in Proceedings of the IEEE conference on computer vision and
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification pattern recognition, 2017, pp. 1251–1258.
with deep convolutional neural networks,” in Advances in neural infor- [24] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting
mation processing systems, 2012, pp. 1097–1105. dilated convolution: A simple approach for weakly-and semi-supervised
semantic segmentation,” in Proceedings of the IEEE Conference on
[3] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
Computer Vision and Pattern Recognition, 2018, pp. 7268–7277.
connected convolutional networks,” in Proceedings of the IEEE confer-
[25] E. Süli and D. F. Mayers, An introduction to numerical analysis.
ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
Cambridge university press, 2003.
[4] E. Garcı́a-Martı́n, C. F. Rodrigues, G. Riley, and H. Grahn, “Estimation
[26] X. Hu, G. Feng, S. Duan, and L. Liu, “A memristive multilayer
of energy consumption in machine learning,” Journal of Parallel and
cellular neural network with applications to image processing,” IEEE
Distributed Computing, vol. 134, pp. 75–88, 2019.
transactions on neural networks and learning systems, vol. 28, no. 8,
[5] T. Roska, A. Horvath, A. Stubendek, F. Corinto, G. Csaba, W. Porod, pp. 1889–1901, 2016.
T. Shibata, and G. Bourianoff, “An associative memory with oscillatory [27] A. Ascoli, I. Messaris, R. Tetzlaff, and L. O. Chua, “Theoretical
cnn arrays using spin torque oscillator cells and spin-wave interactions foundations of memristor cellular nonlinear networks: Stability analysis
architecture and end-to-end simulator,” in 2012 13th International Work- with dynamic memristors,” IEEE Transactions on Circuits and Systems
shop on Cellular Nanoscale Networks and their Applications. IEEE, I: Regular Papers, vol. 67, no. 4, pp. 1389–1401, 2019.
2012, pp. 1–3. [28] R. Tetzlaff, A. Ascoli, I. Messaris, and L. O. Chua, “Theoretical foun-
[6] F. Corinto, M. Forti, and L. O. Chua, “Memristor cellular neural dations of memristor cellular nonlinear networks: Memcomputing with
networks computing in the flux-charge domain,” in Nonlinear Circuits bistable-like memristors,” IEEE Transactions on Circuits and Systems I:
and Systems with Memristors. Springer, pp. 343–372. Regular Papers, vol. 67, no. 2, pp. 502–515, 2019.
[7] Á. Zarándy, A. Horváth, and P. Szolgay, “Cnn technology-tools and [29] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
applications,” IEEE Circuits and Systems Magazine, vol. 18, no. 2, pp. A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in
77–89, 2018. pytorch,” 2017.
[8] Q. Lou, C. Pan, J. McGuinness, A. Horvath, A. Naeemi, M. Niemier, [30] L. Zhou, “Multidigit mnist for semantic segmentationh,”
and X. S. Hu, “A mixed signal architecture for convolutional neural 2018. [Online]. Available: https://www.kaggle.com/zhoulingyan0228/
networks,” ACM Journal on Emerging Technologies in Computing m2nist-segmentation-u-net
Systems (JETC), vol. 15, no. 2, pp. 1–26, 2019. [31] Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun.
[9] A. Horváth, M. Hillmer, Q. Lou, X. S. Hu, and M. Niemier, “Cellular com/exdb/mnist/, 1998.
neural network friendly convolutional neural networks: Cnns with cnns,” [32] A. Krizhevsky, V. Nair, and G. Hinton, “The cifar-10 dataset,” online:
in Proceedings of the Conference on Design, Automation & Test in http://www. cs. toronto. edu/kriz/cifar. html, 2014.
Europe. European Design and Automation Association, 2017, pp. 145–
150.
[10] A. Fülöp and A. Horváth, “Template optimization in cellular neural net-
works using gradient based approaches,” in 2020 European Conference
on Circuit Theory and Design (ECCTD). IEEE, 2020, pp. 1–4.
[11] L. Bottou, “Large-scale machine learning with stochastic gradient de-
scent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–
186.
[12] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, “A theoretical
framework for back-propagation,” in Proceedings of the 1988 connec-
tionist models summer school, vol. 1. CMU, Pittsburgh, Pa: Morgan
Kaufmann, 1988, pp. 21–28.
[13] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on
Medical image computing and computer-assisted intervention. Springer,
2015, pp. 234–241.
[14] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation,” arXiv
preprint arXiv:1511.00561, 2015.
[15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
Computer Vision (ICCV), 2017 IEEE International Conference on.
IEEE, 2017, pp. 2980–2988.
[16] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
for dense object detection,” IEEE transactions on pattern analysis and
machine intelligence, 2018.
[17] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2015, pp. 3431–3440.
[18] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features
from rgb-d images for object detection and segmentation,” in European
Conference on Computer Vision. Springer, 2014, pp. 345–360.
[19] Y. Zhu, Y. Tian, D. N. Metaxas, and P. Dollár, “Semantic amodal
segmentation.” in CVPR, vol. 2, 2017, p. 7.
[20] K. Fukushima and S. Miyake, “Neocognitron: A self-organizing neural
network model for a mechanism of visual pattern recognition,” in
Competition and cooperation in neural nets. Springer, 1982, pp. 267–
285.
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 15,2021 at 04:15:38 UTC from IEEE Xplore. Restrictions apply.