0% found this document useful (0 votes)

36 views

Deep Learning Using Rectified Linear Units (ReLU)

This document introduces using rectified linear units (ReLU) as the classification function in deep neural networks, instead of the conventional softmax function. It describes experiments comparing the predictive performance of deep learning models using ReLU versus softmax on three datasets: MNIST, Fashion-MNIST, and Wisconsin Diagnostic Breast Cancer. The models are implemented using Keras with TensorFlow, and include feed-forward and convolutional neural networks. Data preprocessing, model architecture, training methodology, and results are discussed.

Uploaded by

dejanira deja

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Deep Learning Using Rectified Linear Units (ReLU)

Uploaded by

dejanira deja

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred M. Agarap

[email protected]

ABSTRACT 2 METHODOLOGY
We introduce the use of rectified linear units (ReLU) as the classifi- 2.1 Machine Intelligence Library
cation function in a deep neural network (DNN). Conventionally,
Keras[4] with Google TensorFlow[1] backend was used to imple-
ReLU is used as an activation function in DNNs, with Softmax
ment the deep learning algorithms in this study, with the aid of
function as their classification function. However, there have been
other scientific computing libraries: matplotlib[7], numpy[14], and
several studies on using a classification function other than Soft-
scikit-learn[11].
max, and this study is an addition to those. We accomplish this
arXiv:1803.08375v2 [cs.NE] 7 Feb 2019

by taking the activation of the penultimate layer hn−1 in a neu-

2.2 The Datasets
ral network, then multiply it by weight parameters θ to get the
raw scores oi . Afterwards, we threshold the raw scores oi by 0, i.e. In this section, we describe the datasets used for the deep learning
f (o) = max(0, oi ), where f (o) is the ReLU function. We provide models used in the experiments.
class predictions ŷ through arg max function, i.e. arg max f (x). 2.2.1 MNIST. MNIST[10] is one of the established standard
datasets for benchmarking deep learning models. It is a 10-class
CCS CONCEPTS classification problem having 60,000 training examples, and 10,000
test cases – all in grayscale, with each image having a resolution of
• Computing methodologies → Supervised learning by clas- 28 × 28.
sification; Neural networks;
2.2.2 Fashion-MNIST. Xiao et al. (2017)[17] presented the new
Fashion-MNIST dataset as an alternative to the conventional MNIST.
KEYWORDS The new dataset consists of 28 × 28 grayscale images of 70, 000
artificial intelligence; artificial neural networks; classification; con- fashion products from 10 classes, with 7, 000 images per class.
volutional neural network; deep learning; deep neural networks;
feed-forward neural network; machine learning; rectified linear 2.2.3 Wisconsin Diagnostic Breast Cancer (WDBC). The WDBC
units; softmax; supervised learning dataset[16] consists of features which were computed from a digi-
tized image of a fine needle aspirate (FNA) of a breast mass. There
are 569 data points in this dataset: (1) 212 – Malignant, and (2) 357
– Benign.

1 INTRODUCTION 2.3 Data Preprocessing

A number of studies that use deep learning approaches have claimed We normalized the dataset features using Eq. 1,
state-of-the-art performances in a considerable number of tasks
such as image classification[9], natural language processing[15], X −µ
speech recognition[5], and text classification[18]. These deep learn- z= (1)
σ
ing models employ the conventional softmax function as the classi- where X represents the dataset features, µ represents the mean
fication layer.
value for each dataset feature x (i) , and σ represents the correspond-
However, there have been several studies[2, 3, 12] on using a
ing standard deviation. This normalization technique was imple-
classification function other than Softmax, and this study is yet
mented using the StandardScaler[11] of scikit-learn.
another addition to those.
For the case of MNIST and Fashion-MNIST, we employed Princi-
In this paper, we introduce the use of rectified linear units (ReLU)
pal Component Analysis (PCA) for dimensionality reduction. That
at the classification layer of a deep learning model. This approach
is, to select the representative features of image data. We accom-
is the novelty presented in this study, i.e. ReLU is conventionally
plished this by using the PCA[11] of scikit-learn.
used as an activation function for the hidden layers in a deep neural
network. We accomplish this by taking the activation of the penul-
2.4 The Model
timate layer in a neural network, then use it to learn the weight
parameters of the ReLU classification layer through backpropaga- We implemented a feed-forward neural network (FFNN) and a con-
tion. volutional neural network (CNN), both of which had two different
We demonstrate and compare the predictive performance of classification functions, i.e. (1) softmax, and (2) ReLU.
DL-ReLU models with DL-Softmax models on MNIST[10], Fashion- 2.4.1 Softmax. Deep learning solutions to classification prob-
MNIST[17], and Wisconsin Diagnostic Breast Cancer (WDBC)[16] lems usually employ the softmax function as their classification
classification. We use the Adam[8] optimization algorithm for learn- function (last layer). The softmax function specifies a discrete prob-
ing the network weight parameters. ability distribution for K classes, denoted by kK=1 pk .
Í
,, Abien Fred M. Agarap

If we take x as the activation at the penultimate layer of a neural

network, and θ as its weight parameters at the softmax layer, we
Õ
ℓ(θ ) = − y · loд max(0, θx + b) (6)

have o as the input to the softmax layer,
Let the input x be replaced the penultimate activation output h,
n−1
∂ℓ(θ ) θ ·y
Õ
o= θi xi (2) =− (7)
i ∂h max 0, θh + b · ln 10

Consequently, we have The backpropagation algorithm (see Eq. 8) is the same as the
conventional softmax-based deep neural network.
exp(o )
pk = Ín−1 k (3)
exp(ok )
" #
∂ℓ(θ ) Õ ∂ℓ(θ ) Õ ∂pi ∂ok

k =0
= (8)
Hence, the predicted class would be ŷ ∂θ i
∂pi ∂ok ∂θ
k
Algorithm 1 shows the rudimentary gradient-descent algorithm
ŷ = arg max pi (4) for a DL-ReLU model.
i ∈1, ..., N

2.4.2 Rectified Linear Units (ReLU). ReLU is an activation func- Algorithm 1: Mini-batch stochastic gradient descent
tion introduced by [6], which has strong biological and mathemati- training of neural network with the rectified linear unit
cal underpinning. In 2011, it was demonstrated to further improve (ReLU) as its classification function.
training of deep neural networks. It works by thresholding values Input: {x (i) ∈ Rm }i=1 n ,θ
at 0, i.e. f (x) = max(0, x). Simply put, it outputs 0 when x < 0, and Output: W
conversely, it outputs a linear function when x ≥ 0 (refer to Figure for number of training iterations do
1 for visual representation). for i = 1, 2, . . . n do
θ ·y
∇θ = ∇θ −
max 0, θh + b · ln 10

θ = θ − α · ∇θ ℓ(θ ; x (i) )
Any standard gradient-based learning algorithm may be used.
We used adaptive momentum estimation (Adam) in our
experiments.

In some experiments, we found the DL-ReLU models perform

on par with the softmax-based models.

2.5 Data Analysis

Figure 1: The Rectified Linear Unit (ReLU)
To evaluate the performance of the DL-ReLU models, we employ
activation function produces 0 as an output
the following metrics:
when x < 0, and then produces a linear with
(1) Cross Validation Accuracy & Standard Deviation. The result
slope of 1 when x > 0.
of 10-fold CV experiments.
(2) Test Accuracy. The trained model performance on unseen
We propose to use ReLU not only as an activation function in data.
each hidden layer of a neural network, but also as the classification (3) Recall, Precision, and F1-score. The classification statistics
function at the last layer of a network. on class predictions.
Hence, the predicted class for ReLU classifier would be ŷ, (4) Confusion Matrix. The table for describing classification
ŷ = arg max max(0, o) (5) performance.
i ∈1, ..., N

2.4.3 Deep Learning using ReLU. ReLU is conventionally used 3 EXPERIMENTS

as an activation function for neural networks, with softmax being All experiments in this study were conducted on a laptop computer
their classification function. Then, such networks use the softmax with Intel Core(TM) i5-6300HQ CPU @ 2.30GHz x 4, 16GB of DDR3
cross-entropy function to learn the weight parameters θ of the RAM, and NVIDIA GeForce GTX 960M 4GB DDR5 GPU.
neural network. In this paper, we still implemented the mentioned Table 1 shows the architecture of the VGG-like CNN (from
loss function, but with the distinction of using the ReLU for the Keras[4]) used in the experiments. The last layer, dense_2, used
prediction units (see Eq. 6). The θ parameters are then learned by the softmax classifier and ReLU classifier in the experiments.
backpropagating the gradients from the ReLU classifier. To accom- The Softmax- and ReLU-based models had the same hyper-
plish this, we differentiate the ReLU-based cross-entropy function parameters, and it may be seen on the Jupyter Notebook found in
(see Eq. 7) w.r.t. the activation of the penultimate layer, the project repository: https://github.com/AFAgarap/relu-classifier.
Deep Learning using Rectified Linear Units (ReLU) ,,

Table 1: Architecture of VGG-like CNN from Keras[4]. Table 3: MNIST Classification. Comparison of FFNN-
Softmax and FFNN-ReLU models in terms of % accuracy. The
Layer (type) Output Shape Param # training cross validation is the average cross validation ac-
curacy over 10 splits. Test accuracy is on unseen data. Preci-
conv2d_1 (Conv2D) (None, 14, 14, 32) 320 sion, recall, and F1-score are on unseen data.
conv2d_2 (Conv2D) (None, 12, 12, 32) 9248
max_pooling2d_1 (MaxPooling2) (None, 6, 6, 32) 0
Metrics / Models FFNN-Softmax FFNN-ReLU
dropout_1 (Dropout) (None, 6, 6, 32) 0
conv2d_3 (Conv2D) (None, 4, 4, 64) 18496 Training cross validation ≈ 99.29% ≈ 98.22%
conv2d_4 (Conv2D) (None, 2, 2, 64) 36928 Test accuracy 97.98% 97.77%
max_pooling2d_2 (MaxPooling2) (None, 1, 1, 64) 0 Precision 0.98 0.98
dropout_2 (Dropout) (None, 1, 1, 64) 0 Recall 0.98 0.98
flatten_1 (Flatten) (None, 64) 0 F1-score 0.98 0.98
dense_1 (Dense) (None, 256) 16640
dropout_3 (Dropout) (None, 256) 0
dense_2 (Dense) (None, 10) 2570

Table 2 shows the architecture of the feed-forward neural net-

work used in the experiments. The last layer, dense_6, used the
softmax classifier and ReLU classifier in the experiments.

Table 2: Architecture of FFNN.

Layer (type) Output Shape Param #

dense_3 (Dense) (None, 512) 131584
dropout_4 (Dropout) (None, 512) 0
dense_4 (Dense) (None, 512) 262656
dropout_5 (Dropout) (None, 512) 0
dense_5 (Dense) (None, 512) 262656
dropout_6 (Dropout) (None, 512) 0
dense_6 (Dense) (None, 10) 5130

All models used Adam[8] optimization algorithm for training, Figure 2: Confusion matrix of FFNN-ReLU on
with the default learning rate α = 1 × 10−3 , β 1 = 0.9, β 2 = 0.999, MNIST classification.
ϵ = 1 × 10−8 , and no decay.
Table 4: MNIST Classification. Comparison of CNN-Softmax
3.1 MNIST and CNN-ReLU models in terms of % accuracy. The training
We implemented both CNN and FFNN defined in Tables 1 and 2 cross validation is the average cross validation accuracy over
on a normalized, and PCA-reduced features, i.e. from 28 × 28 (784) 10 splits. Test accuracy is on unseen data. Precision, recall,
dimensions down to 16 × 16 (256) dimensions. and F1-score are on unseen data.
In training a FFNN with two hidden layers for MNIST classifica-
tion, we found the results described in Table 3.
Despite the fact that the Softmax-based FFNN had a slightly Metrics / Models CNN-Softmax CNN-ReLU
higher test accuracy than the ReLU-based FFNN, both models had Training cross validation ≈ 97.23% ≈ 73.53%
0.98 for their F1-score. These results imply that the FFNN-ReLU is Test accuracy 95.36% 91.74%
on par with the conventional FFNN-Softmax. Precision 0.95 0.92
Figures 2 and 3 show the predictive performance of both models Recall 0.95 0.92
for MNIST classification on its 10 classes. Values of correct pre- F1-score 0.95 0.92
diction in the matrices seem to be balanced, as in some classes,
the ReLU-based FFNN outperformed the Softmax-based FFNN, and
vice-versa. inspected (see Table 5). However, despite its slower convergence, it
In training a VGG-like CNN[4] for MNIST classification, we was able to achieve a test accuracy higher than 90%. Granted, it is
found the results described in Table 4. lower than the test accuracy of CNN-Softmax by ≈ 4%, but further
The CNN-ReLU was outperformed by the CNN-Softmax since it optimization may be done on the CNN-ReLU to achieve an on-par
converged slower, as the training accuracies in cross validation were performance with the CNN-Softmax.
,, Abien Fred M. Agarap

Figure 3: Confusion matrix of FFNN-Softmax on Figure 4: Confusion matrix of CNN-ReLU on

MNIST classification. MNIST classification.

Table 5: Training accuracies and losses per fold in the 10-fold

training cross validation for CNN-ReLU on MNIST Classifi-
cation.

Fold # Loss Accuracy (×100%)

1 1.9060128301398311 0.32963837901722315
2 1.4318902588488513 0.5091768125718277
3 1.362783239967884 0.5942213337366827
4 0.8257899198037331 0.7495911319797827
5 1.222473526516734 0.7038720233118376
6 0.4512576775334098 0.8729090907790444
7 0.49083630082824015 0.8601818182685158
8 0.34528968995411613 0.9032199380288064
9 0.30161443973038743 0.912663755545276
10 0.279967466075669 0.9171823807790317

Figures 4 and 5 show the predictive performance of both models

for MNIST classification on its 10 classes. Since the CNN-Softmax
converged faster than CNN-ReLU, it has the most number of correct
predictions per class. Figure 5: Confusion matrix of CNN-Softmax on
MNIST classification.
3.2 Fashion-MNIST
We implemented both CNN and FFNN defined in Tables 1 and 2
on a normalized, and PCA-reduced features, i.e. from 28 × 28 (784) 0.89 for their F1-score. These results imply that the FFNN-ReLU is
dimensions down to 16 × 16 (256) dimensions. The dimensionality on par with the conventional FFNN-Softmax.
reduction for MNIST was the same for Fashion-MNIST for fair Figures 6 and 7 show the predictive performance of both models
comparison. Though this discretion may be challenged for further for Fashion-MNIST classification on its 10 classes. Values of correct
investigation. prediction in the matrices seem to be balanced, as in some classes,
In training a FFNN with two hidden layers for Fashion-MNIST the ReLU-based FFNN outperformed the Softmax-based FFNN, and
classification, we found the results described in Table 6. vice-versa.
Despite the fact that the Softmax-based FFNN had a slightly In training a VGG-like CNN[4] for Fashion-MNIST classification,
higher test accuracy than the ReLU-based FFNN, both models had we found the results described in Table 7.
Deep Learning using Rectified Linear Units (ReLU) ,,

Table 6: Fashion-MNIST Classification. Comparison of

FFNN-Softmax and FFNN-ReLU models in terms of % accu-
racy. The training cross validation is the average cross val-
idation accuracy over 10 splits. Test accuracy is on unseen
data. Precision, recall, and F1-score are on unseen data.

Metrics / Models FFNN-Softmax FFNN-ReLU

Training cross validation ≈ 98.87% ≈ 92.23%
Test accuracy 89.35% 89.06%
Precision 0.89 0.89
Recall 0.89 0.89
F1-score 0.89 0.89

Figure 7: Confusion matrix of FFNN-Softmax on

Fashion-MNIST classification.

had the same F1-score of 0.86 with CNN-Softmax – also similar to

the findings in MNIST classification.

Table 8: Training accuracies and losses per fold in the 10-fold

training cross validation for CNN-ReLU for Fashion-MNIST
classification.

Fold # Loss Accuracy (×100%)

1 0.7505188028133193 0.7309229651162791
2 0.6294445606858231 0.7821584302325582
Figure 6: Confusion matrix of FFNN-ReLU on 3 0.5530192871624917 0.8128293656488342
Fashion-MNIST classification. 4 0.468552251288519 0.8391494002614356
5 0.4499297190579501 0.8409090909090909
6 0.45004472223195163 0.8499999999566512
Table 7: Fashion-MNIST Classification. Comparison of CNN-
7 0.4096944159454683 0.855610110994295
Softmax and CNN-ReLU models in terms of % accuracy. The
8 0.39893951664539995 0.8681098779960613
training cross validation is the average cross validation ac-
9 0.37760543597664203 0.8637190683266308
curacy over 10 splits. Test accuracy is on unseen data. Preci-
10 0.34610279169377683 0.8804367606156083
sion, recall, and F1-score are on unseen data.

Figures 8 and 9 show the predictive performance of both models

Metrics / Models CNN-Softmax CNN-ReLU
for Fashion-MNIST classification on its 10 classes. Contrary to the
Training cross validation ≈ 91.96% ≈ 83.24% findings of MNIST classification, CNN-ReLU had the most number
Test accuracy 86.08% 85.84% of correct predictions per class. Conversely, with its faster conver-
Precision 0.86 0.86 gence, CNN-Softmax had the higher cumulative correct predictions
Recall 0.86 0.86 per class.
F1-score 0.86 0.86
3.3 WDBC
We implemented FFNN defined in Table 2, but with hidden layers
Similar to the findings in MNIST classification, the CNN-ReLU having 64 neurons followed by 32 neurons instead of two hidden
was outperformed by the CNN-Softmax since it converged slower, layers both having 512 neurons. For the WDBC classfication, we
as the training accuracies in cross validation were inspected (see only normalized the dataset features. PCA dimensionality reduction
Table 8). Despite its slightly lower test accuracy, the CNN-ReLU might not prove to be prolific since WDBC has only 30 features.
,, Abien Fred M. Agarap

Table 9: WDBC Classification. Comparison of CNN-Softmax

and CNN-ReLU models in terms of % accuracy. The training
cross validation is the average cross validation accuracy over
10 splits. Test accuracy is on unseen data. Precision, recall,
and F1-score are on unseen data.

Metrics / Models FFNN-Softmax FFNN-ReLU

Training cross validation ≈ 91.21% ≈ 87.96%
Test accuracy ≈ 92.40% ≈ 90.64%
Precision 0.92 0.91
Recall 0.92 0.91
F1-score 0.92 0.90

Figure 8: Confusion matrix of CNN-ReLU on

Fashion-MNIST classification.

Figure 10: Confusion matrix of FFNN-ReLU on

WDBC classification.

Figures 10 and 11 show the predictive performance of both mod-

els for WDBC classification on binary classification. The confusion
matrices show that the FFNN-Softmax had more false negatives
than FFNN-ReLU. Conversely, FFNN-ReLU had more false positives
than FFNN-Softmax.
Figure 9: Confusion matrix of CNN-Softmax on
Fashion-MNIST classification. 4 CONCLUSION AND RECOMMENDATION
The relatively unfavorable findings on DL-ReLU models is most
probably due to the dying neurons problem in ReLU. That is, no
In training the FFNN with two hidden layers of [64, 32] neurons, gradients flow backward through the neurons, and so, the neurons
we found the results described in Table 9. become stuck, then eventually “die”. In effect, this impedes the
Similar to the findings in classification using CNN-based models, learning progress of a neural network. This problem is addressed
the FFNN-ReLU was outperformed by the FFNN-Softmax in WDBC in subsequent improvements on ReLU (e.g. [13]). Aside from such
classification. Consistent with the CNN-based models, the FFNN- drawback, it may be stated that DL-ReLU models are still compa-
ReLU suffered from slower convergence than the FFNN-Softmax. rable to, if not better than, the conventional Softmax-based DL
However, there was only 0.2 F1-score difference between them. models. This is supported by the findings in DNN-ReLU for image
It stands to reason that the FFNN-ReLU is still comparable with classification using MNIST and Fashion-MNIST.
FFNN-Softmax. Future work may be done on thorough investigation of DL-ReLU
Deep Learning using Rectified Linear Units (ReLU) ,,

processing systems. 1097–1105.

[10] Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST hand-
written digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exd-
b/mnist 2 (2010).
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[12] Yichuan Tang. 2013. Deep learning using linear support vector machines. arXiv
preprint arXiv:1306.0239 (2013).
[13] Ludovic Trottier, Philippe Gigu, Brahim Chaib-draa, et al. 2017. Parametric
exponential linear unit for deep convolutional neural networks. In Machine
Learning and Applications (ICMLA), 2017 16th IEEE International Conference on.
IEEE, 207–214.
[14] Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy
array: a structure for efficient numerical computation. Computing in Science &
Engineering 13, 2 (2011), 22–30.
[15] Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke,
and Steve Young. 2015. Semantically conditioned lstm-based natural language
generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745 (2015).
[16] William H Wolberg, W Nick Street, and Olvi L Mangasarian. 1992. Breast cancer
Wisconsin (diagnostic) data set. UCI Machine Learning Repository [http://archive.
ics. uci. edu/ml/] (1992).
[17] Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel
Image Dataset for Benchmarking Machine Learning Algorithms. (2017).
arXiv:cs.LG/1708.07747
[18] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Ed-
Figure 11: Confusion matrix of FFNN-Softmax on uard H Hovy. 2016. Hierarchical Attention Networks for Document Classification..
In HLT-NAACL. 1480–1489.
WDBC classification.

models through numerical inspection of gradients during backprop-

agation, i.e. compare the gradients in DL-ReLU models with the
gradients in DL-Softmax models. Furthermore, ReLU variants may
be brought into the table for additional comparison.

5 ACKNOWLEDGMENT
An appreciation of the VGG-like Convnet source code in Keras[4],
as it was the CNN model used in this study.

REFERENCES
[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San-
jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven-
berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike
Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul
Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.
2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
(2015). http://tensorflow.org/ Software available from tensorflow.org.
[2] Abien Fred Agarap. 2017. A Neural Network Architecture Combining Gated
Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection
in Network Traffic Data. arXiv preprint arXiv:1709.03082 (2017).
[3] Abdulrahman Alalshekmubarak and Leslie S Smith. 2013. A novel approach
combining recurrent neural network and support vector machines for time series
classification. In Innovations in Information Technology (IIT), 2013 9th International
Conference on. IEEE, 42–47.
[4] François Chollet et al. 2015. Keras. https://github.com/keras-team/keras. (2015).
[5] Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and
Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances
in Neural Information Processing Systems. 577–585.
[6] Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rodney J Douglas,
and H Sebastian Seung. 2000. Digital selection and analogue amplification coexist
in a cortex-inspired silicon circuit. Nature 405, 6789 (2000), 947.
[7] J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science
& Engineering 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55
[8] Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimiza-
tion. arXiv preprint arXiv:1412.6980 (2014).
[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica-
tion with deep convolutional neural networks. In Advances in neural information

Mechanical Springs Wahl
67% (3)
Mechanical Springs Wahl
463 pages
〈1228.1〉 DRY HEAT DEPYROGENATION
100% (8)
〈1228.1〉 DRY HEAT DEPYROGENATION
4 pages
Magdalen College School 11 Plus (11+) Maths Sample Paper Answers
0% (1)
Magdalen College School 11 Plus (11+) Maths Sample Paper Answers
20 pages
Boolean Operators Worksheet
100% (1)
Boolean Operators Worksheet
2 pages
(8th) James Stewart - Student Solutions Manual, Chapters 1-11 For Stewart's Single Variable Calculus, 8th-Brooks Cole (2015)
No ratings yet
(8th) James Stewart - Student Solutions Manual, Chapters 1-11 For Stewart's Single Variable Calculus, 8th-Brooks Cole (2015)
2 pages
Neural Networks: 1 Basic Optimizer
No ratings yet
Neural Networks: 1 Basic Optimizer
8 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Oct1 A
No ratings yet
Oct1 A
7 pages
Object Classification Using CNN
No ratings yet
Object Classification Using CNN
9 pages
Paper 1
No ratings yet
Paper 1
17 pages
DL_EXP-3_16010422230
No ratings yet
DL_EXP-3_16010422230
9 pages
ABCs2018 Paper 186
No ratings yet
ABCs2018 Paper 186
6 pages
Control System Term Paper
No ratings yet
Control System Term Paper
12 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
A RELU DENSE LAYER TO IMPROVE
No ratings yet
A RELU DENSE LAYER TO IMPROVE
5 pages
6.3 HiddenUnits
No ratings yet
6.3 HiddenUnits
26 pages
He Delving Deep Into
No ratings yet
He Delving Deep Into
9 pages
GK Deeplearning
No ratings yet
GK Deeplearning
15 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Unit 4
No ratings yet
Unit 4
19 pages
Neural Networks in Healthcare Lecture 2_021808
No ratings yet
Neural Networks in Healthcare Lecture 2_021808
73 pages
Delving Deep Into Rectifiers: Surpassing Human-Level Performance On Imagenet Classification
No ratings yet
Delving Deep Into Rectifiers: Surpassing Human-Level Performance On Imagenet Classification
11 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
1803 08823 PDF
No ratings yet
1803 08823 PDF
122 pages
AA12_Deep_Learning_2024 (1)
No ratings yet
AA12_Deep_Learning_2024 (1)
30 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
CH 02 Summary
No ratings yet
CH 02 Summary
3 pages
Project Report: CS 574 - Computer Vision Using Machine Learning
No ratings yet
Project Report: CS 574 - Computer Vision Using Machine Learning
38 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Visualizing Deep Neural Networks Classes and Features - Ankivil
No ratings yet
Visualizing Deep Neural Networks Classes and Features - Ankivil
58 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
2K21_EE_192 MLP
No ratings yet
2K21_EE_192 MLP
59 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
No ratings yet
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
117 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
3EBX0_lecture_notes_addendum
No ratings yet
3EBX0_lecture_notes_addendum
10 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
Deep Learning
No ratings yet
Deep Learning
40 pages
Unit II
No ratings yet
Unit II
56 pages
Improvement of Learning For CNN With Relu Activation by Sparse Regularization
No ratings yet
Improvement of Learning For CNN With Relu Activation by Sparse Regularization
8 pages
Rec Ex 11
No ratings yet
Rec Ex 11
13 pages
Review of Deep Learning Algorithms and Architectur
No ratings yet
Review of Deep Learning Algorithms and Architectur
29 pages
what are the activation functions, how do i deter...
No ratings yet
what are the activation functions, how do i deter...
3 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
WS_2021_Solutions
No ratings yet
WS_2021_Solutions
16 pages
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
No ratings yet
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
33 pages
Machine Learning-Lecture 16(Student)
No ratings yet
Machine Learning-Lecture 16(Student)
10 pages
Artifical Intelligence Coursework Report
No ratings yet
Artifical Intelligence Coursework Report
28 pages
Chapter 3 Ann
No ratings yet
Chapter 3 Ann
26 pages
Building Your Deep Neural Network - Step by Step v8 PDF
No ratings yet
Building Your Deep Neural Network - Step by Step v8 PDF
44 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Full download Navidi, W: ISE Elementary Statistics William Navidi pdf docx
100% (1)
Full download Navidi, W: ISE Elementary Statistics William Navidi pdf docx
55 pages
RD Sharma Solutions For Class 10 Chapter 1 Real Numbers
No ratings yet
RD Sharma Solutions For Class 10 Chapter 1 Real Numbers
56 pages
Aiml Lab Manual 3 (Bfs & Dfs)
No ratings yet
Aiml Lab Manual 3 (Bfs & Dfs)
7 pages
Integration of Financial Markets in India: An Empirical Analysis
No ratings yet
Integration of Financial Markets in India: An Empirical Analysis
24 pages
Y8 End of Year Mark Scheme - Calculator
No ratings yet
Y8 End of Year Mark Scheme - Calculator
3 pages
Lesson 6: Objective: Represent Number Bonds With Composition and Decomposition Story Situations
No ratings yet
Lesson 6: Objective: Represent Number Bonds With Composition and Decomposition Story Situations
9 pages
DLD Lab 1
No ratings yet
DLD Lab 1
5 pages
Chapter 18 Macro Reference
No ratings yet
Chapter 18 Macro Reference
145 pages
STAT230 Course Notes
No ratings yet
STAT230 Course Notes
51 pages
20 June 2018 Maths Notes and Examples
No ratings yet
20 June 2018 Maths Notes and Examples
50 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Eio0000002093 03
No ratings yet
Eio0000002093 03
572 pages
Prinsip Induksi Matematika
No ratings yet
Prinsip Induksi Matematika
25 pages
Significant Figures Homework Worksheet
100% (1)
Significant Figures Homework Worksheet
6 pages
Algebra Intermedia Agodic16 G2.santiago - Benjamin.ada P 7
No ratings yet
Algebra Intermedia Agodic16 G2.santiago - Benjamin.ada P 7
4 pages
DC Circuits Viva Questions
No ratings yet
DC Circuits Viva Questions
5 pages
Vdoc - Pub - Art Meets Mathematics in The Fourth Dimension
100% (1)
Vdoc - Pub - Art Meets Mathematics in The Fourth Dimension
191 pages
Correlation Regression And: Learning Outcomes
No ratings yet
Correlation Regression And: Learning Outcomes
16 pages
Assignment 3 ProofTechniques
No ratings yet
Assignment 3 ProofTechniques
5 pages
Mathematics Lesson Plan Template
No ratings yet
Mathematics Lesson Plan Template
6 pages
Mesh Current Method With Current Sources Present
No ratings yet
Mesh Current Method With Current Sources Present
23 pages
340notes PDF
No ratings yet
340notes PDF
34 pages
CPE251 Probability Methods in Engineering
No ratings yet
CPE251 Probability Methods in Engineering
7 pages
Instant download The elements of advanced mathematics Fourth Edition Krantz pdf all chapter
100% (5)
Instant download The elements of advanced mathematics Fourth Edition Krantz pdf all chapter
52 pages
Urban Design Process: Quality of The City
No ratings yet
Urban Design Process: Quality of The City
44 pages

Deep Learning Using Rectified Linear Units (ReLU)

Uploaded by

Deep Learning Using Rectified Linear Units (ReLU)

Uploaded by

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred M. Agarap

by taking the activation of the penultimate layer hn−1 in a neu-

1 INTRODUCTION 2.3 Data Preprocessing

If we take x as the activation at the penultimate layer of a neural

In some experiments, we found the DL-ReLU models perform

2.5 Data Analysis

2.4.3 Deep Learning using ReLU. ReLU is conventionally used 3 EXPERIMENTS

Table 2 shows the architecture of the feed-forward neural net-

Table 2: Architecture of FFNN.

Layer (type) Output Shape Param #

Figure 3: Confusion matrix of FFNN-Softmax on Figure 4: Confusion matrix of CNN-ReLU on

Table 5: Training accuracies and losses per fold in the 10-fold

Fold # Loss Accuracy (×100%)

Figures 4 and 5 show the predictive performance of both models

Table 6: Fashion-MNIST Classification. Comparison of

Metrics / Models FFNN-Softmax FFNN-ReLU

Figure 7: Confusion matrix of FFNN-Softmax on

had the same F1-score of 0.86 with CNN-Softmax – also similar to

Table 8: Training accuracies and losses per fold in the 10-fold

Fold # Loss Accuracy (×100%)

Figures 8 and 9 show the predictive performance of both models

Table 9: WDBC Classification. Comparison of CNN-Softmax

Metrics / Models FFNN-Softmax FFNN-ReLU

Figure 8: Confusion matrix of CNN-ReLU on

Figure 10: Confusion matrix of FFNN-ReLU on

Figures 10 and 11 show the predictive performance of both mod-

processing systems. 1097–1105.

models through numerical inspection of gradients during backprop-

You might also like