ABCs2019 Paper v2 169
ABCs2019 Paper v2 169
Comparing techniques like drop-out and test-set cross-validation to reduce over fitting, and
modifying threshold of classes to reduce overlapping of classes to improve classification using a
Deeper Neural Network
Priyanka Jambulingam Amuthalakshmi
Abstract. In this paper, I discuss about an artificial deeper neural network (DNN) implemented to solve a classification problem
based on a real world oil-well dataset. The oil-well dataset has a lot of overlapping classes which makes classification difficult [1].
I aim to reduce overfitting and the problem of overlapping classes in this paper. To minimize overfitting I compare the performance
of techniques like dropout, test-set cross-validation in my deeper neural network and choose the technique that best suites the
dataset. To minimize the overlapping classes problem, I describe a method of modifying the balance of “False Positives” and “False
Negatives” produced by the deeper neural network in classification. The deeper neural network is implemented using
Pytorch/Python code. First, a simple neural network is designed and it is extended to a deeper neural network by an appropriate
selection of the number of hidden layers. I also discuss the reasons for the choice of the specific Loss Function, Activation Function
and Optimizer for the DNN in this paper. The reason for the selection of a particular number of hidden layers for deeper neural
network is also discussed. Performance of deeper neural network with 3 hidden layers and a simple neural network with one hidden
layer having the same techniques implemented are also compared. My deeper neural network model shows a similar result as a
research paper that uses the same dataset.
Keywords: Dropout, Test-set cross validation, Deeper Neural Network, Overlapping classes
1 . Introduction
The deeper neural network I implemented aims to solve a classification problem with 10 inputs (features) and three outputs (targets).
I am using a dataset obtained from 3 oil-wells in an oil reservoir in the North West Shelf, offshore Western Australia [1]. I chose
this dataset because I can evaluate and compare the performance of my deeper neural network(DNN) with data from 3 oil-wells and
it also becomes easy to generalize my DNN for wider range of data. Additionally, the problem set is related to natural resources
which is an area of my interest.
A set of rock samples from the three wells were manually classified based on their quality as “Fracture”, “Ok” and “Good” by
geologist based on 10 different features and is contained in the dataset [1]. The objective is to model a deeper neural network for
classifying the quality of the rock sample into “Fracture”, “Ok” or “Good” using this 10 inputs (Types of well measurement). The
10 features used as inputs are Gamma Ray(Gamma Ray), RDEV (Deep Resistivity), RMEV (Shallow Resistivity), RXO (Flushed
Zone Resistivity), RHOB (Bulk Density), NPHI (Neutron Porosity), PEF (Photoelectric Factor) and DT (Sonic Travel Time),
porosity (PHI) and permeability (logK).
A considerable problem with this dataset is that it has a lot of overlapping classes which affects the accuracy of DNN prediction [1]
and the small dataset used can lead to overfitting. So solving this problems is an important part of building a deeper neural network.
To achieve a classification problem with enhanced deeper neural network, firstly I modified the raw data from the dataset to be
compatible to feed into DNN. Secondly, I designed a basic three layer neural network with one hidden layer,one input layer and one
output layer, the performance of the neural network was tested with different specifications until the output accuracy was acceptable
for a simple neural network. Thirdly, I extended the simple neural network to a deeper neural network by experimenting on different
number of hidden layers and finally chose the number of layers with which the deep neural network gave the best accuracy. The
same specification used for the simple neural network were also used for the deeper neural network because DNN also performed
best with the same specifications. I applied test-set cross-validation and dropout technique to reduce overfitting and compared the
performance with the basic deeper neural network. I chose dropout technique because it showed a better result than test-set cross
validation in my deeper neural network. Finally, I applied a simple method of modifying the threshold of each classes to achieve a
balance between “False Positives” And “False Negatives” in classification. I also use a research paper named “A Clustering Assisted
Method for Fuzzy Rdc Extraction and Pattern Classification” by H. Kuo, T. Gedeon, and P. Wong, that uses the same dataset to
highlight the tradeoff between the performance of my DNN and their neural network. Finally, the same techniques were also applied
to the simple neural network with one hidden layer to compare the performance between the simple neural network and the deeper
neural network.
The research problem of this paper is,
1. Designing a simple neural network with specifications that produces an admissible accuracy.
2. Extending the simple neural network to a deeper neural network by carefully selecting the number of hidden layers.
3. Comparing the performance of DNN with different techniques to reduce overfitting in this dataset and chose the most suitable
technique for this dataset.
4. Reducing the overlapping classes problem by modifying the threshold of each classes to achieve a balance between “False
Positives” And “False Negatives” in classification.
5. Comparing the performance of the simple neural network and deep neural network with the same techniques implemented as
mentioned above.
These classifications are important in oil-well industries because they are used to determine the productivity of the oil-well and the
performance of the reservoirs. They are also used to determine the capacity of the oil-well to produce oil.
2. Method
2.1 Data-Preprocessing:
All data are normalised between 0 to 1. The data is divided into train and test data set for each well in the same way as used by the
research paper mentioned above [1]. The train and test data proportion is not modified for the ease of performance comparison with
the research paper [1]. The data was shuffled since it was sorted by the target class in the initial dataset. Shuffling data improves the
predictive performance of a deeper neural network and its quality [2]. It also helps DNN to be generalized without overfitting to
some extent [2]. The data is split into inputs and target.The first 10 columns are inputs and the last column is target. Tensors were
created to hold the inputs and targets in pytorch. All string target values are converted into numeric values for applying into DNN.
The conversions are made as, class ‘Frac’: 0, class ‘OK’: 1, class ‘Good’ : 2.
With reference to the Gedeon, Milne and Skidmore’s method in their research paper “Classifying Dry Sclerophyll Forest From
Augmented Satellite Data: Comparing Neural Network,Decision Tree & Maximum Likelihood” [3], different network topologies
were tested with different learning rates, different number of hidden neurons, different number of hidden layers and different number
of epochs. Finally I decided on a three layer neural network (one hidden layer) with 20 hidden neurons, 10 input neurons and 3
output neurons. I decided the learning rate and epochs to be 0.01 and 1600 because my neural network performed best at that value.
Number of epochs above 1600 lead to overfitting and thus highly deteriorated the performance of NN. Epochs less than 1600 lead
to underfitting and resulted in a low performance of NN. Similarly the NN was not able to reach the region of lowest errors due to
drastic oscillation for learning rate higher than 0.01, contrastingly with learning rates lower than 0.01, NN got stuck in the local
minima.
The input layer with 10 input neurons receives the 10 features which are split during data-preprocessing. The three outputs are the
three classes “Frac”, “Good” and “ok”. The output layer gives predictions for three classes. My NN implemented a standard and a
simple backpropagation model. The network uses cross-entropy loss as loss function, mini-batch gradient descent as optimizer and
sigmoid as activation function for hidden neurons, reasons for this choices are discussed below.
An activation function defines the output of a neural network given a set of inputs.The main reason for using sigmoid function is
because it is nonlinear, it gives an output value in the range of 0 to 1 [4]. It is especially used for neural networks were we have to
predict probability as output[4]. This makes it easy to understand and work with sigmoid functions.
An optimization algorithm used for finding the weights of a neural network is gradient Descent. It uses the errors in the model’s
prediction on training data to update the model in order to reduce the future error. Mini-batch gradient descent divides the training
data into smaller batches,calculates errors and updates the model weight [5]. I decided to choose minibatch gradient descent model
because update frequency is higher in mini-batch, this allows more robust convergence thus avoiding local minima. A function that
is used to calculate the “Loss” in a neural network is called loss function. In my neural network, I use cross-entropy loss function
because cross-entropy performs better than any other loss function in classification problem [6].
2.3 Extending the simple neural network to a DNN:
The simple neural network designed is extended to a deeper neural network. This extension from the neural network is done instead
of directly designing a DNN because, in this paper both the neural network and the deep neural network are implemented with the
same techniques discussed above and the performance is compared. So it is necessary to design both neural network and DNN with
its best specifications.
Deeper neural networks are more complex neural network, i.e., neural networks with more than two layers. Deep neural networks
are designed with deep learning architecture. The number of hidden layers with which the DNN gave the maximum performance
was chosen. The performance of DNN with different hidden layers were compared with each other. DNN with hidden layers upto 8
were tested. The performance of the DNN was increasing until 3 hidden layers and started decreasing after that. DNN with 3 hidden
layers gave better performance than DNN with 2 hidden layers, but after 3 hidden layers the performance accuracy of the DNN
monotonically decreased. Since DNN with 3 hidden layers gave the best performance, it was chosen for our simple DNN. Similarly,
different number of hidden neurons from the range 10 to 30 were tested with the DNN. DNN with 20 hidden neurons in each hidden
layers gave the maximum accuracy and thus it was chosen. The learning rate and the number of epochs were maintained at 0.01 and
1600 respectively similar to the simple neural network since the best performance of the DNN were also obtained with the same
specifications.
One of the common problems that occurs during DNN training is overfitting. Overfitting happens when a network learns too much
detail and noise from the training dataset, that it negatively impacts the performance of the network on new dataset. Especially with
a small dataset like oil-well it is more difficult to reduce overfitting. To solve the problem of overfitting, the flexibility of the network
model should be increased, but too much flexibility also reduces the performance of the network. Some of the good techniques to
reduce overfitting are L1 regularization, L2 regularization, dropout and cross validation.
I tried dropout and cross validation technique with the simple neural network and DNN to avoid overfitting. Overfitting is a critical
problem in neural networks. The best solution for preventing overfitting is to use more training data [7]. But with a small dataset
like oil-well the next good step is to use regularization techniques and cross validation [7].
Dropout is a technique that was invented to address the problem of overfitting[7]. The key idea is during training, some units are
randomly dropped out from the network. Dropping out a unit/node means removing it temporarily from the network along with its
outgoing and incoming connections. The nodes are randomly dropped out with a specific probability. Dropouts are implemented to
the neural network only while training, this is achieved by changing the network into training mode (network.train()) while training
and evaluation mode(network.eval()) while testing [7] in pytorch. In my neural network and DNN each unit is retained with a fixed
probability of 0.2. A classic value of dropout rate lies in the range 0.2 to 0.5 [7]. There is a possibility of underfitting while choosing
higher rate (0.5), to avoid underfitting I chose 0.2.
Cross-validation is a powerful technique to reduce overfitting. It is also defined as an effective technique that gauge the quality of
the deeper neural network. The idea is splitting the training data into training and validation set and using this data to tune our model
[8]. There are different types of cross validation based on the way by which the validation set is obtained from the training data set.
Some of the popular cross-validation techniques are K-Fold cross validation, leave-one-out cross validation and Test-set cross
validation. I am using Test-Set Cross Validation because it is cheap and very simple than other cross-validation techniques. In test
set cross-validation, 30 percent of the training data is randomly choose as validation dataset and the accuracy of the validation
dataset is calculated for every epoch. The DNN is trained until the accuracy has reached its highest value in the validation dataset.
Since the accuracy on validation dataset is oscillating as shown below, I am stopping the training on my deeper neural network if
the validation accuracy has not improved for 100 epochs.
One of the main complication of this dataset is that it has highly overlapping classes [1]. From a series of experiments, I conclude
that the regions are mostly overlapping between “Ok” and “Good” classes. I carefully evaluated the confusion matrix obtained from
each of the oil-well and noticed that there are a lot of “False Positives” and “False Negatives” distributed between “Ok” and “Good”
columns.
To overcome this “False Positives” and “False Negatives” I used a simple technique that was used by Gedeon, Milne and Skidmore’s
in their research paper “Classifying Dry Sclerophyll Forest From Augmented Satellite Data: Comparing Neural Network,Decision
Tree & Maximum Likelihood” [3].They vary the threshold between 0.4 to 0.75 for a two class classification to minimise the false
positives results. I decided to implement a similar technique for the three class classification. I did a softmax on my 3 output values
to get the probability of each values first and manually decided on what values I predict each class. I varied the threshold of “Good”
class from 0.2 to 0.3 and compared the performance. I mainly concentrated on “Good” class since the overlap between “Good” and
“OK” was predominant. When the threshold of the “Good” class was varied from 0.2 to 0.3. The accuracy was higher if the threshold
of “Good” class was 0.3. This means, I am determining the predicted class to be “Good” if the probability of that class is greater
than 0.3. Or else,I am determining the predicted class to be the one with maximum probability. Threshold of 0.2 gives very low
results than the simple neural network and simple DNN. So a “Good” class threshold of 0.3 is chosen.
2.6 Implementing The Same Techniques On Simple Neural Network (one hidden layer) For Performance Comparison:
To evaluate the performance of a deeper neural network, I decided to compare its performance with a simple neural network. The
techniques like drop-out, cross-validation and modifying threshold were also implemented on a simple neural network. A drop-out
of 0.2 percentage, test-set cross validation were applied to the simpler neural network. The threshold of “Good” class was modified
in the simpler neural network.
Some techniques had a better effect on performance but some had a negative effect on the performance.
Overall deeper neural network showed better performance than simple neural network for every cases. The
accuracy on the oil-well 1 data set is higher for all the techniques in both deeper neural network and simple
neural network compared to other dataset. This can be because all the techniques used works well for the oil-
well 1 data set. Said that, the accuracy on the oil-well 2 data set is the lowest for all the techniques. The below
table shows the performance of the neural network with different techniques implemented:
Oil- Simple Neural With Test-Set With Modifying Threshold Research Paper Result
Well Network Cross-Validation Drop-out of Each Classes
(Final Enhanced)
Oil- Simple Neural With Test-Set With Modifying Threshold Research Paper Result
Well Network Cross-Validation Drop-out of Each Classes
(Final Enhanced)
The deeper neural network has a better performance than the simple neural network for all the data-set. The positive effect of adding
more layers to a neural network is obvious from this result. Said that, adding more layers can also spoil the performance of Deeper
neural networks. For example, in my DNN the performance decreased on adding hidden layers more than three.
It is seen from the above table that, test-set cross validation does not improve the performance of both NN and DNN. This is because
the advantage of the cross-validation is compensated by the 30 percentage of the data lost in training dataset which is used for
validation, this impact is more prominent because of the small dataset available. From this experiment, the importance of sufficient
data for a neural network performance is prominent. More labelled data available, better the performance of a Deeper neural
network[7].
As shown in the table above, Adding drop-out showed better results than test-set cross-validation for both the networks. This is
because, in dropout overfitting is reduced without data loss. Each node is dropped with equal probability at every epoch, this prevents
the data loss. I choose dropout over test-set cross-validation for my neural network because, I can use all the data for training which
leads to better performance, whereas I lose 30% of the data if I do test-set cross validation. The DNN with drop-out performs better
than simple NN.
Firstly, to confirm the presence of overlapping classes I also carried out an additional two class experiment to prove this imbalance.
A two class classification was carried out to classify between “Frac” and (“Ok” or “Good”), “Ok” and (“Frac” or “Good”) and
“Good” and (“Frac” or “Ok”) with the simple neural network. A testing accuracy of over 90 percentage was obtained in “Frac” and
(“Ok” or “Good”) classification and 60 percent in other two classification. According to me, this proves the overlap between class
“ok” and “good”.
Modifying Threshold experiment was carried in a network with dropout technique implemented. The results were better than a
simple neural network, but was lower than the performance of a neural network and DNN with only dropout technique for some
datasets. This is because the overlapping between classes are too complicated that it cannot be fully rectified with thresholding each
classes.
3.5 Performance Comparison Between Neural Network And Deeper Neural Network:
On the whole, with all the chosen technique implemented, Deeper neural network showed better results than simple neural network.
So from this we can infer that deeper neural network is better suited for this dataset than simple one hidden layer neural network.
Let us discuss the performance of my deeper neural network with a research paper that uses the same dataset with same ratios in
classes with “Fuzzy Classification Method” Implemented [1].
Although deeper neural network with dropout technique showed better results than deeper neural network with dropout technique
and threshold modification. Still the final enhanced neural network showed similar results as the research paper.
References:
[1] Kuo, H., Gedeon, T., Wong, P. : A clustering assisted method for fuzzy rule extraction and pattern classification : 6th International
Conference on Neural Information Processing.
[2] Weninger, F., Bergmann,J., Schuller,B.: Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural
Network Toolkit.(2015)
[3] Gedeon, T., Skidmore2, A.K., Milnel, L.K. : Classifying Dry Scherophyll Forest From Augmented Satellite Data: Comparing
Neural Network, Decision Tree & Maximum Likelihood.
[4] Cybenko, G. : Approximation by Superpositions of a Sigmoidal Function, pp 303 - 304. Mathematics Of Control Signals And
Systems.(1989)
[5] Li, M., Zhang, T., Chen, Y., Smola,A.J. : Efficient minibatch training for stochastic optimization.(2014)
[6] Kline, D.M., Berardi, V.L.: Revisiting squared-error and cross-entropy functions for training neural network classifiers, pp.
310-318. Neural Computing and Applications(2005)
[7] Srivastava, N., Hinton, G., Krizhevsky, Alex., Sutskever, Ilya., Salakhutdinov, Ruslan. : Dropout: A Simple Way to Prevent
Neural Networks from Overfitting(2014)
[8] Forman, G., Scholz,M. : Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement,
pp.4957.(2010)