Computer Science Coursework: DeepLearning Library
Computer Science Coursework: DeepLearning Library
Aamir Soni
1 Analysis 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Application of AI/ML libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Methods of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Coursera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Khan Academy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.3 Medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Current Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6.1 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6.2 Pytorch and Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Target Audience and Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Aims & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.1 My Core Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.2 Desired Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Design 8
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.1.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1.1.2 Back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.1.3 Loss-Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1.1.3.1 Squared-error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1.1.3.2 Soft-max cost function . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2 Optimisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2.1 Momentum Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2.2 RMS Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1.2.3 Adaptive Moment Estimation (Adam) Optimiser . . . . . . . . . . . . . . . 14
2.1.1.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1.3.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1.3.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1.3.3 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1.3.4 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Data-Structures & Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 User-Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1
3 Technical Solution 25
3.1 Base Classes Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Linear Algebra Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Techniques - Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Optimisation Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Testing 47
4.1 Non-erroneous Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Matrix Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 Volume Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.3 Mapping Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.4 Net Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.5 Neural Networks Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.6 Screen shots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Erroneous Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Matrix Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Volume Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Evaluation 63
5.1 Meeting the Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Re-Interviewing Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Revisiting the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Code Listings 66
6.1 NeuralDot.Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 NeuralDot.Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 NeuralDot.Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 NeuralDot.Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 NeuralDot.Dense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6 NeuralDot.Conv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.7 NeuralDot.Reshape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.8 NeuralDot.MaxPool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.9 NeuralDot.Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.10 NeuralDot.Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.11 NeuralDot.netData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.12 NeuralDot.Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.13 NeuralDot.GradientDescentOptimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.14 NeuralDot.AdvancedOptimisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.15 NeuralDot.Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.16 NeuralDot.RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.17 NeuralDot.ADAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2
Analysis
1.1 Introduction
Machine Learning has recently become a very popular field in Computer Science thanks to an area known as ”Deep
Learning”, which has allowed researches to achieve record breaking results in tasks such as computer vision, natural
language processing, drug synthesis, health care, financial trading, physical simulation and many other tasks, which
were difficult to do before the rise of Deep learning. Thus there are many libraries online for researchers, AI scientists
or students with the relevant knowledge to carry out their tasks, however for students that are passionate and have
very limited knowledge of AI, it can be a daunting experience to learn how to use a machine learning library.
The hardest part however, is the theory behind the machine learning which is required to make full use of the
library, can be extremely difficult as the mathematics behind the algorithms require graduate level mathematics,
causing many students to give up, even before building their very own first ML model.
3
1.3 Machine Learning Techniques
All machine learning models work the same way, with the only difference being the inference process. Every machine
learning model has 2 different phases. The training phase and the inference phase.
Before progressing to the training phase, a machine learning engineer should have a training set. A training set
is split up into 2 difference sets. One set is used to make the predictions while the others are used as labels for
the other set. An example of this is for a home face recognition. Say, you wanted to make a ML model that can
detect faces inside the house. So the training data would in this case be the face images of the people that live in
the house and the labels would be the name of each face in the training set.
2. The outputs of the inference is then compared with the labels of the training set to find the error for each
training iterations using a costf unction.
3. The parameters of the ML model are updated using a back propagation algorithm
This training phase is then repeated many times until the error is less than the acceptable bound that was set.
1.5.1 Coursera
I will be using Coursera as it will allow me to gain a better and deeper understanding in machine learning. Currently,
I am taking a course in deep learning which will give me the foundations and skills required to understand neural
networks and also expose me to the other different types of variations of NN such as convolutional networks, recurrent
neural networks and LTSM networks. By being exposed to these new algorithms, I will be able to improve the
functionality of my library making it more flexible to use, as there are constant advancements being made in the
field of AI. Thus by making the library more flexible the user wouldn’t feel restricted in experimenting with new
ideas, this could include, a different cost function, activation function or even a different layer.
4
1.5.3 Medium
I will also be using medium as a method of research, as there have been many articles on medium that explain the
different machine learning algorithms. I will be using Medium as it will expose me to the different machine learning
algorithms and approaches people have taken. This will help me to learn from other people’s techniques and the
approaches they took when encountering a problem.
1.6.1 Tensorflow
Currently, Tensorflow is the leading AI library and has been made especially for researchers and for industrial use.
Tensorflow, being one of the best libraries for machine learning in general however, requires extensive knowledge
from linear algebra to optimisation. This makes it very limiting for many student already as many of these topics
are not studied at A level and all they would want is an easy to use machine learning library that allows them to
experiment with their ML models.
In addition, Tensorflow also lacks in its user-friendliness, as making a simple Neural Network would require a lot
of work. This is because Tensorflow’s main purpose isn’t for NNs but for allowing the user to create a computational
graph that they can experiment with. This means that when creating a simple neural network, the user would need
to set up the dimensions of the matrices and the biases being used, when in fact all that was required was a single
number representing the number of neurons in that layer. After this the user would also need to mathematically
define the cost function being used and also create some placeholders for the training set and the test set. All this
makes it very difficult for the user especially if the user isn’t a confident programmer who wants to experiment with
simple NNs.
One student, Nitish Bala, I interviewed said ”The library should be easy to use by making sure many of the
parameters such as the initial weights should already be defined and creating a NN shouldn’t require someone to
know all the maths behind it. The library should also make it easy for users to add convolutional layers and define
their own layers which can also be trained by a pre-defined gradient descent optimiser.”
5
Another student, Basim Khajwal, said ”The library should allow the user to create their own back propagation
algorithms and also allow the user to experiment with different models. The library should be easy to use by limiting
the amount of setting up required by the user such as matrix sizes and defining cost functions from definition.
Finally, users should be allowed to create many NN at the same time to compare the performance of one model to
another.”
A third student, Taha Rind, said ”The library shouldn’t be too complicated and that experimenting with different
parameters should be easy to do. Adding a dense layer should at minimum require the user to input the number of
neurons in a layer and layer activation and there should be some advanced optimisation algorithms to train these
dense networks such as momentum. Finally, the library should also allow the user to see the parameters learnt and
the gradients of the dense layers in any layer.”
A fourth student, Mujahid Mamaniat, I interviewed said ”The library must offer easy manipulation of matrices by
making sure many of the functions required are already implemented such as rotation, reshaping and adding/removing
a column in a matrix. The library must also further allow the user to manipulate with images in RBG format, which
may be done through the use of lists of matrices. Finally, making a neural network should require little effort and
should not be difficult to make.”
Finally, a fifth student, Jamie Stirling , I interviewed said ”The library should have extra focus on dense-nets as
many beginners do not understand how conv-layers work. I remember when I tried experimenting with my first ML
library. It was difficult to use as it supported many different types of layers which over complicated it, as I didn’t
understand most of what it offered, as all I knew about was dense nets. Therefore, making a dense net should be
incredibly easy as that is all many beginners in AI know about. One way in which it could be made easy would be by
setting many of the parameters optional such as the initial weights. By having many optional parameters, I think
the user would worry less about whether they have implemented the net correctly.”
From these interviews it is clear that my target audience are looking for an easy to use machine learning library
that allows them to experiment easily with different configurations of Neural Networks. I will be focusing more on
dense-nets as it is clear that many beginners do not have a wide knowledge regarding NNs. Therefore, I will try
to make it as easy as possible to make a dense net, and also add some extra functionality to dense nets, making
it more functional for the user to work with. These extra functionality will include allowing the user to view the
gradients of the dense-layers, thus enabling users to view the change in gradients as the network learns. This will
allow the user to develop an insight into NNs, thus easing the way for beginners in AI. Furthermore, I will also be
including conv-layers as many students will quickly learn the basics of NNs and would want to move on to complex
data such as images or sound. Therefore, including conv-layers will allow users to experiment with all kinds of
data. However, my main focus will be on dense nets as many beginners would not have the necessary skills to use
conv nets and would just want to experiment with the dense-layers due to their limited knowledge. Finally, I will
also be interviewing while the making of the project and keep asking users as to which parts of library can still be
improved further.
6
1. The user can manipulate with matrices by joining two matrices together, splitting a given matrix, iterate
through the columns of a matrix or its values and applying a onehot encoding function to a given matrix
2. The user can multiply, add, subtract and transpose a given matrix.
3. The user can multiply, add, subtract and transpose a list of matrices together, i.e volumes.
4. The user can create their own Dense Neural Networks.
5. The user has an option in which activation function they would like to use and also allow the user to experiment
with their own activation functions.
6. The user can tune the hyper parameters, such as the learning rate, number of neurons in a layer, number of
layers and loss function.
7. The user can train the network using a back-propagation algorithm.
8. The gradient descent algorithms should implement stochastic gradient descent, batch gradient descent as well
as mini-batch gradient descent.
9. The user can view the weights of the network i.e learned parameters of the network
10. The user can view the gradients for a specific layer in a dense-net, given the gradients for the layer above.
11. The user can add convolutional layers to their network, which they can tune by changing the hyper parameters
such as the kernel dimensions, layer activation and kernel strides.
7
Design
2.1 Overview
For the aims to be met, it is important that we have a strong idea about what will be needed to be done in order
to meet the objectives set. In this section, we will be discussing the core algorithms, core data-structures and also
the user-interface of the library. The library will now be referred to as the NeuralDot library.
2.1.1 Algorithms
The main algorithms that will be used in the making of the NeuralDot library is as follows:
1. Neural Networks
2. Optimisation
3. Matrix Operations
I will be discussing each of these algorithms in detail separately along with their respective pseudo-codes
Throughout this paper, we will be using the standard notation to avoid any unnecessary confusion
nx : Input size
ny : Output size
L : Number of layers in Neural Network
[l]
nh : Number of hidden units in layer l
m : Number of examples in data set
X ∈ Rnx ×m : Input matrix, can also be referred to as a[0]
x(i) ∈ Rnx : ith example represented as a column vector
Y ∈ Rny ×m : Label matrix for X
y ∈ Rny : Output label for the ith example
(i)
l−1 l
W l ∈ Rnh ×nh : Weight matrix in layer l
[l]
b[l] ∈ Rnh : Bias vector in layer l
[l]
z [l] ∈ Rnh : Product vector in layer l
g [l] : Activation function in layer l
[l]
a[l] ∈ Rnh : Activation vector in layer l
ŷ ∈ R : Predicted output vector. Can also be denoted as a[L]
ny
8
Here is an example of a simple Neural Network, with the notation included
This network is called a 2 layer network, as it has 1 hidden layer and one output layer.
This process is then repeated throughout every layer until the final layer where the error is calculated. Using the
notation we have established, we can now write this mathematically as:
Forward propagation for the convolutional layers is the same to that of the dense layers, the only difference
being is that instead of matrix multiplication being the operator, the operator is now cross-correlation with both
the 3d volumes. In pseudo-code this can be now written as:
9
Algorithm 2 Convolution Forward Propagation Algorithm
1: net ← Network being used to make prediction
2: x ← mini-batch
3: stridesx ← User Defined
4: stridesy ← User Defined
5: padding ← User Defined
6:
7: for each layer in net do
8: x = V olume.conv2d(layer.f ilter, x, stridesx, stridesy, padding) + layer.b (Applying 2d convolution, using
x as input and filter as the kernel, with strides = (stridesx, stridesy) )
9: x = layer.act(x)
return x
2.1.1.1.2 Back-propagation
Back-propagation, is another algorithm with many variations which will be discussed further in the section 2.1.1.2.
Although there are many techniques for back-propagation they all have the same goal; reduce the error of the
network. Therefore the essence of back propagation is to find the correct weight matrix that approximates a given
function to an appropriate degree of accuracy in a given interval. Back-propagation, is a recursive algorithm that
uses memoization to back-propagates throughout the network to find the derivative of each weight w.r.t (with
respect to) a cost. The cost being used, will be set before training and is different for each task. The notation that
will be used in explaining back-propagation will be:
The equations, that will be used for back-propagation for a Dense layer will be:
δ l = ((W [l+1] )T δ [l+1] ) · g 0[l] (z [l−1] )
∂E
∂W l
= δ l (a[l−1] )T
∂E
∂bl
= δl
In summation form this may be written as:
δl = l+1
P
m δm wm,k
The equations, that will be used for back-propagation for a Conv layer will be:
l+1 0 l
δ l = δx,y
l+1
· rot180◦ (wx,y )f (ax,y )
∂E
∂W l
= δx,y · f (rot180◦ (ol−1
l
x,y ))
x,y
∂E
∂bl
= δl
In these equations, w denotes a Volume, i.e a lists of matrices with equal dimensions.
In summation form the gradient of the error w.r.t wl , in a conv net will be:
1 −1 kX
kX 2 −1
l l+1 l+1 0
δ = rot 180◦ { δi+m,j+n wm,n f (xi,j )}
m=0 n=0
Finally, the corresponding updates for a layer in the net will be:
∂E
W := W − α ∂W
∂E
b := b − α ∂b
10
These updates will be applied to every layer in the network on each iteration.
These equations, can be easily derived by extending the chain-rule to matrix multiplication, i.e by using the
T
lemma ∂x∂x a = a
In pseudo-code this may be written as:
These back-propagation algorithms require a sample of the training data. If each image is used for one back
propagation, this is known as stochastic gradient decent, whereas if every image is used in the training sample,
then this is known as batch-gradient descent. Batch-gradient descent is more prone to local minimas, however it
can take less time to train the network, whereas stochastic gradient descent can take more time but is less prone to
locals minimas. Therefore, it is important that the user selects something in between by splitting the training data
into different batches each. This is known as mini-batch gradient descent. Therefore, stochastic gradient descent
and batch-gradient descent are just special cases for mini-batch gradient descent when mb = 1 and mb = m where
mb is the mini-batch size, respectively.
2.1.1.1.3 Loss-Functions
The back-propagation algorithms in section 2.1.1.1.2 relied on a loss function. The loss function, also referred to as
the cost function, measures how bad the network is performing. Therefore, the goal of the network is to minimise
the loss function, which is done through the use of the back-prop algorithm. This allows the network to learn from
the data by minimising the cost function, thus a decreasing loss function implies that the net is learning from the
data and a converging loss function implies that the learning is slowing down possibly due to the net having learnt
the data, or reaching a global/local minimum. There are a wide array of cost function that are available, however
the two most commonly used are: soft-max and squared error.
Minimising a function, is a classic problem in calculus and relies upon the derivative of the function. Furthermore,
the back-propagation algorithm in section 2.1.1.1.2 relies upon the derivative of the loss function. Therefore, it is
necessary to be able to compute the derivative of the cost function as it is used to back-propagate throughout the
network.
2.1.1.1.3.1 Squared-error
The squared-error loss function measures the average of the sum of the squared error for each data-point.
In mathematical form, the squared-error and its derivative may be written as:
1
Pm
E = 2m (ŷ − y)2
dE
dŷ = ŷ − y
11
These results can be proved by using the chain-rule, i.e using the result dy dy dx
dt = dx dt
The squared error cost function is commonly used for regression tasks, image-recognition and also used to train
many other type of ML models besides neural networks.
Finally, the cross-entropy loss function works by taking in the vector of probabilities(ŷ) of each class and then
applying the natural logarithm to the reciprocal of each element in this vector. Each element is then multiplied by
its corresponding yi which produces a vector, whose elements are then summed up which produces the output of
the cross-entropy loss function.
Using mathematical notation, the cross-entropy activation function and its respective derivative may then be
written as:
Pn
E = c y −yc log(ŷc )
dE
dŷ = ŷ − y
An interesting point to note is that the derivative for the squared error and the cross-entropy loss function is the
same.
2.1.1.2 Optimisers
As discussed in the previous section, we have seen that by using matrix calculus we can formulate an algorithm to
back-propagate through the neural network to find the respective derivatives for each weight matrix. Once these
derivatives have been found, the most basic way to update the weights is as shown in the section 2.1.1.1.2. However,
there are many other alternatives which have shown to work much better. The alternative optimisation methods
that I will also be implementing are:
• Momentum Optimisation
• RMS Optimiser
• Adaptive Moment Estimation (ADAM) Optimiser
12
Momentum, is an alternative approach that uses an extra parameter called the ”momentum” term that is used
to calculate the variable ”velocity” on each update. The intuition behind this is that when the gradient is high, i.e
the network is learning, the velocity parameter also increases, and when the learning is slow the velocity parameter
also reduces. This effect allows the network to be less susceptible to local minimas. The pseudo algorithm for
Momentum is as shown in Algorithm 4
Algorithm 4 Momentum
1: α ← User defined, default 0.01
2: β ← User defined, default 0.9
3: T ← User defined Number of iterations
4: t ← 0 (Initialise time step)
5: vdw ← 0 Setting the Momentum term for W
6: vd w ← 0 Setting the Momentum term for b
7:
8: for t < T do
9: for each layer in net do
10: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
11:
12: dw = ∂E
∂w
13: db = ∂E
∂b
14:
15: vdw = βvdw + (1 − β)dw (calculating the new momentum term for w)
16: vdb = βvdb + (1 − β)db (calculating the new momentum term for b)
17:
18: W = W − αvdw (Updating weights using the weight momentum term)
19: b = b − αvdb (Updating bias using the bias momentum term)
13
Algorithm 5 RMS Optimiser
1: α ← User defined, default 0.01
2: β ← User defined, default 0.9
3: t ← 0 (Initialise time step)
4: T ← User defined Number of iterations
5: sdw ← 0 RMS term for the weights
6: sdb ← 0 RMS term for the biases
7:
8: for t < T do
9: for each layer in net do
10: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
11:
12: dw = ∂E
∂w
13: db = ∂E
∂b
14:
15: sdw = βsdw + (1 − β)dw2
16: sdb = βsdb + (1 − β)db2
17:
18: W = W − α √sdw
sw +
(Updating the weights using the normalised gradient values)
19: b = b − α √sdb
sb +
(Updating the biases using the normalised gradient values)
14
Algorithm 6 Adam Optimiser
1: α ← User defined, default 0.01
2: β1 , β2 ∈ [0, 1) Exponential decay rates for moment estimates
3: mdw ← 0 (Initialise 1st moment vector for weight matrix)
4: vdw ← 0 (Initialise 2nd moment vector)
5: mdb ← 0 (Initialise 1st moment vector for bias matrix)
6: vdb ← 0 (Initialise 2nd moment vector)
7: t ← 0 (Initialise time step)
8: T ← User defined Number of iterations
9:
10: for t < T do
11: for each layer in net do
12: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
13:
14: dw = ∂E
∂w
15: db = ∂E
∂b
16:
17: mdw = β1 mdw + (1 − β1 )dw
18: vdw = β2 vdw + (1 − β2 )dw2
19:
20: mdb = β1 mdb + (1 − β1 )db
21: vdb = β2 vdb + (1 − β2 )db2
22:
mdw
23: mˆdw = 1−β t
1
mdb
24: mˆdb = 1−β t
1
25:
vdw
26: vdw
ˆ = 1−β t
2
vdb
27: vˆdb = 1−β t
2
28:
29: W = W − α √m ˆdw
v ˆ + dw
30: b = b − α √mˆdb
vˆ + db
• Convolution
• Matrix Transpose
• Basic Matrix Operations
15
Algorithm 7 Matrix Multiplication
1: A ←User defined Parameter
2: B ←User Defined Parameter
3:
4: n = A.shape(0)
5: m = B.shape(1)
6:
7: C ← Matrix(n, m) ’C will be the resulting matrix of matrix multiplication
8: for 1 ≤ i ≤ n do (Looping over the columns of A)
9: for 1 ≤ j ≤ m do (Looping over the rows of B)
10: Cij = 0
11: for 1 ≤ k ≤ n do (Iterating over rows & columns of A and B, respectively and summing the products)
12: Cij = Cij + (Aik ∗ Bjk )
return C
2.1.1.3.2 Convolution
Convolution is another operation that is widely used in deep-learning. Convolution, or more correctly known as
cross-correlation, is used in image recognition as it allows the network to learn a representation that is equivalent
to translations. This speeds up learning, as a traditional deep-net would require many training iterations for it to
learn this kind of inter-relationship of a given image. The pseudo-code for convolution is as shown in Algorithm 8
Algorithm 8 Convolution
1: M ←User defined Matrix - Input for the convolution operation
2: kernel ←User Defined Matrix
3: stridesx ←User Defined integer
4: stridesy ←User Defined integer
5:
6: hkernel ← kernel.shape(0) (kernel.shape(0) returns the height of the kernel)
7: wkernel ← kernel.shape(1) (kernel.shape(1) returns the width of the kernel)
8: hm ← M.shape(0)
9: wm ← M.shape(0)
10:
−hm wkernel −wm
11: c ← Matrix( hkernel
stridesy + 1, stridesx + 1) (c will be the output matrix for the convolution operation)
12:
13: (The Following code will now use a striding window to stride over the input M using the kernel)
14: for 0 ≤ i ≤ hm step stridesy do
15: for 0 ≤ j ≤ wm step stridesx do
16: cij = dotsum(kernel, m.item[i, i + hkernel , j, j + wkernel ]) (dotsum is a function that multiplies each
matrices element-wise and then sums up the element of the matrix)
return C
In convolutional networks, the input can also have a depth, meaning the convolution operation is applied on
a volume instead of a matrix. The dotsum in this case is the same, but this time the kernel is also multiplied by
each layer in the input to produce a volume, whose elements are then summed up. Furthermore, the kernel being
used can also be a volume. The process is again very similar, with the only difference being that the convolution
operation uses each layer of the kernel and convolves it with the input. This generates a volume as the output of
the convolution operation.
16
In mathematical terms, a convolution operation can be written as follows:
∞
X
(f ∗ g)[n] = f ∗ (m)g[m + n] (2.2)
−∞
17
• Normalising matrices
The pseudo-code for the reshaping of a matrix is as follows:
One crucial aspect of machine learning is data processing. This is extremely important as the data needs to
be processed properly for the ML models to work well. Therefore, I will be adding as much functionality as I can
18
to allow the user to process their data as much as possible. This will include functions such as remove cols/rows,
rotating matrices, inversing the oneHot operation on a matrix, padding a matrix and many other functions to make
the manipulation of data easy for the user. These functions will come extremely handy when the user is dealing
with images as data for conv-nets, which will require volumes, and many times the data is in RGB format, which is
essentially a volume of depth 3. Therefore, if the user would wanted to manipulate and process their data to make
the training of the net as easy as possible, it is important that all the most used functions are all predefined, as the
most important aspect of my library is to make machine learning as easy as possible. Therefore, I will be one step
closer to achieving this goal, as the user will not need to define these functions which could be a daunting tasks
especially for beginners thus putting off many people before even getting their hands on the machine learning stuff.
Furthermore, from the pseudo-code, for the back-propagation and forward propagation algorithms, it is clear that
I will be using many of these matrix operations.
19
Mapping
2.2
<<Interface>>
Net Tensor
//MustOverride netData
+ print()
+ transposeSelf()
+ clone(): Tensor
Optimizer + normalize(mean: double = 0, std: double = 1) as Tensor
+ getshape(): List(int)
Layer(Of Tensor)
Matrix Volume
//Constructors //Constructors
+ Matrix(initial_values : double(,)) + Volume(initial_values:List(Matrix))
+ Matrix(rows : int, cols : int) + Volume(h:int, w:int, d:int, val:int = 0)
+ Matrix(rows:int, cols:int, mean:decimal, std: decimal) + Volume(h:int, w:int, d:int, mean:double, std:double)
Conv(Volume)
+ Matrix(rows:int, cols:int, value:decimal)
//Accessors
//Accessors + item(i: int, j: int, k: int): double
+ print(MatList: Ienumerable(Of Matrix) + split(i_start: int, i_end: int, j_start: int, j_end: int, k: int): Matrix
+ item(i: int, j: int): double + split(i_start: int, i_end: int, j_start: int, j_end: int): Volume
+ item(i_start: int, i_end: int, j_start: int, j_end: int): Matrix
Data-Structures & Diagrams
//Operations
//Operations + arange(n: int, h: int, w: int, d: int): Volume
MaxPool(Volume)
20
+ arange(rows: int, cols: int): Matrix + rotate(v: Volume): Volume
+ reshape(rows: int, cols: int): Matrix + op(x:Volume, f: func(Matrix, Matrix), y: Volume): Volume
+ get_values(): double + conv(v: Volume, kernel: Volume, stridesx: int = 1, stridesy: int = 1, padding: string = "valid") : Volume
+ randn(i: int, j: int): Matrix + conv2d(v: Volume, kernel: Volume, stridesx: int = 1, stridesy: int = 1, padding = "valid"): Volume
+ transpose_self() + maxpool(filter: Volume, kernely: int, kernelx: int, stridesy: int, stridesx: int): Volumes
+ transpose(): Matrix + op(v: Volume, f: func(Matrix, Matrix, Matrix), m: Matrix): Volume
+ transpose(): Matrix + op(m: Volume, f: func(Matrix, Matrix, Matrix), v: Matrix): Volume
+ remove_index(m: Matrix, row: int, axis: int): Matrix + op(v: Volume, f: func(Matrix, double, Matrix), y: double): Volume
+ conv(m: Matrix, kernel: Matrix, stridesx: int = 1, stridesy: int = 1, padding: string = "valid"): Matrix + op(f: func(Matrix, Matrix), v: Volume): Volume
+ padd(m: Matrix, i: int, j: int): Matrix + op(f: func(double, double), v: Volume): Volume
+ addcol(m: Matrix, index: int, n: int, axs: int): Matrix + transpose(): Volume
+ join(m: Matrix, n: Matrix, index: int): Matrix + rotate(theta: int): Volume
+ split(m: Matrix, row: Tuple(Matrix, Matrix), cols: Tuple(Matrix, Matrix)): Matrix + cast(v: Volume, rows: int, cols: int): Matrix
+ maxpool(kernelx: int, kernely: int, stridesx: int, stridesy: int): Matrix + cast(m: Matrix, rows: int, cols: int): Volume
+ max(): double + cast(matrixList: List(Matrix), rows: int, cols: int): List(Volume)
+ rotate(m: Matrix, theta: int = 1): Matrix + mean(axis: int): Matrix
+ oneHot(num_classes: int): Matrix + j_rotate(v: Volume): Volume
+ invOneHot(): Matrix
+ broadcast(m: Matrix): Matrix //Iterators
+ op(f: Func(double, double), m: Matrix): Matrix + subvolume(v: Volume, kernelx: int, kernely: int, stridesx: int, stridesy: int): IEnumerable(Volume)
+ op(f: Func(double, double), m: Matrix, n: Matrix): Matrix + Items(v: Volume): IEnumerable(Matrix)
21
+ act: Mapping {readOnly} + kernelx, kernely, stridesx, stridesy: int {readOnly} + act: Mapping {readOnly}
- x_in, filter, z, b, a: Volume - x_in: Volume - x_in: Volume - x_in, w, z, b, a: Matrix
+ filters_depth, kernelx, kernely, stridesx, stridesy: int {readOnly} + units, v_rows, v_cols, v_depth: int {readOnly} + units: int {readOnly}
+ padding: string {readOnly}
// Constructors
// Constructors
+ MaxPool(_kernelx: int, _kernely: int, _stridesx: int, _stridesy: int) // Constructors
// Constructors + Dense(prevUnits: int, layerUnits: int, layerAct: Mapping, mean: double, std: double)
+ Reshape(_v_rows: int, _v_cols: int, _v_depth: int)
+ Conv(_filters_depth: int, _kernelx :int, _kernely: int, _stridesx:int, _stridesy: int, _padding:
string, _act: Mapping, mean: double, std: double) // Operations
+ gradient(prev_dx: Matrix, param() Tensor): IEnumerable(Matrix)
+ gradient(prev_dx: Matrix): IEnumerable(Matrix)
Matrix
Volume
Volume
Matrix
+ model:net {readOnly}
+dataxy: IEnumerable(Tuple(Tensor, Tensor)) {readOnly}
+ iterations: int = 0
// Constructors
+ Optimizer(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor))) Net
// MustOveride
+ run(learning_rate: decimal, printLoss: Boolean, batchSize: int, param(): decimal): List(Tensor) Tensor
+ resetParameters
+ calculateCost: Matrix
+ splitdata(batchSize: int): list(IEnumerable(tuple(Tensor, Tensor)))
+calculateGradients(xypoint: IEnumerable(tuple(Tensor, Tensor))) : tuple(list(Matrix), list(Matrix))
AdvancedOptimisers GradientDescentOptimizer
22
// Constructor
+ GradientDescentOptimizer(_net: Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))
// Constructor
+ AdvancedOptimisers(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))
23
+ AddMaxPool(kernelx: int, kernely: int, stridesx: int, stridesy: int) + Mapping(_f: act, _d: act)
+ addReshape(v_rows: int, v_cols: int, v_depth: int)
+ predict(x: Tensor): Tensor + f(x: Matrix): Matrix
+ predict(x: IEnumerable(Tensor)): IEnumerable(Tensor) + d(x: Matrix): Matrix
+ save() + f(x: Matrix, y: Matrix): Matrix
+ laod(check_point: int): Net + d:(x: Matrix, y: Matrix): Matrix
Tensor
- sigmoidAct(x: Matrix): Matrix
- sigmoidDerivative(x: Matrix): Matrix
Optimizer
Tensor
Tensor
functions and iterators, I have also added shared operators to both classes such as +, -, *, and /. These operators
subroutines that the user may use and the same applies for the Volume Class. In addition to the sub-routines,
layers will be of type Volume, and some of Matrix. The class matrix includes all the functions, iterators and
generic base class of Tensor. Therefore, it is necessary to have volume and matrix inherit from Tensor, as some
and Volume, as both classes have functions in common. This base class is also necessary, as the layers class is a
The figures above shows the UML for the NeuralDot library. The class Tensor, is a base class for the class Matrix
makes using matrices and volumes more accessible and intuitive for beginners making the library more user-friendly
to work with.
The Layer class is a generic interface of type Tensor. The classes Dense, Conv, MaxPool and Reshape
all extend from Layer, as they all have the same functions and are all layers. The class Layer includes all the
main functionalities required by any specific layer such as updating the layer, retrieving the parameters and so on.
Besides the class Dense, every other class that implements Layers, do not have any other functions besides the
ones they override from the base class. The class Dense, which implements layer(Of Matrix), has 2 extra func-
tions called gradient each which return the gradient for a particular back-prop iteration given the previous layers
gradients, with the only difference being that one of them only works if it is the final layer in the net. These extra
functions are there so that the advanced optimisation methods can then be used to train these dense nets, allowing
the user to gain an intuition onto which optimisation method works best for which particular architecture or data.
Furthermore, these extra functions also allow the user to view the gradients for a particular training iterations,
which is something the user asked for in the interview process, therefore I have added these extra functions to the
Dense class only as my main focus was on Dense nets.
Finally, the class Net adds functionality to the network enabling users to add layers to their net and choose
the loss function being used as well as the activations at each layer. Another functionality, I added to the Net class
is that I allowed the user to save models in a list, enabling the user to have many models. This is so that the user
can then experiment with different architectures, optimisation techniques and loss functions and then compare the
impact each has on the overall outcome on the Net. After saving these models, the user can then load the net back
from the list, called checkpoints, and then re-use this loaded Net.
2.3 User-Interface
Due to my program being a library, I will not be having a proper user-interface. However I will be making my
library as easy as possible to use as this is very clearly stated by my clients.
The main method in which I will be interacting with my users are through exception handlers and comments I
placed in my code. An important point to note about exception handlers however, is that there are 2 types. Ones
that I purposefully placed in my code, and the other are because of the vb compiler finding error in my code or
the users code. The second type of errors are something I will need to keep to a minimum, as the user will not
know where the error is specifically. I will achieve this by thoroughly testing out my library and taking care of any
overflow problems that could occur anywhere within my library.
On the other hand, exceptions written by myself, will provide the user with where they went wrong and details of
how they can avoid it. These exceptions, will make the library more accessible for the user as they know exactly
where they went wrong.
Finally, comments can also be used as a way to guide the users upon how certain algorithms work or what certain
variables or parameters represent. This will allow the user to feel less restricted when using my library as they will
know exactly what each function does at each stage.
24
Technical Solution
In this section, I will be explaining some of the advanced algorithms used in my project and also some of the base
classes that were used throughout my project.
25
3.1 Base Classes Used
Below is the following code used to create the 3 most important base classes in my project:
26
Public Interface Tensor
'The base class tensor, will be inherited by the Volume and the Matrix class.
'Also both Volume and Matrix are tensors and will both have functions that are in common.
Sub print() 'The sub print is used to print out the values of the Tensor. This is nessesary
,→ that every child inherits this as the user may want
'to see all the values the Tensor holds
Sub transposeSelf() 'This subroutine transposes the Tensor. This is a useful operation as
,→ Transpose is used many times in deep-nets, especially for back-prop
Function clone() As Tensor 'This function will be used by all Tensors, when cloning every
,→ layer. This clone function returns the exact same Tensor, with
'the same values and same state.
Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double = 1) As
,→ Tensor 'This function will be used to normalise the values
'in a matrix.
Function getshape() As List(Of Integer) 'Function returns the shape of the Tensor. Function
,→ returns a list as the tensors can have an arbitrary number
'of dimensions.
End Interface
Imports NeuralDot
27
'The applicability of this function depends upon how the forward propagation works in this
,→ layer.
Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
model = _net
dataxy = xydata
End Sub
MustOverride Sub resetParameters() 'This sub-routine resets the parameters being used to
,→ train the net. This includes the iterations variable.
28
Dim temp As New List(Of Tuple(Of Tensor, Tensor)) 'The Temp list will store the
,→ examples for a particular batch for the gradient descent
For n As Integer = 0 To batchSize - 1
temp.Add(dataxy(batchNum * batchSize + n))
Next
batchdata.Add(temp.AsEnumerable)
Next
Return batchdata
End Function 'This function will organise all the data examples into seperate batches for
,→ minibatch gradient descent
End Class
29
3.2 Techniques Used
3.2.1 Linear Algebra Techniques Used
Below is the code for some of the matrix operations that are being used in my project:
31
Return cloned
End Function 'completed -- function returns an identical matrix (A clone)
Public Shared Function conv(ByVal m As Matrix, ByVal kernel As Matrix, Optional ByVal
,→ stridesx As Integer = 1, Optional ByVal stridesy As Integer = 1, Optional ByVal
,→ padding As String = "valid") As Matrix
Dim paddy, paddx As Integer
If padding = "full" Then
'For a full convolution, the Matrix should be first zero-padded such that the every
,→ element in the matrix can be used to convolve with the kernel, and then a Valid
,→ convolution is applied.
m = Matrix.padd(m, kernel.shape.Item1 - 1, kernel.shape.Item2 - 1)
Return conv(m, kernel, stridesx, stridesy, "valid")
End If
If padding = "same" Then
If ((m.shape.Item1 Mod stridesy) = 0) Then
paddy = Math.Max(kernel.shape.Item1 - stridesy, 0)
Else
paddy = Math.Max(kernel.shape.Item1 - (m.shape.Item1 Mod stridesy), 0)
End If
If ((m.shape.Item2 Mod stridesx) = 0) Then
paddx = Math.Max(kernel.shape.Item2 - stridesx, 0)
Else
paddx = Math.Max(kernel.shape.Item2 - (m.shape.Item2 Mod stridesx), 0)
End If
m = Matrix.addcol(m, 1, Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, m.shape.Item1 + 1, paddy - Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, 1, Math.Floor(paddx / 2), 0)
m = Matrix.addcol(m, m.shape.Item2 + 1, paddx - Math.Floor(paddx / 2), 0)
'The amount of padding done for SAME convolution follows the tensorlflow guidlines
,→ for the amount of padding
Return conv(m, kernel, stridesx, stridesy, "valid")
ElseIf padding = "valid" Then
Dim result As New Matrix(Math.Truncate((m.shape.Item1 - kernel.getshape(0)) /
,→ stridesy) + 1, Math.Truncate((m.shape.Item2 - kernel.getshape(1)) / stridesx) +
,→ 1)
Dim i As Integer = 0
'The following code is used to compute the resulting convolved Matrix
'The dot product is used as convolution is essentially a series of dot products
For Each S In submatrix(m, kernel.shape.Item2, kernel.shape.Item1, stridesx,
,→ stridesy)
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2)
,→ = Matrix.dotsum(S, kernel)
i += 1
Next
Return result
End If
Console.WriteLine(padding)
Throw New System.Exception("Padding must be either valid, same or full")
End Function 'COMPLETED -- Returns the convolution after a kernel have been applied
32
Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index
,→ As Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis
Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index As
,→ Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis
33
Public Function maxpool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx
,→ As Integer, ByVal stridesy As Integer) As Matrix
Dim result As New Matrix(((Me.shape.Item1 - kernely) / stridesy) + 1, ((Me.shape.Item2
,→ - kernelx) / stridesx) + 1)
Dim i As Integer = 0
'Following code is used to select the Maximum element out of each submatrix
For Each m In submatrix(Me, kernelx, kernely, stridesx, stridesy) 'submatrix is a
,→ function of type ienumerable
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2) =
,→ m.max()
i += 1
Next
Return result
End Function 'Completed -- Applies maxpooling to a matrix with a kernel of size = (kernelx,
,→ kernely)
34
For j As Integer = 1 To Me.getshape(1)
Dim maxval As Double = Double.MinValue 'Variable used to store the maximum item in
,→ this column vector
Dim pos As Integer = 0 'Variable used to store the position of the maximum item in
,→ the vector
For i As Integer = 1 To Me.getshape(0)
If Me.item(i, j) > maxval Then 'If the maximum item is greater than assign the
,→ maxmav to that item and "pos" to the position of that maxmimum value
maxval = Me.item(i, j)
pos = i
End If
Next
result.item(1, j) = pos - 1
Next
Return result
End Function 'COMPLETED -- Returns the inverse of Onehot encoding to a matrix
Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim means As Matrix = Matrix.sum(Me, 0) / Me.getshape(0) 'Finds the column means
Dim stds As Matrix = Matrix.op(AddressOf Math.Pow, (Matrix.sum(Matrix.op(AddressOf
,→ Math.Pow, Me, 2), 0) - Matrix.op(AddressOf Math.Pow, Me, 2) * Me.getshape(0)) /
,→ (Me.getshape(0) - 1), 0.5) 'Finds the std for this particular colum
35
result.values(i, j) = sum
Next
Next
Return result
End Function 'COMPLETED -- Returns the product of Matrix Multiplication
36
For j As Integer = 1 To m.getshape(1) Step stepx
Yield m.item(i, j)
Next
Next
End Function 'COMPLETED -- Returns items from a matrix, using a stepsize of (stepx, stepy)
Below is the code for some of the volume operations that are being used in my project:
Public Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer, ByVal k As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j, k)
Next
Next
Return result
End Get
Set(ByVal value As Matrix)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j, k) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
End Set
End Property 'COMPLETED -- Property used to set/select a portion of a volume
Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim n As Integer = Me.shape.Item1 * Me.shape.Item2 * Me.values.Count
Dim means As Double = Me.values.Select(Function(x) Matrix.sum(x).item(1, 1)).Sum / n
Dim stds As Double = Math.Sqrt((Me.values.Select(Function(x) Matrix.sum(x * x).item(1,
,→ 1)).Sum - (mean * mean * n)) / (n - 1))
Dim result As New List(Of Matrix)
For Each M In Volume.Items(Me)
result.Add((M - means) / stds)
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns a volume whose layers are normlised using all the
,→ elements in the volume
37
Dim result As New Matrix(Me.shape.Item1, Me.shape.Item2)
If axis = 2 Then
For Each M In Me.values
result += M
Next
Return result / Me.values.Count
Else
Throw New System.Exception("axis 1 or 2 has not yet been implemented yet for
,→ Volume")
End If
End Function 'COMPLETED -- Returns the mean of a volume in a specified dimension.
,→ Currently, only works for axis = 2
Public Shared Function conv2d(ByVal v As Volume, ByVal kernels As Volume, Optional stridesx
,→ As Integer = 1, Optional stridesy As Integer = 1, Optional padding As String = "valid")
,→ As Volume
'conv2d is applying a convolution using in 2 dimensions. This means that every 2d
,→ kernel is applied to every layer in the volume.
Dim result_values As New List(Of Matrix) : Dim all_channels As List(Of Matrix) =
,→ Items(v).ToList
Public Shared Function maxpool(ByVal filter As Volume, ByVal kernely As Integer, ByVal
,→ kernelx As Integer, ByVal stridesy As Integer, ByVal stridesx As Integer) As Volume
Dim result As New List(Of Matrix)
For Each M In Items(filter)
result.Add(M.maxpool(kernelx, kernely, stridesx, stridesy))
Next
Return New Volume(result)
38
End Function 'COMPLETED -- Returns the maxpooling of a volume using a kernel of shape
,→ (kernelx, kernely) and step size = (stridesx, stridesy)
Public Shared Function op(ByVal x As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal y As Volume) As Volume
Dim result As New List(Of Matrix)
For i As Integer = 0 To x.values.Count - 1
result.Add(f.Invoke(x.values(i), y.values(i)))
Next
Return New Volume(result)
End Function 'COMPLETED -- Applies a function f(Matrix, Matrix) -> Matrix, to all the
,→ layers in the Volumes x and y
Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal m As Matrix) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, m))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wis
Public Shared Function cast(ByVal matrixList As List(Of Matrix), ByVal rows As Integer,
,→ ByVal cols As Integer) As List(Of Volume)
Dim l_v As New List(Of Volume)
For Each M In matrixList
l_v.Add(Volume.cast(M, rows, cols))
Next
Return l_v
End Function 'COMPLETED -- Casts a list of matrices into volumes elementwise
Public Shared Function cast(ByVal v As Volume, ByVal rows As Integer, ByVal cols As
,→ Integer) As Matrix
If rows * cols <> v.shape.Item1 * v.shape.Item2 * v.values.Count Then
39
Throw New System.Exception("Dimensions for matrix must only be sufficient to store
,→ all the items in the Volume")
End If
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each M In Volume.Items(v)
For Each k In Matrix.val(M)
result.item((Math.Truncate(i / result.getshape(1)) + 1), (i Mod
,→ result.getshape(1)) + 1) = k 'Assigning each element in result with
'its corresponding value in the Volume "v"
i += 1
Next
Next
Return result
End Function 'COMPLETED -- Function used to cast a Volume into a matrix of shape = (rows,
,→ cols)
Public Shared Function cast(ByVal m As Matrix, ByVal rows As Integer, ByVal cols As
,→ Integer) As Volume
Dim result As New Volume(rows, cols, m.getshape(0) * m.getshape(1) / (rows * cols))
Dim i As Integer = 0
For Each d In Matrix.val(m)
result.item(((Math.Truncate(i / (cols))) Mod rows) + 1, (i Mod (cols)) + 1,
,→ Math.Truncate(i / ((cols * rows)))) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Function casts a matrix into a volume of shape = (rows, cols)
40
Technique Used (& class How it works
Implemented)
DenseNetworks foreword- see section 2.1.1.1.1
prop
ConvNetworks foreword- see section 2.1.1.1.1
prop
Dense Back-prop The back-propagation algorithm for dense nets works recursively. The
gradients for a layer depend upon the gradient of the layer above. Matrix
multiplication then takes place between the transposed of these gradients
multiplied by the previous layers weights. This product is then scalar
multiplied by the gradient of the activation function which then returns
the gradient of the loss w.r.t to this layers bias. To find the gradient
w.r.t the weight matrix, the gradient w.r.t the bias is multiplied by the
input of the layer transposed.
Convolution Back-prop The back-propagation for the convolutional layers is very complicated
due to the fact that we are dealing with volumes instead of matrices and
differentiating w.r.t the cross correlation function. To find the gradient
of the loss w.r.t to a filter in a specific CNN layer, the previous layers
gradients are multiplied element-wise by the kernal and the derivative
of the activation function, that was used to produce these outputs. This
produces a 3d volume, which means we will need to find the mean across
the layers of this volume which results in a matrix corresponding to the
gradients for a specific layer in the filter volume.
The following code below shows the forward propagation procedure for the layers.
For the Dense Layer:
Finally, the following code below shows the back-propagation procedure for the layers.
For the Dense Layer:
41
Public Overridable Overloads Function update(ByVal l_r As Decimal, ByVal prev_delta As
,→ Tensor, ByVal ParamArray param() As Tensor) As Tensor Implements Layer(Of
,→ Matrix).update
Dim grads As IEnumerable(Of Matrix) = gradient(prev_delta, param) 'grads returns (dw,
,→ db)
w -= l_r * grads(0) : b -= l_r * grads(1) 'Parameters are being updated
Return grads(1) 'Function returns db, as will be needed for the next-layers update for
,→ back-prop
End Function 'Function updates the parameters using prev_delta and param
42
For j As Integer = 0 To x_in.shape.Item2 - kernelx Step stridesx
dx_channel_sum.split(i + 1, i + kernely, j + 1, j + kernelx, 0) =
,→ filter.values(f) * dz.item(Math.Truncate(k / dz.shape.Item2) + 1,
,→ (k Mod dz.shape.Item2) + 1, f)
k += 1
Next
Next
dx.Add(dx_channel_sum.mean(2))
Next
Next
The following algorithms below were used to train the net and they are, ADAM, RMS, Momentum and the
standard back-propagation algorithm.
The following code below is the code that was implemented for the standard backpropagation algorithm.
43
pred = model.predict(vector.Item1)
errors.Add(model.loss.d(pred, vector.Item2) *
,→ model.netLayers.Peek.parameters.Last)
Next
Dim deltas As New Stack(Of Tensor)
deltas.Push((New Volume(errors) / batch.Count).mean(2))
The following code below is the code that was implemented for the Momentum Optimisation algorithm.
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("Momentum requires 1 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) =
,→ MyBase.calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2 'This
,→ line retreives the gradients for w & b
44
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
The following code below is the code that was implemented for the RMS Optimisation algorithm.
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("RMS requires 1 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
'Following code applies the RMS optimisation technique to the network
For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(0) + (1 - param(0)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(0) + (1 - param(0)) * db(layer) * db(layer)
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
Next
Next
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
The following code below is the code that was implemented for the ADAM Optimisation algorithm.
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
45
If param.Count <> 2 Then
Throw New System.Exception("Adam requires 2 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
46
Testing
In this Section, I will now be testing my library to make sure every function and sub-routine created works correctly.
It is important, that every component works correctly, otherwise it could result in a unsuccessful product that does
not fulfil the needs of my target audience.
47
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating Matrix with Matrix Instantiated with As expected N/A
initial values set set values
2 Instantiating Matrix with Matrix Instantiated with As expected N/A
fixed size size set
3 Instantiating Matrix with Matrix instantiated with As expected N/A
values ∼ N(0, 1) all values ∼ N(0, 1)
48
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 matrices Matrix of correct values As expected N/A
returned
2 Difference of 2 matrices Matrix of correct value re- As expected N/A
turned
3 Element-wise multiplica- Matrix of correct values As expected N/A
tion between 2 matrices returned
4 Element-wise division of 2 Matrix of correct values As expected N/A
matrices returned
5 Matrix Multiplication Matrix of correct values As expected N/A
returned
6 clockwise rotation of ma- Matrix of correct values As expected N/A
trix returned
7 MaxPool operation on a Matrix of correct values As expected N/A
matrix returned
8 maximum item operation correct value returned As expected N/A
on matrix
9 Matrix Convolution Matrix of correct values As expected N/A
returned
10 Sum of 2 matrices using Matrix of correct values As expected N/A
broadcasting returned
11 DotSum operation be- Matrix of correct values As expected N/A
tween two matrices returned
12 Equality between 2 matri- Correct Boolean output As expected N/A
ces returned
13 Scalar multiplication be- Matrix of correct values As expected N/A
tween a double and matrix returned
14 Scalar Addition between a Matrix of correct values As expected N/A
double and matrix returned
49
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Function f (x)− > y, Function returns matrix As expected N/A
where x is double, is ap- with each value being the
plied on a matrix result of the function f (x)
2 Function f (x, y)− > y, Function returns matrix As expected N/A
where x and y are doubles with each value being
each from a separate ma- the result of the function
trix, is applied on matrix f (x, y)
50
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 Volumes Volume of correct values As expected N/A
returned
2 Subtraction between 2 Volume of correct values As expected N/A
Volumes returned
3 Element-wise - Multiplica- Volume of correct values As expected N/A
tion between 2 Volumes returned
4 Element-wise - Division Volume of correct values As expected N/A
between 2 Volumes returned
5 Matrix Multiplication be- Volume of correct values As expected N/A
tween 2 Volumes returned
6 Element-wise - Division Volume of correct values As expected N/A
between 2 Volumes returned
7 Scalar multiplication be- Volume of correct values As expected N/A
tween Volume and double returned
8 Scalar Addition between Volume of correct values As expected N/A
Volume and double returned
9 Convolution 2d between Volume of correct values As expected N/A
Volumes returned
10 Maxpooling between 2 Volume of correct values As expected N/A
volumes returned
51
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 List to volume cast Function needs to convert As expected N/A
list of matrix to a volume
given height and width
2 Volume to matrix cast Function needs to convert As expected N/A
a volume to a matrix,
given height and width of
the matrix
3 Matrix to volume cast Function needs to convert As expected N/A
a matrix to a volume given
the height and width of
the volume
52
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating a Mapping f and d are set to the val- As expected N/A
with activations specified ues specified
and its respective deriva-
tives
2 Instantiating a Mapping f and d are set to the val- As expected N/A
with loss functions spec- ues specified
ified and its respective
derivatives
53
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating Net with loss Net Loss function and net- As expected N/A
function and data features feature variables are set to
specified the specified values
I will now be testing the dense layers, using the squared error as my loss function and using 500 training iterations.
The training data that I will be using will be x = matrix.join(x1 , x2 ), where x1 ∼ N (3, 1) & x1 .shape = (5, 50)
and x1 ∼ N (0, 2) & x2 .shape = (5, 50).
54
This means, that I will be having a data-set of 100 examples with 5 features and the network will try to distinguish
between the two data-sets. If the data belongs in x1 , then the net will need to predict a 1 else it needs to predict a
0.
To see if the net has learnt the data, I will be testing it with a test set which will have the same shape as x, but the
difference being that it will have different values. In these tests the Net should learn the function to an appropriate
degree of accuracy with all hyper parameters kept constant besides the mini-batch size and learning rate. In these
tests the Net should achieve at least a minimum loss of 0.01 for a test to be said to be successful.
I will be recording the average loss for a particular test using 100 different tests for the same architecture and
optimiser.
The following table below shows the results of the test. The net architecture describes the number of layers in
each net and the activation used.
From this table, we can see that the gradient descent optimises are working as they should and that the net is
calculating the gradients correctly. This integration test shows that the constituent parts are functioning together
correctly as no exceptions turned up, which shows that the algorithms have been implemented carefully as no
overflow occurred. Furthermore, the time taken for this test was 2 hours as each test was repeated 100x. Finally,
this test was again repeated 8x using different training sets, but the results were all successful.
The conv layers, unlike the dense layers, do not have these advanced optimisation methods, therefore, I will only
need to test out the gradient descent optimiser(GDO), for the conv layers. One important point to note is that
training the conv layers can take a long time, possibly around 6 hours, depending upon the size of the data if the
tests are to be repeated many times to obtain reliable results.
The data-set I will be using for the conv layers will be the MNIST data set, which consist of a data-set of
55
hand-written digits. This data-set has been collected for the purpose of machine learning, therefore by using this
data-set I will be able to achieve reliable results.
The notation being used to represent the net architecture is:
• c = (f, kx , ky , sx , sy )a will denote a convolutional layer, with kernel of dimension (kx , ky , f ) with strides of
(sx , sy ), activation a and padding set to the default value: ”valid”. The activations that will be used are
relu(r), sigmoid(s) and swish(sw)
• m = (kx , ky , sx , sy ) will denote a max-pooling layer, with kernel of dimension (kx , ky ) with strides of (sx , sy )
and activation a.
The notation to represent the dense nets that follow on after the conv nets will use the same notation as before.
Finally, the softmax function will be used at the end to make a prediction for a specific class, and the loss function
that will be used will be the softmax cross entropy cost function. Due to the limited amount of time and available
computational resources, only 3 tests could be done to train a CNN using 1000 iterations, and each test is repeated
200 times to achieve reliable results for the final loss. The data consists of 50 images of size 28x28x1. The final
loss in this case is measured using another 50 images. Finally, in these tests the network should correctly classify
at least 90% of the test data.
In the second test an exception handler was thrown due to an overflow exception
This overflow exception was because the input was too large before the softmax function was applied. I solved
ex xi −m
this problem through using the fact that Pjex = Pe (xj −m)
i e
Therefore, I have changed the line of code:
56
to
This works because both the numerator and denominator of the fraction are being divided by ex.max() , where
x.max() is the maximum element in the matrix, making all the values in the input matrix less than or equal to 1.
After changing this line, the nets worked perfectly fine. This shows that the convolutional networks are work-
ing as they should as the nets on average classified more than 90% of the images correctly for each test, which
means that the CNNs can now differentiate between images of numbers between 0 and 9. Finally, the time taken
to train the CNNs was approximately 19 hours, which is mainly due to the reason that the convolution operation
has time complexity O(n4 ).
57
4.1.6 Screen shots
58
Figure 4.5: Results of first model(out of the 200) for Test No1 on Test Set: part 1
Figure 4.6: Results of first model for Test No1 on Test Set: part 2
Figure 4.7: Results of first model for Test No2 on Test Set: part 1
59
Figure 4.8: Results of first model for Test No2 on Test Set: part 2
Figure 4.9: Results of first model for Test No3 on Test Set: part 1
Figure 4.10: Results of first model for Test No3 on Test Set: part 2
Below are the YouTube link for the videos for the matrices, volume, and Dense networks tests. A video could
not be provided for the convolutional neural networks tests as training conv nets took more than 19 hours in total,
hence the reason why the screen shots were provided instead.
I will not be able to record the tests for the dense layer as training each different architecture 100 times on
different functions for the sake of reliability takes a long time - as it took approximately 8 hours in total. However,
I can record a dense net learning a function instead and training it using a variety of different optimises.
In the playlist above, we used a dense net to learn a function. We then made the function (data), much more
difficult and we see that the ADAM optimiser is able to train the net to learn the function to a good degree of
accuracy while the RMS optimiser is not able to do this.
Finally, the link for the testing of some of the functions of the matrix and Volume classes are below:
Note: Not every function was tested in the video, however in the testing section every function was tested thor-
oughly.
60
4.2 Erroneous Testing
While it is important that the library works for valid inputs, it is also important to make sure the library handles
invalid inputs and overflows correctly. Therefore, I will be testing the Matrix for any errounous inputs and overflows,
and how it handles them through the use of exceptions.
61
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 matrices with Exception occurred As expected N/A
invalid dimensions, i.e throwing ”shapes do not
shapes not conforming conform for addition”
2 Difference of 2 matrices Exception occurred As expected N/A
with dimensions not con- throwing ”shapes do not
forming conform for addition”
3 Element-wise multiplica- Exception occurred As expected N/A
tion between 2 matrices throwing ”shapes do not
with invalid dimensions conform for element-wise
multiplication”
4 Element-wise division of 2 Exception occurred As expected N/A
matrices with invalid di- throwing ”shapes do not
mensions conform for element-wise
multiplication”
5 Matrix Multiplication be- Exception occurred As expected N/A
tween 2 matrices with in- throwing ”shapes do
valid dimensions not conform for matrix
multiplication”
6 DotSum operation be- Exception occurred As expected N/A
tween 2 matrices with throwing ”shapes do
different shapes not conform for matrix
multiplication”
62
Evaluation
Finally, I will now be evaluating my project and looking back at the objectives and compare my project to the
objectives set and determining whether the library meets the required objectives.
2. The user can multiply, add, subtract and transpose a given matrix.
3. The user can multiply, add, subtract and transpose a list of matrices together, i.e volumes.
These objectives specify the linear algebra part of my project and it is important that I meet these particular
objectives because they are used by every class in my project. I have met these objectives thoroughly as the matrix
class offers a wide range of functions and sub-routines which includes:
• Add/subtract/divide/multiply
• Matrix Multiplication
• reshaping matrix
• Transposing
• Iterations, which includes columns, values, and a list of matrices
• Onehot and inversing the onehot function
• Applying a function to the items of the matrix
Likewise, the same can be said of Volume as it offers a wide range of functions including the ones specified in the
objectives. Clearly, from this we can state that these objectives set have been met.
63
8. The gradient descent algorithms should implement stochastic gradient descent, batch gradient descent as well
as mini-batch gradient descent.
9. The user can view the weights of the network i.e learned parameters of the network
10. The user can view the gradients for a specific layer in a dense-net, given the gradients for the layer above.
These objectives specify the neural networks part of my project which is the most important aspect of my project
as this is the main functionality of the library. The project has met all these objectives as it allows users to create
dense neural networks, layer by layer and for each layer allows the user to chose, layer activation, number of neurons
in layer and loss function. The user also has a choice of gradient descent optimises such as Adam, Momentum, RMS
or the standard back-propagation algorithm which all implement mini-batch gradient descent. By implementing
mini-batch gradient descent, the user can chose their batch size, which enables them to either use stochastic, batch
or mini-batch gradient descent. Furthermore, the library allows the user to view the parameters of the layer which
includes the weights, bias and also the output and allows the user to view the gradients for a specific layer in a
dense net given the gradients for the layer above.
One aspect however that could be further improved is that I could make the training of neural networks much more
efficient through the use of parallel programming. This would speed-up allot of the heavy computation hence saving
time for the user and much easier to run tests on. However, due to the limited time I was not able to do this as
I would then need to consider many other factors and parallel programming would introduce new problems that
require a certain level of expertise to deal with.
Finally, the project meets these objectives but if the user wanted to add many layers to a net or have 1000 neurons
in a layer, it could get out of hand as the amount of time required for one single backpropagation iteration would
increase exponentially.
11. The user can add convolutional layers to their network, which they can tune by changing the hyper parameters
such as the kernel dimensions, layer activation and kernel strides.
The project meets this objective as users can create a convolutional layer and allows the user to chose the
settings for that layer, which includes kernel dimensions, layer activation and kernel strides.
Overall, all the core objectives have been met to a satisfactory degree and there is very little that can be done
to improve upon the project.
5.1.2 Extension
The extension objectives are:
1. The user can choose from a wide range of optimisation algorithms such as momentum, RMS and Adam to
train their dense networks
2. The user can define their own back propagation algorithms to train their neural networks
64
5.2 Re-Interviewing Clients
I re-interviewed the clients and asked them for their opinion on the final product. These are some of the responses,
I received:
The first student, Nitish Bala said, ”The library is extremely intuitive to use and manipulating with data-structures
such as matrices and volumes is very easy to do. The optimisation methods also work well and it is very easy to
define your own optimisation techniques. Overall, I think this library is easy to use and making dense nets has
never been any easier than this.”
The second student, Basim Khajwal said, ”The library is easy to use and offers a variety of optimisation
techniques. One particular part, I like a lot about this library is that it allows the user to define their own layers,
which I think is a big plus. Overall, I think this library is the perfect library to go to for a beginner, if they would
like to get a hands on intro to machine learning.”
The third student, Taha Rind said, ”I like the design of this library as it makes the library very obvious to use.
I quite like the fact that making a simple dense network can be done in 5 lines of code, which makes this library
appealing to use.”
The fourth student, Mujahid Mamaniat said, ”Making a neural network including a CNN is very easy to do, as
creating a network only requires setting the parameters for each layer. Another aspect I like about the library is that
although it offers many features such as enabling users to create their own layers, optimisers, activation functions
it doesn’t sacrifice out on the design of the software like many other ML libraries out there.”
Finally, Jamie Stirling said, ”I like the way the library has an extra focus on dense nets as this is what many
libraries are lacking. I also like the way the library allows users to create their own activations and loss functions,
without making the library too difficult to work with. Overall, I think this library is perfect for a beginner as it is
not complicated but at the same time offers many features which might interest the user even more.”
Therefore, it is quite obvious from these feed-backs that the library has been successful in delivering a machine
learning library for beginners.
This problem that I am trying to solve however, can never be solved entirely, mainly because it is an open problem.
The needs of the users are constantly changing and machine learning is always advancing. It was only recently that
dense nets became the number one go to for machine learning, therefore in the future the user may require different
algorithms to suit their needs, however for the current climate of machine learning, this solution does meet the
required objectives to counter the original problem; offering an easy to use machine learning library for beginners
to use.
65
Code Listings
6.1 NeuralDot.Tensor
Sub print() 'The sub print is used to print out the values of the Tensor. This is nessesary
,→ that every child inherits this as the user may want
'to see all the values the Tensor holds
Sub transposeSelf() 'This subroutine transposes the Tensor. This is a useful operation as
,→ Transpose is used many times in deep-nets, especially for back-prop
Function clone() As Tensor 'This function will be used by all Tensors, when cloning every
,→ layer. This clone function returns the exact same Tensor, with
'the same values and same state.
Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double = 1) As
,→ Tensor 'This function will be used to normalise the values
'in a matrix.
Function getshape() As List(Of Integer) 'Function returns the shape of the Tensor. Function
,→ returns a list as the tensors can have an arbitrary number
'of dimensions.
End Interface
6.2 NeuralDot.Matrix
Imports NeuralDot
Public Class Matrix
Implements Tensor
66
'The tState variable can either be True or False. This is used to denote whether the matrix
,→ has been transposed or not
'Constructors
Public Sub New(ByVal initial_values(,) As Double)
values = initial_values
shape = New Tuple(Of Integer, Integer)(initial_values.GetLength(0),
,→ initial_values.GetLength(1))
End Sub 'COMPLETED -- This constructor is used to initialize a matrix with initial values
Public Sub New(ByVal rows As Integer, ByVal cols As Integer, ByVal mean As Decimal, ByVal
,→ std As Decimal)
Me.New(rows, cols)
For j As Integer = 0 To cols - 1
For i As Integer = 0 To rows - 1
Me.values(i, j) = Math.Round(norm(mean, std), 2)
Next
Next
End Sub 'Completed -- Constructor Instantiates matrix with values which are normal
,→ Distribution
Public Sub New(ByVal rows As Integer, ByVal cols As Integer, ByVal value As Decimal)
Me.New(rows, cols)
For i As Integer = 0 To rows - 1
For j As Integer = 0 To cols - 1
Me.values(i, j) = value
Next
Next
End Sub 'COMPLTED -- Constrctor Instantiates matrix with all values the same
Public Shared Function arange(ByVal rows As Integer, ByVal cols As Integer) As Matrix
Dim result As New Matrix(rows, cols)
Dim i As Double = 0
For Each d In val(result)
result.item(Math.Truncate(i / cols) + 1, (i Mod cols) + 1) = i
i += 1
Next
Return result
End Function 'COMPLETED -- Returns a Matrix that has all the values in an incrementing
,→ order. Useful for debugging and Testing Other functions
67
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each d In val(Me)
result.item(Math.Truncate(i / cols) + 1, (i Mod cols) + 1) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Rehapes a matrix in to another matrix with shape = (rows, cols)
68
Public Property item(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j)
Next
Next
Return result
'Get is used to return a Submatrix from a matrix
End Get
Set(ByVal value As Matrix)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
'Set is used to set a submatrix of a matrix
End Set
End Property 'COMPLETED -- Property is used to set or get a submatrix of a matrix
Public Shared Function randn(ByVal rows As Integer, ByVal cols As Integer) As Matrix
Return New Matrix(rows, cols, 0, 1)
End Function 'Completed -- Function returns a matrix of values with normal dist
69
Public Sub transposeSelf() Implements Tensor.transposeSelf
tState = Not tState
shape = New Tuple(Of Integer, Integer)(Me.shape.Item2, Me.shape.Item1)
End Sub 'COMPLETED -- Subroutine used to swap "ij" and also swap indices of shape -
,→ transposes the current matrix - Does not create an instance
Public Shared Function remove_index(ByVal m As Matrix, ByVal row As Integer, ByVal axis As
,→ Integer) As Matrix
'If axis is 0 then a column is deleted else if axis is 1 then a row is deleted
'The row parameter is used to state which row or column is being deleted. The index
,→ starts from 0
If axis = 1 Then
m.transposeSelf()
'The matrix(M) is being transposed as removing a row from the matrix(M) is
,→ equaivilant to removing a column in the transposed matrix
End If
If axis <> 0 And axis <> 1 Then
Throw New System.Exception("Axis must be 0 or 1")
'An excpetion is thrown as the direction to remove a column or row is only limited
,→ to 0 or 1 meaning column or row, respectively.
ElseIf Not (0 <= row < m.shape.Item2) Then
Throw New System.Exception("Index was out of range")
'An exception is thrown as the row parameter is not within the acceptable range, so
,→ that the row/column can be deleted
End If
70
'The following is used to copy all the values of the matrix onto the matrix result. The
,→ elements of the column/row that will be removed will not be copied on to the matrix
,→ result
For j As Integer = 0 To m.shape.Item2 - 1
index_i = 0
If Not (j = row) Then
For i As Integer = 0 To m.shape.Item1 - 1
result.values(index_i, index_j) = m.item(i + 1, j + 1)
index_i += 1
Next
index_j += 1
End If
Next
If axis = 1 Then
result.transposeSelf() 'The matrix is transposed to turn it back into its
,→ original form. The matrix will only be transposed if the axis is 1 as only then
,→ the Matrix is transposed.
End If
Return result
End Function 'COMPLETED --Removes a row/col from the list
Public Shared Function conv(ByVal m As Matrix, ByVal kernel As Matrix, Optional ByVal
,→ stridesx As Integer = 1, Optional ByVal stridesy As Integer = 1, Optional ByVal padding
,→ As String = "valid") As Matrix
Dim paddy, paddx As Integer
If padding = "full" Then
'For a full convolution, the Matrix should be first zero-padded such that the every
,→ element in the matrix can be used to convolve with the kernel, and then a Valid
,→ convolution is applied.
m = Matrix.padd(m, kernel.shape.Item1 - 1, kernel.shape.Item2 - 1)
Return conv(m, kernel, stridesx, stridesy, "valid")
End If
If padding = "same" Then
If ((m.shape.Item1 Mod stridesy) = 0) Then
paddy = Math.Max(kernel.shape.Item1 - stridesy, 0)
Else
paddy = Math.Max(kernel.shape.Item1 - (m.shape.Item1 Mod stridesy), 0)
End If
If ((m.shape.Item2 Mod stridesx) = 0) Then
paddx = Math.Max(kernel.shape.Item2 - stridesx, 0)
Else
paddx = Math.Max(kernel.shape.Item2 - (m.shape.Item2 Mod stridesx), 0)
End If
m = Matrix.addcol(m, 1, Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, m.shape.Item1 + 1, paddy - Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, 1, Math.Floor(paddx / 2), 0)
m = Matrix.addcol(m, m.shape.Item2 + 1, paddx - Math.Floor(paddx / 2), 0)
'The amount of padding done for SAME convolution follows the tensorlflow guidlines
,→ for the amount of padding
Return conv(m, kernel, stridesx, stridesy, "valid")
ElseIf padding = "valid" Then
71
Dim result As New Matrix(Math.Truncate((m.shape.Item1 - kernel.getshape(0)) /
,→ stridesy) + 1, Math.Truncate((m.shape.Item2 - kernel.getshape(1)) / stridesx) +
,→ 1)
Dim i As Integer = 0
'The following code is used to compute the resulting convolved Matrix
'The dot product is used as convolution is essentially a series of dot products
For Each S In submatrix(m, kernel.shape.Item2, kernel.shape.Item1, stridesx,
,→ stridesy)
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2)
,→ = Matrix.dotsum(S, kernel)
i += 1
Next
Return result
End If
Console.WriteLine(padding)
Throw New System.Exception("Padding must be either valid, same or full")
End Function 'COMPLETED -- Returns the convolution after a kernel have been applied
Public Overloads Shared Function addcol(ByVal m As Matrix, ByVal index As Integer, ByVal n
,→ As Integer, ByVal axis As Integer) As Matrix
Select Case axis
Case 1
'The matrix(M) is being transposed as adding a row to the matrix(M) is
,→ equaivilant to adding a column in the transposed matrix
Return addcol(m.transpose, index, n, 0).transpose
Case 0
If index <= 0 Or index > m.shape.Item2 + 1 Then
'The error is thrown here as the index is out of the accpetable bounds for
,→ a column to be inserted
Throw New System.Exception("Index cannot be more that zero, and needs to be
,→ less than the size of dimension in which col/row is being added")
End If
Dim output As New Matrix(m.shape.Item1, m.shape.Item2 + n) 'adds "n" extra
,→ columns
'The following code inserts all the values from the Matrix (M) into the output
,→ Matrix
For j As Integer = 0 To m.shape.Item1 - 1
For i As Integer = 0 To index - 2
output.values(j, i) = m.item(j + 1, i + 1)
Next
72
Next
For j As Integer = 0 To output.shape.Item1 - 1
For i As Integer = index To output.shape.Item2 - 1 - n + 1
output.values(j, i + n - 1) = m.item(j + 1, i)
Next
Next
Return output
Case Else
'Error is thrown here as only a column or row can only be inserted
Throw New System.Exception("axis can only be 0 or 1")
End Select
End Function 'COMPLETED -- Adds "n" number of cols/rows to a matrix
Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index As
,→ Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis
Public Function maxpool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx
,→ As Integer, ByVal stridesy As Integer) As Matrix
Dim result As New Matrix(((Me.shape.Item1 - kernely) / stridesy) + 1, ((Me.shape.Item2
,→ - kernelx) / stridesx) + 1)
Dim i As Integer = 0
'Following code is used to select the Maximum element out of each submatrix
For Each m In submatrix(Me, kernelx, kernely, stridesx, stridesy) 'submatrix is a
,→ function of type ienumerable
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2) =
,→ m.max()
i += 1
Next
Return result
73
End Function 'Completed -- Applies maxpooling to a matrix with a kernel of size = (kernelx,
,→ kernely)
74
Dim maxval As Double = Double.MinValue 'Variable used to store the maximum item in
,→ this column vector
Dim pos As Integer = 0 'Variable used to store the position of the maximum item in
,→ the vector
For i As Integer = 1 To Me.getshape(0)
If Me.item(i, j) > maxval Then 'If the maximum item is greater than assign the
,→ maxmav to that item and "pos" to the position of that maxmimum value
maxval = Me.item(i, j)
pos = i
End If
Next
result.item(1, j) = pos - 1
Next
Return result
End Function 'COMPLETED -- Returns the inverse of Onehot encoding to a matrix
Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim means As Matrix = Matrix.sum(Me, 0) / Me.getshape(0) 'Finds the column means
Dim stds As Matrix = Matrix.op(AddressOf Math.Pow, (Matrix.sum(Matrix.op(AddressOf
,→ Math.Pow, Me, 2), 0) - Matrix.op(AddressOf Math.Pow, Me, 2) * Me.getshape(0)) /
,→ (Me.getshape(0) - 1), 0.5) 'Finds the std for this particular colum
Public Shared Function op(ByVal f As Func(Of Double, Double), ByVal m As Matrix) As Matrix
75
Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.values(i - 1, j - 1) = f.Invoke(m.item(i, j))
Next
Next
Return result
End Function 'COMPLETED -- Applies a function(double -> double) elementwise to a matrix
Public Shared Function op(ByVal f As Func(Of Double, Double, Double), ByVal m As Matrix,
,→ ByVal n As Double) As Matrix
Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.values(i - 1, j - 1) = f.Invoke(m.item(i, j), n) 'Applies function
,→ "f(x,y)" using parameters (m_ij, n)
Next
Next
Return result
End Function 'COMPLETED -- Applies a function((double, dobule) -> double) elementwise to
,→ matrix using "n" as the second parameter
76
Public Shared Function mean(ByVal m As Matrix, ByVal axis As Integer) As Matrix
Dim n As Integer = m.getshape(axis)
Return sum(m, axis) / n
End Function 'COMPLETED -- Returns a reduced matrix by calculating the mean in the
,→ direction of "axis"
77
Return c
End Function 'COMPLETED -- Adds to matrixes together elementwise
78
Next
Return result
End Function 'COMPLETED -- Returns the multiplication of two matrices elementwise
79
End Operator 'COMPLETED -- Returns the differece of a matrix with a decimal number
80
Return result
End Operator 'COMPLETED -- Compares each value in the matrix elementwise, with another item
,→ and returns 1 if value is larger,
'Else returns 0 If value was smaller
'Iterators
81
Public Shared Iterator Function val(ByVal m As Matrix, Optional stepx As Integer = 1,
,→ Optional stepy As Integer = 1) As IEnumerable(Of Double)
For i As Integer = 1 To m.getshape(0) Step stepy
For j As Integer = 1 To m.getshape(1) Step stepx
Yield m.item(i, j)
Next
Next
End Function 'COMPLETED -- Returns items from a matrix, using a stepsize of (stepx, stepy)
'All iterators defined
'Matrix Class Completed
Public Shared Function norm(ByVal mean As Decimal, ByVal std As Decimal) As Double
Randomize()
Return Math.Sqrt(-2 * Math.Log(Rnd())) * Math.Cos(2 * Math.PI * Rnd()) * std + mean
End Function 'Used to Create random normal numbers using the box-muller transform
End Class
6.3 NeuralDot.Volume
Imports NeuralDot
'Constructors
Public Sub New(ByVal initial_values As List(Of Matrix))
values = initial_values
shape = New Tuple(Of Integer, Integer)(initial_values(0).getshape(0),
,→ initial_values(0).getshape(1))
End Sub 'This constructor is used to initalise the matrix, with initial values
Public Sub New(ByVal h As Integer, ByVal w As Integer, ByVal d As Integer, Optional ByVal
,→ val As Double = 0)
For i As Integer = 1 To d
values.Add(New Matrix(h, w, val))
Next
shape = New Tuple(Of Integer, Integer)(h, w)
End Sub 'Constructor creates a volume of shape = (h, w, d) with all values set to "val"
Public Sub New(ByVal h As Integer, ByVal w As Integer, ByVal d As Integer, ByVal mean As
,→ Double, ByVal std As Double)
For i As Integer = 1 To d
values.Add(New Matrix(h, w, mean, std))
82
Next
shape = New Tuple(Of Integer, Integer)(h, w)
End Sub 'Constructor creates a volume of shape = (h, w, d) with all values normally
,→ distributes with a set mean and std
'Accessors
Public Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer, ByVal k As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j, k)
Next
Next
Return result
End Get
Set(ByVal value As Matrix)
83
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j, k) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
End Set
End Property 'COMPLETED -- Property used to set/select a portion of a volume
Public ReadOnly Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal
,→ j_start As Integer, ByVal j_end As Integer) As Volume
Get
Dim result As New List(Of Matrix)
For Each M In Items(Me)
result.Add(M.item(i_start, i_end, j_start, j_end))
Next
Return New Volume(result)
End Get
End Property 'COMPLETED -- Property used to return a Volume of shape(i_start - i_end + 1,
,→ j_start - j_end + 1, k). All values in the volume
'correspond to an item in the indexing volume
'Volume Operations
Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim n As Integer = Me.shape.Item1 * Me.shape.Item2 * Me.values.Count
Dim means As Double = Me.values.Select(Function(x) Matrix.sum(x).item(1, 1)).Sum / n
Dim stds As Double = Math.Sqrt((Me.values.Select(Function(x) Matrix.sum(x * x).item(1,
,→ 1)).Sum - (mean * mean * n)) / (n - 1))
Dim result As New List(Of Matrix)
For Each M In Volume.Items(Me)
84
result.Add((M - means) / stds)
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns a volume whose layers are normlised using all the
,→ elements in the volume
'Joint Operations
85
Return op(x, AddressOf Matrix.add, y)
End Operator 'COMPLETED -- Adds a scaler to a Volume elementwise
Public Shared Function conv2d(ByVal v As Volume, ByVal kernels As Volume, Optional stridesx
,→ As Integer = 1, Optional stridesy As Integer = 1, Optional padding As String = "valid")
,→ As Volume
'conv2d is applying a convolution using in 2 dimensions. This means that every 2d
,→ kernel is applied to every layer in the volume.
Dim result_values As New List(Of Matrix) : Dim all_channels As List(Of Matrix) =
,→ Items(v).ToList
86
Public Shared Function maxpool(ByVal filter As Volume, ByVal kernely As Integer, ByVal
,→ kernelx As Integer, ByVal stridesy As Integer, ByVal stridesx As Integer) As Volume
Dim result As New List(Of Matrix)
For Each M In Items(filter)
result.Add(M.maxpool(kernelx, kernely, stridesx, stridesy))
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns the maxpooling of a volume using a kernel of shape
,→ (kernelx, kernely) and step size = (stridesx, stridesy)
Public Shared Function op(ByVal x As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal y As Volume) As Volume
Dim result As New List(Of Matrix)
For i As Integer = 0 To x.values.Count - 1
result.Add(f.Invoke(x.values(i), y.values(i)))
Next
Return New Volume(result)
End Function 'COMPLETED -- Applies a function f(Matrix, Matrix) -> Matrix, to all the
,→ layers in the Volumes x and y
Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal m As Matrix) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, m))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wise
Public Shared Function op(ByVal m As Matrix, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal v As Volume) As Volume
Return op(v, f, m)
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wise
Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Double, Matrix),
,→ ByVal y As Double) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, y))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f(v,y)" to a volume "v" and a
,→ double "y"
Public Shared Function op(ByVal y As Double, ByVal f As Func(Of Matrix, Double, Matrix),
,→ ByVal v As Volume) As Volume
Return Volume.op(v, f, y)
End Function 'COMPLETED -- Function applies a function "f(v,y)" to a volume "v" and a
,→ double "y"
87
Public Shared Function op(ByVal f As Func(Of Matrix, Matrix), ByVal v As Volume) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x))
Next
Return result
End Function 'COMPLETED -- Applies a function layer-wise to a Volume
Public Shared Function op(ByVal f As Func(Of Double, Double), ByVal v As Volume) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each m In Items(v)
result.values.Add(Matrix.op(f, m))
Next
Return result
End Function 'COMPLETED -- Applies a function elementwise to a Volume
'All Joint Operations Defined
'Iterators
Public Overloads Shared Iterator Function Items(ByVal v As Volume) As IEnumerable(Of
,→ Matrix)
For Each M In v.values
Yield M
Next
End Function 'COMPLETED -- Returns an IEnumerable of all the layers in the Volume
'Casts
Public Shared Function cast(ByVal matrixList As List(Of Matrix), ByVal rows As Integer,
,→ ByVal cols As Integer) As List(Of Volume)
Dim l_v As New List(Of Volume)
For Each M In matrixList
l_v.Add(Volume.cast(M, rows, cols))
Next
Return l_v
End Function 'COMPLETED -- Casts a list of matrices into volumes elementwise
88
Public Shared Function cast(ByVal v As Volume, ByVal rows As Integer, ByVal cols As
,→ Integer) As Matrix
If rows * cols <> v.shape.Item1 * v.shape.Item2 * v.values.Count Then
Throw New System.Exception("Dimensions for matrix must only be sufficient to store
,→ all the items in the Volume")
End If
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each M In Volume.Items(v)
For Each k In Matrix.val(M)
result.item((Math.Truncate(i / result.getshape(1)) + 1), (i Mod
,→ result.getshape(1)) + 1) = k 'Assigning each element in result with
'its corresponding value in the Volume "v"
i += 1
Next
Next
Return result
End Function 'COMPLETED -- Function used to cast a Volume into a matrix of shape = (rows,
,→ cols)
Public Shared Function cast(ByVal m As Matrix, ByVal rows As Integer, ByVal cols As
,→ Integer) As Volume
Dim result As New Volume(rows, cols, m.getshape(0) * m.getshape(1) / (rows * cols))
Dim i As Integer = 0
For Each d In Matrix.val(m)
result.item(((Math.Truncate(i / (cols))) Mod rows) + 1, (i Mod (cols)) + 1,
,→ Math.Truncate(i / ((cols * rows)))) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Function casts a matrix into a volume of shape = (rows, cols)
'All Casts defined
'Volume Class Completed
End Class
6.4 NeuralDot.Layer
89
'Function will be used to forward propagate through a layer
Function clone() As Layer(Of T)
'Function will be used to clone a layer. This is useful when saving a model as all layers
,→ will need to be cloned when saving a model
Function update(ByVal learning_rate As Decimal, ByVal prev_delta As Tensor, ByVal
,→ ParamArray param() As Tensor) As Tensor
'This function is used when a layer depends upon the prevous layers parameters. Therefore,
,→ this function is a MustInherit, as when the user defines their
'own functions they may need to use this depending upon how forward propagation works
,→ within the layers
6.5 NeuralDot.Dense
Imports NeuralDot
'x_in, w, z, b, a are all variables of type matrix that will be used by the net.
'Relationship between variables Is z = matmul(w,x_in) + b : a = act(z) & w = (units,
,→ prev_units)
'w and b are the parameter that will be optimised to lower cost
'x_in is the input into the layer and act is the activation for this current layer
Public Sub New(ByVal prevUnits As Integer, ByVal layerUnits As Integer, ByVal layerAct As
,→ Mapping, ByVal mean As Double, ByVal std As Double)
units = layerUnits : act = layerAct
w = New Matrix(layerUnits, prevUnits, mean, std)
90
b = New Matrix(layerUnits, 1, mean, std)
End Sub 'Constructor initialises parmeters that will be tuned
91
Public Overridable Function clone() As Layer(Of Matrix) Implements Layer(Of Matrix).clone
Dim cloned As New Dense(Me.w.getshape(1), units, act, 0, 0)
cloned.w = w.clone : cloned.b = b.clone
Return cloned
End Function 'Function used to clone a layer used when saving a model. Note: It is required
,→ to first make a foreward propagation, before training clone as the variables such as
,→ layer outputs will not have instantiated.
End Class
6.6 NeuralDot.Conv
Imports NeuralDot
'x_in, filter, z, b, a are all variables, with the trainable variables being only filter
,→ and b
'Relationship between variables is: z = conv2d(x_in, filter) + b : a = act(z)
'act is the activation function being used for this layer
'kernelx, kernely denote the width and height of the kernel being applied, respectively.
'stridesx, stridesy and padding denote the properties of the type of conv2d
Public Sub New(ByVal _filters_depth As Integer, ByVal _kernelx As Integer, ByVal _kernely
,→ As Integer, ByVal _stridesx As Integer, ByVal _stridesy As Integer,
ByVal _padding As String, ByVal _act As Mapping, ByVal mean As Double, ByVal
,→ std As Double)
kernelx = _kernelx : kernely = _kernely : filters_depth = _filters_depth
stridesx = _stridesx : stridesy = _stridesy : act = _act : padding = _padding
filter = New Volume(kernely, kernelx, filters_depth, mean, std) : b = New Volume(1, 1,
,→ _filters_depth, mean, std)
End Sub 'Constructor initialises parmeters that will be tuned
92
b += deltaParams(1)
'sub-routine allows user to make their own update to the layer - Useful when user wants
,→ to create/test their own optimization algorithms
End Sub
93
For f As Integer = 0 To filter.values.Count - 1
Dim k As Integer = 0
For i As Integer = 0 To x_in.shape.Item1 - kernely Step stridesy
For j As Integer = 0 To x_in.shape.Item2 - kernelx Step stridesx
dx_channel_sum.split(i + 1, i + kernely, j + 1, j + kernelx, 0) =
,→ filter.values(f) * dz.item(Math.Truncate(k / dz.shape.Item2) + 1,
,→ (k Mod dz.shape.Item2) + 1, f)
k += 1
Next
Next
dx.Add(dx_channel_sum.mean(2))
Next
Next
6.7 NeuralDot.Reshape
Imports NeuralDot
'This reshape layer isn't limited to transforming from a conv-layer to a dense-layer, but
,→ can be used to go from one user-defined layer, of type Volume
'to another of type Matrix
Public Sub New(ByVal _v_rows As Integer, ByVal _v_cols As Integer, ByVal _v_depth As
,→ Integer)
v_rows = _v_rows : v_cols = _v_cols : v_depth = _v_depth : units = (_v_cols * _v_rows *
,→ _v_depth)
End Sub
94
Dim cloned As New Reshape(v_rows, v_cols, v_depth)
Return cloned
End Function 'Function used to clone a layer used when saving a model.
End Class
6.8 NeuralDot.MaxPool
Imports NeuralDot
Public Sub New(ByVal _kernelx As Integer, ByVal _kernely As Integer, ByVal _stridesx As
,→ Integer, ByVal _stridesy As Integer)
kernelx = _kernelx : kernely = _kernely
stridesx = _stridesx : stridesy = _stridesy
95
End Sub
'Following code will find the gradient w.r.t the maxpooled elements.
'This algorithm can be derived through chain-rule and matrix-algebra.
For x_channel As Integer = 0 To x_channels.Count - 1
Dim x_slices As List(Of Matrix) = Matrix.submatrix(x_channels(x_channel), kernelx,
,→ kernely, stridesx, stridesy).ToList
Dim temp As New Matrix(x_in.shape.Item1, x_in.shape.Item2)
Dim k As Integer = 0
96
Next
6.9 NeuralDot.Net
Imports NeuralDot
Public Sub AddDense(ByVal units As Integer, ByVal act As Mapping, Optional mean As Double =
,→ 0, Optional std As Double = 1)
'A dense layer consists of a set of layers, activation being applied after matrix
,→ multiplication, and the intial values set.
'The shape of the transformation matrix = (units,units_in_prev_layer)
If netLayers.Count = 0 Then 'If no layer set, then units_in_prev = netFeatures
netLayers.Push(New Dense(netFeatures, units, act, mean, std))
ElseIf netLayers.Peek.GetType = GetType(Reshape) Then
'If prevLayers was a conv_layer then units_in_prev = reshape.units
netLayers.Push(New Dense(DirectCast(netLayers.Peek(), Reshape).units, units, act,
,→ mean, std))
Else
netLayers.Push(New Dense(DirectCast(netLayers.Peek(), Dense).units, units, act,
,→ mean, std))
End If
End Sub
Public Sub AddConv(ByVal filters As Integer, ByVal kernelx As Integer, ByVal kernely As
,→ Integer, ByVal stridesx As Integer, ByVal stridesy As Integer,
ByVal act As Mapping, Optional ByVal padding As String = "valid", Optional
,→ ByVal mean As Double = 0, Optional ByVal std As Double = 1)
97
netLayers.Push(New Conv(filters, kernelx, kernely, stridesx, stridesy, padding, act,
,→ mean, std))
End Sub 'This sub-routine adds a conv-layer, with the properties defined by the user
Public Sub AddMaxPool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx As
,→ Integer, ByVal stridesy As Integer)
netLayers.Push(New MaxPool(kernelx, kernely, stridesx, stridesy))
End Sub 'This sub-routine adds a MaxPooling layer, with properties defined by the user
Public Sub AddReshape(ByVal v_rows As Integer, ByVal v_cols As Integer, ByVal v_depth As
,→ Integer)
netLayers.Push(New Reshape(v_rows, v_cols, v_depth))
End Sub 'This sub adds a reshape-layer that will be used to reshape a volume into a matrix.
,→ This is used to go from one-type of layer of type Tensor
'to another of type Tensor
End Class
98
6.10 NeuralDot.Mapping
'act is a delegate function that is used to map a matrix of numbers to another matrix of
,→ numbers via a function "f" which maps a double to a double,
'i.e f(double) -> double
'loss is a delegate function that will be used to find the error and will be used for
,→ backprop. loss(x, y) - > e,
'where "x" represent the prediction, "y" represents the true value And "e" represents the
,→ error
99
'The following are the lists of activation functions defined, and therefore can be used by
,→ the network
Public Shared linear As New Mapping(Function(x) x, Function(x) New Matrix(x.getshape(0),
,→ x.getshape(1), 1))
Public Shared relu As New Mapping(Function(x) x.max(0), Function(x) 0 < x)
Public Shared sigmoid As New Mapping(AddressOf sigmoidAct, AddressOf sigmoidDerivative)
Public Shared tanh As New Mapping(Function(x) Matrix.op(AddressOf Math.Tanh, x),
,→ Function(x) (1 / Matrix.op(AddressOf Math.Cosh, x)) * (1 / Matrix.op(AddressOf
,→ Math.Cosh, x)))
Public Shared swish As New Mapping(Function(x) x * sigmoid.f(x), Function(x) sigmoid.f(x) *
,→ x * sigmoid.d(x))
Public Shared softmax_act As New Mapping(Function(x) Matrix.exp(x - x.max()) /
,→ Matrix.sum(Matrix.exp(x - x.max()), 0).item(1, 1), Function(x) softmax_act.f(x) * (1 -
,→ softmax_act.f(x)))
'The following code for the sigmoid function is for the extra-speed up, as the dirivative
,→ of sigmoid is s*(1-s), where s = sigmoid(x)
Private Shared Function sigmoidAct(ByVal x As Matrix) As Matrix
Return 1 / (1 + Matrix.op(AddressOf Math.Exp, -1 * x))
End Function
'The following are the lists of loss functions defined, and therefore can be used by the
,→ network
'The x value represents the prediction by the net, whereas the y value represents the
,→ actual value
Public Shared squared_error As New Mapping(Function(x, y) (x - y) * (x - y) * 0.5,
,→ Function(x, y) (x - y))
Public Shared softmax As New Mapping(Function(x, y) -1 * Matrix.op(AddressOf Math.Log, x) *
,→ y, Function(x, y) (x - y))
End Class
6.11 NeuralDot.netData
Imports NeuralDot
100
'The xdata and ydata variables will store the inputs and the corresponding outputs,
,→ respectively
101
End Function
'This functon extracts data stored in a csv file and stores the data in a matrix.
End Class
6.12 NeuralDot.Optimiser
Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
model = _net
dataxy = xydata
End Sub
MustOverride Sub resetParameters() 'This sub-routine resets the parameters being used to
,→ train the net. This includes the iterations variable.
102
For batchNum As Integer = 0 To dataxy.Count / batchSize - 1
Dim temp As New List(Of Tuple(Of Tensor, Tensor)) 'The Temp list will store the
,→ examples for a particular batch for the gradient descent
For n As Integer = 0 To batchSize - 1
temp.Add(dataxy(batchNum * batchSize + n))
Next
batchdata.Add(temp.AsEnumerable)
Next
Return batchdata
End Function 'This function will organise all the data examples into seperate batches for
,→ minibatch gradient descent
End Class
6.13 NeuralDot.GradientDescentOptimiser
Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
103
resetParameters()
End Sub
End Class
104
6.14 NeuralDot.AdvancedOptimisers
Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
'The following code, makes sure that the net is a dense net only
For Each layer In _net.netLayers
If layer.GetType <> GetType(Dense) Then
Throw New System.Exception("For AdvancedOptimizers, Network must be a dense
,→ neural network")
End If
Next
model.predict(xydata.ToList(0).Item1) 'By running this line of code, Object reference
,→ errors are avoided as any non initialised variables are then initialised.
End Sub
End Class
6.15 NeuralDot.Momentum
Class Momentum
Inherits AdvancedOptimisers
Private v_dw, v_db As New List(Of Matrix)
Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
105
If param.Count <> 1 Then
Throw New System.Exception("Momentum requires 1 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) =
,→ MyBase.calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2 'This
,→ line retreives the gradients for w & b
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
End Class
6.16 NeuralDot.RMS
Class RMS
Inherits AdvancedOptimisers
Private s_dw, s_db As New List(Of Matrix)
Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub
106
For n As Integer = 0 To model.netLayers.Count - 1
Dim layer_par As List(Of Tensor) = model.netLayers(n).parameters
s_dw.Add(New Matrix(layer_par(0).getshape(0), layer_par(0).getshape(1)))
s_db.Add(New Matrix(layer_par(1).getshape(0), layer_par(1).getshape(1)))
Next
End Sub
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("RMS requires 1 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
'Following code applies the RMS optimisation technique to the network
For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(0) + (1 - param(0)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(0) + (1 - param(0)) * db(layer) * db(layer)
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
Next
Next
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
End Class
6.17 NeuralDot.ADAM
107
Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub
Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 2 Then
Throw New System.Exception("Adam requires 2 parameters for training")
End If
For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
108
Next
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function
End Class
109