NEURAL NETWORKS RESEARCH
NEURAL NETWORKS RESEARCH
CHAPTER 1
First example is going to be a neural network that can learn to recognise
handwritten digits.
Neurons can be each of the pixels of the image with the number
representing the grayscale value of it -that is known as its activation
Each of the pixels in this case 28*28 make up the first LAYER of the neural
network imagine it as all the pixels lined up vertically for the first layer -
784 pixels
The last layer has 10 neurons from 0-9 the activation of these represent
how much the system thinks the image correspond with the digit
For this example lets imagine two hidden layers each with 16 neurons ,
the activation state of the previous layer is tied to the activation state of
the current layer they are very interlinked
We can also make use of weights between the link between neuron layers-
these again should be a number value
We can take all the activation states and calculate a weighted sum, if we
made the weights associated with certain patterns larger taking the
weighted sum can add up to specifying a certain pattern
Another thing to consider is that the weighted sum can be very large a
common thing to do is use a function to squish it into the range 0-1 e.g.
the sigmoid function 1/1+e^-x. This makes negative inputs close to zero
and large positive ones closer to one. You can also add a bias e.g. subtract
10 from the weighted sum to determine WHEN the pattern is meaningful
enough.
For this overall example there are 13002 weights and biases we can tweak
with so learning refers to finding the right weights and biases
The activation calculation can be written as vectors with is the common
notation for it
CHAPTER 2
-how neural networks learn
Uses training data with labels it will adjust the weights and biases to
improve the performance
Each neuron is connected to all previous neurons and the weights are a
measure of the strength of the connection and the bias an indication of
how active the neuron is
You can start by initializing all biases and weights randomly ,this will
obviously be very inaccurate so we calculate the cost similar to chi
squared where you find the difference between the current activation
states and the ones it should be squared .The bigger it is the further from
an accurate network
The cost function uses MANY training examples to find the average
squared difference
This method is known as the gradient descent using the negative gradient
of the cost function to converge towards a local minimum of the cost
function
CHAPTER 3
This is about back propagation the method which neural networks use to
learn
-you can adjust the activation state by increasing the bias, increasing the
weights or change activations from previous