3b Dynamics4
3b Dynamics4
Inputs Outputs
10000 10000
01000 01000
00100 00100
00010 00010
00001 00001
3 4
8–3–8 Encoder Hinton Diagrams
Sharp Straight Sharp
Left Ahead Right
Exercise: 30 Output
Units
➛ Draw the hidden unit space for 2-2-2, 3-2-3, 4-2-4 and 5-2-5 encoders.
➛ Represent the input-to-hidden weights for each input unit by a point, and the 4 Hidden
Units
hidden-to-output weights for each output unit by a line.
➛ Now consider the 8-3-8 encoder with its 3-dimensional hidden unit space.
→ what shape would be formed by the 8 points representing the
input-to-hidden weights for the 8 input units?
→ what shape would be formed by the planes representing the
hidden-to-output weights for each output unit? 30x32 Sensor
Input Retina
Hint: think of two platonic solids, which are “dual” to each other.
➛ used to visualize higher dimensions
➛ white = positive, black = negative
5 6
7 8
Weight Space Symmetry (8.2) Controlled Nonlinearity
➛ swap any pair of hidden nodes, overall function will be the same
➛ for small weights, each layer implements an approximately linear function,
➛ on any hidden node, reverse the sign of all incoming and outgoing weights so multiple layers also implement an approximately linear function.
(assuming symmetric transfer function)
➛ for large weights, transfer function approximates a step function,
➛ hidden nodes with identical input-to-hidden weights in theory would never so computation becomes digital and learning becomes very slow.
separate; so, they all have to begin with different random weights
➛ with typical weight values, two-layer neural network implements a function
➛ in practice, all hidden nodes may try to do similar job at first, then gradually which is close to linear, but takes advantage of a limited degree of nonlinearity.
specialize.
9 10
−2
−4
−6
−6 −4 −2 0 2 4 6
For example, this Twin Spirals problem is difficult to learn with a 2-layer network,
but it can be learned using a 3-layer network.
11 12
Second Hidden Layer Network Output
13 14
15 16
Vanishing / Exploding Gradients Activation Functions (6.3)
4 4
3 3
2 2
1 1
1 1
0 0
-1 -1
-2 -2
-4 -2 0 2 4 -4 -2 0 2 4
17 18
Activation Functions
19