cs231n Github Io Understanding CNN
cs231n Github Io Understanding CNN
Conv/FC Filters. The second common strategy is to visualize the weights. These are usually most interpretable
on the first CONV layer which is looking directly at the raw pixel data, but it is possible to also show the filter
weights deeper in the network. The weights are useful to visualize because well-trained networks usually display
nice and smooth filters without any noisy patterns. Noisy patterns can be an indicator of a network that hasn’t
been trained for long enough, or possibly a very low regularization strength that may have led to overfitting.
Typical-looking filters on the first CONV layer (left), and the 2nd CONV layer (right) of a trained AlexNet. Notice that the first-
layer weights are very nice and smooth, indicating nicely converged network. The color/grayscale features are clustered
because the AlexNet contains two separate streams of processing, and an apparent consequence of this architecture is that
one stream develops high-frequency grayscale features and the other low-frequency color features. The 2nd CONV layer
weights are not as interpretable, but it is apparent that they are still smooth, well-formed, and absent of noisy patterns.
One problem with this approach is that ReLU neurons do not necessarily have any semantic meaning by
themselves. Rather, it is more appropriate to think of multiple ReLU neurons as the basis vectors of some space
that represents in image patches. In other words, the visualization is showing the patches at the edge of the
cloud of representations, along the (arbitrary) axes that correspond to the filter weights. This can also be seen by
the fact that neurons in a ConvNet operate linearly over the input space, so any arbitrary rotation of that space is
a no-op. This point was further argued in Intriguing properties of neural networks by Szegedy et al., where they
perform a similar visualization along arbitrary directions in the representation space.
To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in
AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity).
We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can
them be visualized in a grid:
t-SNE embedding of a set of images based on their CNN codes. Images that are nearby each other are also close in the CNN
representation space, which implies that the CNN "sees" them as being very similar. Notice that the similarities are more often
class-based and semantic rather than pixel and color-based. For more details on how this visualization was produced the
associated code, and more related visualizations at different scales refer to t-SNE visualization of CNN codes.
Three input images (top). Notice that the occluder region is shown in grey. As we slide the occluder over the image we record
the probability of the correct class and then visualize it as a heatmap (shown below each image). For instance, in the left-most
image we see that the probability of Pomeranian plummets when the occluder covers the face of the dog, giving us some level
of confidence that the dog's face is primarily responsible for the high classification score. Conversely, zeroing out other parts of
the image is seen to have relatively negligible impact.
Visualizing the data gradient and friends
Data Gradient.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
DeconvNet.
Guided Backpropagation.
Fooling ConvNets
Explaining and Harnessing Adversarial Examples
Comparing ConvNets to Human labelers
What I learned from competing against a ConvNet on ImageNet
cs231n
cs231n
[email protected]