CVPR-2015
在 CIFAR-10 上的小实验可以参考博客 【Keras-Inception v1】CIFAR-10
文章目录
1 Background and Motivation
作者的工作很大程度上是受到这两个工作的启发的


DNN model size 越大(more depth,more width)效果越好,但是这样会有两个 major drawbacks
- 更 prone to overfitting,对数据要求越多(标注成本不低咯,细粒度分类需要更专业的人才能标注)
- 更消耗 computational resources
如下图
解决以上问题的根本方法就是把 neural network 变得更稀疏,当某个数据集的分布可以用一个稀疏网络表达的时候就可以通过分析某些激活值的相关性,将相关度高的神经元聚合,来获得一个稀疏的表示( Their main result [2] states that if the probability distribution of the dataset is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer after layer by analyzing the correlation statistics of the preceding layer activations and clustering neurons with highly correlated outputs. )。
[2] suggests a layer-by layer construction where one should analyze the correlation statistics of the last layer and cluster them into groups of units with high correlation.
这种方法也呼应了 Hebbian principle (neurons that fire together, wire together),一个很通俗的现象,先摇铃铛,之后给一只狗喂食,久而久之,狗听到铃铛就会口水连连。这也就是狗的“听到”铃铛的神经元与“控制”流口水的神经元之间的链接被加强了,而Hebbian principle的精确表达就是如果两个神经元常常同时产生动作电位,或者说同时激动(fire),这两个神经元之间的连接就会变强,反之则变弱1
[2] + Hebbian principle 正是的稀疏结构设计的理论支持
因此作者来了一个聚类(inception),配合 1x1 稀疏,效果傲视群雄,实在佩服
2 Advantages / Contributions
2.1 Advantages
- 12 times fewer parameters than AlexNet (more depth,more width) and more accuracy
- ILSVRC 2014 outperforms the current state of the art(classification and detection challenges)
2.2 Contributions
our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.
3 Innovation
Inception 结构的设计,结合 NIN 减少计算量
4 Method
Inception 名字的起源
- 【NIN】《Network In Network》
- We need to go deeper(internet meme)
Deep 有两层含义
- 网络的深度
- 境界更深(inception module 的形式)
caffe 代码,caffe代码可视化工具
keras版本可以参考系列连载:【Keras-Inception v1】CIFAR-10
全景图看博客最后一节 GoogleNet
输入: 224*224
inception3:
224
/
2
3
=
28
224/2^3 = 28
224/23=28
inception4:
224
/
2
4
=
14
224/2^4 = 14
224/24=14
inception5:
224
/
2
5
=
7
224/2^5 = 7
224/25=7
table 1 绿色加起来等于output size 的channels eg:256 = 64 + 128 + 32 + 32
-
train:the losses of the auxiliary classifiers(4a、4b——combat the vanishing gradient、providing regularization) were weighted by 0.3
-
inference:these auxiliary networks are discarded
5 Dataset
-
ILSVRC 2014 Classification Challenge(1000类)
- training: about 1.2 million
- validation:50,000
- testing:100,000
-
ILSVRC 2014 Detection Challenge(200类)
6 Experiments
6.1 ILSVRC 2014 Classification Challenge
致敬 “OG”
ensemble(crops 是在 test上)
6.2 ILSVRC 2014 Detection Challenge
致敬 “OG”
1v1 battle(GoogleNet 团战(ensemble) 作用比 Deep Insight 大,solo 被压制)
7 Conclusion / Future work
Still, our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.
Q1:第7节中 V/S 的含义 (valid 和 same)
Q2:Table 5 中 Contextual model 的含义
Q3:为啥要在 3×3 和 5×5 之前用 1×1,而在 max pooling 之后用 1×1(感觉前面是模仿 NIN,后面单纯的是为了减少计算量)
Q4:pool proj 是什么?
8 GoogleNet
auxiliary classifiers 的使用
-
train:the losses of the auxiliary classifiers(4a、4b) were weighted by 0.3
-
inference:these auxiliary networks are discarded
auxiliary classifiers 的作用
- combat the vanishing gradient(加速收敛)
- providing regularization(我的理解是,正则化就是防过拟合,类似要求题的结果正确,步骤也要正确的感觉,也可以理解为高层的特征趋向于拟合复杂的结构,底层的特征趋向于拟合简单的结构,我们的数据有复杂也有简单的结构,auxiliary classifiers 接在低层是网络不倾向于复杂的结构)
Inception v3 中关于 auxiliary classifiers 的观点如下:
-
The original motivation was to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combating the vanishing gradient problem in very deep networks. 实验表明,并不能加速 converge,只是在训练快结束的有时候,有比没有精度会高一点点
-
these branches help evolving the low-level features is most likely misplaced. Instead, we argue that the auxiliary classifiers act as regularizer. This is supported by the fact that the main classifier of the network performs better if the side branch is batch-normalized or has a dropout layer.
v3作者想表达 GoogleNet 中的 auxiliary classifiers 并没有太多 combat Gradient Vannishing 的功能(并不能加快收敛),更像是 regularizer。