【Inception-v1】《Going Deeper with Convolutions》

最新推荐文章于 2025-05-10 21:18:56 发布

苏堤春不晓

最新推荐文章于 2025-05-10 21:18:56 发布

阅读量954

点赞数 3

CC 4.0 BY-SA版权

分类专栏： CNN / Transformer

本文链接：https://blog.csdn.net/bryant_meng/article/details/80779353

CNN / Transformer 专栏收录该内容

251 篇文章

订阅专栏

本文深入探讨了Inception模块的设计理念，及其在GoogleNet中的应用，如何通过结合NIN减少计算量，实现参数数量大幅减少的同时提升准确性。文章还分析了Inception结构背后的Hebbian原则，以及其在ILSVRC2014竞赛中的卓越表现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述

CVPR-2015

在 CIFAR-10 上的小实验可以参考博客【Keras-Inception v1】CIFAR-10

文章目录

1 Background and Motivation
2 Advantages / Contributions
- 2.1 Advantages
- 2.2 Contributions
3 Innovation
4 Method
5 Dataset
6 Experiments
- 6.1 ILSVRC 2014 Classification Challenge
- 6.2 ILSVRC 2014 Detection Challenge
7 Conclusion / Future work
8 GoogleNet

1 Background and Motivation

作者的工作很大程度上是受到这两个工作的启发的

DNN model size 越大（more depth，more width）效果越好，但是这样会有两个 major drawbacks

更 prone to overfitting，对数据要求越多（标注成本不低咯，细粒度分类需要更专业的人才能标注）
更消耗 computational resources

如下图

解决以上问题的根本方法就是把 neural network 变得更稀疏，当某个数据集的分布可以用一个稀疏网络表达的时候就可以通过分析某些激活值的相关性，将相关度高的神经元聚合，来获得一个稀疏的表示（ Their main result [2] states that if the probability distribution of the dataset is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer after layer by analyzing the correlation statistics of the preceding layer activations and clustering neurons with highly correlated outputs. ）。

[2] suggests a layer-by layer construction where one should analyze the correlation statistics of the last layer and cluster them into groups of units with high correlation.

这种方法也呼应了 Hebbian principle （neurons that fire together, wire together），一个很通俗的现象，先摇铃铛，之后给一只狗喂食，久而久之，狗听到铃铛就会口水连连。这也就是狗的“听到”铃铛的神经元与“控制”流口水的神经元之间的链接被加强了，而Hebbian principle的精确表达就是如果两个神经元常常同时产生动作电位，或者说同时激动（fire），这两个神经元之间的连接就会变强，反之则变弱¹

[2] + Hebbian principle 正是的稀疏结构设计的理论支持

因此作者来了一个聚类（inception），配合 1x1 稀疏，效果傲视群雄，实在佩服

2 Advantages / Contributions

2.1 Advantages

12 times fewer parameters than AlexNet (more depth，more width) and more accuracy
ILSVRC 2014 outperforms the current state of the art（classification and detection challenges）

2.2 Contributions

our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.

3 Innovation

Inception 结构的设计，结合 NIN 减少计算量

4 Method

Inception 名字的起源

【NIN】《Network In Network》
We need to go deeper（internet meme）

Deep 有两层含义

网络的深度
境界更深（inception module 的形式）

caffe 代码，caffe代码可视化工具
keras版本可以参考系列连载：【Keras-Inception v1】CIFAR-10
全景图看博客最后一节 GoogleNet

在这里插入图片描述

输入： 224*224
inception3： $224/2^3 = 28$
inception4： $224/2^4 = 14$
inception5： $224/2^5 = 7$

table 1 绿色加起来等于output size 的channels eg：256 = 64 + 128 + 32 + 32

train：the losses of the auxiliary classifiers（4a、4b——combat the vanishing gradient、providing regularization） were weighted by 0.3
inference：these auxiliary networks are discarded

5 Dataset

ILSVRC 2014 Classification Challenge（1000类）
- training： about 1.2 million
- validation：50,000
- testing：100,000
ILSVRC 2014 Detection Challenge（200类）

6 Experiments

6.1 ILSVRC 2014 Classification Challenge

致敬 “OG”
在这里插入图片描述

ensemble（crops 是在 test上）
在这里插入图片描述

6.2 ILSVRC 2014 Detection Challenge

致敬 “OG”
在这里插入图片描述

1v1 battle（GoogleNet 团战（ensemble）作用比 Deep Insight 大，solo 被压制）
在这里插入图片描述

7 Conclusion / Future work

Still, our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.

Q1：第7节中 V/S 的含义（valid 和 same）
Q2：Table 5 中 Contextual model 的含义
Q3：为啥要在 3×3 和 5×5 之前用 1×1，而在 max pooling 之后用 1×1（感觉前面是模仿 NIN，后面单纯的是为了减少计算量）
Q4：pool proj 是什么？

8 GoogleNet

auxiliary classifiers 的使用

train：the losses of the auxiliary classifiers（4a、4b） were weighted by 0.3
inference：these auxiliary networks are discarded

auxiliary classifiers 的作用

combat the vanishing gradient（加速收敛）
providing regularization（我的理解是，正则化就是防过拟合，类似要求题的结果正确，步骤也要正确的感觉，也可以理解为高层的特征趋向于拟合复杂的结构，底层的特征趋向于拟合简单的结构，我们的数据有复杂也有简单的结构，auxiliary classifiers 接在低层是网络不倾向于复杂的结构）

Inception v3 中关于 auxiliary classifiers 的观点如下：

The original motivation was to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combating the vanishing gradient problem in very deep networks. 实验表明，并不能加速 converge，只是在训练快结束的有时候，有比没有精度会高一点点
these branches help evolving the low-level features is most likely misplaced. Instead, we argue that the auxiliary classifiers act as regularizer. This is supported by the fact that the main classifier of the network performs better if the side branch is batch-normalized or has a dropout layer.

v3作者想表达 GoogleNet 中的 auxiliary classifiers 并没有太多 combat Gradient Vannishing 的功能（并不能加快收敛），更像是 regularizer。

Hebbian principle理解 ↩︎