經典分類CNN模型系列其五:Inception v2/v3

介紹

Inception v2與Inception v3被作者放在了一篇paper裡面,因此我們也作為一篇blog來對其講解。

Google家的Inception系列模型提出的初衷主要為了解決CNN分類模型的兩個問題,其一是如何使得網路深度增加的同時能使得模型的分類性能隨著增加,而非像簡單的VGG網路那樣達到一定深度後就陷入了性能飽和的困境(Resnet針對的也是此一問題);其二則是如何在保證分類網路分類準確率提升或保持不降的同時使得模型的計算開銷與內存開銷充分地降低。在這兩個問題中,他們尤其關注第二個問題,畢竟在移動互聯網大行天下的今天,如何將複雜的CNN模型部署在計算與存儲資源均有限的移動端,並使之有效地運行有著更大的實際價值。

在Inception v1當中,它用於參賽的Googlenet模型只使用了約5百萬個參數,與它相比,Alexnet使用了約6千萬個參數,VGG用的參數更是多達1億八千萬個(當然其中主要是因為後兩者使用了較大的FC層)。

而在Inception v2模型中,作者們又進一步考慮了其它減少計算與可訓練參數的可能,最終這一新模型在使用較少訓練參數的情況下達到了更高的state-of-art分類準確率。

模型設計的通用準則

對於CNN模型設計與訓練這一『鍊金術』學問,Googler們在一番摸索與思考後提出了以下幾個尚未被證偽的『煉金法則』。而Inception v2/v3模型的設計思想即是源自於它們。

慎用bottleneck

如果出於模型分類精度考慮,那麼應盡量避免使用bottleneck模塊(關於此可參考不才之前的那篇inception v1博客),尤其是不應當在模型的早期階段使用。作者們認為CNN模型本質上是一個DAG(有向無環圖),其上信息自底向上流動,而每一個bottleneck的使用都會損失一部分信息,因此當我們出於計算與存儲節省而使用bottleneck時,一定要下手輕些(不要一下子使用1x1 conv縮減過多的feature maps的channels數目,如果一定要用reduction模塊也要在模型較靠後的幾層使用)。

層寬些還是有好處的

撇開計算與內存開銷增加的負面因素不談,每個Conv層的kernels數目增加對於模型處理局部信息,進而更有效地對其表達還是有好處的。畢竟多些參數就可使得每層獲得多些的表達能力,所謂一寸長一寸強。它還可使得模型收斂得更快(當然是指的整體所需的迭代次數減少,而非整體訓練下來所需的全部時間)。

更深的較底層(size亦小)可以使勁壓

這個純是實驗多了試出來的結論。即對於網路後面的feature maps可以使用像bottleneck那樣的模塊對其進行channels數目縮減再進行3x3 conv這樣的較大計算。在其中1x1 conv reduction op的使用不僅不會影響模型精度,反而還能使其收斂速度加快。他們給出的猜測性解釋是,後期較小size的feature maps之上的相鄰units(即channels)之間具有更加的關聯性(即冗餘信息不少),因此可以折騰的厲害些(使輸出的channels變少)而不擔心信息丟失(反而信息被梳理的更清晰、有效)。。(好吧,我承認這一段乍讀像是在胡說,我自己也是雲里霧裡。。什麼時候SCI文章也開始像李義山的詩一樣朦朧了。。)

平衡網路的深度與寬度

Googler們將深度學習網路的設計問題視為了一個在計算/內存資源限定條件存在的情況下,通過有效組合、堆加各種層/模塊,從而使得模型分類精度最高的一種最優化問題。而這自然也是最近火熱的所謂AutoML的核心思想。。

他們認為(也是通過實驗後總結)一個成功的CNN網路設計一定要將深度與寬度同時增加,瘦高或矮胖的CNN網路都不如一個身材勻稱的網路的效果好。

Inception v2中引入的一些變動

將kernel size較大的conv計算進一步分解

inception v1中稀疏表達模塊的思想在inception v2中得到了較好的繼承。既然我們可以用稀疏的inception模塊來有力地表達多維度信息,那麼幹嗎不再進一步將其中大的kernel size的conv層再進一步分解展開呢。。Network in network文章中提到一個表達力強的複雜網路可以由較簡單的小網路來組成,那麼乾脆就將網路的組合維度再增加些好了,說不定就能更有效地逼近人腦神經元的組合複雜度呢。。下圖為inception v1中所使用的inception 模塊。

Inception基本模塊

大kernel分解為多個小kernel的累加

首先試著將一個5x5的conv分解為了兩個累加在一塊的3x3 conv。如此可以有效地只使用約(3x3 + 3x3)/(5x5)=72%的計算開銷。下圖可看出此替換的有效性。

conv_5x5_等價於兩個conv_3x3

它的實用直接可將我們原來在inception v1中所用的inception module升級為了如下一種新的inception module。

Inception模塊變種一

將對稱的conv計算分解為非對稱的conv計算

這一次是將一個3x3的conv分解為了兩個分別為1x3與3x1的conv計算。這樣同樣可以只使用約(1x3 + 3x1) / (3x3) = 67%的計算開銷。下圖是此替換的有效性。作者更進一步發揮想像,認為任一個nxn conv都可通過替換為兩個分別為1xn與nx1的convs層來節省計算與內存。

conv_3x3_等價於conv_1x3和conv3x1

它的使用帶來了另外一種更新的inception模塊變種,如下圖所示。

Inception變種二

增加的分類層的作用分析

在inception v1中,作者為了減少深度模型中反向傳播時梯度消失的問題,而提出了在模型的中間與較底部增加了兩個extra 分類loss層的方案。

在inception v2中,作者同樣使用了extra loss層。不過他們反思了之前說過的話,覺著不大對了,果斷以今日之我否定了昨日之我。他們現在(當時是2015年)覺著extra loss的真正意義在於對訓練參數進行regularization。為此他們試著在這些extra loss的FC層里添加了BN或者dropout層,果然發現分類結果好了些,於是就興沖沖地發布了這一『重大』最新發現。。

Aug_loss的使用

更高效的下採樣方案

深度CNN網路中一般會不斷使用Pool層來減少feature maps size。這必然意味著傳遞信息的不斷丟失。一般為了減少信息的過度丟失,在加入Pool層減少feature maps size的同時都會同比例擴大它的channels數目(此一思想與做法可在VGG網路中明顯看到,亦已被所有的CNN網路設計所遵循)。

真正實行可以有兩個辦法,其一先將channels數目擴大(一般使用1x1 conv),然後再使用pool層來減少feature map size,不過其中1x1 conv的計算顯然會有非常大的計算開銷;其二則是先做Pooling減少feature map size,然後再使用1x1 conv對其channels數目放大,不過顯然首先使用Pooling的話會造成信息硬性丟失的不可避免,在此之後再使用1x1 conv去增加channels數目的做法已經有些亡羊補牢之嫌了。。下圖反映了這兩種較為傳統的做法。

下採樣的兩種傳統做法

所以作者提出了他們的辦法,確實比較新穎。即分別使用pool與conv直接減少feature map size的做法分別計算,完了後再將兩者算出的feature maps組合起來,妙哉,直欲為此飲上一大浮白也!下圖是此一方法的表示。

新提出的更有效率的下採樣方法

最終的Inception v2/inception v3模型

講到這裡,inception v2/v3已經呼之欲出了。請見下表。

Inception_v2_和_inception_v3模型

其中v2/v3模型結構上的差別只有一點即在inception v3中使用的Aug loss裡面使用了BN進行regularization。

使用Label smoothing來對模型進行規則化處理

作者認為softmax loss過於注重使模型學習分類出正確的類別(label),而過於地試著偏離其它的非正確labels。。如此的話可能使得訓練得到的模型在新的數據集上擴展性不好(即易陷入overfitting的困局)。為此他們認為有必要使用label的先驗分布信息對其loss進行校正。如下為他們最終使用的loss。

規則化的training_loss

實驗結果

下圖為inception v3與其它模型相比的實驗結果。

實驗結果

代碼分析

我們還是通過intel caffe裡面的inception v3 prototxt file來看下它的模型設計實驗吧。當然重點是看新引入的兩種inception module設計。單純去看prototxt file的話會有些不方便,畢竟配置文件實在是太長了,建議使用Netscope這個工具導入caffe的prototxt file去圖形化認識它的網路。

此模型位置可於intel caffe的此處看到:models/intel_optimized_models/benchmark/googlenet_v3/train_val.prototxt。

如下為inception模型變形一。

layer {
name: "mixed_3_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_2_chconcat"
top: "mixed_3_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
pad: 0
kernel_size: 3
stride: 2
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_3_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_3_conv_conv2d"
top: "mixed_3_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_3_conv_relu"
type: "ReLU"
bottom: "mixed_3_conv_conv2d_bn"
top: "mixed_3_conv_conv2d_relu"
}
layer {
name: "mixed_3_tower_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_2_chconcat"
top: "mixed_3_tower_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_3_tower_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_3_tower_conv_conv2d"
top: "mixed_3_tower_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_3_tower_conv_relu"
type: "ReLU"
bottom: "mixed_3_tower_conv_conv2d_bn"
top: "mixed_3_tower_conv_conv2d_relu"
}
layer {
name: "mixed_3_tower_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_3_tower_conv_conv2d_relu"
top: "mixed_3_tower_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 96
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_3_tower_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_3_tower_conv_1_conv2d"
top: "mixed_3_tower_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_3_tower_conv_1_relu"
type: "ReLU"
bottom: "mixed_3_tower_conv_1_conv2d_bn"
top: "mixed_3_tower_conv_1_conv2d_relu"
}
layer {
name: "mixed_3_tower_conv_2_conv2d"
type: "Convolution"
bottom: "mixed_3_tower_conv_1_conv2d_relu"
top: "mixed_3_tower_conv_2_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 96
bias_term: false
pad: 0
kernel_size: 3
stride: 2
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_3_tower_conv_2_batchnorm"
type: "BatchNorm"
bottom: "mixed_3_tower_conv_2_conv2d"
top: "mixed_3_tower_conv_2_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_3_tower_conv_2_relu"
type: "ReLU"
bottom: "mixed_3_tower_conv_2_conv2d_bn"
top: "mixed_3_tower_conv_2_conv2d_relu"
}
layer {
name: "max_pool_mixed_3_pool"
type: "Pooling"
bottom: "ch_concat_mixed_2_chconcat"
top: "max_pool_mixed_3_pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
pad: 0
}
}

如下為inception v2/v3中所使用的較為新穎的下採樣模塊的表示。

layer {
name: "mixed_2_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_1_chconcat"
top: "mixed_2_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_conv_conv2d"
top: "mixed_2_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_conv_relu"
type: "ReLU"
bottom: "mixed_2_conv_conv2d_bn"
top: "mixed_2_conv_conv2d_relu"
}
layer {
name: "mixed_2_tower_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_1_chconcat"
top: "mixed_2_tower_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 48
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_conv_conv2d"
top: "mixed_2_tower_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_conv_relu"
type: "ReLU"
bottom: "mixed_2_tower_conv_conv2d_bn"
top: "mixed_2_tower_conv_conv2d_relu"
}
layer {
name: "mixed_2_tower_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_2_tower_conv_conv2d_relu"
top: "mixed_2_tower_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_conv_1_conv2d"
top: "mixed_2_tower_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_conv_1_relu"
type: "ReLU"
bottom: "mixed_2_tower_conv_1_conv2d_bn"
top: "mixed_2_tower_conv_1_conv2d_relu"
}
layer {
name: "mixed_2_tower_1_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_1_chconcat"
top: "mixed_2_tower_1_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_1_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_1_conv_conv2d"
top: "mixed_2_tower_1_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_1_conv_relu"
type: "ReLU"
bottom: "mixed_2_tower_1_conv_conv2d_bn"
top: "mixed_2_tower_1_conv_conv2d_relu"
}
layer {
name: "mixed_2_tower_1_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_2_tower_1_conv_conv2d_relu"
top: "mixed_2_tower_1_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 96
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_1_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_1_conv_1_conv2d"
top: "mixed_2_tower_1_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_1_conv_1_relu"
type: "ReLU"
bottom: "mixed_2_tower_1_conv_1_conv2d_bn"
top: "mixed_2_tower_1_conv_1_conv2d_relu"
}
layer {
name: "mixed_2_tower_1_conv_2_conv2d"
type: "Convolution"
bottom: "mixed_2_tower_1_conv_1_conv2d_relu"
top: "mixed_2_tower_1_conv_2_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 96
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_1_conv_2_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_1_conv_2_conv2d"
top: "mixed_2_tower_1_conv_2_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_1_conv_2_relu"
type: "ReLU"
bottom: "mixed_2_tower_1_conv_2_conv2d_bn"
top: "mixed_2_tower_1_conv_2_conv2d_relu"
}
layer {
name: "AVE_pool_mixed_2_pool"
type: "Pooling"
bottom: "ch_concat_mixed_1_chconcat"
top: "AVE_pool_mixed_2_pool"
pooling_param {
pool: AVE
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "mixed_2_tower_2_conv_conv2d"
type: "Convolution"
bottom: "AVE_pool_mixed_2_pool"
top: "mixed_2_tower_2_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_2_tower_2_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_2_tower_2_conv_conv2d"
top: "mixed_2_tower_2_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_2_tower_2_conv_relu"
type: "ReLU"
bottom: "mixed_2_tower_2_conv_conv2d_bn"
top: "mixed_2_tower_2_conv_conv2d_relu"
}

以下為inception模塊變形二的描述。

layer {
name: "mixed_9_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_8_chconcat"
top: "mixed_9_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 320
bias_term: false
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
kernel_h: 1
kernel_w: 1
}
}
layer {
name: "mixed_9_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_conv_conv2d"
top: "mixed_9_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_conv_relu"
type: "ReLU"
bottom: "mixed_9_conv_conv2d_bn"
top: "mixed_9_conv_conv2d_relu"
}
layer {
name: "mixed_9_tower_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_8_chconcat"
top: "mixed_9_tower_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_9_tower_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_conv_conv2d"
top: "mixed_9_tower_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_conv_relu"
type: "ReLU"
bottom: "mixed_9_tower_conv_conv2d_bn"
top: "mixed_9_tower_conv_conv2d_relu"
}
layer {
name: "mixed_9_tower_mixed_conv_conv2d"
type: "Convolution"
bottom: "mixed_9_tower_conv_conv2d_relu"
top: "mixed_9_tower_mixed_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
pad_h: 0
pad_w: 1
kernel_h: 1
kernel_w: 3
}
}
layer {
name: "mixed_9_tower_mixed_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_mixed_conv_conv2d"
top: "mixed_9_tower_mixed_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_mixed_conv_relu"
type: "ReLU"
bottom: "mixed_9_tower_mixed_conv_conv2d_bn"
top: "mixed_9_tower_mixed_conv_conv2d_relu"
}
layer {
name: "mixed_9_tower_mixed_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_9_tower_conv_conv2d_relu"
top: "mixed_9_tower_mixed_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
pad_h: 1
pad_w: 0
kernel_h: 3
kernel_w: 1
}
}
layer {
name: "mixed_9_tower_mixed_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_mixed_conv_1_conv2d"
top: "mixed_9_tower_mixed_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_mixed_conv_1_relu"
type: "ReLU"
bottom: "mixed_9_tower_mixed_conv_1_conv2d_bn"
top: "mixed_9_tower_mixed_conv_1_conv2d_relu"
}
layer {
name: "mixed_9_tower_1_conv_conv2d"
type: "Convolution"
bottom: "ch_concat_mixed_8_chconcat"
top: "mixed_9_tower_1_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 448
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_9_tower_1_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_1_conv_conv2d"
top: "mixed_9_tower_1_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_1_conv_relu"
type: "ReLU"
bottom: "mixed_9_tower_1_conv_conv2d_bn"
top: "mixed_9_tower_1_conv_conv2d_relu"
}
layer {
name: "mixed_9_tower_1_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_9_tower_1_conv_conv2d_relu"
top: "mixed_9_tower_1_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_9_tower_1_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_1_conv_1_conv2d"
top: "mixed_9_tower_1_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_1_conv_1_relu"
type: "ReLU"
bottom: "mixed_9_tower_1_conv_1_conv2d_bn"
top: "mixed_9_tower_1_conv_1_conv2d_relu"
}
layer {
name: "mixed_9_tower_1_mixed_conv_conv2d"
type: "Convolution"
bottom: "mixed_9_tower_1_conv_1_conv2d_relu"
top: "mixed_9_tower_1_mixed_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
pad_h: 0
pad_w: 1
kernel_h: 1
kernel_w: 3
}
}
layer {
name: "mixed_9_tower_1_mixed_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_1_mixed_conv_conv2d"
top: "mixed_9_tower_1_mixed_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_1_mixed_conv_relu"
type: "ReLU"
bottom: "mixed_9_tower_1_mixed_conv_conv2d_bn"
top: "mixed_9_tower_1_mixed_conv_conv2d_relu"
}
layer {
name: "mixed_9_tower_1_mixed_conv_1_conv2d"
type: "Convolution"
bottom: "mixed_9_tower_1_conv_1_conv2d_relu"
top: "mixed_9_tower_1_mixed_conv_1_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 384
bias_term: false
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
pad_h: 1
pad_w: 0
kernel_h: 3
kernel_w: 1
}
}
layer {
name: "mixed_9_tower_1_mixed_conv_1_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_1_mixed_conv_1_conv2d"
top: "mixed_9_tower_1_mixed_conv_1_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_1_mixed_conv_1_relu"
type: "ReLU"
bottom: "mixed_9_tower_1_mixed_conv_1_conv2d_bn"
top: "mixed_9_tower_1_mixed_conv_1_conv2d_relu"
}
layer {
name: "AVE_pool_mixed_9_pool"
type: "Pooling"
bottom: "ch_concat_mixed_8_chconcat"
top: "AVE_pool_mixed_9_pool"
pooling_param {
pool: AVE
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "mixed_9_tower_2_conv_conv2d"
type: "Convolution"
bottom: "AVE_pool_mixed_9_pool"
top: "mixed_9_tower_2_conv_conv2d"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 192
bias_term: false
pad: 0
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
}
}
layer {
name: "mixed_9_tower_2_conv_batchnorm"
type: "BatchNorm"
bottom: "mixed_9_tower_2_conv_conv2d"
top: "mixed_9_tower_2_conv_conv2d_bn"
batch_norm_param {
}
}
layer {
name: "mixed_9_tower_2_conv_relu"
type: "ReLU"
bottom: "mixed_9_tower_2_conv_conv2d_bn"
top: "mixed_9_tower_2_conv_conv2d_relu"
}

參考文獻

  • Rethinking the Inception Architecture for Computer Vision, Christian-Szegedy, 2015
  • github.com/intel/caffe
  • ethereon.github.io/nets

推薦閱讀:

TAG:模型 | 卷積神經網路(CNN) | 深度學習(DeepLearning) |