【博客存檔】TensoFlow之深入理解GoogLeNet

前言

GoogLeNet是ILSVRC 2014的冠軍,主要是致敬經典的LeNet-5演算法,主要是Google的team成員完成,paper見Going Deeper with Convolutions. 相關工作主要包括LeNet-5Gabor filtersNetwork-in-Network.Network-in-Network改進了傳統的CNN網路,採用少量的參數就輕鬆地擊敗了AlexNet網路,使用Network-in-Network的模型最後大小約為29MNetwork-in-Network caffe model.GoogLeNet借鑒了Network-in-Network的思想,下面會詳細講述下。

Network-in-Network

左邊是我們CNN的線性卷積層,一般來說線性卷積層用來提取線性可分的特徵,但所提取的特徵高度非線性時,我們需要更加多的filters來提取各種潛在的特徵,這樣就存在一個問題,filters太多,導致網路參數太多,網路過於複雜對於計算壓力太大。

文章主要從兩個方法來做了一些改良:1,卷積層的改進:MLPconv,在每個local部分進行比傳統卷積層複雜的計算,如上圖右,提高每一層卷積層對於複雜特徵的識別能力,這裡舉個不恰當的例子,傳統的CNN網路,每一層的卷積層相當於一個只會做單一任務,你必須要增加海量的filters來達到完成特定量類型的任務,而MLPconv的每層conv有更加大的能力,每一層能夠做多種不同類型的任務,在選擇filters時只需要很少量的部分;2,採用全局均值池化來解決傳統CNN網路中最後全連接層參數過於複雜的特點,而且全連接會造成網路的泛化能力差,Alexnet中有提高使用dropout來提高網路的泛化能力。

最後作者設計了一個4層的Network-in-network+全局均值池化層來做imagenet的分類問題.

class NiN(Network): def setup(self): (self.feed("data") .conv(11, 11, 96, 4, 4, padding="VALID", name="conv1") .conv(1, 1, 96, 1, 1, name="cccp1") .conv(1, 1, 96, 1, 1, name="cccp2") .max_pool(3, 3, 2, 2, name="pool1") .conv(5, 5, 256, 1, 1, name="conv2") .conv(1, 1, 256, 1, 1, name="cccp3") .conv(1, 1, 256, 1, 1, name="cccp4") .max_pool(3, 3, 2, 2, padding="VALID", name="pool2") .conv(3, 3, 384, 1, 1, name="conv3") .conv(1, 1, 384, 1, 1, name="cccp5") .conv(1, 1, 384, 1, 1, name="cccp6") .max_pool(3, 3, 2, 2, padding="VALID", name="pool3") .conv(3, 3, 1024, 1, 1, name="conv4-1024") .conv(1, 1, 1024, 1, 1, name="cccp7-1024") .conv(1, 1, 1000, 1, 1, name="cccp8-1024") .avg_pool(6, 6, 1, 1, padding="VALID", name="pool4") .softmax(name="prob"))

網路基本結果如上,代碼見github.com/ethereon/caf. 這裡因為我最近工作變動的問題,沒有了機器來跑一篇,也無法畫下基本的網路結構圖,之後我會補上。這裡指的提出的是中間cccp1和ccp2(cross channel pooling)等價於1*1kernel大小的卷積層。caffe中NIN的實現如下:

name: "nin_imagenet"layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb" backend: LMDB batch_size: 64 } transform_param { crop_size: 224 mirror: true mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean" } include: { phase: TRAIN }}layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb" backend: LMDB batch_size: 89 } transform_param { crop_size: 224 mirror: false mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean" } include: { phase: TEST }}layers { bottom: "data" top: "conv1" name: "conv1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" mean: 0 std: 0.01 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "conv1" top: "conv1" name: "relu0" type: RELU}layers { bottom: "conv1" top: "cccp1" name: "cccp1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 96 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp1" top: "cccp1" name: "relu1" type: RELU}layers { bottom: "cccp1" top: "cccp2" name: "cccp2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 96 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp2" top: "cccp2" name: "relu2" type: RELU}layers { bottom: "cccp2" top: "pool0" name: "pool0" type: POOLING pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layers { bottom: "pool0" top: "conv2" name: "conv2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "conv2" top: "conv2" name: "relu3" type: RELU}layers { bottom: "conv2" top: "cccp3" name: "cccp3" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp3" top: "cccp3" name: "relu5" type: RELU}layers { bottom: "cccp3" top: "cccp4" name: "cccp4" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 256 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp4" top: "cccp4" name: "relu6" type: RELU}layers { bottom: "cccp4" top: "pool2" name: "pool2" type: POOLING pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layers { bottom: "pool2" top: "conv3" name: "conv3" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 384 pad: 1 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.01 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "conv3" top: "conv3" name: "relu7" type: RELU}layers { bottom: "conv3" top: "cccp5" name: "cccp5" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 384 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp5" top: "cccp5" name: "relu8" type: RELU}layers { bottom: "cccp5" top: "cccp6" name: "cccp6" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 384 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp6" top: "cccp6" name: "relu9" type: RELU}layers { bottom: "cccp6" top: "pool3" name: "pool3" type: POOLING pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layers { bottom: "pool3" top: "pool3" name: "drop" type: DROPOUT dropout_param { dropout_ratio: 0.5 }}layers { bottom: "pool3" top: "conv4" name: "conv4-1024" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 pad: 1 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "conv4" top: "conv4" name: "relu10" type: RELU}layers { bottom: "conv4" top: "cccp7" name: "cccp7-1024" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1024 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.05 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp7" top: "cccp7" name: "relu11" type: RELU}layers { bottom: "cccp7" top: "cccp8" name: "cccp8-1024" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 1000 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" mean: 0 std: 0.01 } bias_filler { type: "constant" value: 0 } }}layers { bottom: "cccp8" top: "cccp8" name: "relu12" type: RELU}layers { bottom: "cccp8" top: "pool4" name: "pool4" type: POOLING pooling_param { pool: AVE kernel_size: 6 stride: 1 }}layers { name: "accuracy" type: ACCURACY bottom: "pool4" bottom: "label" top: "accuracy" include: { phase: TEST }}layers { bottom: "pool4" bottom: "label" name: "loss" type: SOFTMAX_LOSS include: { phase: TRAIN }}

NIN的提出其實也可以認為我們加深了網路的深度,通過加深網路深度(增加單個NIN的特徵表示能力)以及將原先全連接層變為aver_pool層,大大減少了原先需要的filters數,減少了model的參數。paper中實驗證明達到Alexnet相同的性能,最終model大小僅為29M。

理解NIN之後,再來看GoogLeNet就不會有不明所理的感覺。

GoogLeNet

痛點

  • 越大的CNN網路,有更大的model參數,也需要更多的計算力支持,並且由於模型過於複雜會過擬合;
  • 在CNN中,網路的層數的增加會伴隨著需求計算資源的增加;
  • 稀疏的network是可以接受,但是稀疏的數據結構通常在計算時效率很低

Inception module

Inception module的提出主要考慮多個不同size的卷積核能夠hold圖像當中不同cluster的信息,為方便計算,paper中分別使用1*1,3*3,5*5,同時加入3*3 max pooling模塊。 然而這裡存在一個很大的計算隱患,每一層Inception module的輸出的filters將是分支所有filters數量的綜合,經過多層之後,最終model的數量將會變得巨大,naive的inception會對計算資源有更大的依賴。 前面我們有提到Network-in-Network模型,1*1的模型能夠有效進行降維(使用更少的來表達儘可能多的信息),所以文章提出了」Inception module with dimension reduction」,在不損失模型特徵表示能力的前提下,盡量減少filters的數量,達到減少model複雜度的目的:

Overall of GoogLeNet

在tensorflow構造GoogLeNet基本的代碼:

from kaffe.tensorflow import Networkclass GoogleNet(Network): def setup(self): (self.feed("data") .conv(7, 7, 64, 2, 2, name="conv1_7x7_s2") .max_pool(3, 3, 2, 2, name="pool1_3x3_s2") .lrn(2, 2e-05, 0.75, name="pool1_norm1") .conv(1, 1, 64, 1, 1, name="conv2_3x3_reduce") .conv(3, 3, 192, 1, 1, name="conv2_3x3") .lrn(2, 2e-05, 0.75, name="conv2_norm2") .max_pool(3, 3, 2, 2, name="pool2_3x3_s2") .conv(1, 1, 64, 1, 1, name="inception_3a_1x1")) (self.feed("pool2_3x3_s2") .conv(1, 1, 96, 1, 1, name="inception_3a_3x3_reduce") .conv(3, 3, 128, 1, 1, name="inception_3a_3x3")) (self.feed("pool2_3x3_s2") .conv(1, 1, 16, 1, 1, name="inception_3a_5x5_reduce") .conv(5, 5, 32, 1, 1, name="inception_3a_5x5")) (self.feed("pool2_3x3_s2") .max_pool(3, 3, 1, 1, name="inception_3a_pool") .conv(1, 1, 32, 1, 1, name="inception_3a_pool_proj")) (self.feed("inception_3a_1x1", "inception_3a_3x3", "inception_3a_5x5", "inception_3a_pool_proj") .concat(3, name="inception_3a_output") .conv(1, 1, 128, 1, 1, name="inception_3b_1x1")) (self.feed("inception_3a_output") .conv(1, 1, 128, 1, 1, name="inception_3b_3x3_reduce") .conv(3, 3, 192, 1, 1, name="inception_3b_3x3")) (self.feed("inception_3a_output") .conv(1, 1, 32, 1, 1, name="inception_3b_5x5_reduce") .conv(5, 5, 96, 1, 1, name="inception_3b_5x5")) (self.feed("inception_3a_output") .max_pool(3, 3, 1, 1, name="inception_3b_pool") .conv(1, 1, 64, 1, 1, name="inception_3b_pool_proj")) (self.feed("inception_3b_1x1", "inception_3b_3x3", "inception_3b_5x5", "inception_3b_pool_proj") .concat(3, name="inception_3b_output") .max_pool(3, 3, 2, 2, name="pool3_3x3_s2") .conv(1, 1, 192, 1, 1, name="inception_4a_1x1")) (self.feed("pool3_3x3_s2") .conv(1, 1, 96, 1, 1, name="inception_4a_3x3_reduce") .conv(3, 3, 208, 1, 1, name="inception_4a_3x3")) (self.feed("pool3_3x3_s2") .conv(1, 1, 16, 1, 1, name="inception_4a_5x5_reduce") .conv(5, 5, 48, 1, 1, name="inception_4a_5x5")) (self.feed("pool3_3x3_s2") .max_pool(3, 3, 1, 1, name="inception_4a_pool") .conv(1, 1, 64, 1, 1, name="inception_4a_pool_proj")) (self.feed("inception_4a_1x1", "inception_4a_3x3", "inception_4a_5x5", "inception_4a_pool_proj") .concat(3, name="inception_4a_output") .conv(1, 1, 160, 1, 1, name="inception_4b_1x1")) (self.feed("inception_4a_output") .conv(1, 1, 112, 1, 1, name="inception_4b_3x3_reduce") .conv(3, 3, 224, 1, 1, name="inception_4b_3x3")) (self.feed("inception_4a_output") .conv(1, 1, 24, 1, 1, name="inception_4b_5x5_reduce") .conv(5, 5, 64, 1, 1, name="inception_4b_5x5")) (self.feed("inception_4a_output") .max_pool(3, 3, 1, 1, name="inception_4b_pool") .conv(1, 1, 64, 1, 1, name="inception_4b_pool_proj")) (self.feed("inception_4b_1x1", "inception_4b_3x3", "inception_4b_5x5", "inception_4b_pool_proj") .concat(3, name="inception_4b_output") .conv(1, 1, 128, 1, 1, name="inception_4c_1x1")) (self.feed("inception_4b_output") .conv(1, 1, 128, 1, 1, name="inception_4c_3x3_reduce") .conv(3, 3, 256, 1, 1, name="inception_4c_3x3")) (self.feed("inception_4b_output") .conv(1, 1, 24, 1, 1, name="inception_4c_5x5_reduce") .conv(5, 5, 64, 1, 1, name="inception_4c_5x5")) (self.feed("inception_4b_output") .max_pool(3, 3, 1, 1, name="inception_4c_pool") .conv(1, 1, 64, 1, 1, name="inception_4c_pool_proj")) (self.feed("inception_4c_1x1", "inception_4c_3x3", "inception_4c_5x5", "inception_4c_pool_proj") .concat(3, name="inception_4c_output") .conv(1, 1, 112, 1, 1, name="inception_4d_1x1")) (self.feed("inception_4c_output") .conv(1, 1, 144, 1, 1, name="inception_4d_3x3_reduce") .conv(3, 3, 288, 1, 1, name="inception_4d_3x3")) (self.feed("inception_4c_output") .conv(1, 1, 32, 1, 1, name="inception_4d_5x5_reduce") .conv(5, 5, 64, 1, 1, name="inception_4d_5x5")) (self.feed("inception_4c_output") .max_pool(3, 3, 1, 1, name="inception_4d_pool") .conv(1, 1, 64, 1, 1, name="inception_4d_pool_proj")) (self.feed("inception_4d_1x1", "inception_4d_3x3", "inception_4d_5x5", "inception_4d_pool_proj") .concat(3, name="inception_4d_output") .conv(1, 1, 256, 1, 1, name="inception_4e_1x1")) (self.feed("inception_4d_output") .conv(1, 1, 160, 1, 1, name="inception_4e_3x3_reduce") .conv(3, 3, 320, 1, 1, name="inception_4e_3x3")) (self.feed("inception_4d_output") .conv(1, 1, 32, 1, 1, name="inception_4e_5x5_reduce") .conv(5, 5, 128, 1, 1, name="inception_4e_5x5")) (self.feed("inception_4d_output") .max_pool(3, 3, 1, 1, name="inception_4e_pool") .conv(1, 1, 128, 1, 1, name="inception_4e_pool_proj")) (self.feed("inception_4e_1x1", "inception_4e_3x3", "inception_4e_5x5", "inception_4e_pool_proj") .concat(3, name="inception_4e_output") .max_pool(3, 3, 2, 2, name="pool4_3x3_s2") .conv(1, 1, 256, 1, 1, name="inception_5a_1x1")) (self.feed("pool4_3x3_s2") .conv(1, 1, 160, 1, 1, name="inception_5a_3x3_reduce") .conv(3, 3, 320, 1, 1, name="inception_5a_3x3")) (self.feed("pool4_3x3_s2") .conv(1, 1, 32, 1, 1, name="inception_5a_5x5_reduce") .conv(5, 5, 128, 1, 1, name="inception_5a_5x5")) (self.feed("pool4_3x3_s2") .max_pool(3, 3, 1, 1, name="inception_5a_pool") .conv(1, 1, 128, 1, 1, name="inception_5a_pool_proj")) (self.feed("inception_5a_1x1", "inception_5a_3x3", "inception_5a_5x5", "inception_5a_pool_proj") .concat(3, name="inception_5a_output") .conv(1, 1, 384, 1, 1, name="inception_5b_1x1")) (self.feed("inception_5a_output") .conv(1, 1, 192, 1, 1, name="inception_5b_3x3_reduce") .conv(3, 3, 384, 1, 1, name="inception_5b_3x3")) (self.feed("inception_5a_output") .conv(1, 1, 48, 1, 1, name="inception_5b_5x5_reduce") .conv(5, 5, 128, 1, 1, name="inception_5b_5x5")) (self.feed("inception_5a_output") .max_pool(3, 3, 1, 1, name="inception_5b_pool") .conv(1, 1, 128, 1, 1, name="inception_5b_pool_proj")) (self.feed("inception_5b_1x1", "inception_5b_3x3", "inception_5b_5x5", "inception_5b_pool_proj") .concat(3, name="inception_5b_output") .avg_pool(7, 7, 1, 1, padding="VALID", name="pool5_7x7_s1") .fc(1000, relu=False, name="loss3_classifier") .softmax(name="prob"))

代碼在github.com/ethereon/caf中,作者封裝了一些基本的操作,了解網路結構之後,構造GoogLeNet很容易。之後等到新公司之後,我會試著在tflearn的基礎上寫下GoogLeNet的網路代碼。

GoogLeNet on Tensorflow

GoogLeNet為了實現方便,我用tflearn來重寫了下,代碼中和caffe model裡面不一樣的就是一些padding的位置,因為改的比較麻煩,必須保持inception部分的concat時要一致,我這裡也不知道怎麼修改pad的值(caffe prototxt),所以統一padding設定為same,具體代碼如下:

# -*- coding: utf-8 -*-""" GoogLeNet.Applying "GoogLeNet" to Oxford"s 17 Category Flower Dataset classification task.References: - Szegedy, Christian, et al. Going deeper with convolutions. - 17 Category Flower Dataset. Maria-Elena Nilsback and Andrew Zisserman.Links: - [GoogLeNet Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf) - [Flower Dataset (17)](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/)"""from __future__ import division, print_function, absolute_importimport tflearnfrom tflearn.layers.core import input_data, dropout, fully_connectedfrom tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2dfrom tflearn.layers.normalization import local_response_normalizationfrom tflearn.layers.merge_ops import mergefrom tflearn.layers.estimator import regressionimport tflearn.datasets.oxflower17 as oxflower17X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))network = input_data(shape=[None, 227, 227, 3])conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation="relu", name = "conv1_7_7_s2")pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2)pool1_3_3 = local_response_normalization(pool1_3_3)conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation="relu",name = "conv2_3_3_reduce")conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation="relu", name="conv2_3_3")conv2_3_3 = local_response_normalization(conv2_3_3)pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name="pool2_3_3_s2")inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation="relu", name="inception_3a_1_1")inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation="relu", name="inception_3a_3_3_reduce")inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3, activation="relu", name = "inception_3a_3_3")inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation="relu", name ="inception_3a_5_5_reduce" )inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation="relu", name= "inception_3a_5_5")inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, )inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation="relu", name="inception_3a_pool_1_1")# merge the inception_3a__inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode="concat", axis=3)inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation="relu", name= "inception_3b_1_1" )inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation="relu", name="inception_3b_3_3_reduce")inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3, activation="relu",name="inception_3b_3_3")inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation="relu", name = "inception_3b_5_5_reduce")inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5, name = "inception_3b_5_5")inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1, name="inception_3b_pool")inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation="relu", name="inception_3b_pool_1_1")#merge the inception_3b_*inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode="concat",axis=3,name="inception_3b_output")pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name="pool3_3_3")inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation="relu", name="inception_4a_1_1")inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation="relu", name="inception_4a_3_3_reduce")inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3, activation="relu", name="inception_4a_3_3")inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation="relu", name="inception_4a_5_5_reduce")inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5, activation="relu", name="inception_4a_5_5")inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1, name="inception_4a_pool")inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation="relu", name="inception_4a_pool_1_1")inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode="concat", axis=3, name="inception_4a_output")inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation="relu", name="inception_4a_1_1")inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation="relu", name="inception_4b_3_3_reduce")inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation="relu", name="inception_4b_3_3")inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation="relu", name="inception_4b_5_5_reduce")inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5, activation="relu", name="inception_4b_5_5")inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1, name="inception_4b_pool")inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation="relu", name="inception_4b_pool_1_1")inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode="concat", axis=3, name="inception_4b_output")inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation="relu",name="inception_4c_1_1")inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation="relu", name="inception_4c_3_3_reduce")inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256, filter_size=3, activation="relu", name="inception_4c_3_3")inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation="relu", name="inception_4c_5_5_reduce")inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64, filter_size=5, activation="relu", name="inception_4c_5_5")inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1)inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation="relu", name="inception_4c_pool_1_1")inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode="concat", axis=3,name="inception_4c_output")inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation="relu", name="inception_4d_1_1")inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation="relu", name="inception_4d_3_3_reduce")inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation="relu", name="inception_4d_3_3")inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation="relu", name="inception_4d_5_5_reduce")inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5, activation="relu", name="inception_4d_5_5")inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1, name="inception_4d_pool")inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation="relu", name="inception_4d_pool_1_1")inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode="concat", axis=3, name="inception_4d_output")inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation="relu", name="inception_4e_1_1")inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation="relu", name="inception_4e_3_3_reduce")inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation="relu", name="inception_4e_3_3")inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation="relu", name="inception_4e_5_5_reduce")inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128, filter_size=5, activation="relu", name="inception_4e_5_5")inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1, name="inception_4e_pool")inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation="relu", name="inception_4e_pool_1_1")inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode="concat")pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name="pool_3_3")inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation="relu", name="inception_5a_1_1")inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation="relu", name="inception_5a_3_3_reduce")inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation="relu", name="inception_5a_3_3")inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation="relu", name="inception_5a_5_5_reduce")inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5, activation="relu", name="inception_5a_5_5")inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1, name="inception_5a_pool")inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation="relu", name="inception_5a_pool_1_1")inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode="concat")inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation="relu", name="inception_5b_1_1")inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation="relu", name="inception_5b_3_3_reduce")inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384, filter_size=3,activation="relu", name="inception_5b_3_3")inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation="relu", name="inception_5b_5_5_reduce")inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5, activation="relu", name="inception_5b_5_5" )inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1, name="inception_5b_pool")inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation="relu", name="inception_5b_pool_1_1")inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode="concat")pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1)pool5_7_7 = dropout(pool5_7_7, 0.4)loss = fully_connected(pool5_7_7, 17,activation="softmax")network = regression(loss, optimizer="momentum", loss="categorical_crossentropy", learning_rate=0.001)model = tflearn.DNN(network, checkpoint_path="model_googlenet", max_checkpoints=1, tensorboard_verbose=2)model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id="googlenet_oxflowers17")

大家如果感興趣,可以看看這部分的caffe model prototxt, 幫忙檢查下是否有問題,代碼我已經提交到tflearn的官方庫了,add GoogLeNet(Inception) in Example,各位有tensorflow的直接安裝下tflearn,看看是否能幫忙檢查下是否有問題,我這裡因為沒有GPU的機器,跑的比較慢,TensorBoard的圖如下,不像之前Alexnet那麼明顯(主要還是沒有跑那麼多epoch,這裡在寫入的時候發現主機上沒有磁碟空間了,尷尬,然後從新寫了restore來跑的,TensorBoard的圖也貌似除了點問題, 好像每次載入都不太一樣,但是從基本的log裡面的東西來看,是逐步在收斂的,這裡圖也貼下看看吧)

網路結構,這裡有個bug,可能是TensorBoard的,googlenet的graph可能是太大,大概是1.3m,在chrome上無法下載,試了火狐貌似可以了:

為了方便,這裡也貼出一些我自己保存的運行的log,能夠很明顯的看出收斂:


推薦閱讀:

深入淺出Tensorflow(四):卷積神經網路
利用TensorFlow搞定知乎驗證碼之《讓你找中文倒轉漢字》
深入淺出Tensorflow(五):循環神經網路簡介
cs20si:tensorflow for research 學習筆記2

TAG:TensorFlow | 计算机视觉 | 深度学习DeepLearning |