MXnet初體驗之inception-resnet-v2從Model到Predict

前言

如何找到自己實用的丹爐,是一個深度修真之人至關重要的,丹爐的好壞直接關係到煉丹的成功與否,道途千載,尋一合適丹爐也不妨這千古悠悠的修真(正)之路

為什麼學MXNet? 熟悉本人博客的都知道,前段時間一直在關注TensorFlow也安利了很多次TFLearn,為什麼這次突然會寫MXNet的東西呢?原因是沒錢呀,TensorFlow計算力要求太高,雖然使用方便,但是顯存佔用太高,計算也不夠快,做公司項目還好,自己玩一些好玩的東西時太費時間,不過現在MXNet資源相對少一點,基於MXNet的有意思的開源項目也相對少一點,不過沒關係,都不是問題,另外一點就是造MXNet都是一群說得上名字的大牛,能和大牛們玩一樣的東西,想想都很興奮。

MXNet的文檔一直被一些愛好者噴,確實文檔比較少,不過考慮到開發者都是業餘時間造輪子(不,造丹爐!),很那像其他的框架有那麼熟悉的文檔,不過還好,在cv這塊還是比較容易下手的。 這裡有我從最近開始接觸MXNet(其實很早就聽說一直沒有用過),學習的一些代碼還有筆記mxnet 101,沒有特別細緻研究,只是了解怎麼用在CV上,完整的做一個項目。

新的丹方–inception-resnet-v2

每一付新的丹方,無不是深度前輩們多年經驗的結晶,丹方,很多時候在同樣煉丹材料表現天差地別,也成為傳奇前輩們的一個個標誌

一看到這個名字就知道和resnet和inception(googlenet 即是inception-v1)逃脫不了干係,就是一個比較複雜的網路結構,具體多複雜?!玩過TFLearn的去看看我寫的代碼,run下 然後從TensorBoard的graph打開看看,(之前一個被merge的版本後來發現沒有batch normalization)改了的提了PR但是在寫博客的時候還沒有被merge, add inception-resnet-v2 in branch inception-resnet-v2 #450。總之就是」丹方」特別複雜,具體去結合Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,了解過resnet和googlenet的網路結構的小夥伴應該很容易弄明白,以下TFLearn的代碼參考tf.slim下inception-resnet-v2。 基本的代碼結構:

# -*- coding: utf-8 -*-nn""" inception_resnet_v2.nnApplying inception_resnet_v2 to Oxfords 17 Category Flower Dataset classification task.nnReferences:n Inception-v4, Inception-ResNet and the Impact of Residual Connectionsn on Learningn Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi.nnLinks:n http://arxiv.org/abs/1602.07261nn"""nnfrom __future__ import division, print_function, absolute_importnimport tflearnnfrom tflearn.layers.core import input_data, dropout, flatten, fully_connectednfrom tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2dnfrom tflearn.utils import repeatnfrom tflearn.layers.merge_ops import mergenfrom tflearn.data_utils import shuffle, to_categoricalnimport tflearn.activations as activationsnimport tflearn.datasets.oxflower17 as oxflower17ndef block35(net, scale=1.0, activation=relu):n tower_conv = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_1x1)n tower_conv1_0 = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 32, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0b_3x3)n tower_conv2_0 = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv2_1 = conv_2d(tower_conv2_0, 48,3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0b_3x3)n tower_conv2_2 = conv_2d(tower_conv2_1, 64,3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0c_3x3)n tower_mixed = merge([tower_conv, tower_conv1_1, tower_conv2_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnndef block17(net, scale=1.0, activation=relu):n tower_conv = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_1x1)n tower_conv_1_0 = conv_2d(net, 128, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv_1_1 = conv_2d(tower_conv_1_0, 160,[1,7], normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0b_1x7)n tower_conv_1_2 = conv_2d(tower_conv_1_1, 192, [7,1], normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0c_7x1)n tower_mixed = merge([tower_conv,tower_conv_1_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnnndef block8(net, scale=1.0, activation=relu):n """n """n tower_conv = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_1x1)n tower_conv1_0 = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 224, [1,3], normalizer_fn=batch_normalization, name=Conv2d_0b_1x3)n tower_conv1_2 = conv_2d(tower_conv1_1, 256, [3,1], normalizer_fn=batch_normalization, name=Conv2d_0c_3x1)n tower_mixed = merge([tower_conv,tower_conv1_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnn# Data loading and preprocessingnimport tflearn.datasets.oxflower17 as oxflower17nX, Y = oxflower17.load_data(one_hot=True, resize_pics=(299, 299))nnnum_classes = 17ndropout_keep_prob = 0.8nnnetwork = input_data(shape=[None, 299, 299, 3])nconv1a_3_3 = conv_2d(network, 32, 3, strides=2, normalizer_fn=batch_normalization, padding=VALID,activation=relu,name=Conv2d_1a_3x3)nconv2a_3_3 = conv_2d(conv1a_3_3, 32, 3, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_2a_3x3)nconv2b_3_3 = conv_2d(conv2a_3_3, 64, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_2b_3x3)nmaxpool3a_3_3 = max_pool_2d(conv2b_3_3, 3, strides=2, padding=VALID, name=MaxPool_3a_3x3)nconv3b_1_1 = conv_2d(maxpool3a_3_3, 80, 1, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_3b_1x1)nconv4a_3_3 = conv_2d(conv3b_1_1, 192, 3, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_4a_3x3)nmaxpool5a_3_3 = max_pool_2d(conv4a_3_3, 3, strides=2, padding=VALID, name=MaxPool_5a_3x3)nntower_conv = conv_2d(maxpool5a_3_3, 96, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b0_1x1)nntower_conv1_0 = conv_2d(maxpool5a_3_3, 48, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b1_0a_1x1)ntower_conv1_1 = conv_2d(tower_conv1_0, 64, 5, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b1_0b_5x5)nntower_conv2_0 = conv_2d(maxpool5a_3_3, 64, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b2_0a_1x1)ntower_conv2_1 = conv_2d(tower_conv2_0, 96, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b2_0b_3x3)ntower_conv2_2 = conv_2d(tower_conv2_1, 96, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b2_0c_3x3)nntower_pool3_0 = avg_pool_2d(maxpool5a_3_3, 3, strides=1, padding=same, name=AvgPool_5b_b3_0a_3x3)ntower_conv3_1 = conv_2d(tower_pool3_0, 64, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b3_0b_1x1)nntower_5b_out = merge([tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1], mode=concat, axis=3)nnnet = repeat(tower_5b_out, 10, block35, scale=0.17)tower_conv2_2 = conv_2d(tower_conv2_1, 96, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b2_0c_3x3)nntower_pool3_0 = avg_pool_2d(maxpool5a_3_3, 3, strides=1, padding=same, name=AvgPool_5b_b3_0a_3x3)ntower_conv3_1 = conv_2d(tower_pool3_0, 64, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b3_0b_1x1)nntower_5b_out = merge([tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1], mode=concat, axis=3)nnnet = repeat(tower_5b_out, 10, block35, scale=0.17)nntower_conv = conv_2d(net, 384, 3, normalizer_fn=batch_normalization, strides=2,activation=relu, padding=VALID, name=Conv2d_6a_b0_0a_3x3)ntower_conv1_0 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0a_1x1)ntower_conv1_1 = conv_2d(tower_conv1_0, 256, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0b_3x3)ntower_conv1_2 = conv_2d(tower_conv1_1, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_6a_b1_0c_3x3)ntower_pool = max_pool_2d(net, 3, strides=2, padding=VALID,name=MaxPool_1a_3x3)nnet = merge([tower_conv, tower_conv1_2, tower_pool], mode=concat, axis=3)nnet = repeat(net, 20, block17, scale=0.1)nntower_conv = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)ntower_conv0_1 = conv_2d(tower_conv, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_0a_1x1)nntower_conv1 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, padding=VALID, activation=relu,name=Conv2d_0a_1x1)ntower_conv1_1 = conv_2d(tower_conv1,288,3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=COnv2d_1a_3x3)nntower_conv2 = conv_2d(net, 256,1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)ntower_conv2_1 = conv_2d(tower_conv2, 288,3, normalizer_fn=batch_normalization, name=Conv2d_0b_3x3,activation=relu)ntower_conv2_2 = conv_2d(tower_conv2_1, 320, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=Conv2d_1a_3x3)nntower_pool = max_pool_2d(net, 3, strides=2, padding=VALID, name=MaxPool_1a_3x3)nnet = merge([tower_conv0_1, tower_conv1_1,tower_conv2_2, tower_pool], mode=concat, axis=3)nnnet = repeat(net, 9, block8, scale=0.2)nnet = block8(net, activation=None)nnnet = conv_2d(net, 1536, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_7b_1x1)nnet = avg_pool_2d(net, net.get_shape().as_list()[1:3],strides=2, padding=VALID, name=AvgPool_1a_8x8)nnet = flatten(net)nnet = dropout(net, dropout_keep_prob)nloss = fully_connected(net, num_classes,activation=softmax)nnnnetwork = tflearn.regression(loss, optimizer=RMSprop,n loss=categorical_crossentropy,n learning_rate=0.0001)nmodel = tflearn.DNN(network, checkpoint_path=inception_resnet_v2,n max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir="./tflearn_logs/")nmodel.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,n show_metric=True, batch_size=32, snapshot_step=2000,n snapshot_epoch=False, run_id=inception_resnet_v2_17flowers)nntower_conv = conv_2d(net, 384, 3, normalizer_fn=batch_normalization, strides=2,activation=relu, padding=VALID, name=Conv2d_6a_b0_0a_3x3)ntower_conv1_0 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0a_1x1)ntower_conv1_1 = conv_2d(tower_conv1_0, 256, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0b_3x3)ntower_conv1_2 = conv_2d(tower_conv1_1, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_6a_b1_0c_3x3)ntower_pool = max_pool_2d(net, 3, strides=2, padding=VALID,name=MaxPool_1a_3x3)nnet = merge([tower_conv, tower_conv1_2, tower_pool], mode=concat, axis=3)nnet = repeat(net, 20, block17, scale=0.1)nntower_conv = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)ntower_conv0_1 = conv_2d(tower_conv, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_0a_1x1)nntower_conv1 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, padding=VALID, activation=relu,name=Conv2d_0a_1x1)ntower_conv1_1 = conv_2d(tower_conv1,288,3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=COnv2d_1a_3x3)nntower_conv2 = conv_2d(net, 256,1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)ntower_conv2_1 = conv_2d(tower_conv2, 288,3, normalizer_fn=batch_normalization, name=Conv2d_0b_3x3,activation=relu)ntower_conv2_2 = conv_2d(tower_conv2_1, 320, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=Conv2d_1a_3x3)nntower_pool = max_pool_2d(net, 3, strides=2, padding=VALID, name=MaxPool_1a_3x3)nnet = merge([tower_conv0_1, tower_conv1_1,tower_conv2_2, tower_pool], mode=concat, axis=3)nnnet = repeat(net, 9, block8, scale=0.2)nnet = block8(net, activation=None)nnnet = conv_2d(net, 1536, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_7b_1x1)nnet = avg_pool_2d(net, net.get_shape().as_list()[1:3],strides=2, padding=VALID, name=AvgPool_1a_8x8)nnet = flatten(net)nnet = dropout(net, dropout_keep_prob)nloss = fully_connected(net, num_classes,activation=softmax)nnnnetwork = tflearn.regression(loss, optimizer=RMSprop,n loss=categorical_crossentropy,n learning_rate=0.0001)nmodel = tflearn.DNN(network, checkpoint_path=inception_resnet_v2,n max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir="./tflearn_logs/")nmodel.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,n show_metric=True, batch_size=32, snapshot_step=2000,n snapshot_epoch=False, run_id=inception_resnet_v2_17flowers)n

想要run下的可以去使用下TFLearn,注意更改conv_2d裡面的內容,我這裡在本身conv_2d上加了個normalizer_fn,來使用batch_normalization。

MXNet 煉丹

不同的丹爐,即使是相同的丹方,煉丹的方式都不僅相同

在打算用MXNet實現inception-resnet-v2之前,除了mxnet-101裡面的代碼,基本沒有寫過mxnet,但是沒關係,不怕,有很多其他大神寫的丹方,這裡具體參考了symbol_inception-bn.py。首先,為了減少代碼條數,參考創建一個ConvFactory,但是和inception-bn不同的是,inception-resnet-v2要考慮是否要激活函數的版本。所以inception-resnet-v2的ConvFactory如下:

def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), act_type="relu", mirror_attr={},with_act=True):n conv = mx.symbol.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad)n bn = mx.symbol.BatchNorm(data=conv)n if with_act:n act = mx.symbol.Activation(data = bn, act_type=act_type, attr=mirror_attr)n return actn else:n return bnn

然後就簡單了,按照網路一路往下寫:

def get_symbol(num_classes=1000input_data_shape=(64,3,299,299)):n data = mx.symbol.Variable(name=data)n conv1a_3_3 = ConvFactory(data=data, num_filter=32, kernel=(3,3), stride=(2, 2))n conv2a_3_3 = ConvFactory(conv1a_3_3, 32, (3,3))n conv2b_3_3 = ConvFactory(conv2a_3_3, 64, (3,3), pad=(1,1))n maxpool3a_3_3 = mx.symbol.Pooling(data=conv2b_3_3, kernel=(3, 3), stride=(2, 2), pool_type=max)n conv3b_1_1 = ConvFactory(maxpool3a_3_3, 80 ,(1,1))n conv4a_3_3 = ConvFactory(conv3b_1_1, 192, (3,3))n maxpool5a_3_3 = mx.symbol.Pooling(data=conv4a_3_3, kernel=(3,3), stride=(2,2), pool_type=max)nn tower_conv = ConvFactory(maxpool5a_3_3, 96, (1,1))n tower_conv1_0 = ConvFactory(maxpool5a_3_3, 48, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 64, (5,5), pad=(2,2))nn tower_conv2_0 = ConvFactory(maxpool5a_3_3, 64, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 96, (3,3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 96, (3,3), pad=(1,1))nn tower_pool3_0 = mx.symbol.Pooling(data=maxpool5a_3_3, kernel=(3,3), stride=(1,1),pad=(1,1), pool_type=avg)n tower_conv3_1 = ConvFactory(tower_pool3_0, 64, (1,1))n tower_5b_out = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1])n

然後就不對了,要重複條用一個block35的結構,repeat函數很容易實現,給定調用次數,調用函數,參數, 多次調用就好了:

def repeat(inputs, repetitions, layer, *args, **kwargs):n outputs = inputsn for i in range(repetitions):n outputs = layer(outputs, *args, **kwargs)n return outputsn

這裡很簡單,但是block35就有問題啦,這個子結構的目的要輸出與輸入同樣大小的channel數,之前因為在tensorflow下寫的,很容易拿到一個Variable的shape,但是在MXNet上就很麻煩,這裡不知道怎麼做,提了個issue How can i get the shape with the net?,然後就去查api,發現有個infer_shape,MXNet客服部小夥伴也讓我用這個去做, 試了試,挺管用能夠拿到shape,但是必須給入一個4d的tensor的shape,比如(64,3,299,299),他會在graph運行時infer到對應symbol的shape,然後就這麼寫了:

def block35(net, input_data_shape, scale=1.0, with_act=True, act_type=relu, mirror_attr={}):n assert len(input_data_shape) == 4, input_data_shape should be len of 4, your n input_data_shape is len of %d%len(input_data_shape)n _, out_shape,_ = net.infer_shape(data=input_data_shape)n tower_conv = ConvFactory(net, 32, (1,1))n tower_conv1_0 = ConvFactory(net, 32, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 32, (3, 3), pad=(1,1))n tower_conv2_0 = ConvFactory(net, 32, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 48, (3, 3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 64, (3, 3), pad=(1,1))n tower_mixed = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2])n tower_out = ConvFactory(tower_mixed, out_shape[0][1], (1,1), with_act=False)nn net += scale * tower_outn if with_act:n act = mx.symbol.Activation(data = net, act_type=act_type, attr=mirror_attr)n return actn else:n return netn

大家是不是感到很彆扭,我也覺得很彆扭,但是我一直是個『不拘小節』的工程師,對這塊不斤斤計較,所以,寫完這塊之後也覺得就成了接下來就是block17, block8, 這裡都很簡單的很類似,就不提了。

然後就接下來一段,很快就完成了:

net = repeat(tower_5b_out, 10, block35, scale=0.17, input_num_channels=320)ntower_conv = ConvFactory(net, 384, (3,3),stride=(2,2))ntower_conv1_0 = ConvFactory(net, 256, (1,1))ntower_conv1_1 = ConvFactory(tower_conv1_0, 256, (3,3), pad=(1,1))ntower_conv1_2 = ConvFactory(tower_conv1_1, 384, (3,3),stride=(2,2))ntower_pool = mx.symbol.Pooling(net, kernel=(3,3), stride=(2,2), pool_type=max)nnet = mx.symbol.Concat(*[tower_conv, tower_conv1_2, tower_pool])nnet = repeat(net, 20, block17, scale=0.1, input_num_channels=1088)ntower_conv = ConvFactory(net, 256, (1,1))ntower_conv0_1 = ConvFactory(tower_conv, 384, (3,3), stride=(2,2))ntower_conv1 = ConvFactory(net, 256, (1,1))ntower_conv1_1 = ConvFactory(tower_conv1, 288, (3,3), stride=(2,2))ntower_conv2 = ConvFactory(net, 256, (1,1))ntower_conv2_1 = ConvFactory(tower_conv2, 288, (3,3), pad=(1,1))ntower_conv2_2 = ConvFactory(tower_conv2_1, 320, (3,3), stride=(2,2))ntower_pool = mx.symbol.Pooling(net, kernel=(3,3), stride=(2,2), pool_type=max)nnet = mx.symbol.Concat(*[tower_conv0_1, tower_conv1_1, tower_conv2_2, tower_pool])nnnet = repeat(net, 9, block8, scale=0.2, input_num_channels=2080)nnet = block8(net, with_act=False, input_num_channel=2080)nnnet = ConvFactory(net, 1536, (1,1))nnet = mx.symbol.Pooling(net, kernel=(1,1), global_pool=True, stride=(2,2), pool_type=avg)nnet = mx.symbol.Flatten(net)nnet = mx.symbol.Dropout(data=net,p= 0.8)nnet = mx.symbol.FullyConnected(data=net,num_hidden=num_classes)nsoftmax = mx.symbol.SoftmaxOutput(data=net, name=softmax)n

感覺很開心,寫完了,這麼簡單,大家先忽略先忽悠所有的pad值,因為找不到沒有pad的版本,所以大家先忽略下。然後就是寫樣例測試呀,又是17flowers這個數據集,參考mxnet-101中如何把dataset轉換為binary, 首先寫個py來get到所有的圖像list,index還有他的label_index,這個很快就解決了。具體參考我這裡的mxnet-101 然後就是拿數據開始run啦,Ready? Go!

咦,車開不起來,不對,都是些什麼鬼? infer_shape 有問題? 沒事 查api tensorflow中padding是」valid」和」same」,MXNet中沒有, 沒有…,要自己計算,什麼鬼?沒有valid,same,我不會呀!!!寫了這麼久,就不寫了?不行,找下怎麼搞定,看了TensorFlow的文檔,翻了資料,same就是保證input與output保持一致,valid就無所謂,不需要設置pad,所以當TensorFlow中有same的時候,就需要在MXNet中設置對應的pad值,kernel為3的時候pad=1, kernel=5,pad=2。這裡改來改去,列印出每一層網路後的shape,前後花了我大概6個小時,終於讓我一步一步debug出來了,但是不對,在repeat 10次block35後,怎麼和tf.slim的inception-resnet-v2的注釋的shape不同?我了個擦,當時已經好像快凌晨4點了,本以為run起來了,怎麼就解釋不通呢?不會TensorFlow的注釋有問題吧?我了個擦,老美真是數學有點問題,提了個issue,很快就有人fix然後commit了 may be an error in slim.nets.inception_resnet_v2 #634,不過貌似到現在還沒有被merge。

一切ok,開始run了,用17flowers,很快可以收斂,沒有更多的資源來測試更大的數據集,就直接提交了,雖然代碼很爛,但怎麼著也是一步一步寫出來的,可是,始終確實是有點問題,後來經過github的好心人指點肯定也是一個大牛,告訴我Pooling 有個global_pool來做全局的池化,我了個擦,這麼好的東西,TensorFlow上可沒有,所以TensorFlow上用的是通過get_shape拿到對應的tensor的width和height來做pooling,我也二筆的在MXNet它裡面這樣用,所以需要input_shape_shape來infer到所在layer的shape,來做全局池化,有了這個,我還infer_shape個什麼鬼,blockxx裡面也不需要了,channel數可以直接手工計算,傳一個channel數就好了,get_symbol也可以保持和原來一樣不需要傳什麼input_data_shape啦!!!感謝zhreshold的提示,一切都ok,更改了,但是後面MXNet的大神在重構一些代碼,還沒有merge,不過沒有關係,等他們ok了 我再把inception-resnet-v2整理下,再提pr。

def block35(net, input_num_channels, scale=1.0, with_act=True, act_type=relu, mirror_attr={}):n tower_conv = ConvFactory(net, 32, (1,1))n tower_conv1_0 = ConvFactory(net, 32, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 32, (3, 3), pad=(1,1))n tower_conv2_0 = ConvFactory(net, 32, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 48, (3, 3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 64, (3, 3), pad=(1,1))n tower_mixed = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2])n tower_out = ConvFactory(tower_mixed, input_num_channels, (1,1), with_act=False)nn net += scale * tower_outn if with_act:n act = mx.symbol.Activation(data = net, act_type=act_type, attr=mirror_attr)n return actn else:n return netn

一直到這裡,inception-resnet-v2就寫出來了,但是只是測試了小數據集,後來在zhihu上偶遇李沐大神,果斷上去套近乎,最後拿到一個一台機器,就在測大一點的數據集,其實也不大,102flowers,之後會請沐神幫忙擴展一個大點的盤來放下ImageNet,測試一下性能,不過現在102flowers也還行,效果還不錯。

丹成

金丹品階高低,以丹紋記,不同煉丹材料丹紋不同,評判標準也不同,acc是最常用判斷金丹品階高低的手段

將102flower按9:1分成訓練集和驗證集,設置300個epoch(數據集比較小,貌似設置多點epoch才能有比較好的性能,看有小夥伴用inception-bn在imagenet上只需要50個epoch),網路inception-resnet-v2確實大,如此小的數據集上300 epoch大概也需要1天,不過對比tensorflow那是快多了。

`

編不下去了(predict)

這裡,會簡單地寫一個inference的例子,算作學習如果使用訓練好的model,注意還是最好使用python的opencv,因為MXNet官方是用的opencv,使用cv2這個庫,我在網上找的使用skimage的庫,做出來的始終有問題,應該是brg2rgb的問題,使用cv2的cv2.cvtColor(img, cv2.COLOR_BGR2RGB之後會成功:

import mxnet as mxnimport loggingnimport numpy as npnimport cv2nimport scipy.io as sionnlogger = logging.getLogger()nlogger.setLevel(logging.DEBUG)nnum_round = 260nprefix = "102flowers"nmodel = mx.model.FeedForward.load(prefix, num_round, ctx=mx.cpu(), numpy_batch_size=1)n# synset = [l.strip() for l in open(Inception/synset.txt).readlines()]nnndef PreprocessImage(path, show_img=False):n # load imagen img = cv2.imread(path)n img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)n mean_img = mx.nd.load(mean.bin).values()[0].asnumpy()n print img.shapen print mean_img.shapen img = cv2.resize(img,(299,299))n img = np.swapaxes(img, 0, 2)n img = np.swapaxes(img, 1, 2)n img = img -mean_imgn img = img[np.newaxis, :]n print img.shapenn return imgnn right = 0n sum = 0n with open(test.lst, r) as fread:n for line in fread.readlines()[:20]:n sum +=1n batch = ../day2/102flowers/ + line.split("t")[2].strip("n")n print batchn batch = PreprocessImage(batch, False)n prob = model.predict(batch)[0]n pred = np.argsort(prob)[::-1]n # # Get top1 labeln # top1 = synset[pred[0]]n top_1 = pred[0]n if top_1 == int(line.split("t")[1]):n print top1 rightn right += 1nn print top 1 accuracy: %f %(right/(1.0*sum))n

使用第260個epoch的模型weight,這裡因為手賤刪除了9:1時的test.lst,只能用7:3是的test.lst暫時做計算,最後accuracy應該會比較偏高,不過這不是重點。

總結(繼續編不下去了)

在這樣一次暢快淋漓的MXNet之旅後,總結一下遇到的幾個坑,與大家分享:

  • 無法直接拿到tensor的shape信息,通過infer_shape,在設計代碼時走了很多;
  • im2rec時,準備的train.lst, test.lst未shuffle,在102flowers上我都沒有發覺,在後面做鑒黃的training的時候發現開始training accuracy,分析可能是train.lst未shuffle的問題(以為在ImageRecordIter中有shuffle參數,就不需要),改了後沒有training accuracy從開始就為1的情況;
  • pad值的問題,翻閱了很多資料才解決,文檔也沒有特別多相關的,對於我這種從tensorflow轉MXNet的小夥伴來說是個比較大的坑;
  • predict的問題,找了MXNet github的源上的example,並不能成功,在找官網上的example發現使用的是cv2,並不是一些例子當中的skimage,考慮到mxnet在安裝時需要opencv,可能cv2和skimage在一些標準上有差異,就改用cv2的predict版本,還有讀入圖片之後要cv2.cvtColor(img, cv2.COLOR_BGR2RGB).
  • 還是predict的問題,在MXNet中,構造ImageRecordIter時沒有指定mean.bin,但是並不是說計算的時候不會減到均值圖片在訓練,開始誤解為不需要減到均值圖片,後來發現一直不正確,考慮到train的時候會自己生成mean.bin,猜測可能是這裡的問題,通過mean_img = mx.nd.load(mean.bin).values()[0].asnumpy()讀入後,在原始圖片減去均值圖,結果ok;但整個流程相對於tf.slim的predict還是比較複雜的。

優點:

  • 速度快,速度快,速度快,具體指沒有做測量,但是相對於tensorflow至少兩到三倍;
  • 佔用內存低, 同樣batch和模型,12g的顯存,tf會爆,但是MXNet只需要佔用7g多點;
  • im2rec很方便,相對於tensorflow下tfrecord需要寫部分代碼,更容易入手,但是切記自己生成train.lst, test.lst的時候要shuffle;
  • Pooling下的global_pool是個好東西,tensorflow沒有;

推薦閱讀:

MXNet的動態圖介面Gluon
AI 從業者該如何選擇深度學習開源框架丨硬創公開課
國內哪些公司在用caffe、torch、TensorFlow、paddle等框架,哪些在用自研框架?
為什麼強大的 MXNet 一直火不起來?

TAG:mxnet | 深度学习DeepLearning | 计算机视觉 |