機器學習進階筆記之七 | MXnet初體驗
引言
前段時間,『機器學習進階筆記』系列一直關注TensorFlow系統的技術實踐(想看TensorFlow技術實踐的同學可直接拉到文章底部看相關閱讀推薦),幫助大家從零開始,由淺入深,走上機器學習的進階之路。雖然之前都在誇TensorFlow的好,但其劣勢也很明顯——對計算力要求太高,雖然使用方便,但是顯存佔用太高,計算也不夠快,做公司項目還好,自己玩一些好玩的東西時太費時間了。
簡而言之,窮!
今天新開一篇,給大家介紹另一個優秀而強大的深度學習框架——MXnet,現在MXnet資源相對少一點,基於MXnet的有意思的開源項目也相對少一點,不過沒關係,都不是問題,他的優點是足夠靈活,速度足夠快,擴展新的功能比較容易,還有就是造MXnet都是一群說得上名字的大牛,能和大牛們玩一樣的東西,想想都很興奮有沒有!那我們開始吧:)
前言
如何找到自己實用的丹爐,是一個深度修真之人至關重要的,丹爐的好壞直接關係到煉丹的成功與否,道途千載,尋一合適丹爐也不妨這千古悠悠的修真(正)之路。
為什麼學mxnet? 熟悉本人博客的都知道,前段時間一直在關注TensorFlow也安利了很多次TFlearn,為什麼這次突然會寫MXnet的東西呢?原因是沒錢呀,TensorFlow計算力要求太高,雖然使用方便,但是顯存佔用太高,計算也不夠快,做公司項目還好,自己玩一些好玩的東西時太費時間,不過現在MXnet資源相對少一點,基於MXnet的有意思的開源項目也相對少一點,不過沒關係,都不是問題,另外一點就是造MXnet都是一群說得上名字的大牛,能和大牛們玩一樣的東西,想想都很興奮。
MXnet的文檔一直被一些愛好者噴,確實文檔比較少,不過考慮到開發者都是業餘時間造輪子(不,造丹爐!),很那像其他的框架有那麼熟悉的文檔,不過還好,在cv這塊還是比較容易下手的。 這裡有我從最近開始接觸MXnet(其實很早就聽說一直沒有用過),學習的一些代碼還有筆記mxnet 101,沒有特別細緻研究,只是了解怎麼用在CV上,完整的做一個項目。
新的丹方—inception-resnet-v2
每一付新的丹方,無不是深度前輩們多年經驗的結晶,丹方,很多時候在同樣煉丹材料表現天差地別,也成為傳奇前輩們的一個個標誌。
一看到這個名字就知道和resnet和inception(googlenet 即是inception-v1)逃脫不了干係,就是一個比較複雜的網路結構,具體多複雜?!玩過tflearn的去看看我寫的代碼,run下 然後從tensorboard的graph打開看看,(之前一個被merge的版本後來發現沒有batch normalization)改了的提了PR但是在寫博客的時候還沒有被mergeadd inception-resnet-v2 in branch inception-resnet-v2 #450。總之就是」丹方」特別複雜,具體去結合Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,了解過resnet和googlenet的網路結構的小夥伴應該很容易弄明白,以下tflearn的代碼參考tf.slim下inception-resnet-v2。 基本的代碼結構:
# -*- coding: utf-8 -*-nn """ inception_resnet_v2.nn Applying inception_resnet_v2 to Oxfords 17 Category Flower Dataset classification task.nn References:n Inception-v4, Inception-ResNet and the Impact of Residual Connectionsn on Learningn Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi.nn Links:n [http://arxiv.org/abs/1602.07261](http://arxiv.org/abs/1602.07261)nn """nn from __future__ import division, print_function, absolute_importn import tflearnn from tflearn.layers.core import input_data, dropout, flatten, fully_connectedn from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2dn from tflearn.utils import repeatn from tflearn.layers.merge_ops import mergen from tflearn.data_utils import shuffle, to_categoricaln import tflearn.activations as activationsn import tflearn.datasets.oxflower17 as oxflower17n def block35(net, scale=1.0, activation=relu):n tower_conv = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_1x1)n tower_conv1_0 = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 32, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0b_3x3)n tower_conv2_0 = conv_2d(net, 32, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv2_1 = conv_2d(tower_conv2_0, 48,3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0b_3x3)n tower_conv2_2 = conv_2d(tower_conv2_1, 64,3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0c_3x3)n tower_mixed = merge([tower_conv, tower_conv1_1, tower_conv2_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnn def block17(net, scale=1.0, activation=relu):n tower_conv = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_1x1)n tower_conv_1_0 = conv_2d(net, 128, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv_1_1 = conv_2d(tower_conv_1_0, 160,[1,7], normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0b_1x7)n tower_conv_1_2 = conv_2d(tower_conv_1_1, 192, [7,1], normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0c_7x1)n tower_mixed = merge([tower_conv,tower_conv_1_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnnn def block8(net, scale=1.0, activation=relu):n """n """n tower_conv = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_1x1)n tower_conv1_0 = conv_2d(net, 192, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 224, [1,3], normalizer_fn=batch_normalization, name=Conv2d_0b_1x3)n tower_conv1_2 = conv_2d(tower_conv1_1, 256, [3,1], normalizer_fn=batch_normalization, name=Conv2d_0c_3x1)n tower_mixed = merge([tower_conv,tower_conv1_2], mode=concat, axis=3)n tower_out = conv_2d(tower_mixed, net.get_shape()[3], 1, normalizer_fn=batch_normalization, activation=None, name=Conv2d_1x1)n net += scale * tower_outn if activation:n if isinstance(activation, str):n net = activations.get(activation)(net)n elif hasattr(activation, __call__):n net = activation(net)n else:n raise ValueError("Invalid Activation.")n return netnn # Data loading and preprocessingn import tflearn.datasets.oxflower17 as oxflower17n X, Y = oxflower17.load_data(one_hot=True, resize_pics=(299, 299))nn num_classes = 17n dropout_keep_prob = 0.8nn network = input_data(shape=[None, 299, 299, 3])n conv1a_3_3 = conv_2d(network, 32, 3, strides=2, normalizer_fn=batch_normalization, padding=VALID,activation=relu,name=Conv2d_1a_3x3)n conv2a_3_3 = conv_2d(conv1a_3_3, 32, 3, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_2a_3x3)n conv2b_3_3 = conv_2d(conv2a_3_3, 64, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_2b_3x3)n maxpool3a_3_3 = max_pool_2d(conv2b_3_3, 3, strides=2, padding=VALID, name=MaxPool_3a_3x3)n conv3b_1_1 = conv_2d(maxpool3a_3_3, 80, 1, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_3b_1x1)n conv4a_3_3 = conv_2d(conv3b_1_1, 192, 3, normalizer_fn=batch_normalization, padding=VALID,activation=relu, name=Conv2d_4a_3x3)n maxpool5a_3_3 = max_pool_2d(conv4a_3_3, 3, strides=2, padding=VALID, name=MaxPool_5a_3x3)nn tower_conv = conv_2d(maxpool5a_3_3, 96, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b0_1x1)nn tower_conv1_0 = conv_2d(maxpool5a_3_3, 48, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b1_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 64, 5, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b1_0b_5x5)nn tower_conv2_0 = conv_2d(maxpool5a_3_3, 64, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b2_0a_1x1)n tower_conv2_1 = conv_2d(tower_conv2_0, 96, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_5b_b2_0b_3x3)n tower_conv2_2 = conv_2d(tower_conv2_1, 96, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b2_0c_3x3)nn tower_pool3_0 = avg_pool_2d(maxpool5a_3_3, 3, strides=1, padding=same, name=AvgPool_5b_b3_0a_3x3)n tower_conv3_1 = conv_2d(tower_pool3_0, 64, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b3_0b_1x1)nn tower_5b_out = merge([tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1], mode=concat, axis=3)nn net = repeat(tower_5b_out, 10, block35, scale=0.17)tower_conv2_2 = conv_2d(tower_conv2_1, 96, 3, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b2_0c_3x3)nn tower_pool3_0 = avg_pool_2d(maxpool5a_3_3, 3, strides=1, padding=same, name=AvgPool_5b_b3_0a_3x3)n tower_conv3_1 = conv_2d(tower_pool3_0, 64, 1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_5b_b3_0b_1x1)nn tower_5b_out = merge([tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1], mode=concat, axis=3)nn net = repeat(tower_5b_out, 10, block35, scale=0.17)nn tower_conv = conv_2d(net, 384, 3, normalizer_fn=batch_normalization, strides=2,activation=relu, padding=VALID, name=Conv2d_6a_b0_0a_3x3)n tower_conv1_0 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 256, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0b_3x3)n tower_conv1_2 = conv_2d(tower_conv1_1, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_6a_b1_0c_3x3)n tower_pool = max_pool_2d(net, 3, strides=2, padding=VALID,name=MaxPool_1a_3x3)n net = merge([tower_conv, tower_conv1_2, tower_pool], mode=concat, axis=3)n net = repeat(net, 20, block17, scale=0.1)nn tower_conv = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv0_1 = conv_2d(tower_conv, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_0a_1x1)nn tower_conv1 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, padding=VALID, activation=relu,name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1,288,3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=COnv2d_1a_3x3)nn tower_conv2 = conv_2d(net, 256,1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)n tower_conv2_1 = conv_2d(tower_conv2, 288,3, normalizer_fn=batch_normalization, name=Conv2d_0b_3x3,activation=relu)n tower_conv2_2 = conv_2d(tower_conv2_1, 320, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=Conv2d_1a_3x3)nn tower_pool = max_pool_2d(net, 3, strides=2, padding=VALID, name=MaxPool_1a_3x3)n net = merge([tower_conv0_1, tower_conv1_1,tower_conv2_2, tower_pool], mode=concat, axis=3)nn net = repeat(net, 9, block8, scale=0.2)n net = block8(net, activation=None)nn net = conv_2d(net, 1536, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_7b_1x1)n net = avg_pool_2d(net, net.get_shape().as_list()[1:3],strides=2, padding=VALID, name=AvgPool_1a_8x8)n net = flatten(net)n net = dropout(net, dropout_keep_prob)n loss = fully_connected(net, num_classes,activation=softmax)nnn network = tflearn.regression(loss, optimizer=RMSprop,n loss=categorical_crossentropy,n learning_rate=0.0001)n model = tflearn.DNN(network, checkpoint_path=inception_resnet_v2,n max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir="./tflearn_logs/")n model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,n show_metric=True, batch_size=32, snapshot_step=2000,n snapshot_epoch=False, run_id=inception_resnet_v2_17flowers)nn tower_conv = conv_2d(net, 384, 3, normalizer_fn=batch_normalization, strides=2,activation=relu, padding=VALID, name=Conv2d_6a_b0_0a_3x3)n tower_conv1_0 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1_0, 256, 3, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_6a_b1_0b_3x3)n tower_conv1_2 = conv_2d(tower_conv1_1, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_6a_b1_0c_3x3)n tower_pool = max_pool_2d(net, 3, strides=2, padding=VALID,name=MaxPool_1a_3x3)n net = merge([tower_conv, tower_conv1_2, tower_pool], mode=concat, axis=3)n net = repeat(net, 20, block17, scale=0.1)nn tower_conv = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_0a_1x1)n tower_conv0_1 = conv_2d(tower_conv, 384, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID, activation=relu,name=Conv2d_0a_1x1)nn tower_conv1 = conv_2d(net, 256, 1, normalizer_fn=batch_normalization, padding=VALID, activation=relu,name=Conv2d_0a_1x1)n tower_conv1_1 = conv_2d(tower_conv1,288,3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=COnv2d_1a_3x3)nn tower_conv2 = conv_2d(net, 256,1, normalizer_fn=batch_normalization, activation=relu,name=Conv2d_0a_1x1)n tower_conv2_1 = conv_2d(tower_conv2, 288,3, normalizer_fn=batch_normalization, name=Conv2d_0b_3x3,activation=relu)n tower_conv2_2 = conv_2d(tower_conv2_1, 320, 3, normalizer_fn=batch_normalization, strides=2, padding=VALID,activation=relu, name=Conv2d_1a_3x3)nn tower_pool = max_pool_2d(net, 3, strides=2, padding=VALID, name=MaxPool_1a_3x3)n net = merge([tower_conv0_1, tower_conv1_1,tower_conv2_2, tower_pool], mode=concat, axis=3)nn net = repeat(net, 9, block8, scale=0.2)n net = block8(net, activation=None)nn net = conv_2d(net, 1536, 1, normalizer_fn=batch_normalization, activation=relu, name=Conv2d_7b_1x1)n net = avg_pool_2d(net, net.get_shape().as_list()[1:3],strides=2, padding=VALID, name=AvgPool_1a_8x8)n net = flatten(net)n net = dropout(net, dropout_keep_prob)n loss = fully_connected(net, num_classes,activation=softmax)nnn network = tflearn.regression(loss, optimizer=RMSprop,n loss=categorical_crossentropy,n learning_rate=0.0001)n model = tflearn.DNN(network, checkpoint_path=inception_resnet_v2,n max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir="./tflearn_logs/")n model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,n show_metric=True, batch_size=32, snapshot_step=2000,n snapshot_epoch=False, run_id=inception_resnet_v2_17flowers)n
想要run下的可以去使用下tflearn,注意更改conv_2d裡面的內容,我這裡在本身conv_2d上加了個normalizer_fn,來使用batch_normalization。
MXnet 煉丹
不同的丹爐,即使是相同的丹方,煉丹的方式都不僅相同。
在打算用MXnet實現inception-resnet-v2之前,除了mxnet-101裡面的代碼,基本沒有寫過mxnet,但是沒關係,不怕,有很多其他大神寫的丹方,這裡具體參考了symbol_inception-bn.py。首先,為了減少代碼條數,參考創建一個ConvFactory,但是和inception-bn不同的是,inception-resnet-v2要考慮是否要激活函數的版本。所以inception-resnet-v2的ConvFactory如下:
def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), act_type="relu", mirror_attr={},with_act=True):n conv = mx.symbol.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad)n bn = mx.symbol.BatchNorm(data=conv)n if with_act:n act = mx.symbol.Activation(data = bn, act_type=act_type, attr=mirror_attr)n return actn else:n return bnn
然後就簡單了,按照網路一路往下寫:
def get_symbol(num_classes=1000,input_data_shape=(64,3,299,299)):n data = mx.symbol.Variable(name=data)n conv1a_3_3 = ConvFactory(data=data, num_filter=32, kernel=(3,3), stride=(2, 2))n conv2a_3_3 = ConvFactory(conv1a_3_3, 32, (3,3))n conv2b_3_3 = ConvFactory(conv2a_3_3, 64, (3,3), pad=(1,1))n maxpool3a_3_3 = mx.symbol.Pooling(data=conv2b_3_3, kernel=(3, 3), stride=(2, 2), pool_type=max)n conv3b_1_1 = ConvFactory(maxpool3a_3_3, 80 ,(1,1))n conv4a_3_3 = ConvFactory(conv3b_1_1, 192, (3,3))n maxpool5a_3_3 = mx.symbol.Pooling(data=conv4a_3_3, kernel=(3,3), stride=(2,2), pool_type=max)nn tower_conv = ConvFactory(maxpool5a_3_3, 96, (1,1))n tower_conv1_0 = ConvFactory(maxpool5a_3_3, 48, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 64, (5,5), pad=(2,2))nn tower_conv2_0 = ConvFactory(maxpool5a_3_3, 64, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 96, (3,3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 96, (3,3), pad=(1,1))nn tower_pool3_0 = mx.symbol.Pooling(data=maxpool5a_3_3, kernel=(3,3), stride=(1,1),pad=(1,1), pool_type=avg)n tower_conv3_1 = ConvFactory(tower_pool3_0, 64, (1,1))n tower_5b_out = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2, tower_conv3_1])n
然後就不對了,要重複條用一個block35的結構,repeat函數很容易實現,給定調用次數,調用函數,參數, 多次調用就好了:
def repeat(inputs, repetitions, layer, *args, **kwargs):n outputs = inputsn for i in range(repetitions):n outputs = layer(outputs, *args, **kwargs)n return outputsn
這裡很簡單,但是block35就有問題啦,這個子結構的目的要輸出與輸入同樣大小的channel數,之前因為在tensorflow下寫的,很容易拿到一個Variable的shape,但是在MXnet上就很麻煩,這裡不知道怎麼做,提了個issue How can i get the shape with the net?,然後就去查api,發現有個infer_shape,mxnet客服部小夥伴也讓我用這個去做, 試了試,挺管用能夠拿到shape,但是必須給入一個4d的tensor的shape,比如(64,3,299,299),他會在graph運行時infer到對應symbol的shape,然後就這麼寫了:
def block35(net, input_data_shape, scale=1.0, with_act=True, act_type=relu, mirror_attr={}):n assert len(input_data_shape) == 4, input_data_shape should be len of 4, your n input_data_shape is len of %d%len(input_data_shape)n _, out_shape,_ = net.infer_shape(data=input_data_shape)n tower_conv = ConvFactory(net, 32, (1,1))n tower_conv1_0 = ConvFactory(net, 32, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 32, (3, 3), pad=(1,1))n tower_conv2_0 = ConvFactory(net, 32, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 48, (3, 3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 64, (3, 3), pad=(1,1))n tower_mixed = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2])n tower_out = ConvFactory(tower_mixed, out_shape[0][1], (1,1), with_act=False)nn net += scale * tower_outn if with_act:n act = mx.symbol.Activation(data = net, act_type=act_type, attr=mirror_attr)n return actn else:n return netn
大家是不是感到很彆扭,我也覺得很彆扭,但是我一直是個『不拘小節』的工程師,對這塊不斤斤計較,所以,寫完這塊之後也覺得就成了接下來就是block17, block8, 這裡都很簡單的很類似,就不提了。
然後就接下來一段,很快就完成了:
net = repeat(tower_5b_out, 10, block35, scale=0.17, input_num_channels=320)n tower_conv = ConvFactory(net, 384, (3,3),stride=(2,2))n tower_conv1_0 = ConvFactory(net, 256, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 256, (3,3), pad=(1,1))n tower_conv1_2 = ConvFactory(tower_conv1_1, 384, (3,3),stride=(2,2))n tower_pool = mx.symbol.Pooling(net, kernel=(3,3), stride=(2,2), pool_type=max)n net = mx.symbol.Concat(*[tower_conv, tower_conv1_2, tower_pool])n net = repeat(net, 20, block17, scale=0.1, input_num_channels=1088)n tower_conv = ConvFactory(net, 256, (1,1))n tower_conv0_1 = ConvFactory(tower_conv, 384, (3,3), stride=(2,2))n tower_conv1 = ConvFactory(net, 256, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1, 288, (3,3), stride=(2,2))n tower_conv2 = ConvFactory(net, 256, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2, 288, (3,3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 320, (3,3), stride=(2,2))n tower_pool = mx.symbol.Pooling(net, kernel=(3,3), stride=(2,2), pool_type=max)n net = mx.symbol.Concat(*[tower_conv0_1, tower_conv1_1, tower_conv2_2, tower_pool])nn net = repeat(net, 9, block8, scale=0.2, input_num_channels=2080)n net = block8(net, with_act=False, input_num_channel=2080)nn net = ConvFactory(net, 1536, (1,1))n net = mx.symbol.Pooling(net, kernel=(1,1), global_pool=True, stride=(2,2), pool_type=avg)n net = mx.symbol.Flatten(net)n net = mx.symbol.Dropout(data=net,p= 0.8)n net = mx.symbol.FullyConnected(data=net,num_hidden=num_classes)n softmax = mx.symbol.SoftmaxOutput(data=net, name=softmax)n
感覺很開心,寫完了,這麼簡單,大家先忽略先忽悠所有的pad值,因為找不到沒有pad的版本,所以大家先忽略下。然後就是寫樣例測試呀,又是17flowers這個數據集,參考mxnet-101中如何把dataset轉換為binary, 首先寫個py來get到所有的圖像list,index還有他的label_index,這個很快就解決了。
具體參考我這裡的mxnet-101 然後就是拿數據開始run啦,Ready? Go!
咦,車開不起來,不對,都是些什麼鬼? infer_shape 有問題? 沒事,查api tensorflow中padding是」valid」和」same」,mxnet中沒有, 沒有…,要自己計算,什麼鬼?沒有valid,same,我不會呀!!!
寫了這麼久,就不寫了?不行,找下怎麼搞定,看了tensorflow的文檔,翻了資料,same就是保證input與output保持一致,valid就無所謂,不需要設置pad,所以當tensorflow中有same的時候,就需要在mxnet中設置對應的pad值,kernel為3的時候pad=1, kernel=5,pad=2。這裡改來改去,列印出每一層網路後的shape,前後花了我大概6個小時,終於讓我一步一步debug出來了,但是不對,在repeat 10次block35後,怎麼和tf.slim的inception-resnet-v2的注釋的shape不同?
我了個擦,當時已經好像快凌晨4點了,本以為run起來了,怎麼就解釋不通呢?不會tensorflow的注釋有問題吧?我了個擦,老美真是數學有點問題,提了個issue,很快就有人fix然後commit了 may be an error in slim.nets.inception_resnet_v2 #634,不過貌似到現在還沒有被merge。
一切ok,開始run了,用17flowers,很快可以收斂,沒有更多的資源來測試更大的數據集,就直接提交了,雖然代碼很爛,但怎麼著也是一步一步寫出來的,可是,始終確實是有點問題,後來經過github的好心人指點肯定也是一個大牛,告訴我Pooling 有個global_pool來做全局的池化,我了個擦,這麼好的東西,tensorflow上可沒有,所以tensorflow上用的是通過get_shape拿到對應的tensor的width和height來做pooling,我也二筆的在mxne它裡面這樣用,所以需要input_shape_shape來infer到所在layer的shape,來做全局池化,有了這個,我還infer_shape個什麼鬼,blockxx裡面也不需要了,channel數可以直接手工計算,傳一個channel數就好了,get_symbol也可以保持和原來一樣不需要傳什麼input_data_shape啦!!!
感謝zhreshold的提示,一切都ok,更改了,但是後面mxnet的大神在重構一些代碼,還沒有merge,不過沒有關係,等他們ok了 我再把inception-resnet-v2整理下,再提pr(教練,我想當mxnet contributor)。
def block35(net, input_num_channels, scale=1.0, with_act=True, act_type=relu, mirror_attr={}):n tower_conv = ConvFactory(net, 32, (1,1))n tower_conv1_0 = ConvFactory(net, 32, (1,1))n tower_conv1_1 = ConvFactory(tower_conv1_0, 32, (3, 3), pad=(1,1))n tower_conv2_0 = ConvFactory(net, 32, (1,1))n tower_conv2_1 = ConvFactory(tower_conv2_0, 48, (3, 3), pad=(1,1))n tower_conv2_2 = ConvFactory(tower_conv2_1, 64, (3, 3), pad=(1,1))n tower_mixed = mx.symbol.Concat(*[tower_conv, tower_conv1_1, tower_conv2_2])n tower_out = ConvFactory(tower_mixed, input_num_channels, (1,1), with_act=False)nn net += scale * tower_outn if with_act:n act = mx.symbol.Activation(data = net, act_type=act_type, attr=mirror_attr)n return actn else:n return netn
一直到這裡,inception-resnet-v2就寫出來了,但是只是測試了小數據集,後來在zhihu上偶遇李沐大神,果斷上去套近乎,最後拿到一個一台機器,就在測大一點的數據集,其實也不大,102flowers,之後會請沐神幫忙擴展一個大點的盤來放下ImageNet,測試一下性能,不過現在102flowers也還行,效果還不錯。
丹成
金丹品階高低,以丹紋記,不同煉丹材料丹紋不同,評判標準也不同,acc是最常用判斷金丹品階高低的手段。
將102flower按9:1分成訓練集和驗證集,設置300個epoch(數據集比較小,貌似設置多點epoch才能有比較好的性能,看有小夥伴用inception-bn在imagenet上只需要50個epoch),網路inception-resnet-v2確實大,如此小的數據集上300 epoch大概也需要1天,不過對比tensorflow那是快多了。
編不下去了(predict)
這裡,會簡單地寫一個inference的例子,算作學習如果使用訓練好的model,注意還是最好使用python的opencv,因為mxnet官方是用的opencv,使用cv2這個庫,我在網上找的使用skimage的庫,做出來的始終有問題,應該是brg2rgb的問題,使用cv2的cv2.cvtColor(img, cv2.COLOR_BGR2RGB之後會成功:
import mxnet as mxn import loggingn import numpy as npn import cv2n import scipy.io as sionn logger = logging.getLogger()n logger.setLevel(logging.DEBUG)n num_round = 260n prefix = "102flowers"n model = mx.model.FeedForward.load(prefix, num_round, ctx=mx.cpu(), numpy_batch_size=1)n # synset = [l.strip() for l in open(Inception/synset.txt).readlines()]nnn def PreprocessImage(path, show_img=False):n # load imagen img = cv2.imread(path)n img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)n mean_img = mx.nd.load(mean.bin).values()[0].asnumpy()n print img.shapen print mean_img.shapen img = cv2.resize(img,(299,299))n img = np.swapaxes(img, 0, 2)n img = np.swapaxes(img, 1, 2)n img = img -mean_imgn img = img[np.newaxis, :]n print img.shapenn return imgnn right = 0n sum = 0n with open(test.lst, r) as fread:n for line in fread.readlines()[:20]:n sum +=1n batch = ../day2/102flowers/ + line.split("t")[2].strip("n")n print batchn batch = PreprocessImage(batch, False)n prob = model.predict(batch)[0]n pred = np.argsort(prob)[::-1]n # # Get top1 labeln # top1 = synset[pred[0]]n top_1 = pred[0]n if top_1 == int(line.split("t")[1]):n print top1 rightn right += 1nn print top 1 accuracy: %f %(right/(1.0*sum))n
使用第260個epoch的模型weight,這裡因為手賤刪除了9:1時的test.lst,只能用7:3是的test.lst暫時做計算,最後accuracy應該會比較偏高,不過這不是重點。
總結(繼續編不下去了)
在這樣一次暢快淋漓的mxnet之旅後,總結一下遇到的幾個坑,與大家分享:
- 無法直接拿到tensor的shape信息,通過infer_shape,在設計代碼時走了很多;
- im2rec時,準備的train.lst, test.lst未shuffle,在102flowers上我都沒有發覺,在後面做鑒黃的training的時候發現開始training accuracy,分析可能是train.lst未shuffle的問題(以為在ImageRecordIter中有shuffle參數,就不需要),改了後沒有training accuracy從開始就為1的情況;
- pad值的問題,翻閱了很多資料才解決,文檔也沒有特別多相關的,對於我這種從tensorflow轉mxnet的小夥伴來說是個比較大的坑;
- predict的問題,找了mxnet github的源上的example,並不能成功,在找官網上的example發現使用的是cv2,並不是一些例子當中的skimage,考慮到mxnet在安裝時需要opencv,可能cv2和skimage在一些標準上有差異,就改用cv2的predict版本,還有讀入圖片之後要cv2.cvtColor(img, cv2.COLOR_BGR2RGB).
還是predict的問題,在mxnet中,構造ImageRecordIter時沒有指定mean.bin,但是並不是說計算的時候不會減到均值圖片在訓練,開始誤解為不需要減到均值圖片,後來發現一直不正確,考慮到train的時候會自己生成mean.bin,猜測可能是這裡的問題,通過mean_img = mx.nd.load(mean.bin).values()[0].asnumpy()讀入後,在原始圖片減去均值圖,結果ok;但整個流程相對於tf.slim的predict還是比較複雜的。
優點
速度快,速度快,速度快,具體指沒有做測量,但是相對於tensorflow至少兩到三倍;
- 佔用內存低, 同樣batch和模型,12g的顯存,tf會爆,但是mxnet只需要佔用7g多點;
- im2rec很方便,相對於tensorflow下tfrecord需要寫部分代碼,更容易入手,但是切記自己生成train.lst, test.lst的時候要shuffle;
Pooling下的global_pool是個好東西,tensorflow沒有;
That is all!之後會在ImageNet Dataset做一下測試,感覺會更有意思。
坐等更新,期待!
相關閱讀推薦:
機器學習進階筆記之六 | 深入理解Fast Neural Style
機器學習進階筆記之五 | 深入理解VGGResidual Network
機器學習進階筆記之四 | 深入理解GoogLeNet
機器學習進階筆記之三 | 深入理解Alexnet
機器學習進階筆記之二 | 深入理解Neural Style
機器學習進階筆記之一 | TensorFlow安裝與入門
本文由『UCloud內核與虛擬化研發團隊』提供。關於作者:
Burness(@段石石 ), UCloud平台研發中心深度學習研發工程師,tflearn Contributor & tensorflow Contributor,做過電商推薦、精準化營銷相關演算法工作,專註於分散式深度學習框架、計算機視覺演算法研究,平時喜歡玩玩演算法,研究研究開源的項目,偶爾也會去一些數據比賽打打醬油,生活中是個極客,對新技術、新技能痴迷。
你可以在Github上找到他:http://hacker.duanshishi.com/
「UCloud機構號」將獨家分享雲計算領域的技術洞見、行業資訊以及一切你想知道的相關訊息。
歡迎提問&求關注 o(*////▽////*)q~
以上。
推薦閱讀:
※關於情感討論的結語
※怎樣通過神經網路學習語義?
※機器學習與微博:TensorFlow在微博的大規模應用與實踐
※編程是大眾化需求嗎?關於設計和編程的思考