【博客存檔】TensorFlow之深入理解AlexNet

前言

前面看了一些Tensorflow的文檔和一些比較有意思的項目,發現這裡面水很深的,需要多花時間好好從頭了解下,尤其是cv這塊的東西,特別感興趣,接下來一段時間會開始深入了解ImageNet比賽中中獲得好成績的那些模型: AlexNet、GoogLeNet、VGG(對就是之前在nerual network用的pretrained的model)、deep residual networks。

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks 是Hinton和他的學生Alex Krizhevsky在12年ImageNet Challenge使用的模型結構,刷新了Image Classification的幾率,從此deep learning在Image這塊開始一次次超過state-of-art,甚至於搭到打敗人類的地步,看這邊文章的過程中,發現了很多以前零零散散看到的一些優化技術,但是很多沒有深入了解,這篇文章講解了他們alexnet如何做到能達到那麼好的成績,好的廢話不多說,來開始看文章

這張圖是基本的caffe中alexnet的網路結構,這裡比較抽象,我用caffe的draw_net把alexnet的網路結構畫出來了

AlexNet的基本結構

alexnet總共包括8層,其中前5層convolutional,後面3層是full-connected,文章裡面說的是減少任何一個卷積結果會變得很差,下面我來具體講講每一層的構成:

  • 第一層卷積層 輸入圖像為227*227*3(paper上貌似有點問題224*224*3)的圖像,使用了96個kernels(96,11,11,3),以4個pixel為一個單位來右移或者下移,能夠產生5555個卷積後的矩形框值,然後進行response-normalized(其實是Local Response Normalized,後面我會講下這裡)和pooled之後,pool這一層好像caffe裡面的alexnet和paper裡面不太一樣,alexnet裡面採樣了兩個GPU,所以從圖上面看第一層卷積層厚度有兩部分,池化pool_size=(3,3),滑動步長為2個pixels,得到96個2727個feature。
  • 第二層卷積層使用256個(同樣,分布在兩個GPU上,每個128kernels(5*5*48)),做pad_size(2,2)的處理,以1個pixel為單位移動(感謝網友指出),能夠產生27*27個卷積後的矩陣框,做LRN處理,然後pooled,池化以3*3矩形框,2個pixel為步長,得到256個13*13個features。
  • 第三層、第四層都沒有LRN和pool,第五層只有pool,其中第三層使用384個kernels(3*3*384,pad_size=(1,1),得到384*15*15,kernel_size為(3,3),以1個pixel為步長,得到384*13*13);第四層使用384個kernels(pad_size(1,1)得到384*15*15,核大小為(3,3)步長為1個pixel,得到384*13*13);第五層使用256個kernels(pad_size(1,1)得到384*15*15,kernel_size(3,3),得到256*13*13,pool_size(3,3)步長2個pixels,得到256*6*6)。
  • 全連接層: 前兩層分別有4096個神經元,最後輸出softmax為1000個(ImageNet),注意caffe圖中全連接層中有relu、dropout、innerProduct。

(感謝AnaZou指出上面之前的一些問題) paper裡面也指出了這張圖是在兩個GPU下做的,其中和caffe裡面的alexnet可能還真有點差異,但這可能不是重點,各位在使用的時候,直接參考caffe中的alexnet的網路結果,每一層都十分詳細,基本的結構理解和上面是一致的。

AlexNet為啥取得比較好的結果

前面講了下AlexNet的基本網路結構,大家肯定會對其中的一些點產生疑問,比如LRN、Relu、dropout, 相信接觸過dl的小夥伴們都有聽說或者了解過這些。這裡我講按paper中的描述詳細講述這些東西為什麼能提高最終網路的性能。

ReLU Nonlinearity

一般來說,剛接觸神經網路還沒有深入了解深度學習的小夥伴們對這個都不會太熟,一般都會更了解另外兩個激活函數(真正往神經網路中引入非線性關係,使神經網路能夠有效擬合非線性函數)tanh(x)和(1+e^(-x))^(-1),而ReLU(Rectified Linear Units) f(x)=max(0,x)。基於ReLU的深度卷積網路比基於tanh的網路訓練塊數倍,下圖是一個基於CIFAR-10的四層卷積網路在tanh和ReLU達到25%的training error的迭代次數:

實線、間斷線分別代表的是ReLU、tanh的training error,可見ReLU比tanh能夠更快的收斂

Local Response Normalization

使用ReLU f(x)=max(0,x)後,你會發現激活函數之後的值沒有了tanh、sigmoid函數那樣有一個值域區間,所以一般在ReLU之後會做一個normalization,LRU就是文中提出(這裡不確定,應該是提出?)一種方法,在神經科學中有個概念叫「Lateral inhibition」,講的是活躍的神經元對它周邊神經元的影響。

Dropout

Dropout也是經常挺說的一個概念,能夠比較有效地防止神經網路的過擬合。 相對於一般如線性模型使用正則的方法來防止模型過擬合,而在神經網路中Dropout通過修改神經網路本身結構來實現。對於某一層神經元,通過定義的概率來隨機刪除一些神經元,同時保持輸入層與輸出層神經元的個人不變,然後按照神經網路的學習方法進行參數更新,下一次迭代中,重新隨機刪除一些神經元,直至訓練結束

Data Augmentation

其實,最簡單的增強模型性能,防止模型過擬合的方法是增加數據,但是其實增加數據也是有策略的,paper當中從256*256中隨機提出227*227的patches(paper裡面是224*224),還有就是通過PCA來擴展數據集。這樣就很有效地擴展了數據集,其實還有更多的方法視你的業務場景去使用,比如做基本的圖像轉換如增加減少亮度,一些濾光演算法等等之類的,這是一種特別有效地手段,尤其是當數據量不夠大的時候。

文章裡面,我認為的基本內容就是這個了,基本的網路結構和一些防止過擬合的小的技巧方法,對自己在後面的項目有很多指示作用。

AlexNet On Tensorflow

caffe的AlexNet可以到/models/bvlc_alexnet/train_val.prototxt 去看看具體的網路結構,這裡我會弄點基於Tensorflow的AlexNet: 代碼是在cs.toronto.edu/~guerzho

from numpy import *import osfrom pylab import *import numpy as npimport matplotlib.pyplot as pltimport matplotlib.cbook as cbookimport timefrom scipy.misc import imreadfrom scipy.misc import imresizeimport matplotlib.image as mpimgfrom scipy.ndimage import filtersimport urllibfrom numpy import randomimport tensorflow as tffrom caffe_classes import class_namestrain_x = zeros((1, 227,227,3)).astype(float32)train_y = zeros((1, 1000))xdim = train_x.shape[1:]ydim = train_y.shape[1]net_data = load("bvlc_alexnet.npy").item()def conv(input, kernel, biases, k_h, k_w, c_o, s_h, s_w, padding="VALID", group=1): """From https://github.com/ethereon/caffe-tensorflow """ c_i = input.get_shape()[-1] assert c_i%group==0 assert c_o%group==0 convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) if group==1: conv = convolve(input, kernel) else: input_groups = tf.split(3, group, input) kernel_groups = tf.split(3, group, kernel) output_groups = [convolve(i, k) for i,k in zip(input_groups, kernel_groups)] conv = tf.concat(3, output_groups) return tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape().as_list())x = tf.Variable(i)#conv1#conv(11, 11, 96, 4, 4, padding="VALID", name="conv1")k_h = 11; k_w = 11; c_o = 96; s_h = 4; s_w = 4conv1W = tf.Variable(net_data["conv1"][0])conv1b = tf.Variable(net_data["conv1"][1])conv1_in = conv(x, conv1W, conv1b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=1)conv1 = tf.nn.relu(conv1_in)#lrn1#lrn(2, 2e-05, 0.75, name="norm1")radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0lrn1 = tf.nn.local_response_normalization(conv1, depth_radius=radius, alpha=alpha, beta=beta, bias=bias)#maxpool1#max_pool(3, 3, 2, 2, padding="VALID", name="pool1")k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = "VALID"maxpool1 = tf.nn.max_pool(lrn1, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding)#conv2#conv(5, 5, 256, 1, 1, group=2, name="conv2")k_h = 5; k_w = 5; c_o = 256; s_h = 1; s_w = 1; group = 2conv2W = tf.Variable(net_data["conv2"][0])conv2b = tf.Variable(net_data["conv2"][1])conv2_in = conv(maxpool1, conv2W, conv2b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)conv2 = tf.nn.relu(conv2_in)#lrn2#lrn(2, 2e-05, 0.75, name="norm2")radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0lrn2 = tf.nn.local_response_normalization(conv2, depth_radius=radius, alpha=alpha, beta=beta, bias=bias)#maxpool2#max_pool(3, 3, 2, 2, padding="VALID", name="pool2")k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = "VALID"maxpool2 = tf.nn.max_pool(lrn2, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding)#conv3#conv(3, 3, 384, 1, 1, name="conv3")k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 1conv3W = tf.Variable(net_data["conv3"][0])conv3b = tf.Variable(net_data["conv3"][1])conv3_in = conv(maxpool2, conv3W, conv3b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)conv3 = tf.nn.relu(conv3_in)#conv4#conv(3, 3, 384, 1, 1, group=2, name="conv4")k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 2conv4W = tf.Variable(net_data["conv4"][0])conv4b = tf.Variable(net_data["conv4"][1])conv4_in = conv(conv3, conv4W, conv4b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)conv4 = tf.nn.relu(conv4_in)#conv5#conv(3, 3, 256, 1, 1, group=2, name="conv5")k_h = 3; k_w = 3; c_o = 256; s_h = 1; s_w = 1; group = 2conv5W = tf.Variable(net_data["conv5"][0])conv5b = tf.Variable(net_data["conv5"][1])conv5_in = conv(conv4, conv5W, conv5b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)conv5 = tf.nn.relu(conv5_in)#maxpool5#max_pool(3, 3, 2, 2, padding="VALID", name="pool5")k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = "VALID"maxpool5 = tf.nn.max_pool(conv5, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding)#fc6#fc(4096, name="fc6")fc6W = tf.Variable(net_data["fc6"][0])fc6b = tf.Variable(net_data["fc6"][1])fc6 = tf.nn.relu_layer(tf.reshape(maxpool5, [1, int(prod(maxpool5.get_shape()[1:]))]), fc6W, fc6b)#fc7#fc(4096, name="fc7")fc7W = tf.Variable(net_data["fc7"][0])fc7b = tf.Variable(net_data["fc7"][1])fc7 = tf.nn.relu_layer(fc6, fc7W, fc7b)#fc8#fc(1000, relu=False, name="fc8")fc8W = tf.Variable(net_data["fc8"][0])fc8b = tf.Variable(net_data["fc8"][1])fc8 = tf.nn.xw_plus_b(fc7, fc8W, fc8b)#prob#softmax(name="prob"))prob = tf.nn.softmax(fc8)init = tf.initialize_all_variables()sess = tf.Session()sess.run(init)output = sess.run(prob)#################################################################################Output:inds = argsort(output)[0,:]for i in range(5): print class_names[inds[-1-i]], output[0, inds[-1-i]]

這個是基於原生tensorflow的一版代碼,好長而且看著比較麻煩一點,還load了caffe裡面生成的網路模型,比較麻煩,這裡找了一版稍微簡單的blog.csdn.net/chenriwei:

# 輸入數據import input_datamnist = input_data.read_data_sets("/tmp/data/", one_hot=True)import tensorflow as tf# 定義網路超參數learning_rate = 0.001training_iters = 200000batch_size = 64display_step = 20# 定義網路參數n_input = 784 # 輸入的維度n_classes = 10 # 標籤的維度dropout = 0.8 # Dropout 的概率# 佔位符輸入x = tf.placeholder(tf.types.float32, [None, n_input])y = tf.placeholder(tf.types.float32, [None, n_classes])keep_prob = tf.placeholder(tf.types.float32)# 卷積操作def conv2d(name, l_input, w, b): return tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding="SAME"),b), name=name)# 最大下採樣操作def max_pool(name, l_input, k): return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding="SAME", name=name)# 歸一化操作def norm(name, l_input, lsize=4): return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)# 定義整個網路 def alex_net(_X, _weights, _biases, _dropout): # 向量轉為矩陣 _X = tf.reshape(_X, shape=[-1, 28, 28, 1]) # 卷積層 conv1 = conv2d("conv1", _X, _weights["wc1"], _biases["bc1"]) # 下採樣層 pool1 = max_pool("pool1", conv1, k=2) # 歸一化層 norm1 = norm("norm1", pool1, lsize=4) # Dropout norm1 = tf.nn.dropout(norm1, _dropout) # 卷積 conv2 = conv2d("conv2", norm1, _weights["wc2"], _biases["bc2"]) # 下採樣 pool2 = max_pool("pool2", conv2, k=2) # 歸一化 norm2 = norm("norm2", pool2, lsize=4) # Dropout norm2 = tf.nn.dropout(norm2, _dropout) # 卷積 conv3 = conv2d("conv3", norm2, _weights["wc3"], _biases["bc3"]) # 下採樣 pool3 = max_pool("pool3", conv3, k=2) # 歸一化 norm3 = norm("norm3", pool3, lsize=4) # Dropout norm3 = tf.nn.dropout(norm3, _dropout) # 全連接層,先把特徵圖轉為向量 dense1 = tf.reshape(norm3, [-1, _weights["wd1"].get_shape().as_list()[0]]) dense1 = tf.nn.relu(tf.matmul(dense1, _weights["wd1"]) + _biases["bd1"], name="fc1") # 全連接層 dense2 = tf.nn.relu(tf.matmul(dense1, _weights["wd2"]) + _biases["bd2"], name="fc2") # Relu activation # 網路輸出層 out = tf.matmul(dense2, _weights["out"]) + _biases["out"] return out# 存儲所有的網路參數weights = { "wc1": tf.Variable(tf.random_normal([3, 3, 1, 64])), "wc2": tf.Variable(tf.random_normal([3, 3, 64, 128])), "wc3": tf.Variable(tf.random_normal([3, 3, 128, 256])), "wd1": tf.Variable(tf.random_normal([4*4*256, 1024])), "wd2": tf.Variable(tf.random_normal([1024, 1024])), "out": tf.Variable(tf.random_normal([1024, 10]))}biases = { "bc1": tf.Variable(tf.random_normal([64])), "bc2": tf.Variable(tf.random_normal([128])), "bc3": tf.Variable(tf.random_normal([256])), "bd1": tf.Variable(tf.random_normal([1024])), "bd2": tf.Variable(tf.random_normal([1024])), "out": tf.Variable(tf.random_normal([n_classes]))}# 構建模型pred = alex_net(x, weights, biases, keep_prob)# 定義損失函數和學習步驟cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)# 測試網路correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))# 初始化所有的共享變數init = tf.initialize_all_variables()# 開啟一個訓練with tf.Session() as sess: sess.run(init) step = 1 # Keep training until reach max iterations while step * batch_size < training_iters: batch_xs, batch_ys = mnist.train.next_batch(batch_size) # 獲取批數據 sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout}) if step % display_step == 0: # 計算精度 acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.}) # 計算損失值 loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.}) print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc) step += 1 print "Optimization Finished!" # 計算測試精度 print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images[:256], y: mnist.test.labels[:256], keep_prob: 1.})

基於mnist 構建alexnet,這裡的input可以去tensorflow的github上去找找,這一版代碼比較簡單。

後來發現了tflearn裡面有一個alexnet來分類Oxford的例子,好開心,在基於tflearn對一些日常layer的封裝,代碼量只有不到50行,看了下內部layer的實現,挺不錯的,寫代碼的時候可以多參考參考,代碼地址github.com/tflearn/tfle.

from __future__ import division, print_function, absolute_importimport tflearnfrom tflearn.layers.core import input_data, dropout, fully_connectedfrom tflearn.layers.conv import conv_2d, max_pool_2dfrom tflearn.layers.normalization import local_response_normalizationfrom tflearn.layers.estimator import regressionimport tflearn.datasets.oxflower17 as oxflower17X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))# Building "AlexNet"network = input_data(shape=[None, 227, 227, 3])network = conv_2d(network, 96, 11, strides=4, activation="relu")network = max_pool_2d(network, 3, strides=2)network = local_response_normalization(network)network = conv_2d(network, 256, 5, activation="relu")network = max_pool_2d(network, 3, strides=2)network = local_response_normalization(network)network = conv_2d(network, 384, 3, activation="relu")network = conv_2d(network, 384, 3, activation="relu")network = conv_2d(network, 256, 3, activation="relu")network = max_pool_2d(network, 3, strides=2)network = local_response_normalization(network)network = fully_connected(network, 4096, activation="tanh")network = dropout(network, 0.5)network = fully_connected(network, 4096, activation="tanh")network = dropout(network, 0.5)network = fully_connected(network, 17, activation="softmax")network = regression(network, optimizer="momentum", loss="categorical_crossentropy", learning_rate=0.001)# Trainingmodel = tflearn.DNN(network, checkpoint_path="model_alexnet", max_checkpoints=1, tensorboard_verbose=2)model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True, show_metric=True, batch_size=64, snapshot_step=200, snapshot_epoch=False, run_id="alexnet_oxflowers17")

使用tflearn版本的alexnet來做實驗,從TensorBoard上到到的基本效果如下:

alexnet graph 如下:

Reference

1,ImageNet Classification with Deep Convolutional Neural Networks

2,blog.csdn.net/chenriwei

3,cs.toronto.edu/~guerzho

4,github.com/tflearn/tfle5,github.com/BVLC/caffe/b


推薦閱讀:

2017年GAN 計算機視覺相關paper匯總
港中大劉雲輝教授:自動駕駛、醫療手術、人機交互,機器視覺的應用潛力比你想像的要大
我國的車牌識別系統發展到了什麼水平?
從高層離職的Magic Leap談計算機視覺
[目標檢測] RON-Reverse Connection with Objectness Prior Networks for Object Detection

TAG:TensorFlow | 计算机视觉 | 深度学习DeepLearning |