TensorFlow小試牛刀(1):CNN圖像分類

深度學習不能只是一味的看paper,看源碼,必須要親自動手寫代碼。最近好好學了下TensorFlow,順便自己寫了一個簡單的CNN來實現圖像分類,也遇到了不少問題,但都一一解決,也算是收穫滿滿。重在實現,不在結果。

首先我使用的數據集是CIFAR-10,可以從這裡獲得CIFAR-10

IDE使用的是ipython notebook(並不好用,建議少用ipynb)

模型結構層數比較少,因為我的筆記本並跑不快。

兩個卷積層,兩個全連接層,最後加一個softmax分類器。

1.數據預處理

首先是讀入CIFAR-10數據的部分,我參考了一下以前cs231n作業裡面讀入數據的格式。

import tensorflow as tfnimport numpy as npnimport osnfrom tensorflow.contrib.layers.python.layers import batch_norm as batch_normnBATCH_SIZE = 64nNUM_CLASS = 10nn# read imagendef load_CIFAR_batch(filename):n import picklen with open(filename, rb) as f:n datadict = pickle.load(f, encoding=bytes)n #print(datadict)n X = datadict[bdata]n Y = datadict[blabels]n X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")n Y = np.array(Y)n return X, Yn ndef load_CIFAR10(ROOT):n xs = []n ys = []n for b in range(1,6):n f = os.path.join(ROOT, data_batch_%d % (b, ))n X, Y = load_CIFAR_batch(f)n xs.append(X)n ys.append(Y)n n Xtr = np.concatenate(xs)n Ytr = np.concatenate(ys)n del X,Yn Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, test_batch))n return Xtr, Ytr, Xte, Ytenn ndef read_data(num_training=49000, num_validation=1000, num_test=1000):n cifar10_dir = data/cifar-10-batches-pyn X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)n # Subsample the datan mask = range(num_training, num_training + num_validation)n X_val = X_train[mask]n y_val = y_train[mask]n mask = range(num_training)n X_train = X_train[mask]n y_train = y_train[mask]n mask = range(num_test)n X_test = X_test[mask]n y_test = y_test[mask]nn # Normalize the data: subtract the mean imagen mean_image = np.mean(X_train, axis=0)n X_train -= mean_imagen X_val -= mean_imagen X_test -= mean_imagen n # Transpose so that channels come firstn X_train = X_train.transpose(0, 3, 1, 2).copy()n X_val = X_val.transpose(0, 3, 1, 2).copy()n X_test = X_test.transpose(0, 3, 1, 2).copy()nn # Package data into a dictionaryn return {n X_train: X_train, y_train: y_train,n X_val: X_val, y_val: y_val,n X_test: X_test, y_test: y_test,n }nndata = read_data()nfor x, y in data.items():n print(%s: % x, y.shape)nn"""n輸出ny_test: (1000,)nX_test: (1000, 3, 32, 32)ny_train: (49000,)nX_train: (49000, 3, 32, 32)ny_val: (1000,)nX_val: (1000, 3, 32, 32)n"""n

2.layer實現

讀入數據沒什麼好說的,模仿別人的代碼即可,接下來是實現每一個層的函數,卷積層,bn層,全連接層。

def conv2d(value, output_dim, k_h = 5, k_w = 5, strides = [1,2,2,1], name = conv2d):n with tf.variable_scope(name):n try:n weights =tf.get_variable( weights,n [k_h, k_w, value.get_shape()[-1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, n [output_dim],initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights =tf.get_variable( weights,n [k_h, k_w, value.get_shape()[-1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, n [output_dim],initializer = tf.constant_initializer(0.0))n conv = tf.nn.conv2d(value, weights, strides = strides, padding = SAME)n conv = conv + biasesn return convn ndef batch_norm_layer(value, is_train = True, name = batch_norm):n with tf.variable_scope(name) as scope:n if is_train:n return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,is_training = is_train, updates_collections = None, scope = scope)n else :n return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,is_training = is_train, reuse = True,updates_collections = None, scope = scope)n ndef linear_layer(value, output_dim, name = fully_connected):n with tf.variable_scope(name):n try:n weights = tf.get_variable( weights,n [value.get_shape()[1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, n [output_dim], initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights = tf.get_variable( weights,n [value.get_shape()[1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, [output_dim], n initializer = tf.constant_initializer(0.0))n return tf.matmul(value, weights) + biasesn ndef softmax(value, output_dim, name = softmax):n with tf.variable_scope(name):n try:n weights = tf.get_variable( weights,n [value.get_shape()[1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, n [output_dim], initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights = tf.get_variable( weights,n [value.get_shape()[1], output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable( biases, n [output_dim], initializer = tf.constant_initializer(0.0))n return tf.nn.softmax(tf.matmul(value, weights) + biases)n

在實現每個層的函數時,我使用了變數作用域,讓每個變數都擁有自己的name,並且為了防止有時候會出現變數已存在的情況,用try來捕獲ValueError,這樣就可以很好的避免有時候多次運行導致變數重用。

對於卷積層的實現,想必不用多說,創建weights和biases後主要是調用tf.nn.conv2d方法就行了。bn層我也使用了tf自帶的方法。調用tf.nn.softmax()可以直接對算出來的scores計算交叉熵損失,關於softmax和交叉熵損失的具體介紹可以參考CS231n課程筆記翻譯:線性分類筆記(下)

3.model實現

下面是模型以及計算loss的函數:

def CNN(image, train = True):n conv1 = conv2d(image, 64, k_h = 5, k_w = 5, strides = [1,1,1,1], name = cnn_conv2d1)n conv1 = tf.nn.relu(conv1, name = relu1)n pool1 = tf.nn.max_pool(conv1, ksize = [1, 3, 3, 1], strides = [1, 2, 2, 1],padding = SAME, name = cnn_pool1)n norm1 = batch_norm_layer(pool1, is_train = train, name = cnn_norm1)n n conv2 = conv2d(norm1, 64, k_h = 5, k_w = 5, strides = [1,1,1,1], name = cnn_conv2d2)n conv2 = tf.nn.relu(conv2, name = relu2)n norm2 = batch_norm_layer(conv2, is_train = train, name = cnn_norm2)n pool2 = tf.nn.max_pool(norm2, ksize = [1, 3, 3, 1], strides = [1,2,2,1],padding = SAME, name = cnn_pool2)n n dim = int(pool2.get_shape()[1])*int(pool2.get_shape()[2])*int(pool2.get_shape()[3]);n pool2 = tf.reshape(pool2, [-1, dim])n fc1 = linear_layer(pool2, 384, name = cnn_fc1)n fc1 = tf.nn.relu(fc1, name = relu3)n n fc2 = linear_layer(fc1, 192, name = cnn_fc2)n fc2 = tf.nn.relu(fc2, name = relu4)n n softmax_result = softmax(fc2, NUM_CLASS, name = cnn_softmax)n return softmax_resultnndef cal_loss(scores, labels):n cross_entropy = -tf.reduce_mean(labels * tf.log(scores))n return cross_entropyn

model中每個層,我都賦值了不同的name,這樣可以讓每一層的weights和biases在不同的作用域內,這樣就不會衝突。

輸入的是Tensor類型,所以需要使用get_shape()的方法獲取它的維度信息,CNN函數最後返回的是一個維度為[batch_size, 10]的Tensor,然後這個Tensor傳入cal_loss中計算平均交叉熵損失。

4.train部分

最後是演算法運行部分,對於輸入數據,我設置了兩個佔位符images和y,讓演算法可以適用於各種輸入數據batch的大小。讀入的標籤y是一維的向量,但是在softmax中使用需要是[batch_size, 10]維度的,所以需要轉換成one-hot編碼。

train的部分使用的是tf.train.AdamOptimizer(0.0002, beta1 = 0.5).minimize(loss),經過測試,Adam的收斂速度比SGD快了許多倍。

我同時使用了變數存儲機制,對訓練中的模型進行了保存,在下一次運行時可以從斷點處進行。

y_train = data[y_train]ny_val = data[y_val]nX_val = data[X_val].transpose(0,2,3,1)nX_test = data[X_test].transpose(0,2,3,1)ny_test = data[y_test]nX_train = data[X_train].transpose(0,2,3,1)nnglobal_step = tf.Variable(0, name = global_step, trainable = False)ncurr_epoch = tf.Variable(0, name = curr_epoch, trainable = False)ncurr_batch_idx = tf.Variable(0, name = curr_batch_idx, trainable = False)nvalue = tf.placeholder(tf.int32, [], name = value)nimages = tf.placeholder(tf.float32, [None, 32, 32, 3], name = images)ny = tf.placeholder(tf.int32, [None], name = y)n_y = tf.one_hot(y, depth = 10, on_value=None, off_value=None, axis=None, dtype=None, name=one_hot)nnt_vars = tf.trainable_variables()nsoftmax_result = CNN(images)nloss = cal_loss(softmax_result, _y)nntrain_step = tf.train.AdamOptimizer(0.0002, beta1 = 0.5).minimize(loss)nncorrect_prediction = tf.equal(tf.to_int32(y), tf.to_int32(tf.argmax(softmax_result, 1)))naccuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))nnop_assign1 = tf.assign(curr_epoch, value)nop_assign2 = tf.assign(curr_batch_idx, value)nncheck_path = "data/CNN/model.ckpt"nsaver = tf.train.Saver()nsess = tf.InteractiveSession()ninit = tf.global_variables_initializer()nsess.run(init)nsaver.restore(sess, check_path)nnepoch_ckpt = curr_epoch.eval()nidx_ckpt = curr_batch_idx.eval()nprint(idx_ckpt)nfor epoch in range(epoch_ckpt,100):n batch_idx = int(49000/64)n sess.run(op_assign1, feed_dict={value: epoch})n for idx in range(idx_ckpt, batch_idx):n sess.run(op_assign2, feed_dict = {value: idx+1})n batch_images = X_train[idx*64:idx*64+64]n batch_labels = y_train[idx*64:idx*64+64]n sess.run(train_step, feed_dict = {images: batch_images, y: batch_labels})n if idx%100==0:n print("Epoch: %d [%4d/%4d] loss: %.8f, accuracy: %.8f" % (epoch, idx, batch_idx, loss.eval({images: X_test, y: y_test}),accuracy.eval({images: X_test, y: y_test})))n saver.save(sess, check_path)n idx_ckpt = 0n n

對於像loss和accuracy這樣,需要輸入數據才能知道值的變數,可以使用eval(feed_dict={})這個方法來獲取值。每訓練100次,我就使用test數據,獲取此時的test loss和test accuracy。

訓練的結果如下圖所示:

大概訓練了七八個epoch之後,test accuracy和loss都基本穩定了,分類正確率大概在75%左右。後來在epoch8的後段,loss突然變成了nan(not a number),具體原因還沒有搞懂,我推測是因為訓練擬合之後,計算loss那邊,log裡面的內容出現了0的緣故吧。在log里加了一個eps之後,跑到了11個epoch還沒有出現nan,看來應該就是這個問題吧。

雖然寫的網路很小,分類正確率很低,但是這次實踐還是讓我收穫頗多。以前總是眼睛看看,感覺tf也是很簡單很好理解,但是真的自己寫起來,問題還是挺多的,感覺對tf的理解更深刻了。同時在ValueError的問題上花了很多時間,也讓我知道了一些解決策略,以及以後再也不用ipynb寫深度學習了。

推薦閱讀:

2.2 RNN入門
2.1 TensorFlow實踐-入門與數字識別示例解析(1)
MobileNet教程(2):用TensorFlow做一個安卓圖像分類App
TensorFlow 聊天機器人開源項目評測第一期:DeepQA
cs20si: tensorflow for research 學習筆記1

TAG:TensorFlow | 深度学习DeepLearning | 人工智能 |