TensorFlow從1到2 | 第五章非專家莫入！TensorFlow實現CNN

01-30

歡迎關注我們的微信公眾號「人工智慧LeadAI」（ID：atleadai）

當看到本篇時，根據TensorFlow官方標準《Deep MNIST for Experts》，你已經達到Expert Level，要恭喜了。

且不說是否誇大其詞，換一種角度，假如能乘坐時光機僅往回飛5年，藉此CNN實現，你也能在ImageNet上叱吒風雲，戰無不勝。就算飛不回去，它在今天依然是大殺傷力武器，大批大批老演算法等著你去槍斃，大片大片垂直領域換代產品等著你去落地。這還不夠么？

上一篇TensorFlow從1到2 | 第四章：拆解CNN架構準備好了CNN的理論基礎，本篇從代碼層面，來看看TensorFlow如何搞定CNN，使識別精度達到99%以上。

分析代碼的方式

再次說明下分析代碼的方式。

與逐行分析代碼不同，我偏好先清理代碼涉及到的語言、工具的知識點，然後再去掃描邏輯。所以「Python必知必會」、「TensorFlow必知必會」將是首先出現的章節。

當然你也可以直接跳到代碼部分：

tf_2-5_cnn.py：CNN識別MNIST數字，基於官方文檔《Deep MNIST for Experts》，略有微調；

tf_25_cnn_fashion_mnist.py（https://github.com/EthanYuan/TensorFlow/blob/master/TF1_3/tf_2-5_cnn_fashion_mnist.py）：nCNN識別Fashion-MNIST（http://www.jianshu.com/p/2ed1707c610d）；n

代碼運行環境：

1、Python 3.6.2；

2、TensorFlow 1.3.0 CPU version；

python必知必會

with

在本篇所分析的代碼中，用到了大量的With，值得一說。

With要搭配上下文管理器（Context Manager）對象使用。

所謂的上下文管理器對象，就是實現了上下文管理器協議（Context Manager Protocol）的對象。協議要求對象定義中必須實現__enter__()和__exit__()方法。

當看到下面語句時：

With Context Manager Object [as target]： Bodyn

它有4個意思：

1、With塊會在Body開始前自動調用Context Manager Object的__enter__()方法；

2、With塊會在Body結束前自動調用Context Manager Object的__exit__()方法，即使Body還未執行完時發生了異常，__exit__()也總會被調用；

3、Body中出現異常時，Context Manager Object的__exit__()執行如果返回False，異常繼續向上層拋出，如果返回True則該異常被忽略；

4、可選的as target並非是Context Manager Object本身，而是其調用__enter__()的返回值；

總的來說，With語句幫助上下文管理器對象實現了兩個自動化的操作enter和exit，並充分考慮了異常情況。對於資源類對象（用完需要儘快釋放）的使用，比如文件句柄、資料庫連接等等，這無疑是一種簡潔而完善的代碼形式。

如果還想了解更多的細節，推薦閱讀一篇老文章《淺談Python的with語句》。

TensorFlow必知必會

上面說的with，主要是為了配合TensorFlow的tf.name_scope。

tf.name_scope

先來體會下我設計的「玩具」代碼:

import tensorflow as tfnnwith tf.name_scope(V1):n a1 = tf.Variable([50])n a2 = tf.Variable([100], name=a1)nnassert a1.name == V1/Variable:0nassert a2.name == V1/a1:0nnwith tf.name_scope("V2"):n a1 = tf.add(a1, a2, name="Add_Variable_a1")n a2 = tf.multiply(a1, a2, name="Add_Variable_a1")nnwith tf.Session() as sess:n sess.run(tf.global_variables_initializer())nn assert a1.name == V2/Add_Variable_a1:0n assert sess.run(a1) == 150n assert a2.name == V2/Add_Variable_a1_1:0n assert sess.run(a2) == 15000nna2 = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=V1/a1:0)[0]nassert a2.name == V1/a1:0n

可以看到，其中有兩類與With的搭配。

一種是資源類的tf.Session，手工使用時總要記得在使用後調用tf.Session.close方法釋放，而與With搭配使用，則會自動調用其__exit__()進行釋放。

另一種是本節的重點，與With搭配的並不是「資源」，而是tf.name_scope()方法返回的對象，此時在With塊中定義的節點，都會自動在屬性name上添加name scope前綴：

通過tf.Variable定義的變數節點，其屬性name都添加了前綴；
通過tf.add和tf.multiply定義的運算節點，其屬性name也添加了前綴；

注意：通過tf.get_variable定義的節點，其屬性name不受影響，tf.get_variable需要與tf.variable_scope搭配產生類似效果。

TensorFlow的name scope有什麼作用呢？主要是兩點：

1、起到名字空間的作用，name scope還可以嵌套，方便管理大規模計算圖節點；

2、可視化優化控制，能夠生成層次化的計算圖，節點可以按照name scope進行摺疊，見下圖；

節點摺疊

如果對上述介紹仍有疑問，請仔細讀讀下面我為此準備的：

tf.Variable()返回的a1、a2、a3等等Python變數，是對節點的引用，與節點的name屬性沒有半毛錢關係；
Node的name屬性是計算圖中節點的標識，Python層面的節點引用變數則不是，後者可以隨時更改為對其他節點的引用；
如果在Python層面失去了對某一節點的引用，節點並沒有消失，也不會被自動回收，找回方法見玩具代碼倒數第2行；
有關TensorFlow計算圖（Graph）基本構建單元Node的概念，請回顧《TensorFlow從0到1 - 2 - TensorFlow核心編程》。

CNN架構

掃清了障礙，終於可以開始構建CNN了。

TensorFlow官方《Deep MNIST for Experts》中構建的CNN與LeNet-5的深度規模相當，具有5個隱藏層，但是卷積層濾波器的數量可多了不少：

輸入層placeholder；
reshape；
隱藏層1：conv1卷積層，32個濾波器；
隱藏層2：pool1池化層；
隱藏層3：conv2卷積層，64個濾波器；
隱藏層4：pool2池化層；
隱藏層5：fc1全連接層；
dropout；
fc2輸出層；

計算下網路中權重的數量：

5x5x1x32 + 5x5x32x64 + 7x7x64x1024 + 1024x10 = 800 + 51200 + 3211264 + 10240 = 3273504

這個並不算深的CNN有三百多萬個參數，比之前識別MNIST所用的淺層神經網路，多了兩個數量級。不過再仔細看下，兩個卷積層包含的權重數量所佔比例極小，導致參數量激增的是全連接網路層fc1。

下圖是構建好的計算圖（Computational Graph），得益於name scope的使用，它能夠以「層」為粒度，清晰的顯示出網路的骨架：

CNN

Tensors和Filters

示例代碼中，有了更多工程化的考慮，對CNN的構建進行了封裝，形成了函數deepnn，在函數內部，With代碼塊的使用，使得網路的前饋路徑也相當清楚，這部分就不再贅述了。

本節的重點是我們構建的計算圖節點上流動的Tensors，以及參與運算的Filters：

Tensor：4階，shape形式為：[batch, width, height, channels]；
Filter：4階，shape形式為：[width, height, channels，F-amount]；

deepnn函數定義如下（省略處用……代替）：

def deepnn(x): nwith tf.name_scope(reshape): nx_image = tf.reshape(x, [-1, 28, 28, 1]) nwith tf.name_scope(conv1): nW_conv1 = weight_variable([5, 5, 1, 32]) n …… nwith tf.name_scope(pool1): nh_pool1 = max_pool_2x2(h_conv1) nwith tf.name_scope(conv2): nW_conv2 = weight_variable([5, 5, 32, 64]) n…… nwith tf.name_scope(pool2): nh_pool2 = max_pool_2x2(h_conv2) nwith tf.name_scope(fc1): nW_fc1 = weight_variable([7 * 7 * 64, 1024]) nb_fc1 = bias_variable([1024]) nh_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) nh_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) nwith tf.name_scope(dropout): n …… nwith tf.name_scope(fc2): nW_fc2 = weight_variable([1024, 10]) n b_fc2 = bias_variable([10]) ny_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2n return y_conv, keep_probn

Tensors-[batch, width, height, channels]：

1、x_image = tf.reshape(x, [-1, 28, 28, 1])：要將數據輸入進二維的卷積網路，首先要進行一次reshape，把[batch, 784]的數據變成[-1, 28, 28, 1]，其中batch位填入「-1」可以自適應輸入，width和height位為輸入圖像的原始寬高，最後一位是原始圖像的通道數1（灰度圖為單通道）；

2、h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])：在將卷積網路的輸出數據，輸入全連接層時，需要再把數據拉平回一個2階Tensor；

Filters-[width, height, channels，F-amount]：

1、W_conv1 = weight_variable([5, 5, 1, 32])：第一卷積層濾波器，width和height位為卷積核的寬高，channels位代表濾波器通道數（匹配輸入），最後一位F-amount位代表濾波器的數量為32個（官方文檔從輸出數據的角度定義其為output channels也頗為合理）；

2、W_conv2 = weight_variable([5, 5, 32, 64])：第二卷積層濾波器，仍採用5x5卷積核，具有32個channels，濾波器數量64個；

跨距Strides

為防止代碼重複，卷積和池化這兩項操作也進行了封裝，前面缺失的濾波器的跨距（strides）定義，包含在這裡。

conv2d定義：

def conv2d(x, W): nreturn tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=SAME)n

strides=[1, 1, 1, 1]：跨距（strides）默認情況下第一個參數batch與最後一個參數channels都是1，第二位width和第三位height這裡也設為1；

max_pool_2x2定義：

def max_pool_2x2(x):n return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],n strides=[1, 2, 2, 1], padding=SAME)n

ksize=[1, 2, 2, 1]：池化濾波器採用了固定尺寸，池化操作是逐channel進行的，所以默認情況下第一個參數batch與最後一個參數channels都是1，第二位width和第三位height這裡設為2，視野範圍如一個「田」字；

strides=[1, 2, 2, 1]：跨距（strides）默認情況下第一個參數batch與最後一個參數channels都是1，第二位width和第三位height這裡設為2，表示從左到右、從上到下以「田」字進行搜索；

濾波器還有一個padding參數，官方文檔給出的計算方法見下：

padding == SAME：output_spatial_shape = ceil(input_spatial_shape / strides);nnpadding == VALID：output_spatial_shape = ceil((input_spatial_shape - (spatial_filter_shape-1)) / strides)；n

測試結果

運行代碼進行實測，與TensorFlow官方基本一致：

MNIST識別達到99.3%，明顯超越了淺層神經網路；
60次迭代CPU運行時間：4 hours，接近無法忍受，更深的網路必須上GPU了；

相同架構下，基於Fashion MNIST數據集對網路重新進行了訓練，驗證集識別精度達到了92.64%。CNN的全能型，由此可見一斑。

Fashion MNIST訓練過程輸出

附完整代碼

import argparsen import sysnfrom tensorflow.examples.tutorials.mnist import input_datanimport tensorflow as tfnFLAGS = Nonendef deepnn(x): n"""deepnn builds the graph for a deep net for classifying digits. nArgs: nx: an input tensor with the dimensions (N_examples, 784), where 784 is nthe number of pixels in a standard MNIST image. nReturns: nA tuple (y, keep_prob). y is a tensor of shape (N_examples, 10), with nvalues equal to the logits of classifying the digit into one of 10 nclasses (the digits 0-9). keep_prob is a scalar placeholder for the nprobability of dropout. n """ n# Reshape to use within a convolutional neural net. n# Last dimension is for "features" - there is only one here, since images n# are grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc. n with tf.name_scope(reshape): nx_image = tf.reshape(x, [-1, 28, 28, 1]) n# First convolutional layer - maps one grayscale image to 32 feature maps. n with tf.name_scope(conv1): nW_conv1 = weight_variable([5, 5, 1, 32]) nb_conv1 = bias_variable([32]) nh_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) n# Pooling layer - downsamples by 2X. nwith tf.name_scope(pool1): n h_pool1 = max_pool_2x2(h_conv1) n# Second convolutional layer -- maps 32 feature maps to 64. nwith tf.name_scope(conv2): nW_conv2 = weight_variable([5, 5, 32, 64]) nb_conv2 = bias_variable([64]) nh_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) n# Second pooling layer. nwith tf.name_scope(pool2): nh_pool2 = max_pool_2x2(h_conv2) n# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image n# is down to 7x7x64 feature maps -- maps this to 1024 features. nwith tf.name_scope(fc1): nW_fc1 = weight_variable([7 * 7 * 64, 1024]) nb_fc1 = bias_variable([1024]) nh_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) nh_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) n# Dropout - controls the complexity of the model, prevents co-adaptation of n# features. nwith tf.name_scope(dropout): nkeep_prob = tf.placeholder(tf.float32) nh_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) n# Map the 1024 features to 10 classes, one for each digit nwith tf.name_scope(fc2): nW_fc2 = weight_variable([1024, 10]) nb_fc2 = bias_variable([10]) ny_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2 nreturn y_conv, keep_prob def conv2d(x, W): n"""conv2d returns a 2d convolution layer with full stride.""" nreturn tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=SAME) def max_pool_2x2(x): """max_pool_2x2 downsamples a feature map by 2X.""" nreturn tf.nn.max_pool(x, ksize=[1, 2, 2, 1], nstrides=[1, 2, 2, 1], padding=SAME) def weight_variable(shape): n """weight_variable generates a weight variable of a given shape.""" ninitial = tf.truncated_normal(shape, stddev=0.1) nreturn tf.Variable(initial)ndef bias_variable(shape): n"""bias_variable generates a bias variable of a given shape.""" ninitial = tf.constant(0.1, shape=shape) nreturn tf.Variable(initial) def main(_): n# Import data nmnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True, validation_size=10000) n# Create the model nx = tf.placeholder(tf.float32, [None, 784]) n # Define loss and optimizer ny_ = tf.placeholder(tf.float32, [None, 10]) n# Build the graph for the deep net ny_conv, keep_prob = deepnn(x) nwith tf.name_scope(loss): n cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv) ncross_entropy = tf.reduce_mean(cross_entropy) nwith tf.name_scope(adam_optimizer): n train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) nwith tf.name_scope(accuracy): ncorrect_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) ncorrect_prediction = tf.cast(correct_prediction, tf.float32) naccuracy = tf.reduce_mean(correct_prediction) ngraph_location = MNIST/logs/tf2-4/train nprint(Saving graph to: %s % graph_location) ntrain_writer = tf.summary.FileWriter(graph_location) train_writer.add_graph(tf.get_default_graph()) nbest = 0 nwith tf.Session() as sess: nsess.run(tf.global_variables_initializer()) nfor epoch in range(60): nfor _ in range(1000): n batch = mnist.train.next_batch(50) ntrain_step.run( nfeed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) naccuracy_validation = accuracy.eval( nfeed_dict={ nx: mnist.validation.images, ny_: mnist.validation.labels, nkeep_prob: 1.0}) nprint(epoch %d, validation accuracy %s % ( nepoch, accuracy_validation)) nbest = (best, accuracy_validation)[ nbest <= accuracy_validation] n # Test trained model nprint("best: %s" % best) if __name__ == __main__: parser = argparse.ArgumentParser() parser.add_argument(--data_dir, type=str, default=../MNIST/, n help=Directory for storing input data) nFLAGS, unparsed = parser.parse_known_args() ntf.app.run(main=main, argv=[sys.argv[0]] + unparsed)n

下載tf_25_cnn.py

TensorFlow從1到2 | 第五章 非專家莫入！TensorFlow實現CNN