LeNet-5模型的Python實現

07-10

LeNet-5模型的Python實現

來自專欄深度學習+自然語言處理（NLP）

LeNet-5模型的Python實現

本文基於Python的Tensorflow庫實現了經典模型LeNet-5，並在數據集CIFAR-10上進行訓練和測試。本文的代碼均經過實測，讀者也可以在我的Github下載Jupyter Notebook文件.

LeNet-5模型

LeNet-5模型是Yan Lecun在1998年提出的一種卷積神經網路模型，最初設計用於手寫數字的識別，是早期卷積神經網路中最有代表性的實驗系統之一。論文原地址鏈接。以下是該模型的結構。

CIFAR-10數據集

CIOWE10數據集包括60000個32×32彩色圖像，分為10個類別，每個類有6000個圖像。有50000個訓練圖像和10000個測試圖像。數據集分為五個訓練批次和一個測試批次，每一個批次都有10000個圖像。測試批次包含來自每個類別的隨機抽取的1000個圖像。訓練批次在剩餘圖像中隨機抽取，訓練批次可能包含某一類的圖像多於另一類。訓練批次包含來自每個類的5000個圖像。下面是數據集中的類別以及10個隨機抽取的圖像：

目標

本文的目標是搭建LeNet模型，並在CIFAR-10數據集上進行訓練和測試。使用python語言和tensorflow庫。為了方便處理數據集，請下載helper.py文件並放在你的工作目錄下。下載地址.

python 版本： Python 3.5.4
tensorflow 版本：tensorflow 1.1.0

獲取數據

數據集下載地址：CIFAR-10 dataset for python 請下載並解壓縮。

預處理函數

歸一化

我們知道，圖像大小為[32,32,3]，其中第三維代表RGB通道，每一個通道的取值範圍是[0,255]，所以我們將每一個像素數據除以255，相當於對數據進行歸一化處理，得到歸一化之後的數據取值範圍是[0, 1]。

def normalize(x): """ 歸一化至範圍[0, 1] x: 像素數據列表 return: 歸一化後的數據 """ return x/255 import numpy as npnormalize(np.array([120,329,255])) #測試

One-hot編碼

這裡需要對標籤進行one-hot編碼，首先創建一個與類別數量相同的全0數組，然後將對應位置置為1。例如，編碼前的標籤是4，編碼後為[0,0,0,1,0,0,0,0,0,0]

def one_hot_encode(x): """ one-hot編碼. : x: 標籤列表 : return: Numpy array格式的one-hot編碼標籤列表 """ output = np.zeros((len(x),10)) # 根據輸入數組的長度和類別數量創建數組 for i in range(len(x)): output[i][x[i]] = 1 # 將數組的對應位置置為1 return outputone_hot_encode(np.array([1,3,6,2,4,9])) #測試

對所有數據進行預處理並保存

有了上述預處理函數，我們就可以對所有數據進行預處理並保存。

# 這裡需要下載helper.py，並保存在工作目錄下，下載鏈接在「目標」章節import helperhelper.preprocess_and_save_data(cifar-10-batches-py, normalize, one_hot_encode)import pickle# 讀取保存的數據valid_features, valid_labels = pickle.load(open(preprocess_validation.p, mode=rb))

打造卷積網路基本模塊

這一章主要聚焦打造卷積神經網路的基本模塊，包括輸入層，卷積層，池化層，全連接層，輸出層等。使用tensorflow庫。

圖像輸入

根據輸入圖像的尺寸定義tensor。

import tensorflow as tfdef neural_net_image_input(image_shape): """ 返回輸入圖像的tensor : image_shape: 圖像的尺寸 : return: 正確尺寸的Tensor """ return tf.placeholder(tf.float32, [None, image_shape[0], image_shape[1], image_shape[2]], name = x)neural_net_image_input([32, 32, 3]) #測試

注意到，這裡將[32, 32, 3]尺寸的圖像轉換為(?, 32, 32, 3)尺寸的tensor格式，第一個維度是『？』，因為第一個維度代表批大小，這裡可以使用問號來代表以後可以輸入任意大小的批數量。

標籤輸入

根據輸入標籤的類別數量定義張量。

def neural_net_label_input(n_classes): """ 返回輸入標籤的tensor : n_classes: 標籤類別數量 : return: 正確尺寸的Tensor """ return tf.placeholder(tf.float32, [None, n_classes], name = y)neural_net_label_input(10) #測試

返回的tensor尺寸為(?, 10)，問號代表後面可以輸入任意大小的批數量。

drop out參數

後面會用到drop out演算法，簡單來說，這個演算法通過在訓練網路的過程中隨機關閉一定比例的神經元來避免過擬合，所以這裡需要設定一個參數，即隨機關閉的比例keep_prob.

def neural_net_keep_prob_input(): """ 返回tensor格式的概率，該概率是drop out演算法的參數，表示保存的神經元的比例 """ return tf.placeholder(tf.float32, None, name = keep_prob)

卷積和池化層

觀察LeNet-5結構: 輸入層 -> 卷積層 -> 池化層 -> 卷積層 -> 池化層 -> 全連接層1 -> 全連接層2 -> 輸出層

所以我們可以打造卷積-池化層模塊和全連接層模塊，將模塊組合拼在一起即可以組成LeNet-5網路。

def conv2d_avgpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides, keep_prob): """ 對輸入x_tensor應用卷積層和池化層 x_tensor: TensorFlow Tensor conv_num_outputs: 卷積層的輸出數量 conv_ksize: 卷積核的尺寸 conv_strides: 卷積的步進 pool_ksize: 池化層的核尺寸 pool_strides: 池化層的步進 keep_prob: dropout保存率 : return: 經過卷積層和池化層後的tensor """ weight = tf.Variable(tf.random_normal([conv_ksize[0], conv_ksize[1], x_tensor.get_shape().as_list()[3],conv_num_outputs])) bias = tf.Variable(tf.random_normal([conv_num_outputs])) x = tf.nn.conv2d(x_tensor, weight, strides= [1,conv_strides[0],conv_strides[1],1], padding = VALID) x = tf.nn.bias_add(x, bias) x = tf.nn.dropout(x, keep_prob) x = tf.nn.avg_pool(x, ksize=[1, pool_ksize[0], pool_ksize[1], 1], strides=[1, pool_strides[0], pool_strides[1], 1], padding=SAME) x = tf.nn.sigmoid(x) return xconv2d_avg_pool(tf.placeholder(tf.float32, [None, 32,32,3]), 6, (5,5), (1,1), (2,2), (2,2), 0.8) # 測試

Flatten層

我們知道，全連接層就是常規神經網路中的隱藏層，它的維度是一維的，而卷積層和池化層處理的都是二維的數據，因此，池化層和全連接層之間有一個Flatten層將二維數據「Flatten」為一維數據。什麼不將Flatten層定義為卷積神經網路中的一層，因為這一層沒有任何參數，也沒有對數據進行任何處理，所以我們可以把這層理解為對數據維度的一種轉換，而不是神經網路最後中的一層。

import tensorflow.contrib as tfcdef flatten(x_tensor): """ 將x_tensor轉換為(批大小, flatten後圖像大小) x_tensor: 輸入tensor return: 輸出tensor """ return tfc.layers.flatten(x_tensor)flatten(tf.placeholder(tf.float32, [None, 16,16,10])) #測試

全連接層

全連接層相當於常規神經網路中的隱藏層，該層的唯一參數是輸出尺寸（即神經元的數量）。

def fully_conn(x_tensor, num_outputs): """ 定義全連接層 : x_tensor: 輸入tensor : num_outputs: 輸出尺寸 : return: tensor """ return tfc.layers.fully_connected(x_tensor, num_outputs)fully_conn(tf.placeholder(tf.float32, [None, 2560]), 120) #測試

輸出層

最後是輸出層，由於我們有10個類別，因此我們的輸出層的尺寸為10.

def output(x_tensor, num_outputs): """ 定義輸出層 : x_tensor: 輸入tensor : num_outputs: 輸出尺寸 : return: tensor """ return tfc.layers.fully_connected(inputs = x_tensor, num_outputs = num_outputs, activation_fn=None)In [197]:output(tf.placeholder(tf.float32, [None, 120]), 10) #測試

LeNet-5模型

有了上述模塊，接下來可以搭積木式搭建LeNet-5模型，為了方便，我們將該模型的結構圖放在這裡：

def LeNet(x, keep_prob): """ 打造LeNet : x: 輸入tensor : keep_prob: dropout的保存率 : return: Tensor """ # C1 and S2 layer x = conv2d_avgpool(x, 6, (5,5), (1,1), (2,2), (2,2), keep_prob) # C3 and S4 layer x = conv2d_avgpool(x, 16, (5,5), (1,1), (2,2), (2,2), keep_prob) # C5 layer weight = tf.Variable(tf.random_normal([5, 5, x.get_shape().as_list()[3],120])) bias = tf.Variable(tf.random_normal([120])) x = tf.nn.conv2d(x, weight, strides= [1,5,5,1], padding = VALID) x = tf.nn.bias_add(x, bias) x = flatten(x) # F6 layer x= fully_conn(x, 84) # ouput layer x = output(x, 10) return x

下面利用前面的模塊構建模型，並定義損失函數和優化器。

tf.reset_default_graph()# 輸入x = neural_net_image_input((32, 32, 3))y = neural_net_label_input(10)keep_prob = neural_net_keep_prob_input()# 模型logits = LeNet(x, keep_prob)# Name logits Tensor, so that is can be loaded from disk after traininglogits = tf.identity(logits, name=logits)# 損失函數和優化器cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))optimizer = tf.train.AdamOptimizer().minimize(cost)# 使用準確率作為meritccorrect_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name=accuracy)

訓練神經網路

訓練函數

因為我們要批量訓練，所以這裡我們先定義一個訓練函數。

def train_neural_network(session, optimizer, keep_probability, feature_batch, label_batch): """ 對批次訓練集進行訓練 : session: TensorFlow session : optimizer: 優化器 : keep_probability: dropout參數 : feature_batch: 特徵批次 : label_batch: 標籤批次 """ session.run(optimizer, feed_dict={x: feature_batch, y: label_batch, keep_prob: keep_probability}) pass

列印狀態

由於訓練時間較長，這裡定義一個函數用於列印中間狀態。

def print_stats(session, feature_batch, label_batch, cost, accuracy): """ 列印損失函數和準確率 : session: TensorFlow session : feature_batch: 特徵批次 : label_batch: 標籤批次 : cost: 損失函數 : accuracy: 精確度 """ loss = session.run(cost, feed_dict={x: feature_batch, y: label_batch, keep_prob: 1.}) valid_acc = sess.run(accuracy, feed_dict={x: valid_features, y: valid_labels, keep_prob: 1.}) print(loss) print(valid_acc) pass

訓練

epochs = 30batch_size = 128keep_probability = 0.9save_model_path = ./image_classificationprint(Training...)with tf.Session() as sess: # Initializing the variables sess.run(tf.global_variables_initializer()) # Training cycle for epoch in range(epochs): # Loop over all batches n_batches = 5 for batch_i in range(1, n_batches + 1): for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size): train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels) print(Epoch {:>2}, CIFAR-10 Batch {}: .format(epoch + 1, batch_i), end=) print_stats(sess, batch_features, batch_labels, cost, accuracy) # Save Model saver = tf.train.Saver() save_path = saver.save(sess, save_model_path)

最終得到的準確率48.9%

測試

最後，在測試集上驗證演算法。

%matplotlib inline%config InlineBackend.figure_format = retinaimport tensorflow as tfimport pickleimport helperimport randomtry: if batch_size: passexcept NameError: batch_size = 64save_model_path = ./image_classificationn_samples = 4top_n_predictions = 3def test_model(): test_features, test_labels = pickle.load(open(preprocess_training.p, mode=rb)) loaded_graph = tf.Graph() with tf.Session(graph=loaded_graph) as sess: # Load model loader = tf.train.import_meta_graph(save_model_path + .meta) loader.restore(sess, save_model_path) # Get Tensors from loaded model loaded_x = loaded_graph.get_tensor_by_name(x:0) loaded_y = loaded_graph.get_tensor_by_name(y:0) loaded_keep_prob = loaded_graph.get_tensor_by_name(keep_prob:0) loaded_logits = loaded_graph.get_tensor_by_name(logits:0) loaded_acc = loaded_graph.get_tensor_by_name(accuracy:0) # Get accuracy in batches for memory limitations test_batch_acc_total = 0 test_batch_count = 0 for train_feature_batch, train_label_batch in helper.batch_features_labels(test_features, test_labels, batch_size): test_batch_acc_total += sess.run( loaded_acc, feed_dict={loaded_x: train_feature_batch, loaded_y: train_label_batch, loaded_keep_prob: 1.0}) test_batch_count += 1 print(Testing Accuracy: {} .format(test_batch_acc_total/test_batch_count)) # Print Random Samples random_test_features, random_test_labels = tuple(zip(*random.sample(list(zip(test_features, test_labels)), n_samples))) random_test_predictions = sess.run( tf.nn.top_k(tf.nn.softmax(loaded_logits), top_n_predictions), feed_dict={loaded_x: random_test_features, loaded_y: random_test_labels, loaded_keep_prob: 1.0}) helper.display_image_predictions(random_test_features, random_test_labels, random_test_predictions)test_model()INFO:tensorflow:Restoring parameters from ./image_classificationTesting Accuracy: 0.49238528481012656

至此，感謝閱讀。