學習筆記TF029:實現進階卷積網路

02-03

經典數據集CIFAR-10,60000張32x32彩色圖像，訓練集50000張，測試集10000張。標註10類，每類圖片6000張。airplance、automobile、bird、cat、deer、dog、frog、horse、ship、truck。沒有任何重疊。CIFAR-100,100類標註。深度學習之父 Geoffrey Hinton和學生Alex Krizhevsky、Vinod Nair收集。圖片源於80 million tiny images數據集。State-of-the-art 3.5%錯誤率，GPU訓練十幾小時。詳細Benchmark和排名在 http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html 。LeCun，現有卷積神經網路已經解決CIFAR-10數據集問題。

根據Alex cuda-convnet模型修改，3000個batch，每個batch 128個樣本，達到73%正確率。GTX1080單顯卡幾十秒模型訓練時間。CPU慢很多。如用100k batch 結合學習速度decay(每隔一段時間下降學習速率一個比率)，正確率可到86%。模型訓練參數100萬個，預測四則運算總量2000萬次。對weights進行L2正則化。圖片翻轉、隨機剪切等數據增強，製造更多樣本。每個卷積-最大池化層後用LRN層，增強模型泛化能力。

下載TensorFlow Models庫，使用其中提供CIFAR-10數據類。git clone https://github.com/tensorflow/models.git。models/tutorials/image/cifar10。

載入常用庫，NumPy、time，TensorFlow Models自動下載、讀取CIFAR-10數據類。

定義batch_size，訓練輪數max_steps，下載CIFAR-10數據默認路徑。

定義初始化weight函數，tf.truncated_normal截斷正態分布初始化權重。Weight加L2 loss ,做L2 正則化。減少特徵或懲罰不重要特徵權重，緩解特徵過多導致過擬合。正則化幫助找到該懲罰的特徵權重。為使用某個特徵，需付出loss代價。L1正則製造稀疏特徵，大部分無用特徵權重被置0。L2正則讓特徵權重不過大，特徵權重較平均。wl控制L2 loss大小，tf.nn.l2_loss函數計算weight L2 loss，tf.multiply L2 loss 乘以wl,得最後 weight loss。tf.add_to_collection weight loss統一存在collection losses，計算神經網路總體loss使用。

用cifar10類下載數據集，解壓、展開到默認位置。

用cifar10_input類 distorted_inputs函數產生訓練數據，包括特徵、label，返回封裝tensor，每次執行生成一個batch_size數量樣本。Data Augmentation(數據增強),cifar10_input.distorted_inputs函數，隨機水平翻轉(tf.image.random_flip_left_right)、隨機剪切一塊24x24圖片(tf.random_crop)、設置隨機亮度對比度(tf.image.random_brightness、tf.image.random_contrast)，數據標準化(tf.image.per_image_whitening，數據減均值，除方差，保證數據零均值，方差1)。獲得更多樣本，帶雜訊，一張圖片樣本變多張圖片，擴大樣本量，提高準確率。數據增強操作耗費大量CPU時間，distored_inputs用16個獨立線程加速任務，函數內部產生線程池，通過TensorFlow queue調度。

用cifar10_input.inputs函數生成測試數據，裁剪圖片正中間24x24大小區塊，數據標準化。

創建輸入數據placeholderx，特徵、label。設定placeholder數據尺寸，batch_size定義網路結構要用，數據尺寸第一個值樣本條數需要預先設定，不能設None。數據尺寸的圖片尺寸為24x24,裁剪後大小，顏色通道數3,彩色RGB三通道。

第一個卷積層，variable_with_weight_loss 函數創建卷積核參數初始化。卷積核大小5x5,3個顏色通道，64個卷積核，設置weight初始化函數標準差0.05。wl(weight loss)設0。tf.nn.conv2d函數對輸入數據image_holder卷積操作，步長stride設1,padding模式SAME，bias初始化0，卷積結果加bias，用ReLU激活函數非線化。用尺寸3x3,步長2x2最大池化層處理數據，尺寸、步長不一致，增加數據豐富性。tf.nn.lrn函數，LRN，處理結果。

LRN起於Alex用CNN參加ImageNet比賽論文。LRN模仿生物神經系統側抑制機制，對局部神經元活動創建競爭環境，響應較大值變得相對更大，抑制其他反饋較小神經元，增強模型泛化能力。用LRN後CNN Top1錯誤率降低1.4%。LRN對無上限邊界激活函數ReLU有用，從附近多個卷積核響應(Response)挑選較大反饋，不適合固定邊界能抑制過大值激活函數Sigmoid。

第二個卷積層，卷積核尺寸第三維度輸入通道數64,bias值全初始化0.1。先進行LRN層處理，再用最大池化層。

全連接層，把前面兩個卷積層輸出結果全部flatten，tf.reshape函數把每個樣本變成一維向量。get_shape函數獲取數據扁平化長度。variable_with_weight_loss函數初始化全連接層weight，隱含節點384,正態分布標準差0.04，bias初始化0.1。設非零weight loss值0.04，所有參數被L2正則約束，避免過擬合。ReLU激活函數非線性化。

第二個全連接層，隱含節點192。

最後一層，先創建weight，正態分布標準差設上一隱含層節點數倒數，不計入L2正則。Softmax操作放在計算loss部分，不需要對inference輸出softmax處理，就可以獲得最終分類，直接比較inference輸出各類數值大小。

整個卷積神經網路從輸入到輸出流程。設計CNN，安排卷積層、池化層、全連接層分布和順序，超參數設置、Trick使用。卷積神經網路結構：

conv1:卷積層和ReLU激活函數

pool1:最大池化

norm1:LRN

conv2:卷積層和ReLU激活函數

norm2:LRN

pool2:最大池化

local3:全連接層和ReLU激活函數

local4:全連接層和ReLU激活函數

logits:模型Inference輸出結果

計算CNN loss。softmax計算和cross entropy loss 計算合在一起，tf.nn.sparse_softmax_cross_entropy_with_logits。tf.reduce_mean計算cross entropy均值，tf.add_to_collection 添加cross entropy loss 到整體losses collection。tf.add_n整體losses collection 全部loss求和，得最終loss，包括cross entropy loss，和後兩個連接層weight L2 loss。Logits節點、label_placeholder傳入loss小孩子數，獲得最終loss。

優化器選擇Adam Optimizer，學習速率1e-3。

tf.nn.in_top_k函數求輸出結果top k準確率，默認top 1,輸出分類最高類準確率。

tf.InteractiveSession創建默認session ,初始化全部模型參數。

啟動圖片數據增強線程隊列，16個線程加速。

訓練。每個step訓練過程，session run方法執行images_train、 labels_train計算，獲得batch訓練數據，傳入train_op和loss計算。記錄每個step時間，每隔10個step計算展示當前loss、每秒鐘訓練樣本數量、訓練batch數據時間，監控整個訓練過程。GTX 1080,每秒訓練1800個樣本，batch_size 128,每個batch 0.066s。損失loss，開始4.6，3000步訓練下降到1.0。

評測模型測試集準確率。測試集10000個樣本，使用固定batch_size，逐個batch輸入測試數據。計算全部樣本評測完batch數量。每個step用session run方法獲取images_test、labels_test的batch，執行top_k_op計算模型 batch top 1預測正確樣本數。匯總所有預測正確結果，求全部測試樣本預測正確數量。

列印準確率評測結果計算。

73%準確率。持續增加max_steps，期望準確率逐漸增加。max_steps較大，用學習速率衰減(decay)的SGD訓練，接近86%。L2正則，LRN層提升模型準確率，提升框泛化性。

數據增強(Data Augmentation)，給單幅圖增加多個副本，提高圖片利用率，防止圖片結構學習過擬合。利用圖片本身性質，圖片冗餘信息量較大，製造不同雜訊，依可識別。神經網路克服雜訊準確識別，泛化性更好。深度學習只要提供足夠多樣本，準確率可以持續提升。規模越大越複雜神經網路模型，可以達到準確率水平越高，需要更多數據訓練。Alex cuda-convnet測試結果，CIFAR-10,不數據增強，錯誤最低下降到17%，數據增強，錯誤率下降到11%。

import cifar10,cifar10_input

import tensorflow as tf

import numpy as np

import time

max_steps = 3000

batch_size = 128

data_dir = /tmp/cifar10_data/cifar-10-batches-bin

def variable_with_weight_loss(shape, stddev, wl):

var = tf.Variable(tf.truncated_normal(shape, stddev=stddev))

if wl is not None:

weight_loss = tf.multiply(tf.nn.l2_loss(var), wl, name=weight_loss)

tf.add_to_collection(losses, weight_loss)

return var

def loss(logits, labels):

labels = tf.cast(labels, tf.int64)

cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(

logits=logits, labels=labels, name=cross_entropy_per_example)

cross_entropy_mean = tf.reduce_mean(cross_entropy, name=cross_entropy)

tf.add_to_collection(losses, cross_entropy_mean)

return tf.add_n(tf.get_collection(losses), name=total_loss)

###

cifar10.maybe_download_and_extract()

images_train, labels_train = cifar10_input.distorted_inputs(data_dir=data_dir,

batch_size=batch_size)

images_test, labels_test = cifar10_input.inputs(eval_data=True,

data_dir=data_dir,

batch_size=batch_size)

#images_train, labels_train = cifar10.distorted_inputs()

#images_test, labels_test = cifar10.inputs(eval_data=True)

image_holder = tf.placeholder(tf.float32, [batch_size, 24, 24, 3])

label_holder = tf.placeholder(tf.int32, [batch_size])

#logits = inference(image_holder)

weight1 = variable_with_weight_loss(shape=[5, 5, 3, 64], stddev=5e-2, wl=0.0)

kernel1 = tf.nn.conv2d(image_holder, weight1, [1, 1, 1, 1], padding=SAME)

bias1 = tf.Variable(tf.constant(0.0, shape=[64]))

conv1 = tf.nn.relu(tf.nn.bias_add(kernel1, bias1))

pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],

padding=SAME)

norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

weight2 = variable_with_weight_loss(shape=[5, 5, 64, 64], stddev=5e-2, wl=0.0)

kernel2 = tf.nn.conv2d(norm1, weight2, [1, 1, 1, 1], padding=SAME)

bias2 = tf.Variable(tf.constant(0.1, shape=[64]))

conv2 = tf.nn.relu(tf.nn.bias_add(kernel2, bias2))

norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)

pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],

padding=SAME)

reshape = tf.reshape(pool2, [batch_size, -1])

dim = reshape.get_shape()[1].value

weight3 = variable_with_weight_loss(shape=[dim, 384], stddev=0.04, wl=0.004)

bias3 = tf.Variable(tf.constant(0.1, shape=[384]))

local3 = tf.nn.relu(tf.matmul(reshape, weight3) + bias3)

weight4 = variable_with_weight_loss(shape=[384, 192], stddev=0.04, wl=0.004)

bias4 = tf.Variable(tf.constant(0.1, shape=[192]))

local4 = tf.nn.relu(tf.matmul(local3, weight4) + bias4)

weight5 = variable_with_weight_loss(shape=[192, 10], stddev=1/192.0, wl=0.0)

bias5 = tf.Variable(tf.constant(0.0, shape=[10]))

logits = tf.add(tf.matmul(local4, weight5), bias5)

loss = loss(logits, label_holder)

train_op = tf.train.AdamOptimizer(1e-3).minimize(loss) #0.72

top_k_op = tf.nn.in_top_k(logits, label_holder, 1)

sess = tf.InteractiveSession()

tf.global_variables_initializer().run()

tf.train.start_queue_runners()

###

for step in range(max_steps):

start_time = time.time()

image_batch,label_batch = sess.run([images_train,labels_train])

_, loss_value = sess.run([train_op, loss],feed_dict={image_holder: image_batch,

label_holder:label_batch})

duration = time.time() - start_time

if step % 10 == 0:

examples_per_sec = batch_size / duration

sec_per_batch = float(duration)

format_str = (step %d, loss = %.2f (%.1f examples/sec; %.3f sec/batch))

print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))

###

num_examples = 10000

import math

num_iter = int(math.ceil(num_examples / batch_size))

true_count = 0

total_sample_count = num_iter * batch_size

step = 0

while step < num_iter:

image_batch,label_batch = sess.run([images_test,labels_test])

predictions = sess.run([top_k_op],feed_dict={image_holder: image_batch,

label_holder:label_batch})

true_count += np.sum(predictions)

step += 1

precision = true_count / total_sample_count

print(precision @ 1 = %.3f % precision)

參考資料：

《TensorFlow實戰》

歡迎付費諮詢(150元每小時)，我的微信：qingxingfengzi