TensorFlow 教程 #01 - 簡單線性模型

01-29

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube

中文翻譯 thrillerist /Github

如有轉載，請附上本文鏈接。

_______________________________________________________________________________________

介紹

這份教程示範了在TensorFlow中使用一個簡單線性模型的工作流程。在載入稱為MNIST的手寫數字圖片數據集後，我們在TensorFlow中定義並優化了一個數學模型。（我們）會畫出結果並展開討論。

你應該熟悉基本的線性代數，Python和Jupyter Notebook編輯器。如果你對機器學習和分類有基本的理解也很有幫助。

導入

%matplotlib inlinenimport matplotlib.pyplot as pltnimport tensorflow as tfnimport numpy as npnfrom sklearn.metrics import confusion_matrixn

使用Python3.5.2（Anaconda）開發，TensorFlow版本是：

tf.__version__n

0.12.0-rc1

載入數據

MNIST數據集大約有12MB，如果給定的地址里沒有文件，它將自動下載。

from tensorflow.examples.tutorials.mnist import input_datandata = input_data.read_data_sets("data/MNIST/", one_hot=True)n

Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

現在已經載入了MNIST數據集，它由70,000張圖像和對應的標籤（比如圖像的類別）組成。數據集分成三份互相獨立的子集。我們在教程中只用訓練集和測試集。

print("Size of:")nprint("- Training-set:tt{}".format(len(data.train.labels)))nprint("- Test-set:tt{}".format(len(data.test.labels)))nprint("- Validation-set:t{}".format(len(data.validation.labels)))n

Size of:
- Training-set: 55000

- Test-set: 10000
- Validation-set: 5000

One-Hot 編碼

數據集以一種稱為One-Hot編碼的方式載入。這意味著標籤從一個單獨的數字轉換成一個長度等於所有可能類別數量的向量。向量中除了第$i$個元素是1，其他元素都是0，這代表著它的類別是$i$。比如，前面五張圖像標籤的One-Hot編碼為：

data.test.labels[0:5, :]

array([[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])

在不同的比較和度量性能時，我們也需要用單獨的數字表示類別，因此我們通過取最大元素的索引，將One-Hot編碼的向量轉換成一個單獨的數字。需注意的是class在Python中是一個關鍵字，所以我們用cls代替它。

data.test.cls = np.array([label.argmax() for label in data.test.labels])

現在我們可以看到測試集中前面五張圖像的類別。將這些與上面的One-Hot編碼的向量進行比較。例如，第一張圖像的類別是7，對應的在One-Hot編碼向量中，除了第7個元素其他都為零。

data.test.cls[0:5]

array([7, 2, 1, 0, 4])

數據維度

在下面的源碼中，有很多地方用到了數據維度。在計算機編程中，通常來說最好使用變數和常量，而不是在每次使用數值時寫硬代碼。這意味著數字只需要在一個地方改動就行。這些最好能從讀取的數據中獲取，但這裡我們直接寫上數值。

# We know that MNIST images are 28 pixels in each dimension.nimg_size = 28n n# Images are stored in one-dimensional arrays of this length.nimg_size_flat = img_size * img_sizen n# Tuple with height and width of images used to reshape arrays.nimg_shape = (img_size, img_size)n n# Number of classes, one class for each of 10 digits.nnum_classes = 10n

用來繪製圖像的幫助函數

這個函數用來在3x3的柵格中畫9張圖像，然後在每張圖像下面寫出真實的和預測的類別。

def plot_images(images, cls_true, cls_pred=None):n assert len(images) == len(cls_true) == 9n n # Create figure with 3x3 sub-plots.n fig, axes = plt.subplots(3, 3)n fig.subplots_adjust(hspace=0.3, wspace=0.3)nn n for i, ax in enumerate(axes.flat):n # Plot image.n ax.imshow(images[i].reshape(img_shape), cmap=binary)n n # Show true and predicted classes.nn if cls_pred is None:n xlabel = "True: {0}".format(cls_true[i])n else:n xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])n n ax.set_xlabel(xlabel)nn n # Remove ticks from the plot.n ax.set_xticks([])n ax.set_yticks([])n

繪製幾張圖像來看看數據是否正確

# Get the first images from the test-set.nimages = data.test.images[0:9]n n# Get the true classes for those images.ncls_true = data.test.cls[0:9]n n# Plot the images and labels using our helper-function above.nplot_images(images=images, cls_true=cls_true)n

TensorFlow圖

TensorFlow的全部目的就是使用一個稱之為計算圖（computational graph）的東西，它會比直接在Python中進行相同計算量要高效得多。TensorFlow比Numpy更高效，因為TensorFlow了解整個需要運行的計算圖，然而Numpy只知道某個時間點上唯一的數學運算。

TensorFlow也能夠自動地計算需要優化的變數的梯度，使得模型有更好的表現。這是由於Graph是簡單數學表達式的結合，因此整個圖的梯度可以用鏈式法則推導出來。

TensorFlow還能利用多核CPU和GPU，Google也為TensorFlow製造了稱為TPUs（Tensor Processing Units）的特殊晶元，它比GPU更快。

一個TensorFlow圖由下面詳細描述的幾個部分組成：

佔位符變數（Placeholder）用來改變圖的輸入。
模型變數（Model）將會被優化，使得模型表現得更好。
模型本質上就是一些數學函數，它根據Placeholder和模型的輸入變數來計算一些輸出。
一個cost度量用來指導變數的優化。
一個優化策略會更新模型的變數。

另外，TensorFlow圖也包含了一些調試狀態，比如用TensorBoard列印log數據，本教程不涉及這些。

佔位符（Placeholder）變數

Placeholder是作為圖的輸入，每次我們運行圖的時候都可能會改變它們。將這個過程稱為feeding placeholder變數，後面將會描述它。

首先我們為輸入圖像定義placeholder變數。這讓我們可以改變輸入到TensorFlow圖中的圖像。這也是一個張量（tensor），代表一個多維向量或矩陣。數據類型設置為float32，形狀設為[None, img_size_flat]，None代表tensor可能保存著任意數量的圖像，每張圖象是一個長度為img_size_flat的向量。

x = tf.placeholder(tf.float32, [None, img_size_flat])n

接下來我們為輸入變數x中的圖像所對應的真實標籤定義placeholder變數。變數的形狀是[None, num_classes]，這代表著它保存了任意數量的標籤，每個標籤是長度為num_classes的向量，本例中長度為10。

y_true = tf.placeholder(tf.float32, [None, num_classes])n

最後我們為變數x中圖像的真實類別定義placeholder變數。它們是整形，並且這個變數的維度設為[None]，代表placeholder變數是任意長的一維向量。

y_true_cls = tf.placeholder(tf.int64, [None])n

需要優化的變數

除了上面定義的那些給模型輸入數據的變數之外，TensorFlow還需要改變一些模型變數，使得訓練數據的表現更好。

第一個需要優化的變數稱為權重weight，TensorFlow變數需要被初始化為零，它的形狀是[img_size_flat, num_classes]，因此它是一個img_size_flat行、num_classes列的二維張量（或矩陣）。

weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))n

第二個需要優化的是偏差變數biases，它被定義成一個長度為num_classes的1維張量（或向量）。

`biases = tf.Variable(tf.zeros([num_classes]))n`
模型

這個最基本的數學模型將placeholder變數x中的圖像與權重weight相乘，然後加上偏差biases。

結果是大小為[num_images, num_classes]的一個矩陣，由於x的形狀是[num_images, img_size_flat] 並且 weights的形狀是[img_size_flat, num_classes]，因此兩個矩陣乘積的形狀是[num_images, num_classes]，然後將biases向量添加到矩陣每一行中。

logits = tf.matmul(x, weights) + biasesn

現在logits是一個 num_images 行num_classes列的矩陣，第$i$行第$j$列的那個元素代表著第i 張輸入圖像有多大可能性是第 j 個類別。

然而，這是很粗略的估計並且很難解釋，因為數值可能很小或很大，因此我們想要對它們做歸一化，使得logits矩陣的每一行相加為1，每個元素限制在0到1之間。這是用一個稱為softmax的函數來計算的，結果保存在y_pred中。

y_pred = tf.nn.softmax(logits)n

可以從y_pred矩陣中取每行最大元素的索引值，來得到預測的類別。

y_pred_cls = tf.argmax(y_pred, dimension=1)

優化損失函數

為了使模型更好地對輸入圖像進行分類，我們必須改變weights和biases變數。首先我們需要比較模型的預測輸出y_pred和期望輸出y_true，來了解目前模型的性能如何。

交叉熵（cross-entropy）是一個在分類中使用的性能度量。交叉熵是一個常為正值的連續函數，如果模型的預測值精準地符合期望的輸出，它就等於零。因此，優化的目的就是最小化交叉熵，通過改變模型中weights和biases的值，使交叉熵越接近零越好。

TensorFlow有一個內置的計算交叉熵的函數。需要注意的是它使用logits的值，因為在它內部也計算了softmax。

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_true)

現在，我們已經為每個圖像分類計算了交叉熵，所以有一個當前模型在每張圖上的性能度量。但是為了用交叉熵來指導模型變數的優化，我們需要一個額外的標量值，因此我們簡單地利用所有圖像分類交叉熵的均值。

cost = tf.reduce_mean(cross_entropy)n

優化方法

現在，我們有一個需要被最小化的損失度量，接著我們可以創建優化器。在這種情況中，用的是梯度下降的基本形式，步長設為0.5。

優化過程並不是在這裡執行。實際上，還沒計算任何東西，我們只是往TensorFlow圖中添加了優化器，以便之後的操作。

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)n

性能度量

我們需要另外一些性能度量，來向用戶展示這個過程。

這是一個布爾值向量，代表預測類型是否等於每張圖片的真實類型。

correct_prediction = tf.equal(y_pred_cls, y_true_cls)n

上面先將布爾值向量類型轉換成浮點型向量，這樣子False就變成0，True變成1，然後計算這些值的平均數，以此來計算分類的準確度。

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))n

運行TensorFlow

創建TensorFlow會話（session）

一旦創建了TensorFlow圖，我們需要創建一個TensorFlow session，用來運行圖。

session = tf.Session()n

初始化變數

我們需要在開始優化weights和biases變數之前對它們進行初始化。

session.run(tf.global_variables_initializer())n

用來優化迭代的幫助函數

在訓練集中有50,000張圖。用這些圖像計算模型的梯度會花很多時間。因此我們利用隨機梯度下降的方法，它在優化器的每次迭代里只用到了一小部分的圖像。

batch_size = 100n

函數執行了多次的優化迭代來逐步地提升模型的weights和biases。在每次迭代中，從訓練集中選擇一批新的數據，然後TensorFlow用這些訓練樣本來執行優化器。

def optimize(num_iterations):n for i in range(num_iterations):n # Get a batch of training examples.n # x_batch now holds a batch of images andn # y_true_batch are the true labels for those images.n x_batch, y_true_batch = data.train.next_batch(batch_size)nn n # Put the batch into a dict with the proper namesn # for placeholder variables in the TensorFlow graph.n # Note that the placeholder for y_true_cls is not setn # because it is not used during training.n feed_dict_train = {x: x_batch,nn y_true: y_true_batch}n n # Run the optimizer using this batch of training data.n # TensorFlow assigns the variables in feed_dict_trainn # to the placeholder variables and then runs the optimizer.n session.run(optimizer, feed_dict=feed_dict_train)n

展示性能的幫助函數

測試集數據字典被當做TensorFlow圖的輸入。注意，在TensorFlow圖中，placeholder變數必須使用正確的名字。

feed_dict_test = {x: data.test.images,n y_true: data.test.labels,n y_true_cls: data.test.cls}n

用來列印測試集分類準確度的函數。

def print_accuracy():n # Use TensorFlow to compute the accuracy.n acc = session.run(accuracy, feed_dict=feed_dict_test)n n # Print the accuracy.n print("Accuracy on test-set: {0:.1%}".format(acc))n

用scikit-learn列印混淆矩陣。

def print_confusion_matrix():n # Get the true classifications for the test-set.n cls_true = data.test.clsn n # Get the predicted classifications for the test-set.n cls_pred = session.run(y_pred_cls, feed_dict=feed_dict_test)nn n # Get the confusion matrix using sklearn.n cm = confusion_matrix(y_true=cls_true,n y_pred=cls_pred)n n # Print the confusion matrix as text.nn print(cm)n n # Plot the confusion matrix as an image.n plt.imshow(cm, interpolation=nearest, cmap=plt.cm.Blues)n n # Make various adjustments to the plot.nn plt.tight_layout()n plt.colorbar()n tick_marks = np.arange(num_classes)n plt.xticks(tick_marks, range(num_classes))n plt.yticks(tick_marks, range(num_classes))n plt.xlabel(Predicted)nn plt.ylabel(True)n

繪製測試集中誤分類圖像的函數。

def plot_example_errors():n # Use TensorFlow to get a list of boolean valuesn # whether each test-image has been correctly classified,n # and a list for the predicted class of each image.n correct, cls_pred = session.run([correct_prediction, y_pred_cls],n feed_dict=feed_dict_test)nn n # Negate the boolean array.n incorrect = (correct == False)n n # Get the images from the test-set that have beenn # incorrectly classified.nn images = data.test.images[incorrect]n n # Get the predicted classes for those images.n cls_pred = cls_pred[incorrect]n n # Get the true classes for those images.nn cls_true = data.test.cls[incorrect]n n # Plot the first 9 images.n plot_images(images=images[0:9],n cls_true=cls_true[0:9],n cls_pred=cls_pred[0:9])n

繪製模型權重的幫助函數

這個函數用來繪製模型的權重weights。畫了10張圖像，訓練模型所識別出的每個數字對應著一張圖。

def plot_weights():n # Get the values for the weights from the TensorFlow variable.n w = session.run(weights)n n # Get the lowest and highest values for the weights.n # This is used to correct the colour intensity acrossnn # the images so they can be compared with each other.n w_min = np.min(w)n w_max = np.max(w)n n # Create figure with 3x4 sub-plots,n # where the last 2 sub-plots are unused.nn fig, axes = plt.subplots(3, 4)n fig.subplots_adjust(hspace=0.3, wspace=0.3)n n for i, ax in enumerate(axes.flat):n # Only use the weights for the first 10 sub-plots.n if i<10:nn # Get the weights for the ith digit and reshape it.n # Note that w.shape == (img_size_flat, 10)n image = w[:, i].reshape(img_shape)n n # Set the label for the sub-plot.n ax.set_xlabel("Weights: {0}".format(i))nn n # Plot the image.n ax.imshow(image, vmin=w_min, vmax=w_max, cmap=seismic)n n # Remove ticks from each sub-plot.n ax.set_xticks([])nn ax.set_yticks([])n

優化之前的性能

測試集上的準確度是9.8%。這是由於模型只做了初始化，並沒做任何優化，所以它通常將圖像預測成數字零，正如下面繪製的圖像那樣，剛好測試集中9.8%的圖像是數字零。

print_accuracy()n

Accuracy on test-set: 9.8%

plot_example_errors()n

1次迭代優化後的性能

在完成一次迭代優化之後，模型在測試集上的準確率從9.8%提高到了40.7%。這意味著它大約10次裡面會誤分類6次，正如下面所顯示的。

optimize(num_iterations=1)nnprint_accuracy()n

Accuracy on test-set: 40.7%

plot_example_errors()n

下面繪製的是權重。正值為紅色，負值為藍色。這些權重可以直觀地理解為圖像濾波器。

例如，權重用來確定一張數字零的圖像對圓形圖像有正反應（紅色），對圓形圖像的中間部分有負反應（藍色）。

類似的，權重也用來確定一張數字一的圖像對圖像中心垂直線段有正反應（紅色），對線段周圍有負反應（藍色）。

注意到權重大多看起來跟它要識別的數字很像。這是因為只做了一次迭代，即權重只在100張圖像上訓練。等經過上千張圖像的訓練之後，權重會變得更難分辨，因為它們需要識別出數字的許多種書寫方法。

plot_weights()n

10次優化迭代後的性能

# We have already performed 1 iteration.noptimize(num_iterations=9)nnprint_accuracy()n

Accuracy on test-set: 78.2%

plot_example_errors()n

plot_weights()n

1000次迭代之後的性能

在迭代了1000次之後，模型在十次裡面大約只誤識別了一次。如下圖所示，有些誤識別情有可原，因為即使在人類眼裡，也很難確定圖像（的數字），然而有一些圖像是很明顯的，好的模型應該能分辨出來。但這個簡單的模型無法達到更好的性能，因此需要更為複雜的模型。

# We have already performed 10 iterations.noptimize(num_iterations=990)nnprint_accuracy()n

Accuracy on test-set: 91.7%

plot_example_errors()n

模型經過了1000次迭代訓練，每次迭代用到訓練集裡面的100張圖像。由於圖像的多樣化，現在權重變得很難辨認，我們可能會懷疑這些權重是否真的理解數字是怎麼由線條組成的，或者模型只是記住了許多不同的像素。

plot_weights()n

我們也可以列印並繪製出混淆矩陣，它讓我們看到誤分類的更多細節。例如，它展示了描繪著數字5的圖像有時會被誤分類成其他可能的數字，但大多是3，6或8。

print_confusion_matrix()

[[ 957 0 3 2 0 5 11 1 1 0]
[ 0 1108 2 2 1 2 4 2 14 0]
[ 4 9 914 19 15 5 13 14 35 4]
[ 1 0 16 928 0 28 2 14 13 8]
[ 1 1 3 2 939 0 10 2 6 18]
[ 10 3 3 33 10 784 17 6 19 7]
[ 8 3 3 2 11 14 915 1 1 0]
[ 3 9 21 9 7 1 0 959 2 17]
[ 8 8 8 38 11 40 14 18 825 4]
[ 11 7 1 13 75 13 1 39 4 845]]

現在我們用TensorFlow完成了任務，關閉session，釋放資源。

# This has been commented out in case you want to modify and experimentn# with the Notebook without having to restart it.n# session.close()n

練習

下面使一些可能會讓你提升TensorFlow技能的一些建議練習。為了學習如何更合適地使用TensorFlow，實踐經驗是很重要的。

在你對這個Notebook進行修改之前，可能需要先備份一下。

改變優化器的學習率。
改變優化器，比如用AdagradOptimizer 或 AdamOptimizer。
將batch-size改為1或1000。
這些改變如何影響性能？
你覺得這些改變對其他分類問題或數學模型有相同的影響嗎?
如果你不改變任何參數，多次運行Notebook，會得到完成一樣的結果嗎？為什麼？
改變plot_example_errors() 函數，使它列印誤分類的 logits和y_pred值。
用sparse_softmax_cross_entropy_with_logits 代替 softmax_cross_entropy_with_logits。這可能需要改變代碼的多個地方。探討使用這兩中方法的優缺點。
不看源碼，自己重寫程序。
向朋友解釋程序如何工作。

注意：本文代碼格式如下

代碼n

輸出

TensorFlow 教程 #01 - 簡單線性模型

介紹

導入

載入數據

One-Hot 編碼

數據維度

用來繪製圖像的幫助函數

TensorFlow圖

佔位符 （Placeholder）變數

biases = tf.Variable(tf.zeros([num_classes]))n 模型

優化損失函數

優化方法

性能度量

運行TensorFlow

創建TensorFlow會話（session）

初始化變數

展示性能的幫助函數

繪製模型權重的幫助函數

優化之前的性能

1次迭代優化後的性能

10次優化迭代後的性能

1000次迭代之後的性能

練習

佔位符（Placeholder）變數

`biases = tf.Variable(tf.zeros([num_classes]))n`
模型