TensorFlow小試牛刀(2):GAN生成手寫數字

01-29

TensorFlow入門實戰第二彈，今天是自己寫了一個GAN，實現了一下生成手寫數字。以前讀了不少GAN的源碼，感覺風格都比較接近，今天就用我最喜歡的代碼風格實現了一遍。

首先數據集使用的是著名的MNIST，每一張圖片的大小為[28, 28, 1]，訓練集有60000張，測試集有10000張，共有70000張可以使用來訓練GAN

使用的GAN的種類是DCGAN，即deep convolutional GAN，同時使用了CGAN的condition，用條件來約束GAN生成的圖像的內容。

IDE使用的是GVim（也就是windows下的Vim）

我的網路結構是如下圖所示：

（原諒我懶，手繪網路圖）

代碼結構分成了4個部分：

read_data
ops
model
train

使用的layer的種類有：

conv（卷積層）
deconv（反卷積層）
linear（線性層）
batch_norm（批量歸一化層）
lrelu/relu/sigmoid（非線性函數層）

1.數據預處理和讀入

import os nimport numpy as npnimport tensorflow as tfnndef read_data():n data_dir = "datamnist"n #read training datan fd = open(os.path.join(data_dir,"train-images.idx3-ubyte"))n loaded = np.fromfile(file = fd, dtype = np.uint8)n trainX = loaded[16:].reshape((60000, 28, 28, 1)).astype(np.float)nn fd = open(os.path.join(data_dir,"train-labels.idx1-ubyte"))n loaded = np.fromfile(file = fd, dtype = np.uint8)n trainY = loaded[8:].reshape((60000)).astype(np.float)nn #read test datan fd = open(os.path.join(data_dir,"t10k-images.idx3-ubyte"))n loaded = np.fromfile(file = fd, dtype = np.uint8)n testX = loaded[16:].reshape((10000, 28, 28, 1)).astype(np.float)nn fd = open(os.path.join(data_dir,"t10k-labels.idx1-ubyte"))n loaded = np.fromfile(file = fd, dtype = np.uint8)n testY = loaded[8:].reshape((10000)).astype(np.float)nn X = np.concatenate((trainX, testX), axis = 0)n y = np.concatenate((trainY, testY), axis = 0)nn print(X[:2])n #set the random seedn seed = 233n np.random.seed(seed)n np.random.shuffle(X)n np.random.seed(seed)n np.random.shuffle(y)nn return X/255, yn

首先是把下載下來的MNIST數據存在當前文件夾下的data文件夾里的mnist文件夾，把訓練集和測試集讀入，並且將兩個集合併乘70000大小的訓練集，然後是使用了numpy中的隨機化，設置相同的seed就可以把兩個數組隨機成相同順序的。然後把X範圍歸於0到1之間（原X中的數據為0-255的整數），y標籤大小為[70000]的向量。

2.layer的實現

import tensorflow as tfnfrom tensorflow.contrib.layers.python.layers import batch_norm as batch_normnndef linear_layer(value, output_dim, name = linear_connected):n with tf.variable_scope(name):n try:n weights = tf.get_variable(weights, n [int(value.get_shape()[1]), output_dim], n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases, n [output_dim], initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights = tf.get_variable(weights, n [int(value.get_shape()[1]), output_dim], n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases, n [output_dim], initializer = tf.constant_initializer(0.0))n return tf.matmul(value, weights) + biasesnndef conv2d(value, output_dim, k_h = 5, k_w = 5, strides = [1,1,1,1], name = "conv2d"):n with tf.variable_scope(name):n try:n weights = tf.get_variable(weights, n [k_h, k_w, int(value.get_shape()[-1]), output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases,n [output_dim], initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights = tf.get_variable(weights, n [k_h, k_w, int(value.get_shape()[-1]), output_dim],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases,n [output_dim], initializer = tf.constant_initializer(0.0))n conv = tf.nn.conv2d(value, weights, strides = strides, padding = "SAME")n conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())n return convnndef deconv2d(value, output_shape, k_h = 5, k_w = 5, strides = [1,1,1,1], name = "deconv2d"):n with tf.variable_scope(name):n try:n weights = tf.get_variable(weights,n [k_h, k_w, output_shape[-1], int(value.get_shape()[-1])],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases,n [output_shape[-1]], initializer = tf.constant_initializer(0.0))n except ValueError:n tf.get_variable_scope().reuse_variables()n weights = tf.get_variable(weights,n [k_h, k_w, output_shape[-1], int(value.get_shape()[-1])],n initializer = tf.truncated_normal_initializer(stddev = 0.02))n biases = tf.get_variable(biases,n [output_shape[-1]], initializer = tf.constant_initializer(0.0))n deconv = tf.nn.conv2d_transpose(value, weights, output_shape, strides = strides)n deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())n return deconvnndef conv_cond_concat(value, cond, name = concat):n value_shapes = value.get_shape().as_list()n cond_shapes = cond.get_shape().as_list()nn with tf.variable_scope(name):n return tf.concat([value, cond * tf.ones(value_shapes[0:3] + cond_shapes[3:])], 3, name = name)nndef batch_norm_layer(value, is_train = True, name = batch_norm):n with tf.variable_scope(name) as scope:n if is_train:n return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,n is_training = is_train, updates_collections = None, scope = scope)n else :n return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,n is_training = is_train, reuse = True,n updates_collections = None, scope = scope)nndef lrelu(x, leak = 0.2, name = lrelu):n with tf.variable_scope(name):n return tf.maximum(x, x*leak, name = name)n

linear層，conv層和bn層都是前面CNN中使用的，這裡也一樣，加上了為了防止ValueError的寫法。

deconv層是反卷積層，也叫轉置卷積層，是卷積層反向傳播時的操作，熟悉卷積神經網路反向傳播原理的肯定很容易就能理解deconv層的操作，只要輸入輸出的大小，以及filter和步長strides的大小就可以使用tf里封裝的函數了。

conv_cond_concat是為了把用於卷積層計算的四維數據[batch_size, w, h, c]和約束條件y連接起來的操作，需要把兩個數據的前三維轉化到一樣大小才能使用tf.concat

lrelu就是relu的改良版，按照論文里的要求使用的。

3.model

import tensorflow as tfnfrom ops import * nnBATCH_SIZE = 64nndef generator(z, y, train = True):n yb = tf.reshape(y, [BATCH_SIZE, 1, 1, 10], name = g_yb)n z_y = tf.concat([z,y], 1, name = g_z_concat_y)n n linear1 = linear_layer(z_y, 1024, name = g_linear_layer1)n bn1 = tf.nn.relu(batch_norm_layer(linear1, is_train = True, name = g_bn1))nn bn1_y = tf.concat([bn1, y], 1 ,name = g_bn1_concat_y)n linear2 = linear_layer(bn1_y, 128*49, name = g_linear_layer2)n bn2 = tf.nn.relu(batch_norm_layer(linear2, is_train = True, name = g_bn2))n bn2_re = tf.reshape(bn2, [BATCH_SIZE, 7, 7, 128], name = g_bn2_reshape)nn bn2_yb = conv_cond_concat(bn2_re, yb, name = g_bn2_concat_yb) n deconv1 = deconv2d(bn2_yb, [BATCH_SIZE, 14, 14, 128], strides = [1, 2, 2, 1], name = g_deconv1)n bn3 = tf.nn.relu(batch_norm_layer(deconv1, is_train = True, name = g_bn3))nn bn3_yb = conv_cond_concat(bn3, yb, name = g_bn3_concat_yb)n deconv2 = deconv2d(bn3_yb, [BATCH_SIZE, 28, 28, 1], strides = [1, 2, 2, 1], name = g_deconv2)n return tf.nn.sigmoid(deconv2)nndef discriminator(image, y, reuse = False):n if reuse:n tf.get_variable_scope().reuse_variables()nn yb = tf.reshape(y, [BATCH_SIZE, 1, 1, 10], name = d_yb)n image_yb = conv_cond_concat(image, yb, name = d_image_concat_yb)n conv1 = conv2d(image_yb, 11, strides = [1, 2, 2, 1], name = d_conv1)n lr1 = lrelu(conv1, name = d_lrelu1)nn lr1_yb = conv_cond_concat(lr1, yb, name = d_lr1_concat_yb)n conv2 = conv2d(lr1_yb, 74, strides = [1, 2, 2, 1], name = d_conv2)n bn1 = batch_norm_layer(conv2, is_train = True, name = d_bn1)n lr2 = lrelu(bn1, name = d_lrelu2)n lr2_re = tf.reshape(lr2, [BATCH_SIZE, -1], name = d_lr2_reshape)n n lr2_y = tf.concat([lr2_re, y], 1, name = d_lr2_concat_y)n linear1 = linear_layer(lr2_y, 1024, name = d_linear_layer1)n bn2 = batch_norm_layer(linear1, is_train = True, name = d_bn2)n lr3 = lrelu(bn2, name = d_lrelu3)nn lr3_y = tf.concat([lr3, y], 1, name = d_lr3_concat_y)n linear2 = linear_layer(lr3_y, 1, name = d_linear_layer2)n n return linear2nndef sampler(z, y, train = True):n tf.get_variable_scope().reuse_variables()n return generator(z, y, train = train)n

G的模型，完全按照前面畫的模型圖來實現，沒有什麼難度，最多是deconv層需要算好strides的大小，不過圖也是計算好的前提下才能畫出來的。返回值用了sigmoid，規範到（0，1）之內，與前面輸入圖像的範圍一致。

D的模型，也是完全按照圖來寫的。只是有兩個需要注意的地方，一個就是需要設置一個reuse變數，為什麼呢。第一篇文章講過reuse主要是用來實現共享變數的，為什麼GAN需要共享變數呢。GAN需要對於同一個D，先餵給它real data訓練一波，接著然後餵給它fake data訓練一波，在一次train_step里這裡涉及了兩次D的變數重用，所以需要設置共享，不然就會新創建變數訓練fake data了。

第二點是最後返回值沒有使用sigmoid，因為在train的時候我只用了sigmoid_cross_entropy_with_logits來計算loss，所以只要傳入沒用經過sigmoid處理的就行了。

最後的sampler模型，是用於在訓練中，去生成圖像的，純粹是為了不用generator里加reuse變數而使用的。其實在generator模型里加個reuse重用一下變數就行了。這樣寫清楚一點。

4.train

# -*- coding: utf-8 -*-nimport scipy.miscnimport numpy as npnimport tensorflow as tfnimport os nfrom read_data import *nfrom ops import * nfrom model import * nnBATCH_SIZE = 64nndef save_images(images, size, path):n img = (images + 1.0)/2.0n h, w = img.shape[1], img.shape[2]nn merge_img = np.zeros((h * size[0], w * size[1], 3))nn for idx, image in enumerate(images):n i = idx % size[1]n j = idx // size[1]n merge_img[j*h:j*h+h,i*w:i*w+w,:] = imagenn return scipy.misc.imsave(path, merge_img)nndef train():nn #read datan X, Y = read_data()n n #global_step to record the step of trainingn global_step = tf.Variable(0, name = global_step, trainable = True)n n #set the data placeholdern y = tf.placeholder(tf.int32, [BATCH_SIZE], name = y)n _y = tf.one_hot(y, depth = 10, on_value=None, off_value=None, axis=None, dtype=None, name=one_hot)n z = tf.placeholder(tf.float32, [BATCH_SIZE, 100], name = z)n images = tf.placeholder(tf.float32, [BATCH_SIZE, 28, 28, 1], name = images)nn #modeln G = generator(z, _y)n #train real datan D = discriminator(images, _y)n #train generated datan _D = discriminator(G, _y)n n #calculate loss using sigmoid cross entropyn d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = D, labels = tf.ones_like(D)))n d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = _D, labels = tf.zeros_like(_D)))n g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = _D, labels = tf.ones_like(_D)))n d_loss = d_loss_real + d_loss_faken n t_vars = tf.trainable_variables()n d_vars = [var for var in t_vars if d_ in var.name]n g_vars = [var for var in t_vars if g_ in var.name]nn with tf.variable_scope(tf.get_variable_scope(), reuse = False):n d_optim = tf.train.AdamOptimizer(0.0002, beta1 = 0.5).minimize(d_loss, var_list = d_vars, global_step = global_step)n g_optim = tf.train.AdamOptimizer(0.0002, beta2 = 0.5).minimize(g_loss, var_list = g_vars, global_step = global_step)nn #tensorboradn train_dir = logsn z_sum = tf.summary.histogram("z",z)n d_sum = tf.summary.histogram("d",D)n d__sum = tf.summary.histogram("d_",_D)n g_sum = tf.summary.histogram("g", G)nn d_loss_real_sum = tf.summary.scalar("d_loss_real", d_loss_real)n d_loss_fake_sum = tf.summary.scalar("d_loss_fake", d_loss_fake)n g_loss_sum = tf.summary.scalar("g_loss", g_loss)n d_loss_sum = tf.summary.scalar("d_loss", d_loss)nn g_sum = tf.summary.merge([z_sum, d__sum, g_sum, d_loss_fake_sum, g_loss_sum])n d_sum = tf.summary.merge([z_sum, d_sum, d_loss_real_sum, d_loss_sum])nn #initial n init = tf.global_variables_initializer()n sess = tf.InteractiveSession()n writer = tf.summary.FileWriter(train_dir, sess.graph)nn #saven saver = tf.train.Saver()n check_path = "save/model.ckpt"n n #samplen sample_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, 100))n sample_labels = Y[0:BATCH_SIZE]nn #make samplen sample = sampler(z, _y)nn #runn sess.run(init)n #saver.restore(sess.check_path)n n #trainn for epoch in range(10):n batch_idx = int(70000/64)n for idx in range(batch_idx):n batch_images = X[idx*64idx+1)*64]n batch_labels = Y[idx*64idx+1)*64]n batch_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, 100))nn _, summary_str = sess.run([d_optim, d_sum],n feed_dict = {images: batch_images,n z: batch_z,n y: batch_labels})n writer.add_summary(summary_str, idx+1)nn _, summary_str = sess.run([g_optim, g_sum],n feed_dict = {images: batch_images,n z: batch_z,n y: batch_labels})n writer.add_summary(summary_str, idx+1)nn d_loss1 = d_loss_fake.eval({z: batch_z, y: batch_labels})n d_loss2 = d_loss_real.eval({images: batch_images, y:batch_labels})n D_loss = d_loss1 + d_loss2n G_loss = g_loss.eval({z: batch_z, y: batch_labels})n n #every 20 batch output lossn if idx % 20 == 0:n print("Epoch: %d [%4d/%4d] d_loss: %.8f, g_loss: %.8f" % (epoch, idx, batch_idx, D_loss, G_loss))n n #every 100 batch save a picturen if idx % 100 == 0:n sap = sess.run(sample, feed_dict = {z: sample_z, y: sample_labels})n samples_path = samplen save_images(sap, [8,8], samples_path+test_%d_epoch_%d.png % (epoch, idx))n n #every 500 batch save modeln if idx % 500 == 0:n saver.save(sess, check_path, global_step = idx + 1)n sess.close() nnif __name__ == __main__:n train()n

設置了一個_y的placeholder主要是把y變成[BATCH_SIZE, 10]大小的one-hot編碼格式。

模型訓練的順序是先generator生成fake data，然後real data餵給D訓練，再把fake data餵給D訓練。

loss的計算是分開計算了real loss和fake loss，然後相加才是D的loss，應該理解上也沒有問題。

設置了一些tensorboard中的觀測數據，以及saver來存儲模型，這些大多是參考別人的代碼寫的。訓練中就是每一個batch的訓練，訓練一次D，再訓練一次G，按照論文里講的應該是訓練k次D，訓練一次G。但是按照Goodfellow本人說的一般是一次D一次G也沒有問題。

然後每100個batch就生成一下sample圖片，我最終跑出來的效果是這樣的。

最後一張圖片放大是這樣的：

可以看到，部分數字生成的和real data中的很相似，但是也有部分數字還是有點崩。不過本來這個MNIST裡面的real data中的數字也非常吃藕，我也就不往下訓練了。

可以觀察下最後幾輪訓練的誤差：

有的g_loss很小，有的很大，說明有的圖已經很realistic了，有的還不行，一般是d_loss小的g_loss大，d_loss大的g_loss小，在這樣互相的對抗中一直訓練下去，我的model可能還沒有擬合，但是看生成出來的效果已經還可以了，就不往下繼續訓練了，畢竟筆記本負擔有點大。

5.參考

代碼部分參考於這位大佬DL(Deep Learning)小記，我比較喜歡他的代碼風格，就模仿了。最後有些bug不知道如何調試，就改的和他差不多了。不過ops和model以及train的大部分還是自己理解之後獨立實現的。

寫代碼之餘可以配套理解GAN的原理GAN原理學習筆記