Tensorflow小練習（四）：快速風格遷移（fast-style-transfer）

05-07

本文是圖片風格遷移領域經典文論《Perceptual Losses for Real-Time Style Transfer

and Super-Resolution》的tensorflow簡易實現，論文來自李飛飛實驗室的Justin Johnson，相信很多看過cs231n的同學都認識他。

框架：Tensorflow(>=1.4)+python(2.7)+slim

訓練數據集：COCO2014 http://images.cocodataset.org/zips/train2014.zip (13GB左右)，風格圖片來自lengstrom/fast-style-transfer

代碼：2012013382/tensorflow-slim-fast-style-transfer

工具：Titan X(12G)或性能較好的CUP。

演算法

該模型分為兩個部分，一個風格圖像生成網路（Image Transform Net），和一個用於計算損失的網路(VGG-16)。

圖1演算法框架圖[1]

如圖1所示，輸入圖片 $x$ 通過風格遷移網路得到結果 $reve{y}$ ，我們希望這個結果在內容上與輸入原圖 $x$ （即content target $y_{c}$ ）一致，並且希望在風格上與風格圖片（Style Target $y_{s}$ ）一致。因此將三者都輸入到損失網路（VGG-16）中，產生相應的損失（包括內容損失content loss和風格損失style loss）。通過end-to-end的訓練，即可得到理想的風格圖片。

內容損失（Content loss）

所謂的content loss，其實就是風格化後的圖片與原圖片之間的loss。要注意的是這裡使用的不是像素級的pixel-to-pixel的loss，而是卷積層上的loss。由於這個的VGG-16是在Image Net上預訓練好的，因此卷積層可以有效地對圖片的內容進行描述。這裡的損失直接使用l2-loss即可。

風格損失（Style loss）

計算風格損失的時候使用的是Gram矩陣，可以簡單理解為去除空間信息的融合各個通道的特徵響應，這符合我們對風格的定義。詳細的介紹可以參考Gram Matrices理解 - CSDN博客和Batch normalization和Instance normalization的對比？。

代碼實現

數據集使用了COCO2014，總共8萬多張圖片，下載後解壓到項目的根目錄中。我們先將所有訓練圖片的地址做成一個list，保存在train.list中，用於訓練時方便進行讀取，每次訓練都從硬碟上讀取一個batch的數據。Tensorflow有更好的讀取方式，但是為了理解簡便，使用這種較為粗暴的方式。

在convert_images_to_list.sh中寫入

#!/bin/bash> train.listCOUNT=-1for folder in $1/*do COUNT=$[$COUNT + 1] for imagesFolder in "$folder" do echo "$imagesFolder" >> train.list donedone

並在命令行中輸入

./convert_images_to_list.sh train2014/

生成train.list文件。

數據預處理

唯一的預處理是將訓練圖片減去所有圖片的均值。

from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom scipy.misc import imread, imresizeimport numpy as npimport cv2COCO_image_path = train2014/IMG_SIZE = 256IMG_CHANNEL = 3#Mean pixel for all images in the set. It is provided by https://github.com/lengstrom/fast-style-transferMEAN_PIXEL = np.array([[123.68, 116.779, 103.939]])#Read image function for style image or test image def read_image(filename, BATCH=False, batch_size=64): if BATCH: img = imread(filename) img = imresize(img, (IMG_SIZE, IMG_SIZE, IMG_CHANNEL)) img_batch = [] for _ in range(batch_size): img_batch.append(img) img_batch = np.array(img_batch).astype(np.float32) return img_batch else: img = imread(filename) img = imresize(img, (IMG_SIZE, IMG_SIZE, IMG_CHANNEL)) img_batch = [] img_batch.append(img) img_batch = np.array(img_batch).astype(np.float32) return img_batch#Get batches from the data set. I know there are better methods in Tensorflow to get input data, but I just read them from the prepared list#for simplicity.def get_batches(filename, batch_index, batch_size=4): lines = open(filename, r) images = [] lines = list(lines) image_indices = range(len(lines)) count = 0 for i in image_indices[batch_index: batch_index + batch_size]: if count >= batch_size: break count += 1 dirname = lines[i].strip( ).split() img = imread(dirname[0]) img = imresize(img, (IMG_SIZE, IMG_SIZE, IMG_CHANNEL)) #The only process for input images is subtracting mean value of each channel. if len(img.shape) < 3: timg = img img = np.zeros((IMG_SIZE, IMG_SIZE, IMG_CHANNEL)).astype(np.float32) img[:, :, 0] = timg - MEAN_PIXEL[0, 0] img[:, :, 1] = timg - MEAN_PIXEL[0, 1] img[:, :, 2] = timg - MEAN_PIXEL[0, 2] cv2.namedWindow(test win, flags=0) cv2.imshow(test win, img) cv2.waitKey(0) images.append(img) images_np = np.array(images).astype(np.float32) batch_index = batch_index + batch_size return images_np, batch_index

模型

模型使用slim進行構建

import tensorflow as tfimport tensorflow.contrib.slim as slimimport tensorflow.contrib.slim.nets as netsThis file only contains the generator net and loss net(vgg_16)We apply instance normalization and resize conv2d.We utilize slim to build network for its simplicityFunction instance_norm, resize_conv2d and gram are provided by https://github.com/hzy46/fast-neural-style-tensorflowdef conv2d_slim(x, filter_num, kernel_size, strides, name): return slim.conv2d(x, filter_num, [kernel_size, kernel_size], stride=strides, weights_regularizer=slim.l2_regularizer(1e-6), biases_regularizer=slim.l2_regularizer(1e-6), padding=SAME, activation_fn=None, scope=name)def instance_norm(x): epsilon = 1e-9 mean, var = tf.nn.moments(x, [1,2], keep_dims=True) return tf.div(tf.subtract(x, mean), tf.sqrt(tf.add(var, epsilon)))def residual(x, filter_num, kernel_size, strides, name): with tf.variable_scope(name): conv1 = conv2d_slim(x, filter_num, kernel_size, strides, conv1) conv2 = conv2d_slim(tf.nn.relu(conv1), filter_num, kernel_size, strides, conv2) residual = x + conv2 return residualdef resize_conv2d(x, filters_num, kernel_size, strides, training, name): with tf.variable_scope(name): height = x.get_shape()[1].value if training else tf.shape(x)[1] width = x.get_shape()[2].value if training else tf.shape(x)[2] new_height = height * strides * 2 new_width = width * strides * 2 x_resized = tf.image.resize_images(x, [new_height, new_width], tf.image.ResizeMethod.NEAREST_NEIGHBOR) return conv2d_slim(x_resized, filters_num, kernel_size, strides, conv1)def generator(image, training): image = tf.pad(image, [[0, 0], [10, 10], [10, 10], [0, 0]], mode=REFLECT) with tf.variable_scope(generator): conv1 = tf.nn.relu(instance_norm(conv2d_slim(image, 32, 9, 1, conv1))) conv2 = tf.nn.relu(instance_norm(conv2d_slim(conv1, 64, 3, 2, conv2))) conv3 = tf.nn.relu(instance_norm(conv2d_slim(conv2, 128, 3, 2, conv3))) res1 = residual(conv3, 128, 3, 1, res1) res2 = residual(res1, 128, 3, 1, res2) res3 = residual(res2, 128, 3, 1, res3) res4 = residual(res3, 128, 3, 1, res4) res5 = residual(res4, 128, 3, 1, res5) deconv1 = tf.nn.relu(instance_norm(resize_conv2d(res5, 64, 3, 1, training, deconv1))) deconv2 = tf.nn.relu(instance_norm(resize_conv2d(deconv1, 32, 3, 1, training, deconv2))) deconv3 = tf.nn.tanh(instance_norm(conv2d_slim(deconv2, 3, 9, 1, deconv3))) #re-vlaue to [0, 255] y = (deconv3 + 1.0) * 127.5 height = tf.shape(y)[1] width = tf.shape(y)[2] y = tf.slice(y, [0, 10, 10, 0], tf.stack([-1, height - 20, width - 20, -1])) return y#Loss model Vgg 16 provided by slim.def loss_model(x): #x = x / 127.5 - 1 logits, endpoints_dict = nets.vgg.vgg_16(x, spatial_squeeze=False) return logits, endpoints_dict#content lossdef content_loss(endpoints_mixed, content_layers): loss = 0 for layer in content_layers: A, B, _ = tf.split(endpoints_mixed[layer], 3, 0) size = tf.size(A) loss += tf.nn.l2_loss(A - B) * 2 / tf.to_float(size) return loss#style lossdef style_loss(endpoints_mixed, style_layers): loss = 0 for layer in style_layers: _, B, C = tf.split(endpoints_mixed[layer], 3, 0) size = tf.size(B) loss += tf.nn.l2_loss(gram(B) - gram(C)) * 2 / tf.to_float(size) return loss#Gramdef gram(layer): shape = tf.shape(layer) num_images = shape[0] width = shape[1] height = shape[2] num_filters = shape[3] features = tf.reshape(layer, tf.stack([num_images, -1, num_filters])) grams = tf.matmul(features, features, transpose_a=True) / tf.to_float(width * height * num_filters) return grams

訓練

默認在數據集上訓練1個epoch，所有的參數均與論文中一致。超參數STYLE WEIGHT和 CONTENT WEIGHT參考 https://github.com/hzy46/fast-neural-style-tensorflow

#coding: utf-8from __future__ import print_functionfrom __future__ import divisionimport tensorflow as tfimport tensorflow.contrib.slim as slimimport modelimport data_processingimport numpy as npfrom os.path import joinfrom scipy.misc import imsaveBATCH_SIZE = 4IMAGE_SIZE = 256CHANNEL_NUM = 3CONTENT_LAYERS = ["vgg_16/conv3/conv3_3"]STYLE_LAYERS = ["vgg_16/conv1/conv1_2", "vgg_16/conv2/conv2_2", "vgg_16/conv3/conv3_3", "vgg_16/conv4/conv4_3"]#STYLE_WEIGHT and CONTENT_WEIGHT are provided by https://github.com/hzy46/fast-neural-style-tensorflow#STYLE_WEIGHT=50.0, CONTENT_WEIGHT=1.0 for Candy style.#STYLE_WEIGHT=180.0, CONTENT_WEIGHT=1.0 for cubist style.#STYLE_WEIGHT=220.0, CONTENT_WEIGHT=1.0 for feathers style.#STYLE_WEIGHT=100.0, CONTENT_WEIGHT=1.0 for mosaic style.#STYLE_WEIGHT=250.0, CONTENT_WEIGHT=1.0 for scream style.#STYLE_WEIGHT=200.0, CONTENT_WEIGHT=1.0 for udnie style.#STYLE_WEIGHT=220.0, CONTENT_WEIGHT=1.0 for wave style.STYLE_WEIGHT = 220.0CONTENT_WEIGHT = 1.0MODEL_PATH = model/vgg_16.ckptSTYLE_IMAGE_PATH = style_images/wave.jpgTRAIN_IMAGE_PATH = train.listTRAIN_CHECK_POINT = model/trained_model/wave/TEST_IMAGE_PATH = test/test.jpgIMAGE_SAVE_PATH = image/CHECK_POINT_PATH = log/LEARNING_RATE = 1e-3EPOCH_NUM = 1DATA_SIZE = 82783with tf.Graph().as_default(): image = tf.placeholder(tf.float32, [None, IMAGE_SIZE, IMAGE_SIZE, CHANNEL_NUM]) style_image = tf.placeholder(tf.float32, [None, IMAGE_SIZE, IMAGE_SIZE, CHANNEL_NUM]) #Obtain image features generated_image = model.generator(image, training=True) squeezed_generated_image = tf.image.encode_jpeg(tf.cast(tf.squeeze(generated_image, [0]), tf.uint8)) #Obtain loss model layers for image, generated_image and style image. _, endpoints_mixed = model.loss_model(tf.concat([image, generated_image, style_image], 0)) variables_to_restore = slim.get_variables_to_restore(include=[vgg_16]) restorer = tf.train.Saver(variables_to_restore) #Content loss content_loss = model.content_loss(endpoints_mixed, CONTENT_LAYERS) #Style loss style_loss = model.style_loss(endpoints_mixed, STYLE_LAYERS) loss = STYLE_WEIGHT * style_loss + CONTENT_WEIGHT * content_loss tf.summary.scalar(losses/content_loss, CONTENT_WEIGHT * content_loss) tf.summary.scalar(losses/style_loss, STYLE_WEIGHT * style_loss) tf.summary.scalar(losses/loss, loss) tf.summary.image(generated, generated_image) tf.summary.image(origin, image) summary = tf.summary.merge_all() #Only train generator network variables_for_training = slim.get_variables_to_restore(include=[generator]) gradients = tf.gradients(loss, variables_for_training) grad_and_var = list(zip(gradients, variables_for_training)) optimizer = tf.train.AdamOptimizer(LEARNING_RATE) opt_op = optimizer.apply_gradients(grads_and_vars=grad_and_var) #Only save parameters of generator model variables_to_save = slim.get_variables_to_restore(include=[generator]) saver = tf.train.Saver(variables_to_save, max_to_keep=100) config = tf.ConfigProto() config.gpu_options.allow_growth = True with tf.Session() as sess: train_writer = tf.summary.FileWriter(CHECK_POINT_PATH, sess.graph) # Restore variables from disk. sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) restorer.restore(sess, MODEL_PATH) #Obtain style image batch style_image_batch = data_processing.read_image(STYLE_IMAGE_PATH, True, BATCH_SIZE) step = 0 for epoch in range(EPOCH_NUM): batch_index = 0 for i in range(DATA_SIZE // BATCH_SIZE): image_batch, batch_index = data_processing.get_batches(TRAIN_IMAGE_PATH, batch_index, BATCH_SIZE) _, batch_ls, style_ls, content_ls, summary_str = sess.run([opt_op, loss, style_loss, content_loss, summary], feed_dict={image: image_batch, style_image: style_image_batch}) step += 1 if i % 10 == 0: print(Epoch %d, Batch %d of %d, loss is %.3f, style loss is %.3f, content loss is %.3f%(epoch + 1, i, DATA_SIZE // BATCH_SIZE, batch_ls, 220 * style_ls, content_ls)) train_writer.add_summary(summary_str, step) test_image = data_processing.read_image(TEST_IMAGE_PATH) styled_image = sess.run(squeezed_generated_image, feed_dict={image: test_image}) #imsave(join(IMAGE_SAVE_PATH, epoch + str(epoch + 1) + .jpg), styled_image) with open(training_image/res.jpg, wb) as img_s: img_s.write(styled_image) if i % 1000 == 0: #save model parameters saver.save(sess, join(TRAIN_CHECK_POINT, model.ckpt), global_step=step)

測試

#coding: utf-8from __future__ import print_functionfrom __future__ import divisionimport tensorflow as tfimport tensorflow.contrib.slim as slimimport modelimport numpy as npfrom os.path import joinfrom scipy.misc import imread, imresize, imsaveTEST_IMAGE_PATH = test/test.jpgMODEL_PATH = model/trained_model/wave/model.ckpt-20001IMAGE_SAVE_PATH = ./#Read test imagetest_img = imread(TEST_IMAGE_PATH)t_shape = test_img.shapetest_img = imresize(test_img, (t_shape[0], t_shape[1], t_shape[2]))test_imgg = []test_imgg.append(test_img)test_imgg = np.array(test_imgg).astype(np.float32)with tf.Graph().as_default(): test_image = tf.placeholder(tf.float32, [None, t_shape[0], t_shape[1], t_shape[2]]) generated_image = model.generator(test_image, training=False) squeezed_generated_image = tf.squeeze(generated_image, [0]) restorer = tf.train.Saver() config = tf.ConfigProto() config.gpu_options.allow_growth = True with tf.Session(config=config) as sess: sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) restorer.restore(sess, MODEL_PATH) styled_image = sess.run(squeezed_generated_image, feed_dict={test_image: test_imgg}) imsave(join(IMAGE_SAVE_PATH, test.jpg), np.squeeze(styled_image))