【博客存檔】TensorFlow之深入理解Neural Style

01-29

前言

前面TensorFlow入門一簡單講了下怎麼在Ubuntu安裝tensorflow gpu版本，也跑了下基於Mnist的比較基本的LR演算法，但是Tensorflow可遠遠不止這些，它能做很多很有意思的東西，這篇文章主要針對Tensorflow利用CNN的方法對藝術照片做下Neural Style的相關工作。首先，我會詳細解釋下A Neural Algorithm of Artistic Style這篇paper是怎麼做的，然後會結合一個開源的Tensorflow的Neural Style版本來領略下大神的風采。

A Neural Algorithm of Artistic Style

在藝術領域，尤其是繪畫，藝術家們通過創造不同的內容與風格，並相互交融影響來創立獨立的視覺體驗。如果給定兩張圖像，現在的技術手段，完全有能力讓計算機識別出圖像具體內容。而風格是一種很抽象的東西，在計算機的眼中，當然就是一些pixel，但人眼就能很有效地的辨別出不同畫家不同的style，是否有一些更複雜的feature來構成，最開始學習DeepLearning的paper時，多層網路的實質其實就是找出更複雜、更內在的features，所以圖像的style理論上可以通過多層網路來提取裡面可能一些有意思的東西。而這篇文章就是利用卷積神經網路（利用pretrain的Pre-trained VGG network model）來分別做Content、Style的reconstruction，在合成時考慮content loss 與style loss的最小化（其實還包括去噪變化的的loss），這樣合成出來的圖像會保證在content 和style的重構上更準確。

文章大綱

這裡是整個paper在neural style的工作流，理解這幅圖對理解整篇paper的邏輯很關鍵，主要分為兩部分：

Content Reconstruction: 上圖中下面部分是Content Reconstruction對應於CNN中的a，b，c，d，e層，注意最開始標了Content Representations的部分不是原始圖片（可以理解是給計算機比如分類器看的圖片，因此如果可視化它，可能完全就不知道是什麼內容），而是經過了Pre-trained之後的VGG network model的圖像數據，該model主要用來做object recognition，這裡主要用來生成圖像的Content Representations。理解了這裡，後面就比較容易了，經過五層卷積網路來做Content的重構，文章作者實驗發現在前3層的Content Reconstruction效果比較好，d，e兩層丟失了部分細節信息，保留了比較high-level的信息。
Style Reconstruction： Style的重構比較複雜，很難去模型化Style這個東西，Style Represention的生成也是和Content Representation的生成類似，也是由VGG network model去做的，不同點在於a,b,c,d,e的處理方式不同，Style Represention的Reconstruction是在CNN的不同的子集上來計算的，怎麼說呢，它會分別構造conv1_1(a),[conv1_1, conv2_1](b),[conv1_1, conv2_1, conv3_1],[conv1_1, conv2_1, conv3_1,conv4_1],[conv1_1, conv2_1, conv3_1, conv4_1, conv5_1]。這樣重構的Style 會在各個不同的尺度上更加匹配圖像本身的style，忽略場景的全局信息。

methods

理解了以上兩點，剩下的就是建模的數據問題了，這裡按Content和Style來分別計算loss，Content loss的method比較簡單：

其中F^l是產生的Content Representation在第l層的數據表示，P^l是原始圖片在第l層的數據表示，定義squared-error loss為兩種特徵表示的error。

Style的loss基本也和Content loss一樣，只不過要包含每一層輸出的errors之和

其中A^l 是原始style圖片在第l的數據表示，而G^l是產生的Style Representation在第l層的表示

定義好loss之後就是採用優化方法來最小化模型loss(注意paper當中只有content loss和style loss)，源碼當中還涉及到降噪的loss：

優化方法這裡就不講了，tensorflow有內置的如Adam這樣的方法來處理

Tensorflow版本源碼解讀

項目github地址：https://github.com/anishathalye/neural-style

代碼主要包括三個文件：neural_style.py, stylize.py, vgg.py。一些基本的介面代碼我就不描述了，直接來核心代碼：

g = tf.Graph()nwith g.as_default(), g.device(/cpu:0), tf.Session() as sess:n image = tf.placeholder(float, shape=shape)n net, mean_pixel = vgg.net(network, image)n content_pre = np.array([vgg.preprocess(content, mean_pixel)])n content_features[CONTENT_LAYER] = net[CONTENT_LAYER].eval(n feed_dict={image: content_pre})n

這裡會調用imagenet-vgg-verydeep-19.mat這個model，在這個基礎上通過vgg裡面的net構建前文當中提到的abcde那五個卷積層conv1_1, conv2_1, conv3_1, conv4_1, conv5_1，net每個不同的key表示對應的層，然後ceontent_pre得到經過model輸出後再經過abcde後的content的的feature

for i in range(len(styles)):n g = tf.Graph()n with g.as_default(), g.device(/cpu:0), tf.Session() as sess:n image = tf.placeholder(float, shape=style_shapes[i])n net, _ = vgg.net(network, image)n style_pre = np.array([vgg.preprocess(styles[i], mean_pixel)])n for layer in STYLE_LAYERS:n features = net[layer].eval(feed_dict={image: style_pre})n features = np.reshape(features, (-1, features.shape[3]))n gram = np.dot(features.T, features) / features.sizen style_features[i][layer] = gramn

這裡和content的feature的計算一樣，只不過，由於計算loss的方法不同（style loss 為total loss包括每一層輸出的loss），因此CONTENT_LAYER = relu4_2``STYLE_LAYERS = (relu1_1, relu2_1, relu3_1, relu4_1, relu5_1).

然後就是最小化loss的過程：

with tf.Graph().as_default():n if initial is None:n noise = np.random.normal(size=shape, scale=np.std(content) * 0.1)n initial = tf.random_normal(shape) * 0.256n else:n initial = np.array([vgg.preprocess(initial, mean_pixel)])n initial = initial.astype(float32)n image = tf.Variable(initial)n net, _ = vgg.net(network, image)nn # content lossn content_loss = content_weight * (2 * tf.nn.l2_loss(n net[CONTENT_LAYER] - content_features[CONTENT_LAYER]) /n content_features[CONTENT_LAYER].size)n # style lossn style_loss = 0n for i in range(len(styles)):n style_losses = []n for style_layer in STYLE_LAYERS:n layer = net[style_layer]n _, height, width, number = map(lambda i: i.value, layer.get_shape())n size = height * width * numbern feats = tf.reshape(layer, (-1, number))n gram = tf.matmul(tf.transpose(feats), feats) / sizen style_gram = style_features[i][style_layer]n style_losses.append(2 * tf.nn.l2_loss(gram - style_gram) / style_gram.size)n style_loss += style_weight * style_blend_weights[i] * reduce(tf.add, style_losses)n # total variation denoisingn tv_y_size = _tensor_size(image[:,1:,:,:])n tv_x_size = _tensor_size(image[:,:,1:,:])n tv_loss = tv_weight * 2 * (n (tf.nn.l2_loss(image[:,1:,:,:] - image[:,:shape[1]-1,:,:]) /n tv_y_size) +n (tf.nn.l2_loss(image[:,:,1:,:] - image[:,:,:shape[2]-1,:]) /n tv_x_size))n # overall lossn loss = content_loss + style_loss + tv_lossnn # optimizer setupn train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)n

和上文中提到的公式一一對應，除了多了一個total variation denoising，定義好 total loss後調用AdamOptimizer來進行迭代計算，最小化loss注意這裡的代碼還是按像素點計算，並未向量化，所以看起來會有點頭疼，後面如果更加熟悉tensorflow後，我再來這兒試圖改改，看看能不能把這裡計算的部分做稍微高效點。

如果想要詳細了解這部分代碼的童靴，可以clone這個項目下來，仔細研究研究，當做學習tensorflow。

Neural Style Demo

大家也可以用一些有意思的圖片來多試試看看效果

Reference

1 Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 「A neural algorithm of artistic style.」 arXiv preprint arXiv:1508.06576 (2015).

2 Pre-trained VGG network http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat

3 Neural Style with Tensorflow https://github.com/anishathalye/neural-style