深度學習一行一行敲cyclegan-tensorflow版(ops.py文件Batch_Norm與instance_norm討論)

02-02

對源碼進行逐句解析，盡量說的很細緻。

歡迎各位看官捧場!

源碼地址：CycleGAN-tensorflow

論文地址：[1703.10593] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

其實一直對Batch_Norm和instance_norm感覺沒什麼區別，感覺instance_norm只是Batch_Norm的一維版本，其實這個是特別錯的!!!

def batch_norm(x, name="batch_norm"):n return tf.contrib.layers.batch_norm(x, decay=0.9, updates_collections=None, epsilon=1e-5, scale=True, scope=name)n

Batch_Norm是對整個batch進行規整，其是為了防止同一個batch之間的梯度相互抵消，其將不同batch規整到同一個均值0和方差1，這就要兩個參數來記錄batch的均值e,方差c，它規整化的對象是一個batch。其是對每一個神經元進行的，由於將均值方差規整到同一個數字，限制了特徵的分布特性會降低網路的表達能力，所以還要引入 $gamma$ ， $beta$ 來改變每一個特徵維度的概率分布，從而增加網路的表達能力。訓練時的情況，網路會記錄每一個batch的滑動平均的均值和方差，訓練結束的時候這四個參數就固定了測試時直接載入使用。

Batch_norm論文地址

深度學習中 Batch Normalization為什麼效果好？

YJango：YJango的Batch Normalization--介紹

def instance_norm(input, name="instance_norm"):n with tf.variable_scope(name):n depth = input.get_shape()[3]n scale = tf.get_variable("scale", [depth], initializer=tf.random_normal_initializer(1.0, 0.02, dtype=tf.float32))n offset = tf.get_variable("offset", [depth], initializer=tf.constant_initializer(0.0))n mean, variance = tf.nn.moments(input, axes=[1,2], keep_dims=True)n epsilon = 1e-5n inv = tf.rsqrt(variance + epsilon)n normalized = (input-mean)*invn return scale*normalized + offsetn

而instance_norm就相對要簡明的多了。它是將輸入在深度方向上減去均值除以標準差（但這個源碼用的是成標準差，就這樣吧誰讓它效果好了），這其實也可以增加加快網路的訓練的數度。為了增加非線性擬合能力，在乘由於scale加offset。cyclegan的訓練的batch_size為1，所以是不可以用batch_norm。它就是簡單的標準化，並沒有什麼需要訓練的參數。

keras寫法：

class InstanceNormalization(Layer):ntdef __init__(self, axis=-1, epsilon=1e-5, **kwargs):nttsuper(InstanceNormalization, self).__init__(**kwargs)nttself.axis = axisnttself.epsilon = epsilonnntdef build(self, input_shape):nttdim = input_shape[self.axis]nttif dim is None:ntttraise ValueError(Axis +str(self.axis)+ of input tensor should have a defined dimension but the layer received an input with shape +str(input_shape)+ .)nttshape = (dim,)nnttself.gamma = self.add_weight(shape=shape, name=gamma, initializer=initializers.random_normal(1.0, 0.02))nttself.beta = self.add_weight(shape=shape, name=beta, initializer=zeros)nttself.built = Truenntdef call(self, inputs, training=None):nttmean, var = tf.nn.moments(inputs, axes=[1,2], keep_dims=True)nttreturn K.batch_normalization(inputs, mean, var, self.beta, self.gamma, self.epsilon)n

總結起來就是：Batch_Norm是在一個Batch內不同樣本在的標準化，而instance_norm在一個樣本內的標準化。

最後放一張效果圖：

歡迎關注公眾號：huangxiaobai880

https://www.zhihu.com/video/932226129343823872
推薦閱讀：

※如果要學習並使用深度學習，應該學哪些預備知識？
※自動求導的二三事
※對於PCA或者SVD在降維來說，是去去除了相似性高的列？還是去掉信息量少的列？

TAG:深度学习DeepLearning | 机器学习 | 源码阅读 |