怎樣在tensorflow中使用batch normalization?

試了幾個版本的batch normalization,包括tf.contribute中的,slim中的,也從stackoverflow上找了幾個版本的,都不對。幾個版本的代碼如下:

版本1:源自stackoverflow

def batch_norm_layer(x, train_phase, scope_bn="bn"):

bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,

updates_collections=None,

is_training=True,

reuse=None, # is this right?

trainable=True,

scope=scope_bn)

bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,

updates_collections=None,

is_training=False,

reuse=True, # is this right?

trainable=True,

scope=scope_bn)

z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)

return z

版本2:from tensorflow.contrib.layers.python.layers.layers import batch_norm,直接用

以上各版本都是在訓練時is_training為True或一個placeholder,預測時為False,預測結果隨著batch_size的大小變化,都是不穩定的,即,以上各版本的batch_normalization_layer 都是錯誤。請大家看下我錯在了哪兒,該怎樣正確使用,最好有示例代碼,謝謝!


之前也有和題主一樣的疑問 找了好幾個之後 暫時這個用的還好

def batch_norm_layer(x, train_phase, scope_bn):
with tf.variable_scope(scope_bn):
beta = tf.Variable(tf.constant(0.0, shape=[x.shape[-1]]), name="beta", trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[x.shape[-1]]), name="gamma", trainable=True)
axises = np.arange(len(x.shape) - 1)
batch_mean, batch_var = tf.nn.moments(x, axises, name="moments")
ema = tf.train.ExponentialMovingAverage(decay=0.5)

def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)

mean, var = tf.cond(train_phase, mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed


在高票(作者:于洋,鏈接:于洋:怎樣在tensorflow中使用batch normalization?)基礎的細節更改。主要減少了不比要的moments計算。參數仿造pytorch

from tensorflow.python.training.moving_averages import assign_moving_average

def batch_norm(x, train, eps=1e-05, decay=0.9, affine=True, name=None):
with tf.variable_scope(name, default_name="BatchNorm2d"):
params_shape = tf.shape(x)[-1:]
moving_mean = tf.get_variable("mean", params_shape,
initializer=tf.zeros_initializer,
trainable=False)
moving_variance = tf.get_variable("variance", params_shape,
initializer=tf.ones_initializer,
trainable=False)

def mean_var_with_update():
mean, variance = tf.nn.moments(x, tf.shape(x)[:-1], name="moments")
with tf.control_dependencies([assign_moving_average(moving_mean, mean, decay),
assign_moving_average(moving_variance, variance, decay)]):
return tf.identity(mean), tf.identity(variance)
mean, variance = tf.cond(train, mean_var_with_update, lambda: (moving_mean, moving_variance))
if affine:
beta = tf.get_variable("beta", params_shape,
initializer=tf.zeros_initializer)
gamma = tf.get_variable("gamma", params_shape,
initializer=tf.ones_initializer)
x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, eps)
else:
x = tf.nn.batch_normalization(x, mean, variance, None, None, eps)
return x


示例代碼見Udacity的batch-norm教程,你會找到三個 notebook:

  • Batch_Normalization_Lesson.ipynb - 此 notebook 將向你展示批歸一化的原理
  • Batch_Normalization_Exercises.ipynb - 由你來實現批歸一化的練習
  • Batch_Normalization_Solutions.ipynb - 這些練習的解決方案

Tips:

  1. Added is_training, a placeholder to store a boolean value indicating whether or not the network is training.
  2. Passed is_training to the conv_layer and fully_connected functions.
  3. Each time we call run on the session, we added to feed_dict the appropriate value for is_training.
  4. Moved the creation of train_opt inside a with tf.control_dependencies... statement. This is necessary to get the normalization layers created with tf.layers.batch_normalization to update their population statistics, which we need when performing inference.


martin-gorner/tensorflow-mnist-tutorial, 這裡有使用batch normalization的示例


使用slim.batch_norm(input,is_training=True)

還要加下面這幾句話。

不知道說的對不對,如果有錯誤,請大家指正。


首先,version 1有問題。

其次,contrib.layers和slim里的bn是完全一樣的。看看slim的__init__。

再次,于洋的train_phase是是否在training的tag。

最後,slim的bn肯定沒問題,但不是直接調用下就完事了,你若只調用下,會發現acc上下起伏,連綿不絕。。。moving_mean 和 moving_variance是需要更新的,利用control_dependencies保證每次訓練前更新一下。所以思路上我覺得于洋的回答可用。若要直接調用slim的bn,你的update_op要和train_op保證一個dependency。

最最後,多看tf的官方文檔,bn怎麼用,寫的還是很清楚的。


Easy to use batch norm layer. · Issue #1122 · tensorflow/tensorflow1.

  1. 保證moving average和variance在訓練的時候更新
  2. 修改bn層的參數decay


設置batch_norm函數中的參數updates_collections值為None

updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.

或者用樓上兄弟的那個了


推薦閱讀:

tensorflow如何訓練自己的圖像數據?
關於Tensorflow的一些想法?
tensorflow中的tensorboard可視化中的準確率損失率曲線,為什麼有類似毛刺一樣?
Tensorflow 中怎麼定義自己的層呢?
怎樣使用tensorflow導入已經下載好的mnist數據集?

TAG:深度學習DeepLearning | TensorFlow | 卷積神經網路CNN |