怎樣在tensorflow中使用batch normalization？

01-15

試了幾個版本的batch normalization，包括tf.contribute中的，slim中的，也從stackoverflow上找了幾個版本的，都不對。幾個版本的代碼如下：
版本1：源自stackoverflow
def batch_norm_layer(x, train_phase, scope_bn="bn"):
bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
updates_collections=None,

is_training=True,
reuse=None, # is this right?
trainable=True,
scope=scope_bn)
bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
updates_collections=None,
is_training=False,
reuse=True, # is this right?
trainable=True,
scope=scope_bn)

z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
版本2：from tensorflow.contrib.layers.python.layers.layers import batch_norm，直接用
以上各版本都是在訓練時is_training為True或一個placeholder，預測時為False，預測結果隨著batch_size的大小變化，都是不穩定的，即，以上各版本的batch_normalization_layer 都是錯誤。請大家看下我錯在了哪兒，該怎樣正確使用，最好有示例代碼，謝謝！

之前也有和題主一樣的疑問找了好幾個之後暫時這個用的還好

def batch_norm_layer(x, train_phase, scope_bn):
with tf.variable_scope(scope_bn):
beta = tf.Variable(tf.constant(0.0, shape=[x.shape[-1]]), name="beta", trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[x.shape[-1]]), name="gamma", trainable=True)
axises = np.arange(len(x.shape) - 1)
batch_mean, batch_var = tf.nn.moments(x, axises, name="moments")
ema = tf.train.ExponentialMovingAverage(decay=0.5)

def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)

mean, var = tf.cond(train_phase, mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed

在高票（作者：于洋，鏈接：于洋：怎樣在tensorflow中使用batch normalization？）基礎的細節更改。主要減少了不比要的moments計算。參數仿造pytorch

from tensorflow.python.training.moving_averages import assign_moving_average


def batch_norm(x, train, eps=1e-05, decay=0.9, affine=True, name=None):

    with tf.variable_scope(name, default_name="BatchNorm2d"):

        params_shape = tf.shape(x)[-1:]

        moving_mean = tf.get_variable("mean", params_shape,

                                      initializer=tf.zeros_initializer,

                                      trainable=False)

        moving_variance = tf.get_variable("variance", params_shape,

                                          initializer=tf.ones_initializer,

                                          trainable=False)

def mean_var_with_update(): mean, variance = tf.nn.moments(x, tf.shape(x)[:-1], name="moments") with tf.control_dependencies([assign_moving_average(moving_mean, mean, decay), assign_moving_average(moving_variance, variance, decay)]): return tf.identity(mean), tf.identity(variance) mean, variance = tf.cond(train, mean_var_with_update, lambda: (moving_mean, moving_variance)) if affine: beta = tf.get_variable("beta", params_shape, initializer=tf.zeros_initializer) gamma = tf.get_variable("gamma", params_shape, initializer=tf.ones_initializer) x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, eps) else: x = tf.nn.batch_normalization(x, mean, variance, None, None, eps) return x

示例代碼見Udacity的batch-norm教程，你會找到三個 notebook：

Batch_Normalization_Lesson.ipynb - 此 notebook 將向你展示批歸一化的原理
Batch_Normalization_Exercises.ipynb - 由你來實現批歸一化的練習
Batch_Normalization_Solutions.ipynb - 這些練習的解決方案

Tips：

Added is_training, a placeholder to store a boolean value indicating whether or not the network is training.
Passed is_training to the conv_layer and fully_connected functions.
Each time we call run on the session, we added to feed_dict the appropriate value for is_training.
Moved the creation of train_opt inside a with tf.control_dependencies... statement. This is necessary to get the normalization layers created with tf.layers.batch_normalization to update their population statistics, which we need when performing inference.

martin-gorner/tensorflow-mnist-tutorial, 這裡有使用batch normalization的示例

使用slim.batch_norm(input,is_training=True)

還要加下面這幾句話。

不知道說的對不對，如果有錯誤，請大家指正。

首先，version 1有問題。

其次，contrib.layers和slim里的bn是完全一樣的。看看slim的__init__。

再次，于洋的train_phase是是否在training的tag。

最後，slim的bn肯定沒問題，但不是直接調用下就完事了，你若只調用下，會發現acc上下起伏，連綿不絕。。。moving_mean 和 moving_variance是需要更新的，利用control_dependencies保證每次訓練前更新一下。所以思路上我覺得于洋的回答可用。若要直接調用slim的bn，你的update_op要和train_op保證一個dependency。

最最後，多看tf的官方文檔，bn怎麼用，寫的還是很清楚的。

Easy to use batch norm layer. · Issue #1122 · tensorflow/tensorflow1.

保證moving average和variance在訓練的時候更新
修改bn層的參數decay

設置batch_norm函數中的參數updates_collections值為None

updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.

或者用樓上兄弟的那個了