怎樣在tensorflow中使用batch normalization?
試了幾個版本的batch normalization,包括tf.contribute中的,slim中的,也從stackoverflow上找了幾個版本的,都不對。幾個版本的代碼如下:
版本1:源自stackoverflowdef batch_norm_layer(x, train_phase, scope_bn="bn"): bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True, updates_collections=None,is_training=True,
reuse=None, # is this right? trainable=True, scope=scope_bn) bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True, updates_collections=None, is_training=False, reuse=True, # is this right? trainable=True, scope=scope_bn)z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z版本2:from tensorflow.contrib.layers.python.layers.layers import batch_norm,直接用以上各版本都是在訓練時is_training為True或一個placeholder,預測時為False,預測結果隨著batch_size的大小變化,都是不穩定的,即,以上各版本的batch_normalization_layer 都是錯誤。請大家看下我錯在了哪兒,該怎樣正確使用,最好有示例代碼,謝謝!
之前也有和題主一樣的疑問 找了好幾個之後 暫時這個用的還好
def batch_norm_layer(x, train_phase, scope_bn):
with tf.variable_scope(scope_bn):
beta = tf.Variable(tf.constant(0.0, shape=[x.shape[-1]]), name="beta", trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[x.shape[-1]]), name="gamma", trainable=True)
axises = np.arange(len(x.shape) - 1)
batch_mean, batch_var = tf.nn.moments(x, axises, name="moments")
ema = tf.train.ExponentialMovingAverage(decay=0.5)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(train_phase, mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed
在高票(作者:于洋,鏈接:于洋:怎樣在tensorflow中使用batch normalization?)基礎的細節更改。主要減少了不比要的moments計算。參數仿造pytorch
from tensorflow.python.training.moving_averages import assign_moving_average
def batch_norm(x, train, eps=1e-05, decay=0.9, affine=True, name=None):
with tf.variable_scope(name, default_name="BatchNorm2d"):
params_shape = tf.shape(x)[-1:]
moving_mean = tf.get_variable("mean", params_shape,
initializer=tf.zeros_initializer,
trainable=False)
moving_variance = tf.get_variable("variance", params_shape,
initializer=tf.ones_initializer,
trainable=False)
def mean_var_with_update():
mean, variance = tf.nn.moments(x, tf.shape(x)[:-1], name="moments")
with tf.control_dependencies([assign_moving_average(moving_mean, mean, decay),
assign_moving_average(moving_variance, variance, decay)]):
return tf.identity(mean), tf.identity(variance)
mean, variance = tf.cond(train, mean_var_with_update, lambda: (moving_mean, moving_variance))
if affine:
beta = tf.get_variable("beta", params_shape,
initializer=tf.zeros_initializer)
gamma = tf.get_variable("gamma", params_shape,
initializer=tf.ones_initializer)
x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, eps)
else:
x = tf.nn.batch_normalization(x, mean, variance, None, None, eps)
return x
示例代碼見Udacity的batch-norm教程,你會找到三個 notebook:
- Batch_Normalization_Lesson.ipynb - 此 notebook 將向你展示批歸一化的原理
- Batch_Normalization_Exercises.ipynb - 由你來實現批歸一化的練習
- Batch_Normalization_Solutions.ipynb - 這些練習的解決方案
Tips:
- Added is_training, a placeholder to store a boolean value indicating whether or not the network is training.
- Passed is_training to the conv_layer and fully_connected functions.
- Each time we call run on the session, we added to feed_dict the appropriate value for is_training.
- Moved the creation of train_opt inside a with tf.control_dependencies... statement. This is necessary to get the normalization layers created with tf.layers.batch_normalization to update their population statistics, which we need when performing inference.
martin-gorner/tensorflow-mnist-tutorial, 這裡有使用batch normalization的示例
使用slim.batch_norm(input,is_training=True)
還要加下面這幾句話。
不知道說的對不對,如果有錯誤,請大家指正。
首先,version 1有問題。
其次,contrib.layers和slim里的bn是完全一樣的。看看slim的__init__。
再次,于洋的train_phase是是否在training的tag。
最後,slim的bn肯定沒問題,但不是直接調用下就完事了,你若只調用下,會發現acc上下起伏,連綿不絕。。。moving_mean 和 moving_variance是需要更新的,利用control_dependencies保證每次訓練前更新一下。所以思路上我覺得于洋的回答可用。若要直接調用slim的bn,你的update_op要和train_op保證一個dependency。
最最後,多看tf的官方文檔,bn怎麼用,寫的還是很清楚的。
Easy to use batch norm layer. · Issue #1122 · tensorflow/tensorflow1.
- 保證moving average和variance在訓練的時候更新
- 修改bn層的參數decay
設置batch_norm函數中的參數updates_collections值為None
updates_collections
: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.
或者用樓上兄弟的那個了
推薦閱讀:
※tensorflow如何訓練自己的圖像數據?
※關於Tensorflow的一些想法?
※tensorflow中的tensorboard可視化中的準確率損失率曲線,為什麼有類似毛刺一樣?
※Tensorflow 中怎麼定義自己的層呢?
※怎樣使用tensorflow導入已經下載好的mnist數據集?
TAG:深度學習DeepLearning | TensorFlow | 卷積神經網路CNN |