Comment utiliser la normalisation par lots dans TensorFlow?

Question

Je voudrais utiliser la normalisation par lots dans TensorFlow. J'ai trouvé le code source C++ associé dans core/ops/nn_ops.cc . Cependant, je ne l'ai pas trouvé documenté sur tensorflow.org.

Le BN a une sémantique différente dans MLP et CNN, donc je ne suis pas sûr de ce que fait exactement ce BN.

Je n'ai pas trouvé de méthode appelée MovingMoments non plus.

dga · Accepted Answer

Mise à jour juillet 2016 La manière la plus simple d'utiliser la normalisation par lots dans TensorFlow consiste à utiliser les interfaces de niveau supérieur fournies dans contrib/layers , tflearn ou slim .

Réponse précédente si vous voulez bricoler : La chaîne de documentation correspondante a été améliorée depuis la publication - voir le commentaire dans le branche principale au lieu de celle que vous avez trouvée. Cela précise, en particulier, qu'il s'agit du résultat de tf.nn.moments.

Vous pouvez voir un exemple très simple de son utilisation dans le code de test batch_norm . Pour un exemple d'utilisation plus réaliste, j'ai inclus ci-dessous la classe d'aide et utilisé des notes que j'ai rédigées pour ma propre utilisation (aucune garantie fournie!):

"""A helper class for managing batch normalization state. This class is designed to simplify adding batch normalization (http://arxiv.org/pdf/1502.03167v3.pdf) to your model by managing the state variables associated with it. Important use note: The function get_assigner() returns an op that must be executed to save the updated state. A suggested way to do this is to make execution of the model optimizer force it, e.g., by: update_assignments = tf.group(bn1.get_assigner(), bn2.get_assigner()) with tf.control_dependencies([optimizer]): optimizer = tf.group(update_assignments) """ import tensorflow as tf class ConvolutionalBatchNormalizer(object): """Helper class that groups the normalization logic and variables. Use: ewma = tf.train.ExponentialMovingAverage(decay=0.99) bn = ConvolutionalBatchNormalizer(depth, 0.001, ewma, True) update_assignments = bn.get_assigner() x = bn.normalize(y, train=training?) (the output x will be batch-normalized). """ def __init__(self, depth, epsilon, ewma_trainer, scale_after_norm): self.mean = tf.Variable(tf.constant(0.0, shape=[depth]), trainable=False) self.variance = tf.Variable(tf.constant(1.0, shape=[depth]), trainable=False) self.beta = tf.Variable(tf.constant(0.0, shape=[depth])) self.gamma = tf.Variable(tf.constant(1.0, shape=[depth])) self.ewma_trainer = ewma_trainer self.epsilon = epsilon self.scale_after_norm = scale_after_norm def get_assigner(self): """Returns an EWMA apply op that must be invoked after optimization.""" return self.ewma_trainer.apply([self.mean, self.variance]) def normalize(self, x, train=True): """Returns a batch-normalized version of x.""" if train: mean, variance = tf.nn.moments(x, [0, 1, 2]) assign_mean = self.mean.assign(mean) assign_variance = self.variance.assign(variance) with tf.control_dependencies([assign_mean, assign_variance]): return tf.nn.batch_norm_with_global_normalization( x, mean, variance, self.beta, self.gamma, self.epsilon, self.scale_after_norm) else: mean = self.ewma_trainer.average(self.mean) variance = self.ewma_trainer.average(self.variance) local_beta = tf.identity(self.beta) local_gamma = tf.identity(self.gamma) return tf.nn.batch_norm_with_global_normalization( x, mean, variance, local_beta, local_gamma, self.epsilon, self.scale_after_norm)

Notez que je l'ai appelée ConvolutionalBatchNormalizer car elle épingle l'utilisation de tf.nn.moments pour faire la somme sur les axes 0, 1 et 2, alors que pour une utilisation non convolutionnelle, vous pourriez ne vouloir que l'axe 0.

Des commentaires appréciés si vous l'utilisez.

Matthew Rahtz · Answer

Depuis TensorFlow 1.0 (février 2017), il existe également l'API de haut niveau tf.layers.batch_normalization incluse dans TensorFlow lui-même.

C'est super simple à utiliser:

_# Set this to True for training and False for testing training = tf.placeholder(tf.bool) x = tf.layers.dense(input_x, units=100) x = tf.layers.batch_normalization(x, training=training) x = tf.nn.relu(x) _

... sauf qu'il ajoute des opérations supplémentaires au graphique (pour mettre à jour ses variables de moyenne et de variance) de telle sorte qu'elles ne soient pas des dépendances de votre opération de formation. Vous pouvez simplement exécuter les opérations séparément:

_extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) sess.run([train_op, extra_update_ops], ...) _

ou ajoutez les opérations de mise à jour en tant que dépendances de votre opération d'entraînement manuellement, puis exécutez-les simplement comme d'habitude:

_extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(extra_update_ops): train_op = optimizer.minimize(loss) ... sess.run([train_op], ...) _

bgshi · Answer

Ce qui suit fonctionne bien pour moi, il n'est pas nécessaire d'appeler EMA-apply outside.

import numpy as np import tensorflow as tf from tensorflow.python import control_flow_ops def batch_norm(x, n_out, phase_train, scope='bn'): """ Batch normalization on convolutional maps. Args: x: Tensor, 4D BHWD input maps n_out: integer, depth of input maps phase_train: boolean tf.Varialbe, true indicates training phase scope: string, variable scope Return: normed: batch-normalized maps """ with tf.variable_scope(scope): beta = tf.Variable(tf.constant(0.0, shape=[n_out]), name='beta', trainable=True) gamma = tf.Variable(tf.constant(1.0, shape=[n_out]), name='gamma', trainable=True) batch_mean, batch_var = tf.nn.moments(x, [0,1,2], name='moments') ema = tf.train.ExponentialMovingAverage(decay=0.5) def mean_var_with_update(): ema_apply_op = ema.apply([batch_mean, batch_var]) with tf.control_dependencies([ema_apply_op]): return tf.identity(batch_mean), tf.identity(batch_var) mean, var = tf.cond(phase_train, mean_var_with_update, lambda: (ema.average(batch_mean), ema.average(batch_var))) normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3) return normed

Exemple:

import math n_in, n_out = 3, 16 ksize = 3 stride = 1 phase_train = tf.placeholder(tf.bool, name='phase_train') input_image = tf.placeholder(tf.float32, name='input_image') kernel = tf.Variable(tf.truncated_normal([ksize, ksize, n_in, n_out], stddev=math.sqrt(2.0/(ksize*ksize*n_out))), name='kernel') conv = tf.nn.conv2d(input_image, kernel, [1,stride,stride,1], padding='SAME') conv_bn = batch_norm(conv, n_out, phase_train) relu = tf.nn.relu(conv_bn) with tf.Session() as session: session.run(tf.initialize_all_variables()) for i in range(20): test_image = np.random.Rand(4,32,32,3) sess_outputs = session.run([relu], {input_image.name: test_image, phase_train.name: True})

Pinocchio · Answer

Il existe également un couche "officielle" de normalisation par lots codé par les développeurs. Ils n'ont pas une très bonne documentation sur la façon de l'utiliser mais voici comment l'utiliser (selon moi):

from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm def batch_norm_layer(x,train_phase,scope_bn): bn_train = batch_norm(x, decay=0.999, center=True, scale=True, updates_collections=None, is_training=True, reuse=None, # is this right? trainable=True, scope=scope_bn) bn_inference = batch_norm(x, decay=0.999, center=True, scale=True, updates_collections=None, is_training=False, reuse=True, # is this right? trainable=True, scope=scope_bn) z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) return z

pour l'utiliser, vous devez créer un espace réservé pour train_phase qui indique si vous êtes en phase d'instruction ou d'inférence (comme dans train_phase = tf.placeholder(tf.bool, name='phase_train')). Sa valeur peut être renseignée pendant l'inférence ou l'entraînement avec un tf.session comme dans:

test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})

ou pendant l'entraînement:

sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})

Je suis à peu près sûr que cela est correct selon la discussion dans github .

Il semble y avoir un autre lien utile:

http://r2rt.com/implementing-batch-normalization-in-tensorflow.html

Martina Marek · Answer

Vous pouvez simplement utiliser la couche batch_norm intégrée:

batch_norm = tf.cond(is_train, lambda: tf.contrib.layers.batch_norm(prev, activation_fn=tf.nn.relu, is_training=True, reuse=None), lambda: tf.contrib.layers.batch_norm(prev, activation_fn =tf.nn.relu, is_training=False, reuse=True))

où prev est la sortie de votre couche précédente (peut être à la fois entièrement connecté ou une couche convolutionnelle) et is_train est un espace réservé booléen. Il suffit d’utiliser batch_norm comme entrée de la couche suivante.

jrock · Answer

Depuis que quelqu'un a récemment édité cela, j'aimerais préciser que ce n'est plus un problème.

Cette réponse ne semble pas être correct Lorsque phase_train est défini sur false, il met toujours à jour la moyenne et la variance ema. Cela peut être vérifié avec l'extrait de code suivant.

x = tf.placeholder(tf.float32, [None, 20, 20, 10], name='input') phase_train = tf.placeholder(tf.bool, name='phase_train') # generate random noise to pass into batch norm x_gen = tf.random_normal([50,20,20,10]) pt_false = tf.Variable(tf.constant(True)) #generate a constant variable to pass into batch norm y = x_gen.eval() [bn, bn_vars] = batch_norm(x, 10, phase_train) tf.initialize_all_variables().run() train_step = lambda: bn.eval({x:x_gen.eval(), phase_train:True}) test_step = lambda: bn.eval({x:y, phase_train:False}) test_step_c = lambda: bn.eval({x:y, phase_train:True}) # Verify that this is different as expected, two different x's have different norms print(train_step()[0][0][0]) print(train_step()[0][0][0]) # Verify that this is same as expected, same x's (y) have same norm print(train_step_c()[0][0][0]) print(train_step_c()[0][0][0]) # THIS IS DIFFERENT but should be they same, should only be reading from the ema. print(test_step()[0][0][0]) print(test_step()[0][0][0])

Maksim Khaitovich · Answer

À l'aide de la couche batch_norm intégrée à TensorFlow, vous trouverez ci-dessous le code permettant de charger des données, de créer un réseau avec une couche cachée ReLU et une normalisation L2, ainsi que d'introduire une normalisation par lots pour les couches cachée et externe. Cela fonctionne bien et s'entraîne bien. Juste pour votre information, cet exemple est principalement basé sur les données et le code du cours Udacity DeepLearning. P.S. Oui, certaines réponses ont été abordées plus tôt, mais j'ai décidé de rassembler dans un extrait de code tout ce qui vous donne un exemple de processus de formation du réseau complet avec Batch Normalization et son évaluation.

# These are all the modules we'll be using later. Make sure you can import them # before proceeding further. from __future__ import print_function import numpy as np import tensorflow as tf from six.moves import cPickle as pickle pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle' with open(pickle_file, 'rb') as f: save = pickle.load(f) train_dataset = save['train_dataset'] train_labels = save['train_labels'] valid_dataset = save['valid_dataset'] valid_labels = save['valid_labels'] test_dataset = save['test_dataset'] test_labels = save['test_labels'] del save # hint to help gc free up memory print('Training set', train_dataset.shape, train_labels.shape) print('Validation set', valid_dataset.shape, valid_labels.shape) print('Test set', test_dataset.shape, test_labels.shape) image_size = 28 num_labels = 10 def reformat(dataset, labels): dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32) # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...] labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32) return dataset, labels train_dataset, train_labels = reformat(train_dataset, train_labels) valid_dataset, valid_labels = reformat(valid_dataset, valid_labels) test_dataset, test_labels = reformat(test_dataset, test_labels) print('Training set', train_dataset.shape, train_labels.shape) print('Validation set', valid_dataset.shape, valid_labels.shape) print('Test set', test_dataset.shape, test_labels.shape) def accuracy(predictions, labels): return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0]) #for NeuralNetwork model code is below #We will use SGD for training to save our time. Code is from Assignment 2 #beta is the new parameter - controls level of regularization. #Feel free to play with it - the best one I found is 0.001 #notice, we introduce L2 for both biases and weights of all layers batch_size = 128 beta = 0.001 #building tensorflow graph graph = tf.Graph() with graph.as_default(): # Input data. For the training data, we use a placeholder that will be fed # at run time with a training minibatch. tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) tf_valid_dataset = tf.constant(valid_dataset) tf_test_dataset = tf.constant(test_dataset) #introduce batchnorm tf_train_dataset_bn = tf.contrib.layers.batch_norm(tf_train_dataset) #now let's build our new hidden layer #that's how many hidden neurons we want num_hidden_neurons = 1024 #its weights hidden_weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_hidden_neurons])) hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons])) #now the layer itself. It multiplies data by weights, adds biases #and takes ReLU over result hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset_bn, hidden_weights) + hidden_biases) #adding the batch normalization layerhi() hidden_layer_bn = tf.contrib.layers.batch_norm(hidden_layer) #time to go for output linear layer #out weights connect hidden neurons to output labels #biases are added to output labels out_weights = tf.Variable( tf.truncated_normal([num_hidden_neurons, num_labels])) out_biases = tf.Variable(tf.zeros([num_labels])) #compute output out_layer = tf.matmul(hidden_layer_bn,out_weights) + out_biases #our real output is a softmax of prior result #and we also compute its cross-entropy to get our loss #Notice - we introduce our L2 here loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( out_layer, tf_train_labels) + beta*tf.nn.l2_loss(hidden_weights) + beta*tf.nn.l2_loss(hidden_biases) + beta*tf.nn.l2_loss(out_weights) + beta*tf.nn.l2_loss(out_biases))) #now we just minimize this loss to actually train the network optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss) #Nice, now let's calculate the predictions on each dataset for evaluating the #performance so far # Predictions for the training, validation, and test data. train_prediction = tf.nn.softmax(out_layer) valid_relu = tf.nn.relu( tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases) valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases) test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases) #now is the actual training on the ANN we built #we will run it for some number of steps and evaluate the progress after #every 500 steps #number of steps we will train our ANN num_steps = 3001 #actual training with tf.Session(graph=graph) as session: tf.initialize_all_variables().run() print("Initialized") for step in range(num_steps): # Pick an offset within the training data, which has been randomized. # Note: we could use better randomization across epochs. offset = (step * batch_size) % (train_labels.shape[0] - batch_size) # Generate a minibatch. batch_data = train_dataset[offset:(offset + batch_size), :] batch_labels = train_labels[offset:(offset + batch_size), :] # Prepare a dictionary telling the session where to feed the minibatch. # The key of the dictionary is the placeholder node of the graph to be fed, # and the value is the numpy array to feed to it. feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels} _, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict) if (step % 500 == 0): print("Minibatch loss at step %d: %f" % (step, l)) print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)) print("Validation accuracy: %.1f%%" % accuracy( valid_prediction.eval(), valid_labels)) print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Rob Romijnders · Answer

Voici donc un exemple simple d'utilisation de cette classe batchnorm:

from bn_class import * with tf.name_scope('Batch_norm_conv1') as scope: ewma = tf.train.ExponentialMovingAverage(decay=0.99) bn_conv1 = ConvolutionalBatchNormalizer(num_filt_1, 0.001, ewma, True) update_assignments = bn_conv1.get_assigner() a_conv1 = bn_conv1.normalize(a_conv1, train=bn_train) h_conv1 = tf.nn.relu(a_conv1)