tf.layers.batch_normalization grande erreur de test

Question

J'essaie d'utiliser la normalisation par lots. J'ai essayé d'utiliser tf.layers.batch_normalization sur un simple conv net pour mnist.

J'obtiens une grande précision pour l'étape du train (> 98%) mais une très faible précision de test (<50%). J'ai essayé de changer les valeurs de momentum (j'ai essayé 0.8.0.9,0.99,0.999) et de jouer avec des tailles de lot, mais cela se comporte toujours essentiellement de la même manière. Je l'entraîne sur 20 000 itérations.

mon code

# Input placeholders x = tf.placeholder(tf.float32, [None, 784], name='x-input') y_ = tf.placeholder(tf.float32, [None, 10], name='y-input') is_training = tf.placeholder(tf.bool) # inut layer input_layer = tf.reshape(x, [-1, 28, 28, 1]) with tf.name_scope('conv1'): #Convlution #1 ([5,5] : [28x28x1]->[28x28x6]) conv1 = tf.layers.conv2d( inputs=input_layer, filters=6, kernel_size=[5, 5], padding="same", activation=None ) #Batch Norm #1 conv1_bn = tf.layers.batch_normalization( inputs=conv1, axis=-1, momentum=0.9, epsilon=0.001, center=True, scale=True, training = is_training, name='conv1_bn' ) #apply relu conv1_bn_relu = tf.nn.relu(conv1_bn) #apply pool ([2,2] : [28x28x6]->[14X14X6]) maxpool1=tf.layers.max_pooling2d( inputs=conv1_bn_relu, pool_size=[2,2], strides=2, padding="valid" ) with tf.name_scope('conv2'): #convolution #2 ([5x5] : [14x14x6]->[14x14x16] conv2 = tf.layers.conv2d( inputs=maxpool1, filters=16, kernel_size=[5, 5], padding="same", activation=None ) #Batch Norm #2 conv2_bn = tf.layers.batch_normalization( inputs=conv2, axis=-1, momentum=0.999, epsilon=0.001, center=True, scale=True, training = is_training ) #apply relu conv2_bn_relu = tf.nn.relu(conv2_bn) #maxpool2 ([2,2] : [14x14x16]->[7x7x16] maxpool2=tf.layers.max_pooling2d( inputs=conv2_bn_relu, pool_size=[2,2], strides=2, padding="valid" ) #fully connected 1 [7*7*16 = 784 -> 120] maxpool2_flat=tf.reshape(maxpool2,[-1,7*7*16]) fc1 = tf.layers.dense( inputs=maxpool2_flat, units=120, activation=None ) #Batch Norm #2 fc1_bn = tf.layers.batch_normalization( inputs=fc1, axis=-1, momentum=0.999, epsilon=0.001, center=True, scale=True, training = is_training ) #apply reliu fc1_bn_relu = tf.nn.relu(fc1_bn) #fully connected 2 [120-> 84] fc2 = tf.layers.dense( inputs=fc1_bn_relu, units=84, activation=None ) #apply relu fc2_bn_relu = tf.nn.relu(fc2) #fully connected 3 [84->10]. Output layer with softmax y = tf.layers.dense( inputs=fc2_bn_relu, units=10, activation=None ) #loss cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)) tf.summary.scalar('cross entropy', cross_entropy) correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.summary.scalar('accuracy',accuracy) #merge summaries and init train writer sess = tf.Session() merged = tf.summary.merge_all() train_writer = tf.summary.FileWriter(log_dir + '/train' ,sess.graph) test_writer = tf.summary.FileWriter(log_dir + '/test') train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) init = tf.global_variables_initializer() sess.run(init) with sess.as_default(): def get_variables_values(): variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) values = {} for variable in variables: values[variable.name[:-2]] = sess.run(variable, feed_dict={ x:batch[0], y_:batch[1], is_training:True }) return values for i in range(t_iter): batch = mnist.train.next_batch(batch_size) if i%100 == 0: #test-set summary print('####################################') values = get_variables_values() print('moving variance is:') print(values["conv1_bn/moving_variance"]) print('moving mean is:') print(values["conv1_bn/moving_mean"]) print('gamma is:') print(values["conv1_bn/gamma/Adam"]) print('beta is:') print(values["conv1_bn/beta/Adam"]) summary, acc = sess.run([merged,accuracy], feed_dict={ x:mnist.test.images, y_:mnist.test.labels, is_training:False }) else: summary, _ = sess.run([merged,train_step], feed_dict={ x:batch[0], y_:batch[1], is_training:True }) if i%10 == 0: train_writer.add_summary(summary,i)

Je pense que le problème est que le Moving_mean/var n'est pas mis à jour. J'imprime la variable moving_mean/var pendant l'exécution et j'obtiens: la variance mobile est: [1. 1. 1. 1. 1. 1.] la moyenne mobile est: [0. 0. 0. 0. 0. 0.] gamma est: [-0.00055969 0.00164391 0.00163301 -0.00206227 -0.00011434 -0.00070161] beta est: [-0.00232835 -0.00040769 0.00114277 -0.0025414 -0.00049697 0.00221556]

Quelqu'un a une idée de ce que je fais mal?

Matthew Rahtz · Accepted Answer

Les opérations qui tf.layers.batch_normalization les ajouts pour mettre à jour la moyenne et la variance ne sont pas automatiquement ajoutés en tant que dépendances de l'exploitation du train - donc si vous ne faites rien de plus, ils ne sont jamais exécutés. (Malheureusement, la documentation ne le mentionne pas actuellement. J'ouvre un problème à ce sujet.)

Heureusement, les opérations de mise à jour sont faciles à obtenir, car elles sont ajoutées au tf.GraphKeys.UPDATE_OPS collection. Ensuite, vous pouvez soit exécuter les opérations supplémentaires manuellement:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) sess.run([train_op, extra_update_ops], ...)

Ou ajoutez-les en tant que dépendances de votre opération d'entraînement, puis exécutez simplement votre opération d'entraînement comme d'habitude:

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(extra_update_ops): train_op = optimizer.minimize(loss) ... sess.run([train_op], ...)