Compréhension intuitive des convolutions 1D, 2D et 3D dans les réseaux de neurones convolutionnels

Question

Quelqu'un peut-il expliquer clairement la différence entre les convolutions 1D, 2D et 3D dans CNN (apprentissage en profondeur) avec des exemples?

runhani · Answer

Je veux expliquer avec l'image de C3D .

En un mot, la direction convolutionnelle & la forme de la sortie est importante!

↑↑↑↑ Convolutions 1D - Basic ↑↑↑↑

juste 1 - direction (axe des temps) pour calculer la conv
entrée = [W], filtre = [k], sortie = [W]
ex) entrée = [1,1,1,1,1], filtre = [0,25,0,5,0,25], sortie = [1,1,1,1,1]
forme de sortie est un tableau 1D
exemple) lissage de graphe

tf.nn.conv1d code Exemple de jouet

import tensorflow as tf import numpy as np sess = tf.Session() ones_1d = np.ones(5) weight_1d = np.ones(3) strides_1d = 1 in_1d = tf.constant(ones_1d, dtype=tf.float32) filter_1d = tf.constant(weight_1d, dtype=tf.float32) in_width = int(in_1d.shape[0]) filter_width = int(filter_1d.shape[0]) input_1d = tf.reshape(in_1d, [1, in_width, 1]) kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]) output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME')) print sess.run(output_1d)

↑↑↑↑ Convolutions 2D - Basic ↑↑↑↑

2 - direction (x, y) pour calculer la conv
la forme en sortie est la matrice 2D
entrée = [W, H], filtre = [k, k] sortie = [W, H]
exemple) Sobel Egde Fllter

tf.nn.conv2d - Exemple de jouet

ones_2d = np.ones((5,5)) weight_2d = np.ones((3,3)) strides_2d = [1, 1, 1, 1] in_2d = tf.constant(ones_2d, dtype=tf.float32) filter_2d = tf.constant(weight_2d, dtype=tf.float32) in_width = int(in_2d.shape[0]) in_height = int(in_2d.shape[1]) filter_width = int(filter_2d.shape[0]) filter_height = int(filter_2d.shape[1]) input_2d = tf.reshape(in_2d, [1, in_height, in_width, 1]) kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME')) print sess.run(output_2d)

↑↑↑↑↑ Convolutions 3D - Basic ↑↑↑↑

3 - direction (x, y, z) pour calculer la conv
la forme de sortie est le volume 3D
entrée = [W, H, L], filtre = [k, k, d ] sortie = [W, H, M]
d <L est important! pour faire du volume
exemple) C3D

tf.nn.conv3d - Exemple de jouet

ones_3d = np.ones((5,5,5)) weight_3d = np.ones((3,3,3)) strides_3d = [1, 1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) in_depth = int(in_3d.shape[2]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) filter_depth = int(filter_3d.shape[2]) input_3d = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1]) kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1]) output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME')) print sess.run(output_3d)

↑↑↑↑ Convolutions 2D avec entrée 3D - LeNet, VGG, ..., ↑ ↑↑↑↑

Bien que l'entrée soit 3D ex) 224x224x3, 112x112x32
la forme en sortie n'est pas 3D Volume, mais 2D Matrice
parce que la profondeur du filtre = L doit correspondre aux canaux d'entrée = L
2 - direction (x, y) pour calculer la conv! pas en 3D
entrée = [W, H, L], filtre = [k, k, L] sortie = [W, H]
la forme en sortie est la matrice 2D
et si nous voulions former N filtres (N est le nombre de filtres)
alors la forme en sortie est (empilé 2D) 3D = matrice 2D x N .

conv2d - LeNet, VGG, ... pour 1 filtre

in_channels = 32 # 3 for RGB, 32, 64, 128, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae with in_channels weight_3d = np.ones((3,3,in_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME')) print sess.run(output_2d)

conv2d - LeNet, VGG, ... pour N filtres

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME') print sess.run(output_3d)

↑↑↑↑ Bonus 1x1 conv dans CNN - GoogLeNet, ..., ↑↑↑↑

1x1 conv est déroutant quand vous pensez que ceci est un filtre d'image 2D comme sobel
pour 1x1 conv dans CNN, l'entrée est une forme 3D comme ci-dessus.
il calcule le filtrage en profondeur
entrée = [W, H, L], filtre = [1,1, L] sortie = [W, H]
la forme empilée en sortie correspond à la matrice 3D = 2D x N .

tf.nn.conv2d - cas spécial 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME') print sess.run(output_3d)

Animation (conv. 2D avec entrées 3D)

- Lien d'origine: LINK
- L'auteur: Martin Görner
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne

Bonus 1D Convolutions avec entrée 2D

↑↑↑↑ Convolutions 1D avec entrée 1D ↑↑↑↑

↑↑↑↑ Convolutions 1D avec entrée 2D ↑↑↑↑↑

Eventhough l’entrée est 2D ex) 20x14
la forme de sortie n'est pas 2D , mais 1D Matrix
car la hauteur du filtre = L doit correspondre à la hauteur saisie = L
1 - direction (x) pour calculer la conv! pas 2D
entrée = [W, L], filtre = [k, L] sortie = [W]
la forme en sortie est la matrice 1D
et si nous voulions former N filtres (N est le nombre de filtres)
alors la forme en sortie est (empilé 1D) matrice 2D = 1D x N .

Bonus C3D

in_channels = 32 # 3, 32, 64, 128, ... out_channels = 64 # 3, 32, 64, 128, ... ones_4d = np.ones((5,5,5,in_channels)) weight_5d = np.ones((3,3,3,in_channels,out_channels)) strides_3d = [1, 1, 1, 1, 1] in_4d = tf.constant(ones_4d, dtype=tf.float32) filter_5d = tf.constant(weight_5d, dtype=tf.float32) in_width = int(in_4d.shape[0]) in_height = int(in_4d.shape[1]) in_depth = int(in_4d.shape[2]) filter_width = int(filter_5d.shape[0]) filter_height = int(filter_5d.shape[1]) filter_depth = int(filter_5d.shape[2]) input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_depth, in_channels]) kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels]) output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME') print sess.run(output_4d) sess.close()

Entrée et sortie dans Tensorflow

Sommaire

Jerry Liu · Answer

CNN 1D, 2D ou 3D fait référence à la direction de convolution plutôt qu'à la dimension d'entrée ou de filtrage.
Pour une entrée de canal, CNN2D est égal à CNN1D = longueur du noyau = longueur de l’entrée. (1 direction conv)