web-dev-qa-db-fra.com

keras BLSTM pour le marquage de séquence

Je suis relativement nouveau dans les réseaux neuronaux, alors excusez mon ignorance. J'essaie d'adapter l'exemple keras BLSTM ici . L'exemple lit dans les textes et les classe comme 0 ou 1. Je veux un BLSTM qui ressemble beaucoup au marquage POS, bien que des extras comme la lemmatisation ou d'autres fonctionnalités avancées ne soient pas nécessaires, mais un modèle de base. Mes données sont une liste de phrases et chaque mot se voit attribuer une catégorie de 1 à 8. Je veux former un BLSTM qui peut utiliser ces données pour prédire la catégorie de chaque mot dans une phrase invisible.

par exemple. input = ['Le', 'chien', 'est', 'rouge'] donne la sortie = [2, 4, 3, 7]

Si l'exemple de keras n'est pas le meilleur itinéraire, je suis ouvert à d'autres suggestions.

J'ai actuellement ceci:

'''Train a Bidirectional LSTM.'''

from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers import Dense, Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan


np.random.seed(1337)  # for reproducibility
max_features = 20000
batch_size = 16
maxlen = 18

print('Loading data...')
(X_train, y_train), (X_test, y_test) = prep_scan(nb_words=max_features,
                                                 test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print("Pad sequences (samples x time)")
# type issues here? float/int?
X_train = sequence.pad_sequences(X_train, value=0.)
X_test = sequence.pad_sequences(X_test, value=0.)  # pad with zeros

print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

# need to pad y too, because more than 1 ouput value, not classification?
y_train = sequence.pad_sequences(np.array(y_train), value=0.)
y_test = sequence.pad_sequences(np.array(y_test), value=0.)

print('y_train shape:', X_train.shape)
print('y_test shape:', X_test.shape)

# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')

# this embedding layer will transform the sequences of integers
# into vectors of size 128
embedded = Embedding(max_features, 128, input_length=maxlen)(sequence)

# apply forwards LSTM
forwards = LSTM(64)(embedded)
# apply backwards LSTM
backwards = LSTM(64, go_backwards=True)(embedded)

# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.5)(merged)
# number after dense has to corresponse to output matrix?
output = Dense(17, activation='sigmoid')(after_dp)

model = Model(input=sequence, output=output)

# try using different optimizers and different optimizer configs
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])

print('Train...')
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_Epoch=4,
          validation_data=[X_test, y_test])

X_test_new = np.array([[0,0,0,0,0,0,0,0,0,12,3,55,4,34,5,45,3,9],[0,0,0,0,0,0,0,1,7,65,34,67,34,23,24,67,54,43,]])

classes = model.predict(X_test_new, batch_size=16)
print(classes)

Ma sortie est la bonne dimension, mais me donne des flottants 0-1. Je pense que c'est parce qu'il est toujours à la recherche de classfication binaire. Quelqu'un sait comment réparer ceci?

Résolu

Assurez-vous simplement que les étiquettes sont des tableaux binaires:

(X_train, y_train), (X_test, y_test), maxlen, Word_ids, tags_ids = prep_model(
    nb_words=nb_words, test_len=75)

W = (y_train > 0).astype('float')

print(len(X_train), 'train sequences')
print(int(len(X_train)*val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')

# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')

# this embedding layer will transform the sequences of integers
# into vectors of size 256
embedded = Embedding(nb_words, output_dim=hidden,
                     input_length=maxlen, mask_zero=True)(sequence)

# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
                 go_backwards=True)(embedded)

# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.15)(merged)

# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
    Dense(output_dim=nb_classes,
          activation='softmax'))(after_dp)

model = Model(input=sequence, output=output)

# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
              metrics=['accuracy'], optimizer='adam',
              sample_weight_mode='temporal')

print('Train...')
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_Epoch=epochs,
          shuffle=True,
          validation_split=val_split,
          sample_weight=W)
11
ChrisDH

Résolu Le problème principal était de remodeler les données pour les catégories de classification en tableaux binaires. Également utilisé TimeDistributed et défini return_sequences sur True.

4
ChrisDH

Je sais que ce fil est très ancien, mais j'espère pouvoir vous aider.

J'ai modifié le modèle pour un modèle binaire:

sequence = Input(shape=(X_train.shape[1],), dtype='int32')

embedded = Embedding(max_fatures,embed_dim,input_length=X_train.shape[1], mask_zero=True)(sequence)

# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,go_backwards=True)(embedded)

# concatenate the outputs of the 2 LSTMs
merged = concatenate([forwards, backwards])
after_dp = Dropout(0.15)(merged)
# add now layer LSTM without return_sequence
lstm_normal = LSTM(hidden)(merged)

# TimeDistributed for sequence
# change activation to sigmoid?
#output = TimeDistributed(Dense(output_dim=2,activation='sigmoid'))(after_dp)
#I changed output layer TimeDistributed for a Dense, for the problem of dimensionality and output_dim = 1 (output binary) 
output = Dense(output_dim=1,activation='sigmoid')(lstm_normal)

model = Model(input=sequence, output=output)

# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
# I changed modelo compile by to binary and remove sample_weight_mode parameter
model.compile(loss='binary_crossentropy',
              metrics=['accuracy'], optimizer='adam',
              )

print(model.summary())


###################################
#this is the line of training

model.fit(X_train, Y_train,
          batch_size=128,
          epochs=10,
          shuffle=True,
          validation_split=0.2,
          #sample_weight=W
         )

#In this moment work fine.....
Train on 536000 samples, validate on 134000 samples
Epoch 1/10
536000/536000 [==============================] - 1814s 3ms/step - loss: 0.4794 - acc: 0.7679 - val_loss: 0.4624 - val_acc: 0.7784
Epoch 2/10
536000/536000 [==============================] - 1829s 3ms/step - loss: 0.4502 - acc: 0.7857 - val_loss: 0.4551 - val_acc: 0.7837
Epoch 3/10
 99584/536000 [====>.........................] - ETA: 23:10 - loss: 0.4291 - acc: 0.7980
0
Chevelle