Comment insérer le modèle Keras dans le pipeline scikit-learn?

Question

J'utilise un pipeline personnalisé Scikit-Learn (sklearn.pipeline.Pipeline) en conjonction avec RandomizedSearchCV pour l'optimisation hyperparamétrique. Cela fonctionne très bien.

Maintenant, je voudrais insérer un modèle Keras comme première étape dans le pipeline. Les paramètres du modèle doivent être optimisés. Le modèle Keras calculé (ajusté) devrait ensuite être utilisé plus tard dans le pipeline par d'autres étapes, donc je pense que je dois stocker le modèle en tant que variable globale afin que les autres étapes du pipeline puissent l'utiliser. Est-ce correct?

Je sais que Keras propose des wrappers pour l'API Scikit-Learn mais le problème est que ces wrappers font déjà une classification/régression mais je veux seulement calculer le modèle Keras et rien d'autre.

Comment cela peut-il être fait?

Par exemple, j'ai une méthode qui renvoie le modèle:

def create_model(file_path, argument2,...): ... return model

La méthode a besoin de certains paramètres fixes comme un chemin de fichier, etc. mais X et y ne sont pas nécessaires (ou peuvent être ignorés). Les paramètres du modèle doivent être optimisés (nombre de couches etc.).

Felipe Almeida · Answer

Vous devez envelopper votre modèle Keras en tant que modèle Scikit learn d'abord, puis continuer comme d'habitude.

Voici un exemple rapide (j'ai omis les importations par souci de concision)

Voici un article de blog complet avec celui-ci et de nombreux autres exemples: Exemples de pipelines Scikit-learn

# create a function that returns a model, taking as parameters things you # want to verify using cross-valdiation and model selection def create_model(optimizer='adagrad', kernel_initializer='glorot_uniform', dropout=0.2): model = Sequential() model.add(Dense(64,activation='relu',kernel_initializer=kernel_initializer)) model.add(Dropout(dropout)) model.add(Dense(1,activation='sigmoid',kernel_initializer=kernel_initializer)) model.compile(loss='binary_crossentropy',optimizer=optimizer, metrics=['accuracy']) return model # wrap the model using the function you created clf = KerasRegressor(build_fn=create_model,verbose=0) # just create the pipeline pipeline = Pipeline([ ('clf',clf) ]) pipeline.fit(X_train, y_train)

Ahmad · Answer

Il s'agit d'une modification de l'exemple RBM dans la documentation sklearn ( http://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html#sphx-glr-auto-examples-neural-networks-plot- rbm-logistique-classification-py )

mais le réseau neuronal implémenté en keras avec backend tensorflow

 # -*- coding: utf-8 -*- """ Created on Mon Nov 27 17:11:21 2017 @author: ZED """ from __future__ import print_function print(__doc__) # Authors: Yann N. Dauphin, Vlad Niculae, Gabriel Synnaeve # License: BSD import numpy as np import matplotlib.pyplot as plt from scipy.ndimage import convolve from keras.models import Sequential from keras.layers.core import Dense,Activation from keras.wrappers.scikit_learn import KerasClassifier from keras.utils import np_utils from sklearn import datasets, metrics from sklearn.model_selection import train_test_split from sklearn.neural_network import BernoulliRBM from sklearn.pipeline import Pipeline #%% # Setting up def Nudge_dataset(X, Y): """ This produces a dataset 5 times bigger than the original one, by moving the 8x8 images in X around by 1px to left, right, down, up """ direction_vectors = [ [[0, 1, 0], [0, 0, 0], [0, 0, 0]], [[0, 0, 0], [1, 0, 0], [0, 0, 0]], [[0, 0, 0], [0, 0, 1], [0, 0, 0]], [[0, 0, 0], [0, 0, 0], [0, 1, 0]]] shift = lambda x, w: convolve(x.reshape((8, 8)), mode='constant', weights=w).ravel() X = np.concatenate([X] + [np.apply_along_axis(shift, 1, X, vector) for vector in direction_vectors]) Y = np.concatenate([Y for _ in range(5)], axis=0) return X, Y # Load Data digits = datasets.load_digits() X = np.asarray(digits.data, 'float32') X, Y = Nudge_dataset(X, digits.target) X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001) # 0-1 scaling X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0) #%% def create_model(): model = Sequential() model.add(Dense(100, input_dim=64)) model.add(Activation('tanh')) """ #other layer model.add(Dense(500)) model.add(Activation('tanh')) """ model.add(Dense(10)) model.add(Activation('softmax')) # Compile model model.compile(loss = 'binary_crossentropy', optimizer = 'adadelta', metrics=['accuracy']) return model rbm = BernoulliRBM(random_state=0, verbose=True) #This is the model you want. it is in sklearn format clf = KerasClassifier(build_fn=create_model, verbose=0) classifier = Pipeline(steps=[('rbm', rbm), ('VNN', clf)]) #%% # Training # Hyper-parameters. These were set by cross-validation, # using a GridSearchCV. Here we are not performing cross-validation to # save time. rbm.learning_rate = 0.06 rbm.n_iter = 20 # More components tend to give better prediction performance, but larger # fitting time rbm.n_components = 64 #adapt targets to hot matrix yTrain = np_utils.to_categorical(Y_train, 10) # Training RBM-Logistic Pipeline classifier.fit(X_train, yTrain) #%% # Evaluation print() print("NN using RBM features:
%s
" % ( metrics.classification_report( Y_test, classifier.predict(X_test)))) #%% # Plotting plt.figure(figsize=(4.2, 4)) for i, comp in enumerate(rbm.components_): plt.subplot(10, 10, i + 1) plt.imshow(comp.reshape((8, 8)), cmap=plt.cm.gray_r, interpolation='nearest') plt.xticks(()) plt.yticks(()) plt.suptitle('64 components extracted by RBM', fontsize=16) plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23) plt.show()