comment utiliser le lemmatiseur spacy pour obtenir un mot en forme de base

Question

Je suis nouveau dans spacy et je veux utiliser sa fonction lemmatizer, mais je ne sais pas comment l'utiliser, comme moi dans les chaînes de Word, qui renverra la chaîne avec la forme de base les mots.

Exemples:

'words' => 'Word'
'did' => 'do'

Merci.

Exemples:

'words' => 'Word'
'did' => 'do'

Merci.

damio · Answer

La réponse précédente est compliquée et ne peut pas être modifiée, alors voici une réponse plus conventionnelle.

# make sure your downloaded the english model with "python -m spacy download en" import spacy nlp = spacy.load('en') doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.") for token in doc: print(token, token.lemma, token.lemma_)

Sortie:

Apples 6617 apples and 512 and oranges 7024 orange are 536 be similar 1447 similar . 453 . Boots 4622 boot and 512 and hippos 98365 hippo are 536 be n't 538 not . 453 .

De la tournée officielle d'éclairage

RAVI · Answer

Code:

import os from spacy.en import English, LOCAL_DATA_DIR data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR) nlp = English(data_dir=data_dir) doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others") for token in doc3: print token, token.lemma, token.lemma_

Sortie:

this 496 this is 488 be spacy 173779 spacy lemmatize 1510965 lemmatize testing 2900 testing . 419 . programming 3408 programming books 1011 book are 488 be more 529 more better 615 better than 555 than others 871 others

Exemple Réf: ici

joel · Answer

Si vous souhaitez utiliser uniquement le Lemmatizer. Vous pouvez le faire de la manière suivante.

from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) lemmas = lemmatizer(u'ducks', u'NOUN') print(lemmas)

Sortie

['duck']