Comment diviser une liste de chaînes en sous-listes de chaînes par un élément de chaîne spécifique

Question

J'ai une liste de mots comme ci-dessous. Je veux diviser la liste par .. Existe-t-il un code meilleur ou utile dans Python 3?

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] result = [] tmp = [] for Elm in a: if Elm is not '.': tmp.append(Elm) else: result.append(tmp) tmp = [] print(result) # result: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

Mettre à jour

Ajoutez des cas de test pour le gérer correctement.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] def split_list(list_data, split_Word='.'): result = [] sub_data = [] for Elm in list_data: if Elm is not split_Word: sub_data.append(Elm) else: if len(sub_data) != 0: result.append(sub_data) sub_data = [] if len(sub_data) != 0: result.append(sub_data) return result print(split_list(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']] print(split_list(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']] print(split_list(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']]

Transhuman · Accepted Answer

Utiliser itertools.groupby

from itertools import groupby a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] result = [list(g) for k,g in groupby(a,lambda x:x=='.') if not k] print (result) #[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

Scott Boston · Answer

Vous pouvez faire tout cela avec une "ligne unique" en utilisant la compréhension de liste et les fonctions de chaîne join, split, strip et aucune bibliothèque supplémentaire.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] In [5]: [i.strip().split(' ') for i in ' '.join(a).split('.') if len(i) > 0 ] Out[5]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']] In [8]: [i.strip().split(' ') for i in ' '.join(b).split('.') if len(i) > 0 ] Out[8]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']] In [9]: In [8]: [i.strip().split(' ') for i in ' '.join(c).split('.') if len(i) > 0 ] Out[9]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']]

@Craig a une mise à jour plus simple:

[s.split() for s in ' '.join(a).split('.') if s]

&#211;scar L&#243;pez · Answer

Voici une autre méthode utilisant uniquement des opérations de liste standard (sans aucune dépendance d’autres bibliothèques!). Nous trouvons d’abord les points de partage, puis nous créons des sous-listes autour d’eux; remarquez que le premier élément est traité comme un cas particulier:

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] indexes = [-1] + [i for i, x in enumerate(a) if x == '.'] [a[indexes[i]+1:indexes[i+1]] for i in range(len(indexes)-1)] => [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

Ajax1234 · Answer

Vous pouvez reconstruire la chaîne en utilisant ' '.join et utiliser regex:

import re a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] new_s = [b for b in [re.split('\s', i) for i in re.split('\s*\.\s*', ' '.join(a))] if all(b)]

Sortie:

[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

RoadRunner · Answer

Je ne pouvais pas m'en empêcher, je voulais juste m'amuser avec cette excellente question:

import itertools a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.', 'yes'] def split_dots(lst): dots = [0] + [i+1 for i, e in enumerate(lst) if e == '.'] result = [list(itertools.takewhile(lambda x : x != '.', lst[dot:])) for dot in dots] return list(filter(lambda x : x, result)) print(split_dots(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']] print(split_dots(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']] print(split_dots(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice'], ['yes']]

MSeifert · Answer

Cette réponse nécessite l'installation d'une bibliothèque tierce: iteration_utilities ¹. La fonction split incluse facilite la résolution de cette tâche:

>>> from iteration_utilities import split >>> a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'Nice', '.'] >>> list(filter(None, split(a, '.', eq=True))) [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

Au lieu d'utiliser le paramètre eq, vous pouvez également définir une fonction personnalisée où scinder:

>>> list(filter(None, split(a, lambda x: x=='.'))) [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'Nice']]

Si vous souhaitez conserver le '.'s, vous pouvez également utiliser l'argument keep_before:

>>> list(filter(None, split(a, '.', eq=True, keep_before=True))) [['this', 'is', 'a', 'cat', '.'], ['hello', '.'], ['she', 'is', 'Nice', '.']]

Notez que la bibliothèque facilite simplement - il est facilement possible (voir les autres réponses) d’accomplir cette tâche sans installer de bibliothèque supplémentaire.

La filter peut être supprimée si vous ne vous attendez pas à ce que '.' apparaisse au début ou à la fin de votre liste à scinder.

¹ Je suis l'auteur de cette bibliothèque. Il est disponible via pip ou le canal conda-forge avec conda.