Python glob mais contre une liste de chaînes plutôt que le système de fichiers

Question

Je veux pouvoir faire correspondre un modèle au format glob à une liste de chaînes, plutôt qu'à des fichiers réels dans le système de fichiers. Existe-t-il un moyen de le faire ou de convertir facilement un modèle glob en expression régulière?

Nizam Mohamed · Accepted Answer

Copie de bons artistes; grands artistes voler .

Je vole ;)

fnmatch.translate traduit des globes ? et * à regex . et .* respectivement. Je ne l'ai pas modifié.

import re def glob2re(pat): """Translate a Shell PATTERN to a regular expression. There is no way to quote meta-characters. """ i, n = 0, len(pat) res = '' while i < n: c = pat[i] i = i+1 if c == '*': #res = res + '.*' res = res + '[^/]*' Elif c == '?': #res = res + '.' res = res + '[^/]' Elif c == '[': j = i if j < n and pat[j] == '!': j = j+1 if j < n and pat[j] == ']': j = j+1 while j < n and pat[j] != ']': j = j+1 if j >= n: res = res + '\[' else: stuff = pat[i:j].replace('\','\\') i = j+1 if stuff[0] == '!': stuff = '^' + stuff[1:] Elif stuff[0] == '^': stuff = '\' + stuff res = '%s[%s]' % (res, stuff) else: res = res + re.escape(c) return res + '\Z(?ms)'

Celui-ci à la fnmatch.filter, tous les deux re.match et re.search travail.

def glob_filter(names,pat): return (name for name in names if re.match(glob2re(pat),name))

Les modèles globaux et les chaînes trouvés sur cette page réussissent le test.

pat_dict = { 'a/b/*/f.txt': ['a/b/c/f.txt', 'a/b/q/f.txt', 'a/b/c/d/f.txt','a/b/c/d/e/f.txt'], '/foo/bar/*': ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'], '/*/bar/b*': ['/foo/bar/baz', '/foo/bar/bar'], '/*/[be]*/b*': ['/foo/bar/baz', '/foo/bar/bar'], '/foo*/bar': ['/foolicious/spamfantastic/bar', '/foolicious/bar'] } for pat in pat_dict: print('pattern :\t{}\nstrings :\t{}'.format(pat,pat_dict[pat])) print('matched :\t{}\n'.format(list(glob_filter(pat_dict[pat],pat))))

Martijn Pieters · Answer

Le module glob utilise le module fnmatch pour les éléments de chemin individuels .

Cela signifie que le chemin est divisé en nom de répertoire et nom de fichier, et si le nom de répertoire contient des méta caractères (contient l'un des caractères [, * Ou ?), Alors ceux-ci sont développés récursivement .

Si vous avez une liste de chaînes qui sont de simples noms de fichiers, il suffit alors d'utiliser la fonction fnmatch.filter() :

import fnmatch matching = fnmatch.filter(filenames, pattern)

mais s'ils contiennent des chemins d'accès complets, vous devez faire plus de travail car l'expression régulière générée ne prend pas en compte les segments de chemin (les caractères génériques n'excluent pas les séparateurs ni ne sont ajustés pour la correspondance de chemin multiplateforme).

Vous pouvez construire un simple trie à partir des chemins, puis faire correspondre votre modèle à celui-ci:

import fnmatch import glob import os.path from itertools import product # Cross-Python dictionary views on the keys if hasattr(dict, 'viewkeys'): # Python 2 def _viewkeys(d): return d.viewkeys() else: # Python 3 def _viewkeys(d): return d.keys() def _in_trie(trie, path): """Determine if path is completely in trie""" current = trie for elem in path: try: current = current[elem] except KeyError: return False return None in current def find_matching_paths(paths, pattern): """Produce a list of paths that match the pattern. * paths is a list of strings representing filesystem paths * pattern is a glob pattern as supported by the fnmatch module """ if os.altsep: # normalise pattern = pattern.replace(os.altsep, os.sep) pattern = pattern.split(os.sep) # build a trie out of path elements; efficiently search on prefixes path_trie = {} for path in paths: if os.altsep: # normalise path = path.replace(os.altsep, os.sep) _, path = os.path.splitdrive(path) elems = path.split(os.sep) current = path_trie for elem in elems: current = current.setdefault(elem, {}) current.setdefault(None, None) # sentinel matching = [] current_level = [path_trie] for subpattern in pattern: if not glob.has_magic(subpattern): # plain element, element must be in the trie or there are # 0 matches if not any(subpattern in d for d in current_level): return [] matching.append([subpattern]) current_level = [d[subpattern] for d in current_level if subpattern in d] else: # match all next levels in the trie that match the pattern matched_names = fnmatch.filter({k for d in current_level for k in d}, subpattern) if not matched_names: # nothing found return [] matching.append(matched_names) current_level = [d[n] for d in current_level for n in _viewkeys(d) & set(matched_names)] return [os.sep.join(p) for p in product(*matching) if _in_trie(path_trie, p)]

Cette bouchée peut trouver rapidement des correspondances en utilisant des globes n'importe où le long du chemin:

>>> paths = ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/foo/bar/*') ['/foo/bar/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/*/bar/b*') ['/foo/bar/baz', '/foo/bar/bar'] >>> find_matching_paths(paths, '/*/[be]*/b*') ['/foo/bar/baz', '/foo/bar/bar', '/spam/eggs/baz']

Veedrac · Answer

Sur Python 3.4+, vous pouvez simplement utiliser PurePath.match .

pathlib.PurePath(path_string).match(pattern)

Sur Python 3.3 ou version antérieure (y compris 2.x), obtenez pathlib de PyPI .

Notez que pour obtenir des résultats indépendants de la plate-forme (qui dépendront de pourquoi vous exécutez cela), vous voudriez indiquer explicitement PurePosixPath ou PureWindowsPath.

mu 無 · Answer

Alors que fnmatch.fnmatch peut être utilisé directement pour vérifier si un modèle correspond à un nom de fichier ou non, vous pouvez également utiliser la méthode fnmatch.translate Pour générer l'expression régulière à partir du fnmatch modèle:

>>> import fnmatch >>> fnmatch.translate('*.txt') '.*\.txt\Z(?ms)'

De la documentation :

fnmatch.translate(pattern)

Renvoie le modèle de style Shell converti en une expression régulière.

Jason S · Answer

tant pis, je l'ai trouvé. Je veux le module fnmatch .

Carson Gee · Answer

Je voulais ajouter la prise en charge des modèles globaux récursifs, c'est-à-dire things/**/*.py et avoir un chemin relatif correspondant donc example*.py ne correspond pas à folder/example_stuff.py.

Voici mon approche:

 from os import path import re def recursive_glob_filter(files, glob): # Convert to regex and add start of line match pattern_re = '^' + fnmatch_translate(glob) # fnmatch does not escape path separators so escape them if path.sep in pattern_re and not r'\{}'.format(path.sep) in pattern_re: pattern_re = pattern_re.replace('/', r'\/') # Replace `*` with one that ignores path separators sep_respecting_wildcard = '[^\{}]*'.format(path.sep) pattern_re = pattern_re.replace('.*', sep_respecting_wildcard) # And now for `**` we have `[^\/]*[^\/]*`, so replace that with `.*` # to match all patterns in-between pattern_re = pattern_re.replace(2 * sep_respecting_wildcard, '.*') compiled_re = re.compile(pattern_re) return filter(compiled_re.search, files)

NumesSanguis · Answer

Une extension de @Veedrac PurePath.match réponse qui peut être appliquée à une liste de chaînes:

# Python 3.4+ from pathlib import Path path_list = ["foo/bar.txt", "spam/bar.txt", "foo/eggs.txt"] # convert string to pathlib.PosixPath / .WindowsPath, then apply PurePath.match to list print([p for p in path_list if Path(p).match("ba*")]) # "*ba*" also works # output: ['foo/bar.txt', 'spam/bar.txt'] print([p for p in path_list if Path(p).match("*o/ba*")]) # output: ['foo/bar.txt']

Il est préférable d'utiliser pathlib.Path() plutôt que pathlib.PurePath(), car alors vous n'avez pas à vous soucier du système de fichiers sous-jacent.