Tableau SimpleJSON et NumPy

Question

Quelle est la manière la plus efficace de sérialiser un tableau numpy en utilisant simplejson?

Alex Martelli · Accepted Answer

J'utiliserais simplejson.dumps(somearray.tolist()) comme l'approche la plus la plus pratique (si j'utilisais encore simplejson du tout, ce qui implique d'être bloqué avec Python 2.5 ou version antérieure; 2.6 et versions ultérieures ont un module de bibliothèque standard json qui fonctionne de la même manière, donc bien sûr, j'utiliserais cela si la version Python dans utilisez-le pris en charge ;-).

Dans une quête d'une plus grande efficacité, vous pourriez sous-classe json.JSONEncoder (dans json; je ne ' Je ne sais pas si l'ancien simplejson offrait déjà de telles possibilités de personnalisation) et, dans la méthode default, des cas spéciaux de numpy.array en les transformant en liste ou en tuples "juste dans temps". Je doute que vous gagneriez suffisamment à une telle approche, en termes de performances, pour justifier l'effort.

tlausch · Answer

Afin de conserver dtype et dimension, essayez ceci:

import base64 import json import numpy as np class NumpyEncoder(json.JSONEncoder): def default(self, obj): """If input object is an ndarray it will be converted into a dict holding dtype, shape and the data, base64 encoded. """ if isinstance(obj, np.ndarray): if obj.flags['C_CONTIGUOUS']: obj_data = obj.data else: cont_obj = np.ascontiguousarray(obj) assert(cont_obj.flags['C_CONTIGUOUS']) obj_data = cont_obj.data data_b64 = base64.b64encode(obj_data) return dict(__ndarray__=data_b64, dtype=str(obj.dtype), shape=obj.shape) # Let the base class default method raise the TypeError super(NumpyEncoder, self).default(obj) def json_numpy_obj_hook(dct): """Decodes a previously encoded numpy ndarray with proper shape and dtype. :param dct: (dict) json encoded ndarray :return: (ndarray) if input was an encoded ndarray """ if isinstance(dct, dict) and '__ndarray__' in dct: data = base64.b64decode(dct['__ndarray__']) return np.frombuffer(data, dct['dtype']).reshape(dct['shape']) return dct expected = np.arange(100, dtype=np.float) dumped = json.dumps(expected, cls=NumpyEncoder) result = json.loads(dumped, object_hook=json_numpy_obj_hook) # None of the following assertions will be broken. assert result.dtype == expected.dtype, "Wrong Type" assert result.shape == expected.shape, "Wrong Shape" assert np.allclose(expected, result), "Wrong Values"

Russ · Answer

J'ai trouvé ce code de sous-classe json pour sérialiser des tableaux numpy unidimensionnels dans un dictionnaire. Je l'ai essayé et ça marche pour moi.

class NumpyAwareJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, numpy.ndarray) and obj.ndim == 1: return obj.tolist() return json.JSONEncoder.default(self, obj)

Mon dictionnaire est "résultats". Voici comment j'écris dans le fichier "data.json":

j=json.dumps(results,cls=NumpyAwareJSONEncoder) f=open("data.json","w") f.write(j) f.close()

unutbu · Answer

Cela montre comment convertir un tableau NumPy 1D en JSON et revenir à un tableau:

try: import json except ImportError: import simplejson as json import numpy as np def arr2json(arr): return json.dumps(arr.tolist()) def json2arr(astr,dtype): return np.fromiter(json.loads(astr),dtype) arr=np.arange(10) astr=arr2json(arr) print(repr(astr)) # '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]' dt=np.int32 arr=json2arr(astr,dt) print(repr(arr)) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

En s'appuyant sur réponse de tlausch , voici un moyen de coder en JSON un tableau NumPy tout en préservant la forme et le type de n'importe quel tableau NumPy - y compris ceux avec un type complexe.

class NDArrayEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, np.ndarray): output = io.BytesIO() np.savez_compressed(output, obj=obj) return {'b64npz' : base64.b64encode(output.getvalue())} return json.JSONEncoder.default(self, obj) def ndarray_decoder(dct): if isinstance(dct, dict) and 'b64npz' in dct: output = io.BytesIO(base64.b64decode(dct['b64npz'])) output.seek(0) return np.load(output)['obj'] return dct # Make expected non-contiguous structured array: expected = np.arange(10)[::2] expected = expected.view('<i4,<f4') dumped = json.dumps(expected, cls=NDArrayEncoder) result = json.loads(dumped, object_hook=ndarray_decoder) assert result.dtype == expected.dtype, "Wrong Type" assert result.shape == expected.shape, "Wrong Shape" assert np.array_equal(expected, result), "Wrong Values"

HerrIvan · Answer

Si vous souhaitez appliquer la méthode de Russ aux tableaux numpy à n dimensions, vous pouvez essayer ceci

class NumpyAwareJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, numpy.ndarray): if obj.ndim == 1: return obj.tolist() else: return [self.default(obj[i]) for i in range(obj.shape[0])] return json.JSONEncoder.default(self, obj)

Cela transformera simplement un tableau à n dimensions en une liste de listes avec une profondeur "n". Pour reconstituer ces listes dans un tableau numpy, my_nparray = numpy.array(my_list) fonctionnera quelle que soit la "profondeur" de la liste.

The Doctor · Answer

Vous pouvez également répondre à cela avec juste une fonction passée dans json.dumps de cette façon:

json.dumps(np.array([1, 2, 3]), default=json_numpy_serializer)

Avec

import numpy as np def json_numpy_serialzer(o): """ Serialize numpy types for json Parameters: o (object): any python object which fails to be serialized by json Example: >>> import json >>> a = np.array([1, 2, 3]) >>> json.dumps(a, default=json_numpy_serializer) """ numpy_types = ( np.bool_, # np.bytes_, -- python `bytes` class is not json serializable # np.complex64, -- python `complex` class is not json serializable # np.complex128, -- python `complex` class is not json serializable # np.complex256, -- special handling below # np.datetime64, -- python `datetime.datetime` class is not json serializable np.float16, np.float32, np.float64, # np.float128, -- special handling below np.int8, np.int16, np.int32, np.int64, # np.object_ -- should already be evaluated as python native np.str_, np.timedelta64, np.uint8, np.uint16, np.uint32, np.uint64, np.void, ) if isinstance(o, np.ndarray): return o.tolist() Elif isinstance(o, numpy_types): return o.item() Elif isinstance(o, np.float128): return o.astype(np.float64).item() # Elif isinstance(o, np.complex256): -- no python native for np.complex256 # return o.astype(np.complex128).item() -- python `complex` class is not json serializable else: raise TypeError("{} of type {} is not JSON serializable".format(repr(o), type(o)))

validé:

need_addition_json_handeling = ( np.bytes_, np.complex64, np.complex128, np.complex256, np.datetime64, np.float128, ) numpy_types = Tuple(set(np.typeDict.values())) for numpy_type in numpy_types: print(numpy_type) if numpy_type == np.void: # complex dtypes evaluate as np.void, e.g. numpy_type = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]) Elif numpy_type in need_addition_json_handeling: print('python native can not be json serialized') continue a = np.ones(1, dtype=nptype) json.dumps(a, default=json_numpy_serialzer)

ankostis · Answer

Pour améliorer la réponse de Russ, j'inclurais également les np.generic scalars :

class NumpyAwareJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, np.ndarray) and obj.ndim == 1: return obj.tolist() Elif isinstance(obj, np.generic): return obj.item() return json.JSONEncoder.default(self, obj)

Luindil · Answer

Je viens de découvrir la réponse de tlausch à cette question et j'ai réalisé qu'elle donne la réponse presque correcte à mon problème, mais au moins pour moi cela ne fonctionne pas en Python 3.5, à cause de plusieurs erreurs: 1 - infini récursivité 2 - les données ont été enregistrées sous Aucune

puisque je ne peux pas encore commenter directement la réponse originale, voici ma version:

import base64 import json import numpy as np class NumpyEncoder(json.JSONEncoder): def default(self, obj): """If input object is an ndarray it will be converted into a dict holding dtype, shape and the data, base64 encoded. """ if isinstance(obj, np.ndarray): if obj.flags['C_CONTIGUOUS']: obj_data = obj.data else: cont_obj = np.ascontiguousarray(obj) assert(cont_obj.flags['C_CONTIGUOUS']) obj_data = cont_obj.data data_b64 = base64.b64encode(obj_data) return dict(__ndarray__= data_b64.decode('utf-8'), dtype=str(obj.dtype), shape=obj.shape) def json_numpy_obj_hook(dct): """Decodes a previously encoded numpy ndarray with proper shape and dtype. :param dct: (dict) json encoded ndarray :return: (ndarray) if input was an encoded ndarray """ if isinstance(dct, dict) and '__ndarray__' in dct: data = base64.b64decode(dct['__ndarray__']) return np.frombuffer(data, dct['dtype']).reshape(dct['shape']) return dct expected = np.arange(100, dtype=np.float) dumped = json.dumps(expected, cls=NumpyEncoder) result = json.loads(dumped, object_hook=json_numpy_obj_hook) # None of the following assertions will be broken. assert result.dtype == expected.dtype, "Wrong Type" assert result.shape == expected.shape, "Wrong Shape" assert np.allclose(expected, result), "Wrong Values"

John Zwinck · Answer

Un moyen rapide, mais pas vraiment optimal, est d'utiliser Pandas :

import pandas as pd pd.Series(your_array).to_json(orient='values')