Lire le fichier json en tant que pandas dataframe?

Question

J'utilise python 3.6 et j'essaie de télécharger le fichier json (350 Mo) en tant que pandas dataframe en utilisant le code ci-dessous. Cependant, l'erreur suivante apparaît:

data_json_str = "[" + ",".join(data) + "] "TypeError: sequence item 0: expected str instance, bytes found

Comment puis-je réparer l'erreur?

import pandas as pd # read the entire file into a python array with open('C:/Users/Alberto/nutrients.json', 'rb') as f: data = f.readlines() # remove the trailing "
" from each line data = map(lambda x: x.rstrip(), data) # each element of 'data' is an individual JSON object. # i want to convert it into an *array* of JSON objects # which, in and of itself, is one large JSON object # basically... add square brackets to the beginning # and end, and have all the individual business JSON objects # separated by a comma data_json_str = "[" + ",".join(data) + "]" # now, load it into pandas data_df = pd.read_json(data_json_str)

Stephen Rauch · Accepted Answer

Si vous ouvrez le fichier en tant que binaire ('rb'), vous obtiendrez des octets. Que diriez-vous:

with open('C:/Users/Alberto/nutrients.json', 'rU') as f:

cs95 · Answer

D'après votre code, il semblerait que vous chargiez un fichier JSON contenant des données JSON sur chaque ligne. read_json supporte un argument lines pour des données comme celle-ci:

data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)

Remarque
Retirer lines=True _ si vous avez un seul objet JSON au lieu d’objets JSON individuels sur chaque ligne.

James Doepp - pihentagyu · Answer

En utilisant le module json, vous pouvez analyser le json dans un objet python, puis créer un cadre de données à partir de celui-ci:

import json import pandas as pd with open('C:/Users/Alberto/nutrients.json', 'r') as f: data = json.load(f) df = pd.DataFrame(data)

A.Emad · Answer

si vous voulez le convertir en tablea d'objets JSON, je pense que celui-ci fera ce que vous voulez

import json data = [] with open('nutrients.json', errors='ignore') as f: for line in f: data.append(json.loads(line)) print(data[0])