KeyError: ('count', 's'est produite à l'index 0')

Question

J'essaie de faire l'exemple dans tilisez Python & Pandas pour créer un diagramme de réseau dirigé par la force D

Mais dans la ligne ci-dessous, j'obtiens une erreur 'KeyError: (' count ',' s'est produite à l'index 0 ')'

 temp_links_list = list(grouped_src_dst.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['count']}, axis=1))

Je suis nouveau en python. Quel est le problème ici?

Code édité

import pandas as pd import json import re pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.') dataframe = pcap_data src_dst = dataframe[["Source","Destination"]] src_dst.rename(columns={"Source":"source","Destination":"target"}, inplace=True) grouped_src_dst = src_dst.groupby(["source","target"]).size().reset_index() grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records') unique_ips = pd.Index(grouped_src_dst['source'] .append(grouped_src_dst['target']) .reset_index(drop=True).unique())

Mais

print(grouped_src_dst.columns.tolist()) ['source', 'target', 0]

Code final

import pandas as pd import json import re pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.') dataframe = pcap_data src_dst = dataframe[["Source","Destination"]] src_dst.sample(10) grouped_src_dst = src_dst.groupby(["Source","Destination"]).size().reset_index() d={0:'value',"Source":"source","Destination":"target"} L = grouped_src_dst.rename(columns=d) unique_ips = pd.Index(L['source'] .append(L['target']) .reset_index(drop=True).unique()) group_dict = {} counter = 0 for ip in unique_ips: breakout_ip = re.match("^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$", ip) if breakout_ip: net_id = '.'.join(breakout_ip.group(1,2,3)) if net_id not in group_dict: counter += 1 group_dict[net_id] = counter else: pass temp_links_list = list(L.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['value']}, axis=1))

jezrael · Accepted Answer

Je pense qu'il y a un problème avec le nom de la colonne count - manquant ou un espace comme ' count'.

#check columns names print (grouped_src_dst.columns.tolist()) ['count', 'source', 'target']

Échantillon:

grouped_src_dst = pd.DataFrame({'source':['a','s','f'], 'target':['b','n','m'], 'count':[0,8,4]}) print (grouped_src_dst) count source target 0 0 a b 1 8 s n 2 4 f m f = lambda row: {"source": row['source'], "target": row['target'], "value": row['count']} temp_links_list = list(grouped_src_dst.apply(f, axis=1)) print (temp_links_list) [{'value': 0, 'source': 'a', 'target': 'b'}, {'value': 8, 'source': 's', 'target': 'n'}, {'value': 4, 'source': 'f', 'target': 'm'}]

La solution la plus simple consiste à renommer la colonne count et à utiliser DataFrame.to_dict :

print (grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records')) [{'value': 0, 'source': 'a', 'target': 'b'}, {'value': 8, 'source': 's', 'target': 'n'}, {'value': 4, 'source': 'f', 'target': 'm'}]

EDIT1:

pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.') grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index() d = {0:'value', "Source":"source","Destination":"target"} L = grouped_src_dst.rename(columns=d).to_dict(orient='records')

Échantillon:

pcap_data = pd.DataFrame({'Source':list('aabbccdd'), 'Destination':list('eertffff')}) print (pcap_data) Destination Source 0 e a 1 e a 2 r b 3 t b 4 f c 5 f c 6 f d 7 f d grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index() print (grouped_src_dst) Source Destination 0 0 a e 2 1 b r 1 2 b t 1 3 c f 2 4 d f 2 d = {0:'value', "Source":"source","Destination":"target"} L = grouped_src_dst.rename(columns=d).to_dict(orient='records') print (L) [{'value': 2, 'source': 'a', 'target': 'e'}, {'value': 1, 'source': 'b', 'target': 'r'}, {'value': 1, 'source': 'b', 'target': 't'}, {'value': 2, 'source': 'c', 'target': 'f'}, {'value': 2, 'source': 'd', 'target': 'f'}]

unique_ips = pd.Index(grouped_src_dst['Source'] .append(grouped_src_dst['Destination']) .reset_index(drop=True).unique()) print (unique_ips) Index(['a', 'b', 'c', 'd', 'e', 'r', 't', 'f'], dtype='object') import numpy as np unique_ips = np.unique(grouped_src_dst[['Source','Destination']].values.ravel()).tolist() print (unique_ips) ['a', 'b', 'c', 'd', 'e', 'f', 'r', 't']