Comment prévisualiser une partie d'un grand pandas DataFrame, dans le cahier iPython?

Question

Je commence tout juste à utiliser pandas dans le cahier IPython et rencontre le problème suivant: lorsqu'un DataFrame lu à partir d'un fichier CSV est petit, le cahier IPython l'affiche dans un fichier Nice. vue table. Quand le DataFrame est grand, quelque chose comme ceci est ouput:

In [27]: evaluation = readCSV("evaluation_MO_without_VNS_quality.csv").filter(["solver", "instance", "runtime", "objective"]) In [37]: evaluation Out[37]: <class 'pandas.core.frame.DataFrame'> Int64Index: 333 entries, 0 to 332 Data columns: solver 333 non-null values instance 333 non-null values runtime 333 non-null values objective 333 non-null values dtypes: int64(1), object(3)

Je voudrais voir une petite partie du bloc de données sous forme de tableau juste pour m'assurer qu'il est dans le bon format. Quelles sont mes options?

DSM · Accepted Answer

Dans ce cas, où le DataFrame est long mais pas trop large, vous pouvez simplement le trancher:

>>> df = pd.DataFrame({"A": range(1000), "B": range(1000)}) >>> df <class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 999 Data columns: A 1000 non-null values B 1000 non-null values dtypes: int64(2) >>> df[:5] A B 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4

ix est obsolète.

Si c'est large et long, j'ai tendance à utiliser .ix:

>>> df = pd.DataFrame({i: range(1000) for i in range(100)}) >>> df.ix[:5, :10] 0 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5

tagoma · Answer

# Say you have a df object containing your dataframe df.head(5) # will print out the first 5 rows df.tail(5) # will print out the 5 last rows # Note: it is similar to R

bigbug · Answer

J'écris une méthode pour montrer les quatre coins des données et monkey-patch à dataframe pour le faire:

def _sw(df, up_rows=10, down_rows=5, left_cols=4, right_cols=3, return_df=False): ''' display df data at four corners A,B (up_pt) C,D (down_pt) parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3 usage: df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10]) df.sw(5,2,3,2) df1 = df.set_index(['A','B'], drop=True, inplace=False) df1.sw(5,2,3,2) ''' #pd.set_printoptions(max_columns = 80, max_rows = 40) ncol, nrow = len(df.columns), len(df) # handle columns if ncol <= (left_cols + right_cols) : up_pt = df.ix[0:up_rows, :] # screen width can contain all columns down_pt = df.ix[-down_rows:, :] else: # screen width can not contain all columns pt_a = df.ix[0:up_rows, 0:left_cols] pt_b = df.ix[0:up_rows, -right_cols:] pt_c = df[-down_rows:].ix[:,0:left_cols] pt_d = df[-down_rows:].ix[:,-right_cols:] up_pt = pt_a.join(pt_b, how='inner') down_pt = pt_c.join(pt_d, how='inner') up_pt.insert(left_cols, '..', '..') down_pt.insert(left_cols, '..', '..') overlap_qty = len(up_pt) + len(down_pt) - len(df) down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows dt_str_list = down_pt.to_string().split('
') # transfer down_pt to string list # Display up part data print up_pt start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index # Display omit line if screen height is not enought to display all rows if overlap_qty < 0: print "." * len(dt_str_list[start_row]) # Display down part data row by row for line in dt_str_list[start_row:]: print line # Display foot note print "
" print "Index :",df.index.names print "Column:",",".join(list(df.columns.values)) print "row: %d col: %d"%(len(df), len(df.columns)) print "
" return (df if return_df else None) DataFrame.sw = _sw #add a method to DataFrame class

Voici l'échantillon:

>>> df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10]) >>> df.sw() A B C D .. H I J 0 -0.8166 0.0102 0.0215 -0.0307 .. -0.0820 1.2727 0.6395 1 1.0659 -1.0102 -1.3960 0.4700 .. 1.0999 1.1222 -1.2476 2 0.4347 1.5423 0.5710 -0.5439 .. 0.2491 -0.0725 2.0645 3 -1.5952 -1.4959 2.2697 -1.1004 .. -1.9614 0.6488 -0.6190 4 -1.4426 -0.8622 0.0942 -0.1977 .. -0.7802 -1.1774 1.9682 5 1.2526 -0.2694 0.4841 -0.7568 .. 0.2481 0.3608 -0.7342 6 0.2108 2.5181 1.3631 0.4375 .. -0.1266 1.0572 0.3654 7 -1.0617 -0.4743 -1.7399 -1.4123 .. -1.0398 -1.4703 -0.9466 8 -0.5682 -1.3323 -0.6992 1.7737 .. 0.6152 0.9269 2.1854 9 0.2361 0.4873 -1.1278 -0.2251 .. 1.4232 2.1212 2.9180 10 2.0034 0.5454 -2.6337 0.1556 .. 0.0016 -1.6128 -0.8093 .............................................................. 15 1.4091 0.3540 -1.3498 -1.0490 .. 0.9328 0.3668 1.3948 16 0.4528 -0.3183 0.4308 -0.1818 .. 0.1295 1.2268 0.1365 17 -0.7093 1.3991 0.9501 2.1227 .. -1.5296 1.1908 0.0318 18 1.7101 0.5962 0.8948 1.5606 .. -0.6862 0.9558 -0.5514 19 1.0329 -1.2308 -0.6896 -0.5112 .. 0.2719 1.1478 -0.1459 Index : [None] Column: A,B,C,D,E,F,G,H,I,J row: 20 col: 10 >>> df.sw(4,2,3,4) A B C .. G H I J 0 -0.8166 0.0102 0.0215 .. 0.3671 -0.0820 1.2727 0.6395 1 1.0659 -1.0102 -1.3960 .. 1.0984 1.0999 1.1222 -1.2476 2 0.4347 1.5423 0.5710 .. 1.6675 0.2491 -0.0725 2.0645 3 -1.5952 -1.4959 2.2697 .. 0.4856 -1.9614 0.6488 -0.6190 4 -1.4426 -0.8622 0.0942 .. -0.0947 -0.7802 -1.1774 1.9682 .............................................................. 18 1.7101 0.5962 0.8948 .. -0.8592 -0.6862 0.9558 -0.5514 19 1.0329 -1.2308 -0.6896 .. -0.3954 0.2719 1.1478 -0.1459 Index : [None] Column: A,B,C,D,E,F,G,H,I,J row: 20 col: 10

ajp619 · Answer

Voici un moyen rapide de prévisualiser une grande table sans la laisser trop large:

Fonction d'affichage:

# display large dataframes in an html iframe def ldf_display(df, lines=500): txt = ("<iframe " + "srcdoc='" + df.head(lines).to_html() + "' " + "width=1000 height=500>" + "</iframe>") return IPython.display.HTML(txt)

Maintenant, lancez ceci dans n'importe quelle cellule:

ldf_display(large_dataframe)

Cela convertira le dataframe en HTML, puis l'affichera dans un iframe. L'avantage est que vous pouvez contrôler la taille de la sortie et disposer de barres de défilement facilement accessibles.

Travaillé pour mes besoins, peut-être que cela aidera quelqu'un d'autre.

Bang Shen · Answer

Pour voir les n premières lignes de DataFrame:

df.head(n) # (n=5 by default)

Pour voir les n dernières lignes:

df.tail(n)

Bang Shen · Answer

Vous pouvez simplement utiliser nrows. Par exemple

pd.read_csv('data.csv',nrows=6)

montrera les 6 premières lignes de data.csv.

HeadAndTail · Answer

Dans Python pandas, fournissez head () et tail ()) pour imprimer les données de tête et de queue, respectivement.

import pandas as pd train = pd.read_csv('file_name') train.head() # it will print 5 head row data as default value is 5 train.head(n) # it will print n head row data train.tail() #it will print 5 tail row data as default value is 5 train.tail(n) #it will print n tail row data

bigbug · Answer

Mettez-en à jour une pour générer une chaîne et tenez-en compte pour Pandas0.13 +

def _sw2(df, up_rows=5, down_rows=3, left_cols=4, right_cols=2, return_df=False): """ return df data display string at four corners A,B (up_pt) C,D (down_pt) parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3 usage: df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10]) df.sw(5,2,3,2) df1 = df.set_index(['A','B'], drop=True, inplace=False) df1.sw(5,2,3,2) """ #pd.set_printoptions(max_columns = 80, max_rows = 40) nrow, ncol = df.shape #ncol, nrow = len(df.columns), len(df) # handle columns if ncol <= (left_cols + right_cols) : up_pt = df.ix[0:up_rows, :] # screen width can contain all columns down_pt = df.ix[-down_rows:, :] else: # screen width can not contain all columns pt_a = df.ix[0:up_rows, 0:left_cols] pt_b = df.ix[0:up_rows, -right_cols:] pt_c = df[-down_rows:].ix[:,0:left_cols] pt_d = df[-down_rows:].ix[:,-right_cols:] up_pt = pt_a.join(pt_b, how='inner') down_pt = pt_c.join(pt_d, how='inner') up_pt.insert(left_cols, '..', '..') down_pt.insert(left_cols, '..', '..') overlap_qty = len(up_pt) + len(down_pt) - len(df) down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows dt_str_list = down_pt.to_string().split('
') # transfer down_pt to string list # Display up part data ds = up_pt.__str__() #get rid of ending part of Pandas0.13+ display string by finding the last 3 '
', ugly though Display_str = ds[0:ds[0:ds[0:ds.rfind('
')].rfind('
')].rfind('
')] #refer to http://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index # Display omit line if screen height is not enought to display all rows if overlap_qty < 0: Display_str += "
" Display_str += "." * len(dt_str_list[start_row]) Display_str += "
" # Display down part data row by row for line in dt_str_list[start_row:]: Display_str += "
" Display_str += line # Display foot note Display_str += "

" Display_str += "Index : %s
"%str(df.index.names) col_name_list = list(df.columns.values) if ncol < 10: col_name_str = ", ".join(col_name_list) else: col_name_str = ", ".join(col_name_list[0:7]) + ' ... ' + ", ".join(col_name_list[-2:]) Display_str = Display_str + "Column: " + col_name_str + "
" Display_str = Display_str + "row: %d col: %d"%(nrow, ncol) + " " dty_dict={} #simulate defaultdict for k,g in itertools.groupby(list(df.dtypes.values)): #http://stackoverflow.com/questions/13565248/grouping-the-same-recurring-items-that-occur-in-a-row-from-list/13565414#13565414 try: dty_dict[k] = dty_dict[k] + len(list(g)) except: dty_dict[k] = len(list(g)) for key in dty_dict: Display_str += "{0}: {1} ".format(key, dty_dict[key]) Display_str += "

" return (df if return_df else Display_str)

Lov Verma · Answer

Pour ne voir que les premières entrées, vous pouvez utiliser la fonction pandas head qui est utilisée comme

dataframe.head(any number) // default is 5 dataframe.head(n=value)

ou vous pouvez aussi vous trancher à cet effet, ce qui peut également donner le même résultat,

dataframe[:n]

Pour voir les dernières entrées, vous pouvez utiliser pandas tail () de la même manière,

dataframe.tail(any number) // default is 5 dataframe.tail(n=value)

Paul Sochacki · Answer

J'ai trouvé l'approche suivante la plus efficace pour l'échantillonnage d'un DataFrame:

print(df[A:B]) ## 'A' and 'B' are the first and last records in range

Par exemple, print(df[10:15]) imprimera les lignes 10 à 15 - incluses - de votre jeu de données.

John Khalaf · Answer

Cette ligne vous permet de voir toutes les lignes (jusqu'au nombre que vous avez défini comme "max_rows") sans qu'aucune ligne ne soit masquée par les points (".....") qui apparaissent normalement entre la tête et la queue de la sortie imprimée. .

pd.options.display.max_rows = 500