J'apprends à utiliser Imputer sur Python.
Ceci est mon code:
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])
df["price"]=imp.transform(df["price"])
Toutefois, l'erreur suivante s'affiche: ValueError: la longueur des valeurs ne correspond pas à la longueur de l'index
Qu'est ce qui ne va pas avec mon code???
Merci pour ton aide
Ceci est dû au fait que Imputer
utilise généralement DataFrames plutôt que Series. Une solution possible est:
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()
# Or even
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()
Je pense que vous voulez spécifier l’axe de l’imputer, puis transposer le tableau qu’il renvoie:
import pandas as pd
import numpy as np
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean",axis=1 ) #specify axis
q = imp.fit_transform(df["price"]).T #perform a transpose operation
df["price"]=q
print df
La solution simple est de fournir un tableau 2D
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]])
df['boh'] = imp.fit_transform(df[['price']])
Voici votre DataFrame