Obtenir aucune boucle correspondant à la signature spécifiée et une erreur de conversion

Question

Je suis un débutant en python et en apprentissage automatique. Je reçois une erreur ci-dessous lorsque j'essaie d'adapter des données à statsmodels.formula.api OLS.fit ()

Traceback (appel le plus récent en dernier):

Fichier "", ligne 47, dans regressor_OLS = sm.OLS (y, X_opt) .fit ()

Fichier "E:\Anaconda\lib\site-packages\statsmodels egression\linear_model.py", ligne 190, en forme self.pinv_wexog, singular_values = pinv_extended (self.wexog)

Fichier "E:\Anaconda\lib\site-packages\statsmodels ools ools.py", ligne 342, dans pinv_extended u, s, vt = np.linalg.svd (X, 0)

Fichier "E:\Anaconda\lib\site-packages umpy\linalg\linalg.py", ligne 1404, in svd u, s, vt = gufunc (a, signature = signature, extobj = extobj)

TypeError: aucune boucle correspondant à la signature spécifiée et la transtypage n'était trouvé pour ufunc svd_n_s

code

#Importing Libraries import numpy as np # linear algebra import pandas as pd # data processing import matplotlib.pyplot as plt #Visualization #Importing the dataset dataset = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv') #dataset.head(10) #Encoding categorical data using panda get_dummies function . Easier and straight forward than OneHotEncoder in sklearn #dataset = pd.get_dummies(data = dataset , columns=['Platform' , 'Genre' , 'Rating' ] , drop_first = True ) #drop_first use to fix dummy varible trap dataset=dataset.replace('tbd',np.nan) #Separating Independent & Dependant Varibles #X = pd.concat([dataset.iloc[:,[11,13]], dataset.iloc[:,13: ]] , axis=1).values #Getting important variables X = dataset.iloc[:,[10,12]].values y = dataset.iloc[:,9].values #Dependant Varible (Global sales) #Taking care of missing data from sklearn.preprocessing import Imputer imputer = Imputer(missing_values = 'NaN' , strategy = 'mean' , axis = 0) imputer = imputer.fit(X[:,0:2]) X[:,0:2] = imputer.transform(X[:,0:2]) #Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2 , random_state = 0) #Fitting Mutiple Linear Regression to the Training Set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train,y_train) #Predicting the Test set Result y_pred = regressor.predict(X_test) #Building the optimal model using Backward Elimination (p=0.050) import statsmodels.formula.api as sm X = np.append(arr = np.ones((16719,1)).astype(float) , values = X , axis = 1) X_opt = X[:, [0,1,2]] regressor_OLS = sm.OLS(y , X_opt).fit() regressor_OLS.summary()

Ensemble de données

lien de jeu de données

Impossible de trouver quelque chose d'utile pour résoudre ce problème sur stack-overflow ou google.

Victor Sejas · Answer

essayez de spécifier le

quand la matrice est créée ..___ Exemple:

a=np.matrix([[1,2],[3,4]], dtype='float')

Hope this works!

Muke888 · Answer

Comme suggéré précédemment, vous devez vous assurer que X_opt est un type float . Par exemple, dans votre code, cela ressemblerait à ceci:

X_opt = X[:, [0,1,2]] X_opt = X_opt.astype(float) regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit() regressor_OLS.summary()