J'ai configuré Hive (1.13.1) avec Spark (1.4.0) et je suis en mesure d'accéder à toutes les bases de données et la table de Hive et mon répertoire d'entrepôt est hdfs://192.168.1.17:8020/user/Hive/warehouse
Mais quand, j'essaie d'enregistrer un Dataframe via Spark-Shell (en utilisant master) dans Hive en utilisant la fonction df.saveAsTable("df")
, j'ai eu cette erreur.
15/07/03 14:48:59 INFO audit: ugi=user ip=unknown-ip-addr cmd=get_database: default
15/07/03 14:48:59 INFO HiveMetaStore: 0: get_table : db=default tbl=df
15/07/03 14:48:59 INFO audit: ugi=user ip=unknown-ip-addr cmd=get_table : db=default tbl=df
Java.net.ConnectException: Call From bdiuser-Vostro-3800/127.0.1.1 to 192.168.1.19:8020 failed on connection exception: Java.net.ConnectException: Connection refused; For more details see: http://wiki.Apache.org/hadoop/ConnectionRefused
at Sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at Sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.Java:57)
at Sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.Java:45)
at Java.lang.reflect.Constructor.newInstance(Constructor.Java:526)
at org.Apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.Java:783)
at org.Apache.hadoop.net.NetUtils.wrapException(NetUtils.Java:730)
at org.Apache.hadoop.ipc.Client.call(Client.Java:1414)
at org.Apache.hadoop.ipc.Client.call(Client.Java:1363)
at org.Apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.Java:206)
at com.Sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
at Java.lang.reflect.Method.invoke(Method.Java:606)
at org.Apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.Java:190)
at org.Apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.Java:103)
at com.Sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.Apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.Java:699)
at org.Apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.Java:1762)
at org.Apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.Java:1124)
at org.Apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.Java:1120)
at org.Apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.Java:81)
at org.Apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.Java:1120)
at org.Apache.hadoop.fs.FileSystem.exists(FileSystem.Java:1398)
at org.Apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:78)
at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.Apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.Apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.Apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
at org.Apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
at org.Apache.spark.sql.Hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:239)
at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.Apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.Apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.Apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
at org.Apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:211)
at org.Apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1517)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC.<init>(<console>:37)
at $iwC.<init>(<console>:39)
at <init>(<console>:41)
at .<init>(<console>:45)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
at Java.lang.reflect.Method.invoke(Method.Java:606)
at org.Apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.Apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.Apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.Apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.Apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.Apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.Apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.Apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.Apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.Apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.Apache.spark.repl.SparkILoop.org$Apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.Apache.spark.repl.SparkILoop.org$Apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.Apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.Apache.spark.repl.Main$.main(Main.scala:31)
at org.Apache.spark.repl.Main.main(Main.scala)
at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
at Java.lang.reflect.Method.invoke(Method.Java:606)
at org.Apache.spark.deploy.SparkSubmit$.org$Apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.Apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.Apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: Java.net.ConnectException: Connection refused
at Sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at Sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.Java:744)
at org.Apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.Java:206)
at org.Apache.hadoop.net.NetUtils.connect(NetUtils.Java:529)
at org.Apache.hadoop.net.NetUtils.connect(NetUtils.Java:493)
at org.Apache.hadoop.ipc.Client$Connection.setupConnection(Client.Java:604)
at org.Apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.Java:699)
at org.Apache.hadoop.ipc.Client$Connection.access$2800(Client.Java:367)
at org.Apache.hadoop.ipc.Client.getConnection(Client.Java:1462)
at org.Apache.hadoop.ipc.Client.call(Client.Java:1381)
... 86 more
Lorsque, je passe par cette erreur, j'ai trouvé que le programme a essayé une connexion Hôte pour HDFS différente pour enregistrer la table.
Et j'ai aussi essayé avec l'étincelle de Shell d'un autre travailleur, j'ai eu la même erreur.
Avec saveAsTable
l'emplacement par défaut dans lequel Spark enregistre est contrôlé par HiveMetastore (basé sur les documents). Une autre option serait d'utiliser saveAsParquetFile
et de spécifier le chemin, puis enregistrez ce chemin plus tard avec votre métastore Hive OR utilisez la nouvelle interface DataFrameWriter et spécifiez l'option de chemin write.format(source).mode(mode).options(options).saveAsTable(tableName)
.
Veuillez trouver l'exemple ci-dessous:
val options = Map("path" -> hiveTablePath)
result.write.format("orc").partitionBy("partitiondate").options(options).mode(SaveMode.Append).saveAsTable(hiveTable)
Vous pouvez écrire spark dataframe dans le spark table) existant.
Veuillez trouver l'exemple ci-dessous:
df.write.mode("overwrite").saveAsTable("database.tableName")