web-dev-qa-db-fra.com

Pourquoi DataFrame.saveAsTable ("df") enregistre la table sur un hôte HDFS différent?

J'ai configuré Hive (1.13.1) avec Spark (1.4.0) et je suis en mesure d'accéder à toutes les bases de données et la table de Hive et mon répertoire d'entrepôt est hdfs://192.168.1.17:8020/user/Hive/warehouse

Mais quand, j'essaie d'enregistrer un Dataframe via Spark-Shell (en utilisant master) dans Hive en utilisant la fonction df.saveAsTable("df"), j'ai eu cette erreur.

15/07/03 14:48:59 INFO audit: ugi=user  ip=unknown-ip-addr  cmd=get_database: default   
15/07/03 14:48:59 INFO HiveMetaStore: 0: get_table : db=default tbl=df
15/07/03 14:48:59 INFO audit: ugi=user  ip=unknown-ip-addr  cmd=get_table : db=default tbl=df   
Java.net.ConnectException: Call From bdiuser-Vostro-3800/127.0.1.1 to 192.168.1.19:8020 failed on connection exception: Java.net.ConnectException: Connection refused; For more details see:  http://wiki.Apache.org/hadoop/ConnectionRefused
    at Sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at Sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.Java:57)
    at Sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.Java:45)
    at Java.lang.reflect.Constructor.newInstance(Constructor.Java:526)
    at org.Apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.Java:783)
    at org.Apache.hadoop.net.NetUtils.wrapException(NetUtils.Java:730)
    at org.Apache.hadoop.ipc.Client.call(Client.Java:1414)
    at org.Apache.hadoop.ipc.Client.call(Client.Java:1363)
    at org.Apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.Java:206)
    at com.Sun.proxy.$Proxy14.getFileInfo(Unknown Source)
    at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
    at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
    at Java.lang.reflect.Method.invoke(Method.Java:606)
    at org.Apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.Java:190)
    at org.Apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.Java:103)
    at com.Sun.proxy.$Proxy14.getFileInfo(Unknown Source)
    at org.Apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.Java:699)
    at org.Apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.Java:1762)
    at org.Apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.Java:1124)
    at org.Apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.Java:1120)
    at org.Apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.Java:81)
    at org.Apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.Java:1120)
    at org.Apache.hadoop.fs.FileSystem.exists(FileSystem.Java:1398)
    at org.Apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:78)
    at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.Apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.Apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
    at org.Apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
    at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
    at org.Apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
    at org.Apache.spark.sql.Hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:239)
    at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.Apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.Apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.Apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.Apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
    at org.Apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
    at org.Apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
    at org.Apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:211)
    at org.Apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1517)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
    at $iwC$$iwC$$iwC.<init>(<console>:35)
    at $iwC$$iwC.<init>(<console>:37)
    at $iwC.<init>(<console>:39)
    at <init>(<console>:41)
    at .<init>(<console>:45)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
    at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
    at Java.lang.reflect.Method.invoke(Method.Java:606)
    at org.Apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.Apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.Apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.Apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.Apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.Apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.Apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.Apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.Apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.Apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.Apache.spark.repl.SparkILoop.org$Apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.Apache.spark.repl.SparkILoop$$anonfun$org$Apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.Apache.spark.repl.SparkILoop.org$Apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.Apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.Apache.spark.repl.Main$.main(Main.scala:31)
    at org.Apache.spark.repl.Main.main(Main.scala)
    at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
    at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
    at Java.lang.reflect.Method.invoke(Method.Java:606)
    at org.Apache.spark.deploy.SparkSubmit$.org$Apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
    at org.Apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
    at org.Apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
    at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
    at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: Java.net.ConnectException: Connection refused
    at Sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at Sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.Java:744)
    at org.Apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.Java:206)
    at org.Apache.hadoop.net.NetUtils.connect(NetUtils.Java:529)
    at org.Apache.hadoop.net.NetUtils.connect(NetUtils.Java:493)
    at org.Apache.hadoop.ipc.Client$Connection.setupConnection(Client.Java:604)
    at org.Apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.Java:699)
    at org.Apache.hadoop.ipc.Client$Connection.access$2800(Client.Java:367)
    at org.Apache.hadoop.ipc.Client.getConnection(Client.Java:1462)
    at org.Apache.hadoop.ipc.Client.call(Client.Java:1381)
    ... 86 more

Lorsque, je passe par cette erreur, j'ai trouvé que le programme a essayé une connexion Hôte pour HDFS différente pour enregistrer la table.

Et j'ai aussi essayé avec l'étincelle de Shell d'un autre travailleur, j'ai eu la même erreur.

6
Kaushal

Avec saveAsTable l'emplacement par défaut dans lequel Spark enregistre est contrôlé par HiveMetastore (basé sur les documents). Une autre option serait d'utiliser saveAsParquetFile et de spécifier le chemin, puis enregistrez ce chemin plus tard avec votre métastore Hive OR utilisez la nouvelle interface DataFrameWriter et spécifiez l'option de chemin write.format(source).mode(mode).options(options).saveAsTable(tableName).

11
Holden

Veuillez trouver l'exemple ci-dessous:

val options = Map("path" -> hiveTablePath)
result.write.format("orc").partitionBy("partitiondate").options(options).mode(SaveMode.Append).saveAsTable(hiveTable)

J'ai expliqué cela un peu plus dans mon blog .

21
Deepika Khera

Vous pouvez écrire spark dataframe dans le spark table) existant.

Veuillez trouver l'exemple ci-dessous:

df.write.mode("overwrite").saveAsTable("database.tableName")
4
Sareesh Krishnan