J'ai une configuration de cluster Hadoop/Yarn sur AWS, j'ai un maître et 3 esclaves. J'ai vérifié que 3 nœuds actifs fonctionnaient sur les ports 50070 et 8088. J'ai testé un travail d'étincelle en mode de déploiement client, tout fonctionne correctement.
Lorsque j'essaie de soumettre un travail à l'étincelle à l'aide de ./spark-2.1.1-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster ip.py
. Je reçois l'erreur suivante.
Diagnostics: le fichier n'existe pas: hdfs: //ec2-54-153-50-11.us-west-1.compute.amazonaws.com: 9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__ spark_libs__1200479165381142167.Zip
Java.io.FileNotFoundException: Le fichier n'existe pas:
hdfs: //ec2-54-153-50-11.us-west 1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.Zip
17/05/28 18:58:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
17/05/28 18:58:33 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-153-50-11.us-west-1.compute.amazonaws.com/172.31.5.235:8032
17/05/28 18:58:34 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
17/05/28 18:58:34 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/05/28 18:58:34 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/05/28 18:58:34 INFO yarn.Client: Setting up container launch context for our AM
17/05/28 18:58:34 INFO yarn.Client: Setting up the launch environment for our AM container
17/05/28 18:58:34 INFO yarn.Client: Preparing resources for our AM container
17/05/28 18:58:36 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/05/28 18:58:41 INFO yarn.Client: Uploading resource file:/tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14/__spark_libs__1200479165381142167.Zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.Zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/ip.py -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/ip.py
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/pyspark.Zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/pyspark.Zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.Zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/py4j-0.10.4-src.Zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14/__spark_conf__7895841687984145748.Zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_conf__.Zip
17/05/28 18:58:46 INFO spark.SecurityManager: Changing view acls to: ubuntu
17/05/28 18:58:46 INFO spark.SecurityManager: Changing modify acls to: ubuntu
17/05/28 18:58:46 INFO spark.SecurityManager: Changing view acls groups to:
17/05/28 18:58:46 INFO spark.SecurityManager: Changing modify acls groups to:
17/05/28 18:58:46 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
17/05/28 18:58:46 INFO yarn.Client: Submitting application application_1495996836198_0003 to ResourceManager
17/05/28 18:58:46 INFO impl.YarnClientImpl: Submitted application application_1495996836198_0003
17/05/28 18:58:47 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:47 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster Host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1495997926073
final status: UNDEFINED
tracking URL: http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/proxy/application_1495996836198_0003/
user: ubuntu
17/05/28 18:58:48 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:49 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:50 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:51 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:52 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:53 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:54 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:55 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:56 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:57 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:58 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:59 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:00 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:01 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:02 INFO yarn.Client: Application report for application_1495996836198_0003 (state: FAILED)
17/05/28 18:59:02 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1495996836198_0003 failed 2 times due to AM Container for appattempt_1495996836198_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/cluster/app/application_1495996836198_0003Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.Zip
Java.io.FileNotFoundException: File does not exist: hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.Zip
at org.Apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.Java:1309)
at org.Apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.Java:1301)
at org.Apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.Java:81)
at org.Apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.Java:1301)
at org.Apache.hadoop.yarn.util.FSDownload.copy(FSDownload.Java:253)
at org.Apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.Java:63)
at org.Apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.Java:361)
at org.Apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.Java:359)
at Java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.Java:421)
at org.Apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.Java:1657)
at org.Apache.hadoop.yarn.util.FSDownload.call(FSDownload.Java:358)
at org.Apache.hadoop.yarn.util.FSDownload.call(FSDownload.Java:62)
at Java.util.concurrent.FutureTask.run(FutureTask.Java:262)
at Java.util.concurrent.Executors$RunnableAdapter.call(Executors.Java:473)
at Java.util.concurrent.FutureTask.run(FutureTask.Java:262)
at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1145)
at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:615)
at Java.lang.Thread.run(Thread.Java:745)
Failing this attempt. Failing the application.
ApplicationMaster Host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1495997926073
final status: FAILED
tracking URL: http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/cluster/app/application_1495996836198_0003
user: ubuntu
Exception in thread "main" org.Apache.spark.SparkException: Application application_1495996836198_0003 finished with failed status
at org.Apache.spark.deploy.yarn.Client.run(Client.scala:1180)
at org.Apache.spark.deploy.yarn.Client$.main(Client.scala:1226)
at org.Apache.spark.deploy.yarn.Client.main(Client.scala)
at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
at Java.lang.reflect.Method.invoke(Method.Java:606)
at org.Apache.spark.deploy.SparkSubmit$.org$Apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.Apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.Apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/28 18:59:02 INFO util.ShutdownHookManager: Shutdown hook called
17/05/28 18:59:02 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14
ubuntu@ip-172-31-5-235:~$
Je mettais le maître en local (.setMaster('local')
) dans mon fichier source. Après que j'enlève cela, tout fonctionne bien.
J'ai aussi eu ce problème. J'ai essayé la solution de supprimer la setMaster('local')
dans le source, mais l'erreur n'a pas disparu.
Ce qui a finalement résolu mon problème était en réalité assez simple: le contexte spark devait être initialisé en premier (même avant les variables non liées à une étincelle).
À partir du message mentionné ci-dessus, voici un exemple en python. La même logique a fonctionné pour moi en scala
Bonjour, Si je suis tes suggestions, ça marche.
Notre code était comme ça:
Import numpy as np Import SparkContext foo = np.genfromtext(xxxxx) sc=SparkContext(...) #compute
===> Ça échoue ...
Import numpy as np Import SparkContext sc=SparkContext(...) foo = np.genfromtext(xxxxx) #compute
===> Cela fonctionne parfaitement ...
Remarque J'ai également supprimé la setMaster('local')
, car il est logique que cela interfère également.