標籤:

spark在那裡指定master URL呢?

14/05/30 16:04:23 ERROR UserGroupInformation: PriviledgedActionException as:jnleec (auth:SIMPLE) cause:java.lang.reflect.InvocationTargetException
Exception in thread "Thread-3" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1504)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:159)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:165)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
... 2 more
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.&(SparkContext.scala:113)
at org.apache.spark.api.java.JavaSparkContext.&(JavaSparkContext.scala:47)
at com.test.JavaSparkPi.main(JavaSparkPi.java:21)
... 12 more
Exception in thread "main" java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:105)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:443)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
14/05/30 16:04:33 INFO ApplicationMaster: AppMaster received a signal.

上面的報錯有一段「org.apache.spark.SparkException: A master URL must be set in your configuration」,那麼我怎麼樣指定一個master URL呢?


@牟小峰 同學說的配置conf/spark-env.sh 是配置spark的standalone環境,類似於hadoop配置hdfs環境一樣。但是部署程序時仍然需要指定master的位置。

如果選擇的部署模式是standalone且部署到你配置的這個集群上,可以指定 MASTER=spark://ubuntu:7070

下面解答spark在那裡指定master URL的問題:

1.通過spark shell,執行後進入交互界面

MASTER=spark://IP:PORT ./bin/spark-shell

2.程序內指定(可以通過參數傳入)

val conf = new SparkConf()
.setMaster(...)
val sc = new SparkContext(conf)

傳遞給spark的master url可以有如下幾種:

local 本地單線程

local[K] 本地多線程(指定K個內核)

local[*] 本地多線程(指定所有可用內核)

spark://HOST:PORT 連接到指定的 Spark standalone cluster master,需要指定埠。

mesos://HOST:PORT 連接到指定的 Mesos 集群,需要指定埠。

yarn-client客戶端模式 連接到 YARN 集群。需要配置 HADOOP_CONF_DIR。

yarn-cluster集群模式 連接到 YARN 集群
。需要配置 HADOOP_CONF_DIR。

spark1.0起的版本在提交程序到集群有很大的不同,需要注意:

./bin/spark-submit
--class &
--master &
--deploy-mode &
... # other options
&
[application-arguments]

例如:

# Run application locally on 8 cores
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master local[8]
/path/to/examples.jar
100

# Run on a Spark standalone cluster
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master spark://207.184.161.138:7077
--executor-memory 20G
--total-executor-cores 100
/path/to/examples.jar
1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master yarn-cluster # can also be `yarn-client` for client mode
--executor-memory 20G
--num-executors 50
/path/to/examples.jar
1000

# Run a Python application on a cluster
./bin/spark-submit
--master spark://207.184.161.138:7077
examples/src/main/python/pi.py
1000

題主應該多看下官方的文檔,裡面寫的都很清楚。

Cluster Mode OverviewSubmitting Applications Spark Standalone Mode Running Spark on YARN


我最後沒有修改conf/spark-env.sh下的內容,而是在代碼中修改如下:

SparkConf sparkConf = new SparkConf().setMaster("yarn-standalone").setAppName("JavaSparkPi");

添加了setMaster,可以解決這個問題,希望可以幫助到同樣遇到這個問題的朋友~


如果在IDE里可以直接

val sc = new SparkContext(new SparkConf().setAppName("xxx").setMaster("local[*]"))


master 可以在代碼內部設置也可以。

val sparkConf = new SparkConf().setMaster("yarn-cluster")

也可以在命令行上面設置。

spark-submit --master yarn-cluster


bin/spark-submit --master spark://hostname:7077(埠號) py腳本


推薦閱讀:

想研讀下spark的源碼,怎麼搭閱讀和調試的環境呢?
[譯] 解密 Uber 數據團隊的基礎數據架構優化之路
第四範式的人工智慧平台 Prophet 有可能替代 Spark 么?
深入淺出Spark(三)什麼是Standalone

TAG:Spark |