spark在那裡指定master URL呢?
14/05/30 16:04:23 ERROR UserGroupInformation: PriviledgedActionException as:jnleec (auth:SIMPLE) cause:java.lang.reflect.InvocationTargetException
Exception in thread "Thread-3" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1504)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:159)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:165)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
... 2 more
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.&(SparkContext.scala:113)
at org.apache.spark.api.java.JavaSparkContext.&(JavaSparkContext.scala:47)
at com.test.JavaSparkPi.main(JavaSparkPi.java:21)
... 12 more
Exception in thread "main" java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:105)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:443)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
14/05/30 16:04:33 INFO ApplicationMaster: AppMaster received a signal.上面的報錯有一段「org.apache.spark.SparkException: A master URL must be set in your configuration」,那麼我怎麼樣指定一個master URL呢?
@牟小峰 同學說的配置conf/spark-env.sh 是配置spark的standalone環境,類似於hadoop配置hdfs環境一樣。但是部署程序時仍然需要指定master的位置。
如果選擇的部署模式是standalone且部署到你配置的這個集群上,可以指定 MASTER=spark://ubuntu:7070下面解答spark在那裡指定master URL的問題:
1.通過spark shell,執行後進入交互界面MASTER=spark://IP:PORT ./bin/spark-shell
val conf = new SparkConf()
.setMaster(...)
val sc = new SparkContext(conf)
傳遞給spark的master url可以有如下幾種:
spark1.0起的版本在提交程序到集群有很大的不同,需要注意:local 本地單線程
local[K] 本地多線程(指定K個內核)local[*] 本地多線程(指定所有可用內核)spark://HOST:PORT 連接到指定的 Spark standalone cluster master,需要指定埠。
mesos://HOST:PORT 連接到指定的 Mesos 集群,需要指定埠。yarn-client客戶端模式 連接到 YARN 集群。需要配置 HADOOP_CONF_DIR。yarn-cluster集群模式 連接到 YARN 集群
。需要配置 HADOOP_CONF_DIR。
./bin/spark-submit
--class &
--master &
--deploy-mode &
... # other options
&
[application-arguments]
例如:
# Run application locally on 8 cores
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master local[8]
/path/to/examples.jar
100
# Run on a Spark standalone cluster
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master spark://207.184.161.138:7077
--executor-memory 20G
--total-executor-cores 100
/path/to/examples.jar
1000
# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master yarn-cluster # can also be `yarn-client` for client mode
--executor-memory 20G
--num-executors 50
/path/to/examples.jar
1000
# Run a Python application on a cluster
./bin/spark-submit
--master spark://207.184.161.138:7077
examples/src/main/python/pi.py
1000
Cluster Mode OverviewSubmitting Applications Spark Standalone Mode Running Spark on YARN
我最後沒有修改conf/spark-env.sh下的內容,而是在代碼中修改如下:
SparkConf sparkConf = new SparkConf().setMaster("yarn-standalone").setAppName("JavaSparkPi");
如果在IDE里可以直接val sc = new SparkContext(new SparkConf().setAppName("xxx").setMaster("local[*]"))
master 可以在代碼內部設置也可以。
val sparkConf = new SparkConf().setMaster("yarn-cluster")
也可以在命令行上面設置。
spark-submit --master yarn-cluster
bin/spark-submit --master spark://hostname:7077(埠號) py腳本
推薦閱讀:
※想研讀下spark的源碼,怎麼搭閱讀和調試的環境呢?
※[譯] 解密 Uber 數據團隊的基礎數據架構優化之路
※第四範式的人工智慧平台 Prophet 有可能替代 Spark 么?
※深入淺出Spark(三)什麼是Standalone
TAG:Spark |