模式:
- yarn-cluster:
Spark的driver运行YARN集群启动的一个application master进程中,client在初始化application后可以消失。
Spark on YARN集群模式作业运行全过程分析 - yarn-client:
Spark的driver运行在client进程中,而application master只用来向YARN申请资源。
Spark on YARN客户端模式作业运行全过程分析
Deployment Mode Summary
Mode | YARN Client Mode | YARN Cluster Mode |
---|---|---|
Driver runs in | Client | ApplicationMaster |
Requests resources | ApplicationMaster | ApplicationMaster |
Starts executor processes | YARN NodeManager | YARN NodeManager |
Persistent services | YARN ResourceManager and NodeManagers | YARN ResourceManager and NodeManagers |
Supports Spark Shell | Yes | No |
参考:
Spark:Yarn-cluster和Yarn-client区别与联系
Running Spark Applications on YARN
启动App
在yarn-cluster模式中启动一个application:1
2
3
4
5
6
7
8
9
10
11
12
13./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]
例如:
SPARK_JAR=hdfs://hansight/libs/spark-assembly-1.0.2-hadoop2.4.0.2.1.4.0-632.jar \
./bin/spark-submit --class org.apache.spark.examples.SparkPI \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar \
10
yarn的资源调度
YARN Capacity Scheduler 简介
YARN Independent RM 指标: Weight, Virtual Cores, Min and Max Memory, Max Running Apps, and Scheduling Policy