Spark on yarn

模式:

Deployment Mode Summary

Mode YARN Client Mode YARN Cluster Mode
Driver runs in Client ApplicationMaster
Requests resources ApplicationMaster ApplicationMaster
Starts executor processes YARN NodeManager YARN NodeManager
Persistent services YARN ResourceManager and NodeManagers YARN ResourceManager and NodeManagers
Supports Spark Shell Yes No

参考:
Spark:Yarn-cluster和Yarn-client区别与联系
Running Spark Applications on YARN

启动App

在yarn-cluster模式中启动一个application:

1
2
3
4
5
6
7
8
9
10
11
12
13
./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]

例如:

SPARK_JAR=hdfs://hansight/libs/spark-assembly-1.0.2-hadoop2.4.0.2.1.4.0-632.jar \
./bin/spark-submit --class org.apache.spark.examples.SparkPI \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar \
10

yarn的资源调度

YARN Capacity Scheduler 简介
YARN Independent RM 指标: Weight, Virtual Cores, Min and Max Memory, Max Running Apps, and Scheduling Policy

搭建环境

Spark On YARN 集群安装部署