Spark-submit: Examples and Reference

Spark-submit: Examples and Reference

Last updated:
Spark-submit: Examples and Reference
Source

Unless otherwise noted, examples reflect Spark 2.x, running on a local setup, on client mode

Simplest possible example

All commands can be written on a single line, but for presentation purposes I've used a backward slash ("\"), which can be used to signal line breaks in shells like bash

Using a Scala Jar file and running locally (no cluster) using 2 threads.

$ spark-submit  --class name.space.to.MyMainClass \
                --master local[2] \
                path/to/my-spark-fat-jar.jar \
                argument1 \
                argument2 \
                argument3

Setting memory options

Set total driver memory to 8 gigbytes

$ spark-submit  --class name.space.to.MyMainClass \
                --driver-memory 8G
                --master local[2] \
                path/to/my-spark-fat-jar.jar

Pass general config options

For instance, if you want Spark to use 70% of available JVM heap space for itself (default value is 0.6)

$ spark-submit  --class name.space.to.MyMainClass \
                --conf "spark.memory.fraction=0.7" \
                --master local[2] \
                path/to/my-spark-fat-jar.jar

Pass general JVM flags

It's forbidden to pass JVM heap size configs here (use other command-line parameters such as --driver-memory or --executor-memory)

These may be passed to the driver JVM or to the executor JVMs separately. To increase garbage collection verbosity on the driver and the executor:

$ spark-submit  --class name.space.to.MyMainClass \
                --conf "spark.driver.extraJavaOptions=-XX:+PrintGCDetails" \
                --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails" \
                --master local[2] \
                path/to/my-spark-fat-jar.jar

References

Dialogue & Discussion