# Spark Architecture Overview: Clusters, Jobs, Stages, Tasks, etc

Last updated:

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

## Cluster

TODO


## Driver

TODO mention client and cluster mode


## Executor

TODO mention cores


## Job

A job is a sequence of stages, triggered by an action such as .count(), foreachRdd(), sortBy(), read() or write().

## Stage

A stage is a sequence of tasks that can all be run together, without a shuffle, for example, using .read to read a file from disk, then runnning .map and .filter can all be done without a shuffle, so it can fit in a single stage.

A task is a single operation on an RDD, like .map or .filter

## Shuffle

TODO


## Partition

A partition is a logic chunk of your input data. It can be processed by a single executor core.

If you have 4 data partitions and you have 4 executor cores, you can process everything in parallel, in a single pass.