Debugging NullPointerException in Apache Spark
Last updated:Table of Contents
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
A variable may be null inside map, flatMap, etc block
Code inside blocks like
map
andflatMap
get executed in worker/executor nodes.
It may be the case (particularly when you're running on actual, multi-node clusters rather than standalone setups) that some variables or function defined outside these blocks fail to be correctly sent across the network to the executor nodes, where this code is actually run, thus causing NullPointerException
if you try to call a method on null
.
Example (spark 2.1)
val HTML_TAGS_PATTERN = """<[^>]+>""".r
spark
.sparkContext
.textFile(pathToInputFile, numPartitions)
.toDS()
.map { str =>
var body: String = ""
// NEXT LINE TRIGGERS NPE
body = HTML_TAGS_PATTERN.replaceAllIn(str, " ")
// other code here
}
The example above causes a NUllPointerException (NPE), while the code below doesn't:
spark
.sparkContext
.textFile(pathToInputFile, numPartitions)
.toDS()
.map { str =>
var body: String = ""
// NEXT LINE DOES NOT TRIGGER NPE
body = """<[^>]+>""".r.replaceAllIn(str, " ")
// other code here
}
References: