Spark SQL: Examples on pyspark

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

SparkContext

Skip this step if scis already available to you

from pyspark import SparkContext

sc = SparkContext("local", "Simple App")

Create SQLContext from SparkContext

The SQLContext is used for operations such as creating DataFrames.

from pyspark.sql import SQLContext

# sc is the sparkContext
sqlContext = SQLContext(sc)

Load json file into DataFrame

TODO

where and select

TODO

join example

TODO

groupBy example

TODO

RDD Operations

TODO

Dialogue & Discussion