Spark SQL: Examples on pyspark

Last updated:

WIP ALERT This is a Work in Progress

SparkContext

Skip this step if scis already available to you

from pyspark import SparkContext

sc = SparkContext("local", "Simple App")

Creating a SQLContext from a regular SparkContext

The SQLContext is used for operations such as creating DataFrames.

from pyspark.sql import SQLContext

# sc is the sparkContext
sqlContext = SQLContext(sc)

Loading a json file into a DataFrame

TODO

Basic operations: where and select

TODO

More advanced operations: join, groupBy

TODO

RDD Operations

TODO

Dialogue & Discussion