Entries by tag: spark

Including child/synonym tags

Spark dataframe Examples: Reading and Writing Dataframes  23 Feb 2020    spark scala
Some examples on how to read and write spark dataframes from sources such as S3 and databricks file systems. Read More ›

Spark SQL Case/When Examples  09 Nov 2019    spark
Case/when clauses are useful to mimic if/else behaviour in SQL and also spark, via when/otherwise clauses. Read More ›

Spark SQL Date/Datetime Function Examples  09 Nov 2019    spark scala
Examples on how to use date and datetime functions for commonly used transformations in spark sql dataframes. Read More ›

Spark Dataframe Examples: Window Functions  22 Aug 2019    spark dataframes scala
Examples on how to do common operations using window functions in apache spark dataframes. Examples using the Spark Scala API. Read More ›

Spark Dataframe Examples: Pivot and Unpivot Data  10 Aug 2019    spark scala
Pivoting and unpivoting are very commonly-used data transformation operations. Use them when you want to switch from a row-based to a column-based view and vice-versa. Read More ›

Jupyter Notebook Kernels: How to Add, Change, Remove  28 Jul 2019    jupyter-notebooks scala spark
Add, remove and change Kernels to use with Jupyter notebook Read More ›

Debugging NullPointerException in Apache Spark  04 Jun 2017    spark nullpointerexception
A lot of things can go wrong when writing distributed code in Spark. Here's a couple of ways to work around/debug NullPointerExceptions. Read More ›

Apache Spark Architecture Overview: Jobs, Stages, Tasks, etc  03 Jan 2017    spark architecture
Quick overview of the main architecture components involved in running spark jobs, so you can better understand how to make the best possible use of resources. Read More ›

Spark Streaming: Commong Pitfalls and Tips for Long-running Streaming Applications  11 Dec 2016    spark-streaming
Running Spark Streaming applications may introduce a couple of problems that you may not face when you are running Spark on Batch mode. Here are a couple of things you may need to take into account to keep long-running spark streaming jobs running smoothly. Read More ›

ApplicationAttemptNotFoundException: Spark Application Stuck in ACCEPTED state on YARN  13 Nov 2016    emr yarn spark
ApplicationAttemptNotFoundException may be cause because the log directory have become too crowded with data. Read More ›

Using the AWS CLI to manage Spark Clusters on EMR: Examples and Reference  23 Mar 2016    emr cli spark
Update Java to JDK 8 on Amazon Elastic MapReduce  22 Mar 2016    emr spark java 8
Comparing Interactive Solutions for Running Scala and Spark: Zeppelin, Spark-notebook and Jupyter-scala  07 Mar 2016    notebook interactive scala spark zeppelin
Apache Zeppelin, Spark Streaming and Amazon Kinesis: Simple Guide and Examples  19 Feb 2016    emr spark zeppelin kinesis wip
Spark DataFrame UDFs: Examples using Scala and Python  11 Nov 2015    spark udf wip
Add an Apache Zeppelin UI to your Spark cluster on AWS EMR  10 Nov 2015    aws emr spark zeppelin
Creating a Spark Cluster on AWS EMR: a Tutorial  10 Nov 2015    aws emr spark
AWS now provides full support for Spark Clusters within Elastic MapReduce (EMR). It's very simples and you just need a couple of minutes to learn how to do it. Read More ›

Creating Scala Fat Jars for Spark on SBT with sbt-assembly Plugin  18 Sep 2015    sbt hadoop spark sbt-assembly
Spark-submit: Examples and Reference  13 Sep 2015    spark scala