Add an Apache Zeppelin UI to your Spark cluster on AWS EMRLast updated:
- Apache Zeppelin
- Step 1) Launch cluster with Zeppelin as add-on
- Step 2) Open SSH Tunnel to cluster
- Step 3) Install and configure FoxyProxy
- Step 4) Navigate to your cluster URL, port 8890
- More links
This is a small guide on how to add Apache Zeppelin to your Spark cluster on AWS Elastic MapReduce (EMR). It's the easiest way to get interactive access to Spark and be able to view results immediately.
TL;DR: You need to a) start a (non auto-terminating) cluster with the Zeppelin add-on using a private key, b) enable web connection with a SSH tunnel, c) enable a proxy on your browser (such as FoxyProxy) and d) navigate to http://your-cluster-url:8890
Apache Zeppelin is an incubating Apache project, whose aim is to project a generic notebook-like access to various backends. Spark is but one of them.
For info on how to launch a Spark cluster on AWS EMR, look at Creating a Spark Cluster on AWS EMR: a tutorial
A word of advice here: in order to be able to use Zeppelin on your cluster, you cannot have it automatically terminate after it is run.
This raises the possibility of your leaving your cluster active for an extended period of time if you're not careful. If you forget to manually terminate it after you're done using it, you can incur a large bill, denominated in U.S. dollars.
Don't forget to terminate your cluster after you're done using Zeppelin.
Step 1) Launch cluster with Zeppelin as add-on
YOUR_KEY_NAME with your private key name.
$ aws emr create-cluster \ --name "1-node Zeppelin cluster (turn me off after use)" \ --instance-type m3.xlarge \ --release-label emr-4.1.0 \ --instance-count 1 \ --ec2-attributes KeyName=YOUR_KEY_NAME \ --use-default-roles \ --applications Name=Spark Name=Zeppelin-Sandbox \ --no-auto-terminate
Step 2) Open SSH Tunnel to cluster
After you find out your external IP, you can open an SSH Tunnel to it:
<port>can be any port you can use any that's not in use (higher in value than 1024), for example
<cluster_address>will vary for you; it should look something like
ssh -i /path/to/your/key.pem -ND <port> <cluster_address>
Step 3) Install and configure FoxyProxy
Follow instructions on this AWS post
Step 4) Navigate to your cluster URL, port 8890
Make sure FoxyProxy (the little blue dot to the right of the adress bar) is enabled!
Using the AWS CLI to manage Spark Clusters on EMR: Examples and Reference
Set up an SSH tunnel to AWS so that you can access your cluster securely with a browser
Configure a proxy tool on your browser so that it uses the SSH Tunnel created in the previous step to connect to your cluster