When extracting features, from a dataset, it is often useful to transform categorical features into vectors so that you can do vector operations (such as calculating the cosine distance) on them.

Think about it for

Read more

WIP ALERT This is a work in progress

Read a CSV file into a DataFrame

TODO http://pandas.pydata.org/pandas-docs/version/0.16.2/generated/pandas.read_csv.html

Select rows based on the value of a

Read more

WIP ALERT This is a Work in Progress

What are User-Defined functions ?

They are function that operate on a DataFrame's column. For instance, if you have a Column that represents an age feature, you could

Read more

WIP ALERT This is a Work in progress

This is a small guide on how to add Apache Zeppelin to your Spark cluster on AWS Elastic MapReduce (EMR). It's the easiest way to get interactive access to Spark and be able to

Read more

AWS Elastic MapReduce is a way to remotely create and control Hadoop and Spark clusters on AWS.

You can think of it as something like Hadoop-as-a-service; you spin up a clu

Read more