Toree (Spark Kernel) in OSX El-Capitan

Apache Spark is topping the charts as a reference for Big Data, Advanced Analytics and “fast engine for large-scale computing”. In an earlier post, we saw how to use PySpark leveraging Jupyter notebook interactive interface. Here we will see how to use Apache Toree multi-interpreter and use Spark-Kernel, SparkR and and SparkQL as well. The Github docs for Toree are still in incubator mode & wip.…

R with Jupyter Notebook in OSX El-Capitan

Jupyter Notebook is perfect tool to combine in one document, code, text and visuals. Here we will see how to set up Jupyter to use R on OS X, same steps can be used for linux & windows as well. Installing Anaconda – it is a free Python distribution (including commercial use and redistribution!). You can download it here then install as below. Installing…

Integrating IPython Notebook with Spark

1. To install Spark download Apache Spark from here 2. Extract Spark from the downloaded zip file and place at desired location 3. Create an Environment variable named ‘SPARK_HOME’ with path value like ‘C:\spark’ 4. Download & Install Anaconda Python distribution from here 5. Open command prompt and enter command This should create a pyspark…

PySpark/IPython in OSX El-Capitan

STEP 1: To use Spark on Hadoop first install hadoop Installing Hadoop on OSX (El-Capitan) If not already then install HomeBrew STEP 2: Then Install Spark brew will install Spark to directory /usr/local/Cellar/apache-spark/1.5.0/ STEP 3: Create a HDFS directory for test dataset STEP 4: Download a sample book for Word Count STEP 5: Install Anaconda Python because it contains iPython and that will…