Toree (Spark Kernel) in OSX El-Capitan

Apache Spark is topping the charts as a reference for Big Data, Advanced Analytics and “fast engine for large-scale computing”. In an earlier post, we saw how to use PySpark leveraging Jupyter notebook interactive interface. Here we will see how to use Apache Toree multi-interpreter and use Spark-Kernel, SparkR and and SparkQL as well. The Github docs for Toree are still in incubator mode & wip.

STEP 1: Install Toree package with pip

$ pip install --pre toree  #--pre for the latest release
 # jupyter toree install  --this will install the default Scala Kernel, use below command to Install all Kernels.



$ jupyter toree install --spark_opts='--master=local[2]' --kernel_name=Apache toree --interpreters=PySpark,SparkR,Scala,SQL

Apache Toree

STEP 2: Cross Check if all the Kernels are installed

$ jupyter kernelspec list

STEP 3: Each kernel contains a  kernel.json file you can further customize (like you could change display names as shown below).

"__TOREE_SPARK_OPTS__": "--packages mysql:mysql-connector-java:5.1.39 --master=local[2]"

STEP 4: Now simply launch the notebook

$ jupyter notebook