Apache Spark is topping the charts as a reference for Big Data, Advanced Analytics and “fast engine for large-scale computing”. In an earlier post, we saw how to use PySpark leveraging Jupyter notebook interactive interface. Here we will see how to use Apache Toree multi-interpreter and use Spark-Kernel, SparkR and and SparkQL as well. The Github docs for Toree are still in incubator mode & wip.
STEP 1: Install Toree package with pip
$ pip install --pre toree #--pre for the latest release # jupyter toree install --this will install the default Scala Kernel, use below command to Install all Kernels.
$ jupyter toree install --spark_opts='--master=local' --kernel_name=Apache toree --interpreters=PySpark,SparkR,Scala,SQL
STEP 2: Cross Check if all the Kernels are installed
$ jupyter kernelspec list
STEP 3: Each kernel contains a kernel.json file you can further customize (like you could change display names as shown below).
"__TOREE_SPARK_OPTS__": "--packages mysql:mysql-connector-java:5.1.39 --master=local"
STEP 4: Now simply launch the notebook
$ jupyter notebook