Toree (Spark Kernel) in OSX El-Capitan

Apache Spark is topping the charts as a reference for Big Data, Advanced Analytics and “fast engine for large-scale computing”. In an earlier post, we saw how to use PySpark leveraging Jupyter notebook interactive interface. Here we will see how to use Apache Toree multi-interpreter and use Spark-Kernel, SparkR and and SparkQL as well. The Github docs for Toree are still in incubator mode & wip.…

Solr in OSX El-Capitan

  STEP 1: for osx solr can be installed from Homebrew   STEP 2: To launch Solr, run:   STEP 3: Then open http://localhost:8983/solr in browser you will see solr admin ui   STEP 4: INDEXING DATA – now the Solr server is up and running, but it doesn’t contain any data. The solr/bin directory includes the post* tool in order to…

R with Jupyter Notebook in OSX El-Capitan

Jupyter Notebook is perfect tool to combine in one document, code, text and visuals. Here we will see how to set up Jupyter to use R on OS X, same steps can be used for linux & windows as well. Installing Anaconda – it is a free Python distribution (including commercial use and redistribution!). You can download it here then install as below. Installing…

Kafka in OSX El-Capitan

Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s Producer-Consumer model it becomes easy to implement multiple data consumers that do live monitoring as well persistent data storage for later analysis. STEP 1: Installation, the best way to install the latest version of the Kafka…

PySpark/IPython in OSX El-Capitan

STEP 1: To use Spark on Hadoop first install hadoop Installing Hadoop on OSX (El-Capitan) If not already then install HomeBrew STEP 2: Then Install Spark brew will install Spark to directory /usr/local/Cellar/apache-spark/1.5.0/ STEP 3: Create a HDFS directory for test dataset STEP 4: Download a sample book for Word Count STEP 5: Install Anaconda Python because it contains iPython and that will…

Hive in OSX El-Capitan

STEP 1: MySql should be installed as prerequisite If you get error Error: The `brew link` step did not complete successfully The formula built, but is not symlinked into /usr/local Could not symlink include/mysql /usr/local/include is not writable. Make /usr/ writeable & try again using: STEP 2: Install Hive STEP 3: Add hadoop and hive to your path…

Hadoop in OSX El-Capitan

STEP 1: First Install HomeBrew, download it from http://brew.sh STEP 2: Install Hadoop Hadoop will be installed at path /usr/local/Cellar/hadoop STEP 3: Configure Hadoop: Edit hadoop-env.sh, the file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh where 2.6.0 is the hadoop version. Change the line to Edit Core-site.xml, The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml add below config Edit mapred-site.xml, The file can be…