R with Jupyter Notebook in OSX El-Capitan

Jupyter Notebook is perfect tool to combine in one document, code, text and visuals. Here we will see how to set up Jupyter to use R on OS X, same steps can be used for linux & windows as well. Installing Anaconda – it is a free Python distribution (including commercial use and redistribution!). You can download it here then install as below. Installing…

Integrating IPython Notebook with Spark

1. To install Spark download Apache Spark from here 2. Extract Spark from the downloaded zip file and place at desired location 3. Create an Environment variable named ‘SPARK_HOME’ with path value like ‘C:\spark’ 4. Download & Install Anaconda Python distribution from here 5. Open command prompt and enter command This should create a pyspark…

Kafka in OSX El-Capitan

Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s Producer-Consumer model it becomes easy to implement multiple data consumers that do live monitoring as well persistent data storage for later analysis. STEP 1: Installation, the best way to install the latest version of the Kafka…

PySpark/IPython in OSX El-Capitan

STEP 1: To use Spark on Hadoop first install hadoop Installing Hadoop on OSX (El-Capitan) If not already then install HomeBrew STEP 2: Then Install Spark brew will install Spark to directory /usr/local/Cellar/apache-spark/1.5.0/ STEP 3: Create a HDFS directory for test dataset STEP 4: Download a sample book for Word Count STEP 5: Install Anaconda Python because it contains iPython and that will…

Hive in OSX El-Capitan

STEP 1: MySql should be installed as prerequisite If you get error Error: The `brew link` step did not complete successfully The formula built, but is not symlinked into /usr/local Could not symlink include/mysql /usr/local/include is not writable. Make /usr/ writeable & try again using: STEP 2: Install Hive STEP 3: Add hadoop and hive to your path…