PySpark/IPython in OSX El-Capitan

STEP 1: To use Spark on Hadoop first install hadoop Installing Hadoop on OSX (El-Capitan) If not already then install HomeBrew STEP 2: Then Install Spark brew will install Spark to directory /usr/local/Cellar/apache-spark/1.5.0/ STEP 3: Create a HDFS directory for test dataset STEP 4: Download a sample book for Word Count STEP 5: Install Anaconda Python because it contains iPython and that will…

PyPlex

PyTrek (Learning path for Python) 1. Start with Why Python in Data Science ? from Jeremy Achin founder of DataRobot at PyCon 2014. 2. Install Anaconda or Canopy or Ipython or Zeppelin 3. Start with basics of Python from Codecademy or 2 day Google Class for Python 4. RegEx in Python we will use a lot of regex for data cleansing, refer Google class for Regular…