Hadoop in OSX El-Capitan

STEP 1: First Install HomeBrew, download it from http://brew.sh

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

STEP 2: Install Hadoop

$ brew search hadoop
$ brew install hadoop

Hadoop will be installed at path /usr/local/Cellar/hadoop

STEP 3: Configure Hadoop:

Edit hadoop-env.sh, the file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh where 2.6.0 is the hadoop version. Change the line

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

to

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Edit Core-site.xml, The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml add below config

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Edit mapred-site.xml, The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank  add below config

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:9010</value>
 </property>
</configuration>

Edit hdfs-site.xml, The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hdfs-site.xml add

<configuration>
 <property>
  <name>dfs.replication</name>
  <value></value>
 </property>
</configuration>

To simplify life edit a ~/.profile and add the following commands. By default ~/.profile might not exist.

alias hstart=<"/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh">
alias hstop=<"/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh">

and source it

$ source ~/.profile

Before running Hadoop format HDFS

$ hdfs namenode -format

STEP 4: To verify if SSH Localhost is working check for files ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If they don’t exist generate the keys using below command

$ ssh-keygen -t rsa

Enable Remote Login: “System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys: To allow your system to accept login, we have to make it aware of the keys that will be used

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Test login.

$ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit

STEP 5: Run Hadoop

$ hstart

hstart

and stop using

$ hstop

STEP 6: Access Hadoop web interface by connecting to

Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088/
Node Specific Info: http://localhost:8042/

Command
$ jps
7379 DataNode
7459 SecondaryNameNode
7316 NameNode
7636 NodeManager
7562 ResourceManager
7676 Jps 

$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs

STEP 7: Wordcount: Create a file called WordCount.java.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.StringTokenizer;

public class WordCount extends Configured implements Tool {
   private final static LongWritable ONE = new LongWritable(1L);

// Mapper Class, Counts words in each line. For each line, break the line into words and emits them as (word, 1)

public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
  private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();

public void map(LongWritable key, Text value,
   OutputCollector<text, intwritable> output,
   Reporter reporter) throws IOException {

  String line = value.toString();
  StringTokenizer itr = new StringTokenizer(line);
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
     output.collect(word, one);
   }
 }
}

// Reducer class that just emits the sum of the input values.

public static class Reduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable > {
public void reduce(Text key, Iterator values,
 OutputCollector<text, intwritable=""> output,
 Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      output.collect(key, new IntWritable(sum));
    }
  }

static int printUsage() {
System.out.println("wordcount [-m #mappers ] [-r #reducers] input_file output_file");
    ToolRunner.printGenericCommandUsage(System.out);
    return -1;
  }

public int run(String[] args) throws Exception {

    JobConf conf = new JobConf(getConf(), WordCount.class);
    conf.setJobName("wordcount");

// the keys are words (strings)
   conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
   conf.setOutputValueClass(IntWritable.class);

   conf.setMapperClass(MapClass.class);
// Here we set the combiner!!!!
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);

  List other_args = new ArrayList();
   for(int i=0; i < args.length; ++i) {
     try {
        if ("-m".equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
        } else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }
      } catch (NumberFormatException except) {
        System.out.println("ERROR: Integer expected instead of " + args[i]);
        return printUsage();
      } catch (ArrayIndexOutOfBoundsException except) {
        System.out.println("ERROR: Required parameter missing from " +
            args[i-1]);
        return printUsage();
      }
    }
// Make sure there are exactly 2 parameters left.
   if (other_args.size() != 2) {
      System.out.println("ERROR: Wrong number of parameters: " +
          other_args.size() + " instead of 2.");
      return printUsage();
    }
    FileInputFormat.setInputPaths(conf, other_args.get(0));
    FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));

    JobClient.runJob(conf);
    return 0;
  }

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new WordCount(), args);
    System.exit(res);
  }
}

Compile:

$ javac WordCount.java -cp $(hadoop classpath)

The hadoop classpath provides the compiler with all the paths it needs to compile correctly and you should see a resulting WordCount.class appear in the directory.

Execute: Test the jar file using

% hadoop jar ./target/bdp-1.3.jar dataSet3.txt  dataOutput1

  • txt is the data file we uploaded using put
  • dataOutput will be the folder where results are added

Uploading Data Files
hdfs dfs -put book.txt /data
hdfs dfs -ls /

hdfs

Advertisements

14 thoughts on “Hadoop in OSX El-Capitan

  1. Pingback: Installing Hadoop on mac os – DevilSADVOCATEDIWAKAR

  2. I have followed all the steps correctly till “hdfs namenode -format”. And when i execute this, it says command not found. Please help.

    Like

  3. Are we not setting hadoop_home ? Coz the hadoop -version command doesn’t say anything.

    Like

  4. When I create .profile and edit it, this happens:

    MacBook-Pro-de-Hector:~ Hector$ source ~/.profile
    -bash: /Users/Hector/.profile: line 2: syntax error near unexpected token newline'
    -bash: /Users/Hector/.profile: line 2:
    alias hstart=<“/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh”>’

    Do you know the problem?

    Like

  5. Thanks for the installation directions, they worked great! Do you know where the log files are stored so that they are retrievable to see where executions go wrong when running a hadoop job? Thanks!

    Like

    • In case anyone comes has the same question and needs an answer, I found the directory where my log files are @ :
      /usr/local/Cellar/hadoop/2.7.3/libexec/logs

      Liked by 1 person

  6. Hi , i’m a noob .
    sorry for the questions .

    hadoop jar ./target/bdp-1.3.jar dataSet3.txt dataOutput1″

    where will be the jar file formed ??

    Like

  7. Thank you for the tutorial .. But when i restarted my machine ” hstart ” is not working .. anything i’m missing ?

    Like

Comments are closed.