Friday, June 14, 2013

How to Build HADOOP CLUSTER Using 2 Linux Machines

STEP 1) Install Java 6 or above on Linux machine ( jdk1.6.0.12 )
I am having 'jdk-6u12-linux-i586.bin' on my REDHAT machine.
To Install follow commands :
# chmod 744 jdk-6u12-linux-i586.bin
# ./ jdk-6u12-linux-i586.bin

STEP 2) Download ''
extract it.
# cp -f jce/*.jar $JAVA_HOME/jre/lib/seciruty/
# chmod 444 $JAVA_HOME/jre/lib/seciruty/*.jar

STEP 3) Download hadoop-0.20.0.tar.gz or any latest version
extract it and copy ' hadoop-0.20.0' folder to '/usr/local/' directory.

# export JAVA_HOME=/java_installation_folder/jdk1.6.0_12
# export HADOOP_HOME=/usr/local/hadoop-0.20.2

Install same on second Linux machine
Then Description of machines is :

Server IP                             HostName                                Role

1)             hostmaster         Master [ NameNode and JobTracker ]
2)             hostslave            Slave [ Datanode and TaskTracker]

STEP 6) Now do following settings on Master :

# vim /etc/hosts
make changes as...
comment all and write at the end hostmaster
save and exit

Changes to be made on Slave Machine :

# vim /etc/hosts
make changes as...
comment all and write at the end hostslave hostmaster
save and exit

STEP 7) For Communication setup SSH :

Do the steps on master as well as on slave-
# ssh-keygen -t rsa
it generates the RSA public & private keys.
This is because Hadoop Master Node communicates with Slave Node using SSH.
This will generate '' file under '/root/.ssh' directory. Now rename the Master's to '' and copy it to Slave Node (at same path).
Then execute the following command to add the Master's public key to the Slave's authorized keys.

# cat /root/.ssh/ >> /root/.ssh/authorized_keys

Now try to ssh the Slave Node. It should be connected without needing any password.

# ssh

STEP 8) Setting up MASTER NODE :
Setup Hadoop to work in a fully distributed mode by configuring the configuration files under the $HADOOP_HOME/conf/ directory.

Configuration Property :
Property                                               Explanation
1)                              NameNode URI
2) mapred.job.tracker                       JobTracker URI
3) dfs.replication                                Number of replication
4) hadoop.tmp.dir (optional)              Temp Directory

Let us Start with Configuration files :

1) $HADOOP_HOME/conf/
make change as...
export JAVA_HOME=/java_installation_folder/jdk1.6.0_12

2) $HADOOP_HOME/conf/core-site.xml


3) $HADOOP_HOME/conf/hdfs-site.xml


4) $HADOOP_HOME/conf/mapred-site.xml


5) $HADOOP_HOME/conf/masters

6) $HADOOP_HOME/conf/slaves

Now copy all these files to /conf directory of SLAVE Machine.

STEP 9) Setup Master and Slave Node : (run on both machines)

# hadoop namenode -format

Now your Cluster is Ready to run Jobs

No comments:

Post a Comment