Hadoop Tutorial pdf | Hadoop Cluster Configuration 2 | Hadoop (Big Data) Interview Questions and Answers

Hadoop Tutorial pdf | Hadoop Cluster Configuration 2

Hadoop Cluster Configuration 2

Download and configure Hadoop

You can download Hadoop in two different ways
1. By using your web browser.
2. By using Terminal(command prompt)

I will choose second option i.e. Terminal

1. Go to http://hadoop.apache.org/releases.html
Click on ‘Download’ link
Click on ‘Download a release now!’ link
Click on ‘http://download.nextag.com/apache/hadoop/common’
Click on ‘Stable’
download hadoop-1.1.2-bin.tar.gz
OR
You can copy the location of the file to install through terminal
(Right click - copy link location, path will be like http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz)

2. Open Master machine and open Termianl using command CTRL+ALT+T
Type below command
$ wget http://download.nextag.com/apache/hadoop/common/stable/hadoop-1.1.2-bin.tar.gz
This command will download hadoop files, it takes some time to download.
It will download in "Download" folder.

3. After downloading hadoop files you can extract in two way
one by using TERMINAL and another you can unzip using some software
I will try to extract using TERMINAL
$ tar xzf hadoop-1.1.2-bin.tar.gz or $ tar xzfv hadoop-1.1.2-bin.tar.gz
It will extract files in HOME folder hadoop-1.1.2

4. Now go to hadoop-1.2.0/config/
Three files you generally need to change.

a.hadoop-env.sh
First you need to change java class path(JVM path setting)
eg: export JAVA_HOME=/usr/lib/jvm/java-6-sun

b.core-site.xml
There are three options how you want to run hadoop
1. StandAlone or Local Mode: you need not to change anything - you just start working.
2. Psedue Distributed Mode: NN, SNN, JT, TT and DN - all run on same machine
3. Fully Distributed or Cluster mode: NameNode run on Master machine, Secondary NameNode run on some other machine and DN TT run on some other machine.
c.mapred-site.xml

How to configure Hadoop on single system.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

5. Change owner of hadoop folder
$ chown -R yash hadoop-1.1.2 -> the owner of this folder has been changed
$ chmod -R 755 hadoop-1.1.2 ->

6. Now open core-site.xml and copy below code in configuration tag
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yash/tempdir</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://yeshwanth1:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

**To run on local machine fs.dafult.name should point localhost
eg:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

6. Now open mapred-site.xml file
<property>
<name>mapred.job.tracker</name>
<value>yeshwanth1:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

**To run on local machine fs.dafult.name should point localhost
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

7. Open hdfs-site.xml add below configuration tags (Replication factor should not be more than DataNode "dfs.replication")

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>


<property>
<name>dfs.name.dir</name>
<value>/home/yash/namenodeanddatanode</value>
</property>


<property>
<name>dfs.data.dir</name>
<value>/home/yash/namenodeanddatanode</value>
</property>

8. Open masterfile
Add text "yeshwanth1" -> because my master is running on yeshwanth1

9. Openslaves -> add below names
yeshwanth1
yeshwanth2
I will keep Master machine as slave so that you can run DataNode on same machine

10. Now open TERMINAL
You can copy hadoop folder to shared machine(i.e. master to slave)
$ scp -r hadoop-1.1.2 yash@yeshwanth2:/home/yash
Or
you can use any method to copy this folder(copy paste)
After running above command you can able to see hadoop-1.1.2 in slave
machine called yeshwanth2

In master file - Location hadoop-1.1.2/config
You can able to see below text
yeshwanth1

In slaves file you can see
You can able to see below text
yeshwanth1
yeshwanth2

11. Now format your Hadoop using below commands
$ cd hadoop-1.2.0/
$ bin/hadoop namenode -format

12. Now start all the jobs using below command
$ bin/start-all.sh

$ jps (java processes)
It should show below jobs to be running
a. JobTracker
b. NameNode
c. SecondaryNameNode
d. TaskTracker
DataNode will not be working due to some reason

13. Now go to slave machine i.e. yeshwanth2
Open TERMINAL and run below command
$ jps
You can see below jobs running in slave
a. DataNode
b. TaskTracker

14. Now go to yeshwanth1
$ jps
$ bin/start-all.sh (It start the process on Master as well as on Slave also automatically)
$ jps
Now you can see all the jobs are running
NN,DN,TT,JT and SNN

** Suppose if you have three nodes, you make second or third node as SNN because if master goes down process can be handled by SNN which is on other machine.

15. Now go to yeshwanth2
$ jps
You can see below are running on this machine
a. DataNode
b. TaskTracker

Hadoop (Big Data) Interview Questions and Answers

Pages

Hadoop Tutorial pdf | Hadoop Cluster Configuration 2

No comments:

Post a Comment