HADOOP INSTALLATION ON SINGLE NODE UBUNTU 12.04

    INSTALLATION GUIDE FOR HADOOP IN UBUNTU 12.04 (SINGLE NODE)

 

Installing Java-7-oracle

 

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java7-installer

 

 

Create a separate user for hadoop

 

$ sudo addgroup hadoop

$ sudo adduser –ingroup hadoop hduser

 

Configure SSH

 

su – hduser

ssh-keygen -t rsa -P “”

 

To be sure that SSH installation is went well, you can open a new terminal and try to create ssh session using hduser by the following command:

$ssh localhost

 

if localhost is not connected reinstalltion of ssh is needed

 

sudo apt-get install openssh-server

 

Edit Sudoers

pkexec visudo

 

Add below line to add hduser into sudoers

hduser (ALL)=(ALL) ALL

 

Ctrl + O to save nano 

 

and exit from the editor  

 

Disable IPv6

 

 

$sudo gedit /etc/sysctl.conf

This command will open sysctl.conf in text editor, you can copy the following lines at the end of the file:

 

#disable ipv6

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

 

If you faced a problem telling you don’t have permissions, just remember to run the previous commands by your root account.

These steps required you to reboot your system, but alternatively, you can run the following command to re-initialize the configurations again.

 

$sudo sysctl -p

To make sure that IPV6 is disabled, you can run the following command:

$cat /proc/sys/net/ipv6/conf/all/disable_ipv6

 

Configuration of Hadoop

Installing Hadoop

 

Now we can download Hadoop to begin installation. Go to Apache Downloads and download Hadoop version 1.0.4.(current stable version)

Then you need to extract the tar file and rename the extracted folder to ‘hadoop’. Open a new terminal and run the following command:

 

$ cd /home/hduser

$ sudo tar xzf hadoop-0.20.2.tar.gz

$ sudo mv hadoop-0.20.2 hadoop

 

Update $HOME/.bashrc

You will need to update the .bachrc for hduser (and for every user you need to administer Hadoop). To open .bachrc file, you will need to open it as root:

 

$sudo gedit /home/hduser/.bashrc

 

Then you will add the following configurations at the end of .bachrc file

 

# Set Hadoop-related environment variables

 

export HADOOP_HOME=/home/hduser/hadoop

 

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)

 

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

 

# Some convenient aliases and functions for running Hadoop-related commands

 

unalias fs &> /dev/null

alias fs=”hadoop fs”

unalias hls &> /dev/null

alias hls=”fs -ls”

 

# If you have LZO compression enabled in your Hadoop cluster and

# compress job outputs with LZOP (not covered in this tutorial):

# Conveniently inspect an LZOP compressed file from the command

# line; run via:

#

# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo

#

# Requires installed ‘lzop’ command.

#

lzohead () {

    hadoop fs -cat $1 | lzop -dc | head -1000 | less

}

 

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:$HADOOP_HOME/bin

 

 

hadoop-env.sh

We need only to update the JAVA_HOME variable in this file. Simply you will open this file using a text editor using the following command:

 

$sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh

 

or

 

nano /home/hduser/hduser/hadoop/conf/hadoop-env.sh

 

Then you will need to change the following line

 

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

 

To 

 

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

 

Note: if you faced “Error: JAVA_HOME is not set” Error while starting the services, then you seems that you forgot toe uncomment the previous line (just remove #).

 

core-site.xml

First, we need to create a temp directory for Hadoop framework. If you need this environment for testing or a quick prototype (e.g. develop simple hadoop programs for your personal test …), I suggest to create this folder under /home/hduser/ directory, otherwise, you should create this folder in a shared place under shared folder (like /usr/local …) but you may face some security issues. But to overcome the exceptions that may caused by security (like java.io.IOException), I have created the tmp folder under hduser space.

 

To create this folder, type the following command:

 

$ sudo mkdir  /home/hduser/tmp

 

Please note that if you want to make another admin user (e.g. hduser2 in hadoop group), you should grant him a read and write permission on this folder using the following commands:

 

 

$ sudo chown hduser:hadoop /home/hduser/tmp

 

$ sudo chmod 755 /home/hduser/tmp

Now, we can open hadoop/conf/core-site.xml to edit the hadoop.tmp.dir entry.

We can open the core-site.xml using text editor:

 

$sudo gedit /home/hduser/hadoop/conf/core-site.xml

 

or

 

nano /home/hduser/hduser/hadoop/conf/core-site.xml

 

Then add the following configurations between .. xml elements:

 

  hadoop.tmp.dir

  /home/hduser/tmp

  A base for other temporary directories.

 

  fs.default.name

  hdfs://localhost:54310

  The name of the default file system.  A URI whose

  scheme and authority determine the FileSystem implementation.  The

  uri’s scheme determines the config property (fs.SCHEME.impl) naming

  the FileSystem implementation class.  The uri’s authority is used to

  determine the host, port, etc. for a filesystem.

mapred-site.xml

We will open the hadoop/conf/mapred-site.xml using a text editor and add the following configuration values (like core-site.xml)

nano /home/hduser/hduser/hadoop/conf/mapred-site.xml

 

  mapred.job.tracker

  localhost:54311

  The host and port that the MapReduce job tracker runs

  at.  If “local”, then jobs are run in-process as a single map

  and reduce task.

 

hdfs-site.xml

Open hadoop/conf/hdfs-site.xml using a text editor and add the following configurations:

 

nano /home/hduser/hduser/hadoop/conf/hdfs-site.xml

 

  dfs.replication

  1

  Default block replication.

  The actual number of replications can be specified when the file is created.

  The default is used if replication is not specified in create time.

 

Formatting NameNode

~/hduser/hadoop/bin/hadoop namenode -format

 

You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.

Run the following command

 

$/home/hduser/hadoop/bin/hadoop namenode -format

 


NameNode Formatting

Starting Hadoop Cluster

You will need to navigate to hadoop/bin directory and run ./start-all.sh script.

cd ~/hduser/hadoop/bin/

./start-all.sh

 


Starting Hadoop Services using ./start-all.sh

 

There is a nice tool called jps. You can use it to ensure that all the services are up.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s