0% found this document useful (0 votes)
125 views

CDH3 Pseudo Installation On Ubuntu

This document provides instructions for installing CDH3 Hadoop on Ubuntu, including: 1. Installing Java and the CDH3 package 2. Configuring environment variables for Hadoop and Java home directories 3. Configuring core-site.xml, hdfs-site.xml, and mapred-site.xml files 4. Formatting the namenode and starting all Hadoop daemons

Uploaded by

ud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

CDH3 Pseudo Installation On Ubuntu

This document provides instructions for installing CDH3 Hadoop on Ubuntu, including: 1. Installing Java and the CDH3 package 2. Configuring environment variables for Hadoop and Java home directories 3. Configuring core-site.xml, hdfs-site.xml, and mapred-site.xml files 4. Formatting the namenode and starting all Hadoop daemons

Uploaded by

ud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CDH3 Pseudo installation on Ubuntu

1) Do not create username as hadoop as you will have issues in installation.


2) Install Java
Copy the java jdk on to desktop.
$ sudo cp jdk-6u30-linux-x** /usr/local
$ cd /usr/local
$ sudo sh jdk-6u30-linux-x**
3) Install CDH3 package
Go to - http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH3/CDH3u6/CDH3Installation-Guide/CDH3-Installation-Guide.html
Click on Installing CDH3 on Ubuntu and Debian Systems
Click on - this link for a Maverick system
Install using GDebi package installer or save it and issue the command below
$ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb
$ sudo apt-get update
4) Install Hadoop
$ apt-cache search hadoop - Must show all available Hadoop Packages
$ sudo apt-get install hadoop-0.20 hadoop-0.20-native
sudo apt-get install hadoop-0.20-<daemon type> install all Daemons
5) Set Java and Hadoop Home
Using command:

gedit ~/.bashrc

# Set Hadoop-related environment variables


export HADOOP_HOME=/usr/lib/hadoop

export PATH=$PATH:/usr/lib/hadoop/bin

# Set JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.6.0_30
export PATH=$PATH:/usr/local/jdk1.6.0_30/bin
close terminals and open new one and test
echo $JAVA_HOME
echo $HADOOP HOME
6) Adding dedicated users hdfs and mapred to hadoop group

$ sudo gpasswd -a hdfs hadoop


$ sudo gpasswd -a mapred hadoop
7) Configuration
$ cd /usr/lib/hadoop/conf
Set Java Home in hadoop-env.sh
$ sudo gedit hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.6.0_30

8) core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/lib/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
$ sudo mkdir /usr/lib/hadoop/tmp
$ sudo chmod 750 /usr/lib/hadoop/tmp/
$ sudo chown hdfs:hadoop /usr/lib/hadoop/tmp/
9) hdfs-site.xml

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/storage/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/storage/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
$ sudo mkdir /storage

$ sudo chmod 775 /storage/


$ chown hdfs:hadoop /storage/

10) mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:8021</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/mapred/local</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/mapred/temp</value>

</property>
$ sudo mkdir /mapred

$ sudo chmod 775 /mapred


$ sudo chown mapred:hadoop /mapred
11) User Assignment

export HADOOP_NAMENODE_USER=hdfs
export HADOOP_SECONDARYNAMENODE_USER=hdfs
export HADOOP_DATANODE_USER=hdfs
export HADOOP_JOBTACKER_USER=mapred
export HADOOP_TASKTRACKER_USER=mapred

12) Format namenode


$ cd /usr/lib/hadoop/bin/

$ sudo -u hdfs hadoop namenode -format


You must get a successfully formatted message. Otherwise, check the error and correct it.
13) Start Daemons
$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start
$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start

Check for any errors in /var/log/hadoop-0.20 for each daemon


check all ports are opened using $netstat -ptlen
14) Check UI
localhost:50070 - Hadoop Admin
localhost:50030 - Mapreduce

You might also like