0% found this document useful (0 votes)
5 views

DataVisuaization Lab

Uploaded by

Odrib Deb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DataVisuaization Lab

Uploaded by

Odrib Deb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Practical: 1

Aim: Configure Hadoop cluster in pseudo distributed mode and run basic Hadoop
commands.

Installation of Hadoop 3.3.2 on Ubuntu 18.04 LTS

1. Installing Java

$ sudo apt update


$ sudo apt install openjdk-8-jdk openjdk-8-jre
$ java -version

Set JAVA_HOME in .bashrc

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin

Apply changes of bashrc in ubuntu environment either by rebooting the system or


applying source ~/.bashrc

2. Adding dedicated hadoop user

$ sudo addgroup hadoop


$ sudo adduser --ingroup hadoop hduser

3. Adding hduser in sudoers file

$ sudo visudo

Add following line in the /etc/sudoers.tmp file

hduser ALL=(ALL:ALL) ALL

4. Now switch to hduser

$ su -hduser

5. Setting up SSH

Hadoop services like Resource Manager & Node Manager uses ssh to share the status of
nodes b/w slave to master & master to master.

$ sudo apt-get install openssh-server openssh-client

After installing ssh, generate ssh keys and copy them in ~/.ssh/authorized_keys.
Generate Keys for secure communication:
$ ssh-keygen -t rsa -P “”
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

6. Download Hadoop 3.3.2 tar file, extract it into /usr/local/Hadoop folder.

$ sudo tar xvzf hadoop-3.0.2.tar.gz

$ sudo mv -r hadoop-3.0.2 /usr/local/hadoop

7. Changing ownership to hduser:Hadoop group and full permission to them.

$ sudo chown -R hduser:hadoop /usr/local/hadoop $ sudo chmod -R 777


/usr/local/Hadoop

8. Hadoop Setup

This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as
a single Java process. A Hadoop environment is configured by editing a set of
configuration files:

bashrc hadoop-env.sh core-site.xml hdfs-site.xml mapred-site-xml yarn-site.xml

8.1 bashrc

$ sudo gedit ~/.bashrc


Add following lines at the end:

#Hadoop Related Options


export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

$ source ~/.bashrc

8.2 hadoop-env.sh

Lets change the working directory to hadoop configurations location $ cd


/usr/local/hadoop/etc/hadoop/

$ sudo gedit hadoop-env.sh


Add this line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

8.3 yarn-site.xml

$ sudo gedit yarn-site.xml


Add following lines:
<property> <name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>

</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

8.4 hdfs-site.xml

$ sudo gedit hdfs-site.xml


Add following lines: <property> <name>dfs.replication</name> <value>1</value>

</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property>
<property>
<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property>

8.5 core-site.xml

$ sudo gedit core-site.xml


Add following lines:
<property>
<name>hadoop.tmp.dir</name> <value>/home/hduser/hadoop/tmp</value>
</property>

<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value>


</property>

8.6 mapred-site.xml

$ sudo gedit mapred-site.xml


Add following lines:
<property> <name>mapred.framework.name</name> <value>yarn</value>

</property>
<property> <name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

9. Create temp directory, directory for datanode and namenode

$ sudo mkdir -p /home/hduser/hadoop/tmp


$ sudo chown -R hduser:hadoop /home/hduser/hadoop/tmp

$ sudo chmod -R 777 /home/hduser/hadoop/tmp

$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/namenode


$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/datanode
$ sudo chmod -R 777 /usr/local/hadoop/yarn_data/hdfs/namenode
$ sudo chmod -R 777 /usr/local/hadoop/yarn_data/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop/yarn_data/hdfs/namenode $ sudo
chown -R hduser:hadoop /usr/local/hadoop/yarn_data/hdfs/datanode

10. Format Hadoop namenode to get the fresh start


$ hdfs namenode -format
Start all hadoop services by executing command one by one. $ start-dfs.sh
$ start-yarn.sh

or
$ start-all.sh

Type this simple command to check if all the daemons are active and running as Java
processes:
$ jps

Following output is expected if all went well:

6960 SecondaryNameNode 7380 NodeManager


6632 NameNode
11066 Jps

7244 ResourceManager 6766 DataNode

Access Hadoop UI from Browser

The default port number 9870 gives you access to the Hadoop NameNode UI:

http://localhost:9870

The NameNode user interface provides a comprehensive overview of the entire cluster.

The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
The YARN Resource Manager is accessible on port 8088: http://localhost:8088

You might also like