DataVisuaization Lab
DataVisuaization Lab
Aim: Configure Hadoop cluster in pseudo distributed mode and run basic Hadoop
commands.
1. Installing Java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin
$ sudo visudo
$ su -hduser
5. Setting up SSH
Hadoop services like Resource Manager & Node Manager uses ssh to share the status of
nodes b/w slave to master & master to master.
After installing ssh, generate ssh keys and copy them in ~/.ssh/authorized_keys.
Generate Keys for secure communication:
$ ssh-keygen -t rsa -P “”
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
8. Hadoop Setup
This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as
a single Java process. A Hadoop environment is configured by editing a set of
configuration files:
8.1 bashrc
$ source ~/.bashrc
8.2 hadoop-env.sh
8.3 yarn-site.xml
</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
8.4 hdfs-site.xml
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property>
8.5 core-site.xml
8.6 mapred-site.xml
</property>
<property> <name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
or
$ start-all.sh
Type this simple command to check if all the daemons are active and running as Java
processes:
$ jps
The default port number 9870 gives you access to the Hadoop NameNode UI:
http://localhost:9870
The NameNode user interface provides a comprehensive overview of the entire cluster.
The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
The YARN Resource Manager is accessible on port 8088: http://localhost:8088