Hadoop 3 Installation
Hadoop 3 Installation
22.04
By Rahul6 Mins Read
Understanding unstructured data and analyzing massive amounts of data is a different ball game today.
And so, businesses have resorted to Apache Hadoop and other related technologies to manage their
unstructured data more efficiently. Not just businesses but also individuals are using Apache Hadoop
for various purposes, such as analyzing large datasets or creating a website that can process user
queries. However, installing Apache Hadoop on Ubuntu may seem like a difficult task for users new to
the world of Linux servers. Fortunately, you don’t need to be an experienced system administrator to
install Apache Hadoop on Ubuntu.
The following step-by-step installation guide will get you through the entire process from downloading
the software to configuring the server with ease. In this article, we will explain how to install Apache
Hadoop on Ubuntu 22.04 LTS system. This can be also used for other Ubuntu versions.
2. Once you have successfully installed it, check the current Java version:
java -version
3. You can find the location of the JAVA_HOME directory by running the following command.
That will be required latest in this article.
dirname $(dirname $(readlink -f $(which java)))
ADVERTISEMENT
Step 2: Create User for Hadoop
All the Hadoop components will run as the user that you create for Apache Hadoop, and the user will
also be used for logging in to Hadoop’s web interface. You can create a new user account with the
“sudo” command or you can create a user account with “root” permissions. The user account with root
permissions is more secure but might not be as convenient for users who are not familiar with the
command line.
1. Run the following command to create a new user with the name “hadoop”:
sudo adduser hadoop
ADVERTISEMENT
3. Now configure password-less SSH access for the newly created hadoop user. Generate an SSH
keypair first:
ssh-keygen -t rsa
4. Copy the generated public key to the authorized key file and set the proper permissions:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys
You will be asked to authenticate hosts by adding RSA keys to known hosts. Type yes and hit
Enter to authenticate the localhost:
2. Once you’ve downloaded the file, you can unzip it to a folder on your hard drive.
tar xzf hadoop-3.3.4.tar.gz
3. Rename the extracted folder to remove version information. This is an optional step, but if you
don’t want to rename, then adjust the remaining configuration paths.
mv hadoop-3.3.4 hadoop
4. Next, you will need to configure Hadoop and Java Environment Variables on your system.
Open the ~/.bashrc file in your favorite text editor:
nano ~/.bashrc
Append the below lines to the file. You can find the JAVA_HOME location by running
dirname $(dirname $(readlink -f $(which java))) command on the
terminal.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME export
HADOOP_MAPRED_HOME=$HADOOP_HOME export
HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:
$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-
Djava.library.path=$HADOOP_HOME/lib/native"
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-
amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
1 export
2 HADOOP_MAPRED_HOME=$HADOOP_HOME
3 export
4 HADOOP_COMMON_HOME=$HADOOP_HOME
5
export HADOOP_HDFS_HOME=$HADOOP_HOME
6
7 export HADOOP_YARN_HOME=$HADOOP_HOME
8 export
9 HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP
1 _HOME/lib/native
0 export PATH=$PATH:$HADOOP_HOME/sbin:
$HADOOP_HOME/bin
export HADOOP_OPTS="-
Djava.library.path=$HADOOP_HOME/lib/native"
Save the file and close it.
5. Load the above configuration in the current environment.
source ~/.bashrc
6. You also need to configure JAVA_HOME in hadoop-env.sh file. Edit the Hadoop
environment variable file in the text editor:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Search for the “export JAVA_HOME” and configure it with the value found in step 1. See the
below screenshot:
2. Next, edit the core-site.xml file and update with your system hostname:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
1 <property>
2 <name>fs.defaultFS</name>
3
4
<value>hdfs://localhost:9000</value>
5 </property>
6 </configuration>
<configuration
>
<property>
<name>df
s.replication</
name>
<value>1
</value>
</property>
1
2 <property>
3
<name>df
4
5 s.name.dir</
6 name>
7 <value>fil
8 e:///home/
9 hadoop/
1 hadoopdata/
0
hdfs/
1
1 namenode</
1 value>
2 </property>
1
3 <property>
1 <name>df
4
s.data.dir</
1
5 name>
1 <value>fil
6 e:///home/
hadoop/
hadoopdata/
hdfs/
datanode</
value>
</property>
</
configuration>
<configuration
>
<property>
<name>m
1 apreduce.frame
2 work.name</
3
4
name>
5 <value>y
6 arn</value>
</property>
</
configuration>
<configuration
>
<property>
<name>y
arn.nodemanag
1 er.aux-
2 services</
3
4
name>
5 <value>m
6 apreduce_shuff
le</value>
</property>
</
configuration>
Once the namenode directory is successfully formatted with hdfs file system, you will see the
message “Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully
formatted“.
Once all the services started, you can access the Hadoop at: http://localhost:9870
And the Hadoop application page is available at http://localhost:8088
Conclusion
Installing Apache Hadoop on Ubuntu can be a tricky task for newbies, especially if they only follow
the instructions in the documentation. Thankfully, this article provides a step-by-step guide that will
help you install Apache Hadoop on Ubuntu with ease. All you have to do is follow the instructions
listed in this article, and you can be sure that your Hadoop installation will be up and running in no
time.