HDFS Installation Steps
HDFS Installation Steps
those in the class. This is just to save time for installation during the class.
Windows
Ideally, HDFS is not meant to be installed on Windows, but you can follow these
steps to install HDFS. You might face some issues while installing it, which is possible,
in a few cases, installation might not work at all, don’t worry, we will try to resolve it
on the best-effort basis.
1. If you see something like this (e.g., java version "1.8.0_221"), it’s already
installed. It might be possible that you have a different Java version installed,
ensure that it is JDK 8 or higher, as Hadoop works well with JDK 8. If Java is
installed, you can skip Step 1 and directly proceed with Step 2 to download
and configure Hadoop.
Hadoop requires Java to run, so we need to install Java first. Here's how to do it:
1. Open your web browser and go to the Oracle Java Downloads page or
AdoptOpenJDK.
2. Look for Java SE 8 (also called JDK 8).. Select the version for Windows.
3. Download the JDK installer file for your system (likely Windows x64 Installer
if you're using a 64-bit version of Windows).
The JAVA_HOME environment variable tells the system where Java is installed.
Example:
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_221
5. Now, select the Path variable under System Variables and click Edit.
○ Click New and add %JAVA_HOME%\bin to the list of paths.
○ Click OK to save the changes.
1. Open the Command Prompt (type cmd in the Windows search bar and press
Enter).
In the Command Prompt, type the following command and press Enter:
java -version
If Java is installed correctly, when you type the above command in the command
prompt you should see an output like this:
1. Open your web browser and go to the official Apache Hadoop Releases page:
Hadoop Release Page.
2. Scroll down and download the latest stable release of Hadoop, such as
Hadoop 3.x.x. Click on the binary release link (e.g., hadoop-3.3.4.tar.gz).
1. Once the Hadoop archive (.tar.gz file) is downloaded, you need to extract it.
You can use a tool like 7-Zip to extract the archive:
○ Right-click on the file and select Extract to hadoop-3.3.x/.
2. After extraction, move the extracted folder to a directory where you want to
install Hadoop (for example, C:\hadoop).
1. Create a new folder called bin inside your Hadoop installation directory
(C:\hadoop).
2. Place the winutils.exe file inside this bin folder (C:\hadoop\bin).
Now that we have Java, Hadoop, and WinUtils in place, let’s configure Hadoop.
1. Navigate to C:\hadoop\etc\hadoop.
2. Open the file core-site.xml in a text editor (such as Notepad).
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoop/data/datanode</value>
</property>
</configuration>
These folders will store the metadata and data blocks of HDFS.
Step 5: Format the NameNode
Before we start HDFS, we need to format the NameNode, which is the main node of
the HDFS cluster.
2. If the formatting is successful, you will see messages indicating that the
NameNode has been formatted.
2. Run the following command to start the Hadoop distributed file system
(HDFS):
start-dfs.cmd
3. In the second Command Prompt window, run this command to check the
status of the HDFS directories:
hdfs dfs -ls /
○
○ List the directories in the root directory of HDFS:
hdfs dfs -ls /
MAC
1. Open Terminal (you can find it via Spotlight or in Applications > Utilities).
2. Type the following command and press Enter:
java -version
If you don’t have JDK 8 or higher version installed, follow these steps to install JDK 8.
1. Double-click the downloaded .dmg file and follow the on-screen instructions
to install JDK 8.
2. Once the installation is complete, you need to set up the JAVA_HOME
environment variable.
After installing Java JDK 8, the JAVA_HOME variable needs to be set so Hadoop can
find it.
1. Open the Terminal and edit your shell configuration file, depending on your
shell:
2. For Zsh (default shell in macOS Catalina and later):
nano ~/.zshrc
3. For Bash (if you're using an older macOS version):
nano ~/.bash_profile
4. Add the following line to the bottom of the file:
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
5. Save the file by pressing CTRL + O, then CTRL + X to exit the editor.
6. Apply the changes by running:
source ~/.zshrc # or `source ~/.bash_profile` for Bash users
7. Verify that JAVA_HOME is set correctly by typing:
echo $JAVA_HOME
8. This should return a path similar to
/Library/Java/JavaVirtualMachines/jdk1.8.0_301.jdk/Contents/Home.
9. Finally, check that Java 8 is the active version:
java -version
10. You should now see 1.8.x as the version, confirming that JDK 8 is properly set
up.
Step 4: Install Hadoop
Now that Java JDK 8 is installed and configured, you can install Hadoop.
1. Using Homebrew is the easiest way to install Hadoop. If you don’t have
Homebrew, install it by running this command in Terminal:
hadoop version
You should see the Hadoop version number and other details.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/Cellar/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/Cellar/hadoop/hdfs/datanode</value>
</property>
</configuration>
In the Terminal, create the necessary directories for the NameNode and DataNode:
mkdir -p /usr/local/Cellar/hadoop/hdfs/namenode
mkdir -p /usr/local/Cellar/hadoop/hdfs/datanode
Optional Step, if facing issues with SSH
Setup SSH
The first step is to generate SSH keys without passphrase, by do the following:
However, if you’ve already have a key pair that is associated with a password, you
probably want to avoid overwriting the existing key. For this, you can generate a
separate key pair for accessing localhost:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/<username>/.ssh/id_rsa_local):
/Users/<username>/.ssh/id_rsa_local
...
$ cat ~/.ssh/id_rsa_local.pub >> ~/.ssh/authorized_keysls
This tells SSH to use the key id_rsa_local when accessing localhost, or 0.0.0.0. This is
necessary to start a one-node Hadoop cluster.
With the above, now try:
$ ssh localhost
Note: make sure you enable remote login if you are login.
It should succeed.
Also make sure Remote login is enabled on your local system (System
Preferences -> Sharing)
start-dfs.sh
Linux
1. Open the Terminal (you can use the shortcut Ctrl + Alt + T).
○ The important part is the version number. If it starts with 1.8, you
already have JDK 8, and you can skip the Java installation step.
○ If the version is higher than 1.8 (e.g., 11, 15), or if Java is not installed,
you need to install JDK 8.
2. Install OpenJDK 8:
sudo apt install openjdk-8-jdk -y
3. Verify the installation by checking the Java version again:
java -version
4. It should now display a version starting with 1.8.
1. This downloads the Hadoop 3.3.4 version. You can check for the latest version
at Apache Hadoop Releases.
2. Extract the downloaded archive:
tar -xzvf hadoop-3.3.4.tar.gz
3. Move the extracted Hadoop folder to /usr/local:
sudo mv hadoop-3.3.4 /usr/local/hadoop
We need to set Hadoop environment variables so that the system can access
Hadoop commands globally.
1. Open your .bashrc file:
nano ~/.bashrc
2. Add the following lines to set the Hadoop environment variables:
# Set HADOOP_HOME
export HADOOP_HOME=/usr/local/hadoop
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
We need to create the directories where Hadoop will store HDFS metadata and data.
2. If successful, you will see output indicating that the NameNode has been
formatted.
Step 7: Start Hadoop Services
Once Hadoop is configured, you can start its services.
1. You can test HDFS by creating a directory and listing its contents.
2. Create a directory in HDFS:
hdfs dfs -mkdir /user
3. List the contents of the root directory in HDFS:
hdfs dfs -ls /
4. You should see the /user directory listed.