0% found this document useful (0 votes)
33 views

HDFS Installation Steps

Installation steps for HDFS on Windows, mac, and linux

Uploaded by

SushantBhargav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

HDFS Installation Steps

Installation steps for HDFS on Windows, mac, and linux

Uploaded by

SushantBhargav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

I know you are not aware of terminologies, but it's fine, we are going to discuss

those in the class. This is just to save time for installation during the class.

Windows
Ideally, HDFS is not meant to be installed on Windows, but you can follow these
steps to install HDFS. You might face some issues while installing it, which is possible,
in a few cases, installation might not work at all, don’t worry, we will try to resolve it
on the best-effort basis.

If Java is Already Installed


Verify Java Installation
Open Command Prompt and run the following command to check if Java is
installed:
java -version

1. If you see something like this (e.g., java version "1.8.0_221"), it’s already
installed. It might be possible that you have a different Java version installed,
ensure that it is JDK 8 or higher, as Hadoop works well with JDK 8. If Java is
installed, you can skip Step 1 and directly proceed with Step 2 to download
and configure Hadoop.

Java is Not Installed


If you don’t have Java installed, don’t worry! Follow these detailed steps to install Java
and set up Hadoop's HDFS.

Step 1: Install Java

Hadoop requires Java to run, so we need to install Java first. Here's how to do it:

1.1 Download Java Development Kit (JDK)

1. Open your web browser and go to the Oracle Java Downloads page or
AdoptOpenJDK.
2. Look for Java SE 8 (also called JDK 8).. Select the version for Windows.
3. Download the JDK installer file for your system (likely Windows x64 Installer
if you're using a 64-bit version of Windows).

1.2 Install Java


1. Double-click the downloaded installer file (e.g., jdk-8u221-windows-x64.exe).
2. Follow the installation wizard:
○ Click Next and use the default installation path (usually C:\Program
Files\Java\jdk1.8.x_x).
○ Wait for the installation to complete.
3. Once installed, Java is ready for use. But we need to set up an environment
variable so that your system knows where Java is installed.

1.3 Set Up JAVA_HOME Environment Variable

The JAVA_HOME environment variable tells the system where Java is installed.

1. Right-click on This PC (or My Computer) and select Properties.


2. On the left side of the screen, click Advanced system settings.
3. In the System Properties window, click the Environment Variables button at
the bottom.
4. In the Environment Variables window, under System variables, click New to
create a new variable:
○ Variable name: JAVA_HOME
○ Variable value: The path to your Java installation folder. This should
look something like C:\Program Files\Java\jdk1.8.0_xxx (where
xxx is the specific version number of the JDK you installed).

Example:
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_221

5. Now, select the Path variable under System Variables and click Edit.
○ Click New and add %JAVA_HOME%\bin to the list of paths.
○ Click OK to save the changes.

1.4 Verify Java Installation

1. Open the Command Prompt (type cmd in the Windows search bar and press
Enter).

In the Command Prompt, type the following command and press Enter:
java -version

If Java is installed correctly, when you type the above command in the command
prompt you should see an output like this:

java version "1.8.0_221"


Java(TM) SE Runtime Environment (build 1.8.0_221-b11)

2. Now that Java is installed, we can proceed to install Hadoop.

Step 2: Download and Install Hadoop


2.1 Download Hadoop

1. Open your web browser and go to the official Apache Hadoop Releases page:
Hadoop Release Page.
2. Scroll down and download the latest stable release of Hadoop, such as
Hadoop 3.x.x. Click on the binary release link (e.g., hadoop-3.3.4.tar.gz).

2.2 Extract the Hadoop Archive

1. Once the Hadoop archive (.tar.gz file) is downloaded, you need to extract it.
You can use a tool like 7-Zip to extract the archive:
○ Right-click on the file and select Extract to hadoop-3.3.x/.
2. After extraction, move the extracted folder to a directory where you want to
install Hadoop (for example, C:\hadoop).

2.3 Set Up Hadoop Environment Variables

1. Open the Environment Variables window again by following the steps in


Step 1.3.
2. Under System variables, click New to create the following variable:
○ Variable name: HADOOP_HOME
○ Variable value: The path to your Hadoop folder (e.g., C:\hadoop).
3. Edit the Path variable again:
○ Click New and add the following to the path: %HADOOP_HOME%\bin.
4. Click OK to save the changes.

Step 3: Install WinUtils


Hadoop requires winutils.exe to work correctly on Windows systems. Here’s how to
set it up:
3.1 Download WinUtils

1. Open your browser and go to this GitHub repository.


2. Download the winutils.exe file corresponding to your Hadoop version (for
example, for Hadoop 3.x.x).

3.2 Place WinUtils in the Hadoop Directory

1. Create a new folder called bin inside your Hadoop installation directory
(C:\hadoop).
2. Place the winutils.exe file inside this bin folder (C:\hadoop\bin).

Now that we have Java, Hadoop, and WinUtils in place, let’s configure Hadoop.

Step 4: Configure Hadoop for HDFS


To get Hadoop’s HDFS working, we need to modify a couple of configuration files.

4.1 Edit core-site.xml

1. Navigate to C:\hadoop\etc\hadoop.
2. Open the file core-site.xml in a text editor (such as Notepad).

Inside the <configuration> tags, paste the following:


<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

3. Save and close the file.

4.2 Edit hdfs-site.xml


1. In the same directory (C:\hadoop\etc\hadoop), open hdfs-site.xml.

Inside the <configuration> tags, add the following:


<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:///C:/hadoop/data/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///C:/hadoop/data/datanode</value>

</property>

</configuration>

2. Save and close the file.

4.3 Create Directories for NameNode and DataNode

1. In File Explorer, go to C:\hadoop.


2. Create the following directories:
○ C:\hadoop\data\namenode
○ C:\hadoop\data\datanode

These folders will store the metadata and data blocks of HDFS.
Step 5: Format the NameNode
Before we start HDFS, we need to format the NameNode, which is the main node of
the HDFS cluster.

5.1 Format NameNode

1. Open the Command Prompt as an Administrator (right-click on the


Command Prompt icon and select Run as Administrator).

Run the following command to format the NameNode:


hdfs namenode -format

2. If the formatting is successful, you will see messages indicating that the
NameNode has been formatted.

Step 6: Start Hadoop Services


Now, it’s time to start the Hadoop services.

6.1 Start NameNode and DataNode

1. Open two separate Command Prompt windows as Administrator.

In the first window, navigate to the Hadoop sbin directory:


cd %HADOOP_HOME%\sbin

2. Run the following command to start the Hadoop distributed file system
(HDFS):
start-dfs.cmd
3. In the second Command Prompt window, run this command to check the
status of the HDFS directories:
hdfs dfs -ls /

Step 7: Verify the Installation


You can verify that HDFS is running correctly by:
1. Checking the NameNode UI:
○ Open your browser and go to http://localhost:50070. This should
open the NameNode’s web interface.
2. Running HDFS Commands:

In the Command Prompt, try creating a directory in HDFS:


hdfs dfs -mkdir /user


○ List the directories in the root directory of HDFS:
hdfs dfs -ls /

MAC

Step 1: Check if Java is Already Installed


Before proceeding with the installation of Java, let’s first check if Java is already
installed on your system.

1.1 Check Java Version

1. Open Terminal (you can find it via Spotlight or in Applications > Utilities).
2. Type the following command and press Enter:
java -version

If Java is installed, you’ll see something like:

java version "1.8.0_221"


Java(TM) SE Runtime Environment (build 1.8.0_221-b11)

If Java is already installed proceed to Step 4.

Step 2: Install Java JDK 8

If you don’t have JDK 8 or higher version installed, follow these steps to install JDK 8.

2.1 Download JDK 8


1. Open your web browser and go to the Oracle JDK 8 Downloads page.
Note: You may need to create an Oracle account (free) to download older
versions of Java.
2. Look for the Java SE Development Kit 8u (e.g., 8u241, 8u301, etc.).
○ Choose the version for macOS (e.g., jdk-8u301-macosx-x64.dmg).
○ Download the installer.

2.2 Install JDK 8

1. Double-click the downloaded .dmg file and follow the on-screen instructions
to install JDK 8.
2. Once the installation is complete, you need to set up the JAVA_HOME
environment variable.

Step 3: Set the JAVA_HOME Environment Variable

After installing Java JDK 8, the JAVA_HOME variable needs to be set so Hadoop can
find it.

1. Open the Terminal and edit your shell configuration file, depending on your
shell:
2. For Zsh (default shell in macOS Catalina and later):
nano ~/.zshrc
3. For Bash (if you're using an older macOS version):
nano ~/.bash_profile
4. Add the following line to the bottom of the file:
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
5. Save the file by pressing CTRL + O, then CTRL + X to exit the editor.
6. Apply the changes by running:
source ~/.zshrc # or `source ~/.bash_profile` for Bash users
7. Verify that JAVA_HOME is set correctly by typing:
echo $JAVA_HOME
8. This should return a path similar to
/Library/Java/JavaVirtualMachines/jdk1.8.0_301.jdk/Contents/Home.
9. Finally, check that Java 8 is the active version:
java -version
10. You should now see 1.8.x as the version, confirming that JDK 8 is properly set
up.
Step 4: Install Hadoop

Now that Java JDK 8 is installed and configured, you can install Hadoop.

4.1 Install Hadoop via Homebrew

1. Using Homebrew is the easiest way to install Hadoop. If you don’t have
Homebrew, install it by running this command in Terminal:

/bin/bash -c "$(curl -fsSL


https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Once Homebrew is installed, run the following command to install Hadoop:

brew install hadoop

4.2 Verify Hadoop Installation

To check if Hadoop was installed successfully, run:

hadoop version

You should see the Hadoop version number and other details.

Step 5: Configure Hadoop for HDFS

Now we need to configure Hadoop for HDFS.

5.1 Edit core-site.xml


1. Navigate to the Hadoop configuration directory:
cd /usr/local/Cellar/hadoop/<version>/libexec/etc/hadoop
Replace <version> with your installed Hadoop version.
2. Open the core-site.xml file:
nano core-site.xml

Add the following configuration inside the <configuration> tags:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

3. Save and close the file.

5.2 Edit hdfs-site.xml


1. Open the hdfs-site.xml file:
nano hdfs-site.xml
2. Add the following configuration inside the <configuration> tags:
<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/Cellar/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>

<value>file:///usr/local/Cellar/hadoop/hdfs/datanode</value>
</property>
</configuration>

3. Save and close the file.

5.3 Create Directories for NameNode and DataNode

In the Terminal, create the necessary directories for the NameNode and DataNode:

mkdir -p /usr/local/Cellar/hadoop/hdfs/namenode
mkdir -p /usr/local/Cellar/hadoop/hdfs/datanode
Optional Step, if facing issues with SSH

Setup SSH
The first step is to generate SSH keys without passphrase, by do the following:

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

However, if you’ve already have a key pair that is associated with a password, you
probably want to avoid overwriting the existing key. For this, you can generate a
separate key pair for accessing localhost:

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/<username>/.ssh/id_rsa_local):
/Users/<username>/.ssh/id_rsa_local
...
$ cat ~/.ssh/id_rsa_local.pub >> ~/.ssh/authorized_keysls

This tells SSH to use the key id_rsa_local when accessing localhost, or 0.0.0.0. This is
necessary to start a one-node Hadoop cluster.
With the above, now try:

$ ssh localhost
Note: make sure you enable remote login if you are login.

It should succeed.
Also make sure Remote login is enabled on your local system (System
Preferences -> Sharing)

Step 6: Format the NameNode

Before using HDFS, we need to format the NameNode.

Run the following command to format the NameNode:


hdfs namenode -format
1. You should see output indicating that the NameNode has been formatted
successfully.

Step 7: Start Hadoop Services

7.1 Start HDFS

Start the Hadoop Distributed File System (HDFS) by running:

start-dfs.sh

7.2 Verify the Hadoop Setup

1. Open a web browser and go to http://localhost:9870 to view the Hadoop


NameNode web UI. This interface will show you the status of the HDFS cluster.
2. In Terminal, run the following commands to verify HDFS:

Create a directory in HDFS:


hdfs dfs -mkdir /user

List the root directory in HDFS:


hdfs dfs -ls /

Linux

Step 1: Check if Java is Already Installed


Before installing Hadoop, we need to verify if Java JDK 8 is already installed on your
Linux system.

1.1 Check Java Version

1. Open the Terminal (you can use the shortcut Ctrl + Alt + T).

Run the following command to check if Java is installed:


java -version
2. If Java is installed, you will see something like this:
java version "1.8.0_231"

Java(TM) SE Runtime Environment (build 1.8.0_231-b11)

○ The important part is the version number. If it starts with 1.8, you
already have JDK 8, and you can skip the Java installation step.
○ If the version is higher than 1.8 (e.g., 11, 15), or if Java is not installed,
you need to install JDK 8.

Step 2: Install Java JDK 8 (If Not Installed)


If you don’t have JDK 8, follow these steps to install it.

2.1 Install JDK 8 on Ubuntu/Debian-based Systems

1. Open the Terminal.

Update the package list:


sudo apt update

2. Install OpenJDK 8:
sudo apt install openjdk-8-jdk -y
3. Verify the installation by checking the Java version again:
java -version
4. It should now display a version starting with 1.8.

2.2 Install JDK 8 on CentOS/RHEL-based Systems

1. Open the Terminal.


2. Install the OpenJDK 8 package:
sudo yum install java-1.8.0-openjdk-devel -y
3. Verify the installation:
java -version

Step 3: Set JAVA_HOME Environment Variable


Now that Java is installed, you need to set the JAVA_HOME environment variable,
which Hadoop needs to locate Java.

Open the Terminal and edit your shell configuration file:


nano ~/.bashrc

1. Add the following line to the end of the file:


export JAVA_HOME=$(readlink -f /usr/bin/java | sed
"s:bin/java::")
2. Save and close the file by pressing CTRL + O, then CTRL + X.
3. Apply the changes by running:
source ~/.bashrc
4. To check if JAVA_HOME is set correctly, run:
echo $JAVA_HOME
5. It should return a path similar to /usr/lib/jvm/java-8-openjdk-amd64.

Step 4: Download and Install Hadoop


Now that Java is installed and configured, we can move on to installing Hadoop.

4.1 Download Hadoop


Open a web browser or use wget to download Hadoop from the official Apache
website:
wget
https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.t
ar.gz

1. This downloads the Hadoop 3.3.4 version. You can check for the latest version
at Apache Hadoop Releases.
2. Extract the downloaded archive:
tar -xzvf hadoop-3.3.4.tar.gz
3. Move the extracted Hadoop folder to /usr/local:
sudo mv hadoop-3.3.4 /usr/local/hadoop

4.2 Set Hadoop Environment Variables

We need to set Hadoop environment variables so that the system can access
Hadoop commands globally.
1. Open your .bashrc file:
nano ~/.bashrc
2. Add the following lines to set the Hadoop environment variables:
# Set HADOOP_HOME
export HADOOP_HOME=/usr/local/hadoop

# Add Hadoop bin and sbin to PATH


export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3. Save and exit the file (CTRL + O, then CTRL + X).


4. Apply the changes:
source ~/.bashrc

Step 5: Configure Hadoop for HDFS


Now we’ll configure Hadoop to work as a single-node HDFS system.

5.1 Configure core-site.xml


1. Navigate to the Hadoop configuration directory:
cd $HADOOP_HOME/etc/hadoop
2. Open core-site.xml for editing:
nano core-site.xml
3. Inside the <configuration> tags, add the following properties:
<configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

4. Save and exit the file.

5.2 Configure hdfs-site.xml


1. Open hdfs-site.xml for editing:
nano hdfs-site.xml
2. Inside the <configuration> tags, add the following properties:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hdfs/datanode</value>
</property>
</configuration>

3. Save and exit the file.

5.3 Add Java Home in hadoop-env


1. vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
2. export JAVA_HOME=/path/to/your/java
3. Verify it by running $HADOOP_HOME/bin/hadoop version.

5.4 Create Directories for NameNode and DataNode

We need to create the directories where Hadoop will store HDFS metadata and data.

Create the necessary directories:


sudo mkdir -p /usr/local/hadoop/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hdfs/datanode

Step 6: Format the NameNode


1. Before starting Hadoop, we need to format the NameNode.

Run the following command to format the NameNode:


hdfs namenode -format

2. If successful, you will see output indicating that the NameNode has been
formatted.
Step 7: Start Hadoop Services
Once Hadoop is configured, you can start its services.

7.1 Start HDFS Daemons


1. Run the following command to start the NameNode and DataNode:
start-dfs.sh
2. You can check if the NameNode and DataNode are running by using the
following command:
Jps
3. You should see the NameNode and DataNode services listed.

7.2 Access the HDFS Web UI

1. Hadoop provides a web interface to monitor HDFS.


2. Open a browser and go to:
http://localhost:9870
3. This will display the Hadoop NameNode web UI.

7.3 Test HDFS

1. You can test HDFS by creating a directory and listing its contents.
2. Create a directory in HDFS:
hdfs dfs -mkdir /user
3. List the contents of the root directory in HDFS:
hdfs dfs -ls /
4. You should see the /user directory listed.

You might also like