Hadoop Installation and Configuration

The document describes the steps to install and configure Hadoop on a single node Windows Server environment using Cygwin. Key steps include: 1. Preparing a "virgin" Windows Server with only Cygwin installed for the Linux environment. 2. Configuring OpenSSH in Cygwin to enable passwordless authentication that Hadoop scripts require. 3. Downloading and extracting the Hadoop files, and configuring environment variables and XML configuration files to set up the filesystem and job tracker on the local node.

Uploaded by

Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views16 pages

Hadoop Installation and Configuration

Uploaded by

Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Experiment no.

1
Installation and Configuration of Hadoop
Preparing To Install Hadoop:
The first thing youll need to do in order to install Hadoop is to prepare a virgin Windows Server.
While I suppose its not really necessary, from my perspective the fewer things that are installed on the
server the better. In my case, Im using Windows Server 2008 R2 with SP1 installed and nothing else.
No roles are enabled and no extra software has been installed.
Once you have a server ready (multiple servers if you want to install an actual Hadoop cluster) youll
want to installCygwin. If you arent familiar, Cygwin is basically a Linux bash shell for Windows.
Cygwin is open source and is available as a free download from: http://cygwin.com/install.html Keep in
mind that this is just a web installer. To support Hadoop, we will need to ensure that we install the
openssh package and its associated pre-requisites. In order to do this, start the setup.exe program, select
c:\cygwin as the root folder, and then click next. When you get to the screen that asks you to select a
package, search for openssl and then click the skip (Not exactly intuitive, but it works) text to enable
the checkbox for install as shown below:

Once you have selected the openssh library, click next and then answer Yes when asked if you want to
install the package pre-requisites. Click next and then finish the wizard for install. This will take some
time, so be patient.
Once the install is complete, youll want to start the Cygwin terminal as administrator (right-click on the
icon and select run as administrator) This will then setup your shell environment as follows:

Once Cygwin is installed and running properly, youll need to configure the ssh components in order for
the hadoop scripts to execute properly.
Configuring Openssh
The first step in configuring the ssh server is to run the configuration wizard. Do this by executing sshhost-config from the cygwin terminal window, which will start the wizard. Answer the questions as
follows:

(yes to different name, and sshd for the name, yes to new privileged user, and a password you can
remember)
Once the configuration is complete, open the Services control panel (start/administrative tools/services)
and right-click on the Cygwin sshd Service and select start. It should start.

If the service doesnt start, the most likely cause is that the ssh user was not created properly. You can
manually create the user and then add it to the service startup and it should work just fine.
Once the service is started, you can test it by entering the following command in the Cygwin terminal:
ssh localhost

Answer yes when prompted about the fingerprint, and you should be ready to go.
The next step in configuring ssh for use with Hadoop is to configure the authentication mechanisms
(otherwise youll be typing your password a lot when running Hadoop commands).

Since the Hadoop processes are all invoked via shell scripts and make use of ssh for all operations on the
machine (including local operations), youll want to generate key-based authentication that can be used
so that ssh doesnt require the use of a password every time its invoked. In order to do this, execute the
following command in the Cygwin terminal:
ssh-keygen t dsa P f ~/.ssh/id_dsa

Once you have the key generated and saved, well need to copy it with the following command:
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

This will take the key you just generated and save it to the list of authorized keys for ssh.
Now that ssh is properly configured, you can move on to installing and configuring Hadoop.
Downloading and installing Hadoop
Apache Hadoop is available for download from one of many mirror sites linked from the following
page:http://www.apache.org/dyn/closer.cgi/hadoop/common/
Basically just choose one of the sites, and it should bring you to a page that looks similar to the
following figure. Note that I am going to use version 1.0.0 which as of this writing is the latest, but the
release train for hadoop sometimes moves pretty fast, so there will likely be more releases available very
soon.

Click on the version you want to install (again, I am going to be using 1.0.0 for this post) and then
download the appropriate file. In my case, that will be hadoop-1.0.0-bin.tar.gz as shown in the following
figure:

Once you have the file downloaded, youll want to use a program such as Winrar which can understand
the .tar.gz file format.
The actual install process is very simple. You simply open the downloaded file in Winrar (or other
program that can understand the format) and extract all files to c:\cygwin\usr\local as shown below:

Once you are done, youll want to rename the c:\cygwin\usr\local\hadoop-1.0.0 folder to hadoop. (This
just makes things easier as youll see once we start configuring and testing hadoop)
You will also need the latest version of the Java SDK installed, which can be downloaded
from http://www.java.com (make sure you download and install the SDK and not the runtime, youll
need the server JVM) . To make things easier, you can change the target folder for the java install to
c:\java (although this is not a requirement I find it easier to use than the default path, as youll have to
escape all of the spaces and parens when adding this folder to the configuration files)
Configuring Hadoop
For the purposes of this post, Im just going to configure a single node Hadoop cluster. This may seem
counter-intuitive, but the point here is that we get a single node up and running properly and then we can
add additional nodes later.
One of the key configuration needs for Hadoop is the location where it can find the Java Runtime. Note
that the configuration files were going to work with are Unix/Linux files and thus dont look very good
(or work very well) when you use a standard Windows text editor like Notepad. To keep things simple,
use a text editor that supports Unix formats such as MetaPad. Assuming that you extracted the Hadoop
files as described above and renamed the root folder to Hadoop, open the
C:\cygwin\usr\local\hadoop\etc\hadoop\hadoop-env.sh file. Locate the line that contains JAVA_HOME,
remove the # in front of it, and replace the folder with the location you installed the Java sdk (in my

example, C:\java\jre. (Note that this is a Unix file, so special characters must be escaped with a \, so in
my case the path is c:\\java\\jre).

This is really all that is required to change in this file, but if you want to know more about the contents
you
can
check
out
the
Hadoop
documentation
here: http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html (Note that this guide is based on
the 0.20.2 release and *not* the 1.0.0 release I detailing here. The docs havent quite caught up with the
release at the time of this posting)
Once you save and close that file, you can verify that Hadoop is properly running by executing the
following command inside of Cygwin. (Make sure you change to the /usr/local/hadoop directory first)
bin/hadoop version

You should see a Cygwin warning about MSDOS file paths, and then a version of Hadoop and
Subversion located as shown in the figure above. If you do not see a similar output, you likely do not
have the path to your Java home set correctly. If you installed Java in the default path, remember that all
spaces, slashes and parens must be escaped first. The default path would look like: C:\\Program\ Files\ \
(x86\)\\Javaxxxx (you get the idea and probably understand now why I said it would be better to put it in
a simple folder)
Once you have the basic Hadoop configuration working, the next step is to configure the site
environment settings. This is done via the C:\cygwin\usr\local\hadoop\etc\hadoop\hdfs-site.xml file.
Again open this file with MetaPad or a similar editor, and add the following configuration items to the
file. Of course replace hadoop2 with your host name as appropriate:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop2:47110</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hadoop2:47111</value>
</property>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
This basically configures the local host (in my case its named hadoop2) filesystem and job tracker, and
sets the dfs replication to 2 blocks. You can read about this file and its values
here: http://wiki.apache.org/hadoop/HowToConfigure(again remember that the docs are outdated for my
particular installation, but they still work)
We will also need to configure the mapred-site.xml file to specify the configuration for the mapreduce
service:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop2:8021</value>
</property>
</configuration>
This basically configures the mapreduce job tracker to use port 8021 on the local host. There is a LOT
more to both of these configuration files, Im only presenting the basics to get things up and running
here.
Now that we have the basic environment setup, we need to format the HDFS filesystem that Hadoop
will use. In the configuration file above, I did not specify a location for the DFS files. This means that
they will be stored in /tmp. This is OK for our testing and for a small cluster, but for production systems
youll want to make sure that you specify the location by using the dfs.name.dir and dfs.data.dir
configuration items (which you can read about in the link I provided above).
To format the filesystem, enter the following command in cygwin:
bin\hadoop namenode format

If you have properly configured the hdfs-site xml file, you should see output that is similar to the above.
Note in my case that the default folder location is /tmp/hadoop-tmalone/dfs/name. Remember this
directory name as you will need it later.
Now that we have the configuration in place and the filesystem formatted, we can start the Hadoop
subsystems. The first thing well want to do is start the DFS subsystem. Do this with the following
command in the Cygwin terminal:
sbin/start-dfs.sh

This should take a few moments, and you should see output as shown in the figure above. Note that the
logs are stored in/usr/local/hadoop/logs. You can verify that DFS is running by examining the namenode
log:

You can also verify that the DFS service is running by checking the monitor (assuming you used the
ports as I described them in the hdfs-site xml configuration above). To check the monitor, open a web
browser and navigate to the following site:
http://localhost:50070

If the DFS service is properly running, you will see a status screen like the above. If you get an error
when attempting to open that page, it is likely that DFS is not running and you will need to check the log
file to determine what has gone wrong. Most common problem is a misconfiguration of the site-xml file,
so double check that the file is correct.
Once DFS is up and running, you can start the mapreduce process as follows:
sbin/start-mapred.sh

You can test that the Mapreduce process is running by checking the monitor. Open a web browser and
navigate to:
http://localhost:50030

Now we have success! Weve installed and configured Hadoop, formatted the DFS filesystem, and
started the basic processes necessary to use the power of Hadoop for processing!
Testing the installation
Each Hadoop distribution comes with a set of samples that can be used to very that the system is
functional. The samples are stored in Java jar files and are located in the hadoop/share/hadoop
directory. One simplistic test would be to copy some text files into DFS and then use the sample
mapreduce job to enumerate them. First though, you will likely want to setup an alias to make entering
the commands a little easier. In my case, I will alias the hadoop dfs command to simply hdfs. Do
accomplish this, type the following command in the Cygwin terminal window:
alias hdfs=bin/hadoop dfs

For the first part of our test, we will copy the configuration files from the hadoop directory into DFS. In
order to do this, we will use the dfs put command (for more information on the put command, see the
docs here:http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html (again remember that the docs
are a little behind)
In the Cygwin terminal window (still in the /usr/local/hadoop directory) execute the following
command:
hdfs put etc/hadoop conf

This will load all of the files in the /usr/local/hadoop/etc/hadoop directory into HDFS in the conf folder.
Since the conf folder doesnt exist, the put command will create it. Note that you receive a warning
about the platform, but that is OK the files will still copy.
You can verify that the files were copied by using the dfs ls command as follows:
hdfs ls conf

Now that we are sure the files are stored in HDFS, we can use one of the examples that is shipped with
Hadoop to analyze the text in the files for a certain pattern. The samples are located in the
/usr/local/hadoop/share/hadoop folder, and its easiest to change to that folder and execute the sample
there. In the Cygwin terminal, execute the following command:
cd /usr/local/hadoop/share/hadoop
Once were in the folder, we can run a simple IO test to determine how well our cluster DFS IO will
perform. In my case, since Im running this on a VM with a slow disk, I dont expect much out of the
cluster, but its a very nice way to test to see that DFS is indeed functioning as it should. Execute the
following command to test DFS IO:
../../bin/hadoop jar hadoop-test-1.0.0.jar testDFSIO write nrFiles 10 filesize 1000

If you dont see any exceptions in the output, you have successfully configured Hadoop and the DFS
cluster is operational.

Case Profinet
No ratings yet
Case Profinet
6 pages
Cloud Computing Lab Manual CS-804
No ratings yet
Cloud Computing Lab Manual CS-804
143 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Experiment 1
No ratings yet
Experiment 1
72 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
bi lab file
No ratings yet
bi lab file
19 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Updated CMD
No ratings yet
Updated CMD
23 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
BDAO
No ratings yet
BDAO
23 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Installation Final
No ratings yet
Hadoop Installation Final
5 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
big data
No ratings yet
big data
32 pages
CP5261Data Analytics Laboratory
No ratings yet
CP5261Data Analytics Laboratory
57 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop installation process
No ratings yet
Hadoop installation process
16 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
How to Install Hadoop in Windows 10 & 11 _ Hadoop Installation
No ratings yet
How to Install Hadoop in Windows 10 & 11 _ Hadoop Installation
9 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
04. Hadoop Installaion (1)
No ratings yet
04. Hadoop Installaion (1)
113 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
CC EXP 8 VBHV
No ratings yet
CC EXP 8 VBHV
8 pages
Hadoop on Windows
No ratings yet
Hadoop on Windows
13 pages
Hadoop Installation
No ratings yet
Hadoop Installation
17 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Install Apache Hadoop Using Cloudera
No ratings yet
Install Apache Hadoop Using Cloudera
132 pages
L-3
No ratings yet
L-3
5 pages
lab manual
No ratings yet
lab manual
34 pages
Big Data File
No ratings yet
Big Data File
16 pages
C LANGUAGE Material (C Questions)
No ratings yet
C LANGUAGE Material (C Questions)
226 pages
Experiment 3.5 How To Create Simulation Entities in Run-Time Using A Global Manager Entity
No ratings yet
Experiment 3.5 How To Create Simulation Entities in Run-Time Using A Global Manager Entity
8 pages
Experiment 3.2 How To Create A Datacenter With One Host and Run Two Cloudlets On It
No ratings yet
Experiment 3.2 How To Create A Datacenter With One Host and Run Two Cloudlets On It
7 pages
Experiment 3.1 How To Create A Datacenter With One Host and Run One
No ratings yet
Experiment 3.1 How To Create A Datacenter With One Host and Run One
6 pages
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
No ratings yet
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
9 pages
Traing On Hadoop
No ratings yet
Traing On Hadoop
123 pages
Final Assessment Questions and Answers
No ratings yet
Final Assessment Questions and Answers
5 pages
Using OMICRON MBX1 For GOOSE Signal Switchboard Testing
No ratings yet
Using OMICRON MBX1 For GOOSE Signal Switchboard Testing
47 pages
Arista EOS ConfigGuide PDF
No ratings yet
Arista EOS ConfigGuide PDF
934 pages
Shortcut Keys For Desktop & Laptop
No ratings yet
Shortcut Keys For Desktop & Laptop
3 pages
Co&a Unit 3
No ratings yet
Co&a Unit 3
63 pages
Pricelist Laptop PC Branded Projector 23 Agustus 2023
No ratings yet
Pricelist Laptop PC Branded Projector 23 Agustus 2023
13 pages
Service Courses Bookshelf
No ratings yet
Service Courses Bookshelf
19 pages
Internet of Things (IOT)
No ratings yet
Internet of Things (IOT)
21 pages
Css Module Updated 2
No ratings yet
Css Module Updated 2
3 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
8 pages
PMSL-ISMS-SP-005-Standard Operating Procedure For IT
No ratings yet
PMSL-ISMS-SP-005-Standard Operating Procedure For IT
13 pages
Pertemuan 3 Introduction To MySQL
No ratings yet
Pertemuan 3 Introduction To MySQL
14 pages
AS228 Encoder Okuma
No ratings yet
AS228 Encoder Okuma
13 pages
Dell EMC PowerProtect Oracle RMAN Agent - How to Run a Backup _ Dell Vietnam
No ratings yet
Dell EMC PowerProtect Oracle RMAN Agent - How to Run a Backup _ Dell Vietnam
5 pages
TestKing CCNA - 642-821 Edt1
No ratings yet
TestKing CCNA - 642-821 Edt1
39 pages
Labnms 2001
No ratings yet
Labnms 2001
29 pages
kivy_25-01-15_3
No ratings yet
kivy_25-01-15_3
3 pages
Transport Layer Security (TLS)
No ratings yet
Transport Layer Security (TLS)
22 pages
Module 8 - Troubleshooting vCenter Server and ESXi
No ratings yet
Module 8 - Troubleshooting vCenter Server and ESXi
43 pages
Week 1 - LAN Design Module PDF
No ratings yet
Week 1 - LAN Design Module PDF
24 pages
Alcatel Networking DT00TE719FR
No ratings yet
Alcatel Networking DT00TE719FR
2 pages
MTech Job Profile - 2025 V.1.1
No ratings yet
MTech Job Profile - 2025 V.1.1
4 pages
MSTRWorld2015 T6 S2 Command MGR and System MGR
No ratings yet
MSTRWorld2015 T6 S2 Command MGR and System MGR
50 pages
B ACI Config Guide
No ratings yet
B ACI Config Guide
212 pages
Web Page PPT 1
No ratings yet
Web Page PPT 1
17 pages
Operating Systems: Course Objectives
No ratings yet
Operating Systems: Course Objectives
1 page
MDUs Huawei
No ratings yet
MDUs Huawei
73 pages
Lab 1 Connect To The Management Network
No ratings yet
Lab 1 Connect To The Management Network
18 pages
Tensilica Sogggggggund
No ratings yet
Tensilica Sogggggggund
12 pages

Hadoop Installation and Configuration

Uploaded by

Hadoop Installation and Configuration

Uploaded by

Experiment no.

You might also like