SlideShare a Scribd company logo
Deploy Hadoop on Cluster
Install Hadoop in distributed mode
This document explains how to setup Hadoop on real cluster. Here one node will act as master and rest
(two) as slave. To get real power of Hadoop Multi-node cluster is used in the productions. In this
document we will use 3 machines to deploy Hadoop cluster
2
Contents
1. Recommended Platform .................................................................................................................4
2. Prerequisites:.................................................................................................................................4
3. Install java 7 (recommended oracle java) ........................................................................................4
3.1Update the source list.................................................................................................................4
3.2 Install Java: ...............................................................................................................................4
4. Add entry of master and slaves in hosts file:.....................................................................................4
5. Configure SSH.................................................................................................................................4
5.1 Install Open SSH Server-Client....................................................................................................4
5.2 Generate key pairs ....................................................................................................................4
5.3 Configure password-less SSH......................................................................................................4
5.4 Check by SSH to slaves...............................................................................................................5
5. Download Hadoop..........................................................................................................................5
5.1 Download Hadoop.....................................................................................................................5
6. Install Hadoop................................................................................................................................5
6.1 Untar Tar ball............................................................................................................................5
6.2 Go to HADOOP_HOME_DIR........................................................................................................5
7. Setup Configuration:.......................................................................................................................5
7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME..................................................5
7.2 Edit configuration file conf/core-site.xml and add following entries:.............................................5
7.3 Edit configuration file conf/hdfs-site.xml and add following entries:.............................................5
7.4 Edit configuration file conf/mapred-site.xml and add following entries:........................................6
7.5 Edit configuration file conf/masters and add entry of secondary-master.......................................6
7.6 Edit configuration file conf/slaves and add entry of slaves ...........................................................6
7.7 Set environment variables .........................................................................................................6
8. Setup Hadoop on slaves..................................................................................................................6
8.1 Repeat the step-3 and step-4 on all the slaves.............................................................................6
8.2 Create tar ball of configured Hadoop-setup and copy to all the slaves: .........................................6
8.3 Untar configured Hadoop-setup on all the slaves ........................................................................6
9. Start The Cluster.............................................................................................................................6
9.1 Format the name node:.............................................................................................................6
9.2 Now start Hadoop services.........................................................................................................7
9.2.1 Start HDFS services .............................................................................................................7
3
9.2.2 Start Map-Reduce services ..................................................................................................7
9.3. Check daemons status, by running jps command:.......................................................................7
9.3.1 On master ..........................................................................................................................7
9.3.2 On slaves-01:......................................................................................................................7
9.3.3 On slaves-02:......................................................................................................................7
10. Stop the cluster ............................................................................................................................7
10.1 Stop mapreduce services .........................................................................................................7
10.2 Stop HDFS services ..................................................................................................................7
4
1. Recommended Platform
• OS: Ubuntu 12.04 or later (you can use other OS (cent OS, Redhat, etc))
• Hadoop: Cloudera distribution for Apache hadoop CDH3U6 (you can use Apache hadoop (0.20.X
/ 1.X))
2. Prerequisites:
• Java (oracle java is recommended for production)
• Password-less SSH setup (Hadoop need passwordless ssh from master to all the slaves, this is
required for remote script invocations)
Run following commands on the Master of Hadoop Cluster
3. Install java 7 (recommended oracle java)
3.1Update the source list
sudo apt-get update sudo apt-get install python-
software-properties sudo add-apt-repository
ppa:webupd8team/java sudo apt-get update
3.2 Install Java:
sudo apt-get install oracle-java7-installer
4. Add entry of master and slaves in hosts file:
Edit hosts file and following add entries
sudo nano /etc/hosts MASTER-IP
master
SLAVE01-IP slave-01
SLAVE02-IP slave-02
(In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP)
5. Configure SSH
5.1 Install Open SSH Server-Client
sudo apt-get install openssh-server openssh-client
5.2 Generate key pairs
ssh-keygen -t rsa -P ""
5.3 Configure password-less SSH
Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys” all the
slaves.
5
5.4 Check by SSH to slaves
ssh slave-01 ssh
slave-02
5. Download Hadoop
5.1 Download Hadoop
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
6. Install Hadoop
6.1 Untar Tar ball
tar xzf hadoop-0.20.2-cdh3u6.tar.gz
6.2 Go to HADOOP_HOME_DIR
cd hadoop-0.20.2-cdh3u6/
7. Setup Configuration:
7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.7.0_65)
7.2 Edit configuration file conf/core-site.xml and add following entries:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop_admin/hdata/hadoop-${user.name}</value>
</property>
</configuration>
7.3 Edit configuration file conf/hdfs-site.xml and add following entries:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
6
</configuration>
7.4 Edit configuration file conf/mapred-site.xml and add following entries:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
7.5 Edit configuration file conf/masters and add entry of secondary-master
slave-01
IP/Alias of node, where secondary-master will run
7.6 Edit configuration file conf/slaves and add entry of slaves
slave-01 slave-02
7.7 Set environment variables
Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows:
nano ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin Hadoop is
setup on master.
8. Setup Hadoop on slaves.
8.1 Repeat the step-3 and step-4 on all the slaves
Step-3: “install Java”
Step-4: “Add entry of master, slaves in hosts file”
8.2 Create tar ball of configured Hadoop-setup and copy to all the slaves:
tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6
scp hadoop.tar.gz slave01:~ scp
hadoop.tar.gz slave02:~
8.3 Untar configured Hadoop-setup on all the slaves
tar xzf hadoop.tar.gz
Run this command on all the slaves
9. Start The Cluster
9.1 Format the name node:
$bin/hadoop namenode –format
This activity should be done once when you install hadoop, else It will delete all your data from HDFS
7
9.2 Now start Hadoop services
9.2.1 Start HDFS services
$bin/start-dfs.sh
Run this command on master
9.2.2 Start Map-Reduce services
$bin/start-mapred.sh
Run this command on master
9.3. Check daemons status, by running jps command:
9.3.1 On master
$jps
NameNode
JobTracker
9.3.2 On slaves-01:
$jps
TaskTracker
DataNode SecondaryNameNode
9.3.3 On slaves-02:
$jps
TaskTracker
DataNode
10. Stop the cluster
10.1 Stop mapreduce services
$bin/start-mapred.sh
Run this command on master
10.2 Stop HDFS services
$bin/start-dfs.sh
Run this command on master
Ad

Recommended

Replication in PostgreSQL tutorial given in Postgres Conference 2019
Replication in PostgreSQL tutorial given in Postgres Conference 2019
Abbas Butt
 
MySQL Replication: Demo Réplica en Español
MySQL Replication: Demo Réplica en Español
Keith Hollman
 
Mongo db sharding guide
Mongo db sharding guide
Deysi Gmarra
 
Hadoop on aws amazon
Hadoop on aws amazon
Sandish Kumar H N
 
Mongo db replication guide
Mongo db replication guide
Deysi Gmarra
 
Cluster in linux
Cluster in linux
Nguyen Hoang
 
Rhel Tuningand Optimizationfor Oracle V11
Rhel Tuningand Optimizationfor Oracle V11
Yusuf Hadiwinata Sutandar
 
Pandora FMS: PostgreSQL Plugin
Pandora FMS: PostgreSQL Plugin
Pandora FMS
 
Construction ofanoracle10glinuxserver 0.5
Construction ofanoracle10glinuxserver 0.5
sopan sonar
 
Xtrabackup工具使用简介 - 20110427
Xtrabackup工具使用简介 - 20110427
Jinrong Ye
 
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
webhostingguy
 
Understand
Understand
Kalimuthu Velappan
 
Db2 udb backup and recovery with ess copy services
Db2 udb backup and recovery with ess copy services
bupbechanhgmail
 
Mysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windows
Rogério Rocha
 
D space manual 1.5.2
D space manual 1.5.2
tvcumet
 
BOOK - IBM Z vse using db2 on linux for system z
BOOK - IBM Z vse using db2 on linux for system z
Satya Harish
 
Metatron Technology Consulting 's MySQL to PostgreSQL ...
Metatron Technology Consulting 's MySQL to PostgreSQL ...
webhostingguy
 
Bugzilla guide
Bugzilla guide
Bhargavi Bhatt
 
Mater,slave on mysql
Mater,slave on mysql
Vasudeva Rao
 
WebHost Manager Online Help 1.0
WebHost Manager Online Help 1.0
webhostingguy
 
Book hudson
Book hudson
Suresh Kumar
 
Jboss4 clustering
Jboss4 clustering
shahdullah
 
Memory Pools for C and C++
Memory Pools for C and C++
Pathfinder Solutions
 
installation_manual
installation_manual
tutorialsruby
 
PipelineProject
PipelineProject
Mark Short
 
Architecting cloud
Architecting cloud
Tahsin Hasan
 
Mapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
An example Hadoop Install
An example Hadoop Install
Mike Frampton
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16
Enrique Davila
 
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
SolidQ
 

More Related Content

What's hot (18)

Construction ofanoracle10glinuxserver 0.5
Construction ofanoracle10glinuxserver 0.5
sopan sonar
 
Xtrabackup工具使用简介 - 20110427
Xtrabackup工具使用简介 - 20110427
Jinrong Ye
 
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
webhostingguy
 
Understand
Understand
Kalimuthu Velappan
 
Db2 udb backup and recovery with ess copy services
Db2 udb backup and recovery with ess copy services
bupbechanhgmail
 
Mysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windows
Rogério Rocha
 
D space manual 1.5.2
D space manual 1.5.2
tvcumet
 
BOOK - IBM Z vse using db2 on linux for system z
BOOK - IBM Z vse using db2 on linux for system z
Satya Harish
 
Metatron Technology Consulting 's MySQL to PostgreSQL ...
Metatron Technology Consulting 's MySQL to PostgreSQL ...
webhostingguy
 
Bugzilla guide
Bugzilla guide
Bhargavi Bhatt
 
Mater,slave on mysql
Mater,slave on mysql
Vasudeva Rao
 
WebHost Manager Online Help 1.0
WebHost Manager Online Help 1.0
webhostingguy
 
Book hudson
Book hudson
Suresh Kumar
 
Jboss4 clustering
Jboss4 clustering
shahdullah
 
Memory Pools for C and C++
Memory Pools for C and C++
Pathfinder Solutions
 
installation_manual
installation_manual
tutorialsruby
 
PipelineProject
PipelineProject
Mark Short
 
Architecting cloud
Architecting cloud
Tahsin Hasan
 
Construction ofanoracle10glinuxserver 0.5
Construction ofanoracle10glinuxserver 0.5
sopan sonar
 
Xtrabackup工具使用简介 - 20110427
Xtrabackup工具使用简介 - 20110427
Jinrong Ye
 
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
EMC NetWorker Module for Microsoft SQL Server Release 5.1 ...
webhostingguy
 
Db2 udb backup and recovery with ess copy services
Db2 udb backup and recovery with ess copy services
bupbechanhgmail
 
Mysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windows
Rogério Rocha
 
D space manual 1.5.2
D space manual 1.5.2
tvcumet
 
BOOK - IBM Z vse using db2 on linux for system z
BOOK - IBM Z vse using db2 on linux for system z
Satya Harish
 
Metatron Technology Consulting 's MySQL to PostgreSQL ...
Metatron Technology Consulting 's MySQL to PostgreSQL ...
webhostingguy
 
Mater,slave on mysql
Mater,slave on mysql
Vasudeva Rao
 
WebHost Manager Online Help 1.0
WebHost Manager Online Help 1.0
webhostingguy
 
Jboss4 clustering
Jboss4 clustering
shahdullah
 
PipelineProject
PipelineProject
Mark Short
 
Architecting cloud
Architecting cloud
Tahsin Hasan
 

Viewers also liked (20)

Mapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
An example Hadoop Install
An example Hadoop Install
Mike Frampton
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16
Enrique Davila
 
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
SolidQ
 
Big Data para Dummies
Big Data para Dummies
Stratebi
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for Development/Production
IMC Institute
 
Big data para principiantes
Big data para principiantes
Carlos Toxtli
 
Install hadoop in a cluster
Install hadoop in a cluster
Xuhong Zhang
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
Introducción al Big Data
Introducción al Big Data
David Alayón
 
Jamaica
Jamaica
chglat
 
St. Thomas and Peter Island
St. Thomas and Peter Island
chglat
 
2011 05 26 museomemoriaandalucia 4_ay2bachay1bc
2011 05 26 museomemoriaandalucia 4_ay2bachay1bc
pabloacostarobles
 
Smart phones
Smart phones
cmbh1
 
Great Exuma
Great Exuma
chglat
 
Justin Riviera Maya Options
Justin Riviera Maya Options
chglat
 
Lauren Jamaica Options
Lauren Jamaica Options
chglat
 
David St. Lucia Options
David St. Lucia Options
chglat
 
Mapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
An example Hadoop Install
An example Hadoop Install
Mike Frampton
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16
Enrique Davila
 
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
Introducción a Big Data. HDInsight - Webcast Technet SolidQ
SolidQ
 
Big Data para Dummies
Big Data para Dummies
Stratebi
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for Development/Production
IMC Institute
 
Big data para principiantes
Big data para principiantes
Carlos Toxtli
 
Install hadoop in a cluster
Install hadoop in a cluster
Xuhong Zhang
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
Introducción al Big Data
Introducción al Big Data
David Alayón
 
Jamaica
Jamaica
chglat
 
St. Thomas and Peter Island
St. Thomas and Peter Island
chglat
 
2011 05 26 museomemoriaandalucia 4_ay2bachay1bc
2011 05 26 museomemoriaandalucia 4_ay2bachay1bc
pabloacostarobles
 
Smart phones
Smart phones
cmbh1
 
Great Exuma
Great Exuma
chglat
 
Justin Riviera Maya Options
Justin Riviera Maya Options
chglat
 
Lauren Jamaica Options
Lauren Jamaica Options
chglat
 
David St. Lucia Options
David St. Lucia Options
chglat
 
Ad

Similar to Deploy hadoop cluster (20)

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Titus Damaiyanti
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
jijukjoseph
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Hadoop installation
Hadoop installation
Ankit Desai
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
IMC Institute
 
Exp-3.pptx
Exp-3.pptx
PraveenKumar581409
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)
valeri kopaleishvili
 
Hadoop installation on windows
Hadoop installation on windows
habeebulla g
 
Hadoop installation
Hadoop installation
habeebulla g
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
baabtra.com - No. 1 supplier of quality freshers
 
Hadoop single cluster installation
Hadoop single cluster installation
Minh Tran
 
Hadoop cluster 安裝
Hadoop cluster 安裝
recast203
 
Two single node cluster to one multinode cluster
Two single node cluster to one multinode cluster
sushantbit04
 
Hadoop presentation
Hadoop presentation
MaggieZhang61
 
Hadoop presentation
Hadoop presentation
MaggieZhang61
 
Hadoop single node setup
Hadoop single node setup
Mohammad_Tariq
 
Setup and run hadoop distrubution file system example 2.2
Setup and run hadoop distrubution file system example 2.2
Mounir Benhalla
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Titus Damaiyanti
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
jijukjoseph
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Hadoop installation
Hadoop installation
Ankit Desai
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
IMC Institute
 
Hadoop installation on windows
Hadoop installation on windows
habeebulla g
 
Hadoop installation
Hadoop installation
habeebulla g
 
Hadoop single cluster installation
Hadoop single cluster installation
Minh Tran
 
Hadoop cluster 安裝
Hadoop cluster 安裝
recast203
 
Two single node cluster to one multinode cluster
Two single node cluster to one multinode cluster
sushantbit04
 
Hadoop single node setup
Hadoop single node setup
Mohammad_Tariq
 
Setup and run hadoop distrubution file system example 2.2
Setup and run hadoop distrubution file system example 2.2
Mounir Benhalla
 
Ad

More from Chirag Ahuja (9)

Word count example in hadoop mapreduce using java
Word count example in hadoop mapreduce using java
Chirag Ahuja
 
Big data introduction
Big data introduction
Chirag Ahuja
 
Flume
Flume
Chirag Ahuja
 
Hbase
Hbase
Chirag Ahuja
 
Pig
Pig
Chirag Ahuja
 
Hive : WareHousing Over hadoop
Hive : WareHousing Over hadoop
Chirag Ahuja
 
MapReduce basic
MapReduce basic
Chirag Ahuja
 
Hdfs
Hdfs
Chirag Ahuja
 
Hadoop introduction
Hadoop introduction
Chirag Ahuja
 

Recently uploaded (20)

25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
Data Visualisation in data science for students
Data Visualisation in data science for students
confidenceascend
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Flextronics Employee Safety Data-Project-2.pptx
Flextronics Employee Safety Data-Project-2.pptx
kilarihemadri
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Starbucks in the Indian market through its joint venture.
Starbucks in the Indian market through its joint venture.
sales480687
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
Data Visualisation in data science for students
Data Visualisation in data science for students
confidenceascend
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Flextronics Employee Safety Data-Project-2.pptx
Flextronics Employee Safety Data-Project-2.pptx
kilarihemadri
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Starbucks in the Indian market through its joint venture.
Starbucks in the Indian market through its joint venture.
sales480687
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Indigo dyeing Presentation (2).pptx as dye
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 

Deploy hadoop cluster

  • 1. Deploy Hadoop on Cluster Install Hadoop in distributed mode This document explains how to setup Hadoop on real cluster. Here one node will act as master and rest (two) as slave. To get real power of Hadoop Multi-node cluster is used in the productions. In this document we will use 3 machines to deploy Hadoop cluster
  • 2. 2 Contents 1. Recommended Platform .................................................................................................................4 2. Prerequisites:.................................................................................................................................4 3. Install java 7 (recommended oracle java) ........................................................................................4 3.1Update the source list.................................................................................................................4 3.2 Install Java: ...............................................................................................................................4 4. Add entry of master and slaves in hosts file:.....................................................................................4 5. Configure SSH.................................................................................................................................4 5.1 Install Open SSH Server-Client....................................................................................................4 5.2 Generate key pairs ....................................................................................................................4 5.3 Configure password-less SSH......................................................................................................4 5.4 Check by SSH to slaves...............................................................................................................5 5. Download Hadoop..........................................................................................................................5 5.1 Download Hadoop.....................................................................................................................5 6. Install Hadoop................................................................................................................................5 6.1 Untar Tar ball............................................................................................................................5 6.2 Go to HADOOP_HOME_DIR........................................................................................................5 7. Setup Configuration:.......................................................................................................................5 7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME..................................................5 7.2 Edit configuration file conf/core-site.xml and add following entries:.............................................5 7.3 Edit configuration file conf/hdfs-site.xml and add following entries:.............................................5 7.4 Edit configuration file conf/mapred-site.xml and add following entries:........................................6 7.5 Edit configuration file conf/masters and add entry of secondary-master.......................................6 7.6 Edit configuration file conf/slaves and add entry of slaves ...........................................................6 7.7 Set environment variables .........................................................................................................6 8. Setup Hadoop on slaves..................................................................................................................6 8.1 Repeat the step-3 and step-4 on all the slaves.............................................................................6 8.2 Create tar ball of configured Hadoop-setup and copy to all the slaves: .........................................6 8.3 Untar configured Hadoop-setup on all the slaves ........................................................................6 9. Start The Cluster.............................................................................................................................6 9.1 Format the name node:.............................................................................................................6 9.2 Now start Hadoop services.........................................................................................................7 9.2.1 Start HDFS services .............................................................................................................7
  • 3. 3 9.2.2 Start Map-Reduce services ..................................................................................................7 9.3. Check daemons status, by running jps command:.......................................................................7 9.3.1 On master ..........................................................................................................................7 9.3.2 On slaves-01:......................................................................................................................7 9.3.3 On slaves-02:......................................................................................................................7 10. Stop the cluster ............................................................................................................................7 10.1 Stop mapreduce services .........................................................................................................7 10.2 Stop HDFS services ..................................................................................................................7
  • 4. 4 1. Recommended Platform • OS: Ubuntu 12.04 or later (you can use other OS (cent OS, Redhat, etc)) • Hadoop: Cloudera distribution for Apache hadoop CDH3U6 (you can use Apache hadoop (0.20.X / 1.X)) 2. Prerequisites: • Java (oracle java is recommended for production) • Password-less SSH setup (Hadoop need passwordless ssh from master to all the slaves, this is required for remote script invocations) Run following commands on the Master of Hadoop Cluster 3. Install java 7 (recommended oracle java) 3.1Update the source list sudo apt-get update sudo apt-get install python- software-properties sudo add-apt-repository ppa:webupd8team/java sudo apt-get update 3.2 Install Java: sudo apt-get install oracle-java7-installer 4. Add entry of master and slaves in hosts file: Edit hosts file and following add entries sudo nano /etc/hosts MASTER-IP master SLAVE01-IP slave-01 SLAVE02-IP slave-02 (In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP) 5. Configure SSH 5.1 Install Open SSH Server-Client sudo apt-get install openssh-server openssh-client 5.2 Generate key pairs ssh-keygen -t rsa -P "" 5.3 Configure password-less SSH Copy the contents of “$HOME/.ssh/id_rsa.pub” of master to “$HOME/.ssh/authorized_keys” all the slaves.
  • 5. 5 5.4 Check by SSH to slaves ssh slave-01 ssh slave-02 5. Download Hadoop 5.1 Download Hadoop http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz 6. Install Hadoop 6.1 Untar Tar ball tar xzf hadoop-0.20.2-cdh3u6.tar.gz 6.2 Go to HADOOP_HOME_DIR cd hadoop-0.20.2-cdh3u6/ 7. Setup Configuration: 7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/jdk1.7.0_65) 7.2 Edit configuration file conf/core-site.xml and add following entries: <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_admin/hdata/hadoop-${user.name}</value> </property> </configuration> 7.3 Edit configuration file conf/hdfs-site.xml and add following entries: <configuration> <property> <name>dfs.replication</name> <value>2</value> </property>
  • 6. 6 </configuration> 7.4 Edit configuration file conf/mapred-site.xml and add following entries: <configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> </configuration> 7.5 Edit configuration file conf/masters and add entry of secondary-master slave-01 IP/Alias of node, where secondary-master will run 7.6 Edit configuration file conf/slaves and add entry of slaves slave-01 slave-02 7.7 Set environment variables Update ~/.bashrc and set or update the HADOOP_HOME and PATH shell variables as follows: nano ~/.bashrc export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u6 export PATH=$PATH:$HADOOP_HOME/bin Hadoop is setup on master. 8. Setup Hadoop on slaves. 8.1 Repeat the step-3 and step-4 on all the slaves Step-3: “install Java” Step-4: “Add entry of master, slaves in hosts file” 8.2 Create tar ball of configured Hadoop-setup and copy to all the slaves: tar czf hadoop.tar.gz hadoop-0.20.2-cdh3u6 scp hadoop.tar.gz slave01:~ scp hadoop.tar.gz slave02:~ 8.3 Untar configured Hadoop-setup on all the slaves tar xzf hadoop.tar.gz Run this command on all the slaves 9. Start The Cluster 9.1 Format the name node: $bin/hadoop namenode –format This activity should be done once when you install hadoop, else It will delete all your data from HDFS
  • 7. 7 9.2 Now start Hadoop services 9.2.1 Start HDFS services $bin/start-dfs.sh Run this command on master 9.2.2 Start Map-Reduce services $bin/start-mapred.sh Run this command on master 9.3. Check daemons status, by running jps command: 9.3.1 On master $jps NameNode JobTracker 9.3.2 On slaves-01: $jps TaskTracker DataNode SecondaryNameNode 9.3.3 On slaves-02: $jps TaskTracker DataNode 10. Stop the cluster 10.1 Stop mapreduce services $bin/start-mapred.sh Run this command on master 10.2 Stop HDFS services $bin/start-dfs.sh Run this command on master