Deployment and Management of Hadoop Clusters

Deployment and Management
of Hadoop Clusters
Amal G Jose
Big Data Analytics
http://www.coderfox.com/
http://amalgjose.wordpress.com/
in.linkedin.com/in/amalgjose/

• Introduction
• Cluster design and deployment
• Backup and Recovery
• Hadoop Upgrade
• Routine Administration Tasks
Agenda

Introduction
• What is Hadoop ?
• What makes Hadoop different ?
• Need for a hadoop cluster ?

This has 4 parts:
• Cluster Planning.
• OS installation & Hardening.
• Cluster Software Installation.
• Cluster configuration.
Cluster Installation

Cluster Planning
Hadoop Daemon Configuration
Namenode Dedicated servers.
OS is installed on the RAID device.
The dfs.name.dir will reside on the
same RAID device. One more copy is
configured to have on NFS.
Secondary Namenode Dedicated Server
OS is installed on RAID device
Jobtracker Dedicated Server.
OS installed on JBOD configuration
Datanode/Tasktracker Individual servers.
OS installed on JBOD configuration

Workload Patterns For Hadoop
• Balanced Workload
• Compute Intensive
• I/O Intensive
• Unknown or evolving workload patterns

Name Node
Job Tracker
Ganglia-Daemon
Name Node
Job Tracker
Ganglia-Daemon
MN
Hive
Pig
Oozie
Mahout
Ganglia-Master
Hive
Pig
Oozie
Mahout
Ganglia-Master
CN
Typical Hadoop Cluster Topology
Task Tracker
Data Node
Ganglia-Daemon
Task Tracker
Data Node
Ganglia-Daemon
SN

• Creating the instances based on the
requirement
Creating Instances (in case of cloud)

• We will be installing the Hadoop on the RHEL6 64-
bit servers.
• OS should be hardened based on RHEL6
hardening document.
• Setting iptable rules necessary for hadoop
services.
• In case of Amazon EC2 instances create
key/value pairs for logging in.
• GUI can be disabled to make more room for
hadoop.
• Time should be made same in all the servers.
Operating System Hardening

• Choosing the distribution of Hadoop.
• Creation of Local Yum Repository.
• Java Installation in all the machines.
Cluster Software Installation

Installation Methods
• Hadoop can be installed either manually
or automatically using some tools such as
ClouderaManager, Ambari etc.
• One click installation tools helps the users
to install hadoop on clusters without any
pain.

Manual Installation
• Install hadoop daemons in the nodes.
• We can either use tarball or rpm for
installation.
• rpm installation will be easier.

Setting up Client Node
• What is client node ?
• Necessity of a client node ?
• How to configure a client node ?
• What all services are installed ?
• Need for multiuser segregation ?

Cluster Configuration
• Storage location for namenode,
secondarynamenode and datanode.
• Number of task slots (map/reduce slots).
– Number of task slots/node = (memory
available/child jvm size)
• Backup location for namenode.
• Configuring mysql for hive and oozie.

Namenode - Single point of
Failure
• Why namenode is the single point of
failure?
• How to resolve this issue?
• How backup can be achieved?

Implementing Schedulers
• Capacity scheduler
• Fair scheduler

Monitoring Hadoop Cluster
• For manual installation, we can use
Ganglia.
• Automated installation tools have built-in
monitoring mechanisms available.

Cluster Maintenance
• Managing Hadoop processes
– Starting/stopping processes
• HDFS Maintenance
– Adding /Decommissioning datanode
– Checking file system integrity with fsck.
– Balancing hdfs block data
– Dealing with a failed disk
• Mapreduce Maintenance
– Adding /Decommissioning tasktracker
– Killing a mapreduce Job/ Task
– Dealing with a blacklisted tasktracker

Backup and Recovery
• Data Backup
– Distributed copy (distcp)
– Parallel Ingestion
• Namenode Metadata
• Hive metastore backup.

Hadoop Upgrades
• Data Backup
• Software upgrade
• HDFS upgrade
• Finalize upgrade

Steps for Hadoop Upgrade
• Make sure that any previous upgrade is finalized before proceeding
with another upgrade.
• Shut down MapReduce and kill any orphaned task processes on the
tasktrackers.
• Shut down HDFS and backup the namenode directories.
• Install new versions of Hadoop HDFS and MapReduce on the
cluster and on clients.
• Start HDFS with the -upgrade option.
• Wait until the upgrade is complete.
• Perform some sanity checks on HDFS.
• Start MapReduce.
• Roll back or finalize the upgrade (optional).

Routine Administration
Procedures
• Checking every nodes
• Metadata backups
• Data backups
• File system check
• File system balancer

Summary
• Hadoop Cluster design
• Hadoop Cluster Installation
• Back up and Recovery
• Hadoop Upgrade
• Routine Administration Procedures

For more info, visit:
http://amalgjose.wordpress.com
http://coderfox.com
http://in.linkedin.com/in/amalgjose
Additional Information

Deployment and Management of Hadoop Clusters

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Deployment and Management of Hadoop Clusters (20)

Recently uploaded (20)

Deployment and Management of Hadoop Clusters