Hadoop operations-2014-strata-new-york-v5

Hadoop Operations –
Best Practices from the Field
October 17, 2014
Chris Nauroth
email: cnauroth@hortonworks.com
twitter: @cnauroth
Suresh Srinivas
email: suresh@hortonworks.com
twitter: @suresh_m_s

About Us
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop committer and PMC member
– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Suresh Srinivas
• Architect & Founder at Hortonworks
– Long time Apache Hadoop committer and PMC member
– Designed and developed many key Hadoop features
• Experience from supporting many clusters
– Including some of the world’s largest Hadoop clusters
© Hortonworks Inc. 2011
Page 2
Architecting the Future of Big Data

Agenda
• Analysis of Hadoop Support Cases
– Support case trends
– Configuration
– Documentation
– Software Improvements
• Key Learnings and Best Practices
– HDFS ACLs
– HDFS Snapshots
– YARN Application Timeline Server
Page 3

Support Cases: Setting the Context
• Hortonworks Support
– Multiple tiers of support contacts
– Support engineers trained and knowledgeable across the entire Hadoop ecosystem
– Cases may escalate to subject matter experts for depth in one particular area
– Challenging cases may escalate to Apache committers at Hortonworks if additional expertise is required
• Apache Community Support
– user@hadoop.apache.org for user questions and support
– https://issues.apache.org/jira for reporting confirmed bugs
– Apache Hadoop users, contributors, committers and PMC members all participate actively in these forums to help
resolve issues
Page 4

Support Case Analysis Methodology
• Inspected over 2 years of support case history across hundreds of customers
• Broad inclusion of 29 Hadoop ecosystem and related projects
• Multiple versions of Hadoop in deployments
– 2 major versions: Hadoop 1.x and 2.x
– ~3 minor versions within each major version
– ~3 patch releases per minor version
– ~15 total releases and updates
• Distinct deployment environments
– Cluster sizes ranging from 10s to 1000s of nodes
– Different management environments and operational practices
– Various deployment techniques: Ambari, Chef, RPMs, etc.
Page 5

Support Case Trends – Cases per Month
Page 6
140
120
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
HDFS
Map Reduce
YARN

Support Case Trends – Cases per Month
• What is the spike in May 2014?
– More users
– More total users means more total support cases
– More features
– Many upgrades of existing clusters from Hadoop 1 to Hadoop 2
– Many conversions to HA deployments
– Many conversions to secured deployments
– More integration
– Many sites running separate Hadoop 1 and Hadoop 2 clusters simultaneously
– Questions around migrating data between clusters at 2 different versions (DistCp)
Page 7

Support Case Trends – Proportional Cases per Month
Page 8
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
HDFS
Map Reduce
YARN
Other (26 components)

Support Case Trends – Root Cause
Page 9
450
400
350
300
250
200
150
100
50
0
Customer Environment
(Non HDP)
Documentation Defect Documentation Gap Documentation Not
Utilized
Education -
Configuration
Needs Training Product Defect
YARN
Map Reduce
HDFS

Support Case Trends
• Highlights
– Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and therefore
receive proportionally more support cases than other ecosystem components.
– Misconfiguration is the dominant root cause.
– Documentation is a close second.
– We are constantly improving the code to eliminate operational issues, help with diagnosis and provide increased
visibility.
Page 10

Hardware and Cluster Sizing
• Considerations
–Larger clusters heal faster on nodes or disk failure
–Machines with huge storage take longer to recover
–More racks give more failure domains
• Recommendations
– Get good-quality commodity hardware
– Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores
– More memory is better – real time is memory hungry!
– Before considering fatter machines (1U 6 disks vs. 2U 12 disks)
– Get to 30-40 machines or 3-4 racks
–Use pilot cluster to learn about load patterns
– Balanced hardware for I/O, compute or memory bound
– More details - http://tinyurl.com/hwx-hadoop-hw
Page 12

Configuration
• Avoid JVM issues
– Use 64 bit JVM for all daemons
– Compressed OOPS enabled by default (6 u23 and later)
– Java heap size
– Set same max and starting heapsize, Xmx == Xms
– Avoid java defaults – configure NewSize and MaxNewSize
– Use 1/8 to 1/6 of max size for JVMs larger than 4G
– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB
– Use low-latency GC collector
– -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>
– High <N> on Namenode and JobTracker or ResourceManager
– Important JVM configs to help debugging
– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails
– -XX:ErrorFile=<file>
– -XX:+HeapDumpOnOutOfMemoryError
Page 13

Configuration
• Multiple redundant dirs for namenode metadata
– One of dfs.namenode.name.dir should be on NFS
– NFS softmount - tcp,soft,intr,timeo=20,retrans=5
• Configure open fd ulimit
– Default 1024 is too low
– 16K for datanodes, 64K for Master nodes
• Use version control for configuration!
Page 14

Configuration
• Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated
– Disk failure is no longer datanode failure
– Especially important for large density nodes
• Set dfs.namenode.name.dir.restore to true
– Restores NN storage directory during checkpointing
• Take periodic backups of namenode metadata
– Make copies of the entire storage directory
• Set aside a lot of disk space for NN logs
– It is verbose – set aside multiple GBs
– Many installs configure this too small
– NN logs roll with in minutes – hard to debug issues
Page 15

Monitor Usage
• Cluster storage, nodes, files, blocks grows
– Update NN heap, handler count, number of DN xceivers
– Tweak other related config periodically
• Monitor the hardware usage for your work load
– Disk I/O, network I/O, CPU and memory usage
– Use this information when expanding cluster capacity
• Monitor the usage with HADOOP metrics
– JVM metrics – GC times, Memory used, Thread Status
– RPC metrics – especially latency to track slowdowns
–HDFS metrics
– Used storage, # of files and blocks, total load on the cluster
– File System operations
– MapReduce Metrics
– Slot utilization and Job status
• Tweak configurations during upgrades/maintenance on an ongoing basis
Page 16

Documentation
• Continual Investment in Documentation
– Hortonworks Data Platform Documentation
– http://docs.hortonworks.com/
– Apache Hadoop Documentation
– http://hadoop.apache.org/docs/current/
• Apache Hadoop Documentation
– We welcome your requests in Apache jira for documentation improvements.
– Create issues with the “documentation” label.
– Getting the end user perspective is extremely valuable.
– We would be grateful to receive documentation patches.
– It’s a great way to get started in the Apache Hadoop open source process.
– Search for unresolved issues with the “documentation” label.
– https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUC
E)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20documentation
Page 18

Software Improvements
Real Incidents and Software Improvements to Address Them

Don’t edit the metadata files!
• Editing can corrupt the cluster state
– Might result in loss of data
• Real incident
– NN misconfigured to point to another NN’s metadata
– DNs can’t register due to namespace ID mismatch
– System detected the problem correctly
– Safety net ignored by the admin!
– Admin edits the namenode VERSION file to match ids
What Happens Next?
Page 20

Improvement
• Pause deletion of blocks when the namenode starts up
– https://issues.apache.org/jira/browse/HDFS-6186
– Supports configurable delay of block deletions after NameNode startup
– Gives an admin extra time to diagnose before deletions begin
• Show when block deletion will start after NameNode startup in WebUI
– The web UI already displays the number of pending block deletions
– This will enhance the display to indicate when actual deletion will begin
Page 21

Guard Against Accidental Deletion
• rm –r deletes the data at the speed of Hadoop!
– ctrl-c of the command does not stop deletion!
– Undeleting files on datanodes is hard & time consuming
– Immediately shutdown NN, unmount disks on datanodes
– Recover deleted files
– Start namenode without the delete operation in edits
• Enable Trash
• Real Incident
– Customer is running a distro of Hadoop with trash not enabled
– Deletes a large dir (100 TB) and shuts down NN immediately
– Support person asks NN to be restarted to see if trash is enabled!
What happens next?
• Now HDFS has Snapshots!
Page 22

Improvement
• HDFS Snapshots
– A snapshot is a read-only point-in-time image of part of the file system
– A snapshot created before a deletion can be used to restore deleted data
– More coverage of snapshots later in the presentation
• HDFS ACLs
– Finer-grained control of file permissions can help prevent an accidental deletion
– More coverage of ACLs later in the presentation
Page 23

Unexpected error during HA HDFS upgrade
• Background: HDFS HA Architecture
– http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
• Real Incident
– During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames
“current” directory to “previous.tmp”.
– Permissions incorrect on metadata directory for 1 out of 3 JournalNodes.
– The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with
error.
What happens next?
Page 24

Improvement
• Improve diagnostics on storage directory rename operations by using native code.
– Logs additional root cause information for rename failure. For example, EACCES
• Split error checks in into separate conditions to improve diagnostics.
– Splits a log message about failure to delete or rename into separate log messages to clarify which specific action
failed
• When aborting NameNode or JournalNode, write the contents of the metadata directories and
permissions to logs.
– Usually the first information asked of the user, so we can automate this
• For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that
the operation can succeed.
– Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded
Page 25

Support Case Trends
• Highlights Revisited
– Core Hadoop components (HDFS, YARN and MapReduce) are used across almost all deployments, and
therefore receive proportionally more support cases than other ecosystem components.
– Action: Focus efforts on core Hadoop first to improve operability of the platform.
– Misconfiguration is the dominant root cause.
– Action: Publish configuration best practices and advise on the need for ongoing review of configuration as cluster usage
patterns change over time.
– Documentation is a close second.
– Action: Contribute frequently to product documentation, both in open source Apache Hadoop and in the distro. End user
documentation is a gating factor for launching new features. We welcome your requests in Apache jira for documentation
improvements, and we welcome your patches!
– Code changes often can be implemented to eliminate an operational issue, help with diagnosis or provide
increased visibility.
– Action: After resolution of each support case, consider potential product improvements. For example, can logging be
improved? Small code changes can have a big impact.
Page 26

Key Learnings and Best Practices
Features that Help Improve Production Operations

HDFS ACLs
• Existing HDFS POSIX permissions good, but not flexible enough
– Permission requirements may differ from the natural organizational hierarchy of users and groups.
• HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX
ACL model.
– An ACL (Access Control List) provides a way to set different permissions for specific named users or named
groups, not only the file’s owner and file’s group.
Page 28

HDFS File Permissions Example
• Authorization requirements:
– In a sales department, they would like a single user Maya (Department Manager) to
control all modifications to sales data
– Other members of sales department need to view the data, but can’t modify it.
–Everyone else in the company must not be allowed to view the data.
• Can be implemented via the following:
Read/Write perm for user
maya
User
Group
Read perm for group sales
File with sales data

HDFS ACLs
• Problem
–No longer feasible for Maya to control all modifications to the file
– New Requirement: Maya, Diane and Clark are allowed to make modifications
– New Requirement: New group called executives should be able to read the sales data
–Current permissions model only allows permissions at 1 group and 1 user
• Solution: HDFS ACLs
–Now assign different permissions to different users and groups
Owner
Group
Others
HDFS
Directory
… rwx
… rwx
… rwx
Group D … rwx
Group F … rwx
User Y … rwx

HDFS ACLs
New Tools for ACL Management (setfacl, getfacl)
– hdfs dfs -setfacl -m group:execs:r-- /sales-data
– hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group:
sales user::rw- group::r-- group:execs:r-- mask::r-- other::--
– How do you know if a directory has ACLs set?
– hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0
2014-03-04 16:31 /sales-data

HDFS ACLs
Default ACLs
–hdfs dfs -setfacl -m default:group:execs:r-x /monthly-sales-data
–hdfs dfs -mkdir /monthly-sales-data/JAN
–hdfs dfs –getfacl /monthly-sales-data/JAN
– # file: /monthly-sales-data/JAN # owner: maya # group: sales user::rwx group::r-x
group:execs:r-x mask::r-x other::--- default:user::rwx default:group::r-x
default:group:execs:r-x default:mask::r-x default:other::---

HDFS ACLs Best Practices
• Start with traditional HDFS permissions to implement most permission requirements.
• Define a smaller number of ACLs to handle exceptional cases.
• A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that
has only traditional permissions.
Page 33

HDFS Snapshots
• HDFS Snapshots
– A snapshot is a read-only point-in-time image of part of the file system
– Performance: snapshot creation is instantaneous, regardless of data size or subtree depth
– Reliability: snapshot creation is atomic
– Scalability: snapshots do not create extra copies of data blocks
– Useful for protecting against accidental deletion of data
• Example: Daily Feeds
hdfs dfs -ls /daily-feeds
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13
Page 34

HDFS Snapshots
• Create a snapshot after each daily load
hdfs dfsadmin -allowSnapshot /daily-feeds
Allowing snaphot on /daily-feeds succeeded
hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17
Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17
• User accidentally deletes data for 2014-10-16
hdfs dfs -ls /daily-feeds
Found 4 items
Page 35

HDFS Snapshots
• Snapshots to the rescue: the data is still in the snapshot
hdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/.
snapshot/snapshot-to-2014-10-17/2014-10-13
• Restore data from 2014-10-16
hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds
Page 36

YARN Application Timeline Server
• Stores data about YARN application execution
– Generic data
– YARN container utilization
– Metrics related to containers
– Application-specific data
– MapReduce jobs and their tasks
– Tez DAG execution
• Provides CLI for accessing data
– Useful for ad-hoc queries or scripted analysis
• Provides REST API for accessing data
– Consumed by UI front-ends such as Apache Ambari
Page 37

Querying a Map Reduce Job Entity
curl http://127.0.0.1:8188/ws/v1/timeline/MAPREDUCE_JOB/job_1413405332088_0001
{
"entity": "job_1413405332088_0001",
"entitytype": "MAPREDUCE_JOB",
"events": [
{
"eventinfo": {
"FINISHED_MAPS": 2,
"FINISHED_REDUCES": 1,
"FINISH_TIME": 1413405349192,
"JOB_STATUS": "SUCCEEDED"
},
"eventtype": "JOB_FINISHED",
"timestamp": 1413405349194
}
],
"relatedentities": {
"MAPREDUCE_TASK": [
"task_1413405332088_0001_m_000000"
]
},
"starttime": 1413405339442
}
Page 38

Querying a Map Task Entity
curl http://127.0.0.1:8188/ws/v1/timeline/MAPREDUCE_TASK/task_1413405332088_0001_m_000000
{
"entity": "task_1413405332088_0001_m_000000",
"entitytype": "MAPREDUCE_TASK",
"events": [
{
"eventtype": "TASK_FINISHED",
"timestamp": 1413405345253
},
{
"eventinfo": {
"SPLIT_LOCATIONS": "localhost",
"START_TIME": 1413405340255,
"TASK_TYPE": "MAP"
},
"eventtype": "TASK_STARTED",
"timestamp": 1413405340258
}
],
}
Page 39

Summary
• Configuration
– Prevent garbage collection issues
– Configure for redundancy
– Retune configuration in response to metrics
• Documentation
– End user perspective is crucial
– Please consider contributing to Apache Hadoop documentation
• HDFS ACLs
– Implement fine-grained authorization rules on files
– Can protect against accidental file manipulations
• HDFS Snapshots
– Point-in-time image of part of the filesystem
– Useful for restoring to a prior state after accidental file manipulation
• YARN Application Timeline Server
– Provides generic and application-specific data about YARN application execution
– Useful for analyzing cluster usage patterns
Page 40

Thank you, Q&A
Page 41
Resource Location
Hardware
Recommendations for
Apache Hadoop
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning-guide/
content/ch_hardware-recommendations.html
Hadoop Documentation
Issues
https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20HA
DOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20resolution%20%3
D%20Unresolved%20AND%20labels%20%3D%20documentation
HDFS operational and
debuggability
improvements
https://issues.apache.org/jira/browse/HDFS-6185
HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/
HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/
YARN Timeline Server
Documentation
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/
TimelineServer.html
Learn more

Hadoop operations-2014-strata-new-york-v5

Recommended

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Hadoop operations-2014-strata-new-york-v5 (20)

Recently uploaded (20)

Hadoop operations-2014-strata-new-york-v5

Editor's Notes