SlideShare a Scribd company logo
Why is my Hadoop* job slow?
Rajesh Balamohan
@rajeshbalamohan
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,
HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the
Apache Software Foundation.
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics and Monitoring
 Metrics as high level pointers
 Ambari Metrics System
 Ambari Grafana Integration
 HBase, HDFS, YARN Dashboards
 Metrics based alerting
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics as high level pointers
 Machine level metrics like CPU load
 Application level metrics like HDFS counters
 Metrics at point of time
 Metrics anomalies along a time series
 Correlated anomalies
 Problem is to need to know what to look for
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Metrics Service - Motivation
 Limited Ganglia capabilities
 OpenTSDB – GPL license and needs a Hadoop cluster
 Need service level aggregation as well as time based
 Alerts based on metrics system
 Ability to scale past a 1000 nodes
 Ability to perform analytics based on a use case
 Allow fine grained control over aspects like: retention, collection intervals, aggregation
 Pluggable and Extensible
First version released with Ambari 2.0.0
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Grafana Integration
 Open source dashboard builder integrated with AMS.
 Available from Ambari-2.2.2
 Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.
 Added to Ambari through API after upgrade
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Dashboard
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Dashboard
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Dashboard
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics based Alerting
 Top N support to quickly identify potential offenders
 Alerting based on time series
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging and Correlation
 HDFS, YARN Audit logs
 Caller Context
 YARN Application Timeline Service
 Lineage tracking of operations across workloads
 Ambari Log Search
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Audit Logs and Caller Context
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create
src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=mr_attempt_1464484887407_0011_m_000097_0
FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=mr_attempt_1464484887407_0011_m_000095_0
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ResourceManager Audit Logs and Caller Context
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001
CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009
CALLERCONTEXT=CLI
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008
CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012
CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Application Timeline Service
 YARN service for fine grained application level tracing
 Enables complex metadata to be recorded as the YARN app makes progress
 Allows retrieval of this timeline data based on filters
 Can be used to drive limited online analytics and extensive post-hoc analysis
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lineage Tracking using YARN Timeline
 Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1
dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2",
callerType: "HIVE_QUERY_ID",
context: "HIVE",
description: "select user, count(visit_id) as visits from users group by user order by visits” }
 Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007-
5840-4c64-9970-c1b506f68db2
hiveContext: { callerId: “workflow_abcd",
callerType: “OOZIE_ID",
context: “OOZIE",
description: “Daily ETL Summary Job” }
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Log Search
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Log Search
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tracing and Analysis
 Use Big Data methods to solve Big Data problems
 Apache Zeppelin as analytical tool
 Hive/Tez/YARN notebook for analysis
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Ad-hoc Analytics
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Analyzer
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Swimlane View
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez UI to Download Timeline Data
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enable Task Level Debug Logs in Tez
 Enable debug logs for specific class
 tez.task.log.level="INFO;org.apache.hadoop.hive.ql.io.orc=DEBUG;”
 For specific task in specific vertex
– hive --hiveconf tez.task-specific.launch.cmd-opts.list="Map 1[0]" --hiveconf tez.task-
specific.log.level="INFO;org.apache=DEBUG;”
– Adds DEBUG logs for Task 0 in Map 1.
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Swimlanes
 TEZ-1332
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Ad

More Related Content

What's hot (20)

Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
Hortonworks
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
DataWorks Summit
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
Hortonworks
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
DataWorks Summit
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
DataWorks Summit/Hadoop Summit
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Andre Langevin
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
VMware Tanzu
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
VMware Tanzu
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
Jay Lee
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
DataWorks Summit/Hadoop Summit
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
DataWorks Summit/Hadoop Summit
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
VMware Tanzu
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
VMware Tanzu
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
VMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
VMware Tanzu
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Andre Langevin
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
VMware Tanzu
 
Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2Troubleshooting App Health and Performance with PCF Metrics 1.2
Troubleshooting App Health and Performance with PCF Metrics 1.2
VMware Tanzu
 
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data GemfireSpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
SpringCamp 2016 - Apache Geode 와 Spring Data Gemfire
Jay Lee
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
VMware Tanzu
 
Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?Why Domain-Driven Design and Reactive Programming?
Why Domain-Driven Design and Reactive Programming?
VMware Tanzu
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
VMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Ensuring Cloud Native Success: Organization Transformation
Ensuring Cloud Native Success:  Organization TransformationEnsuring Cloud Native Success:  Organization Transformation
Ensuring Cloud Native Success: Organization Transformation
VMware Tanzu
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Why is my Hadoop* job slow? (20)

Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
Bikas Saha
 
Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
Bikas Saha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Data Con LA
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
Bikas Saha
 
Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
Bikas Saha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Data Con LA
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 

Why is my Hadoop* job slow?

  • 1. Why is my Hadoop* job slow? Rajesh Balamohan @rajeshbalamohan *Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
  • 2. 2 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 3. 3 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics and Monitoring  Metrics as high level pointers  Ambari Metrics System  Ambari Grafana Integration  HBase, HDFS, YARN Dashboards  Metrics based alerting
  • 4. 4 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics as high level pointers  Machine level metrics like CPU load  Application level metrics like HDFS counters  Metrics at point of time  Metrics anomalies along a time series  Correlated anomalies  Problem is to need to know what to look for
  • 5. 5 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Metrics Service - Motivation  Limited Ganglia capabilities  OpenTSDB – GPL license and needs a Hadoop cluster  Need service level aggregation as well as time based  Alerts based on metrics system  Ability to scale past a 1000 nodes  Ability to perform analytics based on a use case  Allow fine grained control over aspects like: retention, collection intervals, aggregation  Pluggable and Extensible First version released with Ambari 2.0.0
  • 6. 6 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Grafana Integration  Open source dashboard builder integrated with AMS.  Available from Ambari-2.2.2  Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.  Added to Ambari through API after upgrade
  • 7. 7 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Dashboard
  • 8. 8 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Dashboard
  • 9. 9 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Dashboard
  • 10. 10 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics based Alerting  Top N support to quickly identify potential offenders  Alerting based on time series
  • 11. 11 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 12. 12 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging and Correlation  HDFS, YARN Audit logs  Caller Context  YARN Application Timeline Service  Lineage tracking of operations across workloads  Ambari Log Search
  • 13. 13 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Audit Logs and Caller Context FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0 FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000097_0 FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000095_0
  • 14. 14 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved ResourceManager Audit Logs and Caller Context resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001 CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009 CALLERCONTEXT=CLI resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008 CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0 resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012 CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b
  • 15. 15 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Application Timeline Service  YARN service for fine grained application level tracing  Enables complex metadata to be recorded as the YARN app makes progress  Allows retrieval of this timeline data based on filters  Can be used to drive limited online analytics and extensive post-hoc analysis
  • 16. 16 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Lineage Tracking using YARN Timeline  Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1 dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2", callerType: "HIVE_QUERY_ID", context: "HIVE", description: "select user, count(visit_id) as visits from users group by user order by visits” }  Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007- 5840-4c64-9970-c1b506f68db2 hiveContext: { callerId: “workflow_abcd", callerType: “OOZIE_ID", context: “OOZIE", description: “Daily ETL Summary Job” }
  • 17. 17 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Log Search
  • 18. 18 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Log Search
  • 19. 19 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 20. 20 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tracing and Analysis  Use Big Data methods to solve Big Data problems  Apache Zeppelin as analytical tool  Hive/Tez/YARN notebook for analysis
  • 21. 21 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin for Ad-hoc Analytics
  • 22. 22 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Analyzer
  • 23. 23 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 24. 24 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 25. 25 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Swimlane View
  • 26. 26 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez UI to Download Timeline Data
  • 27. 27 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Enable Task Level Debug Logs in Tez  Enable debug logs for specific class  tez.task.log.level="INFO;org.apache.hadoop.hive.ql.io.orc=DEBUG;”  For specific task in specific vertex – hive --hiveconf tez.task-specific.launch.cmd-opts.list="Map 1[0]" --hiveconf tez.task- specific.log.level="INFO;org.apache=DEBUG;” – Adds DEBUG logs for Task 0 in Map 1.
  • 28. 28 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Swimlanes  TEZ-1332
  • 29. 29 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 30. 30 Š Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  • #3: - We will be looking at 3 aspects mainly metrics and monitoring, logging and co-relation and tracing &analysis Metrics are at the bottom level where you have counters and system level metrics gathered at runtime and exposed to high level for processing Logs are mainly at the service level, for example, HDFS, YARN, HIVE, HBASE etc Tracing and Analysis is at the application level where you are running a specific hive job or hadoop job.
  • #4: When we say about metrics and monitoring, we have heard saying “there is no smoke without fire” and metrics is something equivalent to the smoke. And that is something which provides info on where the problem could be. Usually metrics do not go wrong unless something is very broken.
  • #5: So what do I mean by metrics as high level pointers. Let me start with a simple example of one of the issues we dealt with. <customer story>
  • #10: Similarly there is YARN dashboard. YARN is a resource management layer where you arrange a bunch of queues in the cluster which have a certain amount of memory and cpu allocated to it and you run multiple jobs,and it arbitrates the cpu/memory among these jobs.
  • #13: HDFS and YARN provides the audit logs to track of the various state changes in the system.
  • #19: You can use log search directly query directly .So here we are search for give me top 5 files which are being used.
  • #21: You have lots of data distributed across systems about the metrics, audit logs, lineage etc and it is possible to make use big data tools itself to mine these logs.
  • #22: You could potentially fetch data from ambari metrics system, from audit logs, from yarn application time serivce etc and start mining it via the notebooks.
  • #31: Summarize: There are many kinds of data that are available for debugging. Like counters and system counters which are emitted at the stack level. There are audit log details to provide information about the kind of operation that are happening in the system And Also the application time line service which captures the metadata about the events. They all have ways in which each of them can be co-related to get meaningful information.