SlideShare a Scribd company logo
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Treasure Data on The YARN
Ryu Kobayashi
!
Hadoop Conference Japan 2014
8 July 2014
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Who am I?
• Ryu Kobayashi
• @ryu_kobayashi
• https://github.com/ryukobayashi
• Treasure Data, Inc.
• Software Engineer
• Background
• Hadoop, Cassandra, Machine Learning, ...
• I developed Huahin(Hadoop) Framework. 

http://huahinframework.org/
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
What is Treasure Data?
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Service
!
!
!
!
Columnar Storage!
+!
Hadoop!
MapReduce!
Data Collection Data Warehouse Data Analysis
!
!
!
Open-Source!
Log Collector!
Bulk Loader!
!
CSV / TSV!
MySQL,
Postgres!
Oracle, etc.
Web Log
App Log
Sensor
RDBMS
CRM
ERP
Streaming Upload
BI Tools!
Tableau, QlickView,!
Pentaho, Excel, etc.!
!
TD command / 

Web Console
REST API
JDBC / ODBC
SQL
(HiveQL)
or
Pig
Bulk Upload
Parallel Upload
External Service/
Storage!
Custom App,!
RDBMS, FTP, etc.
Result push
schema-less!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Service
!
!
!
!
Columnar Storage!
+!
Hadoop!
MapReduce!
Data Collection Data Warehouse Data Analysis
!
!
!
Open-Source!
Log Collector!
Bulk Loader!
!
CSV / TSV!
MySQL,
Postgres!
Oracle, etc.
Web Log
App Log
Sensor
RDBMS
CRM
ERP
Streaming Upload
BI Tools!
Tableau, QlickView,!
Pentaho, Excel, etc.!
!
TD command / 

Web Console
REST API
JDBC / ODBC
SQL
(HiveQL)
or
Pig
Bulk Upload
Parallel Upload
External Service/
Storage!
Custom App,!
RDBMS, FTP, etc.
Result push
schema-less!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Query Language
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Service
!
!
!
!
Columnar Storage!
+!
Hadoop!
MapReduce!
Data Collection Data Warehouse Data Analysis
!
!
!
Open-Source!
Log Collector!
Bulk Loader!
!
CSV / TSV!
MySQL,
Postgres!
Oracle, etc.
Web Log
App Log
Sensor
RDBMS
CRM
ERP
Streaming Upload
BI Tools!
Tableau, QlickView,!
Pentaho, Excel, etc.!
!
TD command / 

Web Console
REST API
JDBC / ODBC
SQL
(HiveQL)
or
Pig
Bulk Upload
Parallel Upload
External Service/
Storage!
Custom App,!
RDBMS, FTP, etc.
Result push
schema-less!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Hadoop&Cluster
PlazmaDB
Our System
HDFS is not used
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Hadoop&Cluster
PlazmaDB
Our System
HDFS is not used
• Customize Hadoop
• Customize Hive
• Customize Pig
• Customize Impala
• Customize Presto
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
We have 4 production’s
Hadoop Cluster
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
We have 4 production’s
Hadoop Cluster
user1,&user4,&
user5,&…
user2,&user9,&
user34,&…
user10,&user40,&
user102,&…
user50,&user88,&
user1023,&…
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Scheduler and Queue
QueueScheduler
Hadoop&Cluster Hadoop&Cluster
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
We have 4 production’s
Hadoop Cluster and
Hadoop Cluster(YARN)
YARN&Cluster
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
MRv1 and YARN Queue
Queue
Hadoop&Cluster Hadoop&Cluster
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Our Service
• About 4700 users
• About 6 trillion records
• About 12 million Jobs
• About 40,000 Job by day
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
What is YARN?
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
YARN(Yet Another Resource Negotiator)
Architecture
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• MRv1
• JobTracker
• TaskTracker
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• YARN
• ResourceManager
• NodeManager
• ApplicationMaster
• Job History Server
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• MRv1
• JobTracker
• TaskTracker
• YARN
• ResourceManager
• NodeManager
• ApplicationMaster
• Job History Server
* ******(We*can*not*see*the*log*history*If*it*do*not*install)
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Note!!!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Use the Hadoop 2.4.0
and later!!!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• The versions which must not be used
• Apache Hadoop 2.2.0
• Apache Hadoop 2.3.0
• HDP 2.0(2.2.0 based)
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• Currently
• Apache Hadoop 2.4.1
• CDH 5.0.2(2.3.0 based and patch)
• HDP 2.1(2.4.0 based)
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• Why should not use?
• Capacity Scheduler
• There is a bug
• Fair Scheduler
• There is a bug
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
• Any bugs?
• Each Scheduler will cause
a deadlock
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Distribution
• CDH 5.0.2
• Red Hat/CentOS/Oracle 5
• Red Hat/CentOS/Oracle 6
• Ubuntu/Debian
• HDP 2.1
• Red Hat/CentOS/SLES (64-bit)
• (There is already Ubuntu12 to the
repository)
• Windows Server 2008 & 2012
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Configuration file has been changed
several(YARN from MRv1)
!
reference: http://goo.gl/vBIYQP
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Deprecated Properties
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Other notes for configuration file
• hadoop-conf-pseudo does not work
• some mistakes
ex : yarn.nodemanager.aux-services
mapreduce.shuffle -> mapreduce_shuffle
• 2.2.0 and 2.4.0
• There are some differences
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
What should we do?
• Copy of CDH VM and HDP VM
configuration files
• Use the Ambari or Cloudera
Manager
• I work hard on their own!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Slot has been changed(YARN from MRv1)
• MRv1
• map slot, reduce slot
• YARN(MRv2)
• resource(container)
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
mapred-site.xml
• mapred.tasktracker.map.tasks.maximum
• mapred.tasktracker.reduce.tasks.maximum
scheduler.xml
• maxMaps, minMaps
• maxReduces, minReduces
MRv1
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
yarn-site.xml
• yarn.nodemanager.resource.memory-mb
• (yarn.nodenamager.vmem-pmem-ratio)
• (yarn.scheduler.minimum-allocation-mb)
mapred-site.xml
• yarn.app.mapreduce.am.resource.mb
• mapreduce.map.memory.mb
• mapreduce.reduce.memory.mb
fair-scheduler.xml
• maxResources, minResources
YARN(MRv2)
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
yarn.nodemanager.resource.memory-mb =>
Memory that NodeManager uses
!
yarn.app.mapreduce.am.resource.mb =>
Memory that ApplicationMaster uses
!
mapreduce.map.memory.mb =>
Memory that Map uses
!
mapreduce.reduce.memory.mb =>
Memory that Reduce uses
YANR Resource Management
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
yarn.nodemanager.resource.memory-mb = 4096
yarn.app.mapreduce.am.resource.mb = 1024
mapreduce.map.memory.mb = 1024
mapreduce.reduce.memory.mb = 2048
!
MRv2 Application
	 ApplicationMaster => 1
	 	 Mapper => 3
	 	 	 Reducer => 1
YANR Resource Example
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
In addition to this(ex: Fair Scheduler):
	 minResources
	 maxResources
	 maxRunningApps
	 schedulingPolicy
YANR Resource Example
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
In addition to this(ex: Fair Scheduler):
	 pool -> queue
	 user. maxRunningJobs -> user. maxRunningApps
	 userMaxJobsDefault -> userMaxAppsDefault
	 etc…
Changes Fair scheduler
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
yarn.nodemanager.resource.memoryDmb
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
YANR Scheduler Management
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
e.g.
	 Use hdp-configuration-utils.py script
	 	 http://goo.gl/L2hxyq
!
	 Use Ambari
	 	 http://ambari.apache.org/
	 	 (not supported Ubuntu12.
	 	 Ubuntu 12 support is coming soon)
YANR Resource Management
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
DefaultContainerExecuter
• Container launch process based
• Same as the conventional(MRv1)
!
LinuxContainerExecuter
• Only Linux
• Some restrictions
• cgroup, etc…
YANR Container Executer
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
MRv1
• The need to set the initial
!
YARN
• The need to set the initial
• There is a change from MRv1 (ex: /tmp/hadoop-yarn/)
YANR Directory Structure
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
What should we do?
• Reference the CDH VM and HDP
VM HDFS directory
• Use the Ambari or Cloudera
Manager
• I work hard on their own!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Enjoy the YARN!!!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
We are hiring!!!
Copyright*©2014*Treasure*Data.**All*Rights*Reserved.
Thanks!!!

More Related Content

PPTX
Hadoop configuration & performance tuning
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
PDF
White paper hadoop performancetuning
PPTX
Hadoop Architecture_Cluster_Cap_Plan
PPT
Hadoop
PDF
알쓸신잡
ODP
Tune hadoop
PDF
How to Increase Performance of Your Hadoop Cluster
Hadoop configuration & performance tuning
Optimizing your Infrastrucure and Operating System for Hadoop
White paper hadoop performancetuning
Hadoop Architecture_Cluster_Cap_Plan
Hadoop
알쓸신잡
Tune hadoop
How to Increase Performance of Your Hadoop Cluster

What's hot (20)

PPT
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
PDF
Introduction to Hadoop
PPT
Hw09 Monitoring Best Practices
PDF
Improving Hadoop Performance via Linux
PDF
Hadoop Operations at LinkedIn
PPT
Hadoop for Scientific Workloads__HadoopSummit2010
PDF
Troubleshooting Hadoop: Distributed Debugging
PPTX
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
PDF
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
PDF
Keynote: Getting Serious about MySQL and Hadoop at Continuent
PDF
Improving Hadoop Cluster Performance via Linux Configuration
PPTX
HBase with MapR
PDF
Hadoop 2.0 handout 5.0
PPTX
A Basic Introduction to the Hadoop eco system - no animation
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ODP
Hadoop2.2
PPT
Hadoop Performance at LinkedIn
PPT
Hadoop 1.x vs 2
PDF
Deview2013 SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
PDF
Introduction To Elastic MapReduce at WHUG
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Introduction to Hadoop
Hw09 Monitoring Best Practices
Improving Hadoop Performance via Linux
Hadoop Operations at LinkedIn
Hadoop for Scientific Workloads__HadoopSummit2010
Troubleshooting Hadoop: Distributed Debugging
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Improving Hadoop Cluster Performance via Linux Configuration
HBase with MapR
Hadoop 2.0 handout 5.0
A Basic Introduction to the Hadoop eco system - no animation
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
Hadoop2.2
Hadoop Performance at LinkedIn
Hadoop 1.x vs 2
Deview2013 SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
Introduction To Elastic MapReduce at WHUG
Ad

Viewers also liked (12)

PDF
Taming YARN @ Hadoop conference Japan 2014
PDF
Sparkパフォーマンス検証
PDF
FluentdやNorikraを使った データ集約基盤への取り組み紹介
PDF
Hivemall v0.3の機能紹介@1st Hivemall meetup
PDF
「PV、UBなどの数値からでは見えてこないユーザー行動の可視化」#yjdsw2
PPTX
Hadoopカンファレンス20140707
PDF
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
PDF
「最近傍検索とその応用」#yjdsw2
PDF
Hcj2014 myui
PDF
Gwt sdm public
PDF
実践機械学習 — MahoutとSolrを活用したレコメンデーションにおけるイノベーション - 2014/07/08 Hadoop Conference ...
PDF
Shib: WebUI tool provides crossover of Hive and MPP
Taming YARN @ Hadoop conference Japan 2014
Sparkパフォーマンス検証
FluentdやNorikraを使った データ集約基盤への取り組み紹介
Hivemall v0.3の機能紹介@1st Hivemall meetup
「PV、UBなどの数値からでは見えてこないユーザー行動の可視化」#yjdsw2
Hadoopカンファレンス20140707
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
「最近傍検索とその応用」#yjdsw2
Hcj2014 myui
Gwt sdm public
実践機械学習 — MahoutとSolrを活用したレコメンデーションにおけるイノベーション - 2014/07/08 Hadoop Conference ...
Shib: WebUI tool provides crossover of Hive and MPP
Ad

Similar to Treasure Data on The YARN - Hadoop Conference Japan 2014 (20)

PDF
HDP2 and YARN operations point
PDF
Apache Hadoop YARN
PPTX
Yarnthug2014
PDF
Unleash your cluster with YARN
PPTX
MHUG - YARN
PPTX
YARN - Presented At Dallas Hadoop User Group
PPTX
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
PDF
Taming YARN @ Hadoop Conference Japan 2014
PDF
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PPTX
PDF
Hadoop map reduce v2
PPTX
Anatomy of Hadoop YARN
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
PPTX
YARN - Hadoop Next Generation Compute Platform
PDF
PDF
PDF
Hadoop 2.0 YARN webinar
PDF
Introduction to yarn
HDP2 and YARN operations point
Apache Hadoop YARN
Yarnthug2014
Unleash your cluster with YARN
MHUG - YARN
YARN - Presented At Dallas Hadoop User Group
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Taming YARN @ Hadoop Conference Japan 2014
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hadoop map reduce v2
Anatomy of Hadoop YARN
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
YARN - Hadoop Next Generation Compute Platform
Hadoop 2.0 YARN webinar
Introduction to yarn

More from Ryu Kobayashi (7)

PDF
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
PDF
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
KEY
Hadoop Conference Japan 2011 Fall
KEY
Developers summit cassandraで見るNoSQL
KEY
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
KEY
AWSを使ったトラッキングログ収集
KEY
Hadoopソースコードリーディング MapReduce障害時のフロー
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
Hadoop Conference Japan 2011 Fall
Developers summit cassandraで見るNoSQL
Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra
AWSを使ったトラッキングログ収集
Hadoopソースコードリーディング MapReduce障害時のフロー

Recently uploaded (20)

PPTX
10 Hidden App Development Costs That Can Sink Your Startup.pptx
PDF
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
PDF
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
PDF
Micromaid: A simple Mermaid-like chart generator for Pharo
PPTX
AIRLINE PRICE API | FLIGHT API COST |
PDF
A REACT POMODORO TIMER WEB APPLICATION.pdf
PDF
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Safe Confined Space Entry Monitoring_ Singapore Experts.pptx
PDF
Exploring AI Agents in Process Industries
PDF
Build Multi-agent using Agent Development Kit
PPTX
Mastering-Cybersecurity-The-Crucial-Role-of-Antivirus-Support-Services.pptx
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
DOCX
The Future of Smart Factories Why Embedded Analytics Leads the Way
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Presentation of Computer CLASS 2 .pptx
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
PDF
Best Practices for Rolling Out Competency Management Software.pdf
PDF
The Role of Automation and AI in EHS Management for Data Centers.pdf
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
10 Hidden App Development Costs That Can Sink Your Startup.pptx
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
Micromaid: A simple Mermaid-like chart generator for Pharo
AIRLINE PRICE API | FLIGHT API COST |
A REACT POMODORO TIMER WEB APPLICATION.pdf
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
Online Work Permit System for Fast Permit Processing
Safe Confined Space Entry Monitoring_ Singapore Experts.pptx
Exploring AI Agents in Process Industries
Build Multi-agent using Agent Development Kit
Mastering-Cybersecurity-The-Crucial-Role-of-Antivirus-Support-Services.pptx
Materi_Pemrograman_Komputer-Looping.pptx
The Future of Smart Factories Why Embedded Analytics Leads the Way
2025 Textile ERP Trends: SAP, Odoo & Oracle
Presentation of Computer CLASS 2 .pptx
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Best Practices for Rolling Out Competency Management Software.pdf
The Role of Automation and AI in EHS Management for Data Centers.pdf
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...

Treasure Data on The YARN - Hadoop Conference Japan 2014