Timeline Service v.2 (Hadoop Summit 2016)

(Big Data)2
How YARN Timeline Service v.2 Unlocks 360-Degree
Pla@orm Insights at Scale
Sangjin Lee @sjlee (Twi5er)
Li Lu (Hortonworks)
Vrushali Channapa5an @vrushalivc (Twi5er)

Outline
• Why v.2?
• Highlights
• Developing for Timeline Service v.2
• SeIng up Timeline Service v.2
• Milestones
• Demo

Why v.2?
• YARN Timeline Service v 1.x
• Gained good adopSon: Tez, HIVE, Pig, etc.
• Keeps improving with v 1.5 APIs and storage implementaSon
• SSll facing some fundamental challenges...

Why v.2?
• Scalability and reliability challenges
• Single instance of Timeline Server
• Storage (single local LevelDB instance)
• Usability
• Flow
• Metrics and conﬁguraSon as ﬁrst-class ciSzens
• Metrics aggregaSon up the enSty hierarchy

Highlights
v.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB storage* Scalable storage (HBase)
v.1 enSty model New v.2 enSty model
No aggregaSon Metrics aggregaSon
REST API Richer query REST API

Architecture
• SeparaSon of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• Dedicated RM collector for RM-generated data
• Collector discovery via RM
• Pluggable storage with HBase as default storage

Distributed collectors & readers
!meline
reader
!meline
reader
Storage
!meline
reader
AM
!meline
collector
NM
!meline reader pool
app metrics/events
container events/metrics
RM
!meline collector
app/container events
user queries
(worker node running AM)
(worker node running containers)
write ﬂow
read ﬂow

Collector discovery
RM
AM
app id => address
! start AM container
NM
3meline
collector
" node heartbeat
# allocate response
worker node
3meline
client

New enSty model
• Flows and ﬂow runs as parents of YARN applicaSon enSSes
• First-class conﬁguraSon (key-value pairs)
• First-class metrics (single-value or Sme series)
• Designed to handle mulS-cluster environment out of the box

What is a ﬂow?
• A ﬂow is a group of YARN
applicaSons that are launched as
parts of a logical app
• Oozie, Scalding, Pig, etc.
• name:
“frequent_visitor_stat”
• run id: 1466097809000
• version: “b9b9068”

ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 10
metric: B = 100
config: "Foo" = "bar"

ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 50
metric: B = 100
config: "Foo" = "bar"

HBase Storage
• Scalable backend
• Row Key structure
• efficient range scans
• KeyPrefixRegionSplitPolicy
• Filter pushdown
• Coprocessors for flow aggregaSon (“readless” aggregaSon)
• Cell tags for metadata (applicaSon id, aggregaSon operaSon)
• Cell Smestamps generated during put
• lei shiied with app id added to avoid overwrites

Tables in HBase
• flow run
• application
• entity
• flow activity
• app to flow

table: flow run
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)
most recent flow run stored first
coprocessor enabled

table: applicaSon
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)!
AppId
applicaSons within a flow run stored
together
most recent flow run stored first

table: enSty
Row key:
userName!clusterId!flowName!
inverted(flowRunId)!AppId!entityType!
entityId
enSSes within an applicaSon within a ﬂow run stored together per
type
• for example, all containers within a yarn applicaSon will be
stored together
pre-split table
stores information per entity run like info, relatesTo, relatedTo,
events, metrics, conﬁg

table: flow acSvity
Row key:
clusterId!
inverted(TopOfTheDay)!
userName!flowName
shows the flows that ran on that day
stores informaSon per flow like number of
runs, the run ids, versions

table: appToFlow
Row key:
clusterId!appId
- stores mapping of appId to
flowName and flowRunId

Metrics aggregaSon
• ApplicaSon level
• Rolls up sub-applicaSon metrics
• Performed in real Sme in the collectors in memory
• Flow run level
• Rolls up app level metrics
• Performed in HBase region servers via coprocessors
• Offline aggregaSon (TBD)
• Rolls up on user, queue, and flow offline periodically
• Phoenix tables
Container 1_1
“bytes” : 23
Container 1_2
“bytes” : 135
Container 2_1
“bytes” : 50
Container 3_1
“bytes” : 64
App1
“bytes”: 158
App2
“bytes”: 50
App3
“bytes”: 64
flow1
“bytes”: 208
flow2
“bytes”: 64
user1
“bytes”: 272
queue1
“bytes”: 272
App
aggregation
In collector
flow
aggregation
In hbase
offline
aggregation

FlowRun
Aggrega:on
via the HBase
Coprocessor
App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum

App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum
FlowRun
Aggrega:on
via the HBase
Coprocessor

Reader REST API: paths
• URLs under /ws/v2/Smeline
• Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/
users/user_name/flows/flow_name/runs/run_id
• Path elements may be omi5ed if they can be inferred
• flow context can be inferred by app id
• default cluster is assumed if cluster is omi5ed

Reader REST API: query params
• limit, createdTimeStart, createdTimeEnd: constrain the enSSes
• ﬁelds (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO |
IS_RELATED_TO): limit the contents to return
• metricsToRetrieve, confsToRetrieve: further limit the contents to
return
• metricsLimit: limits the number of values in a Sme series

Reader REST API: query params
• relatesTo, isRelatedTo: filters by associaSon
• *Filters: filters by info, config, metric, event, …
• Supports complex filters including operators
• metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt
20))

Developing: TimelineClient
In your application master:
// create TimelineClient v.2 style
TimelineClient client = TimelineClient.createTimelineClient(appId);
client.init(conf);
client.start();
// bind it to AM/RM client to receive the collector address
amRMClient.registerTimelineClient(client);
// create and write timeline entities
TimelineEntity entity = new TimelineEntity();
client.putEntities(entity);
// when the app is complete, stop the timeline client
client.stop();

Developing: Flow context
In your app submitter:
ApplicationSubmissionContext appContext =
app.getApplicationSubmissionContext();
// set the flow context as YARN application tags
Set<String> tags = new HashSet<>();
tags.add(TimelineUtils.generateFlowNameTag("distributed grep"));
tags.add(TimelineUtils.generateFlowVersionTag(
"3df8b0d6100530080d2e0decf9e528e57c42a90a"));
tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis()));
appContext.setApplicationTags(tags);

SeIng up Timeline Service v.2
• Set up the HBase cluster (1.1.x)
• Add the Smeline service jar to HBase
• Install the ﬂow run coprocessor
• Create tables via TimelineSchemaCreator uSlity
• Conﬁgure the YARN cluster
• Enable Timeline Service v.2
• Add hbase-site.xml for the Smeline collector and readers
• Start the Smeline reader daemon

Milestone 1 ("Alpha 1")
• Merge discussion (YARN-2928) in progress as we speak!
✓ Complete end-to-end read/write
ﬂow
✓ Real Sme applicaSon and ﬂow
aggregaSon
✓ New enSty model
✓ HBase Storage
✓ Rich REST API
✓ IntegraSon with Distributed Shell
and MapReduce
✓ YARN generic events and system
metrics

Milestones - Future
• Milestone 2 (“Alpha 2”)
• IntegraSon with new YARN
UI
• IntegraSon with more
frameworks
• Beta
• Freeze API and storage schema
• Security
• Collectors as containers
• Storage fault tolerance
• ProducSon-ready
• MigraSon-ready

Contributors
• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)
• Varun Saxena, Naganarasimha G. R. (Huawei)
• Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er)
• Zhijie Shen (now at Facebook)
• The HBase and Phoenix community!

Timeline Service v.2 (Hadoop Summit 2016)

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Timeline Service v.2 (Hadoop Summit 2016) (20)

Recently uploaded (20)

Timeline Service v.2 (Hadoop Summit 2016)