유연하고 확장성 있는 빅데이터 처리

Onyx:
A Flexible and Extensible
Data Processing System
전병곤, 김주연, 송원욱
Software Platform Lab
Joint work with 양영석, 이산하, 서장호, 어정윤, 이계원, 엄태건, 이우연,
이윤성, 정주성, 하현민, 정은지, 김수정, 유경인, 신동진
1

Data Processing from 10,000 Feet
2
Data Processing Application
Data Processing Framework
Resource Environment
Spark, Flink,
Hadoop MR,
Dryad, Tez,
...

3
Spark, Flink,
Hadoop MR,
Dryad, Tez,
...
Existing frameworks perform poorly in new resource
environments (e.g., disaggregation, transient resources)

Disaggregation
4
Compute Storage
(Ref. OpenCompute)
Intermediate data generated from compute nodes
should be written to and read from storage nodes.

Transient Resources
5
Preemption!
Task preemption can cause expensive recomputation.

Cross Datacenter
6
Wide-area network bandwidth is scarce and expensive

7
Spark, Flink,
Hadoop MR,
Dryad, Tez,
...
It is hard to add new application optimization features
to existing frameworks.

Dynamic Optimization
Dynamic skew handling
Optimizing job execution based on its characteristics
Adapting execution to resource elasticity
8

Onyx
Key observation: current data processing frameworks
are not flexible and extensible.
9
=> Onyx: A new flexible and extensible data processing
system

Onyx Architecture
Dataflow Program
Onyx Compiler
Onyx Runtime
Cluster
10

Onyx Compiler
11
Beam Program
Execution Plan
OnyxCompiler
Beam Frontend
Onyx Backend
Spark Frontend
Spark Program
IR
DAG

IR (Intermediate Representation) DAG
: Program-agnostic DAG with Annotations
12
Vertex Edge
Vertex Labels
Type: Operator/Loop
Placement: GPUNode/
ReservedNode/TransientNode/Any
Parallelism
Edge Labels
Type: 1:1/Broadcast/Shuffle
Mode: Push/Pull
Storage: Memory/Disk/RemoteDisk

MapReduce IR DAG Example
13
Shuffle,Pull,Disk
Classical MapReduce
Small-scale MapReduce
Shuffle,Push,Memory
Map
Map Reduce
Reduce

Compiler Passes
Transform an IR DAG into an optimized IR DAG after a series of “passes”
Compile-time annotation pass examples
● Parallelism pass
● Executor placement pass
● Data flow model pass
● Stage partitioning pass
14

Compiler Passes
Transform an IR DAG into an optimized IR DAG after a series of “passes”
Compile-time reshaping pass examples
● Loop extraction pass
● Loop fusion pass (loop optimization)
● Common subexpression elimination pass
● Data skew reshaping pass
Runtime pass example
● Data skew runtime pass
15

Compiler to Runtime
1616
Type: “Map” Operator
Placement: “Reserved” Node
Parallelism: 100
Shuffle,Pull,Disk
Type: “Reduce” Operator
Placement: “Reserved” Node
Parallelism: 50
Reduce Stage
Index
Map Stage
Index
Optimized IR DAG

Compiler to Runtime
1717
Stage Stage
“Map”Tasks “Reduce”Tasks.
.
.
.
.
.
.
X 100
.
.
X 50
I/O channels for
intermediate data flow
between tasks
Execution Plan

Distributed Execution in Onyx Runtime
Stage
18
Executor Executor Executor Executor
Master

Master Stage
19
TaskGroup(Tasks)

Master Stage
20

Onyx in Action
● Onyx implementation
● Onyx compiler and runtime components
● Onyx job execution
● Onyx dynamic optimization
22

Onyx Implementation
● Programming Models:
○ Apache Beam applications supported
○ Spark applications coming up shortly
● Implemented on Apache REEF
○ which uses YARN or Mesos for resource management
● Implemented using Java 8
○ makes good use of lambda and stream 23

Job Execution Demo
Will show how:
1. Job execution can be controlled flexibly and
2. Job execution properties can be extended using:
a. Annotation Pass
b. Policy
3. An iterative part of a job can be represented using:
a. LoopExtraction Pass (a Reshaping Pass)
4. Status of a running job can be monitored using:
a. a Web UI 27
MapReduce
ALS

MapReduce
We will show two executions of MapReduce using different
settings:
● Intermediate data is saved in disk, and pulled by the reducers
● Intermediate data is saved in memory, and pushed to the reducers
28

Demo
Map Data in Disk, Pulled
29
Shuffle,Pull,Disk
Reduce
Stage
Map
Stage

Demo
Map Data in Memory, Pushed
30
Shuffle,Push,Memory
Reduce
Stage
Map
Stage

Alternating Least Squares Example
● Alternating Least Square is an ML algorithm used
commonly in recommendation systems.
● Most ML algorithms are iterative processes
=> ALS is one of them!
34

Naively…
35
(Read input data) . . . . . . . . . . . . (Write output). . . . . . .
Iteration 1 Iteration 2 Iteration N
But what if we want to decide this
“N” according to some condition?
(ex. model convergence in ML)
A set of operators that executes the ALS algorithm

Something special we have for the ALS example: Loops!
36
(Read input data) . . . . . . . . . . . . (Write output)
LoopVertex
with termination condition
(Read input data) . . . . . . . . . (Write output). . . . . .
Iteration 1 Iteration NIteration 2

Dynamic Optimization
Will show how Onyx achieves dynamic optimization using:
1. Reshaping Pass
=> for metric collection
2. Runtime Pass
=> for generating a dynamically optimized plan
41

Dynamic Data Partitioning Example
● What happens if there is a data skew while executing a job?
● How do we detect such a data skew and partition data appropriately?
42
Onyx Compiler
Onyx Runtime
AnnotationPass(es)
IR DAG

43
Onyx Compiler
Onyx Runtime
ReshapingPass
IR DAG

47
Onyx Compiler
Onyx Runtime
StageStage
Optimized IR DAGExecution Plan Conversion

48
Onyx Compiler
Onyx Runtime
Stage
Stage
Execution Plan
Execution Plan Conversion

49
Onyx Compiler
Onyx Runtime
Execute!
Stage
Stage
Execution Plan

50
Onyx Compiler
Onyx Runtime
Data Size Metric
Job Executing...

51
Onyx Compiler
Onyx Runtime
New IR DAG
RuntimePass(es)

52
Onyx Compiler
Onyx Runtime
Execute!
New Execution Plan
Stage
Stage

Lessons Learned
1. Dynamic Optimization: extensible to any job
a. A Reshaping Pass to define when customizable metric should be
received from Runtime
b. A Runtime Pass to define how to change the DAG using the received
metric
53

Lessons Learned
2. Extend the various options for execution properties by
a. Implementing new Compile-Time Passes (Annotation + Reshaping)
b. Adding new implementations of the interfaces of the configurable
components for Runtime
54

Lessons Learned
3. Flexibly control the execution properties by:
a. Pre-defined/newly implemented Compile-Time Passes
b. Using Composite Passes
c. Using Policies
55

Harnessing Transient Resources with Onyx
56

Harnessing Transient Resources with Onyx
57
Pado (EuroSys 2017): A Special Data Processing Engine for
Harnessing Transient Resources
as a simple policy on
Onyx, a flexible and extensible data processing system.

Batch Engine
58
MapReduce
Flume
Spark
...
Transient Resources
?

59
Transient Resources
Resources borrowed from
over-provisioned latency-critical jobs
(search service, online mall, etc.)

Data Analytics with Transient Resources
60
....
Dataflow
Program
Transient

61
....
Dataflow
Program
Execute! Transient
Tasks Tasks Tasks Tasks

62
....
Dataflow
Program
Execute! Transient

63
....
Dataflow
Program
Execute! Transient
Data
Data
Data

Solution
64
....
Dataflow
Program Transient

Solution
65
....
Dataflow
Program Transient
Analyze

Solution
66
....
Dataflow
Program
Other
Computations
Valuable
Computations Reserved
Transient
Analyze

Valuable
Our definition of Valuable computations
Not so valuable
One-to-One One-to-Many Many-to-One Many-to-Many

Valuable
Our definition of Valuable computations
Not so valuable
One-to-One One-to-Many Many-to-One Many-to-Many
... ... ... ...

69
No dependency Many-to-Many
Map Reduce
Many-to-Many
Map Reduce

70
No dependency
⇒ Not so valuable
⇒ Transient
Many-to-Many
⇒ Valuable
⇒ Reserved
Map Reduce
Many-to-Many
Map Reduce

Batch Engines (e.g., Spark)
2 Transient, 1 Reserved Containers 71
Our Approach
ReservedTransient

Map, Reduce tasks on each
container 72
ReservedTransient
Our Approach
Map1 Map2 Map3
Reduce1 Reduce2 Reduce3

Map tasks on Transient and
Reduce task on Reserved73
Our Approach
Map1 Map2 Map3
Reduce1 Reduce2 Reduce3 Reduce1
ReservedTransient
Map1 Map2

74
Our Approach
Map1 Map2 Map3 Map1 Map2
Push Map Outputs to Destination
Reserved Containers
ReservedTransient
Maintain Map Outputs
on Local Disks

75
Our Approach
ReservedTransient
Reduce1
Read Input Data from Local
Reserved Containers
Pull Map Outputs

76
Our Approach
ReservedTransient
Reduce1
Eviction of Transient Containers
→ Map Outputs Not Destroyed
Eviction of Transient Containers
→ Map Outputs Destroyed

77
Our Approach
Map1 Map2 Map3
Cascading Recomputation of
5 Tasks
ReservedTransient
Reduce1
No Recomputation

Step 1:
Transient/Reserved
Executor Placement Pass
78

Operator Placement Example with the
Transient Resource Policy
Multinomial Logistic Regression(MLR)
: Machine learning application for classifying
inputs, like tumors as malignant or benign, and
ad clicks as profitable or not.
Gradients are used to update the regression
model, which is used for prediction.
79

Executor Placement Example
Create
1st
Model
Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
80
One-to-One
One-to-Many
Many-to-One Costly!

Create
1st
Model
Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved TransientNo
Dependency
No
Dependency
81
Many-to-One Costly!
One-to-One
One-to-Many

Create
1st
Model
Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
82
Many-to-One Costly!
No Costly Dependency
with Parents
One-to-One
One-to-Many

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved TransientCostly Dependency
with Parent
83
Many-to-One Costly!
One-to-One
One-to-Many
Costly Dependency
with Parent, Pipelined
Create
1st
Model

Step 2:
Data Flow Model Pass
84

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
85
Recall..
Safe! Prone to
evictions :(
Create
1st
Model

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
86
Must evacuate data out of transient executors ASAP
Create
1st
Model

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
87
Push data out as soon as it is ready!
Create
1st
Model Push

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
88
No need to hurry for data in Reserved containers
Pull Pull
Push
Create
1st
Model
Pull
Pull

Step 3:
Stage Partitioning Pass
89

Stage Partitioning in Compiler
90
Execute subgraph-by-subgraph
⇒ Partition into subgraphs
⇒ Good abstraction for handling evictions/faults

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
91
Stage Partitioning Example
Create
1st
Model
Pull Pull
Push
Pull
Pull

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
92
Create
1st
Model

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
93
Create
1st
Model

Compute
Gradient
Aggr
Gradient
Compute
2nd
Model
Read
Training
Data
....
Reserved Transient
94
Create
1st
Model

Demo
Executor Placement Pass
DataFlowModel Pass
Stage Partitioning Pass
with MLR example
95

ExecutorPlacementPass
97
1-to-1
1-to-1
1-to-many
Many-to-Many

DataFlowModelPass
Pull
Push
Pull

Not 1-to-1
Not 1-to-1
StagePartitioningPass
Stage-1
Stage-2
Stage-3

Batch Engines
100
Spark 2.0.0
Onyx with
suggested
optimizations
VS

Containers
● Amazon EC2s(with local SSDs) as containers
● 40 Transient Containers, 5 Reserved Containers
● All containers used for computation
101

Workloads
● Alternating Least Squares
Yahoo! Music User Ratings of Songs with Artist, Album, and Genre Meta
Information, v. 1.0. https://webscope. sandbox.yahoo.com/catalog.php?datatype=r
● Multinomial Logistic Regression
Synthetic
● Map-Reduce
Page view statistics for Wikimedia projects.
https://dumps.wikimedia.org/other/pagecounts-raw
102

Job Completion Time (Lower is Better)
103
4.13x
3.52x
5.15x

Summary
● Introduces a new data processing system that is flexible
and extensible
○ Compiler that represents various execution policies
○ Runtime that are modular and reconfigurable
● Adapts data processing seamlessly for new deployment
and application requirements
104

105
We are working on creating an Apache incubator
project. We look forward contribution from many
developers!
We are hiring software developers!
Contact: onyx@spl.snu.ac.kr
Software platform lab site: http://spl.snu.ac.kr

Onyx:
A Flexible and Extensible
Data Processing System
전병곤, 김주연, 송원욱
Software Platform Lab
Joint work with 양영석, 이산하, 서장호, 어정윤, 이계원, 엄태건, 이우연,
이윤성, 정주성, 하현민, 정은지, 김수정, 유경인, 신동진
106

유연하고 확장성 있는 빅데이터 처리

Recommended

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to 유연하고 확장성 있는 빅데이터 처리 (20)

More from NAVER D2 (20)

Recently uploaded (20)

유연하고 확장성 있는 빅데이터 처리