Towards Scalable Service Composition on Multicores

Towards Scalable Service
Composition on Multicores
Daniele Bonetta,
Achille Peternier, Cesare Pautasso,Walter Binder
Faculty of Informatics
University of Lugano - USI
Switzerland
http://sosoa.inf.usi.ch
daniele.bonetta@usi.ch

Service Composition
Build Services by reusing existing Web services
Client
Web
Service
Web
Service
Web
Service
Composite
Web
Service

Composition Engines
Focus: Service Composition Runtime
Execution Environment
Client
Web
Service
Web
Service
Web
Service
Composite
Web
Service
Service
Composition
Engine

How to scale?
Client
Web
Service
Web
Service
Web
Service
Composite
Web
Service
Service
Composition
EngineClient
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client

How to scale?
Client
Web
Service
Web
Service
Web
Service
Composite
Web
Service
Service
Composition
EngineClient
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
ent
ient
ent
ent
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client
Client

Outline
1. Problem: Scalable Service Composition
2. Opportunity: Multicores
3. Scalable Composition Engine Architecture
4. Multicore-Aware Deployment
5. Preliminary Evaluation
6. Conclusion & Outlook

Scalability Constraints
• Service Level Agreement
• Response Time
• Throughput
• Portability
• Heterogeneous environments

Existing solutions
• Centralized:
• Scale on cluster of computers, beowulf
• Decentralized:
• Scale on P2P networks
s.c.e.
ws
ws
ws
cli
cli
ws
ws
ws
sce
sce
sce
sce

Scalability on the Cloud
Today’s challenge
The Cloud
Services
Data
Code
Service
Composition
Engine

Portability
The Cloud
Service
Composition
Engine
Before scaling out on the cloud it is important
to make efﬁcient usage of the hardware
architectures that are available on the Cloud
The cloud is a very heterogeneous environment

Scalability on Multicores
core core core core core core
Quad-Core
AMD Opteron
core
core core
core

Scalability on Multicores
On top of today’s heterogeneous hardware
• Different number of
cores
• Different type of
cores (SMT = n)
• Different chip
memory layouts
(cache levels, cache
size, NUMA)

Engine Architecture
Run a large number of concurrent
compositions with a limited
number of execution threads
Request
Handler
Kernel Invoker

Engine Architecture
Request
Handler
Kernel Invoker
• 3-stage Pipeline
•Thread Pools
• Non-blocking I/O

Deployment on Multicores
core
core core
core core
core corecore
core
core
core core
core
core
core
core
...
#2 #4 #6 # n
// threads
Request
Handler
Kernel Invoker

Deployment on Multicores
core
core core
core core
core corecore
core
core
core core
core
core
core
core
...
#2 #4 #6 # n
// threads
Request
Handler
Kernel Invoker
How?

!

!

#

!$
$

!

!
#
$

!

!

#

!
$

$

!

!
#
$

!

!

#

!$
$

!

!
#
$

!

!

#

!$
$
# #

!#

#
#
# #

!#
# ##
#$ #

!$
$ $
$
$

!$
$ $
$ $#

!$$
$$ $

!

!
#
$

!

!

#

! $
$

!

!
#
$

!

!

#
• 4 P7 CPUs
• 32 cores
• 128 // threads

Deployment Challenge
How to scale on multicores?
Just increase the number of
parallel concurrent threads
in the engine?

Experimental Results
200
400
600
800
1000
1200
1400
1600
1800
0 20 40 60 80 100 120 140
Throughput(Instances/sec)
Number of threads (per pool)
ForEach
Sequential
Parallel
Loop
Just increasing the number of threads...
# of threads
Throughput(req/s)

Our Proposal
Topology-Aware deployment

Our Proposal
• Replicate the architecture instead of just
increasing the number of threads

Our Proposal
• Bind threads to speciﬁc afﬁnity groups

Our Proposal
• Bind threads to speciﬁc afﬁnity groups
• Distribute resources(memory/threads)
among replicas proportionally to hw-
resources and number of replicas

Example
• 4 cores, 4 L1 caches, 2 L2 caches
L2 cache L2 cache
L1 L1 L1 L1
C1 C2 C3 C4

Single Instance
This baseline deployment lets the OS thread
scheduler map the engine threads on all cores
L2 cache L2 cache
L1 L1 L1 L1
C1 C2 C3 C4
Engine Instance
(8 threads)

Two instances
The threads of each
instance are bound to speciﬁc cores
L2 cache L2 cache
L1 L1 L1 L1
C1 C2 C3 C4
Instance
#1
(4 threads)
Instance
#2
(4 threads)

Hardware Awareness
1. Gather hardware topology information:
• #cores, #caches, #cache-levels, ...
2. Replicate the engine architecture:
• One instance per last-level shared cache
• Conﬁgure the thread pool sizes
Self-conﬁguration at startup:

Sequence
Invoke/ 6x
(a)
/Sequence
Sequential
Foreach 6x
Invoke/
(b)
/Foreach
Foreach
Flow
Invoke/ 6x
(c)
/Flow
Parallel
While 6x
Invoke/
(d)
/While
Loop
......

200
400
600
800
1000
1200
1400
1600
1800
0 20 40 60 80 100 120 140
Number of threads (per pool)
ForEach
Sequential
Parallel
Loop
Just increasing the number of threads...
# of threads
Throughput(res/s)

Fixing the number of threads to the
optimal value number
# of
Replicas
Request
Handler
Kernel Invoker Total
1
2
6
12
12
6
2
1
12
6
2
1
12
6
2
1
36
36
36
36

2 x AMD Barcelona 6 cores processors with 2 LLC
300
600
900
1200
1500
1800
2100
300 600 900 1200 1500 1800 2100 2400 2700 3000 3300
Number of Clients
1 Replica
2 Replicas
6 Replicas
12 Replicas
Scalability (Throughput up to 3300 clients)
# of clients
Throughput(res/s)

Towards Scalable Service Composition on Multicores

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Towards Scalable Service Composition on Multicores (20)

More from Cesare Pautasso (20)

Recently uploaded (20)

Towards Scalable Service Composition on Multicores