SlideShare a Scribd company logo
Harnessing the Power of
YARN with Apache Twill
Terence Yim
terence@continuuity.com
@chtyim
Harnessing the power of YARN with Apache Twill
ApacheCon 2014
A Distributed App
split split split
part part part
shuffle
Reducers
Mappers
ApacheCon 2014
A Map/Reduce Cluster
split split split
part part part
split split split
part part part
split split
part part
split split
part part
ApacheCon 2014
What Next?
ApacheCon 2014
A Message Passing (MPI) App
data
data data
data data
data
ApacheCon 2014
A Stream Processing App
Database
events
ApacheCon 2014
A Distributed Load Test
test testtest
test test test
test test test
test
test
test
Web
Service
ApacheCon 2014
A Multi-Purpose Cluster
t
ApacheCon 2014
Continuuity Reactor
• Developer-Centric Big Data Application Platform!
• Many different types of jobs in Hadoop cluster!
• Real-time stream processing!
• Ad-hoc queries!
• Map/Reduce!
• Web Apps
ApacheCon 2014
The Answer is YARN
• Resource Manager of Hadoop 2.0!
• Separates !
• Resource Management!
• Programming Paradigm!
• (Almost) any distributed app in Hadoop cluster!
• Application Master to negotiate resources
ApacheCon 2014
A YARN Application
data
data data
data data
data
YARN

Resource

Manager
App

Master
ApacheCon 2014
A Multi-Purpose Cluster
t
AM
AM
AM AM
AM
YARN

Resource

Manager
ApacheCon 2014
Node
Mgr
Node
Mgr
YARN - How it works
YARN

Resource

Manager
YARN!
Client
1.
1. Submit App Master
AM
2.
2. Start App Master in a Container
3.
3. Request Containers Task
TaskTask
4.
4. Start Tasks in Containers
ApacheCon 2014
Starting the App Master
YARN

Resource

Manager
YARN!
Client
Node
Mgr
. . .
Local

file system
Local

file system
HDFS distributed file system
Local

file system!
!
AM
AM.jar
AM.jar
1) copy 

to HDFS
AM.jar
2) copy 

to local
3) load
Node
Mgr
ApacheCon 2014
The YARN Client
1. Connect to the Resource Manager.!
2. Request a new application ID.!
3. Create a submission context and a container launch context.!
4. Define the local resources for the AM.!
5. Define the environment for the AM.!
6. Define the command to run for the AM.!
7. Define the resource limits for the AM.!
8. Submit the request to start the app master.
ApacheCon 2014
Writing the YARN Client
1. Connect to the Resource Manager:


YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress =
NetUtils.createSocketAddr(yarnConf.get(
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS));
LOG.info("Connecting to ResourceManager at " + rmAddress);
configuration rmServerConf = new Configuration(conf);
rmServerConf.setClass(
YarnConfiguration.YARN_SECURITY_INFO,
ClientRMSecurityInfo.class, SecurityInfo.class);
ClientRMProtocol resourceManager = ((ClientRMProtocol) rpc.getProxy(

ClientRMProtocol.class, rmAddress, appsManagerServerConf));
!
ApacheCon 2014
Writing the YARN Client
2) Request an application ID:

GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
resourceManager.getNewApplication(request);
LOG.info("Got new ApplicationId=" + response.getApplicationId());
3) Create a submission context and a launch context
!
ApplicationSubmissionContext appContext =
Records.newRecord(ApplicationSubmissionContext.class);
appContext.setApplicationId(appId);
appContext.setApplicationName(appName);
ContainerLaunchContext amContainer =
Records.newRecord(ContainerLaunchContext.class);
ApacheCon 2014
Writing the YARN Client
4. Define the local resources:


Map<String, LocalResource> localResources = Maps.newHashMap();
// assume the AM jar is here:
Path jarPath; // <- known path to jar file
!
// Create a resource with location, time stamp and file length
LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
amJarRsrc.setType(LocalResourceType.FILE);
amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));
FileStatus jarStatus = fs.getFileStatus(jarPath);
amJarRsrc.setTimestamp(jarStatus.getModificationTime());
amJarRsrc.setSize(jarStatus.getLen());
localResources.put("AppMaster.jar", amJarRsrc);
!
amContainer.setLocalResources(localResources);
ApacheCon 2014
Writing the YARN Client
5. Define the environment:
// Set up the environment needed for the launch context
Map<String, String> env = new HashMap<String, String>();
!
// Setup the classpath needed.
// Assuming our classes are available as local resources in the
// working directory, we need to append "." to the path.
String classPathEnv = "$CLASSPATH:./*:";
env.put("CLASSPATH", classPathEnv);
!
// setup more environment
env.put(...);
!
amContainer.setEnvironment(env);
!
ApacheCon 2014
Writing the YARN Client
6. Define the command to run for the AM:


// Construct the command to be executed on the launched container
String command =
"${JAVA_HOME}" + /bin/java" +
" MyAppMaster" +
" arg1 arg2 arg3" +
" 1>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stdout" +
" 2>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stderr";
!
List<String> commands = new ArrayList<String>();
commands.add(command);
!
// Set the commands into the container spec
amContainer.setCommands(commands);
!
ApacheCon 2014
Writing the YARN Client
7. Define the resource limits for the AM:


// Define the resource requirements for the container.
// For now, YARN only supports memory constraints.
// If the process takes more memory, it is killed by the framework.
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(amMemory);
amContainer.setResource(capability);
// Set the container launch content into the submission context
appContext.setAMContainerSpec(amContainer);
!
!
!
!
ApacheCon 2014
Writing the YARN Client
8. Submit the request to start the app master:


// Create the request to send to the Resource Manager
SubmitApplicationRequest appRequest =
Records.newRecord(SubmitApplicationRequest.class);
appRequest.setApplicationSubmissionContext(appContext);
!
// Submit the application to the ApplicationsManager
resourceManager.submitApplication(appRequest);
!
!
!
!
!
ApacheCon 2014
Node
Mgr
Node
Mgr
YARN - How it works
YARN

Resource

Manager
YARN!
Client
1.
1. Submit App Master
AM
2.
2. Start App Master in a Container
3.
3. Request Containers Task
TaskTask
4.
4. Start Tasks in Containers
ApacheCon 2014
YARN is complex
• Three different protocols to learn!
• Client -> RM, AM -> RM, AM -> NM!
• Asynchronous protocols!
• Full Power at the expense of simplicity!
• Duplication of code !
• App masters!
• YARN clients
Can it be easier?
ApacheCon 2014
YARN App vs Multi-threaded App
YARN Application Multi-threaded Java Application
YARN Client
java command to launch application
with options and arguments
Application Master main() method preparing threads
Task in Container
Runnable implementation, each runs
in its own Thread
ApacheCon 2014
Apache Twill
• Adds simplicity to the power of YARN!
• Java thread-like programming model!
• Incubated at Apache Software Foundation (Nov 2013)!
• Current release: 0.2.0-incubating
ApacheCon 2014
Hello World
public class HelloWorld {

static Logger LOG = LoggerFactory.getLogger(HelloWorld.class);


static class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World");

}

}

public static void main(String[] args) throws Exception {

YarnConfiguration conf = new YarnConfiguration();

TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");

runner.startAndWait();



TwillController controller = runner.prepare(new HelloWorldRunnable())

.start();

Services.getCompletionFuture(controller).get();

}
ApacheCon 2014
Hello World
public class HelloWorld {

static Logger LOG = LoggerFactory.getLogger(HelloWorld.class);


static class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World");

}

}

public static void main(String[] args) throws Exception {

YarnConfiguration conf = new YarnConfiguration();

TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");

runner.startAndWait();



TwillController controller = runner.prepare(new HelloWorldRunnable())

.start();

Services.getCompletionFuture(controller).get();

}
ApacheCon 2014
Hello World
public class HelloWorld {

static Logger LOG = LoggerFactory.getLogger(HelloWorld.class);


static class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World");

}

}

public static void main(String[] args) throws Exception {

YarnConfiguration conf = new YarnConfiguration();

TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");

runner.startAndWait();



TwillController controller = runner.prepare(new HelloWorldRunnable())

.start();

Services.getCompletionFuture(controller).get();

}
ApacheCon 2014
Hello World
public class HelloWorld {

static Logger LOG = LoggerFactory.getLogger(HelloWorld.class);


static class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World");

}

}

public static void main(String[] args) throws Exception {

YarnConfiguration conf = new YarnConfiguration();

TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");

runner.startAndWait();



TwillController controller = runner.prepare(new HelloWorldRunnable())

.start();

Services.getCompletionFuture(controller).get();

}
Twill is easy.
ApacheCon 2014
Architecture
!
!
Twill!
Client!
!
!
…!
!
!
Twill

Runnable
Twill

Runner
Twill

Runnable
This is the only
programming
interface you need
Node
Mgr
Twill!
AM
YARN

Resource

Manager
Task Task
Task
Task Task
Task
Node
Mgr
Node
Mgr
Node
Mgr
1.
2.
3.
4.
ApacheCon 2014
Twill Application
What if my app needs more than one type of task?!
!
!
!
!
!
!
!
ApacheCon 2014
Twill Application
What if my app needs more than one type of task?!
• Define a TwillApplication with multiple TwillRunnables inside:!


public class MyTwillApplication implements TwillApplication {
!
@Override
public TwillSpecification configure() {
return TwillSpecification.Builder.with()
.setName("Search")
.withRunnable()
.add("crawler", new CrawlerTwillRunnable()).noLocalFiles()
.add("indexer", new IndexerTwillRunnable()).noLocalFiles()
.anyOrder()
.build();
}
}
ApacheCon 2014
Features
• Real-time logging!
• Resource report!
• State recovery!
• Elastic scaling!
• Command messages!
• Service discovery!
• Bundle jar execution
ApacheCon 2014
Real-Time Logging
!
!
Twill!
Client!
Twill

Runner
LogHandler to
receive logs from

all runnables
Node
Mgr
AM+
Kafka
YARN

Resource

Manager
Task Task
Task
Task Task
Task
Node
Mgr
Node
Mgr
Node
Mgr
log stream
log
log appender
to send logs
to Kafka
ApacheCon 2014
Real-time Logging
TwillController controller =
runner.prepare(new HelloWorldRunnable())
.addLogHandler(new PrinterLogHandler(
new PrintWriter(System.out, true)))
.start();
!
OR
!
!
controller.addLogHandler(new PrinterLogHandler(
new PrintWriter(System.out, true)));
!
!
ApacheCon 2014
Resource Report
• Twill Application Master exposes HTTP endpoint!
• Resource information for each container (AM and Runnables)!
• Memory and virtual core!
• Number of live instances for each Runnable!
• Hostname of the container is running on!
• Container ID!
• Registered as AM tracking URL
ApacheCon 2014
Resource Report
• Programmatic access to resource report.!
! ! ResourceReport report = controller.getResourceReport();

Collection<TwillRunResources> resources = 

report.getRunnableResource("MyRunnable");
!
!
!
!
ApacheCon 2014
State Recovery
• What happens to the Twill app if the client terminates?!
• It keeps running!
• Can a new client take over control?
ApacheCon 2014
!
!
Twill!
Client!
State Recovery
Twill

Runner
Recover state
from ZooKeeper
Node
Mgr
AM+
Kafka
YARN

Resource

Manager
Task Task
Task
Task Task
Task
Node
Mgr
Node
Mgr
Node
Mgr
ZooKeeper
ZooKeeper
ZooKeeper
state
ApacheCon 2014
State Recovery
• All live instances of an application!
! ! Iterable<TwillController> controllers =

runner.lookup("HelloWorld");!
• A particular live instance!
! ! TwillController controller =
runner.lookup(“HelloWorld”,
RunIds.fromString("lastRunId"));
• All live instances of all applications!
! ! Iterable<LiveInfo> liveInfos = runner.lookupLive();
ApacheCon 2014
!
!
Twill!
Client!
Command Messages
Twill

Runner
Send command

message to 

runnables
Node
Mgr
AM+
Kafka
YARN

Resource

Manager
Task Task
Task
Task Task
Task
Node
Mgr
Node
Mgr
Node
Mgr
ZooKeeper
ZooKeeper
ZooKeeper
message
ApacheCon 2014
Command Messages
• Send to all Runnables:!
! ListenableFuture<Command> completion =
controller.sendCommand(

Command.Builder.of("gc").build());
• Send to the “indexer” Runnable:!
! ListenableFuture<Command> completion =
controller.sendCommand("indexer",

Command.Builder.of("flush").build());
• The Runnable implementation defines how to handle it:!
! void handleCommand(Command command) throws Exception;
ApacheCon 2014
Elastic Scaling
• Change the instance count of a live runnable!
! ListenableFuture<Integer> completion =
controller.changeInstances("crawler", 10);
// Wait for change complete
completion.get();
• Implemented by a command message to the AM.!
!
ApacheCon 2014
Service Discovery
• Running a service in Twill!
• What host/port should a client connect to?
ApacheCon 2014
!
!
Twill!
Client!
Service Discovery
Twill

Runner
Watch for change

in discovery nodes
Node
Mgr
AM+
Kafka
YARN

Resource

Manager
Task Task
Task
Task Task
Task
Node
Mgr
Node
Mgr
Node
Mgr
ZooKeeper
ZooKeeper
ZooKeeper
Service 

changes
Register

service
ApacheCon 2014
Service Discovery
• In TwillRunnable, register as a named service:!
! @Override
public void initialize(TwillContext context) {
// Starts server on random port
int port = startServer();
context.announce("service", port);
}
• Discover by service name on client side:!
! ServiceDiscovered serviceDiscovered =
controller.discoverService(“service");
!
ApacheCon 2014
Bundle Jar Execution
• Different library dependencies than Twill!
• Run existing application in YARN
ApacheCon 2014
Bundle Jar Execution
• Bundle Jar contains classes and all libraries depended on inside a jar.!
! MyMain.class
MyRecord.class
lib/guava-16.0.1.jar
lib/netty-all-4.0.17.jar
• Easy to create with build tools!
• Maven - maven-bundle-plugin!
• Gradle - apply plugin: 'osgi'
!
ApacheCon 2014
Bundle Jar Execution
• Execute with BundledJarRunnable!
• See BundledJarExample in the source tree.!
! java org.apache.twill.example.yarn.BundledJarExample
<zkConnectStr> <bundleJarPath> <mainClass> <args…>
!
• Successfully run Presto on YARN!
• Non-intrusive, no code modification for Presto!
• Simple maven project to use Presto in embedded way and create Bundle Jar!
!
ApacheCon 2014
The Road Ahead
• Scripts for easy life cycle management and scaling!
• Distributed coordination within application!
• Remote debugging!
• Non-Java application!
• Suspend and resume application!
• Metrics!
• Local runner service!
• …
ApacheCon 2014
Summary
• YARN is powerful!
• Allows applications other than M/R in Hadoop cluster!
• YARN is complex!
• Complex protocols, boilerplate code!
• Twill makes YARN easy!
• Java Thread-like runnables!
• Add-on features required by many distributed applications!
• Productivity Boost!
• Developers can focus on application logic
ApacheCon 2014
Thank You
• Twill is Open Source and needs your contributions!
• twill.incubator.apache.org!
• dev@twill.incubator.apache.org!
!
• Continuuity is hiring!
• continuuity.com/careers
Ad

More Related Content

What's hot (20)

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
 
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
DataWorks Summit
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
DataWorks Summit/Hadoop Summit
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
Rico Lin
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Heat and its resources
Heat and its resourcesHeat and its resources
Heat and its resources
Sangeeth Kumar
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
DataWorks Summit/Hadoop Summit
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
Joey Echeverria
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
 
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
DataWorks Summit
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
Rico Lin
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Heat and its resources
Heat and its resourcesHeat and its resources
Heat and its resources
Sangeeth Kumar
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
DataWorks Summit/Hadoop Summit
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 

Similar to Harnessing the power of YARN with Apache Twill (20)

RichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
Pavol Pitoňák
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
Steve Loughran
 
Spring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in HeavenSpring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in Heaven
Joshua Long
 
Arquitecturas de microservicios - Medianet Software
Arquitecturas de microservicios   -  Medianet SoftwareArquitecturas de microservicios   -  Medianet Software
Arquitecturas de microservicios - Medianet Software
Ernesto Hernández Rodríguez
 
Continuuity Weave
Continuuity WeaveContinuuity Weave
Continuuity Weave
bigdatagurus_meetup
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 
Intro to Rack
Intro to RackIntro to Rack
Intro to Rack
Rubyc Slides
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
Iegor Fadieiev
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
 
What's New In Laravel 5
What's New In Laravel 5What's New In Laravel 5
What's New In Laravel 5
Darren Craig
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
KAI CHU CHUNG
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
Steve Loughran
 
Passenger 6 generic language support presentation
Passenger 6 generic language support presentationPassenger 6 generic language support presentation
Passenger 6 generic language support presentation
Hongli Lai
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
mfrancis
 
Html5 : stockage local & synchronisation
Html5 : stockage local & synchronisationHtml5 : stockage local & synchronisation
Html5 : stockage local & synchronisation
goldoraf
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
Paul Czarkowski
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
Pietro Michiardi
 
"Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native ""Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native "
FDConf
 
Node Interactive: Node.js Performance and Highly Scalable Micro-Services
Node Interactive: Node.js Performance and Highly Scalable Micro-ServicesNode Interactive: Node.js Performance and Highly Scalable Micro-Services
Node Interactive: Node.js Performance and Highly Scalable Micro-Services
Chris Bailey
 
Multi Client Development with Spring
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
Joshua Long
 
RichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
Pavol Pitoňák
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
Steve Loughran
 
Spring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in HeavenSpring and Cloud Foundry; a Marriage Made in Heaven
Spring and Cloud Foundry; a Marriage Made in Heaven
Joshua Long
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
Iegor Fadieiev
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
 
What's New In Laravel 5
What's New In Laravel 5What's New In Laravel 5
What's New In Laravel 5
Darren Craig
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
KAI CHU CHUNG
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
Steve Loughran
 
Passenger 6 generic language support presentation
Passenger 6 generic language support presentationPassenger 6 generic language support presentation
Passenger 6 generic language support presentation
Hongli Lai
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
mfrancis
 
Html5 : stockage local & synchronisation
Html5 : stockage local & synchronisationHtml5 : stockage local & synchronisation
Html5 : stockage local & synchronisation
goldoraf
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
Paul Czarkowski
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
Pietro Michiardi
 
"Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native ""Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native "
FDConf
 
Node Interactive: Node.js Performance and Highly Scalable Micro-Services
Node Interactive: Node.js Performance and Highly Scalable Micro-ServicesNode Interactive: Node.js Performance and Highly Scalable Micro-Services
Node Interactive: Node.js Performance and Highly Scalable Micro-Services
Chris Bailey
 
Multi Client Development with Spring
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
Joshua Long
 
Ad

Recently uploaded (20)

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Ad

Harnessing the power of YARN with Apache Twill

  • 1. Harnessing the Power of YARN with Apache Twill Terence Yim [email protected] @chtyim
  • 3. ApacheCon 2014 A Distributed App split split split part part part shuffle Reducers Mappers
  • 4. ApacheCon 2014 A Map/Reduce Cluster split split split part part part split split split part part part split split part part split split part part
  • 6. ApacheCon 2014 A Message Passing (MPI) App data data data data data data
  • 7. ApacheCon 2014 A Stream Processing App Database events
  • 8. ApacheCon 2014 A Distributed Load Test test testtest test test test test test test test test test Web Service
  • 10. ApacheCon 2014 Continuuity Reactor • Developer-Centric Big Data Application Platform! • Many different types of jobs in Hadoop cluster! • Real-time stream processing! • Ad-hoc queries! • Map/Reduce! • Web Apps
  • 11. ApacheCon 2014 The Answer is YARN • Resource Manager of Hadoop 2.0! • Separates ! • Resource Management! • Programming Paradigm! • (Almost) any distributed app in Hadoop cluster! • Application Master to negotiate resources
  • 12. ApacheCon 2014 A YARN Application data data data data data data YARN
 Resource
 Manager App
 Master
  • 13. ApacheCon 2014 A Multi-Purpose Cluster t AM AM AM AM AM YARN
 Resource
 Manager
  • 14. ApacheCon 2014 Node Mgr Node Mgr YARN - How it works YARN
 Resource
 Manager YARN! Client 1. 1. Submit App Master AM 2. 2. Start App Master in a Container 3. 3. Request Containers Task TaskTask 4. 4. Start Tasks in Containers
  • 15. ApacheCon 2014 Starting the App Master YARN
 Resource
 Manager YARN! Client Node Mgr . . . Local
 file system Local
 file system HDFS distributed file system Local
 file system! ! AM AM.jar AM.jar 1) copy 
 to HDFS AM.jar 2) copy 
 to local 3) load Node Mgr
  • 16. ApacheCon 2014 The YARN Client 1. Connect to the Resource Manager.! 2. Request a new application ID.! 3. Create a submission context and a container launch context.! 4. Define the local resources for the AM.! 5. Define the environment for the AM.! 6. Define the command to run for the AM.! 7. Define the resource limits for the AM.! 8. Submit the request to start the app master.
  • 17. ApacheCon 2014 Writing the YARN Client 1. Connect to the Resource Manager: 
 YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr(yarnConf.get( YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS)); LOG.info("Connecting to ResourceManager at " + rmAddress); configuration rmServerConf = new Configuration(conf); rmServerConf.setClass( YarnConfiguration.YARN_SECURITY_INFO, ClientRMSecurityInfo.class, SecurityInfo.class); ClientRMProtocol resourceManager = ((ClientRMProtocol) rpc.getProxy(
 ClientRMProtocol.class, rmAddress, appsManagerServerConf)); !
  • 18. ApacheCon 2014 Writing the YARN Client 2) Request an application ID:
 GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = resourceManager.getNewApplication(request); LOG.info("Got new ApplicationId=" + response.getApplicationId()); 3) Create a submission context and a launch context ! ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); appContext.setApplicationName(appName); ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
  • 19. ApacheCon 2014 Writing the YARN Client 4. Define the local resources: 
 Map<String, LocalResource> localResources = Maps.newHashMap(); // assume the AM jar is here: Path jarPath; // <- known path to jar file ! // Create a resource with location, time stamp and file length LocalResource amJarRsrc = Records.newRecord(LocalResource.class); amJarRsrc.setType(LocalResourceType.FILE); amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); FileStatus jarStatus = fs.getFileStatus(jarPath); amJarRsrc.setTimestamp(jarStatus.getModificationTime()); amJarRsrc.setSize(jarStatus.getLen()); localResources.put("AppMaster.jar", amJarRsrc); ! amContainer.setLocalResources(localResources);
  • 20. ApacheCon 2014 Writing the YARN Client 5. Define the environment: // Set up the environment needed for the launch context Map<String, String> env = new HashMap<String, String>(); ! // Setup the classpath needed. // Assuming our classes are available as local resources in the // working directory, we need to append "." to the path. String classPathEnv = "$CLASSPATH:./*:"; env.put("CLASSPATH", classPathEnv); ! // setup more environment env.put(...); ! amContainer.setEnvironment(env); !
  • 21. ApacheCon 2014 Writing the YARN Client 6. Define the command to run for the AM: 
 // Construct the command to be executed on the launched container String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster" + " arg1 arg2 arg3" + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stdout" + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stderr"; ! List<String> commands = new ArrayList<String>(); commands.add(command); ! // Set the commands into the container spec amContainer.setCommands(commands); !
  • 22. ApacheCon 2014 Writing the YARN Client 7. Define the resource limits for the AM: 
 // Define the resource requirements for the container. // For now, YARN only supports memory constraints. // If the process takes more memory, it is killed by the framework. Resource capability = Records.newRecord(Resource.class); capability.setMemory(amMemory); amContainer.setResource(capability); // Set the container launch content into the submission context appContext.setAMContainerSpec(amContainer); ! ! ! !
  • 23. ApacheCon 2014 Writing the YARN Client 8. Submit the request to start the app master: 
 // Create the request to send to the Resource Manager SubmitApplicationRequest appRequest = Records.newRecord(SubmitApplicationRequest.class); appRequest.setApplicationSubmissionContext(appContext); ! // Submit the application to the ApplicationsManager resourceManager.submitApplication(appRequest); ! ! ! ! !
  • 24. ApacheCon 2014 Node Mgr Node Mgr YARN - How it works YARN
 Resource
 Manager YARN! Client 1. 1. Submit App Master AM 2. 2. Start App Master in a Container 3. 3. Request Containers Task TaskTask 4. 4. Start Tasks in Containers
  • 25. ApacheCon 2014 YARN is complex • Three different protocols to learn! • Client -> RM, AM -> RM, AM -> NM! • Asynchronous protocols! • Full Power at the expense of simplicity! • Duplication of code ! • App masters! • YARN clients
  • 26. Can it be easier?
  • 27. ApacheCon 2014 YARN App vs Multi-threaded App YARN Application Multi-threaded Java Application YARN Client java command to launch application with options and arguments Application Master main() method preparing threads Task in Container Runnable implementation, each runs in its own Thread
  • 28. ApacheCon 2014 Apache Twill • Adds simplicity to the power of YARN! • Java thread-like programming model! • Incubated at Apache Software Foundation (Nov 2013)! • Current release: 0.2.0-incubating
  • 29. ApacheCon 2014 Hello World public class HelloWorld {
 static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); 
 static class HelloWorldRunnable extends AbstractTwillRunnable {
 @Override
 public void run() {
 LOG.info("Hello World");
 }
 }
 public static void main(String[] args) throws Exception {
 YarnConfiguration conf = new YarnConfiguration();
 TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");
 runner.startAndWait();
 
 TwillController controller = runner.prepare(new HelloWorldRunnable())
 .start();
 Services.getCompletionFuture(controller).get();
 }
  • 30. ApacheCon 2014 Hello World public class HelloWorld {
 static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); 
 static class HelloWorldRunnable extends AbstractTwillRunnable {
 @Override
 public void run() {
 LOG.info("Hello World");
 }
 }
 public static void main(String[] args) throws Exception {
 YarnConfiguration conf = new YarnConfiguration();
 TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");
 runner.startAndWait();
 
 TwillController controller = runner.prepare(new HelloWorldRunnable())
 .start();
 Services.getCompletionFuture(controller).get();
 }
  • 31. ApacheCon 2014 Hello World public class HelloWorld {
 static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); 
 static class HelloWorldRunnable extends AbstractTwillRunnable {
 @Override
 public void run() {
 LOG.info("Hello World");
 }
 }
 public static void main(String[] args) throws Exception {
 YarnConfiguration conf = new YarnConfiguration();
 TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");
 runner.startAndWait();
 
 TwillController controller = runner.prepare(new HelloWorldRunnable())
 .start();
 Services.getCompletionFuture(controller).get();
 }
  • 32. ApacheCon 2014 Hello World public class HelloWorld {
 static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); 
 static class HelloWorldRunnable extends AbstractTwillRunnable {
 @Override
 public void run() {
 LOG.info("Hello World");
 }
 }
 public static void main(String[] args) throws Exception {
 YarnConfiguration conf = new YarnConfiguration();
 TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181");
 runner.startAndWait();
 
 TwillController controller = runner.prepare(new HelloWorldRunnable())
 .start();
 Services.getCompletionFuture(controller).get();
 }
  • 34. ApacheCon 2014 Architecture ! ! Twill! Client! ! ! …! ! ! Twill
 Runnable Twill
 Runner Twill
 Runnable This is the only programming interface you need Node Mgr Twill! AM YARN
 Resource
 Manager Task Task Task Task Task Task Node Mgr Node Mgr Node Mgr 1. 2. 3. 4.
  • 35. ApacheCon 2014 Twill Application What if my app needs more than one type of task?! ! ! ! ! ! ! !
  • 36. ApacheCon 2014 Twill Application What if my app needs more than one type of task?! • Define a TwillApplication with multiple TwillRunnables inside:! 
 public class MyTwillApplication implements TwillApplication { ! @Override public TwillSpecification configure() { return TwillSpecification.Builder.with() .setName("Search") .withRunnable() .add("crawler", new CrawlerTwillRunnable()).noLocalFiles() .add("indexer", new IndexerTwillRunnable()).noLocalFiles() .anyOrder() .build(); } }
  • 37. ApacheCon 2014 Features • Real-time logging! • Resource report! • State recovery! • Elastic scaling! • Command messages! • Service discovery! • Bundle jar execution
  • 38. ApacheCon 2014 Real-Time Logging ! ! Twill! Client! Twill
 Runner LogHandler to receive logs from
 all runnables Node Mgr AM+ Kafka YARN
 Resource
 Manager Task Task Task Task Task Task Node Mgr Node Mgr Node Mgr log stream log log appender to send logs to Kafka
  • 39. ApacheCon 2014 Real-time Logging TwillController controller = runner.prepare(new HelloWorldRunnable()) .addLogHandler(new PrinterLogHandler( new PrintWriter(System.out, true))) .start(); ! OR ! ! controller.addLogHandler(new PrinterLogHandler( new PrintWriter(System.out, true))); ! !
  • 40. ApacheCon 2014 Resource Report • Twill Application Master exposes HTTP endpoint! • Resource information for each container (AM and Runnables)! • Memory and virtual core! • Number of live instances for each Runnable! • Hostname of the container is running on! • Container ID! • Registered as AM tracking URL
  • 41. ApacheCon 2014 Resource Report • Programmatic access to resource report.! ! ! ResourceReport report = controller.getResourceReport();
 Collection<TwillRunResources> resources = 
 report.getRunnableResource("MyRunnable"); ! ! ! !
  • 42. ApacheCon 2014 State Recovery • What happens to the Twill app if the client terminates?! • It keeps running! • Can a new client take over control?
  • 43. ApacheCon 2014 ! ! Twill! Client! State Recovery Twill
 Runner Recover state from ZooKeeper Node Mgr AM+ Kafka YARN
 Resource
 Manager Task Task Task Task Task Task Node Mgr Node Mgr Node Mgr ZooKeeper ZooKeeper ZooKeeper state
  • 44. ApacheCon 2014 State Recovery • All live instances of an application! ! ! Iterable<TwillController> controllers =
 runner.lookup("HelloWorld");! • A particular live instance! ! ! TwillController controller = runner.lookup(“HelloWorld”, RunIds.fromString("lastRunId")); • All live instances of all applications! ! ! Iterable<LiveInfo> liveInfos = runner.lookupLive();
  • 45. ApacheCon 2014 ! ! Twill! Client! Command Messages Twill
 Runner Send command
 message to 
 runnables Node Mgr AM+ Kafka YARN
 Resource
 Manager Task Task Task Task Task Task Node Mgr Node Mgr Node Mgr ZooKeeper ZooKeeper ZooKeeper message
  • 46. ApacheCon 2014 Command Messages • Send to all Runnables:! ! ListenableFuture<Command> completion = controller.sendCommand(
 Command.Builder.of("gc").build()); • Send to the “indexer” Runnable:! ! ListenableFuture<Command> completion = controller.sendCommand("indexer",
 Command.Builder.of("flush").build()); • The Runnable implementation defines how to handle it:! ! void handleCommand(Command command) throws Exception;
  • 47. ApacheCon 2014 Elastic Scaling • Change the instance count of a live runnable! ! ListenableFuture<Integer> completion = controller.changeInstances("crawler", 10); // Wait for change complete completion.get(); • Implemented by a command message to the AM.! !
  • 48. ApacheCon 2014 Service Discovery • Running a service in Twill! • What host/port should a client connect to?
  • 49. ApacheCon 2014 ! ! Twill! Client! Service Discovery Twill
 Runner Watch for change
 in discovery nodes Node Mgr AM+ Kafka YARN
 Resource
 Manager Task Task Task Task Task Task Node Mgr Node Mgr Node Mgr ZooKeeper ZooKeeper ZooKeeper Service 
 changes Register
 service
  • 50. ApacheCon 2014 Service Discovery • In TwillRunnable, register as a named service:! ! @Override public void initialize(TwillContext context) { // Starts server on random port int port = startServer(); context.announce("service", port); } • Discover by service name on client side:! ! ServiceDiscovered serviceDiscovered = controller.discoverService(“service"); !
  • 51. ApacheCon 2014 Bundle Jar Execution • Different library dependencies than Twill! • Run existing application in YARN
  • 52. ApacheCon 2014 Bundle Jar Execution • Bundle Jar contains classes and all libraries depended on inside a jar.! ! MyMain.class MyRecord.class lib/guava-16.0.1.jar lib/netty-all-4.0.17.jar • Easy to create with build tools! • Maven - maven-bundle-plugin! • Gradle - apply plugin: 'osgi' !
  • 53. ApacheCon 2014 Bundle Jar Execution • Execute with BundledJarRunnable! • See BundledJarExample in the source tree.! ! java org.apache.twill.example.yarn.BundledJarExample <zkConnectStr> <bundleJarPath> <mainClass> <args…> ! • Successfully run Presto on YARN! • Non-intrusive, no code modification for Presto! • Simple maven project to use Presto in embedded way and create Bundle Jar! !
  • 54. ApacheCon 2014 The Road Ahead • Scripts for easy life cycle management and scaling! • Distributed coordination within application! • Remote debugging! • Non-Java application! • Suspend and resume application! • Metrics! • Local runner service! • …
  • 55. ApacheCon 2014 Summary • YARN is powerful! • Allows applications other than M/R in Hadoop cluster! • YARN is complex! • Complex protocols, boilerplate code! • Twill makes YARN easy! • Java Thread-like runnables! • Add-on features required by many distributed applications! • Productivity Boost! • Developers can focus on application logic
  • 56. ApacheCon 2014 Thank You • Twill is Open Source and needs your contributions! • twill.incubator.apache.org! • [email protected]! ! • Continuuity is hiring! • continuuity.com/careers