SlideShare a Scribd company logo
Athena 
Streaming with Samza 

Goddess of Wisdom
The team
Brief Background
Stream Processing, Samza, Athena
Kafka Uber uses Kafka as logging/event system
Message streams
Near-real time
computation
Stream Processing Framework:
Apache Samza
Platform built on top of Samza:
Athena
Current use cases
Current use cases
Aggregation
Pricing
Assess supply and demand in real time to calculate
accurate surge multiples
Kafka Samza
location updates
Elasticsearch
S3 Spark
location updates
HTTP
Service
Realtime
Batch
Queries
Mapper
Job
Mapper
Job
Mapper
Job
Event	
  parsing,	
  filtering,	
  
classification	
  
Event	
  Aggregation	
  
Raw
events
Riak
Inserter
Job
Inserter
Job
Tiles
1m Agg
TilesReducer
Job
Reducer
Job
1m Agg
Tiles
Per	
  Contract	
   Common	
  
Artemis
Samza
Current use cases
Event driven update engine
Driver Activation
KAFKA SAMZA
Driver status
updates
Onboarding
Service
Fetch
driver
info
RocksDB
Retry queue
Update
status
Upcoming use cases
Fraud monitoring and alerting
KAFKA Partitioner
Real Time
Hourly
Alerting
Service
Monitoring
Service
Uberx metrics
Da Vinci: Streaming platform for data science
KAFKA
Aggregation job
(Model generation)
RocksDB
Historical state
Timeseries
data
Model evaluation
Database /
Elasticsearch
updates
Model
Updates
Samza Architecture Overview
Samza Architecture
Basic Structure of a Task
Task
Deployment in Uber
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Athena Tooling
Tooling
●  Athena manager
●  Job configuration
●  Unit test framework
●  Graphite integration
●  Codahale library support
●  Maven archetype
●  Artifactory support
Athena Manager
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Job configuration
reference.conf
Sandbox Staging ProductionDev
athena-core-lib
application.conf
Sandbox Staging ProductionDev
Samza job
Job configuration
projects {
artemis {
job_1 {
mapper {
envs.common {
task.inputs = "kafka.topic_1"
task.class = "com.uber.athena.SampleTaskClass"
}
envs.local = ${projects.artemis.job_1.mapper.envs.common}
envs.local = {
task.window.ms = 30000
}
envs.sandbox = ${projects.artemis.job_1.mapper.envs.local}
envs.sandbox = {
yarn.package.path = "http://artifactory..../artifactory/libs-snapshot-local/com/uber/athena/.../hello-athena.tar.gz"
}
envs.staging = ${projects.artemis.job_1.mapper.envs.sandbox}
envs.production = ${projects.artemis.job_1.mapper.envs.sandbox}
}
Job configuration
Unit test framework
Stream Job
StreamTask InitableTask WindowableTask
TaskUnitTestHarness
Message Listener
Inject data Custom
IncomingMessageEnvelope
Custom
MessageCollector
Unit test framework
String classTaskName = "com.uber.athena.test.TestProcessTask";
// Register the job to the test harness
TaskUnitTestHarness<String, Integer> testProcessTask = new
TaskUnitTestHarness<>(classTaskName, false, true);
// Register a listener for validating the output of every process function
testProcessTask.registerMessageListenerOnProcess(new KeyAsserterListener());
// Start the job
testProcessTask.start();
// Inject data
testProcessTask.inject(key,value);
// Get the full output
testWindowTask.getResult()
Tooling
●  Athena manager
●  Job configuration
●  Unit test framework
●  Graphite integration
●  Codahale library support
●  Maven archetype
●  Artifactory support
Observations
Observations
●  YARN is not bad !
●  Offset lag and Buffered messages
●  Kafka Appender for ELK
●  Checkpoint topic partition incorrect count
●  Config validation needs improvement
●  Job restarts are complicated
●  Built-in Metrics are insufficient
●  Seamless upgrades
●  Custom built-in serde support
●  Config validation enhancement
●  Auto benchmarking a Samza job
●  Unit test framework enhancement
Upcoming Samza improvements
Q&A
Ad

More Related Content

What's hot (17)

Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
Soren Macbeth
 
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Weaveworks
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
Tamir Dresher
 
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
Shengyou Fan
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
VMware Tanzu
 
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20..."AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
Vadym Kazulkin
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safety
Chris Shenton
 
Ecs gitlab runners
Ecs gitlab runnersEcs gitlab runners
Ecs gitlab runners
dynnamitt
 
Terraform day 2
Terraform day 2Terraform day 2
Terraform day 2
Kalkey
 
Creating asynchronous flows on AWS
Creating asynchronous flows on AWSCreating asynchronous flows on AWS
Creating asynchronous flows on AWS
Metin Kale
 
Backpressure? Resistance is not futile. RxJS Live 2019
Backpressure? Resistance is not futile. RxJS Live 2019Backpressure? Resistance is not futile. RxJS Live 2019
Backpressure? Resistance is not futile. RxJS Live 2019
Jay Phelps
 
Scalable Event Tracking
Scalable Event TrackingScalable Event Tracking
Scalable Event Tracking
schibstedpayment
 
Emberjs building-ambitious-web-applications
Emberjs building-ambitious-web-applicationsEmberjs building-ambitious-web-applications
Emberjs building-ambitious-web-applications
ColdFusionConference
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3
JustinHolt20
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
A Crash Course on Serverless Applications in Python
A Crash Course on Serverless Applications in PythonA Crash Course on Serverless Applications in Python
A Crash Course on Serverless Applications in Python
James Saryerwinnie
 
Orbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmOrbiter and how to extend Docker Swarm
Orbiter and how to extend Docker Swarm
Gianluca Arbezzano
 
Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
Soren Macbeth
 
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Weaveworks
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
Tamir Dresher
 
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
Shengyou Fan
 
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
Cloud-Native Streaming Platform: Running Apache Kafka on PKS (Pivotal Contain...
VMware Tanzu
 
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20..."AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
"AWS Fargate: Containerization meets Serverless" at AWS User Group Cologne 20...
Vadym Kazulkin
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safety
Chris Shenton
 
Ecs gitlab runners
Ecs gitlab runnersEcs gitlab runners
Ecs gitlab runners
dynnamitt
 
Terraform day 2
Terraform day 2Terraform day 2
Terraform day 2
Kalkey
 
Creating asynchronous flows on AWS
Creating asynchronous flows on AWSCreating asynchronous flows on AWS
Creating asynchronous flows on AWS
Metin Kale
 
Backpressure? Resistance is not futile. RxJS Live 2019
Backpressure? Resistance is not futile. RxJS Live 2019Backpressure? Resistance is not futile. RxJS Live 2019
Backpressure? Resistance is not futile. RxJS Live 2019
Jay Phelps
 
Emberjs building-ambitious-web-applications
Emberjs building-ambitious-web-applicationsEmberjs building-ambitious-web-applications
Emberjs building-ambitious-web-applications
ColdFusionConference
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3
JustinHolt20
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
A Crash Course on Serverless Applications in Python
A Crash Course on Serverless Applications in PythonA Crash Course on Serverless Applications in Python
A Crash Course on Serverless Applications in Python
James Saryerwinnie
 
Orbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmOrbiter and how to extend Docker Swarm
Orbiter and how to extend Docker Swarm
Gianluca Arbezzano
 

Viewers also liked (18)

Autobiography ppt
Autobiography pptAutobiography ppt
Autobiography ppt
MEXI13
 
A Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure EmergenceA Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure Emergence
whichlight
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
Jayush Luniya
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
Abhra Pal
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
Cask Data
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
 
Sinaunang Kabihasnan ng Gresya o Greece
Sinaunang Kabihasnan ng Gresya o GreeceSinaunang Kabihasnan ng Gresya o Greece
Sinaunang Kabihasnan ng Gresya o Greece
Romeline Magsino
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Cloudera, Inc.
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
Cask Data
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
Mihnea Giurgea
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
Dvir Volk
 
Mga Diyos at diyosa ng Greece
Mga Diyos at diyosa ng GreeceMga Diyos at diyosa ng Greece
Mga Diyos at diyosa ng Greece
Marie Estelle Celestial
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
DoiT International
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Christine Joyce Javier
 
Autobiography ppt
Autobiography pptAutobiography ppt
Autobiography ppt
MEXI13
 
A Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure EmergenceA Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure Emergence
whichlight
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
Jayush Luniya
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
Abhra Pal
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
Cask Data
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
 
Sinaunang Kabihasnan ng Gresya o Greece
Sinaunang Kabihasnan ng Gresya o GreeceSinaunang Kabihasnan ng Gresya o Greece
Sinaunang Kabihasnan ng Gresya o Greece
Romeline Magsino
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Cloudera, Inc.
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
Cask Data
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
Mihnea Giurgea
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
Dvir Volk
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
DoiT International
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Mga Diyos at Diyosa ng Imperyong Griyego at Roman (The Gods and Goddesses of ...
Christine Joyce Javier
 
Ad

Similar to Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 (20)

Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
Yaroslav Tkachenko
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
Lightbend
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
Jacob Park
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroup
Johan Andrén
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
Shashank Gautam
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
 
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Plain Concepts
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams
Johan Andrén
 
Writing RESTful web services using Node.js
Writing RESTful web services using Node.jsWriting RESTful web services using Node.js
Writing RESTful web services using Node.js
FDConf
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka Streams
Johan Andrén
 
Serverless Apps with AWS Step Functions
Serverless Apps with AWS Step FunctionsServerless Apps with AWS Step Functions
Serverless Apps with AWS Step Functions
Amanda Mackay (she/her)
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
Dori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
Dori Waldman
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
Reactive streams processing using Akka Streams
Reactive streams processing using Akka StreamsReactive streams processing using Akka Streams
Reactive streams processing using Akka Streams
Johan Andrén
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Provectus
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
Yaroslav Tkachenko
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
Lightbend
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
Jacob Park
 
Akka streams - Umeå java usergroup
Akka streams - Umeå java usergroupAkka streams - Umeå java usergroup
Akka streams - Umeå java usergroup
Johan Andrén
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
Shashank Gautam
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
 
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Plain Concepts
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
Reactive stream processing using Akka streams
Reactive stream processing using Akka streams Reactive stream processing using Akka streams
Reactive stream processing using Akka streams
Johan Andrén
 
Writing RESTful web services using Node.js
Writing RESTful web services using Node.jsWriting RESTful web services using Node.js
Writing RESTful web services using Node.js
FDConf
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka Streams
Johan Andrén
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
Dori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
Dori Waldman
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
Reactive streams processing using Akka Streams
Reactive streams processing using Akka StreamsReactive streams processing using Akka Streams
Reactive streams processing using Akka Streams
Johan Andrén
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
Prateek Maheshwari
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Provectus
 
Ad

More from Cask Data (9)

Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...
Cask Data
 
About CDAP
About CDAPAbout CDAP
About CDAP
Cask Data
 
Transaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, CaskTransaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, Cask
Cask Data
 
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask #BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache Twill
Cask Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Cask Data
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
Cask Data
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Cask Data
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
Cask Data
 
Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...
Cask Data
 
Transaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, CaskTransaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, Cask
Cask Data
 
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask #BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache Twill
Cask Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Cask Data
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
Cask Data
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Cask Data
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
Cask Data
 

Recently uploaded (20)

Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 

Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

  • 1. Athena Streaming with Samza Goddess of Wisdom
  • 3. Brief Background Stream Processing, Samza, Athena Kafka Uber uses Kafka as logging/event system Message streams Near-real time computation Stream Processing Framework: Apache Samza Platform built on top of Samza: Athena
  • 6. Pricing Assess supply and demand in real time to calculate accurate surge multiples Kafka Samza location updates Elasticsearch S3 Spark location updates HTTP Service Realtime Batch Queries
  • 7. Mapper Job Mapper Job Mapper Job Event  parsing,  filtering,   classification   Event  Aggregation   Raw events Riak Inserter Job Inserter Job Tiles 1m Agg TilesReducer Job Reducer Job 1m Agg Tiles Per  Contract   Common   Artemis Samza
  • 8. Current use cases Event driven update engine
  • 9. Driver Activation KAFKA SAMZA Driver status updates Onboarding Service Fetch driver info RocksDB Retry queue Update status
  • 11. Fraud monitoring and alerting KAFKA Partitioner Real Time Hourly Alerting Service Monitoring Service Uberx metrics
  • 12. Da Vinci: Streaming platform for data science KAFKA Aggregation job (Model generation) RocksDB Historical state Timeseries data Model evaluation Database / Elasticsearch updates Model Updates
  • 16. Task
  • 20. Tooling ●  Athena manager ●  Job configuration ●  Unit test framework ●  Graphite integration ●  Codahale library support ●  Maven archetype ●  Artifactory support
  • 24. Job configuration reference.conf Sandbox Staging ProductionDev athena-core-lib application.conf Sandbox Staging ProductionDev Samza job
  • 25. Job configuration projects { artemis { job_1 { mapper { envs.common { task.inputs = "kafka.topic_1" task.class = "com.uber.athena.SampleTaskClass" } envs.local = ${projects.artemis.job_1.mapper.envs.common} envs.local = { task.window.ms = 30000 } envs.sandbox = ${projects.artemis.job_1.mapper.envs.local} envs.sandbox = { yarn.package.path = "http://artifactory..../artifactory/libs-snapshot-local/com/uber/athena/.../hello-athena.tar.gz" } envs.staging = ${projects.artemis.job_1.mapper.envs.sandbox} envs.production = ${projects.artemis.job_1.mapper.envs.sandbox} }
  • 27. Unit test framework Stream Job StreamTask InitableTask WindowableTask TaskUnitTestHarness Message Listener Inject data Custom IncomingMessageEnvelope Custom MessageCollector
  • 28. Unit test framework String classTaskName = "com.uber.athena.test.TestProcessTask"; // Register the job to the test harness TaskUnitTestHarness<String, Integer> testProcessTask = new TaskUnitTestHarness<>(classTaskName, false, true); // Register a listener for validating the output of every process function testProcessTask.registerMessageListenerOnProcess(new KeyAsserterListener()); // Start the job testProcessTask.start(); // Inject data testProcessTask.inject(key,value); // Get the full output testWindowTask.getResult()
  • 29. Tooling ●  Athena manager ●  Job configuration ●  Unit test framework ●  Graphite integration ●  Codahale library support ●  Maven archetype ●  Artifactory support
  • 31. Observations ●  YARN is not bad ! ●  Offset lag and Buffered messages ●  Kafka Appender for ELK ●  Checkpoint topic partition incorrect count ●  Config validation needs improvement ●  Job restarts are complicated ●  Built-in Metrics are insufficient
  • 32. ●  Seamless upgrades ●  Custom built-in serde support ●  Config validation enhancement ●  Auto benchmarking a Samza job ●  Unit test framework enhancement Upcoming Samza improvements
  • 33. Q&A