SlideShare a Scribd company logo
Realtime Statistics based on Apache Storm and RocketMQ
- Xin Wang
Dec.16, 2017, Shenzhen, Apache RocketMQ Meetup
Xin Wang
• Apache Storm Committer & PMC member
• Five years distributed system experience
• Love open source & community
• Focus on distributed technologies, especially stream processing
• https://github.com/vesense
Streaming and batch which come from different worlds,
use different ways to solve different problems.
- Xin
Index
Part-1: Streaming Ecosystem
Part-2: Stateful Statistics based on Storm & RocketMQ
Part-3: Best Practices
01
Streaming Ecosystem
The Streaming Ecosystem
Collector
Messaging
SQL
Streaming-
Connector
Streaming-
Connector
Storage
APP
Streaming-
State
Schema-
Registry
CEP ML ...
Stream	API
Runtime
Deploy:	Local,	Cluster,	Cloud
Streaming-
Manager
Streaming
...
Messaging:	apache/kafka,rocketmq,pulsar
Streaming:	apache/storm,flink,spark-streaming,kafka-streams
Schema-Registry:	hortonworks/registry,	confluentinc/schema-registry
Streaming-Manager:	hortonworks/streamline
Which one is better for me?
• Simple API
• Fault-tolerant/Stable
• Scalable
• Performance(high throughput & low latency)
• Guarantees: at-least-once/exactly-once
• Mature
• Ecosystem
• Operation and Maintenance
• Support
• Code
Storm 2.0
• Port Clojure to Java
• Unified Stream API
• Storm-SQL Improvements
• Metrics V2
• Threading model Redesign
• Lambda Expression Support - bolt: `tuple -> System.out.println(tuple)`
• Apache Beam Runner
• Worker Classloader Isolation
• Dynamic Topology Updates
• ......
02
Stateful Statistics based on Storm & RocketMQ
Realtime Architecture
Apache	HBase
MySQL
S
S
P R
Source	
Topic
Retry
Topic
Sink	Topic
Apache	RocketMQ
Apache	RocketMQ
Apache	Storm
Stateful Statistics Challenges
1 2 3
1 2 3
3 2 1
time
1 2
3
~
loss
duplicating
out-of-order
mutex
Q:
complex, state machine + streaming
open source middleware?
A:
• loss -> compensating
• s1 -> s2 on condition when e3
• duplicating	->	idempotent
• exists(key)
• out-of-order -> compensating+idempotent
• mutex -> +/-
• s1++ && s2--
1 2 3 expected
Stateful Event Counting: Alien
Alien? a stateful event counting middleware.
• Support event loss, duplicating, out-of-order, mutex
• Support time/event Window API
• Support integrating with streaming systems
• Support dimension changing
• Support sync/async snapshot storage
• Support user defined State, Snapshot Serializer
• Support state REST interfaces
• ...
Alien	alien	=	Alien.createAlien()
.withState(new	LocalMemoryState())
.withWindow(new	TimeWindow(2){
@Override
public	void	accept(Map<String,View>	views)	{
View	view	=	views.get(“report”);
List<Row>	rows	=	view.getRows();
for	(Row	r :	rows)	{
println(r.getDimensions()	+	“->”	+	r.getMetrics());
}
}
});	
alien.putEvent(new	Event("name","key") );
03
Best Practices
Best Practices
• Worker heavy GCs:
• worker restart, heartbeat timeout, bad performance -> take care of your heap memory usage. Do you use local
caches? Reasonable JVM options? e.g. -XX:CMSInitiatingOccupancyFraction
• Topology design vs performance
• bad performance -> put the lightweight logics into the same bolt/operator
• Too many executors/tasks
• high cluster CPU load, bad performance -> tuning the number of threads:
for CPU-intensive task: task parallelism <= vcore,
for IO-intensive task: vcore <= task parallelism <= N*vcore.
Warn: runnable sun.nio.ch.EPollArrayWrapper.epollWait
CPU: user cpu or sys cpu? Load: runnable task or io task?
Amdahl law: Non-Parallelizable + Parallelizable
• Data hot point / data skew
• some nodes have bad performance -> choose the right hash key, two-phase aggregation, or use micro-batch
• Big objects serialization:
• bad performance -> reduce the size of objects, and enable kryo registry(from 55ms to 11ms after kryo registry)
• Too many logs:
• bad performance -> never log the logs unnecessary
Data Hot Point / Data Skew
Q:
partition = hash(key) % N
A:
• Choose the right hash key
• mapreduce from history?
• key == null?
• Two-phase aggregation
• Use micro-batch / local-reduce
S
P
P
S
P
P
S
P
P
G
k1
k2
k1+salt1
k1+salt2
k1
k2micro-batch
num(k1)
num(P)
num(S)
RocketMQ-Streaming Integration
RocketMQ-Storm: https://github.com/apache/storm/tree/master/external/storm-rocketmq
• RocketMQSpout - Now only RocketMQ push mode supported, pull mode is in the plan. The default Deserializer is
StringScheme, you can override the value by setting `RocketMQConfig.SCHEME`.
• RocketMQBolt - Async sending by default, or you can change the value by invoking `withAsync(boolean async)`
• RocketMQState - For users using Storm Trident API
• TopicSelector - Selecting a topic based on the input Storm tuple
• TupleToMessageMapper - Mapping a Storm tuple to a RocketMQ message, you can implement the
MessageBodySerializer interface to serialize the message body. The default implementation of MessageBodySerializer
is `body.toString().getBytes(StandardCharsets.UTF_8)`
• MessageRetryManager - Retry policy for failed messages
RocketMQ-Spark: https://github.com/apache/rocketmq-externals/tree/master/rocketmq-spark
RocketMQ-Flink: Coming soon
RocketMQ-Avro: Coming soon
OpenMessaging-Streaming
Ad

More Related Content

What's hot (20)

Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
StreamNative
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAXAlpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Richard Rabins
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
makker_nl
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
HostedbyConfluent
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
HostedbyConfluent
 
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and WebsocketsWebinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Redis Labs
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
Venu Ryali
 
LCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching TechnologiesLCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching Technologies
Linaro
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
Venu Ryali
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
Venu Ryali
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
the100rabh
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
StreamNative
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAXAlpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Richard Rabins
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
confluent
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
makker_nl
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
HostedbyConfluent
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
HostedbyConfluent
 
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and WebsocketsWebinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Redis Labs
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
Venu Ryali
 
LCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching TechnologiesLCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching Technologies
Linaro
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
Venu Ryali
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
Venu Ryali
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu
 

Similar to 3.2 Streaming and Messaging (20)

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
Piotr Pelczar
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
Alex Moskvin
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Concurrency
ConcurrencyConcurrency
Concurrency
Biju Nair
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
Alex Moskvin
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Ad

Recently uploaded (20)

Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Ad

3.2 Streaming and Messaging

  • 1. Realtime Statistics based on Apache Storm and RocketMQ - Xin Wang Dec.16, 2017, Shenzhen, Apache RocketMQ Meetup
  • 2. Xin Wang • Apache Storm Committer & PMC member • Five years distributed system experience • Love open source & community • Focus on distributed technologies, especially stream processing • https://github.com/vesense
  • 3. Streaming and batch which come from different worlds, use different ways to solve different problems. - Xin
  • 4. Index Part-1: Streaming Ecosystem Part-2: Stateful Statistics based on Storm & RocketMQ Part-3: Best Practices
  • 6. The Streaming Ecosystem Collector Messaging SQL Streaming- Connector Streaming- Connector Storage APP Streaming- State Schema- Registry CEP ML ... Stream API Runtime Deploy: Local, Cluster, Cloud Streaming- Manager Streaming ... Messaging: apache/kafka,rocketmq,pulsar Streaming: apache/storm,flink,spark-streaming,kafka-streams Schema-Registry: hortonworks/registry, confluentinc/schema-registry Streaming-Manager: hortonworks/streamline
  • 7. Which one is better for me? • Simple API • Fault-tolerant/Stable • Scalable • Performance(high throughput & low latency) • Guarantees: at-least-once/exactly-once • Mature • Ecosystem • Operation and Maintenance • Support • Code
  • 8. Storm 2.0 • Port Clojure to Java • Unified Stream API • Storm-SQL Improvements • Metrics V2 • Threading model Redesign • Lambda Expression Support - bolt: `tuple -> System.out.println(tuple)` • Apache Beam Runner • Worker Classloader Isolation • Dynamic Topology Updates • ......
  • 9. 02 Stateful Statistics based on Storm & RocketMQ
  • 11. Stateful Statistics Challenges 1 2 3 1 2 3 3 2 1 time 1 2 3 ~ loss duplicating out-of-order mutex Q: complex, state machine + streaming open source middleware? A: • loss -> compensating • s1 -> s2 on condition when e3 • duplicating -> idempotent • exists(key) • out-of-order -> compensating+idempotent • mutex -> +/- • s1++ && s2-- 1 2 3 expected
  • 12. Stateful Event Counting: Alien Alien? a stateful event counting middleware. • Support event loss, duplicating, out-of-order, mutex • Support time/event Window API • Support integrating with streaming systems • Support dimension changing • Support sync/async snapshot storage • Support user defined State, Snapshot Serializer • Support state REST interfaces • ... Alien alien = Alien.createAlien() .withState(new LocalMemoryState()) .withWindow(new TimeWindow(2){ @Override public void accept(Map<String,View> views) { View view = views.get(“report”); List<Row> rows = view.getRows(); for (Row r : rows) { println(r.getDimensions() + “->” + r.getMetrics()); } } }); alien.putEvent(new Event("name","key") );
  • 14. Best Practices • Worker heavy GCs: • worker restart, heartbeat timeout, bad performance -> take care of your heap memory usage. Do you use local caches? Reasonable JVM options? e.g. -XX:CMSInitiatingOccupancyFraction • Topology design vs performance • bad performance -> put the lightweight logics into the same bolt/operator • Too many executors/tasks • high cluster CPU load, bad performance -> tuning the number of threads: for CPU-intensive task: task parallelism <= vcore, for IO-intensive task: vcore <= task parallelism <= N*vcore. Warn: runnable sun.nio.ch.EPollArrayWrapper.epollWait CPU: user cpu or sys cpu? Load: runnable task or io task? Amdahl law: Non-Parallelizable + Parallelizable • Data hot point / data skew • some nodes have bad performance -> choose the right hash key, two-phase aggregation, or use micro-batch • Big objects serialization: • bad performance -> reduce the size of objects, and enable kryo registry(from 55ms to 11ms after kryo registry) • Too many logs: • bad performance -> never log the logs unnecessary
  • 15. Data Hot Point / Data Skew Q: partition = hash(key) % N A: • Choose the right hash key • mapreduce from history? • key == null? • Two-phase aggregation • Use micro-batch / local-reduce S P P S P P S P P G k1 k2 k1+salt1 k1+salt2 k1 k2micro-batch num(k1) num(P) num(S)
  • 16. RocketMQ-Streaming Integration RocketMQ-Storm: https://github.com/apache/storm/tree/master/external/storm-rocketmq • RocketMQSpout - Now only RocketMQ push mode supported, pull mode is in the plan. The default Deserializer is StringScheme, you can override the value by setting `RocketMQConfig.SCHEME`. • RocketMQBolt - Async sending by default, or you can change the value by invoking `withAsync(boolean async)` • RocketMQState - For users using Storm Trident API • TopicSelector - Selecting a topic based on the input Storm tuple • TupleToMessageMapper - Mapping a Storm tuple to a RocketMQ message, you can implement the MessageBodySerializer interface to serialize the message body. The default implementation of MessageBodySerializer is `body.toString().getBytes(StandardCharsets.UTF_8)` • MessageRetryManager - Retry policy for failed messages RocketMQ-Spark: https://github.com/apache/rocketmq-externals/tree/master/rocketmq-spark RocketMQ-Flink: Coming soon RocketMQ-Avro: Coming soon OpenMessaging-Streaming