SlideShare a Scribd company logo
Cloud-native Stream Processor
S. Suhothayan (Suho)
Director WSO2
@suhothayan
Adaptation of Microservices for Stream Processing
Organizations try to:
1. Port legacy big data solutions to the cloud.
- Not designed to work in microservice architecture.
- Massive (need 5 - 6 large nodes).
- Needs multiple tools for integration and analytics.
2. Build microservices using on-premise solutions such as
Siddhi or Esper library.
- Do not support scalability and fault tolerance.
Introducing A Cloud Native Stream Processor
● Lightweight (low memory footprint and quick startup)
● 100% open source (no commercial features) with Apache License v2.
● Native support for Docker and Kubernetes.
● Support agile devops workflow and full CI/CD pipeline.
● Allow event processing logics to be written in SQL like query language and
via graphical tool.
● Single tool for data collection, injection, processing, analyzing, integrating
(with services and databases), and to manage notifications
Overview
Key features
● Native distributed deployment in Kubernetes
● Native CDC support for Oracle, MySQL, MSSQL, Postgres
● Long running aggregations from sec to years
● Complex pattern detection
● Online machine learning
● Synchronous decision making
● DB integration with caching
● Service integration with error handling
● Multiple built in connectors (file, Kafka, NATS, gRPC, ...)
Scenarios and Use Cases Supported by Siddhi
1. Realtime Policy Enforcement Engine
2. Notification Management
3. Streaming Data Integration
4. Fraud Detection
5. Stream Processing at Scale on Kubernetes
6. Embedded Decision Making
7. Monitoring and Time Series Data Analytics
8. IoT, Geo and Edge Analytics
9. Realtime Decision as a Service
10. Realtime Predictions With Machine Learning
Find out more about the Supported Siddhi Scenarios here
Full size Image Area
Success Stories
Experian makes real-time marketing channel decisions under 200
milliseconds using Siddhi.
Eurecat built their next generation shopping experience by
integrating iBeacons and IoT devices with WSO2.
Cleveland Clinic and Hospital Corporation of America detects critical
patient conditions and alert nurses and also automates decisions
during emergencies.
BNY Mellon uses Siddhi as a notification management engine.
Success Stories ...
a
TFL used WSO2 real time streaming to create next generation
transport systems.
Uber detected fraud in real time, processing over 400K events per
second
Use Siddhi as its APIM management throttling engine and as the
policy manager for its Identify and Access Platform
EBay and PayPal uses Siddhi as part of Apache Eagle to as a policy
enforcement engine.
Image Area
Working with Siddhi
● Develop apps using Siddhi Editor.
● CI/CD with build integration and
Siddhi Test Framework.
● Running modes
○ Emadded in Java/Python apps.
○ Microservice in bare metal/VM.
○ Microservice in Docker.
○ Microservice in Kubernetes
(distributed deployment with NATS)
Streaming SQL
@app:name('Alert-Processor')
@source(type='kafka', ..., @map(type='json'))
define stream TemperatureStream (roomNo string, temp double);
@info(name='AlertQuery')
from TemperatureStream#window.time(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert into AvgTemperatureStream;
Source/Sink & Streams
Window Query
with Rate Limiting
Web Based Source Editor
Web Based Graphical Editor
Siddhi Editor Introduction Video
Reference CI/CD Pipeline of Siddhi
https://medium.com/siddhi-io/building-an-efficient-ci-cd-pipeline-for-siddhi-c33150721b5d
Full size Image Area with text
Supported Data Processing Patterns
Supported Data Processing Patterns
1. Consume and publish events with various data formats.
2. Data filtering and preprocessing.
3. Date transformation.
4. Database integration and caching.
5. Service integration and error handling.
6. Data Summarization.
7. Rule processing.
8. Serving online and predefined ML models.
9. Scatter-gather and data pipelining.
10. Realtime decisions as a service (On-demand processing).
Image Area
Scenario: Order Processing
Customers place orders.
Shipments are made.
Customers pay for the order.
Tasks:
● Process order fulfillment.
● Alerts sent on abnormal conditions.
● Send recommendations.
● Throttle order requests when limit
exceeded.
● Provide order analytics over time.
Full size Image Area with text
Consume and Publish Events With
Various Data Formats
Consume and Publish Events With Various Data Formats
Supported transports
● NATS, Kafka, RabbitMQ, JMS, IBMMQ, MQTT
● Amazon SQS, Google Pub/Sub
● HTTP, gRPC, TCP, Email, WebSocket,
● Change Data Capture (CDC)
● File, S3, Google Cloud Storage
Supported data formats
● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV
Consume and Publish Events With Various Data Formats
Default JSON mapping
Custom JSON mapping
@source(type = mqtt, …, @map(type = json))
define stream OrderStream(custId string, item string, amount int);
@source(type = mqtt, …,
@map(type = json, @attribute(“$.id”,"$.itm” “$.count”)))
define stream OrderStream(custId string, item string, amount int);
{“event”:{“custId”:“15”,“item”:“shirt”,“amount”:2}}
{“id”:“15”,“itm”:“shirt”,“count”:2}
Full size Image Area with text
Data Filtering and Preprocessing
Data Filtering and Preprocessing
Filtering
● Value ranges
● String matching
● Regex
Setting Defaults
● Null checks
● Default function
● If-then-else function
define stream OrderStream
(custId string, item string, amount int);
from OrderStream[item!=“unknown”]
select default(custId, “internal”) as custId,
item,
ifThenElse(amount<0, 0, amount) as amount,
insert into CleansedOrderStream;
Full size Image Area with text
Date Transformation
Date Transformation
Data extraction
● JSON, Text
Reconstruct messages
● JSON, Text
Inline operations
● Math, Logical operations
Inbuilt functions
● 60+ extensions
Custom functions
● Java, JS
json:getDouble(json,"$.amount") as amount
str:concat(‘Hello ’,name) as greeting
amount * price as cost
time:extract('DAY', datetime) as day
myFunction(item, price) as discount
Full size Image Area with text
Database Integration and Caching
Database Integration and Caching
Supported Databases:
● RDBMS (MySQL, Oracle, DB2, Postgre, H2), Redis, Hazelcast
● MongoDB, HBase, Cassandra, Solr, Elasticsearch
In-memory Table
Joining stream with a table.
define stream CleansedOrderStream
(custId string, item string, amount int);
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable (name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
In-memory Table
Join Query
Database Integration
Joining stream and table.
define stream CleansedOrderStream
(custId string, item string, amount int);
@store(type=‘rdbms’, …,)
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable(name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
Table backed with DB
Join Query
Database Caching
Joining table with cache (preloads data for high read performance).
define stream CleansedOrderStream
(custId string, item string, amount int);
@store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … ))
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable(name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
Table with Cache
Join Query
Full size Image Area with text
Service Integration and Error Handling
Enriching data with HTTP and gRPC service Calls
● Non blocking
● Handle responses based on status
codes
Service Integration and Error Handling
200
4**
Handle response based on
status code
SQL for HTTP Service Integration
Calling external HTTP service and consuming the response.
@sink(type='http-call', publisher.url="http://mystore.com/discount",
sink.id="discount", @map(type='json'))
define stream EnrichedOrderStream (custId string, item string, price double);
@source(type='http-call-response', http.status.code="200",
sink.id="discount", @map(type='json',
@attributes(custId ="trp:custId", ..., price="$.discountedPrice")))
define stream DiscountedOrderStream (custId string, item string, price double);
Call service
Consume Response
Error Handling Options
Options when endpoint is not available.
● Log and drop the events
● Wait and back pressure until the service becomes available
● Divert events to another stream for error handling.
In all cases system continuously retries for reconnection.
Events Diverted Into Error Stream
@onError(action='stream')
@sink(type='http', publisher.url = 'http://localhost:8080/logger',
on.error='stream', @map(type = 'json'))
define stream DiscountedOrderStream (custId string, item string, price double);
from !DiscountedOrderStream
select custId, item, price, _error
insert into FailedEventsTable;
Diverting connection failure
events into table.
Full size Image Area with text
Data Summarization
Data Summarization
Type of data summarization
● Time based
○ Sliding time window
○ Tumbling time window
○ On time granularities (secs to years)
● Event count based
○ Sliding length window
○ Tumbling length window
● Session based
● Frequency based
Type of aggregations
● Sum
● Count
● Avg
● Min
● Max
● DistinctCount
● StdDev
Summarizing Data Over Shorter Period Of Time
Use window query to aggregate orders over time for each customer.
define stream DiscountedOrderStream (custId string, item string, price double);
from DiscountedOrderStream#window.time(10 min)
select custId, sum(price) as totalPrice
group by custId
insert into AlertStream;
Window query
with aggregation and
rate limiting
Aggregation Over Multiple Time Granularities
Aggregation on every second, minute, hour, … , year
Built using 𝝀 architecture
● In-memory real-time data
● RDBMs based historical data
define aggregation OrderAggregation
from OrderStream
select custId, itemId, sum(price) as total, avg(price) as avgPrice
group by custId, itemId
aggregate every sec ... year;
Query
Speed Layer &
Serving Layer
Batch Layer
Data Retrieval from Aggregations
Query to retrieve data for relevant time interval and granularity.
Data being retrieved both from memory and DB with milliseconds accuracy.
from OrderAggregation
within "2019-10-06 00:00:00",
"2019-10-30 00:00:00"
per "days"
select total as orders;
Full size Image Area with text
Rule Processing
Rule Processing
Type of predefined rules
● Rules on single event
○ Filter, If-then-else, Match, etc.
● Rules on collection of events
○ Summarization
○ Join with window or table
● Rules based on event occurrence order
○ Pattern detection
○ Trend (sequence) detection
○ Non-occurrence of event
Alert Based On Event Occurrence Order
Use pattern query to detect event occurrence order and non occurrence.
define stream OrderStream (custId string, orderId string, ...);
define stream PaymentStream (orderId string, ...);
from every (e1=OrderStream) ->
not PaymentStream[e1.orderId==orderId] for 15 min
select e1.custId, e1.orderId, ...
insert into PaymentDelayedStream;
Non occurrence of event
Full size Image Area with text
Serving Online and Predefined
ML Models
Serving Online and Predefined ML Models
Type of Machine Learning and Artificial Intelligence processing
● Anomaly detection
○ Markov model
● Serving pre-created ML models
○ PMML (build from Python, R, Spark, H2O.ai, etc)
○ TensorFlow
● Online machine learning
○ Clustering
○ Classification
○ Regression from OrderStream
#pmml:predict(“/home/user/ml.model”,custId, itemId)
insert into RecommendationStream;
Find recommendations
Full size Image Area with text
Scatter-gather and Data Pipelining
Scatter-gather and Data Pipelining
Divide into sub-elements, process each and combine the results
Example :
json:tokenize() -> process -> window.batch() -> json:group()
str:tokenize() -> process -> window.batch() -> str:groupConcat()
{x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}
● Create a Siddhi App per use case (Collection of queries).
● Connect multiple Siddhi Apps using in-memory source and sink.
● Allow rules addition and deletion at runtime.
Modularization
Siddhi Runtime
Siddhi App
for data capture
and preprocessing
Siddhi Apps
for each use case
Siddhi App
for common data
publishing logic
Periodically Trigger Events
Periodic events can be generated to initialize data pipelines
● Time interval
● Cron expression
● At start
define trigger FiveMinTrigger at every 5 min;
define trigger WorkStartTrigger at '0 15 10 ? * MON-FRI';
define trigger InitTrigger at 'start';
Full size Image Area with text
Realtime Decisions As A Service
Realtime Decisions As A Service
Query Data Stores using REST APIs
● Database backed stores (RDBMS, NoSQL)
● Named aggregations
● In-memory windows & tables
Call HTTP and gRPC service using REST APIs
● Use Service and Service-Response
loopbacks
● Process Siddhi query chain
and send response synchronously
Full size Image AreaScalable Deployment
Deployment of Scalable Stateful Apps
● Data kept in memory.
● Perform periodic
state snapshots
and replay data from
NATS.
● Scalability is achieved
partitioning data by
key.
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works?
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
System snapshots periodically, replay data from source upon failure.
How Checkpointing Works? ...
Full size Image Area with text
Siddhi Deployment in Kubernetes
Kubernetes Deployment
$ kubectl get siddhi
NAME STATUS READY AGE
Sample-app Running 2/2 5m
@source(type = ‘HTTP’, …, @map(type = ‘json’))
define stream ProductionStream (name string, amount double, factoryId int);
@dist(parallel = ‘4’, execGroup = ‘gp1’)
from ProductionStream[amount > 100]
select *
insert into HighProductionStream ;
@dist(parallel = ‘2’, execGroup = ‘gp2’)
partition with (factoryId of HighProductionStream)
begin
from HighProductionStream#window.timeBatch(1 min)
select factoryId, sum(amount) as amount
group by factoryId
insert into ProdRateStream ;
end;
Filter
Source
FilterFilterFilter
PartitionPartition
Sample Distributed Siddhi App
Thank You
For more information visit
https://siddhi.io

More Related Content

What's hot (20)

PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Apache Kafka in Financial Services - Use Cases and Architectures
Kai Wähner
 
PPTX
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PDF
Achieving Lakehouse Models with Spark 3.0
Databricks
 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
PPT
Date warehousing concepts
pcherukumalla
 
PDF
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PPTX
API Strategy Introduction
Doug Gregory
 
PPTX
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
PPTX
Sap Analytics Cloud
Srinath Reddy
 
PPTX
SAP’s Intelligent Enterprise Strategy
AGSanePLDTCompany
 
PDF
Building a Data Lake on AWS
Gary Stafford
 
PDF
adb.pdf
AdityaMehta724216
 
PPTX
Data Lake Overview
James Serra
 
PPTX
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Apache Kafka in Financial Services - Use Cases and Architectures
Kai Wähner
 
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
Delta lake and the delta architecture
Adam Doyle
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Modernizing to a Cloud Data Architecture
Databricks
 
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Date warehousing concepts
pcherukumalla
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
API Strategy Introduction
Doug Gregory
 
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
Sap Analytics Cloud
Srinath Reddy
 
SAP’s Intelligent Enterprise Strategy
AGSanePLDTCompany
 
Building a Data Lake on AWS
Gary Stafford
 
Data Lake Overview
James Serra
 
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 

Similar to Siddhi - cloud-native stream processor (20)

PDF
A head start on cloud native event driven applications - bigdatadays
Sriskandarajah Suhothayan
 
PDF
WSO2 Complex Event Processor - Product Overview
WSO2
 
PDF
[WSO2Con EU 2018] Patterns for Building Streaming Apps
WSO2
 
PDF
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2
 
PDF
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
PDF
Patterns for Building Streaming Apps
Mohanadarshan Vivekanandalingam
 
PDF
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
PDF
[WSO2Con USA 2018] Patterns for Building Streaming Apps
WSO2
 
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
PPTX
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PPTX
Siddhi: A Second Look at Complex Event Processing Implementations
Srinath Perera
 
PDF
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Sriskandarajah Suhothayan
 
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
PDF
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
WSO2
 
PDF
Solutions Using WSO2 Analytics
WSO2
 
PDF
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2
 
PDF
WSO2 Data Analytics Server - Product Overview
WSO2
 
PDF
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
PDF
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
A head start on cloud native event driven applications - bigdatadays
Sriskandarajah Suhothayan
 
WSO2 Complex Event Processor - Product Overview
WSO2
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
WSO2
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
Patterns for Building Streaming Apps
Mohanadarshan Vivekanandalingam
 
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
WSO2
 
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
Introduction to Stream Processing
Guido Schmutz
 
Siddhi: A Second Look at Complex Event Processing Implementations
Srinath Perera
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Sriskandarajah Suhothayan
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
WSO2
 
Solutions Using WSO2 Analytics
WSO2
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2
 
WSO2 Data Analytics Server - Product Overview
WSO2
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Ad

More from Sriskandarajah Suhothayan (6)

PDF
Stream Processing with Ballerina
Sriskandarajah Suhothayan
 
PDF
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
PDF
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Sriskandarajah Suhothayan
 
PDF
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan
 
PPT
Siddhi CEP 2nd sideshow presentation
Sriskandarajah Suhothayan
 
PPTX
Siddhi CEP 1st presentation
Sriskandarajah Suhothayan
 
Stream Processing with Ballerina
Sriskandarajah Suhothayan
 
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Sriskandarajah Suhothayan
 
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan
 
Siddhi CEP 2nd sideshow presentation
Sriskandarajah Suhothayan
 
Siddhi CEP 1st presentation
Sriskandarajah Suhothayan
 
Ad

Recently uploaded (20)

PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Digital Circuits, important subject in CS
contactparinay1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

Siddhi - cloud-native stream processor

  • 1. Cloud-native Stream Processor S. Suhothayan (Suho) Director WSO2 @suhothayan
  • 2. Adaptation of Microservices for Stream Processing Organizations try to: 1. Port legacy big data solutions to the cloud. - Not designed to work in microservice architecture. - Massive (need 5 - 6 large nodes). - Needs multiple tools for integration and analytics. 2. Build microservices using on-premise solutions such as Siddhi or Esper library. - Do not support scalability and fault tolerance.
  • 3. Introducing A Cloud Native Stream Processor ● Lightweight (low memory footprint and quick startup) ● 100% open source (no commercial features) with Apache License v2. ● Native support for Docker and Kubernetes. ● Support agile devops workflow and full CI/CD pipeline. ● Allow event processing logics to be written in SQL like query language and via graphical tool. ● Single tool for data collection, injection, processing, analyzing, integrating (with services and databases), and to manage notifications
  • 5. Key features ● Native distributed deployment in Kubernetes ● Native CDC support for Oracle, MySQL, MSSQL, Postgres ● Long running aggregations from sec to years ● Complex pattern detection ● Online machine learning ● Synchronous decision making ● DB integration with caching ● Service integration with error handling ● Multiple built in connectors (file, Kafka, NATS, gRPC, ...)
  • 6. Scenarios and Use Cases Supported by Siddhi 1. Realtime Policy Enforcement Engine 2. Notification Management 3. Streaming Data Integration 4. Fraud Detection 5. Stream Processing at Scale on Kubernetes 6. Embedded Decision Making 7. Monitoring and Time Series Data Analytics 8. IoT, Geo and Edge Analytics 9. Realtime Decision as a Service 10. Realtime Predictions With Machine Learning Find out more about the Supported Siddhi Scenarios here
  • 8. Success Stories Experian makes real-time marketing channel decisions under 200 milliseconds using Siddhi. Eurecat built their next generation shopping experience by integrating iBeacons and IoT devices with WSO2. Cleveland Clinic and Hospital Corporation of America detects critical patient conditions and alert nurses and also automates decisions during emergencies. BNY Mellon uses Siddhi as a notification management engine.
  • 9. Success Stories ... a TFL used WSO2 real time streaming to create next generation transport systems. Uber detected fraud in real time, processing over 400K events per second Use Siddhi as its APIM management throttling engine and as the policy manager for its Identify and Access Platform EBay and PayPal uses Siddhi as part of Apache Eagle to as a policy enforcement engine.
  • 10. Image Area Working with Siddhi ● Develop apps using Siddhi Editor. ● CI/CD with build integration and Siddhi Test Framework. ● Running modes ○ Emadded in Java/Python apps. ○ Microservice in bare metal/VM. ○ Microservice in Docker. ○ Microservice in Kubernetes (distributed deployment with NATS)
  • 11. Streaming SQL @app:name('Alert-Processor') @source(type='kafka', ..., @map(type='json')) define stream TemperatureStream (roomNo string, temp double); @info(name='AlertQuery') from TemperatureStream#window.time(5 min) select roomNo, avg(temp) as avgTemp group by roomNo insert into AvgTemperatureStream; Source/Sink & Streams Window Query with Rate Limiting
  • 15. Reference CI/CD Pipeline of Siddhi https://medium.com/siddhi-io/building-an-efficient-ci-cd-pipeline-for-siddhi-c33150721b5d
  • 16. Full size Image Area with text Supported Data Processing Patterns
  • 17. Supported Data Processing Patterns 1. Consume and publish events with various data formats. 2. Data filtering and preprocessing. 3. Date transformation. 4. Database integration and caching. 5. Service integration and error handling. 6. Data Summarization. 7. Rule processing. 8. Serving online and predefined ML models. 9. Scatter-gather and data pipelining. 10. Realtime decisions as a service (On-demand processing).
  • 18. Image Area Scenario: Order Processing Customers place orders. Shipments are made. Customers pay for the order. Tasks: ● Process order fulfillment. ● Alerts sent on abnormal conditions. ● Send recommendations. ● Throttle order requests when limit exceeded. ● Provide order analytics over time.
  • 19. Full size Image Area with text Consume and Publish Events With Various Data Formats
  • 20. Consume and Publish Events With Various Data Formats Supported transports ● NATS, Kafka, RabbitMQ, JMS, IBMMQ, MQTT ● Amazon SQS, Google Pub/Sub ● HTTP, gRPC, TCP, Email, WebSocket, ● Change Data Capture (CDC) ● File, S3, Google Cloud Storage Supported data formats ● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV
  • 21. Consume and Publish Events With Various Data Formats Default JSON mapping Custom JSON mapping @source(type = mqtt, …, @map(type = json)) define stream OrderStream(custId string, item string, amount int); @source(type = mqtt, …, @map(type = json, @attribute(“$.id”,"$.itm” “$.count”))) define stream OrderStream(custId string, item string, amount int); {“event”:{“custId”:“15”,“item”:“shirt”,“amount”:2}} {“id”:“15”,“itm”:“shirt”,“count”:2}
  • 22. Full size Image Area with text Data Filtering and Preprocessing
  • 23. Data Filtering and Preprocessing Filtering ● Value ranges ● String matching ● Regex Setting Defaults ● Null checks ● Default function ● If-then-else function define stream OrderStream (custId string, item string, amount int); from OrderStream[item!=“unknown”] select default(custId, “internal”) as custId, item, ifThenElse(amount<0, 0, amount) as amount, insert into CleansedOrderStream;
  • 24. Full size Image Area with text Date Transformation
  • 25. Date Transformation Data extraction ● JSON, Text Reconstruct messages ● JSON, Text Inline operations ● Math, Logical operations Inbuilt functions ● 60+ extensions Custom functions ● Java, JS json:getDouble(json,"$.amount") as amount str:concat(‘Hello ’,name) as greeting amount * price as cost time:extract('DAY', datetime) as day myFunction(item, price) as discount
  • 26. Full size Image Area with text Database Integration and Caching
  • 27. Database Integration and Caching Supported Databases: ● RDBMS (MySQL, Oracle, DB2, Postgre, H2), Redis, Hazelcast ● MongoDB, HBase, Cassandra, Solr, Elasticsearch
  • 28. In-memory Table Joining stream with a table. define stream CleansedOrderStream (custId string, item string, amount int); @primaryKey(‘name’) @index(‘unitPrice’) define table ItemPriceTable (name string, unitPrice double); from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name select O.custId, O.item, O.amount * T.unitPrice as price insert into EnrichedOrderStream; In-memory Table Join Query
  • 29. Database Integration Joining stream and table. define stream CleansedOrderStream (custId string, item string, amount int); @store(type=‘rdbms’, …,) @primaryKey(‘name’) @index(‘unitPrice’) define table ItemPriceTable(name string, unitPrice double); from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name select O.custId, O.item, O.amount * T.unitPrice as price insert into EnrichedOrderStream; Table backed with DB Join Query
  • 30. Database Caching Joining table with cache (preloads data for high read performance). define stream CleansedOrderStream (custId string, item string, amount int); @store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … )) @primaryKey(‘name’) @index(‘unitPrice’) define table ItemPriceTable(name string, unitPrice double); from CleansedOrderStream as O join ItemPriceTable as T on O.item == T.name select O.custId, O.item, O.amount * T.unitPrice as price insert into EnrichedOrderStream; Table with Cache Join Query
  • 31. Full size Image Area with text Service Integration and Error Handling
  • 32. Enriching data with HTTP and gRPC service Calls ● Non blocking ● Handle responses based on status codes Service Integration and Error Handling 200 4** Handle response based on status code
  • 33. SQL for HTTP Service Integration Calling external HTTP service and consuming the response. @sink(type='http-call', publisher.url="http://mystore.com/discount", sink.id="discount", @map(type='json')) define stream EnrichedOrderStream (custId string, item string, price double); @source(type='http-call-response', http.status.code="200", sink.id="discount", @map(type='json', @attributes(custId ="trp:custId", ..., price="$.discountedPrice"))) define stream DiscountedOrderStream (custId string, item string, price double); Call service Consume Response
  • 34. Error Handling Options Options when endpoint is not available. ● Log and drop the events ● Wait and back pressure until the service becomes available ● Divert events to another stream for error handling. In all cases system continuously retries for reconnection.
  • 35. Events Diverted Into Error Stream @onError(action='stream') @sink(type='http', publisher.url = 'http://localhost:8080/logger', on.error='stream', @map(type = 'json')) define stream DiscountedOrderStream (custId string, item string, price double); from !DiscountedOrderStream select custId, item, price, _error insert into FailedEventsTable; Diverting connection failure events into table.
  • 36. Full size Image Area with text Data Summarization
  • 37. Data Summarization Type of data summarization ● Time based ○ Sliding time window ○ Tumbling time window ○ On time granularities (secs to years) ● Event count based ○ Sliding length window ○ Tumbling length window ● Session based ● Frequency based Type of aggregations ● Sum ● Count ● Avg ● Min ● Max ● DistinctCount ● StdDev
  • 38. Summarizing Data Over Shorter Period Of Time Use window query to aggregate orders over time for each customer. define stream DiscountedOrderStream (custId string, item string, price double); from DiscountedOrderStream#window.time(10 min) select custId, sum(price) as totalPrice group by custId insert into AlertStream; Window query with aggregation and rate limiting
  • 39. Aggregation Over Multiple Time Granularities Aggregation on every second, minute, hour, … , year Built using 𝝀 architecture ● In-memory real-time data ● RDBMs based historical data define aggregation OrderAggregation from OrderStream select custId, itemId, sum(price) as total, avg(price) as avgPrice group by custId, itemId aggregate every sec ... year; Query Speed Layer & Serving Layer Batch Layer
  • 40. Data Retrieval from Aggregations Query to retrieve data for relevant time interval and granularity. Data being retrieved both from memory and DB with milliseconds accuracy. from OrderAggregation within "2019-10-06 00:00:00", "2019-10-30 00:00:00" per "days" select total as orders;
  • 41. Full size Image Area with text Rule Processing
  • 42. Rule Processing Type of predefined rules ● Rules on single event ○ Filter, If-then-else, Match, etc. ● Rules on collection of events ○ Summarization ○ Join with window or table ● Rules based on event occurrence order ○ Pattern detection ○ Trend (sequence) detection ○ Non-occurrence of event
  • 43. Alert Based On Event Occurrence Order Use pattern query to detect event occurrence order and non occurrence. define stream OrderStream (custId string, orderId string, ...); define stream PaymentStream (orderId string, ...); from every (e1=OrderStream) -> not PaymentStream[e1.orderId==orderId] for 15 min select e1.custId, e1.orderId, ... insert into PaymentDelayedStream; Non occurrence of event
  • 44. Full size Image Area with text Serving Online and Predefined ML Models
  • 45. Serving Online and Predefined ML Models Type of Machine Learning and Artificial Intelligence processing ● Anomaly detection ○ Markov model ● Serving pre-created ML models ○ PMML (build from Python, R, Spark, H2O.ai, etc) ○ TensorFlow ● Online machine learning ○ Clustering ○ Classification ○ Regression from OrderStream #pmml:predict(“/home/user/ml.model”,custId, itemId) insert into RecommendationStream; Find recommendations
  • 46. Full size Image Area with text Scatter-gather and Data Pipelining
  • 47. Scatter-gather and Data Pipelining Divide into sub-elements, process each and combine the results Example : json:tokenize() -> process -> window.batch() -> json:group() str:tokenize() -> process -> window.batch() -> str:groupConcat() {x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}
  • 48. ● Create a Siddhi App per use case (Collection of queries). ● Connect multiple Siddhi Apps using in-memory source and sink. ● Allow rules addition and deletion at runtime. Modularization Siddhi Runtime Siddhi App for data capture and preprocessing Siddhi Apps for each use case Siddhi App for common data publishing logic
  • 49. Periodically Trigger Events Periodic events can be generated to initialize data pipelines ● Time interval ● Cron expression ● At start define trigger FiveMinTrigger at every 5 min; define trigger WorkStartTrigger at '0 15 10 ? * MON-FRI'; define trigger InitTrigger at 'start';
  • 50. Full size Image Area with text Realtime Decisions As A Service
  • 51. Realtime Decisions As A Service Query Data Stores using REST APIs ● Database backed stores (RDBMS, NoSQL) ● Named aggregations ● In-memory windows & tables Call HTTP and gRPC service using REST APIs ● Use Service and Service-Response loopbacks ● Process Siddhi query chain and send response synchronously
  • 52. Full size Image AreaScalable Deployment
  • 53. Deployment of Scalable Stateful Apps ● Data kept in memory. ● Perform periodic state snapshots and replay data from NATS. ● Scalability is achieved partitioning data by key.
  • 54. System snapshots periodically, replay data from source upon failure. How Checkpointing Works?
  • 55. System snapshots periodically, replay data from source upon failure. How Checkpointing Works? ...
  • 56. System snapshots periodically, replay data from source upon failure. How Checkpointing Works? ...
  • 57. System snapshots periodically, replay data from source upon failure. How Checkpointing Works? ...
  • 58. System snapshots periodically, replay data from source upon failure. How Checkpointing Works? ...
  • 59. System snapshots periodically, replay data from source upon failure. How Checkpointing Works? ...
  • 60. Full size Image Area with text Siddhi Deployment in Kubernetes
  • 61. Kubernetes Deployment $ kubectl get siddhi NAME STATUS READY AGE Sample-app Running 2/2 5m
  • 62. @source(type = ‘HTTP’, …, @map(type = ‘json’)) define stream ProductionStream (name string, amount double, factoryId int); @dist(parallel = ‘4’, execGroup = ‘gp1’) from ProductionStream[amount > 100] select * insert into HighProductionStream ; @dist(parallel = ‘2’, execGroup = ‘gp2’) partition with (factoryId of HighProductionStream) begin from HighProductionStream#window.timeBatch(1 min) select factoryId, sum(amount) as amount group by factoryId insert into ProdRateStream ; end; Filter Source FilterFilterFilter PartitionPartition Sample Distributed Siddhi App
  • 63. Thank You For more information visit https://siddhi.io