SlideShare a Scribd company logo
StreamING models
Realtime model deployment of ML capabilities
Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
 IT Chapter Lead within the Fraud & Cybersecurity
department, based in Amsterdam
 Before ING implemented Enterprise Software,
mainly knowledge management and CRM related
 Background in: Scala, Java, C# (MCSD), Tomcat, Websphere,
Oracle, Cassandra and now….Flink
https://www.linkedin.com/in/erik-de-nooij-93ab1a/
Erik.g.de.Nooij@ing.nl
Who Am I?
2
About ING
Worldwide
 35 Million customers
 51.000 Employees
 Presence in over 40 countries
Netherlands
 9 Million Customers
 Billion logins yearly on https://www.ing.nl
 1 million transactions per day
About ING
4
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
The Netherlands
Threats
Individuals Small groups worldwide groups Organized crime
Manual detection
Rule based detection
Model based detection
Criminal
organizationResponse
Scanomaly detection
Fake ID Skimming Phishing APT
?
2008 2010 2012 2014
2017
Threats related to fraud & cybersecurity
5
Carbanak APT (Advanced Persistent Threat)
6
 This started via a phishing email…
 Support various types of (ML) models
 Tools to create models versus scoring models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Multiple domains
Goals
7
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Pre-processor, Decoupled architecture
 Make changes instantly (no downtime)
 Multiple domains
Goals
8
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Use case
 Feature extraction
 Enriching streams
 End user tooling
 Demo
 Multiple domains
Goals
9
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Multiple domains
 examples
Goals
10
Support various types of models
Model creation
HDFS
offline
Model execution
Streaming
platform
online
Creating models offline, scoring online
12
<PMML />
{PFA}
Portable model
 The Predictive Model Markup Language (PMML)
is an XML-based predictive model interchange format
Predictive Model Markup Language (PMML)
13
<SimpleRule score="Alert" weight="1.0">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="field1" operator="greaterThan" value="500"/>
<SimplePredicate field="field2" operator="equal" value="1"/>
<SimplePredicate field="field3" operator="greaterThan" value="1"/>
</CompoundPredicate>
</SimpleRule>
if field1 > 500
AND
field2 == 1
AND
field3 > 1
 The Predictive Model Markup Language (PMML)
is an XML-based predictive model interchange format
Predictive Model Markup Language (PMML)
14
Machinelearningtools supporting pmml
15
 Parse the pmml file(s)
 Pass on the Feature Set to the model(s)
 Run the ‘predict’ function which returns the output of the model(s)
16
Model scoringusing OpenScoring.iolibrary
Control stream
Data stream
Score
Feature sets
model
scoring
Supportedmodels
17
Supported models(*)
Association rules Regression
Cluster model Rule set
General regression Scorecard
Naive Bayes Support Vector Machine
k-Nearestneighbours Tree model
Neural network Ensemble model
(*) supported models by http://openscoring.io/
Goals
18
Use of various types of models
One codebase, SaaS Deployment model
Pre-processor, Decoupled architecture
Make changes instantly (no downtime)
Multiple domains
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
One Bank Strategy
19
How flexible is this architecture?
20
Feature extraction
&
Model scoring
Amount = “42,00”
Amountincents = 4200
Amount = 42.00
Decoupled architecture
21
Feature extraction
&
Model scoring
Pre-
Processor
Busines
s events
Amount = “42.00”
Amountincents = 4200
Amount = 42.00
Amountincents = 4200
Goals
22
Use of various types of models
One codebase, SaaS Deployment model
Make changes instantly (no downtime)
 Use case
 Feature extraction
 Enriching streams
 End user tooling
 Demo
Multiple domains
• Your phone with the banking app installed is stolen
• Limit on the banking app is 1.000,-
• Funds are transferred from your account (A) to a mule account (B)
Use case
23
Model features and model output
24
Amount > 500
NrOf Trxs Last 1h
First Trx <24h ago
Model
Alert || OK
Stream with stateless operators
25
A
B
1000
Ev.1
Model
scoring
Amount, Unknown, PrevTrxs
PMM
L
FeX
(1000, ?, ?)
Feature
extraction
Stream with stateful operators
26
STATE
A
B
1000
Ev.1
A
B
1000
Ev.2
Model
scoring
Alert ||
OK
Alert ||
OK
Key Value
(A,B, FirstTrx) Ev.1
(A,B, HistoricalTrxs) ev11000
Amount, Unknown, PrevTrxs
PMM
L
FeX
(1000, true, 1)
Key Value
(A,B, FirstTrx) Ev.1
(A,B, HistoricalTrxs) ev11000, ev21000
Amount, Unknown, PrevTrxs
(1000, true, 0)
How to perform aggregate functions on a stream?
27
Average amount last week: € 37,04
Max amount last month: € 834,12
Average amount last week: € 37,04
A
B
IP
1000
Ev.1
192.x.x.4, …….
192.x.x.3, 192.x.x.7
192.x.x.2, 192.x.x.6
192.x.x.1, 192.x.x.5
Aggregation
step
Calculating
features
Enriching the stream based on multiple keys
28
Split
A A’
A
B
IP
1000
Ev.1
B
A.
B
I
P
B’
A.B’
IP’
3542321
3542321
3542321
3542321
3542321
A,E,I ..
B D,F ..
C G, H
..
J, K ..
Accounts are distributed across the task managers
(A.B’,
1000)
Aggregating and model scoring
29
A
B
IP
1000
Ev.1
1. Amount
2. (A.B).FirstTr
x
3. (A.B).NrTrxs
A
B
IP
1000
Ev.1
A.B’
B’
(B’)
1. B’
1. IP’
2. ….
Aggregation Model Scoring
A DSL is a domain specific language. We use it to define the
behaviour of our operators.
 The persist rules (which data to store within state)
 Feature calculation rules
 Model definition rules
Domain Specific Language (DSL)
30
Definition instead of code - Persist rule
31
history[double, 4weeks,100] @(sourceAccntNr.destAccntNr).Trxs := $amount
NrOf Trxs Last 1h
count(between @(sourceAccntNr.destAccntNr).Trxs, $eventtime,$eventtime-1hour));
First Trx A to B <24h
@(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours;
Feature Calculation rules
32
Model creation
HDFS
offline
Model execution
Streaming
platform
online
Creating models offline, scoring online
33
<PMML />
{PFA}
Portable model
DSL
Data scientist
with offline
tooling
Control streams
Split Fex
&
Model
scoring
Streaming in the defintions
35
Broad
cast
DSL files
 Model definitions
 Feature calculation rules Persist rules
Demo
Goals
37
Use of various types of models
One codebase, SaaS Deployment model
Make changes instantly (no downtime)
Multiple domains
We have built a feature-extraction engine and used that to make a
Fraud-Risk Engine
Can we also build this?….
 Customer Notifications?
 Calculating RFQ’s for Bond Prices?
 Product Fullfilment engine?
 Other?
Multiple domains – ponder on this
38
Take aways
39
Decoupled
architecture
with preprocessor
Enriching events
with multiple keys
End users
making changes
Multiple domain
Ad

More Related Content

What's hot (20)

Lean Master Data Management
Lean Master Data ManagementLean Master Data Management
Lean Master Data Management
nnorthrup
 
Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case
Ramandeep Kaur Bagri
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
vineeta vineeta
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
Arul Bharathi
 
Adversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender SystemsAdversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender Systems
WQ Fan
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
Ron Bodkin
 
Introduction to BPM
Introduction to BPMIntroduction to BPM
Introduction to BPM
Sandy Kemsley
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
akbkck
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
Ronan Soares
 
Recommendation Engine Project Presentation
Recommendation Engine Project PresentationRecommendation Engine Project Presentation
Recommendation Engine Project Presentation
19Divya
 
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Market Engel SAS
 
Big data and machine learning for Businesses
Big data and machine learning for BusinessesBig data and machine learning for Businesses
Big data and machine learning for Businesses
Abdul Wahid
 
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
BlackLine
 
What is BPM?
What is BPM?What is BPM?
What is BPM?
BOC Group
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
Alex Meadows
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
Marlon Dumas
 
Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
Stefano Tempesta
 
Better decision making with proper business intelligence
Better decision making with proper business intelligenceBetter decision making with proper business intelligence
Better decision making with proper business intelligence
madhavlankapati
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
Stock market prediction using data mining
Stock market prediction using data miningStock market prediction using data mining
Stock market prediction using data mining
ShivakumarSoppannavar
 
Lean Master Data Management
Lean Master Data ManagementLean Master Data Management
Lean Master Data Management
nnorthrup
 
Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case
Ramandeep Kaur Bagri
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
vineeta vineeta
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
Arul Bharathi
 
Adversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender SystemsAdversarial Attacks for Recommender Systems
Adversarial Attacks for Recommender Systems
WQ Fan
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
Ron Bodkin
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
akbkck
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
Ronan Soares
 
Recommendation Engine Project Presentation
Recommendation Engine Project PresentationRecommendation Engine Project Presentation
Recommendation Engine Project Presentation
19Divya
 
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Digital signatures, paving the way to a digital Europe_Arthur D Little_2014
Market Engel SAS
 
Big data and machine learning for Businesses
Big data and machine learning for BusinessesBig data and machine learning for Businesses
Big data and machine learning for Businesses
Abdul Wahid
 
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
Reconciliations Done Right: Automate and Scale Your Bank and Credit Card Reco...
BlackLine
 
What is BPM?
What is BPM?What is BPM?
What is BPM?
BOC Group
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
Alex Meadows
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
Marlon Dumas
 
Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
Stefano Tempesta
 
Better decision making with proper business intelligence
Better decision making with proper business intelligenceBetter decision making with proper business intelligence
Better decision making with proper business intelligence
madhavlankapati
 
Analytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko DimeskiAnalytics & Data Strategy 101 by Deko Dimeski
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
Stock market prediction using data mining
Stock market prediction using data miningStock market prediction using data mining
Stock market prediction using data mining
ShivakumarSoppannavar
 

Similar to Flink Forward SF 2017: Erik de Nooij - StreamING models, how ING adds models at runtime to catch fraudsters (20)

Visual basic 6.0
Visual basic 6.0Visual basic 6.0
Visual basic 6.0
Aarti P
 
OneTeam Media Server
OneTeam Media ServerOneTeam Media Server
OneTeam Media Server
Mickaël Rémond
 
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling FrameworkEclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
Dave Steinberg
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of data
Tomasz Sosiński
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
jimfuller2009
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Spark Summit
 
WCF and WF in Framework 3.5
WCF and WF in Framework 3.5WCF and WF in Framework 3.5
WCF and WF in Framework 3.5
ukdpe
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
Keon Kim
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
Publicis Sapient Engineering
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Chester Chen
 
Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11
Matt Warren
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
Soham Kulkarni
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Ivo Andreev
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
PostSharp Technologies
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluent
confluent
 
Visual basic 6.0
Visual basic 6.0Visual basic 6.0
Visual basic 6.0
Aarti P
 
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling FrameworkEclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
Dave Steinberg
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of data
Tomasz Sosiński
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
jimfuller2009
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Spark Summit
 
WCF and WF in Framework 3.5
WCF and WF in Framework 3.5WCF and WF in Framework 3.5
WCF and WF in Framework 3.5
ukdpe
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
Keon Kim
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
Publicis Sapient Engineering
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Chester Chen
 
Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11
Matt Warren
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
Soham Kulkarni
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Ivo Andreev
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluent
confluent
 
Ad

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Ad

Recently uploaded (20)

Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 

Flink Forward SF 2017: Erik de Nooij - StreamING models, how ING adds models at runtime to catch fraudsters

  • 1. StreamING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
  • 2.  IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam  Before ING implemented Enterprise Software, mainly knowledge management and CRM related  Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now….Flink https://www.linkedin.com/in/erik-de-nooij-93ab1a/ [email protected] Who Am I? 2
  • 4. Worldwide  35 Million customers  51.000 Employees  Presence in over 40 countries Netherlands  9 Million Customers  Billion logins yearly on https://www.ing.nl  1 million transactions per day About ING 4 Market leaders Benelux Growth markets Commercial Banking Challengers The Netherlands
  • 5. Threats Individuals Small groups worldwide groups Organized crime Manual detection Rule based detection Model based detection Criminal organizationResponse Scanomaly detection Fake ID Skimming Phishing APT ? 2008 2010 2012 2014 2017 Threats related to fraud & cybersecurity 5
  • 6. Carbanak APT (Advanced Persistent Threat) 6  This started via a phishing email…
  • 7.  Support various types of (ML) models  Tools to create models versus scoring models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Multiple domains Goals 7
  • 8.  Support various types of (ML) models  One codebase, SaaS deployment model  Pre-processor, Decoupled architecture  Make changes instantly (no downtime)  Multiple domains Goals 8
  • 9.  Support various types of (ML) models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Use case  Feature extraction  Enriching streams  End user tooling  Demo  Multiple domains Goals 9
  • 10.  Support various types of (ML) models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Multiple domains  examples Goals 10
  • 12. Model creation HDFS offline Model execution Streaming platform online Creating models offline, scoring online 12 <PMML /> {PFA} Portable model
  • 13.  The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format Predictive Model Markup Language (PMML) 13 <SimpleRule score="Alert" weight="1.0"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="field1" operator="greaterThan" value="500"/> <SimplePredicate field="field2" operator="equal" value="1"/> <SimplePredicate field="field3" operator="greaterThan" value="1"/> </CompoundPredicate> </SimpleRule> if field1 > 500 AND field2 == 1 AND field3 > 1
  • 14.  The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format Predictive Model Markup Language (PMML) 14
  • 16.  Parse the pmml file(s)  Pass on the Feature Set to the model(s)  Run the ‘predict’ function which returns the output of the model(s) 16 Model scoringusing OpenScoring.iolibrary Control stream Data stream Score Feature sets model scoring
  • 17. Supportedmodels 17 Supported models(*) Association rules Regression Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearestneighbours Tree model Neural network Ensemble model (*) supported models by http://openscoring.io/
  • 18. Goals 18 Use of various types of models One codebase, SaaS Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains
  • 19. Market leaders Benelux Growth markets Commercial Banking Challengers One Bank Strategy 19
  • 20. How flexible is this architecture? 20 Feature extraction & Model scoring Amount = “42,00” Amountincents = 4200 Amount = 42.00
  • 21. Decoupled architecture 21 Feature extraction & Model scoring Pre- Processor Busines s events Amount = “42.00” Amountincents = 4200 Amount = 42.00 Amountincents = 4200
  • 22. Goals 22 Use of various types of models One codebase, SaaS Deployment model Make changes instantly (no downtime)  Use case  Feature extraction  Enriching streams  End user tooling  Demo Multiple domains
  • 23. • Your phone with the banking app installed is stolen • Limit on the banking app is 1.000,- • Funds are transferred from your account (A) to a mule account (B) Use case 23
  • 24. Model features and model output 24 Amount > 500 NrOf Trxs Last 1h First Trx <24h ago Model Alert || OK
  • 25. Stream with stateless operators 25 A B 1000 Ev.1 Model scoring Amount, Unknown, PrevTrxs PMM L FeX (1000, ?, ?) Feature extraction
  • 26. Stream with stateful operators 26 STATE A B 1000 Ev.1 A B 1000 Ev.2 Model scoring Alert || OK Alert || OK Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000 Amount, Unknown, PrevTrxs PMM L FeX (1000, true, 1) Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000, ev21000 Amount, Unknown, PrevTrxs (1000, true, 0)
  • 27. How to perform aggregate functions on a stream? 27 Average amount last week: € 37,04 Max amount last month: € 834,12 Average amount last week: € 37,04
  • 28. A B IP 1000 Ev.1 192.x.x.4, ……. 192.x.x.3, 192.x.x.7 192.x.x.2, 192.x.x.6 192.x.x.1, 192.x.x.5 Aggregation step Calculating features Enriching the stream based on multiple keys 28 Split A A’ A B IP 1000 Ev.1 B A. B I P B’ A.B’ IP’ 3542321 3542321 3542321 3542321 3542321 A,E,I .. B D,F .. C G, H .. J, K .. Accounts are distributed across the task managers
  • 29. (A.B’, 1000) Aggregating and model scoring 29 A B IP 1000 Ev.1 1. Amount 2. (A.B).FirstTr x 3. (A.B).NrTrxs A B IP 1000 Ev.1 A.B’ B’ (B’) 1. B’ 1. IP’ 2. …. Aggregation Model Scoring
  • 30. A DSL is a domain specific language. We use it to define the behaviour of our operators.  The persist rules (which data to store within state)  Feature calculation rules  Model definition rules Domain Specific Language (DSL) 30
  • 31. Definition instead of code - Persist rule 31 history[double, 4weeks,100] @(sourceAccntNr.destAccntNr).Trxs := $amount
  • 32. NrOf Trxs Last 1h count(between @(sourceAccntNr.destAccntNr).Trxs, $eventtime,$eventtime-1hour)); First Trx A to B <24h @(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours; Feature Calculation rules 32
  • 33. Model creation HDFS offline Model execution Streaming platform online Creating models offline, scoring online 33 <PMML /> {PFA} Portable model DSL Data scientist with offline tooling
  • 35. Split Fex & Model scoring Streaming in the defintions 35 Broad cast DSL files  Model definitions  Feature calculation rules Persist rules
  • 36. Demo
  • 37. Goals 37 Use of various types of models One codebase, SaaS Deployment model Make changes instantly (no downtime) Multiple domains
  • 38. We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this?….  Customer Notifications?  Calculating RFQ’s for Bond Prices?  Product Fullfilment engine?  Other? Multiple domains – ponder on this 38
  • 39. Take aways 39 Decoupled architecture with preprocessor Enriching events with multiple keys End users making changes Multiple domain