0% found this document useful (0 votes)

2 views

Learning Framework

The document outlines a framework for Adversarial Bandit Control Learning, emphasizing its interpretability, flexibility, and robustness in handling sensitive data and multivariate interactions. It describes a novel supervised learning module utilizing a Bandit Control Tree/Forest approach for feature selection, anomaly detection, and optimization across multiple phases of data processing. The framework aims to enhance predictive modeling through systematic tracking of anomalies and the integration of various machine learning techniques, including tree-based algorithms and deep learning methods.

Uploaded by

Soumyajit Das

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Learning Framework

Uploaded by

Soumyajit Das

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

~Adversarial Bandit Control Learning

https://register.epo.org/espacenet/regviewer?AP=21834552&CY=EP&LG=en&DB=REG
https://patentscope.wipo.int/search/en
/detail.jsf?docId=WO2022003733&_fid
=US393708628
https://ppubs.uspto.gov/pubwebapp/external.html?q=(US-
20230083762-A1).did.&db=US-PGPUB
Unique Objectives : Why we need this Model / Framework

• Interpretable & flexibility over hypothesis space ( open to hand picked rule )
• Missing Value Imputation & Feature Engineering Tool (Band/NLP/Sequence/ DL – Adversarial Forest)
• Robust to sensitive data & High stability
• Multivariate interaction Extraction & stable operating conditions
• Combined Feature selection (Interaction – Resp index )& combined impact/ stability analysis
• Single model for Multi phase data - process cycle optimization (Applicable only for same sensor across multiple phase)
• Controllability vs optimization - optimal operating condition (with Both Side Definition - Min & Max)
• Multi criteria optimization - combined response optimization (multiple Y / Response)
• Anomaly tracking & detection - band/rule fluctuations ( RCA )
• Systematic tracking of anomaly - sensitive var Subset
• Auto EDA – Key Interaction highlight w.r.t Response

Topic Reference :
Response surface method
Information Theory
Bandit Algorithm for Supervised Learning
Tree, Bagging & Boosting Algorithm
Active Learning
Supervised Learning Module : Bandit Control Tree / Forest
Concepts :
Alternative to supervised learning model : Rule engine {similar algo to extract rule like Decision tree }
Raw Input Data
Bucket mapping from Similarity Pair : Do binning/ bucket ( quartile / vector discretization ) - each

continuous variable level

run for all continuous variable for bucket mapping. Number of bin will be based on data driven technique.

Binning can be done based on response variable Y.

rule : As combination to bucket/bin at variable

like :

Based on Y var1 has : 5 bin, var2 has : 10 bin, var3 has : 15 bin

var1:bin5,var2:bin9,var3:bin6 - #_match(1) : 80, #_nomatch(0) 20, tot obs: 821

var1:bin3,var2:bin6,var3:bin7 - #_match(1) : 85, #_nomatch(0) 15, tot obs: 678

%#_match(1) : probability of getting 1(match) (at rule level like D-tree)

%#_nomatch(0) : probability of getting 0(nomatch) (at rule level like D-tree)

result will be same as D tree algo

advantage : rule will include all variable and give a fixed cutoff

we can rank them based on %_match, and tot pair - higher is better

it will help us to apply rule directly on data to predict - match / nomatch

Bucket / Splitting
Raw Input Data Transformed Data
Input Response
X1 x2 x3 x4 x5 y1 y2 y3 y3a y3b y3c
1 8 46 a 7 92 1 a 1 0 0
4 6 35 a 7 95 1 a 1 0 0
5 9 28 a 7 90 1 a 1 0 0
4 6 33 a 1 92 0 a 1 0 0
2 10 30 a 1 89 1 b 0 1 0
1 9 22 b 1 89 1 b 0 1 0
4 8 44 b 1 91 1 b 0 1 0
1 5 39 b 1 94 0 c 0 0 1
5 5 25 b 1 99 0 c 0 0 1
3 5 39 b 4 100 0 c 0 0 1
3 6 50 c 4 99 0 c 0 0 1
4 6 38 c 4 95 0 c 0 0 1
4 5 29 c 4 95 0 c 0 0 1
1 6 38 c 6 98 1 c 0 0 1
3 8 21 c 6 90 1 c 0 0 1
1 5 37 a 6 97 1 a 1 0 0

Data Transformation on Direct input & Transformed Feature Bucket size –

Split Criteria – independent & Dependent var1 with 2000 Unique pt. -> 40 to 50 bucket
(with max bucket size – 10 to 100) -> Bucket Details var2 with 10000 to 50000 unique pts -> 100 bucket
Bucket size – based on cardinality & few other factor from input vars Same Applicable for Response Var Y ( Continuous ) -> Percentile / Bucket
( Both Side Definition - Min & Max )
Using these Split Definition – transform data [Response sensitive Segmentation ]
Band Tree Table formation after applying different split criteria

Using These
Aggregate trees for
/ Group by generating
scores from out
of bag sample

1) Split Criteria – independent & Dependent (with max bucket size – 10 to

100)

=B 2) Variable subset -> N/2, Log(N), nCk : ( K<N ) : continuous only

( Categorical fixed )
Bucket creation based on 1 & 2
Band Tree Table extraction for all 1 X 2 combination
Band Tree Table ensemble Prediction – Scoring / Classification
Basic eda - uva, Correlation & BVA analysis,
features cluster

1) Var type & granularity ( unique pt/ var Row

size) - continuous & split eligible

2) Splitting criteria – independent, Dependent

3) for all filtered set of vars/ features - band or

rule / branch (x10, y25, z47 - > response 0.70,
X30, y5, z3 - > response 0.80)
Split vars with bin range 10 – 100

4) create multiple Subset of vars vs split

combination to create band tree / band forest

5) for each band get count of class, average of

response
for regression - average, median
Classification - average of probability / ratio
Also add other uva, bva score mode, sd, woe,
Snr....
One Subset of vars & one splitting criteria - one
band tree
... Like multiple diff Subset of vars & diff splitting
criteria - multiple band tree / band forest

Regression 6) for prediction - get band level map based on

band definition (x10 : min x =2.5 max x= 6.5;
Classification - binary & y25 : min y = 3.65 max y=50.45)
multiclass
Bandit control, optimal Get response score generated from band tree table
condition, risk level - scoring / - tree level score
Get average at band forests level aggregation –
ratio
Model Building Steps :
A) For classification get weighted average.
Data dimension : 10M X 200 Input [Full Batch – 2000], Response : Unit Count, Yield, Quality Class
Weight - multivariate IV based from each
Band tree (scaled)
Split Data into 70% Train [FB-1400]– 30% Test [FB-600] [15% ITV, 15% OTV]
For 7M Data points – 5M for Generating tree, 2M for OOB Validation B) For Regression - average / weighted - Central
tendency
Band Tree Table ensemble Prediction – Regularization
Regularization &
Prediction :
Robustness check

1) Band pruning : rules

filtering
Filter band by number of
supporting observations /
high residual - rule
fluctuation

2) Band tree table weight :

Predictive power associated
to single tree
Filter band tree tables by IV,
info content weight, validate
with training split rule to
check psi
Surfaces / band tree
fluctuation

Filter band tree based on information content & For each data point /obs
Apply split details (bucket
predictive power – IV, AUC, R2 /band definition) to map
specific band of a band tree
Take scaled score for ensemble model – weighted table to generate
score responses score

Get response score from

band 1 from band tree table
1
Band 2 from band tree
table 2
Band 3 from band tree
Inference code / Hypothesis / Model objects :

Basic eda - uva, Correlation & BVA analysis, features cluster Model Building Steps :

1) extracted band / rules from training - boundary details for Data dimension : 10M X 200 Input [Full Batch – 2000]
splitting criteria for all eligible vars Response : Unit Count, Yield, Quality Class
(bucket / band definition)
Split Data into 70% Train [FB-1400]– 30% Test [FB-
2) apply these rule & get response count / avg... From validation 600] [15% ITV, 15% OTV]
sample
For 7M Data points – 5M for Generating tree, 2M for
3) column join training band tree table & validation band tree OOB Validation
table – Anomaly Tracking & sensitivity analysis
Anomaly tracking detection as band / rule fluctuation Algorithm Sequence :
4) band score : measure residual, cov, Cross Entropy - relative, IV 1] Split & Bucket generation from data
conditional expectation for both 2] Data Transformation on Direct Input & Transformed Feature
training & validation response (oob validation & measurement) 3] Band Tree Table formation after applying different split & Vars Combination
4] Band Tree Table ensemble Prediction – Regression / Classification / Scoring
5) get band tree table score - psi, R2, Mae, CE, IV, sq err
5] Band Tree Table ensemble Prediction – Regularization
6) filter/Prune/Prioritize band & band tree table (based on score/ 6] Tune Hyperparameter of Learning Model
normalized ) 7] Model Diagnostics & Validation
Once Model is Ready & Running we can use it for

Anomaly Tracking & Band update [ Same (Split, vars & Bucket) combination
=> Band Tree Training vs New comparison ]
Hyperparameter of Learning Model
Hyperparameters of algo: Model Diagnostics :

Split range - 10 - 100 ( continues vars with specific granularity) Summary Table : List of Tree Table with all scoring & contribution,
[ Higher Split range - high variance model ] – stability issue in long run residual diagnostics details for weighting -ensemble
[ lower split range – low variance model ]
Residual Diagnostics – Confusion Matrix
AUC – ROC, R2, MSE, MAE … etc.
Split / bucket size – based on cardinality & few other( SNR, SD) factor from input vars (unique
pt. / tot obs)
Score quality checking, Rank order checking

Var Subset index - sqrt(tot vars), n/2…..etc Stability & Sensitivity Analysis – Deviation in previous vs Current
score & critical parameters
Split criteria –
binning / discretization : ChiMerge, MDLP, CAIM, khiops, Adaptive Further diagnostics – IML XAI ( Optional )
Quantizer, Mutual Information based : PMI, MIC,
change pt, cov / density / contour based…….etc Feature Importance( Forest level ) & Combined interaction
Importance
( Tree Level )
Band tree Table Weight score – IV,AUC, R2 ….etc

Prediction - get count, average, weighted avg, class, probability

Out of bag sample size : 30%

## Fluctuations pruning : true / false (warm up / incremental addition)
Anomaly Tracking & Band update :

A) join training band tree & New band tree – Anomaly Tracking &
sensitivity analysis
Anomaly tracking detection as band / rule fluctuation
Hurricane
Irma Created
Some New Rule might appear in data & few old rule might come a New Island
Along the
insignificant due to fluctuation Georgia Coast
New Data with Band Tree A1
X1 x2 x4 x5 y1 y2
3 5 a 7 96 2
3 5 a 1 92 0
1 9 b 2 50 1
1 9 b 1 86 1
3 7 c 4 46 0

Joining table – Training

Table
Add new rule
& Also Measure
Fluctuation

Once Model is Ready & Running we can use

it for Anomaly Tracking & Band update Measure Deviation PSI
[ Same (Split, vars & Bucket) combination => Change in moment /
Band Tree Old vs New comparison ] Distribution
Further Scope & Research

Basic eda - uva, Correlation & BVA analysis, features cluster

As a Feature Engineering Tool

• Process script / process modelling

• Sequence attention - process level - Response retrieval
• Scoring / classification – LSTM, CNN
• Apply NLP algo - on sequence token
• Sequence2vec, surfaces2surface
• Summarization - set of critical settings / pt / band / path
Extraction /
• Gan / Adversarial Model - surface generation / simulation
• Process Similarity, process base, process hierarchy / • Process script / process modelling
• Sequence attention - process
segmentation simulation, process retrieval
• sequence level
NLP, Sequence • Each Band as sequence token /
band scripts
Modeling, Deep • Scoring / classification - CNN LSTM
Learning • Apply NLP algo - on sequence token
• Sequence2vec, surfaces2surface,
• Summarization - set of critical
settings / pt / band / path
Extraction /
• Gan - surface generation /
simulation
• Process Similarity, process base,
process hierarchy / segmentation
Further Scope & Research
Architecture for IOT Application
Architecture for IOT Application
Thank You

Soumyajit Das
[email protected]

ML Unit-1
100% (2)
ML Unit-1
12 pages
Product IP
No ratings yet
Product IP
17 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Unit-V_1
No ratings yet
Unit-V_1
26 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
2025 Ensemble Learning.docx
No ratings yet
2025 Ensemble Learning.docx
25 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Random Forest
No ratings yet
Random Forest
5 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Guided Tour To Random Forest
No ratings yet
Guided Tour To Random Forest
42 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Team 5
No ratings yet
Team 5
12 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Data Mining
No ratings yet
Data Mining
15 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
A Random Forest Guided Tour: Gerard - Biau@
No ratings yet
A Random Forest Guided Tour: Gerard - Biau@
41 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
No ratings yet
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
11 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Random Forest
No ratings yet
Random Forest
83 pages
14 Model Ensembles
No ratings yet
14 Model Ensembles
63 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
24 pages
decision_trees_implementation (1)
No ratings yet
decision_trees_implementation (1)
13 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
Random Forest
No ratings yet
Random Forest
27 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Biau 2016
No ratings yet
Biau 2016
31 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Extremely Randomized Trees: Pierre Geurts
No ratings yet
Extremely Randomized Trees: Pierre Geurts
40 pages
Data Science Steps
No ratings yet
Data Science Steps
3 pages
Q3-Copy1: Pandas PD Numpy NP CSV
No ratings yet
Q3-Copy1: Pandas PD Numpy NP CSV
7 pages
73 Ben-Haim Parallel Decision Tree
No ratings yet
73 Ben-Haim Parallel Decision Tree
4 pages
CP 4
No ratings yet
CP 4
2 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Learn Digital and Microprocessor Techniques On Your Smartphone: Portable Learning, Reference and Revision Tools.
From Everand
Learn Digital and Microprocessor Techniques On Your Smartphone: Portable Learning, Reference and Revision Tools.
Clive W. Humphris
No ratings yet
200 Soumyajit Das - 24
No ratings yet
200 Soumyajit Das - 24
4 pages
Requirements
No ratings yet
Requirements
1 page
EP About This File - European Patent Register
No ratings yet
EP About This File - European Patent Register
2 pages
TTTT
No ratings yet
TTTT
1 page
Unit 5
No ratings yet
Unit 5
26 pages
Exploring the Applications of Machine Learning in Healthcare
No ratings yet
Exploring the Applications of Machine Learning in Healthcare
16 pages
Lecture 4.1 Machine Learning Deep Learning Reinforcement Learning
No ratings yet
Lecture 4.1 Machine Learning Deep Learning Reinforcement Learning
32 pages
BDA Notes Unit-5
No ratings yet
BDA Notes Unit-5
62 pages
Student Performance Analysis Using Machine Learning
No ratings yet
Student Performance Analysis Using Machine Learning
40 pages
Chakraborty Et Al 2022 Attribute Sentiment Scoring With Online Text Reviews Accounting For Language Structure and
No ratings yet
Chakraborty Et Al 2022 Attribute Sentiment Scoring With Online Text Reviews Accounting For Language Structure and
23 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Erasmus Mundus Joint Master Degree
No ratings yet
Erasmus Mundus Joint Master Degree
31 pages
Week 4
No ratings yet
Week 4
13 pages
GIS Based Network Analysis of Public Transport Accessibility in Temeke Municipality, Tanzania.
No ratings yet
GIS Based Network Analysis of Public Transport Accessibility in Temeke Municipality, Tanzania.
49 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
Expert System and Machine-Learning Paradigms
No ratings yet
Expert System and Machine-Learning Paradigms
14 pages
Obstructive Sleep Apnea
No ratings yet
Obstructive Sleep Apnea
19 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
34 pages
Stock Price Prediction Using Deep Learning
No ratings yet
Stock Price Prediction Using Deep Learning
60 pages
Intro To Machine Learning Nanodegree Program Syllabus
No ratings yet
Intro To Machine Learning Nanodegree Program Syllabus
13 pages
Polycopié
No ratings yet
Polycopié
3 pages
Content Based ML Repo
No ratings yet
Content Based ML Repo
36 pages
AnalytixLabs - Data Science & Machine Learning With Python-1601625377114-1
No ratings yet
AnalytixLabs - Data Science & Machine Learning With Python-1601625377114-1
16 pages
Supervised and Unsupervised Learning in R Programming
No ratings yet
Supervised and Unsupervised Learning in R Programming
10 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
AWS Certified Machine Learning - Specialty - Sample Questions
No ratings yet
AWS Certified Machine Learning - Specialty - Sample Questions
5 pages
DIP Notes - Image Classification-SA
No ratings yet
DIP Notes - Image Classification-SA
3 pages
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
No ratings yet
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
8 pages
Introduction (15 Files Merged)
No ratings yet
Introduction (15 Files Merged)
43 pages
Paper 2-Application of Machine Learning Approaches in Intrusion Detection System
No ratings yet
Paper 2-Application of Machine Learning Approaches in Intrusion Detection System
10 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Supervised Learning-1
100% (1)
Supervised Learning-1
37 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages

Learning Framework

Uploaded by

Learning Framework

Uploaded by

~Adversarial Bandit Control Learning

continuous variable level

Binning can be done based on response variable Y.

rule : As combination to bucket/bin at variable

var1:bin5,var2:bin9,var3:bin6 - #_match(1) : 80, #_nomatch(0) 20, tot obs: 821

var1:bin3,var2:bin6,var3:bin7 - #_match(1) : 85, #_nomatch(0) 15, tot obs: 678

%#_match(1) : probability of getting 1(match) (at rule level like D-tree)

%#_nomatch(0) : probability of getting 0(nomatch) (at rule level like D-tree)

result will be same as D tree algo

it will help us to apply rule directly on data to predict - match / nomatch

Data Transformation on Direct input & Transformed Feature Bucket size –

1) Split Criteria – independent & Dependent (with max bucket size – 10 to

=B 2) Variable subset -> N/2, Log(N), nCk : ( K<N ) : continuous only

1) Var type & granularity ( unique pt/ var Row

2) Splitting criteria – independent, Dependent

3) for all filtered set of vars/ features - band or

4) create multiple Subset of vars vs split

5) for each band get count of class, average of

Regression 6) for prediction - get band level map based on

1) Band pruning : rules

2) Band tree table weight :

Get response score from

Prediction - get count, average, weighted avg, class, probability

Out of bag sample size : 30%

Joining table – Training

Once Model is Ready & Running we can use

Basic eda - uva, Correlation & BVA analysis, features cluster

As a Feature Engineering Tool

• Process script / process modelling

You might also like