SlideShare a Scribd company logo
© SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger
Explaining Online Reinforcement Learning Decisions
of Self-Adaptive Systems
Felix Feit, Andreas Metzger, Klaus Pohl
ACSOS 2022
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 2
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
A. Metzger, ACSOS 2022
3
MAPE-K Combination
RL
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Online RL for SAS
Execute
Policy
(K)
Monitor
Action
Selection
(A + P)
Policy
Update
Action
A
State S
Reward R
Next state S’
Action A
State S
Reward R
Action
Selection
Next state S’
RL Agent
Policy
Policy
Update
Environ-
ment
• Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]
 Leverage information only available @ runtime (i.e., during live system execution)
• Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020]
• Conceptual Model of Online RL:
[Metzger et al. @ Computing 2022]
Learning Goal
defined by
Reward
Function
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Online Reinforcement Learning for SAS
Policy (Knowledge) represented
as deep neural network
Pro
• Handling of continuous
states and actions
• Generalization over unseen,
neighboring states
Con
• Deep RL = “black box”
•  Limited trustworthiness
•  Difficult to debug
e.g., reward function
correctly defined?
A. Metzger, ACSOS 2022 4
Increasing use of Deep RL The power of deep learning:
“/imagine yellow Labrador in the style of…”
[generated using Midjourney AI]
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
-8,1866584 -7,9931344 -7,9940185 -8,9510004
-7,0729858 -6,9970914 -6,9977784 -8,2920288
-6,0319487 -5,9988816 -5,9989739 -6,6224484
-6,0996963 -4,9994285 -4,999519 -6,232246
-4,8057376 -3,9997425 -3,9997874 -4,9212681
-3,1293615 -2,9999285 -2,9999384 -3,556979
-2,7588749 -2,1 -2 -2,2345458
-13,360876 -12 -13,954412 -12,991696
-12,565624 -11 -112,1835 -12,995431
-11,733772 -10 -112,79659 -11,980454
-10,740429 -9 -111,48554 -10,947642
-9,95339 -8 -112,12695 -9,9878453
-8,9112 -7 -112,67844 -8,8890419
-7,992178 -6 -112,91331 -7,965498
-6,9846114 -5 -112,41604 -6,9763523
-5,9533325 -4 -111,81117 -5,9401313
-4,9217978 -3 -110,6219 -4,8068301
-3,9307738 -2 -112,85584 -3,9418704
-2,9796888 -1,9340884 -1 -2,9827181
-13 -112,90933 -13,998779 -13,995334
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
State
S
Action A
State
S
Action
A
Deep RL Classical RL (Q-Learning)
Environ-
ment
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Explainable Reinforcement Learning (XRL) for SAS
A. Metzger, ACSOS 2022 5
State of the Art
Goal-based Models [Welsh et al. @ Trans. CCI 2014]
• Explanations in terms of the satisficement of softgoals
• Requires making assumptions about environment dynamics at
design time (difficult due to design time uncertainty)
Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020]
• Graph history to determine if, how and why model changed
• Graph can become too complex to be meaningfully interpreted by humans
• Query language suggested for “pruning” graphs but not for explanations
Temporal Graph Models [Ullauri et al. @ SoSym 2022]
• Explicitly considers Deep RL
• Explanations via queries to model @ runtime
• Interesting points of interactions extracted via CEP
• No detailed, contrastive decomposition of explanations
Example: Vacuum Cleaner
Example:
Fibonacci
Example: Remote Data Mirroring
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 6
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Decomposition
[Sequeira et al. @ Artif. Intell. 2020]
Decompose reward function to explain short-
term goal orientation of RL (train sub-RL agents)
Pro
• Helpful in the presence of multiple, “competing”
quality goals for learning
• Provides contrastive (counterfactual) explanations
Con
• No indication of explanation’s relevance
• Requires manually selecting relevant explanations
 cognitive overhead
Interestingness Elements
[Juozapaitis et al. @ IJCAI Wkshp 2019]
Identify relevant moments of interaction
between agent and environment at runtime
Pro
• Facilitates automatically selecting relevant
interactions to be explained
Con
• Does not explain whether RL behaves as expected
and for the right reasons
7
Augment and Combine RL Explanation Techniques from AI Research
Decomposed Interestingness Elements (DINEs)
A. Metzger, ACSOS 2022
+
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Internal Behaviour
Evolution of Reward
External Behaviour
Evolution of States and Actions
8
Understanding RL without DINEs?
A. Metzger, ACSOS 2022
R
S, A
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Important Interaction
Determine whether RL in given state is
uncertain (wide range of actions) or
certain (almost always same action)
• How much does relative importance
of actions differ for each sub-agent?
• Number of DINES shown can be
tuned via Threshold ρ (level of
inequality)
9
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Certain Uncertain
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Dominance
Influence that each sub-agent has on
each possible action
• Influence of rewards of sub-agents
on composed decision
10
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Relative
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
XRL-DINE
Reward Channel Extremum
Points after local minimum/maximum
of state-value  RL decisions in
potentially critical states
• ExpectedReward (S) –
ExpectedReward (S’) > ϕ
 Maximum
• Number of DINES shown can be
tuned via Threshold ϕ
11
Three Types of DINEs
A. Metzger, ACSOS 2022
Visualization in Dashboard
Minimum
Maximum
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 12
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Proof-of-Concept
Implementation
• Double Deep Q-Networks with
Experience Replay [Hasselt et al.
@ AAAI 2016]
• Approximation of environment
model using supervised
learning on contents of replay
memory
• OpenAI Gym interface to
connect RL and SAS
A. Metzger, ACSOS 2022 13
Experimental Setup
RL Problem Formulation
• Action Space =
{Add / remove web servers,
Change dimmer value}
• State Space =
{Request arrival rate,
Average throughput,
Average response time}
• Decomposed Reward Function
Self-Adaptive System
• “SWIM” Exemplar [Moreno et al.
@ SEAMS 2018]
• Self-adaptive multi-tier web
application
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
A. Metzger, ACSOS 2022 14
Qualitative Results
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Validation
Important Interactions Reward Channel Extrema
A. Metzger, ACSOS 2022 15
Quantitative Results
Cognitive load ~ number of DINEs shown to developers
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Agenda
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
A. Metzger, ACSOS 2022 16
1. Motivation
2. Explanation Approach “XRL-DINE“
3. Validation
4. Discussion and Outlook
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Discussion
A. Metzger, ACSOS 2022 17
Limitations of XRL-DINE
May generate difficult to understand explanations
• Reason 1: Reward function was decomposed incorrectly or non-optimally
• Reason 2: Environment dynamics may delay effects of adaptations and thus understandability
Not directly applicable to collaborative adaptive systems
• XRL-DINE does not consider decisions of other RL agents
• May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting
Only works for value-based deep RL
• XRL-DINEs computed using value-function Q(S, A) – details see paper
• Value-based deep RL: policy = Q(S, A) approximated by neural network
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Outlook
A. Metzger, ACSOS 2022 18
Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019]
Contrastive: “why P happened instead of Q?”
 “Reward Channel Dominance” DINE
Selective: “no need for complete course of events”
 “Reward Channel Extrema” DINE &
“Important Interactions” DINE
Causal: “most likely explanation not
necessarily the best”
 Future work: e.g., check whether agent relies on
spurious correlations (and not causality)
[Gajcin et al. @ AAAMAS Wkshp 2022]
Social: “transfer of knowledge as part of a
conversation”
 Future work: e.g., Chatbot4XAI


?
? Prediction
Explanation
Train will be
delayed
Train passed
last light 5
min later
than typical
This also
happened
yesterday, but
why today a
problem?
Because of
attribute
“number of
trains behind
current one” > 5
What would
have to change
such that not
delay
(counterfactual)
“number of
trains behind
current one” < 2
Human
(Explainee)
Chatbot4XAI
(Explainer)
©
SSE,
Prof.
Dr.
Klaus
Pohl,
Prof.
Dr.
Andreas
Metzger
SOFTWARE SYSTEMS ENGINEERING
Prof. Dr. K. Pohl
Thank You!
Research leading to these results has received funding from the EU’s Horizon 2020 research and
innovation programme under grant agreements no. 780351 & 871493
Further Reading
• A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online
Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
• A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement
learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020),
LNCS 12571, Springer, 2020
• A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information
systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS
12127. Springer, 2020
A. Metzger, ACSOS 2022 19
www.enact-project.eu www.dataports-project.eu
Ad

More Related Content

Similar to Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems (20)

What Comes After MBSE Webinar
What Comes After MBSE WebinarWhat Comes After MBSE Webinar
What Comes After MBSE Webinar
Elizabeth Steiner
 
A cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturingA cross-layer approach to energy management in manufacturing
A cross-layer approach to energy management in manufacturing
FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology
 
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector Machine
IRJET Journal
 
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdAdvances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Independent Geoscience
 
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo..."How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
Smart Grid Interoperability Panel
 
ASEP midsem review_ asep project[1].pptx
ASEP midsem review_ asep project[1].pptxASEP midsem review_ asep project[1].pptx
ASEP midsem review_ asep project[1].pptx
alokjaiswal6622
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
Dr. Aparna Varde
 
report
reportreport
report
Udayan Mitra
 
Electronics engineer resume
Electronics engineer resumeElectronics engineer resume
Electronics engineer resume
Mohammed Irshad S K
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPC
inside-BigData.com
 
Program Management in MBSE
Program Management in MBSEProgram Management in MBSE
Program Management in MBSE
TaylorDuffy11
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Paragon_Science_Inc
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
Rupa Nandimandalam
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
ChemAxon
 
3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?
Jeffrey Funk
 
Zack hsu resume
Zack hsu resumeZack hsu resume
Zack hsu resume
KuoChan Hsu
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
predictive maintenance advanced solution
predictive maintenance advanced solutionpredictive maintenance advanced solution
predictive maintenance advanced solution
nirmalnarayanaswamyk
 
What Comes After MBSE Webinar
What Comes After MBSE WebinarWhat Comes After MBSE Webinar
What Comes After MBSE Webinar
Elizabeth Steiner
 
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector Machine
IRJET Journal
 
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty LtdAdvances dynamic pile testing technology - Independent Geoscience Pty Ltd
Advances dynamic pile testing technology - Independent Geoscience Pty Ltd
Independent Geoscience
 
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo..."How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
"How Today's Power Grid Implementation Choices Impact Future Smart Grid Deplo...
Smart Grid Interoperability Panel
 
ASEP midsem review_ asep project[1].pptx
ASEP midsem review_ asep project[1].pptxASEP midsem review_ asep project[1].pptx
ASEP midsem review_ asep project[1].pptx
alokjaiswal6622
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
Dr. Aparna Varde
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPC
inside-BigData.com
 
Program Management in MBSE
Program Management in MBSEProgram Management in MBSE
Program Management in MBSE
TaylorDuffy11
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Paragon_Science_Inc
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
ChemAxon
 
3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?3D Television: When will it become economically feasible?
3D Television: When will it become economically feasible?
Jeffrey Funk
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
Michael_Kogan_portfolio
Michael_Kogan_portfolioMichael_Kogan_portfolio
Michael_Kogan_portfolio
Michael Kogan
 
predictive maintenance advanced solution
predictive maintenance advanced solutionpredictive maintenance advanced solution
predictive maintenance advanced solution
nirmalnarayanaswamyk
 

More from Andreas Metzger (14)

Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptx
Andreas Metzger
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Andreas Metzger
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
Andreas Metzger
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management
Andreas Metzger
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
Andreas Metzger
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles
Andreas Metzger
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
Andreas Metzger
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Andreas Metzger
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Andreas Metzger
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
Andreas Metzger
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and Risk
Andreas Metzger
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process Adaptation
Andreas Metzger
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability Estimates
Andreas Metzger
 
Antrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptxAntrittsvorlesung - APL.pptx
Antrittsvorlesung - APL.pptx
Andreas Metzger
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Andreas Metzger
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
Andreas Metzger
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management
Andreas Metzger
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
Andreas Metzger
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles
Andreas Metzger
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
Andreas Metzger
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Andreas Metzger
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Andreas Metzger
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
Andreas Metzger
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and Risk
Andreas Metzger
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process Adaptation
Andreas Metzger
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability Estimates
Andreas Metzger
 
Ad

Recently uploaded (20)

Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Ad

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

  • 1. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems Felix Feit, Andreas Metzger, Klaus Pohl ACSOS 2022
  • 2. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 2 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 3. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS A. Metzger, ACSOS 2022 3 MAPE-K Combination RL Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge Online RL for SAS Execute Policy (K) Monitor Action Selection (A + P) Policy Update Action A State S Reward R Next state S’ Action A State S Reward R Action Selection Next state S’ RL Agent Policy Policy Update Environ- ment • Online RL = Emerging approach for addressing design time uncertainty [Palm et al. @ CAiSE 2020]  Leverage information only available @ runtime (i.e., during live system execution) • Since 2019 use of Learning for SAS most prominent strategy [Porter et al. @ ACSOS 2020] • Conceptual Model of Online RL: [Metzger et al. @ Computing 2022] Learning Goal defined by Reward Function
  • 4. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Online Reinforcement Learning for SAS Policy (Knowledge) represented as deep neural network Pro • Handling of continuous states and actions • Generalization over unseen, neighboring states Con • Deep RL = “black box” •  Limited trustworthiness •  Difficult to debug e.g., reward function correctly defined? A. Metzger, ACSOS 2022 4 Increasing use of Deep RL The power of deep learning: “/imagine yellow Labrador in the style of…” [generated using Midjourney AI] UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 -8,1866584 -7,9931344 -7,9940185 -8,9510004 -7,0729858 -6,9970914 -6,9977784 -8,2920288 -6,0319487 -5,9988816 -5,9989739 -6,6224484 -6,0996963 -4,9994285 -4,999519 -6,232246 -4,8057376 -3,9997425 -3,9997874 -4,9212681 -3,1293615 -2,9999285 -2,9999384 -3,556979 -2,7588749 -2,1 -2 -2,2345458 -13,360876 -12 -13,954412 -12,991696 -12,565624 -11 -112,1835 -12,995431 -11,733772 -10 -112,79659 -11,980454 -10,740429 -9 -111,48554 -10,947642 -9,95339 -8 -112,12695 -9,9878453 -8,9112 -7 -112,67844 -8,8890419 -7,992178 -6 -112,91331 -7,965498 -6,9846114 -5 -112,41604 -6,9763523 -5,9533325 -4 -111,81117 -5,9401313 -4,9217978 -3 -110,6219 -4,8068301 -3,9307738 -2 -112,85584 -3,9418704 -2,9796888 -1,9340884 -1 -2,9827181 -13 -112,90933 -13,998779 -13,995334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 State S Action A State S Action A Deep RL Classical RL (Q-Learning) Environ- ment
  • 5. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Explainable Reinforcement Learning (XRL) for SAS A. Metzger, ACSOS 2022 5 State of the Art Goal-based Models [Welsh et al. @ Trans. CCI 2014] • Explanations in terms of the satisficement of softgoals • Requires making assumptions about environment dynamics at design time (difficult due to design time uncertainty) Provenance Graphs [Reynolds et al. @ MODELS Wkshp 2020] • Graph history to determine if, how and why model changed • Graph can become too complex to be meaningfully interpreted by humans • Query language suggested for “pruning” graphs but not for explanations Temporal Graph Models [Ullauri et al. @ SoSym 2022] • Explicitly considers Deep RL • Explanations via queries to model @ runtime • Interesting points of interactions extracted via CEP • No detailed, contrastive decomposition of explanations Example: Vacuum Cleaner Example: Fibonacci Example: Remote Data Mirroring
  • 6. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 6 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 7. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Decomposition [Sequeira et al. @ Artif. Intell. 2020] Decompose reward function to explain short- term goal orientation of RL (train sub-RL agents) Pro • Helpful in the presence of multiple, “competing” quality goals for learning • Provides contrastive (counterfactual) explanations Con • No indication of explanation’s relevance • Requires manually selecting relevant explanations  cognitive overhead Interestingness Elements [Juozapaitis et al. @ IJCAI Wkshp 2019] Identify relevant moments of interaction between agent and environment at runtime Pro • Facilitates automatically selecting relevant interactions to be explained Con • Does not explain whether RL behaves as expected and for the right reasons 7 Augment and Combine RL Explanation Techniques from AI Research Decomposed Interestingness Elements (DINEs) A. Metzger, ACSOS 2022 +
  • 8. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Internal Behaviour Evolution of Reward External Behaviour Evolution of States and Actions 8 Understanding RL without DINEs? A. Metzger, ACSOS 2022 R S, A
  • 9. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Important Interaction Determine whether RL in given state is uncertain (wide range of actions) or certain (almost always same action) • How much does relative importance of actions differ for each sub-agent? • Number of DINES shown can be tuned via Threshold ρ (level of inequality) 9 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Certain Uncertain
  • 10. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Dominance Influence that each sub-agent has on each possible action • Influence of rewards of sub-agents on composed decision 10 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Relative
  • 11. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl XRL-DINE Reward Channel Extremum Points after local minimum/maximum of state-value  RL decisions in potentially critical states • ExpectedReward (S) – ExpectedReward (S’) > ϕ  Maximum • Number of DINES shown can be tuned via Threshold ϕ 11 Three Types of DINEs A. Metzger, ACSOS 2022 Visualization in Dashboard Minimum Maximum
  • 12. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 12 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 13. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Proof-of-Concept Implementation • Double Deep Q-Networks with Experience Replay [Hasselt et al. @ AAAI 2016] • Approximation of environment model using supervised learning on contents of replay memory • OpenAI Gym interface to connect RL and SAS A. Metzger, ACSOS 2022 13 Experimental Setup RL Problem Formulation • Action Space = {Add / remove web servers, Change dimmer value} • State Space = {Request arrival rate, Average throughput, Average response time} • Decomposed Reward Function Self-Adaptive System • “SWIM” Exemplar [Moreno et al. @ SEAMS 2018] • Self-adaptive multi-tier web application
  • 14. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation A. Metzger, ACSOS 2022 14 Qualitative Results
  • 15. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Validation Important Interactions Reward Channel Extrema A. Metzger, ACSOS 2022 15 Quantitative Results Cognitive load ~ number of DINEs shown to developers
  • 16. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Agenda © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger A. Metzger, ACSOS 2022 16 1. Motivation 2. Explanation Approach “XRL-DINE“ 3. Validation 4. Discussion and Outlook
  • 17. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Discussion A. Metzger, ACSOS 2022 17 Limitations of XRL-DINE May generate difficult to understand explanations • Reason 1: Reward function was decomposed incorrectly or non-optimally • Reason 2: Environment dynamics may delay effects of adaptations and thus understandability Not directly applicable to collaborative adaptive systems • XRL-DINE does not consider decisions of other RL agents • May lead to misleading explanations if DINE-XRL is directly applied in collaborative setting Only works for value-based deep RL • XRL-DINEs computed using value-function Q(S, A) – details see paper • Value-based deep RL: policy = Q(S, A) approximated by neural network
  • 18. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Outlook A. Metzger, ACSOS 2022 18 Considering Explanation Requirements from Social Sciences [Miller @ Artif. Intell. 2019] Contrastive: “why P happened instead of Q?”  “Reward Channel Dominance” DINE Selective: “no need for complete course of events”  “Reward Channel Extrema” DINE & “Important Interactions” DINE Causal: “most likely explanation not necessarily the best”  Future work: e.g., check whether agent relies on spurious correlations (and not causality) [Gajcin et al. @ AAAMAS Wkshp 2022] Social: “transfer of knowledge as part of a conversation”  Future work: e.g., Chatbot4XAI   ? ? Prediction Explanation Train will be delayed Train passed last light 5 min later than typical This also happened yesterday, but why today a problem? Because of attribute “number of trains behind current one” > 5 What would have to change such that not delay (counterfactual) “number of trains behind current one” < 2 Human (Explainee) Chatbot4XAI (Explainer)
  • 19. © SSE, Prof. Dr. Klaus Pohl, Prof. Dr. Andreas Metzger SOFTWARE SYSTEMS ENGINEERING Prof. Dr. K. Pohl Thank You! Research leading to these results has received funding from the EU’s Horizon 2020 research and innovation programme under grant agreements no. 780351 & 871493 Further Reading • A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022 • A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020 • A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020 A. Metzger, ACSOS 2022 19 www.enact-project.eu www.dataports-project.eu

Editor's Notes

  • #5: Monet – Kandinski -- Marc