Articulo

Uploaded by

fabian.aroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Articulo

Uploaded by

fabian.aroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Extending the BDI Model with Q-learning in Uncertain

Environment
Qian Wan† Wei Liu Longlong Xu
Hubei Province Key Laboratory Hubei Province Key Laboratory Hubei Province Key Laboratory
of Intelligent Robot of Intelligent Robot of Intelligent Robot
School of Computer Science and School of Computer Science and School of Computer Science and
Engineering, Wuhan Institute of Engineering, Wuhan Institute of Engineering, Wuhan Institute of
Technology Technology Technology
Wuhan, China Wuhan, China Wuhan, China
[email protected] [email protected] [email protected]

Jingzhi Guo
Hubei Province Key Laboratory
of Intelligent Robot
School of Computer Science and
Engineering, Wuhan Institute of
Technology
Wuhan, China
[email protected]

https://doi.org/10.1145/3302425.3302432
ABSTRACT
The BDI model has solved the problem of reasoning and decision-
making of agents in a particular environment by procedure 1 Introduction
reasoning. But in uncertain environment which the context is The research on agents, acting in an uncertain and dynamic
unknown the BDI model is not applicable, because in BDI model environment is a challenge, BD I [1] agents is designed for agent-
the context must be matched in plan library. To address this issue, orient programming model and build multi-agent systems. BDI
in this paper we propose a method extending the BDI model with model is concerned with agent’s rule description and logic
Q-learning which is one algorithm of reinforcement learning, and reasoning in multi-agent systems. However these two factors are
make an improvement to the decision-making mechanism on the based on context and must be designed in advance.
Reinforcement learning [2] (RL) is applied to solve this problem
ASL as a implement model of BDI. Finally we completed the that BDI agent don’t know environmental model. RL assumes
simulation of maze on Jason simulation platform to verify the that an agent is using observed rewards that are perceived from
feasibility of the method. the environment to measure its utility following its actions in an
uncertain and dynamic environment. According to reward value,
KEYWORDS the agent can determine the sequence of actions though in
uncertain environment.
BDI model, Agent, Q-learning, Jason, Plan Library BDI concepts used to describe people's behavior and intention
at first, then was introduced into artificial intelligence, and the
ACM Reference format: earliest BDI abstract model was put forward by Georgeff. On
Qian Wan, Wei Liu, Longlong Xu and Jingzhi Guo. 2018. Extending the the basis of the model, different Procedure Reasoning System
BDI Model with Q-learning in Uncertain Environment. In Proceedings of (PRS) are designed for reasoning. Based on PRS mode, the
2018 International Conference on Algorithms, Computing and Artificial researchers developed the Multi-Agent System (MAS) based on
Intelligence (ACAI’18). Sanya, China, 6 pages. BDI model including JACK [3], DECAF [4], IRMA [5],
†
JADEX [6],ASL [7], etc. Because the ASL which has been
Corresponding author: [email protected]
Permission to make digital or hard copies of all or part of this work for personal or
added with the plan library on the foundation model of BDI has
classroom use is granted without fee provided that copies are not made or a simulation system and a better extension interface, we see
distributed for profit or commercial advantage and that copies bear this notice and ASL as a starting point for studying the BDI model. The plan
the full citation on the first page. Copyrights for components of this work owned problem of the agent in uncertain environment, so our research
by others than ACM must be honored. Abstracting with credit is permitted. To is aimed at improving the planning part of the ASL which is the
copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from
implementation model of the BDI Agent.
[email protected]. RL is learning to map situations to actions so as to maximize
ACAI '18, December 21–23, 2018, Sanya, China a numerical reward. Without knowing which actions to take, the
© 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6625- learner must discover which actions yield the most reward by
0/18/12…$15.00
trying them. Actions may affect not only the immediate reward
https://doi.org/10.1145/3302425.3302432 but also the next situation and all subsequent rewards. Agent in
ACAI’18, December, 2018, Sanya, China Q. Wan et al.

RL has the ability of learning and planning, but lacks the logic decision. But the human resource is consumed by the human
and reasoning ability of BDI Agent.RL can be divided into two being as the trainer.
types by the: model-based and model-free, model-free RL is
mainly used to solve the planning of agents in situations where Compared with the above method, this study avoids the unit of
environmental information is not known, Q-learning is one of plan with the state as the plan and the decision- making in the
non-model algorithm that has high learning efficiency. unknown environment will become a plan which added to the
Therefore, this study put forward the method which can solve the plan library after the Q-learning algorithm has explored the
problem that BDI Agent can't decision under dynamic and
uncertain environment by RL. environment. The plan is the unit after learning which avoids the
The structure of this paper is as follows. After this introductory problem of oversize planning library, and the state of agent is
section, we follow in Section II with a brief discussion of related easy to define.
works. Section III introduces BDI Agent and AgentSpeak(L), RL
and q-learning algorithm. Section IV describes our method that
3 Background
decision improvement algorithm based on q-learning in ASL
system. Section V describes the simulation experiment and 3.1 BDI-Agent and AgentSpeak(L)
evaluates and analyzes the results of experiment. Section VI
summarizes our research and point to possible future Programming paradigm based on agent gives higher intelligent
developments. computer software module and adaptive ability, BDI is an agent
oriented programming model which is composed of Belief, Desire
and Intention. The belief includes the environmental information
2 Related Works acquired by the agent from the environment, the information of its
own operation and the information received from other agents.
The inference mechanism in BDI model is based on the preset
The desire represents the possible state of the agent when
environment, so the BDI system lacks the planning under
performing task, which is driven by intention in the actual
unknown environment. For the known environment model, there
operation. The intention represents the state that the agent decides
have been many methods are used to solve the planning in which
to achieve in the actual operation of the agent. There are two types
including decision tree, self-aware neural network, and rules
of intention one of which is the target to be executed by the agent,
learning algorithm are used to optimize the Agent’s decision.
and another is the plan based on the context matching in the BDI
Pereira apply Markov decision process to generate the optimal
model. The BDI conceptual model is described as follows: agent
strategy of BDI plan, but the method in unknown environments is
initializes beliefs and intentions, then perceives environmental
difficult to build Markov environment model, so these methods do
information. By belief update function, belief is update.
not apply in unknown environment information.
According to the current intentions and beliefs, some desires are
For the unknown environment model, the self-adaptive research
selected as a candidate, one of them can be selected as the
of agent in the unknown environment of BDI model has been paid
execution plan through the matching rules that have been
more attention. Google recently proposed a method that building
designed in advance. Agent finish the task by the plan. BDI model
the BDI model of Agent by deep reinforcement learning to
is a conceptual model in which the agent has basic logical
understand the current real intention of Agent in order to improve
reasoning ability.
its planning [8], J. L. Feliu propose to have an offline training
session for plan generation in an uncertain and dynamic ASL (Agent Speak Language)[12] is extended on the basis of
environment, the use of Q-learning from training sessions BDI model, and the specific system structure diagram is shown
consisting of interactions between agent and environment in figure 1.
generates plans for agents in ASL[9]. Although this method has
solved the problem that agents know how to act when considering 1. The agent target and belief library are initialized, the belief
the efficiency of achieving goals in every state under uncertain library is updated by the BRF (Belief Revision Function)
environment, but when the state set is too large, quantity of according to the target agent perceives information from the
environment or from other agents.
generated plan will be explosive which lead to deviation that
2. The agent cognitive the environment changes, and updates
agents focus on the internal logic in original BDI model. Joost
the belief library and verify whether the change triggers the
Broekens propose to solve the problem for selection of rules of event in the plan library. When multiple events are triggered
which priority is learned by RL, and to use a state to represent simultaneously, the sequence of event execution is
independent heuristic rules on active targets [10]. However, it is determined by the SE function.
difficult for agents to express the state in dynamic environment. 3. According to the trigger event, the predicate symbol is
Autonomous Agent mentioned in the literature [11], incentive matched in plan library. For example, if default trigger
signal of agent's behavior is given by the person. Supervised event in the plan library is +color(Object,color). When
learning is used to generate action selector according to the agent perceives the blue box from the environment, the data
excitation signal which finally enables the agent to perform the will be transformed into expression +color
task efficiently in complex and unknown environment, this (box1,blue)[source(percept)], then after SE(Select Event)
is used to match trigger event,box1 Object, blue
method enables the Agent to learn the human’s perception and
Color. The applicable plan will be matched, and the value
Extending the BDI Model with Q-learning in Uncertain
ACAI’18, December, 2018, Sanya, China
Environment

of Object and Color is instantiated into box1 and blue entities the maximum value of Q(s,a) is updated every time the state is
in the plan. passed. Its iterative update equation is given in equation (1):
4. Select a plan through the S0 function which context is
matched in the applicable plan, the plan structure is Q k + 1 ( s ,a ) = Q ( s ,a ) +  r
t t k t t
( t + 1 +  max Q ( s ,a) —Q (s ,a ) (1)
a
k t +1 k t t
)
trigger_event:context->body, body content includes the
action sequence, subtarget and child trigger event. Q ( s ,a ) represents the value of the updated cumulative reward
k + 1 t t

5. According to the context, such as box1 = 1 ㎡ , context

obtained by performing the action at time t. Q ( s ,a ) represents
expression Object > 0.5 ㎡ , and there may be subgoal which
k t t

is matched similarly in body. The plans that conform to the cumulative value of the last time the Agent passed the state.
context are pushed into the stack of intent. The agent   ( 0,1) , the larger  is, the later reward is more considered.
performs the action in turn to pop up the stack. When the
stack is empty, the system enters the next loop. rt+1 represents the value of the reward obtained when st performs
action a to st + 1 .  is the discount cumulative reward.
AgentSpeak(L) builds a multi-agent system based on BDI (
max Qk st +1, a ) denotes the maximum value at st + 1 in Q table. Q-
model, which facilitates the programming and research of Agent. a

This study mainly focuses on the improvement of ASL which is learning is a greedy strategy algorithm. The value of Q(s, a) tends
the implementation model of BDI. to be stable after the finite iteration of formula (1), the maximum
current state Q value in the Q table is selected until the final state
is reached to form the optimal strategy.

4 The ASL Optimal Decision Algorithm Based On

Q-learning
The ASL reasoning model is shown in figure 1. The context of an
agent consists of trigger events and environmental information.
But in an uncertain environment, there be no plan for matching
context, another problem is that agent need to consider which plan
is the best when multiple contexts are matched in plan library. In
view of the above problems, this paper proposes an ASL optimal
Figure 1: ASL Architecture
decision algorithm based on q-learning by which BDI agent
improve the plan library in ASL. This study mainly improves the
3.2 RL and Q-learning planning library from two aspects. (i) For multiple plan decision
Reinforcement learning is completely different from other problems, the task is completed at each times, the cumulative
machine learning algorithms such as supervised learning and reward value is recorded and compared with the latest plan.
unsupervised learning. It requires interaction with the Choose the plan with the largest cumulative reward as the next
environment and obtains “experience” from the environment.
execution plan. (ii) In the uncertain environment, agents explore
Most of the reinforcement learning adopts Markov model of
which the most important feature is that the next step in the the environment and accept the feedback of the environment. By
current state is not relevant to the past, RL can be described in the using q-learning algorithm training, the best sequence of actions is
Markov decision process. The definition of Markov decision completed and the plan is added to the plan library.
process is as follow M = ( S , A, P, r ,  , N ) , S is the state set of The detailed algorithm of ASL optimal decision improvement
agent. A denotes the action set of agent. P represents the transfer algorithm improvement details are shown in figure 2. In the initial
probability between states, r is the immediate reward of a state environment model, because RL is based on markov decision
transition.   0,1 is a discount factor, N is the number of steps model，you need to define the state s that the agent may exist, the
from the initial state to the final state. The cumulative reward corresponding reward value r, the plan library Pu and trigger
N
denote with R=   t rt , The goal of reinforcement learning is to find event E.
n=0
the optimal strategy to maximize the cumulative reward value of
the function which denote with max  R( )P

( ) d ,  represents the ASL optimal decision algorithm based on q-learning:

behavior trajectory of the agents =( s0,a0,s1,a1,....).

Input：E /* E are initial trigger event */
P /* Pu are initial plans in library */
Q-learning algorithm which is one of the RL is mainly used for Su，rt /* Su is set of states，rt are reward */
agent decision learning under the model in which agent does not output：  /*  is the agent’s strategy */
know the information of environment. Q-learning utilize the state- 1: if match(E,P)
behavior function Q(s,a) to indicate the cumulative reward of the
2: N++; /* calculate number of matched plan */
new state to the final state when the state s performs action a, and
3: if (N==1)

9699 Teacher Guide (For Examination From 2021)
No ratings yet
9699 Teacher Guide (For Examination From 2021)
28 pages
3es Lesson Plan
No ratings yet
3es Lesson Plan
4 pages
Passing It On Growing Your Future Leaders by Munroe Myles
100% (4)
Passing It On Growing Your Future Leaders by Munroe Myles
303 pages
Learning Context Conditions For BDI Plan Selection: Dhirendra Singh, Sebastian Sardina, and Lin Padgham Stéphane Airiau
No ratings yet
Learning Context Conditions For BDI Plan Selection: Dhirendra Singh, Sebastian Sardina, and Lin Padgham Stéphane Airiau
8 pages
MDP Concepts
No ratings yet
MDP Concepts
23 pages
Markov Decicion
No ratings yet
Markov Decicion
40 pages
Follow Actions PDF
No ratings yet
Follow Actions PDF
42 pages
5. What is the Belief-Desire-Intention (BDI) Agent
No ratings yet
5. What is the Belief-Desire-Intention (BDI) Agent
8 pages
Learning What To Value
No ratings yet
Learning What To Value
6 pages
16 - Instructional Management Paper - Rowe Et Al
No ratings yet
16 - Instructional Management Paper - Rowe Et Al
10 pages
Reactive (Re) Planning Agents in A Dynamic Environment: Please Use The Foumving Format When Citing This Chapter
No ratings yet
Reactive (Re) Planning Agents in A Dynamic Environment: Please Use The Foumving Format When Citing This Chapter
10 pages
2303.07109v1
No ratings yet
2303.07109v1
21 pages
Unit-5 Mlt
No ratings yet
Unit-5 Mlt
13 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Lec 11
No ratings yet
Lec 11
45 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Building a Belief Desire Intention Agent for Modeling Neural Networks
No ratings yet
Building a Belief Desire Intention Agent for Modeling Neural Networks
14 pages
Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
No ratings yet
Graph-Enhanced Model-Free Reinforcement Learning Agents for Efficient Power Grid Topological Control
32 pages
Learning What To Value: Daniel Dewey
No ratings yet
Learning What To Value: Daniel Dewey
8 pages
Disertatie
No ratings yet
Disertatie
5 pages
Hota-ML-ReinforcementLearning
No ratings yet
Hota-ML-ReinforcementLearning
12 pages
Wcci 14 S
No ratings yet
Wcci 14 S
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
No ratings yet
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
5 pages
Paixi
No ratings yet
Paixi
12 pages
Skill-Based Curiosity For Intrinsically Motivated Reinforcement Learning
No ratings yet
Skill-Based Curiosity For Intrinsically Motivated Reinforcement Learning
20 pages
Compact and Efficient Encodings For Planning in Factored Sta - 2020 - Artificial
No ratings yet
Compact and Efficient Encodings For Planning in Factored Sta - 2020 - Artificial
21 pages
A Survey of Preference-Based Reinforcement Learning Methods
No ratings yet
A Survey of Preference-Based Reinforcement Learning Methods
46 pages
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
12 pages
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
12 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
15
No ratings yet
15
17 pages
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
No ratings yet
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
6 pages
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
Exploration of Reinforcement Learning To SNAKE: Bowei Ma, Meng Tang, Jun Zhang
No ratings yet
Exploration of Reinforcement Learning To SNAKE: Bowei Ma, Meng Tang, Jun Zhang
5 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Model-Based Reinforcement Learning
No ratings yet
Model-Based Reinforcement Learning
67 pages
Origins of Life Questions and Debates
No ratings yet
Origins of Life Questions and Debates
12 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Lec 09
No ratings yet
Lec 09
26 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
The Belief-Desire-Intention Model of Agency: Georgeff@aaii - Oz.au
No ratings yet
The Belief-Desire-Intention Model of Agency: Georgeff@aaii - Oz.au
10 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
rl
No ratings yet
rl
6 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
AI-unit 3
No ratings yet
AI-unit 3
55 pages
Exploring Game Playing AI Using Reinforcement Learning Techniques
No ratings yet
Exploring Game Playing AI Using Reinforcement Learning Techniques
5 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
Final
No ratings yet
Final
18 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning Details
No ratings yet
Reinforcement Learning Details
9 pages
Lecture_2_Summary
No ratings yet
Lecture_2_Summary
1 page
Sandholm 1996
No ratings yet
Sandholm 1996
20 pages
RL Ia 2
No ratings yet
RL Ia 2
14 pages
19. Larning Introduction
No ratings yet
19. Larning Introduction
6 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
From Everand
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
Fouad Sabry
No ratings yet
1098-T Copy B: 12,589.34 University of Oklahoma 1000 ASP AVE ROOM 105 Norman OK 73019 (405) 325-9000
No ratings yet
1098-T Copy B: 12,589.34 University of Oklahoma 1000 ASP AVE ROOM 105 Norman OK 73019 (405) 325-9000
2 pages
Community Health Nurse II - January 2024 (Nursingnote - In)
No ratings yet
Community Health Nurse II - January 2024 (Nursingnote - In)
1 page
(eBook PDF) Principles of Macroeconomics 7th Asia-Pacific Edition instant download
100% (1)
(eBook PDF) Principles of Macroeconomics 7th Asia-Pacific Edition instant download
52 pages
Class-VI Eng Inner Pages
No ratings yet
Class-VI Eng Inner Pages
120 pages
Homeroom Guidance: My Career, My Choice
100% (1)
Homeroom Guidance: My Career, My Choice
11 pages
Mathematics 6 Reviewer Third Quarter
No ratings yet
Mathematics 6 Reviewer Third Quarter
3 pages
Communit Y-Based Art Lesson Plan Template
No ratings yet
Communit Y-Based Art Lesson Plan Template
2 pages
SBFP Forms 1 6. Disaggregated by Sex.v2
No ratings yet
SBFP Forms 1 6. Disaggregated by Sex.v2
18 pages
Handbook of African Philosophy Handbooks in Philosophy 1st Ed 2023 3031251482 9783031251481 Compress
No ratings yet
Handbook of African Philosophy Handbooks in Philosophy 1st Ed 2023 3031251482 9783031251481 Compress
639 pages
The Rise Of Metacreativity Ai Aesthetics After Remix Eduardo Navas pdf download
100% (1)
The Rise Of Metacreativity Ai Aesthetics After Remix Eduardo Navas pdf download
82 pages
Examen - ? (AC-S06) Week 06 - Pre-Task - Quiz - Vocabulary Week 06
75% (4)
Examen - ? (AC-S06) Week 06 - Pre-Task - Quiz - Vocabulary Week 06
5 pages
Copia de B1 Speed Up Blended Planning Solution
No ratings yet
Copia de B1 Speed Up Blended Planning Solution
1,950 pages
For Woldiya University Students
No ratings yet
For Woldiya University Students
2 pages
Indian tradition AND WESTERN IMAGINATION (SEC)
No ratings yet
Indian tradition AND WESTERN IMAGINATION (SEC)
2 pages
Lesson Plan MS OFFICE
No ratings yet
Lesson Plan MS OFFICE
3 pages
Module 1
No ratings yet
Module 1
4 pages
Fee Structure 2024 25
No ratings yet
Fee Structure 2024 25
1 page
CURRICULUM MAP IN CLE GRADE 9 3rd and 4th quarter
No ratings yet
CURRICULUM MAP IN CLE GRADE 9 3rd and 4th quarter
11 pages
SKCE: Family Literacy Program Proposal
No ratings yet
SKCE: Family Literacy Program Proposal
5 pages
DT20223076108 JL
No ratings yet
DT20223076108 JL
3 pages
Psychological Criticism
No ratings yet
Psychological Criticism
2 pages
CA REVIEWER (TFN) Divinagracia, Betty&Abaquin
No ratings yet
CA REVIEWER (TFN) Divinagracia, Betty&Abaquin
9 pages
Mentor Mentee Toolkit
100% (9)
Mentor Mentee Toolkit
72 pages
Validaity and Reliability
No ratings yet
Validaity and Reliability
17 pages
Compare Management's Classical and Behavioral Theory
No ratings yet
Compare Management's Classical and Behavioral Theory
10 pages
Approaches To Curriculum Design: Tomasa C. Iringan, PH.D
100% (1)
Approaches To Curriculum Design: Tomasa C. Iringan, PH.D
16 pages
Ell6 Samp
No ratings yet
Ell6 Samp
17 pages

Articulo

Uploaded by

Articulo

Uploaded by

Extending the BDI Model with Q-learning in Uncertain

5. According to the context, such as box1 = 1 ㎡ , context

4 The ASL Optimal Decision Algorithm Based On

behavior trajectory of the agents =( s0,a0,s1,a1,....).

You might also like