0% found this document useful (0 votes)
5 views

pomdp py AFramework to Build and Solve POMDP Problems

The document introduces pomdp py, a POMDP library designed for ease of use and efficient prototyping in Python and Cython, addressing the limitations of existing libraries. It provides a comprehensive interface for defining and solving POMDP problems, with integration capabilities for robotics systems via ROS. The paper outlines the design principles, programming model, and potential future directions for the library's development.

Uploaded by

octavio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

pomdp py AFramework to Build and Solve POMDP Problems

The document introduces pomdp py, a POMDP library designed for ease of use and efficient prototyping in Python and Cython, addressing the limitations of existing libraries. It provides a comprehensive interface for defining and solving POMDP problems, with integration capabilities for robotics systems via ROS. The paper outlines the design principles, programming model, and potential future directions for the library's development.

Uploaded by

octavio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

pomdp py: A Framework to Build and Solve POMDP Problems

Kaiyu Zheng∗ Stefanie Tellex


Department of Computer Science,
Brown University
Providence, RI, USA
{kzheng10, stefie10}@cs.brown.edu

Abstract belief
target
target objects

agent
In this paper, we present pomdp py, a general purpose Par- agent agent

tially Observable Markov Decision Process (POMDP) library target


written in Python and Cython. Existing POMDP libraries
often hinder accessibility and efficient prototyping due to
(a) (b) (c)
the underlying programming language or interfaces, and re-
quire extra complexity in software toolchain to integrate with move look torso-down look detect
robotics systems. pomdp py features simple and compre-
hensive interfaces capable of describing large discrete or con- target object target object target object target object

tinuous (PO)MDP problems. Here, we summarize the de-


sign principles and describe in detail the programming model
and interfaces in pomdp py. We also describe intuitive in- 1 2 3 4
tegration of this library with ROS (Robot Operating Sys- (d)
tem), which enabled our torso-actuated robot to perform ob-
ject search in 3D. Finally, we note directions to improve and Figure 1: Example tasks implemented using pomdp py. (a)
extend this library for POMDP planning and beyond. Object search in 2D with a simulated drone in Unity con-
trolled using ROS1 . (b) the light-dark domain with con-
Introduction tinuous spaces. (c) Object search in a 3D simulation with
frustum-shaped field-of-view. (d) Object search in 3D im-
Partially Observable Markov Decision Processes (POMDP) plemented on a torso-actuated robot controlled using ROS;
are a sequential decision-making framework suitable to (d1-d4) shows a sequence of actions planned using an PO-
model many robotics problems, from localization and map- UCT implemented in pomdp py, where the robot decides
ping (Ocaña et al. 2005) to human-robot interaction (Whit- to lower its torso to search, and finds the object on the table.
ney et al. 2017). Early efforts in developing tools for
POMDPs attempt to separate solvers from domain descrip-
tion by creating specialized file formats to specify POMDPs
via ROSPlan (Cashmore et al. 2015), recently demonstrated
(Cassandr 2003; APPL 2009), which are not designed for
for object fetching (Canal et al. 2019). Nevertheless, using
large and complex problems. Among libraries under ac-
this set of tools adds overhead of using a classical fluent-
tive development, Approximate POMDP Planning Toolkit
based planning paradigm, which is not required to describe
(APPL) (Somani et al. 2013) and AI-Toolkit (Bargiac-
and solve POMDPs in general.
chi 2014) are implemented in C++ and contain numerous
solvers. However, the learning curve for these libraries is This leads to our belief that there lacks a POMDP library
steep as C++ is less accessible to current researchers in gen- with simple interfaces that brings together both accessibil-
eral compared to Python (Virtanen et al. 2020). POMDPs.jl ity and performance. We address this demand by present-
(Egorov et al. 2017) is a POMDP library with a suite of ing pomdp py, a framework to build and solve POMDP
solvers and domains, written in Julia. Though promising, problems written in Python and Cython (Behnel et al. 2011).
Julia has yet to achieve a wide recognition and creates It features simple and comprehensive interfaces to describe
language barrier for many researchers. POMDPy (Emami, POMDP or MDP problems, and can be integrated with ROS
Hamlet, and Crane 2015) is implemented purely in Python. (Quigley et al. 2009) intuitively through rospy. In the rest
Yet with an original focus on POMCP implementation, it as- of this paper, we first review POMDPs, then illustrate the
sumes a blackbox world model in its POMDP interface, lim- design principles and key features of pomdp py, includ-
iting its extensibility. Finally, a promising toolchain is to use ing integration with ROS. Finally, we note directions to im-
Relational Dynamic Influence Diagram Language (RDDL) prove and extend this library, in hope of cultivating an open-
(Sanner 2010) to describe factored POMDPs and solve them source community for POMDP-related research and devel-
∗ 1
corresponding author We thank Rebecca Mathew for kindly providing this figure.
opment. The documentation of pomdp py is available at: Environment
2
https://h2r.github.io/pomdp-py/html/. Tutorials on example
domains can be found in the documentation. Figure 1 shows
1 3
several different domains that are implemented on top of the
pomdp py framework. This library is currently actively de-
Agent
veloped as we continue our POMDP-related research. 4

POMDPs
POMDPs (Kaelbling, Littman, and Cassandra 1998) model
sequential decision making problems where the agent must Figure 2: POMDP model of agent-environment interaction.
act under partial observability of the environment state (Fig- (1) Agent takes an action. (2) Environment state transi-
ure 2). POMDPs consider both uncertainty in action effect tions. (3) Agent receives an observation and a reward signal.
(i.e. transitions) and observations, which are usually incom- (4) Agent updates history and belief.
plete and noisy information related to the state. A POMDP
is defined as a tuple hS, A, O, T, O, R, γi. The problem
domain is specified by S, A, O: the state, action, and obser- quires nested iterations over the state space to update the be-
vation spaces. At each time step, the agent decides to take lief, which is computationally intractable in large domains.
an action a ∈ A, which may be sampled from a ∼ π(ht , ·) Particle belief representation is a simple and scalable belief
according to a policy model π(ht , a) = Pr(a|ht ). This representation which is updated through matching simulated
leads to state change from s to s′ ∼ T (s, a, s′ ) according and real observations exactly (Silver and Veness 2010). Dif-
to the transition model T . Then, the agent receives an ferent schemes of weighted particles have been proposed to
observation o ∼ O(s′ , a, o) according to the observation handle large or continuous observation spaces where exact
model O, and reward r ∼ R(s, a, s′ ), r ∈ R according to matching results in particle depletion (Sunberg and Kochen-
the reward model R. Upon receiving o and r, the agent derfer 2018; Garg, Hsu, and Lee 2019).
updates its history ht and belief bt to ht+1 and bt+1 . The pomdp py does not commit to any specific belief repre-
goal of solving a POMDP is to find a policy π(ht , ·) which sentation. It provides implementations for basic belief rep-
maximizes the expectation
hP of future discounted rewards: i resentations and update algorithms, including tabular, parti-
π
V (ht ) = E

γ k−t
R(s , a ) a = π(h , ·) , cles, and multi-variate Gaussians, but more importantly al-
k=t k k k k
lows the user to create their own new or problem-specific
where γ is the discount factor. In pomdp py, a few key representation, according to the interface of a generative
interfaces are defined to help organize the definition of probability distribution.
POMDPs in a simple and consistent manner.

Solvers. Most recent POMDP solvers are anytime algo- Design Philosphy
rithms (Zilberstein 1996; Ross et al. 2008), due to the in- Our goal is to design a framework that allows simple and
tractable computation required to solve POMDPs exactly intuitive ways of defining POMDPs at scale for both dis-
(Madani, Hanks, and Condon 1999). There are currently crete and continuous domains, as well as solving them either
two major camps of anytime solvers, point-based methods through planning or through reinforcement learning. In ad-
(Kurniawati, Hsu, and Lee 2008; Shani, Pineau, and Kaplow dition, we implement this framework in Python and Cython
2013) which approximates the belief space by a set of reach- to improve accessibility and prototyping efficiency without
able α-vectors, and Monte-Carlo tree search-based methods losing orders of magnitude in performance (Behnel et al.
(Silver and Veness 2010; Somani et al. 2013) that explores a 2011; Smith 2015). We summarize the design principles be-
subset of future action-observation sequences. hind pomdp py below:
Currently, pomdp py contains an implementation of
• Fundamentally, we view the POMDP scenario as the in-
POMCP and PO-UCT (Silver and Veness 2010), as well
teraction between an agent and the environment, through
as a naive exact value iteration algorithm without pruning
a few important generative probability distributions (π,
(Kaelbling, Littman, and Cassandra 1998). The interfaces of
T, O, R or blackbox model G).
the library support implementation of other algorithms; We
hope to cultivate a community to implement more solvers • The agent and the environment may carry different mod-
or create bridges between pomdp py and other libraries. els to support learning, since for real-world problems es-
pecially in robotics, the agent generally does not know
Belief representation The partial observability of en- the true transition or reward models underlying the envi-
vironment state implies that the agent has to main- ronment, and only acts based on a simplified or estimated
tain a posterior distribution over possible states (Thrun, model.
Burgard, and Fox 2005). The agent should update this
• The POMDP domain could be very large or continuous,
belief distribution through new actions and observa-
′ thus explicit enumeration of elements in the spaces should
P belief update is given by bt+1 (s ) =
tions. The exact
be optional.
η Pr(o|s′ , a) s Pr(s′ |s, a)bt (s), where η is the normaliz-
ing factor. Hence, a naive tabular belief representation re- • The representation of belief distribution is decided by the
(1) Interfaces to Define a POMDP
user and can be customized, as long as it follows the in-
terface of a generative distribution. POMDP
Agent Environment
• Models can be reused across different POMDP problems.
Extensions of the POMDP framework to, for example, de-
Agent Environment
centralized POMDPs, should also be possible by building
GenerativeDistribution State
upon existing interfaces. (belief)

PolicyModel either
TransitionModel
either
Programming Model and Features TransitionModel RewardModel

or
The basis of pomdp py is a set of simple interfaces that ObservationModel
BlackboxModel
collectively form a framework for building and solving RewardModel

or
POMDPs. Figure 3 illustrates some of the key components BlackboxModel
Interface corresponding
POMDP component
legend
input/output
(i.e. base class)

of the framework and the control flow. function contained


description
component

in class argument
When defining a POMDP, one first defines the domain (2) POMDP control flow
by implementing the State, Action, Observation in- implemented via the interfaces
terfaces. The only required functions for each interface are Environtment
eq and hash . For example, the interface for State
is simply2 : .sample(..)

copy
class State:
provide_observation(..)
def e q (self , other ):
raise NotImplementedError
def h a s h (self ): copy

raise NotImplementedError Action supply


Observ.
observation (reward)
model

Next, one defines the models by implementing the inter-


faces TransitionModel, ObservationModel, etc.
(see Figure 3 for all). Note that one may define a different
Planner Agent
transition and reward model for the agent than the environ- do

ment (e.g. for learning). One also defines a PolicyModel plan(..) belief
update

which (1) determines the action space at a given his-


tory or state, and (2) samples an action from this space
Figure 3: (1) Core Interfaces in the pomdp py framework;
according to some probability distribution. Implementing
(2) POMDP control flow implemented through interaction
these models involves implementing the probability,
between the core interfaces.
sample and argmax functions. For example, the inter-
face for ObservationModel, modeling O(s′ , a, o) =
Pr(o|s′ , a), is:
To instantiate a POMDP, one provides parameters for the
class ObservationModel : models, the initial state of the environment, and the ini-
def probability (self , observation , next state , tial belief of the agent. For the Tiger problem3 (Kaelbling,
action , ∗∗ kwargs ): Littman, and Cassandra 1998), for example,
""" Returns the probability Pr(o | s’,a). """
s0 = random . choice (list( TigerProblem . STATES ))
raise NotImplementedError
b0 = pomdp py . Histogram({ State("tiger−left"): 0.5,
def sample (self , next state , action , ∗∗ kwargs ):
State("tiger−right"): 0.5 } )
""" Returns a sample o ~ Pr(o | s’,a). """
tiger problem = TigerProblem (... , s0 , b0)
raise NotImplementedError
def argmax (self , next state , action , ∗∗ kwargs ): Here, TigerProblem is a POMDP whose constructor takes
""" Returns o∗ = argmax o Pr(o | s’,a). """ care of initializing the Agent and Environment objects,
raise NotImplementedError and is instantiated by parameters (omitted), initial state and
def get all observations (self , ∗ args , ∗∗ kwargs ): belief. Note that it is entirely optional to explicitly define a
""" Returns a set of all possible problem class such as TigerProblem in order to program
observations , if feasible .""" the POMDP control flow, discussed below.
raise NotImplementedError To solve a POMDP with pomdp py, here is the control
It is up to the user to choose which subset of these functions flow one should implement that contains the basic steps (see
to implement, depending on the domain. These interfaces also Figure 3 for illustration):
aim to remind users the essence of models in POMDPs. 1. Create a planner (Planner), i.e. a POMDP solver.
2
Note that the code snippets here are modified or shortened 2. Agent plans an action a ∈ A through the planner.
slightly for display purposes. Please refer to the code on github:
3
https://github.com/h2r/pomdp-py/ https://h2r.github.io/pomdp-py/html/examples.tiger.html
3. Environment state transitions st → st+1 according to its Object-Oriented POMDPs. OO-POMDP (Wandzel et al.
transition model. 2019) is a particular kind of factored POMDP that factors
4. Agent receives an observation ot and reward rt from the the state and observation spaces
Q into a set of n objects. For
environment. instance, Pr(s′ |s, a) = ′
i Pr(si |s, a), i ∈ {1, · · · , n}.
The belief space is also factored, which allows the belief
5. Agent updates history and belief. ht , bt → ht+1 , bt+1 , space to grow linearly instead of exponentially as the
where ht+1 = ht (at , ot ). number of objects increases. Each object is of a certain class
6. Unless termination condition is true, repeat steps 2-5. and has a set of attributes. The values of these attributes
The Planner interface is as follows. The planner may be constitute the state of an object. In pomdp py, we provide
updated given a real action and a real observation, which is interfaces to implement OO-POMDPs, which serves as
necessary for MCTS-based solvers. an example of extending the basic POMDP framework
to create another class of model. These interfaces include
class Planner : OOState, OOBelief, OOTransitionModel, etc.
def plan(self , agent ):
""" The agent carries the information : Integration with ROS. ROS (Quigley et al. 2009) is
Bt , ht , O,T,R/G, pi , needed for planning """ an open-source system that builds a network connecting
raise NotImplementedError computing stations and robots, where nodes interact with
def update (self , agent , action , observation ): one another through publishing messages or making ser-
""" Updates the planner based on real action vice requests. It is typical to separate nodes that manage
and observation . Updates the agent belief resources and controls the robot from nodes that runs
accordingly if necessary . """ sophisticated algorithms. This is the case of pomdp py as
pass well. The POMDP-related computations can be done on a
Code Organization. In a more complicated problem such as node that implements the POMDP control flow (see the six
the Light-Dark domain (Platt Jr et al. 2010) or Multi-Object steps above). Inside this node, when an action is selected
Search with fan-shaped sensors (Wandzel et al. 2019), it may by the Planner (step 1), the node can publish a message
be tricky to organize the code base and be consistent across to the nodes for robot control so that the robot can execute
different problems. Below we provide a recommendation of the action (step 2). The environment state automatically
the package structure to use pomdp py to guide the devel- updates in the real world as a result of that action (step 3),
opment and facilitate code sharing: and the node receives the sensor measurements or other
forms of observations through subscribed topics (step 4),
− domain / and performs belief update (step 5). This process is repeated
− state.py // State until termination condition is met (step 6). ROS provides a
− action .py // Action package rospy which eases the integration of the POMDP
− observation .py // Observation control flow with the robot system.
− ...
− models / 3D Object Search with Torso-Actuated Robot. We devel-
− transition model .py // TransitionModel oped a novel approach to model and solve an OO-POMDP
− observation model .py // ObservationModel for the task of multi-object search in 3D. Using pomdp py,
− reward model .py // RewardModel we implemented this approach in a simulated environment,
− policy model .py // PolicyModel and on Kinova MOVO, a torso-actuated mobile manpulator
− ... platform controlled with ROS. Figure 1 shows a sequence of
− agent/ actions that lead to object detection. In this demonstration,
− agent.py // Agent each planning step has a time budget of 3 seconds.
− ...
− env/ Conclusions & Future Work
− env.py // Environment We present a POMDP library, named pomdp py, that brings
− ... together accessibility to programmers through Python as
− problem .py // POMDP well as performance through Cython, with an intuitive de-
sign and straightforward integration with ROS. The pro-
The recommendation is to separate code for domain,
gramming model is designed to encourage organized de-
models, agent and environment, and have simple generic
velopment and code sharing within a community. We be-
filenames. As in the above tree, files such as state.py
lieve pomdp py has potential to facilitate research be-
or transition model.py are self-evident in their role.
sides POMDP planning, including reinforcement learning,
The problem.py file is where the specific implemen-
transfer learning, and multi-agent systems. For example,
tation of the POMDP class is defined, and where the logic
mulitple Agent objects could be instantiated, and different
of control flow is implemented. Refer to the Multi-Object
RewardModel classes can be created to represent different
Search example in the documentation for more detail4 .
tasks. Finally, we call for support to create bridges between
pomdp py and other libraries to make use of existing algo-
4
https://h2r.github.io/pomdp-py/html/examples.mos.html rithm implementations.
References [Quigley et al. 2009] Quigley, M.; Conley, K.; Gerkey, B.;
[APPL 2009] APPL, P. 2009. Pomdpx file format (ver- Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; and Ng, A. Y.
sion 1.0). https://bigbird.comp.nus.edu.sg/pmwiki/farm/ 2009. Ros: an open-source robot operating system. In ICRA
appl/index.php?n=Main.PomdpXDocumentation. workshop on open source software, volume 3, 5. Kobe,
Japan.
[Bargiacchi 2014] Bargiacchi, E. 2014. Ai-toolbox: A c++
[Ross et al. 2008] Ross, S.; Pineau, J.; Paquet, S.; and Chaib-
framework for mdps and pomdps with python bindings.
Draa, B. 2008. Online planning algorithms for pomdps.
https://github.com/Svalorzen/AI-Toolbox.
Journal of Artificial Intelligence Research 32:663–704.
[Behnel et al. 2011] Behnel, S.; Bradshaw, R.; Citro, C.; Dal-
[Sanner 2010] Sanner, S. 2010. Relational dynamic influ-
cin, L.; Seljebotn, D. S.; and Smith, K. 2011. Cython: The
ence diagram language (rddl): Language description. Un-
best of both worlds. Computing in Science & Engineering
published ms. Australian National University 32.
13(2):31–39.
[Shani, Pineau, and Kaplow 2013] Shani, G.; Pineau, J.; and
[Canal et al. 2019] Canal, G.; Cashmore, M.; Krivić, S.; Kaplow, R. 2013. A survey of point-based pomdp solvers.
Alenyà, G.; Magazzeni, D.; and Torras, C. 2019. Probabilis- Autonomous Agents and Multi-Agent Systems 27(1):1–51.
tic planning for robotics with rosplan. In Annual Conference
Towards Autonomous Robotic Systems, 236–250. Springer. [Silver and Veness 2010] Silver, D., and Veness, J. 2010.
Monte-carlo planning in large pomdps. In Advances in neu-
[Cashmore et al. 2015] Cashmore, M.; Fox, M.; Long, D.; ral information processing systems, 2164–2172.
Magazzeni, D.; Ridder, B.; Carrera, A.; Palomeras, N.; Hur-
tos, N.; and Carreras, M. 2015. Rosplan: Planning in the [Smith 2015] Smith, K. W. 2015. Cython: A Guide for
robot operating system. In Twenty-Fifth International Con- Python Programmers. ” O’Reilly Media, Inc.”.
ference on Automated Planning and Scheduling. [Somani et al. 2013] Somani, A.; Ye, N.; Hsu, D.; and Lee,
[Cassandr 2003] Cassandr, A. R. 2003. Pomdp file format. W. S. 2013. Despot: Online pomdp planning with regular-
http://www.pomdp.org/code/pomdp-file-spec.html. ization. In Advances in neural information processing sys-
tems, 1772–1780.
[Egorov et al. 2017] Egorov, M.; Sunberg, Z. N.; Balaban,
E.; Wheeler, T. A.; Gupta, J. K.; and Kochenderfer, M. J. [Sunberg and Kochenderfer 2018] Sunberg, Z. N., and
2017. Pomdps. jl: A framework for sequential decision mak- Kochenderfer, M. J. 2018. Online algorithms for pomdps
ing under uncertainty. The Journal of Machine Learning Re- with continuous state, action, and observation spaces. In
search 18(1):831–835. Twenty-Eighth International Conference on Automated
Planning and Scheduling.
[Emami, Hamlet, and Crane 2015] Emami, P.; Hamlet, A. J.;
[Thrun, Burgard, and Fox 2005] Thrun, S.; Burgard, W.; and
and Crane, C. 2015. Pomdpy: An extensible framework for
Fox, D. 2005. Probabilistic robotics.
implementing pomdps in python.
[Virtanen et al. 2020] Virtanen, P.; Gommers, R.; Oliphant,
[Garg, Hsu, and Lee 2019] Garg, N. P.; Hsu, D.; and Lee,
T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski,
W. S. 2019. Despot-α: Online pomdp planning with large
E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. 2020.
state and observation spaces. In Robotics: Science and Sys-
Scipy 1.0: fundamental algorithms for scientific computing
tems.
in python. Nature methods 1–12.
[Kaelbling, Littman, and Cassandra 1998] Kaelbling, L. P.; [Wandzel et al. 2019] Wandzel, A.; Oh, Y.; Fishman, M.;
Littman, M. L.; and Cassandra, A. R. 1998. Planning and Kumar, N.; LS, W. L.; and Tellex, S. 2019. Multi-object
acting in partially observable stochastic domains. Artificial search using object-oriented pomdps. In 2019 International
intelligence 101(1-2):99–134. Conference on Robotics and Automation (ICRA), 7194–
[Kurniawati, Hsu, and Lee 2008] Kurniawati, H.; Hsu, D.; 7200. IEEE.
and Lee, W. S. 2008. Sarsop: Efficient point-based [Whitney et al. 2017] Whitney, D.; Rosen, E.; MacGlashan,
pomdp planning by approximating optimally reachable be- J.; Wong, L. L.; and Tellex, S. 2017. Reducing errors
lief spaces. In Robotics: Science and systems, volume 2008. in object-fetching interactions through social feedback. In
Zurich, Switzerland. 2017 IEEE International Conference on Robotics and Au-
[Madani, Hanks, and Condon 1999] Madani, O.; Hanks, S.; tomation (ICRA), 1006–1013. IEEE.
and Condon, A. 1999. On the undecidability of probabilistic [Zilberstein 1996] Zilberstein, S. 1996. Using anytime algo-
planning and infinite-horizon partially observable markov rithms in intelligent systems. AI magazine 17(3):73–73.
decision problems. In AAAI/IAAI, 541–548.
[Ocaña et al. 2005] Ocaña, M.; Bergasa, L. M.; Sotelo, M.;
and Flores, R. 2005. Indoor robot navigation using a
pomdp based on wifi and ultrasound observations. In 2005
IEEE/RSJ International Conference on Intelligent Robots
and Systems, 2592–2597. IEEE.
[Platt Jr et al. 2010] Platt Jr, R.; Tedrake, R.; Kaelbling, L.;
and Lozano-Perez, T. 2010. Belief space planning assuming
maximum likelihood observations.

You might also like