
6 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)
pgmpy: Probabilistic Graphical Models using Python
Ankur Ankan
∗
, Abinash Panda
https://www.youtube.com/watch?v=Vcmjqx7lht0
F
Abstract—Probabilistic Graphical Models (PGM) is a technique of compactly
representing a joint distribution by exploiting dependencies between the random
variables. It also allows us to do inference on joint distributions in a computation-
ally cheaper way than the traditional methods. PGMs are widely used in the field
of speech recognition, information extraction, image segmentation, modelling
gene regulatory networks.
pgmpy [pgmpy] is a python library for working with graphical models. It allows
the user to create their own graphical models and answer inference or map
queries over them. pgmpy has implementation of many inference algorithms like
VariableElimination, Belief Propagation etc.
This paper first gives a short introduction to PGMs and various other python
packages available for working with PGMs. Then we discuss about creating and
doing inference over Bayesian Networks and Markov Networks using pgmpy.
Index Terms—Graphical Models, Bayesian Networks, Markov Networks, Vari-
able Elimination
Introduction
Probabilistic Graphical Model (PGM) is a technique of repre-
senting Joint Distributions over random variables in a compact
way by exploiting the dependencies between them. PGMs use
a network structure to encode the relationships between the
random variables and some parameters to represent the joint
distribution.
There are two major types of Graphical Models: Bayesian
Networks and Markov Networks.
Bayesian Network: A Bayesian Network consists of a
directed graph and a conditional probability distribution asso-
ciated with each of the random variables. A Bayesian network
is used mostly when there is a causal relationship between
the random variables. An example of a Bayesian Network
representing a student [student] taking some course is shown
in Fig 1.
Markov Network: A Markov Network consists of an undi-
rected graph and a few Factors are associated with it. Unlike
Conditional Probability Distributions, a Factor does not rep-
resent the probabilities of variables in the network; instead it
represents the compatibility between random variables that is
how much a particular state of a random variable likely to
agree with the another state of some other random variable.
An example of markov [markov] network over four friends A,
B, C, D agreeing to some concept is shown in Fig 2.
Copyright
c
○ 2015 Ankur Ankan et al. This is an open-access article dis-
tributed under the terms of the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
There are numerous open source packages available in
Python for working with graphical models. eBay’s bayesian-
belief-networks [bbn] mostly focuses on Bayesian Models
and has implementation of a limited number of inference
algorithms. Another package pymc [pymc] focuses mainly
on Markov Chain Monte Carlo (MCMC) method. libpgm
[libpgm] also mainly focuses on Bayesian Networks.
pgmpy tries to be a complete package for working with
graphical models and gives the user full control on designing
the model. The source code is very well documented with
proper docstrings and doctests for each method so that users
can quickly get upto speed. Furthermore, pgmpy also provides
easy extensibility allowing users to write their own inference
algorithms or elimination order algorithms without any addi-
tional effort to get familiar with the source code.
Getting Source Code and Installing
pgmpy is released under MIT Licence and is hosted on github.
We can simply clone the repository and install it:
git clone https://github.com/pgmpy/pgmpy
cd pgmpy
[sudo] python3 setup.py install
Dependencies: pgmpy runs only on python3 and is dependent
on networkx, numpy, pandas and scipy which can be installed
using pip or conda as:
pip install -r requirements.txt
or:
conda install --file requirements.txt
Creating Bayesian Models using pgmpy
A Bayesian Network consists of a directed graph where
nodes represents random variables and edges represent the the
relation between them. It is parameterized using Conditional
Probability Distributions(CPD). Each random variable in a
Bayesian Network has a CPD associated with it. If a random
varible has parents in the network then the CPD represents
P(var|Par
var
) i.e. the probability of that variable given its
parents. In the case, when the random variable has no parents
it simply represents P(var) i.e. the probability of that variable.
For example, we can take the case of student model rep-
resented in Fig 1. A possible CPD for the random variable
grade is shown in Table 1.
We can represent the CPD shown in Table 1 in pgmpy as
follows: