Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms

Using Problem-Speciﬁc Knowledge and Learning
from Experience in Estimation of Distribution
Algorithms

Martin Pelikan and Mark W. Hauschild

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)
University of Missouri, St. Louis, MO
pelikan@cs.umsl.edu, mwh308@umsl.edu

http://medal.cs.umsl.edu/

Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

Motivation
Two key questions
Can we use past EDA runs to solve future problems faster?
EDAs do more than solve a problem.
EDAs provide us with lot of information about the landscape.
Why throw out this information?
Can we use problem-specific knowledge to speed up EDAs?
EDAs are able to adapt exploration operators to the problem.
We do not have to know much about the problem to solve it.
But why throw out prior problem-specific information if available?

This presentation
Reviews some of the approaches that attempt to do this.
Focus is on two areas:
Using prior problem-specific knowledge.
Learning from experience (past EDA runs).


Outline

1. EDA bottlenecks.

2. Prior problem-speciﬁc knowledge.

3. Learning from experience.

4. Summary and conclusions.


Estimation of Distribution Algorithms

Estimation of distribution algorithms (EDAs)
Work with a population of candidate solutions.
Learn probabilistic model of promising solutions.
Sample the model to generate new solutions.
Probabilistic Model-Building GAs
Current Selected New
population population population

Probabilistic

Model

…replace crossover+mutation with learning in EDAs
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience

Eﬃciency Enhancement of EDAs

Main EDA bottlenecks
Evaluation.
Model building.
Model sampling.
Memory complexity (models, candidate solutions).

Eﬃciency enhancement techniques
Address one or more bottlenecks.
Can adopt much from standard evolutionary algorithms.
But EDAs provide opportunities to do more than that!
Many approaches, we focus on a few.


What Comes Next?

1. Using problem-speciﬁc knowledge.

2. Learning from experience.


Problem-Specific Knowledge in EDAs

Basic idea
We don’t have to know much about the problem to use EDAs.
But what if we do know something about it?
Can we use prior problem-specific knowledge in EDAs?

Bias populations
Inject high quality solutions into population.
Modify solutions using a problem-specific procedure.

Bias model building
How to bias
Bias model structure (e.g. Bayesian network structure).
Bias model parameters (e.g. conditional probabilities).
Types of bias
Hard bias: Restrict admissible models/parameters.
Soft bias: Some models/parameters given preference over others.


Example: Biasing Model Structure in Graph Bipartitioning

Graph bipartitioning
Input
Graph G = (V, E).
V are nodes.
E are edges.
Task
Split V into equally sized subsets so that the number of edges
between these subsets is minimized.


Example: Biasing Model Structure in Graph Bipartitioning

Biasing models in graph bipartitioning
Soft bias (Schwarz & Ocenasek, 2000)
Increase prior probability of models with dependencies included in E.
Decrease prior probability of models with dependencies not included in E.
Hard bias (M¨hlenbein and Mahnig, 2002)
u
Strictly disallow model dependencies that disagree with edges in E.
In both cases performance of EDAs was substantially improved.


Important Challenges

Challenges in the use of prior knowledge in EDAs
Parameter bias using prior probabilities not explored much.
Structural bias introduced only rarely.
Model bias often studied only on surface.
Theory missing.


Learning from Experience

Basic idea
Consider solving many instances of the same problem class.
Can we learn from past EDA runs to solve future instances of
this problem type faster?
Similar to the use of prior knowledge, but in this case we
automate the discovery of problem properties (instead of
relying on expert knowledge).

What features to learn?
Model structure.
Promising candidate solutions or partial solutions.
Algorithm parameters.

How to use the learned features?
Modify/restrict algorithm parameters.
Bias populations.
Bias models.

Example: Probability Coincidence Matrix

Probability coincidence matrix (PCM)
Hauschild, Pelikan, Sastry, Goldberg (2008).
Each model may contain dependency between Xi and Xj .
PCM stores observed probabilities of dependencies.
PCM = {pij } where i, j ∈ {1, 2, . . . , n}.
pi,j = proportion of models with dependency between Xi and Xj .

Example PCM


Example: Probability Coincidence Matrix

Using PCM for hard bias
Hauschild et al. (2008).
Set threshold for the minimum proportion of a dependency.
Only accept dependencies occuring at least that often.
Strictly disallow other dependencies.

Using PCM for soft bias
Hauschild and Pelikan (2009).
Introduce prior probability of a model structure.
Dependencies that were more likely in the past are given
preference.


(b) 24x24
Results: PCM for 32 × 32 2D Spin Glass

5

Execution Time Speedup
4

3

2

1

0
1.5 2 0 0.5 1 1.5 2
ntage allowed Minimum edge percentage allowed

(d) 32x32
(Hauschild, Pelikan, Sastry, Goldberg; 2008)

edupPelikan, Mark W. Hauschild restrictions on model-building Experience in EDAs
Martin
with increased Prior Knowledge and Learning from
for 10

Results:Hauschild for 32 × 32 2D Spin Glass
Mark W. PCM

Size Execution-time speedup pmin % Total Dep.
256 (16 × 16) 3.89 0.020 6.4%
324 (18 × 18) 4.37 0.011 8.7%
400 (20 × 20) 4.34 0.020 7.0%
484 (22 × 22) 4.61 0.010 6.3%
576 (24 × 24) 4.63 0.013 4.6%
676 (26 × 26) 4.62 0.011 4.7%
784 (28 × 28) 4.45 0.009 5.4%
900 (30 × 30) 4.93 0.005 8.1%
1024 (32 × 32) 4.14 0.007 5.5%

Table 2: Optimal speedup and the corresponding PCM threshold pmin as well as the
percentage of total possible dependencies that were considered for the 2D Ising spin
glass.

(Hauschild, Pelikan, Sastry, Goldberg; 2008)
maximum distance of dependencies remains a challenge. If the distances are restricted
too severely, the bias on the model building may be too strong to allow for sufficiently
complex models; this was supported also with results in Hauschild, Pelikan, Lima, and
Sastry (2007). On the other hand, if the distances are not restricted sufficiently, the
benefits of this approach may be negligible. Prior Knowledge and Learning from Experience in EDAs
Martin Pelikan, Mark W. Hauschild

Example: Distance Restrictions

PCM limitations
Only can be applied when variables have fixed “function”.
Dependencies between specific variables are either more likely
or less likely across many problem instances.
Concept is difficult to scale with the number of variables.

Distance restrictions
Hauschild, Pelikan, Sastry, Goldberg (2008).
Introduce a distance metric over problem variables such that
variables at shorter distances are more likely to interact.
Gather statistics of dependencies at particular distances.
Decide on distance threshold to disallow some dependencies.
Use distances to provide soft bias via prior distributions.
Distance metrics are often straightforward, especially for
additively decomposable problems.


Example: Distance Restrictions for Graph Bipartitioning

Example for graph bipartitioning
Given graph G = (V, E).
Assign weight 1 for all edges in E.
Distance given as shortest path between vertices.
Unconnected vertices given distance |V |.


Example: Distance Restrictions for ADFs

Distance metric for additively decomposable function
Additively decomposable function (ADF):
m
f (X1 , . . . , Xn ) = fi (Si )
i=1
fi is ith subfunction
Si is subset of variables from {X1 , . . . , Xn }
Connect variables in the same subset Si for some i.
Distance is shortest path between variables (if connected).
Distance is n if path doesn’t exist.


(b) 20 × 20
Results: Distance Restrictions on 28 × 28 2D Spin Glass

6

Execution Time Speedup
5 ←
←4 5
←6

4 ←3 ←7
7 ←8
←8 3
←9
←9 ←2 ← 10
← 10 ← 11
← 11 2 ← 13
← 12 ← 12
← 14
24 → 1 28 →

0
0.8 1 0.2 0.4 0.6 0.8 1
ependencies Original Ratio of Total Dependencies

(Hauschild, Pelikan; 2009) (d) 28 × 28


Results: Distance Restrictions on 2D Spin Glass

Biasing models in hBOA using prior knowledge

Size Execution-time speedup Max Dist Allowed qmin % Total Dep.
256 (16 × 16) 4.2901 2 0.62 4.7%
400 (20 × 20) 4.9288 3 0.64 6.0%
576 (24 × 24) 5.2156 3 0.60 4.1%
784 (28 × 28) 4.9007 5 0.63 7.6%

Table 3: Distance cutoff runs with their best speedups by distance as well as the per-
centage of total possible dependencies that were considered for 2D Ising spin glass

(Hauschild, Pelikan; 2009) with dependencies restricted by the maximum distance,
instances we ran experiments
which was varied from 1 to the maximum distance found between any two proposi-
tions (for example, for p = 2−4 we ran experiments using a maximum distance from 1
to 9). For some instances with p = 1 the maximum distance was 500, indicating that
there was no path between some pairs of propositions. On the tested problems, small
distance restrictions (restricting to only distance 1 or 2) were sometimes too restrictive
and some instances would not be solved even with extremely large population sizes
(N = 512000); in these cases the results were omitted (such restrictions were not used).

Important Challenges

Challenges in learning from experience
The process of selecting threshold is manual and diﬃcult.
The ideas must be applied and tested on more problem types.
Theory is missing.


Another Related Idea: Model-Directed Hybridization

Model-directed hybridization
EDA models reveal lot about problem landscape
Use this information to design advanced neighborhood
structures (operators).
Use this information to design problem-speciﬁc operators.
Lot of successes, lot of work to be done.


Conclusions and Future Work
Conclusions
EDAs do a lot more than just solve the problem.
EDAs give us a lot of information about the problem.
EDAs allow use of prior knowledge of various forms.
Yet, most EDA researchers focus on design of new EDAs and only
few look at the use of EDAs beyond solving an isolated problem
instance.
Future work
Some of the key challenges were mentioned throughout the talk.
If you are interested in collaboration, talk to us.


Acknowledgments

Acknowledgments
NSF; NSF CAREER grant ECS-0547013.
University of Missouri; High Performance Computing
Collaboratory sponsored by Information Technology Services;
Research Award; Research Board.


Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms

Recommended

More Related Content

What's hot (19)

Viewers also liked (17)

Similar to Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms (20)

More from Martin Pelikan (8)

Recently uploaded (20)

Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms