World-GAN - A Generative Model For Minecraft
World-GAN - A Generative Model For Minecraft
Worlds
Maren Awiszus∗ Frederik Schubert∗ Bodo Rosenhahn
Institut für Informationsverarbeitung Institut für Informationsverarbeitung Institut für Informationsverarbeitung
Leibniz University Hannover Leibniz University Hannover Leibniz University Hannover
Hannover, Germany Hannover, Germany Hannover, Germany
[email protected] [email protected] [email protected]
On
Abstract—This work introduces World-GAN, the first method e3
D
Le
to perform data-driven Procedural Content Generation via Ma- ve
l
chine Learning in Minecraft from a single example. Based on a 3D
Generative Adversarial Network (GAN) architecture, we are able Ra
nd
om
to create arbitrarily sized world snippets from a given sample. 3D
W Le
or
GAld- ve
We evaluate our approach on creations from the community as ls
of
N
arbi
well as structures generated with the Minecraft World Generator. tra
ry
siz
Our method is motivated by the dense representations used in es
G D
0 0
...
...
Progression
G D
N- N-
Training
1 1
G D
N N
II. R ELATED W ORK Grbic et al. [21] introduce the problem of open-ended proce-
dural content generation in Minecraft. Sudhakaran et al. [22]
The field of PCGML has seen many advances in recent use the Neural Cellular Automata architecture to produce a
years, due to the growing capabilities of Machine Learning fixed structure in Minecraft given a seed or partial structure.
algorithms. Besides classical methods like Markov Random Yoon et al. [23] classify Minecraft villages into several
Fields and GANs [11], its methods have been used for level themes (e.g. medieval, futurist, asian) but do not perform
generation in Candy Crush Saga and Super Mario Bros. PCG. There have been experiments to generate Minecraft
(SMB) [5]. A recent approach framed PCG as a Reinforcement structures based on user-defined content [24], but the results
Learning (RL) problem and generated Zelda and Sokoban were not satisfying. Our proposed World-GAN is one of the
levels [12] using a Deep RL agent. Our method is inspired first practical PCGML applications for Minecraft.
by our previous work TOAD-GAN [10], which extended
SinGAN [13] to token-based games by using a hierarchical III. M ETHOD
downsampling operation. Our method builds upon several existing techniques which
In contrast to these existing studies, we propose a method are briefly described in this section before we introduce World-
for 3D level generation in Minecraft that adapts the idea of GAN and our block2vec algorithm.
token embeddings from Natural Language Processing (NLP) A. Generative Adversarial Networks
to overcome memory bottlenecks and manually-defined token
World-GAN is based on the GAN [25] architecture. Given
hierarchies. Embeddings of game entities have not been used
a dataset, these networks are able to generate new samples
in PCG, but were posed as a future research direction in [14].
that are similar to the provided examples. They are trained by
A game where PCG plays an essential role is Minecraft
using two adversaries, a generator G and a discriminator D.
[7]. The complex 3D structures in this game pose a problem
The generator is fed a random noise vector z and produces
for PCGML methods. The AI Settlement Generation Chal-
an output x̃. Then, the discriminator is either given a real
lenge [15]–[17] was recently created to spur research in this
sample x or the generated one and has to predict whether
direction. The submitted algorithms are generating villages in
the sample is from the real dataset or not. By learning to
a given world and are evaluated using subjective measures
fool the discriminator, the generator is gradually producing
(adaptability, functionality, narrative, aesthetics). The creators
more and more samples that look as if they belong to the
of the challenge mention data-driven approaches as a future
training distribution. One problem with this architecture is
direction of PCG in Minecraft which was one motivation
that it requires a lot of data. Otherwise, it is too easy for the
for our work. One method to increase the diversity of the
discriminator to distinguish between real and fake samples and
generated content was published by Green et al. [18], which
the generator is not able to improve its output.
generates floor plans using a constrained-growth algorithm and
Cellular Automata. B. SinGAN
Several simplified Minecraft-inspired simulators were pro- SinGAN [13] enables the generation of images from only
posed [19], [20] to study the creative space of 3D structures. one example by using a fully-convolutional generator and
discriminator architecture. Thus, the discriminator only sees
one part of the sample and can more easily be fooled by
the generator. Because the field of view in this architecture
is limited, long-range correlations can only be modeled by
introducing a cascade of generators and discriminators that
operate at N different scales. The samples for each scale are
downsampled and the GANs are trained beginning from the
smallest scale N
x̃N = GN (zN ). (1)
This scale defines the global structure of the generated
sample, which will be refined in the subsequent scales. At
scales 0 ≤ n < N , the output from the previous scale
is upsampled (↑) and passed to the scale’s generator after
disturbing it with a noise map zn ∼ N (0, σn 2 ). The variance
of the noise determines the amount of detail that will be added
at the current scale by the generator to produce
x̃n = x̃n+1↑ +Gn (zn + x̃n+1↑). (2) Fig. 3: Embeddings learned by block2vec of the ruins struc-
ture. The embeddings have 32 dimensions but are transformed
At each scale, the discriminator either receives a down- to two dimensions for this visualization using the Minimum
sampled real sample xn or the output of the generator with Distortion Embedding method [28].
equal probability. The gradient of the discrimination loss is
then propagated to the discriminator and the generator, which
creates the Minimax problem is able to generate. To put this difference into perspective,
a one-hot encoded tensor of the original SMB level 1-1 has
min max Ladv (Gn , Dn ) + αLrec (Gn ). (3) a shape of 202 × 16 with 12 (out of 28 possible) different
Gn Dn
tokens. Taking only the actually present tokens into account,
The loss Ladv is the widely-used Wasserstein GAN with this results in 38, 784 floating point numbers, which take up
Gradient Penalty (WGAN-GP) [26], [27] loss and Lrec is a 0.16 MB. The village example by comparison has a shape
reconstruction loss weighted by α which ensures that the of 121 × 136 × 33 with 71 (out of 300+ possible) different
GAN’s latent space contains the real sample2 . After training on tokens, resulting in 38, 556, 408 numbers that require 154.23
one scale has converged, the parameters of the generator and MB to store. If you do not preprocess the data so that only
discriminator are copied to the next scale as an initialization. present tokens are taken into account, the difference becomes
even more steep.
C. TOAD-GAN
D. World-GAN
As SinGAN is designed for modeling natural images, its
application to token-based games requires some modifications. While the overall architecture of World-GAN in Fig. 2a is
TOAD-GAN [10] introduces several changes to SinGAN’s similar to TOAD-GAN, the 3D structure of Minecraft levels
architecture. Small structures that consist of only a few or a requires several modifications. The generator and discriminator
single token would be missing at lower scales due to aliasing now use 3D convolutional filters that can process the k × D ×
by the downsampling operation. The bilinear downsampling H ×W sized slices from the input level. Here, k is the number
is thus replaced by a special downsampling operation that of tokens in a level and D, H and W are the depth, height
considers the importance of a token in comparison with its and width of the slice. Fig. 2b shows a visualization of the
neighbors. The importance is determined using a hierarchy that 3D convolution operation.
is constructed by a heuristic which is motivated by the TF- Another difficulty is the number of tokens in Minecraft
IDF metric from NLP. These extensions allow TOAD-GAN to and their long-tailed distribution, i.e. some of the tokens only
be applied to SMB and several other 2D token-based games. appear a few times in a given sample whereas others (such
However, the generation of 3D content requires some changes as air) take up half of the map. To make World-GAN
to the network architecture of TOAD-GAN. The jump from 2D independent of the number of tokens, we turn to a technique
to 3D means the size of samples will be significantly bigger from NLP.
and since TOAD-GAN is using one-hot encodings of tokens,
E. block2vec
the required GPU space grows substantially. This shortcoming
is especially apparent in Minecraft, where the high number Previous works on GANs [5], [10] for PCGML use a one-
of tokens can drastically limit the volume that TOAD-GAN hot encoding of each token in a level. The downsampling in
TOAD-GAN’s architecture requires a hierarchy of tokens to
2 For a more detailed description see [13] and [10]. enable the generation of small structures at lower scales. This
TABLE I: Structure coordinates of our example areas in
DREHMAL:PRIMΩRDIAL [29] (visualizations are shown in
Fig. 4).
Structure x y z Volume
desert [-3219, -3132] [2628, 2717] [116, 128] 92 916
plains [1082, 1167] [1110, 1186] [65, 103] 245 480
ruins [1026, 1077] [1088, 1152] [63, 73] 32 640
beach [606, 695] [-688, -629] [39, 64] 131 275
swamp [-2753, -2702] [3242, 3296] [56, 86] 82 620
mine shaft [24987, 25029] [-799, -754] [20, 38] 34 020
village [25165, 25286] [-770, -634] [55, 88] 543 048
TABLE II: Average Tile-Pattern KL-Divergence between the TABLE III: Average Levenshtein distance between the gener-
real structure and 20 generated levels. A lower TPKL-Div ated levels. A larger distance implies a larger variability in the
implies that the patterns of the original level are matched generated output.
better.
World-GAN TOAD-GAN 3D TOAD-GAN 3D*