Voxel Terrain Engine
Voxel Terrain Engine
Starting by the obvious, what is a voxel ? A voxel is a short for volume element in the same way as
Home
pixel is a contraction of picture element. OK that doesn't really make sense but they just wanted to
make it sound the same. There are different kind of voxels, one that we won't consider here is to see
Résumé
voxel as a way to represent volumetric object as three dimensional bitmaps instead of vectors (by
+courses opposition, polygonal objects are made of vectors). In this situation voxels are like legos. Small bricks
that constitute a bigger object, similar to the relation pixels have with the bigger image.
Art Gallery
But here we'll only discuss the voxel terrain as featured in games like Outcast or Comanche. The
Computer voxel terrain engine is a 2.5D engine, it doesn't have all the levels of freedom that a regular 3D engine
+projects offers. It has the same degree of freedom as ray casting engines (popularized by Doom) and they are
so close than in fact you could probably implement a voxel terrain engine as a ray caster. We won't
+articles
do that, instead we'll use rasterization with a few tricks. Why is it called voxel if this isn't the same as
the first definition ? Because our terrain is constituted by small 2D bricks that expand vertically. So
Journal
this is a voxel, only a more restrictive definition of a voxel.
+photos
+coding Introduction
Links This project was started as I discovered the screenshots of Outcast in 1999 and decided to do
something similar. This was a rather large project given that I didn't know anything about 3D engines
Valid XHTML or C++, but the math seemed easy enough. What is following is the newest version using Win32 for
1.1 the interface (sorry if I use a lot of Win32 notations but I guess that the algorithms explained below
are portable enough).
Copyright © Grégory Massal 1976-
2005
For some laughs, the original versions with a bad case of programmer art (and an underpowered
Pentium).
Why would you need a voxel engine nowadays ? Short answer you don't need one. If you've got a
3D accelerator you can have a very detailed terrain with multi texturing, occlusion with a z buffer,
powerful shading languages, and most of the work (clipping, rasterization, transformation etc..) is
done for you and is the fastest you could ever hope to get. You should see what follows as a mere
exercise of fun. There are better tools, so use them.
The authoring can be tricky, there are real worlds data available as elevation maps, but what if I
want to represent something that doesn't exist anywhere else ? Asking an artist to do a simple terrain
with faults, mountains, crevices, plains, can be a lot of pain. There are tools that help you author
random generated height maps. For this article I used the free versions of World machine and
Terragen. And with some amount of work or by using pregenerated scripts you can get realistic
enough terrains (go see their galleries for more).
For each position on the map I have a corresponding height encoded as a 8 bit number. 0 to 255
might not be enough for what you envision but you can of course vary the elevation range that is
contained in those 8 bits (compression), and if that is not enough you can go for 16 bit integers or 32
bit floating point numbers. It's really a decision depending of your data and there are way to
compress those bits better to achieve a similar result with a smaller memory footprint. I won't detail
them too much. Let's go back to our design.
Make sure that the map you've created is using the full range of 0 to 255, this is small enough that
you don't want to lose precious bits of information in the process. For this engine purpose I also had
to make the map tileable. That means that if I go beyond the left side of the map, I can magically
wrap to the right side of the map and not fall a cliff because the aspect of the right border is a perfect
fit for the left border. I won't detail that too much, but there are ways to make sure of that, one is
taking an untileable map and transform it into a copy of the same map duplicated four time and
mirrored horizontally or vertically. This way the left border is perfectly equal to the right border.
Inconvenient is that you have a map that is highly redundant (well you were going to make it tiled
anyway..). Another solution is to generate a slightly larger version of the map, to crop the upper
border and report the deleted bit to the lower part of the image with a smooth blend with the
underlying image. That way the lower border will be a match with the upper border and this will look
seamless enough. In this example I did this manually and this is far from perfect but with a bit of
practice you'll have to be looking closely for defects. There are also advanced algorithms that try to
preserve the features of the original image and minimize the repetition artifacts but that's too overkill
for this little project.
Second there was the visibility
Before drawing every single terrain element you have to determine what is visible and what is not.
This is both a performance and a correctness problem. If a terrain element is partially visible or
totally occluded and you draw it anyway, then you have a correctness issue. If you draw terrain
elements that are situated behind your camera or outside of your view then you have spent
numerous cpu cycles for basically nothing.
The above image represents what you can actually see from a given point of view (in red), most of
the black areas are outside of the screen, and some parts that should be on the screen inside the
view frustum are in fact occluded by terrain features in front of them. I cannot realistically go through
every terrain element surrounding the camera and determine if they are clipped outside of the
frustum, I will use a hierarchical representation with a coarse level and a detailed level, I will consider
going through the detailed clipping only if the coarse has determined visibility or ambiguity.
I'll only use two levels in my hierarchy, so deciding where to put the coarse level is strategic what is
detailed enough to avoid unnecessary clipping work at the lower level (limit the ambiguity cases), and
what is high level enough to not do unnecessary work at the coarse level. I've decided to go for
256x256 zones of terrain that can be quickly culled.
In pure red, you have zones that lies 100% within the frustum (the frustum, is the cone, or the zone
delimited by those two lines in green that limit what we can potentially see in this world at a given
rotation and position). In pink are those zones that are not entirely inside the frustum but that we
have to include anyway because *some of it* is visible. In the worst case scenario that means
that one terrain element can be visible out of those 65536. That's bad but we have to do with it or
else go for a coarser grid.
Clipping is a pretty straightforward operation, you have to test each angle of a quadrangle that
delimits a zone against three lines that I call left, right and near. I put near at a short distance of the
camera position, because I'll have to divide later by Z, the distance along the view axis and I want to
avoid to divide by zero. Note that the farther we push near, the better performance we get but then
we increase the risk of causing artifacts.
In order to clip, you have to determine the equation of each of those lines, finding the equation is
equivalent to searching the normals to the lines in the 2D plane, the normals are defined in terms of
fov and rho, the two relevant angles of this configuration.
So if we posed x1, y1, x2, y2 the four coordinates that define an axis aligned bounding box. Then
we have the following code to clip against the near plane :
if (dist1 <= 0 && dist2 <= 0 && dist3 <= 0 && dist4 <= 0)
{
// clipped by near plane
}
We do that for each of the line (plane). We handled everything until now as a 2D problem, we'll get
the third dimension when we start projecting things on the screen.
Reverse painter
We have a simple algorithm to draw a set of elements in 3D that occlude each other, that algorithm
is called painter algorithm. It consists of sorting everything and draw the world from back to front.
Each time you draw an element you don't care about what is already on the screen you draw it
anyway because you know that the last element is the one on top. One major inconvenient of this
algorithm is that it is rather expensive to draw everything even what may be occluded. This
phenomenon is called overdraw. The rare occasion where overdraw is good is when you are drawing
transparent object in which case this isn't really overdraw because nothing is occluded. Another
major inconvenient that makes this impractical in real life (even if overdraw was free) is that the
sorting path is not always possible. Most of the time you can't really sort a scene from back to front in
a simple way (it might require dividing polygons or elements to sort them out).
We don't want overdraw, in fact we want zero overdraw and only because we can without having to
sacrifice anything else. What I call reverse painter is a pun on the previous algorithm, we do
everything in reverse. First we draw our elements front to back. Then when touching a pixel, we look
first if something has been drawn on this pixel, if this is the case we don't write our color because
that means that whatever was drawn first must occlude the new pixel.
Sorting is easy. We have big blocks of data that we can sort between each other because they don't
intersect (zones). At the beginning of a frame, I take all potential zones (in a circle around the
camera), I put them all in a structure where they are sorted by their distance relative to the camera.
Then I take all zones in their order (front to back) and I draw them using four simple cases depending
what side of them I see first.
When drawing an individual zone we have to decide on a pattern to traverse the 256x256
rectangle, once again this is trivial, we have four cases :
The arrow indicates in which direction the viewer look at the zone. You may wonder how good is
this mechanism to sort front to back in 2D when you're going to read a displacement information
from a texture and that is going to mess with your good ordering. Actually it won't. Simply because
the displacement is in a constant direction, the relative position of each terrain element will stay
exactly the same.
If you're familiar with 3D accelerator, then you probably know the Z-buffer algorithm, that allow
you to draw your geometry in any order and to let the Z-Buffer sort out the visible pixels from the
occluded. This would work of course, but we don't really need that. The thing to remember with a Z
buffer is that if you don't sort then there may be situations where you have as much overdraw as the
painter algorithm (if by accident everything is drawn back to front). You get limited by the resolution
on your Z-data. In some cases drawing two objects near to each other they will overlap themselves
badly because we've hit the resolution limit of the Z-Buffer. Also you have to access Z data for each
pixel to determine if your new pixel is occluded, you can accelerate that access with a hierarchical Z,
but once again you don't need that if you draw everything as we've described here. If you need the Z
Data to do some frankenrender (that is combine the output of the voxel engine to use in a more
general purpose engine) then you can write the Z Data optimally once per pixel, exactly as you write
the color data. But you don't need this Z data to do any occlusion in the voxel engine.
Y-Buffer
How to determine optimally if a pixel has already been drawn ? Generally speaking you would need
one bit of data per pixel that is on, if something has been drawn and off else. But we can do better
because we draw only vertical and screen aligned blades. The algorithm is called Y buffer as a
reference to the more complete Z buffer algorithm. We'll maintain a single line of integer data that
will contain for each vertical column of pixel, what y was last written by our engine.
How does that work ? Each blade of terrain, occupy the zone of the screen below its vertical spot,
the only condition where this doesn't go all the way down to the bottom of the screen is if it is
partially occluded by a previous drawing. That means that when we draw a blade, we get a new Y
vertical position on the screen, then compare it to the existing Y for this column, if the previous Y is
bigger than the current Y, then we skip the drawing (occluded) else we draw a single color from the
current Y to the older Y. This way we draw pixels only once and the occlusion of individual pixels is
correct.
Level of detail
Because of the perspective we use, zones that are far away from our point of view will only cover a
small zone on the screen. There is no need to go through the pain of evaluating a map of 65536
elements when only one or two pixels might be drawn in the end.
We use a basic LOD system, this is also called MIP mapping (multum in parvo, a lot in little). Instead
of considering the whole 256x256, we will grossly estimate how much space this zone will cover
horizontally on the screen, then we will get the smaller representation that will roughly be equivalent
to one terrain element per pixel (we can go for multiple pixels per elements if we want faster but
lower quality rendering). So if the terrain zone covers 16 pixels on the screen we decide we can go for
a 16x16 representation. Each NxN representation is a filtered version of the larger 2Nx2N
representation.
That means for each terrain zone of 256x256, we have 8 other representations of smaller size,
128x128, 64x64, 32x32, 16x16 etc.
Texture mapping
Usually texture mapping in 3D is tricky because we have to do a perspective correction of UV,
filtering and so on. Now we're lucky because our algorithm does handle texture mapping nicely, in
fact when we traverse our 256x256 square we do it in texture space and we don't take any shortcut
when projecting terrain elements on the screen. For each terrain element we read a single height
from our height map and a single diffuse color from our color map and the position on the screen is
correct as a part of the projection process.
Summary
Here's what comes out of our engine so far. We added a bit of aerial perspective (blue coloring in
the distance), I'll detail that later. We still have a low quality because I use a LOD selection where one
terrain element is roughly covering two pixels horizontally.
Now how can we improve the quality ? First we should have one terrain element covering one pixel
or less horizontally (we don't have that fine grain for vertical resolution because it's hard to
determine beforehand how it will be expanded on the screen). Then we can use supersampling and
use a better sky model, I'll detail those below.
With all other things activated at the highest levels, enabling a four time supersampling
(horizontally) means a 50% increase in cycles. We don't have a four time increase luckily because we
don't spend 100% of our time drawing pixels and we don't increase other costs.
The resolve path is using a block filter, four samples are added and their color value is divided by
four, this is good enough for now. There are better filters that minimize rope effects, while reading
more than four samples per pixels, but we don't need that now (you can browse my ray tracing
articles for a use of a better antialiasing filter).
We abuse the Y buffer algorithm that we've described above. Instead of storing pure integer in the
Y buffer we store fractions of pixels instead. So if we touch a pixel we can determine through the
fractional Y value how much of the pixel is covered.
The final pixel will be written only when the four samples (in this case) are covered by taking the
mean color value. So this is an incremental resolve behavior, no need to do an extra pass at the end
to do the resolve like we did for supersampling above. This is guaranteed to work, because the
variation of Y on a single column of pixels is monotonous.
The cost is not much, for 4 vertical samples per pixel and everything else enabled we end up
spending 15% more cycles than regular rendering.
Linear interpolation
We can represent our blades of terrain as single rectangular blocks aligned to the screen space. But
in order to get a smooth effect, we need to interpolate between terrain elements. What I used here is
a linear interpolation, which gives us a "polygonal" look at a distance. I only interpolate height
between two adjacent terrain elements but I could also interpolate diffuse color or other attributes.
There are several way to implement that in the code, the easiest way, is to take the computed Z,
divide it by a max Z value and linearly interpolate between the "real" color of the terrain and a fog
color value that is light blue. It is in no way physically correct but that looks well enough for what we
want to do and more important it is fast.
We also have some area covered by water. Those are the area that are below a certain level. Water
should be reflecting the environment but let's forget about that we'll just make it reflect the sky color
and combine it with the part of the terrain that is below water level. If you want to see how to model
more interesting water with a refraction and reflection angle go look at my ray tracing articles,
everything is explained (Fresnel, Descartes etc.)
Well in short the sky looks blue, saturated blue as you look up, and more and more unsaturated
sometimes white or grayish when you look down near the horizon. It also looks whiter in the
direction of the sun.
This is for the day and without clouds, because at dawn or dusk, the sunlight has crossed so many
layers of air that it has lost most of its blue and starts to taint everything orange or red. During the
day the sunlight is almost pure white or light yellow, and what is not in direct sunlight is almost blue
because its main source of light comes from the blue sky.
What I did in this program is simply take the saturated blue (1) and unsaturated blue (2) make them
blend in a gradient that grows vertically from top to bottom of the screen (the horizon being the blue
number 2). The sun is not physically present but I keep track of the sun angle. The saturated blue (1)
is blended with more and more white as the angle of view comes near to where the sun should be
(Nth power of the cosine the view angle minus the sun angle).
Conclusion and future work
Here's the final output of the program, so we started with a height map and a diffuse map that we
cut into small pieces so that we could quickly cull the parts that are non visible. We use a fast
occlusion detection that allows us to update a pixel only once and gives us a vertical antialiasing at
the same time. The background is a simple procedural model of the sky and sun.
What can we do to improve this engine ?
At this level of quality it is not really fast and almost non interactive (3 fps on my machine in extra
high quality, more than 20 fps in the low quality version). There is definitely room for performance
improvement, through new and more efficient algorithms, profiling of the code eliminating code
pitfalls, and reworking some of the critical paths in assembly (using latest CPU extensions).
Also, the terrain is entirely loaded in memory which is OK for a simple technical demo, but what if
we want to represent a real looking world with gigs of data. Then we would have to use the LOD
algorithm and push it further so that we would be able to load only the parts that are visible and at
the lowest level of detail to reduce the memory footprint.
The lighting and shading is static because it is done during the authoring process by an offline
renderer that isn't very fast, even if its results are top quality. We could surrender some of this quality
to make an interactive dynamic lighting engine with light conditions updated in real time. We could
also implement a more complex model of sky, to be able to get realistic looking sky for various times
of the day or seasons, with procedural clouds and so on.
I'll finish that article here, with this list of improvements that could be implemented as an exercise
of the reader. I hope to have you interested in digging this kind of engine a bit more.